JP2015118493A

JP2015118493A - Trace device and trace program

Info

Publication number: JP2015118493A
Application number: JP2013260733A
Authority: JP
Inventors: 龍一佐藤; Ryuichi Sato; 明平田; Akira Hirata; 雅浩虻川; Masahiro Abukawa; 下谷　光生; Mitsuo Shitaya; 光生下谷; 伸輝岡田; Nobuteru Okada; 圭人吉村; Keito Yoshimura
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2013-12-18
Filing date: 2013-12-18
Publication date: 2015-06-25

Abstract

PROBLEM TO BE SOLVED: To provide a trace device that can suppress increase in CPU load upon output processing of program operation information without adding exclusive hardware, and can record the program operation information without being affected by system down (or an operation situation of a system operation system).SOLUTION: A trace device 1001 incorporating a multicore processor including a plurality of CPUs 101-1 to 101-3 and 201, the trace device 1001 comprises: a system operation system 100 that runs a program using the CPUs 101-1 to 101-3, and collects program operation information indicative of an operation history of the program; and a log control system 200 that operates independently of a system operation system 100 without being affected by an operation situation of the system operation system 100 using the CPU 201, and outputs the program operation information collected by the system operation system 100 to an IO device 1.

Description

本発明は、マルチコアプロセッサを搭載してトレース（プログラム動作情報）をするトレース装置に関するものである。 The present invention relates to a tracing device that mounts a multi-core processor and performs tracing (program operation information).

従来のプログラム動作情報（トレース）収集方法は、例えば計算機システムが動作中に、日付、時刻、動作内容などをＵＡＲＴ（ＵｎｉｖｅｒｓａｌＡｓｙｎｃｈｒｏｎｏｕｓＲｅｃｅｉｖｅｒＴｒａｎｓｍｉｔｔｅｒ）に出力する方法や、前記プログラム動作情報をメモリに出力し、不揮発領域へ書き出し、プログラム動作情報を収集する方法が利用されている。しかし、ＵＡＲＴにプログラム動作情報を出力する方法ではＵＡＲＴへ出力するためにＣＰＵ負荷が生じ、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）の動作タイミングに影響を及ぼす。プログラム動作情報の出力量によっては、システムの動作に副作用が生じることがあるため、プログラム動作情報を出力する処理負荷を考慮した運用が必要である。また、プログラム動作情報を不揮発領域へ出力する方法は、例えばＯＳ（オペレーティングシステム）を含むシステムがダウンした場合やハングアップした状態の場合は、収集したプログラム動作情報を収集できない状態に陥ることがある。 Conventional program operation information (trace) collection methods include, for example, a method of outputting the date, time, operation content, etc. to a UART (Universal Asynchronous Receiver Transmitter) while the computer system is operating, or outputting the program operation information to a memory. A method of writing to a nonvolatile area and collecting program operation information is used. However, in the method of outputting the program operation information to the UART, a CPU load is generated because the program operation information is output to the UART, which affects the operation timing of the CPU (Central Processing Unit). Depending on the output amount of the program operation information, side effects may occur in the operation of the system. Therefore, it is necessary to operate in consideration of the processing load for outputting the program operation information. Also, the method for outputting the program operation information to the non-volatile area may fall into a state where the collected program operation information cannot be collected, for example, when the system including the OS (operating system) is down or hung up. .

これらの課題を解決するために従来技術で次の特許文献１、特許文献２の技術が開示されている。特許文献１の技術は、ＣＰＵの動作タイミングに影響を与えることなく、プログラム動作情報を収集する技術として、専用のハードウェアを追加し、主記憶装置にプログラム動作情報を書き込み、専用ハードウェアから外部記憶装置にプログラム動作情報を転送し、ＣＰＵを介在させないことでＣＰＵ負荷を低減させる。また、専用ハードウェアを追加するため、システムがダウンした状況においても専用ハードウェアを経由してプログラム動作情報の収集を可能にしている。特許文献２の技術は、複数のＣＰＵを持つ計算機において各ＣＰＵの負荷を監視し、ＣＰＵ負荷が低いコアにトレーサを動作させるようにスケジュールするものである。それにより特定のＣＰＵ負荷が高くなることを防ぎ、システムに対する副作用を軽減できる技術が発明されている。 In order to solve these problems, the following patent documents 1 and 2 are disclosed as conventional techniques. The technique of Patent Document 1 adds dedicated hardware as a technique for collecting program operation information without affecting the operation timing of the CPU, writes the program operation information to the main storage device, and externally uses the dedicated hardware. The program operation information is transferred to the storage device, and the CPU load is reduced by not interposing the CPU. Further, since dedicated hardware is added, it is possible to collect program operation information via the dedicated hardware even when the system is down. The technique of Patent Document 2 monitors the load on each CPU in a computer having a plurality of CPUs, and schedules the tracer to operate on a core with a low CPU load. Thus, a technique has been invented that prevents a specific CPU load from becoming high and reduces the side effects on the system.

特開２００４−５４５３２号公報JP 2004-54532 A 特開２０１２−１８４３８号公報JP 2012-18438 A

特許文献１は、専用のハードウェアを追加することで、システムに対する負荷の軽減、システムダウン時にもプログラム動作情報の収集を可能にしているが、計算機システムに専用ハードウェアを追加する必要がある。計算機システムへ専用ハードウェアを追加するため、コストおよびサイズが大きくなるという課題がある。 Japanese Patent Application Laid-Open No. 2004-228867 reduces the load on the system by adding dedicated hardware and enables collection of program operation information even when the system is down. However, it is necessary to add dedicated hardware to the computer system. Since dedicated hardware is added to the computer system, there is a problem that the cost and size increase.

特許文献２は、既存の計算機システムにリソース監視機構などをソフトウェアで実現し、専用のハードウェアは不要である。このため、コスト高、サイズ拡大の課題は発生しないが、システム全体がダウンした状況では、プログラム動作情報をＵＡＲＴや不揮発領域へ書き出しができない状況に陥る場合がある。更にリソース監視機構をソフトウェアで実現する場合、それらを動作させるためのＣＰＵリソースが必要であり、システムに対する副作用が発生しないように運用する必要がある。 In Patent Document 2, a resource monitoring mechanism and the like are realized by software in an existing computer system, and dedicated hardware is not required. For this reason, the problem of high cost and size expansion does not occur, but when the entire system is down, the program operation information may not be written to the UART or the nonvolatile area. Further, when the resource monitoring mechanism is realized by software, CPU resources for operating them are necessary, and it is necessary to operate so as not to cause side effects on the system.

この発明は、専用ハードウェアを追加することなくプログラム動作情報の出力処理におけるＣＰＵ負荷増大を抑制し、またシステムダウンの影響を受けずにプログラム動作情報の記録が可能なトレース装置の提供を目的とする。 An object of the present invention is to provide a trace device capable of suppressing an increase in CPU load in program operation information output processing without adding dedicated hardware and recording program operation information without being affected by a system down. To do.

この発明のトレース装置は、
複数のＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を有するマルチコアプロセッサを搭載するトレース装置において、
前記マルチコアプロセッサの有する前記複数のＣＰＵのうち、１以上の一部のＣＰＵを用いてプログラムを実行し、前記プログラムの処理状況の履歴を示すプログラム動作情報を収集するシステム動作部と、
前記マルチコアプロセッサの有する前記複数のＣＰＵのうち、前記システム動作部が使用するＣＰＵとは異なる１以上のＣＰＵを用いて前記システム動作部の稼働状況に影響されることなく前記システム動作部とは独立に稼働すると共に、前記システム動作部の収集した前記プログラム動作情報をＩＯ（Ｉｎｐｕｔ／Ｏｕｔｐｕｔ）デバイスに出力するログ制御部と
を備えたことを特徴とする。 The tracing device of the present invention
In a trace device equipped with a multi-core processor having a plurality of CPUs (Central Processing Units),
A system operation unit that executes a program using one or more of the CPUs of the multi-core processor and collects program operation information indicating a history of the processing status of the program;
Of the plurality of CPUs of the multi-core processor, one or more CPUs different from the CPU used by the system operation unit are used and are independent of the system operation unit without being affected by the operating status of the system operation unit. And a log control unit that outputs the program operation information collected by the system operation unit to an IO (Input / Output) device.

この発明により、専用ハードウェアを追加することなくプログラム動作情報の出力処理におけるＣＰＵ負荷増大を抑制し、またシステムダウンの影響を受けずにプログラム動作情報の記録が可能なトレース装置を提供できる。 According to the present invention, it is possible to provide a tracing device that can suppress an increase in CPU load in the output processing of program operation information without adding dedicated hardware and can record program operation information without being affected by a system down.

実施の形態１による、トレース装置を示す構成図。1 is a configuration diagram showing a trace device according to Embodiment 1. FIG. 実施の形態１による、トレース装置の共有メモリの内部構成を示す説明図。FIG. 3 is an explanatory diagram showing an internal configuration of a shared memory of the trace device according to the first embodiment. 実施の形態１による、トレース装置の動作を示すフローチャート（システム動作系）。5 is a flowchart (system operation system) showing the operation of the trace device according to the first embodiment. 実施の形態１による、トレース装置の動作を示すフローチャート（ログ制御系）。6 is a flowchart (log control system) showing the operation of the trace apparatus according to the first embodiment. 実施の形態１による、トレース装置のシステム動作系の稼働状態監視のシーケンス図。FIG. 3 is a sequence diagram of monitoring the operating state of the system operation system of the trace device according to the first embodiment. 実施の形態１による、トレース装置のログ制御系がシステム動作系の異常を検出した場合のシーケンス図。FIG. 6 is a sequence diagram when the log control system of the trace apparatus detects an abnormality in the system operation system according to the first embodiment. 実施の形態１による、トレース装置のシステム動作系で異常を検出した場合のシーケンス図。FIG. 3 is a sequence diagram when an abnormality is detected in the system operation system of the trace device according to the first embodiment. 実施の形態２による、トレース装置を示す構成図。FIG. 4 is a configuration diagram showing a trace device according to a second embodiment. 実施の形態３による、トレース装置を示す構成図。FIG. 6 is a configuration diagram showing a trace device according to a third embodiment. 実施の形態３による、トレース装置の動作を示すフローチャート。10 is a flowchart showing the operation of the trace device according to the third embodiment. 実施の形態４による、トレース装置を示す構成図。FIG. 6 is a configuration diagram showing a trace device according to a fourth embodiment.

実施の形態１．
図１〜図７を参照して実施の形態１のトレース装置１００１を説明する。
図１は、実施の形態１のトレース装置１００１のハードウェア構成及びソフトウェア構成を含む構成図である。図１に示すトレース装置１００１は、４個のＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）であるＣＰＵ１０１−１、ＣＰＵ１０１−２、ＣＰＵ１０１−３，ＣＰＵ２０１を備えたマルチコアプロセッサに、システム動作系１００（システム動作部）とログ制御系２００（ログ制御部）を構築する例である。 Embodiment 1 FIG.
The trace apparatus 1001 according to the first embodiment will be described with reference to FIGS.
FIG. 1 is a configuration diagram including a hardware configuration and a software configuration of the trace apparatus 1001 according to the first embodiment. A trace apparatus 1001 shown in FIG. 1 includes a system operating system 100 (system operating unit) and a multi-core processor including CPUs 101-1, CPU 101-2, CPU 101-3, and CPU 201, which are four CPUs (Central Processing Units). This is an example of constructing a log control system 200 (log control unit).

（システム動作系１００）
システム動作系１００は、ＣＰＵ１０１−１〜１０１−３、ＯＳ（オペレーティングシステム）１０２、ＣＰＵ間通信部１０３、アプリケーションプログラム１０４、ＩＯ（Ｉｎｐｕｔ／Ｏｕｔｐｕｔ）アクセス制御部１０５、異常検出処理部１０６、及びプログラム情報書込部１０７から構成される。プログラム情報書込部１０７は、プログラム動作情報を書き出すプログラムを有するプロセスをＣＰＵ１０１−１等が実行することで実現される。
ここで「プログラム動作情報」とは、プログラムの動作履歴の情報（トレース）である。 (System operation system 100)
The system operation system 100 includes a CPU 101-1 to 101-3, an OS (Operating System) 102, an inter-CPU communication unit 103, an application program 104, an IO (Input / Output) access control unit 105, an abnormality detection processing unit 106, and a program An information writing unit 107 is included. The program information writing unit 107 is realized by the CPU 101-1 or the like executing a process having a program for writing program operation information.
Here, “program operation information” is information (trace) of program operation history.

（ログ制御系２００）
ログ制御系２００は、ＣＰＵ２０１、ＣＰＵ間通信部２０３、ＩＯアクセス制御部２０５、異常検出処理部２０６及びプログラム情報読出部２０７から構成されている。 (Log control system 200)
The log control system 200 includes a CPU 201, an inter-CPU communication unit 203, an IO access control unit 205, an abnormality detection processing unit 206, and a program information reading unit 207.

また、トレース装置１００１は、システム動作系１００とログ制御系２００とが共通して用いるＩＯ（Ｉｎｐｕｔ／Ｏｕｔｐｕｔ）デバイス１、メモリ２（例えば主記憶装置）、通信路３を備えている。上記のシステム動作系１００、ログ制御系２００は、「〜部」の機能を有する各プログラム、ＯＳ１０２、「各プログラム及びＯＳ１０２」を実行するＣＰＵ１０１−１〜１０１−３、２０１によって実現される。図１に示すように、トレース装置１００１では、マルチコアプロセッサを複数の系（図１では２つ）に分割し、システム動作系１００とログ制御系２００とを構成する。システム動作系１００にはＣＰＵを３個、ログ制御系２００にはＣＰＵを１個割り当てる。ＣＰＵ１０１−１〜１０１−３、ＣＰＵ２０１の４つのＣＰＵは、同一アーキテクチャを持つ。これら４つのＣＰＵは、通信路３で接続され、ＣＰＵ間、或いは同じく通信路３に接続される周辺ハードウェアであるＩＯデバイス１、メモリ２と互いに情報を伝達できるように構成されている。 The trace device 1001 includes an IO (Input / Output) device 1, a memory 2 (for example, a main storage device), and a communication path 3 that are commonly used by the system operation system 100 and the log control system 200. The system operation system 100 and the log control system 200 described above are realized by each program having the function of “unit”, the OS 102, and the CPUs 101-1 to 101-3 and 201 that execute “each program and the OS 102”. As shown in FIG. 1, in the trace apparatus 1001, the multi-core processor is divided into a plurality of systems (two in FIG. 1), and the system operation system 100 and the log control system 200 are configured. Three CPUs are allocated to the system operation system 100 and one CPU is allocated to the log control system 200. The four CPUs, the CPUs 101-1 to 101-3 and the CPU 201, have the same architecture. These four CPUs are connected by a communication path 3 and are configured to be able to transmit information to each other or to the IO device 1 and the memory 2 that are peripheral hardware connected to the communication path 3.

システム動作系１００は、ＯＳ１０２を搭載する。また、システム動作系１００は、ＣＰＵ間通信部１０３，アプリケーションプログラム１０４，ＩＯアクセス制御部１０５，異常検出処理部１０６、プログラム情報書込部１０７を搭載しシステム動作系を構築する。ＣＰＵ間通信部１０３〜プログラム情報書込部１０７はＯＳ１０２上で動作する。また、ログ制御系２００にＯＳは搭載せず、単純な構成とする。ログ制御系２００はＣＰＵ間通信部２０３，ＩＯアクセス制御部２０５，異常検出処理部２０６、プログラム情報読出部２０７を搭載し、ログ制御系を構築する。システム動作系１００とログ制御系２００とで共用するＩＯデバイス１へのアクセスは、システム動作系１００またはログ制御系２００とのうちの、どちらか一方がアクセスすることとする。システム動作系１００とログ制御系２００とのＩＯデバイス１へのアクセスの競合を防ぐため、ＩＯアクセス制御部１０５、２０５で排他制御を行う。 The system operation system 100 includes an OS 102. The system operation system 100 includes an inter-CPU communication unit 103, an application program 104, an IO access control unit 105, an abnormality detection processing unit 106, and a program information writing unit 107, and constructs a system operation system. The inter-CPU communication unit 103 to the program information writing unit 107 operate on the OS 102. In addition, the log control system 200 does not include an OS and has a simple configuration. The log control system 200 includes an inter-CPU communication unit 203, an IO access control unit 205, an abnormality detection processing unit 206, and a program information reading unit 207, and constructs a log control system. Assume that either the system operation system 100 or the log control system 200 accesses the IO device 1 shared by the system operation system 100 and the log control system 200. In order to prevent contention of access to the IO device 1 between the system operation system 100 and the log control system 200, the IO access control units 105 and 205 perform exclusive control.

図１に示すように、メモリ２は、システム動作系１００が使用するシステム動作系メモリ領域２１と、ログ制御系２００が使用するログ制御系メモリ領域２２と、システム動作系１００とログ制御系２００との両者で使用する共有メモリ２３に分割されている。 As shown in FIG. 1, the memory 2 includes a system operation system memory area 21 used by the system operation system 100, a log control system memory area 22 used by the log control system 200, a system operation system 100, and a log control system 200. And shared memory 23 used by both of them.

システム動作系１００は通常のシステムを動作させる系である。システム動作系１００のプログラム情報書込部１０７は、システム動作系１００で動作するアプリケーションプログラム１０４をはじめ、ＯＳ１０２を含むプログラム動作情報を収集し、収集したプログラム動作情報を共有メモリ２３へ書き込む。 The system operation system 100 is a system for operating a normal system. The program information writing unit 107 of the system operation system 100 collects the program operation information including the OS 102 as well as the application program 104 operating on the system operation system 100 and writes the collected program operation information to the shared memory 23.

ログ制御系２００のプログラム情報読出部２０７は、共有メモリ２３から、プログラム情報書込部１０７が書き込んだプログラム動作情報を読み出し、ＩＯデバイス１へ、読み出したプログラム動作情報を出力する。 The program information reading unit 207 of the log control system 200 reads the program operation information written by the program information writing unit 107 from the shared memory 23 and outputs the read program operation information to the IO device 1.

（システム動作系１００の異常検出）
ＣＰＵ間通信部１０３，２０３と、異常検出処理部１０６、２０６とによって、システム動作系１００の稼働状態の監視を行い、システム動作系１００の異常検出を行う。システム動作系１００の異常が検出された場合は、ログ制御系２００の異常検出処理部２０６は、システム動作系１００のＣＰＵ１０１−１〜１０１−３を停止させると共に、共有メモリ２３に格納されているプログラム動作情報をプログラム情報読出部２０７に、ＩＯデバイス１へ出力させる。 (Abnormality detection of system operation system 100)
The inter-CPU communication units 103 and 203 and the abnormality detection processing units 106 and 206 monitor the operating state of the system operation system 100 and detect an abnormality in the system operation system 100. When an abnormality of the system operation system 100 is detected, the abnormality detection processing unit 206 of the log control system 200 stops the CPUs 101-1 to 101-3 of the system operation system 100 and is stored in the shared memory 23. Program operation information is output to the IO device 1 by the program information reading unit 207.

図２は共有メモリ２３の使用領域を示す図である。共有メモリ２３は、図２に示すように、プログラム動作情報を出力するＩＯデバイスの種別に応じてメモリ領域が分割される。システム動作系１００は、プログラム動作情報を出力するＩＯデバイスの種別に応じた共有メモリ領域にプログラム動作情報の書き込みを行う。図２は２つのＩＯデバイスへプログラム動作情報を出力する場合の図である。例えばＩＯデバイスＡの出力用領域２３１は、ＵＡＲＴへの出力領域であり、ＩＯデバイスＢの出力用領域２３２は、不揮発なストレージであるＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）やＳＤメモリーカード（ＳＤＭｅｍｏｒｙＣａｒｄ（ＳＤは登録商標））への出力領域として割り当てる。 FIG. 2 is a diagram showing a use area of the shared memory 23. As shown in FIG. 2, the shared memory 23 has a memory area divided according to the type of IO device that outputs program operation information. The system operation system 100 writes the program operation information in the shared memory area corresponding to the type of the IO device that outputs the program operation information. FIG. 2 is a diagram when program operation information is output to two IO devices. For example, the output area 231 of the IO device A is an output area to the UART, and the output area 232 of the IO device B is a non-volatile storage such as an HDD (Hard Disk Drive) or an SD memory card (SD Memory Card (SD) Is assigned as an output area to the registered trademark)).

＜トレース装置１００１の起動とプログラム動作情報収集＞
次に、実施の形態１のトレース装置１００１の動作について説明する。
図３は、システム動作系１００のプログラム情報書込部１０７が、共有メモリ２３にプログラム動作情報を書き込むフローチャートである。電源オンなど、システム起動トリガを検出すると、ログ制御系２００が起動し、その後、システム動作系１００が起動する。起動したシステム動作系１００は通常の動作を行い、プログラム情報書込部１０７が動作内容をプログラム動作情報として共有メモリ２３へ書き込む（ＳＴ１）。共有メモリ２３のプログラム動作情報を格納する領域は、システム動作系１００のみが書き込みを行うものとする。なお読み出しはプログラム情報読出部２０７のみが行う。このためアクセス競合が発生せず、図３に示す通り、判定処理なしで共有メモリ２３へ書き込みができる。また、ログ制御系２００起動後にシステム動作系１００を起動させることで、システム動作系１００の起動時からのプログラム動作情報を取得することができる。なお、システムの起動時間の短縮を目的とし、システム動作系１００と、ログ制御系２００とを並列で起動させてもよい。 <Activation of Trace Device 1001 and Collection of Program Operation Information>
Next, the operation of the trace device 1001 according to the first embodiment will be described.
FIG. 3 is a flowchart in which the program information writing unit 107 of the system operation system 100 writes the program operation information to the shared memory 23. When a system activation trigger such as power on is detected, the log control system 200 is activated, and then the system operation system 100 is activated. The activated system operation system 100 performs a normal operation, and the program information writing unit 107 writes the operation content to the shared memory 23 as the program operation information (ST1). It is assumed that only the system operating system 100 writes in the area for storing the program operation information in the shared memory 23. The reading is performed only by the program information reading unit 207. Therefore, access contention does not occur, and writing to the shared memory 23 can be performed without determination processing as shown in FIG. In addition, by starting the system operation system 100 after starting the log control system 200, it is possible to acquire program operation information from the time when the system operation system 100 is started. Note that the system operation system 100 and the log control system 200 may be activated in parallel for the purpose of shortening the system activation time.

＜プログラム動作情報の出力＞
図４は、プログラム情報読出部２０７が、共有メモリ２３に格納されたプログラム動作情報を、ＩＯデバイス１への出力するフローチャートである。出力先のＩＯデバイス１がシステム動作系１００からもアクセスされる場合は、アクセス競合が発生するため、図４に示すように、ＩＯアクセス制御部２０５で排他制御を行う。これについては後述する。 <Output of program operation information>
FIG. 4 is a flowchart in which the program information reading unit 207 outputs the program operation information stored in the shared memory 23 to the IO device 1. When the output destination IO device 1 is also accessed from the system operating system 100, access contention occurs, and therefore, the exclusive control is performed by the IO access control unit 205 as shown in FIG. This will be described later.

ログ制御系２００のプログラム情報読出部２０７は、共有メモリ２３に書き込まれたプログラム動作情報を出力先のＩＯデバイスの種別に応じた共有メモリ２３の領域から読み出し（ＳＴ１０）、プログラム動作情報が存在するか判定する（ＳＴ１１）。ＩＯデバイス１へ出力するプログラム動作情報が存在する場合、ＩＯアクセス制御部２０５は、ＩＯデバイス１へのアクセスが可能か判定する（ＳＴ１２）。ＩＯアクセス制御部２０５がアクセス可能と判定した場合、プログラム情報読出部２０７は、ＩＯデバイス１へプログラム動作情報の出力処理を行う（ＳＴ１３）。ＩＯデバイス１への出力処理はシステム動作系１００と共用する通信路３の効率を考慮し、プログラム情報読出部２０７は、プログラム動作情報が一定量、共有メモリ２３に溜まるまで待ち、一定量溜まる毎にＩＯデバイス１へ出力する。ＩＯデバイス１へのデータ転送は、ログ制御系２００のＣＰＵ２０１による転送に加え、ＤＭＡ（ＤｉｒｅｃｔＭｅｍｏｒｙＡｃｃｅｓｓ）を利用してもよい。 The program information reading unit 207 of the log control system 200 reads the program operation information written in the shared memory 23 from the area of the shared memory 23 corresponding to the type of the output destination IO device (ST10), and the program operation information exists. (ST11). If there is program operation information to be output to the IO device 1, the IO access control unit 205 determines whether access to the IO device 1 is possible (ST12). When the IO access control unit 205 determines that access is possible, the program information reading unit 207 performs output processing of program operation information to the IO device 1 (ST13). The output processing to the IO device 1 considers the efficiency of the communication path 3 shared with the system operation system 100, and the program information reading unit 207 waits until a certain amount of program operation information is accumulated in the shared memory 23, and every time a certain amount is accumulated. Output to the IO device 1. Data transfer to the IO device 1 may use DMA (Direct Memory Access) in addition to the transfer by the CPU 201 of the log control system 200.

＜異常検出＞
ログ制御系２００の異常検出処理部２０６は、システム動作系１００の稼働状態を監視し、システム動作系１００の異常を検出する機能を有する。以下に具体的な検出方法を記載する。
図５は、ログ制御系２００がシステム動作系１００の稼働状態を監視するシーケンスの例である。通信路３を介したＣＰＵ１０１−１〜１０１―３と、ＣＰＵ２０１との間で、ＣＰＵ間通信部１０３，２０３と共有メモリ２３とを利用し、システム動作系１００は、周期的に共有メモリ２３の稼働情報を更新する（ＳＴ１００）。ＣＰＵ間通信部１０３，２０３間の通信で、ログ制御系２００へ稼働情報の更新通知を送出する（ＳＴ１１１）。ログ制御系２００では、予めタイマ（図示していないがＣＰＵ２０１によって実現する）をセットし（ＳＴ１１２）、タイマカウントダウンを行っている（ＳＴ１１３）。ログ制御系２００は、システム動作系１００からの稼働情報の更新通知を受信し、更新通知を受信した場合、異常検出処理部２０６は共有メモリ２３の稼働内容が期待する値に更新されているか確認する（ＳＴ１１４）。異常検出処理部２０６は、期待する値に更新されていることでシステム動作系１００が正常動作を行っていると判断し、タイマ値をリセットし（ＳＴ１１５）、再度、タイマカウントダウンを行い（ＳＴ１１６）、以下同様に、異常検出処理部２０６はシステム動作系が正常に稼働しているかどうか監視を継続する。 <Abnormality detection>
The abnormality detection processing unit 206 of the log control system 200 has a function of monitoring the operating state of the system operation system 100 and detecting an abnormality of the system operation system 100. A specific detection method is described below.
FIG. 5 is an example of a sequence in which the log control system 200 monitors the operating state of the system operation system 100. Between the CPUs 101-1 to 101-3 via the communication path 3 and the CPU 201, the inter-CPU communication units 103 and 203 and the shared memory 23 are used, and the system operation system 100 periodically stores the shared memory 23. The operation information is updated (ST100). An update notification of operation information is sent to the log control system 200 through communication between the CPU communication units 103 and 203 (ST111). In the log control system 200, a timer (not shown but realized by the CPU 201) is set in advance (ST112), and a timer countdown is performed (ST113). When the log control system 200 receives the update notification of the operation information from the system operation system 100 and receives the update notification, the abnormality detection processing unit 206 checks whether the operation content of the shared memory 23 has been updated to an expected value. (ST114). The abnormality detection processing unit 206 determines that the system operation system 100 is operating normally by being updated to the expected value, resets the timer value (ST115), and performs timer countdown again (ST116). Similarly, the abnormality detection processing unit 206 continues monitoring whether the system operation system is operating normally.

＜ログ制御系２００によるシステム動作系１００の異常検出＞
図６は、システム動作系１００が正常に稼働できていない状態を、ログ制御系２００が検出するシーケンスの例である。ログ制御系２００によるシステム動作系１００の稼働状態の監視方法は、図５で示した通りである。ログ制御系２００は予めタイマをセットし（ＳＴ１２２）、タイマカウントダウン（ＳＴ１２３）を行っている。システム動作系１００が共有メモリ２３の内容を更新したが（ＳＴ１２０）、システム動作系１００がダウンし、稼働情報の更新通知（ＳＴ１２１）が通知できない状態が発生する。ログ制御系２００の異常検出処理部２０６は、稼働情報の更新通知（ＳＴ１２１）が通知されないため、タイマのタイムアウトを検出し（ＳＴ１２４）、システム動作系の稼働状態が異常と判定し、通信路３を介してシステム動作系１００のＣＰＵ１０１の動作を停止ＳＴ１２５させる。その後、異常検出処理部２０６はプログラム情報読出部２０７に、プログラム動作情報の強制出力（ＳＴ１２６）、システム動作系１００のキャッシュ情報出力（ＳＴ１２７）を行わせる。ＳＴ１２６のプログラム動作情報の強制出力、ＳＴ１２７のシステム動作系１００のキャッシュ情報出力については後述する。 <Abnormality detection of system operation system 100 by log control system 200>
FIG. 6 is an example of a sequence in which the log control system 200 detects a state where the system operation system 100 is not operating normally. The method of monitoring the operating state of the system operation system 100 by the log control system 200 is as shown in FIG. The log control system 200 sets a timer in advance (ST122) and performs a timer countdown (ST123). Although the system operation system 100 has updated the contents of the shared memory 23 (ST120), the system operation system 100 goes down and a state in which the operation information update notification (ST121) cannot be notified occurs. The abnormality detection processing unit 206 of the log control system 200 detects a timer timeout (ST124) because the operation information update notification (ST121) is not notified, and determines that the operation state of the system operation system is abnormal. Then, the operation of the CPU 101 of the system operation system 100 is stopped ST125. Thereafter, the abnormality detection processing unit 206 causes the program information reading unit 207 to forcibly output the program operation information (ST126) and to output the cache information of the system operation system 100 (ST127). The forced output of the program operation information in ST126 and the cache information output of the system operation system 100 in ST127 will be described later.

システム動作系１００からログ制御系２００の稼働情報の更新通知は通知されたが、共有メモリ２３の稼働内容が期待する値に更新されていない場合も、異常検出処理部２０６は、システム動作系１００の稼働状態異常と判断する。 Even when the update information of the operation information of the log control system 200 is notified from the system operation system 100, but the operation content of the shared memory 23 is not updated to an expected value, the abnormality detection processing unit 206 also includes the system operation system 100. Is determined to be abnormal.

＜システム動作系の自系の異常検出＞
図７は、システム動作系１００が自系の異常を検出する場合のシーケンスの例であるシステム動作系１００の異常検出処理部１０６は、自身のシステム動作系内の異常を検出する機能を有し、異常検出処理ＳＴ１３０を行う。以下に具体的な検出方法を記載する。
（１）例外などＣＰＵ１０１−１〜１０１−３のエラー検出情報を取得する。
（２）チェックサムを利用したメモリ内容の監視を行い、メモリ破壊、データの書き込み失敗の検出を行う。
（３）アプリケーションプログラム１０４で周期的に動作する処理が一定周期以内に動作しているか監視し、処理遅延の検出を行う。
（４）システム動作系メモリ領域２１の特定の領域に確認用データを格納し、その領域が期待しない値に書き換わらないか監視する。メモリ破壊、スタックオーバフローの検出を行う。
（５）ＯＳ１０２のスケジューリング情報を参照し、スケジューリングのキュー操作から一定時間以上同一キューがＲＵＮ状態になっていないか、周期動作する処理が周期的にＲＵＮ状態に遷移しているかを監視する。
上記（１）〜（５）のいずれかの方法で異常を検出した場合、システム動作系１００の異常検出処理部１０６は、ＣＰＵ間通信部１０３，２０３間の通信を介して、ログ制御系２００に異常検出通知を送る（ＳＴ１３１）。ログ制御系２００では異常検出通知を受信すると、異常検出処理部２０６が通信路３を介してシステム動作系１００のＣＰＵ１０１−１〜１０１−３の動作を停止（ＳＴ１３２）させる。その後、異常検出処理部２０６はプログラム情報読出部２０７に、プログラム動作情報の強制出力（ＳＴ１３３）、及びシステム動作系１００のキャッシュ情報出力（ＳＴ１３４）を行わせる。 <Detection of abnormalities in the system operating system>
FIG. 7 shows an example of a sequence when the system operation system 100 detects an abnormality in its own system. The abnormality detection processing unit 106 of the system operation system 100 has a function of detecting an abnormality in its own system operation system. Then, the abnormality detection process ST130 is performed. A specific detection method is described below.
(1) Acquire error detection information of the CPUs 101-1 to 101-3 such as exceptions.
(2) Monitor memory contents using a checksum and detect memory corruption and data write failure.
(3) Monitors whether processing that operates periodically by the application program 104 operates within a certain period, and detects processing delay.
(4) The confirmation data is stored in a specific area of the system operation system memory area 21, and it is monitored whether the area is rewritten to an unexpected value. Detects memory corruption and stack overflow.
(5) With reference to the scheduling information of the OS 102, it is monitored whether the same queue has been in the RUN state for a certain period of time or more after the scheduling queue operation, and whether the periodically operating process has periodically changed to the RUN state.
When an abnormality is detected by any one of the methods (1) to (5), the abnormality detection processing unit 106 of the system operation system 100 performs the log control system 200 via communication between the inter-CPU communication units 103 and 203. An abnormality detection notification is sent to (ST131). When the log control system 200 receives the abnormality detection notification, the abnormality detection processing unit 206 stops the operations of the CPUs 101-1 to 101-3 of the system operation system 100 via the communication path 3 (ST132). Thereafter, the abnormality detection processing unit 206 causes the program information reading unit 207 to forcibly output the program operation information (ST133) and output the cache information of the system operation system 100 (ST134).

＜プログラム動作情報の強制出力（ＳＴ１２６，ＳＴ１３３）＞
ログ制御系２００の異常検出処理部２０６は、共有メモリ２３にシステム動作系１００のプログラム情報書込部１０７が書き込んだプログラム動作情報を、プログラム情報読出部２０７により強制的にＩＯデバイス１へ出力させる。システム動作系１００が通信路３および共有するＩＯデバイス１を使用中の状態でシステム動作系１００がダウンした場合は、異常検出処理部２０６は通信路３のアクセス権または共有するＩＯデバイスのアクセス権を強制的にログ制御系２００に割り当てる処理を行い、プログラム情報読出部２０７にＩＯデバイス１へプログラム動作情報を出力させる。 <Forced output of program operation information (ST126, ST133)>
The abnormality detection processing unit 206 of the log control system 200 forcibly causes the program information reading unit 207 to output the program operation information written by the program information writing unit 107 of the system operation system 100 to the shared memory 23 to the IO device 1. . When the system operation system 100 is down while the system operation system 100 is using the communication path 3 and the shared IO device 1, the abnormality detection processing unit 206 has the access right of the communication path 3 or the access right of the shared IO device. Is forcibly assigned to the log control system 200 to cause the program information reading unit 207 to output program operation information to the IO device 1.

強制出力することでシステム動作系１００がダウンする直前のプログラム動作情報の取得が可能である。また、ＩＯデバイス１への出力をログ制御系２００で処理するため、システム動作系１００がダウンしても、ＩＯデバイス１への出力過程で出力処理が停止せず、プログラム動作情報を取得することができる。これに対して従来の技術ではシステム動作系でＩＯデバイス１への出力処理を行う場合、システム動作系がダウンするとＩＯデバイス１への出力過程で出力処理が停止し、最後までプログラム動作情報の取得ができないことがある。 By forcibly outputting, it is possible to acquire program operation information immediately before the system operation system 100 goes down. Further, since the output to the IO device 1 is processed by the log control system 200, even if the system operation system 100 goes down, the output process does not stop in the output process to the IO device 1, and program operation information is acquired. Can do. On the other hand, in the conventional technique, when output processing to the IO device 1 is performed in the system operation system, if the system operation system goes down, the output processing is stopped in the output process to the IO device 1, and program operation information is obtained until the end. May not be possible.

＜システム動作系１００のキャッシュ情報出力（ＳＴ１２７，ＳＴ１３４）＞
ログ制御系２００の異常検出処理部２０６は、プログラム情報読出部２０７に、システム動作系１００のＣＰＵ１０１−１〜１０１−３の２次キャッシュの情報を取得させ、ＩＯデバイス１へ出力させる。このようにＳＴ１２７，ＳＴ１３４では、プログラム動作情報に加え、異常が発生したシステム動作系１００におけるＣＰＵ１０１−１〜１０１−３の２次キャッシュの情報をログ制御系２００側からアクセスして異常発生時の状態を収集する。
また、ログ制御系２００の異常検出処理部２０６は、プログラム情報読出部２０７によって、システム動作系１００のＣＰＵ１０１−１〜１０１−３の１次キャッシュの情報を取得する。
ログ制御系２００の異常検出処理部２０６は、停止させたＣＰＵ１０１−１〜１０１−３を再度、起動させ、１次キャッシュ読み出し要求を通信路３へ送出し、ＣＰＵ１０１−１〜１０１−３が保持している１次キャッシュ情報をプログラム情報読出部２０７を介して取得し、異常発生時の状態を収集する。ログ制御系２００では、ＣＰＵ１０１−１〜１０１−３の１次キャッシュ情報を取得した後は必要に応じてＣＰＵ１０１−１〜１０１−３の動作を停止させる。ログ制御系２００によるシステム動作系１００のＣＰＵ１０１−１〜１０１−３の１次キャッシュ、２次キャッシュ情報の取得は、システム動作系１００に異常が発生した時のみでなく、障害調査や動作内容の確認を行いたいタイミングで、異常検出処理部２０６がＣＰＵ１０１を停止し、実行する方法としてもよい。 <Cache information output of system operation system 100 (ST127, ST134)>
The abnormality detection processing unit 206 of the log control system 200 causes the program information reading unit 207 to acquire the secondary cache information of the CPUs 101-1 to 101-3 of the system operation system 100 and output the information to the IO device 1. As described above, in ST127 and ST134, in addition to the program operation information, the secondary cache information of the CPUs 101-1 to 101-3 in the system operation system 100 in which an abnormality has occurred is accessed from the log control system 200 side, so Collect state.
Further, the abnormality detection processing unit 206 of the log control system 200 acquires the primary cache information of the CPUs 101-1 to 101-3 of the system operation system 100 through the program information reading unit 207.
The abnormality detection processing unit 206 of the log control system 200 activates the stopped CPUs 101-1 to 101-3 again, sends a primary cache read request to the communication path 3, and the CPUs 101-1 to 101-3 hold it. The obtained primary cache information is acquired through the program information reading unit 207, and the state at the time of occurrence of an abnormality is collected. In the log control system 200, after acquiring the primary cache information of the CPUs 101-1 to 101-3, the operations of the CPUs 101-1 to 101-3 are stopped as necessary. The acquisition of the primary cache and secondary cache information of the CPUs 101-1 to 101-3 of the system operation system 100 by the log control system 200 is performed not only when an abnormality occurs in the system operation system 100, but also for trouble investigation and operation details. The abnormality detection processing unit 206 may stop and execute the CPU 101 at a timing when confirmation is desired.

＜ＩＯデバイスアクセス制御＞
トレース装置１００１を構築するマルチコアプロセッサのハードウェア構成によっては、システム動作系１００とログ制御系２００とが共通のＩＯデバイス１へアクセスする場合がある。また、マルチコアプロセッサを搭載したＳｏｃでは、ＩＯデバイスは共通の構成が多く用いられる。この場合、共通で使用するＩＯデバイスへのアクセス競合を防ぐために、システム動作系１００のＩＯアクセス制御部１０５とログ制御系２００のＩＯアクセス制御部２０５とで排他制御を実施する。ＩＯアクセス制御部１０５とＩＯアクセス制御部２０５とはそれぞれＣＰＵ間通信部１０３、ＣＰＵ間通信部２０３で通信路３を介し、共通に使用するＩＯデバイス１へのアクセス開始と終了をお互いに通知し、排他制御を行う。共通に使用するＩＯデバイス１へのアクセス状況をシステム動作系１００とログ制御系２００とで共有し、ＩＯデバイス１へアクセスする前に確認することで競合を防ぐことができる。 <IO device access control>
Depending on the hardware configuration of the multi-core processor that constructs the trace device 1001, the system operation system 100 and the log control system 200 may access the common IO device 1. Further, in a Soc equipped with a multi-core processor, a common configuration is often used for IO devices. In this case, exclusive control is performed by the IO access control unit 105 of the system operation system 100 and the IO access control unit 205 of the log control system 200 in order to prevent access contention for commonly used IO devices. The IO access control unit 105 and the IO access control unit 205 notify the start and end of access to the commonly used IO device 1 via the communication path 3 by the inter-CPU communication unit 103 and the inter-CPU communication unit 203, respectively. Perform exclusive control. By sharing the access status to the commonly used IO device 1 between the system operation system 100 and the log control system 200 and confirming before accessing the IO device 1, contention can be prevented.

＜トレース装置１００１のシャットダウン＞
トレース装置１００１のシャットダウンは、システム動作系１００が終了した後、ログ制御系２００を終了させる。システム動作系１００はＣＰＵ間通信を使用しログ制御系２００へシャットダウンを通知する。ログ制御系２００は、システム動作系１００のシャットダウンを受信後、共有メモリ２３に格納されているプログラム動作情報を強制的にＩＯデバイス１へ出力し、ログ制御系２００をシャットダウンする。 <Shutdown of trace device 1001>
The shutdown of the trace device 1001 terminates the log control system 200 after the system operation system 100 is terminated. The system operation system 100 notifies the log control system 200 of the shutdown using inter-CPU communication. After receiving the shutdown of the system operation system 100, the log control system 200 forcibly outputs the program operation information stored in the shared memory 23 to the IO device 1 and shuts down the log control system 200.

なお、上記の例ではＣＰＵ数が４個のマルチコアプロセッサの例を説明したが、ＣＰＵ数はこの値に限定されるものではなく、２個以上のＣＰＵを備えたマルチコアプロセッサであってもよい。すなわち、実施の形態１のトレース装置１００１は、ＣＰＵ数が２個以上のマルチコアプロセッサにおいてＣＰＵ数を任意の個数に分割し、システム動作系１００とログ制御系２００を構築するものである。あるいは複数のＣＰＵを有するマルチコアプロセッサの、複数のＣＰＵの一部を用いてシステム動作系１００、ログ制御系２００を構成しても構わない。例えば５個のＣＰＵを備えたマルチコアプロセッサにおいて、トレース装置１００１と同様に、システム動作系１００に３個使用し、ログ制御系２００に１個使用しても構わない。
また、この実施の形態１では、システム動作系１００にＯＳ１０２が搭載された場合を説明したが、システム動作系１００にＯＳ１０２を搭載しないシステムにおいてもこの実施の形態１の適用が可能である。
さらにログ制御系２００は単純な構成とするためＯＳは搭載しないこととしたが、堅牢性の高いＯＳを搭載する構成も可能である。
さらにシステム動作系１００はＳＭＰ構成だけでなく、ＡＭＰ構成のマルチＯＳ、およびマルチＣＰＵ構成も可能である。 In the above example, the example of the multi-core processor having four CPUs has been described. However, the number of CPUs is not limited to this value, and a multi-core processor having two or more CPUs may be used. That is, the trace apparatus 1001 according to the first embodiment divides the number of CPUs into an arbitrary number in a multi-core processor having two or more CPUs, and constructs the system operation system 100 and the log control system 200. Alternatively, the system operation system 100 and the log control system 200 may be configured by using a part of a plurality of CPUs of a multi-core processor having a plurality of CPUs. For example, in a multi-core processor having five CPUs, three may be used for the system operation system 100 and one may be used for the log control system 200, similarly to the trace device 1001.
In the first embodiment, the case where the OS 102 is mounted on the system operation system 100 has been described. However, the first embodiment can be applied to a system in which the OS 102 is not mounted on the system operation system 100.
Further, since the log control system 200 has a simple configuration, the OS is not installed, but a configuration with a highly robust OS is also possible.
Furthermore, the system operation system 100 can have not only an SMP configuration but also a multi-OS configuration with an AMP configuration and a multi-CPU configuration.

実施の形態２．
図８を参照して実施の形態２のトレース装置１００２を説明する。
図８は実施の形態２のトレース装置１００２の構成図である。図８のトレース装置１００２では、実施の形態１で示したマルチコアプロセッサを用いてトレース装置を構築する例において、計算機の仮想化技術のひとつであるハイパーバイザを利用してトレース装置を構築する。トレース装置１００２は、トレース装置１００１に対して、ログ制御系２００にＯＳ２０２を搭載すると共に、ＩＯデバイス１へのアクセス制御やＣＰＵ間通信部を、ハイパーバイザ１１０を介して実現する。つまりトレース装置１００２では、さらにハイパーバイザ１１０も加えて、システム動作系１００、ログ制御系２００を構成する。 Embodiment 2. FIG.
A trace device 1002 according to the second embodiment will be described with reference to FIG.
FIG. 8 is a configuration diagram of the trace device 1002 according to the second embodiment. In the trace apparatus 1002 of FIG. 8, in the example of constructing a trace apparatus using the multi-core processor shown in the first embodiment, the trace apparatus is constructed using a hypervisor which is one of computer virtualization technologies. The trace device 1002 mounts an OS 202 in the log control system 200 with respect to the trace device 1001, and realizes access control to the IO device 1 and an inter-CPU communication unit via the hypervisor 110. That is, the trace apparatus 1002 further includes the hypervisor 110 to constitute the system operation system 100 and the log control system 200.

実施の形態１との違いは、上記のようにログ制御系２００にＯＳ２０２を搭載する点と、ＣＰＵ間通信部とＩＯデバイスアクセス制御部との代わりにハイパーバイザ１１０を利用する点であり、運用方法は実施の形態１と同様である。また、ハイパーバイザ１１０でシステム動作系１００とログ制御系２００とからのＩＯデバイス１へのアクセスに優先度をつけ、システム動作系１００からのＩＯデバイスアクセスを優先する方法としてもよい。優先度付きＩＯデバイスアクセスを行うことで、システム動作系１００への影響、すなわちＩＯデバイスアクセス競合発生時の待ち時間を小さくすることができる。 The difference from the first embodiment is that the OS 202 is installed in the log control system 200 as described above, and that the hypervisor 110 is used instead of the inter-CPU communication unit and the IO device access control unit. The method is the same as in the first embodiment. Alternatively, the hypervisor 110 may prioritize the access to the IO device 1 from the system operation system 100 and the log control system 200 and prioritize the IO device access from the system operation system 100. By performing IO device access with priority, it is possible to reduce the influence on the system operation system 100, that is, the waiting time when an IO device access conflict occurs.

実施の形態３．
図９、図１０を参照して実施の形態３のトレース装置１００３を説明する。
図９は実施の形態３による、トレース装置１００３の構成図である。図９は、実施の形態１で示したマルチコアプロセッサ内にトレース装置を構築する例において、計算機の入出力を監視・制御するプログラムである、モニタプログラム１２０を利用し、プログラム動作情報を格納する共有メモリ２３の領域を保護する構成である。図９のトレース装置１００３は、実施の形態１の構成にモニタプログラム１２０を搭載する構成である。
図１０は、システム動作系１００のアプリケーションプログラム１０４が、共有メモリ２３へアクセスするフローチャートの例である。モニタプログラム１２０は、共有メモリ２３への不正なアクセスを防止する。共有メモリ２３のプログラム動作情報の領域へアクセスするのは、プログラム動作情報を書き出すプログラムが入っているプロセス（プログラム情報書込部１０７）のみである。図１０に示すように、モニタプログラム１２０は、共有メモリ２３へのアクセスを監視し、プログラム動作情報を書き込むプロセスかどうか判定（ＳＴ２０）する。つまり、モニタプログラム１２０は、ＳＴ２０において、共有メモリ２３へ書き込みを行うのがプログラム情報書込部１０７かどうかを判定する。モニタプログラム１２０は、プログラム動作情報を書き込むプロセス（プログラム情報書込部１０７）である場合、共有メモリ２３へのアクセスを許可（ＳＴ２１）し、プログラム動作情報を書き込むプロセス（プログラム情報書込部１０７）でない場合、共有メモリ２３へのアクセスを禁止（ＳＴ２２）する。 Embodiment 3 FIG.
A trace device 1003 according to the third embodiment will be described with reference to FIGS.
FIG. 9 is a configuration diagram of the trace device 1003 according to the third embodiment. FIG. 9 shows an example of constructing a trace device in the multi-core processor shown in the first embodiment, and uses a monitor program 120, which is a program for monitoring and controlling computer input / output, and stores program operation information. In this configuration, the area of the memory 23 is protected. The trace device 1003 in FIG. 9 has a configuration in which the monitor program 120 is installed in the configuration of the first embodiment.
FIG. 10 is an example of a flowchart in which the application program 104 of the system operation system 100 accesses the shared memory 23. The monitor program 120 prevents unauthorized access to the shared memory 23. Only the process (program information writing unit 107) containing the program for writing the program operation information accesses the program operation information area of the shared memory 23. As shown in FIG. 10, the monitor program 120 monitors access to the shared memory 23 and determines whether the process is a process for writing program operation information (ST20). That is, the monitor program 120 determines in ST20 whether or not it is the program information writing unit 107 that writes to the shared memory 23. If the monitor program 120 is a process for writing program operation information (program information writing unit 107), the process for permitting access to the shared memory 23 (ST21) and writing the program operation information (program information writing unit 107) If not, access to the shared memory 23 is prohibited (ST22).

モニタプログラム１２０（監視プログラム）は、マルチコアプロセッサの有する複数のＣＰＵ１０１−１〜１０１−３及びＣＰＵ２０１のうち、少なくともいずれかのＣＰＵを用いることにより、共有メモリ２３への書き込みを監視して書込みプログラム以外のプログラムからの共有メモリ２３への書き込みを禁止する。 The monitor program 120 (monitoring program) is a program other than a writing program that monitors writing to the shared memory 23 by using at least one of the CPUs 101-1 to 101-3 and CPU 201 of the multi-core processor. Writing to the shared memory 23 from the program is prohibited.

プログラム動作情報を書き込むプロセスかどうか判定する方法には、例えばプロセスＩＤを利用してもよい。モニタプログラム１２０によって、３ｒｄＰａｒｔｙアプリケーションや、悪意のあるプログラムから共有メモリ２３への不要なアクセスを防止でき、トレース装置の堅牢性が向上できる。 For example, a process ID may be used as a method for determining whether or not the process writes program operation information. The monitor program 120 can prevent unnecessary access to the shared memory 23 from a 3rd Party application or a malicious program, thereby improving the robustness of the trace apparatus.

実施の形態４．
図１１を参照して実施の形態４のトレース装置１００４を説明する。実施の形態１ではマルチコアプロセッサ内にトレース装置を構築する例を示したが、共有メモリ２３へアクセス可能な別プロセッサにログ制御系を構築することも可能である。
図１１は４個のＣＰＵ１０１−１〜１０１−４を備えたマルチコアプロセッサにシステム動作系１００を構築する。また、通信路３を介して共有メモリ２３へアクセス可能なＣＰＵ２０１を備えた別プロセッサに、ログ制御系２００を構築する例である。ログ制御系２００を構築する別プロセッサに専用のＩＯデバイス４が存在する場合、システム動作系１００とログ制御系２００とではＩＯデバイスへのアクセス競合は発生しないため、ＩＯアクセス制御部は不要である。ログ制御系２００を別プロセッサに構築する点と、ＩＯデバイスへのアクセス競合が発生しないハードウェア構成においてＩＯアクセス制御部が不要であること以外は、実施の形態１と同様である。 Embodiment 4 FIG.
A tracing device 1004 according to the fourth embodiment will be described with reference to FIG. In the first embodiment, an example in which a trace device is built in a multi-core processor has been shown, but it is also possible to build a log control system in another processor that can access the shared memory 23.
In FIG. 11, the system operation system 100 is constructed on a multi-core processor having four CPUs 101-1 to 101-4. Further, in this example, the log control system 200 is constructed in another processor including a CPU 201 that can access the shared memory 23 via the communication path 3. When the dedicated IO device 4 exists in another processor that constructs the log control system 200, the system operation system 100 and the log control system 200 do not cause access competition to the IO device, and thus the IO access control unit is unnecessary. . This embodiment is the same as the first embodiment except that the log control system 200 is constructed in a separate processor and that an IO access control unit is unnecessary in a hardware configuration that does not cause access competition to the IO device.

以上に説明した実施の形態１〜４のトレース装置１００１〜１００４は、以下の効果を有する。
（１）トレース装置１００１〜１００４は、ＣＰＵリソースや処理時間を要するＩＯデバイス１への出力処理を、システムが動作するＣＰＵとは別の専用ＣＰＵで処理する。このため、システム側（システム動作系１００）はプログラム動作情報の出力処理負荷の影響を受けず動作することが可能である。
（２）また、システム動作系１００とログ制御系２００とは異なるシステムで動作する。よって、プログラム動作情報の取得対象であるシステム動作系１００がシステムダウンした状態においても、異なるシステムで動作するログ制御系２００により共有メモリ２からプログラム動作情報を取得し、取得したプログラム動作情報をＩＯデバイス１へ出力することが可能である。
（３）また、ログ制御系２００がシステム動作系１００の稼働状態を監視し、システム動作系１００の異常を検出した場合は、共有メモリ２に書き込まれたプログラム動作情報を強制的にＩＯデバイス１へ出力する。従って、プログラム動作情報が未出力のためプログラム動作情報を解析できないという事態を防ぐことが可能である。
（４）またシステム動作系１００に異常発生時、ログ制御系２００からシステム動作系１００のＣＰＵを停止さてキャッシュ情報を取得する。このため、異常状態に陥ったシステム動作系１００の直前の原因解析が可能である。
（５）共有メモリ２３のプログラム動作情報を格納する格納領域へのアクセスは、システム動作系１００は書き込みのみ、ログ制御系２００は読み込みのみ行う。このため、トレース装置１００１〜１００４の堅牢性を高めることができる。
（６）実施の形態１〜４のトレース装置では既存のマルチコア技術を使用し、トレース装置１００１〜１００４を実現しているため、システムのサイズや、部品点数への影響は無く、コストへの影響もない。そのため、マルチコアプロセッサを搭載した既存システムへの導入や、また近年組込み機器において用いられることが多いＳｏＣ（Ｓｙｓｔｅｍ−ｏｎ−ａ−ｃｈｉｐ）やＳｉＰ（ＳｙｓｔｅｍＩｎＰａｃｋａｇｅ）を使用したシステムにも、ハードウェアを変更することなく、導入が可能である。 The tracing devices 1001 to 1004 according to the first to fourth embodiments described above have the following effects.
(1) The trace devices 1001 to 1004 process output processing to the IO device 1 that requires CPU resources and processing time by a dedicated CPU that is different from the CPU on which the system operates. Therefore, the system side (system operation system 100) can operate without being affected by the output processing load of the program operation information.
(2) The system operation system 100 and the log control system 200 operate in different systems. Therefore, even when the system operation system 100 that is the acquisition target of the program operation information is in a system down state, the program operation information is acquired from the shared memory 2 by the log control system 200 operating in a different system, and the acquired program operation information is It is possible to output to the device 1.
(3) When the log control system 200 monitors the operating state of the system operation system 100 and detects an abnormality in the system operation system 100, the program operation information written in the shared memory 2 is forcibly stored in the IO device 1. Output to. Accordingly, it is possible to prevent a situation in which the program operation information cannot be analyzed because the program operation information is not output.
(4) When an abnormality occurs in the system operation system 100, the CPU of the system operation system 100 is stopped from the log control system 200 to acquire cache information. Therefore, it is possible to analyze the cause immediately before the system operation system 100 that has entered an abnormal state.
(5) Access to the storage area for storing program operation information in the shared memory 23 is performed only by the system operation system 100 and only by the log control system 200. For this reason, the robustness of the trace devices 1001 to 1004 can be enhanced.
(6) Since the trace devices according to the first to fourth embodiments use the existing multi-core technology and realize the trace devices 1001 to 1004, the system size and the number of parts are not affected, and the cost is affected. Nor. Therefore, it can be introduced into an existing system equipped with a multi-core processor, or in a system using SoC (System-on-a-chip) or SiP (System In Package), which is often used in embedded devices in recent years. It can be introduced without changing.

以上の実施の形態の説明において、「〜部」として説明したものは、「〜手段」であってもよく、また、「〜ステップ」、「〜手順」、「〜処理」であってもよい。すなわち、「〜部」として説明したものは、ソフトウェアのみ、或いは、ソフトウェアとハードウェアとの組み合わせ、さらには、ファームウェアとの組み合わせで実施されても構わない。ファームウェアとソフトウェアは、プログラムとして、磁気ディスク等の記録媒体に記憶される。プログラムはＣＰＵにより読み出され、ＣＰＵにより実行される。すなわちプログラムは、以上に述べた「〜部」としてコンピュータを機能させるものである。あるいは、以上に述べた「〜部」の手順や方法をコンピュータに実行させるものである。 In the above description of the embodiment, what has been described as “to part” may be “to means”, and may also be “to step”, “to procedure”, and “to process”. . That is, what has been described as “˜unit” may be implemented by software alone, a combination of software and hardware, or a combination of firmware. Firmware and software are stored in a recording medium such as a magnetic disk as a program. The program is read by the CPU and executed by the CPU. That is, the program causes the computer to function as the “to part” described above. Alternatively, the computer executes the procedure and method of “to part” described above.

以上の実施の形態ではトレース装置を説明したが、トレース装置の動作はコンピュータをトレース装置として機能させるためのプログラムとしても把握できることは以上の説明から当然である。また、トレース装置の各「〜部」の動作は、トレース方法としても把握できることは以上の説明により明らかである。 Although the tracing device has been described in the above embodiment, it is obvious from the above description that the operation of the tracing device can be grasped as a program for causing a computer to function as a tracing device. Further, it is apparent from the above description that the operation of each “˜unit” of the trace apparatus can be grasped as a trace method.

以上、実施の形態を説明したが、これらの実施の形態のうち、２つ以上を組み合わせて実施しても構わない。あるいは、これらの実施の形態のうち、１つを部分的に実施しても構わない。あるいは、これらの実施の形態のうち、２つ以上を部分的に組み合わせて実施しても構わない。なお、本発明はこれらの実施の形態に限定されるものではなく、必要に応じて種々の変形が可能である。 As mentioned above, although embodiment was described, you may implement combining 2 or more of these embodiments. Alternatively, one of these embodiments may be partially implemented. Alternatively, two or more of these embodiments may be partially combined. In addition, this invention is not limited to these embodiment, A various deformation | transformation is possible as needed.

１ＩＯデバイス、２メモリ、３通信路、２１システム動作系メモリ領域、２２ログ制御系メモリ領域、２３共有メモリ、１０１−１，１０１−２，１０１−３，２０１マルチコアプロセッサのＣＰＵ、１０２オペレーティングシステム、１０３，２０３ＣＰＵ間通信部、１０４アプリケーションプログラム、１０５，２０５ＩＯアクセス制御部、１０６，２０６異常検出処理部、１００システム動作系、２００ログ制御系、１００１，１００２，１００３，１００４トレース装置。 1 IO device, 2 memory, 3 communication path, 21 system operation system memory area, 22 log control system memory area, 23 shared memory, 101-1, 101-2, 101-3, 201 CPU of multi-core processor, 102 operating system , 103, 203 CPU communication unit, 104 application program, 105, 205 IO access control unit, 106, 206 abnormality detection processing unit, 100 system operation system, 200 log control system, 1001, 1002, 1003, 1004 trace device.

Claims

In a trace device equipped with a multi-core processor having a plurality of CPUs (Central Processing Units),
A system operation unit that executes a program using one or more of the CPUs of the plurality of CPUs of the multi-core processor and collects program operation information indicating an operation history of the program;
Of the plurality of CPUs of the multi-core processor, one or more CPUs different from the CPU used by the system operation unit are used and are independent of the system operation unit without being affected by the operating status of the system operation unit. And a log control unit that outputs the program operation information collected by the system operation unit to an IO (Input / Output) device.

The trace device further includes a storage device,
The system operating unit is
Write the collected program operation information to the storage device,
The log control unit
2. The trace apparatus according to claim 1, wherein the program operation information written by the system operation unit is read from the storage device, and the read program operation information is output to the IO device.

The storage device
The trace apparatus according to claim 2, wherein the system operation unit and the log control unit are shared memories.

The area for writing the program operation information in the shared memory is
4. The trace apparatus according to claim 3, wherein only the system operation unit can write.

The log control unit
The operating state of the system operation unit is monitored, and if it is determined that an abnormality has occurred in the system operation unit as a result of monitoring, the program operation information existing in the storage device is forcibly output to the IO device. The trace device according to any one of claims 2 to 4, wherein

The system operating unit is
When monitoring the operating state of the system operation unit and, as a result of monitoring, determining that an abnormality has occurred in the system operation unit, notify the log control unit of the occurrence of the abnormality,
The log control unit
6. When the occurrence of an abnormality is notified from the system operation unit, the program operation information existing in the storage device is forcibly output to the IO device. Trace device.

The log control unit
When the program operation information existing in the storage device is forcibly output to the IO device, the cache information cached in the CPU used by the system operation unit is acquired, and the acquired cache information is stored in the storage device. The trace device according to claim 5, wherein the trace device is stored in the device.

The system operating unit is
A program information writing unit for writing the program operation information to the shared memory;
The trace device further comprises:
Write monitoring unit that monitors writing to shared memory and prohibits writing to shared memory without using program information writing unit by using at least one of the plurality of CPUs of the multi-core processor The trace apparatus according to claim 3, further comprising:

In the trace device which is a computer equipped with a multi-core processor having a plurality of CPUs (Central Processing Units),
A collection process of executing a program using one or more CPUs of the plurality of CPUs of the multi-core processor and collecting program operation information indicating a history of a processing status of the program;
Of the plurality of CPUs of the multi-core processor, one or more CPUs different from the CPU used for the collection process are used, and the operation is independent of the collection process without being affected by the operation status of the collection process. And a trace program for executing an output process for outputting the program operation information collected by the collection process to an IO (Input / Output) device.