WO2015177927A1 - Information processing device - Google Patents

Information processing device Download PDF

Info

Publication number
WO2015177927A1
WO2015177927A1 PCT/JP2014/063715 JP2014063715W WO2015177927A1 WO 2015177927 A1 WO2015177927 A1 WO 2015177927A1 JP 2014063715 W JP2014063715 W JP 2014063715W WO 2015177927 A1 WO2015177927 A1 WO 2015177927A1
Authority
WO
WIPO (PCT)
Prior art keywords
processor
processors
information
information processing
processing apparatus
Prior art date
Application number
PCT/JP2014/063715
Other languages
French (fr)
Japanese (ja)
Inventor
谷口 斉
伊部 英史
巧 上薗
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Priority to PCT/JP2014/063715 priority Critical patent/WO2015177927A1/en
Publication of WO2015177927A1 publication Critical patent/WO2015177927A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/18Error detection or correction of the data by redundancy in hardware using passive fault-masking of the redundant circuits

Definitions

  • the present invention relates to the technology of an information processing apparatus.
  • the soft error resistance can be improved by operating the two processors synchronously.
  • a technique such as mutual monitoring, tripling, or monitoring by an external CPU.
  • An object of the present invention is to increase the degree of freedom of hardware design by realizing soft error tolerance by software.
  • an information processing apparatus compares two or more processors, a memory that can be read and written from each processor associated in advance, and an operation result obtained by comparing the operation results by the processor.
  • a result comparison unit for identifying wherein the processor stores at least information for identifying a processing state in a communication area on the memory where writing from another processor other than the processor is disabled,
  • the processor executes a predetermined process, the processor reads out information specifying the processing state from the communication area associated with any one of the processors that performs the same process as the process. If the result matches the processing state, the result comparison unit is instructed to compare the calculation result.
  • FIG. 1 is a configuration diagram of an information processing apparatus according to a first embodiment of the present invention. It is a block diagram of the information processing apparatus which concerns on the 2nd Embodiment of this invention. It is a block diagram of the information processing apparatus which concerns on the 3rd Embodiment of this invention. It is a figure which shows the example of the hardware constitutions of the information processing apparatus to which embodiment of this invention is applied. It is a figure which shows the example of the flowchart of the software processing of the information processing apparatus to which embodiment of this invention is applied. It is a figure which shows the example of the software hierarchy structure of the information processing apparatus to which embodiment of this invention is applied.
  • Information processing devices are communication devices such as hubs, routers, gateways, firewalls, etc., but are not limited to these, and various information processing devices such as server devices, mobile phones, tablet terminals, personal computers, smart meters, etc. It may be.
  • FIG. 1 is a configuration diagram of an information processing apparatus according to the first embodiment of the present invention.
  • the information processing apparatus according to the first embodiment includes a first processor 1 to an nth processor 1n, which are two or more processors, and a first local memory 2 to an nth local memory provided for each processor. 2n, a result comparison unit / error processor detection unit 3, and a shared storage unit 40.
  • the shared storage unit 40 has a bank which is a storage area associated with each processor in advance.
  • a bank associated with a processor can be both read and written from the processor.
  • access to the bank from a processor other than the associated processor can be read. That is, the shared storage unit 40 can function as a communication area to other processors.
  • any one of the first processor 1 to the n-th processor 1n caused an unexpected operation due to a soft error due to the configuration in which the shared storage unit 40 can only be read with an uncorrelated processor. Even in this case, it is possible to prevent erroneous data from being written in the shared storage unit 40 associated with another processor.
  • TMR Multiple Modular Redundant
  • the first local memory 2 to the nth local memory 2n associated with each of the first processor 1 to the nth processor 1n store programs to be executed and data.
  • an internal state, a calculation state, a calculation result, and the like are stored as information for specifying a processing state of each processor.
  • the Note that the programs executed by each processor are substantially the same.
  • Each of the first processor 1 to the n-th processor 1n reads the state of the other processor from the bank 4n associated with the other processor, and from the user designation and / or information in the shared storage unit 40, the DMR (The configuration is such that Double Modular Redundant, TMR, and nMR (n is greater than 3) are changed according to a predetermined standard.
  • the first processor 1 to the n-th processor 1n autonomously make a determination such as waiting for the next process when the calculation of the other processor is not completed by comparing with the calculation result of the own processor. Further, such comparison and standby processing is performed at the timing of flushing the cache such as SRAM to the local memory. That is, the program is executed in a loosely synchronized state with other processors.
  • Each processor outputs the calculation result to the result comparison unit / error processor detection unit 3 in order to verify whether or not the correct calculation result has been obtained when the calculation result of two or more processors including its own processor is obtained. Instruct to compare.
  • the result comparison unit / error processor detection unit 3 compares the calculation results in each processor and instructs the shared storage unit 40 to store the calculation results as necessary. Further, a processor that does not obtain a calculation result within an expected predetermined period (for example, within a period until the cache is flushed to the local memory) is detected, and a reset instruction 5 is issued to the processor. For example, in the case of an operation result mismatch when there are two processors, both processors are instructed to perform recalculation (not shown). Furthermore, in order to deal with the vulnerability of TMR, the processor is reset at an appropriate timing (not shown).
  • DMR, TMR, and nMR (n is greater than 3) having different tolerances for soft error tolerance and tolerance levels for system suspension can be realized with one type of hardware architecture.
  • nMR n is greater than 3
  • the hardware can be shared to reduce the development man-hours and realize cost reduction.
  • FIG. 2 is a configuration diagram of an information processing apparatus according to the second embodiment of the present invention.
  • the information processing apparatus according to the second embodiment includes a first processor 1 to an nth processor 1n that are two or more processors, and a first local memory 2 to an nth local memory 2n provided for each processor. And a result comparison unit / error processor detection unit 3. Further, in each of the first local memory 2 to the nth local memory 2n, there are provided shared storage units 7 to 7n that are shared with other processors other than the processors associated in advance. . Each of the storage units 7 to 7n is a partial storage area of each of the first local memory 2 to the nth local memory 2n.
  • Each storage unit of the shared storage unit 7 to the shared storage unit 7n is capable of both reading and writing from a processor associated in advance. In addition, access to the storage unit from other processors than the associated processor can be read. That is, each of the storage units 7 to 7n can function as a communication area to other processors.
  • the shared storage unit 7 to the shared storage unit 7n can be read only with the processors that are not associated with each other in this way, any one of the first processor 1 to the n-th processor 1n is caused by a soft error. Even when an unexpected operation is performed, it is possible to prevent erroneous data from being written in the shared storage unit 7 to the shared storage unit 7n associated with other processors.
  • TMR magnetic resonance
  • the first local memory 2 to the nth local memory 2n associated with each of the first processor 1 to the nth processor 1n store programs to be executed and data.
  • the shared storage unit 7 to shared storage unit 7n an internal state, a calculation state, a calculation result, and the like are stored as information for specifying the processing state of each processor.
  • the programs executed by each processor are substantially the same.
  • Each of the first processor 1 to the n-th processor 1n reads the state of the other processor from the shared storage unit 7 to the shared storage unit 7n associated with the other processor, and is designated by the user and / or the shared storage unit. 7 to DMR, TMR, and nMR (n is larger than 3) are changed from information in the shared storage unit 7n according to a predetermined standard.
  • the first processor 1 to the n-th processor 1n autonomously make a determination such as waiting for the next process when the calculation of the other processor is not completed by comparing with the calculation result of the own processor. Further, such comparison and standby processing is performed at the timing of flushing the cache such as SRAM to the local memory. That is, the program is executed in a loosely synchronized state with other processors.
  • Each processor outputs the calculation result to the result comparison unit / error processor detection unit 3 in order to verify whether or not the correct calculation result has been obtained when the calculation result of two or more processors including its own processor is obtained. Instruct to compare.
  • the result comparison unit / error processor detection unit 3 compares the calculation results in each processor and stores the calculation results 6 in the shared storage unit 7 to the shared storage unit 7n as necessary. Further, a processor that does not obtain a calculation result within an expected predetermined period (for example, within a period until the cache is flushed to the local memory) is detected, and a reset instruction 5 is issued to the processor. For example, in the case of an operation result mismatch when there are two processors, both processors are instructed to perform recalculation (not shown). Furthermore, in order to deal with the vulnerability of TMR, the processor is reset at an appropriate timing (not shown).
  • DMR, TMR, and nMR (n is greater than 3) having different tolerances for soft error tolerance and tolerance levels for system suspension can be realized with one type of hardware architecture.
  • nMR n is greater than 3
  • the hardware can be shared to reduce the development man-hours and realize cost reduction.
  • FIG. 3 is a configuration diagram of an information processing apparatus according to the third embodiment of the present invention.
  • the information processing apparatus according to the third embodiment includes a first processor 1 to an nth processor 1n, which are two or more processors, and a first local memory 2 to an nth local memory 2n provided for each processor. And an error processor detection unit 8.
  • each of the first local memory 2 to the nth local memory 2n there are provided shared storage units 7 to 7n that are shared with other processors other than the processors associated in advance.
  • Each of the storage units 7 to 7n is a partial storage area of each of the first local memory 2 to the nth local memory 2n.
  • Each storage unit of the shared storage unit 7 to the shared storage unit 7n is capable of both reading and writing from a processor associated in advance.
  • access to the storage unit from other processors than the associated processor can be read. That is, each of the storage units 7 to 7n can function as a communication area to other processors.
  • the shared storage unit 7 to the shared storage unit 7n can be read only with the processors that are not associated with each other in this way, any one of the first processor 1 to the n-th processor 1n is caused by a soft error. Even when an unexpected operation is performed, it is possible to prevent erroneous data from being written in the shared storage unit 7 to the shared storage unit 7n associated with other processors.
  • TMR magnetic resonance
  • the first local memory 2 to the nth local memory 2n associated with each of the first processor 1 to the nth processor 1n store programs to be executed and data.
  • the shared storage unit 7 to shared storage unit 7n an internal state, a calculation state, a calculation result, and the like are stored as information for specifying the processing state of each processor.
  • the programs executed by each processor are substantially the same.
  • Each of the first processor 1 to the n-th processor 1n reads the state of the other processor from the shared storage unit 7 to the shared storage unit 7n associated with the other processor, and is designated by the user and / or the shared storage unit. 7 to DMR, TMR, and nMR (n is larger than 3) are changed from information in the shared storage unit 7n according to a predetermined standard.
  • the first processor 1 to the n-th processor 1n autonomously make a determination such as waiting for the next process when the calculation of the other processor is not completed by comparing with the calculation result of the own processor. Further, such comparison and standby processing is performed at the timing of flushing the cache such as SRAM to the local memory. That is, the program is executed in a loosely synchronized state with other processors. Then, each processor compares the operation results and identifies the operation results by majority vote, etc., in order to verify whether the correct operation results have been obtained when the operation results of two or more processors including its own processor are obtained. To do. Then, the first processor 1 to the n-th processor 1 n issue an instruction to store the calculation results to the shared storage unit 7 to the shared storage unit 7 n.
  • the error processor detection unit 8 detects a processor for which an operation result is not obtained within an assumed predetermined period (for example, within a period until the cache is flushed to the local memory), and issues a reset instruction 5 to the processor. For example, in the case of an operation result mismatch when there are two processors, both processors are instructed to perform recalculation (not shown). Furthermore, in order to deal with the vulnerability of TMR, the processor is reset at an appropriate timing (not shown).
  • DMR, TMR, and nMR (n is greater than 3) having different tolerances for soft error tolerance and tolerance levels for system suspension can be realized with one type of hardware architecture.
  • nMR n is greater than 3
  • the hardware can be shared to reduce the development man-hours and realize cost reduction.
  • FIG. 4 is a configuration example of the soft error tolerance information processing device 9 including the arithmetic unit 10 having the configuration according to any one of the first to third embodiments.
  • the soft error tolerant information processing device 9 includes an arithmetic unit 10 having a configuration according to any one of the first to third embodiments, a ROM (Read Only Memory) 11 for recording various programs, and the like. And an input / output interface unit 12 (hereinafter abbreviated as input / output IF) for performing input / output with an external device.
  • the input / output IF 12 is connected to an input device 13 such as a keyboard, an output device 14 such as a display, and an input / output device 15 that exchanges various information related to information processing targets.
  • the soft error tolerance information processing apparatus 9 can be restored simply by resetting the corresponding processor. Furthermore, since the operation unit 10 can be operated by appropriately switching any one of DMR, TMR, and nMR, an appropriate operation result can be obtained, and information processing can be continued even at the time of the corresponding error. Furthermore, the hardware of the arithmetic unit 10 can be made common to reduce the development man-hours and realize cost reduction.
  • FIG. 5 is a diagram showing an example of a flowchart of software processing of the information processing apparatus to which the first embodiment or the second embodiment of the present invention is applied.
  • a flowchart of software processing of the information processing apparatus to which the third embodiment is applied will be described later. Since each of the first processor 1 to the n-th processor 1n performs the same processing, the description will be made assuming that the first processor 1 performs the processing representatively for the sake of simplicity.
  • the first processor 1 performs initialization (step S001). Specifically, the first processor 1 initializes the processor itself and initializes the corresponding first local memory 2.
  • the first processor 1 initializes the shared storage unit (step S002). Specifically, the first processor 1 includes a calculation program start PC (program counter) stored in the bank of the shared storage unit 40 or the shared storage unit 7, a start PC of the next calculation program, and processor status information ( Initialize calculation wait, calculation in progress, calculation end wait, next calculation wait, etc.).
  • a calculation program start PC program counter
  • processor status information Initialize calculation wait, calculation in progress, calculation end wait, next calculation wait, etc.
  • the first processor 1 loads information such as predetermined calculation conditions into the shared storage unit 40 or the shared storage unit 7, and turns on the calculation waiting flag of the common storage unit (step S003).
  • the first processor 1 reads the calculation waiting flag in the shared storage unit of the other processor and determines the calculation start (step S004). Specifically, the first processor 1 reads the status information of the other processors in the shared storage unit 40 or the shared storage unit 7n, autonomously determines the overall system state from the other processor states, and starts the calculation. As a result, the program is executed in a gradual synchronization state in all the processors.
  • the first processor 1 performs calculation (step S005). Specifically, the first processor 1 executes a program executed by the first processor itself.
  • the first processor 1 can store the survival signal of the own processor in the bank 4 or the shared storage unit 7 associated with the own processor of the shared storage unit 40 and can check the survival signal from other processors during the calculation. By doing so, it may be possible to detect and reset an abnormal processor.
  • the first processor 1 stores the calculation result in the shared storage unit 40 or the shared storage unit 7 and turns on the calculation end flag (step S006).
  • the first processor 1 determines the status of other processors (step S007). Specifically, the first processor 1 reads the calculation end flag of the bank 4n of the shared storage unit 40 of the other processor or the shared storage unit 7n, and waits for the processing until the calculation end flag of the other processor is completed. When the calculation is completed, the result comparison unit / error processor detection unit 3 is instructed to perform comparison processing.
  • the result comparison unit / error processor detection unit 3 determines the majority of the calculation results (step S008). Specifically, the result comparison unit / error processor detection unit 3 reads out the calculation result stored in the shared storage unit of all the processors constituting the DMR, TMR, and nMR, and determines the majority (for example, the number of processors is 3 Perform in the above case) or perform error determination on one of the processors (implemented when the number of processors is 2) to identify the calculation result.
  • the above is the software processing flow of the information processing apparatus to which the first embodiment or the second embodiment of the present invention is applied.
  • a calculation result can be obtained while avoiding a soft error.
  • a significant calculation result cannot be obtained, for example, the results are different among a plurality of processors, a significant calculation result can be obtained by re-execution.
  • the configurations of the DMR, TMR, and nMR can be appropriately set according to the situation, the dependency on the hardware configuration can be further reduced.
  • step S007 when checking the calculation end flag of the shared storage unit of the other processor and waiting until the calculation of all the processors is completed, the survival signal of the other processor is checked to detect an abnormality.
  • the abnormal processor may be reset (mutual monitoring during calculation).
  • the processor may be reset as an abnormal processor.
  • the processing it may be considered that the calculation of the other core is completed when the calculation of the processor is completed by subtracting the number of processors that performed the processing from the total number of processors.
  • step S008 if the calculation result is different in the error determination, all the processors are reset and recalculated. In the case of majority decision, a correct calculation result is obtained when the calculation results of two or more processors match. At this time, processors having different calculation results may be reset by assuming that an abnormality has occurred.
  • the flow of software processing of the information processing apparatus in the third embodiment is basically the same, but the main body that determines the majority of the calculation results in step S008 is one of the processors. Is different.
  • FIG. 6 is a diagram showing an example of the software hierarchy of the information processing apparatus to which the embodiment of the present invention is applied.
  • the hardware layer 601 is a hardware layer of the information processing apparatus.
  • Above the hardware layer 601 is a driver layer 602 that operates the hardware.
  • a soft error resistance layer 603 is provided on the hardware layer 601 and / or the driver layer 602.
  • the soft error resilience layer 603 includes at least switching of resilience methods such as DMR, TMR, and nMR, gradual synchronous operation based on status information of other processors in the shared storage unit, mutual monitoring during computation, Abstracts programs related to soft error tolerance such as majority decision, detection / reset of abnormal processors, recovery and resynchronization of abnormal processors.
  • an OS Operating System
  • a middleware library layer 605 it is not essential.
  • the OS may not be installed.
  • a user application 606 an application 606 created by the user
  • the software production man-hours are almost the same as the software production man-hours for non-soft error tolerance, and the increase in software development man-hours due to soft error tolerance can be suppressed.
  • each configuration, function, processing unit, processing means, and the like of each of the above embodiments may be realized by hardware by designing a part or all of them with, for example, an integrated circuit.
  • Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor.
  • Information such as programs, tables, and files that realize each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
  • control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.
  • the information processing apparatus has been described above based on the first to third embodiments.
  • the information processing apparatus is not limited to a so-called general-purpose computer, but may be applied to a communication apparatus such as a router or a gateway.
  • the present invention may be applied to devices that measure household power consumption, water consumption, gas consumption, and the like, such as mobile devices and smart meters.
  • devices that measure household power consumption, water consumption, gas consumption, and the like such as mobile devices and smart meters.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Hardware Redundancy (AREA)

Abstract

A software-based implementation technology regarding soft error tolerance is provided. An information processing device is provided with: two or more processors; a memory that can be read or written from each of the processors that are associated therewith in advance; and a result comparison unit that compares the results of operation by the processors and identifies an operation result, wherein each processor stores information identifying at least a processing state in a contact region of the memory that cannot be written by the other processors. The processor, after executing a predetermined process, reads the processing state identifying information from the contact region associated with any of the other processors that perform the same process as the predetermined process, and, if the information matches the processing state of the processor, instructs the result comparison unit to compare the results of operation.

Description

情報処理装置Information processing device
 本発明は、情報処理装置の技術に関する。 The present invention relates to the technology of an information processing apparatus.
 本技術分野の背景技術として、2つのプロセッサを同期動作させるロックステップ技術がある。 As a background technology in this technical field, there is a lock step technology that operates two processors in synchronization.
 上記技術においては、2つのプロセッサを同期動作させることで、ソフトエラー耐性を高めることができる。しかし、当該技術では、要求ソフトエラー耐性やシステム一時停止許容時間によっては、相互監視や三重化、あるいは外部CPUによる監視等の技術により実現した方がよい場合もある。すなわち、開発する機器ごとに最適なハードウェア設計が異なるために開発期間とコストの増加を招くおそれがあった。 In the above technology, the soft error resistance can be improved by operating the two processors synchronously. However, in this technique, depending on the required soft error tolerance and the system temporary suspension allowable time, it may be better to realize the technique by a technique such as mutual monitoring, tripling, or monitoring by an external CPU. In other words, since the optimum hardware design is different for each device to be developed, the development period and cost may increase.
 本発明の目的は、ソフトエラー耐性について、ソフトウェアにより実現することでハードウェア設計の自由度をより高めることにある。 An object of the present invention is to increase the degree of freedom of hardware design by realizing soft error tolerance by software.
 本願は、上記課題の少なくとも一部を解決する手段を複数含んでいるが、その例を挙げるならば、以下のとおりである。上記課題を解決すべく、本発明に係る情報処理装置は、二つ以上のプロセッサと、あらかじめ対応づけられた各前記プロセッサから読み書き可能なメモリと、上記プロセッサによる演算結果を比較して演算結果を特定する結果比較部と、を備え、前記プロセッサは、少なくとも処理状態を特定する情報を、前記メモリ上の前記プロセッサ以外の他の前記プロセッサからの書き込みを不可能とされた連絡領域に格納し、前記プロセッサは、所定の処理を実行すると、前記プロセッサのうち前記処理と同処理を行う他のプロセッサのいずれかに対応づけられた前記連絡領域から前記処理状態を特定する情報を読み出し、前記プロセッサの前記処理状態と一致する場合に前記結果比較部に演算結果を比較するよう指示する。 The present application includes a plurality of means for solving at least a part of the above-described problems, and examples thereof are as follows. In order to solve the above problems, an information processing apparatus according to the present invention compares two or more processors, a memory that can be read and written from each processor associated in advance, and an operation result obtained by comparing the operation results by the processor. And a result comparison unit for identifying, wherein the processor stores at least information for identifying a processing state in a communication area on the memory where writing from another processor other than the processor is disabled, When the processor executes a predetermined process, the processor reads out information specifying the processing state from the communication area associated with any one of the processors that performs the same process as the process. If the result matches the processing state, the result comparison unit is instructed to compare the calculation result.
 本発明によると、ソフトエラー耐性について、特定のハードウェア構造に依存せずソフトウェアにより実現することでハードウェア設計の自由度をより高めることが可能となる。上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。 According to the present invention, it is possible to increase the degree of freedom in hardware design by realizing soft error tolerance by software without depending on a specific hardware structure. Problems, configurations, and effects other than those described above will be clarified by the following description of embodiments.
本発明の第1の実施形態に係る情報処理装置の構成図である。1 is a configuration diagram of an information processing apparatus according to a first embodiment of the present invention. 本発明の第2の実施形態に係る情報処理装置の構成図である。It is a block diagram of the information processing apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第3の実施形態に係る情報処理装置の構成図である。It is a block diagram of the information processing apparatus which concerns on the 3rd Embodiment of this invention. 本発明の実施形態を適用した情報処理装置のハードウェア構成の例を示す図である。It is a figure which shows the example of the hardware constitutions of the information processing apparatus to which embodiment of this invention is applied. 本発明の実施形態を適用した情報処理装置のソフトウェア処理の流れ図の例を示す図である。It is a figure which shows the example of the flowchart of the software processing of the information processing apparatus to which embodiment of this invention is applied. 本発明の実施形態を適用した情報処理装置のソフトウェア階層構成の例を示す図である。It is a figure which shows the example of the software hierarchy structure of the information processing apparatus to which embodiment of this invention is applied.
 本発明に係るソフトエラーに対する耐性化を施した情報処理装置及びこれを用いた通信装置の実施例を、図面を用いて説明する。なお、情報処理装置は、ハブ、ルーター、ゲートウェイ、ファイアウォール等通信装置であるが、これに限られるものではなく、サーバー装置、携帯電話、タブレット端末、パーソナルコンピューター、スマートメーター等様々な情報処理装置であってよい。 Embodiments of an information processing apparatus and a communication apparatus using the same that have been made resistant to soft errors according to the present invention will be described with reference to the drawings. Information processing devices are communication devices such as hubs, routers, gateways, firewalls, etc., but are not limited to these, and various information processing devices such as server devices, mobile phones, tablet terminals, personal computers, smart meters, etc. It may be.
 図1は、本発明に係る第一の実施形態に係る情報処理装置の構成図である。第一の実施形態に係る情報処理装置は、2個以上のプロセッサである第一のプロセッサ1~第nのプロセッサ1nと、プロセッサごとに設けられた第一のローカルメモリ2~第nのローカルメモリ2nと、結果比較部・エラープロセッサ検出部3と、共有記憶部40と、を備える。 FIG. 1 is a configuration diagram of an information processing apparatus according to the first embodiment of the present invention. The information processing apparatus according to the first embodiment includes a first processor 1 to an nth processor 1n, which are two or more processors, and a first local memory 2 to an nth local memory provided for each processor. 2n, a result comparison unit / error processor detection unit 3, and a shared storage unit 40.
 共有記憶部40は、各プロセッサにあらかじめ対応づけられた記憶領域であるバンクを有している。共有記憶部40においては、プロセッサと対応づけられたバンクは、該プロセッサからは読み書きの両方を可能とされている。また、対応づけられたプロセッサ以外の他のプロセッサからバンクへのアクセスは、読み取り可能とされている。すなわち、共有記憶部40は、他のプロセッサへの連絡領域として機能することが可能である。 The shared storage unit 40 has a bank which is a storage area associated with each processor in advance. In the shared storage unit 40, a bank associated with a processor can be both read and written from the processor. In addition, access to the bank from a processor other than the associated processor can be read. That is, the shared storage unit 40 can function as a communication area to other processors.
 共有記憶部40をこのように、対応づけられていないプロセッサとの間では読み取りのみ可能とする構成により、第一のプロセッサ1~第nのプロセッサ1nのいずれかがソフトエラーにより予期しない動作をした場合においても、他プロセッサに対応づけられた共有記憶部40に誤ったデータを書き込まないようにすることができる。 In this way, any one of the first processor 1 to the n-th processor 1n caused an unexpected operation due to a soft error due to the configuration in which the shared storage unit 40 can only be read with an uncorrelated processor. Even in this case, it is possible to prevent erroneous data from being written in the shared storage unit 40 associated with another processor.
 なお、このような実施形態を実現するには、プロセッサを2個以上必要とする。プロセッサを3個設けると、TMR(Triple Modular Redundant)を実現できるため、さらに良好なエラー耐性を実現することができる。また、プロセッサを4個設けると、TMRの脆弱性を克服可能となり、より好ましい。 In order to realize such an embodiment, two or more processors are required. If three processors are provided, TMR (Triple Modular Redundant) can be realized, so that even better error resistance can be realized. In addition, it is more preferable to provide four processors because the vulnerability of TMR can be overcome.
 第一のプロセッサ1~第nのプロセッサ1nのそれぞれに対応付けられた第一のローカルメモリ2~第nのローカルメモリ2nには、実行するプログラムとデータが格納されている。共有記憶部40に設けられた各プロセッサに対応する第一のバンク4~第nのバンクには、各プロセッサの処理の状態を特定する情報として、内部状態・計算状態・計算結果等が保存される。なお、各プロセッサが実行するプログラムは、実質的に同一である。第一のプロセッサ1~第nのプロセッサ1nの各プロセッサは、他のプロセッサに対応づけられたバンク4nから他のプロセッサの状態を読みこみ、ユーザ指定及び又は共有記憶部40の情報から、DMR(Double Modular Redundant)、TMR、nMR(nは3より大きい)を所定の基準に応じて変更するよう構成する。 The first local memory 2 to the nth local memory 2n associated with each of the first processor 1 to the nth processor 1n store programs to be executed and data. In the first bank 4 to the n-th bank corresponding to each processor provided in the shared storage unit 40, an internal state, a calculation state, a calculation result, and the like are stored as information for specifying a processing state of each processor. The Note that the programs executed by each processor are substantially the same. Each of the first processor 1 to the n-th processor 1n reads the state of the other processor from the bank 4n associated with the other processor, and from the user designation and / or information in the shared storage unit 40, the DMR ( The configuration is such that Double Modular Redundant, TMR, and nMR (n is greater than 3) are changed according to a predetermined standard.
 さらに、第一のプロセッサ1~第nのプロセッサ1nは、自プロセッサの演算結果と比較し、他プロセッサの計算が終わっていない場合の次処理の待機などの判断を、自律的に行う。また、そのような比較や待機の処理を、SRAM等のキャッシュをローカルメモリへフラッシュするタイミング等において行う。つまり、他のプロセッサと緩やかな同期状態でプログラムを実行する。そして、各プロセッサは、自プロセッサを含む2個以上のプロセッサの演算結果が出そろうと、正しい計算結果を得られたか否かを検証するために、結果比較部・エラープロセッサ検出部3へ演算結果を比較するよう指示する。 Further, the first processor 1 to the n-th processor 1n autonomously make a determination such as waiting for the next process when the calculation of the other processor is not completed by comparing with the calculation result of the own processor. Further, such comparison and standby processing is performed at the timing of flushing the cache such as SRAM to the local memory. That is, the program is executed in a loosely synchronized state with other processors. Each processor outputs the calculation result to the result comparison unit / error processor detection unit 3 in order to verify whether or not the correct calculation result has been obtained when the calculation result of two or more processors including its own processor is obtained. Instruct to compare.
 結果比較部・エラープロセッサ検出部3は、必要に応じて、各プロセッサにおける演算結果の比較と、演算結果の共有記憶部40への記憶指示6を行う。さらに、想定される所定の期間内(例えば、キャッシュのローカルメモリへのフラッシュまでの期間内)に演算結果が得られないプロセッサを検出し、該プロセッサに対するリセット指示5を行う。例えば、プロセッサが2個の場合における演算結果不一致については、両プロセッサに対して再計算を行うよう指示する(図示せず)。さらに、TMRの脆弱性に対処するため、適切なタイミングでプロセッサのリセットを行う(図示せず)。 The result comparison unit / error processor detection unit 3 compares the calculation results in each processor and instructs the shared storage unit 40 to store the calculation results as necessary. Further, a processor that does not obtain a calculation result within an expected predetermined period (for example, within a period until the cache is flushed to the local memory) is detected, and a reset instruction 5 is issued to the processor. For example, in the case of an operation result mismatch when there are two processors, both processors are instructed to perform recalculation (not shown). Furthermore, in order to deal with the vulnerability of TMR, the processor is reset at an appropriate timing (not shown).
 以上のようにする事で、ソフトエラー耐性の許容度合いやシステム一時停止の許容水準が異なるDMR、TMR、nMR(nは3より大きい)を、1種類のハードウェアアーキテクチャで実現できる。このような第一の実施形態によれば、ソフトエラー耐性について、ソフトウェアにより実現することでハードウェア設計の自由度をより高めることが可能となる。すなわち、ハードウェアを共通化して開発工数を削減し低コスト化を実現することができるといえる。 By doing as described above, DMR, TMR, and nMR (n is greater than 3) having different tolerances for soft error tolerance and tolerance levels for system suspension can be realized with one type of hardware architecture. According to such a first embodiment, it is possible to further increase the degree of freedom in hardware design by realizing soft error tolerance by software. In other words, it can be said that the hardware can be shared to reduce the development man-hours and realize cost reduction.
 図2は、本発明に係る第二の実施形態に係る情報処理装置の構成図である。第二の実施形態における情報処理装置は、2個以上のプロセッサである第一のプロセッサ1~第nのプロセッサ1nと、プロセッサごとに設けられた第一のローカルメモリ2~第nのローカルメモリ2nと、結果比較部・エラープロセッサ検出部3と、を備える。また、第一のローカルメモリ2~第nのローカルメモリ2nのそれぞれの内部には、予め対応づけられたプロセッサ以外の他のプロセッサと共有する共有記憶部7~共有記憶部7nが設けられている。共有記憶部7~共有記憶部7nのそれぞれの記憶部は、第一のローカルメモリ2~第nのローカルメモリ2nのそれぞれの一部の記憶領域である。共有記憶部7~共有記憶部7nのそれぞれの記憶部は、予め対応づけられたプロセッサから読み書きの両方を可能とされている。また、対応づけられたプロセッサ以外の他のプロセッサから記憶部へのアクセスは、読み取り可能とされている。すなわち、共有記憶部7~共有記憶部7nのそれぞれの記憶部は、他のプロセッサへの連絡領域として機能することが可能である。 FIG. 2 is a configuration diagram of an information processing apparatus according to the second embodiment of the present invention. The information processing apparatus according to the second embodiment includes a first processor 1 to an nth processor 1n that are two or more processors, and a first local memory 2 to an nth local memory 2n provided for each processor. And a result comparison unit / error processor detection unit 3. Further, in each of the first local memory 2 to the nth local memory 2n, there are provided shared storage units 7 to 7n that are shared with other processors other than the processors associated in advance. . Each of the storage units 7 to 7n is a partial storage area of each of the first local memory 2 to the nth local memory 2n. Each storage unit of the shared storage unit 7 to the shared storage unit 7n is capable of both reading and writing from a processor associated in advance. In addition, access to the storage unit from other processors than the associated processor can be read. That is, each of the storage units 7 to 7n can function as a communication area to other processors.
 共有記憶部7~共有記憶部7nをこのように、対応づけられていないプロセッサとの間では読み取りのみ可能とする構成により、第一のプロセッサ1~第nのプロセッサ1nのいずれかがソフトエラーにより予期しない動作をした場合においても、他プロセッサに対応づけられた共有記憶部7~共有記憶部7nに誤ったデータを書き込まないようにすることができる。 Since the shared storage unit 7 to the shared storage unit 7n can be read only with the processors that are not associated with each other in this way, any one of the first processor 1 to the n-th processor 1n is caused by a soft error. Even when an unexpected operation is performed, it is possible to prevent erroneous data from being written in the shared storage unit 7 to the shared storage unit 7n associated with other processors.
 なお、このような実施形態を実現するには、プロセッサを2個以上必要とする。プロセッサを3個設けると、TMRを実現できるため、さらに良好なエラー耐性を実現することができる。また、プロセッサを4個設けると、TMRの脆弱性を克服可能となり、より好ましい。 In order to realize such an embodiment, two or more processors are required. When three processors are provided, TMR can be realized, so that even better error resistance can be realized. In addition, it is more preferable to provide four processors because the vulnerability of TMR can be overcome.
 第一のプロセッサ1~第nのプロセッサ1nのそれぞれに対応付けられた第一のローカルメモリ2~第nのローカルメモリ2nには、実行するプログラムとデータが格納されている。共有記憶部7~共有記憶部7nには、各プロセッサの処理の状態を特定する情報として、内部状態・計算状態・計算結果等が保存される。なお、各プロセッサが実行するプログラムは、実質的に同一である。第一のプロセッサ1~第nのプロセッサ1nの各プロセッサは、他のプロセッサに対応づけられた共有記憶部7~共有記憶部7nから他のプロセッサの状態を読みこみ、ユーザ指定及び又は共有記憶部7~共有記憶部7nの情報から、DMR、TMR、nMR(nは3より大きい)を所定の基準に応じて変更するよう構成する。 The first local memory 2 to the nth local memory 2n associated with each of the first processor 1 to the nth processor 1n store programs to be executed and data. In the shared storage unit 7 to shared storage unit 7n, an internal state, a calculation state, a calculation result, and the like are stored as information for specifying the processing state of each processor. Note that the programs executed by each processor are substantially the same. Each of the first processor 1 to the n-th processor 1n reads the state of the other processor from the shared storage unit 7 to the shared storage unit 7n associated with the other processor, and is designated by the user and / or the shared storage unit. 7 to DMR, TMR, and nMR (n is larger than 3) are changed from information in the shared storage unit 7n according to a predetermined standard.
 さらに、第一のプロセッサ1~第nのプロセッサ1nは、自プロセッサの演算結果と比較し、他プロセッサの計算が終わっていない場合の次処理の待機などの判断を、自律的に行う。また、そのような比較や待機の処理を、SRAM等のキャッシュをローカルメモリへフラッシュするタイミング等において行う。つまり、他のプロセッサと緩やかな同期状態でプログラムを実行する。そして、各プロセッサは、自プロセッサを含む2個以上のプロセッサの演算結果が出そろうと、正しい計算結果を得られたか否かを検証するために、結果比較部・エラープロセッサ検出部3へ演算結果を比較するよう指示する。 Further, the first processor 1 to the n-th processor 1n autonomously make a determination such as waiting for the next process when the calculation of the other processor is not completed by comparing with the calculation result of the own processor. Further, such comparison and standby processing is performed at the timing of flushing the cache such as SRAM to the local memory. That is, the program is executed in a loosely synchronized state with other processors. Each processor outputs the calculation result to the result comparison unit / error processor detection unit 3 in order to verify whether or not the correct calculation result has been obtained when the calculation result of two or more processors including its own processor is obtained. Instruct to compare.
 結果比較部・エラープロセッサ検出部3は、必要に応じて、各プロセッサにおける演算結果の比較と、演算結果の共有記憶部7~共有記憶部7nへの記憶指示6を行う。さらに、想定される所定の期間内(例えば、キャッシュのローカルメモリへのフラッシュまでの期間内)に演算結果が得られないプロセッサを検出し、該プロセッサに対するリセット指示5を行う。例えば、プロセッサが2個の場合における演算結果不一致については、両プロセッサに対して再計算を行うよう指示する(図示せず)。さらに、TMRの脆弱性に対処するため、適切なタイミングでプロセッサのリセットを行う(図示せず)。 The result comparison unit / error processor detection unit 3 compares the calculation results in each processor and stores the calculation results 6 in the shared storage unit 7 to the shared storage unit 7n as necessary. Further, a processor that does not obtain a calculation result within an expected predetermined period (for example, within a period until the cache is flushed to the local memory) is detected, and a reset instruction 5 is issued to the processor. For example, in the case of an operation result mismatch when there are two processors, both processors are instructed to perform recalculation (not shown). Furthermore, in order to deal with the vulnerability of TMR, the processor is reset at an appropriate timing (not shown).
 以上のようにする事で、ソフトエラー耐性の許容度合いやシステム一時停止の許容水準が異なるDMR、TMR、nMR(nは3より大きい)を、1種類のハードウェアアーキテクチャで実現できる。このような第一の実施形態によれば、ソフトエラー耐性について、ソフトウェアにより実現することでハードウェア設計の自由度をより高めることが可能となる。すなわち、ハードウェアを共通化して開発工数を削減し低コスト化を実現することができるといえる。 By doing as described above, DMR, TMR, and nMR (n is greater than 3) having different tolerances for soft error tolerance and tolerance levels for system suspension can be realized with one type of hardware architecture. According to such a first embodiment, it is possible to further increase the degree of freedom in hardware design by realizing soft error tolerance by software. In other words, it can be said that the hardware can be shared to reduce the development man-hours and realize cost reduction.
 図3は、本発明に係る第三の実施形態に係る情報処理装置の構成図である。第三の実施形態における情報処理装置は、2個以上のプロセッサである第一のプロセッサ1~第nのプロセッサ1nと、プロセッサごとに設けられた第一のローカルメモリ2~第nのローカルメモリ2nと、エラープロセッサ検出部8と、を備える。 FIG. 3 is a configuration diagram of an information processing apparatus according to the third embodiment of the present invention. The information processing apparatus according to the third embodiment includes a first processor 1 to an nth processor 1n, which are two or more processors, and a first local memory 2 to an nth local memory 2n provided for each processor. And an error processor detection unit 8.
 また、第一のローカルメモリ2~第nのローカルメモリ2nのそれぞれの内部には、予め対応づけられたプロセッサ以外の他のプロセッサと共有する共有記憶部7~共有記憶部7nが設けられている。共有記憶部7~共有記憶部7nのそれぞれの記憶部は、第一のローカルメモリ2~第nのローカルメモリ2nのそれぞれの一部の記憶領域である。共有記憶部7~共有記憶部7nのそれぞれの記憶部は、予め対応づけられたプロセッサから読み書きの両方を可能とされている。また、対応づけられたプロセッサ以外の他のプロセッサから記憶部へのアクセスは、読み取り可能とされている。すなわち、共有記憶部7~共有記憶部7nのそれぞれの記憶部は、他のプロセッサへの連絡領域として機能することが可能である。 Further, in each of the first local memory 2 to the nth local memory 2n, there are provided shared storage units 7 to 7n that are shared with other processors other than the processors associated in advance. . Each of the storage units 7 to 7n is a partial storage area of each of the first local memory 2 to the nth local memory 2n. Each storage unit of the shared storage unit 7 to the shared storage unit 7n is capable of both reading and writing from a processor associated in advance. In addition, access to the storage unit from other processors than the associated processor can be read. That is, each of the storage units 7 to 7n can function as a communication area to other processors.
 共有記憶部7~共有記憶部7nをこのように、対応づけられていないプロセッサとの間では読み取りのみ可能とする構成により、第一のプロセッサ1~第nのプロセッサ1nのいずれかがソフトエラーにより予期しない動作をした場合においても、他プロセッサに対応づけられた共有記憶部7~共有記憶部7nに誤ったデータを書き込まないようにすることができる。 Since the shared storage unit 7 to the shared storage unit 7n can be read only with the processors that are not associated with each other in this way, any one of the first processor 1 to the n-th processor 1n is caused by a soft error. Even when an unexpected operation is performed, it is possible to prevent erroneous data from being written in the shared storage unit 7 to the shared storage unit 7n associated with other processors.
 なお、このような実施形態を実現するには、プロセッサを2個以上必要とする。プロセッサを3個設けると、TMRを実現できるため、さらに良好なエラー耐性を実現することができる。また、プロセッサを4個設けると、TMRの脆弱性を克服可能となり、より好ましい。 In order to realize such an embodiment, two or more processors are required. When three processors are provided, TMR can be realized, so that even better error resistance can be realized. In addition, it is more preferable to provide four processors because the vulnerability of TMR can be overcome.
 第一のプロセッサ1~第nのプロセッサ1nのそれぞれに対応付けられた第一のローカルメモリ2~第nのローカルメモリ2nには、実行するプログラムとデータが格納されている。共有記憶部7~共有記憶部7nには、各プロセッサの処理の状態を特定する情報として、内部状態・計算状態・計算結果等が保存される。なお、各プロセッサが実行するプログラムは、実質的に同一である。第一のプロセッサ1~第nのプロセッサ1nの各プロセッサは、他のプロセッサに対応づけられた共有記憶部7~共有記憶部7nから他のプロセッサの状態を読みこみ、ユーザ指定及び又は共有記憶部7~共有記憶部7nの情報から、DMR、TMR、nMR(nは3より大きい)を所定の基準に応じて変更するよう構成する。 The first local memory 2 to the nth local memory 2n associated with each of the first processor 1 to the nth processor 1n store programs to be executed and data. In the shared storage unit 7 to shared storage unit 7n, an internal state, a calculation state, a calculation result, and the like are stored as information for specifying the processing state of each processor. Note that the programs executed by each processor are substantially the same. Each of the first processor 1 to the n-th processor 1n reads the state of the other processor from the shared storage unit 7 to the shared storage unit 7n associated with the other processor, and is designated by the user and / or the shared storage unit. 7 to DMR, TMR, and nMR (n is larger than 3) are changed from information in the shared storage unit 7n according to a predetermined standard.
 さらに、第一のプロセッサ1~第nのプロセッサ1nは、自プロセッサの演算結果と比較し、他プロセッサの計算が終わっていない場合の次処理の待機などの判断を、自律的に行う。また、そのような比較や待機の処理を、SRAM等のキャッシュをローカルメモリへフラッシュするタイミング等において行う。つまり、他のプロセッサと緩やかな同期状態でプログラムを実行する。そして、各プロセッサは、自プロセッサを含む2個以上のプロセッサの演算結果が出そろうと、正しい演算結果を得られたか否かを検証するために、演算結果を比較し、多数決等により演算結果を特定する。そして、第一のプロセッサ1~第nのプロセッサ1nは、演算結果の共有記憶部7~共有記憶部7nへの記憶指示を行う。 Further, the first processor 1 to the n-th processor 1n autonomously make a determination such as waiting for the next process when the calculation of the other processor is not completed by comparing with the calculation result of the own processor. Further, such comparison and standby processing is performed at the timing of flushing the cache such as SRAM to the local memory. That is, the program is executed in a loosely synchronized state with other processors. Then, each processor compares the operation results and identifies the operation results by majority vote, etc., in order to verify whether the correct operation results have been obtained when the operation results of two or more processors including its own processor are obtained. To do. Then, the first processor 1 to the n-th processor 1 n issue an instruction to store the calculation results to the shared storage unit 7 to the shared storage unit 7 n.
 エラープロセッサ検出部8は、想定される所定の期間内(例えば、キャッシュのローカルメモリへのフラッシュまでの期間内)に演算結果が得られないプロセッサを検出し、該プロセッサに対するリセット指示5を行う。例えば、プロセッサが2個の場合における演算結果不一致については、両プロセッサに対して再計算を行うよう指示する(図示せず)。さらに、TMRの脆弱性に対処するため、適切なタイミングでプロセッサのリセットを行う(図示せず)。 The error processor detection unit 8 detects a processor for which an operation result is not obtained within an assumed predetermined period (for example, within a period until the cache is flushed to the local memory), and issues a reset instruction 5 to the processor. For example, in the case of an operation result mismatch when there are two processors, both processors are instructed to perform recalculation (not shown). Furthermore, in order to deal with the vulnerability of TMR, the processor is reset at an appropriate timing (not shown).
 以上のようにする事で、ソフトエラー耐性の許容度合いやシステム一時停止の許容水準が異なるDMR、TMR、nMR(nは3より大きい)を、1種類のハードウェアアーキテクチャで実現できる。このような第一の実施形態によれば、ソフトエラー耐性について、ソフトウェアにより実現することでハードウェア設計の自由度をより高めることが可能となる。すなわち、ハードウェアを共通化して開発工数を削減し低コスト化を実現することができるといえる。 By doing as described above, DMR, TMR, and nMR (n is greater than 3) having different tolerances for soft error tolerance and tolerance levels for system suspension can be realized with one type of hardware architecture. According to such a first embodiment, it is possible to further increase the degree of freedom in hardware design by realizing soft error tolerance by software. In other words, it can be said that the hardware can be shared to reduce the development man-hours and realize cost reduction.
 図4は、第一の実施形態から第三の実施形態のいずれかの形態に係る構成を有する演算部10を搭載したソフトエラー耐性化情報処理装置9の構成例である。ソフトエラー耐性化情報処理装置9には、第一の実施形態から第三の実施形態のいずれかの形態に係る構成を有する演算部10と、各種プログラムを記録するROM(Read Only Memory)11と、外部装置との入出力を行う入出力インターフェース部12(以降、入出力IFと省略して表記する)とを有する。また、入出力IF12には、キーボードなどの入力装置13、ディスプレイなどの出力装置14、情報処理対象に関わる様々な情報の授受を行う入出力装置15が接続されている。 FIG. 4 is a configuration example of the soft error tolerance information processing device 9 including the arithmetic unit 10 having the configuration according to any one of the first to third embodiments. The soft error tolerant information processing device 9 includes an arithmetic unit 10 having a configuration according to any one of the first to third embodiments, a ROM (Read Only Memory) 11 for recording various programs, and the like. And an input / output interface unit 12 (hereinafter abbreviated as input / output IF) for performing input / output with an external device. The input / output IF 12 is connected to an input device 13 such as a keyboard, an output device 14 such as a display, and an input / output device 15 that exchanges various information related to information processing targets.
 上記のような構成を採用する事で、演算部10に含まれるプロセッサ1個でソフトエラーが発生した場合でも、該当エラーの影響を、該当プロセッサとそのローカルメモリに封じ込める事ができる。そのため、該当プロセッサのリセットを行うだけでソフトエラー耐性化情報処理装置9を復旧する事が出来るといえる。さらに、演算部10をDMR、TMR、nMRのいずれかを適切に切り替えて動作させる事が可能となるため、適切な演算結果が得られ、該当エラー時においても情報処理を継続する事が出来る。
さらに、演算部10のハードウェアを共通化して開発工数を削減し低コスト化を実現することも可能となる。
By adopting the configuration as described above, even when a soft error occurs in one processor included in the arithmetic unit 10, the influence of the error can be contained in the processor and its local memory. Therefore, it can be said that the soft error tolerance information processing apparatus 9 can be restored simply by resetting the corresponding processor. Furthermore, since the operation unit 10 can be operated by appropriately switching any one of DMR, TMR, and nMR, an appropriate operation result can be obtained, and information processing can be continued even at the time of the corresponding error.
Furthermore, the hardware of the arithmetic unit 10 can be made common to reduce the development man-hours and realize cost reduction.
 図5は、本発明の第一の実施形態あるいは第二の実施形態を適用した情報処理装置のソフトウェア処理の流れ図の例を示す図である。第三の実施形態を適用した情報処理装置のソフトウェア処理の流れ図については、後述する。第一のプロセッサ1~第nのプロセッサ1nのそれぞれは、同様の処理を行うものであるため、説明の簡便化のため、代表して第一のプロセッサ1が処理を行うものとして説明する。 FIG. 5 is a diagram showing an example of a flowchart of software processing of the information processing apparatus to which the first embodiment or the second embodiment of the present invention is applied. A flowchart of software processing of the information processing apparatus to which the third embodiment is applied will be described later. Since each of the first processor 1 to the n-th processor 1n performs the same processing, the description will be made assuming that the first processor 1 performs the processing representatively for the sake of simplicity.
 第一のプロセッサ1は、初期化を行う(ステップS001)。具体的には、第一のプロセッサ1は、プロセッサ自身の初期化と、対応する第一のローカルメモリ2の初期化と、を行う。 The first processor 1 performs initialization (step S001). Specifically, the first processor 1 initializes the processor itself and initializes the corresponding first local memory 2.
 そして、第一のプロセッサ1は、共有記憶部の初期化を行う(ステップS002)。具体的には、第一のプロセッサ1は、共有記憶部40のバンク、あるいは共有記憶部7に格納される計算プログラムの開始PC(プログラムカウンター)や次計算プログラムの開始PCやプロセッサのステータス情報(計算待、計算中、計算終了し待機、次計算待など)を初期化する。 Then, the first processor 1 initializes the shared storage unit (step S002). Specifically, the first processor 1 includes a calculation program start PC (program counter) stored in the bank of the shared storage unit 40 or the shared storage unit 7, a start PC of the next calculation program, and processor status information ( Initialize calculation wait, calculation in progress, calculation end wait, next calculation wait, etc.).
 そして、第一のプロセッサ1は、共有記憶部40あるいは共有記憶部7に所定の計算条件等の情報をロードして、共通記憶部の計算待ちフラグをONにする(ステップS003)。 Then, the first processor 1 loads information such as predetermined calculation conditions into the shared storage unit 40 or the shared storage unit 7, and turns on the calculation waiting flag of the common storage unit (step S003).
 そして、第一のプロセッサ1は、他プロセッサの共有記憶部の計算待ちフラグを読出して計算開始判定を行う(ステップS004)。具体的には、第一のプロセッサ1は、共有記憶部40あるいは共有記憶部7nの他プロセッサのステータス情報を読出し、他プロセッサ状態からシステムの全体状態を自律的に判断し、計算開始を行う。これにより、全プロセッサにおいて、緩やかな同期状態でプログラムが実行される。 Then, the first processor 1 reads the calculation waiting flag in the shared storage unit of the other processor and determines the calculation start (step S004). Specifically, the first processor 1 reads the status information of the other processors in the shared storage unit 40 or the shared storage unit 7n, autonomously determines the overall system state from the other processor states, and starts the calculation. As a result, the program is executed in a gradual synchronization state in all the processors.
 そして、第一のプロセッサ1は、計算を行う(ステップS005)。具体的には、第一のプロセッサ1は、第一のプロセッサ自身が実行するプログラムを実行する。なお、第一のプロセッサ1は、計算中に自プロセッサの生存信号を共有記憶部40の自プロセッサに対応付けられたバンク4あるいは共有記憶部7に記憶し、他プロセッサから生存信号をチェック可能とする事で、異常プロセッサの検出とリセットを可能とするようにしてもよい。 Then, the first processor 1 performs calculation (step S005). Specifically, the first processor 1 executes a program executed by the first processor itself. The first processor 1 can store the survival signal of the own processor in the bank 4 or the shared storage unit 7 associated with the own processor of the shared storage unit 40 and can check the survival signal from other processors during the calculation. By doing so, it may be possible to detect and reset an abnormal processor.
 そして、第一のプロセッサ1は、共有記憶部40あるいは共有記憶部7への、計算結果の記憶を行うとともに、計算終了フラグをONにする(ステップS006)。 Then, the first processor 1 stores the calculation result in the shared storage unit 40 or the shared storage unit 7 and turns on the calculation end flag (step S006).
 そして、第一のプロセッサ1は、他プロセッサの状態判定を行う(ステップS007)。具体的には、第一のプロセッサ1は、他プロセッサの共有記憶部40のバンク4nあるいは共有記憶部7nの計算終了フラグを読出し、他プロセッサの計算終了フラグが計算完了となるまで処理を待機し、計算完了となると結果比較部・エラープロセッサ検出部3に比較処理を指示する。 Then, the first processor 1 determines the status of other processors (step S007). Specifically, the first processor 1 reads the calculation end flag of the bank 4n of the shared storage unit 40 of the other processor or the shared storage unit 7n, and waits for the processing until the calculation end flag of the other processor is completed. When the calculation is completed, the result comparison unit / error processor detection unit 3 is instructed to perform comparison processing.
 そして、結果比較部・エラープロセッサ検出部3は、計算結果を多数決決定する(ステップS008)。具体的には、結果比較部・エラープロセッサ検出部3は、DMR、TMR、nMRを構成している全プロセッサの共有記憶部に記憶された計算結果を読み出して、多数決(例えば、プロセッサ数が3以上の場合に実施)又はいずれかのプロセッサにエラー判定(プロセッサ数2の場合に実施)を行い、演算結果を特定する。 Then, the result comparison unit / error processor detection unit 3 determines the majority of the calculation results (step S008). Specifically, the result comparison unit / error processor detection unit 3 reads out the calculation result stored in the shared storage unit of all the processors constituting the DMR, TMR, and nMR, and determines the majority (for example, the number of processors is 3 Perform in the above case) or perform error determination on one of the processors (implemented when the number of processors is 2) to identify the calculation result.
 以上が、本発明の第一の実施形態あるいは第二の実施形態を適用した情報処理装置のソフトウェア処理の流れである。これにより、各プロセッサにより演算された結果、ソフトエラーを回避して演算結果を得られる。また、複数プロセッサ間で結果が食い違う等、有意な演算結果を得られない場合には、再実行により有意な演算結果を得ることができる。また、DMR、TMR、nMRの構成を状況に応じて適切に設定することが可能となるため、ハードウェア構成に対する依存度を、より減らすことができる。 The above is the software processing flow of the information processing apparatus to which the first embodiment or the second embodiment of the present invention is applied. As a result, as a result of calculation by each processor, a calculation result can be obtained while avoiding a soft error. In addition, when a significant calculation result cannot be obtained, for example, the results are different among a plurality of processors, a significant calculation result can be obtained by re-execution. In addition, since the configurations of the DMR, TMR, and nMR can be appropriately set according to the situation, the dependency on the hardware configuration can be further reduced.
 なお、ステップS007において、他プロセッサの共有記憶部の計算終了フラグを確認して全プロセッサの計算完了まで待機する際に、他プロセッサの生存信号をチェックして異常検出を行い、異常がある場合に異常プロセッサのリセットを行ってもよい(計算中の相互監視)。 In step S007, when checking the calculation end flag of the shared storage unit of the other processor and waiting until the calculation of all the processors is completed, the survival signal of the other processor is checked to detect an abnormality. The abnormal processor may be reset (mutual monitoring during calculation).
 また、予め定めた計算期間(例えば、キャッシュをメモリにフラッシュするまで)が経過しても他プロセッサの計算が終了しない場合、該当プロセッサを異常と見なしてリセットを行ってもよい。当該処理を行った場合、全プロセッサ数から当該処理を行ったプロセッサの数を減じたプロセッサの計算終了をもって他コアの計算が終了したものと見なすようにしてもよい。 In addition, when the calculation of another processor does not end even after a predetermined calculation period (for example, until the cache is flushed to the memory), the processor may be reset as an abnormal processor. When the processing is performed, it may be considered that the calculation of the other core is completed when the calculation of the processor is completed by subtracting the number of processors that performed the processing from the total number of processors.
 また、ステップS008において、エラー判定で計算結果が異なる場合、全プロセッサをリセットして再計算を行う。多数決判定の場合、2個以上のプロセッサの計算結果一致をもって正しい演算結果とする。この際に、計算結果が異なるプロセッサについては、異常が発生したものと見なしてリセットを行うようにしてもよい。 In step S008, if the calculation result is different in the error determination, all the processors are reset and recalculated. In the case of majority decision, a correct calculation result is obtained when the calculation results of two or more processors match. At this time, processors having different calculation results may be reset by assuming that an abnormality has occurred.
 また、第三の実施形態における上記情報処理装置のソフトウェア処理の流れについては、基本的に同様であるが、ステップS008において計算結果を多数決判定する主体が、各プロセッサのうちのいずれかである点において相違する。 In addition, the flow of software processing of the information processing apparatus in the third embodiment is basically the same, but the main body that determines the majority of the calculation results in step S008 is one of the processors. Is different.
 図6は、本発明の実施形態を適用した情報処理装置のソフトウェア階層構成の例を示す図である。ハードウェア層601は、情報処理装置のハードウェア層である。ハードウェア層601の上に、ハードウェアを操作するドライバー層602がある。ハードウェア層601及び又はドライバー層602の上にソフトエラー耐性化層603がある。ソフトエラー耐性化層603は、少なくとも、DMR・TMR・nMRなどの耐性化手法の切り替え、共有記憶部の他プロセッサのステータス情報などにもとづく緩やかな同期動作、計算中の相互監視、計算終了後の多数決、異常プロセッサの検出・リセット、異常プロセッサの復旧と再同期等のソフトエラー耐性化に関わるプログラムを抽象化する。 FIG. 6 is a diagram showing an example of the software hierarchy of the information processing apparatus to which the embodiment of the present invention is applied. The hardware layer 601 is a hardware layer of the information processing apparatus. Above the hardware layer 601 is a driver layer 602 that operates the hardware. A soft error resistance layer 603 is provided on the hardware layer 601 and / or the driver layer 602. The soft error resilience layer 603 includes at least switching of resilience methods such as DMR, TMR, and nMR, gradual synchronous operation based on status information of other processors in the shared storage unit, mutual monitoring during computation, Abstracts programs related to soft error tolerance such as majority decision, detection / reset of abnormal processors, recovery and resynchronization of abnormal processors.
 ソフトエラー耐性化層603の上位階層には、OS(Operating System)層604とミドルウエア・ライブラリ層605を置くのが望ましいが、必須ではない。例えば情報処理装置が組み込み制御系の装置である場合には、OSは搭載されない場合もある。ソフトエラー耐性化層603又はOS層604及びミドルウエア・ライブラリ層605の上には、ユーザーアプリ606(ユーザーが作成したアプリケーション606)を配置する。このようなソフトウェア階層構造とする事で、ユーザーアプリ層では、ソフトエラー耐性化を意識する必要が無くなる。非ソフトエラー耐性化時のソフト制作工数とほぼ同じソフト制作工数となり、ソフトエラー耐性化によるソフト開発工数増加を抑止できる。 It is desirable to place an OS (Operating System) layer 604 and a middleware library layer 605 in the upper layer of the soft error tolerance layer 603, but it is not essential. For example, when the information processing apparatus is an embedded control system apparatus, the OS may not be installed. On the soft error resistance layer 603 or the OS layer 604 and the middleware library layer 605, a user application 606 (an application 606 created by the user) is arranged. By adopting such a software hierarchical structure, the user application layer need not be aware of soft error tolerance. The software production man-hours are almost the same as the software production man-hours for non-soft error tolerance, and the increase in software development man-hours due to soft error tolerance can be suppressed.
 また、上記各実施形態の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、メモリや、ハードディスク、SSD(Solid State Drive)等の記録装置、または、ICカード、SDカード、DVD等の記録媒体に置くことができる。 In addition, each configuration, function, processing unit, processing means, and the like of each of the above embodiments may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, and files that realize each function can be stored in a memory, a hard disk, a recording device such as an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD.
 また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。 Also, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.
 以上、第一の実施形態~第三の実施形態に基づき、本願発明に係る情報処理装置について説明した。なお、情報処理装置としては、いわゆる汎用計算機に限らず、ルーターやゲートウェイ等の通信装置に適用することも考えられる。 The information processing apparatus according to the present invention has been described above based on the first to third embodiments. Note that the information processing apparatus is not limited to a so-called general-purpose computer, but may be applied to a communication apparatus such as a router or a gateway.
 また、その他、モバイル機器、スマートメーター等の家庭内の電力消費量や水道、ガス等の消費量を計測する機器に適用してもよい。以上、本発明について、実施形態を中心に説明した。 In addition, the present invention may be applied to devices that measure household power consumption, water consumption, gas consumption, and the like, such as mobile devices and smart meters. In the above, this invention was demonstrated centering on embodiment.
1・・・第一のプロセッサ、2・・・第一のローカルメモリ、3・・・結果比較部・エラープロセッサ検出部、4・・・第一のバンク、5・・・リセット指示、6・・・記憶指示、7・・・共有記憶部、8・・・エラープロセッサ検出部 DESCRIPTION OF SYMBOLS 1 ... 1st processor, 2 ... 1st local memory, 3 ... Result comparison part and error processor detection part, 4 ... 1st bank, 5 ... Reset instruction, 6 ..Storage instruction, 7 ... shared storage unit, 8 ... error processor detection unit

Claims (7)

  1.  二つ以上のプロセッサと、
     あらかじめ対応づけられた各前記プロセッサから読み書き可能なメモリと、
     前記プロセッサによる演算結果を比較して演算結果を特定する結果比較部と、を備え、
     前記プロセッサは、少なくとも処理状態を特定する情報を、前記メモリ上の前記プロセッサ以外の他の前記プロセッサからの書き込みを不可能とされた連絡領域に格納し、
     前記プロセッサは、所定の処理を実行すると、前記プロセッサのうち前記処理と同処理を行う他のプロセッサのいずれかに対応づけられた前記連絡領域から前記処理状態を特定する情報を読み出し、前記プロセッサの前記処理状態と一致する場合に前記結果比較部に演算結果を比較するよう指示する、
     ことを特徴とする情報処理装置。
    Two or more processors,
    A memory that can be read and written from each of the processors associated in advance;
    A result comparison unit that compares the calculation results by the processor and identifies the calculation results;
    The processor stores at least information for specifying a processing state in a communication area on the memory where writing from another processor other than the processor is disabled,
    When the processor executes a predetermined process, the processor reads out information specifying the processing state from the communication area associated with any one of the processors that performs the same process as the process. Instructing the result comparison unit to compare the operation result when the processing state matches.
    An information processing apparatus characterized by that.
  2.  請求項1に記載の情報処理装置であって、
     前記プロセッサは、読み出した前記処理状態を特定する情報が所定の期間内において一致しない場合には、前記結果比較部に前記所定の処理の再実行を行うための処理を要求する、
     ことを特徴とする情報処理装置。
    The information processing apparatus according to claim 1,
    The processor requests the result comparison unit to perform a process for re-execution of the predetermined process when the information specifying the read processing state does not match within a predetermined period.
    An information processing apparatus characterized by that.
  3.  請求項1または2に記載の情報処理装置であって、
    前記プロセッサは、読み出した前記処理状態を特定する情報がエラーの発生を特定する情報である場合には、前記結果比較部に前記所定の処理の再実行を行うための処理を要求する、
     ことを特徴とする情報処理装置。
    The information processing apparatus according to claim 1, wherein:
    The processor requests the result comparison unit to re-execute the predetermined process when the read-out information specifying the processing state is information specifying the occurrence of an error;
    An information processing apparatus characterized by that.
  4.  請求項1~3のいずれか一項に記載の情報処理装置であって、
     前記メモリは、前記プロセッサ毎に対応づけられて設けられている、
     ことを特徴とする情報処理装置。
    The information processing apparatus according to any one of claims 1 to 3,
    The memory is provided in association with each processor,
    An information processing apparatus characterized by that.
  5.  請求項1~4のいずれか一項に記載の情報処理装置であって、
     前記所定の期間は、キャッシュをフラッシュする間隔である、
     ことを特徴とする情報処理装置。
    An information processing apparatus according to any one of claims 1 to 4,
    The predetermined period is an interval for flushing the cache.
    An information processing apparatus characterized by that.
  6.  二つ以上のプロセッサと、
     あらかじめ対応づけられた各前記プロセッサから読み書き可能なメモリと、を備え、
     前記プロセッサは、少なくとも処理状態を特定する情報を、前記メモリ上の前記プロセッサ以外の他の前記プロセッサからの書き込みを不可能とされた連絡領域に格納し、
     前記プロセッサは、所定の処理を実行すると、前記プロセッサのうち前記処理と同処理を行う他のプロセッサのいずれかに対応づけられた前記連絡領域から前記処理状態を特定する情報を読み出し、前記プロセッサの前記処理状態と一致する場合に、前記プロセッサによる演算結果を比較して演算結果を特定する、
     ことを特徴とする情報処理装置。
    Two or more processors,
    A memory that is readable and writable from each processor associated in advance;
    The processor stores at least information for specifying a processing state in a communication area on the memory where writing from another processor other than the processor is disabled,
    When the processor executes a predetermined process, the processor reads out information specifying the processing state from the communication area associated with any one of the processors that performs the same process as the process. When the processing state matches, the operation result by the processor is compared to identify the operation result;
    An information processing apparatus characterized by that.
  7.  二つ以上のプロセッサと、
     あらかじめ対応づけられた各前記プロセッサから読み書き可能なメモリと、
     前記プロセッサによる演算結果を比較して演算結果を特定する結果比較部と、を備え、
     前記プロセッサは、少なくとも処理状態を特定する情報を、前記メモリ上の前記プロセッサ以外の他の前記プロセッサからの書き込みを不可能とされた連絡領域に格納し、
     前記プロセッサは、所定の処理を実行すると、前記プロセッサのうち前記処理と同処理を行う他のプロセッサのいずれかに対応づけられた前記連絡領域から前記処理状態を特定する情報を読み出し、前記プロセッサの前記処理状態と一致する場合に前記結果比較部に演算結果を比較するよう指示する、
     ことを特徴とする通信装置。
    Two or more processors,
    A memory that can be read and written from each of the processors associated in advance;
    A result comparison unit that compares the calculation results by the processor and identifies the calculation results;
    The processor stores at least information for specifying a processing state in a communication area on the memory where writing from another processor other than the processor is disabled,
    When the processor executes a predetermined process, the processor reads out information specifying the processing state from the communication area associated with any one of the processors that performs the same process as the process. Instructing the result comparison unit to compare the operation result when the processing state matches.
    A communication device.
PCT/JP2014/063715 2014-05-23 2014-05-23 Information processing device WO2015177927A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/063715 WO2015177927A1 (en) 2014-05-23 2014-05-23 Information processing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2014/063715 WO2015177927A1 (en) 2014-05-23 2014-05-23 Information processing device

Publications (1)

Publication Number Publication Date
WO2015177927A1 true WO2015177927A1 (en) 2015-11-26

Family

ID=54553618

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2014/063715 WO2015177927A1 (en) 2014-05-23 2014-05-23 Information processing device

Country Status (1)

Country Link
WO (1) WO2015177927A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05189325A (en) * 1992-01-16 1993-07-30 Railway Technical Res Inst Double system electronic computer
JPH06250867A (en) * 1993-03-01 1994-09-09 Nippon Telegr & Teleph Corp <Ntt> Failure resisting computer and failure resisting calculation processing method
JP2001526422A (en) * 1997-12-10 2001-12-18 テレフオンアクチーボラゲツト エル エム エリクソン(パブル) Processor-related methods and processors adapted for functions based on the methods
JP2005049967A (en) * 2003-07-30 2005-02-24 Toshiba Corp Failsafe processor and protection control unit for railroad
JP2006338094A (en) * 2005-05-31 2006-12-14 Daido Signal Co Ltd Multiple-system electronic computer
JP2011095837A (en) * 2009-10-27 2011-05-12 Toshiba Corp Fail-safe system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05189325A (en) * 1992-01-16 1993-07-30 Railway Technical Res Inst Double system electronic computer
JPH06250867A (en) * 1993-03-01 1994-09-09 Nippon Telegr & Teleph Corp <Ntt> Failure resisting computer and failure resisting calculation processing method
JP2001526422A (en) * 1997-12-10 2001-12-18 テレフオンアクチーボラゲツト エル エム エリクソン(パブル) Processor-related methods and processors adapted for functions based on the methods
JP2005049967A (en) * 2003-07-30 2005-02-24 Toshiba Corp Failsafe processor and protection control unit for railroad
JP2006338094A (en) * 2005-05-31 2006-12-14 Daido Signal Co Ltd Multiple-system electronic computer
JP2011095837A (en) * 2009-10-27 2011-05-12 Toshiba Corp Fail-safe system

Similar Documents

Publication Publication Date Title
RU2385484C2 (en) Reduced frequency of non-corrected errors generation in system of double-module redundancy in inflexibility of configuration
CN109872150B (en) Data processing system with clock synchronization operation
JP4795025B2 (en) Dynamic reconfigurable device, control method, and program
US9632860B2 (en) Multicore processor fault detection for safety critical software applications
US10657010B2 (en) Error detection triggering a recovery process that determines whether the error is resolvable
JP5014899B2 (en) Reconfigurable device
US20160283314A1 (en) Multi-Channel Network-on-a-Chip
JP5925909B2 (en) Secure error handling
WO2014041596A1 (en) Safety controller
KR20170084969A (en) System on chip, mobile terminal, and method for operating the system on chip
US20080244620A1 (en) Dynamic Communication Fabric Zoning
Sim et al. A dual lockstep processor system-on-a-chip for fast error recovery in safety-critical applications
TWI510912B (en) Fault tolerance in a multi-core circuit
US20120137035A1 (en) Computing device and serial communication method of the computing device
US10915402B2 (en) Software fault monitoring
US10002057B2 (en) Method and apparatus for managing mismatches within a multi-threaded lockstep processing system
US20090249174A1 (en) Fault Tolerant Self-Correcting Non-Glitching Low Power Circuit for Static and Dynamic Data Storage
US10185635B2 (en) Targeted recovery process
US8954794B2 (en) Method and system for detection of latent faults in microcontrollers
WO2015177927A1 (en) Information processing device
CN107423029B (en) Calculation unit
Nahar et al. RotR: Rotational redundant task mapping for fail-operational MPSoCs
US8516336B2 (en) Latch arrangement for an electronic digital system, method, data processing program, and computer program product for implementing a latch arrangement
JP4165499B2 (en) Computer system, fault tolerant system using the same, and operation control method thereof
Stumpf How to protect the protector?

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14892717

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14892717

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP