KR20160098757A

KR20160098757A - Embedded System and error recovery method thereof

Info

Publication number: KR20160098757A
Application number: KR1020150020747A
Authority: KR
Inventors: 김익균; 변경진; 엄낙웅
Original assignee: 한국전자통신연구원
Priority date: 2015-02-11
Filing date: 2015-02-11
Publication date: 2016-08-19

Abstract

The present invention relates to an embedded system and a method for repairing error therein, which employ error correction and synchronization restoration methods based on triple modular redundancy and partial reconfiguration, thereby enabling high reliability of a sequential circuit. The embedded system comprises: memory; a processor which includes a plurality of cores individually processing assigned tasks; a memory controller which is disposed between the memory and the processor, and is responsible for data transmission between the memory and the processor; a peripheral device which is connected to the processor via an on-chip peripheral bus (OPB), and supports task processing operation of the processor; an input/output port which is connected to the processor via the on-chip peripheral bus, and is responsible for communication with an external device; and a first error processing unit which is disposed between the memory controller and the processor, detects error in the processor, and stores register information about the cores in the memory.

Description

[0001] Embedded system and error recovery method [

본 발명은 임베디드 시스템 및 이의 에러 복구 방법에 관한 것으로, 상세하게는 삼중 리던던시와 부분 재구성에 의한 에러 수정 및 동기 복구 방법을 이용하여 순서 회로에서의 고신뢰화를 도모할 수 있는 임베디드 시스템 및 이의 에러 복구 방법에 관한 것이다.
The present invention relates to an embedded system and an error recovery method thereof, and more particularly, to an embedded system capable of achieving high reliability in an order circuit by using error correction and synchronous recovery by triple redundancy and partial reconfiguration, &Lt; / RTI >

CMOS 프로세스의 미세화에 따라 SoC(System-On-Chip)의 고집적화, 고성능화가 이루어지고 있다. 이에 따라 재구성이 가능한 SRAM형 FPGA가 새로운 양산형 디바이스로서 주목되고 있다. 그리고, FPGA에 임베디드 프로세서 코어를 실장하여 시스템 관리를 수행하도록 하여 시스템 구축의 저가격화를 도모하기도 한다.With the miniaturization of CMOS processes, high integration and high performance of system-on-chip (SoC) have been achieved. As a result, reconfigurable SRAM type FPGAs are attracting attention as new mass-production devices. In addition, by implementing an embedded processor core in the FPGA, system management can be performed, thereby reducing the cost of system construction.

그러나, SRAM형 FPGA는 SEU(Single Event Upset)에 의해 회로의 오동작을 야기할 가능성이 있다. SEU는 방사선이나 중성자의 영향으로 메모리나 Flip-Flop의 값이 반전하는 soft error이다. However, SRAM type FPGAs may cause circuit malfunction by SEU (Single Event Upset). SEU is a soft error in which the value of memory or Flip-Flop is reversed due to the effects of radiation or neutrons.

FPGA에서는 회로구성정보를 configuration 메모리에 저장하고 있다. 이 때문에, FPGA에서 SEU가 발생하면, configuration 메모리에 실장되어 있는 회로 정보가 의도하지 않게 변경될 가능성이 있다.In FPGA, circuit configuration information is stored in configuration memory. Therefore, when an SEU occurs in the FPGA, there is a possibility that the circuit information mounted in the configuration memory is inadvertently changed.

근래까지 지상 레벨에서는 SEU 발생빈도가 낮아, SEU는 에러의 영향이 심각한 자동차나 space application 분야 밖에 고려되고 있지 않았다. 그러나, LSI의 미세화가 진행되고, 트랜지스터의 threshold 전압이 낮아지면서 지상 레벨에서도 SEU의 영향이 표면화되는 것으로 생각할 수 있다.Until recently, the frequency of occurrence of SEU was low at the ground level, and SEU was considered only in the automotive and space application fields where the error is serious. However, as the miniaturization of the LSI progresses and the threshold voltage of the transistor is lowered, it can be considered that the influence of the SEU is surfaced at the ground level.

더구나 SEU는 자동차 전장시스템이나 사회 인프라 시스템 등 신뢰성이 중시되는 분야로 FPGA가 보급되지 않는 요인이다. 이 때문에 FPGA의 고신뢰화의 중요성이 높아지고 있다.Moreover, SEU is an area where reliability is an important issue, such as automotive electronics systems and social infrastructure systems, which is why FPGAs are not deployed. Because of this, the importance of high reliability of FPGA is increasing.

FPGA의 대표적인 고신뢰화 방법으로서 TMR(Triple Modular Redundancy)을 들 수 있다. TMR은 회로를 3중 중복화하여 출력의 다수결을 취하는 방법으로, 하나의 회로에서의 에러를 감출 수 있다.A typical high reliability method of an FPGA is the Triple Modular Redundancy (TMR). TMR is a method of triple redundant circuitry to take the majority of outputs, which can hide errors in one circuit.

Combinational 회로에 있어서는, TMR에 의한 에어 은닉과 재구성을 이용한 에어 수정에 의해 고신뢰화가 가능하다. 특히, 부분 재구성을 이용하는 것으로 시스템의 동작을 멈추지 않고 에러를 수정할 수 있다.In combinational circuits, high reliability can be achieved by air correction using air concealment and reconfiguration by TMR. In particular, by using partial reconfiguration, errors can be corrected without stopping the operation of the system.

그러나, 현재 이용할 수 있는 에러 수정법에서는 재구성 시점에 회로 정보를 초기화하기 때문에, Flip-Flop이나 레지스터 정보까지도 초기화된다. 그 때문에, 프로세서를 비롯한 sequential 회로에 대하여 고신뢰화를 도모하는 것은 곤란하다. 순서회로를 TMR화하고 부분 재구성하는 것만으로는 내부 state가 일치하지 않기 때문에 동작 복구까지는 할 수 없게 된다. However, currently available error correction methods initialize circuit information at the time of reconfiguration, so even Flip-Flops and register information are initialized. Therefore, it is difficult to achieve high reliability for a sequential circuit including a processor. It is impossible to recover the operation because the internal state does not coincide with the TMR and partial reconfiguration of the order circuit.

전체 회로를 재구성하는 경우에는 재구성 처리에 더하여 레지스터 정보를 외부 FPGA에 저장하고, 재구성 후에 복귀시키는 처리가 필요하게 된다.When the entire circuit is reconfigured, in addition to the reconfiguration processing, it is necessary to store the register information in the external FPGA, and to restore the reconfiguration.

FPGA 디바이스에 영향을 미치는 에러로는 소프트 에러와 하드 에러가 있다. 하드 에러는 배선 단절 등의 물리적 파손인 반면에, 소프트 에러로는 메모리나 Flip-Flop에 있어서 1-bit 반전을 야기하는 SEU가 있다.Errors affecting FPGA devices include soft errors and hard errors. Hard errors are physical breaks such as wire breaks, while soft errors are SEUs that cause a 1-bit inversion in memory or Flip-Flop.

방사선이나 중성자 등이 실리콘에 충돌하여 트랜지스터의 전류를 교란하는 전하의 drift가 야기된다. 이 노이즈가 트랜지스터의 threshold 전압을 초과하여 SEU가 발생한다. 이 때문에, 집적회로 선폭의 미세화에 의해 트랜지스터의 threshold 전압이 낮아지면서 SEU 발생의 증가가 예상된다.Radiation or neutrons collide with the silicon causing charge drift that disturbs the transistor's current. This noise causes the SEU to exceed the threshold voltage of the transistor. For this reason, as the threshold voltage of the transistor is lowered due to miniaturization of the integrated circuit line width, SEU generation is expected to increase.

ASIC의 경우 회로 구성이 고정되어 있기 때문에 SEU의 영향은 일시적이다. 스토리지 메모리는 영속적인 영향을 받게 되지만, ECC(Error Correcting Code)에 의해 대처가 가능하다. 그러나, FPGA는 회로구성 정보를 SRAM에 보관하기 때문에 SEU에 의해 중대한 영향을 받을 수 있다.For ASICs, the influence of SEU is temporary because the circuit configuration is fixed. Storage memory will be permanently affected, but it can be handled by ECC (Error Correcting Code). However, since the FPGA stores the circuit configuration information in the SRAM, it can be seriously affected by the SEU.

SEU에 의해 FPGA에 발생하는 에러는 발생하는 위치에 따라 Transient error와 Permanent error로 분류할 수 있다. Transient error는 Flip-Flop나 latch에서 발생하는 에러로서 출력에 영향을 준다. 출력의 피드백(feedback)을 행하지 않는 한 그 영향은 일시적이다. 하지만, Permanent error는 회로구성이나 실행 프로그램을 변화시켜 오동작을 야기하고, 그 영향은 영속적이다.Errors occurring in the FPGA by the SEU can be classified into transient error and permanent error depending on the location where they occur. Transient errors are errors that occur in flip-flops or latches and affect the output. The effect is temporary unless feedback of the output is performed. However, Permanent error causes malfunction by changing circuit configuration or executable program, and the effect is permanent.

이 때문에, FPGA의 내장 메모리에 대하여서는 ASIC과 마찬가지로 ECC에 의한 대처가 필요하고, configuration 메모리에 대하여는 재구성에 의한 에러 수정이 필요하다.
For this reason, it is necessary to cope with the internal memory of FPGA by ECC like ASIC, and error correction by reconfiguration is necessary for configuration memory.

따라서, 본 발명은 상기와 같은 종래 기술의 문제점을 해결하기 위하여 안출된 것으로, 본 발명의 목적은, 삼중 리던던시와 부분 재구성에 의한 에러 수정 및 동기 복구 방법을 이용하여 순서 회로에서의 고신뢰화를 도모할 수 있는 임베디드 시스템 및 이의 에러 복구 방법을 제공함에 있다.
SUMMARY OF THE INVENTION Accordingly, the present invention has been made keeping in mind the above problems occurring in the prior art, and it is an object of the present invention to provide a method and apparatus for high reliability in an order circuit by using an error correction method and a synchronous recovery method by triple redundancy and partial reconfiguration And an error recovery method for the embedded system.

상기와 같은 목적을 달성하기 위한 본 발명의 일 측면에 따른 임베디드 시스템은, 메모리; 할당되는 태스크를 개별적으로 처리하는 다수의 코어를 포함하는 프로세서부; 상기 메모리와 상기 프로세서 사이에 위치하여, 상기 메모리와 상기 프로세서 사이의 데이터 전송을 담당하는 메모리 컨트롤러; 온-칩 주변 버스(OPB: On-chip Peripheral Bus)를 통해 상기 프로세서부와 연결되어 상기 프로세서부의 태스크 처리 동작을 보조하는 주변 장치부; 상기 온-칩 주변 버스를 통해 상기 프로세서부와 연결되며, 외부 장치와의 통신을 담당하는 입출력 포트; 및 상기 메모리 컨트롤러와 상기 프로세서부 사이에 위치하여, 상기 프로세서부에서 발생되는 에러를 검출하고, 상기 다수의 코어들에 대한 레지스터 정보를 상기 메모리에 저장하는 제 1 에러 처리부를 포함한다. According to an aspect of the present invention, there is provided an embedded system including: a memory; A processor unit including a plurality of cores for individually processing tasks to be allocated; A memory controller located between the memory and the processor and responsible for data transfer between the memory and the processor; A peripheral unit connected to the processor unit via an on-chip peripheral bus (OPB) for assisting a task processing operation of the processor unit; An input / output port connected to the processor unit via the on-chip peripheral bus and configured to communicate with an external device; And a first error processing unit located between the memory controller and the processor unit, for detecting errors generated in the processor unit and storing register information for the plurality of cores in the memory.

상기 제 1 에러 처리부는, 상기 프로세서부에서 발생되는 에러를 검출하는 제 1 검출 회로; 및 상기 제 1 검출 회로에서 검출되는 에러를 숨기는 역할을 하며, 상기 다수의 코어로부터 전송되는 레지스터 정보를 상기 메모리에 저장하는 제 1 다수결 회로를 포함한다. Wherein the first error processing unit comprises: a first detection circuit for detecting an error generated in the processor unit; And a first majority circuit for storing an error detected in the first detection circuit and storing the register information transmitted from the plurality of cores in the memory.

상기 임베디드 시스템은 상기 프로세서부와 상기 주변 장치부에 연결되며, 상기 프로세서부에서 발생되는 에러를 검출 및 처리하는 제 2 에러 처리부를 더 포함한다.The embedded system further includes a second error processing unit connected to the processor unit and the peripheral unit, and configured to detect and process an error generated in the processor unit.

상기 제 2 에러 처리부는, 상기 프로세서부에서 발생되는 에러를 검출하는 제 2 검출 회로; 및 상기 제 2 검출 회로에서 검출되는 에러를 숨기는 제 2 다수결 회로를 포함한다. Wherein the second error processing unit comprises: a second detection circuit for detecting an error generated in the processor unit; And a second majority circuit for hiding errors detected in the second detection circuit.

상기 임베디드 시스템은 상기 입출력 포트와 상기 주변 장치부 사이에 연결되며, 상기 프로세서부에서 발생되는 에러를 검출 및 처리하는 제 3 에러 처리부를 더 포함한다.The embedded system further includes a third error processing unit connected between the input / output port and the peripheral device and detecting and processing an error generated in the processor unit.

상기 제 3 에러 처리부는, 상기 프로세서부에서 발생되는 에러를 검출하는 제 3 검출 회로; 및 상기 제 3 검출 회로에 의해 검출되는 에러를 숨기는 제 3 다수결 회로를 포함한다.
Wherein the third error processing unit comprises: a third detecting circuit for detecting an error generated in the processor unit; And a third majority circuit that hides errors detected by the third detection circuit.

본 발명의 실시 예에 따른 임베디드 시스템의 에러 복구 방법은, 프로세서부의 다수의 코어가 할당된 태스크를 처리하는 단계; 상기 다수의 코어가 태스크를 처리하는 과정에서 에러가 발생되는지를 검출하는 단계; 상기 에러가 발생되는지를 검출하는 단계에서의 판단 결과, 에러가 발생된 것으로 판단되면, 에러가 발생된 코어를 부분 재구성하는 단계; 및 모든 코어에 인터럽트를 발생시키고, 정상 동작하고 있는 코어의 레지스터 정보를 공유하여 동기화를 행하는 단계를 포함한다. An error recovery method of an embedded system according to an embodiment of the present invention includes: processing a task to which a plurality of cores of a processor unit are allocated; Detecting whether an error occurs in processing the task by the plurality of cores; A step of partially reconstructing a core in which an error has occurred if it is determined that an error has occurred as a result of the determination in the step of detecting occurrence of the error; And generating an interrupt for all of the cores, and performing synchronization by sharing register information of a core that is operating normally.

이때, 상기 동기화를 행하는 단계는, 상기 다수의 코어에 인터럽트를 발생시키는 단계; 상기 다수의 코어의 레지스터 정보가 메모리에 저장되는 단계; 및 상기 메모리에 저장되는 다수의 레지스터 정보 중 정상적으로 동작하는 코어의 레지스터 정보가 모든 코어로 복원되어 코어 사이의 동기화가 이루어지는 단계를 포함한다.
At this time, the step of synchronizing includes: generating an interrupt to the plurality of cores; Storing register information of the plurality of cores in a memory; And registering information of a core that is normally operating among a plurality of register information stored in the memory, to be restored to all the cores, thereby synchronizing the cores.

이와 같은 본 발명에 따르면, 삼중 리던던시와 부분 재구성에 의한 에러 수정 및 동기 복구 방법을 이용하여 순서 회로에서의 고신뢰화를 도모할 수 있는 임베디드 시스템 및 이의 에러 복구 방법이 제공된다.According to the present invention, there is provided an embedded system and an error recovery method thereof that can achieve high reliability in an order circuit by using an error correction and synchronization recovery method based on triple redundancy and partial reconfiguration.

따라서, 자동자 전장시스템이나 사회 인프라 시스템 등 신뢰화가 중시되는 분에어도 FPGA를 응용하여 시스템의 저 코스트화를 실현할 수 있다.
Therefore, it is possible to realize the low cost of the system by applying the FPGA which is one of the automotive electronic system and the social infrastructure system, which is regarded as reliable.

도 1은 본 발명의 실시 예에 따른 임베디드 시스템의 구성을 도시한 구성도이다.
도 2는 본 발명의 실시 예에 따른 임베디드 시스템의 에러 복구 동작에 따른 순서를 도시한 플로우챠트이다.
도 3은 본 발명의 실시 예에 따른 임베디드 시스템의 에러 복구 동작을 도시한 상태도이다
도 4는 본 발명의 실시 예에 따른 임베디드 시스템의 동기화 과정을 도시한 플로우챠트이다.1 is a block diagram showing the configuration of an embedded system according to an embodiment of the present invention.
2 is a flowchart illustrating an error recovery operation of the embedded system according to an embodiment of the present invention.
3 is a state diagram illustrating an error recovery operation of the embedded system according to the embodiment of the present invention
4 is a flowchart illustrating a synchronization process of an embedded system according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시 예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시 예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시 예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 도면부호는 동일 구성 요소를 지칭한다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like numbers refer to like elements throughout.

본 발명의 실시 예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명의 실시 예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.
In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear. The following terms are defined in consideration of the functions in the embodiments of the present invention, which may vary depending on the intention of the user, the intention or the custom of the operator. Therefore, the definition should be based on the contents throughout this specification.

FPGA를 고신뢰화하는 방법으로는 에러 완화와 에러 수정의 2가지가 있다. 에러 완화는 TMR이나 DWC(Duplication with Comparison)와 같은 리던던시 회로에 의해 에러를 외부로 출력시키지 않는 방법이다. 이에 의해 Transient error에 대한 대책이 가능하다.There are two ways to make FPGA highly reliable: error mitigation and error correction. Error mitigation is a method that does not output an error to the outside by a redundancy circuit such as TMR or Duplication with Comparison (DWC). As a result, it is possible to take measures against transient errors.

에러 수정은 재구성을 이용하여 Permanent error를 수정하는 방법으로, 전체 재구성을 이용한 scrubbing이나 부분 재구성에 의한 에러 수정이 이용되고 있다. 조합 회로(Combinational Circuit)에 대해서는 에러 완화와 에러 수정을 조합하여 고신뢰화를 도모할 수 있다. 특히, TMR과 부분 재구성을 조합시킨 경우, 시스템의 동작을 멈추지 않고 에러를 수정할 수 있다.Error correction is a method of correcting permanent error using reconstruction. Error correction by scrubbing or partial reconstruction using total reconstruction is used. Combination circuit (Combinational Circuit) can combine error mitigation and error correction to achieve high reliability. In particular, when the TMR and partial reconstruction are combined, the error can be corrected without stopping the operation of the system.

하지만, 순차 회로(Sequential Circuit)에 대해서는, 에러 완화에 관한 연구는 진척되고 있지만, 에러 수정에 대해서는 충분하지 않다. 현재 이용되고 있는 재구성 방법의 경우, 회로 정보의 초기화를 행함으로써 재구성을 한다. 이때, 순차 회로의 내부정보도 초기화되기 때문에 에러 발생 직전의 상태로 복구하는 처리도 필요하게 된다. TMR과 부분 재구성을 이용하는 경우에도, 재구성한 회로의 내부 state가 달라지기 때문에 리던던시 회로와의 동기 처리가 필요하다.However, with respect to a sequential circuit, research on error mitigation is progressing, but error correction is not sufficient. In the case of the currently used reconfiguration method, reconfiguration is performed by initializing the circuit information. At this time, since the internal information of the sequential circuit is also initialized, a process of restoring to the state immediately before the occurrence of the error is also required. Even when using TMR and partial reconfiguration, the internal state of the reconfigured circuit is changed, so it is necessary to synchronize with the redundancy circuit.

본 발명은 FPGA가 가진 재구성 가능(re-configurability)이라는 특징에 착안한 것으로, TMR과 부분 재구성에 의하여 에러 수정 및 동기(Synchronization) 복구를 도모한다. 이때, 동기 복구는 FPGA 내장 메모리를 이용하여 각 레지스터 정보의 다수결을 취해 메모리에 보관하고 그것을 전체 프로세서에 복귀시키는 것으로 실현된다.
The present invention focuses on the re-configurability feature of the FPGA, and performs error correction and synchronization recovery by TMR and partial reconfiguration. At this time, the synchronous recovery is realized by taking the majority of each register information using the FPGA internal memory, storing it in the memory, and returning it to the entire processor.

이하, 본 발명의 실시 예에 따른 임베디드 시스템 및 이의 에러 복구 방법에 대하여 첨부한 도면을 참조하여 상세하게 설명해 보기로 한다.
Hereinafter, an embedded system and an error recovery method according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 실시 예에 따른 임베디드 시스템의 구성을 도시한 구성도이다. 이때, 도 1에 도시된 구성은 임베디드 프로세서 코어에서의 에러 수정을 위해 코어를 부분 재구성 영역으로하고, SEU의 발생 대상으로 한 예이다.1 is a block diagram showing the configuration of an embedded system according to an embodiment of the present invention. In this case, the configuration shown in FIG. 1 is an example in which the core is set as a partial reconfiguration area for error correction in the embedded processor core, and the SEU is generated.

도 1을 참조하면, 본 발명의 실시 예에 따른 임베디드 시스템(100)은 메모리(110), 메모리 컨트롤러(120), 프로세서부(130), 주변 장치부(140), 입출력 포트(150) 및 제 1 내지 제 3 에러 처리부(160, 170, 180)를 포함할 수 있다. 1, an embedded system 100 according to an embodiment of the present invention includes a memory 110, a memory controller 120, a processor unit 130, a peripheral unit 140, an input / output port 150, 1 to the third error processing units 160, 170, and 180, respectively.

이때, 본 실시 예에 따른 임베디드 시스템(100)은 하나의 메모리 컨트롤러를 포함하나, 다수의 메모리 컨트롤러를 포함하도록 구성될 수 있다. 이에 따라, 상기 제 1 에러 처리부 역시 다수 구성될 수 있다.At this time, the embedded system 100 according to the present embodiment includes one memory controller, but may be configured to include a plurality of memory controllers. Accordingly, a plurality of the first error processing units may be configured.

상기 메모리(110)는 메모리 컨트롤러(120)로부터 전송되는 데이터를 저장하고 메모리 컨트롤러(120)가 요청하는 데이터를 전송한다.The memory 110 stores data transmitted from the memory controller 120 and transmits data requested by the memory controller 120.

상기 메모리 컨트롤러(120)는 메모리(110)와 프로세서부(130)에 위치하여, 프로세서부(130)의 제어에 따라 메모리(110)로 데이터를 저장하거나 데이터를 요청한다.The memory controller 120 is located in the memory 110 and the processor unit 130 and stores data or requests data to the memory 110 under the control of the processor unit 130.

이때, 상기 메모리 컨트롤러(120)와 프로세서부(130)는 로컬 메모리 버스(LMB : Local Memory Bus)를 통해 연결되어 신호를 송수신하며, 특히 데이터 로컬 메모리 버스(DLMB : Data Local Memory Bus)를 통해 데이터를 송수신하고, 명령 로컬 메모리 버스(ILMB : Instruction Local Memory Bus)를 통해 명령을 송수신한다.In this case, the memory controller 120 and the processor unit 130 are connected to each other through a local memory bus (LMB) to transmit and receive signals. In particular, the memory controller 120 and the processor unit 130 communicate data through a data local memory bus (DLMB) And sends and receives commands via an instruction local memory bus (ILMB).

상기 프로세서부(130)는 할당되는 태스크(task)를 개별적으로 처리하는 다수의 코어를 포함하며, 본 실시 예에 따르면, 제 1 내지 제 3 코어(131, 132, 133)를 포함하며, 상기 프로세서부(130)를 구성하는 코어의 개수는 시스템에 따라 다를 수 있다. 여기서, 상기 제 1 내지 제 3 코어(131, 132, 133)는 Harvard 아키텍처 구조의 RISC 타입일 수 있다.The processor unit 130 includes a plurality of cores that individually process tasks to be allocated. According to the present embodiment, the processor unit 130 includes first to third cores 131, 132 and 133, The number of cores included in the unit 130 may vary depending on the system. Here, the first to third cores 131, 132 and 133 may be RISC types of the Harvard architecture.

상기 프로세서부(130)는 태스크를 처리하면서 발생하는 데이터를 메모리(110)에 저장하거나, 태스크 처리에 필요한 데이터를 메모리(110)로부터 읽어온다.The processor unit 130 stores data generated while processing a task in the memory 110 or reads data necessary for task processing from the memory 110. [

상기 프로세서부(130)는 로컬 메모리 버스(LMB : Local Memory Bus)를 통해 메모리 컨트롤러(120)와 연결되어 신호를 송수신하며, 온-칩 주변 버스(OPB : On-chip Peripheral Bus)를 통해 주변 장치부(130)와 연결되어 신호를 송수신한다.The processor unit 130 is connected to the memory controller 120 through a local memory bus (LMB) to transmit and receive signals, and transmits and receives signals to and from the peripheral device (OS) through an on-chip peripheral bus (OPB) And transmits and receives a signal.

상기 주변 장치부(140)는 프로세서부(130)의 태스크 처리 동작을 보조하거나 시간 정보를 제공하며, 상기 주변 장치부(140)는 시스템 사양에 따라 다양한 종류의 모듈로 구성될 수 있으나, 본 실시 예에서는 상기 주변 장치부(140)가 타이머(141)와 OPB(On-Chip Peripheral BUS) 인터럽트 컨트롤러(142)로 구성되는 것으로 가정한다.The peripheral device unit 140 assists the task processing operation of the processor unit 130 or provides time information. The peripheral device unit 140 may be composed of various types of modules according to system specifications, In the example, it is assumed that the peripheral unit 140 includes a timer 141 and an on-chip peripheral bus (OPB) interrupt controller 142.

한편, 상기 주변 장치부(140)는 프로세서부(130)를 구성하는 코어의 개수와 상응하도록 구비되는 것으로, 본 실시 예에서 프로세서부(130)는 3개의 코어(131, 132, 133)로 구성되므로, 본 발명의 실시 예에 따른 임베디드 시스템(100)에는 3개의 주변 장치부(141, 142, 143)가 구비된다.The processor unit 130 may include three cores 131, 132, and 133. The processor unit 130 may be configured to correspond to the number of cores included in the processor unit 130. In this embodiment, Therefore, the embedded system 100 according to the embodiment of the present invention is provided with three peripheral units 141, 142, and 143.

이때, 상기 주변 장치부(140)는 온-칩 주변 버스(OPB : On-chip Peripheral Bus)를 통해 프로세서부(130)와 연결되어 신호를 송수신한다.At this time, the peripheral device unit 140 is connected to the processor unit 130 via an on-chip peripheral bus (OPB) to transmit and receive signals.

상기 입출력 포트(150)는 임베디드 시스템(100) 외부의 장치와의 통신을 담당한다.
The input / output port 150 is responsible for communication with devices outside the embedded system 100.

상기 제 1 내지 제 3 에러 처리부(160, 170, 180)는 각각 다수결 회로와 검출 회로를 포함한다. Each of the first to third error processing units 160, 170, and 180 includes a majority circuit and a detection circuit.

즉, 상기 제 1 에러 처리부(160)는 제 1 다수결 회로(161)와 제 1 검출 회로(162)를 포함하고, 상기 제 2 에러 처리부(170)는 제 2 다수결 회로(171)와 제 2 검출 회로(172)를 포함하고, 상기 제 3 에러 처리부(180)는 제 3 다수결 회로(181)와 제 3 검출 회로(182)를 포함한다.That is, the first error processing unit 160 includes a first majority circuit 161 and a first detection circuit 162. The second error processing unit 170 includes a second majority circuit 171 and a second detection circuit 162. [ Circuit 172 and the third error processing unit 180 includes a third majority circuit 181 and a third detection circuit 182. [

상기 제 1 에러 처리부(160)는 메모리 컨트롤러(120)와 프로세서부(130) 사이에 연결되며, 특히, 로컬 메모리 버스(LMB : Local Memory Bus)를 통해 프로세서부(130)와 연결된다.The first error processing unit 160 is connected between the memory controller 120 and the processor unit 130 and is connected to the processor unit 130 through a local memory bus (LMB).

상기 제 1 에러 처리부(160)의 제 1 다수결 회로(161)는 에러를 숨기는 역할을 하며, 메모리(110)를 공유하여 메모리 소비량을 삭감시키는 역할을 한다.The first majority circuit 161 of the first error processing unit 160 serves to hide the error and to reduce the memory consumption by sharing the memory 110.

또한, 제 1 다수결 회로(161)는 옳은 레지스터 정보를 메모리(110)에 저장할 수 있도록 한다.Also, the first majority circuit 161 makes it possible to store the correct register information in the memory 110.

상기 제 2 에러 처리부(170)는 프로세서부(130)와 주변 장치부(140) 사이에 연결되며, 특히, 온-칩 주변 버스(OPB : On-chip Peripheral Bus)를 통해 주변 장치부(140)와 연결된다.The second error processing unit 170 is connected between the processor unit 130 and the peripheral device unit 140 and is connected to the peripheral device unit 140 through an on-chip peripheral bus (OPB) Lt; / RTI >

상기 제 2 에러 처리부(170)의 제 2 다수결 회로(171)는 프로세서부(130)에서 에러가 발생하여도, 발생된 에러에 의해 주변 장치부(140)가 영향을 받는 것을 방지한다.The second majority circuit 171 of the second error processing unit 170 prevents the peripheral device unit 140 from being affected by an error even if an error occurs in the processor unit 130.

상기 제 3 에러 처리부(180)는 입출력 포트(150)와 온-칩 주변 버스(OPB : On-chip Peripheral Bus) 사이에 연결된다.The third error processing unit 180 is connected between the input / output port 150 and an on-chip peripheral bus (OPB).

상기 제 3 에러 처리부(180)의 제 3 다수결 회로(181)는 입출력 포트(150)에 관한 에러를 숨기기 위하여 배치된다.The third majority circuit 181 of the third error processing unit 180 is arranged to hide an error relating to the input / output port 150.

한편, 상기 제 1 내지 제 3 다수결 회로(161, 171, 181)는 배치되는 위치에 따라 다른 기능을 하지만, 상기 제 1 내지 제 3 검출 회로(162, 172, 182)는 프로세서부(130)의 코어들(131, 132, 133)에서 발생하는 에러를 감지하기 위해 구성된다.The first to third majority circuits 161, 171 and 181 function differently depending on the positions of the first to third majority circuits 161 to 171. The first to third detection circuits 162, And is configured to detect errors occurring in the cores 131, 132,

여기서, 상기 제 1 내지 제 3 다수결 회로(161, 171, 181)는 에러가 발생한 곳을 검출하고 부분 재구성을 행하도록 신호를 출력한다.Here, the first to third majority circuits 161, 171 and 181 detect a place where an error has occurred and output a signal to perform a partial reconfiguration.

상기 제 1 내지 제 3 검출 회로(162, 172, 182)의 논리식은 다음과 같다.The logical expressions of the first to third detection circuits 162, 172, and 182 are as follows.

여기서, R0, R1, R2는 삼중 리던던시 모듈로부터의 출력이고, E0, E1, E2는 어느 모듈이 에러로 되어 있는가를 알려주는 신호이다.
Here, R0, R1, and R2 are outputs from the triple redundancy module, and E0, E1, and E2 are signals indicating which module is in error.

이상에서는 본 발명의 실시 예에 따른 임베디드 시스템에 대해서 살펴보았다. 이하에서는 첨부된 도면을 바탕으로 본 발명의 실시 예에 따른 임베디드 시스템의 에러 복구 동작에 대해서 단계적으로 살펴보기로 한다.
In the foregoing, an embedded system according to an embodiment of the present invention has been described. Hereinafter, an error recovery operation of the embedded system according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 실시 예에 따른 임베디드 시스템의 에러 복구 동작에 따른 순서를 도시한 플로우챠트이고, 도 3은 본 발명의 실시 예에 따른 임베디드 시스템의 에러 복구 동작을 도시한 상태도이다.FIG. 2 is a flowchart illustrating an error recovery operation of an embedded system according to an embodiment of the present invention. FIG. 3 is a state diagram illustrating an error recovery operation of the embedded system according to an embodiment of the present invention.

이하에서 도 2 및 3을 참조하여, 본 발명의 실시 예에 따른 임베디드 시스템의 에러 복구 동작을 살펴보되, 제 3 코어(133)에서 에러가 발생한 것으로 가정한다.Hereinafter, an error recovery operation of the embedded system according to an embodiment of the present invention will be described with reference to FIGS. 2 and 3. It is assumed that an error has occurred in the third core 133. FIG.

먼저, 임베디드 시스템이 동작함에 따라 다중 코어(131, 132, 133)가 동작하면서 할당된 태스크를 처리한다(S210). 이때, 최초 상기 다중 코어(131, 132, 133)는 정상적으로 동작하는 것으로 가정한다.First, as the embedded system operates, the multiple cores 131, 132, and 133 operate to process an allocated task (S210). At this time, it is assumed that the multiple cores (131, 132, 133) initially operate normally.

상기 다중 코어(131, 132, 133)가 동작을 하던 중 임의의 코어에서 에러가 발생되는지를 검출 회로(162, 172, 182)가 검출한다(S220).The detection circuits 162, 172, and 182 detect whether an error occurs in any core while the multiple cores 131, 132, and 133 are operating (S220).

단계 S220에 따른 판단 결과, 에러가 검출되지 않으면(S220-No), 지속적으로 에러 발생을 검출하고, 제 3 코어(133)에서 발생한 에러가 검출되면(S220-Yes)(도 3의 t1), 에러를 검출한 검출 회로는 에러가 발생한 제 3 코어(133)로 부분 재구성을 위한 제어 신호를 출력하여 부분 재구성시킨다(S230)(도 3의 t2).If it is determined in step S220 that an error is not detected (S220-No), an error occurrence is continuously detected. If an error is detected in the third core 133 (S220-Yes) The detection circuit which has detected the error outputs the control signal for partial reconfiguration to the third core 133 in which the error has occurred to partially reconfigure (S230) (t2 in Fig. 3).

한편, 단계 S230에서는 에러가 발생한 코어에 대한 부분 재구성이 이루어지지만, 코어들 사이의 내부 상태의 동기가 이루어지지 않는다.On the other hand, in step S230, partial reconfiguration is performed on the cores where an error has occurred, but the internal states among the cores are not synchronized.

이에, 단계 S230에 따라 부분 재구성이 이루어진 후, 모든 코어에 인터럽트를 발생시키고(도 3의 t3), 정상 동작하고 있는 코어의 레지스터 정보를 공유하여 동기화를 행한다(S240)(도 3의 t4).After the partial reconfiguration is performed according to step S230, interrupts are generated in all the cores (t3 in FIG. 3), and the register information of the cores in normal operation is shared to perform synchronization (S240) (t4 in FIG. 3).

단계 S240에 따라 모든 코어가 동기화되면, 모든 코어는 정상 동작을 수행하게 된다.When all the cores are synchronized in accordance with step S240, all the cores perform normal operation.

이하, 단계 S240에 따라 이루어지는 코어의 동기화 과정에 대해서 좀 더 구체적으로 도 4를 참조하여 살펴보자.Hereinafter, the synchronization process of the core according to step S240 will be described in more detail with reference to FIG.

도 4는 본 발명의 실시 예에 따른 임베디드 시스템의 동기화 과정을 도시한 플로우챠트이다.4 is a flowchart illustrating a synchronization process of an embedded system according to an embodiment of the present invention.

도 4를 참조하면, 먼저, 각 코어에 인터럽트를 발생시킨다(S241). 이때, 인터럽트를 접수한 정상 동작하는 코어의 레지스터 정보는 메모리에 저장되나, 에러가 발생됨에 따라 부분 재구성된 코어는 인터럽트를 접수하지 않고, 기동 프로세스 실행을 위한 명령을 출력한다. 하지만, 코어로부터 출력되는 기동 프로세스 실행 명령은 다수결 회로에 의해 숨겨지고, 에러가 발생되지 않은 코어들로부터 레지스터 정보 저장 명령 요구가 채택된다.Referring to FIG. 4, an interrupt is generated in each core (S241). At this time, the register information of the normally operating core receiving the interrupt is stored in the memory, but as the error occurs, the partially reconfigured core does not accept the interrupt and outputs a command for executing the start process. However, the startup process execution command output from the core is hidden by the majority circuit, and the register information storage instruction request is adopted from the cores where no error has occurred.

이에 의해, 모든 코어는 메모리로 레지스터 정보 저장 명령을 전송하고(S242), 이 시점에서 명령 단위의 동기가 완료된다. As a result, all the cores transmit the register information storage command to the memory (S242), and the synchronization on a command-by-command basis is completed at this point.

그리고, 모든 코어로부터 레지스터 정보가 read되어 다수결 회로를 통해 메로리에 저장되며(S243), 이때, 메모리에는 정상동작을 하고 있는 코어의 레지스터 정보가 저장된다.Then, the register information is read from all the cores and stored in the memory via the majority circuit (S243). At this time, the register information of the core that is operating normally is stored in the memory.

단계 S243에 따라 저장된 레지스터 정보를 모든 코어로 복원시키기게 되며(S244), 이에 따라 코어 간 내부 상태가 동기화되고, 에러가 발생한 코어에 대한 복구 처리와 동기화 처리가 종료된다.
In step S243, the stored register information is restored to all cores (S244). Accordingly, the internal states between the cores are synchronized with each other, and the recovery processing and the synchronization processing for the cores where the errors occurred are ended.

한편, 본 발명에 따른 임베디드 시스템 및 이의 에러 복구 방법을 실시 예에 따라 설명하였지만, 본 발명의 범위는 특정 실시 예에 한정되는 것은 아니며, 본 발명과 관련하여 통상의 지식을 가진 자에게 자명한 범위 내에서 여러 가지의 대안, 수정 및 변경하여 실시할 수 있다.
While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the scope of the invention is not limited to the disclosed embodiments, but, on the contrary, Various modifications, alterations, and alterations can be made within the scope of the present invention.

따라서, 본 발명에 기재된 실시 예 및 첨부된 도면들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시 예 및 첨부된 도면에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리 범위에 포함되는 것으로 해석되어야 할 것이다.
Therefore, the embodiments described in the present invention and the accompanying drawings are intended to illustrate rather than limit the technical spirit of the present invention, and the scope of the technical idea of the present invention is not limited by these embodiments and accompanying drawings . The scope of protection of the present invention should be construed according to the claims, and all technical ideas within the scope of equivalents should be interpreted as being included in the scope of the present invention.

100 : 임베디드 시스템 110 : 메모리
120 : 메모리 컨트롤러 130 : 프로세서부
131, 132, 133 : 코어 140 : 주변 장치부
141 : 타이머 142 : OPB 컨트롤러
150 : 입출력 포트 160, 170, 180 : 에러 처리부
161, 171, 181 : 다수결 회로 162, 172, 182 : 검출 회로100: embedded system 110: memory
120: memory controller 130:
131, 132, 133: core 140:
141: Timer 142: OPB controller
150: input / output ports 160, 170, 180:
161, 171, 181: majority circuit 162, 172, 182: detection circuit

Claims

Memory;
A processor unit including a plurality of cores for individually processing tasks to be allocated;
A memory controller located between the memory and the processor and responsible for data transfer between the memory and the processor;
A peripheral unit connected to the processor unit via an on-chip peripheral bus (OPB) for assisting a task processing operation of the processor unit;
An input / output port connected to the processor unit via the on-chip peripheral bus and configured to communicate with an external device; And
And a first error processing unit located between the memory controller and the processor unit, for detecting an error occurring in the processor unit and storing register information for the plurality of cores in the memory.

The method according to claim 1,
Wherein the first error processing unit comprises:
A first detection circuit for detecting an error generated in the processor unit; And
And a first majority circuit serving to hide an error detected by the first detection circuit and storing register information transmitted from the plurality of cores in the memory.

The method according to claim 1,
And a second error processing unit connected to the processor unit and the peripheral unit, for detecting and processing an error generated in the processor unit.

The method of claim 3,
And the second error processing unit,
A second detection circuit for detecting an error generated in the processor unit; And
And a second majority circuit for hiding errors detected in said second detection circuit.

The method according to claim 1,
And a third error processing unit connected between the input / output port and the peripheral device and detecting and processing an error generated in the processor unit.

6. The method of claim 5,
The third error processing unit,
A third detection circuit for detecting an error generated in the processor unit; And
And a third majority circuit that hides errors detected by the third detection circuit.

Processing a task to which a plurality of cores of a processor unit are assigned;
Detecting whether an error occurs in processing the task by the plurality of cores;
A step of partially reconstructing a core in which an error has occurred if it is determined that an error has occurred as a result of the determination in the step of detecting occurrence of the error; And
Generating an interrupt in all of the cores and performing synchronization by sharing register information of a core operating normally;
And an error recovery method for an embedded system.

8. The method of claim 7,
Wherein the step of synchronizing comprises:
Generating an interrupt in the plurality of cores;
Storing register information of the plurality of cores in a memory; And
Wherein the register information of a core that is normally operating among a plurality of register information stored in the memory is restored to all cores so that synchronization between the cores is performed.