CN105607974A - High-reliability multicore processing system - Google Patents

High-reliability multicore processing system Download PDF

Info

Publication number
CN105607974A
CN105607974A CN201510955823.4A CN201510955823A CN105607974A CN 105607974 A CN105607974 A CN 105607974A CN 201510955823 A CN201510955823 A CN 201510955823A CN 105607974 A CN105607974 A CN 105607974A
Authority
CN
China
Prior art keywords
kernel
unit
processing
test
data storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510955823.4A
Other languages
Chinese (zh)
Inventor
张犁
李娟伟
殷赞
李甫
石光明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN201510955823.4A priority Critical patent/CN105607974A/en
Publication of CN105607974A publication Critical patent/CN105607974A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2041Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant with more than one idle spare processing component
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/202Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
    • G06F11/2023Failover techniques
    • G06F11/2033Failover techniques switching over of hardware resources

Abstract

The invention discloses a high-reliability multicore processing system, and aims at solving the problem that the existing multicore processing systems are low in reliability and high in resource consumption. The high-reliability multicore processing system comprises N data storage units, N program storage units, N processing cores, a connection network, switching control circuits, a switching controller and n redundant cores, wherein n is less than or equal to N; an error correction encoding unit is arranged at the input end of each data storage unit and an error correction decoding unit is arranged at the output end of each data storage unit; an error correction decoding unit is arranged at the output end of each program storage unit; a built-in self-testing circuit is arranged in each processing core; the data storage units and the program storage units are connected with the n redundant cores through the first switching control circuit (1); the processing cores are connected with the connection network through the second switching control circuit (2); and the switching controller is connected with the switching control circuits and the processing cores. The system disclosed in the invention is reliable in performance and low in resource consumption, and can be applied to the fields of digital signal processing and the like.

Description

High reliability multiple core processing system
Technical field
The invention belongs to technical field of integrated circuits, relate to a kind of multiple core processing system of high reliability, can be used forThe fields such as Digital Signal Processing.
Background technology
Along with scientific and technological development, monokaryon treatment system is because its data-handling capacity is limited, and handling property is lowUnder, more and more cannot meet people's demand, and if monokaryon system break down in the course of the work,Whole system will be made mistakes, thus designer seek multiple monokaryon collection to be combined together work, so graduallyOccurred multiple core processing system, multiple core processing system can be realized the parallel of data level or task level, deal with dataAbility is strong.
Multiple core processing system is when performance improves, and its circuit structure becomes increasingly complex, and forming circuit is requiredResource is more and more, and this makes multiple nucleus system under the condition of various external disturbance and internal noise, easily occur eventBarrier, has affected its reliability of operation, and in order to address this problem, scientific research personnel has proposed multiple different sideCase, for example, consider from technological angle, silicon technology in the higher dielectric substrate of dependability, and this technique hasThe advantage that parasitic capacitance is little, short-channel effect is little, speed is high, integrated level is high, but need to change production technology andDesign cycle, complexity is high, considers from circuit angle, and conventional have triplication redundancy technology, its basic thoughtBe that three identical modules are carried out identical operation simultaneously, then increase a majority voting at output portDevice is selected the object that reaches fault-tolerant to result, in actual applications, disparate modules breaks down simultaneouslyProbability is lower, so just by the thought of hardware redundancy, system reliability is got a promotion.
In the multiple core processing system based on triplication redundancy, each processing kernel is originally identical by threeProcessing kernel and majority voting device replace, improved the reliability of system, but due to used processingThe quantity of kernel is many, and multiple majority voting devices need to be set, and causes the resource overhead of whole system to increase, withTime power consumption also corresponding increase, and because voting machine is not from error detection and error-correcting performance, can be to whole system canImpact by property.
Summary of the invention
The object of the invention is to overcome the deficiency that above-mentioned prior art exists, propose a kind of high reliability multinuclear placeReason system, for solving the technical problem that existing multiple core processing system reliability is not high enough and resource consumption is large.
For achieving the above object, the technical scheme that the present invention takes is:
A kind of high reliability multiple core processing system, comprise N data storage cell, a N program storage unit (PSU),Process kernel and interconnection network for N, wherein: data storage cell is for receiving, store and exporting pendingData, program storage unit (PSU) moves required binary machine code for storing with output system, processes kernel and usesIn the binary machine code of fetch program memory cell output the data place to data storage cell outputReason; Interconnection network is for realizing communication and the exchanges data processed between kernel; Each data storage cell defeatedEnter end and be provided with Error Correction of Coding unit, its output is provided with error correction decoding unit, for detection of and correct error code;The output of each program storage unit (PSU) is provided with error correction decoding unit, for the warp to program storage unit (PSU) outputThe binary machine code of crossing precoding detects and corrects error code; In each processing kernel, be provided with built-in self-testWhether examination circuit, have fault for detection of processing kernel; Each data storage cell and program storage unit (PSU) defeatedGo out end and be all connected with n redundancy kernel by the first control switching circuit 1, wherein n≤N, redundancy kernel is usedIn replacing processing kernel to be detected and fault kernel; Process kernel and redundancy kernel by the second control switching circuit2 are connected with interconnection network.
Above-mentioned high reliability multiple core processing system, Error Correction of Coding unit and error correction decoding unit adopt Hamming code knotStructure.
Above-mentioned high reliability multiple core processing system, built-in self-test circuit comprises test vector generation unit, testResponse analysis unit, comparing unit, tagged word ROM and built-in self-test control module; Wherein, test toProduction units is for generation of processing the required test vector of kernel; Test response analytic unit will be in processingThe test result boil down to actual characteristic symbol that core produces; Comparing unit is for comparing actual characteristic symbol and idealWhether characteristic symbol is identical to draw whether fault of circuit; Tagged word ROM is used for storing desired characteristics symbol;Built-in self-test control module is used for controlling test vector generation unit, test response analytic unit, comparing unitOperation with tagged word ROM.
Above-mentioned high reliability multiple core processing system, redundancy kernel is identical with processing inner core, to be checked for replacingSurvey and process kernel and fault kernel.
Above-mentioned high reliability multiple core processing system, the first control switching circuit 1 and the second control switching circuit 2,By switch controller, redundancy kernel and processing kernel are switched.
The present invention compared with prior art, has the following advantages:
1, the output of the data storage cell in the present invention and program storage unit (PSU) all passes through control switching circuitBe connected with n redundancy kernel, redundancy kernel can be replaced processing kernel to be detected, ensures the company of system worksContinuous property and real-time, in the time that the detection of processing kernel has fault, redundancy kernel forever replaces fault kernel, has ensuredThe correctness of systemic-function, has improved the reliability of system effectively; Data storage cell is due to defeated at it simultaneouslyEnter end and be provided with Hamming code coding unit, output is provided with Hamming code decoding unit, and program storage unit (PSU)Output is provided with Hamming code decoding unit, can realize the output of data storage cell and program storage unit (PSU)Data detect, and the error code detecting is corrected, and have ensured to be input to the data processed in kernelCorrectness, has further improved the reliability of system.
2, the output of the data storage cell in the present invention and program storage unit (PSU) all passes through control switching circuitBe connected with n redundancy kernel, be equivalent to process kernel to N and be equipped with n redundancy kernel, with existingMultiple core processing system based on triplication redundancy is compared, and has reduced and has processed the usage quantity of kernel, and do not needed to arrangeMajority voting device, in ensureing system reliability, has reduced resource consumption effectively.
Brief description of the drawings
Fig. 1 is overall structure schematic diagram of the present invention;
Fig. 2 is processing kernel BIST Structure schematic diagram of the present invention.
Detailed description of the invention
Below in conjunction with the drawings and specific embodiments, the present invention is described in further detail.
With reference to Fig. 1, the present invention includes in N data storage cell, a N program storage unit (PSU), a N processingCore, interconnection network, the first control switching circuit 1, the second control switching circuit 2, switch controller and nRedundancy kernel.
Data storage cell, for receiving, store and export pending data, is provided with error correction at its inputCoding unit, output is provided with error correction decoding unit, and Error Correction of Coding unit and error correction decoding unit adopt HammingCode structure, when data storage cell storage data, first through Hamming code coding, data storage cell sense dataTime, first, through Hamming code decoding, obtain former data, in the time there is a dislocation code in the data in data storage cell,This structure can detect and correct 1-bit error code automatically.
Program storage unit (PSU) moves required binary machine code for storing with output system, establishes at its outputBe equipped with error correction decoding unit, error correction decoding unit adopts Hamming code structure, do not write because program storage unit (PSU) is read-only,Process so binary machine code is carried out to Hamming code coding in advance, then deposit in program storage, program is depositedWhen storage unit is read binary machine code, first, through Hamming code decoding, obtain original binary machine code, work as journeyWhen one dislocation code appears in the binary machine code in order memory cell, this structure can detect and correct 1-bit automaticallyError code.
The binary machine code that processing kernel is exported for fetch program memory cell is also defeated to data storage cellThe data that go out are processed, and in processing kernel, are provided with built-in self-test circuit, and processing kernel has two kinds of workPattern: normal mode and test pattern, when in normal mode, process kernel executive system function; When locatingIn the time of test pattern, whether built-in self-test circuit tests to detect it to processing kernel fault.
Interconnection network is for realizing communication and the exchanges data processed between kernel.
Redundancy kernel, for temporarily replacing the processing kernel executive system normal operating in test pattern, ensures systemThe continuity of system work and real-time, in the time that processing kernel has detected fault, redundancy kernel forever replaces faultProcess kernel.
The first control switching circuit 1 and the second control switching circuit 2 cutting for the treatment of kernel and redundancy kernelChange, in the time that certain processes kernel in test pattern, redundancy kernel is by the first control switching circuit 1 and secondControl switching circuit 2 is switched to get off, and process kernel operation built-in self-test and detect self, and redundancy kernelReplace and process kernel executive system normal function, in the time processing kernel detection fault-free, process kernel and switch superfluousRemaining kernel is got back to original system, and redundancy kernel switches the next processing kernel in test pattern; When processing kernelWhen detection has fault, redundancy kernel forever replaces troubleshooting kernel.
Switch controller is used for controlling the first control switching circuit 1 and the second control switching circuit 2, switching controlsDevice decides the first control switching circuit 1 and the second switching controls according to the result of processing kernel built-in self-testNext step handover operation of circuit 2.
With reference to Fig. 2, built-in self-test circuit comprises test vector generation unit, test response analytic unit, ratioCompared with unit, tagged word ROM and built-in self-test control module, built-in self-test operation is touched by switch controllerSend out, first test vector generation unit generates test vector, and then test vector is applied to and processes on kernel,Processing kernel is input to the test result of generation in test response analytic unit and compresses and produce actual characteristicSymbol, comparing unit compares the desired characteristics symbol of storing in actual characteristic symbol and tagged word ROM,Show whether circuit has fault, above-mentioned test vector generation unit, test corresponding analysis unit, comparing unit andThe operation of tagged word ROM is all to carry out under the control of built-in self-test control module.
The course of work of the present invention is as follows:
When system brings into operation, switch first process kernel with first redundancy kernel, redundancy kernel replaces to be cutThe processing kernel executive system function of changing, the processing kernel switching enters test pattern, starts built-inSelf-test detects self whether have fault, if testing result fault-free will be processed kernel and the switching of redundancy kernel,Make to process kernel and get back to primary circuit, redundancy kernel switches second redundancy kernel, carries out operation similar to the above;If testing result has fault, redundancy kernel will forever replace the operation of processing kernel executive system, in this redundancyCore becomes common process kernel, and next second redundancy kernel starts to carry out identical with first redundancy kernelOperation, i.e. hand-off process kernel, temporary transient replacement, is processed the normal operating of kernel executive system, in processing kernel is carried outBuild self-test operation, above handover operation is by the first control switching circuit 1 and the second control switching circuit 2Under the control of switch controller, carrying out, is exactly more than the workflow of high reliability multiple core processing system. At thisIn system, for N processing kernel is equipped with n redundancy kernel (n≤N), allow at most n processing kernel to go outExisting fault.
More than describing is only example of the present invention, obviously for those skilled in the art,Understand after content of the present invention and principle, all may, in the situation that not deviating from the principle of the invention, structure, enterVarious corrections and change in row form and details, but these correction and changes based on inventive concept still existWithin claim protection domain of the present invention.

Claims (5)

1. a high reliability multiple core processing system, comprises N data storage cell, a N program storage listUnit, N process kernel and interconnection network, wherein:
Described data storage cell is for receiving, store and export pending data;
Described program storage unit (PSU) moves required binary machine code for storing with output system;
Described processing kernel is for the binary machine code of fetch program memory cell output and to data storage listThe data of unit's output are processed;
Described interconnection network is for realizing communication and the exchanges data processed between kernel;
It is characterized in that: the input of each data storage cell is provided with Error Correction of Coding unit, its output is establishedBe equipped with error correction decoding unit, for detection of and correct error code; The output of each program storage unit (PSU) is provided with and entanglesMisexplain code element, for the binary machine code through precoding of program storage unit (PSU) output is detected alsoCorrect error code; In each processing kernel, be provided with built-in self-test circuit, whether have event for detection of processing kernelBarrier; The output of each data storage cell and program storage unit (PSU) is all by the first control switching circuit (1)Be connected with n redundancy kernel, wherein n≤N, redundancy kernel is used for replacing in processing kernel to be detected and faultCore; Processing kernel is connected with interconnection network by the second control switching circuit (2) with redundancy kernel.
2. high reliability multiple core processing system according to claim 1, is characterized in that described error correction volumeCode element and error correction decoding unit adopt Hamming code structure.
3. high reliability multiple core processing system according to claim 1, it is characterized in that described built-in fromTest circuit comprises test vector generation unit, test response analytic unit, comparing unit, tagged word ROMWith built-in self-test control module; Wherein, test vector generation unit is for generation of processing the required test of kernelVector; Test response analytic unit is for processing the test result boil down to actual characteristic symbol of kernel generation;Whether comparing unit is identical with desired characteristics symbol to draw whether fault of circuit for actual characteristic symbol relatively;Tagged word ROM is used for storing desired characteristics symbol; Built-in self-test control module is used for controlling test vector and producesThe operation of raw unit, test response analytic unit, comparing unit and tagged word ROM.
4. high reliability multiple core processing system according to claim 1, is characterized in that in described redundancyCore is identical with processing inner core, for replacing processing kernel to be detected and fault kernel.
5. high reliability multiple core processing system according to claim 1, is characterized in that described first cutsChange control circuit (1) and the second control switching circuit (2) by switch controller to redundancy kernel and process inCore switches.
CN201510955823.4A 2015-12-18 2015-12-18 High-reliability multicore processing system Pending CN105607974A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510955823.4A CN105607974A (en) 2015-12-18 2015-12-18 High-reliability multicore processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510955823.4A CN105607974A (en) 2015-12-18 2015-12-18 High-reliability multicore processing system

Publications (1)

Publication Number Publication Date
CN105607974A true CN105607974A (en) 2016-05-25

Family

ID=55987927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510955823.4A Pending CN105607974A (en) 2015-12-18 2015-12-18 High-reliability multicore processing system

Country Status (1)

Country Link
CN (1) CN105607974A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708655A (en) * 2017-02-16 2017-05-24 深圳前海生生科技有限公司 Memory strengthening method and circuit based on two-dimension error correcting code
CN110750391A (en) * 2019-10-16 2020-02-04 中国电子科技集团公司第五十八研究所 TMR monitoring type-based high-performance anti-irradiation reinforcing system and method
CN112881887A (en) * 2021-01-15 2021-06-01 深圳比特微电子科技有限公司 Chip testing method, computing chip and digital currency mining machine
CN115563019A (en) * 2022-12-05 2023-01-03 天津哈威克科技有限公司 UVM and C combined verification method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1971536A (en) * 2005-11-24 2007-05-30 鸿富锦精密工业(深圳)有限公司 Correcting system and method of basic in-out system
CN101136729A (en) * 2007-09-20 2008-03-05 华为技术有限公司 Method, system and device for implementing high usability
CN101221520A (en) * 2006-11-29 2008-07-16 松下电器产业株式会社 Memory control device, computer system and data reproducing and recording device
CN101258471A (en) * 2005-07-15 2008-09-03 Gsip有限责任公司 Flash error correction
CN102567168A (en) * 2010-12-27 2012-07-11 北京国睿中数科技股份有限公司 BIST (Built-in Self-test) automatic test circuit and test method aiming at PHY (Physical Layer) high-speed interface circuit

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101258471A (en) * 2005-07-15 2008-09-03 Gsip有限责任公司 Flash error correction
CN1971536A (en) * 2005-11-24 2007-05-30 鸿富锦精密工业(深圳)有限公司 Correcting system and method of basic in-out system
CN101221520A (en) * 2006-11-29 2008-07-16 松下电器产业株式会社 Memory control device, computer system and data reproducing and recording device
CN101136729A (en) * 2007-09-20 2008-03-05 华为技术有限公司 Method, system and device for implementing high usability
CN102567168A (en) * 2010-12-27 2012-07-11 北京国睿中数科技股份有限公司 BIST (Built-in Self-test) automatic test circuit and test method aiming at PHY (Physical Layer) high-speed interface circuit

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106708655A (en) * 2017-02-16 2017-05-24 深圳前海生生科技有限公司 Memory strengthening method and circuit based on two-dimension error correcting code
CN106708655B (en) * 2017-02-16 2021-07-16 中云信安(深圳)科技有限公司 Memory reinforcing method and circuit based on two-dimensional error correcting code
CN110750391A (en) * 2019-10-16 2020-02-04 中国电子科技集团公司第五十八研究所 TMR monitoring type-based high-performance anti-irradiation reinforcing system and method
CN110750391B (en) * 2019-10-16 2022-08-02 中国电子科技集团公司第五十八研究所 TMR monitoring type-based high-performance anti-irradiation reinforcing system and method
CN112881887A (en) * 2021-01-15 2021-06-01 深圳比特微电子科技有限公司 Chip testing method, computing chip and digital currency mining machine
CN112881887B (en) * 2021-01-15 2023-02-17 深圳比特微电子科技有限公司 Chip testing method and computing chip
CN115563019A (en) * 2022-12-05 2023-01-03 天津哈威克科技有限公司 UVM and C combined verification method and system

Similar Documents

Publication Publication Date Title
US8732532B2 (en) Memory controller and information processing system for failure inspection
CN100555235C (en) The N-modular redundancy voting system
US10078567B2 (en) Implementing fault tolerance in computer system memory
CN105607974A (en) High-reliability multicore processing system
US9952579B2 (en) Control device
CN112667450B (en) Dynamically configurable fault-tolerant system with multi-core processor
CN102804146A (en) System And Method Of Tracking Error Data Within A Storage Device
CN101615147A (en) The skin satellite is based on the fault-tolerance approach of the memory module of FPGA
CN103247345A (en) Quick-flash memory and detection method for failure memory cell of quick-flash memory
CN101236433A (en) Hardware redundancy based highly reliable A/D collection system and failure testing method
WO2014047225A1 (en) Substitute redundant memory
EP2409231A1 (en) Fault tolerance in integrated circuits
CN104881544A (en) Multi-data triple modular redundancy judgment module based on FPGA (Field Programmable Gate Array)
US20120284586A1 (en) Controller of Memory Device and Method for Operating the Same
US9037948B2 (en) Error correction for memory systems
CN104750577A (en) Random multi-bit fault-tolerant method and device for on-chip large-capacity buffer memory
CN112201378A (en) Hot standby switching method, system, terminal and medium based on nuclear power plant DCS platform
JP5174603B2 (en) Memory error correction method, error detection method, and controller using the same
CN110854826B (en) Fault diagnosis and processing method for two-out-of-three protection system of flexible direct converter valve
CN104134464A (en) System and method for testing address line
CN103177771A (en) Repairable multi-layer memory chip stack and method thereof
US8516336B2 (en) Latch arrangement for an electronic digital system, method, data processing program, and computer program product for implementing a latch arrangement
CN102204099A (en) Resetting device
CN105700996A (en) Log output method and apparatus
Matsumoto et al. Stateful TMR for transient faults

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160525

RJ01 Rejection of invention patent application after publication