CN102750194A - Large-scale integrated circuit level error recording and responding method - Google Patents

Large-scale integrated circuit level error recording and responding method Download PDF

Info

Publication number
CN102750194A
CN102750194A CN2012102087119A CN201210208711A CN102750194A CN 102750194 A CN102750194 A CN 102750194A CN 2012102087119 A CN2012102087119 A CN 2012102087119A CN 201210208711 A CN201210208711 A CN 201210208711A CN 102750194 A CN102750194 A CN 102750194A
Authority
CN
China
Prior art keywords
error
register
mistake
level
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012102087119A
Other languages
Chinese (zh)
Inventor
王恩东
胡雷钧
李仁刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN2012102087119A priority Critical patent/CN102750194A/en
Publication of CN102750194A publication Critical patent/CN102750194A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a large-scale integrated circuit level error recording and responding method, comprising the following steps of: detecting, collecting and recording all errors generated inside a part module in a gathered manner by a level register of the part module, and mapping all the errors of the part module to an error level register according to an error level, and grading all the errors of the part module; collecting error information reported by each module by a chip overall level register in the gathered manner, and mapping the error information to be different system events; and finally mapping the system events in the gathered manner by the chip overall level register, so that a system is triggered to respond the errors. A level error processing mechanism provided by the invention is used for collecting, classifying and reporting the errors in module layering, error filtering and error grading manners, so that the system error processing efficiency is greatly improved, and the resource waste caused by large-range error collection and process inside a chip can be compensated.

Description

A kind of large scale integrated circuit level error logging and response method
Technical field
The invention belongs to the IC design technical field, relate to a kind of large scale integrated circuit level error logging and response method.
Background technology
Develop rapidly along with integrated circuit technique; Large-scale IC design more and more becomes the essential characteristic in this field; The physical size of chip is increasing; Its integrated number of transistors is more and more, integrated tens functional parts of chip internal that function is complicated, and this has brought test with regard to the design for chip internal fault processing mechanism.The computer system of integrated large scale integrated circuit faces this problem equally; The computer system functions of the integrated feasible complicacy of tens functional parts of chip internal is achieved; Equally also brought challenge,, caused the also complicacy extremely of implementation of the mechanism such as error-detecting, record, correction, report of each functional part of chip internal because the design function of computer system is complicated for the design difficulty of the fault processing of chip internal mechanism; Adopt rational fault processing mechanism can efficiently realize the function of chip; Guarantee the correction and the reparation of system mistake, avoid large-scale error message collection and handle, and then the performance of the system of assurance and execution efficient.
So; To the defective of above-mentioned prior art in the design aspect existence of the fault processing mechanism of chip internal; Be necessary to study in fact,, improve the treatment effeciency of staging errors such as gross mistake so that a kind of large scale integrated circuit level error logging and response method to be provided; And the implementation effect of staging error such as minor error, the efficient operation of powerful guarantee system.
Summary of the invention
For addressing the above problem; The object of the present invention is to provide a kind of large scale integrated circuit level error logging and response method; Improving the treatment effeciency of staging errors such as gross mistake, and the implementation effect of staging error such as minor error, the efficient operation of safeguards system.
For realizing above-mentioned purpose, technical scheme of the present invention is:
A kind of large scale integrated circuit level error logging and response method comprise the steps:
The institute that the detection of component models level set of registers, collection and minute book inside modules take place is wrong, and is mapped to the errorlevel register according to the errorlevel mistake that this module is all, to this module wrong grade classification;
Chip global level set of registers is collected the error message of each module report, and it is mapped as different system events;
The final mapped system incident of chip global level set of registers, triggering system responds mistake.
Further, the mistake of said chip global level register collection and record is that each module institute after the small mistake of shielding, that possibly cause system event is wrong.
Further, the grade classification of said mistake can be divided into following three ranks: correctable error, can cover mistake and correctable error not.
Further; Said component models level register is local register; It includes the local error logging register of the error message of being responsible for inner all generations of logging modle, is responsible for inner all the generable mistakes of collection modules, and with the local error status register of the mis-classification of current generation; The local wrong the control register whether mistake of the current generation of responsible control is reported, and the errorlevel register of being responsible for mistake is mapped to three kinds of error severity grades.
Further, said chip global level register is the global register set, and it collects the higher error message of serious grade of each module report; The set of said global register include be used to write down and the global state register of the error message of each module report of classifying, be used for the institute of report wrong in the seriousness grade separation; And the wrong priority processing that serious grade is high, the global error seriousness grade set of registers of realization error burst processing.
Large scale integrated circuit level error logging of the present invention and response method are effectively collected and classification error through the error logging and the report mechanism of multilayer levels such as module level mistake and chip-scale mistake; Shield small mistake, realize wrong fast processing through efficient mapping; This level fault processing mechanism through modes such as module layering, mistake filtration, mistake classification collect, classification, reporting errors; Improved the efficient that system mistake is handled greatly; Remedy the wrong collection of chip internal on a large scale and handled the wasting of resources that is brought; Thereby have boundless application prospect, have very high technological value.
Description of drawings
Fig. 1 is fault processing mechanism hierarchical structure figure of the present invention;
Fig. 2 is the process flow diagram of one embodiment of the invention;
Fig. 3 is the formation diagram of local register of the present invention;
Fig. 4 is the formation diagram of global register set of the present invention.
Embodiment
In order to make the object of the invention, technical scheme and advantage clearer,, the present invention is further elaborated below in conjunction with accompanying drawing and embodiment.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
Large scale integrated circuit level error logging of the present invention and response method are mainly considered the influence of large scale integrated circuit fault processing mechanism structure complex design; Adopt the detection of module local error, collection and record; According to the errorlevel mechanism of dividing the design of system responses is compiled and is produced to the mistake of gross mistake grade, with the fault processing efficient of raising system.The mistake of each module of the collecting method through the error severity grade classification is mapped to error severity grade register; Realize wrong seriousness grade separation; Adopt different error reportings to produce the system event that is directed to different errorlevels with response mechanism; Thereby improved the treatment effeciency of staging errors such as gross mistake greatly, and the implementation effect of staging error such as minor error, powerful guarantee the efficient operation of system.
Please with reference to Fig. 1, shown in Figure 2, large scale integrated circuit level error logging of the present invention and response method mainly comprise the steps:
The institute that the detection of component models level set of registers, collection and minute book inside modules take place is wrong, and is mapped to the errorlevel register according to the errorlevel mistake that this module is all, to this module wrong grade classification;
Chip global level set of registers is collected the error message of each module report, and it is mapped as different system events;
Global register is gathered final mapped system incident, and triggering system responds mistake.
Wherein, realize functions such as collection that inside modules makes a mistake, record, classification, grade classification in each component models indoor design set of registers, when mistake takes place, can among a small circle, realize the quick judgement of mistake; And realize the functions such as collection, record, classification, grade classification of each module error in chip overall situation design set of registers; The mistake of its collection and record is that each module institute after the small mistake of shielding, that possibly cause system event is wrong, and the most at last according to errorlevel mapped system incident.
In embodiments of the present invention, the grade classification of mistake can be divided into following three ranks: correctable error (Correctable Error), can cover mistake (Recoverable Error) and correctable error (Fatal Error) not; The error severity that they are corresponding different respectively, wherein Correctable Error participates in the mistake that hardware can be corrected for not needing software; Recoverable Error is the repairable mistake of software; And Fatal Error is the mistake that software and hardware all can not be corrected.
Shown in Figure 2 is the idiographic flow diagram of one embodiment of the invention, and it specifically comprises the steps:
Step 1, local inside modules mistake take place;
Step 2, local register misregistration relevant information;
Step 3, in the set of local error status register, search corresponding error condition, and with this register set;
Search the also seriousness grade of classification error in step 4, the local wrong serious grade set of registers, serious junior type of error is with conductively-closed; What serious grade was high will be by the overall situation that reports;
The higher error message of serious grade of each module report is collected in step 5, global register set;
The error message of step 6, global state register record and each module report of classifying;
The institute that step 7, global error seriousness grade set of registers will be reported is wrong in the seriousness grade separation, and the wrong priority processing that serious grade is high, realizes the error burst processing;
The mistake of step 8, stringization triggers corresponding system event one by one.
Please, adopt wrong level record and report response mechanism in the embodiment of the invention, be divided into two levels of component models and the chip overall situation with reference to Fig. 3, shown in Figure 4; Wherein, The institute that component models level set of registers is responsible for detecting, collection and minute book inside modules take place is wrong; And be mapped to the errorlevel register according to the errorlevel mistake that this module is all; Realize this module wrong grade classification, can improve fault processing through the extremely low mistake of shielding errorlevel and carry out efficient.After the seriousness grade of the final mapping error of local set of registers; The judgement whether mistake is reported; Wherein, Local register mainly includes the local error logging register (Local Error Log Register) of the error message of being responsible for inner all generations of logging modle; Be responsible for inner all the generable mistakes of collection modules; And, be responsible for the local wrong control register (Local Error Control Register) whether the mistake of the current generation of control is reported, and the errorlevel register (Error Grade Register) of being responsible for mistake is mapped to three kinds of error severity grades with the local error status register (Local Error Status Register) of the mis-classification of current generation.Chip global level set of registers is responsible for collecting the error message of each module report; And it is mapped as different system events; Its set of registers is accomplished functions such as wrong collection, record, classification, grade classification equally; Wherein System Event Register is responsible for writing down the seriousness that makes a mistake, and sends one or more system event according to type of error and seriousness.
In the embodiment of the invention when large scale integrated circuit internal part module makes a mistake; The level method for designing can be preferentially in less range detection, record, classification error information; And realize wrong the filtration and the error severity classification; Then the global error register is handed in the error message of possibility triggering system incident, carried out error logging, classification, seriousness grade classification equally, finally trigger one or more system event.
In the computer system operational process; Large scale integrated circuit is realized the multiprocessor interconnection; Complicated that the multiprocessor characteristic makes integrated circuit realize that tens functional parts are to satisfy the systemic-function requirement; The ill-formalness of various types of ill-formalnesses and multiple errorlevel possibly take place in each functional part, for example, and Correctable Error, Recoverable Error and Fatal Error.Suppose that Module1 is the interconnect interface parts of chip, when this parts generation link CRC detection failure, Local Error Log Register writes down this wrong relevant information; All type of errors of having gathered the interconnect interface parts among the Local Error Status Register; This moment, the corresponding position of CRC bug was changed to " 1 ", and expression the CRC bug takes place at this moment, because the CRC bug can be corrected by the retransmission mechanism of hardware designs; Promptly belong to Correctable Error; So analysis logic is configured to " 1 " with the control bit of Local Error Control Register, represent that this errorlevel is extremely low, can be corrected by hardware voluntarily; Do not need software to participate in; Local Error Control Register control bit is configured to " 1 " promptly this error masking, and minimizing should mistake consume more resources, should mistake be used for status checking by record simultaneously.
When this parts generation link linkage fault; Equally by this wrong relevant information of Local Error Log Register record; All type of errors of having gathered the interconnect interface parts among the Local Error Status Register, this moment, the corresponding position of link linkage fault mistake was changed to " 1 ", and the link linkage fault takes place at this moment in expression; Because the link linkage fault can cause data transmission fails and cause system's machine of delaying; So such fault belongs to the high severity level mistake of Fatal Error, can priority processing, through the mistake of global error register (Global Error Register) collect, after the operation of classification, ordering; The high serious grade error bit of global error control register (Global Error Control Register) is changed to " 1 "; The high severity level mistake of Fatal Error takes place in the expression system, and to have write down type of error in the global error status register (Global Error Status Register) be the link linkage fault, will trigger corresponding system event thus; For example, produce incident such as system reset.In addition,, will carry out wrong serialization and handle based on each wrong seriousness grade classification when system takes place multinomial parallelly when wrong, the mistake of the high seriousness grade of priority treatment, and then the safeguards system reliability with carry out efficient.
The present invention effectively collects and classification error through the error logging and the report mechanism of multilayer levels such as module level mistake and chip-scale mistake, shields small mistake, realizes wrong fast processing through efficient mapping; The characteristic of local component models set of registers; Mainly be meant in each component models indoor design set of registers and realize functions such as collection that inside modules makes a mistake, record, classification, grade classification when mistake takes place, can among a small circle, realizing the quick judgement of mistake; The characteristic of chip global register set; Mainly be meant the functions such as collection, record, classification, grade classification that realize each module error in chip overall situation design set of registers; The mistake of its collection and record is that each module institute after the small mistake of shielding, that possibly cause system event is wrong, and the most at last according to errorlevel mapped system incident.The characteristic of mistake classification and strobe utility is meant that mainly the result that can produce according to mistake classifies mistake according to different grade, and takes strobe utility to improve the system mistake treatment effeciency mistake of small grade.This level fault processing mechanism through modes such as module layering, mistake filtration, mistake classification collect, classification, reporting errors; Improved the efficient that system mistake is handled greatly; Remedy the wrong collection of chip internal on a large scale and handled the wasting of resources that is brought; Thereby have boundless application prospect, have very high technological value.
The above is merely preferred embodiment of the present invention, not in order to restriction the present invention, all any modifications of within spirit of the present invention and principle, being done, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.

Claims (5)

1. large scale integrated circuit level error logging and response method is characterized in that, comprise the steps:
The institute that the detection of component models level set of registers, collection and minute book inside modules take place is wrong, and is mapped to the errorlevel register according to the errorlevel mistake that this module is all, to this module wrong grade classification;
Chip global level set of registers is collected the error message of each module report, and it is mapped as different system events;
The final mapped system incident of chip global level set of registers, triggering system responds mistake.
2. large scale integrated circuit level error logging and response method according to claim 1 is characterized in that: said chip global level register collect and the mistake of record be each module after the small mistake of shielding, possibly cause system event wrong.
3. like claim 2 said large scale integrated circuit level error logging and response method, it is characterized in that: the grade classification of said mistake can be divided into following three ranks: correctable error, can cover mistake and correctable error not.
4. like claim 3 said large scale integrated circuit level error logging and response method; It is characterized in that: said component models level register is local register; It includes the local error logging register of the error message of being responsible for inner all generations of logging modle; Be responsible for inner all the generable mistakes of collection modules; And, be responsible for the local wrong the control register whether mistake of the current generation of control is reported, and the errorlevel register of being responsible for mistake is mapped to three kinds of error severity grades with the local error status register of the mis-classification of current generation.
5. like claim 4 said large scale integrated circuit level error logging and response method, it is characterized in that: said chip global level register is the global register set, and it collects the higher error message of serious grade of each module report; The set of said global register include be used to write down and the global state register of the error message of each module report of classifying, be used for the institute of report wrong in the seriousness grade separation; And the wrong priority processing that serious grade is high, the global error seriousness grade set of registers of realization error burst processing.
CN2012102087119A 2012-06-25 2012-06-25 Large-scale integrated circuit level error recording and responding method Pending CN102750194A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012102087119A CN102750194A (en) 2012-06-25 2012-06-25 Large-scale integrated circuit level error recording and responding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012102087119A CN102750194A (en) 2012-06-25 2012-06-25 Large-scale integrated circuit level error recording and responding method

Publications (1)

Publication Number Publication Date
CN102750194A true CN102750194A (en) 2012-10-24

Family

ID=47030411

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012102087119A Pending CN102750194A (en) 2012-06-25 2012-06-25 Large-scale integrated circuit level error recording and responding method

Country Status (1)

Country Link
CN (1) CN102750194A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104407952A (en) * 2014-11-12 2015-03-11 浪潮(北京)电子信息产业有限公司 Method and system for debugging through multi-CPU (central processing unit) node controller chip
CN110413490A (en) * 2019-08-01 2019-11-05 北京百度网讯科技有限公司 Determine method, the error message code classification method and device of type of error message
CN111682966A (en) * 2020-05-26 2020-09-18 中国人民解放军国防科技大学 Network communication device with fault active reporting function, system and method thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1740986A (en) * 2004-08-24 2006-03-01 华为技术有限公司 Method for warning correlation mask
US20060150009A1 (en) * 2004-12-21 2006-07-06 Nec Corporation Computer system and method for dealing with errors
CN101145681A (en) * 2007-02-09 2008-03-19 湖南科技大学 Multi-function integrated relay protection device for mine
US20080316013A1 (en) * 2004-07-08 2008-12-25 Andrew Corporation Supervising Arrangement

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080316013A1 (en) * 2004-07-08 2008-12-25 Andrew Corporation Supervising Arrangement
CN1740986A (en) * 2004-08-24 2006-03-01 华为技术有限公司 Method for warning correlation mask
US20060150009A1 (en) * 2004-12-21 2006-07-06 Nec Corporation Computer system and method for dealing with errors
CN101145681A (en) * 2007-02-09 2008-03-19 湖南科技大学 Multi-function integrated relay protection device for mine

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104407952A (en) * 2014-11-12 2015-03-11 浪潮(北京)电子信息产业有限公司 Method and system for debugging through multi-CPU (central processing unit) node controller chip
CN110413490A (en) * 2019-08-01 2019-11-05 北京百度网讯科技有限公司 Determine method, the error message code classification method and device of type of error message
CN110413490B (en) * 2019-08-01 2023-07-14 阿波罗智能技术(北京)有限公司 Method for determining error information type, error information code classification method and device
CN111682966A (en) * 2020-05-26 2020-09-18 中国人民解放军国防科技大学 Network communication device with fault active reporting function, system and method thereof
CN111682966B (en) * 2020-05-26 2022-08-19 中国人民解放军国防科技大学 Network communication device with fault active reporting function, system and method thereof

Similar Documents

Publication Publication Date Title
US8095759B2 (en) Error management firewall in a multiprocessor computer
CN101551763B (en) Method and device for repairing single event upset in field programmable logic gate array
Radetzki et al. Methods for fault tolerance in networks-on-chip
CN107799151B (en) Solid State Disk (SSD) and method and system for high-availability peripheral component interconnect express (PCIe) SSD
Kshirsagar et al. Design of a novel fault-tolerant voter circuit for TMR implementation to improve reliability in digital circuits
CN101561477A (en) Method and device for testing single event upset in in-field programmable logic gate array
CN101551762B (en) Spaceborne processing platform resisting single event effect
CN103984630A (en) Single event upset fault processing method based on AT697 processor
CN102033789A (en) Reliability analysis method for embedded safety-critical system
CN106293984A (en) A kind of computer glitch automatically processes mode and device
JP6290934B2 (en) Programmable device, error holding system, and electronic system apparatus
CN103761172A (en) Hardware fault diagnosis system based on neural network
CN102750194A (en) Large-scale integrated circuit level error recording and responding method
US20100199121A1 (en) Error management watchdog timers in a multiprocessor computer
Fiorin et al. Fault-tolerant network interfaces for networks-on-Chip
CN102681930B (en) A kind of chip-scale error logging method
CN116049249A (en) Error information processing method, device, system, equipment and storage medium
US9601217B1 (en) Methods and circuitry for identifying logic regions affected by soft errors
CN105320575B (en) A kind of self checking of duplication redundancy streamline and recovery device and method
JP2009526299A (en) High-speed redundant data processing system
Rivers et al. Reliability challenges and system performance at the architecture level
Lee et al. Evaluation of error detection coverage and fault-tolerance of digital plant protection system in nuclear power plants
WO2014115289A1 (en) Programmable device and electronic syst em device
Bolchini et al. An integrated flow for the design of hardened circuits on SRAM-based FPGAs
Wong et al. System-level reliability using component-level failure signatures

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20121024