CN102681930A - Chip-level error recording method - Google Patents
Chip-level error recording method Download PDFInfo
- Publication number
- CN102681930A CN102681930A CN2012101492112A CN201210149211A CN102681930A CN 102681930 A CN102681930 A CN 102681930A CN 2012101492112 A CN2012101492112 A CN 2012101492112A CN 201210149211 A CN201210149211 A CN 201210149211A CN 102681930 A CN102681930 A CN 102681930A
- Authority
- CN
- China
- Prior art keywords
- error
- register
- mistake
- local
- global
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The invention provides a chip-level error recording method. By using a chip designed by utilizing such a method, an error register aggregate of a hierarchical organization is used for recording an error; the error is reported to an external system through ways of sending interruption and enabling an error pin and the like towards other components in the system; when the error is recorded, the errors at different serious levels can be distinguished and recorded and can be configured as error records enabling or shielding some type as required, wherein a local error register aggregate and a global error register aggregate which are used for recording the error organize and record the error by adopting a hierarchical structure; the local error register aggregate is used for recording the error corresponding to some specific component in the chip; and the global error register aggregate is used for summarizing the error records in each local error register aggregate, and reports the error records to the external system.
Description
Technical field
The present invention relates to the computer chip design field, be specifically related to a kind of chip-scale error logging method.
Background technology
Along with the high speed development of applications such as science calculating, commercial service, government function electronizations, the user is also increasingly high to the requirement of aspects such as the performance of ICT infrastructure such as server, storage and the network equipment, capacity, density, availability, security.And chip is as the basic composition unit of these ICT infrastructure, and its importance is self-evident.In chip design research and development and throwing sheet production run, chip yield is the principal element that influences R&D cycle, design cost, production cost.The repeatedly throwing sheet of chip has not only increased design, checking, test period; And expensive throwing sheet production cost also causes the cost allowance in the entire chip research and development production run; Therefore in early days just should make up prototype system fast in the chip logic design and carry out chip-scale and system-level checking and test, to guarantee once to throw the sheet success ratio.This just needs a kind of method of Efficient and Flexible chip-scale error logging so that in the testing authentication process positioning chip level problem.
Summary of the invention
The purpose of this invention is to provide a kind of chip-scale error logging method.
The objective of the invention is to realize that local error register and global error set of registers adopt hierarchical structure to come tissue registration's mistake: use the corresponding mistake of the concrete parts of the incompatible record chip internal of local error register set by following mode; Use the incompatible error logging that gathers in each local error set of registers of global error register set, and to outside System Reports;
The local error set of registers comprises 1) local error status register, 2) local error control register, 3) the serious grade register of local error, 4) part error log register, 5 first) local follow-up error log register; Wherein:
1) local error status register identifies the every kind of mistake that takes place in the corresponding component, and every kind of wrong 1bit that uses representes, when certain type wrong taken place, bit corresponding in the register was put 1;
2) local error control register; Whether control writes down certain type error that the corresponding component mistake produces; Each position of its bit definition and local error status register is corresponding one by one; If certain control bit in the local error control register is set, then detected corresponding error meeting conductively-closed is not write down and is handled;
3) the serious grade register of local error; The mechanism that is mapped to certain mistake the serious grade of multiple mistake is provided; When taking place, the correspondence mistake can carry out error reporting according to the definition of the error type in the serious grade register of mistake-serious grade mapping relations; Supposing needs to support following 3 kinds of serious grades of mistake: (1) can right the wrong, and the system of being meant can recover and not have losing of information, need not the mistake of the participation of software; Comprise the link crc error, can retransmit through link layer and correct; (2) recoverable error is meant can't need the mistake of recovering through upper layer software (applications) through the hardware mechanisms corrigendum; (3) fatal error; Be meant that possibly to cause specific affairs unreliable; But the mistake that system still can normally move; Comprise mistake that the ECC of the data division that only influences affairs can not correct, the mistake that can't correct or recover through hardware or software, possibly require system reset to return to the mistake of reliable state, comprise that cache multidigit mark is wrong, permanent PCI-E link failure;
The serious grade that every kind of type of error is corresponding need represent with two bit, establishes that the 00b correspondence can be righted the wrong, the corresponding recoverable error of 01b, the corresponding fatal error of 10b, 11b keep use;
4) part error log register first, the corresponding information when being used for writing down certain mistake of corresponding component and being detected first comprises message content, misaddress;
5) local follow-up error log register, the corresponding information when being used for writing down certain wrong follow-up generation removing for the first time of corresponding component comprises error count;
The global error set of registers comprises global error status register 1), global error control register 2), overall situation error log register 3 first), overall follow-up error log register 4), system event status register 5) and system event control register 6), wherein:
Global error status register 1), identify in the chip and whether make a mistake in each parts, the error condition of each parts uses 1bit to represent, when certain parts made a mistake, bit corresponding in the register was put 1;
Global error control register 2); Whether control writes down the mistake that certain parts produces; Its bit definition and each position of global error status register are corresponding one by one; If certain control bit in the global error status register is set, then the mistake of detected corresponding component meeting conductively-closed is not write down and is handled;
The overall situation is error log register 3 first) and overall follow-up error log register 4) when writing down each parts respectively and making a mistake first and the field data during follow-up making a mistake;
System event status register 5) the wrong corresponding serious grade of each parts generation of record chip;
System event control register 6) define the mapping relations of serious grade-report manner, the mistake of configurable certain serious grade comprises sending out and interrupts, enables wrong pin to the mode of other assemblies reports of system;
Concrete steps are following:
1) certain parts produces a certain type of mistake;
2) judge whether in " local error control register ", whether to have shielded such mistake,, then do not write down this mistake, finish if shield; Otherwise, the corresponding bit in " local error status register " is set;
3) judge whether this type of mistake takes place first,, then upgrade " part is the error log register first " content if take place first; Otherwise, upgrade " local follow-up error log register " content;
4) configuration in the basis " the serious grade register of local error " is to global report's mistake;
5 judge whether in " global error control register ", whether to have shielded this parts relevant error, if shield, then do not write down this mistake, finish; Otherwise, the corresponding bit in " global error status register " is set;
6) judge whether to take place first this parts relevant error,, then upgrade " overall situation is the error log register first " content if take place first; Otherwise, upgrade " overall follow-up error log register " content;
7) the corresponding wrong serious grade in the renewal " system event status register ";
8), interrupt, enable mode such as wrong pin through sending out to outside System Reports mistake according to the serious grade-report manner mapping relations of configuration in " system event control register ".So far, whole error logging and reporting process finish.
The invention has the beneficial effects as follows: but this possess stratification tissue, and the chip-scale error logging method of flexible configuration; Remedied that traditional die error logging method efficient is low, the deficiency of very flexible; Can shorten chip design, checking, test period and can guarantee effectively that chip once throws the sheet success ratio, thereby have boundless development prospect and high technological value.
Description of drawings
Accompanying drawing 1 is the error register set synoptic diagram of stratification tissue;
Accompanying drawing 2 chip-scale error logging schematic flow sheets.
Embodiment
With reference to the accompanying drawings, content of the present invention is described by the process that realizes the chip-scale error logging method of description in the invention with an instantiation.
The present invention proposes a kind of chip-scale error logging method; Utilize the chip of this method design; Use the error register of stratification tissue to gather to come misregistration, and send out interruption and enable mode such as wrong pin to outside System Reports mistake through other assemblies in system; When misregistration, can distinguish and write down the mistake of different serious grades, and can be configured to enable or shield certain type error logging based on needs.
The part and the overall situation two kinds of error registers set of misregistration of being used among the present invention adopts hierarchical structure to come tissue registration's mistake: wherein, use the corresponding mistake of the concrete parts of the incompatible record chip internal of local error register set; Use the incompatible error logging that gathers in each local error set of registers of global error register set, and to outside System Reports, as shown in Figure 1:
The local error set of registers comprises local error status register, local error control register, the serious grade register of local error, part error log register, local follow-up error log register first.Wherein:
The local error status register identifies the every kind of mistake that takes place in the corresponding component, and every kind of wrong 1bit that uses representes.When certain type wrong taken place, bit corresponding in the register was put 1.
Whether the control of local error control register writes down certain type error that the corresponding component mistake produces, and each position of its bit definition and local error status register is corresponding one by one.If certain control bit in the local error control register is set, then detected corresponding error meeting conductively-closed is not write down and is handled.
The serious grade register of local error provides the mechanism that is mapped to certain mistake the serious grade of multiple mistake, can carry out error reporting according to the definition of the error type in the serious grade register of mistake-serious grade mapping relations when the correspondence mistake takes place.Supposing needs to support following 3 kinds of serious grades of mistake: can righting the wrong, (system can recover and not have losing of information, need not the participation of software.Like the link crc error, can retransmit through link layer and correct), recoverable error (can't be through hardware mechanisms corrigendum, need the mistake recovered through upper layer software (applications), possibly cause specific affairs unreliable, but system still can normally move.The mistake that can not correct like ECC; Only influence the data division of affairs), fatal error (can't correct or recover through hardware or software; Possibly require system reset to return to reliable state, like cache multidigit mark mistake, permanent PCI-E link failure etc.).The serious grade that then every kind of type of error is corresponding need represent with two bit, can establish that the 00b correspondence can be righted the wrong, the corresponding recoverable error of 01b, the corresponding fatal error of 10b, 11b keep use.
The part is the error log register first, and the corresponding information when being used for writing down certain mistake of corresponding component and being detected first is like message content, misaddress etc.
Local follow-up error log register, the corresponding information when being used for writing down certain wrong follow-up generation removing for the first time of corresponding component is like error count etc.
The global error set of registers comprises global error status register, global error control register, the overall situation error log register, overall follow-up error log register, system event status register and system event control register first.Wherein:
The global error status register identifies in the chip and whether makes a mistake in each parts, and the error condition of each parts uses 1bit to represent.When certain parts made a mistake, bit corresponding in the register was put 1.
Whether the control of global error control register writes down the mistake that certain parts produces, and its bit definition and each position of global error status register are corresponding one by one.If certain control bit in the global error status register is set, then the mistake of detected corresponding component meeting conductively-closed is not write down and is handled.
Whether the control of global error control register writes down the mistake that certain parts produces, and its bit definition and each position of global error status register are corresponding one by one.If certain control bit in the global error status register is set, then the mistake of detected corresponding component meeting conductively-closed is not write down and is handled.
The field data of overall situation when error log register and overall follow-up error log register write down each parts respectively and make a mistake first first and during follow-up making a mistake.
The wrong corresponding serious grade that each parts of system event status register record chip take place.
The system event control register defines the mapping relations of serious grade-report manner, and the mistake of configurable certain serious grade is interrupted, enables wrong pin etc. to the mode of other assembly reports of system as sending out.
As described in the summary of the invention, the error logging process among the present invention sees accompanying drawing 2 for details, and concrete steps are following:
1) certain parts produces a certain type of mistake;
2) judge whether in " local error control register ", whether to have shielded such mistake,, then do not write down this mistake, finish if shield; Otherwise, the corresponding bit in " local error status register " is set;
3) judge whether this type of mistake takes place first,, then upgrade " part is the error log register first " content if take place first; Otherwise, upgrade " local follow-up error log register " content;
4) configuration in the basis " the serious grade register of local error " is to global report's mistake;
5) judge whether in " global error control register ", whether to have shielded this parts relevant error,, then do not write down this mistake, finish if shield; Otherwise, the corresponding bit in " global error status register " is set;
6) judge whether to take place first this parts relevant error,, then upgrade " overall situation is the error log register first " content if take place first; Otherwise, upgrade " overall follow-up error log register " content;
7) the corresponding wrong serious grade in the renewal " system event status register ";
8), interrupt, enable mode such as wrong pin through sending out to outside System Reports mistake according to the serious grade-report manner mapping relations of configuration in " system event control register ".So far, whole error logging and reporting process finish.
Except that the described technical characterictic of instructions, be the known technology of those skilled in the art.
Claims (1)
1. a chip-scale error logging method is characterized in that, local error register and global error set of registers adopt hierarchical structure to come tissue registration's mistake: use the corresponding mistake of the concrete parts of the incompatible record chip internal of local error register set; Use the incompatible error logging that gathers in each local error set of registers of global error register set, and to outside System Reports;
The local error set of registers comprises 1) local error status register, 2) local error control register, 3) the serious grade register of local error, 4) part error log register, 5 first) local follow-up error log register; Wherein:
1) local error status register identifies the every kind of mistake that takes place in the corresponding component, and every kind of wrong 1bit that uses representes, when certain type wrong taken place, bit corresponding in the register was put 1;
2) local error control register; Whether control writes down certain type error that the corresponding component mistake produces; Each position of its bit definition and local error status register is corresponding one by one; If certain control bit in the local error control register is set, then detected corresponding error meeting conductively-closed is not write down and is handled;
3) the serious grade register of local error; The mechanism that is mapped to certain mistake the serious grade of multiple mistake is provided; When taking place, the correspondence mistake can carry out error reporting according to the definition of the error type in the serious grade register of mistake-serious grade mapping relations; Supposing needs to support following 3 kinds of serious grades of mistake: (1) can right the wrong, and the system of being meant can recover and not have losing of information, need not the mistake of the participation of software; Comprise the link crc error, can retransmit through link layer and correct; (2) recoverable error is meant can't need the mistake of recovering through upper layer software (applications) through the hardware mechanisms corrigendum; (3) fatal error; Be meant that possibly to cause specific affairs unreliable; But the mistake that system still can normally move; Comprise mistake that the ECC of the data division that only influences affairs can not correct, the mistake that can't correct or recover through hardware or software, possibly require system reset to return to the mistake of reliable state, comprise that cache multidigit mark is wrong, permanent PCI-E link failure;
The serious grade that every kind of type of error is corresponding need represent with two bit, establishes that the 00b correspondence can be righted the wrong, the corresponding recoverable error of 01b, the corresponding fatal error of 10b, 11b keep use;
4) part error log register first, the corresponding information when being used for writing down certain mistake of corresponding component and being detected first comprises message content, misaddress;
5) local follow-up error log register, the corresponding information when being used for writing down certain wrong follow-up generation removing for the first time of corresponding component comprises error count;
The global error set of registers comprises global error status register 1), global error control register 2), overall situation error log register 3 first), overall follow-up error log register 4), system event status register 5) and system event control register 6), wherein:
Global error status register 1), identify in the chip and whether make a mistake in each parts, the error condition of each parts uses 1bit to represent, when certain parts made a mistake, bit corresponding in the register was put 1;
Global error control register 2); Whether control writes down the mistake that certain parts produces; Its bit definition and each position of global error status register are corresponding one by one; If certain control bit in the global error status register is set, then the mistake of detected corresponding component meeting conductively-closed is not write down and is handled;
The overall situation is error log register 3 first) and overall follow-up error log register 4) when writing down each parts respectively and making a mistake first and the field data during follow-up making a mistake;
System event status register 5) the wrong corresponding serious grade of each parts generation of record chip;
System event control register 6) define the mapping relations of serious grade-report manner, the mistake of configurable certain serious grade comprises sending out and interrupts, enables wrong pin to the mode of other assemblies reports of system;
Concrete steps are following:
1) certain parts produces a certain type of mistake;
2) judge whether in " local error control register ", whether to have shielded such mistake,, then do not write down this mistake, finish if shield; Otherwise, the corresponding bit in " local error status register " is set;
3) judge whether this type of mistake takes place first,, then upgrade " part is the error log register first " content if take place first; Otherwise, upgrade " local follow-up error log register " content;
4) configuration in the basis " the serious grade register of local error " is to global report's mistake;
5 judge whether in " global error control register ", whether to have shielded this parts relevant error, if shield, then do not write down this mistake, finish; Otherwise, the corresponding bit in " global error status register " is set;
6) judge whether to take place first this parts relevant error,, then upgrade " overall situation is the error log register first " content if take place first; Otherwise, upgrade " overall follow-up error log register " content;
7) the corresponding wrong serious grade in the renewal " system event status register ";
8) according to the serious grade-report manner mapping relations of configuration in " system event control register ", interrupt, enable mode such as wrong pin through sending out to outside System Reports mistake, so far, whole error logging and reporting process finish.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210149211.2A CN102681930B (en) | 2012-05-15 | 2012-05-15 | A kind of chip-scale error logging method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210149211.2A CN102681930B (en) | 2012-05-15 | 2012-05-15 | A kind of chip-scale error logging method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102681930A true CN102681930A (en) | 2012-09-19 |
CN102681930B CN102681930B (en) | 2016-08-17 |
Family
ID=46813895
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210149211.2A Active CN102681930B (en) | 2012-05-15 | 2012-05-15 | A kind of chip-scale error logging method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102681930B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104133751A (en) * | 2014-08-06 | 2014-11-05 | 浪潮(北京)电子信息产业有限公司 | Chip debugging method and chip |
CN104407952A (en) * | 2014-11-12 | 2015-03-11 | 浪潮(北京)电子信息产业有限公司 | Method and system for debugging through multi-CPU (central processing unit) node controller chip |
WO2015196941A1 (en) * | 2014-06-24 | 2015-12-30 | 华为技术有限公司 | Method for processing error catalogs of node in cc-numa system and node |
CN110399317A (en) * | 2019-07-15 | 2019-11-01 | 西安微电子技术研究所 | A kind of multifunctional controller that the software of embedded system is adaptive |
CN113832663A (en) * | 2021-09-18 | 2021-12-24 | 珠海格力电器股份有限公司 | Control chip fault recording method and device and control chip fault reading method |
US11385952B2 (en) * | 2018-11-28 | 2022-07-12 | Intel Corporation | Apparatus and method for scalable error detection and reporting |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050219886A1 (en) * | 2003-11-06 | 2005-10-06 | Kyoji Marumoto | Memory device with built-in test function and method for controlling the same |
CN101599812A (en) * | 2008-06-04 | 2009-12-09 | 富士通株式会社 | Data transmission set |
CN102073533A (en) * | 2011-01-14 | 2011-05-25 | 中国人民解放军国防科学技术大学 | Multicore architecture supporting dynamic binary translation |
-
2012
- 2012-05-15 CN CN201210149211.2A patent/CN102681930B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050219886A1 (en) * | 2003-11-06 | 2005-10-06 | Kyoji Marumoto | Memory device with built-in test function and method for controlling the same |
CN101599812A (en) * | 2008-06-04 | 2009-12-09 | 富士通株式会社 | Data transmission set |
CN102073533A (en) * | 2011-01-14 | 2011-05-25 | 中国人民解放军国防科学技术大学 | Multicore architecture supporting dynamic binary translation |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015196941A1 (en) * | 2014-06-24 | 2015-12-30 | 华为技术有限公司 | Method for processing error catalogs of node in cc-numa system and node |
US9652407B2 (en) | 2014-06-24 | 2017-05-16 | Huawei Technologies Co., Ltd. | Method for processing error directory of node in CC-NUMA system, and node |
CN104133751A (en) * | 2014-08-06 | 2014-11-05 | 浪潮(北京)电子信息产业有限公司 | Chip debugging method and chip |
CN104407952A (en) * | 2014-11-12 | 2015-03-11 | 浪潮(北京)电子信息产业有限公司 | Method and system for debugging through multi-CPU (central processing unit) node controller chip |
US11385952B2 (en) * | 2018-11-28 | 2022-07-12 | Intel Corporation | Apparatus and method for scalable error detection and reporting |
US11704181B2 (en) | 2018-11-28 | 2023-07-18 | Intel Corporation | Apparatus and method for scalable error detection and reporting |
CN110399317A (en) * | 2019-07-15 | 2019-11-01 | 西安微电子技术研究所 | A kind of multifunctional controller that the software of embedded system is adaptive |
CN110399317B (en) * | 2019-07-15 | 2020-12-25 | 西安微电子技术研究所 | Software self-adaptive multifunctional controller of embedded system |
CN113832663A (en) * | 2021-09-18 | 2021-12-24 | 珠海格力电器股份有限公司 | Control chip fault recording method and device and control chip fault reading method |
CN113832663B (en) * | 2021-09-18 | 2022-08-16 | 珠海格力电器股份有限公司 | Control chip fault recording method and device and control chip fault reading method |
Also Published As
Publication number | Publication date |
---|---|
CN102681930B (en) | 2016-08-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102681930A (en) | Chip-level error recording method | |
Gunawi et al. | Fail-slow at scale: Evidence of hardware performance faults in large production systems | |
US9495233B2 (en) | Error framework for a microprocesor and system | |
US11501800B2 (en) | Hard disk fault handling method, array controller, and hard disk | |
CN102272731A (en) | Apparatus, system, and method for predicting failures in solid-state storage | |
CN102216904B (en) | Programmable error actions for a cache in a data processing system | |
CN107015890B (en) | Storage device, server system having the same, and method of operating the same | |
CN103680639B (en) | The periodicity of a kind of random access memory is from error detection restoration methods | |
CN105468484A (en) | Method and apparatus for determining fault location in storage system | |
CN102521058A (en) | Disk data pre-migration method of RAID (Redundant Array of Independent Disks) group | |
US10095570B2 (en) | Programmable device, error storage system, and electronic system device | |
EP3054626B1 (en) | Data processing method and device for storage unit | |
CN102915260B (en) | The method that solid state hard disc is fault-tolerant and solid state hard disc thereof | |
CN103092728A (en) | Recovery method and recovery device of abrasion errors of nonvolatile memory | |
CN102819480A (en) | Computer and method for monitoring memory thereof | |
CN100429626C (en) | Information processing apparatus and error detecting method | |
US20160132382A1 (en) | Computing system with debug assert mechanism and method of operation thereof | |
CN203097882U (en) | High-precision pressure gauge for underground data acquisition | |
CN107807862A (en) | Detect the method, apparatus and server of hard disk failure point | |
CN106354580A (en) | Data recovery method and device | |
CN102750194A (en) | Large-scale integrated circuit level error recording and responding method | |
CN103390429B (en) | The online test method of a kind of hard disk and server | |
CN101814046A (en) | Dual redundant bus synchronizing and voting circuit based on programmable device | |
CN110618891B (en) | Solid state disk fault online processing method and solid state disk | |
CN104407952A (en) | Method and system for debugging through multi-CPU (central processing unit) node controller chip |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |