CN102681930B - A kind of chip-scale error logging method - Google Patents

A kind of chip-scale error logging method Download PDF

Info

Publication number
CN102681930B
CN102681930B CN201210149211.2A CN201210149211A CN102681930B CN 102681930 B CN102681930 B CN 102681930B CN 201210149211 A CN201210149211 A CN 201210149211A CN 102681930 B CN102681930 B CN 102681930B
Authority
CN
China
Prior art keywords
mistake
error
depositor
local
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201210149211.2A
Other languages
Chinese (zh)
Other versions
CN102681930A (en
Inventor
乔英良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201210149211.2A priority Critical patent/CN102681930B/en
Publication of CN102681930A publication Critical patent/CN102681930A/en
Application granted granted Critical
Publication of CN102681930B publication Critical patent/CN102681930B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The present invention proposes a kind of chip-scale error logging method, the chip that profit designs in this way, use the incompatible misregistration of error register collection of stratification tissue, and send out interruption by other assemblies in system and enable the modes such as mistake pin to outside System Reports mistake;Can distinguish and record the mistake of different menace level when misregistration, and can be configured to as required enable or shield certain type of error logging, wherein, the two kinds of error register set in local and the overall situation for misregistration use hierarchical structure to carry out tissue registration's mistake: use the mistake that the incompatible memorization COMS clip of local error register set certain concrete parts internal are corresponding;Use the incompatible error logging collected in each local error set of registers of global error register set, and to outside System Reports.

Description

A kind of chip-scale error logging method
Technical field
The present invention relates to computer chip design field, be specifically related to a kind of chip-scale error logging method.
Background technology
Along with the high speed development of the application electronizations such as scientific algorithm, commercial service, government function, user is to service The requirement of the aspects such as the performance of ICT infrastructure, capacity, density, availability, safety such as device, storage and the network equipment is also got over Come the highest.And chip is as the basic component units of these ICT infrastructure, its importance is self-evident.Grind in chip design Sending out and throw in sheet production process, chip yield is the principal element affecting R&D cycle, design cost, production cost.Chip Repeatedly throwing sheet not only increase design, checking, test period, and the throwing sheet production cost of costliness also results in whole chip and grinds Cost allowance during producing, therefore just rapid build prototype system should carry out chip-scale in early days in chip logic design With whole-system verification and test, once throw sheet success rate with guarantee.This is accomplished by the chip-scale error logging of a kind of high efficient and flexible Method, in order to test proof procedure in locating core chip level problem.
Summary of the invention
It is an object of the invention to provide a kind of chip-scale error logging method.
It is an object of the invention to realize in the following manner, local error depositor and global error depositor combine, and adopt Tissue registration's mistake is carried out: use local error depositor to carry out the mistake that memorization COMS clip certain concrete parts internal are corresponding with hierarchical structure By mistake;Global error depositor is used to collect the error logging in each local error depositor, and to outside System Reports;
Local error depositor includes 1) local error conditions depositor, 2) local error control depositor, 3) local is wrong By mistake menace level depositor, 4) local fault log register first, 5) locally subsequent error log register;Wherein:
1) local error conditions depositor, identifies the every kind of mistake occurred in corresponding component, and every kind of mistake uses 1bit Representing, when there is certain type of mistake, bit corresponding in depositor is set to 1;
2) local error controls depositor, controls whether to record certain type error that corresponding component mistake produces, its bit Definition and local each one_to_one corresponding of error status register, if certain control bit that local error controls in depositor is set to Position, then the corresponding mistake detected can be shielded, and does not record and processes;
3) local error menace level depositor, it is provided that certain mistake is mapped to the mechanism of multiple mistake menace level, Can be according to the definition of the error type in local error menace level depositor-menace level mapping relations during the corresponding mistake of generation Carry out error reporting, it is assumed that need to support following 3 kinds of wrong menace levels: (1) can right the wrong, the system of referring to can recover and There is no the loss of information, it is not necessary to the mistake of the participation of software;Including link crc error, it is possible to corrected by link layer retransmission; (2) recoverable error refers to be corrected by hardware mechanisms, needs the mistake recovered by upper layer software (applications);(3) fatal error, Refer to cause specific affairs unreliable, but system remains to properly functioning mistake, including the data portion only affecting affairs Point the most repairable mistake of ECC, cannot carry out, by hardware or software, the mistake correcting or recover, it may be required system is multiple Position returns to the mistake of reliable behavior, including cache multidigit labelling PCI-E link failure wrong, permanent;
The menace level that every kind of type of error is corresponding needs to represent with two bit, if 00b correspondence can be righted the wrong, 01b pair Recoverable error, 10b correspondence fatal error, 11b is answered to retain use;
4) local fault log register first, be used for recording when certain mistake of corresponding component is detected first is corresponding Information, including message content, mistake address;
5) locally subsequent error log register, is used for recording certain mistake rear supervention in addition to first time of corresponding component Corresponding information time raw, including error count;
Global error depositor, including global error status register 1), global error control depositor 2), the overall situation first Fault log register 3), the overall situation subsequent error log register 4), system event status register 5) and system event control Depositor 6), wherein:
Global error status register 1), identify in chip and whether all parts makes a mistake, the mistake of each parts State uses 1bit to represent by mistake, and when certain parts makes a mistake, bit corresponding in depositor is set to 1;
Global error controls depositor 2), control whether to record the mistake that certain parts produces, its bit definition and the overall situation are wrong Each one_to_one corresponding of status register, if certain control bit in global error status register is set, then detects by mistake The mistake of corresponding component can be shielded, do not record and process;
Overall situation fault log register 3 first) and overall situation subsequent error log register 4) record all parts head respectively Secondary when making a mistake and follow-up field data when making a mistake;
System event status register 5) menace level corresponding to the mistake that occurs of memorization COMS clip all parts;
System event controls depositor 6) mapping relations of definition menace level-report manner, it is configurable that certain is serious etc. The mode that the mistake of level is reported to other assemblies of system, including sending out interruption, enabling mistake pin;
Specifically comprise the following steps that
1) certain parts produces a certain class mistake;
2) judge whether the most to have shielded such mistake, if shielded, the most not in " local error control depositor " Record this mistake, terminate;Otherwise, the corresponding bit in " local error conditions depositor " is set;
3) judging whether this type of mistake first, if occurring first, then " local error log first is deposited in renewal Device " content;Otherwise, " locally subsequent error log register " content is updated;
4) depositor report mistake is controlled according to the configuration in " local error menace level depositor " to global error;
5) judge whether the most to have shielded this parts relevant error, if shielded in " global error control depositor " Cover, the most do not record this mistake, terminate;Otherwise, the corresponding bit in " global error status register " is set;
6) judging whether this parts relevant error first, if occurring first, then updating " overall situation mistake day first Will depositor " content;Otherwise, " overall situation subsequent error log register " content is updated;
7) the corresponding mistake menace level in " system event status register " is updated;
8) according to the menace level-report manner mapping relations of configuration in " system event control depositor ", in sending out The modes such as disconnected, enable mistake pin are to outside System Reports mistake.So far, whole error logging and reporting process terminate.
The invention has the beneficial effects as follows: this chip-scale error logging side possessing stratification tissue flexibly configurable Method, compensate for that traditional die error logging method efficiency is low, the deficiency of very flexible, it is possible to shortens chip and designs, verifies, tests Cycle also can effectively ensure that chip once throws sheet success rate, thus has boundless development prospect and high technology valency Value.
Accompanying drawing explanation
The error register set schematic diagram that accompanying drawing 1 is stratification tissue;
Accompanying drawing 2 chip-scale error logging schematic flow sheet.
Detailed description of the invention
With reference to the accompanying drawings, to present disclosure with an instantiation describe realize invention described in chip-scale The process of error logging method.
The present invention proposes a kind of chip-scale error logging method, the chip that profit designs in this way, uses stratification The error register of tissue gathers misregistration, and sends out interruption by other assemblies in system and enable mistake pin etc. Mode is to outside System Reports mistake;Can distinguish and record the mistake of different menace level when misregistration, and can be according to need It is configured to enable or shield certain type of error logging.
The local for misregistration in the present invention and two kinds of error registers of the overall situation gather, and use stratification knot Gou Lai tissue registration mistake: wherein, uses local error depositor to carry out the mistake that memorization COMS clip certain concrete parts internal are corresponding;Make The error logging in each local error depositor is collected with global error depositor, and to outside System Reports, such as Fig. 1 institute Show:
Local error depositor includes that local error conditions depositor, local error control depositor, local error are serious Level register, local fault log register, locally subsequent error log register first.Wherein:
Local error conditions register identification goes out the every kind of mistake occurred in corresponding component, and every kind of mistake uses 1bit table Show.When there is certain type of mistake, bit corresponding in depositor is set to 1.
Local error controls whether register controlled records certain type error that corresponding component mistake produces, and its bit is fixed Justice and local each one_to_one corresponding of error status register.If certain control bit that local error controls in depositor is set to Position, then the corresponding mistake detected can be shielded, and does not record and processes.
Local error menace level depositor provides the mechanism certain mistake being mapped to multiple mistake menace level, occurs Can carry out according to the definition of the error type in local error menace level depositor-menace level mapping relations during corresponding mistake Error reporting.Assume to need to support following 3 kinds of wrong menace levels: can righting the wrong, (system can be recovered and not have information Lose, it is not necessary to the participation of software.Such as link crc error, can be corrected by link layer retransmission), recoverable error (cannot lead to Cross hardware mechanisms corrigendum, need the mistake recovered by upper layer software (applications), specific affairs may be caused unreliable, but system remains to Properly functioning.Mistake as the most repairable in ECC, only affects the data division of affairs), fatal error (hardware or soft cannot be passed through Part is corrected or recovers, it may be required system reset returns to reliable behavior, as wrong, permanent in cache multidigit labelling PCI-E link failure etc.).The menace level that then every kind of type of error is corresponding needs to represent with two bit, and can set 00b correspondence can Right the wrong, 01b correspondence recoverable error, 10b correspondence fatal error, 11b retain use.
Local fault log register first, is used for the corresponding letter recorded when certain mistake of corresponding component is detected first Breath, such as message content, mistake address etc..
Locally subsequent error log register, is used for recording certain mistake follow-up generation in addition to for the first time of corresponding component Time corresponding information, such as error count etc..
Global error depositor, controls depositor, overall situation mistake first including global error status register, global error Log register, overall situation subsequent error log register, system event status register and system event control depositor, its In:
Global error status register identifies in chip and whether makes a mistake in all parts, the wrong shape of each parts State uses 1bit to represent.When certain parts makes a mistake, bit corresponding in depositor is set to 1.
Global error controls whether register controlled records the mistake that certain parts produces, its bit definition and global error Each one_to_one corresponding of status register.If certain control bit in global error status register is set, then detect The mistake of corresponding component can be shielded, and does not record and processes.
Overall situation fault log register first and overall situation subsequent error log register record all parts respectively and send out first During raw mistake and follow-up field data when making a mistake.
The menace level of the mistake correspondence that system event status register memorization COMS clip all parts occurs.
System event controls the mapping relations of register definitions menace level-report manner, certain menace level configurable The mode reported to other assemblies of system of mistake, interrupt as sent out, enable mistake pin etc..
As described in summary of the invention, the error logging process in the present invention refers to accompanying drawing 2, specifically comprises the following steps that
1) certain parts produces a certain class mistake;
2) judge whether the most to have shielded such mistake, if shielded, the most not in " local error control depositor " Record this mistake, terminate;Otherwise, the corresponding bit in " local error conditions depositor " is set;
3) judging whether this type of mistake first, if occurring first, then " local error log first is deposited in renewal Device " content;Otherwise, " locally subsequent error log register " content is updated;
4) depositor report mistake is controlled according to the configuration in " local error menace level depositor " to global error;
5) judge whether the most to have shielded this parts relevant error, if shielded in " global error control depositor " Cover, the most do not record this mistake, terminate;Otherwise, the corresponding bit in " global error status register " is set;
6) judging whether this parts relevant error first, if occurring first, then updating " overall situation mistake day first Will depositor " content;Otherwise, " overall situation subsequent error log register " content is updated;
7) the corresponding mistake menace level in " system event status register " is updated;
8) according to the menace level-report manner mapping relations of configuration in " system event control depositor ", in sending out The modes such as disconnected, enable mistake pin are to outside System Reports mistake.So far, whole error logging and reporting process terminate.
In addition to the technical characteristic described in description, it is the known technology of those skilled in the art.

Claims (1)

1. a chip-scale error logging method, it is characterised in that local error depositor and global error depositor combine to be adopted Tissue registration's mistake is carried out: use local error depositor to carry out the mistake that memorization COMS clip certain concrete parts internal are corresponding with hierarchical structure By mistake;Global error depositor is used to collect the error logging in each local error depositor, and to outside System Reports;
Local error depositor includes 1) local error conditions depositor, 2) local error control depositor, 3) local error is tight Weight level register, 4) local fault log register first, 5) locally subsequent error log register;Wherein:
1) local error conditions depositor, identifies the every kind of mistake occurred in corresponding component, and every kind of mistake uses 1bit to represent, When there is certain type of mistake, bit corresponding in depositor is set to 1;
2) local error controls depositor, controls whether to record certain type error that corresponding component mistake produces, and its bit defines With each one_to_one corresponding of local error status register, if certain control bit that local error controls in depositor is set, The corresponding mistake then detected can be shielded, and does not record and processes;
3) local error menace level depositor, it is provided that certain mistake is mapped to the mechanism of multiple mistake menace level, occurs Can carry out according to the definition of the error type in local error menace level depositor-menace level mapping relations during corresponding mistake Error reporting, needs to support following 3 kinds of wrong menace levels: (1) can right the wrong, and the system of referring to can be recovered and not have information Loss, it is not necessary to the mistake of the participation of software;Including link crc error, it is possible to corrected by link layer retransmission;(2) can be extensive Multiple mistake refers to be corrected by hardware mechanisms, needs the mistake recovered by upper layer software (applications);(3) fatal error, referring to can Specific affairs can be caused unreliable, but system remains to properly functioning mistake, including the ECC of the data division only affecting affairs The most repairable mistake, cannot carry out, by hardware or software, the mistake correcting or recover, it may be required system reset recovers To the mistake of reliable behavior, including cache multidigit labelling PCI-E link failure wrong, permanent;
The menace level that every kind of type of error is corresponding needs to represent with two bit, if 00b correspondence can be righted the wrong, 01b correspondence can Recover mistake, 10b correspondence fatal error, 11b retain use;
4) local fault log register first, is used for the corresponding letter recorded when certain mistake of corresponding component is detected first Breath, including message content, mistake address;
5) locally subsequent error log register, is used for recording follow-up in addition to for the first time of certain mistake of corresponding component when occurring Corresponding information, including error count;
Global error depositor, controls depositor, overall situation error log first including global error status register, global error Depositor, overall situation subsequent error log register, system event status register and system event control depositor, wherein:
Global error status register, identifies in chip and whether makes a mistake in all parts, the error condition of each parts Using 1bit to represent, when certain parts makes a mistake, bit corresponding in depositor is set to 1;
Global error controls depositor, controls whether to record the mistake that certain parts produces, its bit definition and global error state Each one_to_one corresponding of depositor, if certain control bit in global error status register is set, then detect is corresponding The mistake of parts can be shielded, and does not record and processes;
Overall situation fault log register first and overall situation subsequent error log register record all parts respectively and mistake occur first Mistake and follow-up field data when making a mistake;
The menace level of the mistake correspondence that system event status register memorization COMS clip all parts occurs;
System event controls the mapping relations of register definitions menace level-report manner, the mistake of certain menace level configurable By mistake to the mode of other assemblies of system report, including sending out interruption, enabling mistake pin;
Specifically comprise the following steps that
1) certain parts produces a certain class mistake;
2) judge whether the most to have shielded such mistake in " local error control depositor ", if shielded, the most not record This mistake, terminates;Otherwise, the corresponding bit in " local error conditions depositor " is set;
3) judging whether this type of mistake first, if occurring first, then updating " local fault log register first " Content;Otherwise, " locally subsequent error log register " content is updated;
4) depositor report mistake is controlled according to the configuration in " local error menace level depositor " to global error;
5) judge whether the most to have shielded this parts relevant error, if shielded, then in " global error control depositor " Do not record this mistake, terminate;Otherwise, the corresponding bit in " global error status register " is set;
6) judging whether this parts relevant error first, if occurring first, then " overall situation error log first is posted in renewal Storage " content;Otherwise, " overall situation subsequent error log register " content is updated;
7) the corresponding mistake menace level in " system event status register " is updated;
8) according to the menace level-report manner mapping relations of configuration in " system event control depositor ", by sending out interruption, making The mode of energy mistake pin is to outside System Reports mistake, and so far, whole error logging and reporting process terminate.
CN201210149211.2A 2012-05-15 2012-05-15 A kind of chip-scale error logging method Active CN102681930B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210149211.2A CN102681930B (en) 2012-05-15 2012-05-15 A kind of chip-scale error logging method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210149211.2A CN102681930B (en) 2012-05-15 2012-05-15 A kind of chip-scale error logging method

Publications (2)

Publication Number Publication Date
CN102681930A CN102681930A (en) 2012-09-19
CN102681930B true CN102681930B (en) 2016-08-17

Family

ID=46813895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210149211.2A Active CN102681930B (en) 2012-05-15 2012-05-15 A kind of chip-scale error logging method

Country Status (1)

Country Link
CN (1) CN102681930B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104077375B (en) * 2014-06-24 2017-09-12 华为技术有限公司 The processing method and node of a kind of wrong catalogue of CC NUMA systems interior joint
CN104133751A (en) * 2014-08-06 2014-11-05 浪潮(北京)电子信息产业有限公司 Chip debugging method and chip
CN104407952A (en) * 2014-11-12 2015-03-11 浪潮(北京)电子信息产业有限公司 Method and system for debugging through multi-CPU (central processing unit) node controller chip
US10922161B2 (en) * 2018-11-28 2021-02-16 Intel Corporation Apparatus and method for scalable error detection and reporting
CN110399317B (en) * 2019-07-15 2020-12-25 西安微电子技术研究所 Software self-adaptive multifunctional controller of embedded system
CN113832663B (en) * 2021-09-18 2022-08-16 珠海格力电器股份有限公司 Control chip fault recording method and device and control chip fault reading method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599812A (en) * 2008-06-04 2009-12-09 富士通株式会社 Data transmission set
CN102073533A (en) * 2011-01-14 2011-05-25 中国人民解放军国防科学技术大学 Multicore architecture supporting dynamic binary translation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3889391B2 (en) * 2003-11-06 2007-03-07 ローム株式会社 Memory device and display device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599812A (en) * 2008-06-04 2009-12-09 富士通株式会社 Data transmission set
CN102073533A (en) * 2011-01-14 2011-05-25 中国人民解放军国防科学技术大学 Multicore architecture supporting dynamic binary translation

Also Published As

Publication number Publication date
CN102681930A (en) 2012-09-19

Similar Documents

Publication Publication Date Title
CN102681930B (en) A kind of chip-scale error logging method
CN101833497B (en) Computer fault management system based on expert system method
CN102612065B (en) Quick fault-tolerance detection method for monitoring abnormal event by wireless sensor network
US9495233B2 (en) Error framework for a microprocesor and system
US20020144193A1 (en) Method and system for fault isolation methodology for I/O unrecoverable, uncorrectable error
CN103220180B (en) The processing method that a kind of OpenStack cloud platform is abnormal
CN101960429B (en) Video media data storage system and related methods
CN102521084A (en) Data storage and reading method
CN101127675A (en) Initialization method for main nodes of Ethernet loop network system
CN104731670A (en) Switch type on-board computer tolerant system facing satellite
CN103761172B (en) Hardware fault diagnosis system based on neutral net
CN106293984A (en) A kind of computer glitch automatically processes mode and device
CN106648968A (en) Data recovery method and device when ECC correction failure occurs on chip
CN104376143B (en) Soft error screen method based on approximate Logic circuit
CN102404139A (en) Method for increasing fault tolerance performance of application level of fault tolerance server
CN106227464A (en) A kind of double-deck redundant storage system and data write, reading and restoration methods
CN103246585A (en) Storage controller fault detecting method
CN102819480A (en) Computer and method for monitoring memory thereof
CN106330535A (en) Train-ground communication data processing method and apparatus
CN107807862A (en) Detect the method, apparatus and server of hard disk failure point
CN102750194A (en) Large-scale integrated circuit level error recording and responding method
CN104156276A (en) RAID method for preventing two disks from being damaged
CN104731677A (en) High-reliability storage and diagnosis method for external SRAMs (static random access memories) of safety instrument transmitters
CN105893190A (en) Diagnosis processing method and system for multi-path IO errors
CN103390429B (en) The online test method of a kind of hard disk and server

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant