CN1288882C - Solution to local failure of memory - Google Patents

Solution to local failure of memory Download PDF

Info

Publication number
CN1288882C
CN1288882C CNB011350881A CN01135088A CN1288882C CN 1288882 C CN1288882 C CN 1288882C CN B011350881 A CNB011350881 A CN B011350881A CN 01135088 A CN01135088 A CN 01135088A CN 1288882 C CN1288882 C CN 1288882C
Authority
CN
China
Prior art keywords
buffering area
memory
failure
self check
buffer area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB011350881A
Other languages
Chinese (zh)
Other versions
CN1422048A (en
Inventor
涂君
雷春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CNB011350881A priority Critical patent/CN1288882C/en
Publication of CN1422048A publication Critical patent/CN1422048A/en
Application granted granted Critical
Publication of CN1288882C publication Critical patent/CN1288882C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

The present invention relates to a method for resolving the local failure of a memory, which comprises: a logic IC or a logic circuit of an ASIC chip carries out self check to a memory by taking a buffer area as a unit; if the self check result of all storage units of the buffer area is that the storage units of the buffer area are all normal, then the first address of the buffer area is written into an idle buffer area queue after the self check is completed, so that the buffer area is used later; if the situation that a failure storage unit exists in a certain buffer area is detected, the first address of the buffer area is no longer written into the idle buffer area queue, so that the buffer area can not be accessed for ever in the normal operation of the logic IC or the ASIC chip; when the self check is carried out, failure buffer areas are also counted; after the self check is completed, the count value is read, and corresponding processing is carried out according to the number of the failure buffer areas: when failure buffer areas do not exist, or failure buffer areas are very few, and the failure buffer areas have very little influence on system function or system performance, the single board can operate as usual; when the number of failure buffer areas is larger, an alarm signal is sent to the system for requesting maintenance or single board replacement.

Description

A kind of method that solves the memory partial failure
Technical field
The present invention relates to a kind ofly solve the memory partial failure and improve the method for whole system functional reliability and fault-tolerance, this method has bigger using value in the occasion that the memory piecemeal uses, and for example transmits or ATM cell such as cuts apart/recombinate at the Logic Circuit Design of aspect application in the message storage.The invention belongs to logic IC or asic chip circuit design technique field.
Background technology
Relating to that message storage is transmitted or ATM cell is cut apart/recombinate etc. in the circuit design of the logic IC of application or asic chip, often need use mass storage and be used for temporary message, and generally all be that memory is divided into several buffering areas, each buffering area can be deposited a message.
Referring to the realization block diagram of logical circuit in the application that the message storage is transmitted or ATM cell is cut apart/recombinated etc. at present shown in Figure 1, its basic functional principle is described as follows:
(1) after system reset, at first carries out the memory self check.The memory self check can be to be realized by the logical circuit of logic IC or asic chip itself, also can be to be undertaken by the memory access passage that this logic chip provides by the CPU that links to each other with this logic IC or asic chip.Because the capacity of memory is bigger, the self check speed of being carried out memory by CPU is too slow, so, all be that the logical circuit by logic IC or asic chip itself carries out the memory self check usually.The method of self check generally is to write data earlier in certain memory cell of memory, and then these data and the data of reading from this memory cell are compared judgement, if both are identical, thinks that then this memory cell is normal.If through after the self check, all memory cell of this memory are all normal, can judge that this memory self check is normal.After the memory self check was finished, this memory self check mistake of needs output was whether Status Flag, and confession CPU judges and handles accordingly.Find that such as self check there is partial failure in memory, then CPU need send alarm signal, notifies the attendant to change corresponding processing such as veneer.Heavy line among the figure is represented the delivering path of message data, and fine line is then represented the delivering path of buffering area first address.
(2) initialization of the not busy buffering area formation of the normal laggard line space of memory self check, the first address that is about to each buffering area writes the freebuf formation.Freebuf formation and the formation of transmission buffering area in fact all are push-up storage (FIFO), what preserve in the freebuf formation is the first address of freebuf, and sending what preserve in the buffering area formation is to have had buffering area first address to be sent such as message.
(3) the accepting state machine is after receiving message, to from the freebuf formation, read the first address of freebuf, and the message that receives is stored in the corresponding buffering area of mass storage according to this address, after a message received, the first address with this buffering area was written in the formation of transmission buffering area again.
(4) after the transmit status machine examination measures and in the formation of transmission buffering area data is arranged, from send the buffering area formation, read out the buffering area first address of this message storage earlier, from mass storage, read this message according to this first address then, and after handling accordingly, send.After a message transmission finished, message transmit status machine was written to the first address of this buffering area in the freebuf formation more again, to discharge this buffering area.By above workflow, just can finish the storage forwarding work of message.
At present, along with developing rapidly of microelectric technique, the capacity of memory chip is increasing, can integrated several hundred million transistors in the present chip, and the scale of memory chip also rapidly increases continuing.Simultaneously, the employed processing technology of production integrated circuit (IC) chip is also more and more advanced, and its live width is more and more littler, and the possibility that certainly will cause like this occurring the LSU local store unit inefficacy in the memory will increase greatly.
If the LSU local store unit in certain buffering area in the memory lost efficacy, the message that then is easy to cause being temporarily stored in this buffering area is made mistakes when sending.In this case, can think that generally this veneer produces fault, need to change whole memory chip, and veneer need be returned manufacturer's maintenance.The expense of whole maintenance is the cost that is higher than this memory chip itself far away, moreover this memory chip is just LSU local store unit generation inefficacy also, more seriously, can produce the illusion that the quality of this product can not get guaranteeing, bring grievous injury to image product to the user.
Summary of the invention
Thereby the purpose of this invention is to provide a kind of method that the memory partial failure improves whole system functional reliability and fault-tolerance that solves, this method can solve memory preferably and partial failure occur and cause message to send wrong and the high problem of single board default rate, make and it seems that from system to just look like that partial failure does not take place this memory chip the same, only this memory span is little little by little, whole system operation reliability and fault-tolerance be can improve greatly like this, single board default rate and repair rate reduced.
The object of the present invention is achieved like this: a kind of method that solves the memory partial failure, it is characterized in that: the logical circuit of logic IC or asic chip itself is that unit carries out self check with the buffering area to memory, the method of self check is sequentially to write data to each memory cell of this memory, and then these data and the data of reading from this buffering area are compared judgement, if both are identical, think that then the detected memory cell of this buffering area is normal, if the self-detection result of all memory cell of this buffering area is all normal, then after this buffering area self check finishes, its first address is written in the freebuf formation, makes and to use this buffering area in the work afterwards of described logic IC or asic chip; If detect certain buffering area certain or some storage-unit-failure is arranged, then the first address of this buffering area will not be written in the freebuf formation, and this buffering area that has the partial failure memory cell will be accessed in the operate as normal of logic IC or asic chip never; Simultaneously, described logical circuit is when carrying out the memory self check, one counter is set to be counted the buffering area number that damages that lost efficacy, and after self check finishes, reading this statistics by CPU lost efficacy and damages the count value of buffer count device and handle accordingly: if lost efficacy the buffering area number that damages seldom, to the function of veneer and performance impact when little, think that this veneer is normal, allow the work as usual of this veneer; Big when the number that lost efficacy to damage buffering area, when the function of veneer and performance are affected greatly, send alarm signal request maintenance or change veneer.
Adopt method of the present invention, can under the situation that the LSU local store unit that detects memory takes place to lose efficacy, not re-use this and produced the buffering area that LSU local store unit lost efficacy, but other buffering areas that do not lose efficacy can also normally use, and do not need to change whole memory chip.Like this, it seems that from system to just look like that this memory chip does not produce partial failure the same, only the capacity of this memory is little little by little, and this is complete acceptable in the overwhelming majority's system.So application of the present invention can improve the reliability and the fault-tolerance of whole system greatly, reduce the failure rate and the repair rate of veneer, this has very important significance in the continuous work of application scenario have relatively high expectations, need to(for) functional reliability.
Description of drawings
Fig. 1 is the realization block diagram of the hardware logic electric circuit in the application that the message storage used is at present transmitted or ATM cell is cut apart/recombinated etc.
Embodiment
The present invention a kind ofly solves the memory partial failure and improves the method for whole system functional reliability and fault-tolerance, the specific practice of this method is that the logical circuit by logic IC or asic chip itself is that unit carries out self check with the buffering area to this memory, the method of self check is sequentially to write data to each memory cell of each buffering area of this memory, and then these data and the data of reading from this buffering area are compared judgement, if both are identical, think that then the detected memory cell of this buffering area is normal; If the self-detection result of all memory cell of this buffering area is all normal, then after this buffering area self check finishes, its first address is written in the freebuf formation, just can use this buffering area in the work of logic IC or asic chip afterwards; Have certain or some storage-unit-failure if detect this buffering area, then the first address of this buffering area will not be written in the freebuf formation, and this buffering area that has the LSU local store unit inefficacy will no longer be accessed in the operate as normal of logic IC or asic chip forever; Simultaneously, the memory self-checking circuit is when carrying out the memory self check, one counter is set to be counted the buffering area number that damages that lost efficacy, and after self check finishes, by CPU read this statistics lost efficacy the buffer count device that damages count value and handle accordingly: if lost efficacy the buffering area number that damages seldom, under the little situation of influences such as the function of veneer and performance, can think that this veneer is normal, allow the work as usual of this veneer; The number that damaged buffering area when losing efficacy is bigger, under the situation that may affect greatly the function and the performance of veneer, should send alarm signal, and request maintenance or prompting user in time change this data storage veneer.
When each buffering area to this memory carries out self check, can select to adopt a kind of method for testing memory to the requirement of memory error detection probability according to the complexity and the system that realize, for example scanning patter method, checkerboard pattern method, MATS algorithm, the graphic-arts technique that strides, nine step algorithms, nine step of expansion algorithm, 13 go on foot algorithms, March C algorithm or the like, the concrete grammar of above-mentioned these algorithms can be checked related data, and the present invention does not give unnecessary details at this.
Method of the present invention is carried out emulation and simulation by the applicant in computer and some equipment, system, and in actual items, implement test,, realized goal of the invention through the practice test, prove that this method performing step is simple, reliable operation, have good application prospects.

Claims (1)

1, a kind of method that solves the memory partial failure, it is characterized in that: the logical circuit of logic IC or asic chip itself is that unit carries out self check with the buffering area to memory, if the self-detection result of all memory cell of certain buffering area is all normal, then after this buffering area self check finishes, its first address is written in the freebuf formation, makes and to use this buffering area in the work afterwards of described logic IC or asic chip; If detect certain buffering area certain or some storage-unit-failure is arranged, then the first address of this buffering area will not be written in the freebuf formation, and this buffering area that has the partial failure memory cell will be accessed in the operate as normal of logic IC or asic chip never; Simultaneously, described logical circuit is when carrying out the memory self check, one counter is set to be counted the buffering area number that damages that lost efficacy, and after self check finishes, reading this statistics by CPU lost efficacy and damages the count value of buffer count device and handle accordingly: if lost efficacy the buffering area number that damages seldom, to the function of veneer and performance impact when little, think that this veneer is normal, allow the work as usual of this veneer; Big when the number that lost efficacy to damage buffering area, when the function of veneer and performance are affected greatly, send alarm signal request maintenance or change veneer.
CNB011350881A 2001-11-27 2001-11-27 Solution to local failure of memory Expired - Fee Related CN1288882C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB011350881A CN1288882C (en) 2001-11-27 2001-11-27 Solution to local failure of memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB011350881A CN1288882C (en) 2001-11-27 2001-11-27 Solution to local failure of memory

Publications (2)

Publication Number Publication Date
CN1422048A CN1422048A (en) 2003-06-04
CN1288882C true CN1288882C (en) 2006-12-06

Family

ID=4672943

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB011350881A Expired - Fee Related CN1288882C (en) 2001-11-27 2001-11-27 Solution to local failure of memory

Country Status (1)

Country Link
CN (1) CN1288882C (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4651704B2 (en) * 2008-11-14 2011-03-16 シャープ株式会社 Image processing device
CN115292114B (en) * 2022-10-09 2022-12-09 中科声龙科技发展(北京)有限公司 Data storage method, device, equipment and storage medium based on ETHASH algorithm

Also Published As

Publication number Publication date
CN1422048A (en) 2003-06-04

Similar Documents

Publication Publication Date Title
CN102084430B (en) Method and apparatus for repairing high capacity/high bandwidth memory devices
CN101589370B (en) A parallel computer system and fault recovery method therefor
CN102541756A (en) Cache memory system
CN1955935A (en) Method and system for confirming system performance character
US5220567A (en) Signature detecting method and apparatus for isolating source of correctable errors
CN105959235B (en) Distributed data processing system and method
US6950978B2 (en) Method and apparatus for parity error recovery
JP3798750B2 (en) Method and system for inspecting a series of data packets
KR20040093405A (en) Mechanism for field replaceable unit fault isolation in distributed nodal environment
CN101458305B (en) Embedded module test and maintenance bus system
CN1288882C (en) Solution to local failure of memory
JPH03501305A (en) Bus data transmission verification system
CN101634939B (en) Fast addressing device and method thereof
CN108228669A (en) A kind of method for caching and processing and device
CN102135941B (en) Method and device for writing data from cache to memory
WO2016101177A1 (en) Random access memory detection method of computer device and computer device
JPH07321795A (en) Buffer address management method
CN109710495A (en) A kind of information processing method and electronic equipment
US5418794A (en) Error determination scan tree apparatus and method
CN112613254B (en) System and method for verifying fault injection of mirror image control module in processor
US7337247B2 (en) Buffer and method of diagnosing buffer failure
TWI335441B (en) Test system and method of core logic circuit
JP4038663B2 (en) IC test system and minimum address selection method
JP2821326B2 (en) Cache memory failure detection device
JPH02504083A (en) Method and apparatus for data buffer management

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20061206

Termination date: 20161127

CF01 Termination of patent right due to non-payment of annual fee