CN107451051A - A kind of method that server memory diagnosis is carried out under Linux - Google Patents
A kind of method that server memory diagnosis is carried out under Linux Download PDFInfo
- Publication number
- CN107451051A CN107451051A CN201710518822.2A CN201710518822A CN107451051A CN 107451051 A CN107451051 A CN 107451051A CN 201710518822 A CN201710518822 A CN 201710518822A CN 107451051 A CN107451051 A CN 107451051A
- Authority
- CN
- China
- Prior art keywords
- diagnosis
- error
- data
- row
- under linux
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/366—Software debugging using diagnostics
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Techniques For Improving Reliability Of Storages (AREA)
Abstract
The present invention relates to server failure detection field, and in particular to a kind of method for carrying out server memory diagnosis under linux.The memory diagnosis method can be in the case of linux system power off, the memory failure of discovery is positioned, so as to greatly improve fault diagnosis timeliness, reduce the influence that memory failure runs business to server, and memory diagnosis method of the present invention may be used in all kinds of server systems, there is good adaptability.
Description
Technical field
The present invention relates to server failure detection field, and in particular to a kind of to carry out server memory diagnosis under linux
Method.The memory diagnosis method can position in the case of linux system does not power off to the memory failure of discovery, from
And fault diagnosis timeliness is greatly improved, and the influence that memory failure runs business to server is reduced, and it is of the present invention
Memory diagnosis method may be used in all kinds of server systems, have good adaptability.
Background technology
With the rapid development of Internet, people are increasing to the demand of server, the application to server is also got over
Carry out more extensive, and then the also more and more higher of the requirement to the indices of server.Server takes longer for work, and property
It can stablize.But after server long-play, the probability to break down increases.Positioning in time is needed once breaking down
And fix a breakdown.The positioning of traditional server memory failure needs server to shut down, and then takes out internal memory progress monomer and investigates one by one
Analysis, but power-off operation can interrupt client traffic, bring the loss of economic interests.
In view of the above-mentioned problems, the present application one kind need not interrupt existing business, failure memory DQ (data are directly positioned
Passage) method.
The content of the invention
Specifically, a kind of method that row diagnosis positioning is internally deposited under Linux environment is claimed in the application, and its feature exists
In the diagnosis localization method specifically comprises the following steps:
1) SAD information is read, confirms the slot position that reports an error;
2) TAD information is read, confirms the channel position that reports an error;
3) RIR information is read, confirms the arrangement position that reports an error;
4) by address mapping table, column and row, Ku Ji and the position in storehouse are confirmed;
5) data are write to the address and reads the data, the data and the data of write-in are subjected to XOR, obtained
The data channel that reports an error simultaneously generates LOG files.
The method that row diagnosis positioning is internally deposited under Linux environment as described above, is further characterized in that, above-mentioned diagnosis is determined
The step of position, can be repeatedly.
The method that row diagnosis positioning is internally deposited under Linux environment as described above, is further characterized in that, above-mentioned diagnosis is determined
Position the step of under linux system automatic running.
The method that row diagnosis positioning is internally deposited under Linux environment as described above, is further characterized in that, by LOG
File analysis, EMS memory error can be navigated to specific passage, CPU, Home agant and DIMM Rank.
Embodiment
The present invention is to provide a kind of method that row diagnosis positioning is internally deposited under Linux environment, its implementation is:
1st, SAD (Decoder) information is read, confirms the socket (slot) that reports an error;
2nd, TAD (destination address decoding device) information is read, confirms the channel (passage) that reports an error;
3rd, RIR (arrangement exchanges scope) information is read, confirms the rank (arrangement) that reports an error;
4th, by address mapping table, col (row), row (OK), bank group (storehouse collection), bank (storehouse) are confirmed;
5th, data are write to the address and reads the data, the data and the data of write-in are subjected to XOR, obtained
Report an error and DQ and generate LOG files.
It it is below the step of specifically performing the diagnostic test:
1st, EccMon programs are copied to linux system
Operating instruction is as follows:
EccMon EFI help license.rtf src syslinux
2nd, EccMon programs are run
The instruction of operation is as follows:
./EccMon i=1000/f=error.xml
I=1000 is circulated 1000 times to set
F=error.xml is that test result is saved as xml document.
Can generation error record LOG files error.xml after program end of run
<ErrorData PhysAddress=" 0000000000000000 " UnkMask=" 3FFFFFFFFFE0 "
ErrBits=" 0000000000020000 " Node=" 0 " HA=" 0 " Chan=" 0 " Rank=" 1 " Count=" 128 "
0verflow=" 0 "/>
<ErrorData PhysAddress=" 0000000000000008 " UnkMask=" 3FFFFFFFFFE0 "
ErrBits=" 0000000000020000 " Node=" 0 " HA=" 0 " Chan=" 0 " Rank=" 1 " Count=" 128 "
Overflow=" 0 "/>
<ErrorData PhysAddress=" 0000000000000010 " UnkMask=" 3FFFFFFFFFE0 "
ErrBits=" 0000000000020000 " Node=" 0 " HA=" 0 " Chan=" 0 " Rank=" 1 " Count=" 128 "
0verflow=" 0 "/>
Parse error.xml files:
ErrBits is converted into binary number, corresponding DQ0-63, Node CPU, HA are that home agent, Chan are logical
Road, Rank are DIMM Rank.
For example, error information is DQ 17, CPU 0, HA 0, Channe 10, Rank 1 in LOG files.
It should be evident that illustrated above is only one embodiment of the present of invention, for those of ordinary skill in the art
For, on the premise of not paying creative work, other technical schemes can also be obtained according to the embodiment, belong to this
Invent the scope of protection.
, can be in linux system not the invention provides a kind of method that row diagnosis positioning is internally deposited under Linux environment
Under powering-off state, the memory failure of discovery is positioned, so as to greatly improve fault diagnosis timeliness, reduces memory failure pair
Server runs the influence of business.After technical solutions according to the invention can also be adjusted, ordinary individual's calculating is applied to
Machine, method simple possible and obvious technical effects, it can be applied in practice extensively.
Claims (4)
1. the method for row diagnosis positioning is internally deposited under a kind of Linux environment, it is characterised in that the diagnosis localization method specifically wraps
Include following steps:
1) SAD information is read, confirms the slot position that reports an error;
2) TAD information is read, confirms the channel position that reports an error;
3) RIR information is read, confirms the arrangement position that reports an error;
4) by address mapping table, column and row, Ku Ji and the position in storehouse are confirmed;
5) data are write to the address and reads the data, the data and the data of write-in are subjected to XOR, reported an error
Data channel simultaneously generates LOG files.
2. internally depositing into the method for row diagnosis positioning under Linux environment as claimed in claim 1, it is further characterized in that, it is above-mentioned
The step of diagnosis positioning, can be repeatedly.
3. internally depositing into the method for row diagnosis positioning under Linux environment as claimed in claim 2, it is further characterized in that, it is above-mentioned
The step of diagnosis positions automatic running under linux system.
4. internally depositing into the method for row diagnosis positioning under Linux environment as claimed in claim 3, it is further characterized in that, passes through
To LOG file analyses, EMS memory error can be navigated to specific passage, CPU, Home agant and DIMM Rank.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710518822.2A CN107451051A (en) | 2017-06-29 | 2017-06-29 | A kind of method that server memory diagnosis is carried out under Linux |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710518822.2A CN107451051A (en) | 2017-06-29 | 2017-06-29 | A kind of method that server memory diagnosis is carried out under Linux |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107451051A true CN107451051A (en) | 2017-12-08 |
Family
ID=60488164
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710518822.2A Pending CN107451051A (en) | 2017-06-29 | 2017-06-29 | A kind of method that server memory diagnosis is carried out under Linux |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107451051A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108804252A (en) * | 2018-06-15 | 2018-11-13 | 郑州云海信息技术有限公司 | A kind of server memory fault detection method, device, equipment and storage medium |
CN109669830A (en) * | 2018-12-25 | 2019-04-23 | 上海创功通讯技术有限公司 | A kind of physical detection methods and terminal device for memory |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6671837B1 (en) * | 2000-06-06 | 2003-12-30 | Intel Corporation | Device and method to test on-chip memory in a production environment |
CN102135925A (en) * | 2010-12-27 | 2011-07-27 | 西安锐信科技有限公司 | Method and device for detecting error check and correcting memory |
CN104391753A (en) * | 2014-12-16 | 2015-03-04 | 浪潮电子信息产业股份有限公司 | Fault-free operation method for server mainboard memory system |
CN105589770A (en) * | 2015-07-20 | 2016-05-18 | 杭州昆海信息技术有限公司 | Fault detection method and apparatus |
CN106126368A (en) * | 2016-08-22 | 2016-11-16 | 浪潮电子信息产业股份有限公司 | Method for analyzing memory fault address under LINUX |
-
2017
- 2017-06-29 CN CN201710518822.2A patent/CN107451051A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6671837B1 (en) * | 2000-06-06 | 2003-12-30 | Intel Corporation | Device and method to test on-chip memory in a production environment |
CN102135925A (en) * | 2010-12-27 | 2011-07-27 | 西安锐信科技有限公司 | Method and device for detecting error check and correcting memory |
CN104391753A (en) * | 2014-12-16 | 2015-03-04 | 浪潮电子信息产业股份有限公司 | Fault-free operation method for server mainboard memory system |
CN105589770A (en) * | 2015-07-20 | 2016-05-18 | 杭州昆海信息技术有限公司 | Fault detection method and apparatus |
CN106126368A (en) * | 2016-08-22 | 2016-11-16 | 浪潮电子信息产业股份有限公司 | Method for analyzing memory fault address under LINUX |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108804252A (en) * | 2018-06-15 | 2018-11-13 | 郑州云海信息技术有限公司 | A kind of server memory fault detection method, device, equipment and storage medium |
CN109669830A (en) * | 2018-12-25 | 2019-04-23 | 上海创功通讯技术有限公司 | A kind of physical detection methods and terminal device for memory |
CN109669830B (en) * | 2018-12-25 | 2022-04-22 | 上海创功通讯技术有限公司 | Physical detection method for memory and terminal equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105589762B (en) | Memory device, memory module and method for error correction | |
US8977905B2 (en) | Method and system for detecting abnormality of network processor | |
US8880961B2 (en) | System and method of computation by signature analysis | |
US10078567B2 (en) | Implementing fault tolerance in computer system memory | |
CN102135925B (en) | Method and device for detecting error check and correcting memory | |
CN103703447B (en) | MRAM field disturb detection and recovery | |
CN106294222A (en) | A kind of method and device determining PCIE device and slot corresponding relation | |
CN109918226A (en) | A kind of silence error-detecting method, device and storage medium | |
CN107451051A (en) | A kind of method that server memory diagnosis is carried out under Linux | |
US10963395B2 (en) | Memory system | |
US9424164B2 (en) | Memory error tracking in a multiple-user development environment | |
CN107562565A (en) | A kind of method for verifying internal memory Patrol Scurb functions | |
CN106803036B (en) | Safety detection and fault tolerance method for data stream in system operation | |
CN104781790A (en) | Signaling software recoverable errors | |
CN115729477A (en) | Distributed storage IO path data writing and reading method, device and equipment | |
CN115114066A (en) | Memory fault monitoring method, system, storage medium and equipment | |
US12019579B2 (en) | Data transmission method, apparatus, and device, and storage medium | |
US20220038559A1 (en) | Method and system for sharing multi-protocol port, and server | |
US10140186B2 (en) | Memory error recovery | |
US9015522B2 (en) | Implementing DRAM failure scenarios mitigation by using buffer techniques delaying usage of RAS features in computer systems | |
US10268418B1 (en) | Accessing multiple data snapshots via one access point | |
US20090019309A1 (en) | Method and computer program product for determining a minimally degraded configuration when failures occur along connections | |
CN105183390B (en) | Data access method and device | |
CN109683053A (en) | A kind of interface test system and test method based on interface equipment factory test | |
CN103544072A (en) | Method and device for recovering data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171208 |
|
RJ01 | Rejection of invention patent application after publication |