CN105513645B - The fault detection method and device of random access memory ram - Google Patents

The fault detection method and device of random access memory ram Download PDF

Info

Publication number
CN105513645B
CN105513645B CN201410495434.3A CN201410495434A CN105513645B CN 105513645 B CN105513645 B CN 105513645B CN 201410495434 A CN201410495434 A CN 201410495434A CN 105513645 B CN105513645 B CN 105513645B
Authority
CN
China
Prior art keywords
test
ram
link
information cells
test information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410495434.3A
Other languages
Chinese (zh)
Other versions
CN105513645A (en
Inventor
王媛媛
郝涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201410495434.3A priority Critical patent/CN105513645B/en
Priority to PCT/CN2014/094143 priority patent/WO2015131613A1/en
Publication of CN105513645A publication Critical patent/CN105513645A/en
Application granted granted Critical
Publication of CN105513645B publication Critical patent/CN105513645B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C11/00Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor
    • G11C11/21Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements
    • G11C11/34Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices
    • G11C11/40Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors
    • G11C11/401Digital stores characterised by the use of particular electric or magnetic storage elements; Storage elements therefor using electric elements using semiconductor devices using transistors forming cells needing refreshing or charge regeneration, i.e. dynamic cells

Abstract

The present invention provides a kind of fault detection method of random access memory ram and devices, wherein the described method includes: test end equipment sends the first test information cells on the chain road where RAM to be detected;Test end equipment receives the second test information cells obtained after the first test information cells flow through the link;Test end equipment compares the first test data in the first test information cells and whether the second test data in the second test information cells is consistent;It tests end equipment and judges whether all RAM to be detected of chain road break down according to comparison result.Using above-mentioned technical proposal provided in an embodiment of the present invention, it solves in the related technology on the router of cluster environment, it is proposed a kind of simple and effective technical solution not yet the problem of whether RAM breaks down detected, batch detection can be carried out to the RAM failure of the exchange chip on large-scale router with the mode for sending test information cells, substantially increase RAM malfunction elimination efficiency.

Description

The fault detection method and device of random access memory ram
Technical field
The present invention relates to the communications fields, more specifically to a kind of random access memory (Random-Access Memory, referred to as RAM) fault detection method and device.
Background technique
With the rapid development of network technology, more and more large size routers are in current use in the market, Mou Xiete In the case where different, the cluster environment connected by multiple single-stage frames by optical fiber is also by large-scale use.On these routers, Essential a certain number of exchange boards exchange the crucial exchange chip inside board and have used a large amount of RAM again, these RAM is distributed in the various pieces of chip, plays a crucial role to the normal work of chip.If there is event in certain block RAM Barrier, then its bring influence be also it is huge, such as storage routing table RAM there is failure, just bad judgement, generally meeting Start with solution from software, after spending a large amount of time and being checked with energy, finally discovery is likely to be hardware fault, greatly Unnecessary time and efforts is wasted greatly.Based on such situation, if certain lists can be detected using if initially in plank The RAM failure of plate just can be reduced the time and efforts that some unnecessary failures occur and related personnel expends above.
There are many kinds of RAM detection method is current, most basic method is by simple even-odd check, ECC check.It is related The RAM detection method proposed in technology, it proposes a kind of detection processing method that the RAM applied to CPU/DSP fails, main to wrap Include the contents of program read in the RAM;The contents of program of reading is compared with correct contents of program, when the two is different When cause, judge that RAM fails and carries out data reparation;Or the contents of program of reading is subjected to data using the method for calibration of setting Verification, and be compared with correct check results, when the two is inconsistent, judge that RAM fails and alarms.Using above-mentioned technology Scheme timely detects the case where CPU/DSP RAM fails, takes corresponding treatment measures in time, shadow caused by RAM is failed Sound is preferably minimized.It is such but similarly in large-scale router, especially on the router for forming cluster by single-stage frame RAM is too many, if being that the RAM retest waste of veneer chip top one by one one by one is big first in this way Amount time and efforts is not said, the correct content for saving this each ram space is also sought to, this is also one for group system Than no small expense.
In the related technology it is also mentioned that a kind of technical solution, carries out segment processing to ram space in advance, ram space is divided The area segments and other area segments of significant data are stored for one, when the operating system starts up, to the storage significant data Area segments carry out RAM detection;When the periodic duty of current operation system is preset low priority periodic duty, to institute It states other area segments and carries out RAM detection.Specific detection method be first by other area segments not the clear band of storing data into Row detection, then the non-blank-white section of the data stored in other area segments is detected.Then rising to the clear band The first data are written in beginning address, read the data inside the initial address, if read-out data and first data are not Together, it is determined that the address space of the clear band is abnormal, conversely, the second data then are written to the address field again, then reads out It whether identical sees, if they are the same, indicates that this clear band ram space is normal, otherwise just report exception.Above-mentioned technical proposal is by drawing Divide ram space, then read and write the process of data, to judge whether ram space is normal, it require that carrying out careful draw to RAM Point, then fragment detects, and for the product that gadget uses ram space not many in other words, can be said to be can be with for this, if but Large-scale router is exchanged above board especially on the router of cluster environment, there are many exchange boards above There are several crucial exchange chips again, for the product that each chip top there are many RAM to use, then with this test method It is rather just improper.
It is examined for a kind of simple and effective technical solution on the router of cluster environment, is not yet proposed in the related technology The problem of whether RAM breaks down is surveyed, currently no effective solution has been proposed.
Summary of the invention
The present invention provides the fault detection method of RAM a kind of and devices, at least to solve the above problems.
According to an aspect of the invention, there is provided a kind of fault detection method of random access memory ram, comprising: Test end equipment sends the first test information cells on the chain road where RAM to be detected;The test end equipment is received described the One test information cells flow through the second test information cells obtained after the link;Test end equipment first test information cells Whether the first interior test data and the second test data in second test information cells are consistent;The test end equipment according to Comparison result judges whether all RAM to be detected of the chain road break down.
Preferably, the test end equipment judges that all RAM to be detected of the chain road are according to comparison result It is not no to break down, including at least one of: when comparison result instruction is consistent, determine the chain road it is all it is described to It is normal to detect RAM;When comparison result instruction is inconsistent, judge the link with the presence or absence of failure;The link without When failure, determine that at least one RAM to be detected breaks down on the link.
Preferably, the test end equipment judges that all RAM to be detected of the chain road are according to comparison result It is no to break down, comprising: in the link there are when failure, the link to be classified, wherein what is obtained after classification is each Sublink forms the link;Each sublink is checked according to pre-set priority, determination is broken down described Sublink;When the current link conditions for determining the sublink to break down are normal, then the institute in the sublink is determined RAM to be detected is stated to break down.
Preferably, the test end equipment includes: line card;Test end equipment is sent on the chain road where RAM to be detected It include: specified starting point of the line card as the test information cells signal of configuration before first test information cells, and according to described to be checked The link connection relationship of equipment where surveying RAM configures the terminal that the specified line card is the test information cells.
Preferably, the first test data and described second in test end equipment first test information cells is surveyed Try the second test data in cell it is whether consistent before further include: judge whether the test end equipment connects in the given time Receive the test information cells, wherein in the case where the judgment result is yes, trigger first test data and described the Whether two test datas are consistent.
Preferably, first test information cells and/or second test information cells carry following information: issuing described the The end of link slogan and next-hop end of link slogan of one test information cells.
According to another aspect of the present invention, a kind of fault detection means of random access memory ram is additionally provided, is answered For testing end equipment, comprising: sending module, for sending the first test information cells on the chain road where RAM to be detected;It receives Module flows through the second test information cells obtained after the link in first test information cells for receiving;Comparison module is used for Compare the second test data in the first test data and second test information cells in first test information cells whether one It causes;Judgment module, for judging whether all RAM to be detected of the chain road break down according to comparison result.
Preferably, the judgment module, including at least one of: the first judging unit, in the comparison result When indicating consistent, determine that all RAM to be detected in chain road are normal;Judging unit, for referring in the comparison result When showing inconsistent, judge the link with the presence or absence of failure;Second judging unit, for determining in the link fault-free At least one RAM to be detected breaks down on the link.
Preferably, the judgment module, comprising: stage unit, in the link there are when failure, to the link It is classified, wherein each sublink obtained after classification forms the link;Unit is checked, for each subchain It is checked according to pre-set priority on road;Determination unit, for determining the sublink to break down;Third judging unit, For when determining that the current link conditions of the sublink to break down are normal, then determine in the sublink it is described to Detection RAM breaks down.
Preferably, described device further include: configuration module, for when the test end equipment includes line card, configuration to refer to Starting point of the alignment card as the test information cells signal, and the link connection relationship according to equipment where the RAM to be detected Configure the terminal that the specified line card is the test information cells.
Through the invention, the first test information cells are sent using on the chain road where RAM to be detected, and compares and is surveyed first Whether examination cell flows through the test data in the second test information cells and above-mentioned first test information cells that above-mentioned link is circulated throughout after coming It changes come the technical solution for judging whether RAM to be detected breaks down, solves the road in the related technology in cluster environment The problem of whether RAM breaks down is detected by device, not yet proposing a kind of simple and effective technical solution, can be surveyed with sending The mode for trying cell carries out batch detection to the RAM failure of the exchange chip on large-scale router, can check a single frame quickly The RAM failure of exchange chip on router, moreover it is possible to the exchange chip on the multiple frames of cluster environment be tested simultaneously, mentioned significantly High RAM malfunction elimination efficiency.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is the flow chart according to the fault detection method of the RAM of the embodiment of the present invention;
Fig. 2 is the RAM basic test flow chart according to the embodiment of the present invention;
Fig. 3 is the structural block diagram according to the fault detection means of the RAM of the embodiment of the present invention;
Fig. 4 is the another structural block diagram according to the fault detection means of the RAM of the embodiment of the present invention;
Fig. 5 is the single-stage router topology expanded schematic diagram according to the embodiment of the present invention;
Fig. 6 sends out N number of and tests letter to access 1 according to the selection of embodiment of the present invention exchange as the test initiation module first round First schematic diagram;
Fig. 7 is the single-stage frame first round test path schematic diagram according to the embodiment of the present invention;
Fig. 8 is the single-stage frame first round test result schematic diagram according to the embodiment of the present invention;
Fig. 9 is to take turns test result schematic diagram according to the single-stage frame second of the embodiment of the present invention;
Figure 10 is the cluster environment link topology expanded schematic diagram according to the embodiment of the present invention;
Figure 11 is to be selected exchange access 1 as test initiation module test signal according to the cluster environment of the embodiment of the present invention Figure;
Figure 12 is the cluster environment first round test result schematic diagram according to the embodiment of the present invention;
Figure 13 is the cluster environment first round another test result schematic diagram according to the embodiment of the present invention;
Figure 14 is to take turns test result schematic diagram according to the cluster environment second of the embodiment of the present invention;
Figure 15 is to take turns another test result schematic diagram according to the cluster environment second of the embodiment of the present invention.
Specific embodiment
Hereinafter, the present invention will be described in detail with reference to the accompanying drawings and in combination with Examples.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by written explanation Specifically noted structure is achieved and obtained in book, claims and attached drawing.
The embodiment of the invention provides the fault detection method of RAM a kind of, Fig. 1 is the RAM according to the embodiment of the present invention The flow chart of fault detection method, as shown in Figure 1, this method comprises the following steps:
Step S102: test end equipment sends the first test information cells on the chain road where RAM to be detected;
Step S104: above-mentioned test end equipment receives second obtained after above-mentioned first test information cells flow through above-mentioned link Test information cells;
Step S106: the first test data and above-mentioned second in above-mentioned more above-mentioned first test information cells of test end equipment Whether the second test data in test information cells is consistent;
Step S108: above-mentioned test end equipment judges all above-mentioned RAM to be detected of above-mentioned chain road according to comparison result Whether break down.
By above-mentioned each step, the first test information cells are sent using on the chain road where RAM to be detected, and compare First test information cells flow through the test number in the second test information cells and above-mentioned first test information cells that above-mentioned link is circulated throughout after coming According to whether changing come the technical solution for judging whether RAM to be detected breaks down, solve in the related technology in collection group rings On the router in border, a kind of simple and effective technical solution is proposed not yet to detect the problem of whether RAM breaks down, Ke Yiyong The mode for sending test information cells carries out batch detection to the RAM failure of the exchange chip on large-scale router, can check one quickly The RAM failure of exchange chip on a single frame router, moreover it is possible to the exchange chip on the multiple frames of cluster environment is tested simultaneously, Substantially increase RAM malfunction elimination efficiency.
In an alternative embodiment of the embodiment of the present invention, above-mentioned test end equipment judges above-mentioned chain according to comparison result Whether all above-mentioned RAM to be detected of road break down, including at least one of: consistent in the instruction of above-mentioned comparison result When, determine that the above-mentioned all above-mentioned RAM to be detected in chain road are normal;When the instruction of above-mentioned comparison result is inconsistent, above-mentioned chain is judged Road whether there is failure;In above-mentioned link fault-free, determine that at least one RAM to be detected breaks down on above-mentioned link.
In the specific implementation process, if above-mentioned link does not break down currently, the embodiment of the invention also provides A kind of technical solution judgement specifically which RAM breaks down, i.e., above-mentioned test end equipment judges above-mentioned according to comparison result Whether all above-mentioned RAM to be detected of chain road break down, comprising: in above-mentioned link there are when failure, to above-mentioned link into Row classification, wherein each sublink obtained after classification forms above-mentioned link;To each above-mentioned sublink according to pre-set priority It is checked, determines the above-mentioned sublink to break down;Determining the current link conditions of the above-mentioned sublink to break down just Chang Shi then determines that the above-mentioned RAM to be detected in above-mentioned sublink breaks down.
Optionally, above-mentioned test end equipment includes: line card;Test end equipment is sent on the chain road where RAM to be detected It include: specified starting point of the line card as above-mentioned test information cells signal of configuration before first test information cells, and according to above-mentioned to be checked The link connection relationship of equipment where surveying RAM configures the terminal that above-mentioned specified line card is above-mentioned test information cells.
Further improvement to above-mentioned technical proposal of the embodiment of the present invention is, above-mentioned test end equipment more above-mentioned first Before whether the first test data in test information cells is consistent with the second test data in above-mentioned second test information cells further include: Judge whether above-mentioned test end equipment receives above-mentioned test information cells in the given time, wherein in the feelings that judging result is yes Under condition, triggers more above-mentioned first test data and whether above-mentioned second test data is consistent, wherein above-mentioned first test information cells And/or above-mentioned second test information cells carry following information: issuing the end of link slogan and next-hop of above-mentioned first test information cells End of link slogan.
In conclusion the embodiment of the present invention propose it is a kind of based on OAM on large-scale router RAM failure carry out batch The method of detection, to solve on the router of current large-scale use RAM, detecting the time-consuming and laborious problem of RAM failure.And Timeliness, convenience and the stability of Operations, Administration and Maintenance (OAM) provide best platform for the realization of the function.
In order to better understand the process fault detection of above-mentioned RAM, it is illustrated below in conjunction with a preferred embodiment, such as Shown in Fig. 2:
Preferred embodiment one
Step A: at the interface master control OAM, an exchange access board is selected, source is initiated as test, sends test letter Member.
Step B:, can in the sending end of link slogan and next-hop end of link slogan of the stem nominative testing of cell After the exchange for reaching next-hop, the receiving port of test initiation module is again returned to, this can be needed according to test and chain Road topology is specified, it should be noted that in the payload part of cell, can be needed to insert data appropriate according to user, this Inventive embodiments do not limit this.
Step C: when sending test information cells, a timer is set, is judged within the defined time, can be received To the test information cells sent before, if receive, then the test information cells payload received more now with send before Cell payload it is whether consistent.
Step D: if be as a result consistent, show that the link on the paths is all gone well, on the paths chip RAM is all gone well;If comparing, data are inconsistent, illustrate may there is RAM failure in chip that the paths are passed through, examining After looking into link there is no problem, confirms on exchange chip that the paths are passed through there is RAM failure, corresponding veneer is isolated.
Step E: if it exceeds the stipulated time, is not received by the test information cells for sending back and, then first checks for this road Diameter cochain line state if link is abnormal, may result in time-out under normal circumstances.If link is abnormal, ignore this test knot Fruit judges together further according to the link of global other exchange AM access module to the crosspoint, can also replace veneer again Test.
Step F: above-mentioned final testing result is counted
The core test method of above-mentioned testing scheme provided in an embodiment of the present invention has good scalability, can be convenient Be extended to cluster environment.After expansion, original characteristic of the embodiment of the present invention is able to maintain constant, original mould Block operation logic reuse degree is very high, is illustrated below in conjunction with another preferred embodiment:
Preferred embodiment two
The mode of the exchange chip work in lower cluster environment is introduced first, and wherein the exchange chip work on Cluster Line-card Chassis exists SF13 mode, the exchange chip on Main subrack work in SF2 mode.It is configured to the exchange chip of SF13 mode, by port numbers model Enclose the two parts for being divided into SF1 and SF3, SF1 constitutes the first order of switch network architecture, and SF3 forms the of switch network architecture Three-level.The part SF1 and SF3 of same chip is mutually independent on physics and logic function, occupies different links respectively Range.It is walked always from data flow, the input terminal of SF1 is connected with access is exchanged, and output end and Main subrack work in SF2 mode Under exchange chip input terminal be connected.SF3 works from Main subrack receives data letter on the exchange chip under SF2 mode Member, then exchange access chip is sent to by its output end.Be configured to SF2 exchange chip be used in cluster environment second Grade, it is the switching fabric of an omnidirectional, receives then SF3 that data relay to downstream from the SF1 of upstream.
Step A: it is the same with single stage environment, select exchange access board as testing initiation module at the interface OAM, The stem of test information cells is needed according to test and link topology specifies its test path, and test initiation plate can be returned to by making it finally The receiving end of card, in cluster environment, 2 when routing iinformation in each test information cells stem is by single-stage situation are jumped, and are extended to 4 It jumps, can just be returned to the receiving end that board is initiated in test.
Step B: and then be that can select the principle of data with self-determining test data inside the payload of test information cells It is to facilitate subsequent comparison.While sending test information cells, a timer is set, if receiving what return came at the appointed time Test information cells, and compared with sending cell payload originally, as a result unanimously, illustrates that the path is all gone well, otherwise illustrate this road There is ERROR on diameter;It if having crossed timing, still can not receive the test information cells from the paths, then illustrate that link test is super When, need to check particular problem reason.
Step C: if certain paths test result is ERROR or TIME OUT, just illustrate it is problematic on the paths, due to One paths include level Four link, can't judge it is specifically where out of joint, in order to solve the above-mentioned technical problem, the present invention Embodiment provides a kind of exclusive method realization quickly positioning:
Specific method is that whole faulty link diameter is divided into 4 grades, is checked to 4 grades of link level-one grades, this 4 grades of link difference It is linked into Cluster Line-card Chassis SF1 for exchange, chip where Cluster Line-card Chassis SF1 to Main subrack SF2, Main subrack SF2 to Cluster Line-card Chassis correspond to SF1 SF3, Cluster Line-card Chassis SF3 return to exchange access receiving end.Then it is just excluded since the first order, is accessed from exchange and send test information cells To Cluster Line-card Chassis SF1, the cell situation that is received according to SF1 judges whether this grade of link be normal, such as if normal, next to the Second level judged, i.e., sends test information cells to corresponding SF2 from Cluster Line-card Chassis SF1, the test information cells received further according to SF2 Information judged, and so on, until finding out failsafe link.
Step D: by all faulty paths, all investigation is out of order after link according to the above method, carries out first round test result Statistical analysis, be still the test result that the first round is counted in the form of matrix statistical form, due to cluster environment cascaded stages The increase of several increase and system complexity, in the analytic process for carrying out first round test result, with two matrix statistical forms It indicates, a table indicates the test case on Cluster Line-card Chassis, and a table indicates the test case on Main subrack.Table on Cluster Line-card Chassis Show that meaning as single-stage frame, laterally indicates exchange slot position, vertical table timberline slot.Lateral expression Main subrack on Main subrack On exchange, longitudinal exchange indicated on coupled Cluster Line-card Chassis, rule is the serial number by frame, slot from small to large.
Step E: cluster environment second take turns accurate test and seek to judge failsafe link on earth be it is how caused, to exclude Link influences.So accurate testing process to be done be exactly exclude link influence, detect specific veneer RAM failure.
Step F: and then count final test result.
By technical solution above as it can be seen that the embodiment of the present invention can be with the mode of transmission test information cells to large-scale router On the RAM failure of exchange chip carry out batch detection, rather than look for block RAM to be written and read test merely as conventional method To judge.If being tested with conventional method the exchange chip comprising a large amount of RAM, workload is inconceivable.Fortune The RAM failure of exchange chip on a single frame router can not only be checked quickly with this programme, moreover it is possible to the multiple frames of cluster environment On exchange chip simultaneously tested, substantially increase RAM malfunction elimination efficiency.
In addition, the detection work is realized at the interface OAM, it may be convenient to carry out slot position to slot position, slot position to rack, machine Frame is simple and easy to the RAM fault detection of exchange chip above rack etc., and can intuitively analyze test result.Adequately Timeliness, the convenience of OAM is utilized.And this method design is simple, and the survey that existing extensive exchange system is possessed is utilized Function and output end-input terminal loop back operation function are tried, does not need to change exchange system on a large scale, there is good reality The property used.Meanwhile the complexity of this method only can be linear with the growth of input, output end and crosspoint quantity increase, The surge for testing complexity will not be generated because exchange system capacity becomes larger.
The fault detection means of RAM a kind of is additionally provided in the present embodiment, is applied to test end equipment, for realizing upper State embodiment and preferred embodiment, the descriptions that have already been made will not be repeated, below to the module being related in the device into Row explanation.As used below, the combination of the software and/or hardware of predetermined function may be implemented in term " module ".Although with Device described in lower embodiment is preferably realized with software, but the combined realization of hardware or software and hardware It may and be contemplated.Fig. 3 is the structural block diagram according to the fault detection means of the RAM of the embodiment of the present invention.Such as Fig. 3 institute Show, which includes:
Sending module 30, for sending the first test information cells on the chain road where RAM to be detected;
Receiving module 32 is connect with sending module 30, for receiving after above-mentioned first test information cells flow through above-mentioned link The second obtained test information cells;
Comparison module 34 is connect with receiving module 32, for the first test data in more above-mentioned first test information cells It is whether consistent with the second test data in above-mentioned second test information cells;
Judgment module 36 is connect with comparison module 34, for judging all above-mentioned of above-mentioned chain road according to comparison result Whether RAM to be detected breaks down.
Optionally, as shown in figure 4, judgment module 36, including at least one of: the first judging unit 360, for State comparison result instruction it is consistent when, determine that all above-mentioned RAM to be detected in above-mentioned chain road are normal;
Judging unit 362, for judging above-mentioned link with the presence or absence of failure when the instruction of above-mentioned comparison result is inconsistent;
Second judging unit 364, connect with judging unit, in above-mentioned link fault-free, determining above-mentioned chain road At least one RAM to be detected breaks down.
In an alternative embodiment of the embodiment of the present invention, judgment module 36, comprising: stage unit 366 is used for Link is stated there are when failure, above-mentioned link is classified, wherein each sublink obtained after classification forms above-mentioned link;
Unit 368 is checked, is connect with stage unit 366, for being carried out to each above-mentioned sublink according to pre-set priority Investigation;
Determination unit 370 is connect, for determining the above-mentioned sublink to break down with investigation unit 368;
Third judging unit 372, for being connect with determination unit 370, for determining the above-mentioned sublink to break down Current link conditions it is normal when, then determine that the above-mentioned RAM to be detected in above-mentioned sublink breaks down.
Improvement to above-mentioned technical proposal of the embodiment of the present invention is, above-mentioned apparatus further include: configuration module 38, for When above-mentioned test end equipment includes line card, specified starting point of the line card as above-mentioned test information cells signal is configured, and according to above-mentioned The link connection relationship of equipment where RAM to be detected configures the terminal that above-mentioned specified line card is above-mentioned test information cells.
Method and apparatus in order to better understand above-mentioned RAM fault detection, below in conjunction with another preferred embodiment into Row explanation mainly includes following module: link topology parsing module, test information cells OAM sending module, time-out judge mould Block, test information cells receiving module, link state judgment module, test result statistical module;
Above-mentioned link topology parsing module, is the module of the preparation before testing, it parses exchange access board Link connection relationship between card and power board, in case being used when test to test information cells routing.
Above-mentioned test information cells OAM sending module (being equivalent to the configuration module 38 in above-described embodiment), is at the interface OAM On, specific line card or exchange board are selected, source is initiated as above-mentioned test, sends the module of test information cells.
Above-mentioned time-out judgment module, is that a timer is arranged when sending test information cells, for judging providing Within time, the module of the test information cells once sent can be received.
Above-mentioned test information cells receiving module (being equivalent to the comparison module 36 in above-described embodiment) is to send test information cells The receiving end of link receives the test information cells once sent, and it is net with the cell that once sent to extract the payload inside cell Lotus is compared, and the module of test result is obtained.
Above-mentioned link state judgment module (judgment module 34 for being equivalent to above-described embodiment), is connect in above-mentioned test information cells When receipts have abnormal, the link on path walked to it carries out the module of status checkout, judgement.
Above-mentioned test result statistical module is the module for statistical analysis to test result.Above by following side Formula is counted, and test initiation module oneself is being locally created and is safeguarding a matrix statistical form, and the table is for recording exchange system The test result of each paths in system.The lateral slot number for indicating purpose power board of table;The longitudinal of table indicates to send test letter The slot number of member.
The embodiment of the present invention can be used on the router of single-stage frame, also can be used in the cluster connected by optical fiber Environmentally.In above two scene, pass through Line cards with the input terminal and output end unit accessed on board in an exchange Internal circuit is connected (this connection relationship is not shown in the diagram), in actual system, inputs end unit and output end Unit is arranged in same Line cards, so quantity is consistent.Each crosspoint passes through with each input end unit respectively The port links of switching network are connected, and each crosspoint passes through chain between the port in switching network with each output end unit respectively Road is connected.
Embodiment one
Present example is applied on router single-stage frame, as shown in figure 5, being that the single-stage frame connection relationship of this example is opened up Figure is flutterred to show.
The RAM detection method of the present embodiment is divided into two-wheeled, and wherein the first round, steps are as follows:
Step 1: firstly, the connection relationship for parsing single machine frame exchange access board and exchanging between board is stored in the overall situation In variable.
Step 2: it is selected to exchange access 1 as initiation module is tested, at the beginning of testing initiation, the exchange parsed is connect Enter on board and power board that specific serdes connection relationship is filled up to the stem of test information cells between crosspoint unit.In single-stage In environment, it is that the traversal of all switch unit links is linked into from exchange that the first of test information cells, which jumps routing, and second, which jumps routing, is From crosspoint to the receiving end for the exchange access unit for sending test information cells, topological schematic diagram is as shown in instruction sheet 6.
Step 3: insert test data in 128 bytes of payload of test information cells, for convenience and it is accurate during, select respectively It selects 128 byte full 0s or 128 bytes complete 1 is tested.
Step 4: successively sending test information cells to N number of crosspoint, while the timer of 5ms is set.
Step 5: if the receiving end for testing initiation module, which receives to be routed back to from crosspoint, to be come within timing Test information cells, and by the test information cells received payload and test initiate before be written to the numerical value of test information cells Compare, if the two is consistent, then prove the link pass through crosspoint on RAM all there is no problem, if received survey Cell and originally inconsistent are tried, then proves that the possible problematic or link state of RAM present on this link is problematic, first Recording the paths test result is ERROR, and waiting is needed to further confirm that test, if having crossed timing, test is initiated Module, which does not all receive always, returns to the test information cells of coming, it is likely that is that veneer RAM is problematic, it is also possible to which physical link has Problem, first recording the paths test result is to receive time-out, to wait for subsequent accurate test.
Step 6: such as the above process, then to exchange access 2 ... ..., exchange access M carries out a time test, test road Diameter is as shown in instruction sheet 7.
Step 7: and then test result is analyzed, is counted, table is organized into as shown in Figure of description 8.Longitudinal direction is The exchange access board arranged from small to large, is laterally power board.If test result is that there is no problem on path, indicate this two Chain road between block board the RAM through chip there is no problem, if there is mistake or being a time out it is necessary to wait essence Really test.
It needs predeclared to be that " OK " in Fig. 8-9,12-15 indicates the RAM on the crosspoint that the link passes through, " ERROR " indicates this section of chain road there are problem, and " TIME OUT " indicates that this section of chain road in the given time does not receive Feedback.
According to test result above, the above-mentioned mathematical test course for starting to carry out single-stage frame, steps are as follows:
Step 1: being the link of ERROR for test result, it is likely that it be RAM failure is also likely to be link failure, this When will use under the debug command confirmation of internal system that current link state is either with or without problem, for example, chain road has not There is CRC etc..If link is problematic, behind will debug link and retest again, it is preliminary to judge if there is no problem for link Exchange chip RAM failure, needs to be isolated the switching board.
Step 2: being the link of TIME OUT for test result, the various debugging methods carried first with router are true Recognize whether physical link is problematic, for example, physical link chain rupture or connection relationship mismatch etc..If link is problematic, behind It will prepare to debug link or replacement test is initiated board and retested, record test result is link failure;If link There is no problem, tentatively judges exchange chip RAM failure, needs to be isolated the veneer.
Step 3: result above being arranged, example is as shown in Figure of description 9.By accurate test phase to every The investigation of paths, it is ensured that in the case where the overwhelming majority, the path to go wrong and exchange chip RAM are carried out accurate Positioning.If being not sure there are also link state, test can be replaced and initiate veneer progress exact p-value.
Embodiment 2
The embodiment of the present invention is applied in the cluster environment that optical fiber connects into.
Since the mode of cluster environment is different, link topology is also different, so its variation just has very much.Above-mentioned selection is real Example 2 is applied just by taking the topological structure inside Figure 10 as an example, first round testing procedure is specific as follows:
Step 1: it is the same with single stage environment, at the beginning of test starts, first the cocommutative topological relation of Main subrack is resolved to On line card, to give test information cells nominative testing path below.
Step 2: and then as shown in Figure of description 11, select exchange access 1 as test initiation module.It is above-mentioned to test The stem of cell specifies it to test routed path, and the receiving module that board is initiated in test can be returned to by making it finally.Wherein test The first jump routing that cell stem is filled out is to reach the first order that the input terminal is connected from input terminal where test initiation module The routing of crosspoint.Due to first order crosspoint have it is N number of, so the first of test information cells jumps routing will to traverse this N number of First order crosspoint.Followed by the second of cell jumps, instruction is the routing exchanged from the first order crosspoint to the second level, By the connection relationship topological diagram of cluster environment as it can be seen that each first order crosspoint has a link and second level crosspoint It is connected, so second below first order crosspoint jumps routing, it is exactly by the link traversal one of connection second level crosspoint Time.Routing is jumped in the 3rd of test information cells, what the second level crosspoint just parsed according to above-mentioned cluster topology was connected Link is enclosed, and can cover all third level crosspoints in this way.Test information cells the forth jump routing it is fairly simple, letter and Where Yan Zhi, which next returning go, and return to the receiving end of test initiation module link.
Step 3: the test data of oneself above-mentioned setting, selection principle are inserted in 128 byte payload of test information cells It is to facilitate to compare behind above-mentioned.It is generally described above that 128 byte full 0s or 128 bytes complete 1 can all be selected to be tested.
Step 4: it is the same with single stage environment, when sending test information cells, the timer provided with a 5ms, if Test information cells are had received within timing, then are compared the cell received with original cell, if than more consistent, Then paths test result OK, otherwise, the paths test result are ERROR, the test of Yao Jinhang troubleshooting.If advising It can not receive the test information cells of loopback in fixing time, then this link test results records TIME OUT, also to carry out troubleshooting Test.
Step 5: and so on, to exchange access 2 and exchange access 3 ... ..., exchange access N is successively tested.
Step 6: if certain paths test result is ERROR or TIME OUT, just illustrate it is problematic on the paths, due to One paths include level Four link, can't judge that specifically which rank of is out of joint.That is above-mentioned just to be positioned with exclusive method.
Specific method is that whole faulty link diameter is divided into 4 grades, is checked to 4 grades of link level-one grades, this 4 grades of link difference It is linked into Cluster Line-card Chassis SF1 for exchange, Cluster Line-card Chassis SF1 to Main subrack SF2, Main subrack SF2 to Cluster Line-card Chassis correspond to the SF3 of SF1, line card Frame SF3 returns to exchange access receiving end.Then above-mentioned just to be excluded since the first order, it is accessed from exchange and sends test information cells to line Card frame SF1, the cell situation received according to SF1 judge whether this grade of link be normal, such as if normal, the following second level, Test information cells are sent to corresponding SF2 from Cluster Line-card Chassis SF1, are judged further according to the test information cells information that SF2 is received, according to It is secondary to analogize, until finding out failsafe link.
Step 7: the test result of the first round is counted by way of matrix statistical form, with cluster environment cascade series Increase, test result is also shown by two tables.It is exactly above-mentioned cluster environment first as shown in Figure of description 12 and attached drawing 13 The test result of wheel.
The accurate exclusion test process of the second above-mentioned wheel has been begun to below, this process mainly excludes test process What middle link failure influenced, testing procedure is as follows:
Step 1: being the link of ERROR for test result, it is likely that it be RAM failure is also likely to be link failure, this When will use under the debug command confirmation of internal system that current link state is either with or without problem, for example, chain road has not There is CRC etc..If link is problematic, behind will debug link and retest again, it is preliminary to judge if there is no problem for link Exchange chip RAM failure, needs to be isolated the veneer.
Step 2: being the link of TIME OUT for test result, the various debugging methods carried first with router are true Recognize whether physical link is problematic, for example, physical link chain rupture or connection relationship mismatch etc..If link is problematic, behind It will prepare to debug link or replacement test is initiated board and retested, record test result is link failure;If link There is no problem, tentatively judges exchange chip RAM failure, needs to be isolated the veneer.
Step 3: result above being arranged, example is as shown in Figure of description 14 and attached drawing 15.By accurately surveying Investigation of the examination stage to each path, it is ensured that in the case where the overwhelming majority, to the path and exchange chip to go wrong RAM is accurately positioned.If being not sure there are also link state, test initiation veneer can be replaced and carry out confirmation survey Examination.
In order to exclude all RAM failures on exchange chip it is necessary to traverse all links on exchange chip, so test Initiation module preferably will also traverse exchange access slot position all on Cluster Line-card Chassis.
Those of ordinary skill in the art will appreciate that all or part of the steps in the above method can be instructed by program Related hardware is completed, and described program can store in computer readable storage medium, such as read-only memory, disk or CD Deng.Optionally, one or more integrated circuits can be used also to realize in all or part of the steps of above-described embodiment.Accordingly Ground, each module/unit in above-described embodiment can take the form of hardware realization, can also use the shape of software function module Formula is realized.The present invention is not limited to the combinations of the hardware and software of any particular form.
In conclusion the embodiment of the present invention realizes following the utility model has the advantages that solving in the related technology in cluster environment On router, a kind of simple and effective technical solution is proposed not yet the problem of whether RAM breaks down detected, it can be with sending The mode of test information cells carries out batch detection to the RAM failure of the exchange chip on large-scale router, can check a list quickly The RAM failure of exchange chip on frame router, moreover it is possible to the exchange chip on the multiple frames of cluster environment be tested simultaneously, significantly Improve RAM malfunction elimination efficiency.
Obviously, those skilled in the art should be understood that each module of the above invention or each step can be with general Computing device realize that they can be concentrated on a single computing device, or be distributed in multiple computing devices and formed Network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are stored It is performed by computing device in the storage device, and in some cases, it can be to be different from shown in sequence execution herein Out or description the step of, perhaps they are fabricated to each integrated circuit modules or by them multiple modules or Step is fabricated to single integrated circuit module to realize.In this way, the present invention is not limited to any specific hardware and softwares to combine.
These are only the preferred embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art For member, the invention may be variously modified and varied.All within the spirits and principles of the present invention, it is made it is any modification, Equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (8)

1. a kind of fault detection method of random access memory ram characterized by comprising
Test end equipment sends the first test information cells on the chain road where RAM to be detected;
The test end equipment, which is received, flows through the second test information cells obtained after the link in first test information cells;
In the first test data and second test information cells in test end equipment first test information cells Whether the second test data is consistent;
The test end equipment judges whether all RAM to be detected of the chain road break down according to comparison result;
Wherein, the test end equipment judges whether all RAM to be detected of the chain road occur according to comparison result Failure, comprising: in the link there are when failure, the link is classified, wherein each sublink obtained after classification Form the link;Each sublink is checked according to pre-set priority, determines the sublink to break down; When the current link conditions for determining the sublink to break down are normal, then determine described to be detected in the sublink RAM breaks down.
2. the method according to claim 1, wherein the test end equipment judges the chain according to comparison result Whether all RAM to be detected of road break down, including at least one of:
When comparison result instruction is consistent, determine that all RAM to be detected in chain road are normal;
When comparison result instruction is inconsistent, judge the link with the presence or absence of failure;In the link fault-free, sentence At least one RAM to be detected breaks down on the fixed link.
3. the method according to claim 1, wherein the test end equipment includes: line card;Test end equipment exists Chain road where RAM to be detected includes: before sending the first test information cells
Configure specified starting point of the line card as the test information cells signal, and the chain according to equipment where the RAM to be detected Road connection relationship configures the terminal that the specified line card is the test information cells.
4. the method according to claim 1, wherein in test end equipment first test information cells The first test data it is whether consistent with the second test data in second test information cells before further include:
Judge whether the test end equipment receives the test information cells in the given time, wherein in judging result be yes In the case where, it triggers first test data and whether second test data is consistent.
5. the method according to claim 1, wherein first test information cells and/or the second test letter Member carries following information:
Issue the end of link slogan and next-hop end of link slogan of first test information cells.
6. a kind of fault detection means of random access memory ram is applied to test end equipment characterized by comprising
Sending module, for sending the first test information cells on the chain road where RAM to be detected;
Receiving module flows through the second test information cells obtained after the link in first test information cells for receiving;
Comparison module, in the first test data and second test information cells in first test information cells Whether two test datas are consistent;
Judgment module, for judging whether all RAM to be detected of the chain road break down according to comparison result;
Wherein, the judgment module, comprising: stage unit, for, there are when failure, dividing the link in the link Grade, wherein each sublink obtained after classification forms the link;Check unit, for each sublink according to Pre-set priority is checked;Determination unit, for determining the sublink to break down;Third judging unit is used for When determining that the current link conditions of the sublink to break down are normal, then determine described to be detected in the sublink RAM breaks down.
7. device according to claim 6, which is characterized in that the judgment module, including at least one of:
First judging unit, for determining all RAM to be detected in chain road when comparison result instruction is consistent Normally;
Judging unit, for judging the link with the presence or absence of failure when comparison result instruction is inconsistent;
Second judging unit, in the link fault-free, determining that event occurs at least one RAM to be detected on the link Barrier.
8. device according to claim 6, which is characterized in that described device further include:
Configuration module, for when the test end equipment includes line card, configuring specified line card as the test information cells signal Starting point, and configuring the specified line card according to the link connection relationship of equipment where the RAM to be detected is the test The terminal of cell.
CN201410495434.3A 2014-09-24 2014-09-24 The fault detection method and device of random access memory ram Active CN105513645B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201410495434.3A CN105513645B (en) 2014-09-24 2014-09-24 The fault detection method and device of random access memory ram
PCT/CN2014/094143 WO2015131613A1 (en) 2014-09-24 2014-12-17 Method and device for detecting random-access memory (ram) malfunction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410495434.3A CN105513645B (en) 2014-09-24 2014-09-24 The fault detection method and device of random access memory ram

Publications (2)

Publication Number Publication Date
CN105513645A CN105513645A (en) 2016-04-20
CN105513645B true CN105513645B (en) 2019-04-23

Family

ID=54054485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410495434.3A Active CN105513645B (en) 2014-09-24 2014-09-24 The fault detection method and device of random access memory ram

Country Status (2)

Country Link
CN (1) CN105513645B (en)
WO (1) WO2015131613A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110045210A (en) * 2019-05-15 2019-07-23 深圳市英威腾电气股份有限公司 Functional safety detection method, device, functional safety module and detection system
CN112751688B (en) * 2019-10-30 2023-08-01 中兴通讯股份有限公司 Flow control processing method of OTN (optical transport network) equipment, electronic equipment and storage medium
CN111611119B (en) * 2020-05-27 2023-04-07 合肥工大高科信息科技股份有限公司 Method and system for realizing on-line self-check of RAM (random Access memory) under real-time operating system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6530052B1 (en) * 1999-12-29 2003-03-04 Advanced Micro Devices, Inc. Method and apparatus for looping back a current state to resume a memory built-in self-test
CN1835458A (en) * 2005-03-14 2006-09-20 华为技术有限公司 On-line detector and method of communicator service
CN101127650A (en) * 2007-09-29 2008-02-20 中兴通讯股份有限公司 A method and testing backboard for single board production test

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7836386B2 (en) * 2006-09-27 2010-11-16 Qimonda Ag Phase shift adjusting method and circuit

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6530052B1 (en) * 1999-12-29 2003-03-04 Advanced Micro Devices, Inc. Method and apparatus for looping back a current state to resume a memory built-in self-test
CN1835458A (en) * 2005-03-14 2006-09-20 华为技术有限公司 On-line detector and method of communicator service
CN101127650A (en) * 2007-09-29 2008-02-20 中兴通讯股份有限公司 A method and testing backboard for single board production test

Also Published As

Publication number Publication date
CN105513645A (en) 2016-04-20
WO2015131613A1 (en) 2015-09-11

Similar Documents

Publication Publication Date Title
CN109858195B (en) Online simulation system for necessary bit single-particle upset fault on SRAM (static random Access memory) type FPGA (field programmable Gate array)
CN105068929A (en) Test script generation method, test script generation device, testing method, testing device and testing system
CN105513645B (en) The fault detection method and device of random access memory ram
CN115083504B (en) Chip self-inspection method and chip
US8639466B2 (en) Computerised storage system comprising one or more replaceable units for managing testing of one or more replacement units
CN107516547A (en) The processing method and processing device of internal memory hard error
CN110262972A (en) A kind of failure testing tool and method towards micro services application
KR101211042B1 (en) Storage device and storing method for fault information of memory
WO2007147327A1 (en) Method, system and apparatus of fault location for communicaion apparatus
JP2002032998A (en) Fault analyzing device for semiconductor memory
WO2014047225A1 (en) Substitute redundant memory
CN105183641B (en) The data consistency verification method and system of a kind of kernel module
CN106373616B (en) Method and device for detecting faults of random access memory and network processor
US20120307651A1 (en) Protocol free testing of a fabric switch
Tseng et al. A reconfigurable built-in self-repair scheme for multiple repairable RAMs in SOCs
CN116705107B (en) Memory address transmission circuit, method and device, memory medium and electronic equipment
CN103036737A (en) Self-testing method for on-chip multi-node system for large-scale micro-system chip
CN102495778A (en) System and method for testing single-packet regular matching logic
CN112102875B (en) LPDDR test method, device, readable storage medium and electronic equipment
CN107040391A (en) A kind of fault detection method and forwarding unit
CN102239669B (en) Data forwarding method and router
CN108880914B (en) Interconnection network fault detection and positioning method based on network bandwidth test
CN104767658A (en) Method and device for online detecting message transmission errors
KR102483739B1 (en) Dram-based post-silicon debugging method and apparatus reusing bira cam structure
CN101086514A (en) Semiconductor device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant