WO2017012460A1 - Method and apparatus for detecting failure of random memory, and processor - Google Patents

Method and apparatus for detecting failure of random memory, and processor Download PDF

Info

Publication number
WO2017012460A1
WO2017012460A1 PCT/CN2016/088142 CN2016088142W WO2017012460A1 WO 2017012460 A1 WO2017012460 A1 WO 2017012460A1 CN 2016088142 W CN2016088142 W CN 2016088142W WO 2017012460 A1 WO2017012460 A1 WO 2017012460A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
random access
access memory
entry
data message
Prior art date
Application number
PCT/CN2016/088142
Other languages
French (fr)
Chinese (zh)
Inventor
潘静
安康
石金锋
许建文
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2017012460A1 publication Critical patent/WO2017012460A1/en

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C29/00Checking stores for correct operation ; Subsequent repair; Testing stores during standby or offline operation
    • G11C29/04Detection or location of defective memory elements, e.g. cell constructio details, timing of test signals
    • G11C29/08Functional testing, e.g. testing during refresh, power-on self testing [POST] or distributed testing
    • G11C29/12Built-in arrangements for testing, e.g. built-in self testing [BIST] or interconnection details
    • G11C29/38Response verification devices
    • G11C29/42Response verification devices using error correcting codes [ECC] or parity check

Definitions

  • the present invention relates to a storage device failure detection technology, and in particular, to a method, device, and processor for detecting a random memory failure.
  • DDR3 SDRAM DDR3 Synchronous Dynamic Random Access Memory
  • the traditional detection scheme for random access memory is: using a central processing unit (CPU) to construct certain test data, and the CPU controls the test data to be written into the random access memory, and then controlled by the CPU, which will be randomly
  • the data stored in the memory is read out and compared with the previously written data to obtain a failed random access memory address such as DDR3 unit information.
  • this detection scheme is very time consuming when scanning large-capacity random access memory, the scanning period is long, the detection failure speed is slow, and it is necessary to have CPU participation to complete the self-test.
  • CPU when the network processor forwarding plane has a large amount of data traffic It is necessary to manage and deliver the forwarding table entries, and perform protocol interaction and management operations. The CPU resource consumption in the above detection scheme is even more undesirable.
  • an embodiment of the present invention is directed to a method, apparatus, and processor for detecting a random memory failure, which can improve the efficiency of detecting a random memory failure.
  • Embodiments of the present invention provide a method for detecting a random memory failure, including:
  • ECC Error Correcting Code
  • the failure detection result of the random access memory is determined by comparing an ECC check result of the data message before writing to the random access memory and an ECC check result of the read data message.
  • the acquiring the data packet includes: acquiring data packets of multiple entries in sequence;
  • Performing an ECC check on the data packet, and writing the data packet to the random memory including: performing ECC check on the data packet of each entry, and writing the data packet of each entry into the random Memory
  • the reading the data packet from the random access memory and performing the ECC check on the read data packet includes: reading a data packet of the at least one entry from the random access memory, and reading the data packet The data message of each entry is ECC checked;
  • Determining the fault detection result of the random access memory by comparing the ECC check result of the data message before the writing to the random access memory and the ECC check result of the read data message, including: By comparing the ECC check result of the data message of each entry read and The data packet of the corresponding entry is determined by the ECC check result before being written into the random access memory, and the fault detection result of the random access memory corresponding to the data packet corresponding to the entry is determined.
  • the data packet of each entry includes a write table index and data to be written; wherein the write table index carries the data to be written Address information stored in the random access memory;
  • the writing the data message of each entry to the random access memory comprises: writing, to the random access memory, the data to be written of the data message of the corresponding entry based on the write table index of the data message of the corresponding entry.
  • the reading the data message of the at least one entry from the random access memory comprises: receiving, by the network processor, at least one scan message from the packet sending device, and each scan message sent by the packet sending device. Include a lookup table index for searching data of a corresponding entry in the random access memory;
  • the reading the data message of the at least one entry from the random access memory further comprises: the network processor reading the data of the corresponding entry from the random access memory based on each received scan message.
  • the bandwidth of the scan packet sent by the packet sending device is determined according to the traffic of the network processor forwarding data at the current time obtained by the packetizer.
  • the ECC check result of the data message of each entry read and the data message of the corresponding entry are compared with the ECC check result before being written into the random access memory, and the data of the corresponding entry is determined.
  • the fault detection result of the random access memory corresponding to the message includes: if the ECC check result of the data message of any one of the read items and the data message of the corresponding item are the ECC check result before being written into the random access memory If not, the fault information of the random access memory corresponding to the data packet of the corresponding entry is determined.
  • the ECC check algorithm of the data message before being written into the random access memory is the same as the ECC check algorithm of the read data message.
  • the method further The method includes: storing the data message in an ECC check result before being written into the random access memory.
  • An embodiment of the present invention further provides an apparatus for detecting a random memory fault, comprising: an obtaining module, a first verifying module, a second verifying module, and a fault determining module; wherein
  • a first verification module configured to perform error checking on the data packet and correct an ECC check, and write the data packet into a random access memory
  • a second check module configured to read the data packet from the random access memory, and perform ECC check on the read data packet
  • the fault determining module is configured to determine a fault detection result of the random access memory by comparing an ECC check result of the read data message with an ECC check result of the data message before writing to the random access memory.
  • the acquiring module is further configured to sequentially acquire data packets of multiple entries
  • the first check module is further configured to perform ECC check on the data packet of each entry, and write the data packet of each entry into the random access memory;
  • the second verification module is further configured to: read a data packet of the at least one entry from the random access memory, and perform an ECC check on the read data packet of each entry;
  • the fault determining module is further configured to determine, by comparing the ECC check result of the data message of each entry read and the ECC check result of the data message of the corresponding entry before writing to the random access memory, to determine corresponding The failure detection result of the random access memory corresponding to the data message of the entry.
  • the data packet of each entry includes a write table index and data to be written; wherein the write table index carries the data to be written Address information stored in the random access memory;
  • the first verification module is further configured to write, to the random access memory, the data to be written of the data message of the corresponding entry based on the write table index of the data message of the corresponding entry.
  • the second verification module is further configured to receive the packets from the packetizer in sequence. Having one scan message, each scan message sent by the packet sender includes a lookup table index for searching data of the corresponding entry in the random access memory;
  • the second verification module is further configured to read data of the corresponding entry from the random access memory based on each received scan message.
  • the bandwidth of the scan packet sent by the packet sending device is determined according to the traffic of the network processor forwarding data at the current time obtained by the packetizer.
  • the fault determining module is further configured to: the ECC check result of the data message of any one of the read items and the ECC check result of the data message of the corresponding item before being written into the random access memory.
  • the fault information of the random access memory corresponding to the data message corresponding to the entry is determined.
  • the embodiment of the present invention further provides a network processor, including any one of the foregoing devices for detecting a random memory failure.
  • a method, device, and processor for detecting a random memory fault are provided by the embodiment of the present invention, and the error check and the corrected ECC check result and the data message before the data message is written into the random memory are read out
  • the ECC check results are compared to obtain the fault detection result of the random memory. Since the random memory fault can be detected without the participation of the CPU, the CPU resource consumption can be reduced. In addition, compared with the existing ones. Compared with the scheme of writing and reading test data, the embodiment of the present invention only needs to compare the ECC check result of the corresponding data, and obviously can reduce the data calculation amount, improve the efficiency of detecting the random memory failure, and shorten the location of the random memory fault. time.
  • FIG. 1 is a flow chart of a first embodiment of a method for detecting a random memory failure according to the present invention
  • FIG. 2 is a flow chart of a second embodiment of a method for detecting a random memory failure according to the present invention
  • FIG. 3 is a schematic structural diagram of a device for detecting a random memory failure according to an embodiment of the present invention.
  • FIG. 1 is a flow chart of a first embodiment of a method for detecting a random memory failure according to the present invention. As shown in FIG. 1, the method includes:
  • Step 100 Obtain a data packet.
  • the data message includes a write table index and data to be written, and the write table index carries address information that the data to be written needs to be stored in the random access memory, and the write table index may be located in the data report.
  • the length of the data to be written may be 64 bits, 128 bits or 256 bits, and the data to be written may be all 1 data, data composed of random numbers, data arranged in increments or in descending order. Data, etc.; the random access memory can be DDR3.
  • the network processor may be used to receive data packets from multiple items of the packetizer in sequence; further, the parameters of the data packet sent by the packetizer may be pre-configured, and the parameters of the data packet sent by the packetizer include: the packetizer Number of data packets sent, packet type, packet length, packet format, bandwidth, and transmission rate.
  • the generation rule of the data packet sent by the packet sending device twice or twice may be the same or different; the bandwidth of the data packet sent by the packet sending device twice or twice may be the same or different.
  • the network processor can also be used to sequentially receive data of multiple entries from the CPU; here, the CPU controls the write table index and data of the data message sent by the network processor.
  • the packet sending device when the packet sending device sends the data packet transmission mode to the network processor, since the parameters of the data packet sent by the packet sending device can be pre-configured, it is more suitable for performing random access to the granularity unit such as DDR3. Fault checking; and the manner in which the CPU writes data entries to the network processor is typically present in the actual forwarding scenario.
  • Step 101 Perform an ECC check on the data packet, and write the data packet into a random Memory.
  • performing ECC check on the data packet, and writing the data packet to the random memory includes: performing ECC check on the data packet of each entry, and writing the data packet of each entry Random access memory.
  • whether the data packet of the corresponding item needs to be ECC-checked may be pre-configured.
  • performing ECC check on the data packet includes: a datagram of each entry of the network processor that needs to perform ECC check.
  • the text is ECC checked.
  • the data message that needs to perform ECC check may be a data message of all entries received by the network processor, or may be part of a data message of all entries received by the network processor.
  • writing the data message of each entry to the random access memory includes: writing the data to be written of the data message of the corresponding entry into the random access memory based on the write table index of the data message of the corresponding entry.
  • the ECC check result of the data message before being written into the random access memory may be stored, for example, the data message is written in the random The ECC check result before the memory is written to the random access memory.
  • Step 102 Read the data packet from the random access memory, and perform ECC check on the read data packet.
  • the reading the data packet from the random access memory and performing ECC check on the read data packet includes: reading, by the random access memory, a data packet of at least one entry. ECC check is performed on the data packet of each entry read.
  • reading the data message of the at least one entry from the random access memory includes: receiving, by the network processor, at least one scan message of the packetizer, each scan message and one strip The destination data packet corresponds to each other; each scan packet includes a lookup table index, and is used to search for a data packet of the corresponding entry in the random access memory, where the lookup table index is located in a certain field of the scan packet; Each scan message reads a data message of the corresponding entry from the random access memory.
  • the parameter that the packet sending device sends the scan message may be pre-configured.
  • the parameter that the packet sending device sends the scan message includes the bandwidth of the scan packet sent by the packet sending device, and the packet sending device sends the scan report. The number of packets sent, the format of the packet, the length of the packet, the format of the packet, and the sending rate.
  • the bandwidth of the scan packet sent by the packet sending device is determined according to the traffic of the network processor forwarding data at the current time obtained by the packet sending device, to ensure that the network processor receives the bandwidth of the scanned packet and the network processor forwards the data at the current moment.
  • the sum of the bandwidths is within the set threshold.
  • the network processor can simultaneously perform data forwarding and scan packet reception to ensure that the network processor can work normally.
  • the sender that sends the scan message is independent of the CPU, and thus does not consume CPU resources when transmitting the scan message.
  • the packet sending device can send the scanning packet according to a certain time frequency, and the number of the sending packet, the packet sending mode, and the packet length when the packet sending device scans the packet can be flexibly configured, and the sending packet sends the scan packet generated twice.
  • the rules may be the same or different.
  • the bandwidth of the scanning packets sent by the packet sending device twice or twice may be the same or different.
  • the ECC check algorithm of the data message of any one of the items read in this step and the data message of the corresponding item are the same as the ECC check algorithm before being written into the random access memory.
  • Step 103 Determine a fault detection result of the random access memory by comparing an ECC check result of the data message before writing to the random access memory and an ECC check result of the read data message.
  • the step specifically includes: comparing the ECC check result of the data message of each entry read and the ECC check result of the data message of the corresponding entry before writing to the random access memory, The failure detection result of the random access memory corresponding to the data message of the corresponding entry is determined.
  • the fault detection result of the random access memory corresponding to the data packet of the corresponding entry may be that the storage address of the random access memory corresponding to the data packet of the corresponding entry does not fail, or may be the random access memory corresponding to the data packet of the corresponding entry.
  • the specific fault information when the storage address fails.
  • the datagram of the corresponding item is indicated.
  • the storage address of the random access memory corresponding to the text is faulty.
  • the fault information of the random access memory corresponding to the data packet corresponding to the entry is determined, and the fault information of the random access memory corresponding to the data packet of the corresponding entry may be the data packet of the corresponding entry.
  • the storage address in the random access memory is compared by comparing the ECC check result of the data message of each entry read and the ECC check result of the data message of the corresponding entry before writing to the random access memory, the specific fault in the random access memory can be known. information.
  • the determined in this step is determined.
  • the failure detection result of the random access memory is that the random access memory does not have a failure.
  • the network processor stores the statistical result and the determined failure of the random access memory After the information, the data message of each read item and the stored statistical result are uploaded to the control plane together with the determined fault information of the random memory, and the control surface can analyze the received data by analyzing the received data.
  • the specific situation in which a failure occurs in the random access memory for example, can quickly locate the address of the failed memory in the random access memory based on the received data.
  • the method of detecting the random memory failure is referred to as a fully automatic detection mode; when the automatic detection mode is adopted, the network processor is used.
  • the obtained data packet and the scanned packet are sent by the packet sending device. Specifically, the packet sending device sends the scan packet to the network processor immediately after the data packet is sent.
  • the method of detecting the random memory failure is referred to as a semi-automatic detection mode; when the semi-automatic detection mode is adopted, the CPU sends a data message to the network processor, and the packetizer is used for Send a scan message to the network processor.
  • the other implementation processes are completed on the forwarding surface, and thus, the detection is performed.
  • the process of random access memory not only does not consume CPU resources, but also improves the speed of detecting random memory failures and shortens the positioning time of random memory failures.
  • the failure of the random access memory can be detected at any time regardless of whether or not the forwarding surface of the network processor has forwarding traffic.
  • the random access memory is DDR3, and the fault in the DDR3 is detected by the fully automatic detection mode.
  • FIG. 2 is a flow chart of a second embodiment of a method for detecting a random memory failure according to the present invention. As shown in FIG. 2, the method includes:
  • Step 201 The DDR3 controller in the network processor turns on the ECC check function.
  • Step 202 Send the data message of the i-th entry to the DDR3 controller in the network processor by using the packetizer, and the initial value of i is 1.
  • Step 203 The DDR3 controller extracts X-bit data of the data to be written of the data message of the i-th entry according to the bit width X bits of the read-write data, where X is a natural number greater than 1, and the bit width of the read-write data may be Pre-configuring; performing ECC check on the extracted X-bit data, and extracting the ECC check result of the X-bit data and the data message of the i-th entry based on the write table index of the data message of the i-th entry The data to be written is written into the storage space of DDR3.
  • Step 204 Determine whether the value of i is equal to N, N represents the total number of entries of the data packet that the packetizer needs to send to the DDR3 controller; if i is not equal to N, repeat steps 202 to 203 until the value of i is equal to N; If i is equal to N, the process of writing a data message to the DDR3 by the network processor is completed, and step 205 is performed.
  • Step 205 The DDR3 controller in the network processor receives the scan message from the jth entry of the packetizer, and the initial value of j is 1.
  • Step 206 The DDR3 controller reads the data packet of the corresponding entry from the DDR3 based on the lookup table index in the scan message of the jth entry; and extracts the corresponding entry in the data packet of the corresponding entry that is read out.
  • the X-bit data of the data message is subjected to ECC check of the extracted X-bit data.
  • Step 207 Comparing the ECC check result of the data packet of the corresponding corresponding entry with the ECC check result of the data packet of the corresponding entry before writing to DDR3, and obtaining a comparison result, and the comparison result is the two ECCs.
  • the verification results are the same or different.
  • the ECC check result of the data packet of the corresponding entry is the same as the ECC check result of the data packet of the corresponding entry before the DDR3 is written, it indicates that the storage address of the random access memory corresponding to the data packet of the corresponding entry is not If a fault occurs, the data packet of the corresponding entry is recorded as a data packet in which no random memory failure occurs, and step 208 is performed.
  • step 209 is performed.
  • Step 208 Perform statistics counting on data packets that do not have a random memory failure, and perform step 210.
  • Step 209 Perform statistical counting on the data packet in which the memory of the machine is faulty, and determine the fault information of the random access memory corresponding to the data packet corresponding to the entry, and the fault information of the random access memory corresponding to the data packet of the corresponding entry may be the data of the corresponding entry.
  • Step 210 is performed to store the address of the message in the random access memory.
  • Step 210 Determine whether the value of j is equal to M, where M represents the total number of entries of the scan message sent by the packetizer to the DDR3 controller; if j is not equal to M, repeat steps 205 to 209 until the value of j is equal to M; If j is equal to M, the process of detecting a random memory failure is completed, and the process ends.
  • M may be equal to N or less than N.
  • the method for detecting a random memory fault provided by the above two embodiments of the present invention can quickly locate the random access memory by comparing the ECC check result when the data message is written with the ECC check result when the data message is read. The location of the fault.
  • an embodiment of the present invention further provides an apparatus for detecting a random memory failure.
  • 3 is a schematic structural diagram of a device for detecting a random memory failure according to an embodiment of the present invention. As shown in FIG. 3, the device includes an obtaining module 300, a first verifying module 301, a second verifying module 302, and a fault determining module 303; among them,
  • the obtaining module 300 is configured to obtain a data packet.
  • the first verification module 301 is configured to perform error checking and correct ECC check on the data packet, and write the data packet into a random access memory.
  • the second verification module 302 is configured to read the data packet from the random access memory, and The read data packet is ECC checked.
  • the fault determining module 303 is configured to determine a fault detection result of the random access memory by comparing an ECC check result of the read data packet with an ECC check result of the data packet before writing the random memory. .
  • the acquiring module 300 is configured to sequentially acquire data messages of multiple entries.
  • the first check module 301 is configured to perform ECC check on the data packet of each entry, and write the data packet of each entry into the random access memory.
  • the second verification module 302 is configured to read a data message of at least one entry from the random access memory, and perform ECC check on the read data packet of each entry.
  • the fault determining module 303 is configured to determine, by comparing the ECC check result of the data message of each entry that is read out and the ECC check result of the data message of the corresponding entry before writing to the random access memory, to determine the corresponding The failure detection result of the random access memory corresponding to the data message of the entry.
  • the data packet of each entry includes a write table index and data to be written; wherein the write table index carries the data to be written and needs to be stored. Address information in the random access memory.
  • the first check module 301 is configured to write the data to be written of the data message of the corresponding entry into the random access memory based on the write table index of the data message of the corresponding entry.
  • the second check module 302 is further configured to receive at least one scan message, and read, according to each scan message, a data message of the corresponding item from the random memory; each scan message and Corresponding to the data message of an entry, each scan message includes a lookup table index for searching for a data message of the corresponding entry in the random access memory.
  • the second verification module 302 is configured to receive, by using a network processor, at least one scan packet from the packet sending device, and the bandwidth of the scan packet sent by the packet sender is determined according to the current time network acquired by the packetizer. The flow of the processor forwarding data is determined.
  • the fault determining module 303 is configured to read the number of any one of the entries. According to the ECC check result of the message and the data message of the corresponding entry, when the ECC check result before writing to the random access memory is different, the fault information of the random access memory corresponding to the data message of the corresponding entry is determined.
  • the device for detecting a random memory fault provided by the embodiment of the present invention can quickly locate the location where the random memory fails by comparing the ECC check result when the data message is written and the ECC check result when the data message is read. Improve the efficiency of locating random access memory failures.
  • the obtaining module 300, the first checking module 301, the second verifying module 302, and the fault determining module 303 may all be processed by a central processing unit (CPU) located in a network processor.
  • CPU central processing unit
  • MPU Micro Processor Unit
  • DSP Digital Signal Processor
  • FPGA Field Programmable Gate Array
  • the embodiment of the invention further provides a network processor, which comprises any device for detecting a random memory failure in the third embodiment of the invention.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • the embodiment of the present invention compares the ECC check result of the error check and the corrected ECC check result and the data message before the data message is written into the random memory, and obtains the fault detection result of the random memory. Since the detection of random memory failure can be completed without the participation of the CPU, the consumption of CPU resources can be reduced; in addition, the implementation of the present invention is compared with the existing scheme of comparing the written and read test data. The example only needs to compare the ECC check result of the corresponding data, obviously can reduce the data calculation amount, improve the efficiency of detecting the random memory failure, and shorten the time of locating the random memory fault location.

Abstract

A method and apparatus for detecting a failure of a random memory, and a processor, comprising: acquiring a data message (100); performing an error check and correction (ECC) check on the data message, and writing the data message into a random memory (101); reading the data message from the random memory, and performing the ECC check on the read data message (102); and by comparing the ECC check result before the data message is written into the random memory and the ECC check result of the read data message, determining a failure detection result of the random memory (103).

Description

检测随机存储器故障的方法及装置、处理器Method and device for detecting random memory failure, processor 技术领域Technical field
本发明涉及存储设备故障检测技术,尤其涉及一种检测随机存储器故障的方法及装置、处理器。The present invention relates to a storage device failure detection technology, and in particular, to a method, device, and processor for detecting a random memory failure.
背景技术Background technique
随机存储器如八倍速率同步动态随机存储器(DDR3Synchronous Dynamic Random Access Memory,DDR3SDRAM)已经广泛用于网络处理器中,用于存储网络转发设备的大量表项信息和数据,这里,DDR3SDRAM简称为DDR3。Random access memory such as DDR3 Synchronous Dynamic Random Access Memory (DDR3SDRAM) has been widely used in network processors to store a large number of entries and data of network forwarding devices. Here, DDR3 SDRAM is abbreviated as DDR3.
随着现代网络对网络处理设备性能和容量的需求越来越高,大容量、大速率的转发设备应运而生,而网络处理器的内部存储空间有限,需要挂接大量的随机存储器如DDR3来存储网络中的转发数据和信息。而随机存储器由于制作工艺等问题,可能出现内部颗粒坏死的情况。另外,在长时间使用随机存储器的情况下,由于多次写入和读出数据,可能会导致内部的用户数据发生错误,影响某些用户业务的处理和传输,因此,对随机存储器进行监控和检测就显得尤为重要。With the increasing demand for network processing equipment performance and capacity in modern networks, large-capacity, high-speed forwarding devices have emerged, and the internal storage space of network processors is limited, requiring a large amount of random access memory such as DDR3 to be attached. Forward data and information in the storage network. However, due to problems in the production process of the random access memory, internal particle necrosis may occur. In addition, in the case of using the random access memory for a long time, since the data is written and read a plurality of times, internal user data may be erroneous, affecting the processing and transmission of some user services, and therefore, the random access memory is monitored and Testing is especially important.
传统的对随机存储器的检测方案为:采用中央处理器(Central Processing Unit,CPU)构造一定的测试数据,由CPU控制将测试的数据写入到随机存储器中,然后,再由CPU控制,将随机存储器中存储的数据读出来,与之前写入的数据进行比较,得到发生故障的随机存储器地址信息如DDR3单元信息。然而,由于CPU带宽有限,该检测方案在扫描大容量的随机存储器时会非常耗时,扫描周期长,检测故障速度慢,而且必须要有CPU参与才能完成自检。当网络处理器转发面有大量数据流量时,CPU 需要管理、下发转发面的表项,并进行协议交互、管理等操作,上述检测方案中CPU资源的消耗就显得更不可取。The traditional detection scheme for random access memory is: using a central processing unit (CPU) to construct certain test data, and the CPU controls the test data to be written into the random access memory, and then controlled by the CPU, which will be randomly The data stored in the memory is read out and compared with the previously written data to obtain a failed random access memory address such as DDR3 unit information. However, due to the limited CPU bandwidth, this detection scheme is very time consuming when scanning large-capacity random access memory, the scanning period is long, the detection failure speed is slow, and it is necessary to have CPU participation to complete the self-test. CPU when the network processor forwarding plane has a large amount of data traffic It is necessary to manage and deliver the forwarding table entries, and perform protocol interaction and management operations. The CPU resource consumption in the above detection scheme is even more undesirable.
发明内容Summary of the invention
为解决上述技术问题,本发明实施例期望提供一种检测随机存储器故障的方法及装置、处理器,能够提高检测随机存储器故障的效率。In order to solve the above technical problem, an embodiment of the present invention is directed to a method, apparatus, and processor for detecting a random memory failure, which can improve the efficiency of detecting a random memory failure.
本发明实施例的技术方案是这样实现的:The technical solution of the embodiment of the present invention is implemented as follows:
本发明实施例提供了一种检测随机存储器故障的方法,包括:Embodiments of the present invention provide a method for detecting a random memory failure, including:
获取数据报文;Obtain data packets;
对所述数据报文进行错误检查和纠正(Error Correcting Code,ECC)校验,将所述数据报文写入随机存储器;Performing an Error Correcting Code (ECC) check on the data packet, and writing the data packet to a random access memory;
从所述随机存储器中读取所述数据报文,对读取出的数据报文进行ECC校验;Reading the data packet from the random access memory, and performing ECC check on the read data packet;
通过比较所述数据报文在写入所述随机存储器前的ECC校验结果和所述读取出的数据报文的ECC校验结果,确定所述随机存储器的故障检测结果。The failure detection result of the random access memory is determined by comparing an ECC check result of the data message before writing to the random access memory and an ECC check result of the read data message.
上述方案中,所述获取数据报文包括:依次获取多个条目的数据报文;In the above solution, the acquiring the data packet includes: acquiring data packets of multiple entries in sequence;
所述对所述数据报文进行ECC校验,将所述数据报文写入随机存储器,包括:对每个条目的数据报文进行ECC校验,将每个条目的数据报文写入随机存储器;Performing an ECC check on the data packet, and writing the data packet to the random memory, including: performing ECC check on the data packet of each entry, and writing the data packet of each entry into the random Memory
所述从所述随机存储器中读取所述数据报文,对读取出的数据报文进行ECC校验,包括:从所述随机存储器中读取至少一个条目的数据报文,对读取出的每个条目的数据报文进行ECC校验;The reading the data packet from the random access memory and performing the ECC check on the read data packet includes: reading a data packet of the at least one entry from the random access memory, and reading the data packet The data message of each entry is ECC checked;
所述通过比较所述数据报文在写入所述随机存储器前的ECC校验结果和所述读取出的数据报文的ECC校验结果,确定所述随机存储器的故障检测结果,包括:通过比较读取出的每个条目的数据报文的ECC校验结果和 对应条目的数据报文在写入所述随机存储器前的ECC校验结果,确定对应条目的数据报文对应的随机存储器的故障检测结果。Determining the fault detection result of the random access memory by comparing the ECC check result of the data message before the writing to the random access memory and the ECC check result of the read data message, including: By comparing the ECC check result of the data message of each entry read and The data packet of the corresponding entry is determined by the ECC check result before being written into the random access memory, and the fault detection result of the random access memory corresponding to the data packet corresponding to the entry is determined.
上述方案中,所述依次获取多个条目的数据报文中,每个条目的数据报文包括写表索引和待写入数据;其中,所述写表索引携带有所述待写入数据需要存储在所述随机存储器中的地址信息;In the foregoing solution, in the data packet that sequentially acquires multiple entries, the data packet of each entry includes a write table index and data to be written; wherein the write table index carries the data to be written Address information stored in the random access memory;
所述将每个条目的数据报文写入随机存储器包括:基于对应条目的数据报文的写表索引将对应条目的数据报文的待写入数据写入随机存储器。The writing the data message of each entry to the random access memory comprises: writing, to the random access memory, the data to be written of the data message of the corresponding entry based on the write table index of the data message of the corresponding entry.
上述方案中,所述从所述随机存储器中读取至少一个条目的数据报文包括:利用网络处理器依次接收来自发包器的至少一个扫描报文,所述发包器发送的每个扫描报文包括查表索引,用于在所述随机存储器中查找对应条目的数据;In the above solution, the reading the data message of the at least one entry from the random access memory comprises: receiving, by the network processor, at least one scan message from the packet sending device, and each scan message sent by the packet sending device. Include a lookup table index for searching data of a corresponding entry in the random access memory;
所述从所述随机存储器中读取至少一个条目的数据报文还包括:网络处理器基于接收的每个扫描报文,从所述随机存储器中读取出对应条目的数据。The reading the data message of the at least one entry from the random access memory further comprises: the network processor reading the data of the corresponding entry from the random access memory based on each received scan message.
上述方案中,所述发包器每次发送扫描报文的带宽依据发包器获取的当前时刻网络处理器转发数据的流量确定。In the above solution, the bandwidth of the scan packet sent by the packet sending device is determined according to the traffic of the network processor forwarding data at the current time obtained by the packetizer.
上述方案中,所述通过比较读取出的每个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果,确定对应条目的数据报文对应的随机存储器的故障检测结果,包括:如果读取出的任意一个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果不相同,则确定对应条目的数据报文对应的随机存储器的故障信息。In the above solution, the ECC check result of the data message of each entry read and the data message of the corresponding entry are compared with the ECC check result before being written into the random access memory, and the data of the corresponding entry is determined. The fault detection result of the random access memory corresponding to the message includes: if the ECC check result of the data message of any one of the read items and the data message of the corresponding item are the ECC check result before being written into the random access memory If not, the fault information of the random access memory corresponding to the data packet of the corresponding entry is determined.
上述方案中,所述数据报文在写入所述随机存储器前的ECC校验算法与所述读取出的数据报文的ECC校验算法相同。In the above solution, the ECC check algorithm of the data message before being written into the random access memory is the same as the ECC check algorithm of the read data message.
上述方案中,在将所述数据报文写入随机存储器的同时,所述方法还 包括:将所述数据报文在写入所述随机存储器前的ECC校验结果进行存储。In the above solution, while the data message is written into the random access memory, the method further The method includes: storing the data message in an ECC check result before being written into the random access memory.
本发明实施例还提供了一种检测随机存储器故障的装置,包括获取模块、第一校验模块、第二校验模块和故障确定模块;其中,An embodiment of the present invention further provides an apparatus for detecting a random memory fault, comprising: an obtaining module, a first verifying module, a second verifying module, and a fault determining module; wherein
获取模块,配置为获取数据报文;Obtaining a module, configured to obtain a data packet;
第一校验模块,配置为对所述数据报文进行错误检查和纠正ECC校验,将所述数据报文写入随机存储器;a first verification module, configured to perform error checking on the data packet and correct an ECC check, and write the data packet into a random access memory;
第二校验模块,配置为从所述随机存储器中读取所述数据报文,对读取出的数据报文进行ECC校验;a second check module, configured to read the data packet from the random access memory, and perform ECC check on the read data packet;
故障确定模块,配置为通过比较读取出的数据报文的ECC校验结果和所述数据报文在写入所述随机存储器前的ECC校验结果,确定所述随机存储器的故障检测结果。The fault determining module is configured to determine a fault detection result of the random access memory by comparing an ECC check result of the read data message with an ECC check result of the data message before writing to the random access memory.
上述方案中,所述获取模块,还配置为依次获取多个条目的数据报文;In the foregoing solution, the acquiring module is further configured to sequentially acquire data packets of multiple entries;
所述第一校验模块,还配置为对每个条目的数据报文进行ECC校验,将每个条目的数据报文写入随机存储器;The first check module is further configured to perform ECC check on the data packet of each entry, and write the data packet of each entry into the random access memory;
所述第二校验模块,还配置为从所述随机存储器中读取至少一个条目的数据报文,对读取出的每个条目的数据报文进行ECC校验;The second verification module is further configured to: read a data packet of the at least one entry from the random access memory, and perform an ECC check on the read data packet of each entry;
所述故障确定模块,还配置为通过比较读取出的每个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果,确定对应条目的数据报文对应的随机存储器的故障检测结果。The fault determining module is further configured to determine, by comparing the ECC check result of the data message of each entry read and the ECC check result of the data message of the corresponding entry before writing to the random access memory, to determine corresponding The failure detection result of the random access memory corresponding to the data message of the entry.
上述方案中,所述依次获取多个条目的数据报文中,每个条目的数据报文包括写表索引和待写入数据;其中,所述写表索引携带有所述待写入数据需要存储在所述随机存储器中的地址信息;In the foregoing solution, in the data packet that sequentially acquires multiple entries, the data packet of each entry includes a write table index and data to be written; wherein the write table index carries the data to be written Address information stored in the random access memory;
所述第一校验模块,还配置为基于对应条目的数据报文的写表索引将对应条目的数据报文的待写入数据写入随机存储器。The first verification module is further configured to write, to the random access memory, the data to be written of the data message of the corresponding entry based on the write table index of the data message of the corresponding entry.
上述方案中,所述第二校验模块,还配置为依次接收来自发包器的至 少一个扫描报文,所述发包器发送的每个扫描报文包括查表索引,用于在所述随机存储器中查找对应条目的数据;In the above solution, the second verification module is further configured to receive the packets from the packetizer in sequence. Having one scan message, each scan message sent by the packet sender includes a lookup table index for searching data of the corresponding entry in the random access memory;
所述第二校验模块,还配置为基于接收的每个扫描报文,从所述随机存储器中读取出对应条目的数据。The second verification module is further configured to read data of the corresponding entry from the random access memory based on each received scan message.
上述方案中,所述发包器每次发送扫描报文的带宽依据发包器获取的当前时刻网络处理器转发数据的流量确定。In the above solution, the bandwidth of the scan packet sent by the packet sending device is determined according to the traffic of the network processor forwarding data at the current time obtained by the packetizer.
上述方案中,所述故障确定模块,还配置为在读取出的任意一个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果不相同时,确定对应条目的数据报文对应的随机存储器的故障信息。In the above solution, the fault determining module is further configured to: the ECC check result of the data message of any one of the read items and the ECC check result of the data message of the corresponding item before being written into the random access memory. When not the same, the fault information of the random access memory corresponding to the data message corresponding to the entry is determined.
本发明实施例还提供了一种网络处理器,包括上述任意一种检测随机存储器故障的装置。The embodiment of the present invention further provides a network processor, including any one of the foregoing devices for detecting a random memory failure.
本发明实施例提供的一种检测随机存储器故障的方法及装置、处理器,通过对数据报文在写入随机存储器前的错误检查和纠正ECC校验结果和数据报文在被读取出时的ECC校验结果进行比较,得出随机存储器的故障检测结果,由于在不需要CPU参与的情况下可以完成随机存储器故障的检测,如此,可降低CPU资源的消耗;另外,与现有的比较写入和读取的测试数据的方案相比,本发明实施例只需要比较相应数据的ECC的校验结果,显然能够降低数据运算量,提高检测随机存储器故障的效率,缩短定位随机存储器故障位置的时间。A method, device, and processor for detecting a random memory fault are provided by the embodiment of the present invention, and the error check and the corrected ECC check result and the data message before the data message is written into the random memory are read out The ECC check results are compared to obtain the fault detection result of the random memory. Since the random memory fault can be detected without the participation of the CPU, the CPU resource consumption can be reduced. In addition, compared with the existing ones. Compared with the scheme of writing and reading test data, the embodiment of the present invention only needs to compare the ECC check result of the corresponding data, and obviously can reduce the data calculation amount, improve the efficiency of detecting the random memory failure, and shorten the location of the random memory fault. time.
附图说明DRAWINGS
图1为本发明检测随机存储器故障的方法的第一实施例的流程图;1 is a flow chart of a first embodiment of a method for detecting a random memory failure according to the present invention;
图2为本发明检测随机存储器故障的方法的第二实施例的流程图;2 is a flow chart of a second embodiment of a method for detecting a random memory failure according to the present invention;
图3为本发明实施例检测随机存储器故障的装置的组成结构示意图。 FIG. 3 is a schematic structural diagram of a device for detecting a random memory failure according to an embodiment of the present invention.
具体实施方式detailed description
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。The technical solutions in the embodiments of the present invention will be clearly and completely described in the following with reference to the accompanying drawings.
第一实施例First embodiment
图1为本发明检测随机存储器故障的方法的第一实施例的流程图,如图1所示,该方法包括:1 is a flow chart of a first embodiment of a method for detecting a random memory failure according to the present invention. As shown in FIG. 1, the method includes:
步骤100:获取数据报文。Step 100: Obtain a data packet.
这里,数据报文包括写表索引和待写入数据,所述写表索引携带有所述待写入数据需要存储在所述随机存储器中的地址信息,该写表索引可以位于所述数据报文的某个字段中;所述待写入数据长度可以是64bit、128bit或256bit,所述待写入数据可以是全1数据、随机数组成的数据、呈递增排列的数据或呈递减排列的数据等;所述随机存储器可以是DDR3。Here, the data message includes a write table index and data to be written, and the write table index carries address information that the data to be written needs to be stored in the random access memory, and the write table index may be located in the data report. In a certain field of the text, the length of the data to be written may be 64 bits, 128 bits or 256 bits, and the data to be written may be all 1 data, data composed of random numbers, data arranged in increments or in descending order. Data, etc.; the random access memory can be DDR3.
本步骤中,可以利用网络处理器依次接收来自发包器的多个条目的数据报文;进一步地,发包器发送数据报文的参数可预先配置,发包器发送数据报文的参数包括:发包器发送数据报文的数量、发包形态、报文长度、报文格式、带宽、发送速率等。发包器任意两次发送的数据报文的生成规则可以相同,也可以不同;发包器任意两次发送的数据报文的带宽可以相同,也可以不同。In this step, the network processor may be used to receive data packets from multiple items of the packetizer in sequence; further, the parameters of the data packet sent by the packetizer may be pre-configured, and the parameters of the data packet sent by the packetizer include: the packetizer Number of data packets sent, packet type, packet length, packet format, bandwidth, and transmission rate. The generation rule of the data packet sent by the packet sending device twice or twice may be the same or different; the bandwidth of the data packet sent by the packet sending device twice or twice may be the same or different.
本步骤中,还可以利用网络处理器依次接收来自CPU的多个条目的数据;这里,CPU通过控制面向网络处理器下发数据报文的写表索引和数据。In this step, the network processor can also be used to sequentially receive data of multiple entries from the CPU; here, the CPU controls the write table index and data of the data message sent by the network processor.
本发明第一实施例中,采用发包器向网络处理器发送数据报文的发送方式时,由于发包器发送数据报文的参数可以预先配置,因此比较适合对随机存储器如DDR3中颗粒度单元进行故障检查;而CPU向网络处理器写入数据条目的发送方式通常存在于实际的转发场景中。In the first embodiment of the present invention, when the packet sending device sends the data packet transmission mode to the network processor, since the parameters of the data packet sent by the packet sending device can be pre-configured, it is more suitable for performing random access to the granularity unit such as DDR3. Fault checking; and the manner in which the CPU writes data entries to the network processor is typically present in the actual forwarding scenario.
步骤101:对所述数据报文进行ECC校验,将所述数据报文写入随机 存储器。Step 101: Perform an ECC check on the data packet, and write the data packet into a random Memory.
具体地,对所述数据报文进行ECC校验,将所述数据报文写入随机存储器,包括:对每个条目的数据报文进行ECC校验,将每个条目的数据报文写入随机存储器。Specifically, performing ECC check on the data packet, and writing the data packet to the random memory includes: performing ECC check on the data packet of each entry, and writing the data packet of each entry Random access memory.
进一步地,可以预先配置对应条目的数据报文是否需要进行ECC校验,此时,对所述数据报文进行ECC校验包括:网络处理器对需要进行ECC校验的每个条目的数据报文进行ECC校验。这里,需要进行ECC校验的数据报文可以是网络处理器接收的所有条目的数据报文,也可以是网络处理器接收的所有条目的数据报文的一部分。Further, whether the data packet of the corresponding item needs to be ECC-checked may be pre-configured. In this case, performing ECC check on the data packet includes: a datagram of each entry of the network processor that needs to perform ECC check. The text is ECC checked. Here, the data message that needs to perform ECC check may be a data message of all entries received by the network processor, or may be part of a data message of all entries received by the network processor.
这里,将每个条目的数据报文写入随机存储器包括:基于对应条目的数据报文的写表索引将对应条目的数据报文的待写入数据写入随机存储器。Here, writing the data message of each entry to the random access memory includes: writing the data to be written of the data message of the corresponding entry into the random access memory based on the write table index of the data message of the corresponding entry.
这里,在将所述数据报文写入随机存储器的同时,可以将所述数据报文在写入所述随机存储器前的ECC校验结果进行存储,例如将数据报文在写入所述随机存储器前的ECC校验结果写入随机存储器。Here, while the data message is written into the random access memory, the ECC check result of the data message before being written into the random access memory may be stored, for example, the data message is written in the random The ECC check result before the memory is written to the random access memory.
步骤102:从所述随机存储器中读取所述数据报文,对读取出的数据报文进行ECC校验。Step 102: Read the data packet from the random access memory, and perform ECC check on the read data packet.
具体地,所述从所述随机存储器中读取所述数据报文,对读取出的数据报文进行ECC校验,包括:从所述随机存储器中读取至少一个条目的数据报文,对读取出的每个条目的数据报文进行ECC校验。这里,可以预先配置读取出的每个条目的数据报文是否需要进行ECC校验,通常读取出的需要进行ECC校验的数据报文与步骤101中需要进行ECC校验的数据报文保持一致。Specifically, the reading the data packet from the random access memory and performing ECC check on the read data packet includes: reading, by the random access memory, a data packet of at least one entry. ECC check is performed on the data packet of each entry read. Here, it is possible to pre-configure whether the read data packet of each entry needs to be subjected to ECC check, and the data packet that needs to be ECC-checked and the data packet that needs to be ECC-checked in step 101 are normally read. be consistent.
这里,从所述随机存储器中读取至少一个条目的数据报文包括:利用网络处理器依次接收发包器的至少一个扫描报文,每个扫描报文与一个条 目的数据报文相对应;每个扫描报文包括查表索引,用于在所述随机存储器中查找对应条目的数据报文,该查表索引位于所述扫描报文的某个字段中;基于每个扫描报文从所述随机存储器中读取出对应条目的数据报文。Here, reading the data message of the at least one entry from the random access memory includes: receiving, by the network processor, at least one scan message of the packetizer, each scan message and one strip The destination data packet corresponds to each other; each scan packet includes a lookup table index, and is used to search for a data packet of the corresponding entry in the random access memory, where the lookup table index is located in a certain field of the scan packet; Each scan message reads a data message of the corresponding entry from the random access memory.
进一步地,所述发包器发送扫描报文的参数可以预先配置,这里,所述发包器发送扫描报文的参数包括所述发包器每次发送扫描报文的带宽,还包括发包器发送扫描报文时的发包数量、发包形态、报文长度、报文格式、发送速率等。Further, the parameter that the packet sending device sends the scan message may be pre-configured. Here, the parameter that the packet sending device sends the scan message includes the bandwidth of the scan packet sent by the packet sending device, and the packet sending device sends the scan report. The number of packets sent, the format of the packet, the length of the packet, the format of the packet, and the sending rate.
具体地,所述发包器每次发送扫描报文的带宽依据发包器获取的当前时刻网络处理器转发数据的流量确定,确保当前时刻网络处理器接收扫描报文的带宽与网络处理器转发数据的带宽之和在设定阈值之内,如此,网络处理器可以同时进行数据转发和扫描报文的接收,保证网络处理器能够正常工作。这里,发送扫描报文的发包器与CPU相互独立,如此,在发送扫描报文时不消耗CPU的资源。Specifically, the bandwidth of the scan packet sent by the packet sending device is determined according to the traffic of the network processor forwarding data at the current time obtained by the packet sending device, to ensure that the network processor receives the bandwidth of the scanned packet and the network processor forwards the data at the current moment. The sum of the bandwidths is within the set threshold. Thus, the network processor can simultaneously perform data forwarding and scan packet reception to ensure that the network processor can work normally. Here, the sender that sends the scan message is independent of the CPU, and thus does not consume CPU resources when transmitting the scan message.
所述发包器能够按照一定的时间频率发送扫描报文,发包器发送扫描报文时的发包数量、发包形态、报文长度等都可灵活配置,发包器任意两次发送的扫描报文的生成规则可以相同,也可以不同;发包器任意两次发送的扫描报文的带宽可以相同,也可以不同。The packet sending device can send the scanning packet according to a certain time frequency, and the number of the sending packet, the packet sending mode, and the packet length when the packet sending device scans the packet can be flexibly configured, and the sending packet sends the scan packet generated twice. The rules may be the same or different. The bandwidth of the scanning packets sent by the packet sending device twice or twice may be the same or different.
需要说明的是,本步骤中所述读取出的任意一个条目的数据报文的ECC校验算法与对应条目的数据报文在写入所述随机存储器前的ECC校验算法相同。It should be noted that the ECC check algorithm of the data message of any one of the items read in this step and the data message of the corresponding item are the same as the ECC check algorithm before being written into the random access memory.
步骤103:通过比较所述数据报文在写入所述随机存储器前的ECC校验结果和所述读取出的数据报文的ECC校验结果,确定所述随机存储器的故障检测结果。Step 103: Determine a fault detection result of the random access memory by comparing an ECC check result of the data message before writing to the random access memory and an ECC check result of the read data message.
本步骤具体包括:通过比较读取出的每个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果,确 定对应条目的数据报文对应的随机存储器的故障检测结果。这里,对应条目的数据报文对应的随机存储器的故障检测结果可以是:对应条目的数据报文对应的随机存储器的存储地址没有出现故障,也可以是对应条目的数据报文对应的随机存储器的存储地址出现故障时的具体故障信息。The step specifically includes: comparing the ECC check result of the data message of each entry read and the ECC check result of the data message of the corresponding entry before writing to the random access memory, The failure detection result of the random access memory corresponding to the data message of the corresponding entry is determined. Here, the fault detection result of the random access memory corresponding to the data packet of the corresponding entry may be that the storage address of the random access memory corresponding to the data packet of the corresponding entry does not fail, or may be the random access memory corresponding to the data packet of the corresponding entry. The specific fault information when the storage address fails.
具体地,如果读取出的任意一个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果不相同,则说明对应条目的数据报文对应的随机存储器的存储地址出现故障,此时确定对应条目的数据报文对应的随机存储器的故障信息,对应条目的数据报文对应的随机存储器的故障信息可以是对应条目的数据报文在随机存储器中的存储地址。如此,通过比较读取出的每个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果,可以获知随机存储器中出现故障的具体信息。Specifically, if the ECC check result of the data message of any one of the read items and the data message of the corresponding item are not the same in the ECC check result before being written into the random access memory, the datagram of the corresponding item is indicated. The storage address of the random access memory corresponding to the text is faulty. At this time, the fault information of the random access memory corresponding to the data packet corresponding to the entry is determined, and the fault information of the random access memory corresponding to the data packet of the corresponding entry may be the data packet of the corresponding entry. The storage address in the random access memory. In this way, by comparing the ECC check result of the data message of each entry read and the ECC check result of the data message of the corresponding entry before writing to the random access memory, the specific fault in the random access memory can be known. information.
如果读取出的任意一个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果相同,则说明对应条目的数据报文对应的随机存储器的存储地址没有出现故障。If the ECC check result of the data message of any one of the read items and the data message of the corresponding item are the same as the ECC check result before writing to the random access memory, the random data corresponding to the corresponding item is indicated. The memory address of the memory did not malfunction.
需要说明的是,如果读取出的各个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果均相同,则本步骤中确定的所述随机存储器的故障检测结果为:所述随机存储器未出现故障。It should be noted that, if the ECC check result of the data message of each item read and the data message of the corresponding item are the same before the ECC check result before writing to the random access memory, the determined in this step is determined. The failure detection result of the random access memory is that the random access memory does not have a failure.
在对读取出的各个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果进行比较之后,分别统计两个ECC校验结果不相同的条目的数据报文、以及两个ECC校验结果相同的条目的数据报文,之后,利用网络处理器存储统计结果和所确定的随机存储器的故障信息。After comparing the ECC check result of the data message of each read item and the ECC check result of the data message of the corresponding item before writing to the random access memory, respectively, two ECC check results are not counted. The data message of the same entry, and the data message of the same entry of the two ECC check results, after which the network processor is used to store the statistical result and the determined fault information of the random access memory.
进一步地,在网络处理器存储统计结果和所确定的随机存储器的故障 信息之后,将读取出的各个条目的数据报文、以及存储的统计结果和所确定的随机存储器的故障信息一并上传至控制面,控制面通过分析接收到的数据,就可以分析得出随机存储器中发生故障的具体情况,例如,可以根据接收到的数据快速定位出随机存储器中出现故障的地址。Further, the network processor stores the statistical result and the determined failure of the random access memory After the information, the data message of each read item and the stored statistical result are uploaded to the control plane together with the determined fault information of the random memory, and the control surface can analyze the received data by analyzing the received data. The specific situation in which a failure occurs in the random access memory, for example, can quickly locate the address of the failed memory in the random access memory based on the received data.
本发明第一实施例中,如果在步骤100中网络处理器接收的数据报文来自发包器,则将检测随机存储器故障的方式称为全自动检测方式;采用全自动检测方式时,网络处理器获取的数据报文和扫描报文均由发包器发送,具体地说,发包器在发送数据报文完毕之后,立即向网络处理器发送扫描报文。In the first embodiment of the present invention, if the data message received by the network processor is from the packetizer in step 100, the method of detecting the random memory failure is referred to as a fully automatic detection mode; when the automatic detection mode is adopted, the network processor is used. The obtained data packet and the scanned packet are sent by the packet sending device. Specifically, the packet sending device sends the scan packet to the network processor immediately after the data packet is sent.
如果在步骤100中网络处理器接收的数据报文来自CPU,则将检测随机存储器故障的方式称为半自动检测方式;采用半自动检测方式时,CPU向网络处理器发送数据报文,而发包器用于向网络处理器发送扫描报文。If the data message received by the network processor is from the CPU in step 100, the method of detecting the random memory failure is referred to as a semi-automatic detection mode; when the semi-automatic detection mode is adopted, the CPU sends a data message to the network processor, and the packetizer is used for Send a scan message to the network processor.
本发明第一实施例中,除了CPU向网络处理器下发数据报文的写表索引的过程以及控制面最后分析随机存储器故障的过程之外,其余实现过程均在转发面完成,如此,检测随机存储器故障的过程不仅不消耗CPU的资源,还提高检测随机存储器故障的速度,并缩短随机存储器故障的定位时间。另外,在采用全自动检测模式时,不论网络处理器的转发面是否有转发流量,都能随时检测出随机存储器的故障。In the first embodiment of the present invention, except for the process in which the CPU sends the write table index of the data message to the network processor and the process of analyzing the random memory failure at the control plane, the other implementation processes are completed on the forwarding surface, and thus, the detection is performed. The process of random access memory not only does not consume CPU resources, but also improves the speed of detecting random memory failures and shortens the positioning time of random memory failures. In addition, when the fully automatic detection mode is adopted, the failure of the random access memory can be detected at any time regardless of whether or not the forwarding surface of the network processor has forwarding traffic.
第二实施例Second embodiment
为了能更加体现本发明的目的,在本发明第一实施例的基础上,进行进一步的举例说明。在本发明第二实施例中,随机存储器为DDR3,采用全自动检测方式检测DDR3中的故障。In order to further embodies the object of the present invention, further exemplification will be made on the basis of the first embodiment of the present invention. In the second embodiment of the present invention, the random access memory is DDR3, and the fault in the DDR3 is detected by the fully automatic detection mode.
图2为本发明检测随机存储器故障的方法的第二实施例的流程图,如图2所示,该方法包括:2 is a flow chart of a second embodiment of a method for detecting a random memory failure according to the present invention. As shown in FIG. 2, the method includes:
步骤201:网络处理器中的DDR3控制器开启ECC校验功能。 Step 201: The DDR3 controller in the network processor turns on the ECC check function.
步骤202:利用发包器向网络处理器中的DDR3控制器发送第i个条目的数据报文,i的初始值为1。Step 202: Send the data message of the i-th entry to the DDR3 controller in the network processor by using the packetizer, and the initial value of i is 1.
步骤203:DDR3控制器根据读写数据的位宽X比特,提取第i个条目的数据报文的待写入数据的X比特的数据,X为大于1的自然数,读写数据的位宽可以预先配置;将提取的X比特的数据进行ECC校验,基于第i个条目的数据报文的写表索引,将提取的X比特的数据的ECC校验结果和第i个条目的数据报文的待写入数据一并写入DDR3的存储空间。Step 203: The DDR3 controller extracts X-bit data of the data to be written of the data message of the i-th entry according to the bit width X bits of the read-write data, where X is a natural number greater than 1, and the bit width of the read-write data may be Pre-configuring; performing ECC check on the extracted X-bit data, and extracting the ECC check result of the X-bit data and the data message of the i-th entry based on the write table index of the data message of the i-th entry The data to be written is written into the storage space of DDR3.
步骤204:判断i的值是否等于N,N表示发包器需要向DDR3控制器发送的数据报文的条目的总数;如果i不等于N,重复执行步骤202至203,直至i的值等于N;如果i等于N,则说明网络处理器向DDR3写入数据报文的过程完成,此时执行步骤205。Step 204: Determine whether the value of i is equal to N, N represents the total number of entries of the data packet that the packetizer needs to send to the DDR3 controller; if i is not equal to N, repeat steps 202 to 203 until the value of i is equal to N; If i is equal to N, the process of writing a data message to the DDR3 by the network processor is completed, and step 205 is performed.
步骤205:网络处理器中的DDR3控制器接收来自发包器的第j个条目的扫描报文,j的初始值为1。Step 205: The DDR3 controller in the network processor receives the scan message from the jth entry of the packetizer, and the initial value of j is 1.
步骤206:DDR3控制器基于第j个条目的扫描报文中的查表索引,从DDR3中读取出对应条目的数据报文;在读取出的对应条目的数据报文中,提取对应条目的数据报文的X比特的数据,将提取的X比特的数据进行ECC校验。Step 206: The DDR3 controller reads the data packet of the corresponding entry from the DDR3 based on the lookup table index in the scan message of the jth entry; and extracts the corresponding entry in the data packet of the corresponding entry that is read out. The X-bit data of the data message is subjected to ECC check of the extracted X-bit data.
步骤207:将读取的对应条目的数据报文的ECC校验结果与对应条目的数据报文在写入DDR3前的ECC校验结果进行比较,得出比较结果,比较结果为这两个ECC校验结果相同或不相同。Step 207: Comparing the ECC check result of the data packet of the corresponding corresponding entry with the ECC check result of the data packet of the corresponding entry before writing to DDR3, and obtaining a comparison result, and the comparison result is the two ECCs. The verification results are the same or different.
如果读取的对应条目的数据报文的ECC校验结果与对应条目的数据报文在写入DDR3前的ECC校验结果相同,则说明对应条目的数据报文对应的随机存储器的存储地址没有出现故障,将读出的对应条目的数据报文记为未出现随机存储器故障的数据报文,执行步骤208。If the ECC check result of the data packet of the corresponding entry is the same as the ECC check result of the data packet of the corresponding entry before the DDR3 is written, it indicates that the storage address of the random access memory corresponding to the data packet of the corresponding entry is not If a fault occurs, the data packet of the corresponding entry is recorded as a data packet in which no random memory failure occurs, and step 208 is performed.
如果读取的对应条目的数据报文的ECC校验结果与对应条目的数据报 文在写入DDR3前的ECC校验结果不相同,则将读出的对应条目的数据报文记为出现随机存储器故障的数据报文,执行步骤209。If the ECC check result of the data packet of the corresponding entry is read and the datagram of the corresponding entry If the ECC check result before the DDR3 is written is different, the data packet of the corresponding entry is recorded as a data packet with a random memory failure, and step 209 is performed.
步骤208:对未出现随机存储器故障的数据报文进行统计计数,执行步骤210。Step 208: Perform statistics counting on data packets that do not have a random memory failure, and perform step 210.
步骤209:对出现机存储器故障的数据报文进行统计计数,确定对应条目的数据报文对应的随机存储器的故障信息,对应条目的数据报文对应的随机存储器的故障信息可以是对应条目的数据报文在随机存储器中的存储地址,执行步骤210。Step 209: Perform statistical counting on the data packet in which the memory of the machine is faulty, and determine the fault information of the random access memory corresponding to the data packet corresponding to the entry, and the fault information of the random access memory corresponding to the data packet of the corresponding entry may be the data of the corresponding entry. Step 210 is performed to store the address of the message in the random access memory.
步骤210:判断j的值是否等于M,M表示发包器向DDR3控制器发送的扫描报文的条目的总数;如果j不等于M,重复执行步骤205至209,直至j的值等于M;如果j等于M,则说明检测随机存储器故障的过程完成,此时结束流程。Step 210: Determine whether the value of j is equal to M, where M represents the total number of entries of the scan message sent by the packetizer to the DDR3 controller; if j is not equal to M, repeat steps 205 to 209 until the value of j is equal to M; If j is equal to M, the process of detecting a random memory failure is completed, and the process ends.
需要说明的是,步骤210中,M可以等于N,也可以小于N。It should be noted that, in step 210, M may be equal to N or less than N.
本发明上述两个实施例提供的检测随机存储器故障的方法,通过对写入数据报文时的ECC校验结果和读取数据报文时的ECC校验结果进行比较,可以快速定位随机存储器出现故障的位置。The method for detecting a random memory fault provided by the above two embodiments of the present invention can quickly locate the random access memory by comparing the ECC check result when the data message is written with the ECC check result when the data message is read. The location of the fault.
第三实施例Third embodiment
针对本发明第一实施例的方法,本发明实施例还提供了一种检测随机存储器故障的装置。图3为本发明实施例检测随机存储器故障的装置的组成结构示意图,如图3所示,该装置包括获取模块300、第一校验模块301、第二校验模块302和故障确定模块303;其中,For the method of the first embodiment of the present invention, an embodiment of the present invention further provides an apparatus for detecting a random memory failure. 3 is a schematic structural diagram of a device for detecting a random memory failure according to an embodiment of the present invention. As shown in FIG. 3, the device includes an obtaining module 300, a first verifying module 301, a second verifying module 302, and a fault determining module 303; among them,
获取模块300,配置为获取数据报文。The obtaining module 300 is configured to obtain a data packet.
第一校验模块301,配置为对所述数据报文进行错误检查和纠正ECC校验,将所述数据报文写入随机存储器。The first verification module 301 is configured to perform error checking and correct ECC check on the data packet, and write the data packet into a random access memory.
第二校验模块302,配置为从所述随机存储器中读取所述数据报文,对 读取出的数据报文进行ECC校验。The second verification module 302 is configured to read the data packet from the random access memory, and The read data packet is ECC checked.
故障确定模块303,配置为通过比较读取出的数据报文的ECC校验结果和所述数据报文在写入所述随机存储器前的ECC校验结果,确定所述随机存储器的故障检测结果。The fault determining module 303 is configured to determine a fault detection result of the random access memory by comparing an ECC check result of the read data packet with an ECC check result of the data packet before writing the random memory. .
具体地,所述获取模块300,配置为依次获取多个条目的数据报文。Specifically, the acquiring module 300 is configured to sequentially acquire data messages of multiple entries.
所述第一校验模块301,配置为对每个条目的数据报文进行ECC校验,将每个条目的数据报文写入随机存储器。The first check module 301 is configured to perform ECC check on the data packet of each entry, and write the data packet of each entry into the random access memory.
所述第二校验模块302,配置为从所述随机存储器中读取至少一个条目的数据报文,对读取出的每个条目的数据报文进行ECC校验。The second verification module 302 is configured to read a data message of at least one entry from the random access memory, and perform ECC check on the read data packet of each entry.
所述故障确定模块303,配置为通过比较读取出的每个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果,确定对应条目的数据报文对应的随机存储器的故障检测结果。The fault determining module 303 is configured to determine, by comparing the ECC check result of the data message of each entry that is read out and the ECC check result of the data message of the corresponding entry before writing to the random access memory, to determine the corresponding The failure detection result of the random access memory corresponding to the data message of the entry.
具体地,所述依次获取多个条目的数据报文中,每个条目的数据报文包括写表索引和待写入数据;其中,所述写表索引携带有所述待写入数据需要存储在所述随机存储器中的地址信息。Specifically, in the data packet that sequentially acquires multiple entries, the data packet of each entry includes a write table index and data to be written; wherein the write table index carries the data to be written and needs to be stored. Address information in the random access memory.
所述第一校验模块301,配置为基于对应条目的数据报文的写表索引将对应条目的数据报文的待写入数据写入随机存储器。The first check module 301 is configured to write the data to be written of the data message of the corresponding entry into the random access memory based on the write table index of the data message of the corresponding entry.
具体地,所述第二校验模块302,还配置为接收至少一个扫描报文,基于每个扫描报文从所述随机存储器中读取出对应条目的数据报文;每个扫描报文与一个条目的数据报文相对应,每个扫描报文包括查表索引,用于在所述随机存储器中查找对应条目的数据报文。Specifically, the second check module 302 is further configured to receive at least one scan message, and read, according to each scan message, a data message of the corresponding item from the random memory; each scan message and Corresponding to the data message of an entry, each scan message includes a lookup table index for searching for a data message of the corresponding entry in the random access memory.
具体地,所述第二校验模块302,配置为利用网络处理器依次接收来自发包器的至少一个扫描报文,所述发包器每次发送扫描报文的带宽依据发包器获取的当前时刻网络处理器转发数据的流量确定。Specifically, the second verification module 302 is configured to receive, by using a network processor, at least one scan packet from the packet sending device, and the bandwidth of the scan packet sent by the packet sender is determined according to the current time network acquired by the packetizer. The flow of the processor forwarding data is determined.
具体地,所述故障确定模块303,配置为在读取出的任意一个条目的数 据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果不相同时,确定对应条目的数据报文对应的随机存储器的故障信息。Specifically, the fault determining module 303 is configured to read the number of any one of the entries. According to the ECC check result of the message and the data message of the corresponding entry, when the ECC check result before writing to the random access memory is different, the fault information of the random access memory corresponding to the data message of the corresponding entry is determined.
本发明实施例提供的检测随机存储器故障的装置,通过对写入数据报文时的ECC校验结果和读取数据报文时的ECC校验结果进行比较,可以快速定位随机存储器出现故障的位置,提高定位随机存储器故障的效率。The device for detecting a random memory fault provided by the embodiment of the present invention can quickly locate the location where the random memory fails by comparing the ECC check result when the data message is written and the ECC check result when the data message is read. Improve the efficiency of locating random access memory failures.
在实际应用中,所述获取模块300、第一校验模块301、第二校验模块302和故障确定模块303均可由位于网络处理器中的中央处理器(Central Processing Unit,CPU)、微处理器(Micro Processor Unit,MPU)、数字信号处理器(Digital Signal Processor,DSP)、或现场可编程门阵列(Field Programmable Gate Array,FPGA)等实现。In an actual application, the obtaining module 300, the first checking module 301, the second verifying module 302, and the fault determining module 303 may all be processed by a central processing unit (CPU) located in a network processor. (Micro Processor Unit, MPU), Digital Signal Processor (DSP), or Field Programmable Gate Array (FPGA).
第四实施例Fourth embodiment
本发明实施例还提供了一种网络处理器,该网络处理器包括本发明第三实施例中任意一种检测随机存储器故障的装置。The embodiment of the invention further provides a network processor, which comprises any device for detecting a random memory failure in the third embodiment of the invention.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现 在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Realize A means of function specified in a flow or a flow and/or a block diagram of a block or blocks.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the present invention.
工业实用性Industrial applicability
本发明实施例通过对数据报文在写入随机存储器前的错误检查和纠正ECC校验结果和数据报文在被读取出时的ECC校验结果进行比较,得出随机存储器的故障检测结果,由于在不需要CPU参与的情况下可以完成随机存储器故障的检测,如此,可降低CPU资源的消耗;另外,与现有的比较写入和读取的测试数据的方案相比,本发明实施例只需要比较相应数据的ECC的校验结果,显然能够降低数据运算量,提高检测随机存储器故障的效率,缩短定位随机存储器故障位置的时间。 The embodiment of the present invention compares the ECC check result of the error check and the corrected ECC check result and the data message before the data message is written into the random memory, and obtains the fault detection result of the random memory. Since the detection of random memory failure can be completed without the participation of the CPU, the consumption of CPU resources can be reduced; in addition, the implementation of the present invention is compared with the existing scheme of comparing the written and read test data. The example only needs to compare the ECC check result of the corresponding data, obviously can reduce the data calculation amount, improve the efficiency of detecting the random memory failure, and shorten the time of locating the random memory fault location.

Claims (15)

  1. 一种检测随机存储器故障的方法,包括:A method for detecting a random memory failure, comprising:
    获取数据报文;Obtain data packets;
    对所述数据报文进行错误检查和纠正ECC校验,将所述数据报文写入随机存储器;Performing error checking and correcting ECC check on the data packet, and writing the data packet into a random access memory;
    从所述随机存储器中读取所述数据报文,对读取出的数据报文进行ECC校验;Reading the data packet from the random access memory, and performing ECC check on the read data packet;
    通过比较所述数据报文在写入所述随机存储器前的ECC校验结果和所述读取出的数据报文的ECC校验结果,确定所述随机存储器的故障检测结果。The failure detection result of the random access memory is determined by comparing an ECC check result of the data message before writing to the random access memory and an ECC check result of the read data message.
  2. 根据权利要求1所述的方法,其中,所述获取数据报文包括:依次获取多个条目的数据报文;The method of claim 1, wherein the acquiring the data message comprises: sequentially acquiring data messages of the plurality of entries;
    所述对所述数据报文进行ECC校验,将所述数据报文写入随机存储器,包括:对每个条目的数据报文进行ECC校验,将每个条目的数据报文写入随机存储器;Performing an ECC check on the data packet, and writing the data packet to the random memory, including: performing ECC check on the data packet of each entry, and writing the data packet of each entry into the random Memory
    所述从所述随机存储器中读取所述数据报文,对读取出的数据报文进行ECC校验,包括:从所述随机存储器中读取至少一个条目的数据报文,对读取出的每个条目的数据报文进行ECC校验;The reading the data packet from the random access memory and performing the ECC check on the read data packet includes: reading a data packet of the at least one entry from the random access memory, and reading the data packet The data message of each entry is ECC checked;
    所述通过比较所述数据报文在写入所述随机存储器前的ECC校验结果和所述读取出的数据报文的ECC校验结果,确定所述随机存储器的故障检测结果,包括:通过比较读取出的每个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果,确定对应条目的数据报文对应的随机存储器的故障检测结果。Determining the fault detection result of the random access memory by comparing the ECC check result of the data message before the writing to the random access memory and the ECC check result of the read data message, including: Determining the random access memory corresponding to the data message of the corresponding entry by comparing the ECC check result of the data message of each entry read and the ECC check result of the data message of the corresponding entry before writing to the random access memory The fault detection result.
  3. 根据权利要求2所述的方法,其中,所述依次获取多个条目的数据 报文中,每个条目的数据报文包括写表索引和待写入数据;其中,所述写表索引携带有所述待写入数据需要存储在所述随机存储器中的地址信息;The method of claim 2, wherein said sequentially acquiring data of a plurality of entries In the message, the data message of each entry includes a write table index and data to be written; wherein the write table index carries the address information that the data to be written needs to be stored in the random access memory;
    所述将每个条目的数据报文写入随机存储器包括:基于对应条目的数据报文的写表索引将对应条目的数据报文的待写入数据写入随机存储器。The writing the data message of each entry to the random access memory comprises: writing, to the random access memory, the data to be written of the data message of the corresponding entry based on the write table index of the data message of the corresponding entry.
  4. 根据权利要求2所述的方法,其中,所述从所述随机存储器中读取至少一个条目的数据报文包括:利用网络处理器依次接收来自发包器的至少一个扫描报文,所述发包器发送的每个扫描报文包括查表索引,用于在所述随机存储器中查找对应条目的数据;The method of claim 2, wherein the reading the data message of the at least one entry from the random access memory comprises: sequentially receiving, by the network processor, at least one scan message from the packetizer, the packetizer Each scan message sent includes a lookup table index for searching data of the corresponding entry in the random access memory;
    所述从所述随机存储器中读取至少一个条目的数据报文还包括:网络处理器基于接收的每个扫描报文,从所述随机存储器中读取出对应条目的数据。The reading the data message of the at least one entry from the random access memory further comprises: the network processor reading the data of the corresponding entry from the random access memory based on each received scan message.
  5. 根据权利要求4所述的方法,其中,所述发包器每次发送扫描报文的带宽依据发包器获取的当前时刻网络处理器转发数据的流量确定。The method according to claim 4, wherein the bandwidth of the scan packet sent by the packet sender is determined according to the traffic of the network processor forwarding data at the current time obtained by the packetizer.
  6. 根据权利要求2所述的方法,其中,所述通过比较读取出的每个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果,确定对应条目的数据报文对应的随机存储器的故障检测结果,包括:如果读取出的任意一个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果不相同,则确定对应条目的数据报文对应的随机存储器的故障信息。The method according to claim 2, wherein said ECC check result by comparing the ECC check result of the data message of each entry read and the data message of the corresponding entry before writing to said random access memory As a result, determining the fault detection result of the random access memory corresponding to the data packet of the corresponding entry includes: if the ECC check result of the data message of any one of the read items and the data message of the corresponding entry are written in the random If the ECC check results before the memory are different, the fault information of the random access memory corresponding to the data message corresponding to the entry is determined.
  7. 根据权利要求1至6任一项所述的方法,其中,所述数据报文在写入所述随机存储器前的ECC校验算法与所述读取出的数据报文的ECC校验算法相同。The method according to any one of claims 1 to 6, wherein the ECC check algorithm of the data message before writing to the random access memory is the same as the ECC check algorithm of the read data message .
  8. 根据权利要求1至6任一项所述的方法,其中,在将所述数据报文写入随机存储器的同时,所述方法还包括:将所述数据报文在写入所述随机存储器前的ECC校验结果进行存储。 The method according to any one of claims 1 to 6, wherein, while the data message is written to the random access memory, the method further comprises: before the data message is written to the random access memory The ECC check result is stored.
  9. 一种检测随机存储器故障的装置,包括获取模块、第一校验模块、第二校验模块和故障确定模块;其中,An apparatus for detecting a random memory fault, comprising: an obtaining module, a first verifying module, a second verifying module, and a fault determining module; wherein
    获取模块,配置为获取数据报文;Obtaining a module, configured to obtain a data packet;
    第一校验模块,配置为对所述数据报文进行错误检查和纠正ECC校验,将所述数据报文写入随机存储器;a first verification module, configured to perform error checking on the data packet and correct an ECC check, and write the data packet into a random access memory;
    第二校验模块,配置为从所述随机存储器中读取所述数据报文,对读取出的数据报文进行ECC校验;a second check module, configured to read the data packet from the random access memory, and perform ECC check on the read data packet;
    故障确定模块,配置为通过比较读取出的数据报文的ECC校验结果和所述数据报文在写入所述随机存储器前的ECC校验结果,确定所述随机存储器的故障检测结果。The fault determining module is configured to determine a fault detection result of the random access memory by comparing an ECC check result of the read data message with an ECC check result of the data message before writing to the random access memory.
  10. 根据权利要求9所述的装置,其中,所述获取模块,还配置为依次获取多个条目的数据报文;The apparatus according to claim 9, wherein the obtaining module is further configured to sequentially acquire data messages of a plurality of entries;
    所述第一校验模块,还配置为对每个条目的数据报文进行ECC校验,将每个条目的数据报文写入随机存储器;The first check module is further configured to perform ECC check on the data packet of each entry, and write the data packet of each entry into the random access memory;
    所述第二校验模块,还配置为从所述随机存储器中读取至少一个条目的数据报文,对读取出的每个条目的数据报文进行ECC校验;The second verification module is further configured to: read a data packet of the at least one entry from the random access memory, and perform an ECC check on the read data packet of each entry;
    所述故障确定模块,还配置为通过比较读取出的每个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果,确定对应条目的数据报文对应的随机存储器的故障检测结果。The fault determining module is further configured to determine, by comparing the ECC check result of the data message of each entry read and the ECC check result of the data message of the corresponding entry before writing to the random access memory, to determine corresponding The failure detection result of the random access memory corresponding to the data message of the entry.
  11. 根据权利要求10所述的装置,其中,所述依次获取多个条目的数据报文中,每个条目的数据报文包括写表索引和待写入数据;其中,所述写表索引携带有所述待写入数据需要存储在所述随机存储器中的地址信息;The apparatus according to claim 10, wherein, in the data message in which the plurality of entries are sequentially acquired, the data message of each entry includes a write table index and data to be written; wherein the write table index carries The data to be written needs address information stored in the random access memory;
    所述第一校验模块,还配置为基于对应条目的数据报文的写表索引将对应条目的数据报文的待写入数据写入随机存储器。 The first verification module is further configured to write, to the random access memory, the data to be written of the data message of the corresponding entry based on the write table index of the data message of the corresponding entry.
  12. 根据权利要求10所述的装置,其中,所述第二校验模块,还配置为依次接收来自发包器的至少一个扫描报文,所述发包器发送的每个扫描报文包括查表索引,用于在所述随机存储器中查找对应条目的数据;The apparatus according to claim 10, wherein the second verification module is further configured to sequentially receive at least one scan message from the packetizer, and each scan message sent by the packetizer includes a lookup table index. Data for finding a corresponding entry in the random access memory;
    所述第二校验模块,还配置为基于接收的每个扫描报文,从所述随机存储器中读取出对应条目的数据。The second verification module is further configured to read data of the corresponding entry from the random access memory based on each received scan message.
  13. 根据权利要求12所述的装置,其中,所述发包器每次发送扫描报文的带宽依据发包器获取的当前时刻网络处理器转发数据的流量确定。The apparatus according to claim 12, wherein the bandwidth of the scan packet sent by the packet sender is determined according to the traffic of the network processor forwarding data at the current time acquired by the packetizer.
  14. 根据权利要求10所述的装置,其中,所述故障确定模块,还配置为在读取出的任意一个条目的数据报文的ECC校验结果和对应条目的数据报文在写入所述随机存储器前的ECC校验结果不相同时,确定对应条目的数据报文对应的随机存储器的故障信息。The apparatus according to claim 10, wherein the failure determining module is further configured to write the randomized ECC check result of the data message of any one of the read items and the data message of the corresponding entry When the ECC check results before the memory are different, the fault information of the random access memory corresponding to the data message corresponding to the entry is determined.
  15. 一种网络处理器,包括权利要求9至14任一项所述的装置。 A network processor comprising the apparatus of any one of claims 9 to 14.
PCT/CN2016/088142 2015-07-23 2016-07-01 Method and apparatus for detecting failure of random memory, and processor WO2017012460A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510437309.1 2015-07-23
CN201510437309.1A CN106373616B (en) 2015-07-23 2015-07-23 Method and device for detecting faults of random access memory and network processor

Publications (1)

Publication Number Publication Date
WO2017012460A1 true WO2017012460A1 (en) 2017-01-26

Family

ID=57833703

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/088142 WO2017012460A1 (en) 2015-07-23 2016-07-01 Method and apparatus for detecting failure of random memory, and processor

Country Status (2)

Country Link
CN (1) CN106373616B (en)
WO (1) WO2017012460A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110442298B (en) * 2018-05-02 2021-01-12 杭州海康威视系统技术有限公司 Storage equipment abnormality detection method and device and distributed storage system
CN109545268A (en) * 2018-11-05 2019-03-29 西安智多晶微电子有限公司 A method of test RAM
CN111586349B (en) * 2020-04-16 2022-01-11 浙江大华技术股份有限公司 Data outage and continuous transmission method and system for monitoring equipment
CN112420114B (en) * 2020-11-04 2023-07-18 深圳市宏旺微电子有限公司 Fault detection method and device for memory chip
CN117079703B (en) * 2023-10-17 2024-02-02 紫光同芯微电子有限公司 Method and device for testing embedded memory of chip and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060107175A1 (en) * 2004-10-29 2006-05-18 International Business Machines Corporation System, method and storage medium for providing fault detection and correction in a memory subsystem
CN101681283A (en) * 2007-06-28 2010-03-24 国际商业机器公司 System and method for error correction and detection in a memory system
CN102135925A (en) * 2010-12-27 2011-07-27 西安锐信科技有限公司 Method and device for detecting error check and correcting memory
CN102646453A (en) * 2011-02-18 2012-08-22 安凯(广州)微电子技术有限公司 Method and system for testing error correcting code module in NandFlash controller
CN104240768A (en) * 2013-06-13 2014-12-24 英飞凌科技股份有限公司 Method for testing a memory and memory system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103200048B (en) * 2013-04-02 2018-07-13 中兴通讯股份有限公司 A kind of network processing unit method for detecting abnormality, device and network processing device
CN104519516B (en) * 2013-09-29 2018-11-09 华为技术有限公司 The method and device that memory is tested
CN104317525B (en) * 2014-09-23 2017-08-11 天津国芯科技有限公司 The extended method and device of a kind of random access memory

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060107175A1 (en) * 2004-10-29 2006-05-18 International Business Machines Corporation System, method and storage medium for providing fault detection and correction in a memory subsystem
CN101681283A (en) * 2007-06-28 2010-03-24 国际商业机器公司 System and method for error correction and detection in a memory system
CN102135925A (en) * 2010-12-27 2011-07-27 西安锐信科技有限公司 Method and device for detecting error check and correcting memory
CN102646453A (en) * 2011-02-18 2012-08-22 安凯(广州)微电子技术有限公司 Method and system for testing error correcting code module in NandFlash controller
CN104240768A (en) * 2013-06-13 2014-12-24 英飞凌科技股份有限公司 Method for testing a memory and memory system

Also Published As

Publication number Publication date
CN106373616A (en) 2017-02-01
CN106373616B (en) 2020-02-07

Similar Documents

Publication Publication Date Title
WO2017012460A1 (en) Method and apparatus for detecting failure of random memory, and processor
WO2021227556A1 (en) Network adapter node performance detection method and apparatus, device, and readable medium
US7676617B2 (en) Posted memory write verification
JP6686033B2 (en) Method and apparatus for pushing messages
CN113472607B (en) Application program network environment detection method, device, equipment and storage medium
US9497100B2 (en) Methods, systems, and computer readable media for providing fuzz testing functionality
US9218266B2 (en) Systems and methods for replication of test results in a network environment
CN102135925B (en) Method and device for detecting error check and correcting memory
CN110989922B (en) Distributed data storage method and system
WO2021135280A1 (en) Data check method for distributed storage system, and related apparatus
CN107181636B (en) Health check method and device in load balancing system
WO2011009332A1 (en) Method and device for processing data caching
CN103995901B (en) A kind of method for determining back end failure
US10176068B2 (en) Methods, systems, and computer readable media for token based message capture
WO2015087509A1 (en) State storage and restoration device, state storage and restoration method, and storage medium
WO2017008658A1 (en) Storage checking method and system for text data
US9288161B2 (en) Verifying the functionality of an integrated circuit
CN103746868A (en) Methods and apparatuses for sending and receiving testing messages, and testing equipment
CN104519516B (en) The method and device that memory is tested
WO2012137323A1 (en) Information processing device and pseudo-failure generation method
JP2023019091A (en) Communication analysis system, communication analysis method, and computer program
US9251054B2 (en) Implementing enhanced reliability of systems utilizing dual port DRAM
WO2017054182A1 (en) Data check method and apparatus
CN113301002B (en) Information processing method, device, electronic equipment and storage medium
WO2015090081A1 (en) Routing information aging method and device, and computer storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16827151

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16827151

Country of ref document: EP

Kind code of ref document: A1