WO2020087956A1 - 抓取NVME硬盘trace的方法、装置、设备及系统 - Google Patents

抓取NVME硬盘trace的方法、装置、设备及系统 Download PDF

Info

Publication number
WO2020087956A1
WO2020087956A1 PCT/CN2019/093360 CN2019093360W WO2020087956A1 WO 2020087956 A1 WO2020087956 A1 WO 2020087956A1 CN 2019093360 W CN2019093360 W CN 2019093360W WO 2020087956 A1 WO2020087956 A1 WO 2020087956A1
Authority
WO
WIPO (PCT)
Prior art keywords
hard disk
nvme hard
trace
error
register
Prior art date
Application number
PCT/CN2019/093360
Other languages
English (en)
French (fr)
Inventor
孙一心
Original Assignee
郑州云海信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 郑州云海信息技术有限公司 filed Critical 郑州云海信息技术有限公司
Priority to US17/275,827 priority Critical patent/US11442831B2/en
Publication of WO2020087956A1 publication Critical patent/WO2020087956A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2284Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing by power-on test, e.g. power-on self test [POST]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0766Error or fault reporting or storing
    • G06F11/0772Means for error signaling, e.g. using interrupts, exception flags, dedicated error registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4221Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4282Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
    • G06F13/4295Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus using an embedded synchronisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Definitions

  • the embodiments of the present invention relate to the technical field of server applications, and in particular, to a method, device, device, system, and computer-readable storage medium for capturing NVME hard disk traces.
  • NVME Non-Volatile Memory Express
  • PCIe peripheral component interconnect, express, high-speed serial computer expansion bus standard
  • NVME hard drives in server systems can include uncorrectable fatal, correctable non-fatal error, correctable error, etc.
  • the appearance is usually off disk, downtime, slowdown, etc.
  • the related technology uses the trigger function of the PCIe protocol analyzer to capture the actual operating data of the interface protocol of the NVME hard disk (that is, the trace of the NVME hard disk) to analyze various incompatibilities. mistake.
  • BIOS Basic Input Output System
  • PCH Plate Controller Hub, commonly known as integrated south bridge
  • the GPIO General Purpose Input, General Purpose Input / Output or Bus Extender
  • This method requires customized BIOS for different fault types.
  • the protocol analyzer manufacturer's original software trigger is used.
  • the trigger type is limited. It can only trigger several fixed types of errors. It cannot adapt well to the complex errors that occur in the server's complex system. Due to the fast transmission rate of the PCIe interface and the limited buffer capacity of the protocol analyzer itself, if it cannot be effectively triggered for a specific error type, it is often impossible to grab an effective PCIe trace for analysis. What's more, some mistakes are difficult to reproduce. Once missed, they need to wait a long time, wasting a lot of manpower and material resources.
  • the embodiments of the present disclosure provide a method, device, equipment, system and computer-readable storage medium for capturing NVME hard disk traces, without the need to customize the corresponding BIOS or trigger for different fault error types, and accurately and effectively capture the NVME hard disk The PCIe trace corresponding to the fault error.
  • the embodiments of the present invention provide the following technical solutions:
  • An aspect of an embodiment of the present invention provides a system for capturing NVME hard disk traces, including a BMC, a BIOS, a jig board, and a protocol analyzer.
  • the BMC is connected to the jig board and the BIOS, respectively.
  • the board is connected to the protocol analyzer;
  • the BIOS is used to collect register error information of the PCIe link when an error occurs on the PCIe link where the NVME hard disk is located, and send the register error information to the BMC; the BMC is used to error the register Send the information to the jig board;
  • the jig board includes a processor and a dial switch for triggering the protocol analyzer when the error type currently corresponding to the dial switch is consistent with the error type of the processor parsing the error information of the register Grab the PCIe trace of the NVME hard disk.
  • the GPIO pin of the jig board is connected to the trigger connector of the protocol analyzer, and the PCIe trace that triggers the protocol analyzer to grab the NVME hard disk is:
  • the RS-232 interface of the BMC on the server motherboard is connected to the input end of the jig board through a cable, and the jig board and the protocol analyzer are connected through a coaxial cable.
  • the BIOS sends the register error information to the BMC through a KCS link.
  • Another aspect of the embodiments of the present invention provides a method for capturing NVME hard disk traces, which is applied to a jig board and includes:
  • register error information which is the register information collected by the BIOS when an error occurs on the PCIe link where the NVME hard disk is located;
  • the jig board is connected to the protocol analyzer, the register error information is sent by the BIOS to the jig board through the BMC; the address information of the register error information has a corresponding relationship with the error type.
  • the GPIO pin of the jig board is connected to the trigger connector of the protocol analyzer, and the triggering protocol analyzer to capture the PCIe trace of the NVME hard disk includes:
  • the BIOS sends the register error information to the BMC through the KCS link, and the BMC sends the received register error information to the fixture board through the RS-232 serial port.
  • An embodiment of the present invention also provides a device for capturing NVME hard disk traces, which is applied to a jig board and includes:
  • An information acquisition module for acquiring register error information which is the register information collected by the BIOS when an error occurs on the PCIe link where the NVME hard disk is located; the register error information is the BIOS sent to the jig through the BMC Board; the jig board is connected to the protocol analyzer;
  • An error type analysis module used to parse the address information carried by the register error information to obtain the corresponding error type; the address information of the register error information has a corresponding relationship with the error type;
  • the error type matching result judgment module judges whether the error type is consistent with the current corresponding error type of the dial switch
  • the trigger module is configured to trigger a protocol analyzer to capture the PCIe trace of the NVME hard disk when the error type and the current corresponding error type of the dial switch are consistent.
  • An embodiment of the present invention also provides a device for capturing NVME hard disk traces, including a processor, which is used to execute the steps of the method for capturing NVME hard disk traces as described in any of the preceding items when the processor is used to execute a computer program stored in a memory .
  • An embodiment of the present invention finally provides a computer-readable storage medium, which stores a program for capturing the NVME hard disk trace.
  • the program for capturing the NVME hard disk trace is implemented by the processor as follows The steps of the method for capturing the NVME hard disk trace described in the previous item.
  • An embodiment of the present invention provides a system for capturing NVME hard disk traces, including a BMC, a BIOS, a protocol analyzer, and a fixture board including a processor and a dial switch.
  • the BIOS collects the register error information of the PCIe link and sends the register error information to the BMC, and then the BMC sends the received information to the jig board; the jig board is on the dial switch
  • the protocol analyzer is triggered to capture the PCIe trace of the NVME hard disk.
  • the advantage of the technical solution provided by this application is that the BIOS monitors the operating status information of the PCIe link where the NVME hard disk is located, and promptly captures the register error information when the failure occurs.
  • the processor of the jig board analyzes the type of these register error information.
  • the embodiments of the present invention also provide corresponding implementation devices, equipment, and computer-readable storage media for the method of capturing NVME hard disk traces, which further makes the method more practical.
  • the device, equipment, and computer-readable storage The media has corresponding advantages.
  • FIG. 1 is a schematic structural diagram of a system for capturing an NVME hard disk trace according to an exemplary embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of a method for capturing a trace of a NVME hard disk provided by an embodiment of the present invention
  • FIG. 3 is a structural diagram of a specific implementation manner of an apparatus for capturing NVME hard disk traces according to an embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of a system for capturing a NVME hard disk trace according to an embodiment of the present invention.
  • the embodiment of the present invention may include the following:
  • the NVME hard disk trace system may include a BMC (Baseboard Management Controller) 1, a BIOS 2, a jig board 3 and a protocol analyzer 4.
  • BMC Baseboard Management Controller
  • BMC1 is connected to the jig board 3 and BIOS2, the jig board 3 is connected to the protocol analyzer 4, and the BIOS2 is connected to the NVME hard disk.
  • BIOS2 can communicate with BMC1 via KCS link.
  • the RS-232 interface of BMC1 on the server motherboard can be connected to the input end of the jig board 3 through a cable.
  • the jig board 3 can be used as an output terminal through its GPIO pin, and can be connected to The trigger connector of the protocol analyzer 4 is connected.
  • BIOS1 monitors the PCIe link connected to the NVME hard disk in real time. When an error occurs on the PCIe link where the NVME hard disk is located, that is, when a PCIe error occurs on the related link, the PCIe link register error information is collected, and the register can be transferred through the KSC link. The error message is sent to BMC2.
  • BIOS1 detecting errors on the PCIe link and collecting register error information of the PCIe link can refer to the description of related technologies, and will not be repeated here.
  • BMC2 After receiving the information sent by BIOS1, BMC2 can send the received register error information to the jig board 3 through the serial port (RS-232).
  • the jig board 3 includes a processor and a dial switch, and each position of the dial switch corresponds to a register error type, such as unsupported request, badTLP, badDLLP, malformed TLP, and so on.
  • the user can simultaneously select one or more register error types to be triggered through the dial switch, that is, the user can select the incompatible test option of the NVME hard disk and the server system through the dial switch.
  • the processor of the jig board 3 can parse the received register error information to obtain its corresponding error type.
  • the processor can resolve the error type according to the address information carried in the register error information.
  • the address information carried in the register error information can be a custom address, and the address information and the type of the error type correspond uniquely. It should be noted that the address information here is different from its address in the configuration space.
  • the address information carried by the register error information is some custom addresses. For example, when the carried address is 11122, the type of register error information corresponding to the address Is A, and the address carried is 11221, the type of register error information corresponding to the address is B.
  • the processor control triggers the protocol analyzer 4 to grab the PCIe trace of the NVME hard disk, for example, by controlling the setting of the output pin To trigger the capture of the trace.
  • the BIOS is used to monitor the operating status information of the PCIe link where the NVME hard disk is located, and to timely capture the register error information when the fault occurs.
  • the processor of the jig board analyzes the error information of these registers by analyzing Type, and capture the trace through the comparison control protocol analyzer trigger to select the wrong type with the DIP switch, which realizes accurate and efficient capture of PCIe trace of NVME hard disk, not only solves the related technology needs to customize correspondence for different error types
  • the current status of the BIOS or trigger users can also freely choose the type of error that needs to be grabbed through the dial switch to grab the corresponding PCIe trace for analysis, which is helpful to efficiently and accurately find out the cause of the incompatibility between NVME hard disk and server system .
  • FIG. 2 is a schematic flowchart of a method for capturing a NVME hard disk trace provided by an embodiment of the present invention. For example, it can be used for the jig board of the foregoing embodiment.
  • the register error information is the register information collected by the BIOS when an error occurs on the PCIe link where the NVME hard disk is located.
  • the address information of the register error information has a corresponding relationship with the error type.
  • S203 Determine whether the error type is consistent with the current corresponding error type of the DIP switch, and if so, execute S204.
  • S204 Trigger the protocol analyzer to grab the PCIe trace of the NVME hard disk.
  • the jig board protocol analyzer is connected, and the register error information is sent by the BIOS to the jig board through the BMC.
  • the BIOS sends the register error information to the BMC through the KCS link.
  • the BMC sends the received register error information to the governance through the RS-232 serial port With board.
  • the GPIO pin of the jig board is connected to the trigger connector of the protocol analyzer.
  • S204 can send a command to set the GPIO pin to trigger the protocol analyzer to grab the PCIe trace of the NVME hard disk.
  • the embodiments of the present invention do not need to customize the corresponding BIOS or trigger for different fault error types, and accurately and effectively capture the PCIe trace corresponding to the NVME hard disk fault error.
  • the embodiment of the present invention also provides a corresponding implementation device for the method of capturing the NVME hard disk trace, which further makes the method more practical.
  • the following describes an apparatus for capturing an NVME hard disk trace provided by an embodiment of the present invention.
  • the apparatus for capturing an NVME hard disk trace described below and the method for capturing an NVME hard disk trace described above may refer to each other.
  • FIG. 3 is a structural diagram of a device for capturing an NVME hard disk trace according to an embodiment of the present invention, and the device may include:
  • the information obtaining module 301 is used to obtain register error information, which is the register information collected by the BIOS when an error occurs on the PCIe link where the NVME hard disk is located; register error information is the BIOS sent to the jig board through the BMC; the jig board protocol analysis The instrument is connected.
  • the error type analysis module 302 is used to parse the address information carried by the register error information to obtain the corresponding error type; the address information of the register error information has a corresponding relationship with the error type.
  • the error type matching result judgment module 303 judges whether the error type is consistent with the current corresponding error type of the dial switch.
  • the triggering module 304 is used to trigger the protocol analyzer to capture the PCIe trace of the NVME hard disk when the error type and the current corresponding error type of the dial switch are consistent.
  • the trigger module 304 may, for example, connect the GPIO pin of the jig board to the trigger connector of the protocol analyzer, and send an instruction to set the GPIO pin, To trigger the protocol analyzer to grab the PCIe trace module of the NVME hard disk.
  • each functional module of the device for capturing NVME hard disk traces may be specifically implemented according to the method in the above method embodiments.
  • the specific implementation process reference may be made to the related descriptions in the above method embodiments, and details are not described here. .
  • the embodiments of the present invention do not need to customize the corresponding BIOS or trigger for different fault error types, and accurately and effectively capture the PCIe trace corresponding to the NVME hard disk fault error.
  • An embodiment of the present invention also provides a device for capturing NVME hard disk traces, which may specifically include:
  • Memory used to store computer programs
  • the processor is used to execute a computer program to implement the steps of the method for capturing the trace of the NVME hard disk as described in any one of the above embodiments.
  • the embodiments of the present invention do not need to customize the corresponding BIOS or trigger for different fault error types, and accurately and effectively capture the PCIe trace corresponding to the NVME hard disk fault error.
  • An embodiment of the present invention also provides a computer-readable storage medium that stores a program for capturing a NVME hard disk trace.
  • the program for capturing an NVME hard disk trace is executed by a processor, the NVME hard disk is captured as described in any of the above embodiments. The steps of the trace method.
  • the embodiments of the present invention do not need to customize the corresponding BIOS or trigger for different fault error types, and accurately and effectively capture the PCIe trace corresponding to the NVME hard disk fault error.
  • RAM random access memory
  • ROM read-only memory
  • electrically programmable ROM electrically erasable and programmable ROM
  • registers hard disks, removable disks, CD-ROMs, or all fields of technology. Any other known storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种抓取NVME硬盘trace的方法、装置、设备、系统及计算机可读存储介质。其中,系统包括BMC、BIOS、协议分析仪及包含处理器和拨码开关的治具板。BIOS在NVME硬盘所在PCIe链路发生错误时,采集PCIe链路的寄存器错误信息,并将寄存器错误信息发送至BMC,然后BMC将接收到的信息发送至治具板;治具板在拨码开关当前对应的错误类型与处理器解析寄存器错误信息的错误类型相一致时,触发协议分析仪抓取NVME硬盘的PCIe trace。本申请提供的技术方案无需针对不同故障错误类型定制相应的BIOS或trigger,精确、有效的抓取出NVME硬盘故障错误对应的PCIe trace,有利于高效且准确的找出NVME硬盘和服务器系统不兼容的原因。

Description

抓取NVME硬盘trace的方法、装置、设备及系统
本申请要求于2018年11月01日提交中国专利局、申请号为201811295906.5、发明名称为“抓取NVME硬盘trace的方法、装置、设备及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及服务器应用技术领域,特别是涉及一种抓取NVME硬盘trace的方法、装置、设备、系统及计算机可读存储介质。
背景技术
NVME(Non-Volatile Memory Express,非易失性内存主机控制器接口规范)硬盘为目前服务器领域性能比较高端的一种硬盘类型,其接口基于PCIe(peripheral component interconnect express,高速串行计算机扩展总线标准)协议,具有接口速率高、IO吞吐快的特点。由于NVME硬盘性能较高,相应的,其对服务器系统兼容性的要求也相对较高。
常见的NVME硬盘在服务器系统中的故障可包括uncorrectable fatal error、correctable non-fatal error和correctable error等,表象通常为掉盘、宕机、降速等等。在做NVME硬盘在服务器系统中兼容性测试时,往往需要进行长时间各种模型的测试及调试,找出NVME硬盘和服务器系统不兼容的原因,以此来保证服务器系统的可用性。
在对NVME硬盘进行调试时,相关技术使用PCIe协议分析仪自带的trigger(触发条件)功能抓取NVME硬盘的接口协议实际运行数据(也即抓取NVME硬盘的trace)来分析各种不兼容的错误。针对具体的故障类型,可以使用BIOS(Basic Input Output System,基本输入输出系统)监控到PCIe出现错误时,解析出具体的故障,然后针对性的通过PCH(Platform Controller Hub,俗称集成南桥)上的GPIO (General Purpose Input Output,通用输入/输出或总线扩展器)触发协议分析仪的外部trigger,该方法需要针对不同的故障类型进行定制BIOS。
采用协议分析仪厂商原厂软件trigger的方式,trigger类型有限,只能针对固定几种类型的错误进行trigger,不能很好的适应服务器复杂系统中出现的复杂错误。由于PCIe接口传输速率快,加上协议分析仪本身缓存容量受限,所以如果不能针对具体错误类型有效地进行触发,则往往无法抓取有效的PCIe trace进行分析。更有甚者,一些错误很难复现,一旦错过,则又需要等待很长时间,浪费大量人力和物力。
发明内容
本公开实施例提供了一种抓取NVME硬盘trace的方法、装置、设备、系统及计算机可读存储介质,无需针对不同故障错误类型定制相应的BIOS或trigger,精确、有效地抓取出NVME硬盘故障错误对应的PCIe trace。
为解决上述技术问题,本发明实施例提供以下技术方案:
本发明实施例一方面提供了一种抓取NVME硬盘trace的系统,包括BMC、BIOS、治具板与协议分析仪,所述BMC分别与所述治具板和所述BIOS相连,所述治具板与所述协议分析仪相连;
所述BIOS用于在NVME硬盘所在PCIe链路发生错误时,采集所述PCIe链路的寄存器错误信息,并将所述寄存器错误信息发送至所述BMC;所述BMC用于将所述寄存器错误信息发送至治具板;
所述治具板包括处理器和拨码开关,用于在所述拨码开关当前对应的错误类型与所述处理器解析所述寄存器错误信息的错误类型相一致时,触发所述协议分析仪抓取所述NVME硬盘的PCIe trace。
可选的,所述治具板的GPIO引脚与所述协议分析仪的trigger连接器相连,所述触发所述协议分析仪抓取所述NVME硬盘的PCIe trace为:
置位所述治具板的GPIO引脚,以触发所述协议分析仪抓取所述NVME硬盘的PCIe trace。
可选的,通过线缆将服务器主板上所述BMC的RS-232接口连接到所述治具板的输入端,所述治具板与所述协议分析仪通过同轴线缆相连。
可选的,所述BIOS通过KCS链路将所述寄存器错误信息发送至所述BMC。
本发明实施例另一方面提供了一种抓取NVME硬盘trace的方法,应用于治具板,包括:
获取寄存器错误信息,所述寄存器错误信息为BIOS在NVME硬盘所在PCIe链路发生错误时采集的寄存器信息;
解析所述寄存器错误信息携带的地址信息得到对应的错误类型;
判断所述错误类型是否和拨码开关当前对应的错误类型相一致;
若是,则触发协议分析仪抓取所述NVME硬盘的PCIe trace;
其中,所述治具板所述协议分析仪相连,所述寄存器错误信息为所述BIOS通过BMC发送至所述治具板;所述寄存器错误信息的地址信息与错误类型具有对应关系。
可选的,所述治具板的GPIO引脚与所述协议分析仪的trigger连接器相连,所述触发协议分析仪抓取所述NVME硬盘的PCIe trace包括:
发送置位GPIO引脚的指令,以触发协议分析仪抓取所述NVME硬盘的PCIe trace。
可选的,所述BIOS通过KCS链路将所述寄存器错误信息发送至所述BMC,所述BMC将接收到的寄存器错误信息通过RS-232串口发送至所述治具板。
本发明实施例还提供了一种抓取NVME硬盘trace的装置,应用于治具板,包括:
信息获取模块,用于获取寄存器错误信息,所述寄存器错误信息为BIOS在NVME硬盘所在PCIe链路发生错误时采集的寄存器信息; 所述寄存器错误信息为所述BIOS通过BMC发送至所述治具板;所述治具板所述协议分析仪相连;
错误类型解析模块,用于解析所述寄存器错误信息携带的地址信息得到对应的错误类型;所述寄存器错误信息的地址信息与错误类型具有对应关系;
错误类型匹配结果判断模块,判断所述错误类型是否和拨码开关当前对应的错误类型相一致;
触发模块,用于在所述错误类型和拨码开关当前对应的错误类型相一致时,触发协议分析仪抓取所述NVME硬盘的PCIe trace。
本发明实施例还提供了一种抓取NVME硬盘trace的设备,包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如前任一项所述抓取NVME硬盘trace的方法的步骤。
本发明实施例最后还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有抓取NVME硬盘trace的程序,所述抓取NVME硬盘trace的程序被处理器执行时实现如前任一项所述抓取NVME硬盘trace的方法的步骤。
本发明实施例提供了一种抓取NVME硬盘trace的系统,包括BMC、BIOS、协议分析仪及包含处理器和拨码开关的治具板。BIOS在NVME硬盘所在PCIe链路发生错误时,采集PCIe链路的寄存器错误信息,并将寄存器错误信息发送至BMC,然后BMC将接收到的信息发送至治具板;治具板在拨码开关当前对应的错误类型与处理器解析寄存器错误信息的错误类型相一致时,触发协议分析仪抓取NVME硬盘的PCIe trace。
本申请提供的技术方案的优点在于,利用BIOS监控NVME硬盘所在PCIe链路的运行状态信息,并及时抓取发生故障时的寄存器错误信息,治具板的处理器通过解析这些寄存器错误信息的类型,并通过与拨码开关选择的错误类型的比对控制协议分析仪trigger抓取trace,实现了准确、高效地抓取NVME硬盘的PCIe trace,不仅解决了相关技术需要针对不同错误类型定制对应的BIOS或trigger的现状,用户 还可通过拨码开关自由选择所需要抓取的错误类型抓取出对应的PCIe trace进行分析,有利于高效且准确地找出NVME硬盘和服务器系统不兼容的原因。
此外,本发明实施例还针对抓取NVME硬盘trace的方法提供了相应的实现装置、设备及计算机可读存储介质,进一步使得所述方法更具有实用性,所述装置、设备及计算机可读存储介质具有相应的优点。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本公开。
附图说明
为了更清楚的说明本发明实施例或现有技术的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本公开根据一示例性实施例示出的抓取NVME硬盘trace的系统的结构框架示意图;
图2为本发明实施例提供的一种抓取NVME硬盘trace的方法的流程示意图;
图3为本发明实施例提供的抓取NVME硬盘trace的装置的一种具体实施方式结构图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面结合附图和具体实施方式对本发明作进一步的详细说明。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得 的所有其他实施例,都属于本发明保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等是用于区别不同的对象,而不是用于描述特定的顺序。此外术语“包括”和“具有”以及他们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可包括没有列出的步骤或单元。
在介绍了本发明实施例的技术方案后,下面详细的说明本申请的各种非限制性实施方式。
首先参见图1,图1为本发明实施例提供的一种抓取NVME硬盘trace系统的结构框架示意图,本发明实施例可包括以下内容:
抓取NVME硬盘trace系统可包括BMC(Baseboard Management Controller,基板管理控制器)1、BIOS2、治具板3与协议分析仪4。
BMC1分别与治具板3和BIOS2相连,治具板3与协议分析仪4相连,BIOS2与NVME硬盘相连。
BMC1和BIOS2均位于服务器主板,可选的,BIOS2可通过KCS链路与BMC1进行通讯。
可选的,可通过线缆将服务器主板上BMC1的RS-232接口连接到治具板3的输入端,治具板3可通过其GPIO引脚作为输出端,可通过同轴线缆相连与协议分析仪4的trigger连接器相连。
BIOS1实时监控连接NVME硬盘的PCIe链路,在NVME硬盘所在PCIe链路发生错误时,也即当相关链路出现PCIe报错时,采集PCIe链路的寄存器错误信息,并可通过KSC链路将寄存器错误信息发送至BMC2。其中,BIOS1检测PCIe链路发生错误和采集PCIe链路的寄存器错误信息的实现过程可参阅相关技术的描述,此处,便不再赘述。
BMC2在接收到BIOS1发送的信息后,可将接收到的寄存器错误信息通过串口(RS-232)发送至治具板3中。
治具板3包括处理器和拨码开关,拨码开关的每一个位置对应一种寄存器错误类型,例如unsupported request、badTLP、badDLLP、 malformed TLP等等。用户可通过拨码开关同时选择1个或者多个需要触发的寄存器错误类型,也即用户可通过拨码开关选择NVME硬盘和服务器系统不兼容的测试选项。
治具板3的处理器可对接收到的寄存器错误信息进行解析,得到其相应的错误类型。处理器可根据寄存器错误信息中携带的地址信息解析得到错误类型,寄存器错误信息中携带的地址信息可为自定义的地址,地址信息和错误类型种类唯一对应。需要说明的是,此处的地址信息不同于其在配置空间中的地址,寄存器错误信息携带的地址信息为一些自定义地址,例如携带的地址为11122时,该地址对应的寄存器错误信息的种类为A,而携带的地址为11221时,该地址对应的寄存器错误信息的种类为B。
当拨码开关当前对应的错误类型与处理器解析寄存器错误信息的错误类型相一致时,处理器控制触发协议分析仪4抓取NVME硬盘的PCIe trace,例如可通过控制输出端引脚的置位来进行触发trace的抓取。
举例来说,当治具板3的GPIO(General Purpose Input Output,通用输入/输出)引脚与协议分析仪4的trigger连接器相连,置位治具板3的GPIO引脚触发协议分析仪4抓取NVME硬盘的PCIe trace。
在本发明实施例提供的技术方案中,利用BIOS监控NVME硬盘所在PCIe链路的运行状态信息,并及时抓取发生故障时的寄存器错误信息,治具板的处理器通过解析这些寄存器错误信息的类型,并通过与拨码开关选择的错误类型的比对控制协议分析仪trigger抓取trace,实现了准确、高效地抓取NVME硬盘的PCIe trace,不仅解决了相关技术需要针对不同错误类型定制对应的BIOS或trigger的现状,用户还可通过拨码开关自由选择所需要抓取的错误类型抓取出对应的PCIe trace进行分析,有利于高效且准确地找出NVME硬盘和服务器系统不兼容的原因。
请首先参见图2,图2为本发明实施例提供的一种抓取NVME硬 盘trace的方法的流程示意图,例如可用于上述实施例的治具板,本发明实施例可包括以下内容:
S201:获取寄存器错误信息,寄存器错误信息为BIOS在NVME硬盘所在PCIe链路发生错误时采集的寄存器信息。
S202:解析寄存器错误信息携带的地址信息得到对应的错误类型。
寄存器错误信息的地址信息与错误类型具有对应关系。
S203:判断错误类型是否和拨码开关当前对应的错误类型相一致,若是,则执行S204。
S204:触发协议分析仪抓取NVME硬盘的PCIe trace。
治具板协议分析仪相连,寄存器错误信息为BIOS通过BMC发送至治具板,BIOS通过KCS链路将寄存器错误信息发送至BMC,BMC将接收到的寄存器错误信息通过RS-232串口发送至治具板。
可选的,治具板的GPIO引脚与协议分析仪的trigger连接器相连,S204可为发送置位GPIO引脚的指令,以触发协议分析仪抓取NVME硬盘的PCIe trace。
由上可知,本发明实施例无需针对不同故障错误类型定制相应的BIOS或trigger,精确、有效地抓取出NVME硬盘故障错误对应的PCIe trace。
本发明实施例还针对抓取NVME硬盘trace的方法提供了相应的实现装置,进一步使得所述方法更具有实用性。下面对本发明实施例提供的抓取NVME硬盘trace的装置进行介绍,下文描述的抓取NVME硬盘trace的装置与上文描述的抓取NVME硬盘trace的方法可相互对应参照。
参见图3,图3为本发明实施例提供的抓取NVME硬盘trace的装置在一种具体实施方式下的结构图,该装置可包括:
信息获取模块301,用于获取寄存器错误信息,寄存器错误信息为BIOS在NVME硬盘所在PCIe链路发生错误时采集的寄存器信息;寄存器错误信息为BIOS通过BMC发送至治具板;治具板协议分析仪 相连。
错误类型解析模块302,用于解析寄存器错误信息携带的地址信息得到对应的错误类型;寄存器错误信息的地址信息与错误类型具有对应关系。
错误类型匹配结果判断模块303,判断错误类型是否和拨码开关当前对应的错误类型相一致。
触发模块304,用于在错误类型和拨码开关当前对应的错误类型相一致时,触发协议分析仪抓取NVME硬盘的PCIe trace。
可选的,在本实施例的一些实施方式中,所述触发模块304例如还可以为在治具板的GPIO引脚与协议分析仪的trigger连接器相连,发送置位GPIO引脚的指令,以触发协议分析仪抓取NVME硬盘的PCIe trace的模块。
本发明实施例所述抓取NVME硬盘trace的装置的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例无需针对不同故障错误类型定制相应的BIOS或trigger,精确、有效地抓取出NVME硬盘故障错误对应的PCIe trace。
本发明实施例还提供了一种抓取NVME硬盘trace的设备,具体可包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序以实现如上任意一实施例所述抓取NVME硬盘trace的方法的步骤。
本发明实施例所述抓取NVME硬盘trace的设备的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例无需针对不同故障错误类型定制相应的BIOS或trigger,精确、有效地抓取出NVME硬盘故障错误对应的PCIe  trace。
本发明实施例还提供了一种计算机可读存储介质,存储有抓取NVME硬盘trace的程序,所述抓取NVME硬盘trace的程序被处理器执行时如上任意一实施例所述抓取NVME硬盘trace的方法的步骤。
本发明实施例所述计算机可读存储介质的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例无需针对不同故障错误类型定制相应的BIOS或trigger,精确、有效地抓取出NVME硬盘故障错误对应的PCIe trace。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上对本发明所提供的一种抓取NVME硬盘trace的方法、装置、 设备、系统及计算机可读存储介质进行了详细介绍。本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本发明进行若干改进和修饰,这些改进和修饰也落入本发明权利要求的保护范围内。

Claims (10)

  1. 一种抓取NVME硬盘trace的系统,其特征在于,包括BMC、BIOS、治具板与协议分析仪,所述BMC分别与所述治具板和所述BIOS相连,所述治具板与所述协议分析仪相连;
    所述BIOS用于在NVME硬盘所在PCIe链路发生错误时,采集所述PCIe链路的寄存器错误信息,并将所述寄存器错误信息发送至所述BMC;所述BMC用于将所述寄存器错误信息发送至治具板;
    所述治具板包括处理器和拨码开关,用于在所述拨码开关当前对应的错误类型与所述处理器解析所述寄存器错误信息的错误类型相一致时,触发所述协议分析仪抓取所述NVME硬盘的PCIe trace。
  2. 根据权利要求1所述的抓取NVME硬盘trace的系统,其特征在于,所述治具板的GPIO引脚与所述协议分析仪的trigger连接器相连,所述触发所述协议分析仪抓取所述NVME硬盘的PCIe trace为:
    置位所述治具板的GPIO引脚,以触发所述协议分析仪抓取所述NVME硬盘的PCIe trace。
  3. 根据权利要求2所述的抓取NVME硬盘trace的系统,其特征在于,通过线缆将服务器主板上所述BMC的RS-232接口连接到所述治具板的输入端,所述治具板与所述协议分析仪通过同轴线缆相连。
  4. 根据权利要求3所述的抓取NVME硬盘trace的系统,其特征在于,所述BIOS通过KCS链路将所述寄存器错误信息发送至所述BMC。
  5. 一种抓取NVME硬盘trace的方法,其特征在于,应用于治具板,包括:
    获取寄存器错误信息,所述寄存器错误信息为BIOS在NVME硬盘所在PCIe链路发生错误时采集的寄存器信息;
    解析所述寄存器错误信息携带的地址信息得到对应的错误类型;
    判断所述错误类型是否和拨码开关当前对应的错误类型相一致;
    若是,则触发协议分析仪抓取所述NVME硬盘的PCIe trace;
    其中,所述治具板所述协议分析仪相连,所述寄存器错误信息为 所述BIOS通过BMC发送至所述治具板;所述寄存器错误信息的地址信息与错误类型具有对应关系。
  6. 根据权利要求5所述的抓取NVME硬盘trace的方法,其特征在于,所述治具板的GPIO引脚与所述协议分析仪的trigger连接器相连,所述触发协议分析仪抓取所述NVME硬盘的PCIe trace包括:
    发送置位GPIO引脚的指令,以触发协议分析仪抓取所述NVME硬盘的PCIe trace。
  7. 根据权利要求6所述的抓取NVME硬盘trace的方法,其特征在于,所述BIOS通过KCS链路将所述寄存器错误信息发送至所述BMC,所述BMC将接收到的寄存器错误信息通过RS-232串口发送至所述治具板。
  8. 一种抓取NVME硬盘trace的装置,其特征在于,应用于治具板,包括:
    信息获取模块,用于获取寄存器错误信息,所述寄存器错误信息为BIOS在NVME硬盘所在PCIe链路发生错误时采集的寄存器信息;所述寄存器错误信息为所述BIOS通过BMC发送至所述治具板;所述治具板所述协议分析仪相连;
    错误类型解析模块,用于解析所述寄存器错误信息携带的地址信息得到对应的错误类型;所述寄存器错误信息的地址信息与错误类型具有对应关系;
    错误类型匹配结果判断模块,判断所述错误类型是否和拨码开关当前对应的错误类型相一致;
    触发模块,用于在所述错误类型和拨码开关当前对应的错误类型相一致时,触发协议分析仪抓取所述NVME硬盘的PCIe trace。
  9. 一种抓取NVME硬盘trace的设备,其特征在于,包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如权利要求5至7任一项所述抓取NVME硬盘trace的方法的步骤。
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有抓取NVME硬盘trace的程序,所述抓取NVME硬盘 trace的程序被处理器执行时实现如权利要求5至7任一项所述抓取NVME硬盘trace的方法的步骤。
PCT/CN2019/093360 2018-11-01 2019-06-27 抓取NVME硬盘trace的方法、装置、设备及系统 WO2020087956A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/275,827 US11442831B2 (en) 2018-11-01 2019-06-27 Method, apparatus, device and system for capturing trace of NVME hard disc

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811295906.5A CN109471763B (zh) 2018-11-01 2018-11-01 抓取NVME硬盘trace的方法、装置、设备及系统
CN201811295906.5 2018-11-01

Publications (1)

Publication Number Publication Date
WO2020087956A1 true WO2020087956A1 (zh) 2020-05-07

Family

ID=65672566

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/093360 WO2020087956A1 (zh) 2018-11-01 2019-06-27 抓取NVME硬盘trace的方法、装置、设备及系统

Country Status (3)

Country Link
US (1) US11442831B2 (zh)
CN (1) CN109471763B (zh)
WO (1) WO2020087956A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471763B (zh) 2018-11-01 2022-02-18 郑州云海信息技术有限公司 抓取NVME硬盘trace的方法、装置、设备及系统
CN112463490B (zh) * 2020-12-01 2022-07-19 苏州浪潮智能科技有限公司 带PCIe retimer的链路状态诊断系统及方法
CN116582471B (zh) * 2023-07-14 2023-09-19 珠海星云智联科技有限公司 Pcie设备、pcie数据捕获系统和服务器

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106502814A (zh) * 2016-10-19 2017-03-15 杭州迪普科技股份有限公司 一种记录pcie设备错误信息的方法及装置
CN107122277A (zh) * 2017-05-09 2017-09-01 郑州云海信息技术有限公司 基于pcie协议分析仪的pcieras注错测试系统及方法
US20180300111A1 (en) * 2017-04-17 2018-10-18 International Business Machines Corporation Preserving dynamic trace purity
CN109471763A (zh) * 2018-11-01 2019-03-15 郑州云海信息技术有限公司 抓取NVME硬盘trace的方法、装置、设备及系统

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7480303B1 (en) * 2005-05-16 2009-01-20 Pericom Semiconductor Corp. Pseudo-ethernet switch without ethernet media-access-controllers (MAC's) that copies ethernet context registers between PCI-express ports
US7594144B2 (en) * 2006-08-14 2009-09-22 International Business Machines Corporation Handling fatal computer hardware errors
CN102081562A (zh) * 2009-11-30 2011-06-01 华为技术有限公司 一种设备诊断方法及系统
US8693208B2 (en) * 2010-08-06 2014-04-08 Ocz Technology Group, Inc. PCIe bus extension system, method and interfaces therefor
CN103748562B (zh) * 2010-12-23 2019-03-29 英特尔公司 测试、验证和调试架构
US8589722B2 (en) * 2011-05-09 2013-11-19 Lsi Corporation Methods and structure for storing errors for error recovery in a hardware controller
US9954727B2 (en) * 2015-03-06 2018-04-24 Quanta Computer Inc. Automatic debug information collection
US9768952B1 (en) * 2015-09-22 2017-09-19 Seagate Technology Llc Removable circuit for unlocking self-encrypting data storage devices
CN107729220B (zh) * 2017-09-27 2019-06-18 郑州云海信息技术有限公司 一种实现多NVMe硬盘背板点灯的设计方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106502814A (zh) * 2016-10-19 2017-03-15 杭州迪普科技股份有限公司 一种记录pcie设备错误信息的方法及装置
US20180300111A1 (en) * 2017-04-17 2018-10-18 International Business Machines Corporation Preserving dynamic trace purity
CN107122277A (zh) * 2017-05-09 2017-09-01 郑州云海信息技术有限公司 基于pcie协议分析仪的pcieras注错测试系统及方法
CN109471763A (zh) * 2018-11-01 2019-03-15 郑州云海信息技术有限公司 抓取NVME硬盘trace的方法、装置、设备及系统

Also Published As

Publication number Publication date
US20220043728A1 (en) 2022-02-10
CN109471763A (zh) 2019-03-15
US11442831B2 (en) 2022-09-13
CN109471763B (zh) 2022-02-18

Similar Documents

Publication Publication Date Title
WO2020087954A1 (zh) 抓取NVME硬盘trace的方法、装置、设备及系统
US10680921B2 (en) Virtual intelligent platform management interface for hardware components
WO2020087956A1 (zh) 抓取NVME硬盘trace的方法、装置、设备及系统
US9569325B2 (en) Method and system for automated test and result comparison
CN108768730B (zh) 用于操作智能网卡的方法和装置
US20070097872A1 (en) Network connection apparatus testing method
US9710255B1 (en) Updating system of firmware of complex programmable logic device and updating method thereof
CN104268076A (zh) 一种适用各处理器平台的自动测试内存带宽的测试方法
TW201616356A (zh) 偵錯韌體/軟體以產生追蹤資料之系統與方法、記錄媒體及電腦程式產品
CN115525490A (zh) 一种内存眼图测试方法、硬件调试设备及存储介质
CN104239174A (zh) Bmc远程调试系统及方法
US10929261B1 (en) Device diagnosis
US7925728B2 (en) Facilitating detection of hardware service actions
WO2021056913A1 (zh) 基于i2c通讯的故障定位方法、装置及系统
US8880956B2 (en) Facilitating processing in a communications environment using stop signaling
US20230089389A1 (en) Transaction analyzer for communication bus traffic
US9483331B1 (en) Notifying a multipathing driver of fabric events and performing multipathing management operations in response to such fabric events
CN112148537A (zh) 总线监控装置及方法、存储介质、电子装置
US10534688B2 (en) Trace hub logic with automatic event triggering
CN111008098A (zh) 监测系统与方法
CN109491846B (zh) 一种用于服务器抓取SATA硬盘trace的方法和系统
US10216525B1 (en) Virtual disk carousel
CN116382968B (zh) 外部设备的故障检测方法以及装置
CN113535490B (zh) 侦错装置及其操作方法
CN116431453A (zh) 一种通过bios进行系统故障检测的方法、装置和设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19877712

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19877712

Country of ref document: EP

Kind code of ref document: A1