WO2020087954A1 - 抓取NVME硬盘trace的方法、装置、设备及系统 - Google Patents

抓取NVME硬盘trace的方法、装置、设备及系统 Download PDF

Info

Publication number
WO2020087954A1
WO2020087954A1 PCT/CN2019/093354 CN2019093354W WO2020087954A1 WO 2020087954 A1 WO2020087954 A1 WO 2020087954A1 CN 2019093354 W CN2019093354 W CN 2019093354W WO 2020087954 A1 WO2020087954 A1 WO 2020087954A1
Authority
WO
WIPO (PCT)
Prior art keywords
hard disk
nvme hard
trace
capturing
error type
Prior art date
Application number
PCT/CN2019/093354
Other languages
English (en)
French (fr)
Inventor
孙一心
Original Assignee
郑州云海信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 郑州云海信息技术有限公司 filed Critical 郑州云海信息技术有限公司
Publication of WO2020087954A1 publication Critical patent/WO2020087954A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available

Definitions

  • the embodiments of the present invention relate to the technical field of server applications, and in particular, to a method, device, device, system, and computer-readable storage medium for capturing NVME hard disk traces.
  • NVME Non-Volatile Memory Express
  • PCIe peripheral component interconnect, express, high-speed serial computer expansion bus standard
  • NVME hard disk failures in the server can include uncorrectable fatal error, correctable non-fatal error and correctable error, etc.
  • the appearance is usually off disk, downtime, slowdown, etc.
  • the related technology uses the trigger function of the PCIe protocol analyzer to capture the actual operating data of the interface protocol of the NVME hard disk (that is, the trace of the NVME hard disk) to analyze various incompatibilities. mistake.
  • the protocol analyzer manufacturer's original software trigger is used.
  • the trigger type is limited. It can only trigger several fixed types of errors. It cannot adapt well to the complex errors that occur in the operation of the server. Due to the fast transmission rate of the PCIe interface and the limited buffer capacity of the protocol analyzer itself, if it cannot be effectively triggered for a specific error type, it is often impossible to capture an effective PCIe trace for analysis. What's more, some errors are difficult to reproduce. Once they are missed, they need to wait a long time, wasting a lot of manpower and material resources.
  • Embodiments of the present disclosure provide a method, device, equipment, system and computer-readable storage medium for capturing NVME hard disk traces, without the need to customize corresponding triggers for different fault error types, and accurately and effectively capture NVME hard disk fault errors Corresponding PCIe trace.
  • the embodiments of the present invention provide the following technical solutions:
  • An aspect of an embodiment of the present invention provides a method for capturing NVME hard disk trace, which is applied to BIOS and includes:
  • the debug version set includes multiple debug versions, and each debug version corresponds to a register error type; the PCH is connected to the protocol analyzer.
  • Optional also includes:
  • the GPIO signal pin of the PCH is connected to the trigger connector of the protocol analyzer through a coaxial cable.
  • the monitoring of the operating status information of the PCIe link where the NVME hard disk is located is real-time monitoring of the operating status information of the PCIe link where the NVME hard disk is located.
  • Another aspect of an embodiment of the present invention provides a device for capturing NVME hard disk trace, which is applied to BIOS and includes:
  • Monitoring module used to monitor the running status information of PCIe link where NVME hard disk is located
  • the failure judgment module is used to judge whether the PCIe link where the NVME hard disk is located has failed
  • An error grabbing module used to grab the register error information of the PCIe link when the PCIe link where the NVME hard disk is located fails;
  • the error type analysis module is used to call a pre-stored debug version set to parse the register error information to obtain the corresponding error type;
  • the debug version set includes multiple debug versions, and each debug version corresponds to a register error type;
  • the error type matching result judgment module judges whether the error type is consistent with the current corresponding error type of the dial switch
  • the trigger module is used to set the GPIO signal pin corresponding to the PCH according to the error type to trigger the protocol analyzer to grab the PCIe trace of the corresponding NVME hard disk, and the PCH is connected to the protocol analyzer.
  • An embodiment of the present invention also provides a system for capturing NVME hard disk traces, including BIOS, PCH, and a protocol analyzer connected to the PCH;
  • the BIOS is used to monitor the operating status information of the PCIe link where the NVME hard disk is located, and to grab the register error information of the PCIe link when the PCIe link where the NVME hard disk is faulty; call the pre-stored debug version set Parse the register error information to obtain the corresponding error type; set the GPIO signal pin corresponding to the PCH according to the error type to trigger the protocol analyzer to grab the PCIe trace of the corresponding NVME hard disk;
  • the debug version set includes multiple debug versions, and each debug version corresponds to a register error type.
  • the GPIO signal pin of the PCH is connected to the trigger connector of the protocol analyzer through a coaxial cable.
  • BIOS is also used to send the register error information and the corresponding error type to the server operating system and BMC.
  • An embodiment of the present invention also provides a device for capturing NVME hard disk traces, including a processor, which is used to execute the steps of the method for capturing NVME hard disk traces as described in any of the preceding items when the processor is used to execute a computer program stored in a memory .
  • An embodiment of the present invention finally provides a computer-readable storage medium, which stores a program for capturing the NVME hard disk trace.
  • the program for capturing the NVME hard disk trace is implemented by the processor as follows The steps of the method for capturing the NVME hard disk trace described in the previous item.
  • Embodiments of the present invention provide a method for capturing NVME hard disk traces.
  • the BIOS detects a failure of the PCIe link where the NVME hard disk is located, it captures the register error information of the PCIe link; it calls the pre-stored debug version set to resolve the register error Information, get the corresponding error type; finally set the GPIO signal pin corresponding to the PCH according to the error type, thereby triggering the protocol analyzer to grab the PCIe trace of the corresponding NVME hard disk.
  • the advantage of the technical solution provided by this application is that the BIOS promptly captures the register error information of the PCIe link where the NVME hard disk is faulty, and parses its error type, because each debug version of the debug version set and the type of register error type are one by one Correspondingly, the error type can be accurately and quickly parsed, and then the signal pin control protocol analyzer of the PCH is set according to the error type, which realizes the PCIe trace of NVME hard disk accurately and efficiently, and solves the related technology needs to be customized for different error types The current status of the corresponding trigger helps to efficiently and accurately find out the reason why the NVME hard disk is not compatible with the server system.
  • the embodiments of the present invention also provide corresponding implementation devices, equipment, and computer-readable storage media for the method of capturing NVME hard disk traces, which further makes the method more practical.
  • the device, equipment, and computer-readable storage The media has corresponding advantages.
  • FIG. 1 is a schematic flowchart of a method for capturing a trace of a NVME hard disk provided by an embodiment of the present invention
  • FIG. 2 is a structural diagram of a specific implementation manner of an apparatus for capturing NVME hard disk traces according to an embodiment of the present invention
  • FIG. 3 is a structural diagram of another specific implementation manner of an apparatus for capturing NVME hard disk traces according to an embodiment of the present invention
  • FIG. 4 is a structural diagram of a specific implementation manner of a system for capturing NVME hard disk traces according to an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a method for capturing an NVME hard disk trace according to an embodiment of the present invention, which is applied to a BIOS.
  • the embodiment of the present invention may include the following:
  • S101 Monitor the running status information of the PCIe link where the NVME hard disk is located.
  • the BIOS can monitor the running status information of the PCIe link where the NVME hard disk is located in real time, or can be monitored at a fixed frequency, such as 1s, which does not affect the implementation of this application. In order to capture error information in a timely and accurate manner, it can be monitored in real time.
  • the BIOS monitors the PCIe link connected to the NVME hard disk in real time. When an error occurs on the PCIe link where the NVME hard disk is located, that is, when a PCIe error occurs on the relevant link, the PCIe link register error information is collected. The BIOS detects that the PCIe link has an error and The implementation process of collecting register error information of the PCIe link can refer to the description of related technologies, and will not be repeated here.
  • S104 Call the pre-stored debug version set to parse the register error information to obtain the corresponding error type.
  • the debug version set includes multiple debug versions, and each debug version corresponds to a register error type.
  • the debug version is a program written in advance according to each error type, such as unsupported request, badTLP, badDLLP, malformed TLP, etc.
  • the debug program can accurately detect the corresponding register error type, that is, the BIOS can register according to the PCIe error when registering The information resolves the specific fault type.
  • the user can also grab the type of fault required according to the needs, select the corresponding target debug version according to the required fault type, use the target debug version to resolve the register error information, when the fault type of the register error information matches the target debug version , Trigger the protocol analyzer to grab the PCIe trace of the corresponding NVME hard disk.
  • S105 Set the GPIO signal pin corresponding to the PCH according to the error type to trigger the protocol analyzer to grab the PCIe trace of the corresponding NVME hard disk.
  • the triggering of the protocol analyzer can be performed by controlling the setting of the pin at the connection between the protocol analyzer and the PCH, so that the trigger of the protocol analyzer captures the corresponding trace.
  • the GPIO signal pin reserved by the PCH can be connected to the protocol analyzer via a coaxial cable.
  • register error information and the corresponding error type can be packaged and sent to the server operating system and BMC as records for archiving.
  • the BIOS promptly captures the register error information of the PCIe link where the NVME hard disk is faulty and parses its error type, because each debug version of the debug version set and the type of register error type are one by one
  • the error type can be accurately and quickly parsed, and then the signal pin control protocol analyzer of the PCH is set according to the error type, which realizes the PCIe trace of NVME hard disk accurately and efficiently, and solves the related technology needs to be customized for different error types
  • the current status of the corresponding trigger helps to efficiently and accurately find out the reason why the NVME hard disk is not compatible with the server system.
  • the embodiment of the present invention also provides a corresponding implementation device for the method of capturing the NVME hard disk trace, which further makes the method more practical.
  • the following describes an apparatus for capturing an NVME hard disk trace provided by an embodiment of the present invention.
  • the apparatus for capturing an NVME hard disk trace described below and the method for capturing an NVME hard disk trace described above may refer to each other.
  • FIG. 2 is a structural diagram of a device for capturing NVME hard disk trace according to an embodiment of the present invention in a specific implementation manner.
  • the device may include:
  • the monitoring module 201 is used to monitor the running status information of the PCIe link where the NVME hard disk is located.
  • the failure judgment module 202 is used to judge whether the PCIe link where the NVME hard disk is located has a failure.
  • the error capture module 203 is used to capture the register error information of the PCIe link when the PCIe link where the NVME hard disk is located fails.
  • the error type analysis module 204 is used to call a pre-stored debug version set to resolve register error information to obtain a corresponding error type; the debug version set includes multiple debug versions, and each debug version corresponds to a register error type.
  • the error type matching result judgment module 205 judges whether the error type is consistent with the current corresponding error type of the dial switch.
  • the trigger module 206 is used to set the GPIO signal pin corresponding to the PCH according to the error type to trigger the protocol analyzer to grab the PCIe trace of the corresponding NVME hard disk, and the PCH is connected to the protocol analyzer.
  • the apparatus may further include a sending module 207, for sending register error information and corresponding error types to the server operating system and BMC, to Used for archiving.
  • a sending module 207 for sending register error information and corresponding error types to the server operating system and BMC, to Used for archiving.
  • each functional module of the device for capturing NVME hard disk traces may be specifically implemented according to the method in the above method embodiments.
  • the specific implementation process reference may be made to the related descriptions in the above method embodiments, and details are not described here. .
  • the embodiments of the present invention do not need to customize corresponding triggers for different fault error types, and accurately and effectively capture the PCIe trace corresponding to the NVME hard disk fault error.
  • An embodiment of the present invention also provides a device for capturing NVME hard disk traces, which may specifically include:
  • Memory used to store computer programs
  • the processor is used to execute a computer program to implement the steps of the method for capturing the trace of the NVME hard disk as described in any one of the above embodiments.
  • the embodiments of the present invention do not need to customize corresponding triggers for different fault error types, and accurately and effectively capture the PCIe trace corresponding to the NVME hard disk fault error.
  • An embodiment of the present invention also provides a computer-readable storage medium that stores a program for capturing a NVME hard disk trace.
  • the program for capturing an NVME hard disk trace is executed by a processor, the NVME hard disk is captured as described in any of the above embodiments. The steps of the trace method.
  • the embodiments of the present invention do not need to customize corresponding triggers for different fault error types, and accurately and effectively capture the PCIe trace corresponding to the NVME hard disk fault error.
  • the embodiment of the present invention also provides a monitoring method for capturing NVME hard disk traces.
  • it may include a BIOS 41, a PCH42, and a protocol analyzer 43 connected to the PCH42.
  • BIOS41 is used to monitor the operating status information of the PCIe link where the NVME hard disk is located, and capture the register error information of the PCIe link when the PCIe link where the NVME hard disk is faulty; call the pre-stored debug version set to parse the register error information to obtain Corresponding error type; set the GPIO signal pin corresponding to PCH42 according to the error type to trigger the protocol analyzer 43 to grab the PCIe trace of the corresponding NVME hard disk;
  • the debug version set includes multiple debug versions, and each debug version corresponds to a register error type.
  • the GPIO signal pin of the PCH42 can be connected to the trigger connector of the protocol analyzer 43 through a coaxial cable.
  • BIOS 41 is also used to send register error information and corresponding error types to the server operating system and BMC.
  • the embodiments of the present invention do not need to customize corresponding triggers for different fault error types, and accurately and effectively capture the PCIe trace corresponding to the NVME hard disk fault error.
  • RAM random access memory
  • ROM read-only memory
  • electrically programmable ROM electrically erasable and programmable ROM
  • registers hard disks, removable disks, CD-ROMs, or all fields of technology. Any other known storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

一种抓取NVME硬盘trace的方法、装置、设备、系统及计算机可读存储介质。其中,方法包括BIOS在检测到NVME硬盘所在PCIe链路发生故障时,抓取PCIe链路的寄存器错误信息;调用预先存储的调试版本集解析寄存器错误信息,得到相对应的错误类型;最后根据错误类型置位PCH对应的GPIO信号引脚,从而触发协议分析仪抓取相应NVME硬盘的PCIe trace。本申请提供的技术方案无需针对不同故障错误类型定制相应的trigger,实现了准确、高效的抓取NVME硬盘的PCIe trace,解决了相关技术需要针对不同错误类型定制对应的trigger的现状,有利于高效且准确的找出NVME硬盘和服务器系统不兼容的原因。

Description

抓取NVME硬盘trace的方法、装置、设备及系统
本申请要求于2018年11月01日提交中国专利局、申请号为201811295890.8、发明名称为“抓取NVME硬盘trace的方法、装置、设备及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及服务器应用技术领域,特别是涉及一种抓取NVME硬盘trace的方法、装置、设备、系统及计算机可读存储介质。
背景技术
NVME(Non-Volatile Memory Express,非易失性内存主机控制器接口规范)硬盘为目前服务器领域性能比较高端的一种硬盘类型,其接口基于PCIe(peripheral component interconnect express,高速串行计算机扩展总线标准)协议,具有接口速率高、IO吞吐快的特点。由于NVME硬盘性能较高,相应的,其对服务器兼容性的要求也相对较高。
常见的NVME硬盘在服务器中的故障可包括uncorrectable fatal error、correctable non-fatal error和correctable error等,表象通常为掉盘、宕机、降速等等。在做NVME硬盘在服务器中兼容性测试时,往往需要进行长时间各种模型的测试及调试,找出NVME硬盘和服务器不兼容的原因,以此来保证服务器的可用性。
在对NVME硬盘进行调试时,相关技术使用PCIe协议分析仪自带的trigger(触发条件)功能抓取NVME硬盘的接口协议实际运行数据(也即抓取NVME硬盘的trace)来分析各种不兼容的错误。
采用协议分析仪厂商原厂软件trigger的方式,trigger类型有限,只能针对固定几种类型的错误进行trigger,不能很好的适应服务器运行中出现的复杂错误。由于PCIe接口传输速率快,加上协议分析仪本身缓存容量受限,所以如果不能针对具体错误类型有效地进行触发, 则往往无法抓取有效的PCIe trace进行分析。更有甚者,一些错误很难复现,一旦错过,则又需要等待很长时间,浪费了大量人力、物力。
发明内容
本公开实施例提供了一种抓取NVME硬盘trace的方法、装置、设备、系统及计算机可读存储介质,无需针对不同故障错误类型定制相应的trigger,精确、有效的抓取出NVME硬盘故障错误对应的PCIe trace。
为解决上述技术问题,本发明实施例提供以下技术方案:
本发明实施例一方面提供了一种抓取NVME硬盘trace的方法,应用于BIOS,包括:
监控NVME硬盘所在PCIe链路的运行状态信息,并判断所述NVME硬盘所在PCIe链路是否发生故障;
若是,抓取所述PCIe链路的寄存器错误信息;
调用预先存储的调试版本集解析所述寄存器错误信息,得到相对应的错误类型;
根据所述错误类型置位PCH对应的GPIO信号引脚,以触发协议分析仪抓取相应所述NVME硬盘的PCIe trace;
其中,所述调试版本集包括多个调试版本,每个调试版本对应一种寄存器错误类型;所述PCH与所述协议分析仪相连。
可选的,还包括:
将所述寄存器错误信息和对应的错误类型发送至服务器操作系统和BMC,以用于存档。
可选的,所述PCH的GPIO信号引脚通过同轴线缆与所述协议分析仪的trigger连接器相连。
可选的,所述监控NVME硬盘所在PCIe链路的运行状态信息为实时监控NVME硬盘所在PCIe链路的运行状态信息。
本发明实施例另一方面提供了一种抓取NVME硬盘trace的装置,应用于BIOS,包括:
监控模块,用于监控NVME硬盘所在PCIe链路的运行状态信息;
故障判断模块,用于判断所述NVME硬盘所在PCIe链路是否发生故障;
错误抓取模块,用于在所述NVME硬盘所在PCIe链路发生故障时,抓取所述PCIe链路的寄存器错误信息;
错误类型解析模块,用于调用预先存储的调试版本集解析所述寄存器错误信息,得到相对应的错误类型;所述调试版本集包括多个调试版本,每个调试版本对应一种寄存器错误类型;
错误类型匹配结果判断模块,判断所述错误类型是否和拨码开关当前对应的错误类型相一致;
触发模块,用于根据所述错误类型置位PCH对应的GPIO信号引脚,以触发协议分析仪抓取相应所述NVME硬盘的PCIe trace,所述PCH与所述协议分析仪相连。
本发明实施例还提供了一种抓取NVME硬盘trace的系统,包括BIOS、PCH及与所述PCH相连的协议分析仪;
所述BIOS用于监控NVME硬盘所在PCIe链路的运行状态信息,并在所述NVME硬盘所在PCIe链路发生故障时,抓取所述PCIe链路的寄存器错误信息;调用预先存储的调试版本集解析所述寄存器错误信息,得到相对应的错误类型;根据所述错误类型置位所述PCH对应的GPIO信号引脚,以触发所述协议分析仪抓取相应所述NVME硬盘的PCIe trace;
其中,所述调试版本集包括多个调试版本,每个调试版本对应一种寄存器错误类型。
可选的,所述PCH的GPIO信号引脚通过同轴线缆与所述协议分析仪的trigger连接器相连。
可选的,所述BIOS还用于将所述寄存器错误信息和对应的错误类型发送至服务器操作系统和BMC。
本发明实施例还提供了一种抓取NVME硬盘trace的设备,包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如前 任一项所述抓取NVME硬盘trace的方法的步骤。
本发明实施例最后还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有抓取NVME硬盘trace的程序,所述抓取NVME硬盘trace的程序被处理器执行时实现如前任一项所述抓取NVME硬盘trace的方法的步骤。
本发明实施例提供了一种抓取NVME硬盘trace的方法,BIOS在检测到NVME硬盘所在PCIe链路发生故障时,抓取PCIe链路的寄存器错误信息;调用预先存储的调试版本集解析寄存器错误信息,得到相对应的错误类型;最后根据错误类型置位PCH对应的GPIO信号引脚,从而触发协议分析仪抓取相应NVME硬盘的PCIe trace。
本申请提供的技术方案的优点在于,BIOS及时抓取NVME硬盘所在PCIe链路发生故障时的寄存器错误信息,并解析其错误类型,由于调试版本集的各调试版本与寄存器错误类型的种类一一对应,可准确、快速解析错误类型,进而根据错误类型置位PCH的信号引脚控制协议分析仪,实现了准确、高效地抓取NVME硬盘的PCIe trace,解决了相关技术需要针对不同错误类型定制对应的trigger的现状,有利于高效且准确地找出NVME硬盘和服务器系统不兼容的原因。
此外,本发明实施例还针对抓取NVME硬盘trace的方法提供了相应的实现装置、设备及计算机可读存储介质,进一步使得所述方法更具有实用性,所述装置、设备及计算机可读存储介质具有相应的优点。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本公开。
附图说明
为了更清楚的说明本发明实施例或现有技术的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附 图获得其他的附图。
图1为本发明实施例提供的一种抓取NVME硬盘trace的方法的流程示意图;
图2为本发明实施例提供的抓取NVME硬盘trace的装置的一种具体实施方式结构图;
图3为本发明实施例提供的抓取NVME硬盘trace的装置的另一种具体实施方式结构图;
图4为本发明实施例提供的抓取NVME硬盘trace的系统的一种具体实施方式结构图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面结合附图和具体实施方式对本发明作进一步的详细说明。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等是用于区别不同的对象,而不是用于描述特定的顺序。此外术语“包括”和“具有”以及他们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、、产品或设备没有限定于已列出的步骤或单元,而是可包括没有列出的步骤或单元。
在介绍了本发明实施例的技术方案后,下面详细的说明本申请的各种非限制性实施方式。
首先参见图1,图1为本发明实施例提供的一种抓取NVME硬盘trace的方法的流程示意图,应用于BIOS,本发明实施例可包括以下内容:
S101:监控NVME硬盘所在PCIe链路的运行状态信息。
BIOS可实时监控NVME硬盘所在PCIe链路的运行状态信息,也 可固定频率进行监控,例如1s,这均不影响本申请的实现,为了及时、准确抓取错误信息,可实时进行监控。
S102:判断NVME硬盘所在PCIe链路是否发生故障,若是,则执行S103。
S103:抓取PCIe链路的寄存器错误信息。
BIOS实时监控连接NVME硬盘的PCIe链路,在NVME硬盘所在PCIe链路发生错误时,也即当相关链路出现PCIe报错时,采集PCIe链路的寄存器错误信息,BIOS检测PCIe链路发生错误和采集PCIe链路的寄存器错误信息的实现过程可参阅相关技术的描述,此处,便不再赘述。
S104:调用预先存储的调试版本集解析寄存器错误信息,得到相对应的错误类型。
调试版本集包括多个调试版本,每个调试版本对应一种寄存器错误类型。调试版本为预先根据每种错误类型编写的程序,例如unsupported request、badTLP、badDLLP、malformed TLP等等,利用该调试程序可准确检测到对应的寄存器错误类型,也即BIOS可根据采集PCIe错误时寄存器信息解析出具体的故障类型。
当然,用户也可根据需求抓取所需故障的类型,根据所需故障类型选择相应的目标调试版本,利用目标调试版本解析寄存器错误信息,在寄存器错误信息的故障类型与目标调试版本相匹配时,触发协议分析仪抓取相应NVME硬盘的PCIe trace。
S105:根据错误类型置位PCH对应的GPIO信号引脚,以触发协议分析仪抓取相应NVME硬盘的PCIe trace。
可以通过控制协议分析仪与PCH连接处的引脚的置位来进行协议分析仪的触发,使得协议分析仪的trigger抓取对应的trace。可选的,可将PCH预留的GPIO信号引脚通过同轴线缆与协议分析仪相连。
此外,还可将寄存器错误信息和对应的错误类型打包发送至服务器操作系统和BMC作为记录来存档。
在本发明实施例提供的技术方案中,BIOS及时抓取NVME硬盘 所在PCIe链路发生故障时的寄存器错误信息并解析其错误类型,由于调试版本集的各调试版本与寄存器错误类型的种类一一对应,可准确、快速解析错误类型,进而根据错误类型置位PCH的信号引脚控制协议分析仪,实现了准确、高效地抓取NVME硬盘的PCIe trace,解决了相关技术需要针对不同错误类型定制对应的trigger的现状,有利于高效且准确地找出NVME硬盘和服务器系统不兼容的原因。
本发明实施例还针对抓取NVME硬盘trace的方法提供了相应的实现装置,进一步使得所述方法更具有实用性。下面对本发明实施例提供的抓取NVME硬盘trace的装置进行介绍,下文描述的抓取NVME硬盘trace的装置与上文描述的抓取NVME硬盘trace的方法可相互对应参照。
参见图2,图2为本发明实施例提供的抓取NVME硬盘trace的装置在一种具体实施方式下的结构图,该装置可包括:
监控模块201,用于监控NVME硬盘所在PCIe链路的运行状态信息。
故障判断模块202,用于判断NVME硬盘所在PCIe链路是否发生故障。
错误抓取模块203,用于在NVME硬盘所在PCIe链路发生故障时,抓取PCIe链路的寄存器错误信息。
错误类型解析模块204,用于调用预先存储的调试版本集解析寄存器错误信息,得到相对应的错误类型;调试版本集包括多个调试版本,每个调试版本对应一种寄存器错误类型。
错误类型匹配结果判断模块205,判断错误类型是否和拨码开关当前对应的错误类型相一致。
触发模块206,用于根据错误类型置位PCH对应的GPIO信号引脚,以触发协议分析仪抓取相应NVME硬盘的PCIe trace,PCH与协议分析仪相连。
可选的,在本实施例的一些实施方式中,请参阅图3,所述装置 例如还可以包括发送模块207,用于将寄存器错误信息和对应的错误类型发送至服务器操作系统和BMC,以用于存档。
本发明实施例所述抓取NVME硬盘trace的装置的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例无需针对不同故障错误类型定制相应的trigger,精确、有效的抓取出NVME硬盘故障错误对应的PCIe trace。
本发明实施例还提供了一种抓取NVME硬盘trace的设备,具体可包括:
存储器,用于存储计算机程序;
处理器,用于执行计算机程序以实现如上任意一实施例所述抓取NVME硬盘trace的方法的步骤。
本发明实施例所述抓取NVME硬盘trace的设备的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例无需针对不同故障错误类型定制相应的trigger,精确、有效的抓取出NVME硬盘故障错误对应的PCIe trace。
本发明实施例还提供了一种计算机可读存储介质,存储有抓取NVME硬盘trace的程序,所述抓取NVME硬盘trace的程序被处理器执行时如上任意一实施例所述抓取NVME硬盘trace的方法的步骤。
本发明实施例所述计算机可读存储介质的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例无需针对不同故障错误类型定制相应的trigger,精确、有效的抓取出NVME硬盘故障错误对应的PCIe trace。
本发明实施例还提供了一种抓取NVME硬盘trace的监控,参见 图4,可包括BIOS41、PCH42及与PCH42相连的协议分析仪43。
BIOS41用于监控NVME硬盘所在PCIe链路的运行状态信息,并在NVME硬盘所在PCIe链路发生故障时,抓取PCIe链路的寄存器错误信息;调用预先存储的调试版本集解析寄存器错误信息,得到相对应的错误类型;根据错误类型置位PCH42对应的GPIO信号引脚,以触发协议分析仪43抓取相应NVME硬盘的PCIe trace;
调试版本集包括多个调试版本,每个调试版本对应一种寄存器错误类型。
可选的,PCH42的GPIO信号引脚可通过同轴线缆与协议分析仪43的trigger连接器相连。
在一些具体实施方式中,BIOS41还用于将寄存器错误信息和对应的错误类型发送至服务器操作系统和BMC。
本发明实施例所述抓取NVME硬盘trace的系统的各功能模块的功能可根据上述方法实施例中的方法具体实现,其具体实现过程可以参照上述方法实施例的相关描述,此处不再赘述。
由上可知,本发明实施例无需针对不同故障错误类型定制相应的trigger,精确、有效的抓取出NVME硬盘故障错误对应的PCIe trace。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上对本发明所提供的一种抓取NVME硬盘trace的方法、装置、设备、系统及计算机可读存储介质进行了详细介绍。本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以对本发明进行若干改进和修饰,这些改进和修饰也落入本发明权利要求的保护范围内。

Claims (10)

  1. 一种抓取NVME硬盘trace的方法,其特征在于,应用于BIOS,包括:
    监控NVME硬盘所在PCIe链路的运行状态信息,并判断所述NVME硬盘所在PCIe链路是否发生故障;
    若是,抓取所述PCIe链路的寄存器错误信息;
    调用预先存储的调试版本集解析所述寄存器错误信息,得到相对应的错误类型;
    根据所述错误类型置位PCH对应的GPIO信号引脚,以触发协议分析仪抓取相应所述NVME硬盘的PCIe trace;
    其中,所述调试版本集包括多个调试版本,每个调试版本对应一种寄存器错误类型;所述PCH与所述协议分析仪相连。
  2. 根据权利要求1所述的抓取NVME硬盘trace的方法,其特征在于,还包括:
    将所述寄存器错误信息和对应的错误类型发送至服务器操作系统和BMC,以用于存档。
  3. 根据权利要求2所述的抓取NVME硬盘trace的方法其特征在于,所述PCH的GPIO信号引脚通过同轴线缆与所述协议分析仪的trigger连接器相连。
  4. 根据权利要求3所述的抓取NVME硬盘trace的方法其特征在于,所述监控NVME硬盘所在PCIe链路的运行状态信息为实时监控NVME硬盘所在PCIe链路的运行状态信息。
  5. 一种抓取NVME硬盘trace的装置,其特征在于,应用于BIOS,包括:
    监控模块,用于监控NVME硬盘所在PCIe链路的运行状态信息;
    故障判断模块,用于判断所述NVME硬盘所在PCIe链路是否发生故障;
    错误抓取模块,用于在所述NVME硬盘所在PCIe链路发生故障时,抓取所述PCIe链路的寄存器错误信息;
    错误类型解析模块,用于调用预先存储的调试版本集解析所述寄存器错误信息,得到相对应的错误类型;所述调试版本集包括多个调试版本,每个调试版本对应一种寄存器错误类型;
    错误类型匹配结果判断模块,判断所述错误类型是否和拨码开关当前对应的错误类型相一致;
    触发模块,用于根据所述错误类型置位PCH对应的GPIO信号引脚,以触发协议分析仪抓取相应所述NVME硬盘的PCIe trace,所述PCH与所述协议分析仪相连。
  6. 一种抓取NVME硬盘trace的设备,其特征在于,包括处理器,所述处理器用于执行存储器中存储的计算机程序时实现如权利要求1至4任一项所述抓取NVME硬盘trace的方法的步骤。
  7. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有抓取NVME硬盘trace的程序,所述抓取NVME硬盘trace的程序被处理器执行时实现如权利要求1至4任一项所述抓取NVME硬盘trace的方法的步骤。
  8. 一种抓取NVME硬盘trace的系统,其特征在于,包括BIOS、PCH及与所述PCH相连的协议分析仪;
    所述BIOS用于监控NVME硬盘所在PCIe链路的运行状态信息,并在所述NVME硬盘所在PCIe链路发生故障时,抓取所述PCIe链路的寄存器错误信息;调用预先存储的调试版本集解析所述寄存器错误信息,得到相对应的错误类型;根据所述错误类型置位所述PCH对应的GPIO信号引脚,以触发所述协议分析仪抓取相应所述NVME硬盘的PCIe trace;
    其中,所述调试版本集包括多个调试版本,每个调试版本对应一种寄存器错误类型。
  9. 根据权利要求8所述的抓取NVME硬盘trace的系统,其特征在于,所述PCH的GPIO信号引脚通过同轴线缆与所述协议分析仪的trigger连接器相连。
  10. 根据权利要求9所述的抓取NVME硬盘trace的系统,其特 征在于,所述BIOS还用于将所述寄存器错误信息和对应的错误类型发送至服务器操作系统和BMC。
PCT/CN2019/093354 2018-11-01 2019-06-27 抓取NVME硬盘trace的方法、装置、设备及系统 WO2020087954A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811295890.8 2018-11-01
CN201811295890.8A CN109408338B (zh) 2018-11-01 2018-11-01 抓取NVME硬盘trace的方法、装置、设备及系统

Publications (1)

Publication Number Publication Date
WO2020087954A1 true WO2020087954A1 (zh) 2020-05-07

Family

ID=65471130

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/093354 WO2020087954A1 (zh) 2018-11-01 2019-06-27 抓取NVME硬盘trace的方法、装置、设备及系统

Country Status (2)

Country Link
CN (1) CN109408338B (zh)
WO (1) WO2020087954A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111722966A (zh) * 2020-06-19 2020-09-29 广东浪潮大数据研究有限公司 PCIe Switch检测方法、系统、设备及介质
CN113900718A (zh) * 2021-09-30 2022-01-07 苏州浪潮智能科技有限公司 一种bmc与bios资产信息的解耦方法、系统及装置
CN116028291A (zh) * 2023-03-29 2023-04-28 北京象帝先计算技术有限公司 调试信号输出系统、pcie设备、电子设备及方法

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408338B (zh) * 2018-11-01 2022-02-18 郑州云海信息技术有限公司 抓取NVME硬盘trace的方法、装置、设备及系统
CN110502394A (zh) * 2019-08-08 2019-11-26 苏州浪潮智能科技有限公司 服务器故障处理方法、装置、可读存储介质及bmc
CN111274169B (zh) * 2020-01-19 2021-08-20 苏州浪潮智能科技有限公司 一种自动分配nvme硬盘序号的方法及系统
CN111884856B (zh) * 2020-07-29 2022-05-24 苏州浪潮智能科技有限公司 一种fc卡的传输错误定位方法及相关装置
CN112463490B (zh) * 2020-12-01 2022-07-19 苏州浪潮智能科技有限公司 带PCIe retimer的链路状态诊断系统及方法
CN113127285B (zh) * 2021-06-17 2021-10-08 北京燧原智能科技有限公司 一种错误数据调试方法、装置、芯片及计算机设备
CN113535450A (zh) * 2021-07-09 2021-10-22 深圳忆联信息系统有限公司 基于固态硬盘测试的异常分析方法、装置及计算机设备
CN118468361A (zh) * 2024-05-06 2024-08-09 中国电子科技集团公司第十五研究所 一种硬盘固件检测分析方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160062652A1 (en) * 2014-08-29 2016-03-03 Dell Products, Lp System and Method for Providing Personality Switching in a Solid State Drive Device
CN106462498A (zh) * 2014-06-23 2017-02-22 利奇德股份有限公司 用于数据存储系统的模块化交换架构
CN106970866A (zh) * 2017-03-13 2017-07-21 郑州云海信息技术有限公司 一种磁盘监控系统及方法
CN109408338A (zh) * 2018-11-01 2019-03-01 郑州云海信息技术有限公司 抓取NVME硬盘trace的方法、装置、设备及系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7657692B2 (en) * 1999-08-04 2010-02-02 Super Talent Electronics, Inc. High-level bridge from PCIE to extended USB
KR101581702B1 (ko) * 2010-12-23 2016-01-11 인텔 코포레이션 테스트, 검증, 및 디버그 아키텍처
CN103198000A (zh) * 2013-04-02 2013-07-10 浪潮电子信息产业股份有限公司 一种linux系统下的故障内存位置定位方法
US10180889B2 (en) * 2014-06-23 2019-01-15 Liqid Inc. Network failover handling in modular switched fabric based data storage systems
CN106155826B (zh) * 2015-04-16 2019-10-18 伊姆西公司 用于在总线结构中检测及处理错误的方法和系统
CN107122277A (zh) * 2017-05-09 2017-09-01 郑州云海信息技术有限公司 基于pcie协议分析仪的pcieras注错测试系统及方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106462498A (zh) * 2014-06-23 2017-02-22 利奇德股份有限公司 用于数据存储系统的模块化交换架构
US20160062652A1 (en) * 2014-08-29 2016-03-03 Dell Products, Lp System and Method for Providing Personality Switching in a Solid State Drive Device
CN106970866A (zh) * 2017-03-13 2017-07-21 郑州云海信息技术有限公司 一种磁盘监控系统及方法
CN109408338A (zh) * 2018-11-01 2019-03-01 郑州云海信息技术有限公司 抓取NVME硬盘trace的方法、装置、设备及系统

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111722966A (zh) * 2020-06-19 2020-09-29 广东浪潮大数据研究有限公司 PCIe Switch检测方法、系统、设备及介质
CN111722966B (zh) * 2020-06-19 2024-01-23 广东浪潮大数据研究有限公司 PCIe Switch检测方法、系统、设备及介质
CN113900718A (zh) * 2021-09-30 2022-01-07 苏州浪潮智能科技有限公司 一种bmc与bios资产信息的解耦方法、系统及装置
CN113900718B (zh) * 2021-09-30 2023-08-15 苏州浪潮智能科技有限公司 一种bmc与bios资产信息的解耦方法、系统及装置
CN116028291A (zh) * 2023-03-29 2023-04-28 北京象帝先计算技术有限公司 调试信号输出系统、pcie设备、电子设备及方法
CN116028291B (zh) * 2023-03-29 2023-07-21 北京象帝先计算技术有限公司 调试信号输出系统、pcie设备、电子设备及方法

Also Published As

Publication number Publication date
CN109408338B (zh) 2022-02-18
CN109408338A (zh) 2019-03-01

Similar Documents

Publication Publication Date Title
WO2020087954A1 (zh) 抓取NVME硬盘trace的方法、装置、设备及系统
TWI229796B (en) Method and system to implement a system event log for system manageability
EP3140960B1 (en) Methods, systems, and computer readable media for providing fuzz testing functionality
WO2020087956A1 (zh) 抓取NVME硬盘trace的方法、装置、设备及系统
CN109254864A (zh) 一种应用程序故障修复方法、装置及电子设备
WO2016045353A1 (zh) 一种故障诊断分析方法、装置、系统及存储介质
WO2018006702A1 (zh) 自动化测试中的异常处理方法、装置及系统
WO2021056913A1 (zh) 基于i2c通讯的故障定位方法、装置及系统
CN106559288A (zh) 一种基于icmp报文的快速故障检测方法
CN109710479B (zh) 一种处理方法及第一设备、第二设备
CN114003416B (zh) 内存错误动态处理方法、系统、终端及存储介质
US20210334153A1 (en) Remote error detection method adapted for a remote computer device to detect errors that occur in a service computer device
US8880957B2 (en) Facilitating processing in a communications environment using stop signaling
WO2020259339A1 (zh) 总线监控装置及方法、存储介质、电子装置
JP6168628B1 (ja) 障害解析装置、障害解析システム、障害解析方法、及び障害解析用プログラム
CN109446002B (zh) 一种用于服务器抓取sata硬盘的治具板、系统及方法
US7925728B2 (en) Facilitating detection of hardware service actions
CN115022163B (zh) 日志收集方法、装置、计算机设备及存储介质
CN115766526A (zh) 交换机物理层芯片的测试方法、装置及电子设备
CN109491846B (zh) 一种用于服务器抓取SATA硬盘trace的方法和系统
WO2016041387A1 (zh) 一种网络管理系统调试网元配置的方法及装置
CN116382968B (zh) 外部设备的故障检测方法以及装置
CN115955416B (zh) 测试upi降带宽的方法、装置、设备及存储介质
CN118550747A (zh) 一种PCIe致命错误的快速定位方法、系统、电子设备及介质
CN107438259B (zh) 一种网管系统性能模块故障的定位方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19879801

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19879801

Country of ref document: EP

Kind code of ref document: A1