CN116521496A - Method, system, computer device and storage medium for verifying server performance - Google Patents
Method, system, computer device and storage medium for verifying server performance Download PDFInfo
- Publication number
- CN116521496A CN116521496A CN202310432839.1A CN202310432839A CN116521496A CN 116521496 A CN116521496 A CN 116521496A CN 202310432839 A CN202310432839 A CN 202310432839A CN 116521496 A CN116521496 A CN 116521496A
- Authority
- CN
- China
- Prior art keywords
- error
- hardware
- real
- scheme
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
- G06F11/3476—Data logging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A10/00—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
- Y02A10/40—Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping
Landscapes
- Engineering & Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
技术领域technical field
本发明涉及服务器测试领域,具体涉及一种服务器性能验证方法、系统、计算机设备及存储介质。The invention relates to the field of server testing, in particular to a server performance verification method, system, computer equipment and storage medium.
背景技术Background technique
现有技术中,当验证服务器性能时,通常在服务器主板上的xdp(extenddebugport,扩展调试端口)接口进行错误注入测试,错误注入为模拟错误或只发生在硬件接口,并非硬件真实发生的错误,并且Intel提供的PEI卡、MEI卡只能单个、单一种类地注入,因此错误注入测试的准确性有限、效率较低。In the prior art, when verifying server performance, the xdp (extenddebugport, extended debugging port) interface on the server motherboard is usually used for error injection testing. Error injection is a simulated error or only occurs at the hardware interface, not a real error in the hardware. Moreover, the PEI card and MEI card provided by Intel can only be injected into a single type, so the accuracy of the error injection test is limited and the efficiency is low.
发明内容Contents of the invention
本发明目的是:提供一种服务器性能验证方法、系统、计算机设备及存储介质。The purpose of the present invention is to provide a server performance verification method, system, computer equipment and storage medium.
本发明的技术方案是:第一方面,本发明提供一种服务器性能验证方法,所述方法包括:The technical solution of the present invention is: in the first aspect, the present invention provides a server performance verification method, the method comprising:
接收错误注入指令并基于所述错误注入指令生成硬件真实错误发生方案;receiving an error injection instruction and generating a real hardware error occurrence scheme based on the error injection instruction;
发送所述硬件真实错误发生方案至目标硬件,以供所述目标硬件基于所述硬件真实错误发生方案产生真实错误;Sending the hardware real error occurrence scheme to the target hardware, so that the target hardware generates a real error based on the hardware real error occurrence scheme;
获取所述真实错误的日志信息,并获取硬件真实错误发生方案和寄存器信息;Obtain the log information of the real error, and obtain the hardware real error occurrence scheme and register information;
基于所述真实错误的日志信息、所述硬件真实错误发生方案和所述寄存器信息处理所述真实错误以验证所述服务器性能。The real error is processed based on the log information of the real error, the hardware real error occurrence scheme and the register information to verify the performance of the server.
在一种较佳的实施方式中,所述接收错误注入指令并基于所述错误注入指令生成硬件真实错误发生方案包括:In a preferred implementation manner, the receiving the error injection instruction and generating a real hardware error occurrence scheme based on the error injection instruction includes:
接收错误注入指令;Receive error injection instructions;
读取服务器硬件配置信息,所述服务器硬件配置信息至少包括:硬件型号、硬件数量、内容容量以及内存位置;Read server hardware configuration information, the server hardware configuration information at least includes: hardware model, hardware quantity, content capacity and memory location;
基于所述错误注入指令与所述服务器硬件配置信息生成硬件真实错误发生方案,所述硬件真实错误发生方案至少包括错误列表、错误优先级以及错误类型。A real hardware error occurrence scheme is generated based on the error injection instruction and the server hardware configuration information, and the hardware real error occurrence scheme at least includes an error list, an error priority, and an error type.
在一种较佳的实施方式中,所述接收错误注入指令并基于所述错误注入指令生成硬件真实错误发生方案之前,所述方法还包括:In a preferred implementation manner, before receiving the error injection instruction and generating a real hardware error occurrence scheme based on the error injection instruction, the method further includes:
显示预设错误注入方案于人机交互界面以供用户选择以输入错误注入指令,所述预设错误注入方案包括自定义错误、随机错误与故障洪流中的任意一种。A preset error injection scheme is displayed on the human-computer interaction interface for the user to select to input an error injection command, and the preset error injection scheme includes any one of self-defined errors, random errors, and fault torrents.
在一种较佳的实施方式中,响应于用户选择的错误注入方案为自定义错误,所述基于所述错误注入指令与所述服务器硬件配置信息生成硬件真实错误发生方案包括:In a preferred implementation manner, in response to the error injection scheme selected by the user is a custom error, the generation of a real hardware error occurrence scheme based on the error injection instruction and the server hardware configuration information includes:
接收用户输入的自定义硬件类型信息、自定义错误类型信息、自定义错误发生次数信息以及错误位置信息;Receive user-defined hardware type information, custom error type information, custom error occurrence times information, and error location information;
基于所述自定义硬件类型信息、自定义错误类型信息、自定义错误发生次数信息以及错误位置信息以及所述服务器硬件配置信息生成硬件真实错误发生方案。A real hardware error occurrence solution is generated based on the user-defined hardware type information, user-defined error type information, user-defined error frequency information, error location information, and the server hardware configuration information.
在一种较佳的实施方式中,响应于用户选择的错误注入方案为随机错误,所述基于所述错误注入指令与所述服务器硬件配置信息生成硬件真实错误发生方案包括:In a preferred implementation manner, in response to the error injection scheme selected by the user being a random error, the generation of a real hardware error occurrence scheme based on the error injection instruction and the server hardware configuration information includes:
基于所述服务器硬件配置信息随机生成硬件真实错误发生方案。A real hardware error occurrence scheme is randomly generated based on the server hardware configuration information.
在一种较佳的实施方式中,响应于用户选择的错误注入方案为故障洪流,所述基于所述错误注入指令与所述服务器硬件配置信息生成硬件真实错误发生方案包括:In a preferred implementation manner, in response to the error injection scheme selected by the user being a fault flood, the generation of a real hardware error occurrence scheme based on the error injection instruction and the server hardware configuration information includes:
基于所述服务器硬件配置信息持续预设时长随机产生错误以生成硬件真实错误发生方案。Randomly generate errors based on the server hardware configuration information for a preset period of time to generate a real hardware error occurrence solution.
在一种较佳的实施方式中,所述发送所述硬件真实错误发生方案至目标硬件,以供所述目标硬件基于所述硬件真实错误发生方案产生真实错误之前,所述方法还包括:In a preferred implementation manner, before sending the hardware real error occurrence scheme to the target hardware for the target hardware to generate a real error based on the hardware real error occurrence scheme, the method further includes:
解析所述硬件真实错误发生方案获取硬件真实错误类别信息,所述硬件真实错误类别信息包括CPU硬件错误、内存硬件错误、PCIE硬件错误以及其他错误中的至少一种;Analyzing the hardware real error occurrence scheme to obtain hardware real error category information, the hardware real error category information includes at least one of CPU hardware errors, memory hardware errors, PCIE hardware errors and other errors;
解析所述硬件真实错误发生方案获取硬件真实错误类别信息对应的目标硬件层信息;Analyzing the hardware real error occurrence scheme to obtain target hardware layer information corresponding to the hardware real error category information;
解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息;Analyzing the hardware real error occurrence scheme to obtain target error type information corresponding to the target hardware layer information;
所述发送所述硬件真实错误发生方案至目标硬件包括:The sending the real error occurrence scheme of the hardware to the target hardware includes:
基于所述硬件真实错误类别信息发送所述目标硬件层信息与所述目标错误类型信息至目标硬件。Sending the target hardware layer information and the target error type information to target hardware based on the hardware real error type information.
在一种较佳的实施方式中,所述解析所述硬件真实错误发生方案获取硬件真实错误类别信息包括:In a preferred implementation manner, the analyzing the hardware real error occurrence scheme to obtain hardware real error category information includes:
基于所述错误优先级以及所述错误列表解析所述硬件真实错误发生方案获取硬件真实错误类别信息。Analyzing the hardware real error occurrence scheme based on the error priority and the error list to obtain hardware real error category information.
在一种较佳的实施方式中,所述解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息时还生成第一错误随机数;In a preferred implementation manner, when the solution for analyzing the real hardware error occurrence obtains the target error type information corresponding to the target hardware layer information, a first error random number is also generated;
所述解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息之前,所述方法还包括:Before the parsing of the hardware real error occurrence scheme to obtain the target error type information corresponding to the target hardware layer information, the method further includes:
基于所述第一错误随机数修正所述目标硬件层信息。Correcting the target hardware layer information based on the first erroneous random number.
在一种较佳的实施方式中,所述解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息时还生成第二错误随机数。In a preferred implementation manner, when the solution for analyzing the real hardware error occurrence obtains the target error type information corresponding to the target hardware layer information, a second error random number is also generated.
所述基于所述硬件真实错误类别信息发送所述目标硬件层信息与所述目标错误类型信息至目标硬件之前,所述方法还包括:Before sending the target hardware layer information and the target error type information to the target hardware based on the real hardware error category information, the method further includes:
基于所述第二错误随机数修正所述目标错误类型信息。Correcting the target error type information based on the second error random number.
在一种较佳的实施方式中,所述发送硬件真实错误发生方案至目标硬件,以供目标硬件基于硬件真实错误发生方案产生真实错误包括:In a preferred embodiment, the sending the hardware real error occurrence scheme to the target hardware, so that the target hardware generates a real error based on the hardware real error occurrence scheme includes:
以错误包或错误流形式下发所述目标硬件层信息与所述目标错误类型信息至目标硬件以产生真实错误。Sending the target hardware layer information and the target error type information to target hardware in the form of error packets or error streams to generate real errors.
在一种较佳的实施方式中,所述获取所述真实错误的日志信息包括:In a preferred implementation manner, the obtaining log information of the real error includes:
基于预设接口抓取所述真实错误的日志信息,所述预设接口包括串口、XDP接口、IPMI接口、redfish接口与SSH接口的至少一种;所述真实错误的日志信息由所述目标硬件在真实错误发生后经UEFI系统上报至OSkernel以及BMC生成。Grab the log information of the real error based on the preset interface, the preset interface includes at least one of a serial port, an XDP interface, an IPMI interface, a redfish interface and an SSH interface; the log information of the real error is generated by the target hardware After a real error occurs, it is reported to the OSkernel and BMC via the UEFI system for generation.
在一种较佳的实施方式中,所述获取硬件真实错误发生方案和寄存器信息包括:In a preferred implementation manner, said acquisition of hardware real error occurrence scheme and register information includes:
获取硬件真实错误发生方案;Obtain the real error occurrence scheme of the hardware;
获取错误传递和处理触发寄存器置位的寄存器信息,所述寄存器信息至少包括SMI链路信息与CSMI链路信息。Acquiring register information for error transfer and processing trigger register setting, where the register information at least includes SMI link information and CSMI link information.
在一种较佳的实施方式中,所述获取错误传递和处理触发寄存器置位的寄存器信息包括:In a preferred implementation manner, said acquisition of error transmission and processing trigger register setting register information includes:
获取错误发生后传递至UEFI系统中触发寄存器置位的寄存器信息。Obtain the register information that is passed to the trigger register setting in the UEFI system after an error occurs.
在一种较佳的实施方式中,所述基于所述真实错误的日志信息、所述硬件真实错误发生方案和所述寄存器信息处理所述真实错误以验证所述服务器性能包括:In a preferred implementation manner, the processing of the real error based on the real error log information, the hardware real error occurrence scheme and the register information to verify the server performance includes:
判断所述真实错误的日志与所述硬件真实错误发生方案是否一致,并判断所述寄存器信息与所述硬件真实错误发生方案是否一致;Judging whether the log of the real error is consistent with the hardware real error occurrence scheme, and judging whether the register information is consistent with the hardware real error occurrence scheme;
若所述真实错误的日志与所述硬件真实错误发生方案一致,且所述寄存器信息与所述硬件真实错误发生方案一致,则验证所述服务器性能合格。If the real error log is consistent with the hardware real error occurrence scheme, and the register information is consistent with the hardware real error occurrence scheme, then verify that the performance of the server is qualified.
在一种较佳的实施方式中,所述判断所述真实错误的日志与所述硬件真实错误发生方案是否一致包括:In a preferred implementation manner, the judging whether the real error log is consistent with the hardware real error occurrence scheme includes:
判断所述真实错误的日志与所述硬件真实错误发生方案中的错误信息是否一致,所述错误信息包括错误类型、错误产生位置、错误次数与错误产生时间。Judging whether the log of the real error is consistent with the error information in the hardware real error occurrence scheme, the error information includes error type, error location, error times, and error occurrence time.
在一种较佳的实施方式中,所述判断所述寄存器信息与所述硬件真实错误发生方案是否一致包括:In a preferred implementation manner, the judging whether the register information is consistent with the hardware real error occurrence scheme includes:
判断所述寄存器信息中的错误类型与所述硬件真实错误发生方案中的错误类型是否一致。Judging whether the error type in the register information is consistent with the error type in the hardware real error occurrence scheme.
第二方面,本发明还提供一种服务器性能验证系统,所述系统包括:In a second aspect, the present invention also provides a server performance verification system, the system comprising:
接收生成模块,用于接收错误注入指令并基于所述错误注入指令生成硬件真实错误发生方案;A receiving and generating module, configured to receive an error injection instruction and generate a real hardware error occurrence scheme based on the error injection instruction;
发送模块,用于发送所述硬件真实错误发生方案至目标硬件,以供所述目标硬件基于所述硬件真实错误发生方案产生真实错误;A sending module, configured to send the hardware real error occurrence scheme to the target hardware, so that the target hardware generates a real error based on the hardware real error occurrence scheme;
获取模块,用于获取所述真实错误的日志信息,并获取硬件真实错误发生方案和寄存器信息;An obtaining module, configured to obtain the log information of the real error, and obtain the hardware real error occurrence scheme and register information;
处理模块,用于基于所述真实错误的日志信息、所述硬件真实错误发生方案和所述寄存器信息处理所述真实错误以验证所述服务器性能。A processing module, configured to process the real error based on the real error log information, the hardware real error occurrence scheme, and the register information to verify the server performance.
第三方面,本发明还提供一种计算机设备,所述计算机设备包括:In a third aspect, the present invention also provides a computer device, the computer device comprising:
一个或多个处理器;one or more processors;
以及与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行如第一方面中任意一项所述的服务器性能验证方法。And a memory associated with the one or more processors, the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform any one of the steps in the first aspect. The server performance verification method described in the item.
第四方面,本发明还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令使所述计算机执行如第一方面中任意一项所述的服务器性能验证方法。In the fourth aspect, the present invention also provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to execute the server performance as described in any one of the first aspect. Authentication method.
本发明的优点是:提供一种服务器性能验证方法、系统、计算机设备及存储介质,所述方法包括:接收错误注入指令并基于所述错误注入指令生成硬件真实错误发生方案;发送所述硬件真实错误发生方案至目标硬件,以供所述目标硬件基于所述硬件真实错误发生方案产生真实错误;获取所述真实错误的日志信息,并获取硬件真实错误发生方案和寄存器信息;基于所述真实错误的日志信息、所述硬件真实错误发生方案和所述寄存器信息处理所述真实错误以验证所述服务器性能;通过发生真实硬件错误使得各种真实的硬件错误发生并由服务器识别;抓取硬件真实错误发生方案及寄存器和日志信息,优化了服务器性能测试效率。The present invention has the advantages of providing a server performance verification method, system, computer equipment and storage medium, the method comprising: receiving an error injection instruction and generating a hardware real error occurrence scheme based on the error injection instruction; sending the hardware real error occurrence scheme; The error occurrence scheme is sent to the target hardware for the target hardware to generate a real error based on the real error occurrence scheme of the hardware; the log information of the real error is obtained, and the hardware real error occurrence scheme and register information are obtained; based on the real error The log information, the hardware real error occurrence scheme and the register information process the real error to verify the performance of the server; through the occurrence of real hardware errors, various real hardware errors occur and are recognized by the server; grab hardware real The error occurrence scheme and register and log information optimize the efficiency of server performance testing.
附图说明Description of drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.
图1为本申请中服务器性能验证的架构示意图;FIG. 1 is a schematic diagram of the architecture of server performance verification in this application;
图2为本申请所提供的服务器性能验证方法的流程图;Fig. 2 is a flowchart of the server performance verification method provided by the present application;
图3为本申请所提供的服务器性能验证的框架流程图;Fig. 3 is a framework flowchart of the server performance verification provided by the present application;
图4为本申请所提供的服务器性能验证系统的架构图;Fig. 4 is the architectural diagram of the server performance verification system provided by the present application;
图5为本申请所提供的计算机设备的架构图。FIG. 5 is a structural diagram of a computer device provided by the present application.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solutions and advantages of the application clearer, the technical solutions in the embodiments of the application will be clearly and completely described below in conjunction with the drawings in the embodiments of the application. Obviously, the described embodiments are only Some embodiments of this application are not all embodiments. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
如背景技术所述,现有技术中As described in the background technology, in the prior art
服务器主板存在xdp(extenddebugport,扩展调试端口)接口,是一种JTAG(jointtestactiongroup,联合测试工作组)类型接口,该接口是一种国际标准测试协议(ieee1149.1兼容),主要用于错误注入测试。现有技术中错误注入为模拟错误或只发生在硬件接口,非硬件真实发生的错误,而对于Intel提供的实体PEI卡或者MEI卡只能单个、单一种类的注入同类型错误,并且不能持续且随机产生大量硬件错误。The server motherboard has an xdp (extenddebugport, extended debug port) interface, which is a JTAG (jointtestactiongroup, joint test working group) type interface. This interface is an international standard test protocol (ieee1149.1 compatible), mainly used for error injection testing . In the prior art, error injection is a simulated error or an error that only occurs at the hardware interface, not an error that actually occurs in the hardware. However, for the physical PEI card or MEI card provided by Intel, only a single, single type of error can be injected into the same type, and it cannot be sustained and A large number of hardware errors are randomly generated.
为解决上述问题,本申请创造性地提出了一种服务器性能验证方法、系统、计算机设备及存储介质,所述方法包括:接收错误注入指令并基于所述错误注入指令生成硬件真实错误发生方案;发送所述硬件真实错误发生方案至目标硬件,以供所述目标硬件基于所述硬件真实错误发生方案产生真实错误;获取所述真实错误的日志信息,并获取硬件真实错误发生方案和寄存器信息;基于所述真实错误的日志信息、所述硬件真实错误发生方案和所述寄存器信息处理所述真实错误以验证所述服务器性能;通过发生真实硬件错误使得各种真实的硬件错误发生并由服务器识别;抓取硬件真实错误发生方案及寄存器和日志信息,优化了服务器性能测试效率。In order to solve the above problems, the present application creatively proposes a server performance verification method, system, computer equipment and storage medium, the method includes: receiving an error injection instruction and generating a real hardware error occurrence scheme based on the error injection instruction; sending The hardware real error occurrence scheme is sent to the target hardware for the target hardware to generate a real error based on the hardware real error occurrence scheme; obtain the log information of the real error, and obtain the hardware real error occurrence scheme and register information; The log information of the real error, the hardware real error occurrence scheme and the register information process the real error to verify the performance of the server; through the occurrence of real hardware errors, various real hardware errors occur and are recognized by the server; Capture real hardware error occurrence schemes and register and log information, optimizing the efficiency of server performance testing.
下面将结合附图和各个实施例,对本申请的方案进行详细介绍。The solutions of the present application will be described in detail below with reference to the drawings and various embodiments.
实施例一:本实施例对本申请中服务器性能验证的架构进行介绍。Embodiment 1: This embodiment introduces the architecture of server performance verification in this application.
具体的,参照图1所示,所述架构包括:Specifically, as shown in Figure 1, the architecture includes:
服务器硬件及硬件接口,所述服务器硬件及硬件接口包括集成部署至服务器的CPU(CentralProcessingUnit/Processor,中央处理器)硬件接口、内存硬件接口、PCIE(PCI-Express,总线和接口标准)硬件接口、PCH(PlatformControllerHub,是intel公司的集成南桥)硬件接口以及对应的硬件,当然还包括其他根据实际需要设置的硬件接口,所述CPU硬件接口、内存硬件接口、PCIE硬件接口、PCH硬件接口和其他根据实际需要设置的硬件接口均可视及可交互;硬件经硬件接口接收错误验证系统下发的错误并执行;Server hardware and hardware interface, described server hardware and hardware interface comprise CPU (CentralProcessingUnit/Processor, central processing unit) hardware interface, internal memory hardware interface, PCIE (PCI-Express, bus and interface standard) hardware interface, PCH (PlatformControllerHub, is the integrated south bridge of intel company) hardware interface and corresponding hardware, of course, also includes other hardware interfaces set according to actual needs, the CPU hardware interface, memory hardware interface, PCIE hardware interface, PCH hardware interface and other The hardware interfaces set according to actual needs are all visible and interactive; the hardware receives and executes the errors issued by the error verification system through the hardware interfaces;
所述架构还包括:人机交互界面及日志输入模块,其中,人机交互界面与错误验证系统连接,用于用户输入错误注入指令、显示错误注入方案供用户进行选择、监控错误发生进度,查看错误发生历史,查看错误发生方案及以及在验证后显示验证结果;日志抓取接口包括但不限于串口、XDP接口、IPMI接口、redfish接口、SSH接口等,用于注错后日志的抓取,提供日志信息用于错误处理机制;The architecture also includes: a human-computer interaction interface and a log input module, wherein the human-computer interaction interface is connected to the error verification system for users to input error injection instructions, display error injection schemes for users to choose, monitor error occurrence progress, and view Error history, check the error occurrence scheme and display the verification result after verification; the log capture interface includes but not limited to serial port, XDP interface, IPMI interface, redfish interface, SSH interface, etc., used to capture the log after the error is noted, Provide log information for error handling mechanism;
所述架构还包括错误发生装置,所述错误发生装置中错误发生装置中部署有所述错误验证系统,内部包含错误发生模块、错误诊断模块。错误发生模块负责根据用户输入的错误注入指令生成硬件真实错误发生方案,并解析成为传递给硬件的错误发生指令,错误诊断模块负责将日志抓取接口抓取的日志进行解析,判断真实错误发生是否符合硬件真实错误发生方案。The architecture also includes an error generating device, wherein the error verification system is deployed in the error generating device, and includes an error generating module and an error diagnosing module inside. The error occurrence module is responsible for generating the real error occurrence plan of the hardware according to the error injection instruction input by the user, and parses it into an error occurrence instruction passed to the hardware. Comply with the hardware real error occurrence scheme.
实施例二:基于上述实施例一所介绍的服务器性能验证的架构,本实施例结合图2与图3,对本申请中服务器性能验证过程进行介绍。Embodiment 2: Based on the architecture of server performance verification introduced in Embodiment 1 above, this embodiment introduces the process of server performance verification in this application with reference to FIG. 2 and FIG. 3 .
具体的,参照图2与图3所示,本申请提供一种服务器性能验证方法,所述方法包括:Specifically, referring to FIG. 2 and FIG. 3, the present application provides a server performance verification method, the method comprising:
S210、接收错误注入指令并基于错误注入指令生成硬件真实错误发生方案。S210. Receive an error injection instruction and generate a real hardware error occurrence scheme based on the error injection instruction.
具体的,系统与人机交互界面相连,用户在人机交互界面输入错误注入指令,系统接收到人机交互界面传送的错误注入指令后生成硬件真实错误发生方案。Specifically, the system is connected to the human-computer interaction interface, and the user inputs an error injection instruction on the human-computer interaction interface, and the system generates a real hardware error occurrence scheme after receiving the error injection instruction transmitted by the human-computer interaction interface.
在一种实施方式中,所述接收错误注入指令并基于所述错误注入指令生成硬件真实错误发生方案包括:In one embodiment, the receiving an error injection instruction and generating a real hardware error occurrence scheme based on the error injection instruction includes:
S211、显示预设错误注入方案于人机交互界面以供用户选择以输入错误注入指令,所述预设错误注入方案包括自定义错误、随机错误与故障洪流中的任意一种。S211. Displaying a preset error injection scheme on the human-computer interaction interface for the user to select to input an error injection command, the preset error injection scheme includes any one of custom errors, random errors, and fault floods.
具体的,为了提高用户的易用特性,在用户通过人机交互界面的输入输出功能输入错误注入指令时,于人机交互界面提供多种错误注入方案,包括但不限于:自定义错误(用户可自定义硬件类型、错误类型、错误发生次数、错误位置等)、随机错误(错误会根据服务器硬件配置随机产生)、故障洪流(根据服务器硬件配置,持续且随机产生大量硬件错误)。Specifically, in order to improve the user-friendliness, when the user inputs an error injection command through the input and output functions of the human-computer interaction interface, various error injection solutions are provided on the human-computer interaction interface, including but not limited to: custom errors (user-defined errors) You can customize the hardware type, error type, error occurrence times, error location, etc.), random errors (errors will be randomly generated according to the server hardware configuration), fault flood (according to the server hardware configuration, a large number of hardware errors will be continuously and randomly generated).
S212、接收错误注入指令。S212. Receive an error injection instruction.
接收用户选择的预设错误注入方案及具体内容,用户可以选择自定义错误、随机错误、故障洪流中的任意一种。当然,如果预设错误注入方案还包括有其他的错误注入方案,用户也可以进行选择,并不局限于上述的自定义错误、随机错误、故障洪流。Receive the preset error injection scheme and specific content selected by the user, and the user can choose any one of custom errors, random errors, and fault torrents. Of course, if the preset error injection scheme also includes other error injection schemes, the user can also choose, not limited to the above-mentioned custom errors, random errors, and fault torrents.
S213、读取服务器硬件配置信息,所述服务器硬件配置信息至少包括:硬件型号、硬件数量、内容容量以及内存位置。S213. Read server hardware configuration information, where the server hardware configuration information at least includes: hardware model, hardware quantity, content capacity, and memory location.
具体的,系统读取服务器硬件配置,如硬件型号、硬件数量、内存容量、位置等信息,确认当前可用的配置硬件,用于硬件真实错误发生方案的制定。Specifically, the system reads the server hardware configuration, such as hardware model, hardware quantity, memory capacity, location and other information, and confirms the currently available configuration hardware, which is used to formulate a real hardware error occurrence plan.
S214、基于所述错误注入指令与所述服务器硬件配置信息生成硬件真实错误发生方案,所述硬件真实错误发生方案至少包括错误列表、错误优先级以及错误类型。S214. Generate a real hardware error occurrence scheme based on the error injection instruction and the server hardware configuration information, where the hardware real error occurrence scheme at least includes an error list, an error priority, and an error type.
具体的,(1)响应于用户选择的错误注入方案为自定义错误,所述基于所述错误注入指令与所述服务器硬件配置信息生成硬件真实错误发生方案包括:Specifically, (1) in response to the error injection scheme selected by the user as a custom error, the generation of a real hardware error occurrence scheme based on the error injection instruction and the server hardware configuration information includes:
S2141、接收用户输入的自定义硬件类型信息、自定义错误类型信息、自定义错误发生次数信息以及错误位置信息;S2141. Receive user-defined hardware type information, user-defined error type information, user-defined error frequency information, and error location information;
S2142、基于所述自定义硬件类型信息、自定义错误类型信息、自定义错误发生次数信息以及错误位置信息以及所述服务器硬件配置信息生成硬件真实错误发生方案。S2142. Generate a real hardware error occurrence scheme based on the user-defined hardware type information, user-defined error type information, user-defined error frequency information, error location information, and the server hardware configuration information.
若用户输入的自定义硬件类型信息、自定义错误类型信息、自定义错误发生次数信息以及错误位置信息与读取的服务器硬件配置信息冲突,则发送冲突内容信息至人机交互界面进行显示提醒。示例性的,用户输入的自定义硬件类型信息为CPU硬件类型,但读取的服务器硬件配置信息显示当前CPU类型硬件不可用,则用户注入指令与读取的服务器硬件配置信息冲突,错误无法执行,显示冲突内容信息-CPU硬件不可用至人机交互界面进行显示提醒。If the user-defined hardware type information, user-defined error type information, user-defined error frequency information, and error location information entered by the user conflict with the server hardware configuration information read, the conflict content information is sent to the human-computer interaction interface for display reminder. Exemplarily, the user-defined hardware type information entered by the user is the CPU hardware type, but the read server hardware configuration information shows that the current CPU type hardware is unavailable, then the user injection instruction conflicts with the read server hardware configuration information, and the error cannot be executed , display conflict content information - CPU hardware is not available to the human-computer interaction interface for display reminders.
(2)响应于用户选择的错误注入方案为随机错误,所述基于所述错误注入指令与所述服务器硬件配置信息生成硬件真实错误发生方案包括:(2) In response to the error injection scheme selected by the user being a random error, the generation of a real hardware error occurrence scheme based on the error injection instruction and the server hardware configuration information includes:
S2143、基于所述服务器硬件配置信息随机生成硬件真实错误发生方案。S2143. Randomly generate a real hardware error occurrence scheme based on the server hardware configuration information.
(3)响应于用户选择的错误注入方案为故障洪流,所述基于所述错误注入指令与所述服务器硬件配置信息生成硬件真实错误发生方案包括:(3) In response to the error injection scheme selected by the user as a fault flood, the generation of a real hardware error occurrence scheme based on the error injection instruction and the server hardware configuration information includes:
S2144、基于所述服务器硬件配置信息持续预设时长随机产生错误以生成硬件真实错误发生方案。S2144. Randomly generate errors for a preset period of time based on the server hardware configuration information to generate a real hardware error occurrence solution.
SA10、解析所述硬件真实错误发生方案获取硬件真实错误类别信息,所述硬件真实错误类别信息包括CPU硬件错误、内存硬件错误、PCIE硬件错误以及其他错误中的至少一种。SA10. Analyze the hardware real error occurrence scheme to obtain hardware real error category information, and the hardware real error category information includes at least one of CPU hardware errors, memory hardware errors, PCIE hardware errors, and other errors.
优选的,本步骤包括:基于所述错误优先级以及所述错误列表解析所述硬件真实错误发生方案获取硬件真实错误类别信息。Preferably, this step includes: analyzing the hardware real error occurrence scheme based on the error priority and the error list to obtain hardware real error category information.
具体的,硬件真实错误发生方案生成后,系统即抓取硬件真实错误发生方案进行逐级解析。由于服务器集成部署有CPU硬件接口、内存硬件接口、PCIE硬件接口、PCH硬件接口以及对应的硬件,还包括其他根据实际需要设置的硬件接口,系统中部署有每种硬件及硬件接口对应的硬件解析模块。系统抓取到硬件真实错误发生方案后,根据其中的错误优先级以及错误列表将其在对应的硬件解析模块中进行解析。Specifically, after the real hardware error occurrence scheme is generated, the system grabs the hardware real error occurrence scheme and analyzes them step by step. Since the server integrates and deploys CPU hardware interfaces, memory hardware interfaces, PCIE hardware interfaces, PCH hardware interfaces, and corresponding hardware, as well as other hardware interfaces that are set according to actual needs, the system deploys hardware analysis corresponding to each hardware and hardware interface module. After the system captures the real hardware error occurrence scheme, it will be analyzed in the corresponding hardware analysis module according to the error priority and error list.
示例性的,硬件真实错误发生方案中错误列表包括CPU硬件错误与内存硬件错误,则基于其中的错误优先级-CPU硬件错误优先级高于内存硬件错误优先级先后在CPU硬件解析模块和内存解析模块中进行解析。Exemplarily, the error list in the hardware real error occurrence scheme includes CPU hardware errors and memory hardware errors, then based on the error priority-the CPU hardware error priority is higher than the memory hardware error priority in the CPU hardware analysis module and the memory analysis module successively parsed in the module.
SA20、解析所述硬件真实错误发生方案获取硬件真实错误类别信息对应的目标硬件层信息。SA20. Analyze the hardware real error occurrence scheme to obtain target hardware layer information corresponding to the hardware real error category information.
具体的,识别解析硬件真实错误发生方案属于的错误类别后,继续解析获取错误的硬件层信息。Specifically, after identifying and analyzing the error category that the real hardware error occurrence scheme belongs to, continue to analyze and obtain error hardware layer information.
示例性的,解析硬件真实错误发生方案是CPU硬件错误类别后,进一步解析对应的硬件层信息。硬件层包括传输层、数据层和处理层。传输层,负责数据的传输传递和接收输出,以PCIE硬件为例,常规叫法为RX(Receive接收数据)、TX(transport传送数据)。数据层存储了大量的数据,处理层涉及数据的处理和计算。硬件层级越深,数据发送错误的严重程度越高。Exemplarily, after parsing the actual hardware error occurrence scheme is CPU hardware error category, further parsing the corresponding hardware layer information. The hardware layer includes the transport layer, data layer and processing layer. The transport layer is responsible for data transmission, reception and output. Taking PCIE hardware as an example, the conventional names are RX (Receive receiving data) and TX (transport transmitting data). The data layer stores a large amount of data, and the processing layer involves data processing and calculation. The deeper the hardware hierarchy, the higher the severity of data transmission errors.
所述解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息时还生成第一错误随机数。When the scheme for parsing the actual occurrence of hardware errors acquires the target error type information corresponding to the target hardware layer information, a first error random number is also generated.
SA30、基于所述第一错误随机数修正所述目标硬件层信息。SA30. Correct the target hardware layer information based on the first erroneous random number.
硬件层解析过程会产生第一错误随机数,第一错误随机数在解析完成目标硬件层信息后对错误进行修正,目的是防止错误在解析到下一层级时,错误分布会受算法影响,形成固定的分步状态,如出现常见的正态分布。随机数的作用是让错误在可控的范围内进行全面覆盖,减小覆盖度不均的风险。The hardware layer analysis process will generate the first error random number, and the first error random number will correct the error after the analysis of the target hardware layer information is completed. The purpose is to prevent the error distribution from being affected by the algorithm when the error is parsed to the next level. Fixed step-by-step state, as occurs with the usual normal distribution. The role of the random number is to allow errors to be fully covered within a controllable range, reducing the risk of uneven coverage.
SA40、解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息。SA40. Analyze the hardware real error occurrence scheme to obtain target error type information corresponding to the target hardware layer information.
所述解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息时还生成第二错误随机数。When the scheme for analyzing the real hardware error occurrence obtains the target error type information corresponding to the target hardware layer information, a second error random number is also generated.
具体的,错误解析到硬件层后,需要对错误进行进一步解析,把错误在每一硬件层中进行进一步的细化获得目标错误类型信息。如在任意硬件层都可以发生的单BIT错误或叫做比特跳变,比特跳变这种错误是由于二进制数据中,单BIT或者多个BIT发生置位翻转。其他错误还涉及CRC错误、读写进程错误等众多类型的错误。Specifically, after the error is parsed to the hardware layer, the error needs to be further parsed, and the error is further refined in each hardware layer to obtain the target error type information. For example, a single BIT error that can occur at any hardware layer is called a bit jump. The error of a bit jump is due to the flipping of a single BIT or multiple BITs in binary data. Other errors also involve CRC errors, read and write process errors and many other types of errors.
SA50、基于所述第二错误随机数修正所述目标错误类型信息。SA50. Correct the target error type information based on the second error random number.
错误类型解析过程会产生第二错误随机数,第二错误随机数在解析完成目标错误类型信息后对错误进行修正,目的是防止错误分布会受算法影响,形成固定的分步状态,如出现常见的正态分布。随机数的作用是让错误在可控的范围内进行全面覆盖,减小覆盖度不均的风险。The error type parsing process will generate a second error random number, and the second error random number will correct the error after the analysis of the target error type information is completed. The purpose is to prevent the error distribution from being affected by the algorithm and form a fixed step-by-step state. normal distribution of . The role of the random number is to allow errors to be fully covered within a controllable range, reducing the risk of uneven coverage.
S220、发送硬件真实错误发生方案至目标硬件,以供目标硬件基于硬件真实错误发生方案产生真实错误。S220. Send the hardware real error occurrence scheme to the target hardware, so that the target hardware generates a real error based on the hardware real error occurrence scheme.
具体的,所述发送所述硬件真实错误发生方案至目标硬件包括:Specifically, the sending the real error occurrence scheme of the hardware to the target hardware includes:
S221、基于所述硬件真实错误类别信息发送所述目标硬件层信息与所述目标错误类型信息至目标硬件。S221. Send the target hardware layer information and the target error type information to target hardware based on the hardware real error type information.
更具体的,以错误包或错误流形式下发所述目标硬件层信息与所述目标错误类型信息至目标硬件以产生真实错误。More specifically, sending the target hardware layer information and the target error type information to the target hardware in the form of error packets or error streams to generate real errors.
错误类型解析完成后,会以错误包或者错误流的形式将错误下发到硬件,硬件错误产生的表现形式也不同,大致分为,硬件物理层置位:硬件的某个物理层完全瘫痪,无法工作;链路失效:数据传输链路的某段或全段失效,导致数据不能被读取或者不能加入计算序列;数据错误:数据在读写和传输中出现易位、缺失、编译错误等导致数据不能使用,出现数据错误,导致数据不能使用;以及其他类型错误,导致硬件运行出现错误。After the analysis of the error type is completed, the error will be sent to the hardware in the form of an error packet or error stream. The manifestations of hardware errors are also different, roughly divided into hardware physical layer setting: a certain physical layer of the hardware is completely paralyzed, Unable to work; link failure: some or all of the data transmission link fails, resulting in data that cannot be read or added to the calculation sequence; data error: translocation, missing, compilation errors, etc. in data reading, writing and transmission Data cannot be used, data errors occur, data cannot be used; and other types of errors cause errors in hardware operation.
目标硬件在接收系统下发的目标硬件层信息与目标错误类型信息后执行,产生真实错误。The target hardware executes after receiving the target hardware layer information and target error type information issued by the system, and generates real errors.
S230、获取真实错误的日志信息,并获取硬件真实错误发生方案和寄存器信息。S230. Obtain log information of real errors, and acquire hardware real error occurrence schemes and register information.
在一种实施方式中,所述获取所述真实错误的日志信息包括:In an implementation manner, the obtaining log information of the real error includes:
S231、基于预设接口抓取所述真实错误的日志信息,所述预设接口包括串口、XDP(extenddebugport,扩展调试端口)接口、IPMI接口(IntelligentPlatformManagementInterface,智能平台管理接口)、redfish接口与SSH接口的至少一种;所述真实错误的日志信息由所述目标硬件在真实错误发生后经UEFI(UnifiedExtensibleFirmwareInterface,统一可扩展固件接口)系统上报至OSkernel(操作系统内核)以及BMC(BaseboardManagementController,基板管理控制器)生成。S231. Capture the log information of the real error based on a preset interface, the preset interface including a serial port, an XDP (extenddebugport, extended debug port) interface, an IPMI interface (IntelligentPlatformManagementInterface, an intelligent platform management interface), a redfish interface, and an SSH interface at least one of; the log information of the real error is reported by the target hardware to OSkernel (operating system kernel) and BMC (BaseboardManagementController, baseboard management control via UEFI (UnifiedExtensibleFirmwareInterface, Unified Extensible Firmware Interface) system after the real error occurs device) generated.
具体的,目标硬件执行产生错误,错误发生后会传递给UEFI系统,UEFI对错误进行处理,可能会驳回让硬件重新修正,或者累计到一定的数值之后认为是一个真正的错误触发寄存器置位。目标硬件将错误上报至UEFI系统,UEFI系统将错误上报到OSkernel以及BMC中,会产生记录日志,日志信息(包括但不限于massage、dmasg等)。Specifically, if the target hardware executes an error, it will be passed to the UEFI system after the error occurs, and UEFI will process the error and may reject it to allow the hardware to correct it again, or consider it as a real error after a certain value is accumulated to trigger register setting. The target hardware reports the error to the UEFI system, and the UEFI system reports the error to the OSkernel and BMC, which will generate a log and log information (including but not limited to massage, dmasg, etc.).
在一种实施方式中,所述获取硬件真实错误发生方案和寄存器信息包括:In an implementation manner, the acquiring the real hardware error occurrence scheme and register information includes:
S232、获取硬件真实错误发生方案。S232. Obtain a real hardware error occurrence scheme.
S233、获取错误传递和处理触发寄存器置位的寄存器信息,所述寄存器信息至少包括SMI(SerialManagementInterface,串行管理接口)链路信息与CSMI链路信息。S233. Acquire register information for error transfer and processing trigger register setting, where the register information at least includes SMI (Serial Management Interface, serial management interface) link information and CSMI link information.
具体的,所述获取错误传递和处理触发寄存器置位的寄存器信息包括:Specifically, the acquisition of error transfer and processing of the register information of trigger register setting includes:
获取错误发生后传递至UEFI系统中触发寄存器置位的寄存器信息。Obtain the register information that is passed to the trigger register setting in the UEFI system after an error occurs.
错误发生后传递给UEFI系统,错误经过传递和处理后,会触发进行寄存器置位,寄存器信息(包括但不限于SMI链路、CSMI链路等)会传递给系统用于错误处理。After the error occurs, it is passed to the UEFI system. After the error is passed and processed, it will trigger register setting, and the register information (including but not limited to SMI link, CSMI link, etc.) will be passed to the system for error handling.
S240、基于真实错误的日志信息、硬件真实错误发生方案和寄存器信息处理真实错误以验证服务器性能。S240. Process real errors based on real error log information, hardware real error occurrence scheme, and register information to verify server performance.
在一种实施方式中,所述基于所述真实错误的日志信息、所述硬件真实错误发生方案和所述寄存器信息处理所述真实错误以验证所述服务器性能包括:In one embodiment, the processing the real error based on the real error log information, the hardware real error occurrence scheme and the register information to verify the server performance includes:
判断所述真实错误的日志与所述硬件真实错误发生方案是否一致,并判断所述寄存器信息与所述硬件真实错误发生方案是否一致;Judging whether the log of the real error is consistent with the hardware real error occurrence scheme, and judging whether the register information is consistent with the hardware real error occurrence scheme;
若所述真实错误的日志与所述硬件真实错误发生方案一致,且所述寄存器信息与所述硬件真实错误发生方案一致,则验证所述服务器性能合格。If the real error log is consistent with the hardware real error occurrence scheme, and the register information is consistent with the hardware real error occurrence scheme, then verify that the performance of the server is qualified.
优选的,所述判断所述真实错误的日志与所述硬件真实错误发生方案中是否一致包括:Preferably, the judging whether the log of the real error is consistent with the hardware real error occurrence scheme includes:
判断所述真实错误的日志与所述硬件真实错误发生方案中的错误信息是否一致,所述错误信息包括错误类型、错误产生位置、错误次数与错误产生时间。Judging whether the log of the real error is consistent with the error information in the hardware real error occurrence scheme, the error information includes error type, error location, error times, and error occurrence time.
所述判断所述寄存器信息与所述硬件真实错误发生方案是否一致包括:The judging whether the register information is consistent with the hardware real error occurrence scheme includes:
判断所述寄存器信息中的错误类型与所述硬件真实错误发生方案中的错误类型是否一致。寄存器信息中包含错误类型列表,通过比对寄存器信息中的错误类型列表和硬件真实错误发生方案中的错误类型列表是否对应匹配判断所述寄存器信息与所述硬件真实错误发生方案是否一致。Judging whether the error type in the register information is consistent with the error type in the hardware real error occurrence scheme. The register information includes an error type list, and it is judged whether the register information is consistent with the hardware actual error occurrence scheme by comparing whether the error type list in the register information matches the error type list in the hardware actual error occurrence scheme.
真实方案里的错误类型和寄存器信息是否一致。Whether the error type in the real solution is consistent with the register information.
错误处理流程拉取硬件真实错误发生方案、寄存器信息、日志信息,进行故障处理流程,错误处理分为人工检查和自动检查,主要目的是对比硬件真实错误发生方案是否与真实错误的日志相匹配,以及是否与寄存器信息相匹配,从而验证服务器的RSA功能是否正常。自动检查会执行自动化用例,执行的最终结果会呈现到人机交互界面中。人检查方式会直接将抓取的信息显示给操作人员,进行人工测试。The error handling process pulls the real hardware error occurrence scheme, register information, and log information to perform the fault handling process. Error handling is divided into manual inspection and automatic inspection. The main purpose is to compare whether the hardware actual error occurrence scheme matches the real error log. And whether it matches the register information, so as to verify whether the RSA function of the server is normal. Automated checks execute automated use cases, and the final results of the execution are presented in the human-machine interface. The manual inspection method will directly display the captured information to the operator for manual testing.
本实施例提供的服务器性能验证方法,通过发生真实硬件错误使得各种真实的硬件错误发生并由服务器识别;抓取硬件真实错误发生方案及寄存器和日志信息,优化了服务器性能测试效率;The server performance verification method provided in this embodiment, through the occurrence of real hardware errors, various real hardware errors occur and are recognized by the server; capture the real hardware error occurrence scheme and register and log information, and optimize the server performance test efficiency;
进一步的,将多种真实硬件集成部署到服务器,使得多种真实的硬件错误发生并由服务器识别。Further, multiple real hardwares are integrated and deployed to the server, so that multiple real hardware errors occur and are recognized by the server.
进一步的,根据读取的服务器配置信息情况,制定符合待测服务器配置的错误发生方案,对各类服务器的适配能力优异。Furthermore, according to the read server configuration information, an error occurrence scheme conforming to the configuration of the server to be tested is formulated, and the adaptability to various servers is excellent.
进一步的,实现真实的硬件错误的大规模且连续的注入;Further, realize large-scale and continuous injection of real hardware errors;
进一步的,通过解析错误,将硬件错误一步步细化,从而产生大量多样且复杂的硬件错误,极大的贴近客户使用的真实环境。Furthermore, by analyzing the errors, the hardware errors are refined step by step, resulting in a large number of diverse and complex hardware errors, which is very close to the real environment used by customers.
实施例三:与上述实施例一至实施例二相对应的,下面将结合图4对本申请提供的服务器性能验证系统进行介绍。其中,该系统可以通过硬件或软件的方式实现,也可以通过软硬件结合的方式实现,本申请并不限定。Embodiment 3: Corresponding to Embodiment 1 to Embodiment 2 above, the server performance verification system provided by this application will be introduced below with reference to FIG. 4 . Wherein, the system may be realized by means of hardware or software, or by a combination of software and hardware, which is not limited in this application.
在一个示例中,本申请提供了一种服务器性能验证系统,所述服务器性能验证系统包括:In an example, the present application provides a server performance verification system, and the server performance verification system includes:
接收生成模块410,用于接收错误注入指令并基于所述错误注入指令生成硬件真实错误发生方案;The receiving and generating module 410 is configured to receive an error injection instruction and generate a real hardware error occurrence scheme based on the error injection instruction;
发送模块420,用于发送所述硬件真实错误发生方案至目标硬件,以供所述目标硬件基于所述硬件真实错误发生方案产生真实错误;A sending module 420, configured to send the hardware real error occurrence scheme to the target hardware, so that the target hardware generates a real error based on the hardware real error occurrence scheme;
获取模块430,用于获取所述真实错误的日志信息,并获取硬件真实错误发生方案和寄存器信息;The obtaining module 430 is used to obtain the log information of the real error, and obtain the hardware real error occurrence scheme and register information;
处理模块440,用于基于所述真实错误的日志信息、所述硬件真实错误发生方案和所述寄存器信息处理所述真实错误以验证所述服务器性能。The processing module 440 is configured to process the real error based on the real error log information, the hardware real error occurrence scheme and the register information to verify the server performance.
在一种实施方案中,所述接收生成模块410包括:In one embodiment, the reception generating module 410 includes:
接收单元411,用于接收错误注入指令;A receiving unit 411, configured to receive an error injection instruction;
读取单元412,用于读取服务器硬件配置信息,所述服务器硬件配置信息至少包括:硬件型号、硬件数量、内容容量以及内存位置;A reading unit 412, configured to read server hardware configuration information, the server hardware configuration information at least including: hardware model, hardware quantity, content capacity and memory location;
生成单元413,用于基于所述错误注入指令与所述服务器硬件配置信息生成硬件真实错误发生方案,所述硬件真实错误发生方案至少包括错误列表、错误优先级以及错误类型。The generation unit 413 is configured to generate a real hardware error occurrence scheme based on the error injection instruction and the server hardware configuration information, and the hardware real error occurrence scheme at least includes an error list, an error priority, and an error type.
优选的,所述系统还包括:Preferably, the system also includes:
显示模块450,用于在所述接收生成模块410接收错误注入指令并基于所述错误注入指令生成硬件真实错误发生方案之前,显示预设错误注入方案于人机交互界面以供用户选择以输入错误注入指令,所述预设错误注入方案包括自定义错误、随机错误与故障洪流中的任意一种。The display module 450 is used to display a preset error injection scheme on the human-computer interaction interface for the user to select to input an error before the receiving and generating module 410 receives the error injection instruction and generates a real hardware error occurrence scheme based on the error injection instruction. An injection instruction, the preset error injection scheme includes any one of custom errors, random errors and fault floods.
更优选的,所述生成单元413具体用于:响应于用户选择的错误注入方案为自定义错误,接收用户输入的自定义硬件类型信息、自定义错误类型信息、自定义错误发生次数信息以及错误位置信息;More preferably, the generating unit 413 is specifically configured to: receive user-defined hardware type information, user-defined error type information, user-defined error occurrence number information, and error location information;
基于所述自定义硬件类型信息、自定义错误类型信息、自定义错误发生次数信息以及错误位置信息以及所述服务器硬件配置信息生成硬件真实错误发生方案。A real hardware error occurrence solution is generated based on the user-defined hardware type information, user-defined error type information, user-defined error frequency information, error location information, and the server hardware configuration information.
更优选的,所述生成单元413具体用于:响应于用户选择的错误注入方案为随机错误,基于所述服务器硬件配置信息随机生成硬件真实错误发生方案。More preferably, the generating unit 413 is specifically configured to: in response to the error injection scheme selected by the user being a random error, randomly generate a real hardware error occurrence scheme based on the server hardware configuration information.
更优选的,所述生成单元413具体用于:响应于用户选择的错误注入方案为故障洪流,基于所述服务器硬件配置信息持续预设时长随机产生错误以生成硬件真实错误发生方案。More preferably, the generating unit 413 is specifically configured to: respond to the error injection scheme selected by the user as fault flood, randomly generate errors based on the server hardware configuration information for a preset period of time to generate a real hardware error occurrence scheme.
优选的,所述系统还包括:Preferably, the system also includes:
第一解析模块460,用于所述发送模块420发送所述硬件真实错误发生方案至目标硬件,以供所述目标硬件基于所述硬件真实错误发生方案产生真实错误之前,解析所述硬件真实错误发生方案获取硬件真实错误类别信息,所述硬件真实错误类别信息包括CPU硬件错误、内存硬件错误、PCIE硬件错误以及其他错误中的至少一种;The first analysis module 460 is used for the sending module 420 to send the hardware real error occurrence scheme to the target hardware, so that the target hardware can analyze the hardware real error before generating a real error based on the hardware real error occurrence scheme The occurrence scheme obtains the real error category information of hardware, and the real error category information of the hardware includes at least one of CPU hardware error, memory hardware error, PCIE hardware error and other errors;
第二解析模块470,用于解析所述硬件真实错误发生方案获取硬件真实错误类别信息对应的目标硬件层信息;The second analysis module 470 is used to analyze the hardware real error occurrence scheme to obtain the target hardware layer information corresponding to the hardware real error category information;
第三解析模块480,用于解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息;The third parsing module 480 is configured to parse the hardware real error occurrence scheme to obtain the target error type information corresponding to the target hardware layer information;
所述发送模块420具体用于:基于所述硬件真实错误类别信息发送所述目标硬件层信息与所述目标错误类型信息至目标硬件。The sending module 420 is specifically configured to: send the target hardware layer information and the target error type information to the target hardware based on the hardware real error type information.
更优选的,所述第一解析模块460具体用于:More preferably, the first parsing module 460 is specifically used for:
基于所述错误优先级以及所述错误列表解析所述硬件真实错误发生方案获取硬件真实错误类别信息。Analyzing the hardware real error occurrence scheme based on the error priority and the error list to obtain hardware real error category information.
更优选的,所述第一解析模块460还用于在解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息时生成第一错误随机数;More preferably, the first parsing module 460 is further configured to generate a first error random number when parsing the hardware real error occurrence scheme to obtain the target error type information corresponding to the target hardware layer information;
所述系统还包括:The system also includes:
修正模块490,用于所述第二解析模块470解析所述硬件真实错误发生方案获取硬件真实错误类别信息对应的目标硬件层信息之前,基于所述第一错误随机数修正所述目标硬件层信息。The correction module 490 is configured to correct the target hardware layer information based on the first error random number before the second parsing module 470 parses the hardware real error occurrence scheme to obtain the target hardware layer information corresponding to the hardware real error category information .
更优选的,所述第二解析模块470还用于在解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息时生成第二错误随机数;More preferably, the second parsing module 470 is further configured to generate a second error random number when parsing the hardware real error occurrence scheme to obtain the target error type information corresponding to the target hardware layer information;
所述修正模块490还用于在所述发送模块420基于所述硬件真实错误类别信息发送所述目标硬件层信息与所述目标错误类型信息至目标硬件之前,基于所述第二错误随机数修正所述目标错误类型信息。The correction module 490 is further configured to correct the error based on the second error random number before the sending module 420 sends the target hardware layer information and the target error type information to the target hardware based on the hardware real error type information. The target error type information.
更优选的,所述发送模块420具体用于:以错误包或错误流形式下发所述目标硬件层信息与所述目标错误类型信息至目标硬件以产生真实错误。More preferably, the sending module 420 is specifically configured to: send the target hardware layer information and the target error type information to the target hardware in the form of error packets or error streams to generate real errors.
更优选的,所述获取模块430包括:More preferably, the acquisition module 430 includes:
抓取单元431,用于基于预设接口抓取所述真实错误的日志信息,所述预设接口包括串口、XDP接口、IPMI接口、redfish接口与SSH接口的至少一种;所述真实错误的日志信息由所述目标硬件在真实错误发生后经UEFI系统上报至OSkernel以及BMC生成。The grabbing unit 431 is configured to grab log information of the real error based on a preset interface, the preset interface including at least one of a serial port, an XDP interface, an IPMI interface, a redfish interface, and an SSH interface; The log information is reported by the target hardware to the OSkernel and BMC via the UEFI system after a real error occurs.
更优选的,所述获取模块430还包括:More preferably, the obtaining module 430 also includes:
第一获取单元432,用于获取硬件真实错误发生方案;The first acquiring unit 432 is configured to acquire a real hardware error occurrence scheme;
第二获取单元433,用于获取错误传递和处理触发寄存器置位的寄存器信息,所述寄存器信息至少包括SMI链路信息与CSMI链路信息。The second obtaining unit 433 is configured to obtain register information for error transmission and processing trigger register setting, where the register information includes at least SMI link information and CSMI link information.
更优选的,所述第二获取单元433具体用于:获取错误发生后传递至UEFI系统中触发寄存器置位的寄存器信息。More preferably, the second acquiring unit 433 is specifically configured to: acquire the register information transmitted to trigger register setting in the UEFI system after an error occurs.
更优选的,所述处理模块440具体用于判断所述真实错误的日志与所述硬件真实错误发生方案是否一致,并判断所述寄存器信息与所述硬件真实错误发生方案是否一致;More preferably, the processing module 440 is specifically configured to determine whether the real error log is consistent with the hardware real error occurrence scheme, and determine whether the register information is consistent with the hardware real error occurrence scheme;
若所述真实错误的日志与所述硬件真实错误发生方案一致,且所述寄存器信息与所述硬件真实错误发生方案一致,则所述处理模块440验证所述服务器性能合格。If the real error log is consistent with the hardware real error occurrence scheme, and the register information is consistent with the hardware real error occurrence scheme, then the processing module 440 verifies that the performance of the server is qualified.
更优选的,所述处理模块440包括:More preferably, the processing module 440 includes:
第一判断单元441,用于判断所述真实错误的日志与所述硬件真实错误发生方案中的错误信息是否一致,所述错误信息包括错误类型、错误产生位置、错误次数与错误产生时间。The first judging unit 441 is configured to judge whether the real error log is consistent with the error information in the hardware real error occurrence scheme, and the error information includes error type, error location, error times and error occurrence time.
更优选的,所述处理模块440还包括:More preferably, the processing module 440 also includes:
第二判断单元442,用于判断所述寄存器信息中的错误类型与所述硬件真实错误发生方案中的错误类型是否一致。The second judging unit 442 is configured to judge whether the error type in the register information is consistent with the error type in the hardware real error occurrence scheme.
实施例四:与上述实施例一至实施例三相对应的,下面将结合图5,对本申请提供的计算机设备进行介绍。在一个示例中如图5所示,本申请提供了一种计算机设备,该计算机设备包括:Embodiment 4: Corresponding to Embodiment 1 to Embodiment 3 above, the computer equipment provided by this application will be introduced below with reference to FIG. 5 . In an example, as shown in FIG. 5, the present application provides a computer device, which includes:
一个或多个处理器;one or more processors;
以及与所述一个或多个处理器关联的存储器,所述存储器用于存储程序指令,所述程序指令在被所述一个或多个处理器读取执行时,执行如下操作:And a memory associated with the one or more processors, the memory is used to store program instructions, and when the program instructions are read and executed by the one or more processors, perform the following operations:
接收错误注入指令并基于所述错误注入指令生成硬件真实错误发生方案;receiving an error injection instruction and generating a real hardware error occurrence scheme based on the error injection instruction;
发送所述硬件真实错误发生方案至目标硬件,以供所述目标硬件基于所述硬件真实错误发生方案产生真实错误;Sending the hardware real error occurrence scheme to the target hardware, so that the target hardware generates a real error based on the hardware real error occurrence scheme;
获取所述真实错误的日志信息,并获取硬件真实错误发生方案和寄存器信息;Obtain the log information of the real error, and obtain the hardware real error occurrence scheme and register information;
基于所述真实错误的日志信息、所述硬件真实错误发生方案和所述寄存器信息处理所述真实错误以验证所述服务器性能。The real error is processed based on the log information of the real error, the hardware real error occurrence scheme and the register information to verify the performance of the server.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
接收错误注入指令;Receive error injection instructions;
读取服务器硬件配置信息,所述服务器硬件配置信息至少包括:硬件型号、硬件数量、内容容量以及内存位置;Read server hardware configuration information, the server hardware configuration information at least includes: hardware model, hardware quantity, content capacity and memory location;
基于所述错误注入指令与所述服务器硬件配置信息生成硬件真实错误发生方案,所述硬件真实错误发生方案至少包括错误列表、错误优先级以及错误类型。A real hardware error occurrence scheme is generated based on the error injection instruction and the server hardware configuration information, and the hardware real error occurrence scheme at least includes an error list, an error priority, and an error type.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
显示预设错误注入方案于人机交互界面以供用户选择以输入错误注入指令,所述预设错误注入方案包括自定义错误、随机错误与故障洪流中的任意一种。A preset error injection scheme is displayed on the human-computer interaction interface for the user to select to input an error injection command, and the preset error injection scheme includes any one of self-defined errors, random errors, and fault torrents.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
接收用户输入的自定义硬件类型信息、自定义错误类型信息、自定义错误发生次数信息以及错误位置信息;Receive user-defined hardware type information, custom error type information, custom error occurrence times information, and error location information;
基于所述自定义硬件类型信息、自定义错误类型信息、自定义错误发生次数信息以及错误位置信息以及所述服务器硬件配置信息生成硬件真实错误发生方案。A real hardware error occurrence solution is generated based on the user-defined hardware type information, user-defined error type information, user-defined error frequency information, error location information, and the server hardware configuration information.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
基于所述服务器硬件配置信息随机生成硬件真实错误发生方案。A real hardware error occurrence scheme is randomly generated based on the server hardware configuration information.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
基于所述服务器硬件配置信息持续预设时长随机产生错误以生成硬件真实错误发生方案。Randomly generate errors based on the server hardware configuration information for a preset period of time to generate a real hardware error occurrence solution.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
解析所述硬件真实错误发生方案获取硬件真实错误类别信息,所述硬件真实错误类别信息包括CPU硬件错误、内存硬件错误、PCIE硬件错误以及其他错误中的至少一种;Analyzing the hardware real error occurrence scheme to obtain hardware real error category information, the hardware real error category information includes at least one of CPU hardware errors, memory hardware errors, PCIE hardware errors and other errors;
解析所述硬件真实错误发生方案获取硬件真实错误类别信息对应的目标硬件层信息;Analyzing the hardware real error occurrence scheme to obtain target hardware layer information corresponding to the hardware real error category information;
解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息;Analyzing the hardware real error occurrence scheme to obtain target error type information corresponding to the target hardware layer information;
所述发送所述硬件真实错误发生方案至目标硬件包括:The sending the real error occurrence scheme of the hardware to the target hardware includes:
基于所述硬件真实错误类别信息发送所述目标硬件层信息与所述目标错误类型信息至目标硬件。Sending the target hardware layer information and the target error type information to target hardware based on the hardware real error type information.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
基于所述错误优先级以及所述错误列表解析所述硬件真实错误发生方案获取硬件真实错误类别信息。Analyzing the hardware real error occurrence scheme based on the error priority and the error list to obtain hardware real error category information.
所述解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息时还生成第一错误随机数;The first error random number is also generated when the solution for analyzing the real hardware error occurrence obtains the target error type information corresponding to the target hardware layer information;
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
基于所述第一错误随机数修正所述目标硬件层信息。Correcting the target hardware layer information based on the first erroneous random number.
所述解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息时还生成第二错误随机数。When the scheme for analyzing the real hardware error occurrence obtains the target error type information corresponding to the target hardware layer information, a second error random number is also generated.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
基于所述第二错误随机数修正所述目标错误类型信息。Correcting the target error type information based on the second error random number.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
以错误包或错误流形式下发所述目标硬件层信息与所述目标错误类型信息至目标硬件以产生真实错误。Sending the target hardware layer information and the target error type information to target hardware in the form of error packets or error streams to generate real errors.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
基于预设接口抓取所述真实错误的日志信息,所述预设接口包括串口、XDP接口、IPMI接口、redfish接口与SSH接口的至少一种;所述真实错误的日志信息由所述目标硬件在真实错误发生后经UEFI系统上报至OSkernel以及BMC生成。Grab the log information of the real error based on the preset interface, the preset interface includes at least one of a serial port, an XDP interface, an IPMI interface, a redfish interface and an SSH interface; the log information of the real error is generated by the target hardware After a real error occurs, it is reported to the OSkernel and BMC via the UEFI system for generation.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
获取硬件真实错误发生方案;Obtain the real error occurrence scheme of the hardware;
获取错误传递和处理触发寄存器置位的寄存器信息,所述寄存器信息至少包括SMI链路信息与CSMI链路信息。Acquiring register information for error transfer and processing trigger register setting, where the register information at least includes SMI link information and CSMI link information.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
获取错误发生后传递至UEFI系统中触发寄存器置位的寄存器信息。Obtain the register information that is passed to the trigger register setting in the UEFI system after an error occurs.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
判断所述真实错误的日志与所述硬件真实错误发生方案是否一致,并判断所述寄存器信息与所述硬件真实错误发生方案是否一致;Judging whether the log of the real error is consistent with the hardware real error occurrence scheme, and judging whether the register information is consistent with the hardware real error occurrence scheme;
若所述真实错误的日志与所述硬件真实错误发生方案一致,且所述寄存器信息与所述硬件真实错误发生方案一致,则验证所述服务器性能合格。If the real error log is consistent with the hardware real error occurrence scheme, and the register information is consistent with the hardware real error occurrence scheme, then verify that the performance of the server is qualified.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
判断所述真实错误的日志与所述硬件真实错误发生方案中的错误信息是否一致,所述错误信息包括错误类型、错误产生位置、错误次数与错误产生时间。Judging whether the log of the real error is consistent with the error information in the hardware real error occurrence scheme, the error information includes error type, error location, error times, and error occurrence time.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
判断所述寄存器信息中的错误类型与所述硬件真实错误发生方案中的错误类型是否一致。Judging whether the error type in the register information is consistent with the error type in the hardware real error occurrence scheme.
所述程序指令在被所述一个或多个处理器读取执行时,还可以执行与上述方法实施例中的各个步骤对应的操作,可以参考上文中的描述,此处不再赘述。参考图5,其示例性的展示出了计算机设备的架构,具体可以包括处理器510,视频显示适配器511,磁盘驱动器512,输入/输出接口513,网络接口514,以及存储器520。上述处理器510、视频显示适配器511、磁盘驱动器512、输入/输出接口513、网络接口514,与存储器520之间可以通过通信总线530进行通信连接。When the program instructions are read and executed by the one or more processors, operations corresponding to the steps in the above method embodiments can also be performed, and reference can be made to the above description, which will not be repeated here. Referring to FIG. 5 , it exemplarily shows the architecture of a computer device, which may specifically include a processor 510 , a video display adapter 511 , a disk drive 512 , an input/output interface 513 , a network interface 514 , and a memory 520 . The processor 510 , video display adapter 511 , disk drive 512 , input/output interface 513 , network interface 514 , and the memory 520 can be connected by communication bus 530 .
其中,处理器510可以采用通用的中央处理器(CentralProcessingUnit,CPU)、微处理器、应用专用集成电路(ApplicationSpecificIntegratedCircuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本申请所提供的技术方案。Wherein, the processor 510 may be implemented by a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, and is used to execute related programs to Realize the technical solution provided by this application.
存储器520可以采用只读存储器(ReadOnlyMemory,ROM)、随机存取存储器(RandomAccessMemory,RAM)、静态存储设备,动态存储设备等形式实现。存储器520可以存储用于控制计算机设备500运行的操作系统521,用于控制计算机设备500的低级别操作的基本输入输出系统(BIOS)522。另外,还可以存储网页浏览器523,数据存储管理524,以及图标字体处理系统525等等。上述图标字体处理系统525就可以是本申请实施例中具体实现前述各步骤操作的应用程序。总之,在通过软件或者固件来实现本申请所提供的技术方案时,相关的程序代码保存在存储器520中,并由处理器510来调用执行。The memory 520 may be implemented in the form of a read-only memory (ReadOnlyMemory, ROM), a random access memory (RandomAccessMemory, RAM), a static storage device, a dynamic storage device, and the like. The memory 520 may store an operating system 521 for controlling the operation of the computer device 500 and a basic input output system (BIOS) 522 for controlling low-level operations of the computer device 500 . In addition, a web browser 523, a data storage management 524, an icon font processing system 525, etc. can also be stored. The above-mentioned icon font processing system 525 may be an application program in the embodiment of the present application that specifically implements the operations of the foregoing steps. In a word, when implementing the technical solutions provided by the present application through software or firmware, related program codes are stored in the memory 520 and invoked by the processor 510 for execution.
输入/输出接口513用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。The input/output interface 513 is used to connect the input/output module to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions. The input device may include a keyboard, mouse, touch screen, microphone, various sensors, etc., and the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
网络接口514用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。The network interface 514 is used to connect the communication module (not shown in the figure), so as to realize the communication interaction between the device and other devices. The communication module can realize communication through wired means (such as USB, network cable, etc.), and can also realize communication through wireless means (such as mobile network, WIFI, Bluetooth, etc.).
总线530包括一通路,在设备的各个组件(例如处理器510、视频显示适配器511、磁盘驱动器512、输入/输出接口513、网络接口514,与存储器520)之间传输信息。Bus 530 includes a path for transferring information between the various components of the device (eg, processor 510, video display adapter 511, disk drive 512, input/output interface 513, network interface 514, and memory 520).
另外,该计算机设备500还可以从虚拟资源对象领取条件信息数据库541中获得具体领取条件的信息,以用于进行条件判断,等等。In addition, the computer device 500 can also obtain information on specific collection conditions from the virtual resource object collection condition information database 541 for condition judgment, and so on.
需要说明的是,尽管上述计算机设备500仅示出了处理器510、视频显示适配器511、磁盘驱动器512、输入/输出接口513、网络接口514,存储器520,总线530等,但是在具体实施过程中,该计算机设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本申请方案所必需的组件,而不必包含图中所示的全部组件。It should be noted that although the above computer device 500 only shows the processor 510, the video display adapter 511, the disk drive 512, the input/output interface 513, the network interface 514, the memory 520, the bus 530, etc., in the specific implementation process , the computer equipment may also include other components necessary for proper operation. In addition, those skilled in the art can understand that the above-mentioned device may only include components necessary to realize the solution of the present application, and does not necessarily include all the components shown in the figure.
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,云服务器,或者网络设备等)执行本申请各个实施例或者实施例的某些部分所述的方法。It can be known from the above description of the implementation manners that those skilled in the art can clearly understand that the present application can be implemented by means of software plus a necessary general-purpose hardware platform. Based on this understanding, the essence of the technical solution of this application or the part that contributes to the prior art can be embodied in the form of software products, and the computer software products can be stored in storage media, such as ROM/RAM, disk , optical disc, etc., including several instructions to make a computer device (which may be a personal computer, a cloud server, or a network device, etc.) execute the methods described in various embodiments or some parts of the embodiments of the present application.
实施例五:与上述实施例一至实施例四相对应的,下面将对本申请提供的计算机可读存储介质进行介绍。在一个示例中,本申请提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机指令,所述计算机指令使所述计算机执行如下操作:Embodiment 5: Corresponding to Embodiment 1 to Embodiment 4 above, the computer-readable storage medium provided by the present application will be introduced below. In one example, the present application provides a computer-readable storage medium, the computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to perform the following operations:
接收错误注入指令并基于所述错误注入指令生成硬件真实错误发生方案;receiving an error injection instruction and generating a real hardware error occurrence scheme based on the error injection instruction;
发送所述硬件真实错误发生方案至目标硬件,以供所述目标硬件基于所述硬件真实错误发生方案产生真实错误;Sending the hardware real error occurrence scheme to the target hardware, so that the target hardware generates a real error based on the hardware real error occurrence scheme;
获取所述真实错误的日志信息,并获取硬件真实错误发生方案和寄存器信息;Obtain the log information of the real error, and obtain the hardware real error occurrence scheme and register information;
基于所述真实错误的日志信息、所述硬件真实错误发生方案和所述寄存器信息处理所述真实错误以验证所述服务器性能。The real error is processed based on the log information of the real error, the hardware real error occurrence scheme and the register information to verify the performance of the server.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
接收错误注入指令;Receive error injection instructions;
读取服务器硬件配置信息,所述服务器硬件配置信息至少包括:硬件型号、硬件数量、内容容量以及内存位置;Read server hardware configuration information, the server hardware configuration information at least includes: hardware model, hardware quantity, content capacity and memory location;
基于所述错误注入指令与所述服务器硬件配置信息生成硬件真实错误发生方案,所述硬件真实错误发生方案至少包括错误列表、错误优先级以及错误类型。A real hardware error occurrence scheme is generated based on the error injection instruction and the server hardware configuration information, and the hardware real error occurrence scheme at least includes an error list, an error priority, and an error type.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
显示预设错误注入方案于人机交互界面以供用户选择以输入错误注入指令,所述预设错误注入方案包括自定义错误、随机错误与故障洪流中的任意一种。A preset error injection scheme is displayed on the human-computer interaction interface for the user to select to input an error injection command, and the preset error injection scheme includes any one of self-defined errors, random errors, and fault torrents.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
接收用户输入的自定义硬件类型信息、自定义错误类型信息、自定义错误发生次数信息以及错误位置信息;Receive user-defined hardware type information, custom error type information, custom error occurrence times information, and error location information;
基于所述自定义硬件类型信息、自定义错误类型信息、自定义错误发生次数信息以及错误位置信息以及所述服务器硬件配置信息生成硬件真实错误发生方案。A real hardware error occurrence solution is generated based on the user-defined hardware type information, user-defined error type information, user-defined error frequency information, error location information, and the server hardware configuration information.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
基于所述服务器硬件配置信息随机生成硬件真实错误发生方案。A real hardware error occurrence scheme is randomly generated based on the server hardware configuration information.
所述程序指令在被所述一个或多个处理器读取执行时,还执行如下操作:When the program instructions are read and executed by the one or more processors, the following operations are also performed:
基于所述服务器硬件配置信息持续预设时长随机产生错误以生成硬件真实错误发生方案。Randomly generate errors based on the server hardware configuration information for a preset period of time to generate a real hardware error occurrence scheme.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
解析所述硬件真实错误发生方案获取硬件真实错误类别信息,所述硬件真实错误类别信息包括CPU硬件错误、内存硬件错误、PCIE硬件错误以及其他错误中的至少一种;Analyzing the hardware real error occurrence scheme to obtain hardware real error category information, the hardware real error category information includes at least one of CPU hardware errors, memory hardware errors, PCIE hardware errors and other errors;
解析所述硬件真实错误发生方案获取硬件真实错误类别信息对应的目标硬件层信息;Analyzing the hardware real error occurrence scheme to obtain target hardware layer information corresponding to the hardware real error category information;
解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息;Analyzing the hardware real error occurrence scheme to obtain target error type information corresponding to the target hardware layer information;
所述发送所述硬件真实错误发生方案至目标硬件包括:The sending the real error occurrence scheme of the hardware to the target hardware includes:
基于所述硬件真实错误类别信息发送所述目标硬件层信息与所述目标错误类型信息至目标硬件。Sending the target hardware layer information and the target error type information to target hardware based on the hardware real error type information.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
基于所述错误优先级以及所述错误列表解析所述硬件真实错误发生方案获取硬件真实错误类别信息。Analyzing the hardware real error occurrence scheme based on the error priority and the error list to obtain hardware real error category information.
所述解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息时还生成第一错误随机数;The first error random number is also generated when the solution for analyzing the real hardware error occurrence obtains the target error type information corresponding to the target hardware layer information;
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
基于所述第一错误随机数修正所述目标硬件层信息。Correcting the target hardware layer information based on the first erroneous random number.
所述解析所述硬件真实错误发生方案获取所述目标硬件层信息对应的目标错误类型信息时还生成第二错误随机数。When the scheme for analyzing the real hardware error occurrence obtains the target error type information corresponding to the target hardware layer information, a second error random number is also generated.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
基于所述第二错误随机数修正所述目标错误类型信息。Correcting the target error type information based on the second error random number.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
以错误包或错误流形式下发所述目标硬件层信息与所述目标错误类型信息至目标硬件以产生真实错误。Sending the target hardware layer information and the target error type information to target hardware in the form of error packets or error streams to generate real errors.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
基于预设接口抓取所述真实错误的日志信息,所述预设接口包括串口、XDP接口、IPMI接口、redfish接口与SSH接口的至少一种;所述真实错误的日志信息由所述目标硬件在真实错误发生后经UEFI系统上报至OSkernel以及BMC生成。Grab the log information of the real error based on the preset interface, the preset interface includes at least one of a serial port, an XDP interface, an IPMI interface, a redfish interface and an SSH interface; the log information of the real error is generated by the target hardware After a real error occurs, it is reported to the OSkernel and BMC via the UEFI system for generation.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
获取硬件真实错误发生方案;Obtain the real error occurrence scheme of the hardware;
获取错误传递和处理触发寄存器置位的寄存器信息,所述寄存器信息至少包括SMI链路信息与CSMI链路信息。Acquiring register information for error transfer and processing trigger register setting, where the register information at least includes SMI link information and CSMI link information.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
获取错误发生后传递至UEFI系统中触发寄存器置位的寄存器信息。Obtain the register information that is passed to the trigger register setting in the UEFI system after an error occurs.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
判断所述真实错误的日志与所述硬件真实错误发生方案是否一致,并判断所述寄存器信息与所述硬件真实错误发生方案是否一致;Judging whether the log of the real error is consistent with the hardware real error occurrence scheme, and judging whether the register information is consistent with the hardware real error occurrence scheme;
若所述真实错误的日志与所述硬件真实错误发生方案一致,且所述寄存器信息与所述硬件真实错误发生方案一致,则验证所述服务器性能合格。If the real error log is consistent with the hardware real error occurrence scheme, and the register information is consistent with the hardware real error occurrence scheme, then verify that the performance of the server is qualified.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
判断所述真实错误的日志与所述硬件真实错误发生方案中的错误信息是否一致,所述错误信息包括错误类型、错误产生位置、错误次数与错误产生时间。Judging whether the log of the real error is consistent with the error information in the hardware real error occurrence scheme, the error information includes error type, error location, error times, and error occurrence time.
所述计算机指令使所述计算机还执行如下操作:The computer instructions cause the computer to also perform the following operations:
判断所述寄存器信息中的错误类型与所述硬件真实错误发生方案中的错误类型是否一致。Judging whether the error type in the register information is consistent with the error type in the hardware real error occurrence scheme.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to multiple network modules. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.
另外,需要理解的是:本申请中术语“第一”、“第二”、“第三”、“第四”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”、“第三”、“第四”的特征可以明示或者隐含地包括一个或者更多个该特征。In addition, it should be understood that the terms "first", "second", "third" and "fourth" in this application are only used for descriptive purposes, and should not be understood as indicating or implying relative importance or implying The number of technical characteristics indicated. Thus, a feature defined as "first", "second", "third" and "fourth" may expressly or implicitly include one or more of such features.
当然上述实施例只为说明本发明的技术构思及特点,其目的在于让熟悉此项技术的人能够了解本发明的内容并据以实施,并不能以此限制本发明的保护范围。凡根据本发明主要技术方案的精神实质所做的修饰,都应涵盖在本发明的保护范围之内。Of course, the above-mentioned embodiments are only for illustrating the technical conception and characteristics of the present invention, and its purpose is to enable those skilled in the art to understand the content of the present invention and implement it accordingly, and not to limit the protection scope of the present invention. All modifications made according to the spirit of the main technical solutions of the present invention shall fall within the protection scope of the present invention.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310432839.1A CN116521496A (en) | 2023-04-21 | 2023-04-21 | Method, system, computer device and storage medium for verifying server performance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310432839.1A CN116521496A (en) | 2023-04-21 | 2023-04-21 | Method, system, computer device and storage medium for verifying server performance |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116521496A true CN116521496A (en) | 2023-08-01 |
Family
ID=87393405
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310432839.1A Pending CN116521496A (en) | 2023-04-21 | 2023-04-21 | Method, system, computer device and storage medium for verifying server performance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116521496A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118838769A (en) * | 2024-09-20 | 2024-10-25 | 山东云海国创云计算装备产业创新中心有限公司 | Fault tolerance function test method, device and equipment of hardware equipment and storage medium |
-
2023
- 2023-04-21 CN CN202310432839.1A patent/CN116521496A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118838769A (en) * | 2024-09-20 | 2024-10-25 | 山东云海国创云计算装备产业创新中心有限公司 | Fault tolerance function test method, device and equipment of hardware equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9569325B2 (en) | Method and system for automated test and result comparison | |
US6915343B1 (en) | System and method of running diagnostic testing programs on a diagnostic adapter card and analyzing the results for diagnosing hardware and software problems on a network computer | |
US7664986B2 (en) | System and method for determining fault isolation in an enterprise computing system | |
EP2696534B1 (en) | Method and device for monitoring quick path interconnect link | |
US20200241985A1 (en) | Methods, electronic devices, storage systems, and computer program products for error detection | |
US20080276129A1 (en) | Software tracing | |
CN110362473A (en) | Test optimization method and device, storage medium, the terminal of environment | |
CN116627861A (en) | Data processing method and system based on expander, electronic equipment and storage medium | |
CN116680101A (en) | Method and device for detecting downtime of operating system, and method and device for eliminating downtime of operating system | |
CN116521496A (en) | Method, system, computer device and storage medium for verifying server performance | |
CN109710479B (en) | Processing method, first device and second device | |
WO2020087956A1 (en) | Method, apparatus, device and system for capturing trace of nvme hard disc | |
CN118113508A (en) | Network card fault risk prediction method, device, equipment and medium | |
CN106656684B (en) | Grid resource reliability monitoring method and device | |
CN115686896A (en) | Extended memory error processing method, system, electronic device and storage medium | |
CN116306413A (en) | FPGA simulation verification method and device, electronic equipment and storage medium | |
CN115794530A (en) | Hardware connection testing method, device, equipment and readable storage medium | |
CN115373923A (en) | A 0x7c error location method, device and medium | |
CN113742113B (en) | Health management method, equipment and storage medium for embedded system | |
CN114461350A (en) | Container usability testing method and device | |
CN112199247B (en) | A method and device for checking the activity of a Docker container process in a non-business state | |
CN112003727A (en) | A multi-node server power test method, system, terminal and storage medium | |
JP2022033610A (en) | Device for electronic apparatus, control method for device for electronic apparatus, and control program for device for electronic apparatus | |
CN118069296B (en) | Software-based full-system virtualization simulation system and construction method thereof | |
KR940006834B1 (en) | Method of generating the diagnosing and recovery data file in multiprocessor system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |