CN113886165B - Verification method, device and equipment for firmware diagnosis function and readable medium - Google Patents

Verification method, device and equipment for firmware diagnosis function and readable medium Download PDF

Info

Publication number
CN113886165B
CN113886165B CN202111116013.1A CN202111116013A CN113886165B CN 113886165 B CN113886165 B CN 113886165B CN 202111116013 A CN202111116013 A CN 202111116013A CN 113886165 B CN113886165 B CN 113886165B
Authority
CN
China
Prior art keywords
error
data
module
function
diagnosis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111116013.1A
Other languages
Chinese (zh)
Other versions
CN113886165A (en
Inventor
罗鹏芳
王兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202111116013.1A priority Critical patent/CN113886165B/en
Publication of CN113886165A publication Critical patent/CN113886165A/en
Application granted granted Critical
Publication of CN113886165B publication Critical patent/CN113886165B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a verification method of firmware diagnosis function, comprising the following steps: triggering system management interruption in a system management mode, and judging whether a BIOS setting menu is started or not; if the BIOS menu is started, starting a writing function of the machine checking module; acquiring data to be error-injected and a corresponding error source, and executing a diagnostic code based on the data to be error-injected to obtain diagnostic data containing an error component; sending the diagnosis data to a BMC and reporting an OS (baseboard management controller) to judge whether an error component in the diagnosis data is consistent with the corresponding error source or not; and if the error component in the diagnosis data is consistent with the corresponding error source, confirming that the diagnosis function is normal. The invention also discloses a verification device of the firmware diagnosis function, computer equipment and a readable storage medium.

Description

一种固件诊断功能的验证方法、装置、设备及可读介质A verification method, device, device and readable medium for firmware diagnosis function

技术领域technical field

本发明涉及计算机服务器技术领域,尤其涉及一种固件诊断功能的验证方法、装置、设备及可读介质。The present invention relates to the technical field of computer servers, in particular to a verification method, device, equipment and readable medium for a firmware diagnosis function.

背景技术Background technique

随着近几年互联网时代的发展,对海量数据处理能力的需求正在快速增长,从而对服务器提出了更高的要求,作为服务器产业的原始动力,先进技术的应用对于用户采购会起到决定性的作用。在网络技术、虚拟化技术、分布式应用快速发展的今天,对服务器要求的可用性、可靠性、可服务性的指标越来越高。金融、电信业务的正常运转高度依赖于信息系统的持续稳定运行,对服务器的可用性也提出了很高的要求,要求服务器系统的可用度达到99.99%。With the development of the Internet era in recent years, the demand for massive data processing capabilities is growing rapidly, which puts forward higher requirements for servers. As the original driving force of the server industry, the application of advanced technology will play a decisive role in user procurement. effect. Today, with the rapid development of network technology, virtualization technology, and distributed applications, the indicators of server availability, reliability, and serviceability are getting higher and higher. The normal operation of financial and telecommunications services is highly dependent on the continuous and stable operation of the information system, which also puts forward high requirements on the availability of the server, requiring the availability of the server system to reach 99.99%.

服务器运行阶段一旦出现故障,需要通过一种故障及时收集、解析、诊断的方法,及时将故障上报到带外监控管理系统并通知操作系统故障处理服务,用户能及时通过故障日志信息获取当前服务器硬件的健康状态。对已经告警的部件,客户可以在方便的时候停机更换故障部件。Once a fault occurs during the server operation phase, it is necessary to report the fault to the out-of-band monitoring management system and notify the operating system fault handling service through a method of timely collection, analysis, and diagnosis of the fault, so that the user can obtain the current server hardware through the fault log information in a timely manner health status. For the parts that have been alarmed, the customer can stop the machine to replace the faulty parts at a convenient time.

现有技术基于Intel芯片的通用服务器,仅支持工具模拟内存注错和PCIE注错,而CPU的子模块有PCU、IFU、DFU、UPI、UBOX等,这些模块的错误没有专门的测试工具,intel提供AMEI工具注错,但是实际使用很不方便,且需要手动输入脚本命令将错误数据写到Mcbank,然后下命令触发CMCI或MCE。The existing general-purpose servers based on Intel chips only support tools for simulating memory error injection and PCIE error injection, while CPU sub-modules include PCU, IFU, DFU, UPI, UBOX, etc. There are no special testing tools for errors in these modules. The AMEI tool is provided to note errors, but it is very inconvenient to use in practice, and it is necessary to manually enter script commands to write error data to Mcbank, and then issue commands to trigger CMCI or MCE.

MCA的几个64bit长度的寄存器(MC_CTL、MC_STATUS、MC_ADDR、MC_MISC)需要人工组64bit数据,并手动将数据一条一条命令通过intel工具写到寄存器,再执行触发中断的命令触发CMCI或MCE通知OS,测试过程只能验证legacy MCA的OS优先的处理流程,即MCA错误出现后直接由OS处理,无法触发SMI验证固件优先的处理流程。Several 64-bit registers of MCA (MC_CTL, MC_STATUS, MC_ADDR, MC_MISC) need to manually assemble 64-bit data, and manually write the data to the registers one by one through the intel tool, and then execute the command that triggers the interrupt to trigger CMCI or MCE to notify the OS. The test process can only verify the OS priority processing flow of the legacy MCA, that is, the MCA error is directly processed by the OS, and cannot trigger the SMI to verify the firmware priority processing flow.

发明内容Contents of the invention

有鉴于此,本发明实施例的目的在于提出一种固件诊断功能的验证方法、装置、设备及可读介质,在BIOS中将错误信息填写到寄存器模拟一种可能的或之前发生过的错误场景,然后再执行BIOS诊断代码,验证BIOS是否正确将有效数据发送给BMC和OS,并且BMC和OS是否正确接收到信号并处理。In view of this, the purpose of the embodiment of the present invention is to propose a verification method, device, device and readable medium of a firmware diagnostic function, and fill in error information into a register in the BIOS to simulate a possible or previously occurred error scenario , and then execute the BIOS diagnostic code to verify whether the BIOS correctly sends valid data to the BMC and OS, and whether the BMC and OS receive and process the signal correctly.

基于上述目的,本发明实施例的一方面提供了一种固件诊断功能的验证方法,包括以下步骤:在系统管理模式下触发系统管理中断,并判断BIOS设置菜单是否开启;若是BIOS菜单开启,则开启机器检查模块的写功能;获取待注错数据和对应的错误源,并基于所述待注错数据执行诊断代码以得到包含错误部件的诊断数据;将所述诊断数据发送给BMC并上报OS,以判断所述诊断数据中的错误部件是否与所述对应的错误源一致;以及若是所述诊断数据中的错误部件与所述对应的错误源一致,则确认诊断功能正常。Based on the above purpose, an aspect of the embodiments of the present invention provides a method for verifying a firmware diagnostic function, comprising the following steps: triggering a system management interrupt in the system management mode, and judging whether the BIOS setting menu is enabled; if the BIOS menu is enabled, then Open the write function of the machine inspection module; obtain the error data to be noted and the corresponding error source, and execute the diagnostic code based on the error data to be noted to obtain the diagnostic data containing the wrong part; send the diagnostic data to the BMC and report to the OS , to determine whether the error component in the diagnosis data is consistent with the corresponding error source; and if the error component in the diagnosis data is consistent with the corresponding error source, confirm that the diagnosis function is normal.

在一些实施方式中,开启机器检查模块的写功能包括:判断机器检查模块是否支持写功能;若是机器检查模块支持写功能,则开启机器检查模块的写功能。In some embodiments, enabling the write function of the machine check module includes: determining whether the machine check module supports the write function; if the machine check module supports the write function, enabling the write function of the machine check module.

在一些实施方式中,方法还包括:若是机器检查模块不支持写功能,则发出报错告警。In some implementations, the method further includes: if the machine check module does not support the write function, sending an error alarm.

在一些实施方式中,方法还包括:基于预设编码规则对CPU子模块进行编码,并将所述编码规则保存在BMC和待注错数据库中。In some embodiments, the method further includes: encoding the CPU submodule based on a preset encoding rule, and storing the encoding rule in the BMC and the error-to-be-reported database.

在一些实施方式中,获取待注错数据和对应的错误源包括:从所述待注错数据库中获取待注错数据和对应的错误源。In some implementation manners, acquiring the error pending data and corresponding error sources includes: acquiring the pending error data and corresponding error sources from the error pending database.

在一些实施方式中,方法还包括:若是所述诊断数据中的错误部件与所述对应的错误源不一致,则确认诊断功能异常。In some embodiments, the method further includes: if the error component in the diagnosis data is inconsistent with the corresponding error source, confirming that the diagnosis function is abnormal.

本发明实施例的另一方面,还提供了一种固件诊断功能的验证装置,包括:第一模块,配置用于在系统管理模式下触发系统管理中断,并判断BIOS设置菜单是否开启;第二模块,配置用于若是BIOS菜单开启,则开启机器检查模块的写功能;第三模块,配置用于获取待注错数据和对应的错误源,并基于所述待注错数据执行诊断代码以得到包含错误部件的诊断数据;第四模块,配置用于将所述诊断数据发送给BMC并上报OS,以判断所述诊断数据中的错误部件是否与所述对应的错误源一致;以及第五模块,配置用于若是所述诊断数据中的错误部件与所述对应的错误源一致,则确认诊断功能正常。Another aspect of the embodiments of the present invention also provides a device for verifying firmware diagnostic functions, including: a first module configured to trigger a system management interrupt in the system management mode and determine whether the BIOS setting menu is enabled; The module is configured to open the write function of the machine inspection module if the BIOS menu is opened; the third module is configured to obtain the error data to be noted and the corresponding error source, and execute the diagnostic code based on the error data to be noted to obtain Diagnostic data including faulty components; a fourth module configured to send the diagnostic data to the BMC and report to the OS to determine whether the faulty components in the diagnostic data are consistent with the corresponding error source; and a fifth module , configured to confirm that the diagnosis function is normal if the error component in the diagnosis data is consistent with the corresponding error source.

在一些实施方式中,第二模块进一步配置用于:判断机器检查模块是否支持写功能;若是机器检查模块支持写功能,则开启机器检查模块的写功能。In some implementations, the second module is further configured to: determine whether the machine check module supports the write function; if the machine check module supports the write function, enable the write function of the machine check module.

在一些实施方式中,第二模块进一步配置用于:若是机器检查模块不支持写功能,则发出报错告警。In some implementations, the second module is further configured to: if the machine check module does not support the write function, send an error alarm.

在一些实施方式中,第三模块进一步配置用于:基于预设编码规则对CPU子模块进行编码,并将所述编码规则保存在BMC和待注错数据库中。In some implementations, the third module is further configured to: encode the CPU sub-module based on a preset encoding rule, and save the encoding rule in the BMC and the error-to-be-reported database.

在一些实施方式中,第三模块进一步配置用于:从所述待注错数据库中获取待注错数据和对应的错误源。In some implementations, the third module is further configured to: obtain the error pending data and the corresponding error source from the fault pending database.

在一些实施方式中,第五模块进一步配置用于:若是所述诊断数据中的错误部件与所述对应的错误源不一致,则确认诊断功能异常。In some implementations, the fifth module is further configured to: confirm that the diagnostic function is abnormal if the faulty component in the diagnostic data is inconsistent with the corresponding faulty source.

本发明实施例的再一方面,还提供了一种计算机设备,包括:至少一个处理器;以及存储器,存储器存储有可在处理器上运行的计算机指令,指令由处理器执行时实现方法的步骤包括:在系统管理模式下触发系统管理中断,并判断BIOS设置菜单是否开启;若是BIOS菜单开启,则开启机器检查模块的写功能;获取待注错数据和对应的错误源,并基于所述待注错数据执行诊断代码以得到包含错误部件的诊断数据;将所述诊断数据发送给BMC并上报OS,以判断所述诊断数据中的错误部件是否与所述对应的错误源一致;以及若是所述诊断数据中的错误部件与所述对应的错误源一致,则确认诊断功能正常。In yet another aspect of the embodiments of the present invention, there is also provided a computer device, including: at least one processor; and a memory, the memory stores computer instructions that can be run on the processor, and when the instructions are executed by the processor, the steps of the method are implemented Including: triggering a system management interruption in the system management mode, and judging whether the BIOS setting menu is open; if the BIOS menu is open, then enabling the write function of the machine inspection module; obtaining the error data to be noted and the corresponding error source, and based on the pending Annotate the error data and execute the diagnosis code to obtain the diagnosis data containing the wrong part; send the diagnosis data to the BMC and report to the OS to judge whether the wrong part in the diagnosis data is consistent with the corresponding error source; and if the If the error component in the above diagnosis data is consistent with the corresponding error source, it is confirmed that the diagnosis function is normal.

在一些实施方式中,开启机器检查模块的写功能包括:判断机器检查模块是否支持写功能;若是机器检查模块支持写功能,则开启机器检查模块的写功能。In some embodiments, enabling the write function of the machine check module includes: determining whether the machine check module supports the write function; if the machine check module supports the write function, enabling the write function of the machine check module.

在一些实施方式中,方法的步骤还包括:若是机器检查模块不支持写功能,则发出报错告警。In some embodiments, the steps of the method further include: if the machine check module does not support the writing function, sending an error alarm.

在一些实施方式中,方法的步骤还包括:基于预设编码规则对CPU子模块进行编码,并将所述编码规则保存在BMC和待注错数据库中。In some embodiments, the steps of the method further include: encoding the CPU sub-module based on a preset encoding rule, and storing the encoding rule in the BMC and the error-to-be-reported database.

在一些实施方式中,获取待注错数据和对应的错误源包括:从所述待注错数据库中获取待注错数据和对应的错误源。In some implementation manners, acquiring the error pending data and corresponding error sources includes: acquiring the pending error data and corresponding error sources from the error pending database.

在一些实施方式中,方法的步骤还包括:若是所述诊断数据中的错误部件与所述对应的错误源不一致,则确认诊断功能异常。In some embodiments, the steps of the method further include: if the error component in the diagnosis data is inconsistent with the corresponding error source, confirming that the diagnosis function is abnormal.

本发明实施例的再一方面,还提供了一种计算机可读存储介质,计算机可读存储介质存储有被处理器执行时实现如上方法步骤的计算机程序。In yet another aspect of the embodiments of the present invention, a computer-readable storage medium is also provided, and the computer-readable storage medium stores a computer program for implementing the above method steps when executed by a processor.

本发明至少具有以下有益技术效果:模拟真实的MCA错误的故障场景验证固件优先处理MCA的正确性,解决了因为无法模拟的MCA故障场景而导致无法验证固件的MCA故障处理流程的准确性,同时,无需借助故障部件和注错工具,此发明可应用于测试的自动化,测试时间短,故障案例覆盖全面。本发明也可用于所有通用计算机系统产品支持固件优先处理的固件可靠性故障诊断功能验证方案。The present invention has at least the following beneficial technical effects: simulating a real MCA fault scene to verify the correctness of the firmware prioritizing the MCA, solving the problem of the inability to verify the accuracy of the firmware's MCA fault processing flow due to the inability to simulate the MCA fault scene, and at the same time , the invention can be applied to test automation without the need of faulty components and error injection tools, with short test time and comprehensive coverage of fault cases. The invention can also be used in the verification scheme of firmware reliability fault diagnosis function that all general computer system products support firmware priority processing.

通过带外系统通过IPMI命令按照预定义的数据格式设置注错误的数据,硬件上预先设计GPIO支持带外触发SMI中断,带内在SMI中加入注错模块,通过IPMI获取故障数据并写入故障模块从而实现的MCA注错过程,注错过程无需借助故障部件和注错工具,仅需收集历史案例和理论数据作为输入数据即可完成注错过程,能更全面覆盖故障场景。Through the out-of-band system, use the IPMI command to set the error data according to the predefined data format. The GPIO is pre-designed on the hardware to support the out-of-band triggering of the SMI interrupt. The in-band error injection module is added to the SMI, and the fault data is obtained through IPMI and written into the fault module. In this way, the MCA error injection process can be realized. The error injection process does not need to use faulty components and error injection tools. It only needs to collect historical cases and theoretical data as input data to complete the error injection process, which can more comprehensively cover fault scenarios.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的实施例。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention, and those skilled in the art can obtain other embodiments according to these drawings without any creative effort.

图1为本发明提供的固件诊断功能的验证方法的实施例的示意图;FIG. 1 is a schematic diagram of an embodiment of a method for verifying a firmware diagnostic function provided by the present invention;

图2为本发明提供的固件诊断功能的验证装置的实施例的示意图;2 is a schematic diagram of an embodiment of a verification device for a firmware diagnostic function provided by the present invention;

图3为本发明提供的计算机设备的实施例的示意图;FIG. 3 is a schematic diagram of an embodiment of a computer device provided by the present invention;

图4为本发明提供的计算机可读存储介质的实施例的示意图。FIG. 4 is a schematic diagram of an embodiment of a computer-readable storage medium provided by the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本发明实施例进一步详细说明。In order to make the object, technical solution and advantages of the present invention clearer, the embodiments of the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

需要说明的是,本发明实施例中所有使用“第一”和“第二”的表述均是为了区分两个相同名称非相同的实体或者非相同的参量,可见“第一”“第二”仅为了表述的方便,不应理解为对本发明实施例的限定,后续实施例对此不再一一说明。It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are to distinguish two entities with the same name but different parameters or parameters that are not the same, see "first" and "second" It is only for the convenience of expression, and should not be construed as a limitation on the embodiments of the present invention, which will not be described one by one in the subsequent embodiments.

现有技术中,Intel的CPU支持MCA检测报告错误的功能,记录和报告的错误种类多达几百种,如何在开发过程中验证软件处理错误的流程并且故障日志记录是否正确,实际开发过程中缺乏真实的故障场景,可以模拟真实故障场景的方法非常有限,有必要通过设计一种注入故障数据模拟故障场景来达到验证MCA错误处理的目的,不会因为缺失测试方法而出现功能缺陷,从而保证产品对MCA错误的故障诊断功能健壮性。In the prior art, Intel's CPU supports the function of MCA to detect and report errors. There are hundreds of types of errors recorded and reported. How to verify the process of software error handling and whether the fault log records are correct during the development process? In the actual development process In the absence of real fault scenarios, the methods that can simulate real fault scenarios are very limited. It is necessary to design a simulated fault scenario by injecting fault data to achieve the purpose of verifying MCA error handling, and there will be no functional defects due to lack of test methods, so as to ensure Robustness of the product's fault diagnosis function against MCA errors.

基于上述目的,本发明实施例的第一个方面,提出了固件诊断功能的验证方法的实施例。图1示出的是本发明提供的固件诊断功能的验证方法的实施例的示意图。如图1所示,本发明实施例的固件诊断功能的验证方法包括如下步骤:Based on the above purpose, the first aspect of the embodiments of the present invention proposes an embodiment of a method for verifying a firmware diagnosis function. FIG. 1 is a schematic diagram of an embodiment of a method for verifying a firmware diagnosis function provided by the present invention. As shown in Figure 1, the verification method of the firmware diagnosis function of the embodiment of the present invention comprises the following steps:

001、在系统管理模式下触发系统管理中断,并判断BIOS设置菜单是否开启;001. Trigger a system management interrupt in the system management mode, and determine whether the BIOS setting menu is enabled;

002、若是BIOS菜单开启,则开启机器检查模块的写功能;002. If the BIOS menu is enabled, enable the write function of the machine inspection module;

003、获取待注错数据和对应的错误源,并基于待注错数据执行诊断代码以得到包含错误部件的诊断数据;003. Obtain the error data to be noted and the corresponding error source, and execute the diagnostic code based on the error data to be recorded to obtain the diagnostic data including the wrong part;

004、将诊断数据发送给BMC并上报OS,以判断诊断数据中的错误部件是否与对应的错误源一致;以及004. Send the diagnostic data to the BMC and report to the OS to determine whether the faulty component in the diagnostic data is consistent with the corresponding error source; and

005、若是诊断数据中的错误部件与对应的错误源一致,则确认诊断功能正常。005. If the error component in the diagnosis data is consistent with the corresponding error source, confirm that the diagnosis function is normal.

在本实施例中,通过带外系统通过IPMI(Intelligent Platform ManagementInterface,智能平台管理接口)命令按照预定义的数据格式设置注错误的数据,硬件上预先设计GPIO(General-purpose input/output,通用型输入输出)支持带外触发SMI(系统管理中断)中断,带内在SMI中加入注错模块,通过IPMI获取故障数据并写入故障模块从而实现的MCA(Machine Check Architecture,机器检查架构)注错过程,注错过程无需借助故障部件和注错工具,仅需收集历史案例和理论数据作为输入数据即可完成注错过程,能更全面覆盖故障场景。In this embodiment, by the out-of-band system through the IPMI (Intelligent Platform Management Interface, intelligent platform management interface) command according to the pre-defined data format to set the wrong data, the hardware is pre-designed GPIO (General-purpose input/output, general-purpose Input and output) supports out-of-band triggering of SMI (system management interrupt) interrupts, adding an error injection module to SMI in-band, obtaining fault data through IPMI and writing it into the fault module to realize the MCA (Machine Check Architecture, Machine Check Architecture) error injection process , The error injection process does not need to use faulty components and error injection tools, and only needs to collect historical cases and theoretical data as input data to complete the error injection process, which can more comprehensively cover fault scenarios.

在本实施例中,仅适用于服务器固件开发结果的验证,当服务器量产后,功能验证没有问题后,需要将此功能从固件中关闭或删除。In this embodiment, it is only applicable to the verification of server firmware development results. After the server is mass-produced and there is no problem with the functional verification, this function needs to be disabled or deleted from the firmware.

在本实施例中,定义BIOS的setup(设置)菜单选项,控制功能开启或关闭;硬件预留南桥的GPIO连接到BMC(Baseboard Management Controller,基板控制管理器),属性GPI,配置为可触发SMI中断,BMC通过IPMI命令控制触发SMI;按照CPU子模块进行编码,如:DCU为1,IFU为2,MLC为3等,通过BMC设置IPMI命令注错模块和此模块对应的MCBank数据,BIOS可通过IPMI命令获取设置的错误模块和MCBank数据,其中,MCBank数据可从历史故障案例中提炼总结形成的案例数据库获取或从理论上推测可能发生错误的错误数据。In this embodiment, the setup (setup) menu option of the BIOS is defined, and the control function is turned on or off; the GPIO of the hardware reserved south bridge is connected to the BMC (Baseboard Management Controller, baseboard control manager), and the attribute GPI is configured to trigger SMI interrupt, BMC triggers SMI through IPMI command control; code according to CPU sub-module, such as: DCU is 1, IFU is 2, MLC is 3, etc., set the IPMI command to note the wrong module and the corresponding MCBank data of this module through BMC, BIOS The set error module and MCBank data can be obtained through the IPMI command. The MCBank data can be obtained from the case database formed by extracting and summarizing historical fault cases or theoretically speculate that error data that may have occurred.

执行验证流程时通过BMC设置IPMI命令注错模块和此模块对应的MCBank数据存到BMC;由于注错的相关寄存器仅支持在SMM(系统管理模式)下写数值,所以在BIOS的SMI代码中增加注错模块,并确保注错模块在SMI中早于错误处理代码前注册到SMI的处理模块;先检查BIOS setup菜单选项是否开启;开启后则检查MSR的bit位MCA Bank是否支持写操作;如果支持则设置MSR对应功能的bit位开启MCbank写功能;然后通过IPMI命令从BMC获取注错数据;当获取到注错数据后,设置对应子模块的Mcbank数据,即设置MCi_STATUS、MCi_ADDR、MCi_MISC,并设置错误注错控制寄存器的SMI错误源对应bank位置1;执行完注错代码后,执行错误处理代码,检查MCbank的SMI错误源,当检测到错误源后,则读取Mcbank数据,对数据进行解析并诊断,将最终带有错误部件信息的数据发送给BMC,并触发CMCI或MCA通知OS获取错误信息;BMC收到信息后诊断并记录故障部件,OS的mcelog记录错误信息;测试人员验证BMC是否诊断正确的故障部件,OS的mcelog驱动是否显示故障信息。When performing the verification process, set the IPMI command through the BMC to save the error module and the MCBank data corresponding to this module to the BMC; because the relevant registers of the error only support writing values in SMM (system management mode), so add in the SMI code of the BIOS. Note the error module, and make sure that the error note module is registered to the SMI processing module in the SMI earlier than the error handling code; first check whether the BIOS setup menu option is enabled; after enabling it, check whether the bit MCA Bank of the MSR supports write operations; if If it is supported, set the bit corresponding to the MSR function to enable the MCbank write function; then obtain the error data from the BMC through the IPMI command; when the error data is obtained, set the Mcbank data of the corresponding submodule, that is, set MCi_STATUS, MCi_ADDR, MCi_MISC, and Set the SMI error source of the error injection control register to correspond to the bank position 1; after executing the error code, execute the error handling code to check the SMI error source of MCbank. When the error source is detected, read the Mcbank data and perform data processing. Analyze and diagnose, send the final data with error component information to BMC, and trigger CMCI or MCA to notify OS to obtain error information; BMC diagnoses and records the faulty component after receiving the information, and the mcelog of OS records error information; testers verify BMC Whether the correct faulty component is diagnosed, and whether the mcelog driver of the OS displays fault information.

在本发明的一些实施例中,开启机器检查模块的写功能包括:判断机器检查模块是否支持写功能;若是机器检查模块支持写功能,则开启机器检查模块的写功能。In some embodiments of the present invention, enabling the write function of the machine check module includes: judging whether the machine check module supports the write function; if the machine check module supports the write function, enabling the write function of the machine check module.

在本实施例中,执行验证流程时通过BMC设置IPMI命令注错模块和此模块对应的MCBank数据存到BMC;由于注错的相关寄存器仅支持在SMM下写数值,所以在BIOS的SMI代码中增加注错模块,并确保注错模块在SMI中早于错误处理代码前注册到SMI的处理模块;先检查BIOS setup菜单选项是否开启;开启后则检查MSR的bit位MCA Bank是否支持写操作;如果支持则设置MSR对应功能的bit位开启MCbank写功能。In this embodiment, when the verification process is executed, the IPMI command error module and the corresponding MCBank data of this module are stored in the BMC through the BMC; because the relevant registers of the error annotation only support writing values under the SMM, so in the SMI code of the BIOS Add an error injection module, and ensure that the error injection module is registered to the SMI processing module before the error handling code in the SMI; first check whether the BIOS setup menu option is enabled; after enabling it, check whether the bit MCA Bank of the MSR supports write operations; If supported, set the bit corresponding to the MSR function to enable the MCbank write function.

在本发明的一些实施例中,方法还包括:若是机器检查模块不支持写功能,则发出报错告警。In some embodiments of the present invention, the method further includes: if the machine checking module does not support the write function, sending an error alarm.

在本发明的一些实施例中,方法还包括:基于预设编码规则对CPU子模块进行编码,并将编码规则保存在BMC和待注错数据库中。In some embodiments of the present invention, the method further includes: encoding the CPU sub-module based on a preset encoding rule, and saving the encoding rule in the BMC and the error-to-be-reported database.

在本实施例中,按照CPU子模块进行编码,如:DCU为1,IFU为2,MLC为3等,通过BMC设置IPMI命令注错模块和此模块对应的MCBank数据,BIOS可通过IPMI命令获取设置的错误模块和MCBank数据,其中,MCBank数据可从历史故障案例中提炼总结形成的案例数据库获取或从理论上推测可能发生错误的错误数据。In this embodiment, encode according to the CPU sub-module, such as: DCU is 1, IFU is 2, MLC is 3, etc., and the IPMI command is used to set the wrong module and the MCBank data corresponding to this module through the BMC, and the BIOS can be obtained through the IPMI command The set error module and MCBank data, wherein the MCBank data can be obtained from the case database formed by extracting and summarizing historical fault cases or theoretically speculate that error data that may occur errors.

在本发明的一些实施例中,获取待注错数据和对应的错误源包括:从待注错数据库中获取待注错数据和对应的错误源。In some embodiments of the present invention, acquiring the error pending data and the corresponding error source includes: acquiring the error pending data and the corresponding error source from the error pending database.

在本发明的一些实施例中,方法还包括:若是诊断数据中的错误部件与对应的错误源不一致,则确认诊断功能异常。In some embodiments of the present invention, the method further includes: if the error component in the diagnosis data is inconsistent with the corresponding error source, confirming that the diagnosis function is abnormal.

在本实施例中,将最终带有错误部件信息的数据发送给BMC,并触发CMCI或MCA通知OS获取错误信息;BMC收到信息后诊断并记录故障部件,OS的mcelog记录错误信息;测试人员验证BMC是否诊断正确的故障部件,OS的mcelog驱动是否显示故障信息。In this embodiment, the data with error component information is finally sent to BMC, and triggers CMCI or MCA to notify OS to obtain error information; BMC diagnoses and records faulty components after receiving the information, and the mcelog of OS records error information; testers Verify whether the BMC diagnoses the correct faulty component and whether the mcelog driver of the OS displays fault information.

在本实施例中,模拟真实的MCA错误的故障场景验证固件优先处理MCA的正确性,解决了因为无法模拟的MCA故障场景而导致无法验证固件的MCA故障处理流程的准确性,同时,无需借助故障部件和注错工具,此发明可应用于测试的自动化,测试时间短,故障案例覆盖全面。本发明也可用于所有通用计算机系统产品支持固件优先处理的固件可靠性故障诊断功能验证方案。In this embodiment, the correctness of the firmware prioritizing the processing of MCA is verified by simulating a real MCA fault scenario, which solves the problem of being unable to verify the accuracy of the MCA fault handling process of the firmware due to the MCA fault scenario that cannot be simulated. Faulty components and error injection tools, this invention can be applied to test automation, the test time is short, and the fault cases are comprehensively covered. The invention can also be used in the verification scheme of firmware reliability fault diagnosis function that all general computer system products support firmware priority processing.

需要特别指出的是,上述固件诊断功能的验证方法的各个实施例中的各个步骤均可以相互交叉、替换、增加、删减,因此,这些合理的排列组合变换之于固件诊断功能的验证方法也应当属于本发明的保护范围,并且不应将本发明的保护范围局限在实施例之上。It should be pointed out that each step in each embodiment of the verification method of the above-mentioned firmware diagnosis function can be mutually interleaved, replaced, added, and deleted. It should belong to the protection scope of the present invention, and should not limit the protection scope of the present invention to the embodiment.

基于上述目的,本发明实施例的第二个方面,提出了一种固件诊断功能的验证装置。图2示出的是本发明提供的固件诊断功能的验证装置的实施例的示意图。如图2所示,本发明实施例的固件诊断功能的验证装置包括如下模块:第一模块011,配置用于在系统管理模式下触发系统管理中断,并判断BIOS设置菜单是否开启;第二模块012,配置用于若是BIOS菜单开启,则开启机器检查模块的写功能;第三模块013,配置用于获取待注错数据和对应的错误源,并基于待注错数据执行诊断代码以得到包含错误部件的诊断数据;第四模块014,配置用于将诊断数据发送给BMC并上报OS,以判断诊断数据中的错误部件是否与对应的错误源一致;以及第五模块015,配置用于若是诊断数据中的错误部件与对应的错误源一致,则确认诊断功能正常。Based on the above purpose, a second aspect of the embodiments of the present invention provides a device for verifying a firmware diagnosis function. FIG. 2 is a schematic diagram of an embodiment of a verification device for a firmware diagnosis function provided by the present invention. As shown in Figure 2, the verification device of the firmware diagnostic function of the embodiment of the present invention includes the following modules: a first module 011 configured to trigger a system management interrupt in the system management mode, and determine whether the BIOS setting menu is enabled; the second module 012, configured to enable the writing function of the machine inspection module if the BIOS menu is enabled; the third module 013, configured to obtain the error data to be noted and the corresponding error source, and execute the diagnostic code based on the error data to be included to obtain Diagnosis data of the faulty component; the fourth module 014 is configured to send the diagnostic data to the BMC and report to the OS to determine whether the faulty component in the diagnostic data is consistent with the corresponding error source; and the fifth module 015 is configured to be used if If the error component in the diagnosis data is consistent with the corresponding error source, it is confirmed that the diagnosis function is normal.

在本发明的一些实施例中,第二模块012进一步配置用于:判断机器检查模块是否支持写功能;若是机器检查模块支持写功能,则开启机器检查模块的写功能。In some embodiments of the present invention, the second module 012 is further configured to: determine whether the machine check module supports the write function; if the machine check module supports the write function, enable the write function of the machine check module.

在本发明的一些实施例中,第二模块012进一步配置用于:若是机器检查模块不支持写功能,则发出报错告警。In some embodiments of the present invention, the second module 012 is further configured to: if the machine check module does not support the write function, send an error alarm.

在本发明的一些实施例中,第三模块013进一步配置用于:基于预设编码规则对CPU子模块进行编码,并将编码规则保存在BMC和待注错数据库中。In some embodiments of the present invention, the third module 013 is further configured to: encode the CPU sub-module based on a preset encoding rule, and save the encoding rule in the BMC and the error-to-be-reported database.

在本发明的一些实施例中,第三模块013进一步配置用于:从待注错数据库中获取待注错数据和对应的错误源。In some embodiments of the present invention, the third module 013 is further configured to: obtain the error pending data and the corresponding error source from the pending error database.

在本发明的一些实施例中,第五模块015进一步配置用于:若是诊断数据中的错误部件与对应的错误源不一致,则确认诊断功能异常。In some embodiments of the present invention, the fifth module 015 is further configured to: confirm that the diagnosis function is abnormal if the error component in the diagnosis data is inconsistent with the corresponding error source.

基于上述目的,本发明实施例的第三个方面,提出了一种计算机设备。图3示出的是本发明提供的计算机设备的实施例的示意图。如图3所示,本发明实施例的计算机设备包括如下装置:至少一个处理器021;以及存储器022,存储器022存储有可在处理器上运行的计算机指令023,指令由处理器执行时实现方法的步骤包括:在系统管理模式下触发系统管理中断,并判断BIOS设置菜单是否开启;若是BIOS菜单开启,则开启机器检查模块的写功能;获取待注错数据和对应的错误源,并基于待注错数据执行诊断代码以得到包含错误部件的诊断数据;将诊断数据发送给BMC并上报OS,以判断诊断数据中的错误部件是否与对应的错误源一致;以及若是诊断数据中的错误部件与对应的错误源一致,则确认诊断功能正常。Based on the above purpose, a third aspect of the embodiments of the present invention provides a computer device. FIG. 3 shows a schematic diagram of an embodiment of a computer device provided by the present invention. As shown in Figure 3, the computer equipment of the embodiment of the present invention includes the following devices: at least one processor 021; and a memory 022, the memory 022 stores computer instructions 023 that can run on the processor, and the method is implemented when the instructions are executed by the processor The steps include: triggering a system management interrupt in the system management mode, and judging whether the BIOS setting menu is enabled; if the BIOS menu is enabled, enabling the write function of the machine inspection module; Note error data Execute diagnostic codes to obtain diagnostic data containing faulty components; send diagnostic data to BMC and report to OS to determine whether faulty components in diagnostic data are consistent with corresponding error sources; and if faulty components in diagnostic data are consistent with If the corresponding error sources are consistent, it is confirmed that the diagnosis function is normal.

在本发明的一些实施例中,开启机器检查模块的写功能包括:判断机器检查模块是否支持写功能;若是机器检查模块支持写功能,则开启机器检查模块的写功能。In some embodiments of the present invention, enabling the write function of the machine check module includes: judging whether the machine check module supports the write function; if the machine check module supports the write function, enabling the write function of the machine check module.

在本发明的一些实施例中,方法的步骤还包括:若是机器检查模块不支持写功能,则发出报错告警。In some embodiments of the present invention, the steps of the method further include: if the machine check module does not support the write function, sending an error alarm.

在本发明的一些实施例中,方法的步骤还包括:基于预设编码规则对CPU子模块进行编码,并将编码规则保存在BMC和待注错数据库中。In some embodiments of the present invention, the steps of the method further include: encoding the CPU sub-module based on a preset encoding rule, and saving the encoding rule in the BMC and the error-to-be-reported database.

在本发明的一些实施例中,获取待注错数据和对应的错误源包括:从待注错数据库中获取待注错数据和对应的错误源。In some embodiments of the present invention, acquiring the error pending data and the corresponding error source includes: acquiring the error pending data and the corresponding error source from the error pending database.

在本发明的一些实施例中,方法的步骤还包括:若是诊断数据中的错误部件与对应的错误源不一致,则确认诊断功能异常。In some embodiments of the present invention, the steps of the method further include: if the error component in the diagnosis data is inconsistent with the corresponding error source, confirming that the diagnosis function is abnormal.

本发明还提供了一种计算机可读存储介质。图4示出的是本发明提供的计算机可读存储介质的实施例的示意图。如图4所示,计算机可读存储介质031存储有被处理器执行时执行如上方法的计算机程序032。The present invention also provides a computer-readable storage medium. FIG. 4 is a schematic diagram of an embodiment of a computer-readable storage medium provided by the present invention. As shown in FIG. 4 , a computer readable storage medium 031 stores a computer program 032 for executing the above method when executed by a processor.

最后需要说明的是,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关硬件来完成,固件诊断功能的验证方法的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,程序的存储介质可为磁碟、光盘、只读存储记忆体(ROM)或随机存储记忆体(RAM)等。上述计算机程序的实施例,可以达到与之对应的前述任意方法实施例相同或者相类似的效果。Finally, it should be noted that those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct related hardware to complete, and the program of the verification method for the firmware diagnostic function can be stored in a computer. When the program is read from the storage medium, when executed, it may include the procedures of the embodiments of the above-mentioned methods. Wherein, the storage medium of the program may be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM), and the like. The foregoing computer program embodiments can achieve the same or similar effects as any of the foregoing method embodiments corresponding thereto.

此外,根据本发明实施例公开的方法还可以被实现为由处理器执行的计算机程序,该计算机程序可以存储在计算机可读存储介质中。在该计算机程序被处理器执行时,执行本发明实施例公开的方法中限定的上述功能。In addition, the method disclosed according to the embodiments of the present invention can also be implemented as a computer program executed by a processor, and the computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the above functions defined in the methods disclosed in the embodiments of the present invention are executed.

此外,上述方法步骤以及系统单元也可以利用控制器以及用于存储使得控制器实现上述步骤或单元功能的计算机程序的计算机可读存储介质实现。In addition, the above-mentioned method steps and system units can also be realized by using a controller and a computer-readable storage medium for storing a computer program for enabling the controller to realize the functions of the above-mentioned steps or units.

本领域技术人员还将明白的是,结合这里的公开所描述的各种示例性逻辑块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。为了清楚地说明硬件和软件的这种可互换性,已经就各种示意性组件、方块、模块、电路和步骤的功能对其进行了一般性的描述。这种功能是被实现为软件还是被实现为硬件取决于具体应用以及施加给整个系统的设计约束。本领域技术人员可以针对每种具体应用以各种方式来实现的功能,但是这种实现决定不应被解释为导致脱离本发明实施例公开的范围。Those of skill would also appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described generally in terms of their functionality. Whether such functionality is implemented as software or as hardware depends upon the particular application and design constraints imposed on the overall system. Those skilled in the art may implement the functions in various ways for each specific application, but such implementation decisions should not be interpreted as causing a departure from the scope disclosed in the embodiments of the present invention.

在一个或多个示例性设计中,功能可以在硬件、软件、固件或其任意组合中实现。如果在软件中实现,则可以将功能作为一个或多个指令或代码存储在计算机可读介质上或通过计算机可读介质来传送。计算机可读介质包括计算机存储介质和通信介质,该通信介质包括有助于将计算机程序从一个位置传送到另一个位置的任何介质。存储介质可以是能够被通用或专用计算机访问的任何可用介质。作为例子而非限制性的,该计算机可读介质可以包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储设备、磁盘存储设备或其它磁性存储设备,或者是可以用于携带或存储形式为指令或数据结构的所需程序代码并且能够被通用或专用计算机或者通用或专用处理器访问的任何其它介质。此外,任何连接都可以适当地称为计算机可读介质。例如,如果使用同轴线缆、光纤线缆、双绞线、数字用户线路(D0L)或诸如红外线、无线电和微波的无线技术来从网站、服务器或其它远程源发送软件,则上述同轴线缆、光纤线缆、双绞线、D0L或诸如红外线、无线电和微波的无线技术均包括在介质的定义。如这里所使用的,磁盘和光盘包括压缩盘(CD)、激光盘、光盘、数字多功能盘(DVD)、软盘、蓝光盘,其中磁盘通常磁性地再现数据,而光盘利用激光光学地再现数据。上述内容的组合也应当包括在计算机可读介质的范围内。In one or more exemplary designs, functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. Storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example and not limitation, the computer readable medium may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage device, magnetic disk storage device or other magnetic storage device, or may be used to carry or store instructions in Any other medium that can be accessed by a general purpose or special purpose computer or a general purpose or special purpose processor, and the required program code or data structure. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using coaxial cable, fiber optic cable, twisted pair wire, digital subscriber line (DOL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable Cable, fiber optic cable, twisted pair, DOL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers . Combinations of the above should also be included within the scope of computer-readable media.

以上是本发明公开的示例性实施例,但是应当注意,在不背离权利要求限定的本发明实施例公开的范围的前提下,可以进行多种改变和修改。根据这里描述的公开实施例的方法权利要求的功能、步骤和/或动作不需以任何特定顺序执行。此外,尽管本发明实施例公开的元素可以以个体形式描述或要求,但除非明确限制为单数,也可以理解为多个。The above are the exemplary embodiments disclosed in the present invention, but it should be noted that various changes and modifications can be made without departing from the scope of the disclosed embodiments of the present invention defined in the claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. In addition, although the elements disclosed in the embodiments of the present invention may be described or required in an individual form, they may also be understood as a plurality unless explicitly limited to a singular number.

应当理解的是,在本文中使用的,除非上下文清楚地支持例外情况,单数形式“一个”旨在也包括复数形式。还应当理解的是,在本文中使用的“和/或”是指包括一个或者一个以上相关联地列出的项目的任意和所有可能组合。It should be understood that as used herein, the singular form "a" and "an" are intended to include the plural forms as well, unless the context clearly supports an exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

上述本发明实施例公开实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments disclosed in the above-mentioned embodiments of the present invention are only for description, and do not represent the advantages and disadvantages of the embodiments.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above-mentioned embodiments can be completed by hardware, or can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium. The above-mentioned The storage medium may be a read-only memory, a magnetic disk or an optical disk, and the like.

所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本发明实施例公开的范围(包括权利要求)被限于这些例子;在本发明实施例的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,并存在如上的本发明实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。因此,凡在本发明实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本发明实施例的保护范围之内。Those of ordinary skill in the art should understand that: the discussion of any of the above embodiments is exemplary only, and is not intended to imply that the scope (including claims) disclosed by the embodiments of the present invention is limited to these examples; under the idea of the embodiments of the present invention , the technical features in the above embodiments or different embodiments can also be combined, and there are many other changes in different aspects of the above embodiments of the present invention, which are not provided in details for the sake of brevity. Therefore, within the spirit and principle of the embodiments of the present invention, any omissions, modifications, equivalent replacements, improvements, etc., shall be included in the protection scope of the embodiments of the present invention.

Claims (8)

1.一种固件诊断功能的验证方法,其特征在于,包括以下步骤:1. A verification method of a firmware diagnostic function, characterized in that, comprising the following steps: 在系统管理模式下触发系统管理中断,并判断BIOS设置菜单是否开启;Trigger a system management interrupt in system management mode, and determine whether the BIOS setup menu is enabled; 若是BIOS菜单开启,则开启机器检查模块的写功能;If the BIOS menu is enabled, enable the write function of the machine inspection module; 获取待注错数据和对应的错误源,并基于所述待注错数据执行诊断代码以得到包含错误部件的诊断数据;Obtaining error-reporting data and corresponding error sources, and executing diagnostic codes based on the error-reporting data to obtain diagnostic data containing faulty components; 将所述诊断数据发送给BMC并上报OS,以判断所述诊断数据中的错误部件是否与所述对应的错误源一致;以及Sending the diagnostic data to the BMC and reporting to the OS to determine whether the faulty component in the diagnostic data is consistent with the corresponding error source; and 若是所述诊断数据中的错误部件与所述对应的错误源一致,则确认诊断功能正常;If the error component in the diagnosis data is consistent with the corresponding error source, confirm that the diagnosis function is normal; 通过带外系统通过IPMI命令按照预定义的数据格式设置注错误的数据以及带内在SMI中加入注错模块,通过IPMI获取故障数据并写入故障模块,以实现MCA注错过程;Through the out-of-band system, the IPMI command is used to set the error data according to the predefined data format, and the in-band adds the error injection module to the SMI, and obtains the fault data through IPMI and writes it into the fault module to realize the MCA error injection process; 基于预设编码规则对CPU子模块进行编码,并将所述编码规则保存在BMC和待注错数据库中,其中,获取待注错数据和对应的错误源的步骤包括从所述待注错数据库中获取待注错数据和对应的错误源。The CPU submodule is encoded based on the preset coding rules, and the coding rules are stored in the BMC and the error database to be noted, wherein the step of obtaining the error data to be noted and the corresponding source of error comprises from the error database to be noted Obtain the error data to be noted and the corresponding error source. 2.根据权利要求1所述的固件诊断功能的验证方法,其特征在于,开启机器检查模块的写功能包括:2. the verification method of firmware diagnosis function according to claim 1, is characterized in that, opening the writing function of machine inspection module comprises: 判断机器检查模块是否支持写功能;Determine whether the machine check module supports the write function; 若是机器检查模块支持写功能,则开启机器检查模块的写功能。If the machine inspection module supports the write function, enable the write function of the machine inspection module. 3.根据权利要求2所述的固件诊断功能的验证方法,其特征在于,还包括:3. the verification method of firmware diagnosis function according to claim 2, is characterized in that, also comprises: 若是机器检查模块不支持写功能,则发出报错告警。If the machine check module does not support the write function, an error alarm will be issued. 4.根据权利要求1所述的固件诊断功能的验证方法,其特征在于,还包括:4. the verification method of firmware diagnosis function according to claim 1, is characterized in that, also comprises: 若是所述诊断数据中的错误部件与所述对应的错误源不一致,则确认诊断功能异常。If the error component in the diagnosis data is inconsistent with the corresponding error source, it is confirmed that the diagnosis function is abnormal. 5.一种固件诊断功能的验证装置,其特征在于,包括:5. A verification device for a firmware diagnostic function, comprising: 第一模块,配置用于在系统管理模式下触发系统管理中断,并判断BIOS设置菜单是否开启;The first module is configured to trigger a system management interrupt in the system management mode, and determine whether the BIOS setting menu is enabled; 第二模块,配置用于若是BIOS菜单开启,则开启机器检查模块的写功能;The second module is configured to enable the write function of the machine inspection module if the BIOS menu is enabled; 第三模块,配置用于获取待注错数据和对应的错误源,并基于所述待注错数据执行诊断代码以得到包含错误部件的诊断数据;The third module is configured to obtain the error data to be injected and the corresponding error source, and execute the diagnostic code based on the error data to be injected to obtain the diagnostic data including the error component; 第四模块,配置用于将所述诊断数据发送给BMC并上报OS,以判断所述诊断数据中的错误部件是否与所述对应的错误源一致;以及The fourth module is configured to send the diagnostic data to the BMC and report to the OS to determine whether the faulty component in the diagnostic data is consistent with the corresponding fault source; and 第五模块,配置用于若是所述诊断数据中的错误部件与所述对应的错误源一致,则确认诊断功能正常;The fifth module is configured to confirm that the diagnosis function is normal if the error component in the diagnosis data is consistent with the corresponding error source; 以及执行以下步骤的模块:and a module that performs the following steps: 通过带外系统通过IPMI命令按照预定义的数据格式设置注错误的数据以及带内在SMI中加入注错模块,通过IPMI获取故障数据并写入故障模块,以实现MCA注错过程;Through the out-of-band system, the IPMI command is used to set the error data according to the predefined data format, and the in-band adds the error injection module to the SMI, and obtains the fault data through IPMI and writes it into the fault module to realize the MCA error injection process; 基于预设编码规则对CPU子模块进行编码,并将所述编码规则保存在BMC和待注错数据库中,其中,获取待注错数据和对应的错误源的步骤包括从所述待注错数据库中获取待注错数据和对应的错误源。The CPU submodule is encoded based on the preset coding rules, and the coding rules are stored in the BMC and the error database to be noted, wherein the step of obtaining the error data to be noted and the corresponding source of error comprises from the error database to be noted Obtain the error data to be noted and the corresponding error source. 6.根据权利要求5所述的固件诊断功能的验证装置,其特征在于,第二模块进一步配置用于:6. The verification device of the firmware diagnostic function according to claim 5, wherein the second module is further configured for: 判断机器检查模块是否支持写功能;Determine whether the machine check module supports the write function; 若是机器检查模块支持写功能,则开启机器检查模块的写功能。If the machine inspection module supports the write function, enable the write function of the machine inspection module. 7. 一种计算机设备,其特征在于,包括:7. A computer device, comprising: 至少一个处理器;以及at least one processor; and 存储器,所述存储器存储有可在所述处理器上运行的计算机指令,所述指令由所述处理器执行时实现权利要求1-4任意一项所述方法的步骤。A memory, the memory stores computer instructions operable on the processor, and when the instructions are executed by the processor, the steps of the method according to any one of claims 1-4 are implemented. 8.一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-4任意一项所述方法的步骤。8. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, wherein, when the computer program is executed by a processor, the steps of the method according to any one of claims 1-4 are implemented.
CN202111116013.1A 2021-09-23 2021-09-23 Verification method, device and equipment for firmware diagnosis function and readable medium Active CN113886165B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111116013.1A CN113886165B (en) 2021-09-23 2021-09-23 Verification method, device and equipment for firmware diagnosis function and readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111116013.1A CN113886165B (en) 2021-09-23 2021-09-23 Verification method, device and equipment for firmware diagnosis function and readable medium

Publications (2)

Publication Number Publication Date
CN113886165A CN113886165A (en) 2022-01-04
CN113886165B true CN113886165B (en) 2023-08-11

Family

ID=79010387

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111116013.1A Active CN113886165B (en) 2021-09-23 2021-09-23 Verification method, device and equipment for firmware diagnosis function and readable medium

Country Status (1)

Country Link
CN (1) CN113886165B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102707710A (en) * 2012-06-01 2012-10-03 浙江吉利汽车研究院有限公司杭州分公司 Diagnosis function verification method and system for automobile electronic control unit
CN102890494A (en) * 2012-06-19 2013-01-23 浙江吉利汽车研究院有限公司杭州分公司 Functional verification method of automobile diagnosis instrument

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102707710A (en) * 2012-06-01 2012-10-03 浙江吉利汽车研究院有限公司杭州分公司 Diagnosis function verification method and system for automobile electronic control unit
CN102890494A (en) * 2012-06-19 2013-01-23 浙江吉利汽车研究院有限公司杭州分公司 Functional verification method of automobile diagnosis instrument

Also Published As

Publication number Publication date
CN113886165A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
CN108683562B (en) Anomaly detection and positioning method, device, computer equipment and storage medium
CN110992992B (en) A hard disk testing method, device and storage medium
CN110543420B (en) A software testing method, system, terminal and storage medium
CN105677572B (en) Based on self organizing maps model cloud software performance exception error diagnostic method and system
CN105912460A (en) Software test method and system based on QTP
CN111522725A (en) SSD performance automatic evaluation method, device, equipment and medium
WO2024250776A1 (en) Fault detection method and apparatus for external device
CN118550747A (en) PCIe fatal error quick positioning method, system, electronic equipment and medium
CN103645963A (en) Storage system and data consistency verification method thereof
CN114996127A (en) Intelligent test method and system for solid state disk firmware module
CN113886165B (en) Verification method, device and equipment for firmware diagnosis function and readable medium
CN116820946B (en) Method and device for automatically testing compatibility of target software
CN117608952B (en) Detection device and detection method
CN116860341A (en) Equipment identification method, device and medium under same bus
CN116185826A (en) Test method, device, equipment and storage medium
CN115454704A (en) A server fault diagnosis test method, device, terminal and storage medium
CN115114097A (en) Hard disk injection medium error test method, system, terminal and storage medium
CN107145422B (en) A software fault alarm monitoring method
CN117312174B (en) Program error path detection method, device, equipment and readable storage medium
CN113760696A (en) Program problem positioning method and device, electronic equipment and storage medium
CN110781042A (en) Method, device and medium for detecting UBM (Universal boot Module) backboard based on BMC (baseboard management controller)
CN117076183B (en) Error reporting method, system on chip, computer equipment and storage medium
US20240418775A1 (en) Method and system for tracking and managing activities of testbench components in a test environment
CN116627730A (en) Smart log acquisition and analysis method, system, device and medium
CN119479749A (en) NVMe system testing method, device, system and solid state drive

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant