CN118132358A - Error injection method, system, upper computer, controller, equipment and storage medium - Google Patents

Error injection method, system, upper computer, controller, equipment and storage medium Download PDF

Info

Publication number
CN118132358A
CN118132358A CN202410532594.4A CN202410532594A CN118132358A CN 118132358 A CN118132358 A CN 118132358A CN 202410532594 A CN202410532594 A CN 202410532594A CN 118132358 A CN118132358 A CN 118132358A
Authority
CN
China
Prior art keywords
error
target component
error injection
management controller
operation information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410532594.4A
Other languages
Chinese (zh)
Other versions
CN118132358B (en
Inventor
张传玺
张秀波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Metabrain Intelligent Technology Co Ltd
Original Assignee
Suzhou Metabrain Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Metabrain Intelligent Technology Co Ltd filed Critical Suzhou Metabrain Intelligent Technology Co Ltd
Priority to CN202410532594.4A priority Critical patent/CN118132358B/en
Publication of CN118132358A publication Critical patent/CN118132358A/en
Application granted granted Critical
Publication of CN118132358B publication Critical patent/CN118132358B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2284Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing by power-on test, e.g. power-on self test [POST]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/26Functional testing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the invention provides an error injection method, which can determine the error type of a target component needing error injection through an upper computer, so that an initialization link can be created between a substrate management controller and the target component according to the error type of the target component, after the initialization link is completed, an error injection instruction can be sent to the substrate management controller to perform error injection on the target component, after the error injection is completed, an operation information inquiry request can be sent to the substrate management controller, operation information corresponding to the target component sent by the substrate management controller is received, whether a fault corresponding to the error type occurs is determined according to the operation information, and if the fault does not occur, alarm information is generated and displayed; the upper computer can provide a unified error injection platform for server system testers, namely, the upper computer can realize error injection of each component, thereby improving error injection verification efficiency and greatly reducing system test cost.

Description

Error injection method, system, upper computer, controller, equipment and storage medium
Technical Field
The present invention relates to the field of computers, and in particular, to an error injection method, system, host computer, baseboard management controller, device, and storage medium.
Background
Along with development of science and technology, current computer systems are more and more complex, system faults are also various under various application scenes, in order to ensure stable operation of the system, a server needs to perform batch fault injection verification on each component before leaving the factory, fault injection on each component is independent, unified management is lacking, fault injection means are often single-point fault injection, and fault injection cost is high.
Disclosure of Invention
In view of the foregoing, embodiments of the present invention have been developed to provide an error injection method, system, host computer, baseboard management controller, device, and storage medium that overcome or at least partially solve the foregoing problems.
In order to solve the above problems, the present invention discloses an error injection method applied to an upper computer, wherein the upper computer is connected with a baseboard management controller, and the method comprises:
determining the error type of a target component needing to be wrongly injected, and generating a wrongly injected starting command according to the error type of the target component;
sending the error injection initiation command to the baseboard management controller to create an initialization link between the baseboard management controller and the target component;
if an initialization link creation completion signal sent by the baseboard management controller is received, generating an error injection instruction according to the error type, and sending the error injection instruction to the baseboard management controller so as to perform error injection on the target component;
Receiving an error injection completion signal sent by the baseboard management controller, and sending an operation information inquiry request to the baseboard management controller;
and receiving operation information corresponding to the target component, which is sent by the baseboard management controller, judging whether the fault corresponding to the error type occurs according to the operation information, and if the fault corresponding to the error type does not occur, generating and displaying alarm information.
Optionally, the operation information query request includes a register identifier corresponding to an error type of the target component, and the determining, according to the operation information, whether the fault corresponding to the error type occurs includes:
if the operation information of the register corresponding to the error type of the target component comprises a fault corresponding to the error type, determining that the fault corresponding to the error type occurs;
and if the operation information of the register corresponding to the error type of the target component does not comprise the fault corresponding to the error type, determining that the fault corresponding to the error type does not occur.
Optionally, the upper computer includes an intelligent platform management interface, and the method further includes:
and sending a reset command to a baseboard management controller through an intelligent platform management interface so that the baseboard management controller triggers a reset interrupt of the target component, and restoring the target component to an initial state.
Optionally, the target component includes at least one of a central processing unit, a programmable logic device, a power supply unit, a memory, and a network card.
The invention also discloses an error injection method which is applied to the baseboard management controller, wherein the baseboard management controller is connected with the upper computer, and the method comprises the following steps:
receiving an error injection starting command sent by the upper computer, wherein the error injection starting command comprises an error type of a target component needing error injection;
establishing an initialization link with the target component according to the error injection starting command;
Transmitting an initialization link creation completion signal to the upper computer, and receiving an error injection instruction transmitted by the upper computer;
performing error injection on the target component according to the error injection instruction, and sending an error injection completion signal to the upper computer after the error injection is completed;
receiving an operation information inquiry request sent by the upper computer;
and acquiring the operation information of the target component according to the operation information query request, and sending the operation information to the upper computer.
Optionally, the target component includes a central processing unit, the baseboard management controller includes a joint test workgroup interface controller and a joint test workgroup interface, and the initializing link is established with the target component according to the error injection starting command, including:
Initializing the joint test working group interface controller, and configuring an initialization link between the joint test working group interface and the central processing unit through the joint test working group interface controller.
Optionally, the target component includes a programmable logic device, the error type includes a power-down type and a timeout type, and the performing the error injection on the target component according to the error injection instruction includes:
configuring a power-down register of the programmable logic device according to the power-down type;
And configuring a timer of the programmable logic device according to the timeout type.
Optionally, the target component includes a power supply unit, the error type includes a fan fault and an input voltage fault, and the error injection is performed on the target component according to the error injection instruction, including:
Configuring a fan register of the power supply unit according to the fan fault;
And configuring an input voltage register of the power supply unit according to the input voltage fault.
Optionally, the operation information query instruction includes a number of a fan register, and the obtaining, according to the operation information query request, the operation information of the target component includes:
and acquiring the operation information of the fan register through a bidirectional serial bus according to the serial number of the fan register.
Optionally, the target component includes at least one of a central processing unit, a programmable logic device, a power supply unit, a memory, and a network card.
The invention also discloses an error injection system, which comprises an upper computer and a substrate management controller, wherein the upper computer is connected with the substrate management controller;
The upper computer is used for determining the error type of a target component needing error injection, packaging the error type of the target component to obtain an error injection starting command, and sending the error injection starting command to the baseboard management controller; receiving an initialization link creation completion signal sent by the baseboard management controller, generating an error injection instruction according to the error type, sending the error injection instruction to the baseboard management controller, and sending an operation information inquiry request to the baseboard management controller; receiving operation information corresponding to the target component sent by the baseboard management controller, judging whether a fault corresponding to the error type occurs according to the operation information, and if the fault corresponding to the error type does not occur, generating and displaying alarm information;
The baseboard management controller is used for receiving the error injection starting command sent by the upper computer, and establishing an initialization link with the target component according to the error injection starting command; transmitting an initialization link creation completion signal to the upper computer, receiving an error injection instruction transmitted by the upper computer, performing error injection on the target component according to the error injection instruction, and transmitting an error injection completion signal to the upper computer after error injection is completed; receiving an operation information inquiry request sent by the upper computer; and acquiring the operation information of the target component according to the operation information query request, and sending the operation information to the upper computer.
The invention also discloses an upper computer, which is connected with the substrate management controller, and comprises:
the determining module is used for determining the error type of a target component needing to be wrongly injected, and generating a wrongly injected starting command according to the error type of the target component;
A transmitting module, configured to transmit the error injection start command to the baseboard management controller, so as to create an initialization link between the baseboard management controller and the target component;
The first sending module is used for generating an error injection instruction according to the error type if receiving an initialization link creation completion signal sent by the baseboard management controller, and sending the error injection instruction to the baseboard management controller so as to perform error injection on the target component;
the second sending module is used for receiving the error injection completion signal sent by the baseboard management controller and sending an operation information inquiry request to the baseboard management controller;
And the display module is used for receiving the operation information corresponding to the target component sent by the substrate management controller, judging whether the fault corresponding to the error type occurs according to the operation information, and if the fault corresponding to the error type does not occur, generating and displaying alarm information.
The invention also discloses a baseboard management controller, which is connected with the upper computer, and comprises:
the first receiving module is used for receiving an error injection starting command sent by the upper computer, wherein the error injection starting command comprises an error type of a target component needing error injection;
The establishing module is used for establishing an initialization link with the target component according to the error injection starting command;
The second receiving module is used for sending an initialization link creation completion signal to the upper computer and receiving an error injection instruction sent by the upper computer;
The error injection module is used for performing error injection on the target component according to the error injection instruction, and sending an error injection completion signal to the upper computer after the error injection is completed;
the third receiving module is used for receiving an operation information inquiry request sent by the upper computer;
and the acquisition module is used for acquiring the operation information of the target component according to the operation information query request and sending the operation information to the upper computer.
The invention also discloses an electronic device, comprising: a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor performs the steps of the error injection method as described above.
The invention also discloses a nonvolatile readable storage medium, wherein the nonvolatile readable storage medium stores a computer program, and the computer program realizes the steps of the error injection method when being executed by a processor.
The embodiment of the invention has the following advantages:
The invention discloses an error injection method, which can determine the error type of a target component needing error injection through an upper computer, so that an initialization link can be established between a substrate management controller and the target component according to the error type of the target component, after the initialization link is completed, an error injection instruction can be sent to the substrate management controller to perform error injection on the target component, after the error injection is completed, an operation information inquiry request can be sent to the substrate management controller, operation information corresponding to the target component sent by the substrate management controller is received, then whether a fault corresponding to the error type occurs is determined according to the operation information, and if the fault does not occur, alarm information is generated and displayed; the upper computer can provide a unified error injection platform for server system testers, namely, the upper computer can realize error injection of each component, thereby improving error injection verification efficiency and greatly reducing system test cost; and the alarm information can be displayed when the fault corresponding to the error type of the fault injection does not occur, so that convenience is provided for the user.
Drawings
FIG. 1 is a flow chart of steps of an error injection method according to an embodiment of the present invention;
Fig. 2 is a schematic diagram of a display interface of an upper computer according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating steps of another error injection method according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of an error injection method according to an embodiment of the present invention;
Fig. 5 is a block diagram of an upper computer according to an embodiment of the present invention;
fig. 6 is a block diagram of a baseboard management controller according to an embodiment of the present invention;
Fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention;
fig. 8 is a block diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
One of the core ideas of the embodiment of the invention is that the error type of a target component needing to be subjected to error injection can be determined through an upper computer, so that an initialization link can be established between a substrate management controller and the target component according to the error type of the target component, after the initialization link is completed, an error injection instruction can be sent to the substrate management controller to perform error injection on the target component, after the error injection is completed, an operation information inquiry request can be sent to the substrate management controller, operation information corresponding to the target component sent by the substrate management controller is received, whether a fault corresponding to the error type occurs is determined according to the operation information, and if the fault does not occur, alarm information is generated and displayed; the upper computer can provide a unified error injection platform for server system testers, namely, the upper computer can realize error injection of each component, thereby improving error injection verification efficiency and greatly reducing system test cost; and the alarm information can be displayed when the fault corresponding to the error type of the fault injection does not occur, so that convenience is provided for the user.
Referring to fig. 1, a step flow chart of an error injection method provided by an embodiment of the present invention is shown and applied to an upper computer, where the upper computer is connected with a baseboard management controller, and the method includes:
step 101, determining the error type of the target component needing to be wrongly injected, and generating a wrongly injected starting command according to the error type of the target component.
In the embodiment of the invention, a plurality of components can be displayed in a display interface of the upper computer, each component corresponds to a plurality of error types, a user can determine the error type of a target component needing to be wrongly injected in the plurality of components and the plurality of error types, the target component needing to be wrongly injected can be one component or a plurality of components, and the error types can comprise one or a plurality of components, and are not limited herein.
After determining the error type of the target component that the user needs to make the error injection, the error type of the target component may be encapsulated into the error injection initiation command.
In one embodiment of the present invention, the target component is at least one of a central processing unit, a programmable logic device, a power supply unit, a memory, and a network card.
In the embodiment of the present invention, as shown in fig. 2, a schematic diagram of a display interface of an upper computer provided in the embodiment of the present invention is shown, where a plurality of components may be displayed in the upper computer, where the plurality of components may include: CPU (Central Processing Unit ), CPLD (Complex Programmable Logic Device, editable logic device), PSU (Power Supply Unit ), MEM (memory), network card, wherein the types of errors that can be selected in the CPU can include: UPI (Ultra Path Interconnect hyper-path interconnect), PCU (Power Control Unit power control unit), IMC (INTEGRATED MEMORY CONTROLLER memory controller), CHA (Cache Home Agent Cache coherence unit), types of errors that may be selected in memory may include: ADDDC (Adaptive Double DEVICE DATA Correction), PPR (Post PACAKGE REPAIR Post repair after package), UCE (Uncorrectable Error non-repairable errors), types of errors that can be selected in a programmable logic device can include: drop power down, timeout, alert, error types that may be selected in the network card may include: fatal severe faults, PERR (Parity Error), SERR (System Errot systematic Error), types that may be selected in the power supply unit may include: fan Fault, vin Fault, predictive pre-alarm.
Step 102, a fault injection initiation command is sent to the baseboard management controller to create an initialization link between the baseboard management controller and the target component.
In the embodiment of the invention, after the upper computer generates the error injection starting command, the error injection starting command can be sent to the substrate management controller, so that the substrate management controller creates an initialization link between the substrate management controller and the target component after receiving the error injection starting command.
Step 103, if the initialization link creation completion signal sent by the baseboard management controller is received, generating an error injection command according to the error type, and sending the error injection command to the baseboard management controller to perform error injection on the target component.
In the embodiment of the invention, the initialization link creation completion signal is used for indicating that the initialization link creation between the baseboard management controller and the target component is completed, and after receiving the initialization link creation completion signal sent by the baseboard management controller, the upper computer can generate the error injection instruction according to the error type of the target component selected by the user, and then send the error injection instruction to the baseboard management controller, wherein the error injection instruction can comprise the error type of the target component, for example, the target component selected by the user is a network card, and the error type can comprise parity faults and system errors.
Step 104, receiving the error injection completion signal sent by the baseboard management controller, and sending an operation information inquiry request to the baseboard management controller.
In the embodiment of the invention, the error injection completion signal is used for indicating that the substrate management controller has completed the error injection of the target component after receiving the error injection instruction, and the upper computer can send an operation information inquiry request to the substrate management controller after receiving the error injection completion signal, so that the operation information of the target component is obtained.
Step 105, receiving the operation information corresponding to the target component sent by the baseboard management controller, judging whether a fault corresponding to the error type occurs according to the operation information, and if the fault corresponding to the error type does not occur, generating and displaying alarm information.
In the embodiment of the invention, the upper computer can receive the operation information corresponding to the target component sent by the substrate management controller based on the operation information inquiry request, then judge whether the fault corresponding to the error type of the target component occurs according to the operation information, if so, the component is indicated to work normally, if not, the alarm information can be generated and displayed in the interface, and the user is reminded of the fault of the component through the alarm information, so that convenience is provided for the user.
In one embodiment of the present invention, the operation information query request includes a register identifier corresponding to an error type of a target component, and determining whether a fault corresponding to the error type occurs according to operation information includes: if the operation information of the register corresponding to the error type of the target component comprises a fault corresponding to the error type, determining that the fault corresponding to the error type occurs; if the operation information of the register corresponding to the error type of the target component does not comprise the fault corresponding to the error type, determining that the fault corresponding to the error type does not occur.
In the embodiment of the invention, the upper computer can send the operation information inquiry request to the baseboard management controller, after receiving the operation information inquiry request, the baseboard management controller can analyze the register identification corresponding to the error type of the target component, then acquire the operation information of the register corresponding to the error type of the target component and send the operation information to the upper computer, and the upper computer can judge whether the operation information of the register corresponding to the error type of the target component comprises a fault corresponding to the error type, if the operation information comprises the fault corresponding to the error type, the fault corresponding to the error type is indicated, and if the fault does not comprise the fault, the fault corresponding to the error type is indicated. The invention can determine whether the fault occurs or not by acquiring the operation information of the register corresponding to the error type, thereby improving the efficiency of fault detection.
In one example, the target component is a power supply unit, the error type is a fan fault, if the upper computer determines that the operation information of the fan register includes the fan fault after acquiring the operation information of the fan register, the fault corresponding to the error type is determined to occur, and if the operation information of the fan register does not include the fan fault, the fault corresponding to the error type is determined to not occur.
In an embodiment of the present invention, the upper computer includes an intelligent platform management interface, and the method further includes: and sending a reset command to the baseboard management controller through the intelligent platform management interface so that the baseboard management controller triggers a reset interrupt of the target component to restore the target component to an initial state.
In the embodiment of the invention, the upper computer may include an IPMI (INTELLIGENT PLATFORM MANAGEMENT INTERFACE ), and may send a reset command to the baseboard management controller through the intelligent platform management interface, and after receiving the reset command, the baseboard management controller may trigger a reset interrupt of the target component, so as to restore the target component to an initial state, thereby restoring the state of the target component to the initial state after the error type of the target component is wrongly injected, and avoiding the influence on the operation of the target component.
The invention discloses an error injection method, which can determine the error type of a target component needing error injection through an upper computer, so that an initialization link can be established between a substrate management controller and the target component according to the error type of the target component, after the initialization link is completed, an error injection instruction can be sent to the substrate management controller to perform error injection on the target component, after the error injection is completed, an operation information inquiry request can be sent to the substrate management controller, operation information corresponding to the target component sent by the substrate management controller is received, then whether a fault corresponding to the error type occurs is determined according to the operation information, and if the fault does not occur, alarm information is generated and displayed; the upper computer can provide a unified error injection platform for server system testers, namely, the upper computer can realize error injection of each component, thereby improving error injection verification efficiency and greatly reducing system test cost; and the alarm information can be displayed when the fault corresponding to the error type of the fault injection does not occur, so that convenience is provided for the user.
Referring to fig. 3, a flowchart illustrating steps of another error injection method according to an embodiment of the present invention is applied to a baseboard management controller, where the baseboard management controller is connected to an upper computer, and the method includes:
step 201, receiving an error injection starting command sent by the upper computer, where the error injection starting command includes an error type of a target component to be subjected to error injection.
In one embodiment of the present invention, the target component is at least one of a central processing unit, a programmable logic device, a power supply unit, a memory, and a network card.
Step 202, establishing an initialization link with a target component according to the error injection starting command;
In one embodiment of the present invention, the target component includes a central processing unit, the baseboard management controller includes a joint test workgroup interface controller and a joint test workgroup interface, and the initialization link is established with the target component according to the error injection start command, including: and initializing a joint test working group interface controller, and configuring an initialization link of the joint test working group interface and the central processing unit through the joint test working group interface controller.
In the embodiment of the invention, the baseboard management controller can comprise a JTAG (Joint Test Action Group, joint test working group interface) controller and a joint test working group interface, when the target component comprises a central processing unit, the baseboard management controller can be connected with the interface controller by initializing the joint test working group, then the joint test working group interface is configured with an initialization link of the central processing unit by the joint test working group interface controller, so that a communication link is established between the baseboard management controller and the central processing unit.
Step 203, sending an initialization link creation completion signal to an upper computer, and receiving an error injection instruction sent by the upper computer;
and 204, performing error injection on the target component according to the error injection instruction, and sending an error injection completion signal to the upper computer after the error injection is completed.
In one embodiment of the present invention, the target component includes a programmable logic device, the error type includes a power-down type and a timeout type, and the performing the error injection on the target component according to the error injection instruction includes: configuring a power-down register of the programmable logic device according to the power-down type; and configuring a timer of the programmable logic device according to the timeout type.
In the embodiment of the invention, when the target component comprises a programmable logic device and the error injection type selected by a user is a power-down type and a timeout type, after receiving the error injection instruction, the baseboard management controller can analyze the error injection instruction to obtain the programmable logic device, the power-down type and the timeout type, and then can configure a power-down register of the programmable logic device according to the power-down type, namely, construct a power-down type fault in the power-down register and construct a timeout type fault in a timer.
In one embodiment of the present invention, the target component includes a power supply unit, the error type includes a fan failure, an input voltage failure, and the error injection is performed on the target component according to an error injection command, including: configuring a fan register of a power supply unit according to a fan fault; and configuring an input voltage register of the power supply unit according to the input voltage fault.
In the embodiment of the invention, when the target component comprises a power supply unit and the error injection type selected by the user is a fan fault and an input voltage fault, after receiving the error injection command, the baseboard management controller can analyze the error injection command to obtain the power supply unit, the fan fault and the input voltage fault, and then can configure a fan register of the power supply unit according to the fan fault, namely, the fan fault is constructed in the fan register, and the input voltage fault is constructed in the input voltage register.
Step 205, receiving an operation information inquiry request sent by an upper computer;
And step 206, acquiring the operation information of the target component according to the operation information query request, and sending the operation information to the upper computer.
In one embodiment of the present invention, the operation information query instruction includes a number of a fan register, and the obtaining operation information of the target component according to the operation information query request includes: and acquiring the operation information of the fan register through a bidirectional serial bus according to the serial number of the fan register.
In the embodiment of the invention, when the target component comprises a power supply unit and the error injection type selected by the user is a Fan fault, the upper computer can send an operation information inquiry request to the baseboard management controller in order to acquire the operation information of the Fan register, wherein the operation information inquiry command can comprise the serial number of the Fan register, in one example, the serial number of the Fan register is Fan, the baseboard management controller can analyze the operation information inquiry command to Fan, and then acquire the operation information of the Fan register through a bidirectional serial bus.
Fig. 4 shows a flow chart of an error injection method provided by the embodiment of the invention, wherein a target component is a central processing unit, error types are in super-path interconnection, an upper computer can send an error injection starting instruction to a baseboard management controller, the baseboard management controller can create an initialization link with the central processing unit after receiving the error injection starting instruction, the baseboard management controller can send an initialization link creation completion signal to the upper computer after completing the initialization link creation, the upper computer can send the error injection instruction to the baseboard management controller after receiving the initialization link creation completion signal, the baseboard management controller can send the error injection completion signal to the upper computer after completing the error injection, the upper computer can send an operation information inquiry request to the baseboard management controller after receiving the error injection completion signal, the baseboard management controller can obtain the operation information of the central processing unit after receiving the operation information inquiry request, and then send the operation information of the central processing unit to the upper computer, and the upper computer can judge whether the error types have faults or not occur according to the operation information, and if the fault types are not sent, and alarm information is not generated.
The invention discloses an error injection method, which can determine the error type of a target component needing error injection through an upper computer, so that an initialization link can be established between a substrate management controller and the target component according to the error type of the target component, after the initialization link is completed, an error injection instruction can be sent to the substrate management controller to perform error injection on the target component, after the error injection is completed, an operation information inquiry request can be sent to the substrate management controller, operation information corresponding to the target component sent by the substrate management controller is received, then whether a fault corresponding to the error type occurs is determined according to the operation information, and if the fault does not occur, alarm information is generated and displayed; the upper computer can provide a unified error injection platform for server system testers, namely, the upper computer can realize error injection of each component, thereby improving error injection verification efficiency and greatly reducing system test cost; and the alarm information can be displayed when the fault corresponding to the error type of the fault injection does not occur, so that convenience is provided for the user.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.
The invention also discloses an error injection system, which comprises an upper computer and a substrate management controller, wherein the upper computer is connected with the substrate management controller;
The upper computer is used for determining the error type of the target component needing to be subjected to error injection, packaging the error type of the target component to obtain an error injection starting command, and sending the error injection starting command to the baseboard management controller; receiving an initialization link creation completion signal sent by a baseboard management controller, generating an error injection command according to the error type, sending the error injection command to the baseboard management controller, and sending an operation information inquiry request to the baseboard management controller; receiving operation information corresponding to a target component sent by a substrate management controller, judging whether a fault corresponding to an error type occurs according to the operation information, and if the fault corresponding to the error type does not occur, generating and displaying alarm information;
The substrate management controller is used for receiving an error injection starting command sent by the upper computer and establishing an initialization link with the target component according to the error injection starting command; transmitting an initialization link creation completion signal to the upper computer, receiving an error injection instruction transmitted by the upper computer, performing error injection on the target component according to the error injection instruction, and transmitting an error injection completion signal to the upper computer after the error injection is completed; receiving an operation information inquiry request sent by an upper computer; and acquiring the operation information of the target component according to the operation information query request, and sending the operation information to the upper computer.
Referring to fig. 5, a block diagram of an upper computer 30 according to an embodiment of the present invention is shown, where the upper computer is connected to a baseboard management controller, and the upper computer includes:
the determining module 301 is configured to determine an error type of a target component that needs to be error-injected, and generate an error-injection start command according to the error type of the target component;
A first sending module 302, configured to send an error injection start command to the baseboard management controller, so as to create an initialization link between the baseboard management controller and the target component;
the second sending module 303 is configured to generate an error injection instruction according to the error type if receiving the initialization link creation completion signal sent by the baseboard management controller, and send the error injection instruction to the baseboard management controller to perform error injection on the target component;
A third sending module 304, configured to receive an error injection completion signal sent by the baseboard management controller, and send an operation information query request to the baseboard management controller;
And the display module 305 is configured to receive the operation information corresponding to the target component sent by the baseboard management controller, determine whether a fault corresponding to the error type occurs according to the operation information, and if the fault corresponding to the error type does not occur, generate and display alarm information.
In one embodiment of the present invention, the running information query request includes a register identifier corresponding to an error type of the target component, and the display module 305 may include:
the first determining submodule is used for determining that the fault corresponding to the error type occurs if the operation information of the register corresponding to the error type of the target component comprises the fault corresponding to the error type;
and the second determining submodule is used for determining that the fault corresponding to the error type does not occur if the operation information of the register corresponding to the error type of the target component does not comprise the fault corresponding to the error type.
In an embodiment of the present invention, the upper computer includes an intelligent platform management interface, and the upper computer further includes:
And the reset module is used for sending a reset command to the baseboard management controller through the intelligent platform management interface so that the baseboard management controller triggers the reset interrupt of the target component and restores the target component to an initial state.
In one embodiment of the present invention, the target component is at least one of a central processing unit, a programmable logic device, a power supply unit, a memory, and a network card.
The invention discloses an upper computer, which can determine the error type of a target component needing to be subjected to error injection through the upper computer, so that an initialization link can be established between a substrate management controller and the target component according to the error type of the target component, after the initialization link is completed, an error injection instruction can be sent to the substrate management controller to perform error injection on the target component, after the error injection is completed, an operation information inquiry request can be sent to the substrate management controller, operation information corresponding to the target component sent by the substrate management controller is received, whether a fault corresponding to the error type occurs is determined according to the operation information, and if the fault does not occur, alarm information is generated and displayed; the upper computer can provide a unified error injection platform for server system testers, namely, the upper computer can realize error injection of each component, thereby improving error injection verification efficiency and greatly reducing system test cost; and the alarm information can be displayed when the fault corresponding to the error type of the fault injection does not occur, so that convenience is provided for the user.
Referring to fig. 6, there is shown a block diagram of a baseboard management controller 40 according to an embodiment of the present invention, where the baseboard management controller is connected to a host computer, and includes:
The first receiving module 401 is configured to receive an error injection start command sent by the upper computer, where the error injection start command includes an error type of a target component that needs to be subjected to error injection;
An establishing module 402, configured to establish an initialization link with the target component according to the error injection start command;
The second receiving module 403 is configured to send an initialization link creation completion signal to the upper computer, and receive an error injection instruction sent by the upper computer;
The error injection module 404 is configured to perform error injection on the target component according to the error injection instruction, and send an error injection completion signal to the upper computer after the error injection is completed;
a third receiving module 405, configured to receive an operation information query request sent by an upper computer;
And the obtaining module 406 is configured to obtain the operation information of the target component according to the operation information query request, and send the operation information to the upper computer.
In one embodiment of the present invention, the target component comprises a central processing unit, the baseboard management controller comprises a joint test workgroup interface controller and a joint test workgroup interface, and the establishing module 402 comprises:
And the initialization submodule is used for initializing the joint test workgroup interface controller and configuring an initialization link between the joint test workgroup interface and the central processing unit through the joint test workgroup interface controller.
In one embodiment of the present invention, the target component includes a programmable logic device, the error type includes a power down type, a timeout type, and the error injection module 404 includes:
The first configuration submodule is used for configuring a power-down register of the programmable logic device according to the power-down type;
And the second configuration submodule is used for configuring the timer of the programmable logic device according to the timeout type.
In one embodiment of the present invention, the target component includes a power supply unit, the error type includes a fan failure, an input voltage failure, and the error injection module 404 includes:
a third configuration submodule for configuring a fan register of the power supply unit according to a fan fault;
and the fourth configuration submodule is used for configuring an input voltage register of the power supply unit according to the input voltage fault.
In one embodiment of the present invention, the running information query instruction includes a number of a fan register, and the obtaining module 406 includes:
and the acquisition sub-module is used for acquiring the operation information of the fan register through the bidirectional serial bus according to the serial number of the fan register.
In one embodiment of the present invention, the target component is at least one of a central processing unit, a programmable logic device, a power supply unit, a memory, and a network card.
The invention discloses a baseboard management controller, which can determine the error type of a target component needing to be wrongly injected through an upper computer, so that an initialization link can be created between the baseboard management controller and the target component according to the error type of the target component, after the initialization link is completed, an error injection instruction can be sent to the baseboard management controller to wrongly inject the target component, after the error injection is completed, an operation information inquiry request can be sent to the baseboard management controller, operation information corresponding to the target component sent by the baseboard management controller is received, then whether a fault corresponding to the error type occurs is determined according to the operation information, and if the fault does not occur, alarm information is generated and displayed; the upper computer can provide a unified error injection platform for server system testers, namely, the upper computer can realize error injection of each component, thereby improving error injection verification efficiency and greatly reducing system test cost; and the alarm information can be displayed when the fault corresponding to the error type of the fault injection does not occur, so that convenience is provided for the user.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
Referring to fig. 7, a block diagram of an electronic device 50 according to an embodiment of the present invention is provided, including:
the processor 501, the memory 502, and the computer program 5021 stored in the memory 502 and capable of running on the processor 501, where the computer program 5021 when executed by the processor 501 implements the processes of the foregoing embodiments of the error injection method and achieves the same technical effects, and is not repeated herein.
Referring to fig. 8, a block diagram of a computer readable storage medium 60 according to an embodiment of the present invention is shown, where a computer program 601 is stored on the computer readable storage medium 60, and when the computer program 601 is executed by a processor, the processes of the foregoing error injection method embodiment are implemented, and the same technical effects can be achieved, so that repetition is avoided and redundant description is omitted.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or terminal device that comprises the element.
The foregoing describes in detail the method, system, host computer, baseboard management controller, equipment and storage medium provided by the present invention, and specific examples are applied to illustrate the principles and embodiments of the present invention, and the above examples are only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (15)

1. The fault injection method is characterized by being applied to an upper computer, wherein the upper computer is connected with a substrate management controller, and the method comprises the following steps of:
determining the error type of a target component needing to be wrongly injected, and generating a wrongly injected starting command according to the error type of the target component;
sending the error injection initiation command to the baseboard management controller to create an initialization link between the baseboard management controller and the target component;
if an initialization link creation completion signal sent by the baseboard management controller is received, generating an error injection instruction according to the error type, and sending the error injection instruction to the baseboard management controller so as to perform error injection on the target component;
Receiving an error injection completion signal sent by the baseboard management controller, and sending an operation information inquiry request to the baseboard management controller;
and receiving operation information corresponding to the target component, which is sent by the baseboard management controller, judging whether the fault corresponding to the error type occurs according to the operation information, and if the fault corresponding to the error type does not occur, generating and displaying alarm information.
2. The method according to claim 1, wherein the operation information query request includes a register identifier corresponding to an error type of the target component, and the determining whether the fault corresponding to the error type occurs according to the operation information includes:
if the operation information of the register corresponding to the error type of the target component comprises a fault corresponding to the error type, determining that the fault corresponding to the error type occurs;
and if the operation information of the register corresponding to the error type of the target component does not comprise the fault corresponding to the error type, determining that the fault corresponding to the error type does not occur.
3. The method of claim 1, wherein the host computer includes an intelligent platform management interface, the method further comprising:
and sending a reset command to a baseboard management controller through an intelligent platform management interface so that the baseboard management controller triggers a reset interrupt of the target component, and restoring the target component to an initial state.
4. The method of claim 1, wherein the target component comprises at least one of a central processing unit, a programmable logic device, a power supply unit, a memory, a network card.
5. An error injection method is applied to a baseboard management controller, wherein the baseboard management controller is connected with an upper computer, and the method comprises the following steps:
receiving an error injection starting command sent by the upper computer, wherein the error injection starting command comprises an error type of a target component needing error injection;
establishing an initialization link with the target component according to the error injection starting command;
Transmitting an initialization link creation completion signal to the upper computer, and receiving an error injection instruction transmitted by the upper computer;
performing error injection on the target component according to the error injection instruction, and sending an error injection completion signal to the upper computer after the error injection is completed;
receiving an operation information inquiry request sent by the upper computer;
and acquiring the operation information of the target component according to the operation information query request, and sending the operation information to the upper computer.
6. The method of claim 5, wherein the target component comprises a central processing unit, the baseboard management controller comprises a joint test workgroup interface controller and a joint test workgroup interface, the establishing an initialization link with the target component according to the fault injection initiation command comprises:
Initializing the joint test working group interface controller, and configuring an initialization link between the joint test working group interface and the central processing unit through the joint test working group interface controller.
7. The method of claim 5, wherein the target component comprises a programmable logic device, the error type comprises a power down type, a timeout type, and the misannotating the target component according to the misannotating instruction comprises:
configuring a power-down register of the programmable logic device according to the power-down type;
And configuring a timer of the programmable logic device according to the timeout type.
8. The method of claim 5, wherein the target component comprises a power supply unit, the error type comprises a fan failure, an input voltage failure, and the misinjection of the target component according to the misinjection instruction comprises:
Configuring a fan register of the power supply unit according to the fan fault;
And configuring an input voltage register of the power supply unit according to the input voltage fault.
9. The method of claim 8, wherein the operation information query instruction includes a number of a fan register, and the obtaining operation information of the target component according to the operation information query request includes:
and acquiring the operation information of the fan register through a bidirectional serial bus according to the serial number of the fan register.
10. The method of claim 5, wherein the target component comprises at least one of a central processing unit, a programmable logic device, a power supply unit, a memory, a network card.
11. The fault injection system is characterized by comprising an upper computer and a substrate management controller, wherein the upper computer is connected with the substrate management controller;
The upper computer is used for determining the error type of a target component needing error injection, packaging the error type of the target component to obtain an error injection starting command, and sending the error injection starting command to the baseboard management controller; receiving an initialization link creation completion signal sent by the baseboard management controller, generating an error injection instruction according to the error type, sending the error injection instruction to the baseboard management controller, and sending an operation information inquiry request to the baseboard management controller; receiving operation information corresponding to the target component sent by the baseboard management controller, judging whether a fault corresponding to the error type occurs according to the operation information, and if the fault corresponding to the error type does not occur, generating and displaying alarm information;
The baseboard management controller is used for receiving the error injection starting command sent by the upper computer, and establishing an initialization link with the target component according to the error injection starting command; transmitting an initialization link creation completion signal to the upper computer, receiving an error injection instruction transmitted by the upper computer, performing error injection on the target component according to the error injection instruction, and transmitting an error injection completion signal to the upper computer after error injection is completed; receiving an operation information inquiry request sent by the upper computer; and acquiring the operation information of the target component according to the operation information query request, and sending the operation information to the upper computer.
12. An upper computer, the upper computer is connected with a baseboard management controller, the upper computer includes:
the determining module is used for determining the error type of a target component needing to be wrongly injected, and generating a wrongly injected starting command according to the error type of the target component;
A transmitting module, configured to transmit the error injection start command to the baseboard management controller, so as to create an initialization link between the baseboard management controller and the target component;
The first sending module is used for generating an error injection instruction according to the error type if receiving an initialization link creation completion signal sent by the baseboard management controller, and sending the error injection instruction to the baseboard management controller so as to perform error injection on the target component;
the second sending module is used for receiving the error injection completion signal sent by the baseboard management controller and sending an operation information inquiry request to the baseboard management controller;
And the display module is used for receiving the operation information corresponding to the target component sent by the substrate management controller, judging whether the fault corresponding to the error type occurs according to the operation information, and if the fault corresponding to the error type does not occur, generating and displaying alarm information.
13. A baseboard management controller connected to a host computer, the baseboard management controller comprising:
the first receiving module is used for receiving an error injection starting command sent by the upper computer, wherein the error injection starting command comprises an error type of a target component needing error injection;
The establishing module is used for establishing an initialization link with the target component according to the error injection starting command;
The second receiving module is used for sending an initialization link creation completion signal to the upper computer and receiving an error injection instruction sent by the upper computer;
The error injection module is used for performing error injection on the target component according to the error injection instruction, and sending an error injection completion signal to the upper computer after the error injection is completed;
the third receiving module is used for receiving an operation information inquiry request sent by the upper computer;
and the acquisition module is used for acquiring the operation information of the target component according to the operation information query request and sending the operation information to the upper computer.
14. An electronic device, comprising: a processor, a memory and a computer program stored on the memory and capable of running on the processor, which when executed by the processor performs the steps of the error injection method according to any of claims 1-10.
15. A non-transitory readable storage medium, wherein a computer program is stored on the non-transitory readable storage medium, which when executed by a processor, implements the steps of the error injection method according to any one of claims 1-10.
CN202410532594.4A 2024-04-29 2024-04-29 Error injection method, system, upper computer, controller, equipment and storage medium Active CN118132358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410532594.4A CN118132358B (en) 2024-04-29 2024-04-29 Error injection method, system, upper computer, controller, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410532594.4A CN118132358B (en) 2024-04-29 2024-04-29 Error injection method, system, upper computer, controller, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN118132358A true CN118132358A (en) 2024-06-04
CN118132358B CN118132358B (en) 2024-08-30

Family

ID=91242903

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410532594.4A Active CN118132358B (en) 2024-04-29 2024-04-29 Error injection method, system, upper computer, controller, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN118132358B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170102985A1 (en) * 2014-06-24 2017-04-13 Huawei Technologies Co.,Ltd. Fault processing method, related apparatus, and computer
CN115794524A (en) * 2022-12-06 2023-03-14 苏州浪潮智能科技有限公司 Verification method, verification device, electronic equipment and readable storage medium
CN115904846A (en) * 2022-12-15 2023-04-04 苏州浪潮智能科技有限公司 Method, device and system for testing system emergency fault

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170102985A1 (en) * 2014-06-24 2017-04-13 Huawei Technologies Co.,Ltd. Fault processing method, related apparatus, and computer
CN115794524A (en) * 2022-12-06 2023-03-14 苏州浪潮智能科技有限公司 Verification method, verification device, electronic equipment and readable storage medium
CN115904846A (en) * 2022-12-15 2023-04-04 苏州浪潮智能科技有限公司 Method, device and system for testing system emergency fault

Also Published As

Publication number Publication date
CN118132358B (en) 2024-08-30

Similar Documents

Publication Publication Date Title
US9569325B2 (en) Method and system for automated test and result comparison
CN111488233A (en) Method and system for processing bandwidth loss problem of PCIe device
US10552242B2 (en) Runtime failure detection and correction
CN106610712B (en) Substrate management controller resetting system and method
CN112256507B (en) Chip fault diagnosis method and device, readable storage medium and electronic equipment
CN102571498A (en) Fault injection control method and device
CN111858122A (en) Fault detection method, device, equipment and storage medium of storage link
CN111897697B (en) Server hardware fault repairing method and device
CN111988196B (en) Bandwidth detection method and device, electronic equipment and storage medium
US20220043728A1 (en) Method, apparatus, device and system for capturing trace of nvme hard disc
CN109710479B (en) Processing method, first device and second device
CN118132358B (en) Error injection method, system, upper computer, controller, equipment and storage medium
CN105912414A (en) Method and system for server management
CN112905445B (en) Log-based test method, device and computer system
CN109885420B (en) PCIe link fault analysis method, BMC and storage medium
CN112817883A (en) Method, device and system for adapting interface platform and computer readable storage medium
CN112463446B (en) PCIe device recovery method and system, electronic device and storage medium
CN116010158A (en) Verification device, verification system and chip device of configuration register
CN110399258B (en) Stability testing method, system and device for server system
CN112732486A (en) Redundant firmware switching method, device, equipment and storage medium
CN114168396B (en) Fault positioning method and related assembly
CN116382968B (en) Fault detection method and device for external equipment
CN113392026A (en) Interface automation test method, system, electronic equipment and storage medium
CN111258833A (en) High-speed bus stability detection method, system and related components
CN116541211A (en) Keyboard controller style interface fault redundancy method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant