CN115220971A - GPUBOX test method, system, electronic device and storage medium - Google Patents

GPUBOX test method, system, electronic device and storage medium Download PDF

Info

Publication number
CN115220971A
CN115220971A CN202210601602.7A CN202210601602A CN115220971A CN 115220971 A CN115220971 A CN 115220971A CN 202210601602 A CN202210601602 A CN 202210601602A CN 115220971 A CN115220971 A CN 115220971A
Authority
CN
China
Prior art keywords
gpubox
host
test
order
hosts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210601602.7A
Other languages
Chinese (zh)
Inventor
张含
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210601602.7A priority Critical patent/CN115220971A/en
Publication of CN115220971A publication Critical patent/CN115220971A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2289Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing by configuration test
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2268Logging of test results

Landscapes

  • Engineering & Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Test And Diagnosis Of Digital Computers (AREA)

Abstract

The invention provides a GPUBOX testing method, which comprises the following steps: constructing an order long text according to a host order and an associated GPUBOX order and writing the order long text into the host; the host is powered on and started up and enters an operating system; the host reads the stored order long text and judges whether associated GPUBOX order information exists or not; and if the GPUBOX order information exists, the host machine carries out automatic test on the associated GPUBOX according to the associated GPUBOX order information and generates a test result, wherein the automatic test comprises a GPUBOX complete machine test and a GPUBOX part test. The main machine and the GPUBOX are associated firstly, and when the main machine obtains the associated GPUBOX order information, the main machine is used for carrying out fully automatic pre-factory tests including part tests and whole machine tests on the GPUBOX, so that the factory quality of GPUBOX products is guaranteed, and the reject ratio of GPUBOX production is reduced.

Description

GPUBOX test method, system, electronic device and storage medium
Technical Field
The invention relates to the field of complete machine testing, in particular to a GPUBOX testing method, a GPUBOX testing system, electronic equipment and a storage medium.
Background
With the development of internet, cloud computing and big data application technologies, a data center has an increasing demand for a high-density server of an accelerated computing type when processing data. However, the internal space of a common high-density server case is limited, and only 1-2 GPU accelerator cards can be matched based on heat dissipation and power supply requirements, and the host + GPU box mode is increasingly applied. The GPUBOX is an independent server used for GPU expansion, only a GPU board, a fan board and a power board are arranged in the server, and the GPU board comprises a BMC chip and N GPU slots used for being connected with a GPU accelerator card. According to different use scenes and customer requirements, various collocation modes such as 1 torr 1 (one host machine is collocated with one GPUBOX), 2 torr 1,1 torr 2 and the like exist at present. At present, the production of a factory server is that a host and a GPUBOX are assembled respectively, the GPUBOX does not have a CPU and a memory in an order system, an automatic test cannot be carried out under an OS system independently, and the factory-leaving quality of the GPUBOX cannot be ensured.
Disclosure of Invention
In view of the above, it is necessary to provide a GPUBOX testing method, a system, an electronic device, and a storage medium, which can implement automated testing of GPUBOX and acquire GPUBOX integrity data.
In a first aspect, a method of GPUBOX testing is provided, the method comprising:
constructing an order long text according to a host order and an associated GPUBOX order and writing the order long text into the host;
the host is powered on and started up and enters an operating system;
the host reads the stored order long text and judges whether relevant GPUBOX order information exists or not;
and if so, the host machine carries out automatic test on the associated GPUBOX according to the associated GPUBOX order information and generates a test result, wherein the automatic test comprises GPUBOX whole machine test and GPUBOX part test.
In one embodiment, before the host is powered on and started, the method further includes:
an assembler assembles a host and a GPUBOX and connects the host with the associated GPUBOX via a cable according to the host order and the associated GPUBOX order.
In one embodiment, before the host automatically tests the associated GPUBOX according to the associated GPUBOX order information, the method further comprises:
the host obtains product information and firmware versions of the associated GPUBOX;
the host configures the associated GPUBOX in accordance with the product information and the firmware version.
In one embodiment, the GPUBOX whole machine test comprises a GPUBOX pressure test and a GPUBOX health check, and the GPUBOX part test comprises a GPU test, a power supply test, a fan test and a substrate test.
In one embodiment, the host automatically testing the associated gpucox according to the associated gpucox order information includes:
when the number of the hosts is 1, the hosts perform the GPUBOX whole test and the GPUBOX part test on the associated GPUBOX;
and when the number of the hosts is not 1, the hosts and other hosts associated in the long text of the order jointly perform the GPUBOX overall test and the GPUBOX part test on the associated GPUBOX.
In one embodiment, the performing, by the host computer and the other host computers associated with the order long text, the gpucox overall test and the gpucox part test on the associated gpucox together comprises:
the host computer performs the GPUBOX bulk test and the substrate test on the associated GPUBOX;
the associated other host computer performs the GPUBOX overall test on the associated GPUBOX;
the host and the associated other hosts assign parts in the associated GPUBOX and perform the GPUBOX part tests, wherein the parts include one or more of a fan, a power supply, and a GPU.
In one embodiment, after the host automatically tests the associated gpucox according to the associated gpucox order information, the method includes:
the host detects whether the product information and the firmware version of the associated GPUBOX generate errors;
if not, the host sends alarm information to the background;
and if so, the host uploads the test result to a background and shuts down.
In another aspect, a GPUBOX testing system is provided, the system comprising:
constructing an order long text according to a host order and an associated GPUBOX order and writing the order long text into the host;
the host is used for powering on and starting up and entering an operating system;
the host is used for reading the stored order long text and judging whether associated GPUBOX order information exists or not;
and if the GPUBOX order information exists, the host is used for carrying out automatic testing on the associated GPUBOX according to the associated GPUBOX order information and generating a testing result, wherein the automatic testing comprises GPUBOX whole machine testing and GPUBOX part testing.
In one embodiment, before the host is powered on and booted, the method further includes:
an assembler assembles a host and a GPUBOX and connects the host with the associated GPUBOX via a cable according to the host order and the associated GPUBOX order.
In one embodiment, before the host automatically tests the associated GPUBOX according to the associated GPUBOX order information, the method further includes:
the host obtains product information and firmware versions of the associated GPUBOX;
the host configures the associated GPUBOX in accordance with the product information and the firmware version.
In one embodiment, the GPUBOX whole machine test comprises a GPUBOX pressure test and a GPUBOX health check, and the GPUBOX part test comprises a GPU test, a power supply test, a fan test and a substrate test.
In one embodiment, the host automatically testing the associated GPUBOX according to the associated GPUBOX order information comprises:
when the number of the hosts is 1, the hosts perform the GPUBOX whole test and the GPUBOX part test on the associated GPUBOX;
and when the number of the hosts is not 1, the hosts and other hosts associated in the long text of the order jointly perform the GPUBOX overall test and the GPUBOX part test on the associated GPUBOX.
In one embodiment, the performing, by the host computer and the other host computers associated with the order long text, the gpucox overall test and the gpucox part test on the associated gpucox together comprises:
the host computer performs the GPUBOX bulk test and the substrate test on the associated GPUBOX;
the associated other host computer performs the GPUBOX overall test on the associated GPUBOX;
the host and the associated other hosts assign parts in the associated GPUBOX and perform the GPUBOX part tests, wherein the parts include one or more of a fan, a power supply, and a GPU.
In one embodiment, after the host automatically tests the associated gpucox according to the associated gpucox order information, the method includes:
the host detects whether the product information and the firmware version of the associated GPUBOX generate errors;
if not, the host sends alarm information to a background;
and if so, the host uploads the test result to a background and shuts down.
In another aspect, an electronic device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the following steps:
constructing an order long text according to a host order and an associated GPUBOX order and writing the order long text into the host;
the host is powered on and started up and enters an operating system;
the host reads the stored order long text and judges whether associated GPUBOX order information exists or not;
and if so, the host machine carries out automatic test on the associated GPUBOX according to the associated GPUBOX order information and generates a test result, wherein the automatic test comprises GPUBOX whole machine test and GPUBOX part test.
In one embodiment, the processor, when executing the computer program, performs the steps of:
before the host computer is powered on and started up, the method further comprises the following steps:
an assembler assembles a host and a GPUBOX and connects the host with the associated GPUBOX via a cable according to the host order and the associated GPUBOX order.
In one embodiment, the processor, when executing the computer program, performs the steps of:
before the host performs the automated test on the associated GPUBOX according to the associated GPUBOX order information, the method further comprises the following steps:
the host obtains product information and firmware versions of the associated GPUBOX;
the host configures the associated GPUBOX in accordance with the product information and the firmware version.
In one embodiment, the processor, when executing the computer program, performs the steps of:
the GPUBOX whole machine test comprises a GPUBOX pressure test and a GPUBOX health check, and the GPUBOX part test comprises a GPU test, a power supply test, a fan test and a substrate test.
In one embodiment, the processor, when executing the computer program, performs the steps of:
the automated testing of the associated GPUBOX by the host according to the associated GPUBOX order information comprises:
when the number of the hosts is 1, the hosts perform the GPUBOX whole test and the GPUBOX part test on the associated GPUBOX;
and when the number of the hosts is not 1, the hosts and other hosts associated in the long text of the order jointly perform the GPUBOX overall test and the GPUBOX part test on the associated GPUBOX.
In one embodiment, the processor, when executing the computer program, performs the steps of:
the host computer and other host computers related in the order long text jointly carry out the GPUBOX whole test and the GPUBOX part test on the related GPUBOX, and the GPUBOX part test comprises the following steps:
the host performing the GPUBOX global test and the substrate test on the associated GPUBOX;
the associated other host computer performs the GPUBOX overall test on the associated GPUBOX;
the host and the associated other hosts assign parts in the associated GPUBOX and perform the GPUBOX part tests, wherein the parts include one or more of a fan, a power supply, and a GPU.
In one embodiment, the processor, when executing the computer program, performs the steps of:
after the host automatically tests the associated GPUBOX according to the associated GPUBOX order information, the method comprises the following steps:
the host detects whether the product information and the firmware version of the associated GPUBOX generate errors;
if not, the host sends alarm information to the background;
and if so, the host uploads the test result to a background and shuts down.
In yet another aspect, a computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of:
constructing an order long text according to a host order and an associated GPUBOX order and writing the order long text into the host;
the host is powered on and started up and enters an operating system;
the host reads the stored order long text and judges whether associated GPUBOX order information exists or not;
and if so, the host machine carries out automatic test on the associated GPUBOX according to the associated GPUBOX order information and generates a test result, wherein the automatic test comprises GPUBOX whole machine test and GPUBOX part test.
In one embodiment, the computer program when executed by a processor performs the steps of:
before the host computer is powered on and started up, the method further comprises the following steps:
an assembler assembles a host and a GPUBOX and connects the host with the associated GPUBOX via a cable according to the host order and the associated GPUBOX order.
In one embodiment, the computer program when executed by a processor implements the steps of:
before the host performs the automated test on the associated GPUBOX according to the associated GPUBOX order information, the method further comprises the following steps:
the host obtains product information and firmware versions of the associated GPUBOX;
the host configures the associated GPUBOX in accordance with the product information and the firmware version.
In one embodiment, the computer program when executed by a processor implements the steps of:
the GPUBOX whole machine test comprises a GPUBOX pressure test and a GPUBOX health check, and the GPUBOX part test comprises a GPU test, a power supply test, a fan test and a substrate test.
In one embodiment, the computer program when executed by a processor implements the steps of:
the automated testing of the associated GPUBOX by the host according to the associated GPUBOX order information comprises:
when the number of the hosts is 1, the hosts perform the GPUBOX whole test and the GPUBOX part test on the associated GPUBOX;
when the number of the hosts is not 1, the hosts and other hosts associated in the long text of the order jointly perform the GPUBOX whole test and the GPUBOX part test on the associated GPUBOX.
In one embodiment, the computer program when executed by a processor implements the steps of:
the host computer and other host computers related in the order long text jointly carry out the GPUBOX whole test and the GPUBOX part test on the related GPUBOX, and the GPUBOX part test comprises the following steps:
the host computer performs the GPUBOX bulk test and the substrate test on the associated GPUBOX;
the associated other hosts perform the GPUBOX integrity test on the associated GPUBOX;
the host and the associated other hosts assign parts in the associated GPUBOX and perform the GPUBOX part tests, wherein the parts include one or more of a fan, a power supply, and a GPU.
In one embodiment, the computer program when executed by a processor implements the steps of:
after the host automatically tests the associated GPUBOX according to the associated GPUBOX order information, the method comprises the following steps:
the host detects whether the product information and the firmware version of the associated GPUBOX generate errors;
if not, the host sends alarm information to the background;
and if so, the host uploads the test result to a background and shuts down.
Constructing an order long text according to a host order and an associated GPUBOX order and writing the order long text into the host; the host is powered on and started up and enters an operating system; the host reads the stored order long text and judges whether associated GPUBOX order information exists or not; and if so, the host machine carries out automatic test on the associated GPUBOX according to the associated GPUBOX order information and generates a test result, wherein the automatic test comprises GPUBOX whole machine test and GPUBOX part test. The main machine and the GPUBOX are associated, when the main machine obtains associated GPUBOX order information, the main machine conducts comprehensive and automatic pre-factory tests including part tests and whole machine tests on the GPUBOX, the factory quality of GPUBOX products is guaranteed, and the reject ratio of GPUBOX production is reduced.
Drawings
FIG. 1 is a schematic topology of a GPUBOX test method;
FIG. 2 is a schematic representation of the steps of a GPUBOX test method;
FIG. 3 is a schematic flow chart of a GPUBOX test method;
fig. 4 is an internal structural diagram of a computer device in an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The GPUBOX test method provided by the application can be applied to the topological schematic diagram of the GPUBOX test method shown in figure 1. The association between the host order and the GPUBOX order can be determined by the background, the host order comprises one or more hosts, the GPUBOX order comprises one or more GPUBOX, when an assembler knows the host and the associated GPUBOX through the background, the host and the GPUBOX can be connected through a Minisas (host interface) cable, after the host is powered on and started, whether the associated GPUBOX order is still in the order long text or not is detected, when the host does not have the associated GPUBOX order, the host independently performs host testing, and when the associated GPUBOX order is available, the host can perform host testing first and then perform automatic testing on the GPUBOX.
In one embodiment, as shown in fig. 2, the present invention provides a gpucox testing method, the method comprising:
s201, constructing an order long text according to a host order and an associated GPUBOX order and writing the order long text into the host;
s202, powering on and starting up the host and entering an operating system;
s203, the host reads the stored order long text and judges whether associated GPUBOX order information exists or not;
and S204, if the order information of the GPUBOX exists, the host carries out automatic test on the associated GPUBOX according to the associated GPUBOX order information and generates a test result, wherein the automatic test comprises a GPUBOX complete machine test and a GPUBOX part test.
Specifically, as shown in fig. 3, the background constructs and writes the order long text to the current host after determining the current host order and the associated gpucox order from the plurality of host orders and the plurality of gpucox orders. The method comprises the steps that a plurality of hosts are powered on to start and read order long texts, when the hosts determine that no related GPUBOX order information exists in the order long texts, the hosts independently perform host testing, and after the testing is completed, testing results are uploaded to a background and shut down. When the host determines that the relevant GPUBOX order information exists in the order long text, the automatic test of the relevant GPUBOX is realized through the GPUBOX test method.
In one embodiment, before the host is powered on and started, the method further includes:
an assembler assembles a host and a GPUBOX and connects the host with the associated GPUBOX via a cable according to the host order and the associated GPUBOX order.
Specifically, when an assembler assembles a plurality of hosts and a plurality of GPUBOX respectively, when knowing that one of the hosts and the associated GPUBOX need to be automatically tested, the host and the associated GPUBOX are connected through a cable.
In one embodiment, before the host automatically tests the associated GPUBOX according to the associated GPUBOX order information, the method further comprises:
the host obtains product information and firmware versions of the associated GPUBOX;
the host configures the associated GPUBOX in accordance with the product information and the firmware version.
Specifically, the host acquires the product information of the associated GPUBOX and configures the associated GPUBOX, and may acquire BMCIP (internet protocol network interconnection protocol) of the GPUBOX by using a BMC (base board Management Controller) OEM (Original Equipment Manufacturer) command, and then refresh FRU information of the GPUBOX by using a BMCIP out-of-band manner, where one section of EEPROM (electrically erasable programmable read only memory) of the GPUBOX motherboard is used to record FRU information, and the FRU information may record information of manufacturers, product models, product serial numbers, asset serial numbers, and the like of the motherboard and the platelet card in the GPUBOX. The host obtains the firmware version of the associated GPUBOX and configures the associated GPUBOX to perform a BMC FW (FIREWARM firmware version) flush on the GPUBOX by simulating a web page flush by a script curl (a file transfer tool that works under a command line using URL syntax).
In one embodiment, the GPUBOX whole machine test comprises a GPUBOX pressure test and a GPUBOX health check, and the GPUBOX part test comprises a GPU test, a power supply test, a fan test and a substrate test.
Specifically, the host computer performs GPUBOX part tests including GPU (quantity, model, FW, parameters and the like) tests, power supply (quantity, model, FW and the like) tests, fans (quantity, rotating speed and the like) tests, BMCMAC (Moving Average Cost) substrate tests and the like through BMC OEM out-of-band commands. The host machine carries out GPUBOX complete machine test including GPUBOX pressure test, wherein can use BMCIP out-of-band mode to carry out AC pressure test to GPUBOX, namely through intelligent PDU (Power distribution unit, power failure restart), DC pressure test, reboot restart pressure test, carry out GPUBOX health check after restarting each time, including BMC SEL log check (whether there is warning), BMC sensor (temperature, voltage, power, fan speed, etc.) information check.
In one embodiment, the host automatically testing the associated gpucox according to the associated gpucox order information includes:
when the number of the hosts is 1, the hosts perform the GPUBOX whole test and the GPUBOX part test on the associated GPUBOX;
and when the number of the hosts is not 1, the hosts and other hosts associated in the long text of the order jointly perform the GPUBOX overall test and the GPUBOX part test on the associated GPUBOX.
Specifically, among a plurality of hosts and GPUBOXs to be shipped from a factory, a host order and an associated GPUBOX are subjected to GPUBOX automated testing, wherein 1 host order and 1 GPUBOX order are associated, 1 host order and 2 GPUBOX orders are associated, 2 or even a plurality of host orders and 1 GPUBOX order are associated, and the like. When 1 host order is associated with 2 GPUBOX orders, the current host respectively carries out automatic test on the 2 GPUBOX orders; when 2 host orders are associated with 1 GPUBOX order, the 2 associated hosts respectively carry out automatic testing on the associated GPUBOX according to the associated GPUBOX order information.
In one embodiment, the performing, by the host computer and the other host computers associated with the order long text, the gpucox overall test and the gpucox part test on the associated gpucox together comprises:
the host computer performs the GPUBOX bulk test and the substrate test on the associated GPUBOX;
the associated other host computer performs the GPUBOX overall test on the associated GPUBOX;
the host and the associated other hosts assign parts in the associated GPUBOX and perform the GPUBOX part tests, wherein the parts include one or more of a fan, a power supply, and a GPU.
Specifically, for example, 2 host orders and 1 gpucox order are associated, wherein one host performs a gpucox overall test on the current gpucox and a substrate test on the BMCMAC, the other host also performs a gpucox overall test on the current gpucox, and equally distributes a plurality of parts in the gpucox, such as a fan, a power supply and a GPU chip, half of each of the 2 hosts is distributed, one third of each of the 3 hosts is distributed, and after the distribution is finished, a gpucox part test is performed on the parts in the hosts.
In one embodiment, after the host automatically tests the associated gpucox according to the associated gpucox order information, the method includes:
the host detects whether the product information and the firmware version of the associated GPUBOX generate errors;
if not, the host sends alarm information to the background;
and if so, the host uploads the test result to a background and shuts down.
Specifically, after the pressure test of the GPUBOX is finished, whether the FRU information and the BMC FW version of the GPUBOX are consistent with the FRU information and the BMC FW version acquired from a background before the test is finished is checked, when the FRU information and the BMC FW version are inconsistent, the host sends alarm information to the background to remind operation and maintenance personnel, and when the FRU information and the BMC FW version of the GPUBOX are consistent, the host collects BMCMAC (personal management computer) information, FRU information, SEL (service provider identifier) information, SDR (service provider identifier) information and the like of the GPUBOX to establish a whole machine delivery file, uploads the whole machine delivery file to the background and shuts down.
The scheme of this application has following beneficial effect:
1) (ii) a When a factory respectively produces and assembles the host and the GPUBOX, the order of the host and the order of the GPUBOX are associated in an order system, after the assembly is finished, a Minisas cable is used for connecting the host and the associated GPUBOX and carrying out power-up test, so that the GPUBOX which cannot be independently tested automatically because the GPUBOX does not have a CPU and a memory can be tested before leaving a factory, and the production quality of the GPUBOX is ensured;
2) The method comprises the steps of performing GPUBOX part test on GPUBOX to realize independent test and inspection of all parts in the GPUBOX, and recording detailed parameters of all parts in the GPUBOX in detail through records of complete machine field-leaving files;
3) The GPUBOX whole machine test including the GPUBOX pressure test and the GPUBOX health check is carried out on the GPUBOX, so that the quality of the GPUBOX to be delivered from a factory is ensured, and the reject ratio of production is reduced.
It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, there is provided a gpucox testing system, the system comprising:
constructing an order long text according to a host order and an associated GPUBOX order and writing the order long text into the host;
the host is used for powering on and starting up and entering an operating system;
the host is used for reading the stored order long text and judging whether associated GPUBOX order information exists or not;
and if the GPUBOX order information exists, the host computer is used for carrying out automatic testing on the related GPUBOX according to the related GPUBOX order information and generating a testing result, wherein the automatic testing comprises GPUBOX whole machine testing and GPUBOX part testing.
In one embodiment, before the host is powered on and booted, the method further includes:
an assembler assembles a host and a GPUBOX and connects the host with the associated GPUBOX via a cable according to the host order and the associated GPUBOX order.
In one embodiment, before the host automatically tests the associated GPUBOX according to the associated GPUBOX order information, the method further includes:
the host obtains product information and firmware versions of the associated GPUBOX;
the host configures the associated GPUBOX in accordance with the product information and the firmware version.
In one embodiment, the GPUBOX whole machine test comprises a GPUBOX pressure test and a GPUBOX health check, and the GPUBOX part test comprises a GPU test, a power supply test, a fan test and a substrate test.
In one embodiment, the host automatically testing the associated gpucox according to the associated gpucox order information includes:
when the number of the hosts is 1, the hosts perform the GPUBOX whole test and the GPUBOX part test on the associated GPUBOX;
and when the number of the hosts is not 1, the hosts and other hosts associated in the long text of the order jointly perform the GPUBOX overall test and the GPUBOX part test on the associated GPUBOX.
In one embodiment, the performing, by the host computer and the other host computers associated with the order long text, the gpucox overall test and the gpucox part test on the associated gpucox together comprises:
the host computer performs the GPUBOX bulk test and the substrate test on the associated GPUBOX;
the associated other host computer performs the GPUBOX overall test on the associated GPUBOX;
the host and the associated other hosts assign parts in the associated GPUBOX and perform the GPUBOX part tests, wherein the parts include one or more of a fan, a power supply, and a GPU.
In one embodiment, after the host automatically tests the associated GPUBOX according to the associated GPUBOX order information, the method comprises:
the host detects whether the product information and the firmware version of the associated GPUBOX generate errors;
if not, the host sends alarm information to a background;
and if so, the host uploads the test result to a background and shuts down.
For specific limitations of the GPUBOX test system, reference may be made to the above limitations of the GPUBOX test method, which are not described in detail herein.
In one embodiment, an electronic device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 4. The electronic device comprises a processor, a memory, a network interface, a display screen and an input device which are connected through a system bus. Wherein the processor of the electronic device is configured to provide computing and control capabilities. The memory of the electronic equipment comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the electronic device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement an alert information processing method. The display screen of the electronic equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the electronic equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the electronic equipment, an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 4 is a block diagram of only a portion of the structure associated with the present application, and does not constitute a limitation on the electronic device to which the present application applies, and that a particular electronic device may include more or fewer components than shown in the drawings, or may combine certain components, or have a different arrangement of components.
In one embodiment, an electronic device is provided, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
constructing an order long text according to a host order and an associated GPUBOX order and writing the order long text into the host;
the host is powered on and started up and enters an operating system;
the host reads the stored order long text and judges whether associated GPUBOX order information exists or not;
and if the GPUBOX order information exists, the host machine carries out automatic test on the associated GPUBOX according to the associated GPUBOX order information and generates a test result, wherein the automatic test comprises a GPUBOX complete machine test and a GPUBOX part test.
In one embodiment, the processor, when executing the computer program, performs the steps of:
before the host computer is powered on and started up, the method further comprises the following steps:
an assembler assembles a host and a GPUBOX and connects the host with the associated GPUBOX via a cable according to the host order and the associated GPUBOX order.
In one embodiment, the processor, when executing the computer program, performs the steps of:
before the host performs the automated test on the associated GPUBOX according to the associated GPUBOX order information, the method further comprises the following steps:
the host obtains product information and firmware versions of the associated GPUBOX;
the host configures the associated GPUBOX in accordance with the product information and the firmware version.
In one embodiment, the processor, when executing the computer program, performs the steps of:
the GPUBOX whole machine test comprises a GPUBOX pressure test and a GPUBOX health check, and the GPUBOX part test comprises a GPU test, a power supply test, a fan test and a substrate test.
In one embodiment, the processor, when executing the computer program, performs the steps of:
the automated testing of the associated GPUBOX by the host according to the associated GPUBOX order information comprises:
when the number of the hosts is 1, the hosts perform the GPUBOX whole test and the GPUBOX part test on the associated GPUBOX;
and when the number of the hosts is not 1, the hosts and other hosts associated in the long text of the order jointly perform the GPUBOX overall test and the GPUBOX part test on the associated GPUBOX.
In one embodiment, the processor, when executing the computer program, performs the steps of:
the host computer and other host computers related in the order long text jointly carry out the GPUBOX whole test and the GPUBOX part test on the related GPUBOX, and the GPUBOX part test comprises the following steps:
the host computer performs the GPUBOX bulk test and the substrate test on the associated GPUBOX;
the associated other host computer performs the GPUBOX overall test on the associated GPUBOX;
the host and the associated other hosts assign parts in the associated GPUBOX and perform the GPUBOX part tests, wherein the parts include one or more of a fan, a power supply, and a GPU.
In one embodiment, the processor, when executing the computer program, performs the steps of:
after the host automatically tests the associated GPUBOX according to the associated GPUBOX order information, the method comprises the following steps:
the host detects whether the product information and the firmware version of the associated GPUBOX have errors;
if not, the host sends alarm information to the background;
and if so, the host uploads the test result to a background and shuts down.
In one embodiment, a computer-readable storage medium is provided, having stored thereon a computer program which, when executed by a processor, performs the steps of:
constructing an order long text according to a host order and an associated GPUBOX order and writing the order long text into the host;
the host is powered on and started up and enters an operating system;
the host reads the stored order long text and judges whether relevant GPUBOX order information exists or not;
and if the GPUBOX order information exists, the host machine carries out automatic test on the associated GPUBOX according to the associated GPUBOX order information and generates a test result, wherein the automatic test comprises a GPUBOX complete machine test and a GPUBOX part test.
In one embodiment, the computer program when executed by a processor implements the steps of:
before the host computer is powered on and started up, the method further comprises the following steps:
an assembler assembles a host and a GPUBOX and connects the host with the associated GPUBOX via a cable according to the host order and the associated GPUBOX order.
In one embodiment, the computer program when executed by a processor performs the steps of:
before the host performs the automated test on the associated GPUBOX according to the associated GPUBOX order information, the method further comprises the following steps:
the host obtains product information and firmware versions of the associated GPUBOX;
the host configures the associated GPUBOX in accordance with the product information and the firmware version.
In one embodiment, the computer program when executed by a processor implements the steps of:
the GPUBOX whole machine test comprises a GPUBOX pressure test and a GPUBOX health check, and the GPUBOX part test comprises a GPU test, a power supply test, a fan test and a substrate test.
In one embodiment, the computer program when executed by a processor implements the steps of:
the automated testing of the associated GPUBOX by the host according to the associated GPUBOX order information comprises:
when the number of the hosts is 1, the hosts perform the GPUBOX whole test and the GPUBOX part test on the associated GPUBOX;
and when the number of the hosts is not 1, the hosts and other hosts associated in the long text of the order jointly perform the GPUBOX overall test and the GPUBOX part test on the associated GPUBOX.
In one embodiment, the computer program when executed by a processor implements the steps of:
the host computer and other host computers related in the order long text jointly carry out the GPUBOX whole test and the GPUBOX part test on the related GPUBOX, and the GPUBOX part test comprises the following steps:
the host computer performs the GPUBOX bulk test and the substrate test on the associated GPUBOX;
the associated other host computer performs the GPUBOX overall test on the associated GPUBOX;
the host and the associated other hosts assign parts in the associated GPUBOX and perform the GPUBOX part tests, wherein the parts include one or more of a fan, a power supply, and a GPU.
In one embodiment, the computer program when executed by a processor performs the steps of:
after the host automatically tests the associated GPUBOX according to the associated GPUBOX order information, the method comprises the following steps:
the host detects whether the product information and the firmware version of the associated GPUBOX generate errors;
if not, the host sends alarm information to a background;
and if so, the host uploads the test result to a background and shuts down.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A GPUBOX testing method, characterized in that it comprises:
constructing an order long text according to a host order and an associated GPUBOX order and writing the order long text into the host;
the host is powered on and started up and enters an operating system;
the host reads the stored order long text and judges whether associated GPUBOX order information exists or not;
and if the GPUBOX order information exists, the host machine carries out automatic test on the associated GPUBOX according to the associated GPUBOX order information and generates a test result, wherein the automatic test comprises a GPUBOX complete machine test and a GPUBOX part test.
2. The method of claim 1, further comprising, prior to powering up and booting the host:
an assembler assembles a host and a GPUBOX and connects the host with the associated GPUBOX via a cable according to the host order and the associated GPUBOX order.
3. The method of claim 1, wherein prior to the host performing automated testing of the associated GPUBOX in accordance with the associated GPUBOX order information, further comprising:
the host obtains product information and firmware versions of the associated GPUBOX;
the host configures the associated GPUBOX in accordance with the product information and the firmware version.
4. The method of claim 1, wherein the GPUBOX whole plant test comprises a GPUBOX pressure test and a GPUBOX health check, and the GPUBOX part test comprises a GPU test, a power supply test, a fan test and a substrate test.
5. The method of claim 4, wherein the host automated testing of the associated GPUBOX as a function of the associated GPUBOX order information comprises:
when the number of the hosts is 1, the hosts perform the GPUBOX whole test and the GPUBOX part test on the associated GPUBOX;
and when the number of the hosts is not 1, the hosts and other hosts associated in the long text of the order jointly perform the GPUBOX overall test and the GPUBOX part test on the associated GPUBOX.
6. The method as claimed in claim 5, wherein said host and other hosts associated in said long text of the order jointly performing said GPUBOX global test and said GPUBOX part test on said associated GPUBOX comprises:
the host computer performs the GPUBOX bulk test and the substrate test on the associated GPUBOX;
the associated other host computer performs the GPUBOX overall test on the associated GPUBOX;
the host and the associated other hosts assign parts in the associated GPUBOX and perform the GPUBOX part tests, wherein the parts include one or more of a fan, a power supply, and a GPU.
7. The method of claim 6, wherein the host computer, after performing automated testing of the associated GPUBOX according to the associated GPUBOX order information, comprises:
the host detects whether the product information and the firmware version of the associated GPUBOX have errors;
if not, the host sends alarm information to a background;
and if so, the host uploads the test result to a background and shuts down.
8. A GPUBOX testing system, the system comprising:
constructing an order long text according to a host order and an associated GPUBOX order and writing the order long text into the host;
the host is used for powering on and starting up and entering an operating system;
the host is used for reading the stored order long text and judging whether associated GPUBOX order information exists or not;
and if the GPUBOX order information exists, the host is used for carrying out automatic testing on the associated GPUBOX according to the associated GPUBOX order information and generating a testing result, wherein the automatic testing comprises GPUBOX whole machine testing and GPUBOX part testing.
9. An electronic device, comprising:
one or more processors; and memory associated with the one or more processors for storing program instructions which, when read and executed by the one or more processors, perform the method of any of claims 1-7.
10. A computer storage medium, characterized in that a computer program is stored thereon, wherein the program, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202210601602.7A 2022-05-30 2022-05-30 GPUBOX test method, system, electronic device and storage medium Pending CN115220971A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210601602.7A CN115220971A (en) 2022-05-30 2022-05-30 GPUBOX test method, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210601602.7A CN115220971A (en) 2022-05-30 2022-05-30 GPUBOX test method, system, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN115220971A true CN115220971A (en) 2022-10-21

Family

ID=83607046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210601602.7A Pending CN115220971A (en) 2022-05-30 2022-05-30 GPUBOX test method, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN115220971A (en)

Similar Documents

Publication Publication Date Title
CN110196729B (en) Application program updating method, device and apparatus and storage medium
US20140115395A1 (en) System and method of cloud testing and remote monitoring for integrated circuit components in system validation
CN105144074A (en) Block storage using a hybrid memory device
US11043269B2 (en) Performing a test of memory components with fault tolerance
CN112099597A (en) Board adapting method, device, equipment and machine readable storage medium
CN112068852A (en) Method, system, equipment and medium for installing open source software based on domestic server
CN115934447A (en) Display information acquisition method and device, electronic equipment and storage medium
TWI620120B (en) Data loading method and motherboard
CN112785318B (en) Block chain-based transparent supply chain authentication method, device, equipment and medium
CN115543881B (en) PCIE (peripheral component interconnect express) equipment adaptation method, PCIE equipment adaptation system, computer equipment and storage medium
US11003778B2 (en) System and method for storing operating life history on a non-volatile dual inline memory module
CN115220971A (en) GPUBOX test method, system, electronic device and storage medium
CN111400128A (en) Log management method and device, computer equipment and storage medium
CN115878400A (en) Test method, test apparatus, computer device, storage medium, and program product
CN110908725B (en) Application program starting method and device, electronic equipment and readable medium
CN112732562A (en) Computer test method, system and related device
CN117472291B (en) Data block verification method and device, storage medium and electronic equipment
CN102479131A (en) Test method
CN114253618A (en) BIOS modification method and device based on different manufacturers, computer equipment and storage medium
CN116484888A (en) Electronic tag information processing method, device and system and electronic device
CN107357602B (en) Data loading method and mainboard
CN118051416A (en) Test method, apparatus, device, storage medium and computer program product
CN115599716A (en) Method, device and equipment for identifying FRU in place and readable storage medium
TW202238366A (en) Storage device for recording status of hardware component of computer system and computer implementation method thereof
CN116414406A (en) BMC customizing method, device, computer equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination