CN110175096B - GPU (graphics processing Unit) pressurization test method, system, terminal and storage medium - Google Patents

GPU (graphics processing Unit) pressurization test method, system, terminal and storage medium Download PDF

Info

Publication number
CN110175096B
CN110175096B CN201910425730.9A CN201910425730A CN110175096B CN 110175096 B CN110175096 B CN 110175096B CN 201910425730 A CN201910425730 A CN 201910425730A CN 110175096 B CN110175096 B CN 110175096B
Authority
CN
China
Prior art keywords
nbody
command
gpu
duration
refreshing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910425730.9A
Other languages
Chinese (zh)
Other versions
CN110175096A (en
Inventor
张瑞丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Wave Intelligent Technology Co Ltd
Original Assignee
Suzhou Wave Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Wave Intelligent Technology Co Ltd filed Critical Suzhou Wave Intelligent Technology Co Ltd
Priority to CN201910425730.9A priority Critical patent/CN110175096B/en
Publication of CN110175096A publication Critical patent/CN110175096A/en
Application granted granted Critical
Publication of CN110175096B publication Critical patent/CN110175096B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/2236Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test CPU or processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods

Abstract

The invention provides a GPU pressurization test method, a system, a terminal and a storage medium, comprising the following steps: setting a refresh period of an nbody command; collecting the number of GPU cards and generating a corresponding number of nbody commands according to the number of the GPU cards; acquiring the duration of the nbody command; and refreshing the nbody command according to the duration and the refreshing period. The invention can automatically generate the nbody commands which are in one-to-one correspondence with the GPU cards to simultaneously pressurize the GPU cards, and can automatically update the nbody commands, thereby avoiding the interruption of pressurization and simultaneously saving human resources and test time.

Description

GPU (graphics processing Unit) pressurization test method, system, terminal and storage medium
Technical Field
The invention belongs to the technical field of server testing, and particularly relates to a GPU (graphics processing unit) pressurization testing method, a system, a terminal and a storage medium.
Background
With the development of artificial intelligence, GPU servers are becoming more popular, and in order to test the stability and reliability of GPU cards, the GPU cards need to be subjected to long-time (generally >24h), heavy-load pressurization test, and a cuba-owned nbody tool can provide heavy-load pressurization for the GPU cards, but the nbody general pressurization test time is only about 30-40 mins. Another nbody command can only test 1 GPU card. This results in the need to manually perform the pressurization of the GPU while testing the GPU. The degree of automation is low when the GPU test is executed, and time and human resources are consumed.
Disclosure of Invention
In view of the above-mentioned deficiencies of the prior art, the present invention provides a GPU pressure test method, system, terminal and storage medium to solve the above-mentioned technical problems.
In a first aspect, the present invention provides a GPU pressurization test method, including:
setting an n-body command refresh period, and setting the n-body command refresh period to be 30 min;
collecting the number of GPU cards and generating a corresponding number of nbody commands according to the number of the GPU cards, wherein the method comprises the following steps: collecting all GPU card identification codes; generating an nbody command corresponding to the GPU card one by one according to the identification code;
acquiring the duration of the nbody command;
refreshing the nbody command according to the duration and the refresh period, comprising: judging whether the duration of the nbody command reaches the refresh period: if yes, regenerating an nbody command corresponding to the identification code according to the identification code of the GPU card to which the nbody command belongs; otherwise, the acquisition and monitoring of the duration of the nbody command are circulated.
The method further comprises the following steps: starting a GPU state monitoring program, and monitoring error reporting information; and outputting the monitoring result in the form of a test log.
In a second aspect, the present invention provides a GPU pressurization test system, comprising:
the period setting unit is configured for setting a refresh period of the nbody command;
the command generation unit is configured to collect the number of GPU cards and generate a corresponding number of nbody commands according to the number of the GPU cards, and comprises: the information acquisition module is configured for acquiring all GPU card identification codes; the command generation module is configured to generate the nbody commands corresponding to the GPU cards one by one according to the identification codes;
the time acquisition unit is configured to acquire the duration of the nbody command;
the command refreshing unit is configured to refresh the nbody command according to the duration and the refreshing period, and comprises: the time judgment module is configured to judge whether the duration time of the nbody command reaches the refresh period; the regeneration module is configured to regenerate the nbody command corresponding to the identification code according to the identification code of the GPU card to which the nbody command belongs; and the cyclic acquisition module is configured for cyclically monitoring the acquisition of the duration of the nbody command.
In a third aspect, a terminal is provided, including:
a processor, a memory, wherein,
the memory is used for storing a computer program which,
the processor is used for calling and running the computer program from the memory so as to make the terminal execute the method of the terminal.
In a fourth aspect, a computer storage medium is provided having stored therein instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.
The beneficial effect of the invention is that,
according to the GPU pressurization test method, the system, the terminal and the storage medium, provided by the invention, the problem that the pressurization interruption affects the test result is avoided by setting the n-body command refreshing period, generating the corresponding n-body command according to the number of the GPU cards, then acquiring the duration of the n-body command in real time, and updating the n-body command with the duration reaching the refreshing period in time. The invention can automatically generate the nbody commands which are in one-to-one correspondence with the GPU cards to simultaneously pressurize the GPU cards, and can automatically update the nbody commands, thereby avoiding the interruption of pressurization and simultaneously saving human resources and test time.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.
FIG. 2 is a schematic block diagram of a system of one embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The following explains key terms appearing in the present invention.
The GPU, a Graphics Processing Unit (abbreviated as GPU), also called a display core, a visual processor, and a display chip, is a microprocessor that is specially used for image operation on a personal computer, a workstation, a game machine, and some mobile devices (such as a tablet computer, a smart phone, etc.).
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention. The execution body of fig. 1 may be a GPU pressurization test system.
As shown in fig. 1, the method 100 includes:
step 110, setting a refresh period of an nbody command;
step 120, collecting the number of GPU cards and generating a corresponding number of nbody commands according to the number of the GPU cards;
step 130, obtaining the duration of the nbody command;
and step 140, refreshing the nbody command according to the duration and the refreshing period.
Optionally, as an embodiment of the present invention, the setting an nbody command refresh period includes:
the nbody command refresh period is set to 30 min.
Optionally, as an embodiment of the present invention, the acquiring the number of GPU cards and generating a corresponding number of nbody commands according to the number of GPU cards includes:
collecting all GPU card identification codes;
and generating the nbody commands corresponding to the GPU cards one by one according to the identification codes.
Optionally, as an embodiment of the present invention, the refreshing the nbody command according to the duration and the refresh period includes:
judging whether the duration of the nbody command reaches the refresh period:
if yes, regenerating an nbody command corresponding to the identification code according to the identification code of the GPU card to which the nbody command belongs;
otherwise, the acquisition and monitoring of the duration of the nbody command are circulated.
Optionally, as an embodiment of the present invention, after the refreshing the nbody command according to the duration and the refresh period, the method further includes:
starting a GPU state monitoring program, and monitoring error reporting information;
and outputting the monitoring result in the form of a test log.
In order to facilitate understanding of the present invention, the GPU pressurization test method provided by the present invention is further described below with reference to the principle of the GPU pressurization test method of the present invention and the process of pressurizing the GPU in the embodiment.
Specifically, the GPU pressurization test method includes:
s1, since the n body command generally has a pressurizing test time of only about 30-40mins, in order to avoid a pressurizing interruption which may occur, the present embodiment sets the refresh period of the n body command to 30 min.
S2, reading the number of GPU cards in the test server and the identification code of each GPU card through the script, establishing nbody commands corresponding to all the GPU cards one by one according to the identification codes of the GPU cards, and controlling the nbody to generate the nbody commands through the automatic test script.
And S3, circularly acquiring the duration of each nbody command, recording the generation time of each nbody command when each nbody command is generated, and calculating the duration of the nbody command according to the current time and the generation time. The nbody command duration is updated every 2 s.
And S4, judging whether the duration of the nbody commands acquired in the step S3 reaches the refresh period (30min), and if the duration of all the nbody commands is the same and reaches the refresh period, updating all the nbody commands, namely reestablishing the nbody commands. If the duration of the nbody command is asynchronous, after the duration of a certain nbody command reaches a refresh period, the nbody command is reestablished aiming at the GPU card identification code to which the nbody command belongs, and the directed refresh of the nbody command is realized.
And S5, in the process of pressurizing each GPU card of the test server, starting a monitoring program, monitoring whether an error message exists in a log file generated in the pressurizing process, and immediately outputting the error message to a result file if the error message exists, so that a tester can conveniently perform error analysis subsequently.
The specific contents of the automatic test script used in this embodiment are as follows (taking 8 GPU cards in the test server as an example):
Figure BDA0002067425960000061
Figure BDA0002067425960000071
as shown in fig. 2, the system 200 includes:
a period setting unit 210, wherein the period setting unit 210 is used for setting an nbody command refresh period;
the command generating unit 220 is used for acquiring the number of GPU cards and generating n body commands with corresponding number according to the number of the GPU cards;
a time obtaining unit 230, wherein the time obtaining unit 230 is configured to obtain a duration of the nbody command;
a command refresh unit 240, said command refresh unit 240 configured to refresh said nbody command according to said duration and refresh period.
Optionally, as an embodiment of the present invention, the command generating unit includes:
the information acquisition module is configured for acquiring all GPU card identification codes;
and the command generation module is configured to generate the nbody commands corresponding to the GPU cards one by one according to the identification codes.
Optionally, as an embodiment of the present invention, the command refresh unit includes:
the time judgment module is configured to judge whether the duration time of the nbody command reaches the refresh period;
the regeneration module is configured to regenerate the nbody command corresponding to the identification code according to the identification code of the GPU card to which the nbody command belongs;
and the cyclic acquisition module is configured for cyclically monitoring the acquisition of the duration of the nbody command.
Fig. 3 is a schematic structural diagram of a terminal system 300 according to an embodiment of the present invention, where the terminal system 300 may be used to execute the GPU stress test method according to the embodiment of the present invention.
The terminal system 300 may include: a processor 310, a memory 320, and a communication unit 330. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not intended to be limiting, and may be a bus architecture, a star architecture, a combination of more or less components than those shown, or a different arrangement of components.
The memory 320 may be used for storing instructions executed by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The executable instructions in memory 320, when executed by processor 310, enable terminal 300 to perform some or all of the steps in the method embodiments described below.
The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 310 may include only a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.
A communication unit 330, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.
The present invention also provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
Therefore, the method and the device avoid the problem that the test result is influenced by pressurization interruption by setting the refreshing period of the nbody commands, generating the corresponding nbody commands according to the number of the GPU cards, then acquiring the duration of the nbody commands in real time, and updating the nbody commands with the duration reaching the refreshing period in time. The invention can automatically generate the nbody command which is in one-to-one correspondence with the multiple GPU cards to simultaneously pressurize the multiple GPU cards, and can automatically update the nbody command, thereby avoiding pressurization interruption, simultaneously saving human resources and test time.
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, where the computer software product is stored in a storage medium, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like, and the storage medium can store program codes, and includes instructions for enabling a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, and the like) to perform all or part of the steps of the method in the embodiments of the present invention.
The same and similar parts in the various embodiments in this specification may be referred to each other. Especially, for the terminal embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description in the method embodiment.
In the embodiments provided by the present invention, it should be understood that the disclosed system, system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A GPU pressurization test method is characterized by comprising the following steps:
setting a refresh period of an nbody command;
collecting the number of GPU cards and generating a corresponding number of nbody commands according to the number of the GPU cards;
acquiring the duration of the nbody command;
refreshing the nbody command according to the duration and the refreshing period; the refreshing the nbody command according to the duration and the refresh period comprises: judging whether the duration of the nbody command reaches the refresh period: if so, regenerating an nbody command corresponding to the identification code according to the identification code of the GPU card to which the nbody command belongs; if not, the acquisition of the duration of the nbody command is circulated.
2. The GPU pressurization test method of claim 1, wherein said setting an nbody command refresh period comprises:
the nbody command refresh period is set to 30 min.
3. A GPU pressure test method according to claim 1, wherein the collecting the number of GPU cards and generating a corresponding number of nbody commands according to the number of GPU cards comprises:
collecting all GPU card identification codes;
and generating the nbody commands corresponding to the GPU cards one by one according to the identification codes.
4. A GPU stress testing method according to claim 1, wherein after refreshing the nbody commands according to the duration and refresh period, the method further comprises:
starting a GPU state monitoring program, and monitoring error reporting information;
and outputting the monitoring result in the form of a test log.
5. The GPU pressurization test system of claim 1, comprising:
the period setting unit is configured for setting a refresh period of the nbody command;
the command generation unit is configured to collect the number of the GPU cards and generate the number of the nbody commands according to the number of the GPU cards;
the time acquisition unit is configured to acquire the duration of the nbody command;
the command refreshing unit is configured to refresh the nbody command according to the duration and the refreshing period; the command refresh unit includes: the time judgment module is configured to judge whether the duration time of the nbody command reaches the refresh period; the regeneration module is configured to regenerate the nbody command corresponding to the identification code according to the identification code of the GPU card to which the nbody command belongs if the duration of the nbody command reaches the refresh period; and the cycle acquisition module is configured for acquiring the duration time of the nbody command in a cycle manner if the duration time of the nbody command does not reach the refreshing period.
6. The GPU pressurization test system of claim 5, wherein the command generation unit comprises:
the information acquisition module is configured for acquiring all GPU card identification codes;
and the command generation module is configured to generate the nbody commands corresponding to the GPU cards one by one according to the identification codes.
7. A terminal, comprising:
a processor;
a memory for storing instructions for execution by the processor;
wherein the processor is configured to perform the method of any one of claims 1-4.
8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-4.
CN201910425730.9A 2019-05-21 2019-05-21 GPU (graphics processing Unit) pressurization test method, system, terminal and storage medium Active CN110175096B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910425730.9A CN110175096B (en) 2019-05-21 2019-05-21 GPU (graphics processing Unit) pressurization test method, system, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910425730.9A CN110175096B (en) 2019-05-21 2019-05-21 GPU (graphics processing Unit) pressurization test method, system, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN110175096A CN110175096A (en) 2019-08-27
CN110175096B true CN110175096B (en) 2020-02-07

Family

ID=67691787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910425730.9A Active CN110175096B (en) 2019-05-21 2019-05-21 GPU (graphics processing Unit) pressurization test method, system, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN110175096B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111338862B (en) * 2020-02-16 2022-07-19 苏州浪潮智能科技有限公司 GPU mode switching stability test method, system, terminal and storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8539438B2 (en) * 2009-09-11 2013-09-17 International Business Machines Corporation System and method for efficient creation and reconciliation of macro and micro level test plans
CN102063354A (en) * 2009-11-18 2011-05-18 英业达股份有限公司 Pressure test method of server
CN102279782B (en) * 2011-04-01 2014-02-19 奇智软件(北京)有限公司 Pressure and device for testing hardware pressure
CN104679615A (en) * 2013-11-26 2015-06-03 英业达科技有限公司 Bus pressure test system and method thereof
CN103984612B (en) * 2014-05-28 2017-11-10 浪潮电子信息产业股份有限公司 A kind of method of the unattended pressure test based on HPL instruments
CN104375914A (en) * 2014-11-24 2015-02-25 浪潮电子信息产业股份有限公司 Automatic testing method for internal pressure changes of server
CN104615523A (en) * 2015-03-05 2015-05-13 浪潮电子信息产业股份有限公司 Fatigue testing method of BMC management module based on IPMI protocol
CN107423183A (en) * 2017-04-25 2017-12-01 郑州云海信息技术有限公司 A kind of GTX series video card calculates the applied voltage test method of performance
CN109086184A (en) * 2018-07-18 2018-12-25 郑州云海信息技术有限公司 The monitoring method of GPU pressure test under a kind of server Linux system
CN109522173A (en) * 2018-11-02 2019-03-26 郑州云海信息技术有限公司 A kind of OPA network card testing method, device, terminal and storage medium

Also Published As

Publication number Publication date
CN110175096A (en) 2019-08-27

Similar Documents

Publication Publication Date Title
CN111338862B (en) GPU mode switching stability test method, system, terminal and storage medium
CN110175096B (en) GPU (graphics processing Unit) pressurization test method, system, terminal and storage medium
CN111966551A (en) Method, system, terminal and storage medium for verifying remote command execution result
CN111475106A (en) RAID customization creating method, system, terminal and storage medium
CN111147331A (en) Server network card interaction test method, system, terminal and storage medium
CN110569154A (en) Chip interface function testing method, system, terminal and storage medium
CN110554917A (en) method, system, terminal and storage medium for efficiently traversing large data volume set
CN111176917B (en) Method, system, terminal and storage medium for testing stability of CPU SST-BF function
CN109992420B (en) Parallel PCIE-SSD performance optimization method and system
CN111176924A (en) GPU card dropping simulation method, system, terminal and storage medium
CN112214384A (en) Hard disk serial number management method, system, terminal and storage medium
CN109117406B (en) PCIE hot plug test method, device, terminal and storage medium
CN111949518A (en) Method, system, terminal and storage medium for generating fault detection script
CN110543394A (en) server sensor information consistency testing method, system, terminal and storage medium
CN112463504B (en) Double-control storage product testing method, system, terminal and storage medium
CN112463195B (en) Method, system, terminal and storage medium for cluster grouping online upgrade
CN115129249A (en) SAS link topology identification management method, system, terminal and storage medium
CN110703988B (en) Storage pool creating method, system, terminal and storage medium for distributed storage
CN109800114B (en) BMC visual test method, device, terminal and storage medium
CN113076111A (en) Customized cluster configuration method, system, terminal and storage medium
CN110543459A (en) Method, system, terminal and storage medium for acquiring file lock state under NFS
CN111858198A (en) Multi-scheme memory plugging test method, system, terminal and storage medium
CN112463473B (en) Method, system, terminal and storage medium for testing storage data stream unit
CN111475349B (en) Method, system, terminal and storage medium for testing stability of cluster DPDK
CN109920466B (en) Hard disk test data analysis method, device, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant