CN111338862A - GPU mode switching stability test method, system, terminal and storage medium - Google Patents

GPU mode switching stability test method, system, terminal and storage medium Download PDF

Info

Publication number
CN111338862A
CN111338862A CN202010094748.8A CN202010094748A CN111338862A CN 111338862 A CN111338862 A CN 111338862A CN 202010094748 A CN202010094748 A CN 202010094748A CN 111338862 A CN111338862 A CN 111338862A
Authority
CN
China
Prior art keywords
gpu
bandwidth
setting
rate
pcie
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010094748.8A
Other languages
Chinese (zh)
Other versions
CN111338862B (en
Inventor
刘瑞雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010094748.8A priority Critical patent/CN111338862B/en
Publication of CN111338862A publication Critical patent/CN111338862A/en
Application granted granted Critical
Publication of CN111338862B publication Critical patent/CN111338862B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2205Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested
    • G06F11/2236Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing using arrangements specific to the hardware being tested to test CPU or processors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/22Detection or location of defective computer hardware by testing during standby operation or during idle time, e.g. start-up testing
    • G06F11/2273Test methods

Abstract

The invention provides a method, a system, a terminal and a storage medium for testing GPU mode switching stability, wherein the method comprises the following steps: setting a bandwidth reference of the GPU at a speed reduction rate and a bandwidth reference at a maximum rate; the sensitivity of a CPU end to PCIE error reporting is reduced by setting a PCIE error reporting register to zero; limiting the GPU-PCIE link rate to be a reduced rate by setting the limited rate of the cpu register; pressurizing the GPU, comparing the GPU pressurized bandwidth with the bandwidth standard at the corresponding speed, and if the two are not consistent, outputting test failure; and canceling the speed setting of the cpu register, periodically pressurizing the GPU, comparing the bandwidth of the GPU in the periodic pressurizing process with the corresponding reference bandwidth, and if the comparison is inconsistent, failing the output test. The invention can switch PCIE speed by active and passive modes to carry out GPU compatibility stability test. The test coverage of compatibility of the GPU and the machine is improved. A verification mode is provided for small probability events, and product quality guarantee is guaranteed.

Description

GPU mode switching stability test method, system, terminal and storage medium
Technical Field
The invention relates to the technical field of server testing, in particular to a method, a system, a terminal and a storage medium for testing GPU mode switching stability.
Background
GPU (graphics Processing Unit) -graphics processor, also called visual processor, is a microprocessor specially used for image and graphics related operation, and the GPU reduces the dependence of the graphics card on the CPU during 3D graphics Processing and shares part of the work of the original CPU. With the arrival of 2.0 of the internet, various video graphics propagation APPs (micro blogs, micro messages, jitters, volcano videos, and the like) are authors of each user, and various graphics videos DIY require a large amount of image processing. Therefore, in recent years, the utilization and development of GPUs have become more widespread. The stable operation of the GPUs of various APP manufacturers is related to customer satisfaction of APP users. Stability testing for GPUs becomes exceptionally important. The GPUs currently in the mainstream are nvidia and ATI.
Because the power consumption of the GPU is too high, most GPUs have an energy-saving mode, and when the GPU is in an idle state, the GPU can automatically reduce the running speed through speed change to enter the energy-saving mode. The frequent rate switching may cause GPU failure, and therefore, stability testing of the GPU in the frequent rate switching scenario is required.
However, in the development process of a common GPU BOX, due to the fact that a voltage jump of 20-30 mV exists when VDD18 is supplied with power by a collocated timer card (which is used for signal correction during long-distance transmission and the like), PCIe error is probabilistically reported in the process of Speed change corresponding to a PCIe link, so that a system crash is created and restarted. Therefore, the difficulty of realizing the stability test under the situation of frequent rate switching is high.
Disclosure of Invention
In view of the above-mentioned deficiencies of the prior art, the present invention provides a method, a system, a terminal and a storage medium for testing GPU mode switching stability, so as to solve the above-mentioned technical problems.
In a first aspect, the present invention provides a method for testing GPU mode switching stability, including:
setting a bandwidth reference of the GPU at a speed reduction rate and a bandwidth reference at a maximum rate;
the sensitivity of a CPU end to PCIE error reporting is reduced by setting a PCIE error reporting register to zero;
limiting the GPU-PCIE link rate to be a reduced rate by setting the limited rate of the cpu register;
pressurizing the GPU, comparing the GPU pressurized bandwidth with the bandwidth standard at the corresponding speed, and if the two are not consistent, outputting test failure;
and canceling the speed setting of the cpu register, periodically pressurizing the GPU, comparing the bandwidth of the GPU in the periodic pressurizing process with the corresponding reference bandwidth, and if the comparison is inconsistent, failing the output test.
Further, the reducing the sensitivity of the CPU end to the PCIE error report by setting the PCIE error report register to zero includes:
reading a tree structure of the PCIE equipment by utilizing an lspci command, searching mapping information of the PCIE buses where all the GPUs are located from the tree structure, and importing the mapping information into a mapping information file;
reading the id in the mapping information file by using for cyclic polling, and importing the read id into a variable bus in a command setpci;
the surprie _ Down _ Error _ setting register is zeroed with the command setpci.
Further, the limiting the GPU-PCIE link rate to the reduced rate by setting the limited rate of the cpu register includes:
setting the value C0.B of a CPU register Target _ Link _ Speed to be 41 by utilizing a setpci command, and enabling the Link rate of PCIE to be a Speed reduction rate;
setting the a0.b value of the CPU register Retrain _ Link to 60 with the setpci command causes the Link rate of PCIE to rise to the device's own maximum rate.
Further, the step of pressurizing the GPU and comparing the GPU-pressurized bandwidth with the bandwidth reference at the corresponding rate includes:
setting the GPU pressurization time to be 1 h;
capturing the bandwidth of the GPU in the pressurizing process in real time and comparing the bandwidth with a speed reduction bandwidth standard when the link rate of the PCIE is the speed reduction rate;
and when the link rate of the PCIE is increased to the maximum rate, capturing the bandwidth of the GPU in the pressurizing process in real time and comparing the bandwidth with the maximum rate bandwidth standard.
Further, the periodically pressurizing the GPU includes:
setting the pressurizing time, the interval time and the pressurizing times;
and periodically pressurizing the GPU according to the pressurizing time, the interval time and the pressurizing times.
Further, the method further comprises:
and acquiring the bandwidth of the GPU in the periodic pressurization process in real time, and marking the GPU-GPU-PCIE link rate corresponding to the bandwidth.
In a second aspect, the present invention provides a GPU mode switching stability testing system, including:
the reference setting unit is used for setting a bandwidth reference of the GPU at a speed reduction rate and a bandwidth reference at a maximum rate;
the error reporting setting unit is configured to reduce the sensitivity of the CPU end to the PCIE error reporting by setting a PCIE error reporting register to zero;
the rate limiting unit is configured to limit the GPU-PCIE link rate to a reduced rate by setting the limited rate of the cpu register;
the passive comparison unit is configured for pressurizing the GPU and comparing the GPU pressurized bandwidth with the bandwidth standard at the corresponding speed, and if the GPU pressurized bandwidth and the bandwidth standard are not consistent, the output test fails;
and the active comparison unit is configured for canceling the speed setting of the cpu register, periodically pressurizing the GPU, comparing the bandwidth of the GPU in the periodic pressurization process with the corresponding reference bandwidth, and if the comparison is inconsistent, outputting the test failure.
Further, the error setting unit includes:
the file import module is configured to read a tree structure of the PCIE device by using an lspci command, search mapping information of PCIE buses where all GPUs are located from the tree structure, and import the mapping information into a mapping information file;
the variable assignment module is configured to read the id in the mapping information file by using for-loop polling and import the read id into the variable bus in the command setpci;
a zero-setting execution module configured to zero the surrising _ Down _ Error _ setting register with the command setpci.
In a third aspect, a terminal is provided, including:
a processor, a memory, wherein,
the memory is used for storing a computer program which,
the processor is used for calling and running the computer program from the memory so as to make the terminal execute the method of the terminal.
In a fourth aspect, a computer storage medium is provided having stored therein instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.
The beneficial effect of the invention is that,
according to the method, the system, the terminal and the storage medium for testing the GPU mode switching stability, the PCIE error reporting register is set to be zero, the sensitivity of a CPU end to PCIE error reporting is reduced, meanwhile, the GPU is passively decelerated or accelerated by setting the limited speed of the CPU register, so that a scene of frequent switching of a GPU passive acceleration/deceleration mode is created, then the bandwidth of the GPU in the pressurizing process is compared with a preset bandwidth standard, so that the stability of the GPU is verified, in addition, the GPU is periodically pressurized by cancelling the setting of the CPU register, so that the automatic energy-saving mode/high-speed mode of the GPU is frequently switched, then the bandwidth of the GPU in the periodic pressurizing process is compared with the corresponding standard bandwidth, and the stability test of the GPU is realized. The invention can switch PCIE speed by active and passive modes to carry out GPU compatibility stability test. The test coverage of compatibility of the GPU and the machine is improved. A verification mode is provided for small probability events, and product quality guarantee is guaranteed.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.
FIG. 2 is a schematic block diagram of a system of one embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention. The execution subject in fig. 1 may be a GPU mode switching stability test system.
As shown in fig. 1, the method 100 includes:
step 110, setting a bandwidth reference of the GPU at a speed reduction rate and a bandwidth reference at a maximum rate;
step 120, reducing the sensitivity of the CPU end to the PCIE error report by setting the PCIE error report register to zero;
step 130, limiting the GPU-PCIE link rate to a reduced rate by setting the limited rate of the cpu register;
step 140, pressurizing the GPU, comparing the GPU pressurized bandwidth with the bandwidth standard at the corresponding speed, and if the two are not consistent, outputting test failure;
and 150, canceling the speed setting of the cpu register, periodically pressurizing the GPU, comparing the bandwidth of the GPU in the periodic pressurizing process with the corresponding reference bandwidth, and if the comparison is inconsistent, failing the output test.
In order to facilitate understanding of the present invention, the GPU mode switching stability test method provided by the present invention is further described below with reference to the principle of the GPU mode switching stability test method of the present invention and in combination with the process of performing the mode switching stability test on the GPU in the embodiment.
Specifically, the GPU mode switching stability testing method includes:
and S1, setting the bandwidth reference of the GPU at the speed reduction rate and the bandwidth reference at the maximum rate.
Firstly, installing a GPU driving and pressure testing tool nvvs under a linux system, sorting a GPU speed bandwidth reference table and storing a document base.
S2, reducing the sensitivity of the CPU end to PCIE error reporting by setting the PCIE error reporting register to zero.
Reading the tree structure of the PCIE equipment by using a command lspci-tv | grep-i NVIDIA (taking NVIDIA GPU as an example), searching the root cpmplex of the PCIE bus where all the GPUs are located, and importing the root cpmplex into the busid.
In order to avoid the problem of system coast restart caused by PCIE Error, a for loop polling is used for reading id in busid.txt to a variable bus (i), and a command setpci-s bus (i)154.B ═ 10 is used for setting a Surprie _ Down _ Error _ Severity register to be 0, so that the Error reporting sensitivity of PCIE at a CPU end is reduced, and the problem that the PCIE Error causes the system coast in the test process can be avoided.
S3, limiting the GPU-PCIE link rate to a speed reduction rate by setting the limiting rate of the cpu register, pressurizing the GPU, comparing the GPU pressurized bandwidth with the bandwidth reference at the corresponding rate, and if the GPU pressurized bandwidth and the bandwidth reference are not consistent, outputting test failure.
Entering a passive speed reduction test mode (the passive speed reduction test mode is necessarily carried out firstly, because the GPU is designed to automatically enter an idle speed reduction state after pressure is unloaded once the GPU is pressurized, but the GPU itself cannot be started to enter an energy-saving mode but is kept when the system is not pressurized at all), and the passive speed reduction test mode method comprises the following steps:
setting the CPU register Target _ Link _ Speed using the command setpci-s bus (i) c0.b 41 causes the Link rate capability of PCIE to be limited to 2.5 GT/s. The GPU was subjected to nvvs pressure test for 1 hour. And capturing LnkSta and base width of the GPU equipment in real time, comparing the LnkSta and base width with the reduced speed reference in base.txt, printing fail on a screen if the LnkSta and base width are not consistent, and ending the test.
Entering a passive speed-up test mode, setting a CPU register Retain _ Link by using a command setpci-s bus (i) A0.B ═ 60 so that a PCIE Link retrains to obtain the maximum speed of the device, and carrying out nvvs pressure test on the GPU for 5 minutes at the speed of 8GT/s under the normal state of the Link for the existing GPU. And (4) capturing the LnkSta and the base width of the GPU equipment in real time and comparing the LnkSta and the base width with the reference of the maximum rate in the base.
And circulating the above-mentioned execution of the passive speed reduction test until the circulation times reach the preset 1000 times.
And S4, canceling the speed setting of the cpu register, periodically pressurizing the GPU, comparing the bandwidth of the GPU in the periodic pressurizing process with the corresponding reference bandwidth, and if the comparison is inconsistent, outputting the test failure.
Entering an active speed reduction test mode, wherein the method comprises the following steps:
the pressurizing time is set to be 2 minutes in advance, the interval time is set to be 2 minutes, and the pressurizing times are set to be 1000 times.
Pressurizing nvvs for 2 minutes, capturing LnkSta and base width of the GPU in real time, comparing with the bandwidth standard at the maximum speed in base.
And (3) automatically entering an energy-saving mode when the GPU is in an idle state for 2 minutes (pressurizing interval), capturing LnkSta and base width in real time, comparing the LnkSta and the base width with a bandwidth standard during rate reduction in base.
As shown in fig. 2, the system 200 includes:
a reference setting unit 210 configured to set a bandwidth reference of the GPU at a reduced rate and a bandwidth reference at a maximum rate;
an error setting unit 220 configured to reduce the sensitivity of the CPU end to the PCIE error report by setting the PCIE error report register to zero;
a rate limiting unit 230 configured to limit the GPU-PCIE link rate to a reduced rate by setting a limited rate of the cpu register;
a passive comparison unit 240 configured to pressurize the GPU and compare the GPU pressurized bandwidth with a bandwidth reference at a corresponding rate, and if the two are not consistent, the output test fails;
and the active comparison unit 250 is configured to cancel the rate setting of the cpu register and periodically pressurize the GPU, compare the bandwidth of the GPU in the periodic pressurization process with the corresponding reference bandwidth, and if the comparison is inconsistent, fail the output test.
Optionally, as an embodiment of the present invention, the error setting unit includes:
the file import module is configured to read a tree structure of the PCIE device by using an lspci command, search mapping information of PCIE buses where all GPUs are located from the tree structure, and import the mapping information into a mapping information file;
the variable assignment module is configured to read the id in the mapping information file by using for-loop polling and import the read id into the variable bus in the command setpci;
a zero-setting execution module configured to zero the surrising _ Down _ Error _ setting register with the command setpci.
Fig. 3 is a schematic structural diagram of a terminal system 300 according to an embodiment of the present invention, where the terminal system 300 may be used to execute the GPU mode switching stability test method according to the embodiment of the present invention.
The terminal system 300 may include: a processor 310, a memory 320, and a communication unit 330. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not intended to be limiting, and may be a bus architecture, a star architecture, a combination of more or less components than those shown, or a different arrangement of components.
The memory 320 may be used for storing instructions executed by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The executable instructions in memory 320, when executed by processor 310, enable terminal 300 to perform some or all of the steps in the method embodiments described below.
The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 310 may include only a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.
A communication unit 330, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.
The present invention also provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
Therefore, the invention reduces the sensitivity of the CPU end to PCIE error reporting by setting the PCIE error reporting register to zero to avoid the test from being interrupted, and simultaneously enables the GPU to be passively decelerated or passively accelerated by setting the limited speed of the CPU register, thereby creating a scene of frequent switching of the GPU passive acceleration/deceleration mode, then compares the bandwidth of the GPU in the pressurizing process with the preset bandwidth reference, thereby verifying the stability of the GPU, and in addition, periodically pressurizes the GPU by canceling the CPU register setting to realize the frequent switching of the automatic energy-saving mode/high-speed mode of the GPU, then compares the bandwidth of the GPU in the periodic pressurizing process with the corresponding reference bandwidth, and realizes the stability test of the GPU. The invention can switch PCIE speed by active and passive modes to carry out GPU compatibility stability test. The test coverage of compatibility of the GPU and the machine is improved. A verification method is provided for a small-probability event, so that product quality is guaranteed, and the technical effect achieved by the embodiment can be referred to the description above, which is not described herein again.
Those skilled in the art will readily appreciate that the techniques of the embodiments of the present invention may be implemented as software plus a required general purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied in the form of a software product, where the computer software product is stored in a storage medium, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like, and the storage medium can store program codes, and includes instructions for enabling a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, and the like) to perform all or part of the steps of the method in the embodiments of the present invention.
The same and similar parts in the various embodiments in this specification may be referred to each other. Especially, for the terminal embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description in the method embodiment.
In the embodiments provided in the present invention, it should be understood that the disclosed system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A GPU mode switching stability test method is characterized by comprising the following steps:
setting a bandwidth reference of the GPU at a speed reduction rate and a bandwidth reference at a maximum rate;
the sensitivity of a CPU end to PCIE error reporting is reduced by setting a PCIE error reporting register to zero;
limiting the GPU-PCIE link rate to be a reduced rate by setting the limited rate of the cpu register;
pressurizing the GPU, comparing the GPU pressurized bandwidth with the bandwidth standard at the corresponding speed, and if the two are not consistent, outputting test failure;
and canceling the speed setting of the cpu register, periodically pressurizing the GPU, comparing the bandwidth of the GPU in the periodic pressurizing process with the corresponding reference bandwidth, and if the comparison is inconsistent, failing the output test.
2. The method of claim 1, wherein reducing a CPU end's sensitivity to PCIE error reporting by zeroing a PCIE error reporting register comprises:
reading a tree structure of the PCIE equipment by utilizing an lspci command, searching mapping information of PCIE buses where all GPUs are located from the tree structure, and importing the mapping information into a mapping information file;
reading the id in the mapping information file by using for cyclic polling, and importing the read id into a variable bus in a command setpci;
the surprie _ Down _ Error _ setting register is zeroed with the command setpci.
3. The method of claim 1, wherein limiting the GPU-PCIE link rate to a reduced rate by setting a limited rate of a cpu register comprises:
setting the value C0.B of a CPU register Target _ Link _ Speed to be 41 by utilizing a setpci command, and enabling the Link rate of PCIE to be a Speed reduction rate;
setting the a0.b value of the CPU register Retrain _ Link to 60 with the setpci command causes the Link rate of PCIE to rise to the device's own maximum rate.
4. The method of claim 3, wherein the step of pressurizing the GPU and comparing the GPU-pressurized bandwidth to a bandwidth reference at a corresponding rate comprises:
setting the GPU pressurization time to be 1 h;
capturing the bandwidth of the GPU in the pressurizing process in real time and comparing the bandwidth with a speed reduction bandwidth standard when the link rate of the PCIE is the speed reduction rate;
and when the link rate of the PCIE is increased to the maximum rate, capturing the bandwidth of the GPU in the pressurizing process in real time and comparing the bandwidth with the maximum rate bandwidth standard.
5. The method of claim 1, wherein the periodically pressurizing the GPU comprises:
setting the pressurizing time, the interval time and the pressurizing times;
and periodically pressurizing the GPU according to the pressurizing time, the interval time and the pressurizing times.
6. The method of claim 1, further comprising:
and acquiring the bandwidth of the GPU in the periodic pressurization process in real time, and marking the GPU-GPU-PCIE link rate corresponding to the bandwidth.
7. A GPU mode switching stability test system is characterized by comprising:
the reference setting unit is used for setting a bandwidth reference of the GPU at a speed reduction rate and a bandwidth reference at a maximum rate;
the error reporting setting unit is configured to reduce the sensitivity of the CPU end to the PCIE error reporting by setting a PCIE error reporting register to zero;
the rate limiting unit is configured to limit the GPU-PCIE link rate to a reduced rate by setting the limited rate of the cpu register;
the passive comparison unit is configured for pressurizing the GPU and comparing the GPU pressurized bandwidth with the bandwidth standard at the corresponding speed, and if the GPU pressurized bandwidth and the bandwidth standard are not consistent, the output test fails;
and the active comparison unit is configured for canceling the speed setting of the cpu register, periodically pressurizing the GPU, comparing the bandwidth of the GPU in the periodic pressurization process with the corresponding reference bandwidth, and if the comparison is inconsistent, outputting the test failure.
8. The system of claim 7, wherein the error setting unit comprises:
the file import module is configured to read a tree structure of the PCIE device by using an lspci command, search mapping information of PCIE buses where all GPUs are located from the tree structure, and import the mapping information into a mapping information file;
the variable assignment module is configured to read the id in the mapping information file by using for-loop polling and import the read id into the variable bus in the command setpci;
a zero-setting execution module configured to zero the surrising _ Down _ Error _ setting register with the command setpci.
9. A terminal, comprising:
a processor;
a memory for storing instructions for execution by the processor;
wherein the processor is configured to perform the method of any one of claims 1-6.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN202010094748.8A 2020-02-16 2020-02-16 GPU mode switching stability test method, system, terminal and storage medium Active CN111338862B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010094748.8A CN111338862B (en) 2020-02-16 2020-02-16 GPU mode switching stability test method, system, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010094748.8A CN111338862B (en) 2020-02-16 2020-02-16 GPU mode switching stability test method, system, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN111338862A true CN111338862A (en) 2020-06-26
CN111338862B CN111338862B (en) 2022-07-19

Family

ID=71185792

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010094748.8A Active CN111338862B (en) 2020-02-16 2020-02-16 GPU mode switching stability test method, system, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN111338862B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416672A (en) * 2020-11-12 2021-02-26 宁畅信息产业(北京)有限公司 PCIE link stability test method, device, computer equipment and medium
CN114185603A (en) * 2021-11-08 2022-03-15 深圳云天励飞技术股份有限公司 Control method of intelligent accelerator card, server and intelligent accelerator card
CN114255155A (en) * 2022-02-24 2022-03-29 荣耀终端有限公司 Graphics processor testing method and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908490A (en) * 2017-11-09 2018-04-13 郑州云海信息技术有限公司 GPU registers reliability verification method and system in a kind of server DC tests
CN109783378A (en) * 2019-01-02 2019-05-21 郑州云海信息技术有限公司 GPU is in the compatibility test method of Ubnutu system, device, terminal and storage medium
CN110175096A (en) * 2019-05-21 2019-08-27 苏州浪潮智能科技有限公司 A kind of GPU applied voltage test method, system, terminal and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908490A (en) * 2017-11-09 2018-04-13 郑州云海信息技术有限公司 GPU registers reliability verification method and system in a kind of server DC tests
CN109783378A (en) * 2019-01-02 2019-05-21 郑州云海信息技术有限公司 GPU is in the compatibility test method of Ubnutu system, device, terminal and storage medium
CN110175096A (en) * 2019-05-21 2019-08-27 苏州浪潮智能科技有限公司 A kind of GPU applied voltage test method, system, terminal and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112416672A (en) * 2020-11-12 2021-02-26 宁畅信息产业(北京)有限公司 PCIE link stability test method, device, computer equipment and medium
CN112416672B (en) * 2020-11-12 2024-02-23 宁畅信息产业(北京)有限公司 PCIE link stability testing method, PCIE link stability testing device, computer equipment and medium
CN114185603A (en) * 2021-11-08 2022-03-15 深圳云天励飞技术股份有限公司 Control method of intelligent accelerator card, server and intelligent accelerator card
CN114185603B (en) * 2021-11-08 2024-01-05 深圳云天励飞技术股份有限公司 Control method of intelligent accelerator card, server and intelligent accelerator card
CN114255155A (en) * 2022-02-24 2022-03-29 荣耀终端有限公司 Graphics processor testing method and electronic equipment

Also Published As

Publication number Publication date
CN111338862B (en) 2022-07-19

Similar Documents

Publication Publication Date Title
CN111338862B (en) GPU mode switching stability test method, system, terminal and storage medium
CN109933504B (en) Hard disk delay test method, device, terminal and storage medium
CN112473144A (en) Game resource data processing method and device
US20240095436A1 (en) Method and device for automatic verification of pin multiplexing
CN112286723A (en) Computer room disaster recovery control method, terminal and storage medium
CN111147331A (en) Server network card interaction test method, system, terminal and storage medium
CN111506331A (en) Server BMC refreshing method, system, terminal and storage medium
CN111262753A (en) Method, system, terminal and storage medium for automatically configuring number of NUMA nodes
CN111176917B (en) Method, system, terminal and storage medium for testing stability of CPU SST-BF function
CN109992420B (en) Parallel PCIE-SSD performance optimization method and system
CN115656788B (en) Chip testing system, method, equipment and storage medium
CN112003730A (en) Method, system, terminal and storage medium for rapid cluster deployment
CN110175096B (en) GPU (graphics processing Unit) pressurization test method, system, terminal and storage medium
CN111176924A (en) GPU card dropping simulation method, system, terminal and storage medium
CN111221683A (en) Double-flash hot backup method, system, terminal and storage medium for data center switch
CN112463195B (en) Method, system, terminal and storage medium for cluster grouping online upgrade
CN114996069A (en) Mainboard test method, device and medium
CN109450682B (en) IB network card communication configuration method and device, terminal and storage medium
CN110554992B (en) Distributed metadata path management method, system, terminal and storage medium
CN113076111A (en) Customized cluster configuration method, system, terminal and storage medium
CN110399290A (en) Test device, method and the storage medium of message push
CN112260967A (en) Switch voltage adjusting method, system, terminal and storage medium
CN112463483A (en) UPI pressure testing method, system, terminal and storage medium
CN111858198A (en) Multi-scheme memory plugging test method, system, terminal and storage medium
CN111694727A (en) Network card firmware upgrading and downgrading test method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant