CN107590037A - A kind of method that EDPP tests are carried out to server GPU - Google Patents
A kind of method that EDPP tests are carried out to server GPU Download PDFInfo
- Publication number
- CN107590037A CN107590037A CN201710756669.7A CN201710756669A CN107590037A CN 107590037 A CN107590037 A CN 107590037A CN 201710756669 A CN201710756669 A CN 201710756669A CN 107590037 A CN107590037 A CN 107590037A
- Authority
- CN
- China
- Prior art keywords
- test
- edpp
- gpu
- tests
- carried out
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a kind of method that EDPP tests are carried out to server GPU, the method comprising the steps of:Build test environment;Multiple EDPP tests are carried out simultaneously to multiple GPU, test program is restarted in test every time;Test process and result are recorded.Compared with prior art, test process of the invention strengthens the reliability of test result without manually participating in;Test is realized carries out EDPP tests simultaneously to multiple GPU, substantially reduces the test time used, improves operating efficiency.
Description
Technical field
It is specifically a kind of that EDPP tests are carried out to server GPU the present invention relates to server GPU technical field of measurement and test
Method.
Background technology
(PCIE, full name are peripheral component to calculation type server --- PCIE-SWITH
Interconnect express, it is a kind of high speed serialization computer expansion bus standard;SWITH, it is a kind of server) it is one
The server of the brand-new framework of kind, the server disclosure satisfy that client takes to high-performance calculation in the whole machine cabinet server field of China
The demand of business device.
But in the R&D and production stage of server, to its graphics processor (English:Graphics Processing
Unit, abbreviation:GPU test) is an important link.Due to test item is more in test process, each test item compared with
Long, (EDPP, English full name are electrical date peak processing, power data peak value by single loop EDPP
Processing) test whole process needs or so the 3 hours time, and simultaneously continues that item must be surveyed for multiple nodes, therefore workload is very big.
Existing means of testing is manual test and GPU is tested one by one, usage time length, and needs artificial selection,
Increase the work load of staff, operating efficiency is low, and mistake and careless mistake occurs in artificial selection unavoidably.
The content of the invention
To overcome above-mentioned the shortcomings of the prior art, it is an object of the invention to provide a kind of high automaticity, height
The method that EDPP tests are carried out to server GPU of testing efficiency.
The technical solution adopted for the present invention to solve the technical problems is:It is a kind of that EDPP tests are carried out to server GPU
Method, comprise the following steps:
Build test environment;
Multiple EDPP tests are carried out simultaneously to multiple GPU, test program is restarted in test every time;
Test process and result are recorded.
Further, it is described build test environment detailed process be:Make GPUBOX full configurations, by PCIE-SWITCH's
GPUBOX copies video driver and testing tool to the behaviour of server end by Mini SAS HD connecting test server ends
Make system.
Further, described to carry out multiple EDPP tests simultaneously to multiple GPU, the tool of test program is restarted in test every time
Body process is:
Input test node IP and testing time variable;
Judge testing time variable whether in the range of default testing time;
If testing time variable not in the range of preset times, terminates to test;
If testing time variable in the range of preset times, judges whether testing time variable is 1;
If testing time variable is not 1, the test program being currently running is closed, carries out the operation of next step;If
Testing time variable is 1, then calls the video driver matched with testing tool, carry out the operation of next step;
Test node is logined, makes each test node while starts test program, selects EDPP test items to be tested, and it is defeated
Go out test result, testing time variable is added 1, retest.
Further, it is described to call concretely comprising the following steps for the video driver matched with testing tool:
Driver version inspection order is performed, judges whether existing video driver is survey according to the version number of driver
Driven needed for examination;
If being driven needed for test, then the video driver is directly invoked;
If not driving needed for test, then the video driver is unloaded, calls the video driver of copy.
Further, test node is logined by ssh services, and records output information, the output information includes each
One or more information in GPU load, numbering, GPU utilization rates, power consumption, frequency, temperature and state, and pass through log files
Record.
Further, the test result is recorded in same log files with output information.
The beneficial effects of the invention are as follows:
1st, the present invention carries out PCIE-SWITH whole machine cabinet full configurations, multinode while to GPU EDPP tests, real under single cycle
Now to being tested while 32 GPU, the testing time is greatly reduced, improves operating efficiency;.
2nd, the result and GPU information records of the invention by test is convenient subsequently to GPU performances into same log files
Check and be accurately positioned with problem, avoid manual testing and the error to GPU selections.
3rd, testing time is set to multiple by method of the invention, and test restarts test program every time, avoids
Secondary test impacts to this test, ensures accurate, the reliability of each test result.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the method for the invention.
Fig. 2 is to the GPU of full configuration while to carry out the schematic flow sheet of multiple EDPP tests in the present invention.
Embodiment
For the technical characterstic for illustrating this programme can be understood, below by embodiment, and its accompanying drawing is combined, to this hair
It is bright to be described in detail.Following disclosure provides many different embodiments or example is used for realizing the different knots of the present invention
Structure.In order to simplify disclosure of the invention, hereinafter the part and setting of specific examples are described.In addition, the present invention can be with
Repeat reference numerals and/or letter in different examples.This repetition is that for purposes of simplicity and clarity, itself is not indicated
Relation between various embodiments are discussed and/or set.It should be noted that part illustrated in the accompanying drawings is not necessarily to scale
Draw.Present invention omits the description to known assemblies and treatment technology and process to avoid being unnecessarily limiting the present invention.
PCIE-SWITH whole machine cabinets full configuration and connecting test server are passed through the test work at test server end by the present invention
Tool calls GPU, makes the GPU of whole machine cabinet while carries out EDPP tests.
Wherein testing tool is the testing tool that EDPP tests are carried out to GPU, the Nvqual testing tools of the invention selected
For NVQual_P40_v4_18, collocation driver is needed to use using the testing tool in the testing tool, the present invention
The driver of selection is P40gpu:driver_367.57.
As shown in figure 1, the concrete operation step of this method is,
S1, build test environment;
S2, multiple EDPP tests are carried out simultaneously to multiple GPU, test program is restarted in test every time;
S3, test process and result are recorded.
In step S1, to building including to building for hardware test environment and taking for software testing environment for test environment
Build, building for hardware test environment is referred to the GPUBOX full configurations and test server of two PCIE-SWITCH whole machine cabinets
(server) connected with Mini SAS HD cables.Wherein full configuration refers to that single GPUBOX installs 16 GPU.One
Corresponding two server of GPUBOX.Wherein GPUBOX is the box for installing GPU.Building for software environment is driven by copying video card
Dynamic and testing tool is realized to the operating system of server end.When test starts, GPUBOX is first turned on, opened after 1min pair
Server is answered, ensures the stability of test system.
As shown in Fig. 2 concretely comprising the following steps for multiple EDPP tests is carried out simultaneously to the GPU of full configuration in step S2:
S21, input test node IP and testing time variable;
S22, judge testing time variable whether in the range of default testing time;
S23, if testing time variable not in the range of preset times, terminates to test;
S24, if testing time variable in the range of preset times, judges whether testing time variable is 1;
S25, if testing time variable is not 1, close the test program being currently running;
S26, test node is logined, makes each test node while start test program, select EDPP test items to be tested,
And output test result, testing time variable is added 1, go to step S22;
S27, if testing time variable is 1, judge whether existing video driver is to be driven needed for test;
S28, if being driven needed for test, then the video driver is directly invoked, goes to step S25;
S29, if not driving needed for test, then the video driver is unloaded, the video driver of copy is called, goes to step
S25。
In step S21, in the command input line input test node IP and testing time variable of test program.The present invention is defeated
Enter 4 test node IP, wherein corresponding 8 GPU of each test node IP, therefore the present invention carries out EDPP to 32 GPU simultaneously
Test.In step S22, preset times are multiple, and the present invention selects 10 times, and preset times carry out repeating survey to every 32 GPU
The number of examination, the test result that repeatedly test obtains is more accurate, and reliability is stronger, the present invention carry out 1 test it is used when
Between be about 2.5 hours, compared to traditional speed for needing 3 hours to 1 GPU test, substantially increase operating efficiency.
In step S24, to testing time, whether the judgement for being 1 is to further check and judge server operating systems
Under whether the video driver that carries also be present, as the video driver carried be present, unloading the video driver carried, being surveyed using building
The video driver of the stylish copy in test ring border.This is due to that the video driver that operating system carries can not drive existing test work
Tool, thus GPU EDPP tests can not be carried out.
In step S25, to ensure that the stable of test process is carried out, the testing tool being currently running first is closed, is made each time
The test node restarting test program of test, the test program for avoiding the occurrence of last time test are not turned off causing this test
Influence, ensure the accuracy of each test result.
In step S26, test node is logined by ssh services, records and output information, output information includes each GPU
Load, numbering, GPU utilization rates, power consumption, frequency, temperature, the information in state.And by log file records, it is below
Citing to 8 GPU output contents:
Current Application Clocks:[1531MHz]
sgemm Workload-M:3584N:1024K:4096L:100000I:10000P:X:T:60-PASS
Index, timestamp, utilization.gpu [%], power.draw [W],
clocks.current.graphics[MHz],temperature.gpu,pstate,clocks_throttle_
reasons.hw_slowdown
0,2017/04/01 16:27:42.062,0%, 14.00W, 544MHz, 31, P8, Not Active
1,2017/04/01 16:27:42.077,0%, 13.19W, 544MHz, 30, P8, Not Active
2,2017/04/01 16:27:42.091,0%, 12.39W, 544MHz, 30, P8, Not Active
3,2017/04/01 16:27:42.106,0%, 13.38W, 544MHz, 30, P8, Not Active
4,2017/04/01 16:27:42.121,0%, 12.89W, 544MHz, 31, P8, Not Active
5,2017/04/01 16:27:42.136,0%, 12.99W, 544MHz, 30, P8, Not Active
6,2017/04/01 16:27:42.152,0%, 13.40W, 544MHz, 29, P8, Not Active
7,2017/04/01 16:27:42.167,0%, 12.79W, 544MHz, 33, P8, Not Active.
In step S26, the EDPP test items of Nvqual testing tools are selected, GPU is carried out and is powered under fluctuating load pressure
Reach the ability to bear of peak value, if exporting Pass by testing, otherwise test exports Fail.By the defeated of test result and GPU
Go out information record under same log files, be easy to subsequently to checking for GPU performances and being accurately positioned for problem.
Whether it is to test the required judgement driven by performing driver version to existing video driver in step S27
Order is checked, is realized according to the version number of driver.Wherein driver version inspection order is check_driver="
Nvidia-smi | grep-i version ", wherein nvidia-smi are the instructions for capturing current GPU information, grep-i
Version is the instruction for grabbing the driving version in area's information, in of the invention, the version of the video driver to match with GPU tests
For Tesla P40, model selects 367.57.
When carrying out EDPP tests, to test node while Opening pressure, make test node while reach maximum, wherein
Pressure is the ability handled various different size transmission of data blocks the GPU unit interval.
In step S3, test process and the record of result are believed by log file records test result and GPU output
Breath.
Simply the preferred embodiment of the present invention described above, for those skilled in the art,
Without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also regarded as this hair
Bright protection domain.
Claims (6)
1. a kind of method that EDPP tests are carried out to server GPU, it is characterized in that, comprise the following steps:
Build test environment;
Multiple EDPP tests are carried out simultaneously to multiple GPU, test program is restarted in test every time;
Test process and result are recorded.
2. a kind of method that EDPP tests are carried out to server GPU according to claim 1, it is characterized in that, it is described to build
The detailed process of test environment is:Make GPUBOX full configurations, PCIE-SWITCH GPUBOX is surveyed by Mini SAS HD connections
Server end is tried, and copies video driver and testing tool to the operating system of server end.
3. a kind of method that EDPP tests are carried out to server GPU according to claim 1, it is characterized in that, it is described to more
Individual GPU carries out multiple EDPP tests simultaneously, and the detailed process that test program is restarted in test every time is:
Input test node IP and testing time variable;
Judge testing time variable whether in the range of default testing time;
If testing time variable not in the range of preset times, terminates to test;
If testing time variable in the range of preset times, judges whether testing time variable is 1;
If testing time variable is not 1, the test program being currently running is closed, carries out the operation of next step;If test
Degree variables are 1, then call the video driver matched with testing tool, carry out the operation of next step;
Test node is logined, makes each test node while starts test program, selects EDPP test items to be tested, and export survey
Test result, testing time variable is set to add 1, retest.
4. a kind of method that EDPP tests are carried out to server GPU according to claim 3, it is characterized in that, the calling
The video driver matched with testing tool concretely comprises the following steps:
Driver version inspection order is performed, judges whether existing video driver is test institute according to the version number of driver
Need to drive;
If being driven needed for test, then the video driver is directly invoked;
If not driving needed for test, then the video driver is unloaded, calls the video driver of copy.
5. a kind of method that EDPP tests are carried out to server GPU according to claim 4, it is characterized in that, taken by ssh
Test node is logined in business, and records output information, load of the output information including each GPU, numbering, GPU utilization rates, work(
One or more information in consumption, frequency, temperature and state, and pass through log file records.
6. a kind of method that EDPP tests are carried out to server GPU according to claim 5, it is characterized in that, the test
As a result it is recorded in output information in same log files.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710756669.7A CN107590037A (en) | 2017-08-29 | 2017-08-29 | A kind of method that EDPP tests are carried out to server GPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710756669.7A CN107590037A (en) | 2017-08-29 | 2017-08-29 | A kind of method that EDPP tests are carried out to server GPU |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107590037A true CN107590037A (en) | 2018-01-16 |
Family
ID=61051311
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710756669.7A Pending CN107590037A (en) | 2017-08-29 | 2017-08-29 | A kind of method that EDPP tests are carried out to server GPU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107590037A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108958999A (en) * | 2018-06-13 | 2018-12-07 | 郑州云海信息技术有限公司 | A kind of method and system for testing GPU floating-point operation performance |
CN109408312A (en) * | 2018-11-01 | 2019-03-01 | 郑州云海信息技术有限公司 | A kind of server running temperature test macro and equipment |
CN109684144A (en) * | 2018-12-26 | 2019-04-26 | 郑州云海信息技术有限公司 | A kind of method and device of GPU-BOX system testing |
CN112416672A (en) * | 2020-11-12 | 2021-02-26 | 宁畅信息产业(北京)有限公司 | PCIE link stability test method, device, computer equipment and medium |
CN114255155A (en) * | 2022-02-24 | 2022-03-29 | 荣耀终端有限公司 | Graphics processor testing method and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101369244A (en) * | 2007-08-14 | 2009-02-18 | 鸿富锦精密工业(深圳)有限公司 | Graphic display card test method |
CN102479127A (en) * | 2010-11-30 | 2012-05-30 | 英业达股份有限公司 | System for performing power supply cycle period test on multiple servers |
CN102541679A (en) * | 2011-12-30 | 2012-07-04 | 曙光信息产业股份有限公司 | Method and system for testing GPU (graphic processing unit) cards |
CN104268046A (en) * | 2014-10-17 | 2015-01-07 | 浪潮电子信息产业股份有限公司 | Linux-based man-machine interaction NVIDIA GPU (Graphics Processing Unit) automatic testing method |
CN106649014A (en) * | 2016-12-28 | 2017-05-10 | 郑州云海信息技术有限公司 | Automatic testing method of calculating type server which supports multiple GPUs |
-
2017
- 2017-08-29 CN CN201710756669.7A patent/CN107590037A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101369244A (en) * | 2007-08-14 | 2009-02-18 | 鸿富锦精密工业(深圳)有限公司 | Graphic display card test method |
CN102479127A (en) * | 2010-11-30 | 2012-05-30 | 英业达股份有限公司 | System for performing power supply cycle period test on multiple servers |
CN102541679A (en) * | 2011-12-30 | 2012-07-04 | 曙光信息产业股份有限公司 | Method and system for testing GPU (graphic processing unit) cards |
CN104268046A (en) * | 2014-10-17 | 2015-01-07 | 浪潮电子信息产业股份有限公司 | Linux-based man-machine interaction NVIDIA GPU (Graphics Processing Unit) automatic testing method |
CN106649014A (en) * | 2016-12-28 | 2017-05-10 | 郑州云海信息技术有限公司 | Automatic testing method of calculating type server which supports multiple GPUs |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108958999A (en) * | 2018-06-13 | 2018-12-07 | 郑州云海信息技术有限公司 | A kind of method and system for testing GPU floating-point operation performance |
CN109408312A (en) * | 2018-11-01 | 2019-03-01 | 郑州云海信息技术有限公司 | A kind of server running temperature test macro and equipment |
CN109408312B (en) * | 2018-11-01 | 2021-10-29 | 郑州云海信息技术有限公司 | Server operating temperature test system and equipment |
CN109684144A (en) * | 2018-12-26 | 2019-04-26 | 郑州云海信息技术有限公司 | A kind of method and device of GPU-BOX system testing |
CN109684144B (en) * | 2018-12-26 | 2021-11-02 | 郑州云海信息技术有限公司 | Method and device for testing GPU-BOX system |
CN112416672A (en) * | 2020-11-12 | 2021-02-26 | 宁畅信息产业(北京)有限公司 | PCIE link stability test method, device, computer equipment and medium |
CN112416672B (en) * | 2020-11-12 | 2024-02-23 | 宁畅信息产业(北京)有限公司 | PCIE link stability testing method, PCIE link stability testing device, computer equipment and medium |
CN114255155A (en) * | 2022-02-24 | 2022-03-29 | 荣耀终端有限公司 | Graphics processor testing method and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107590037A (en) | A kind of method that EDPP tests are carried out to server GPU | |
CN109302522B (en) | Test method, test device, computer system, and computer medium | |
CN101093462B (en) | Automatization method for testing schooling pressure on database application | |
CN106326067B (en) | A kind of method and device that cpu performance is monitored under pressure test | |
US8683268B2 (en) | Key based cluster log coalescing | |
US9454467B2 (en) | Method and apparatus for mining test coverage data | |
CN104572422A (en) | Memory monitoring achievement method based on startup and shutdown of Linux system | |
US9645911B2 (en) | System and method for debugging firmware/software by generating trace data | |
US10013335B2 (en) | Data flow analysis in processor trace logs using compiler-type information method and apparatus | |
CN108153675A (en) | A kind of Android application automated testing methods towards mobile cloud computing | |
US20150070367A1 (en) | Shader Program Profiler | |
CN103488513B (en) | Equipment plug and play general drive method | |
CN103746879A (en) | Testing system and method for consistency of IPv6 (Internet Protocol Version 6) protocol | |
CN108459951A (en) | test method and device | |
CN105260286A (en) | Method for monitoring CPU working state in real time | |
CN110941553A (en) | Code detection method, device, equipment and readable storage medium | |
JP2012163997A (en) | Failure analysis support system, failure analysis support method, and failure analysis support program | |
CN117572217A (en) | Integrated circuit test excitation method, device, equipment and storage medium | |
CN108984405B (en) | Performance test method, device and computer readable storage medium | |
CN113641575B (en) | Test method, device, equipment and storage medium | |
CN109634792A (en) | A kind of server hardware test platform system based on cloud computing | |
CN104408136A (en) | Log treatment method for public medical system | |
JP2008101921A (en) | System for testing semiconductor | |
WO2021109366A1 (en) | Method and system for viewing simulation signals of digital product | |
CN109920466B (en) | Hard disk test data analysis method, device, terminal and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180116 |
|
RJ01 | Rejection of invention patent application after publication |