CN107590037A - A kind of method that EDPP tests are carried out to server GPU - Google Patents

A kind of method that EDPP tests are carried out to server GPU Download PDF

Info

Publication number
CN107590037A
CN107590037A CN201710756669.7A CN201710756669A CN107590037A CN 107590037 A CN107590037 A CN 107590037A CN 201710756669 A CN201710756669 A CN 201710756669A CN 107590037 A CN107590037 A CN 107590037A
Authority
CN
China
Prior art keywords
test
edpp
gpu
tests
carried out
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710756669.7A
Other languages
Chinese (zh)
Inventor
韩超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Yunhai Information Technology Co Ltd
Original Assignee
Zhengzhou Yunhai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Yunhai Information Technology Co Ltd filed Critical Zhengzhou Yunhai Information Technology Co Ltd
Priority to CN201710756669.7A priority Critical patent/CN107590037A/en
Publication of CN107590037A publication Critical patent/CN107590037A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of method that EDPP tests are carried out to server GPU, the method comprising the steps of:Build test environment;Multiple EDPP tests are carried out simultaneously to multiple GPU, test program is restarted in test every time;Test process and result are recorded.Compared with prior art, test process of the invention strengthens the reliability of test result without manually participating in;Test is realized carries out EDPP tests simultaneously to multiple GPU, substantially reduces the test time used, improves operating efficiency.

Description

A kind of method that EDPP tests are carried out to server GPU
Technical field
It is specifically a kind of that EDPP tests are carried out to server GPU the present invention relates to server GPU technical field of measurement and test Method.
Background technology
(PCIE, full name are peripheral component to calculation type server --- PCIE-SWITH Interconnect express, it is a kind of high speed serialization computer expansion bus standard;SWITH, it is a kind of server) it is one The server of the brand-new framework of kind, the server disclosure satisfy that client takes to high-performance calculation in the whole machine cabinet server field of China The demand of business device.
But in the R&D and production stage of server, to its graphics processor (English:Graphics Processing Unit, abbreviation:GPU test) is an important link.Due to test item is more in test process, each test item compared with Long, (EDPP, English full name are electrical date peak processing, power data peak value by single loop EDPP Processing) test whole process needs or so the 3 hours time, and simultaneously continues that item must be surveyed for multiple nodes, therefore workload is very big.
Existing means of testing is manual test and GPU is tested one by one, usage time length, and needs artificial selection, Increase the work load of staff, operating efficiency is low, and mistake and careless mistake occurs in artificial selection unavoidably.
The content of the invention
To overcome above-mentioned the shortcomings of the prior art, it is an object of the invention to provide a kind of high automaticity, height The method that EDPP tests are carried out to server GPU of testing efficiency.
The technical solution adopted for the present invention to solve the technical problems is:It is a kind of that EDPP tests are carried out to server GPU Method, comprise the following steps:
Build test environment;
Multiple EDPP tests are carried out simultaneously to multiple GPU, test program is restarted in test every time;
Test process and result are recorded.
Further, it is described build test environment detailed process be:Make GPUBOX full configurations, by PCIE-SWITCH's GPUBOX copies video driver and testing tool to the behaviour of server end by Mini SAS HD connecting test server ends Make system.
Further, described to carry out multiple EDPP tests simultaneously to multiple GPU, the tool of test program is restarted in test every time Body process is:
Input test node IP and testing time variable;
Judge testing time variable whether in the range of default testing time;
If testing time variable not in the range of preset times, terminates to test;
If testing time variable in the range of preset times, judges whether testing time variable is 1;
If testing time variable is not 1, the test program being currently running is closed, carries out the operation of next step;If Testing time variable is 1, then calls the video driver matched with testing tool, carry out the operation of next step;
Test node is logined, makes each test node while starts test program, selects EDPP test items to be tested, and it is defeated Go out test result, testing time variable is added 1, retest.
Further, it is described to call concretely comprising the following steps for the video driver matched with testing tool:
Driver version inspection order is performed, judges whether existing video driver is survey according to the version number of driver Driven needed for examination;
If being driven needed for test, then the video driver is directly invoked;
If not driving needed for test, then the video driver is unloaded, calls the video driver of copy.
Further, test node is logined by ssh services, and records output information, the output information includes each One or more information in GPU load, numbering, GPU utilization rates, power consumption, frequency, temperature and state, and pass through log files Record.
Further, the test result is recorded in same log files with output information.
The beneficial effects of the invention are as follows:
1st, the present invention carries out PCIE-SWITH whole machine cabinet full configurations, multinode while to GPU EDPP tests, real under single cycle Now to being tested while 32 GPU, the testing time is greatly reduced, improves operating efficiency;.
2nd, the result and GPU information records of the invention by test is convenient subsequently to GPU performances into same log files Check and be accurately positioned with problem, avoid manual testing and the error to GPU selections.
3rd, testing time is set to multiple by method of the invention, and test restarts test program every time, avoids Secondary test impacts to this test, ensures accurate, the reliability of each test result.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the method for the invention.
Fig. 2 is to the GPU of full configuration while to carry out the schematic flow sheet of multiple EDPP tests in the present invention.
Embodiment
For the technical characterstic for illustrating this programme can be understood, below by embodiment, and its accompanying drawing is combined, to this hair It is bright to be described in detail.Following disclosure provides many different embodiments or example is used for realizing the different knots of the present invention Structure.In order to simplify disclosure of the invention, hereinafter the part and setting of specific examples are described.In addition, the present invention can be with Repeat reference numerals and/or letter in different examples.This repetition is that for purposes of simplicity and clarity, itself is not indicated Relation between various embodiments are discussed and/or set.It should be noted that part illustrated in the accompanying drawings is not necessarily to scale Draw.Present invention omits the description to known assemblies and treatment technology and process to avoid being unnecessarily limiting the present invention.
PCIE-SWITH whole machine cabinets full configuration and connecting test server are passed through the test work at test server end by the present invention Tool calls GPU, makes the GPU of whole machine cabinet while carries out EDPP tests.
Wherein testing tool is the testing tool that EDPP tests are carried out to GPU, the Nvqual testing tools of the invention selected For NVQual_P40_v4_18, collocation driver is needed to use using the testing tool in the testing tool, the present invention The driver of selection is P40gpu:driver_367.57.
As shown in figure 1, the concrete operation step of this method is,
S1, build test environment;
S2, multiple EDPP tests are carried out simultaneously to multiple GPU, test program is restarted in test every time;
S3, test process and result are recorded.
In step S1, to building including to building for hardware test environment and taking for software testing environment for test environment Build, building for hardware test environment is referred to the GPUBOX full configurations and test server of two PCIE-SWITCH whole machine cabinets (server) connected with Mini SAS HD cables.Wherein full configuration refers to that single GPUBOX installs 16 GPU.One Corresponding two server of GPUBOX.Wherein GPUBOX is the box for installing GPU.Building for software environment is driven by copying video card Dynamic and testing tool is realized to the operating system of server end.When test starts, GPUBOX is first turned on, opened after 1min pair Server is answered, ensures the stability of test system.
As shown in Fig. 2 concretely comprising the following steps for multiple EDPP tests is carried out simultaneously to the GPU of full configuration in step S2:
S21, input test node IP and testing time variable;
S22, judge testing time variable whether in the range of default testing time;
S23, if testing time variable not in the range of preset times, terminates to test;
S24, if testing time variable in the range of preset times, judges whether testing time variable is 1;
S25, if testing time variable is not 1, close the test program being currently running;
S26, test node is logined, makes each test node while start test program, select EDPP test items to be tested, And output test result, testing time variable is added 1, go to step S22;
S27, if testing time variable is 1, judge whether existing video driver is to be driven needed for test;
S28, if being driven needed for test, then the video driver is directly invoked, goes to step S25;
S29, if not driving needed for test, then the video driver is unloaded, the video driver of copy is called, goes to step S25。
In step S21, in the command input line input test node IP and testing time variable of test program.The present invention is defeated Enter 4 test node IP, wherein corresponding 8 GPU of each test node IP, therefore the present invention carries out EDPP to 32 GPU simultaneously Test.In step S22, preset times are multiple, and the present invention selects 10 times, and preset times carry out repeating survey to every 32 GPU The number of examination, the test result that repeatedly test obtains is more accurate, and reliability is stronger, the present invention carry out 1 test it is used when Between be about 2.5 hours, compared to traditional speed for needing 3 hours to 1 GPU test, substantially increase operating efficiency.
In step S24, to testing time, whether the judgement for being 1 is to further check and judge server operating systems Under whether the video driver that carries also be present, as the video driver carried be present, unloading the video driver carried, being surveyed using building The video driver of the stylish copy in test ring border.This is due to that the video driver that operating system carries can not drive existing test work Tool, thus GPU EDPP tests can not be carried out.
In step S25, to ensure that the stable of test process is carried out, the testing tool being currently running first is closed, is made each time The test node restarting test program of test, the test program for avoiding the occurrence of last time test are not turned off causing this test Influence, ensure the accuracy of each test result.
In step S26, test node is logined by ssh services, records and output information, output information includes each GPU Load, numbering, GPU utilization rates, power consumption, frequency, temperature, the information in state.And by log file records, it is below Citing to 8 GPU output contents:
Current Application Clocks:[1531MHz]
sgemm Workload-M:3584N:1024K:4096L:100000I:10000P:X:T:60-PASS
Index, timestamp, utilization.gpu [%], power.draw [W], clocks.current.graphics[MHz],temperature.gpu,pstate,clocks_throttle_ reasons.hw_slowdown
0,2017/04/01 16:27:42.062,0%, 14.00W, 544MHz, 31, P8, Not Active
1,2017/04/01 16:27:42.077,0%, 13.19W, 544MHz, 30, P8, Not Active
2,2017/04/01 16:27:42.091,0%, 12.39W, 544MHz, 30, P8, Not Active
3,2017/04/01 16:27:42.106,0%, 13.38W, 544MHz, 30, P8, Not Active
4,2017/04/01 16:27:42.121,0%, 12.89W, 544MHz, 31, P8, Not Active
5,2017/04/01 16:27:42.136,0%, 12.99W, 544MHz, 30, P8, Not Active
6,2017/04/01 16:27:42.152,0%, 13.40W, 544MHz, 29, P8, Not Active
7,2017/04/01 16:27:42.167,0%, 12.79W, 544MHz, 33, P8, Not Active.
In step S26, the EDPP test items of Nvqual testing tools are selected, GPU is carried out and is powered under fluctuating load pressure Reach the ability to bear of peak value, if exporting Pass by testing, otherwise test exports Fail.By the defeated of test result and GPU Go out information record under same log files, be easy to subsequently to checking for GPU performances and being accurately positioned for problem.
Whether it is to test the required judgement driven by performing driver version to existing video driver in step S27 Order is checked, is realized according to the version number of driver.Wherein driver version inspection order is check_driver=" Nvidia-smi | grep-i version ", wherein nvidia-smi are the instructions for capturing current GPU information, grep-i Version is the instruction for grabbing the driving version in area's information, in of the invention, the version of the video driver to match with GPU tests For Tesla P40, model selects 367.57.
When carrying out EDPP tests, to test node while Opening pressure, make test node while reach maximum, wherein Pressure is the ability handled various different size transmission of data blocks the GPU unit interval.
In step S3, test process and the record of result are believed by log file records test result and GPU output Breath.
Simply the preferred embodiment of the present invention described above, for those skilled in the art, Without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also regarded as this hair Bright protection domain.

Claims (6)

1. a kind of method that EDPP tests are carried out to server GPU, it is characterized in that, comprise the following steps:
Build test environment;
Multiple EDPP tests are carried out simultaneously to multiple GPU, test program is restarted in test every time;
Test process and result are recorded.
2. a kind of method that EDPP tests are carried out to server GPU according to claim 1, it is characterized in that, it is described to build The detailed process of test environment is:Make GPUBOX full configurations, PCIE-SWITCH GPUBOX is surveyed by Mini SAS HD connections Server end is tried, and copies video driver and testing tool to the operating system of server end.
3. a kind of method that EDPP tests are carried out to server GPU according to claim 1, it is characterized in that, it is described to more Individual GPU carries out multiple EDPP tests simultaneously, and the detailed process that test program is restarted in test every time is:
Input test node IP and testing time variable;
Judge testing time variable whether in the range of default testing time;
If testing time variable not in the range of preset times, terminates to test;
If testing time variable in the range of preset times, judges whether testing time variable is 1;
If testing time variable is not 1, the test program being currently running is closed, carries out the operation of next step;If test Degree variables are 1, then call the video driver matched with testing tool, carry out the operation of next step;
Test node is logined, makes each test node while starts test program, selects EDPP test items to be tested, and export survey Test result, testing time variable is set to add 1, retest.
4. a kind of method that EDPP tests are carried out to server GPU according to claim 3, it is characterized in that, the calling The video driver matched with testing tool concretely comprises the following steps:
Driver version inspection order is performed, judges whether existing video driver is test institute according to the version number of driver Need to drive;
If being driven needed for test, then the video driver is directly invoked;
If not driving needed for test, then the video driver is unloaded, calls the video driver of copy.
5. a kind of method that EDPP tests are carried out to server GPU according to claim 4, it is characterized in that, taken by ssh Test node is logined in business, and records output information, load of the output information including each GPU, numbering, GPU utilization rates, work( One or more information in consumption, frequency, temperature and state, and pass through log file records.
6. a kind of method that EDPP tests are carried out to server GPU according to claim 5, it is characterized in that, the test As a result it is recorded in output information in same log files.
CN201710756669.7A 2017-08-29 2017-08-29 A kind of method that EDPP tests are carried out to server GPU Pending CN107590037A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710756669.7A CN107590037A (en) 2017-08-29 2017-08-29 A kind of method that EDPP tests are carried out to server GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710756669.7A CN107590037A (en) 2017-08-29 2017-08-29 A kind of method that EDPP tests are carried out to server GPU

Publications (1)

Publication Number Publication Date
CN107590037A true CN107590037A (en) 2018-01-16

Family

ID=61051311

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710756669.7A Pending CN107590037A (en) 2017-08-29 2017-08-29 A kind of method that EDPP tests are carried out to server GPU

Country Status (1)

Country Link
CN (1) CN107590037A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108958999A (en) * 2018-06-13 2018-12-07 郑州云海信息技术有限公司 A kind of method and system for testing GPU floating-point operation performance
CN109408312A (en) * 2018-11-01 2019-03-01 郑州云海信息技术有限公司 A kind of server running temperature test macro and equipment
CN109684144A (en) * 2018-12-26 2019-04-26 郑州云海信息技术有限公司 A kind of method and device of GPU-BOX system testing
CN112416672A (en) * 2020-11-12 2021-02-26 宁畅信息产业(北京)有限公司 PCIE link stability test method, device, computer equipment and medium
CN114255155A (en) * 2022-02-24 2022-03-29 荣耀终端有限公司 Graphics processor testing method and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101369244A (en) * 2007-08-14 2009-02-18 鸿富锦精密工业(深圳)有限公司 Graphic display card test method
CN102479127A (en) * 2010-11-30 2012-05-30 英业达股份有限公司 System for performing power supply cycle period test on multiple servers
CN102541679A (en) * 2011-12-30 2012-07-04 曙光信息产业股份有限公司 Method and system for testing GPU (graphic processing unit) cards
CN104268046A (en) * 2014-10-17 2015-01-07 浪潮电子信息产业股份有限公司 Linux-based man-machine interaction NVIDIA GPU (Graphics Processing Unit) automatic testing method
CN106649014A (en) * 2016-12-28 2017-05-10 郑州云海信息技术有限公司 Automatic testing method of calculating type server which supports multiple GPUs

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101369244A (en) * 2007-08-14 2009-02-18 鸿富锦精密工业(深圳)有限公司 Graphic display card test method
CN102479127A (en) * 2010-11-30 2012-05-30 英业达股份有限公司 System for performing power supply cycle period test on multiple servers
CN102541679A (en) * 2011-12-30 2012-07-04 曙光信息产业股份有限公司 Method and system for testing GPU (graphic processing unit) cards
CN104268046A (en) * 2014-10-17 2015-01-07 浪潮电子信息产业股份有限公司 Linux-based man-machine interaction NVIDIA GPU (Graphics Processing Unit) automatic testing method
CN106649014A (en) * 2016-12-28 2017-05-10 郑州云海信息技术有限公司 Automatic testing method of calculating type server which supports multiple GPUs

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108958999A (en) * 2018-06-13 2018-12-07 郑州云海信息技术有限公司 A kind of method and system for testing GPU floating-point operation performance
CN109408312A (en) * 2018-11-01 2019-03-01 郑州云海信息技术有限公司 A kind of server running temperature test macro and equipment
CN109408312B (en) * 2018-11-01 2021-10-29 郑州云海信息技术有限公司 Server operating temperature test system and equipment
CN109684144A (en) * 2018-12-26 2019-04-26 郑州云海信息技术有限公司 A kind of method and device of GPU-BOX system testing
CN109684144B (en) * 2018-12-26 2021-11-02 郑州云海信息技术有限公司 Method and device for testing GPU-BOX system
CN112416672A (en) * 2020-11-12 2021-02-26 宁畅信息产业(北京)有限公司 PCIE link stability test method, device, computer equipment and medium
CN112416672B (en) * 2020-11-12 2024-02-23 宁畅信息产业(北京)有限公司 PCIE link stability testing method, PCIE link stability testing device, computer equipment and medium
CN114255155A (en) * 2022-02-24 2022-03-29 荣耀终端有限公司 Graphics processor testing method and electronic equipment

Similar Documents

Publication Publication Date Title
CN107590037A (en) A kind of method that EDPP tests are carried out to server GPU
CN109302522B (en) Test method, test device, computer system, and computer medium
CN101093462B (en) Automatization method for testing schooling pressure on database application
CN106326067B (en) A kind of method and device that cpu performance is monitored under pressure test
US8683268B2 (en) Key based cluster log coalescing
US9454467B2 (en) Method and apparatus for mining test coverage data
CN104572422A (en) Memory monitoring achievement method based on startup and shutdown of Linux system
US9645911B2 (en) System and method for debugging firmware/software by generating trace data
US10013335B2 (en) Data flow analysis in processor trace logs using compiler-type information method and apparatus
CN108153675A (en) A kind of Android application automated testing methods towards mobile cloud computing
US20150070367A1 (en) Shader Program Profiler
CN103488513B (en) Equipment plug and play general drive method
CN103746879A (en) Testing system and method for consistency of IPv6 (Internet Protocol Version 6) protocol
CN108459951A (en) test method and device
CN105260286A (en) Method for monitoring CPU working state in real time
CN110941553A (en) Code detection method, device, equipment and readable storage medium
JP2012163997A (en) Failure analysis support system, failure analysis support method, and failure analysis support program
CN117572217A (en) Integrated circuit test excitation method, device, equipment and storage medium
CN108984405B (en) Performance test method, device and computer readable storage medium
CN113641575B (en) Test method, device, equipment and storage medium
CN109634792A (en) A kind of server hardware test platform system based on cloud computing
CN104408136A (en) Log treatment method for public medical system
JP2008101921A (en) System for testing semiconductor
WO2021109366A1 (en) Method and system for viewing simulation signals of digital product
CN109920466B (en) Hard disk test data analysis method, device, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180116

RJ01 Rejection of invention patent application after publication