CN109491871A - Method and device for acquiring equipment information of GPU - Google Patents

Method and device for acquiring equipment information of GPU Download PDF

Info

Publication number
CN109491871A
CN109491871A CN201811385729.XA CN201811385729A CN109491871A CN 109491871 A CN109491871 A CN 109491871A CN 201811385729 A CN201811385729 A CN 201811385729A CN 109491871 A CN109491871 A CN 109491871A
Authority
CN
China
Prior art keywords
gpu
devices
physical layer
equipment
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811385729.XA
Other languages
Chinese (zh)
Inventor
徐伟超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201811385729.XA priority Critical patent/CN109491871A/en
Publication of CN109491871A publication Critical patent/CN109491871A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application provides a method for acquiring equipment information of a GPU, which comprises the following steps: respectively acquiring the number of GPU equipment and the ID of the GPU from a system layer and a physical layer; judging whether the number of GPU equipment and the ID of the GPU which are obtained from the system layer are the same as the number of GPU equipment and the ID of the GPU which are obtained from the physical layer; and when the judgment is the same, respectively acquiring the equipment information of the GPU from the system layer and the physical layer. The method for acquiring the equipment information of the GPU is simple to operate and high in practicability, and time for manual operation and intervention can be reduced. The application also provides a device for acquiring the equipment information of the GPU.

Description

A kind of equipment information acquiring method and device of GPU
Technical field
The present invention relates to field of computer technology more particularly to the equipment information acquiring methods and device of a kind of GPU.
Background technique
More and more extensive now with AI technical application, the server product with GPU equipment is indispensable important Hardware components, the GPU for figure, field of image processing can be very suitable to the height of deep learning with parallel processing mass data Parallel, high localization data scene is the AI artificial intelligence computing architecture of current mainstream.
The GPU of NVIDIA occupies an leading position in server product at present, in actual design, the quantity of GPU easily 8 It is a, 16, futuristic design may more quantity, for being grabbed in this situation server product development process for GPU information Comparison temporarily none effectively intuitive tool and method are taken, is imitated by the information comparison of the artificial each GPU equipment of progress Rate is relatively low and error of omission occurs.And GPU quantity is occurred under system in development process at present in varying environment The different abnormal generation of lower information.
Summary of the invention
In order to solve above-mentioned technical problem of the existing technology, this application provides the apparatus information acquirings of GPU a kind of Method and device, it is easy to operate applied to the quantity and information scratching of server GPU equipment, it is practical, people can be reduced Time work operation and intervened.
This application provides a kind of information acquisition methods of GPU equipment, which comprises
GPU number of devices and the ID of GPU are obtained from system layer and physical layer respectively;
Judge from system layer obtain GPU number of devices and GPU ID with from physical layer obtain GPU number of devices and Whether the ID of GPU is identical;
When judging identical, the facility information of GPU is obtained from the system layer and the physical layer respectively.
Optionally, if from the system layer obtain GPU number of devices and GPU ID with from the physical layer obtain The ID of GPU number of devices and GPU are different, the method also includes:
Interrupt process, and timely carry out miscue.
Optionally, the facility information of the GPU includes at least following one of which:
Bandwidth rates, GPU quantity and the video card basic input output system VBIOS of GPU equipment.
Optionally, described respectively after the facility information that the system layer and the physical layer obtain GPU, the method is also Include:
By the facility information of the GPU obtained from the system layer and the physical layer, it is saved in different days respectively In will file.
Optionally, the method also includes being ranked up processing to the ID of the GPU.
The embodiment of the present application also provides a kind of information acquisition device of GPU equipment, described device includes: the first acquisition mould Block, judgment module and second obtain module;
Described first obtains module, for obtaining GPU number of devices and the ID of GPU from system layer and physical layer respectively;
The ID of the judgment module, GPU number of devices and GPU for judging to obtain from system layer is obtained with from physical layer GPU number of devices and GPU ID it is whether identical;
Described second obtains module, when for judging identical, obtains GPU's from the system layer and the physical layer respectively Facility information.
Optionally, which is characterized in that if the ID of the GPU number of devices and GPU obtained from the system layer with from the object The ID for managing GPU number of devices and GPU that layer obtains is different, and the judgment module is also used to:
Interrupt process, and timely carry out miscue.
Optionally, which is characterized in that the facility information of the GPU includes at least following one of which:
Bandwidth rates, GPU quantity and the video card basic input output system VBIOS of GPU equipment.
Optionally, described device further include: memory module;
The memory module, the facility information of the GPU for will be obtained from the system layer and the physical layer, point It is not saved in different journal files.
Optionally, described first module and the second acquisition module are obtained, is also used to be ranked up the ID of the GPU Processing.
Compared with prior art, the present invention has at least the following advantages:
This application provides a kind of information acquisition methods of GPU equipment, which comprises respectively from system layer and physics Layer obtains GPU number of devices and the ID of GPU;Judge that the ID of the GPU number of devices obtained from system layer and GPU are obtained with from physical layer Whether the ID of the GPU number of devices and GPU that take is identical;If they are the same, GPU letter is obtained from the system layer and the physical layer respectively Breath.It is easy to operate using this application provides a kind of information acquisition methods of GPU equipment, it is practical, it can reduce artificial Operation and the time intervened.
Detailed description of the invention
In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The some embodiments recorded in application, for those of ordinary skill in the art, without creative efforts, It can also be obtained according to these attached drawings other attached drawings.
Fig. 1 is a kind of flow chart of the information acquisition method for GPU equipment that the embodiment of the present application one provides;
Fig. 2 is a kind of schematic diagram of the information acquisition device for GPU equipment that the embodiment of the present application two provides.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only this Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
Embodiment one:
The embodiment of the present application one provides a kind of information acquisition method of GPU equipment, carries out with reference to the accompanying drawing specifically It is bright.
Referring to Fig. 1, which is a kind of flow chart of the information acquisition method for GPU equipment that the embodiment of the present application one provides.
It the described method comprises the following steps:
S101: GPU number of devices and the ID of GPU are obtained from system layer and physical layer respectively.
S102: the ID for judging the GPU number of devices obtained from system layer and GPU and the GPU number of devices from physical layer acquisition Whether the ID of amount and GPU are identical.
S103: when judging identical, the facility information of GPU is obtained from the system layer and the physical layer respectively.
Lower mask body introduces the realization process of the embodiment of the present application the method:
The acquisition of 1.GPU number of devices and ID
The GPU that server end recognizes is obtained from system layer by nvidia-smi tool in NVIDIA driver respectively Number of devices and ID, physical layer obtain server the GPU number of devices and BUS ID that recognize by lspci, and by the two It is compared and judges
If the facility information difference meeting interrupt routine process got twice, and timely carry out error message output and mention Wake up exploitation tester, if needing timely to check that information comparison where problem is identical, program continues to execute downwards
#!/bin/bash
Cur_Dir=`dirname $ 0`
Rm-rf $ Cur_Dir/GPU_pci_info.log $ Cur_Dir/GPU_smi_info.log# deletes script and generates History logs
nvidia-smi-a|grep"Product Name"-B1|grep GPU|cut-d":"-f 2-$NF|tr[a-z] [A-Z] > $ Cur_Dir/smi_list_tmp# obtains the ID of GPU equipment using driving nvidia-smi tool from system layer
lspci|grep NVIDIA|grep"VGA compatible controller"|awk'{print$1}'|tr The BUS ID number of [a-z] [A-Z] > $ Cur_Dir/pci_list_tmp# acquisition GPU equipment
cat$Cur_Dir/smi_list_tmp|sort>$Cur_Dir/smi_list
Cat $ Cur_Dir/pci_list_tmp | sort > $ Cur_Dir/pci_list# respectively gets upper part ID number be ranked up processing
diff$Cur_Dir/smi_list$Cur_Dir/pci_list
if[!$?-eq 0];then
echo"GPU Device Quality is Different in NVIDIA-SMI and PCI Device, Please Check It!!!!!"
exit 1
Fi# compares the GPU ID number of devices that two ways is got and whether address is identical, and makes corresponding sound It should operate
The acquisition of 2.GPU facility information
This part is still to take crawl GPU facility information respectively by two kinds of approach, is by NVIDIA driver respectively Middle nvidia-smi tool obtains from system layer and lspci from physical layer and two kinds of approach is finally got GPU facility information It is saved in different log recordings respectively, facilitates and subsequent check and confirm
CNT=`cat $ Cur_Dir/smi_list | wc-l`# counts GPU number of devices
For ((i=0;i<"$CNT";i++))
do
Smi_list_all [$ i]=`cat $ Cur_Dir/smi_list | sed-n $ [i+1] p`
Pci_list_all [$ i]=`cat $ Cur_Dir/pci_list | sed-n $ [i+1] p`
done
The id information that two kinds of approach are got is saved in different arrays by # respectively
For ((i=0;i<"$CNT";i++))
do
lspci-s${pci_list_all[$i]}-vvvv|sed-n 1p>>$Cur_Dir/GPU_pci_info.log
lspci-s${pci_list_all[$i]}-vvvv|grep-E-w"LnkCap|LnkSta">>$Cur_Dir/ GPU_pci_info.log
done
#lspci obtained from physical layer each GPU equipment rate and bandwidth information and export be saved in GPU_pci_ In info.log log
nvidia-smi-a>$Cur_Dir/GPU_tmp
For ((i=0;i<"$CNT";i++))
do
cat$Cur_Dir/GPU_tmp|grep-i${pci_list_all[$i]}-A20|grep-E"${pci_list_ all[$i]}|Product Name|VBIOS">>$Cur_Dir/GPU_smi_info.log
done
#nvidia-smi tool obtains equipment PN and VBIOS the software information of each GPU equipment from system layer respectively, And it is output in GPU_smi_info.log log
Rm-rf $ Cur_Dir/smi_list_tmp $ Cur_Dir/pci_list_tmp $ Cur_Dir/GPU_tmp needs to infuse Meaning, the server test platform can be intel x86 processor, normal boot-strap and can install RHEL7.4 64bit OS.Shell script GPU_check.sh is copied under any catalogue of system, operation " sh GPU_check.sh " executes test journey Sequence script.Notice that screen output whether there is " GPU Device Quality is Different in NVIDIA-SMI and PCI Device,Please Check It!" miscue, if there is needing to carry out analysis inspection.Exact p-value result day Will GPU_pci_info.log/GPU_smi_info.log, if be consistent with actual GPU quantity and facility information.
The embodiment of the present application provides a kind of information acquisition method of GPU equipment, which comprises respectively from system layer GPU number of devices and the ID of GPU are obtained with physical layer;Judge the ID of the GPU number of devices obtained from system layer and GPU with from object Whether the ID for managing GPU number of devices and GPU that layer obtains is identical;If they are the same, it is obtained respectively from the system layer and the physical layer Take GPU information.It is easy to operate using this application provides a kind of information acquisition methods of GPU equipment, it is practical, it can subtract The time of manual operation and intervention is lacked.
Embodiment two:
The information acquisition method of the GPU equipment provided based on the above embodiment, the embodiment of the present application two additionally provide one kind The information acquisition apparatus of GPU equipment, is specifically described with reference to the accompanying drawing.
Referring to fig. 2, which is a kind of schematic diagram of the information acquisition apparatus for GPU equipment that the embodiment of the present application two provides.
Described device includes: the first acquisition module 201, judgment module 202 and the second acquisition module 203.
Described first obtains module 201, for obtaining GPU number of devices and the ID of GPU from system layer and physical layer respectively.
The judgment module 202, the ID of GPU number of devices and GPU for judging to obtain from system layer with from physical layer Whether the GPU number of devices of acquisition and the ID of GPU are identical.
If the ID of the GPU number of devices and GPU obtained from the system layer and the GPU number of devices obtained from the physical layer The ID of amount and GPU are different, and the judgment module 202 is also used to:
Interrupt process, and timely carry out miscue.
Described second obtains module 203, when for judging identical, obtains respectively from the system layer and the physical layer The facility information of GPU.
The facility information of the GPU includes at least following one of which:
Bandwidth rates, GPU quantity and the video card basic input output system VBIOS of GPU equipment.
Optionally, described device further include: memory module.
The memory module, the facility information of the GPU for will be obtained from the system layer and the physical layer, point It is not saved in different journal files.
Described first, which obtains module 201 and described second, obtains module 203, is also used to be ranked up the ID of the GPU Processing.
The embodiment of the present application provides a kind of information acquisition device of GPU equipment, and described device obtains module using first GPU number of devices and the ID of GPU are obtained from system layer and physical layer respectively;Judged using judgment module from system layer acquisition Whether the ID of GPU number of devices and GPU are identical as the ID of the GPU number of devices and GPU that obtain from physical layer;If they are the same, it utilizes Second, which obtains module, obtains GPU information from the system layer and the physical layer respectively.Using this application provides a kind of GPU to set Standby information acquisition device, it is easy to operate, it is practical, the time of manual operation and intervention can be reduced.
It in above-described embodiment, all emphasizes particularly on different fields for the description of each embodiment, there is no the portion being described in detail in some embodiment Point, reference can be made to the related descriptions of other embodiments.
It should be appreciated that in this application, " at least one (item) " refers to one or more, and " multiple " refer to two or two More than a."and/or" indicates may exist three kinds of relationships, for example, " A and/or B " for describing the incidence relation of affiliated partner It can indicate: only exist A, only exist B and exist simultaneously tri- kinds of situations of A and B, wherein A, B can be odd number or plural number.Word Symbol "/" typicallys represent the relationship that forward-backward correlation object is a kind of "or"." at least one of following (a) " or its similar expression, refers to Any combination in these, any combination including individual event (a) or complex item (a).At least one of for example, in a, b or c (a) can indicate: a, b, c, " a and b ", " a and c ", " b and c ", or " a and b and c ", and wherein a, b, c can be individually, can also To be multiple.
The above described is only a preferred embodiment of the present invention, being not intended to limit the present invention in any form.Though So the present invention has been disclosed as a preferred embodiment, and however, it is not intended to limit the invention.It is any to be familiar with those skilled in the art Member, without departing from the scope of the technical proposal of the invention, all using the methods and technical content of the disclosure above to the present invention Technical solution makes many possible changes and modifications or equivalent example modified to equivalent change.Therefore, it is all without departing from The content of technical solution of the present invention, according to the technical essence of the invention any simple modification made to the above embodiment, equivalent Variation and modification, all of which are still within the scope of protection of the technical scheme of the invention.

Claims (10)

1. a kind of information acquisition method of GPU equipment, which is characterized in that the described method includes:
GPU number of devices and the ID of GPU are obtained from system layer and physical layer respectively;
Judge the ID of the GPU number of devices obtained from system layer and GPU and the GPU number of devices and GPU obtained from physical layer Whether ID is identical;
When judging identical, the facility information of GPU is obtained from the system layer and the physical layer respectively.
2. the equipment information acquiring method of GPU according to claim 1, which is characterized in that if being obtained from the system layer GPU number of devices and GPU ID it is different from the ID of GPU number of devices and GPU obtained from the physical layer, the method is also Include:
Interrupt process, and timely carry out miscue.
3. the information acquisition method of GPU equipment according to claim 1, which is characterized in that the facility information of the GPU is extremely Less include following one of which:
Bandwidth rates, GPU quantity and the video card basic input output system VBIOS of GPU equipment.
4. the information acquisition method of GPU equipment according to claim 1, which is characterized in that described respectively from the system After layer and the physical layer obtain the facility information of GPU, the method also includes:
By the facility information of the GPU obtained from the system layer and the physical layer, it is saved in different log texts respectively In part.
5. the information acquisition method of GPU equipment according to claim 1, which is characterized in that the method also includes to institute The ID for stating GPU is ranked up processing.
6. a kind of information acquisition device of GPU equipment, which is characterized in that described device includes: the first acquisition module, judgment module Module is obtained with second;
Described first obtains module, for obtaining GPU number of devices and the ID of GPU from system layer and physical layer respectively;
The judgment module, for judge from system layer obtain GPU number of devices and GPU ID with from physical layer obtain Whether the ID of GPU number of devices and GPU are identical;
Described second obtains module, when for judging identical, respectively from the equipment of the system layer and physical layer acquisition GPU Information.
7. the apparatus information acquiring device of GPU according to claim 6, which is characterized in that if being obtained from the system layer GPU number of devices and GPU ID it is different from the ID of GPU number of devices and GPU obtained from the physical layer, the judgement mould Block is also used to:
Interrupt process, and timely carry out miscue.
8. the apparatus information acquiring device of GPU according to claim 6, which is characterized in that the facility information of the GPU is extremely Less include following one of which:
Bandwidth rates, GPU quantity and the video card basic input output system VBIOS of GPU equipment.
9. the apparatus information acquiring device of GPU according to claim 6, which is characterized in that described device further include: storage Module;
The memory module, the facility information of the GPU for will obtain from the system layer and the physical layer, is protected respectively It is stored in different journal files.
10. the apparatus information acquiring device of GPU according to claim 6, which is characterized in that it is described first obtain module and Described second obtains module, is also used to be ranked up the ID of the GPU processing.
CN201811385729.XA 2018-11-20 2018-11-20 Method and device for acquiring equipment information of GPU Pending CN109491871A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811385729.XA CN109491871A (en) 2018-11-20 2018-11-20 Method and device for acquiring equipment information of GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811385729.XA CN109491871A (en) 2018-11-20 2018-11-20 Method and device for acquiring equipment information of GPU

Publications (1)

Publication Number Publication Date
CN109491871A true CN109491871A (en) 2019-03-19

Family

ID=65696406

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811385729.XA Pending CN109491871A (en) 2018-11-20 2018-11-20 Method and device for acquiring equipment information of GPU

Country Status (1)

Country Link
CN (1) CN109491871A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502399A (en) * 2019-08-23 2019-11-26 广东浪潮大数据研究有限公司 Fault detection method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111886A (en) * 2014-06-25 2014-10-22 曙光信息产业(北京)有限公司 Management system compatible with different GPUs and design method thereof
US20150074668A1 (en) * 2013-09-09 2015-03-12 Apple Inc. Use of Multi-Thread Hardware For Efficient Sampling
CN107423183A (en) * 2017-04-25 2017-12-01 郑州云海信息技术有限公司 A kind of GTX series video card calculates the applied voltage test method of performance
CN108319539A (en) * 2018-02-28 2018-07-24 郑州云海信息技术有限公司 A kind of method and system generating GPU card slot position information
CN108776595A (en) * 2018-06-11 2018-11-09 郑州云海信息技术有限公司 A kind of recognition methods, device, equipment and the medium of the video card of GPU servers

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150074668A1 (en) * 2013-09-09 2015-03-12 Apple Inc. Use of Multi-Thread Hardware For Efficient Sampling
CN104111886A (en) * 2014-06-25 2014-10-22 曙光信息产业(北京)有限公司 Management system compatible with different GPUs and design method thereof
CN107423183A (en) * 2017-04-25 2017-12-01 郑州云海信息技术有限公司 A kind of GTX series video card calculates the applied voltage test method of performance
CN108319539A (en) * 2018-02-28 2018-07-24 郑州云海信息技术有限公司 A kind of method and system generating GPU card slot position information
CN108776595A (en) * 2018-06-11 2018-11-09 郑州云海信息技术有限公司 A kind of recognition methods, device, equipment and the medium of the video card of GPU servers

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANGLEI1006: "linux系统下查看显卡信息", 《HTTPS://BLOG.CSDN.NET/WANGLEIWAVESHARP》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502399A (en) * 2019-08-23 2019-11-26 广东浪潮大数据研究有限公司 Fault detection method and device
CN110502399B (en) * 2019-08-23 2023-09-01 广东浪潮大数据研究有限公司 Fault detection method and device

Similar Documents

Publication Publication Date Title
US11281520B2 (en) Methods and systems for determining potential root causes of problems in a data center using log streams
US11755452B2 (en) Log data collection method based on log data generated by container in application container environment, log data collection device, storage medium, and log data collection system
US20220027249A1 (en) Automated methods and systems for troubleshooting problems in a distributed computing system
US11663070B2 (en) Root cause analysis of logs generated by execution of a system
US10878335B1 (en) Scalable text analysis using probabilistic data structures
US11880272B2 (en) Automated methods and systems that facilitate root-cause analysis of distributed-application operational problems and failures by generating noise-subtracted call-trace-classification rules
CN107885578A (en) A kind of resources of virtual machine distribution method and device
DE102007046947B4 (en) System and method for managing system management interrupts in a multi-processor computer system
US20210382746A1 (en) Methods and systems for reducing volumes of log messages sent to a data center
US10528407B2 (en) Integrated statistical log data mining for mean time auto-resolution
US11880271B2 (en) Automated methods and systems that facilitate root cause analysis of distributed-application operational problems and failures
US20240045793A1 (en) Method and system for scalable performance testing in cloud computing environments
CN115145777B (en) Test method, system, device and storage medium
US20230143568A1 (en) Intelligent table suggestion and conversion for text
CN112308077A (en) Sample data acquisition method, image segmentation method, device, equipment and medium
CN109376079A (en) The test method and server that interface calls
CN111159167B (en) Labeling quality detection device and method
CN110275779B (en) Resource acquisition method, device, equipment and storage medium
CN109491871A (en) Method and device for acquiring equipment information of GPU
Lin et al. Edits: An easy-to-difficult training strategy for cloud failure prediction
CN112416700A (en) Analyzing initiated predictive failures and SMART logs
CN107479900A (en) A kind of hot plug software scenario suitable for real time operating system
US20210073686A1 (en) Self-structured machine learning classifiers
US20140344257A1 (en) Detecting a Preferred Implementation of an Operation
CN115099229A (en) Plan model generation method, plan model generation device, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190319