CN111580976A

CN111580976A - VASP resource calling method, system, equipment and medium

Info

Publication number: CN111580976A
Application number: CN202010388254.0A
Authority: CN
Inventors: 王倩; 刘羽; 于占乐; 杨振宇; 李龙翔; 崔坤磊; 张敏
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2020-08-25

Abstract

The invention discloses a VASP resource calling method, which comprises the following steps: obtaining architecture information of a GPU of a platform where the VASP is located and the number of GPUs to be called; carrying out environment initialization on the platform according to the architecture information and the number; acquiring an interconnection mode of a GPU and a binding relationship between the GPU and a CPU; calling the corresponding GPUs in sequence from high to low according to the priority sequence of the interconnection mode until the number of the called GPUs reaches the number of the GPUs to be called; determining the number of CPUs bound by the called GPU according to the binding relation and determining the number of CPUs to be called according to the number of GPUs to be called; and responding to the condition that the number of the CPUs to be called is less than the number of the bound CPUs, and calling the corresponding number of the CPUs to be called from the bound CPUs. The invention also discloses a system, a computer device and a readable storage medium. The scheme provided by the invention can greatly save the software redeployment time and the learning time cost of a user for a hardware platform, and also realizes the optimal scheduling of the CPU and the GPU when the VASP software is operated.

Description

VASP resource calling method, system, equipment and medium

Technical Field

The invention relates to the field of VASP, in particular to a method, a system, equipment and a storage medium for calling resources of VASP.

Background

Some scientific computing software, such as VASP, Lammps, etc., have been successfully ported to Nvidia GPU graphics using CUDA programming language with programmer's effort. Test results show that when the VASP runs on a CPU + GPU mixed heterogeneous super computer, the performance is greatly improved, but the optimal scheduling of the CPU and the GPU cannot be realized.

Disclosure of Invention

In view of the above, in order to overcome at least one aspect of the above problems, an embodiment of the present invention provides a resource invoking method for a VASP, including the following steps:

obtaining architecture information of a GPU of a platform where the VASP is located and the number of GPUs to be called;

performing environment initialization on the platform according to the architecture information and the quantity;

acquiring an interconnection mode of the GPU and a binding relationship between the GPU and the CPU;

calling the corresponding GPUs in sequence from high to low according to the priority sequence of the interconnection mode until the number of the called GPUs reaches the number of the GPUs to be called;

determining the number of CPUs bound by the called GPU according to the binding relation and determining the number of CPUs to be called according to the number of GPUs to be called;

and responding to the situation that the number of the CPUs to be called is smaller than the number of the bound CPUs, and calling the corresponding number of the CPUs to be called from the bound CPUs.

In some embodiments, further comprising:

and responding to the condition that the number of the CPUs to be called is not less than the number of the bound CPUs, calling all the bound CPUs, and calling other CPUs which do not have a binding relation with the called GPU until the number of the called CPUs reaches the number of the CPUs to be called.

In some embodiments, further comprising:

comparing the IBZKPT parameters in the input file of the VASP with the number of the called GPUs;

in response to the IBZKPT parameters not being less than the number of invoked GPUs, adjusting the KPAR parameters to be equal to the number of invoked GPUs;

in response to the IBZKPT parameter being less than the number of the called GPUs, determining the parity of the IBZKPT parameter;

in response to the IBZKPT parameter being an odd number, adjusting the KPAR parameter to one-half of the number of invoked GPUs;

in response to the IBZKPT parameter being an even number, adjusting the KPAR parameter to be equal to the IBZKPT parameter.

In some embodiments, further comprising:

the NCORE parameter is adjusted to 1.

In some embodiments, sequentially invoking the corresponding GPUs from high to low according to the priority order of the interconnection manner until the number of the invoked GPUs reaches the number of the GPUs to be invoked, further comprising:

starting the called GPU persistence mode and the multi-process service;

and respectively adjusting the clock frequency of each called GPU to the maximum value according to the bottom layer information of each called GPU.

In some embodiments, the obtaining of the architecture information of the GPU of the platform where the VASP is located and the number of GPUs to be called further includes:

and acquiring the number of all the GPUs, the computing power of the GPUs, the CUDA version and the absolute path of the CUDA mathematical base.

In some embodiments, further comprising:

judging whether the number of GPUs input by a user is received;

in response to the number of GPUs input by the user, taking the number of GPUs input by the user as the number of GPUs to be called;

and in response to the number of GPUs which do not receive the user input, taking the number of all GPUs as the number of GPUs to be called.

Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a resource invocation system of a VASP, including:

the first acquisition module is configured to acquire architecture information of a GPU of a platform where the VASP is located and the number of GPUs to be called;

an initialization module configured to perform environment initialization on the platform according to the architecture information and the number;

the second acquisition module is configured to acquire the interconnection mode of the GPU and the binding relationship between the GPU and the CPU;

the GPU calling module is configured to sequentially call the corresponding GPUs from high to low according to the priority order of the interconnection mode until the number of the called GPUs reaches the number of the GPUs to be called;

the determining module is configured to determine the number of the CPUs bound by the called GPUs according to the binding relation and determine the number of the CPUs to be called according to the number of the GPUs to be called;

a first response module configured to respond that the number of the CPUs to be called is less than the number of the bound CPUs, and call a corresponding number of the CPUs to be called from the bound CPUs.

Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer apparatus, including:

at least one processor; and

a memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of any of the VASPs resource invocation methods described above.

Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of any of the VASPs resource invocation methods described above.

The invention has one of the following beneficial technical effects: the scheme provided by the invention can greatly save the software redeployment time and the learning time cost of a user for a hardware platform, and realizes the optimal scheduling of CPU and GPU resources when VASP software is operated.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a resource calling method of a VASP according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a resource invocation system of a VASP according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a computer device provided in an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

In the present example, VASP (Vienna Ab-initio Simulation Package) is a commercial software for materials Simulation and computational materials science research. CUDA (computer Unified Device architecture) is a Unified computing Device architecture. KPAR, IBZKPT and NCORE are proprietary input parameters in the input file INCAR of VASP, wherein KPAR represents K-space parallel parameters, IBZKPT represents the number of K scattering points that are simple Brillouin zone nondegenerate, and NCORE represents the processing of the same energy band using NCORE computational cores.

According to an aspect of the present invention, an embodiment of the present invention provides a resource calling method of a VASP, which may include, as shown in fig. 1, the steps of: s1, obtaining architecture information of the GPU of the platform where the VASP is located and the number of GPUs to be called; s2, performing environment initialization on the platform according to the architecture information and the quantity; s3, acquiring the interconnection mode of the GPU and the binding relationship between the GPU and the CPU; s4, calling the corresponding GPUs in sequence from high to low according to the priority sequence of the interconnection mode until the number of the called GPUs reaches the number of the GPUs to be called; s5, determining the number of CPUs bound by the called GPU according to the binding relation and determining the number of CPUs to be called according to the number of GPUs to be called; s6, responding to the fact that the number of the CPUs to be called is smaller than the number of the bound CPUs, and calling the corresponding number of the CPUs to be called from the bound CPUs.

The scheme provided by the invention realizes seamless connection of the VASP software from the CPU platform to the GPU platform for transplanting calculation, realizes one-click type foolproof operation, and realizes the configuration of the calculation environment variables of the GPU, the optimized scheduling of the calculation resources and the optimization of the VASP software to the direct output of the calculation result. And even if the user who does not know the GPU platform can directly transplant the computing task to the computing platform provided with the GPU accelerator card for task simulation, the software redeployment time and the learning time cost of the user to the hardware platform are greatly saved.

In some embodiments, in the step S1, acquiring the architecture information of the GPU of the platform where the VASP is located and the number of GPUs to be called, the method further includes:

Specifically, the GPU architecture of the computing platform may be detected by using an Nvidia-smi tool, and the detection content may include: the number of GPU display cards, the computing capacity of the display cards, the CUDA version and the absolute path of the CUDA mathematical library configured by each CPU host are added to the environment variables of the Linux system so as to realize the step S2 of initializing the environment of the platform according to the architecture information and the number.

In some embodiments, the method further comprises:

judging whether the number of GPUs input by a user is received;

Specifically, the number of GPUs to be called in the default state is the number of all GPUs, so that if the number of GPUs to be called is not input by the user, the number of all GPUs is called. That is, the number of GPUs to be called may be the number of GPU display cards configured by each CPU host (i.e., all GPUs are called) that is directly detected, or the number of corresponding GPUs to be called may be input according to the actual needs of the user to call a part of GPUs.

It should be noted that, if the number of GPUs input by the user is greater than the number of all actual GPUs, a corresponding prompt is returned.

In some embodiments, after the initialization is completed, the VASP of the GPU version is compiled, specifically, a suitable C/C + + compiler is first selected; then, setting an optimization option according to a system bottom architecture and a compiler type, and optimizing the VASP application software at a code level; next, configuring a Fast Fourier Transform (FFTW) mathematical library; adding the GPU information detected in the step S1, configuring a CUDA option of the compiled file, and adding related nvcc compiler and CUDA mathematical library optimization options according to the GPU architecture. Therefore, the automatic optimization compilation of the VASP software on the current platform can be completed by utilizing the information.

In some embodiments, in the step S3, in obtaining the interconnection mode of the GPU and the binding relationship between the GPU and the CPU, specifically, the interconnection topology structure of the GPU graphics card may be detected by using an nvidia-smi tool, and the interconnection mode between the GPUs and the binding relationship between the GPUs and the CPU are determined, so as to perform optimal scheduling of the computing resources.

In some embodiments, in step S4, the corresponding GPUs are sequentially called from high to low according to the priority order of the interconnection manner until the number of the called GPUs reaches the number of the GPUs to be called, and specifically, the priority order of the interconnection relationship between the GPUs may be from high to low for interconnection that is realized by NvSwitch, interconnection realized by Nvlink, interconnection realized by PCIE, and interconnection that is realized by PCIE but can communicate only by bypassing the CPU host. The GPUs interconnected through the NvSwitch are scheduled preferentially, point-to-point P2P data exchange can be directly carried out among the GPUs, the data transmission speed is fastest, and the corresponding performance is optimal; then, a GPU for point-to-point data transmission through the Nvlink is scheduled; then, the GPUs interconnected through the PCIE are scheduled; and finally, dispatching the GPU which passes through the PCIE and can communicate only by bypassing the CPU host, and avoiding the last level as much as possible.

In some embodiments, the step S4 sequentially invokes the corresponding GPUs from high to low according to the priority order of the interconnection manner until the number of invoked GPUs reaches the number of GPUs to be invoked, further including:

starting the called GPU persistence mode and the multi-process service;

Specifically, the nvidia-smi tool can be utilized to turn on the GPU persistent mode and turn on the MPS multiprocessing service of the GPU. And detecting the bottom layer information of the GPU architecture by using an nvidia-smi tool, obtaining a plurality of clock frequencies of each GPU from the bottom layer information, and automatically adjusting the clock frequencies of the GPUs to the maximum value.

In some embodiments, in step S5, the number of CPUs bound by the called GPUs is determined according to the binding relationship, and the number of CPUs to be called is determined according to the number of GPUs to be called, specifically, each GPU is bound to a plurality of CPUs, the number of CPUs having a binding relationship that can be called can be determined by the called GPU, and the number of CPUs to be called can be determined by the number of called GPUs in order to implement resource optimal scheduling.

In some embodiments, when the number of CPUs to be called is determined by the number of called GPUs, an optimal number of CPUs to be called can be obtained by testing the input file INCAR of the VASP software, and in order to accelerate the testing speed, the output parameter PREC ═ Low (PREC is a proprietary input parameter name of the VASP software) can be modified, and the electronic step is 1, so as to perform the performance test. The number of called CPUs can be 1 at the first test, then the test is repeated by increasing the number of CPUs each time, and the test is quitted if the running time of the current time, the last running time is less than 0. And taking the number of CPUs called last time as the number to be called.

In some embodiments, the optimal number of CPUs corresponding to GPUs with different numbers may also be obtained by testing in advance, so that the optimal number of CPUs is directly called after the user inputs the number of GPUs to be called.

In some embodiments, the method further comprises:

Specifically, when the CPUs are called, the CPUs having a binding relationship with the called GPUs are preferentially called, and the names of the CPUs can be preferentially specified by using an MPI tool, so that the targeted CPU binding is realized, and the fastest data exchange is performed between the GPUs and the CPUs.

In some embodiments, the method further comprises:

Specifically, after the resource scheduling is performed, the special input parameters KPAR and IBZKPT in the VASP input file INCAR can be optimized and adjusted, so that higher degree of concurrency and faster calculation speed are realized.

In some embodiments, the NCER parameter may also be adjusted to 1, otherwise the program runs with an error.

Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a resource invocation system 400 of a VASP, as shown in fig. 2, including:

a first obtaining module 401, where the first obtaining module 401 is configured to obtain architecture information of a GPU of a platform where the VASP is located and the number of GPUs to be called;

an initialization module 402, the initialization module 402 configured to perform environment initialization on the platform according to the architecture information and the quantity;

a second obtaining module 403, where the second obtaining module 403 is configured to obtain an interconnection manner of the GPU and a binding relationship between the GPU and the CPU;

a GPU calling module 404, where the GPU calling module 404 is configured to call the corresponding GPUs in sequence from high to low according to the priority order of the interconnection manner until the number of called GPUs reaches the number of GPUs to be called;

a determining module 405, where the determining module 405 is configured to determine the number of CPUs bound by the called GPU according to the binding relationship and determine the number of CPUs to be called according to the number of GPUs to be called;

a first response module 406, where the first response module 406 is configured to respond that the number of the CPUs to be called is less than the number of the bound CPUs, and call a corresponding number of the CPUs to be called from the bound CPUs.

In some embodiments, further comprising a second response module configured to:

In some embodiments, further comprising a parameter adjustment module configured to:

In some embodiments, the GPU call module is further configured to:

starting the called GPU persistence mode and the multi-process service;

Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a computer apparatus 501, comprising:

at least one processor 520; and

a memory 510, the memory 510 storing a computer program 511 executable on the processor, the processor 520 executing the program to perform the steps of any of the above resource invocation methods of the VASP.

Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer-readable storage medium 601, where the computer-readable storage medium 601 stores computer program instructions 610, and the computer program instructions 610, when executed by a processor, perform the steps of the resource calling method of the VASP as any one of the above.

Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program to instruct related hardware to implement the methods. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments.

In addition, the apparatuses, devices, and the like disclosed in the embodiments of the present invention may be various electronic terminal devices, such as a mobile phone, a Personal Digital Assistant (PDA), a tablet computer (PAD), a smart television, and the like, or may be a large terminal device, such as a server, and the like, and therefore the scope of protection disclosed in the embodiments of the present invention should not be limited to a specific type of apparatus, device. The client disclosed by the embodiment of the invention can be applied to any one of the electronic terminal devices in the form of electronic hardware, computer software or a combination of the electronic hardware and the computer software.

Furthermore, the method disclosed according to an embodiment of the present invention may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. The computer program, when executed by the CPU, performs the above-described functions defined in the method disclosed in the embodiments of the present invention.

Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.

Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. By way of example, and not limitation, nonvolatile memory can include Read Only Memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM), which can act as external cache memory. By way of example and not limitation, RAM is available in a variety of forms such as synchronous RAM (DRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), and Direct Rambus RAM (DRRAM). The storage devices of the disclosed aspects are intended to comprise, without being limited to, these and other suitable types of memory.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.

The various illustrative logical blocks, modules, and circuits described in connection with the disclosure herein may be implemented or performed with the following components designed to perform the functions herein: a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination of these components. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP, and/or any other such configuration.

The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.

In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. A resource calling method of VASP is characterized by comprising the following steps:

2. The method of claim 1, further comprising:

3. The method of claim 1, further comprising:

in response to the IBZKPT parameters not being less than the number of the invoked GPUs, adjusting KPAR parameters to be equal to the number of the invoked GPUs;

4. The method of claim 3, further comprising:

the NCORE parameter is adjusted to 1.

5. The method of claim 1, wherein the corresponding GPUs are sequentially called from high to low in the priority order of the interconnection manner until the number of called GPUs reaches the number of GPUs to be called, further comprising:

starting the called GPU persistence mode and the multi-process service;

6. The method of claim 1, wherein obtaining architecture information of GPUs of a platform where the VASP is located and a number of GPUs to be called further comprises:

7. The method of claim 6, further comprising:

judging whether the number of GPUs input by a user is received;

8. A resource invocation system for a VASP, comprising:

9. A computer device, comprising:

at least one processor; and

memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1-7.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 7.