CN116800808A

CN116800808A - GPU resource calling method and device, electronic equipment and storage medium

Info

Publication number: CN116800808A
Application number: CN202310739140.XA
Authority: CN
Inventors: 吴慧敏; 卢照旭; 范会杨; 赵健; 过晓春
Original assignee: China United Network Communications Group Co Ltd; Unicom Digital Technology Co Ltd; Unicom Cloud Data Co Ltd
Current assignee: China United Network Communications Group Co Ltd; Unicom Digital Technology Co Ltd; Unicom Cloud Data Co Ltd
Priority date: 2023-06-20
Filing date: 2023-06-20
Publication date: 2023-09-22

Abstract

The application provides a method, a device, an electronic device and a storage medium for invoking GPU resources, which are applied to first electronic equipment of a kubernetes cloud platform, wherein the first electronic equipment comprises local GPU resources, and the method comprises the following steps: at a first moment, a request of a first Artificial Intelligence (AI) service is received, wherein the request of the first AI service comprises a first identifier, and the first identifier is used for indicating that the first AI service is a service needing to call a local GPU resource; according to the first identification, calling a local GPU resource through a first AI service container; at a second moment, a request of a second AI service is received, wherein the request of the second AI service comprises a second identifier, and the second identifier is used for indicating that the second AI service is a service requiring remote call of GPU resources; and according to the second identification, remotely calling the target remote GPU resource through a second AI service container. Thus, the utilization rate of the GPU resources is improved, and the efficiency of GPU resource calling is improved.

Description

GPU resource calling method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of kubernetes cloud platforms, in particular to a GPU resource calling method, a device, electronic equipment and a storage medium.

Background

The Kubernetes (K8S) cloud platform is used as a distributed operating system, so that IT operation and maintenance cost can be reduced, service delivery period is shortened, and meanwhile, the PaaS platform integrating the functions of agile development cloud, application management, elastic expansion, resource monitoring, micro-service management and the like gradually becomes an important bearing platform of cloud on service. Therefore, the Kubernetes cloud platform is widely applied to various fields, in particular to the technical field of artificial intelligence (Artificial Intelligence, abbreviated as AI).

In kubernetes cloud platform, the common service container can meet the demand based on the computing power of a central processing unit (Central Processing Unit, abbreviated as CPU), while the AI service container may need high computing power support of a graphics processor (Graphics Processing Unit, abbreviated as GPU). Because the complexity of the service scene of the AI service is higher, the efficiency requirement for GPU resource calling is higher. Therefore, how to efficiently manage and call GPU resources is a problem to be solved for AI class business containers.

Disclosure of Invention

The embodiment of the application provides a method, a device, electronic equipment and a storage medium for calling GPU resources, which can be used for efficiently calling the GPU resources for different AI service scenes.

In a first aspect, an embodiment of the present application provides a GPU resource calling method, which is applied to a first electronic device, where the first electronic device is a device in a kubernetes cloud platform, the first electronic device includes a GPU resource of a local graphics processor, and the GPU resource calling method includes:

at a first moment, a request of a first Artificial Intelligence (AI) service is received, wherein the request of the first AI service comprises a first identifier, and the first identifier is used for indicating that the first AI service is a service needing to call the local GPU resource;

according to the first identifier, the local GPU resource is called through a first AI service container, wherein the first AI service container is a container running an application program for processing the first AI service;

at a second moment, a request of a second AI service is received, wherein the request of the second AI service comprises a second identifier, and the second identifier is used for indicating that the second AI service is a service requiring remote call of GPU resources;

determining a target remote GPU resource from a plurality of remote GPU resources according to the second identifier;

and remotely calling the target remote GPU resource through a second AI service container, wherein the second AI service container is a container for running an application program for processing the second AI service.

In a possible implementation manner, the calling, according to the first identifier, the local GPU resource through the first AI service container includes:

acquiring a hijacked first interface in a kubernetes device plugin component through the first AI service container according to the first identifier, wherein the kubernetes device plugin component comprises the correspondence between a plurality of AI service identifiers and the hijacked interfaces;

and calling the hijacked first interface through the first AI service container to call the local GPU resource.

In one possible implementation, invoking the local GPU resource through the first AI business container includes:

and calling the local GPU resources in a time division multiplexing mode through the first AI service container.

In a possible implementation manner, the remotely calling, according to the second identifier, the target off-site GPU resource through a second AI service container includes:

acquiring a hijacked second interface in the kubernetes device plugin component through the second AI service container according to the second identifier;

and calling the hijacked second interface in a target kubernetes server through the second AI service container by utilizing a network so as to remotely call the target remote GPU resource, wherein the target kubernetes server is the server where the target remote GPU resource is located.

In a possible implementation manner, the remote invocation of the target off-site GPU resource through the second AI service container further includes:

and remotely calling the target remote GPU resource by adopting a time division multiplexing mode through the second AI service container.

In a possible implementation manner, the method further includes:

receiving a first operation of the first AI service;

responding to the first operation, and setting a first identifier for the first AI service;

receiving a second operation on the second AI service;

and responding to the second operation, and setting a second identifier for the second AI service.

In a possible implementation manner, the method further includes:

and at a third moment, receiving remote call of a third AI service container of a second electronic device to the local GPU resource, so as to remotely call the local GPU resource to the third AI service container, wherein the second electronic device is a device in the kubernetes cloud platform, an application program for processing a third AI service is operated in the third AI service container, and the second electronic device does not comprise GPU resource.

In a second aspect, an embodiment of the present application provides a GPU resource calling device, where the GPU resource calling device includes:

The receiving module is used for receiving a request of a first AI service at a first moment, wherein the request of the first AI service comprises a first identifier, and the first identifier is used for indicating that the first AI service is a service needing to call local GPU resources;

and the calling module is used for calling the local GPU resource through a first AI service container according to the first identifier, wherein the first AI service container is a container for running an application program for processing the first AI service.

The receiving module is further configured to receive a request for a second AI service at a second moment, where the request for the second AI service includes a second identifier, and the second identifier is used to indicate that the second AI service is a service that needs to remotely invoke GPU resources.

And the processing module is used for determining target remote GPU resources from the plurality of remote GPU resources according to the second identification.

The calling module is further configured to remotely call the target remote GPU resource through a second AI service container, where the second AI service container is a container running an application program for processing the second AI service.

In a possible implementation manner, the calling module is specifically configured to obtain, according to the first identifier, a hijacked first interface in a kubernetes device plugin component through the first AI service container, where the kubernetes device plugin component includes correspondence between identifiers of a plurality of AI services and the hijacked interface; and calling the hijacked first interface through the first AI service container to call the local GPU resource.

In a possible implementation manner, the calling module is specifically configured to call the local GPU resource by using a time division multiplexing manner through the first AI service container.

In a possible implementation manner, the calling module is specifically configured to obtain, according to the second identifier, a hijacked second interface in the kubernetes device plugin component through the second AI service container; and calling the hijacked second interface in a target kubernetes server through the second AI service container by utilizing a network so as to remotely call the target remote GPU resource, wherein the target kubernetes server is the server where the target remote GPU resource is located.

In a possible implementation manner, the calling module is specifically configured to remotely call the target remote GPU resource by using a time division multiplexing manner through the second AI service container.

In a possible implementation manner, the receiving module is further configured to receive a first operation on the first AI service.

The processing module is further configured to set a first identifier for the first AI service in response to the first operation.

The receiving module is further configured to receive a second operation on the second AI service.

The processing module is further configured to set a second identifier for the second AI service in response to the second operation.

In a possible implementation manner, the receiving module is further configured to receive, at a third moment, a remote call from a third AI service container of a second electronic device to the local GPU resource, so as to remotely call the local GPU resource to the third AI service container, where the second electronic device is a device in the kubernetes cloud platform, an application for processing a third AI service is running in the third AI service container, and the second electronic device does not include a GPU resource.

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement the method as described in any one of the possible implementations of the first aspect.

In a fourth aspect, an embodiment of the present application further provides a computer readable storage medium, where computer executable instructions are stored, and when executed by a processor, implement the method described in any one of the possible implementation manners of the first aspect.

In a fifth aspect, embodiments of the present application further provide a computer program product comprising a computer program which, when executed by a processor, implements a method as described in any one of the possible implementations of the first aspect.

It can be seen that the embodiments of the present application provide a method, an apparatus, an electronic device, and a storage medium for invoking GPU resources, which are applied to a first electronic device, where the first electronic device is a device in a kubernetes cloud platform, and the first electronic device includes a GPU resource of a local graphics processor, and the method includes: at a first moment, a request of a first Artificial Intelligence (AI) service is received, wherein the request of the first AI service comprises a first identifier, and the first identifier is used for indicating that the first AI service is a service needing to call a local GPU resource; according to the first identifier, the local GPU resource is called through a first AI service container, wherein the first AI service container is a container running an application program for processing the first AI service; at a second moment, a request of a second AI service is received, wherein the request of the second AI service comprises a second identifier, and the second identifier is used for indicating that the second AI service is a service requiring remote call of GPU resources; determining a target remote GPU resource from the plurality of remote GPU resources according to the second identifier; and remotely calling the target remote GPU resource through a second AI service container, wherein the second AI service container is a container running an application program for processing the second AI service. Therefore, the AI service needing to call the local GPU resource can call the local GPU resource, the AI service needing to be called remotely for the GPU resource can call the remote GPU resource, the utilization rate of the GPU resource of the kubernetes cloud platform can be improved, and the efficiency of the GPU resource call is improved.

Drawings

FIG. 1 is a flowchart of a GPU resource calling method according to an embodiment of the present application;

fig. 2 is a flowchart of a method for setting an identifier of an AI service request by a user according to an embodiment of the present application;

FIG. 3 is a schematic diagram of partitioning of GPU resources for AI service invocation provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a method for invoking local GPU resources according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a method for calling local GPU resources in different places according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a GPU resource calling device according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to the present application.

Specific embodiments of the present disclosure have been shown by way of the above drawings and will be described in more detail below. These drawings and the written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the disclosed concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

In embodiments of the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In the text description of the present application, the character "/" generally indicates that the front-rear associated object is an or relationship.

With the rapid development of cloud technology, various container orchestration systems have emerged, wherein kubernetes cloud platforms are widely used by industry for the construction of private container clouds. The kubernetes cloud platform supports the operation of business containers by managing various computing resources, network resources, and storage resources. The common service container can meet the demands based on CPU computing power, but the AI service container with increasingly strong demands needs high computing power support of GPU and the like.

There are multiple electronic devices in the kubernetes cloud platform, and each electronic device may be regarded as a node of the kubernetes cloud platform. In the kubernetes cloud platform, some nodes have local GPU resources and some nodes do not have local GPU resources. Therefore, how to efficiently call the GPU resources in the kubernetes cloud platform enables both nodes with GPU resources and nodes without GPU resources to call the GPU resources becomes a problem to be solved.

Based on this, the embodiment of the application provides a GPU resource calling method, which can determine whether the AI service needs to call local GPU resources in advance according to the situation of the AI service, and mark different AI services. Thus, when the electronic device determines that the received AI service is the service requiring the local GPU resource to be called, the electronic device can call the local resource, and when the electronic device receives the AI service not requiring the local GPU resource to be called, the electronic device can call the local GPU resource. The electronic equipment can flexibly call the GPU resources according to the conditions of the AI service, and the remote call of the remote GPU resources is possibly limited by the network and some conditions of the remote GPU resources, so that the special AI service can be preferentially called by the local GPU resources, and the common AI service can be used for remotely calling the remote GPU resources, so that the utilization rate of the GPU resources and the efficiency of calling the GPU resources can be improved.

The GPU resource calling method provided by the application will be described in detail by a specific embodiment. It is to be understood that the following embodiments may be combined with each other and that some embodiments may not be repeated for the same or similar concepts or processes.

In the embodiment of the application, the kubernetes cloud platform may include a plurality of electronic devices, and the plurality of electronic devices may include a first electronic device, a second electronic device, and possibly include a third electronic device.

Exemplary, among the plurality of electronic devices in the kubernetes cloud platform, an electronic device in which GPU resources exist, and an electronic device in which GPU resources do not exist, are included.

In the embodiment of the application, the first electronic device in the kubernetes cloud platform is an electronic device comprising a local GPU resource.

Fig. 1 is a flowchart of a GPU resource calling method according to an embodiment of the present application. The GPU resource calling method may be performed by software and/or hardware means, for example, the hardware means may be GPU resource calling means, which may be the first electronic device or a processing chip in the first electronic device. For example, referring to fig. 1, the GPU resource calling method may include:

s101, at a first moment, a request of a first artificial intelligence AI service is received, wherein the request of the first AI service comprises a first identifier, and the first identifier is used for indicating that the first AI service is a service needing to call a local GPU resource.

The request of the first AI service may be submitted by the user through an application program of the terminal device, or may be submitted by other manners, which is not limited in the embodiment of the present application.

The first identifier may be at least one of letters, numbers, special symbols, or the like, and the embodiment of the application is not limited to the form of the first identifier.

S102, according to the first identification, the local GPU resource is called through a first AI service container, wherein the first AI service container is a container running an application program for processing the first AI service.

For example, when the first electronic device invokes the local GPU resource through the first AI service container according to the first identifier, the first electronic device may obtain the hijacked first interface in the kubernetes device plugin component through the first AI service container according to the first identifier, and invoke the hijacked first interface through the first AI service container to invoke the local GPU resource.

The kubernetes device plugin component includes correspondence between the identifiers of the plurality of AI services and the hijacked interfaces. For example, the kubernetes device plugin component may include a correspondence between the first identifier of the first AI service and the hijacked first interface, and the embodiment of the present application does not specifically limit the correspondence included in the kubernetes device plugin component.

It should be noted that, the content in the kubernetes device plugin component is reported by each electronic device in the kubernetes cloud platform.

For example, for an electronic device with GPU resources, a GPU resource management module is present in the electronic device, where the GPU resource management module may acquire information of the GPU resources in the electronic device, and may report the acquired information of the GPU resources to the kubernetes device plugin component, so that the service container may acquire a call interface of the GPU resources, that is, a hijacked interface, in the kubernetes device plugin component.

It may be appreciated that the first electronic device includes a first GPU resource management module, where the first GPU resource management module may obtain information of a local GPU resource of the first electronic device, and report the information of the local GPU resource to the kubernetes device plugin component.

The first hijacked interface may be a cuda driver api, or may be another interface, which is not limited in the embodiment of the present application.

In this way, unified pooling of GPU resources in the kubernetes cloud platform can be achieved through the kubernetes device plugin component, and the pooled kubernetes device plugin component is provided for nodes needing to be called by the local GPU resources, so that the local GPU resources are called. When the first AI service is a special AI service, the efficiency of local GPU resource calling can be improved.

In an exemplary embodiment, when the first electronic device invokes the local GPU resource through the first AI service container, the first electronic device may invoke the local GPU resource through the first AI service container in a time division multiplexing manner.

Therefore, the local GPU resource is called in a time division multiplexing mode, so that other AI (advanced technology) services calling the local GPU resource can be normally called, and the utilization rate of the local GPU resource can be improved.

S103, at a second moment, a request of a second AI service is received, wherein the request of the second AI service comprises a second identifier, and the second identifier is used for indicating that the second AI service is a service which needs to remotely call GPU resources.

The request of the second AI service may be submitted by the user through an application program of the terminal device, or may be submitted by other manners, which is not limited in the embodiment of the present application.

The second identifier may be at least one of letters, numbers, special symbols, or the like, and the embodiment of the present application is not limited to the form of the second identifier.

It should be noted that, the first identifier and the second identifier are different identifiers, so as to distinguish whether the AI service corresponding to the identifier is a special service requiring local GPU resource call.

S104, determining the target remote GPU resources from the plurality of remote GPU resources according to the second identification.

In the embodiment of the application, a kubernetes device plugin component in the kubernetes cloud platform comprises a plurality of identifiers of AI services and corresponding relations of hijacked interfaces. For example, the kubernetes device plugin component may include a correspondence between the second identifier of the second AI service and the hijacked second interface, and the embodiment of the present application does not specifically limit the correspondence included in the kubernetes device plugin component.

In the embodiment of the present application, the target off-site GPU resource may be a GPU resource of the third electronic device in the kubernetes cloud platform. The third electronic device includes a second GPU resource management module, and the second GPU resource management module may acquire information of GPU resources of the third electronic device, and report the information of the GPU resources to the kubernetes device plugin component, so that a corresponding relationship between the second identifier of the second AI service and the hijacked second interface is generated in the kubernetes device plugin component.

The embodiment of the application only takes the corresponding relation between the second identifier for generating the second AI service and the hijacked second interface as an example for explanation, but the application does not form any limitation.

Thus, when the first electronic device determines that the second identifier is included in the request for the second AI service, a target off-site GPU resource corresponding to the second identifier may be determined in kubernetes device plugin component in accordance with the second identifier.

S105, remotely calling the target remote GPU resource through a second AI service container, wherein the second AI service container is a container running an application program for processing the second AI service.

For example, when the first electronic device remotely invokes the target off-site GPU resource through the second AI service container according to the second identifier, the second hijacked interface may be obtained in the kubernetes device plugin component through the second AI service container according to the second identifier; and calling a hijacked second interface in the target kubernetes server through a second AI service container by using a network to remotely call the target remote GPU resource, wherein the target kubernetes server is the server where the target remote GPU resource is located.

By way of example, the network may be a network that communicates between electronic devices in a kubernetes cloud platform. The hijacked second interface may be a hijacked cuda run time api, or may be another interface, which is not limited in the embodiment of the present application.

Therefore, for the common AI service which needs to be remotely called on the GPU resource, the remote GPU resource can be remotely called, and the local GPU resource can be preferentially used for the special AI service. The special AI service can be more quickly called to the GPU resource while the common AI service can be called to the GPU resource, and the calling efficiency of the remote GPU resource and the local GPU resource can be improved.

The first electronic device may remotely invoke the target remote GPU resource by using a time division multiplexing manner through the second AI service container when remotely invoking the target remote GPU resource through the second AI service container.

Therefore, the target remote GPU resource is called in a time division multiplexing mode, so that AI (advanced technology) business for calling the target remote GPU resource can be normally called, and the utilization rate of the target remote GPU resource can be improved.

Therefore, according to the GPU resource calling method provided by the embodiment of the application, for the first AI service needing to call the local GPU resource, the first electronic equipment calls the local GPU resource, and for the second AI service needing to be called remotely, the first electronic equipment calls the target remote GPU resource remotely. In this way, different GPU resources can be called according to the actual situation of the AI service, so that the special AI service can call the local GPU resources preferentially, and the special AI service can call the GPU resources rapidly. And the common AI service can be also called to the remote GPU resources, so that the efficiency of GPU resource calling can be improved, and the utilization rate of each GPU resource in the kubernetes cloud platform can be improved.

In the embodiment of the present application, the first identifier in the request of the first AI service and the second identifier in the request of the second AI service may be preset. Fig. 2 is a flowchart of a method for setting an identifier of an AI service request by a user according to an embodiment of the present application.

As shown in fig. 2, the method for setting the identifier of the AI service request by the user may include:

s201, a first operation of a first AI service is received.

The first operation may be an operation input by the user through a device in the kubernetes cloud platform, which is not limited by the embodiment of the present application. For example, the first operation may be an operation that marks the first AI traffic as requiring invocation of a local GPU resource.

In one possible implementation, the first operation may also be an operation in which the user marks the first AI service as a node including GPU resources.

S202, a first identifier is set for a first AI service in response to a first operation.

For example, the first electronic device may set the first identifier for the first AI service such that the first identifier is identified when the first AI service is received.

S203, receiving a second operation of the second AI service.

The second operation may be an operation input by the user through a device in the kubernetes cloud platform, which is not limited by the embodiment of the present application. For example, the second operation may be an operation that marks the second AI traffic as requiring remote invocation of the off-site GPU resource.

In one possible implementation, when the electronic device that processes the second AI-service does not include GPU resources, the second operation may also be an operation in which the user marks the second AI-service as a node that does not include GPU resources.

S204, responding to the second operation, and setting a second identifier for a second AI service.

For example, the first electronic device may set the second identifier for the second AI service, so that when the second AI service is received, the second identifier can be identified to invoke the off-site GPU resource for the second AU service.

Therefore, through the fact that the user sets the identifier for the AI service in advance, the electronic equipment can call the GPU resource according to the identifier, and the efficiency of GPU resource call can be further improved.

In the embodiment of the application, because each node in the kubernetes cloud platform does not all comprise GPU resources, a user can divide GPU resources called by AI services in the kubernetes cloud platform according to whether the node comprises GPU resources and whether the AI services which can be processed by the node are special AI services.

Fig. 3 is a schematic diagram of partitioning of GPU resources for AI service invocation according to an embodiment of the present application.

As shown in fig. 3, the AI services include a general AI service and a special AI service. The normal AI services may include a first normal AI service, a second normal AI service, a third normal AI service … …, and an nth normal AI service. The special AI services may include a first special AI service, a second special AI service … …, and an mth special AI service. Wherein M, N is a natural number, and the size of M, N is not limited in the embodiment of the present application.

The first normal AI service, the second normal AI service, the third normal AI service and the Nth normal AI service are AI services processed by the electronic device which does not comprise GPU resources. The first special AI service, the second special AI service and the Mth special AI service are AI services processed by the electronic equipment comprising GPU resources.

As shown in fig. 3, when the electronic device receives the first special AI service, the GPU resource of the electronic device may be invoked by the first special AI service container of the first special AI service. And when the electronic equipment receives the second special AI service, the GPU resource of the electronic equipment can be called through a second special AI service container of the second special AI service. When the electronic equipment receives the Mth special AI service, the GPU resource of the electronic equipment can be called through an Mth special AI service container of the Mth special AI service. When the electronic equipment receives the first common AI service, the GPU resource of the electronic equipment corresponding to the second special AI service can be called through a first common AI service container of the first common AI service. When the electronic equipment receives the second common AI service, the GPU resource of the electronic equipment corresponding to the first special AI service can be called through a second common AI service container of the second common AI service. When the electronic equipment receives the third common AI service, the GPU resource of the electronic equipment corresponding to the second special AI service can be called through a third common AI service container of the third common AI service. When the electronic equipment receives the N common AI service, the GPU resource of the electronic equipment corresponding to the M special AI service can be called through an N common AI service container of the N common AI service.

Therefore, in the kubernetes cloud platform, by dividing out a normal node without a GPU, it can be denoted as a nogpu node, and a special node with a GPU, it can be denoted as a GPU node. Most of the AI service containers with common grades can be reasonably called to the nogpu node through the algorithm, and a small amount of the AI service containers with special high grades are reasonably called to the gppu node through the algorithm, so that the AI service containers with special high grades do not need to pass through a network when using the GPU. The AI service container on the nogpu node is used remotely by selecting a proper GPU node for GPU through a rcuda (remote cuda runtime) method, and the AI service container on the GPU node only needs to be used locally by a vcuda (virtual cuda driver) method. Among them, the rcuda method and the vcuda method can be described in the examples below.

Based on the above embodiments, when the second electronic device in the kubernetes cloud platform does not include the GPU resource, the first electronic device in the embodiment of the present application may further receive, at a third moment, a remote call of the local GPU resource by a third AI service container of the second electronic device, so as to remotely call the local GPU resource to the third AI service container, where an application for processing the third AI service is running in the third AI service container.

It can be understood that, a method for calling the local GPU resource of the first electronic device by the third AI service container of the second electronic device is similar to the method for calling the target remote GPU resource by the second AI service container in the above embodiment, and will not be described herein.

Therefore, the local GPU resources of the first electronic device can be called by AI service containers of other devices, the utilization rate of the Endi GPU resources can be improved, and the efficiency of GPU resource calling is improved.

In order to facilitate understanding of the GPU resource calling method provided by the embodiments of the present application, two cases of calling a local GPU resource and calling a remote GPU resource will be described below.

Fig. 4 is a schematic diagram of a method for calling a local GPU resource according to an embodiment of the present application.

It should be noted that, the electronic device shown in fig. 4 is an electronic device in the kubernetes cloud platform, which may be the first electronic device described in the foregoing embodiment, or may be other electronic devices including GPU resources. The embodiment of the present application is not limited thereto.

Illustratively, the AI service container shown in FIG. 4 may invoke local GPU resources via the vcuda method.

In fig. 4, when the electronic device is the first electronic device, the AI service container may be the first AI service container described in the foregoing embodiment. The interface may be a hijacked cuda driver api. The GPU resource management module may be a module for obtaining local GPU resources. The driver may be an nvidia driver.

As shown in fig. 4, the AI business container invoking local GPU resources may include the steps of:

step 1, a GPU resource management module manages all GPU hardware information of a local machine and mainly obtains the GPU hardware information by calling a cuda driver api.

The GPU resource management module may be, for example, a kubernetes cloud platform device plug in.

And step 2, the GPU resource management module acquires GPU information and reports the GPU information to the whole cloud environment of the kubernetes cloud platform.

Illustratively, the reporting object of the GPU resource management module is a kubernetes cloud platform device plug in.

In the embodiment of the application, the vcuda may further include a GPU resource control module, where the GPU resource control module is used to generally provide an algorithm for controlling all GPU resource information to be called, and may also provide a target host address for the rcuda module. Providing the rcuda module with the target host address may be seen in the description of the embodiments below.

Step 3, kubernetes device plugin component requires the AI service container to mount the required file system directory.

kubernetes device plugin component requests the AI service container to mount the required file system directory through kubernetes device plugin mechanisms. The file system directory may be the correspondence between the interfaces and the identifiers described in the above embodiments. The directory may include a hijacked cuta driver api and a bottom level related driver.

The underlying related driver may be nvidia driver in fig. 4, which is used to invoke GPU resources.

And 4, invoking the hijacked cuda driver api to realize the time division multiplexing of the GPU when the AI service container runs.

The specific process of invoking the hijacked cuda driver api to implement the time division multiplexing of the GPU in the AI service container during running may be described in the above embodiment, and will not be described herein again.

And step 5, reporting the use condition of the AI service container to the GPU resource management module by the hijacked cuda driver api.

For example, when the electronic device includes multiple GPUs, the AI service container may invoke one or more GPU resources, such that the hijacked cuda driver api reports the use of the AI service container to one or any of the GPUs to the GPU resource management module.

Fig. 5 is a schematic diagram of a method for calling a local GPU resource in a different place according to an embodiment of the present application.

It should be noted that, in fig. 5, the electronic device 1 and the electronic device 2 are both electronic devices in the kubernetes cloud platform, and the electronic device 1 may be the second electronic device described in the foregoing embodiment, or may be other electronic devices that do not include GPU resources. The electronic device 2 may be the first electronic device described in the foregoing embodiment, and may be other electronic devices including GPU resources. The embodiment of the present application is not limited thereto.

Illustratively, the AI service container of the electronic device 1 shown in fig. 5 may invoke the GPU resource of the electronic device 2 through the rcuda method, i.e. the AI service container invokes the off-site GPU resource.

Rcuda can provide a remote use function of the GPU, the electronic device 1 is a client end of Rcuda, and is without GPU resources, and the electronic device 2 is a server end of Rcuda, and is with GPU resources. In the embodiment of the application, the access to the GPU resource is performed by the hijacked cuda run time api on the GPU of the server, and the server is carried and realized by a vcuda scheme.

In fig. 5, when the electronic device 1 is the second electronic device, the AI-service container may be the third AI-service container described in the above embodiment. The interface may be a hijacked cuda driver api. The GPU resource management module may be a module for obtaining local GPU resources. The driver may be an nvidia driver.

As shown in fig. 5, the AI business container invoking local GPU resources may include the steps of:

step 1, a GPU resource management module of the electronic device 2 (server end) manages all GPU hardware information of the local machine, and the GPU hardware information is mainly obtained by calling a cuda driver api.

This step can be described in the above embodiments, and will not be described in detail herein.

And 3, the AI service container of the electronic equipment 1 (client) mounts the related file system catalog or sets the environment variable according to the requirement.

By way of example, the environment variable may comprise a configuration file such as a target host address. The file system directory may contain a hijacked cuta driver api and an underlying related driver nvidia driver.

And 4, after the AI service container of the electronic equipment 1 (client) calls the hijacked cuda runtime api, the call is initiated to the target rcuda-server service through the network.

And 5, calling the hijacked cuda driver api to realize the time division multiplexing of the GPU by the rcuda-server program in the running process.

The rcuda-server program is a program in the electronic device 2 for calling GPU resources.

And 6, reporting the service condition of the rcuda-server program to a GPU resource management module of the electronic equipment 2 by the hijacked cuda driver api.

In this way, the GPU resource management module of the electronic device 2 can understand the calling condition of the GPU resource in time so as to call the GPU resource reasonably.

In summary, the method provided by the embodiment of the application realizes that the GPU resources on different nodes are uniformly managed to form the resource pool and then provided for a local or remote service container for use. By hijacking the CUDA Driver API, the virtualization of the GPU is realized, so that different programs (i.e. business containers) can perform time division multiplexing on a certain GPU. The method can uniformly pool the GPU resources and remotely provide the pooled GPU resources to the nodes without the GPU resources, and simultaneously supports local use of the pooled GPU resources, namely, supports the direct calling of special-grade business containers to the nodes with the GPU resources for local high-efficiency GPU use.

Fig. 6 is a schematic structural diagram of a GPU resource calling device 60 according to an embodiment of the present application, for example, please refer to fig. 6, the GPU resource calling device 60 may include:

the receiving module 601 is configured to receive a request for a first AI service at a first time, where the request for the first AI service includes a first identifier, and the first identifier is used to indicate that the first AI service is a service that needs to invoke a local GPU resource;

and the calling module 602 is configured to call the local GPU resource through a first AI service container according to the first identifier, where the first AI service container is a container running an application program for processing the first AI service.

The receiving module 601 is further configured to receive, at a second moment, a request for a second AI service, where the request for the second AI service includes a second identifier, and the second identifier is used to indicate that the second AI service is a service that needs to remotely invoke GPU resources.

The processing module 603 is configured to determine a target off-site GPU resource from the plurality of off-site GPU resources according to the second identifier.

The calling module 602 is further configured to remotely call the target off-site GPU resource through a second AI service container, where the second AI service container is a container running an application for processing the second AI service.

In a possible implementation manner, the calling module 602 is specifically configured to obtain, according to the first identifier, a hijacked first interface in the kubernetes device plugin component through the first AI service container, where the kubernetes device plugin component includes correspondence between identifiers of a plurality of AI services and the hijacked interface; and calling the hijacked first interface through the first AI service container to call the local GPU resource.

In a possible implementation manner, the calling module 602 is specifically configured to call the local GPU resource by using a time division multiplexing manner through the first AI service container.

In a possible implementation manner, the calling module 602 is specifically configured to obtain, according to the second identifier, the hijacked second interface in the kubernetes device plugin component through the second AI service container; and calling a hijacked second interface in the target kubernetes server through a second AI service container by using a network to remotely call the target remote GPU resource, wherein the target kubernetes server is the server where the target remote GPU resource is located.

In a possible implementation manner, the calling module 602 is specifically configured to remotely call the target off-site GPU resource by using a time division multiplexing manner through the second AI service container.

In a possible implementation manner, the receiving module 601 is further configured to receive a first operation on the first AI service.

The processing module 603 is further configured to set a first identifier for the first AI service in response to the first operation.

The receiving module 601 is further configured to receive a second operation on a second AI service.

The processing module 603 is further configured to set a second identifier for a second AI service in response to a second operation.

In a possible implementation manner, the receiving module 601 is further configured to receive, at a third moment, a remote call of the local GPU resource by a third AI service container of the second electronic device, so as to remotely call the local GPU resource to the third AI service container, where the second electronic device is a device in a kubernetes cloud platform, and the third AI service container runs an application for processing a third AI service, and the second electronic device does not include the GPU resource.

The GPU resource calling device provided by the embodiment of the application can execute the technical scheme of the GPU resource calling method in any embodiment, and the implementation principle and beneficial effects of the GPU resource calling device are similar to those of the GPU resource calling method, and can be seen from the implementation principle and beneficial effects of the GPU resource calling method, and the description is omitted here.

Fig. 7 is a schematic structural diagram of an electronic device according to the present application. As shown in fig. 7, the electronic device 700 may include: at least one processor 701 and a memory 702.

A memory 702 for storing programs. In particular, the program may include program code including computer-operating instructions.

The memory 702 may comprise high-speed RAM memory or may further comprise non-volatile memory (non-volatile memory), such as at least one disk memory.

The processor 701 is configured to execute computer-executable instructions stored in the memory 702 to implement the GPU resource calling method described in the foregoing method embodiment. The processor 701 may be a central processing unit (Central Processing Unit, abbreviated as CPU), or an application specific integrated circuit (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more integrated circuits configured to implement embodiments of the present application. Specifically, when implementing the GPU resource calling method described in the foregoing method embodiment, the electronic device may be, for example, an electronic device having a processing function, such as a terminal, a server, or the like.

Optionally, the electronic device 700 may also include a communication interface 703. In a specific implementation, if the communication interface 703, the memory 702, and the processor 701 are implemented independently, the communication interface 703, the memory 702, and the processor 701 may be connected to each other and perform communication with each other through buses. The bus may be an industry standard architecture (Industry Standard Architecture, abbreviated ISA) bus, an external device interconnect (Peripheral Component, abbreviated PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) bus, among others. Buses may be divided into address buses, data buses, control buses, etc., but do not represent only one bus or one type of bus.

Alternatively, in a specific implementation, if the communication interface 703, the memory 702, and the processor 701 are implemented on a single chip, the communication interface 703, the memory 702, and the processor 701 may complete communication through internal interfaces.

The present application also provides a computer-readable storage medium, which may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, etc., in which program codes may be stored, and in particular, the computer-readable storage medium stores program instructions for the methods in the above embodiments.

The present application also provides a program product comprising execution instructions stored in a readable storage medium. The at least one processor of the electronic device may read the execution instructions from the readable storage medium, and execution of the execution instructions by the at least one processor causes the electronic device to implement the GPU resource invoking methods provided by the various embodiments described above.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the application.

Claims

1. The GPU resource calling method is applied to first electronic equipment, wherein the first electronic equipment is equipment in a kubernetes cloud platform, and the first electronic equipment comprises a local graphic processor GPU resource, and is characterized by comprising the following steps:

2. The method of claim 1, wherein invoking the local GPU resource via the first AI business container according to the first identification comprises:

3. The method of claim 2, wherein invoking the local GPU resource via the first AI business container comprises:

4. A method according to any one of claims 2 or 3, wherein said remotely invoking said target off-site GPU resource via a second AI business container, in accordance with said second identity, comprises:

5. The method of claim 4, wherein remotely invoking the target off-site GPU resource via the second AI business container further comprises:

6. The method according to claim 1, wherein the method further comprises:

receiving a first operation of the first AI service;

receiving a second operation on the second AI service;

7. The method according to claim 1, wherein the method further comprises:

8. A GPU resource calling device, comprising:

the calling module is used for calling the local GPU resource through a first AI service container according to the first identifier, wherein the first AI service container is a container for running an application program for processing the first AI service;

the receiving module is further configured to receive a request for a second AI service at a second moment, where the request for the second AI service includes a second identifier, and the second identifier is used to indicate that the second AI service is a service that needs to remotely invoke GPU resources;

the processing module is used for determining target remote GPU resources from a plurality of remote GPU resources according to the second identifier;

9. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

The memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement a GPU resource calling method as claimed in any of claims 1-8.

10. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement a GPU resource recall method as claimed in any one of claims 1 to 8.