CN115658330B

CN115658330B - WebAssembly-oriented cross-platform GPU virtualization method

Info

Publication number: CN115658330B
Application number: CN202211659810.9A
Authority: CN
Inventors: 许封元; 吴昊; 杨博; 何思怡
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2022-12-23
Filing date: 2022-12-23
Publication date: 2023-03-28
Anticipated expiration: 2042-12-23
Also published as: CN115658330A

Abstract

The invention discloses a WebAssembly-oriented cross-platform GPU virtualization method, which comprises the following steps: compiling and linking the existing CUDA application source code by using a compiling tool chain to generate byte codes, loading the byte codes into a virtual machine for operation, carrying out validity check after the WebAssembly virtual machine receives a GPU request of the CUDA application, modifying the maintained virtual GPU state, and sending a corresponding request for modifying the physical GPU state to the physical GPU. And after an execution result returned by the physical GPU is received, returning the execution result to the CUDA application running in the virtual machine, wherein the virtual machine can enable the original virtual machine to independently access the GPU without the dependence of a JavaScript engine, and can provide transparent access with almost zero performance loss for the CUDA application.

Description

WebAssembly-oriented cross-platform GPU virtualization method

Technical Field

The invention relates to a WebAssembly-oriented cross-platform GPU virtualization method, and belongs to the technical field of virtualization in edge computing and cloud computing.

Background

With the rapid development of mobile communication technology in recent years, under the promotion of the vigorous development of the internet of things technology and cloud service, edge computing is also rapidly popularized and popularized. The edge computing mainly aims to solve the problems of high delay, unstable network and low bandwidth in the traditional cloud computing mode. In an edge computing, cloud computing environment, virtualization of a system is a key technology. Virtualization creates an elastic virtual isolation environment, and resource efficiency, cost reduction and system stability can be improved by sharing given hardware among a plurality of users.

The Graphics Processing Units (GPUs) improve the execution throughput of parallel programs by using thousands of simple computing cores integrated on a chip and a high-memory bandwidth architecture, have strong parallel computing capability, can well perform Graphics Processing, and have strong advantages in the aspect of high-performance computing. At present, the GPU is widely applied to intensive computation tasks such as scientific computation, intelligent video stream processing, voice recognition, natural language processing and the like to improve the computation speed. Computer Unified Device Architecture (CUDA) is currently the most common type of parallel computing platform and Application Programming Interface (API), and a GPU can be directly accessed through the CUDA Interface to perform high-performance parallel computing. Due to the advantages of the GPU and the virtualization, cloud providers are encouraged to equip cloud products with GPUs and provide virtualized GPU environments, and more compute-intensive applications can also utilize GPUs for high-performance computing in virtualized environments.

WebAssembly is a fast, secure, portable low-level binary instruction format, and is an abstraction of an underlying virtual machine of modern hardware. The WebAssembly bytecode implements a strong memory isolation sandbox by software fault isolation and control flow integrity based. The WebAssembly can realize millisecond-level cold start and extremely low resource consumption, and has great advantages compared with the existing system virtualization schemes such as virtual machines and containers. There has been an increasing number of cloud providers turning to the use of WebAssembly-based lightweight virtualization technologies.

Currently, a WebAssembly virtual machine has achieved virtualization of System resources such as a file System and a network through a standardized System Interface (WASI), but there is no virtualization scheme specifically for GPU device resources. The original WebAssembly virtual machine cannot provide a virtualized GPU environment, and currently, only by integrating with a JavaScript engine, as shown in fig. 1, a GPU request is forwarded by means of the JavaScript engine, so that the WebAssembly indirectly accesses the GPU.

The existing scheme for forwarding GPU requests by means of JavaScript engines has three disadvantages:

1. relying on a JavaScript engine. Only the WebAssembly virtual machine integrated with the JavaScript engine has the function (such as Nodejs), but the WebAssembly virtual machine specially designed for a non-Web end (such as Wasmer) cannot be utilized. The JavaScript engine occupies large resources and is slow in starting speed. Therefore, a scheme of separating from a JavaScript engine is needed, so that the WebAssembly virtual machine can independently provide a virtual GPU environment.

2. The performance is insufficient. The interaction between the WebAssembly virtual machine and the JavaScript is a used JavaScript code, the JavaScript code execution is explanatory execution, the execution speed is low, the consumption of computing resources and the consumption of I/O are very high when the GPU request is forwarded, and the performance requirement of high-performance computing cannot be met.

There is no corresponding ecological support. The GPU access interface provided by the JavaScript engine is WebGPU which is a new generation API standard specified by W3C and can provide support for GPU general-purpose computation. The specified device programming Language of the WebGPU is WebGPU sharing Language (WGSL), and most of the current high-performance computing applications are written by using a CUDA interface, and the existing applications are transplanted to the WebGPU standard, which requires a lot of manpower.

Disclosure of Invention

The invention aims to: aiming at the existing problems and defects, the invention aims to provide a WebAssembly-oriented cross-platform GPU virtualization method, which can enable a WebAssembly virtual machine to be independent of a JavaScript engine to access a GPU, and can realize a transparent access method with almost zero performance loss, and simultaneously, the support of a corresponding CUDA compiling tool chain is developed, so that the three technical problems are solved.

The technical scheme is as follows: in order to realize the purpose, the invention adopts the following technical scheme:

the WebAssembly-oriented cross-platform GPU virtualization method comprises the following steps:

step 1: compiling and linking the existing CUDA application source code by using the compiling tool chain provided by the invention to generate a WebAssembly bytecode;

step 2: loading the WebAssembly bytecode obtained in the step 1 into a WebAssembly virtual machine provided by the invention for operation, after receiving a GPU request sent by a CUDA application, carrying out validity check on the request, then changing the maintained virtual GPU state, and generating a corresponding request for modifying the physical GPU state;

and step 3: sending the modification request obtained in the step (2) to a physical GPU, and waiting for the physical GPU to return an execution result;

and 4, step 4: and returning the execution result in the step 3 to the CUDA application running in the WebAssembly virtual machine provided by the invention, so as to realize the virtualization of the GPU.

Further, the specific steps of compiling and linking in the step 1 are as follows: and compiling by using an LLVM tool chain, and simultaneously processing the LLVM IR in the compiling process by using the LLVM Pass plug developed by the invention, processing the generated CUDA device function and the part which cannot be compiled into the WebAssembly, and then generating the LLVM IR which is compiled into a WebAssembly target after processing.

The specific steps of generating the WebAssembly bytecode in the step 1 are that LLVM IR of the WebAssembly target is directed at a 64-bit target, and if the generated target bytecode is 64 bits, the target bytecode is directly compiled and then linked with a 64-bit LIBC; if the generation target is 32 bits, the LLVM Pass plug provided by the system is utilized to process LLVM IR in the compiling process, the packaging function of the LIBC with 64-bit parameters is changed, and then the packaging function is linked with the existing LIBC with 32 bits, so that the WebAssembly is correctly generated.

Further, the specific steps of the WebAssembly virtual machine receiving the GPU request sent by the CUDA application in step 2 are: and when the WebAssembly virtual machine executes the CUDA call, the CUDA call is forwarded to the CUDA environment interface, and then a request for checking the GPU and subsequent steps are processed.

Further, after receiving the CUDA related call from the WebAssembly virtual machine, the CUDA environment interface performs logic processing such as validity check and permission check, then forwards the CUDA related call to the physical GPU for further calculation, and finally returns the calculation result to the CUDA application in the WebAssembly virtual machine, so that virtualization of the GPU on the WebAssembly virtual machine is realized.

Further, in the CUDA application described in step 1, the CUDA resource is stored in the form of a unique handle, a unique handle is allocated to each CUDA resource that is actually applied, and if a CUDA application pointer that runs in the WebAssembly virtual machine is 32 bits, a unique 32-bit integer is used to represent an allocated handle; if the CUDA application pointer running in the WebAssembly virtual machine is 64-bits, then a unique 64-bit integer is used to represent an allocated handle.

Further, the CUDA application applies for a CUDA resource to store a corresponding unique handle, when the CUDA resource is applied for accessing, the corresponding unique handle is used for accessing, when the WebAssembly virtual machine receives an access request, validity check is carried out on the handle, and if the access is an invalid handle or the checked device memory access is out of range, the CUDA application immediately stops processing the access application and returns error information.

Further, the CUDA resource is a resource allocated by CUDA application through CUDA related call, the CUDA related call comprises a CUDA Driver API and a CUDA Runtime API, the CUDA Runtime API is established on the CUDA Driver API and automatically manages CUDA context, CUDA device functions and other CUDA resources, the CUDA application is written through the CUDA Driver API or the CUDA Runtime API, the WebAssembly virtual mechanism is established on the CUDA Driver API, and support of the CUDA Runtime API is achieved through simulation of automatic management behavior of the CUDA Runtime.

Further, when the CUDA environment interface applies for the CUDA resource, the WebAssembly virtual machine receives a call request of a CUDA Driver API or a CUDA Runtime API, judges whether the call request exceeds a specified limit, then applies for the CUDA resource to the physical GPU, allocates a unique handle to the CUDA resource after success, stores the unique handle and the applied CUDA resource in a data dictionary, and returns the unique handle to the WebAssembly application; when an application needs to access a CUDA resource, a WebAssembly virtual machine simultaneously receives a call request of a CUDA Driver API or a CUDA Runtime API, analyzes a transferred parameter handle, then performs security check on the parameter handle, immediately stops processing the request and returns error information if the security check does not pass, sends a corresponding operation request to a physical GPU if the security check passes, and returns a result after waiting for processing; when the application crash is finished or normally finished, the WebAssembly automatically releases the resources which are applied by the CUDA but not released yet, specifically, automatically releases the stored handle and the corresponding CUDA resource, and returns the CUDA resource to the physical GPU.

Further, the context management of the CUDA application includes the following steps: each thread in the WebAssembly virtual machine stores CUDA context information corresponding to the current thread in the thread local storage, and the WebAssembly virtual machine simulates the same operation when a CUDA application applies to create and switch CUDA contexts; when the CUDA application destroys the CUDA context, the WebAssembly virtual machine removes the context, meanwhile, according to the related calling of the CUDA, the specified behavior is called, the CUDA resource is destroyed, and the automatic management part in the CUDA Runtime API is carried out according to the behavior specified by the CUDA Runtime.

Has the advantages that: compared with the prior art, the invention has the following advantages:

1. the current solution for GPU device resources in the WebAssembly-based virtualization technology can only rely on a JavaScript engine, and the JavaScript engine is utilized to forward GPU requests. The scheme by means of the JavaScript engine has various problems of performance, ecological support and the like, so a new solution for virtualizing GPU equipment resources is needed, and virtualization access of a GPU can be provided independently without the JavaScript engine.

And 2, the virtualization of the GPU often has various losses such as I/O (input/output) and the like, so that the performance of the virtualized GPU is not as good as that of local operation, and therefore a new scheme is needed to solve the problem of performance loss caused by GPU virtualization and realize the problem of almost zero performance loss.

3. At present, most of applications utilizing a GPU to perform high-performance computation are written by a CUDA (compute unified device architecture), if GPU virtualization based on WebAssembly is used, corresponding ecological support should be provided for the existing applications, the existing applications can generate WebAssembly targets under the condition that the existing applications do not need to be modified or rewritten, and operation in a WebAssembly environment is achieved.

Drawings

FIG. 1 is a conventional scheme of implementing a JavaScript engine according to an embodiment of the present invention;

FIG. 2 is a relationship between the CUDA Driver API and the CUDA Runtime API, CUDA application of the present invention;

FIG. 3 is the architecture of a WebAssembly virtual machine provided by the present invention.

Detailed Description

The present invention is further illustrated by the following figures and specific examples, which are to be understood as illustrative only and not as limiting the scope of the invention, which is to be given the full breadth of the appended claims and any and all equivalent modifications thereof which may occur to those skilled in the art upon reading the present specification.

Briefly, the system described in this patent mainly has two core portions:

1. CUDA environment interface of WebAssembly virtual machine

2. CUDA supplementation of WebAssembly compilation toolchain

These two parts are analyzed in detail below:

1. CUDA environment interface of WebAssembly virtual machine

According to the method, a CUDA environment interface is provided for the CUDA application running in the WebAssembly virtual machine in a mode of increasing local function call, when the CUDA call is executed by the WebAssembly virtual machine, the CUDA call is forwarded to the CUDA environment interface provided by the method, and then a request for checking the GPU and subsequent steps are processed. After receiving the CUDA related call from the WebAssembly virtual machine provided by the invention, the CUDA environment interface performs logic processing such as validity check and authority check, then forwards the CUDA related call to the physical GPU for further calculation, and finally returns the calculation result to the CUDA application in the WebAssembly virtual machine, so that the GPU on the WebAssembly virtual machine is virtualized.

The CUDA interface of the virtual machine is realized by the following key technologies:

1) Representation of CUDA resources

Originally, in the CUDA application compiled into the native binary, the CUDA resource handle is saved in the form of a pointer, but the CUDA application running in the WebAssembly virtual machine does not use the pointer to save the CUDA resource handle. The reason is that when the WebAssembly virtual machine runs on a 64-bit system and the WebAssembly instance is 32-bit, a 32-bit pointer within the WebAssembly instance cannot hold the next 64-bit CUDA resource handle. The use of pointers also increases the difficulty of security checks.

Therefore, a unique handle is considered to be allocated to each CUDA resource (such as device memory, device function, etc.) actually applied, because the CUDA application uses pointers to store the handle of the CUDA resource, if the pointer is 32 bits in the WebAssembly, a unique 32-bit integer is used to represent an allocated handle; if the pointer is 64 bits in WebAssembly, a unique 64-bit integer is used. Thus, in the CUDA application, a unique integer is used to store the corresponding CUDA resource handle.

2) Maintenance of CUDA resources

The handle of the CUDA resource is represented by a unique integer, and the allocated handle and the corresponding applied CUDA resource are saved by using data dictionary mapping. When a CUDA application applies for accessing a certain CUDA resource, the unique handle which is already allocated to the CUDA resource is used for accessing. The system will check the validity of the handle, and if the access is invalid or the checked device memory access is out of range, the system will stop processing the access application immediately and return error information.

3) Support of CUDA Driver API and CUDA Runtime API

The relationship between the CUDA Driver API and the CUDA Runtime API is shown in FIG. 2, the CUDA Runtime API is built on the CUDA Driver API, resources such as the CUDA context and the CUDA device function are automatically managed, and the CUDA application can be written by using the CUDA Runtime API or written by using the CUDA Driver API. The WebAssembly virtual machine provided by the invention is constructed on the CUDA Driver API, and the support of the CUDA Runtime API is realized by simulating the management behaviors of the CUDA Runtime on the CUDA context, the device function and the like.

The detailed description of this section is as follows:

the framework of the CUDA environment interface part of the WebAssembly virtual machine provided by the invention in the whole WebAssembly is shown in fig. 3, and the system is used as a supplementary part of the WebAssembly virtual machine provided by the invention, and can provide support of CUDA related call for CUDA application running in the WebAssembly virtual machine without invasive modification.

When a CUDA application applies for a CUDA resource, the WebAssembly virtual machine provided by the invention receives a call request of CUDA related call, firstly judges whether the specified limit is exceeded or not, then applies for the resource to a physical GPU, allocates a unique handle to the resource after the application is successful, saves the handle and the applied resource in a data dictionary, and returns the unique handle to the CUDA application.

When the CUDA application needs to access the CUDA resource, the WebAssembly virtual machine provided by the invention receives the request of CUDA related call, analyzes the transferred parameter handle, and executes various security checks on the handle, such as whether the handle corresponds to the CUDA resource, whether the corresponding CUDA resource is effective, whether the access is authorized, and whether the access device memory is out of range if the access device memory is accessed. If the safety check is not passed, immediately stopping processing the request and returning error information, if the safety check is passed, sending a corresponding operation request to a physical GPU, and returning a result after waiting for processing.

When the CUDA application crashes to finish or returns normally for some reasons, the system can automatically release the resources which are applied by the application but not released yet. Specifically, the WebAssembly virtual machine automatically releases the stored handle and the corresponding CUDA resource, returns the CUDA resource to the physical GPU, and prevents the problems of resource leakage and the like.

For the management part of the CUDA context, the system maintains the currently allocated CUDA context set, and each thread stores the CUDA context information corresponding to the current thread in the thread local storage. When the CUDA application applies for creating and switching the context, the system can execute the operations of creating, switching and the like; when the context is destroyed by the CUDA application, the system not only removes the context from the context set, but also destroys the CUDA resource correspondingly applied in the context according to the behavior specified by the CUDA related call. Meanwhile, for the part of automatic management in the CUDA Runtime API, the system is realized according to the behavior specified by the CUDA Runtime, such as automatically and implicitly creating the context of the equipment, and the like, so as to support the application written based on the CUDA Runtime.

CUDA supplementation of compilation toolchains

Therefore, the CUDA is supplemented to the WebAssembly compiling tool chain, the original CUDA application can be compiled into WebAssembly byte codes under the condition that source codes are not required to be modified, and the WebAssembly byte codes can be executed on the WebAssembly virtual machine provided by the invention.

The implementation of CUDA supplementation of a compilation toolchain involves the following key technologies:

1) Substitution of CUDA built-in function

Since the CUDA device function, the CUDA Runtime built-in function, and the like generated by the toolchain cannot be compiled into the WebAssembly bytecode during the compilation process, it is considered to replace them during the compilation process. Firstly, developing LLVM Pass plug by using a LLVM tool chain, processing LLVM IR in a compiling process, and performing portability processing on parts which cannot be compiled into WebAssembly, such as a generated CUDA device function, a CUDA Runtime built-in function and the like. And then replacing the CUDA Runtime static library developed by the system in the linking process, so that the CUDA source code can be compiled to generate the WebAssembly bytecode.

2) Matching of LIBC parameters

The processed LLVM IR is targeted for 64 bits. If the generated target byte code is 64 bits, the LIBC with 64 bits can be directly linked; if the generation target is 32 bits, the LLVM tool chain is used, LLVM Pass plug provided by the system is utilized to process LLVM IR in the compiling process, a packaging function aiming at LIBC with 64-bit parameters is changed, and then the packaging function is linked with the existing LIBC with 32 bits, so that the WebAssembly is correctly generated.

The detailed description of this section is as follows:

the CUDA application can only be compiled by a 64-bit target, and codes which cannot be supported by WebAssemble targets such as embedded assembly and the like exist during compiling, so that supplement is performed in an LLVM tool chain.

If the generated target byte code is 64 bits, the LIBC with 64 bits can be directly linked; if the generation target is 32 bits, the problem of LIBC incompatibility exists, the system also has a corresponding LLVM IR processing plug-in for the generation target, a packing function aiming at the LIBC with 64-bit parameters is changed, then the system is linked with the LIBC with 32 bits, and the WebAssembly is successfully generated.

So far, two parts of the system core are introduced: the CUDA environment interface part of the WebAssembly virtual machine and the CUDA supplementary part of the WebAssembly compiling tool chain are described, and the beneficial effects achieved by the system are described next.

The patent proposes a WebAssembly-based GPU virtualization scheme which can be used for covering a plurality of typical GPU use scenes such as cross-platform deep learning training and reasoning. According to the system, a CUDA environment interface supplement part of the WebAssembly virtual machine is designed, a CUDA related call request is received during operation, then the call is processed through validity check, authority check and the like, a physical GPU is called for calculation, and finally a calculation result is returned to the CUDA application of the WebAssembly virtual machine, so that virtualization of the GPU on the WebAssembly virtual machine is achieved. The virtualization mode does not need to modify the core part of the original WebAssembly virtual machine (without invasion), and only needs to provide a CUDA environment interface of the system when the WebAssembly is started. Meanwhile, the original CUDA application can be compiled into the WebAssembly through the compiling tool chain provided by the invention without modifying source codes. For a running CUDA application, all operations appear to occur locally, as if operating on a real physical GPU. By the aid of the system, GPU virtualization can be supported by the WebAssembly-based edge and cloud environment, cross-platform general parallel computing and deep learning training and reasoning processes can be efficiently carried out.

The system of the patent is implemented and performance evaluated. The system can realize GPU virtualization with almost zero overhead, and the result display of the PolyBench GPU is almost the same as the performance result of local operation; the reasoning experiment result of the model shows that the system is faster than the original scheme of the WebGPU interface by means of the JavaScript engine by more than 3 times.

In the PolyBench GPU comparison results, it can be seen that the performance results of the system and the local operation are almost equal.

In the reasoning operation experimental result of the model, the performance of the system is basically equal to that of the local system, and is faster by more than 3 times than that of the original scheme of the WebGPU interface by means of the JavaScript engine.

In particular, the system described in this patent has several important meanings:

1. lightweight virtualization technology. The system provided by the invention is independent of JavaScript, and the WebAssembly virtual machine provided by the invention transparently accesses the GPU with low performance loss, so that the virtual machine is lighter. In general, the edge server has less computing resources, and the lightweight virtualization technology is a key technology of edge computing.

2. The usability is extremely high. The method supplements the WebAssembly compiling tool chain, so that a user can execute on the WebAssembly virtual machine provided by the invention under the condition of not modifying the source code of the CUDA application.

3. And (4) support of intelligent tasks. The system can support reasoning and training task acceleration of deep learning, and promotes on-site deployment of various intelligent tasks.

The flow of the user using the system is as follows:

1. and compiling and linking the existing CUDA application source code by using a compiling tool chain provided by the system to generate the WebAssembly bytecode.

2. The generated WebAssembly byte code runs in the WebAssembly virtual machine supported by the system, after the virtual machine receives a GPU request of an application, the virtual machine checks the validity of the request, then modifies the maintained virtual GPU state and sends a calculation request to a physical GPU, and finally returns a calculation result to a CUDA application in the WebAssembly virtual machine provided by the invention, thereby realizing the virtualization of the GPU.

Claims

1. The WebAssembly-oriented cross-platform GPU virtualization method is characterized by comprising the following steps: the method comprises the following steps:

step 1: compiling and linking the existing CUDA application source code by using a compiling tool chain to generate a WebAssembly bytecode;

step 2: loading the WebAssembly bytecode obtained in the step 1 into a WebAssembly virtual machine for operation, after receiving a GPU request sent by a CUDA application, carrying out validity check on the request, then changing a maintained virtual GPU state, and generating a corresponding request for modifying a physical GPU state;

and 4, step 4: and returning the execution result in the step 3 to the CUDA application running in the WebAssembly virtual machine to realize the virtualization of the GPU.

2. The WebAssembly-oriented cross-platform GPU virtualization method according to claim 1, wherein: the specific steps of compiling and linking in the step 1 are as follows: and compiling by using a LLVM tool chain, simultaneously processing the LLVM IR in the compiling process by using the developed LLVM Pass plug, processing the generated CUDA device function and the part which cannot be compiled into the WebAssembly, and then generating the LLVM IR which is compiled into the WebAssembly target after processing.

3. The WebAssembly-oriented cross-platform GPU virtualization method according to claim 2, wherein: the specific steps of generating the WebAssembly bytecode in the step 1 are that LLVM IR of the WebAssembly target is directed at a 64-bit target, and if the generated target bytecode is 64 bits, the generated target bytecode is directly compiled and then linked with a 64-bit LIBC; if the generation target is 32 bits, the LLVM Pass plug provided by the system is utilized to process LLVM IR in the compiling process, the LLVM IR is changed into a packaging function aiming at LIBC with 64-bit parameters, and then the packaging function is linked with the existing LIBC with 32 bits, so that the WebAssembly is correctly generated.

4. The WebAssembly-oriented cross-platform GPU virtualization method according to claim 3, wherein: the specific steps of the WebAssembly virtual machine receiving the GPU request sent by the CUDA application in the step 2 are as follows: and when the WebAssembly virtual machine executes the CUDA call, the CUDA call is forwarded to the CUDA environment interface, and then a request for checking the GPU and subsequent steps are processed.

5. The WebAssembly-oriented cross-platform GPU virtualization method of claim 4, wherein: after receiving the CUDA related call from the WebAssembly virtual machine, the CUDA environment interface performs logic processing on the CUDA related call, then forwards the CUDA related call to the physical GPU for further calculation, and finally returns the calculation result to the CUDA application in the WebAssembly virtual machine, so that the GPU on the WebAssembly virtual machine is virtualized.

6. The WebAssembly-oriented cross-platform GPU virtualization method of claim 5, wherein: in the CUDA application in the step 1, the CUDA resources are stored in a unique handle form, each CUDA resource actually applied is allocated with a unique handle, and if a CUDA application pointer operating in the WebAssembly virtual machine is 32 bits, a unique 32-bit integer is used for representing an allocated handle; if the CUDA application pointer running in the WebAssembly virtual machine is 64-bits, then a unique 64-bit integer is used to represent an allocated handle.

7. The WebAssembly-oriented cross-platform GPU virtualization method of claim 6, wherein: the CUDA application applies for a CUDA resource to store a corresponding unique handle, when the CUDA application applies for accessing a certain CUDA resource, the corresponding unique handle is used for accessing, when the WebAssembly virtual machine receives an access request, validity check is carried out on the handle, and if the access is an invalid handle or the checked device memory access is out of range, the CUDA application is immediately stopped processing the access application and error information is returned.

8. The WebAssembly-oriented cross-platform GPU virtualization method of claim 7, wherein: the CUDA resource is distributed by CUDA application through CUDA related call, the CUDA related call comprises a CUDA Driver API and a CUDA Runtime API, the CUDA Runtime API is established on the CUDA Driver API and automatically manages CUDA context and CUDA equipment functions, the CUDA application is compiled through the CUDA Driver API or the CUDA Runtime API, the WebAssembly virtual mechanism is established on the CUDA Driver API, and support of the CUDA Runtime API is achieved through simulation of automatic management behavior of the CUDA Runtime.

9. The WebAssembly-oriented cross-platform GPU virtualization method of claim 8, wherein: when the CUDA environment interface applies for a CUDA resource, the WebAssembly virtual machine receives a call request of a CUDA Driver API or a CUDA Runtime API, judges whether the call request exceeds a specified limit, then applies for the CUDA resource to the physical GPU, allocates a unique handle to the CUDA resource after the call request is successful, stores the unique handle and the applied CUDA resource in a data dictionary, and returns the unique handle to the WebAssembly application; when an application needs to access a CUDA resource, a WebAssembly virtual machine simultaneously receives a call request of a CUDA Driver API or a CUDA Runtime API, analyzes a transmitted parameter handle, then performs security check on the parameter handle, immediately stops processing the request and returns error information if the security check fails, sends a corresponding operation request to a physical GPU if the security check passes, and returns a result after waiting for processing; when the application crash is finished or normally finished, the WebAssembly automatically releases the resources which are applied by the CUDA but not released yet, specifically, automatically releases the stored handle and the corresponding CUDA resource, and returns the CUDA resource to the physical GPU.

10. The WebAssembly-oriented cross-platform GPU virtualization method of claim 9, wherein: the context management of the CUDA application comprises the following steps: each thread in the WebAssembly virtual machine stores CUDA context information corresponding to the current thread in the thread local storage, and the WebAssembly virtual machine simulates the same operation when a CUDA application applies to create and switch CUDA contexts; when the CUDA application destroys the CUDA context, the WebAssembly virtual machine removes the context, meanwhile, according to the related calling of the CUDA, the specified behavior is called, the CUDA resource is destroyed, and the automatic management part in the CUDA Runtime API is carried out according to the behavior specified by the CUDA Runtime.