CN115641249A - Performance optimization method based on domestic platform GPU - Google Patents
Performance optimization method based on domestic platform GPU Download PDFInfo
- Publication number
- CN115641249A CN115641249A CN202211370874.7A CN202211370874A CN115641249A CN 115641249 A CN115641249 A CN 115641249A CN 202211370874 A CN202211370874 A CN 202211370874A CN 115641249 A CN115641249 A CN 115641249A
- Authority
- CN
- China
- Prior art keywords
- gpu
- performance
- layer
- computing
- floating point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Debugging And Monitoring (AREA)
Abstract
The invention discloses a performance optimization method based on a domestic platform GPU, relating to the technical field of GPU optimization; intercepting all called OpenGL interface functions in an application scene, exporting the OpenGL interface functions to a code file, compiling and running the code file according to the performance optimization requirement of a GPU, dividing the code file into a CPU layer, a task management layer and a GPU computing acceleration layer based on a heterogeneous computing architecture, performing GPU operation in parallel through the GPU computing acceleration layer, transmitting data after GPU operation to the task management layer, integrating the data through the task management layer and transmitting the data to the CPU layer, performing overall operation data display through the CPU layer, compiling and transplanting on a domestic platform through OpenCL, improving the GPU floating point computing performance by utilizing GPU parallel computing, and testing the floating point computing performance.
Description
Technical Field
The invention discloses a method, relates to the technical field of GPU optimization, and particularly relates to a performance optimization method based on a domestic platform GPU.
Background
In a networked environment, graphics processing of modern computers is becoming more and more important, and a special graphics processor is required to undertake a display task to meet the requirements of various fields such as aerospace, navigation, satellite data processing, biomedical research and the like on a high-performance GPU. The high-performance GPU with autonomous controllable function can avoid potential safety hazards such as backdoor and loophole, ensure the safety of information systems in China and realize autonomous controllable of national defense information systems.
The current autonomous GPU can be normally used in function, but the performance of the autonomous GPU also improves the space, particularly the 2D/3D graphics processing capability, floating point computing capability, display capability and the like of the GPU.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a performance optimization method based on a domestic platform GPU, and the performance of the domestic GPU is optimized.
The specific scheme provided by the invention is as follows:
the invention provides a performance optimization method based on a domestic platform GPU, which intercepts and captures all called OpenGL interface functions in an application scene, exports the OpenGL interface functions to a code file, compiles and runs the code file according to the performance optimization requirement of the GPU,
based on heterogeneous computing architecture, the system is divided into a CPU layer, a task management layer and a GPU computing acceleration layer, GPU operation is carried out in parallel through the GPU computing acceleration layer, data after GPU operation are transmitted to the task management layer, the data are integrated and transmitted to the CPU layer through the task management layer, the data are operated and displayed integrally through the CPU layer,
compiling and transplanting are carried out on a domestic platform through OpenCL, GPU floating point calculation performance is improved through GPU parallel calculation, and the floating point calculation performance is tested.
Furthermore, in the performance optimization method based on the domestic platform GPU, an interception module of an OpenGL interface function is constructed, all called OpenGL interface functions in an application scene are intercepted by the interception module, the OpenGL interface functions are forwarded to corresponding OpenGL drivers, the OpenGL interface functions are exported to code files by the OpenGL drivers, and the code files are compiled and run according to performance optimization requirements of the GPU.
Furthermore, in the performance optimization method based on the domestic platform GPU, different heterogeneous modes of a heterogeneous computing architecture are set based on computing tasks aiming at different applications and task scenes, and data computing is performed through a GPU computing acceleration layer, a task management layer and a CPU layer.
Further, in the performance optimization method based on the GPU of the home-made platform, the compiling and transplanting are performed on the home-made platform through OpenCL, the GPU floating point calculation performance is improved by using the GPU parallel calculation, and the floating point calculation performance is tested, including:
installing OpenCL interface related drivers, checking whether the related programs are installed successfully,
compiling and installing corresponding versions mpich and OpenCL according to the domestic platform architecture, building a floating point performance testing tool library,
and calling the OpenCL library file in the successfully installed floating point performance testing tool library by using a GPU testing tool, starting testing after successful compiling, and testing the computing performance of the GPU after running the executable file.
The invention also provides a performance optimization system based on the domestic platform GPU, which comprises an interception module, a heterogeneous operation module and a compiling test module,
the interception module intercepts all called OpenGL interface functions in an application scene, exports the OpenGL interface functions to a code file, compiles and runs the code file according to the performance optimization requirement of a GPU,
the heterogeneous operation module is divided into a CPU layer, a task management layer and a GPU calculation acceleration layer based on a heterogeneous calculation architecture, GPU operation is carried out in parallel through the GPU calculation acceleration layer, data after GPU operation are transmitted to the task management layer, the data are integrated and transmitted to the CPU layer through the task management layer, the data are operated and displayed integrally through the CPU layer,
the compiling test module is compiled and transplanted on a domestic platform through OpenCL, improves the GPU floating point calculation performance by utilizing GPU parallel calculation, and tests the floating point calculation performance.
Further, an intercept module of an OpenGL interface function is constructed in the performance optimization system based on the domestic platform GPU, all called OpenGL interface functions in an application scene are intercepted by the intercept module, the OpenGL interface functions are forwarded to corresponding OpenGL drivers, the OpenGL interface functions are exported to code files by the OpenGL drivers, and the code files are compiled and run according to performance optimization requirements of the GPU.
Furthermore, the heterogeneous operation module in the performance optimization system based on the domestic platform GPU sets different heterogeneous modes of a heterogeneous computation framework based on the operation task aiming at different applications and task scenes, and performs data operation through a GPU computation acceleration layer, a task management layer and a CPU layer.
Further, in the performance optimization system based on the GPU in the domestic platform, the compiling and testing module performs compiling and transplanting on the domestic platform through OpenCL, improves the floating point calculation performance of the GPU by using GPU parallel calculation, and tests the floating point calculation performance, including:
installing the driver related to the OpenCL interface, checking whether the related program is installed successfully or not,
compiling and installing corresponding versions mpich and OpenCL according to a domestic platform architecture, building a floating point performance testing tool library,
and calling the OpenCL library file in the floating point performance testing tool library which is successfully installed by using a GPU testing tool, starting testing after successful compiling, and testing the computing performance of the GPU after running the executable file.
The invention has the advantages that:
the invention provides a performance optimization method based on a domestic platform GPU, which optimizes the graphic processing performance of the GPU, accelerates the calculation of a multi-level GPU, develops a CPU floating point calculation performance test tool through an OpenCL library, optimizes the performance of the domestic CPU and solves the problem that the domestic platform does not have the GPU calculation performance test tool.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic view of the process flow of optimizing an image according to the method of the present invention.
FIG. 2 is a schematic diagram illustrating the operation acceleration process implemented by the method of the present invention.
Detailed Description
The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.
The invention provides a performance optimization method based on a domestic platform GPU, which intercepts and captures all called OpenGL interface functions in an application scene, exports the OpenGL interface functions to a code file, compiles and runs the code file according to the performance optimization requirement of the GPU,
based on heterogeneous computing architecture, the heterogeneous computing architecture is divided into a CPU layer, a task management layer and a GPU computing acceleration layer, GPU operation is carried out in parallel through the GPU computing acceleration layer, data after GPU operation are transmitted to the task management layer, the data are integrated through the task management layer and transmitted to the CPU layer, the data are operated and displayed through the CPU layer,
compiling and transplanting are carried out on a domestic platform through OpenCL, GPU floating point calculation performance is improved through GPU parallel calculation, and the floating point calculation performance is tested.
The method provided by the invention avoids using a software middleware mode by intercepting OpenGL interface functions and compiling the OpenGL interface functions, and optimizes the performance of a graphics processing part. Multilevel GPU calculation acceleration is carried out, heterogeneous acceleration between the CPU and the GPU is designed, and communication and parallel calculation distribution between the GPUs are further improved. The invention can test the floating point calculation performance of the GPU by compiling and transplanting OpenCL on a domestic platform.
In a specific application, in some embodiments of the method of the present invention, an interception module of an OpenGL interface function is constructed, all called OpenGL interface functions in an application scene are intercepted by the interception module, the OpenGL interface functions are forwarded to corresponding OpenGL drivers, the OpenGL interface functions are exported to a code file by the OpenGL drivers, and the code file is compiled and run according to performance optimization requirements of a GPU. The invention intercepts OpenGL interface functions and compiles the OpenGL interface functions, avoids using software middleware, and optimizes the performance of a graphics processing part.
The invention also sets different heterogeneous modes of the heterogeneous computing architecture based on the computing task aiming at different applications and task scenes, and performs data computing through the GPU computing acceleration layer, the task management layer and the CPU layer. The GPU calculation is carried out in parallel through the GPU calculation acceleration layer, data after the GPU calculation is transmitted to the task management layer, the data are integrated through the task management layer and transmitted to the CPU layer, and the data are integrally calculated and displayed through the CPU layer, so that the utilization of system resources is optimized, and the performance and the efficiency of the system are improved. Parallel processing, pipeline processing and mixed processing acceleration modes in heterogeneous computing are realized, processing waiting time is reduced, and system performance is further improved.
In the method, openCL actually utilizes the computing resources required to be performed on the GPU to perform parallel computing, and schedules the computing resources to enable the computing resources to perform parallel computing, so that the operating efficiency and the floating point computing performance are improved. And when the GPU performs parallel computation, the MPI parallel computation library can be used for writing a parallel computation program. And when a floating point computing performance environment of the GPU is built, an MPI parallel computing library and an OpenCL library are used. Furthermore, compiling and transplanting are performed on a domestic platform through OpenCL, GPU floating point calculation performance is improved and tested through GPU parallel calculation, and the method comprises the following steps:
installing the driver related to the OpenCL interface, checking whether the related program is installed successfully or not,
compiling and installing corresponding versions mpich and OpenCL according to the domestic platform architecture, building a floating point performance testing tool library,
and calling the OpenCL library file in the floating point performance testing tool library which is successfully installed by using a GPU testing tool, starting testing after successful compiling, and testing the computing performance of the GPU after running the executable file.
The invention also provides a performance optimization system based on the domestic platform GPU, which comprises an interception module, a heterogeneous operation module and a compiling test module,
the interception module intercepts all called OpenGL interface functions in an application scene, exports the OpenGL interface functions to a code file, compiles and runs the code file according to the performance optimization requirement of a GPU,
the heterogeneous operation module is divided into a CPU layer, a task management layer and a GPU calculation acceleration layer based on a heterogeneous calculation architecture, GPU operation is carried out in parallel through the GPU calculation acceleration layer, data after GPU operation are transmitted to the task management layer, the data are integrated and transmitted to the CPU layer through the task management layer, the data are operated and displayed integrally through the CPU layer,
the compiling test module is compiled and transplanted on a domestic platform through OpenCL, improves the GPU floating point calculation performance by utilizing GPU parallel calculation, and tests the floating point calculation performance.
The information interaction, execution process and other contents between the modules in the system are based on the same concept as the method embodiment of the present invention, and specific contents can be referred to the description in the method embodiment of the present invention, and are not described herein again.
Similarly, the system optimizes the graphic processing performance of the GPU, accelerates multi-level GPU calculation, develops a CPU floating point calculation performance test tool through an OpenCL library, optimizes the performance of a domestic CPU, and solves the problem that a domestic platform does not have the GPU calculation performance test tool.
It should be noted that not all steps and modules in the above flows and system structures are necessary, and some steps or modules may be omitted according to actual needs. The execution sequence of the steps is not fixed and can be adjusted according to the needs. The system structures described in the above embodiments may be physical structures or logical structures, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities separately, or some components may be implemented together in a plurality of independent devices.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.
Claims (8)
1. A performance optimization method based on a domestic platform GPU is characterized in that all called OpenGL interface functions in an application scene are intercepted, the OpenGL interface functions are exported to a code file, the code file is compiled and operated according to the performance optimization requirement of the GPU,
based on heterogeneous computing architecture, the system is divided into a CPU layer, a task management layer and a GPU computing acceleration layer, GPU operation is carried out in parallel through the GPU computing acceleration layer, data after GPU operation are transmitted to the task management layer, the data are integrated and transmitted to the CPU layer through the task management layer, the data are operated and displayed integrally through the CPU layer,
compiling and transplanting are carried out on a domestic platform through OpenCL, GPU floating point calculation performance is improved through GPU parallel calculation, and the floating point calculation performance is tested.
2. The performance optimization method based on the domestic platform GPU of claim 1, which is characterized by comprising the steps of constructing an OpenGL interface function intercepting module, intercepting all called OpenGL interface functions in an application scene through the intercepting module, forwarding the OpenGL interface functions to corresponding OpenGL drivers, exporting the OpenGL interface functions to code files through the OpenGL drivers, and compiling and running the code files according to the performance optimization requirements of the GPU.
3. The performance optimization method based on the domestic platform GPU of claim 1, wherein different heterogeneous modes of a heterogeneous computing architecture are set based on computing tasks for different applications and task scenes, and data computing is performed through a GPU computing acceleration layer, a task management layer and a CPU layer.
4. The performance optimization method based on the domestic platform GPU of claim 1, wherein the compiling and transplanting are performed on the domestic platform through OpenCL, GPU floating point calculation performance is improved and tested by utilizing GPU parallel calculation, and the method comprises the following steps:
installing the driver related to the OpenCL interface, checking whether the related program is installed successfully or not,
compiling and installing corresponding versions mpich and OpenCL according to a domestic platform architecture, building a floating point performance testing tool library,
and calling the OpenCL library file in the floating point performance testing tool library which is successfully installed by using a GPU testing tool, starting testing after successful compiling, and testing the computing performance of the GPU after running the executable file.
5. A performance optimization system based on a domestic platform GPU is characterized by comprising an interception module, a heterogeneous operation module and a compiling test module,
the interception module intercepts all called OpenGL interface functions in an application scene, exports the OpenGL interface functions to a code file, compiles and runs the code file according to the performance optimization requirement of a GPU,
the heterogeneous operation module is divided into a CPU layer, a task management layer and a GPU calculation acceleration layer based on a heterogeneous calculation architecture, GPU operation is carried out in parallel through the GPU calculation acceleration layer, data after GPU operation are transmitted to the task management layer, the data are integrated and transmitted to the CPU layer through the task management layer, the data are operated and displayed integrally through the CPU layer,
the compiling test module is compiled and transplanted on a domestic platform through OpenCL, improves the GPU floating point calculation performance by utilizing GPU parallel calculation, and tests the floating point calculation performance.
6. The performance optimization system based on the domestic platform GPU of claim 5, wherein an intercept module of an OpenGL interface function is constructed, all called OpenGL interface functions in an application scene are intercepted by the intercept module, the OpenGL interface functions are forwarded to corresponding OpenGL drivers, the OpenGL interface functions are exported to a code file by the OpenGL drivers, and the code file is compiled and run according to performance optimization requirements of the GPU.
7. The performance optimization system based on the domestic platform GPU of claim 5, wherein the heterogeneous operation module sets different heterogeneous modes of a heterogeneous computation architecture based on operation tasks for different applications and task scenes, and performs data operation through a GPU computation acceleration layer, a task management layer and a CPU layer.
8. The performance optimization system based on the domestic platform GPU of claim 5, wherein the compiling test module is used for compiling and transplanting on the domestic platform through OpenCL, utilizing GPU parallel computing to improve GPU floating point computing performance and test the floating point computing performance, and comprises:
installing the driver related to the OpenCL interface, checking whether the related program is installed successfully or not,
compiling and installing corresponding versions mpich and OpenCL according to a domestic platform architecture, building a floating point performance testing tool library,
and calling the OpenCL library file in the floating point performance testing tool library which is successfully installed by using a GPU testing tool, starting testing after successful compiling, and testing the computing performance of the GPU after running the executable file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211370874.7A CN115641249A (en) | 2022-11-03 | 2022-11-03 | Performance optimization method based on domestic platform GPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211370874.7A CN115641249A (en) | 2022-11-03 | 2022-11-03 | Performance optimization method based on domestic platform GPU |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115641249A true CN115641249A (en) | 2023-01-24 |
Family
ID=84947119
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211370874.7A Pending CN115641249A (en) | 2022-11-03 | 2022-11-03 | Performance optimization method based on domestic platform GPU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115641249A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117910525A (en) * | 2024-01-19 | 2024-04-19 | 上海算法创新研究院 | Large model conversion and training system based on domestic GPU deep learning |
-
2022
- 2022-11-03 CN CN202211370874.7A patent/CN115641249A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117910525A (en) * | 2024-01-19 | 2024-04-19 | 上海算法创新研究院 | Large model conversion and training system based on domestic GPU deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110096338B (en) | Intelligent contract execution method, device, equipment and medium | |
US10942716B1 (en) | Dynamic computational acceleration using a heterogeneous hardware infrastructure | |
Sun et al. | Hetero-mark, a benchmark suite for CPU-GPU collaborative computing | |
US11354159B2 (en) | Method, a device, and a computer program product for determining a resource required for executing a code segment | |
US9721092B2 (en) | Monitoring an application in a process virtual machine | |
US10768916B2 (en) | Dynamic generation of CPU instructions and use of the CPU instructions in generated code for a softcore processor | |
US20160085546A1 (en) | Source Code Separation and Generation for Heterogeneous Central Processing Unit (CPU) Computational Devices | |
CN109032706A (en) | Intelligent contract executes method, apparatus, equipment and storage medium | |
US20080216064A1 (en) | Method, Architecture and Software of Meta-Operating System, Operating Systems and Applications For Parallel Computing Platforms | |
JP2015524126A (en) | Adaptively portable library | |
WO2013192236A1 (en) | Profiling application code to identify code portions for fpga implementation | |
CN111625289B (en) | Method and device for quickly starting application program and electronic equipment | |
CN111563253B (en) | Intelligent contract operation method, device, equipment and storage medium | |
CN112394938A (en) | Method and device for configuring heterogeneous components in accelerator | |
CN115641249A (en) | Performance optimization method based on domestic platform GPU | |
CN117546139A (en) | Deterministic replay of multi-line Cheng Zongji on a multi-threaded processor | |
US10620916B2 (en) | Read-only communication operator | |
US20190324782A1 (en) | Class splitting in object-oriented environments | |
Kjorveziroski et al. | WebAssembly as an enabler for next generation serverless computing | |
US10620980B2 (en) | Techniques for native runtime of hypertext markup language graphics content | |
CN111176663A (en) | Data processing method, device and equipment of application program and storage medium | |
CN116051031A (en) | Project scheduling system, medium and electronic equipment | |
CN115390986A (en) | Intelligent contract parallel execution system based on state cryptographic chip | |
US11061703B2 (en) | Managed runtime data marshaling for native code access using a thread local native buffer | |
CN114153433B (en) | Method for carrying out operator acceleration by using OCaml functional language to call GPU |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |