CN115641249A - Performance optimization method based on domestic platform GPU - Google Patents

Performance optimization method based on domestic platform GPU Download PDF

Info

Publication number
CN115641249A
CN115641249A CN202211370874.7A CN202211370874A CN115641249A CN 115641249 A CN115641249 A CN 115641249A CN 202211370874 A CN202211370874 A CN 202211370874A CN 115641249 A CN115641249 A CN 115641249A
Authority
CN
China
Prior art keywords
gpu
performance
layer
computing
floating point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211370874.7A
Other languages
Chinese (zh)
Inventor
李艳
吴登勇
孙志杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Chaoyue Shentai Information Technology Co Ltd
Original Assignee
Xian Chaoyue Shentai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Chaoyue Shentai Information Technology Co Ltd filed Critical Xian Chaoyue Shentai Information Technology Co Ltd
Priority to CN202211370874.7A priority Critical patent/CN115641249A/en
Publication of CN115641249A publication Critical patent/CN115641249A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a performance optimization method based on a domestic platform GPU, relating to the technical field of GPU optimization; intercepting all called OpenGL interface functions in an application scene, exporting the OpenGL interface functions to a code file, compiling and running the code file according to the performance optimization requirement of a GPU, dividing the code file into a CPU layer, a task management layer and a GPU computing acceleration layer based on a heterogeneous computing architecture, performing GPU operation in parallel through the GPU computing acceleration layer, transmitting data after GPU operation to the task management layer, integrating the data through the task management layer and transmitting the data to the CPU layer, performing overall operation data display through the CPU layer, compiling and transplanting on a domestic platform through OpenCL, improving the GPU floating point computing performance by utilizing GPU parallel computing, and testing the floating point computing performance.

Description

Performance optimization method based on domestic platform GPU
Technical Field
The invention discloses a method, relates to the technical field of GPU optimization, and particularly relates to a performance optimization method based on a domestic platform GPU.
Background
In a networked environment, graphics processing of modern computers is becoming more and more important, and a special graphics processor is required to undertake a display task to meet the requirements of various fields such as aerospace, navigation, satellite data processing, biomedical research and the like on a high-performance GPU. The high-performance GPU with autonomous controllable function can avoid potential safety hazards such as backdoor and loophole, ensure the safety of information systems in China and realize autonomous controllable of national defense information systems.
The current autonomous GPU can be normally used in function, but the performance of the autonomous GPU also improves the space, particularly the 2D/3D graphics processing capability, floating point computing capability, display capability and the like of the GPU.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a performance optimization method based on a domestic platform GPU, and the performance of the domestic GPU is optimized.
The specific scheme provided by the invention is as follows:
the invention provides a performance optimization method based on a domestic platform GPU, which intercepts and captures all called OpenGL interface functions in an application scene, exports the OpenGL interface functions to a code file, compiles and runs the code file according to the performance optimization requirement of the GPU,
based on heterogeneous computing architecture, the system is divided into a CPU layer, a task management layer and a GPU computing acceleration layer, GPU operation is carried out in parallel through the GPU computing acceleration layer, data after GPU operation are transmitted to the task management layer, the data are integrated and transmitted to the CPU layer through the task management layer, the data are operated and displayed integrally through the CPU layer,
compiling and transplanting are carried out on a domestic platform through OpenCL, GPU floating point calculation performance is improved through GPU parallel calculation, and the floating point calculation performance is tested.
Furthermore, in the performance optimization method based on the domestic platform GPU, an interception module of an OpenGL interface function is constructed, all called OpenGL interface functions in an application scene are intercepted by the interception module, the OpenGL interface functions are forwarded to corresponding OpenGL drivers, the OpenGL interface functions are exported to code files by the OpenGL drivers, and the code files are compiled and run according to performance optimization requirements of the GPU.
Furthermore, in the performance optimization method based on the domestic platform GPU, different heterogeneous modes of a heterogeneous computing architecture are set based on computing tasks aiming at different applications and task scenes, and data computing is performed through a GPU computing acceleration layer, a task management layer and a CPU layer.
Further, in the performance optimization method based on the GPU of the home-made platform, the compiling and transplanting are performed on the home-made platform through OpenCL, the GPU floating point calculation performance is improved by using the GPU parallel calculation, and the floating point calculation performance is tested, including:
installing OpenCL interface related drivers, checking whether the related programs are installed successfully,
compiling and installing corresponding versions mpich and OpenCL according to the domestic platform architecture, building a floating point performance testing tool library,
and calling the OpenCL library file in the successfully installed floating point performance testing tool library by using a GPU testing tool, starting testing after successful compiling, and testing the computing performance of the GPU after running the executable file.
The invention also provides a performance optimization system based on the domestic platform GPU, which comprises an interception module, a heterogeneous operation module and a compiling test module,
the interception module intercepts all called OpenGL interface functions in an application scene, exports the OpenGL interface functions to a code file, compiles and runs the code file according to the performance optimization requirement of a GPU,
the heterogeneous operation module is divided into a CPU layer, a task management layer and a GPU calculation acceleration layer based on a heterogeneous calculation architecture, GPU operation is carried out in parallel through the GPU calculation acceleration layer, data after GPU operation are transmitted to the task management layer, the data are integrated and transmitted to the CPU layer through the task management layer, the data are operated and displayed integrally through the CPU layer,
the compiling test module is compiled and transplanted on a domestic platform through OpenCL, improves the GPU floating point calculation performance by utilizing GPU parallel calculation, and tests the floating point calculation performance.
Further, an intercept module of an OpenGL interface function is constructed in the performance optimization system based on the domestic platform GPU, all called OpenGL interface functions in an application scene are intercepted by the intercept module, the OpenGL interface functions are forwarded to corresponding OpenGL drivers, the OpenGL interface functions are exported to code files by the OpenGL drivers, and the code files are compiled and run according to performance optimization requirements of the GPU.
Furthermore, the heterogeneous operation module in the performance optimization system based on the domestic platform GPU sets different heterogeneous modes of a heterogeneous computation framework based on the operation task aiming at different applications and task scenes, and performs data operation through a GPU computation acceleration layer, a task management layer and a CPU layer.
Further, in the performance optimization system based on the GPU in the domestic platform, the compiling and testing module performs compiling and transplanting on the domestic platform through OpenCL, improves the floating point calculation performance of the GPU by using GPU parallel calculation, and tests the floating point calculation performance, including:
installing the driver related to the OpenCL interface, checking whether the related program is installed successfully or not,
compiling and installing corresponding versions mpich and OpenCL according to a domestic platform architecture, building a floating point performance testing tool library,
and calling the OpenCL library file in the floating point performance testing tool library which is successfully installed by using a GPU testing tool, starting testing after successful compiling, and testing the computing performance of the GPU after running the executable file.
The invention has the advantages that:
the invention provides a performance optimization method based on a domestic platform GPU, which optimizes the graphic processing performance of the GPU, accelerates the calculation of a multi-level GPU, develops a CPU floating point calculation performance test tool through an OpenCL library, optimizes the performance of the domestic CPU and solves the problem that the domestic platform does not have the GPU calculation performance test tool.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic view of the process flow of optimizing an image according to the method of the present invention.
FIG. 2 is a schematic diagram illustrating the operation acceleration process implemented by the method of the present invention.
Detailed Description
The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.
The invention provides a performance optimization method based on a domestic platform GPU, which intercepts and captures all called OpenGL interface functions in an application scene, exports the OpenGL interface functions to a code file, compiles and runs the code file according to the performance optimization requirement of the GPU,
based on heterogeneous computing architecture, the heterogeneous computing architecture is divided into a CPU layer, a task management layer and a GPU computing acceleration layer, GPU operation is carried out in parallel through the GPU computing acceleration layer, data after GPU operation are transmitted to the task management layer, the data are integrated through the task management layer and transmitted to the CPU layer, the data are operated and displayed through the CPU layer,
compiling and transplanting are carried out on a domestic platform through OpenCL, GPU floating point calculation performance is improved through GPU parallel calculation, and the floating point calculation performance is tested.
The method provided by the invention avoids using a software middleware mode by intercepting OpenGL interface functions and compiling the OpenGL interface functions, and optimizes the performance of a graphics processing part. Multilevel GPU calculation acceleration is carried out, heterogeneous acceleration between the CPU and the GPU is designed, and communication and parallel calculation distribution between the GPUs are further improved. The invention can test the floating point calculation performance of the GPU by compiling and transplanting OpenCL on a domestic platform.
In a specific application, in some embodiments of the method of the present invention, an interception module of an OpenGL interface function is constructed, all called OpenGL interface functions in an application scene are intercepted by the interception module, the OpenGL interface functions are forwarded to corresponding OpenGL drivers, the OpenGL interface functions are exported to a code file by the OpenGL drivers, and the code file is compiled and run according to performance optimization requirements of a GPU. The invention intercepts OpenGL interface functions and compiles the OpenGL interface functions, avoids using software middleware, and optimizes the performance of a graphics processing part.
The invention also sets different heterogeneous modes of the heterogeneous computing architecture based on the computing task aiming at different applications and task scenes, and performs data computing through the GPU computing acceleration layer, the task management layer and the CPU layer. The GPU calculation is carried out in parallel through the GPU calculation acceleration layer, data after the GPU calculation is transmitted to the task management layer, the data are integrated through the task management layer and transmitted to the CPU layer, and the data are integrally calculated and displayed through the CPU layer, so that the utilization of system resources is optimized, and the performance and the efficiency of the system are improved. Parallel processing, pipeline processing and mixed processing acceleration modes in heterogeneous computing are realized, processing waiting time is reduced, and system performance is further improved.
In the method, openCL actually utilizes the computing resources required to be performed on the GPU to perform parallel computing, and schedules the computing resources to enable the computing resources to perform parallel computing, so that the operating efficiency and the floating point computing performance are improved. And when the GPU performs parallel computation, the MPI parallel computation library can be used for writing a parallel computation program. And when a floating point computing performance environment of the GPU is built, an MPI parallel computing library and an OpenCL library are used. Furthermore, compiling and transplanting are performed on a domestic platform through OpenCL, GPU floating point calculation performance is improved and tested through GPU parallel calculation, and the method comprises the following steps:
installing the driver related to the OpenCL interface, checking whether the related program is installed successfully or not,
compiling and installing corresponding versions mpich and OpenCL according to the domestic platform architecture, building a floating point performance testing tool library,
and calling the OpenCL library file in the floating point performance testing tool library which is successfully installed by using a GPU testing tool, starting testing after successful compiling, and testing the computing performance of the GPU after running the executable file.
The invention also provides a performance optimization system based on the domestic platform GPU, which comprises an interception module, a heterogeneous operation module and a compiling test module,
the interception module intercepts all called OpenGL interface functions in an application scene, exports the OpenGL interface functions to a code file, compiles and runs the code file according to the performance optimization requirement of a GPU,
the heterogeneous operation module is divided into a CPU layer, a task management layer and a GPU calculation acceleration layer based on a heterogeneous calculation architecture, GPU operation is carried out in parallel through the GPU calculation acceleration layer, data after GPU operation are transmitted to the task management layer, the data are integrated and transmitted to the CPU layer through the task management layer, the data are operated and displayed integrally through the CPU layer,
the compiling test module is compiled and transplanted on a domestic platform through OpenCL, improves the GPU floating point calculation performance by utilizing GPU parallel calculation, and tests the floating point calculation performance.
The information interaction, execution process and other contents between the modules in the system are based on the same concept as the method embodiment of the present invention, and specific contents can be referred to the description in the method embodiment of the present invention, and are not described herein again.
Similarly, the system optimizes the graphic processing performance of the GPU, accelerates multi-level GPU calculation, develops a CPU floating point calculation performance test tool through an OpenCL library, optimizes the performance of a domestic CPU, and solves the problem that a domestic platform does not have the GPU calculation performance test tool.
It should be noted that not all steps and modules in the above flows and system structures are necessary, and some steps or modules may be omitted according to actual needs. The execution sequence of the steps is not fixed and can be adjusted according to the needs. The system structures described in the above embodiments may be physical structures or logical structures, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities separately, or some components may be implemented together in a plurality of independent devices.
The above-mentioned embodiments are merely preferred embodiments for fully illustrating the present invention, and the scope of the present invention is not limited thereto. The equivalent substitution or change made by the technical personnel in the technical field on the basis of the invention is all within the protection scope of the invention. The protection scope of the invention is subject to the claims.

Claims (8)

1. A performance optimization method based on a domestic platform GPU is characterized in that all called OpenGL interface functions in an application scene are intercepted, the OpenGL interface functions are exported to a code file, the code file is compiled and operated according to the performance optimization requirement of the GPU,
based on heterogeneous computing architecture, the system is divided into a CPU layer, a task management layer and a GPU computing acceleration layer, GPU operation is carried out in parallel through the GPU computing acceleration layer, data after GPU operation are transmitted to the task management layer, the data are integrated and transmitted to the CPU layer through the task management layer, the data are operated and displayed integrally through the CPU layer,
compiling and transplanting are carried out on a domestic platform through OpenCL, GPU floating point calculation performance is improved through GPU parallel calculation, and the floating point calculation performance is tested.
2. The performance optimization method based on the domestic platform GPU of claim 1, which is characterized by comprising the steps of constructing an OpenGL interface function intercepting module, intercepting all called OpenGL interface functions in an application scene through the intercepting module, forwarding the OpenGL interface functions to corresponding OpenGL drivers, exporting the OpenGL interface functions to code files through the OpenGL drivers, and compiling and running the code files according to the performance optimization requirements of the GPU.
3. The performance optimization method based on the domestic platform GPU of claim 1, wherein different heterogeneous modes of a heterogeneous computing architecture are set based on computing tasks for different applications and task scenes, and data computing is performed through a GPU computing acceleration layer, a task management layer and a CPU layer.
4. The performance optimization method based on the domestic platform GPU of claim 1, wherein the compiling and transplanting are performed on the domestic platform through OpenCL, GPU floating point calculation performance is improved and tested by utilizing GPU parallel calculation, and the method comprises the following steps:
installing the driver related to the OpenCL interface, checking whether the related program is installed successfully or not,
compiling and installing corresponding versions mpich and OpenCL according to a domestic platform architecture, building a floating point performance testing tool library,
and calling the OpenCL library file in the floating point performance testing tool library which is successfully installed by using a GPU testing tool, starting testing after successful compiling, and testing the computing performance of the GPU after running the executable file.
5. A performance optimization system based on a domestic platform GPU is characterized by comprising an interception module, a heterogeneous operation module and a compiling test module,
the interception module intercepts all called OpenGL interface functions in an application scene, exports the OpenGL interface functions to a code file, compiles and runs the code file according to the performance optimization requirement of a GPU,
the heterogeneous operation module is divided into a CPU layer, a task management layer and a GPU calculation acceleration layer based on a heterogeneous calculation architecture, GPU operation is carried out in parallel through the GPU calculation acceleration layer, data after GPU operation are transmitted to the task management layer, the data are integrated and transmitted to the CPU layer through the task management layer, the data are operated and displayed integrally through the CPU layer,
the compiling test module is compiled and transplanted on a domestic platform through OpenCL, improves the GPU floating point calculation performance by utilizing GPU parallel calculation, and tests the floating point calculation performance.
6. The performance optimization system based on the domestic platform GPU of claim 5, wherein an intercept module of an OpenGL interface function is constructed, all called OpenGL interface functions in an application scene are intercepted by the intercept module, the OpenGL interface functions are forwarded to corresponding OpenGL drivers, the OpenGL interface functions are exported to a code file by the OpenGL drivers, and the code file is compiled and run according to performance optimization requirements of the GPU.
7. The performance optimization system based on the domestic platform GPU of claim 5, wherein the heterogeneous operation module sets different heterogeneous modes of a heterogeneous computation architecture based on operation tasks for different applications and task scenes, and performs data operation through a GPU computation acceleration layer, a task management layer and a CPU layer.
8. The performance optimization system based on the domestic platform GPU of claim 5, wherein the compiling test module is used for compiling and transplanting on the domestic platform through OpenCL, utilizing GPU parallel computing to improve GPU floating point computing performance and test the floating point computing performance, and comprises:
installing the driver related to the OpenCL interface, checking whether the related program is installed successfully or not,
compiling and installing corresponding versions mpich and OpenCL according to a domestic platform architecture, building a floating point performance testing tool library,
and calling the OpenCL library file in the floating point performance testing tool library which is successfully installed by using a GPU testing tool, starting testing after successful compiling, and testing the computing performance of the GPU after running the executable file.
CN202211370874.7A 2022-11-03 2022-11-03 Performance optimization method based on domestic platform GPU Pending CN115641249A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211370874.7A CN115641249A (en) 2022-11-03 2022-11-03 Performance optimization method based on domestic platform GPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211370874.7A CN115641249A (en) 2022-11-03 2022-11-03 Performance optimization method based on domestic platform GPU

Publications (1)

Publication Number Publication Date
CN115641249A true CN115641249A (en) 2023-01-24

Family

ID=84947119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211370874.7A Pending CN115641249A (en) 2022-11-03 2022-11-03 Performance optimization method based on domestic platform GPU

Country Status (1)

Country Link
CN (1) CN115641249A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117910525A (en) * 2024-01-19 2024-04-19 上海算法创新研究院 Large model conversion and training system based on domestic GPU deep learning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117910525A (en) * 2024-01-19 2024-04-19 上海算法创新研究院 Large model conversion and training system based on domestic GPU deep learning

Similar Documents

Publication Publication Date Title
CN110096338B (en) Intelligent contract execution method, device, equipment and medium
US10942716B1 (en) Dynamic computational acceleration using a heterogeneous hardware infrastructure
Sun et al. Hetero-mark, a benchmark suite for CPU-GPU collaborative computing
US11354159B2 (en) Method, a device, and a computer program product for determining a resource required for executing a code segment
US9721092B2 (en) Monitoring an application in a process virtual machine
US10768916B2 (en) Dynamic generation of CPU instructions and use of the CPU instructions in generated code for a softcore processor
US20160085546A1 (en) Source Code Separation and Generation for Heterogeneous Central Processing Unit (CPU) Computational Devices
CN109032706A (en) Intelligent contract executes method, apparatus, equipment and storage medium
US20080216064A1 (en) Method, Architecture and Software of Meta-Operating System, Operating Systems and Applications For Parallel Computing Platforms
JP2015524126A (en) Adaptively portable library
WO2013192236A1 (en) Profiling application code to identify code portions for fpga implementation
CN111625289B (en) Method and device for quickly starting application program and electronic equipment
CN111563253B (en) Intelligent contract operation method, device, equipment and storage medium
CN112394938A (en) Method and device for configuring heterogeneous components in accelerator
CN115641249A (en) Performance optimization method based on domestic platform GPU
CN117546139A (en) Deterministic replay of multi-line Cheng Zongji on a multi-threaded processor
US10620916B2 (en) Read-only communication operator
US20190324782A1 (en) Class splitting in object-oriented environments
Kjorveziroski et al. WebAssembly as an enabler for next generation serverless computing
US10620980B2 (en) Techniques for native runtime of hypertext markup language graphics content
CN111176663A (en) Data processing method, device and equipment of application program and storage medium
CN116051031A (en) Project scheduling system, medium and electronic equipment
CN115390986A (en) Intelligent contract parallel execution system based on state cryptographic chip
US11061703B2 (en) Managed runtime data marshaling for native code access using a thread local native buffer
CN114153433B (en) Method for carrying out operator acceleration by using OCaml functional language to call GPU

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination