CN106709861B - Stainer driving static reconstruction method - Google Patents
Stainer driving static reconstruction method Download PDFInfo
- Publication number
- CN106709861B CN106709861B CN201611140690.6A CN201611140690A CN106709861B CN 106709861 B CN106709861 B CN 106709861B CN 201611140690 A CN201611140690 A CN 201611140690A CN 106709861 B CN106709861 B CN 106709861B
- Authority
- CN
- China
- Prior art keywords
- module
- optimization
- atomic
- stainer
- program
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Generation (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention belongs to the field of computer graphics, and particularly relates to a stainer driving static reconstruction method, which comprises the following steps: the driver atomic segment dividing module (1) divides the driver code into atomic segments and sends the atomic segments to the original sub-segment program reconstructing module (3); the original sub-segment program reconstruction module (3) extracts corresponding atomic segments according to the configured functional parameters, reconstructs and generates required software codes, and sends the software codes to the instruction optimization module (4); the instruction optimization module (4) receives the software codes sent by the original sub-segment program reconstruction module (3) for optimization, and then sends the driving program to the machine code generation module (5); the machine code generation module (5) generates a corresponding machine code. According to the method, the original segments of the stainer driving codes are divided, and the related codes are generated through static reconstruction according to the requirements of different scenes configured by a user, so that invalid codes are eliminated, the optimization of a stainer driving program is achieved, and the running performance of the stainer is improved.
Description
Technical Field
The invention belongs to the field of computer graphics, and particularly relates to a stainer driving static reconstruction method.
Background
The shader driver is the core part of the graphics processor, and the running efficiency of the shader driver directly determines the performance of the graphics processor. Most of the existing graphic processors are realized in a large-scale programmable stainer array form, and module division and related optimization work are not carried out, so that a stainer driving program is complex and redundant, and becomes a bottleneck for improving the performance of the graphic processor.
Disclosure of Invention
The purpose of the invention is:
the invention mainly provides a stainer driving static reconstruction method, which optimizes a stainer driving program, thereby improving the performance of a graphic processor.
The solution of the invention is:
a stainer driven static reconstruction method, comprising:
step 1, a driver atomic segment dividing module (1) divides a driver code into the most basic atomic segments and sends the generated atomic segments to an original sub-segment program reconstructing module (3);
step 2, in the user function configuration module (2), a user statically specifies function parameters to be used;
step 3, the atomic segment program reconstruction module (3) extracts corresponding atomic segments in the driving program atomic segment division module (1) according to the functional parameters configured by the user function configuration module (2), reconstructs and generates required software codes, and sends the software codes to the instruction optimization module (4);
step 4, the instruction optimization module (4) receives the software codes sent by the original sub-segment program reconstruction module (3), performs data correlation optimization and structure correlation optimization, and sends the optimized driver to the machine code generation module (5);
and 5, receiving the driving program of the instruction optimization module (4) by the machine code generation module (5) to generate a corresponding machine code.
Step 4, the data correlation optimization in the instruction optimization module (4) refers to: the close instructions do not have the limitation of write after write, read after write and write after read, and in a multi-emission mechanism, the instructions can be executed at the same time.
Step 4, the structural correlation optimization in the instruction optimization module (4) is as follows: multiple instructions that are close to each other can be executed in different arithmetic units at the same time.
The invention has the advantages that: according to the stainer drive static reconstruction method provided by the invention, a stainer drive program is divided into original subsections, and corresponding atomic sections are extracted according to the function parameters statically configured by a user to complete static reconstruction, so that redundant codes are eliminated, optimization of stainer drive is realized, and the performance of a graphic processor is improved.
Drawings
FIG. 1 is a block diagram of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The technical solution of the present invention is further described in detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, a method for reconstructing a stainer-driven static state according to an embodiment of the present invention includes:
step 1, a driver atomic segment dividing module (1) divides a driver code into the most basic atomic segments and sends the generated atomic segments to an original sub-segment program reconstructing module (3);
step 2, in the user function configuration module (2), a user statically specifies function parameters to be used;
step 3, the atomic segment program reconstruction module (3) extracts corresponding atomic segments in the driving program atomic segment division module (1) according to the functional parameters configured by the user function configuration module (2), reconstructs and generates required software codes, and sends the software codes to the instruction optimization module (4);
step 4, the instruction optimization module (4) receives the software codes sent by the original sub-segment program reconstruction module (3), performs data correlation optimization and structure correlation optimization, and sends the optimized driver to the machine code generation module (5);
and 5, receiving the driving program of the instruction optimization module (4) by the machine code generation module (5) to generate a corresponding machine code.
Step 4, the data correlation optimization in the instruction optimization module (4) refers to: the close instructions do not have the limitation of write after write, read after write and write after read, and in a multi-emission mechanism, the instructions can be executed at the same time.
Step 4, the structural correlation optimization in the instruction optimization module (4) is as follows: multiple instructions that are close to each other can be executed in different arithmetic units at the same time.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (1)
1. A stainer driven static reconstruction method, comprising:
step 1, a driver atomic segment dividing module (1) divides a driver code into the most basic atomic segments and sends the generated atomic segments to an original sub-segment program reconstructing module (3);
step 2, in the user function configuration module (2), a user statically specifies function parameters to be used;
step 3, the atomic segment program reconstruction module (3) extracts corresponding atomic segments in the driving program atomic segment division module (1) according to the functional parameters configured by the user function configuration module (2), reconstructs and generates required software codes, and sends the software codes to the instruction optimization module (4);
step 4, the instruction optimization module (4) receives the software codes sent by the original sub-segment program reconstruction module (3), performs data correlation optimization and structure correlation optimization, and sends the optimized driver to the machine code generation module (5);
step 5, the machine code generation module (5) receives the driving program of the instruction optimization module (4) and generates a corresponding machine code;
step 4, the data correlation optimization in the instruction optimization module (4) refers to: the close instructions do not have the limits of write after write, read after write and write after read, and in a multi-emission mechanism, the instructions can be executed at the same time;
step 4, the structural correlation optimization in the instruction optimization module (4) is as follows: multiple instructions that are close to each other can be executed in different arithmetic units at the same time.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611140690.6A CN106709861B (en) | 2016-12-12 | 2016-12-12 | Stainer driving static reconstruction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611140690.6A CN106709861B (en) | 2016-12-12 | 2016-12-12 | Stainer driving static reconstruction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106709861A CN106709861A (en) | 2017-05-24 |
CN106709861B true CN106709861B (en) | 2020-08-11 |
Family
ID=58936875
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611140690.6A Active CN106709861B (en) | 2016-12-12 | 2016-12-12 | Stainer driving static reconstruction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106709861B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109800088B (en) * | 2018-11-14 | 2023-06-20 | 西安翔腾微电子科技有限公司 | Training-based GPU configuration management method and device, storage medium and GPU |
CN109726816A (en) * | 2018-12-12 | 2019-05-07 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of assembly level stainer program chains optimization method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8595439B1 (en) * | 2007-09-28 | 2013-11-26 | The Mathworks, Inc. | Optimization of cache configuration for application design |
CN105549932A (en) * | 2015-12-11 | 2016-05-04 | 中国航空工业集团公司西安航空计算技术研究所 | Graphic processor host driver software structure |
CN105574807A (en) * | 2015-12-11 | 2016-05-11 | 中国航空工业集团公司西安航空计算技术研究所 | Development platform of programmable stainer embedded into graphics processor |
-
2016
- 2016-12-12 CN CN201611140690.6A patent/CN106709861B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8595439B1 (en) * | 2007-09-28 | 2013-11-26 | The Mathworks, Inc. | Optimization of cache configuration for application design |
CN105549932A (en) * | 2015-12-11 | 2016-05-04 | 中国航空工业集团公司西安航空计算技术研究所 | Graphic processor host driver software structure |
CN105574807A (en) * | 2015-12-11 | 2016-05-11 | 中国航空工业集团公司西安航空计算技术研究所 | Development platform of programmable stainer embedded into graphics processor |
Non-Patent Citations (1)
Title |
---|
MIGPU-9多核交互式图形处理器的设计;邓军勇 等;《计算机辅助设计与图形学学报》;20140930;第26卷(第9期);1468-1478 * |
Also Published As
Publication number | Publication date |
---|---|
CN106709861A (en) | 2017-05-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fuller et al. | Computing performance: Game over or next level? | |
CN110032395B (en) | Unified register file for improving resource utilization | |
WO2013123405A1 (en) | Profiling and sequencing operators executable in an emulated computing system | |
CN106709861B (en) | Stainer driving static reconstruction method | |
DE102021104561A1 (en) | ASYNCHRONOUS DATA MOVEMENT PIPELINE | |
CN105700956A (en) | Distributed job processing method and system | |
Yadav et al. | Performance analysis for Android runtime environment | |
Yamashina et al. | Proposal of ROS-compliant FPGA component for low-power robotic systems | |
US20150227350A1 (en) | Multi-dimensional, multi-configuration compilation phase output visualization technique | |
Greengard | GPUs reshape computing | |
CN106683033B (en) | Out-of-order OpenGL interface processing method | |
EP3752914B1 (en) | Techniques for native runtime of hypertext markup language graphics content | |
Patchett et al. | Parallel multi-layer ghost cell generation for distributed unstructured grids | |
US20220197615A1 (en) | Data parallel programming task graph optimization through device telemetry | |
CN118043773A (en) | Operating on matrix operands without restriction of storage locations of the operands in memory | |
CN116521254A (en) | Graph-based memory storage | |
CN112799724B (en) | Stable control device strategy table analysis and calculation method and device | |
DE112022002258T5 (en) | TENSOR MODIFICATION BASED ON RESOURCE PROCESSING | |
DE112021003991T5 (en) | TECHNIQUES FOR GENERATION OF INTERPOLATED VIDEO IMAGES | |
CN106708594B (en) | Implementation method of hierarchical OpenGL runtime compiling software | |
CN105867847A (en) | Memory access control method, device and system | |
CN107918958B (en) | Visualization and customizable three-dimensional rendering system and method | |
CN109671013B (en) | High-performance graphic instruction storage and distribution method supporting multiple GPUs | |
CN116821008B (en) | Processing device with improved cache hit rate and cache device thereof | |
Hu et al. | Hardware/software co-design for high performance computing: Challenges and opportunities |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |