CN113721899A - GPDSP-oriented lightweight efficient assembly code programming method and system - Google Patents

GPDSP-oriented lightweight efficient assembly code programming method and system Download PDF

Info

Publication number
CN113721899A
CN113721899A CN202111028130.2A CN202111028130A CN113721899A CN 113721899 A CN113721899 A CN 113721899A CN 202111028130 A CN202111028130 A CN 202111028130A CN 113721899 A CN113721899 A CN 113721899A
Authority
CN
China
Prior art keywords
instruction
assembly code
beat
gpdsp
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111028130.2A
Other languages
Chinese (zh)
Other versions
CN113721899B (en
Inventor
陈照云
文梅
马奕民
时洋
孔玺畅
扈啸
王耀华
孙海燕
邓灿
赵宵磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202111028130.2A priority Critical patent/CN113721899B/en
Publication of CN113721899A publication Critical patent/CN113721899A/en
Application granted granted Critical
Publication of CN113721899B publication Critical patent/CN113721899B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • G06F8/314Parallel programming languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/441Register allocation; Assignment of physical memory space to logical memory space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/445Exploiting fine grain parallelism, i.e. parallelism at instruction level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/76Adapting program code to run in a different environment; Porting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a GPDSP-oriented lightweight high-efficiency assembly code programming method and a GPDSP-oriented lightweight high-efficiency assembly code programming system, wherein the method comprises the following steps: inputting a serial assembly code, wherein the serial assembly code is formed by instruction serial, the sequence of instruction arrangement represents the sequence of instruction execution and effect, and each instruction does not contain effective parallel symbols and functional unit information; automatically assigning functional units according to instructions in the serial assembly codes, and determining the earliest starting time of the instructions and the occupation beats of the read ports and the write ports of the assigned functional units; and performing instruction parallel scheduling on the instructions after finishing the functional unit assignment and determining the starting clock period, and automatically distributing registers to obtain the final assembly code. The invention can greatly reduce the development difficulty of the assembly code, improve the development efficiency of the assembly code, shorten the development period, is simultaneously suitable for various GPDSP platform architectures, can provide support for cross-platform architectures and has good portability.

Description

GPDSP-oriented lightweight efficient assembly code programming method and system
Technical Field
The invention relates to the technical field of assembly code programming, in particular to a GPDSP-oriented lightweight high-efficiency assembly code programming method and a GPDSP-oriented lightweight high-efficiency assembly code programming system.
Background
The GPDSP (General Purpose digital signal processor, DSP) has wide applications in a plurality of fields such as signal processing, wireless communication, radar image processing, and the like, and simultaneously, with the continuous extension of application requirements, new challenges are also provided for the computation performance of the GPDSP. In order to adapt to new application scenarios and computing requirements, the mainstream GPDSP generally supports fixed-point and floating-point computing, has the characteristics of vector processing, Single Instruction Multiple Data (SIMD) support, and Very Long Instruction Word (VLIW), and exploits parallelism in upper-layer applications to the maximum extent, thereby improving computing efficiency. DSP of FT-Matrix series and Texas instruments (Texas instruments) which are independently developed in China are typical representatives of GPDSP.
In order to improve the hardware utilization rate and maximize the application performance, the development of application core code segments or high-performance algorithm libraries on the mainstream GPDSP at present still adopts assembly language for development. The assembly language is closer to the bottom layer system structure, and can directly and efficiently complete the corresponding user instruction. Because the GPDSP adopts the VLIW architecture, the assembly instructions thereof usually need to be statically arranged, that is, all instructions are generated and arranged in the compiling stage, and during manual assembly and compiling, the constraints of static arrangement pose a very high challenge to programmers, and many aspects including instruction beat, delay slot hiding, register allocation, read-write port conflict, instruction parallel scheduling, and the like need to be considered. Typically, the development of an efficient compilation algorithm library requires a development period of at least one month for a skilled programmer, while for complex applications, a longer development period is required. In addition, the assembly code has strong coupling with instruction set information and a chip hardware structure after being manually and optimally scheduled, and therefore, the assembly code has no portability at all. Even on the same series of GPDSP products, even if the execution beat or the distributed functional units of the same instruction are changed, the whole assembly code is greatly influenced, and even the code needs to be completely rewritten.
In summary, when the VLIW-based GPDSP performs assembly code programming, a programmer needs to consider various programming constraints, including instruction beat, delay slot hiding, register allocation, read/write port conflict, instruction parallel scheduling, and the like. These constraints result in a large labor and time cost to complete an efficient core code or algorithm library, while the resulting assembly code is only oriented to a specific architecture, and once the instruction set information is adjusted, the entire assembly code cannot be directly migrated.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the invention provides a GPDSP-oriented lightweight high-efficiency assembly code programming method and system, which can greatly reduce the assembly code development difficulty, improve the assembly code development efficiency and shorten the development period, can be suitable for various GPDSP platform architectures, can provide support for cross-platform architectures and has good portability.
In order to solve the technical problems, the invention adopts the technical scheme that:
a GPDSP-oriented lightweight efficient assembly code programming method comprises the following steps:
1) inputting a serial assembly code, wherein the serial assembly code is formed by instruction serial, the sequence of instruction arrangement represents the sequence of instruction execution and effect, and each instruction does not contain effective parallel symbols and functional unit information;
2) automatically assigning functional units according to instructions in the serial assembly codes, and determining the earliest starting time of the instructions and the occupation beats of the read ports and the write ports of the assigned functional units;
3) and performing instruction parallel scheduling on the instructions after finishing the functional unit assignment and determining the starting clock period so as to obtain the final assembly code.
Optionally, step 2) comprises:
2.1) traversing and taking out an instruction to be scheduled from the serial assembly code according to the appearance sequence to be used as a current instruction x; if all the instructions in the serial assembly code are completely traversed, skipping to execute the step 3);
2.2) inquiring the occupied beat j of the read-write port of the current instruction x, the execution beat t of the current instruction x and the executable functional unit set Y of the current instruction x from a preset instruction set description file; an initial beat i;
2.3) judging whether the beat i exceeds the limit of the preset longest execution cycle, if so, judging that the current instruction x cannot find a verified instruction placement scheme, and if not, skipping to execute the step 2.1); otherwise, skipping to execute the next step;
2.4) traversing and taking out one functional unit from the functional unit set Y as the current functional unit YkIf all the functional units in the functional unit set Y are completely traversed, adding 1 beat to the beat i, and then jumping to execute the step 2.3); otherwise, skipping to execute the next step;
2.5) determining the execution cycle of the current instruction x from the ith beat to the (i + t-1) th beat; current instruction x to current functional unit ykThe beat occupied by the read port is from the ith beat to the (i + j-1) th beat; current instruction x to current functional unit ykThe beat occupied by the writing port is from the ith + t-j to the ith + t-1;
2.6) judging the current instruction x to the current function listYuan-ykWhether the read port occupation beat and the write port occupation beat conflict with the previous instruction or not is judged, if no conflict exists, the placement scheme of the current instruction x is judged well, and the functional unit finally corresponding to the current instruction x is determined to be the current functional unit ykThe earliest starting time is beat i, and the current instruction x is recorded to the current functional unit ykThe read and write ports occupy beats; otherwise the jump executes step 2.4) to continue traversing the remaining functional units.
Optionally, the fields of each instruction in the preset instruction set description file include: instruction name, functional unit, execution beat, read operand, write operand, read port occupation beat, and write port occupation beat.
Optionally, step 3) comprises: aiming at the instructions which automatically assign the functional units and determine the earliest starting time and the beat occupied by the read and write ports of the assigned functional units, the instructions which use different functional units in the same clock cycle are packed in a VLIW instruction packet to realize concurrent execution, and a specified parallel symbol is added in front of each instruction from the second instruction in the VLIW instruction packet when codes are output.
Optionally, in the serial assembly code input in step 1), each instruction does not include valid condition field and operand list information, and the condition field and the operand list information both adopt register variables; the step 2) further comprises a step of performing automatic register allocation on register variables in the serial assembly code, and the register types allocated to the register variables during the automatic register allocation are consistent with the type description of the read/write operands of the allocated instructions in a preset instruction set description file.
Optionally, the step of performing automatic register allocation includes: reading a register variable declaration file of a platform corresponding to the serial assembly code, wherein the register variable declaration file comprises register types of the platform corresponding to the serial assembly code and single or paired use modes of all registers; and replacing the register variable in each instruction with the corresponding register based on the register type in the register variable declaration file and the single or paired use mode of each register.
Optionally, the execution subject of steps 1) to 3) is an assembly code compiler, and the step 3) is followed by a step in which the assembly code compiler continues to compile and connect the final assembly code to obtain a binary code.
Optionally, the execution subject of steps 1) to 3) is a lightweight compiler for implementing assembly code optimization, and step 3) is followed by a step in which the lightweight compiler outputs or sends the output final assembly code to an assembly code compiler for compiling and connecting the final assembly code to obtain a binary code.
In addition, the embodiment also provides a GPDSP-oriented lightweight and efficient assembly code programming device, which comprises a microprocessor and a memory which are connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the GPDSP-oriented lightweight and efficient assembly code programming method.
Furthermore, the present embodiment also provides a computer-readable storage medium having stored therein a computer program programmed or configured to execute the GPDSP-oriented lightweight efficient assembly code programming method.
Compared with the prior art, the invention has the following advantages:
1. the invention can enable a user to write serial assembly codes and ignore the function unit distribution and parallel scheduling of instructions, replace manual register distribution (selection) by operand quantization, and automatically realize the functions of function unit distribution, instruction parallel scheduling, register distribution (selection) and the like by programs, thereby simplifying the assembly programming difficulty of programmers, enabling the programmers to realize the programmers only by paying attention to the algorithm, and ensuring the high efficiency of generating the assembly codes.
2. The invention is suitable for various GPDSP platform architectures, can provide support for cross-platform architectures, and has good portability.
Drawings
FIG. 1 is a schematic diagram of the basic principle of the method according to the embodiment of the present invention.
FIG. 2 is a detailed flowchart of step 2) in the embodiment of the present invention.
FIG. 3 is a diagram illustrating a format of an instruction set description file according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating a format of an instruction according to an embodiment of the present invention.
Detailed Description
The first embodiment is as follows:
as shown in fig. 1, the GPDSP-oriented lightweight and efficient assembly code programming method of the present embodiment includes:
1) inputting a serial assembly code, wherein the serial assembly code is formed by instruction serial, the sequence of instruction arrangement represents the sequence of instruction execution and effect, and each instruction does not contain effective parallel symbols and functional unit information;
2) automatically assigning functional units according to instructions in the serial assembly codes, and determining the earliest starting time of the instructions and the occupation beats of the read ports and the write ports of the assigned functional units;
3) and performing instruction parallel scheduling aiming at the instruction after the functional unit is assigned and the starting clock period is determined, so that a final assembly code is obtained, the development difficulty of the GPDSP assembly code can be simplified, and the execution efficiency of the produced assembly code is ensured.
As shown in fig. 2, step 2) includes:
2.1) traversing and taking out an instruction to be scheduled from the serial assembly code according to the appearance sequence to be used as a current instruction x; if all the instructions in the serial assembly code are completely traversed, skipping to execute the step 3);
2.2) inquiring the occupied beat j of the read-write port of the current instruction x, the execution beat t of the current instruction x and the executable functional unit set Y of the current instruction x from a preset instruction set description file; an initial beat i;
2.3) judging whether the beat i exceeds the limit of the preset longest execution cycle, if so, judging that the current instruction x cannot find a verified instruction placement scheme, and if not, skipping to execute the step 2.1); otherwise, skipping to execute the next step;
2.4) traversing and taking out one functional unit from the functional unit set Y as the current functional unit YkIf all the functional units in the functional unit set Y are completely traversed, adding 1 beat to the beat i, and then jumping to execute the step 2.3); otherwise, skipping to execute the next step;
2.5) determining the execution cycle of the current instruction x from the ith beat to the (i + t-1) th beat; current instruction x to current functional unit ykThe beat occupied by the read port is from the ith beat to the (i + j-1) th beat; current instruction x to current functional unit ykThe beat occupied by the writing port is from the ith + t-j to the ith + t-1;
2.6) judging the current instruction x to the current functional unit ykWhether the read port occupation beat and the write port occupation beat conflict with the previous instruction or not is judged, if no conflict exists, the placement scheme of the current instruction x is judged well, and the functional unit finally corresponding to the current instruction x is determined to be the current functional unit ykThe earliest starting time is beat i, and the current instruction x is recorded to the current functional unit ykThe read and write ports occupy beats; otherwise the jump executes step 2.4) to continue traversing the remaining functional units.
As shown in fig. 3, the fields of each instruction in the preset instruction set description file include: instruction name, functional unit, execution beat, read operand, write operand, read port occupation beat, and write port occupation beat.
In this embodiment, step 3) includes: aiming at the instructions which automatically assign the functional units and determine the earliest starting time and the beat occupied by the read and write ports of the assigned functional units, the instructions which use different functional units in the same clock cycle are packed in a VLIW instruction packet to realize concurrent execution, and a specified parallel symbol is added in front of each instruction from the second instruction in the VLIW instruction packet when codes are output.
It should be noted that fig. 1 also includes "register allocation", which means that the method of this embodiment further includes an optional step of performing automatic register allocation. As shown in fig. 1 and 4, in the serial assembly code input in step 1) in this embodiment, each instruction does not include valid condition field and operand list information, and the condition field and the operand list information both use register variables; the step 2) further comprises a step of performing automatic register allocation on register variables in the serial assembly code, and the register types allocated to the register variables during the automatic register allocation are consistent with the type description of the read/write operands of the allocated instructions in a preset instruction set description file.
In this embodiment, the step of performing automatic register allocation includes: reading a register variable declaration file of a platform corresponding to the serial assembly code, wherein the register variable declaration file comprises register types of the platform corresponding to the serial assembly code and single or paired use modes of all registers; and replacing the register variable in each instruction with the corresponding register based on the register type in the register variable declaration file and the single or paired use mode of each register.
As an optional implementation manner, the execution subject of steps 1) to 3) is a lightweight compiler for implementing assembly code optimization, and step 3) is followed by a step (no scheduling failure condition) in which the lightweight compiler outputs or sends the output final assembly code to an assembly code compiler for compiling and connecting the final assembly code to obtain binary code.
The lightweight high-efficiency assembly code programming method for the GPDSP aims to reduce the assembly programming difficulty of the GPDSP and ensure the high efficiency of generating assembly codes. A programmer writes serial codes, ignores instruction beats and packs in parallel, replaces an original register based on variable description, and hands functional unit assignment, parallel scheduling and register allocation of instructions to a specific compiler to be automatically realized, so that the output efficiency of assembly codes is improved. Meanwhile, the programming framework can realize the decoupling of the architecture based on the description file of the instruction set information, thereby greatly improving the application range of the programming framework. The implementation of the above-mentioned purpose of the method of this embodiment mainly includes core technical problems including serial assembly writing and analysis for general-purpose GPDSP, instruction operand writing and register replacement based on variable description, and automatic generation of assembly codes.
A. And compiling and analyzing serial assembly facing general GPDSP. Although the assembly code is relatively complicated to write, the writing format of the assembly code is relatively fixed, and the content of the assembly code is mainly divided into a line number, a parallel symbol, a condition field, an instruction, a functional unit, an operation number list, a comment and the like. Since GPDSPs are typically based on VLIW architectures, their assembly code typically needs to be statically arranged. In the instruction arrangement process, on one hand, a programmer needs to consider the specific start execution beat and effective beat of each instruction, so as to manually arrange the instruction sequence, and avoid that the original dependency relationship among the instructions is damaged; on the other hand, the delay slot of the instruction is hidden as much as possible, so that the execution efficiency of the assembly code is improved. The assembly programming framework provided by the embodiment hides the programming constraint from the programmer, that is, the programmer only needs to directly perform serial programming in the assembly programming, and defaults that each instruction is executed and completed by 1 beat, and the hiding of the delay slot and the parallel scheduling of the instruction are automatically completed by the compiler. The compiler can consider that the order of instruction arrangement directly includes the order of instruction execution and validation in the process of analyzing the user code based on the convention, so that the difficulty is reduced when dependency analysis between instructions is performed, and compared with a general compiler, the processing is lighter. The simplified mode can release programmers from complex assembly programming optimization, and only needs to consider the correctness of algorithm semantics, so that the code yield can be greatly improved. Meanwhile, the serial assembly programming mode can be also suitable for any GPDSP platform, and the compiler can support the generation of assembly codes of user serial assembly codes on different platforms only by extracting the instruction set information on each platform into an external information file (comprising instruction names, functional units, execution beats, operand lists and specifications and occupation of read-write ports). But because the instruction names have large difference across platforms, only the assembly code transplantation generation among products of different generations in the same series can be supported unless a specific mapping relation between instructions can be provided.
B. Instruction operand writing and register replacement based on variable descriptions. Instruction operands require direct specification of registers in assembly programming, and register allocation is the most difficult part of assembly programming. The number and types of registers of different architectures are different, so that during compiling assembly, a programmer needs to have a detailed understanding of register resources and consider the dependency relationship between registers and the number constraint of registers, which is greatly different from the traditional variable writing method of a high-level language of the programmer and has higher difficulty. Therefore, the present embodiment proposes an instruction operand writing based on variable description with reference to a high-level language. During the compilation of the assembly instruction by a user, operands can be replaced by variables, so that the programming of the user can be greatly simplified, and the task of allocating registers is automatically executed by a compiler. Specifically, in the assembly code writing, the condition field and the operand list both need to specify a register, so that under the programming framework provided by the embodiment, a user can directly replace the variable by only declaring at the beginning of the whole code. In combination with the architecture features of the GPDSP, register classes usually include scalar registers, vector registers, base registers, and offset registers, and the specific usage is usually used for a single register, and there may be paired usage under some specific architectures. The register types, the register number, the pairing modes and the like which are tightly coupled with the architecture can be unbound through the record of the external information file. After quantizing the instruction operands, the programmer only needs to focus on the variable type and does not need to implement a specific allocation.
C. The assembly code is automatically generated. The automatic generation of assembly code is implemented by a specific lightweight compiler whose main functions include functional unit allocation, instruction parallel scheduling and register allocation. Assembly code based on a VLIW architecture requires a programmer to specify the functional unit of execution for each instruction, and two instructions can be packed into one instruction packet if they have no data dependency and the functional units do not conflict. However, the functional unit assignment is tightly coupled with the hardware resources of the architecture, and once the hardware resources of the architecture are changed, the problems that the assembly code cannot be transplanted or the hardware resources are not fully utilized are caused. On the other hand, the parallel scheduling of the instructions requires a programmer to perform manual arrangement during assembly code optimization, but is limited by data dependence among the instructions and functional unit resource limitation, and a manual searching arrangement scheme is not feasible, especially when the number of assembly instructions is large. Therefore, the present embodiment proposes that based on the serial code written by the user, and the instruction set description file provided externally, the compiler can automatically assign the functional unit to the instruction, which is advantageous in terms of the decoupling of the architecture. When the hardware resources are adjusted, the new assembly code can be recompiled and generated only by modifying the external instruction set description file, which is efficient and concise for programmers. In addition, according to the execution beat of the instructions in the instruction set, data dependence analysis can be performed on the instructions, meanwhile, assembly instructions are subjected to parallel scheduling through an algorithm, such as a heuristic list scheduling algorithm, and the parallel arrangement scheme of automatic search optimization can overcome the limitation of manual search of instruction arrangement space, and assembly codes subjected to parallel arrangement can achieve high execution efficiency. In addition, by summarizing register types and register usages and analyzing in combination with a specific instruction format in a compiler, automatic implementation of register replacement variables can be achieved. When the survival number of the variable exceeds the limit of the number of the corresponding registers, the compiler can also realize automatic insertion of a stack pushing and popping instruction, thereby relieving the programming difficulty for programmers.
In addition, the embodiment provides a GPDSP-oriented lightweight efficient assembly code programming framework, under which a programmer only needs to write serial codes when performing assembly programming, without considering instruction scheduling and register allocation, and other matters are automatically implemented by a specific lightweight compiler. The frame mainly comprises the following three parts: the method comprises the steps of general GPDSP-oriented serial assembly writing and analysis, variable description-based instruction operand writing and register replacement, and automatic assembly code generation. The specific implementation mode comprises the following contents:
(1) and compiling and analyzing serial assembly facing general GPDSP. The GPDSP usually adopts a VLIW architecture, and the characteristic of the VLIW determines that a programmer needs to arrange in a fully static way when the assembly code is programmed, namely, the instruction dependence analysis, the functional unit assignment and the parallel packing scheduling are manually completed. Since assembly instructions typically have multi-beat instruction latency, the validation time needs to be considered when performing instruction dependency analysis and instruction scheduling, which presents a significant challenge to programmers when performing instruction placement. The assembly code programming framework proposed by the present embodiment can therefore free programmers from this difficulty. When assembly programming is carried out, programmers only need to consider algorithm implementation, simple serial assembly instructions are compiled, meanwhile, each assembly instruction can be considered to be finished in 1 beat and effective, and therefore the compiling difficulty of assembly codes can be greatly saved. This programming approach delivers the remaining instruction parallel scheduling to a specific lightweight compiler. Based on the convention with the programmer, the compiler directly performs instruction dependence analysis according to the code writing sequence during code analysis, and the difficulty of instruction analysis and dependence analysis is also reduced. The execution beat of the instruction needs to be taken into consideration when the instruction is dispatched in parallel, so that the efficient assembly code generation is completed. In addition, in order to improve the universality of the programming framework, in combination with the differences of instruction sets of different GPDSP platforms, the embodiment provides an instruction set description mode for architecture decoupling, which extracts and stores instruction set information required by a compiler in an external file, and completes the import of the external instruction set description file before performing user code analysis. As shown in fig. 4, the instruction set description file specifically includes an instruction name, a functional unit, an execution beat, a read operand, a write operand, and a read/write port occupation beat. Under the assistance of instruction set information, the instruction format in the user code is analyzed, and meanwhile, the arrangement and the generation of assembly codes are completed by using the functional units and the instruction beats in the user code. The method can greatly improve the application range of the embodiment for various GPDSPs, especially for the portability problem on different DSP chips of the same series. If different DSP chips only have the change of function unit adjustment or individual instruction execution beat on the instruction set information, on the basis that user codes do not need to be modified, efficient assembly codes can be automatically generated again only by adjusting an external instruction set description file, repetitive work of assembly and writing is greatly reduced, and the code output efficiency is improved while the code instruction quality is ensured.
(2) Instruction operand writing and register replacement based on variable descriptions. The manual assembly requires the programmer to allocate registers autonomously, which increases the programming difficulty for the programmer. On one hand, a programmer needs to be familiar with the types and the number of register resources on the architecture, and on the other hand, data dependency relationship is recorded in algorithm implementation and register allocation and replacement are completed. In view of the above problems, the present embodiment proposes an instruction operand writing method based on variable description, with reference to a programming method in a high-level language, instead of the original manual register allocation. Specifically, as shown in fig. 3, in the condition field and operand list part of the assembly instruction, the user can directly represent the operand by the form of a variable, and the variable with the same name is considered as the same operand. In order to facilitate the compiler to recognize the variable and the subsequent register replacement, a variable declaration form is required to be agreed, and the variable type and the use mode are declared before the whole user code starts. Taking FT-Matrix DSP as an example, register types include general scalar registers, general vector registers, base register, offset registers, etc., and usage includes single register usage or register pair usage. In order to realize that a user writes an operand instead of a register by using a variable and ensure certain universality, register allocation needs to be realized by automatic generation of subsequent codes. To this end, the present embodiment proposes to abstract and record the architecture register information into an external information file. Specifically, register types, register numbers, pairing modes, and the like are included. On this basis, the user can be freed from the task of register allocation when writing code. The register allocation complex module can be automatically realized by a compiler based on operand writing of variable description, thereby simplifying the difficulty of assembling and programming by a user without influencing the correctness of a program.
(3) The assembly code is automatically generated. Based on the lightweight efficient assembly programming framework provided by the embodiment, a user writes serial assembly codes by referring to an instruction set, and can only pay attention to the algorithm to realize algorithm logic, but ignore instruction execution beats, instruction arrangement and register allocation. Specifically, as shown in fig. 3, when the assembly instruction is written, the parallel symbol and functional unit assignment can be completely omitted, and the register allocation can be automatically implemented by a specific compiler. This frees the programmer from complex programming implementations and focuses on the selection and implementation of algorithms. The lightweight high-efficiency assembly programming framework provided by the embodiment can depend on a specific compiler, and after the dependence analysis is performed on the user serial assembly code, the instruction is automatically selected according to the explanation of the relevant information of the instruction in the instruction set description file. For instructions that are executable on multiple functional units, the functional unit that can guarantee early execution of the instruction may be selected. In instruction scheduling, in order to improve the execution efficiency of the whole program, a compiler performs topological sequencing on instructions by combining the dependency relationship among the instructions, the instructions are sequentially selected to start executing beats through an algorithm similar to list scheduling, and instructions without data dependency and functional unit conflict can be selected to be executed concurrently and packaged in a VLIW instruction packet. The advantage of the compiler automatically realizing the parallel scheduling is that the arrangement space of the automatic search of the machine is far beyond the range which can be searched by the human brain, and the search efficiency is high, so the realization of the programming framework can not only reduce the workload of programmers, but also improve the quality of the instruction parallel scheduling, thereby improving the program execution efficiency. On the basis of the statement of the user variable, the lightweight compiler provided by the embodiment can analyze the operand list in combination with the instruction format requirement, so as to analyze the activity interval of the variable in the whole code. Variables may be replaced with registers of a specified type based on commonly used register allocation algorithms. If the number of the variables with activity at the same time exceeds the limit of the number of the registers, the compiler can vacate the registers for the current variables by automatically inserting a stack pushing and popping instruction, so that the output efficiency of the codes is accelerated.
In summary, the GPDSP-oriented lightweight and efficient assembly code programming method of the embodiment has the following advantages: firstly, the method comprises the following steps: the assembly programming framework provided by the embodiment is suitable for a GPDSP platform, and a programmer can automatically generate assembly codes corresponding to a target platform by writing simplified instructions through a decoupled instruction set description file. Secondly, the method comprises the following steps: the serial assembly programming mode provided by the embodiment can enable programmers to ignore execution beats, functional unit assignment and parallel scheduling of assembly instructions, greatly reduce the difficulty of assembly programming, and only need to consider the realization of an algorithm per se to transfer the parallel arrangement and generation of high-efficiency assembly codes to a specific lightweight compiler for completion. Thirdly, the method comprises the following steps: the programming framework of the GPDPS assembly code provided by the embodiment can enable a programmer to write operands in a variable description mode, is closer to a description mode of a high-level language, and automatically generates stack pushing and popping instructions caused by register allocation and register overflow by a compiler, so that the output efficiency of the assembly code is improved. The method can greatly reduce the development difficulty of the assembly code, improve the development efficiency of the assembly code, shorten the development period, is suitable for various GPDSP platform architectures, can provide support for cross-platform architectures, and has good portability.
In addition, the embodiment also provides a GPDSP-oriented lightweight and efficient assembly code programming device, which includes a microprocessor and a memory connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the aforementioned GPDSP-oriented lightweight and efficient assembly code programming method.
Furthermore, the present embodiment also provides a computer-readable storage medium in which a computer program is stored, the computer program being programmed or configured to execute the aforementioned GPDSP-oriented lightweight efficient assembly code programming method.
Example two:
the present embodiment is substantially the same as the first embodiment, and the main differences are as follows: the first embodiment includes a step of performing automatic register allocation, but the first embodiment does not include a step of performing automatic register allocation.
In addition, the embodiment also provides a GPDSP-oriented lightweight and efficient assembly code programming device, which includes a microprocessor and a memory connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the aforementioned GPDSP-oriented lightweight and efficient assembly code programming method.
Furthermore, the present embodiment also provides a computer-readable storage medium in which a computer program is stored, the computer program being programmed or configured to execute the aforementioned GPDSP-oriented lightweight efficient assembly code programming method.
Example three:
the present embodiment is substantially the same as the first embodiment, and the main differences are as follows: in the first embodiment, the execution subject of steps 1) to 3) is a lightweight compiler for implementing assembly code optimization, and the lightweight compiler is further included after step 3). And outputting or sending the output final assembly code to an assembly code compiler for compiling and connecting the final assembly code to obtain binary codes. In this embodiment, the execution subject of steps 1) to 3) is an assembly code compiler, and step 3) is followed by a step in which the assembly code compiler continues to compile and connect the final assembly code to obtain a binary code, that is, the functions of the method of this embodiment are directly integrated in the assembly code compiler, so that the binary code can be directly implemented.
In addition, the embodiment also provides a GPDSP-oriented lightweight and efficient assembly code programming device, which includes a microprocessor and a memory connected with each other, wherein the microprocessor is programmed or configured to execute the steps of the aforementioned GPDSP-oriented lightweight and efficient assembly code programming method.
Furthermore, the present embodiment also provides a computer-readable storage medium in which a computer program is stored, the computer program being programmed or configured to execute the aforementioned GPDSP-oriented lightweight efficient assembly code programming method.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is directed to methods, apparatus (systems), and computer program products according to embodiments of the application, wherein the instructions that execute via the flowcharts and/or processor of the computer program product create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may occur to those skilled in the art without departing from the principle of the invention, and are considered to be within the scope of the invention.

Claims (10)

1. A GPDSP-oriented lightweight and efficient assembly code programming method is characterized by comprising the following steps:
1) inputting a serial assembly code, wherein the serial assembly code is formed by instruction serial, the sequence of instruction arrangement represents the sequence of instruction execution and effect, and each instruction does not contain effective parallel symbols and functional unit information;
2) automatically assigning functional units according to instructions in the serial assembly codes, and determining the earliest starting time of the instructions and the occupation beats of the read ports and the write ports of the assigned functional units;
3) and performing instruction parallel scheduling on the instructions after finishing the functional unit assignment and determining the starting clock period so as to obtain the final assembly code.
2. The GPDSP-oriented lightweight and efficient assembly code programming method according to claim 1, wherein step 2) comprises:
2.1) traversing and taking out an instruction to be scheduled from the serial assembly code according to the appearance sequence to be used as a current instruction x; if all the instructions in the serial assembly code are completely traversed, skipping to execute the step 3);
2.2) inquiring the occupied beat j of the read-write port of the current instruction x, the execution beat t of the current instruction x and the executable functional unit set Y of the current instruction x from a preset instruction set description file; an initial beat i;
2.3) judging whether the beat i exceeds the limit of the preset longest execution cycle, if so, judging that the current instruction x cannot find a verified instruction placement scheme, and if not, skipping to execute the step 2.1); otherwise, skipping to execute the next step;
2.4) traversing and taking out one functional unit from the functional unit set Y as the current functional unit YkIf all the functional units in the functional unit set Y are completely traversed, adding 1 beat to the beat i, and then jumping to execute the step 2.3); otherwise, skipping to execute the next step;
2.5) determining the execution cycle of the current instruction x from the ith beat to the (i + t-1) th beat; current instruction x to current functional unit ykThe beat occupied by the read port is from the ith beat to the (i + j-1) th beat; current instruction x to current functional unit ykThe beat occupied by the writing port is from the ith + t-j to the ith + t-1;
2.6) judging the current instruction x to the current functional unit ykWhether the read port occupation beat and the write port occupation beat conflict with the previous instruction or not is judged, if no conflict exists, the placement scheme of the current instruction x is judged well, and the functional unit finally corresponding to the current instruction x is determined to be the current functional unit ykThe earliest starting time is beat i, and the current instruction x is recorded to the current functional unit ykThe read and write ports occupy beats; otherwise the jump executes step 2.4) to continue traversing the remaining functional units.
3. The GPDSP-oriented lightweight and efficient assembly code programming method according to claim 2, wherein the field of each instruction in the preset instruction set description file comprises: instruction name, functional unit, execution beat, read operand, write operand, read port occupation beat, and write port occupation beat.
4. The GPDSP-oriented lightweight and efficient assembly code programming method according to claim 1, wherein step 3) comprises: aiming at the instructions which automatically assign the functional units and determine the earliest starting time and the beat occupied by the read and write ports of the assigned functional units, the instructions which use different functional units in the same clock cycle are packed in a VLIW instruction packet to realize concurrent execution, and a specified parallel symbol is added in front of each instruction from the second instruction in the VLIW instruction packet when codes are output.
5. The GPDSP-oriented lightweight and efficient assembly code programming method according to claim 1, wherein in the serial assembly code input in the step 1), each instruction does not contain valid condition field and operand list information, and the condition field and the operand list information both adopt register variables; the step 2) further comprises a step of performing automatic register allocation on register variables in the serial assembly code, and the register types allocated to the register variables during the automatic register allocation are consistent with the type description of the read/write operands of the allocated instructions in a preset instruction set description file.
6. A GPDSP-oriented lightweight and efficient assembly code programming method according to claim 5, wherein said step of performing automatic register allocation comprises: reading a register variable declaration file of a platform corresponding to the serial assembly code, wherein the register variable declaration file comprises register types of the platform corresponding to the serial assembly code and single or paired use modes of all registers; and replacing the register variable in each instruction with the corresponding register based on the register type in the register variable declaration file and the single or paired use mode of each register.
7. The GPDSP-oriented lightweight and efficient assembly code programming method according to claim 1, wherein an execution subject of the steps 1) to 3) is an assembly code compiler, and the method further comprises a step of the assembly code compiler continuing to compile and link final assembly codes to obtain binary codes after the step 3).
8. The GPDSP-oriented lightweight and efficient assembly code programming method according to claim 1, wherein the execution subject of steps 1) to 3) is a lightweight compiler for implementing assembly code optimization, and step 3) is followed by a step in which the lightweight compiler outputs or sends the output final assembly code to an assembly code compiler for compiling and connecting the final assembly code to obtain binary code.
9. A GPDSP-oriented lightweight high-efficiency assembly code programming device comprising a microprocessor and a memory connected to each other, characterized in that the microprocessor is programmed or configured to perform the steps of the GPDSP-oriented lightweight high-efficiency assembly code programming method according to any one of claims 1 to 8.
10. A computer readable storage medium having stored thereon a computer program programmed or configured to perform the GPDSP-oriented lightweight, efficient assembly code programming method of any of claims 1-8.
CN202111028130.2A 2021-09-02 2021-09-02 GPDSP-oriented lightweight high-efficiency assembly code programming method and system Active CN113721899B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111028130.2A CN113721899B (en) 2021-09-02 2021-09-02 GPDSP-oriented lightweight high-efficiency assembly code programming method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111028130.2A CN113721899B (en) 2021-09-02 2021-09-02 GPDSP-oriented lightweight high-efficiency assembly code programming method and system

Publications (2)

Publication Number Publication Date
CN113721899A true CN113721899A (en) 2021-11-30
CN113721899B CN113721899B (en) 2023-08-15

Family

ID=78681133

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111028130.2A Active CN113721899B (en) 2021-09-02 2021-09-02 GPDSP-oriented lightweight high-efficiency assembly code programming method and system

Country Status (1)

Country Link
CN (1) CN113721899B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116954122A (en) * 2023-07-27 2023-10-27 上海鲸鱼机器人科技有限公司 Programming device and method based on serial identification code instruction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108776586A (en) * 2018-05-29 2018-11-09 中国人民解放军国防科技大学 Large-point-number FFT vectorization assembly code generation method based on GPDSP
CN108845795A (en) * 2018-05-29 2018-11-20 中国人民解放军国防科技大学 GPDSP-based dense matrix multiplication vectorization assembly code generation method
CN113157318A (en) * 2021-04-21 2021-07-23 中国人民解放军国防科技大学 GPDSP assembly transplanting optimization method and system based on countdown buffering

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108776586A (en) * 2018-05-29 2018-11-09 中国人民解放军国防科技大学 Large-point-number FFT vectorization assembly code generation method based on GPDSP
CN108845795A (en) * 2018-05-29 2018-11-20 中国人民解放军国防科技大学 GPDSP-based dense matrix multiplication vectorization assembly code generation method
CN113157318A (en) * 2021-04-21 2021-07-23 中国人民解放军国防科技大学 GPDSP assembly transplanting optimization method and system based on countdown buffering

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116954122A (en) * 2023-07-27 2023-10-27 上海鲸鱼机器人科技有限公司 Programming device and method based on serial identification code instruction

Also Published As

Publication number Publication date
CN113721899B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
US8806463B1 (en) Feedback-directed inter-procedural optimization
US8448150B2 (en) System and method for translating high-level programming language code into hardware description language code
KR101702651B1 (en) Solution to divergent branches in a simd core using hardware pointers
Kim et al. Efficient SIMD code generation for irregular kernels
US9830164B2 (en) Hardware and software solutions to divergent branches in a parallel pipeline
JP2021501949A (en) Programming flow for multiprocessor systems
JPH11306026A (en) Code optimization device and method and computer readable recording medium recording code optimization program
US20220214866A1 (en) Merged machine-level intermediate representation optimizations
Lorenz et al. Energy aware compilation for DSPs with SIMD instructions
CN113721899B (en) GPDSP-oriented lightweight high-efficiency assembly code programming method and system
US7673296B2 (en) Method and system for optional code scheduling
Hohenauer et al. A SIMD optimization framework for retargetable compilers
CN116382700A (en) Automatic debugging method and system for VLIW and SIMD architecture-oriented compiler
US20210182041A1 (en) Method and apparatus for enabling autonomous acceleration of dataflow ai applications
US20100077384A1 (en) Parallel processing of an expression
US11662988B2 (en) Compiler for RISC processor having specialized registers
Goossens et al. Retargetable Compilation
Chen et al. ORC2DSP: Compiler Infrastructure Supports for VLIW DSP Processors
Manilov Analysis and transformation of legacy code
Lin et al. Effective code generation for distributed and ping-pong register files: a case study on PAC VLIW DSP cores
Ju et al. Develop and prototype code generation techniques for a clause-based GPU
CN116450138A (en) Code optimization generation method and system oriented to SIMD and VLIW architecture
JP2022140995A (en) Information processing device, compile program, and compile method
Pister et al. Generic software pipelining at the assembly level
JP3634712B2 (en) Compiler device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant