CN110134437B

CN110134437B - Software flow optimization method and device

Info

Publication number: CN110134437B
Application number: CN201910395467.3A
Authority: CN
Inventors: 方志红; 肖晶; 郭怡冉; 顾庆远; 梁之勇; 竺红伟
Original assignee: CETC 38 Research Institute
Current assignee: CETC 38 Research Institute
Priority date: 2019-05-13
Filing date: 2019-05-13
Publication date: 2022-12-16
Anticipated expiration: 2039-05-13
Also published as: CN110134437A

Abstract

The invention discloses a method and a device for optimizing software flow, wherein the method comprises the following steps: according to the serial processing relation among the execution codes, performing modular processing on the serial execution codes in the software to be optimized to obtain a plurality of functional modules, wherein the execution codes comprise: a circulating body; for each functional module, sequentially expanding the functional modules according to the running sequence of instructions contained in the functional modules to obtain instruction columns, and processing the instruction columns corresponding to the functional modules side by side to obtain a two-dimensional instruction grid; judging whether the instructions in the two-dimensional instruction grid accord with preset rule constraints or not, wherein the rule constraints comprise: one or a combination of computational resources, register resources, and instruction delays; if not, adjusting the position of the instruction which does not accord with the rule constraint, and returning to the step of judging whether the instruction in the two-dimensional instruction grid accords with the preset rule constraint or not until the instructions in each row accord with the rule constraint. By applying the embodiment of the invention, the software optimization efficiency can be improved.

Description

Software pipeline optimization method and device

Technical Field

The invention relates to a software optimization method and device, in particular to a software flow optimization method and device.

Background

In order to solve the problem of a large number of mathematical operations generated in the Digital Signal processing process, a DSP (Digital Signal Processor) is developed. Digital signal processing may utilize a series of multiplications and additions to implement complex mathematical operations. A significant feature of digital signal processors is the ability to perform at least one multiplication or multiply-accumulate calculation in a single clock cycle. In order to meet the requirements of many applications on the performance of DSPs, modern DSPs have a plurality of execution units such as multipliers and adders. How to fully utilize these internal execution units to exert the performance of DSP becomes the key to improve the performance of the whole application system.

For a loop body in software, each statement in the loop body has strong time correlation, but each statement between a plurality of continuous loop bodies is independent. The loop bodies process different data in the same statement respectively, and the loop number can be reduced by times if the loop bodies are executed in parallel. The software pipeline divides a loop body into a plurality of operation stages, and operations belonging to different loop bodies can be executed in parallel. Software pipelining is a technique for recombining loops to exploit loop program instruction level parallelism that speeds the execution of loops by executing multiple loop bodies in series in parallel. The software flow can fully utilize various operation resources in the processor, effectively reduce the instruction line delay, and is an important means for improving the execution efficiency of the DSP program. Software pipelining is performed by overlapping the execution of different loop bodies, interleaving instructions between different loop bodies, distributing instructions to execution units for parallel execution, while instructions in a single loop body are still executed serially. Therefore, the correlation of the instructions in the loop body is ensured, and the parallelism is improved. In software pipelining, the overlapping of different loop bodies increases the demands on the internal processing units and registers of the processor, which leads to an increased processing pressure for a single instruction cycle, and puts higher demands on the assembler instruction level programming seeking higher performance.

In view of the importance of software pipelining to DSP performance optimization, some DSP manufacturers provide corresponding software and hardware support. Currently, software support is mainly reflected in the improvement of compiling systems. However, improvements in the compilation system still do not guarantee the high efficiency of the resulting software pipelining, and sometimes even do not enable the software pipelining. Therefore, many software pipelines have to be adjusted later by manpower, which leads to the technical problem of low efficiency of the existing software pipeline optimization.

Disclosure of Invention

The invention aims to provide a method and a device for optimizing software pipeline, so as to improve the efficiency of software pipeline optimization.

The invention solves the technical problems through the following technical scheme:

the embodiment of the invention provides a software flow optimization method, which comprises the following steps:

according to the serial processing relation among the execution codes, performing modular processing on the serial execution codes in the software to be optimized to obtain a plurality of functional modules, wherein the execution codes comprise: a circulating body;

for each functional module, sequentially expanding the functional modules according to the running sequence of the instructions contained in the functional modules to obtain instruction columns, and processing the instruction columns corresponding to the functional modules side by side to obtain a two-dimensional instruction grid;

acquiring an adjusting command aiming at an instruction in a two-dimensional instruction grid, and judging whether the instruction in the two-dimensional instruction grid adjusted according to the adjusting command meets a preset rule constraint, wherein the rule constraint comprises the following steps: one or a combination of computational resources, register resources, and instruction delays;

if not, adjusting the position of the instruction which does not accord with the rule constraint, and returning to execute the step of judging whether the instruction in the two-dimensional instruction grid accords with the preset rule constraint or not until the instructions in each row accord with the rule constraint.

Optionally, the processing the instruction sequences corresponding to the functional modules side by side includes:

the first instructions in the instruction columns contained in the respective functional modules are aligned to the same row.

Optionally, when the functions are sequentially expanded according to the operation sequence of the instructions included in the functional modules, the method further includes:

according to the types of the instructions contained in the functional modules, performing differential display on the instructions of each type by using visual elements, wherein the visual elements comprise: one or a combination of fonts, font colors, fill colors and special character marks.

Optionally, the determining whether the instructions in the two-dimensional instruction grid meet preset rule constraints includes:

and judging whether the instructions in each row of the two-dimensional instruction grid meet preset rule constraints or not.

Optionally, the adjusting the position of the instruction that does not meet the rule constraint includes:

the position of instructions that do not meet the rule constraints is adjusted to the next or previous line.

Optionally, the determining whether the instructions in the two-dimensional instruction grid meet a preset rule constraint includes:

and adjusting the position of the instruction which does not meet the rule constraint to the left column or the right column.

and receiving an operation command for adjusting the instruction which does not meet the rule constraint, and executing the operation command to adjust the position of the instruction which does not meet the rule constraint.

The embodiment of the invention provides a software flow optimization device, which comprises:

the processing module is used for modularly processing the serial execution codes in the software to be optimized according to the serial processing relation among the execution codes to obtain a plurality of functional modules, wherein the execution codes comprise: a circulating body;

the expansion module is used for sequentially expanding each functional module according to the operation sequence of the instructions contained in the functional module to obtain an instruction column, and processing the instruction columns corresponding to the functional modules side by side to obtain a two-dimensional instruction grid;

the judging module is used for judging whether the instructions in the two-dimensional instruction grid meet preset rule constraints or not, wherein the rule constraints comprise: one or a combination of computational resources, register resources, and instruction latency;

and the adjusting module is used for adjusting the position of the instruction which does not accord with the rule constraint under the condition that the judgment result of the judging module is negative, and returning to the step of judging whether the instruction in the two-dimensional instruction grid accords with the preset rule constraint until the instructions in each row accord with the rule constraint.

Optionally, the unfolding module is configured to:

Optionally, the apparatus further comprises: a display module to:

according to the types of the instructions contained in the functional modules, performing differential display on the instructions of the types by using visual elements, wherein the visual elements comprise: font, font color, fill color, special character mark.

Optionally, the determining module is configured to:

Optionally, the adjusting module is configured to:

Optionally, the determining module is configured to:

and judging whether the instructions in each column of the two-dimensional instruction grid meet preset rule constraints or not.

Optionally, the adjusting module is configured to:

Compared with the prior art, the invention has the following advantages:

by applying the embodiment of the invention, the two-dimensional instruction grid is constructed by utilizing the instruction columns and the rule constraints corresponding to the functional modules, and then the instructions are adjusted in the two-dimensional instruction grid, so that the software flow optimization efficiency can be improved compared with the manual adjustment and judgment in the prior art.

Drawings

Fig. 1 is a schematic flow chart of a software pipeline optimization method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a multicore processor in the software pipeline optimization method according to the embodiment of the present invention;

fig. 3 is a schematic diagram illustrating a functional module splitting in a software pipeline optimization method according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a two-dimensional instruction grid constructed in a software pipeline optimization method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating adjustment of instruction positions in a method for optimizing software pipeline according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a two-dimensional instruction grid after position adjustment of an instruction in a software pipeline optimization method according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a software pipeline optimization apparatus according to an embodiment of the present invention.

Detailed Description

The following examples are given for the detailed implementation and the specific operation procedures, but the scope of the present invention is not limited to the following examples.

The embodiment of the invention provides a method and a device for optimizing software pipeline, and firstly, the method for optimizing software pipeline provided by the embodiment of the invention is introduced below.

Fig. 1 is a schematic flow chart of a software pipeline optimization method according to an embodiment of the present invention, and as shown in fig. 1, the method includes:

s101: according to the serial processing relation among the execution codes, performing modular processing on the serial execution codes in the software to be optimized to obtain a plurality of functional modules, wherein the execution codes comprise: and a circulating body.

By way of example, the software to be optimized including several loop bodies is taken as an example for illustration,

fig. 2 is a schematic structural diagram of a multi-core processor in the software pipeline optimization method according to an embodiment of the present invention, and as shown in fig. 2, a processor xxx includes a multiplication unit, a shifter unit, an arithmetic unit, and a store/fetch unit; and the number of the first and second groups is,

the instruction code corresponding to the execution instruction which can be run by the multiplication unit is MULSP;

the instruction codes corresponding to the execution instructions which can be operated by the arithmetic operation unit are ADDSP and SUBSP;

the instruction codes corresponding to the execution instructions which can be operated by the shifter operation unit are SHR and SHL;

the instruction codes corresponding to the execution instructions which can be run by the access/fetch arithmetic unit are LOAD and SAVE.

Because the execution codes among the loop bodies are mutually independent, and the execution codes in each loop body have a logical precedence relationship, the execution codes with the logical precedence can be modularized as a whole, and further the software to be optimized can be modularized, and each loop body corresponds to one functional module.

It should be noted that, one software to be optimized may include several functional modules, and one functional module may include several execution instructions executed in series or in parallel; an execution instruction may include lines of execution code.

Finally, it should be emphasized that the above-mentioned "mulps", "ADDSPs", "SUBSP", etc. are all the code numbers of the instruction codes, and in practical applications, the code numbers of the codes that can be operated by each operation unit include, but are not limited to, the above-mentioned code numbers, and any code that can be operated by the corresponding operation unit can be applied to the method according to the embodiment of the present invention.

S102: and for each functional module, sequentially expanding the functional modules according to the running sequence of the instructions contained in the functional modules to obtain an instruction column, and processing the instruction columns corresponding to the functional modules side by side to obtain a two-dimensional instruction grid.

Fig. 3 is a schematic diagram illustrating a split of functional modules in a software pipeline optimization method according to an embodiment of the present invention, as shown in fig. 3, a left side of fig. 3 is an execution code sequence of software to be optimized; the loop body obtained by modularizing the software to be optimized comprises:

loop body 1, which contains the codes of execution instructions, LOAD, NOP;

loop body 2, which contains code of execution instructions as SHR, MULSP, NOP;

loop body 3, which includes code of execution instruction being NOP, mulps, NOP;

a loop body 4 including codes of execution instructions, which are NOP, ADDSP, NOP;

loop body 5, which contains code for executing instructions as SAVE, NOP.

Then, in order to compress the execution time, aligning the first instruction in the instruction column contained in each functional module to the same row; as shown in FIG. 3, the first executed instruction of loop bodies 1-5 is aligned to the same row.

Fig. 4 is a schematic structural diagram of a two-dimensional instruction grid constructed in the software pipeline optimization method according to the embodiment of the present invention, as shown in fig. 4, fig. 4 is a schematic structural diagram of a two-dimensional instruction grid, in fig. 4, first execution instructions of loop bodies 1 to 5 are aligned to a same row, and the loop bodies 1 to 5 are sequentially arranged from left to right. The execution instructions in each functional module, i.e. the loop body, are arranged from top to bottom in sequence.

In practical applications, in order to distinguish and display different execution instructions, when the different execution instructions are sequentially expanded according to the operation sequence of the instructions included in the functional module, the different types of instructions may be displayed differently by using a visual element according to the types of the instructions included in the functional module, where the visual element includes: one or a combination of fonts, font colors, fill colors and special character marks.

3-4, different types of execution instructions may be displayed using different code; different colors can be given to the marking information of different types of execution instructions, and different fonts can be given to the marking information of different types of execution instructions; different colors can be filled in grids corresponding to different types of execution instructions; or different special characters may be used to mark different types of execution instructions. The different display modes for the different types of execution instructions in the embodiment of the present invention include, but are not limited to, the above modes, and the different types of execution instructions may be displayed differently without being listed here.

It will be appreciated that the type of execution instruction is such that the execution instruction executed by the multiply operation unit may be one type of execution instruction and similarly the execution instruction executed by the shifter operation unit may be another type of execution instruction.

S103: acquiring an adjusting command aiming at an instruction in a two-dimensional instruction grid, and judging whether the instruction in the two-dimensional instruction grid after being adjusted according to the adjusting command meets a preset rule constraint, wherein the rule constraint comprises the following steps: one or a combination of computational resources, register resources, and instruction delays; if not, S104 is executed.

Illustratively, a user input adjustment command for the instructions in the two-dimensional instruction grid is firstly received, for example, the a instruction in the first row, the second column and the third column is adjusted to the position of the third row, the third column and the fourth column; or the C instruction in the second row and the second column is adjusted to the instruction interchange position of the third row and the fourth column. The input mode of the adjusting instruction can be character command, mouse operation, gesture operation and the like; the embodiment of the invention does not limit the content of the adjustment command and the input mode of the adjustment command.

In a first aspect, it may be determined whether the instructions in each row of the two-dimensional instruction grid meet a preset rule constraint.

As shown in fig. 4, since the access unit cannot perform access operations simultaneously, the first column execute instruction LOAD in the first row of fig. 4 conflicts with the sixth column execute instruction SAVE,

it can be understood that, when judging whether the instructions in each row of the two-dimensional instruction grid meet the preset rule constraint or not, and taking the register resource and the instruction delay as the rule constraint, the adjustment mode of the execution instructions which do not meet the rule constraint is the same as the mode described above.

In the second aspect, it may also be determined whether the instructions in each column of the two-dimensional instruction grid meet a preset rule constraint.

Specifically, the manner of determining whether the instructions in each row of the two-dimensional instruction grid conform to the preset rule constraint is the same as the manner of determining in the first aspect, except that it is determined whether the control instructions in each row conform to the preset rule constraint.

S104: and adjusting the positions of the instructions which do not accord with the rule constraint, and returning to execute the step of judging whether the instructions in the two-dimensional instruction grid accord with the preset rule constraint or not until the instructions in each row accord with the rule constraint.

Exemplarily, corresponding to the first aspect in the step S103, fig. 5 is a schematic diagram illustrating adjustment of instruction positions in a software pipeline optimization method according to an embodiment of the present invention, and as shown in fig. 5, positions of execution instructions SAVE in a first row and a sixth column in a two-dimensional instruction grid are exchanged with execution instructions corresponding to next grid bits.

It is emphasized that the swapping of positions can only be done if there is no logical precedence between the SAVE instruction and the NOP instruction which is the next instruction of SAVE; and if the SAVE instruction and the NOP instruction of the next SAVE instruction have a logical sequence, shifting the execution instruction of the sixth row down by one grid bit integrally.

Similarly, since the multiplication unit can only perform one multiplication operation at a time, the MULSP in the second row and the second column of the two-dimensional instruction grid conflicts with the MULSP in the second row and the second column of the two-dimensional instruction grid,

as shown in fig. 5, the execution instruction MULSP in the second row and the third column of the two-dimensional instruction grid is swapped with the execution instruction corresponding to the next grid bit.

Specifically, the position of an instruction that does not comply with the rule constraint may be adjusted to the next line or the previous line.

Similarly, when the executed instructions in each column do not meet the preset rule constraint, the executed instructions in conflict may be adjusted from one column to another column, corresponding to the second aspect in step S103.

It should be emphasized that the adjustment when the instructions in each column of the two-dimensional instruction grid do not meet the preset rule constraint is the same as the adjustment when the instructions in each row of the two-dimensional instruction grid do not meet the preset rule constraint, and the difference is only that the adjustment between columns is performed.

After the adjustment of the control instruction is performed, it is necessary to determine whether the control instruction in each row and/or each column in the new two-dimensional instruction grid after the adjustment of the execution instruction meets the preset rule constraint, that is, the step S101 is executed.

Specifically, the positions of the instructions that do not meet the rule constraint may be adjusted to the left column or the right column.

By applying the embodiment shown in fig. 1 of the invention, the two-dimensional instruction grid is constructed by using the instruction columns corresponding to the functional modules and the rule constraint, and then the instructions are adjusted in the two-dimensional instruction grid, so that compared with the manual adjustment and judgment in the prior art, the software pipeline optimization efficiency can be improved.

In addition, the existing programming software generally lacks a programmer-oriented software pipelining auxiliary programming means, so that the pipelining difficulty of manual optimization software is greatly increased.

In practical applications, an operation command for adjusting the instructions which do not comply with the rule constraint can be received, and the operation command is executed to adjust the position of the instructions which do not comply with the rule constraint.

The manual adjusting instruction can be received in the instruction adjusting process, so that the flexibility and the simplicity of the adjusting process are improved.

Corresponding to the embodiment shown in fig. 1 of the present invention, the embodiment of the present invention further provides a software pipeline optimization apparatus.

Fig. 7 is a schematic structural diagram of a software pipeline optimization apparatus according to an embodiment of the present invention, and as shown in fig. 7, the apparatus includes:

a processing module 701, configured to perform modular processing on serial execution codes in software to be optimized according to a serial processing relationship between the execution codes to obtain a plurality of functional modules, where the execution codes include: a circulating body;

an expansion module 702, configured to sequentially expand, for each function module, according to an operation sequence of instructions included in the function module to obtain an instruction column, and process the instruction columns corresponding to the function modules side by side to obtain a two-dimensional instruction grid;

the determining module 703 is configured to obtain an adjustment command for an instruction in a two-dimensional instruction grid, and determine whether the instruction in the two-dimensional instruction grid adjusted according to the adjustment command meets a preset rule constraint, where the rule constraint includes: one or a combination of computational resources, register resources, and instruction delays;

an adjusting module 704, configured to, if the determination result of the determining module 703 is negative, adjust the position of the instruction that does not meet the rule constraint, and return to the step of determining whether the instruction in the two-dimensional instruction grid meets the preset rule constraint until the instructions in each row all meet the rule constraint.

By applying the embodiment shown in fig. 7 of the invention, the two-dimensional instruction grid is constructed by using the instruction columns and the rule constraints corresponding to the functional modules, and then the instructions are adjusted in the two-dimensional instruction grid, so that compared with the manual adjustment and judgment in the prior art, the software flow optimization efficiency can be improved.

In a specific implementation manner of the embodiment of the present invention, the unfolding module 702 is configured to:

In a specific implementation manner of the embodiment of the present invention, the apparatus further includes: a display module to:

according to the types of the instructions contained in the functional modules, performing differential display on the instructions of the types by using visual elements, wherein the visual elements comprise: one or a combination of fonts, font colors, fill colors and special character marks.

In a specific implementation manner of the embodiment of the present invention, the determining module 703 is configured to:

In a specific implementation manner of the embodiment of the present invention, the adjusting module 704 is configured to:

the position of the instruction that does not comply with the rule constraint is adjusted to the next or previous line.

The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims

1. A method for optimizing software pipeline, the method comprising:

for each functional module, sequentially expanding the functional modules according to the running sequence of instructions contained in the functional modules to obtain instruction columns, and processing the instruction columns corresponding to the functional modules side by side to obtain a two-dimensional instruction grid; the parallel processing of the instruction sequences corresponding to the functional modules comprises: aligning the first instructions in the instruction columns contained in the functional modules to the same row;

acquiring an adjusting command aiming at an instruction in a two-dimensional instruction grid, and judging whether the instruction in the two-dimensional instruction grid adjusted according to the adjusting command meets a preset rule constraint, wherein the rule constraint comprises the following steps: one or a combination of computational resources, register resources, and instruction latency;

2. The method according to claim 1, wherein when the functional modules are sequentially expanded according to the running order of the instructions contained in the functional modules, the method further comprises:

3. The method according to claim 1, wherein the determining whether the instructions in the two-dimensional instruction grid conform to a preset rule constraint includes:

4. The method of claim 3, wherein the adjusting the position of the instruction that does not comply with the rule constraint comprises:

5. The method according to claim 1, wherein the determining whether the instructions in the two-dimensional instruction grid conform to a preset rule constraint includes:

6. The method of claim 1, wherein the adjusting the position of the instruction that does not comply with the rule constraint comprises:

7. The method of claim 1, wherein the adjusting the position of the instruction that does not comply with the rule constraint comprises:

8. An apparatus for software pipelining optimization, the apparatus comprising:

the processing module is used for modularizing the serial execution codes in the software to be optimized according to the serial processing relation among the execution codes to obtain a plurality of functional modules, wherein the execution codes comprise: a circulating body;

the expansion module is used for sequentially expanding each function module according to the running sequence of the instructions contained in the function module to obtain an instruction column, and processing the instruction columns corresponding to the function modules side by side to obtain a two-dimensional instruction grid; the deployment module is configured to: aligning the first instructions in the instruction columns contained in the functional modules to the same row;

the judging module is used for acquiring an adjusting command aiming at the instructions in the two-dimensional instruction grid and judging whether the instructions in the two-dimensional instruction grid adjusted according to the adjusting command meet preset rule constraints or not, wherein the rule constraints comprise: one or a combination of computational resources, register resources, and instruction latency;

9. The software pipelining optimization apparatus of claim 8, wherein the apparatus further comprises: a display module for:

according to the types of the instructions contained in the functional modules, performing differential display on the instructions of each type by using visual elements, wherein the visual elements comprise: font, font color, fill color, special character mark.

10. The software pipelining optimization apparatus of claim 8, wherein the determining module is configured to:

11. The software pipelining optimization apparatus of claim 10, wherein the adjusting module is configured to:

12. The software pipelining optimization apparatus of claim 8, wherein the determining module is configured to:

13. The software pipeline optimization apparatus of claim 12, wherein the adjustment module is configured to:

14. The software pipeline optimization device of claim 8, wherein the adjustment module is configured to: