CN110134437B - Software flow optimization method and device - Google Patents

Software flow optimization method and device Download PDF

Info

Publication number
CN110134437B
CN110134437B CN201910395467.3A CN201910395467A CN110134437B CN 110134437 B CN110134437 B CN 110134437B CN 201910395467 A CN201910395467 A CN 201910395467A CN 110134437 B CN110134437 B CN 110134437B
Authority
CN
China
Prior art keywords
instruction
instructions
grid
adjusting
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910395467.3A
Other languages
Chinese (zh)
Other versions
CN110134437A (en
Inventor
方志红
肖晶
郭怡冉
顾庆远
梁之勇
竺红伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 38 Research Institute
Original Assignee
CETC 38 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 38 Research Institute filed Critical CETC 38 Research Institute
Priority to CN201910395467.3A priority Critical patent/CN110134437B/en
Publication of CN110134437A publication Critical patent/CN110134437A/en
Application granted granted Critical
Publication of CN110134437B publication Critical patent/CN110134437B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units
    • G06F9/3889Concurrent instruction execution, e.g. pipeline, look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute

Abstract

The invention discloses a method and a device for optimizing software flow, wherein the method comprises the following steps: according to the serial processing relation among the execution codes, performing modular processing on the serial execution codes in the software to be optimized to obtain a plurality of functional modules, wherein the execution codes comprise: a circulating body; for each functional module, sequentially expanding the functional modules according to the running sequence of instructions contained in the functional modules to obtain instruction columns, and processing the instruction columns corresponding to the functional modules side by side to obtain a two-dimensional instruction grid; judging whether the instructions in the two-dimensional instruction grid accord with preset rule constraints or not, wherein the rule constraints comprise: one or a combination of computational resources, register resources, and instruction delays; if not, adjusting the position of the instruction which does not accord with the rule constraint, and returning to the step of judging whether the instruction in the two-dimensional instruction grid accords with the preset rule constraint or not until the instructions in each row accord with the rule constraint. By applying the embodiment of the invention, the software optimization efficiency can be improved.

Description

Software pipeline optimization method and device
Technical Field
The invention relates to a software optimization method and device, in particular to a software flow optimization method and device.
Background
In order to solve the problem of a large number of mathematical operations generated in the Digital Signal processing process, a DSP (Digital Signal Processor) is developed. Digital signal processing may utilize a series of multiplications and additions to implement complex mathematical operations. A significant feature of digital signal processors is the ability to perform at least one multiplication or multiply-accumulate calculation in a single clock cycle. In order to meet the requirements of many applications on the performance of DSPs, modern DSPs have a plurality of execution units such as multipliers and adders. How to fully utilize these internal execution units to exert the performance of DSP becomes the key to improve the performance of the whole application system.
For a loop body in software, each statement in the loop body has strong time correlation, but each statement between a plurality of continuous loop bodies is independent. The loop bodies process different data in the same statement respectively, and the loop number can be reduced by times if the loop bodies are executed in parallel. The software pipeline divides a loop body into a plurality of operation stages, and operations belonging to different loop bodies can be executed in parallel. Software pipelining is a technique for recombining loops to exploit loop program instruction level parallelism that speeds the execution of loops by executing multiple loop bodies in series in parallel. The software flow can fully utilize various operation resources in the processor, effectively reduce the instruction line delay, and is an important means for improving the execution efficiency of the DSP program. Software pipelining is performed by overlapping the execution of different loop bodies, interleaving instructions between different loop bodies, distributing instructions to execution units for parallel execution, while instructions in a single loop body are still executed serially. Therefore, the correlation of the instructions in the loop body is ensured, and the parallelism is improved. In software pipelining, the overlapping of different loop bodies increases the demands on the internal processing units and registers of the processor, which leads to an increased processing pressure for a single instruction cycle, and puts higher demands on the assembler instruction level programming seeking higher performance.
In view of the importance of software pipelining to DSP performance optimization, some DSP manufacturers provide corresponding software and hardware support. Currently, software support is mainly reflected in the improvement of compiling systems. However, improvements in the compilation system still do not guarantee the high efficiency of the resulting software pipelining, and sometimes even do not enable the software pipelining. Therefore, many software pipelines have to be adjusted later by manpower, which leads to the technical problem of low efficiency of the existing software pipeline optimization.
Disclosure of Invention
The invention aims to provide a method and a device for optimizing software pipeline, so as to improve the efficiency of software pipeline optimization.
The invention solves the technical problems through the following technical scheme:
the embodiment of the invention provides a software flow optimization method, which comprises the following steps:
according to the serial processing relation among the execution codes, performing modular processing on the serial execution codes in the software to be optimized to obtain a plurality of functional modules, wherein the execution codes comprise: a circulating body;
for each functional module, sequentially expanding the functional modules according to the running sequence of the instructions contained in the functional modules to obtain instruction columns, and processing the instruction columns corresponding to the functional modules side by side to obtain a two-dimensional instruction grid;
acquiring an adjusting command aiming at an instruction in a two-dimensional instruction grid, and judging whether the instruction in the two-dimensional instruction grid adjusted according to the adjusting command meets a preset rule constraint, wherein the rule constraint comprises the following steps: one or a combination of computational resources, register resources, and instruction delays;
if not, adjusting the position of the instruction which does not accord with the rule constraint, and returning to execute the step of judging whether the instruction in the two-dimensional instruction grid accords with the preset rule constraint or not until the instructions in each row accord with the rule constraint.
Optionally, the processing the instruction sequences corresponding to the functional modules side by side includes:
the first instructions in the instruction columns contained in the respective functional modules are aligned to the same row.
Optionally, when the functions are sequentially expanded according to the operation sequence of the instructions included in the functional modules, the method further includes:
according to the types of the instructions contained in the functional modules, performing differential display on the instructions of each type by using visual elements, wherein the visual elements comprise: one or a combination of fonts, font colors, fill colors and special character marks.
Optionally, the determining whether the instructions in the two-dimensional instruction grid meet preset rule constraints includes:
and judging whether the instructions in each row of the two-dimensional instruction grid meet preset rule constraints or not.
Optionally, the adjusting the position of the instruction that does not meet the rule constraint includes:
the position of instructions that do not meet the rule constraints is adjusted to the next or previous line.
Optionally, the determining whether the instructions in the two-dimensional instruction grid meet a preset rule constraint includes:
and judging whether the instructions in each row of the two-dimensional instruction grid meet preset rule constraints or not.
Optionally, the adjusting the position of the instruction that does not meet the rule constraint includes:
and adjusting the position of the instruction which does not meet the rule constraint to the left column or the right column.
Optionally, the adjusting the position of the instruction that does not meet the rule constraint includes:
and receiving an operation command for adjusting the instruction which does not meet the rule constraint, and executing the operation command to adjust the position of the instruction which does not meet the rule constraint.
The embodiment of the invention provides a software flow optimization device, which comprises:
the processing module is used for modularly processing the serial execution codes in the software to be optimized according to the serial processing relation among the execution codes to obtain a plurality of functional modules, wherein the execution codes comprise: a circulating body;
the expansion module is used for sequentially expanding each functional module according to the operation sequence of the instructions contained in the functional module to obtain an instruction column, and processing the instruction columns corresponding to the functional modules side by side to obtain a two-dimensional instruction grid;
the judging module is used for judging whether the instructions in the two-dimensional instruction grid meet preset rule constraints or not, wherein the rule constraints comprise: one or a combination of computational resources, register resources, and instruction latency;
and the adjusting module is used for adjusting the position of the instruction which does not accord with the rule constraint under the condition that the judgment result of the judging module is negative, and returning to the step of judging whether the instruction in the two-dimensional instruction grid accords with the preset rule constraint until the instructions in each row accord with the rule constraint.
Optionally, the unfolding module is configured to:
the first instructions in the instruction columns contained in the respective functional modules are aligned to the same row.
Optionally, the apparatus further comprises: a display module to:
according to the types of the instructions contained in the functional modules, performing differential display on the instructions of the types by using visual elements, wherein the visual elements comprise: font, font color, fill color, special character mark.
Optionally, the determining module is configured to:
and judging whether the instructions in each row of the two-dimensional instruction grid meet preset rule constraints or not.
Optionally, the adjusting module is configured to:
the position of instructions that do not meet the rule constraints is adjusted to the next or previous line.
Optionally, the determining module is configured to:
and judging whether the instructions in each column of the two-dimensional instruction grid meet preset rule constraints or not.
Optionally, the adjusting module is configured to:
and adjusting the position of the instruction which does not meet the rule constraint to the left column or the right column.
Optionally, the adjusting module is configured to:
and receiving an operation command for adjusting the instruction which does not meet the rule constraint, and executing the operation command to adjust the position of the instruction which does not meet the rule constraint.
Compared with the prior art, the invention has the following advantages:
by applying the embodiment of the invention, the two-dimensional instruction grid is constructed by utilizing the instruction columns and the rule constraints corresponding to the functional modules, and then the instructions are adjusted in the two-dimensional instruction grid, so that the software flow optimization efficiency can be improved compared with the manual adjustment and judgment in the prior art.
Drawings
Fig. 1 is a schematic flow chart of a software pipeline optimization method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a multicore processor in the software pipeline optimization method according to the embodiment of the present invention;
fig. 3 is a schematic diagram illustrating a functional module splitting in a software pipeline optimization method according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a two-dimensional instruction grid constructed in a software pipeline optimization method according to an embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating adjustment of instruction positions in a method for optimizing software pipeline according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a two-dimensional instruction grid after position adjustment of an instruction in a software pipeline optimization method according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a software pipeline optimization apparatus according to an embodiment of the present invention.
Detailed Description
The following examples are given for the detailed implementation and the specific operation procedures, but the scope of the present invention is not limited to the following examples.
The embodiment of the invention provides a method and a device for optimizing software pipeline, and firstly, the method for optimizing software pipeline provided by the embodiment of the invention is introduced below.
Fig. 1 is a schematic flow chart of a software pipeline optimization method according to an embodiment of the present invention, and as shown in fig. 1, the method includes:
s101: according to the serial processing relation among the execution codes, performing modular processing on the serial execution codes in the software to be optimized to obtain a plurality of functional modules, wherein the execution codes comprise: and a circulating body.
By way of example, the software to be optimized including several loop bodies is taken as an example for illustration,
fig. 2 is a schematic structural diagram of a multi-core processor in the software pipeline optimization method according to an embodiment of the present invention, and as shown in fig. 2, a processor xxx includes a multiplication unit, a shifter unit, an arithmetic unit, and a store/fetch unit; and the number of the first and second groups is,
the instruction code corresponding to the execution instruction which can be run by the multiplication unit is MULSP;
the instruction codes corresponding to the execution instructions which can be operated by the arithmetic operation unit are ADDSP and SUBSP;
the instruction codes corresponding to the execution instructions which can be operated by the shifter operation unit are SHR and SHL;
the instruction codes corresponding to the execution instructions which can be run by the access/fetch arithmetic unit are LOAD and SAVE.
Because the execution codes among the loop bodies are mutually independent, and the execution codes in each loop body have a logical precedence relationship, the execution codes with the logical precedence can be modularized as a whole, and further the software to be optimized can be modularized, and each loop body corresponds to one functional module.
It should be noted that, one software to be optimized may include several functional modules, and one functional module may include several execution instructions executed in series or in parallel; an execution instruction may include lines of execution code.
Finally, it should be emphasized that the above-mentioned "mulps", "ADDSPs", "SUBSP", etc. are all the code numbers of the instruction codes, and in practical applications, the code numbers of the codes that can be operated by each operation unit include, but are not limited to, the above-mentioned code numbers, and any code that can be operated by the corresponding operation unit can be applied to the method according to the embodiment of the present invention.
S102: and for each functional module, sequentially expanding the functional modules according to the running sequence of the instructions contained in the functional modules to obtain an instruction column, and processing the instruction columns corresponding to the functional modules side by side to obtain a two-dimensional instruction grid.
Fig. 3 is a schematic diagram illustrating a split of functional modules in a software pipeline optimization method according to an embodiment of the present invention, as shown in fig. 3, a left side of fig. 3 is an execution code sequence of software to be optimized; the loop body obtained by modularizing the software to be optimized comprises:
loop body 1, which contains the codes of execution instructions, LOAD, NOP;
loop body 2, which contains code of execution instructions as SHR, MULSP, NOP;
loop body 3, which includes code of execution instruction being NOP, mulps, NOP;
a loop body 4 including codes of execution instructions, which are NOP, ADDSP, NOP;
loop body 5, which contains code for executing instructions as SAVE, NOP.
Then, in order to compress the execution time, aligning the first instruction in the instruction column contained in each functional module to the same row; as shown in FIG. 3, the first executed instruction of loop bodies 1-5 is aligned to the same row.
Fig. 4 is a schematic structural diagram of a two-dimensional instruction grid constructed in the software pipeline optimization method according to the embodiment of the present invention, as shown in fig. 4, fig. 4 is a schematic structural diagram of a two-dimensional instruction grid, in fig. 4, first execution instructions of loop bodies 1 to 5 are aligned to a same row, and the loop bodies 1 to 5 are sequentially arranged from left to right. The execution instructions in each functional module, i.e. the loop body, are arranged from top to bottom in sequence.
In practical applications, in order to distinguish and display different execution instructions, when the different execution instructions are sequentially expanded according to the operation sequence of the instructions included in the functional module, the different types of instructions may be displayed differently by using a visual element according to the types of the instructions included in the functional module, where the visual element includes: one or a combination of fonts, font colors, fill colors and special character marks.
3-4, different types of execution instructions may be displayed using different code; different colors can be given to the marking information of different types of execution instructions, and different fonts can be given to the marking information of different types of execution instructions; different colors can be filled in grids corresponding to different types of execution instructions; or different special characters may be used to mark different types of execution instructions. The different display modes for the different types of execution instructions in the embodiment of the present invention include, but are not limited to, the above modes, and the different types of execution instructions may be displayed differently without being listed here.
It will be appreciated that the type of execution instruction is such that the execution instruction executed by the multiply operation unit may be one type of execution instruction and similarly the execution instruction executed by the shifter operation unit may be another type of execution instruction.
S103: acquiring an adjusting command aiming at an instruction in a two-dimensional instruction grid, and judging whether the instruction in the two-dimensional instruction grid after being adjusted according to the adjusting command meets a preset rule constraint, wherein the rule constraint comprises the following steps: one or a combination of computational resources, register resources, and instruction delays; if not, S104 is executed.
Illustratively, a user input adjustment command for the instructions in the two-dimensional instruction grid is firstly received, for example, the a instruction in the first row, the second column and the third column is adjusted to the position of the third row, the third column and the fourth column; or the C instruction in the second row and the second column is adjusted to the instruction interchange position of the third row and the fourth column. The input mode of the adjusting instruction can be character command, mouse operation, gesture operation and the like; the embodiment of the invention does not limit the content of the adjustment command and the input mode of the adjustment command.
In a first aspect, it may be determined whether the instructions in each row of the two-dimensional instruction grid meet a preset rule constraint.
As shown in fig. 4, since the access unit cannot perform access operations simultaneously, the first column execute instruction LOAD in the first row of fig. 4 conflicts with the sixth column execute instruction SAVE,
it can be understood that, when judging whether the instructions in each row of the two-dimensional instruction grid meet the preset rule constraint or not, and taking the register resource and the instruction delay as the rule constraint, the adjustment mode of the execution instructions which do not meet the rule constraint is the same as the mode described above.
In the second aspect, it may also be determined whether the instructions in each column of the two-dimensional instruction grid meet a preset rule constraint.
Specifically, the manner of determining whether the instructions in each row of the two-dimensional instruction grid conform to the preset rule constraint is the same as the manner of determining in the first aspect, except that it is determined whether the control instructions in each row conform to the preset rule constraint.
S104: and adjusting the positions of the instructions which do not accord with the rule constraint, and returning to execute the step of judging whether the instructions in the two-dimensional instruction grid accord with the preset rule constraint or not until the instructions in each row accord with the rule constraint.
Exemplarily, corresponding to the first aspect in the step S103, fig. 5 is a schematic diagram illustrating adjustment of instruction positions in a software pipeline optimization method according to an embodiment of the present invention, and as shown in fig. 5, positions of execution instructions SAVE in a first row and a sixth column in a two-dimensional instruction grid are exchanged with execution instructions corresponding to next grid bits.
It is emphasized that the swapping of positions can only be done if there is no logical precedence between the SAVE instruction and the NOP instruction which is the next instruction of SAVE; and if the SAVE instruction and the NOP instruction of the next SAVE instruction have a logical sequence, shifting the execution instruction of the sixth row down by one grid bit integrally.
Similarly, since the multiplication unit can only perform one multiplication operation at a time, the MULSP in the second row and the second column of the two-dimensional instruction grid conflicts with the MULSP in the second row and the second column of the two-dimensional instruction grid,
as shown in fig. 5, the execution instruction MULSP in the second row and the third column of the two-dimensional instruction grid is swapped with the execution instruction corresponding to the next grid bit.
Specifically, the position of an instruction that does not comply with the rule constraint may be adjusted to the next line or the previous line.
Similarly, when the executed instructions in each column do not meet the preset rule constraint, the executed instructions in conflict may be adjusted from one column to another column, corresponding to the second aspect in step S103.
It should be emphasized that the adjustment when the instructions in each column of the two-dimensional instruction grid do not meet the preset rule constraint is the same as the adjustment when the instructions in each row of the two-dimensional instruction grid do not meet the preset rule constraint, and the difference is only that the adjustment between columns is performed.
After the adjustment of the control instruction is performed, it is necessary to determine whether the control instruction in each row and/or each column in the new two-dimensional instruction grid after the adjustment of the execution instruction meets the preset rule constraint, that is, the step S101 is executed.
Specifically, the positions of the instructions that do not meet the rule constraint may be adjusted to the left column or the right column.
By applying the embodiment shown in fig. 1 of the invention, the two-dimensional instruction grid is constructed by using the instruction columns corresponding to the functional modules and the rule constraint, and then the instructions are adjusted in the two-dimensional instruction grid, so that compared with the manual adjustment and judgment in the prior art, the software pipeline optimization efficiency can be improved.
In addition, the existing programming software generally lacks a programmer-oriented software pipelining auxiliary programming means, so that the pipelining difficulty of manual optimization software is greatly increased.
In practical applications, an operation command for adjusting the instructions which do not comply with the rule constraint can be received, and the operation command is executed to adjust the position of the instructions which do not comply with the rule constraint.
The manual adjusting instruction can be received in the instruction adjusting process, so that the flexibility and the simplicity of the adjusting process are improved.
Corresponding to the embodiment shown in fig. 1 of the present invention, the embodiment of the present invention further provides a software pipeline optimization apparatus.
Fig. 7 is a schematic structural diagram of a software pipeline optimization apparatus according to an embodiment of the present invention, and as shown in fig. 7, the apparatus includes:
a processing module 701, configured to perform modular processing on serial execution codes in software to be optimized according to a serial processing relationship between the execution codes to obtain a plurality of functional modules, where the execution codes include: a circulating body;
an expansion module 702, configured to sequentially expand, for each function module, according to an operation sequence of instructions included in the function module to obtain an instruction column, and process the instruction columns corresponding to the function modules side by side to obtain a two-dimensional instruction grid;
the determining module 703 is configured to obtain an adjustment command for an instruction in a two-dimensional instruction grid, and determine whether the instruction in the two-dimensional instruction grid adjusted according to the adjustment command meets a preset rule constraint, where the rule constraint includes: one or a combination of computational resources, register resources, and instruction delays;
an adjusting module 704, configured to, if the determination result of the determining module 703 is negative, adjust the position of the instruction that does not meet the rule constraint, and return to the step of determining whether the instruction in the two-dimensional instruction grid meets the preset rule constraint until the instructions in each row all meet the rule constraint.
By applying the embodiment shown in fig. 7 of the invention, the two-dimensional instruction grid is constructed by using the instruction columns and the rule constraints corresponding to the functional modules, and then the instructions are adjusted in the two-dimensional instruction grid, so that compared with the manual adjustment and judgment in the prior art, the software flow optimization efficiency can be improved.
In a specific implementation manner of the embodiment of the present invention, the unfolding module 702 is configured to:
the first instructions in the instruction columns contained in the respective functional modules are aligned to the same row.
In a specific implementation manner of the embodiment of the present invention, the apparatus further includes: a display module to:
according to the types of the instructions contained in the functional modules, performing differential display on the instructions of the types by using visual elements, wherein the visual elements comprise: one or a combination of fonts, font colors, fill colors and special character marks.
In a specific implementation manner of the embodiment of the present invention, the determining module 703 is configured to:
and judging whether the instructions in each row of the two-dimensional instruction grid meet preset rule constraints or not.
In a specific implementation manner of the embodiment of the present invention, the adjusting module 704 is configured to:
the position of the instruction that does not comply with the rule constraint is adjusted to the next or previous line.
In a specific implementation manner of the embodiment of the present invention, the determining module 703 is configured to:
and judging whether the instructions in each row of the two-dimensional instruction grid meet preset rule constraints or not.
In a specific implementation manner of the embodiment of the present invention, the adjusting module 704 is configured to:
and adjusting the position of the instruction which does not meet the rule constraint to the left column or the right column.
In a specific implementation manner of the embodiment of the present invention, the adjusting module 704 is configured to:
and receiving an operation command for adjusting the instruction which does not meet the rule constraint, and executing the operation command to adjust the position of the instruction which does not meet the rule constraint.
The above description is intended to be illustrative of the preferred embodiment of the present invention and should not be taken as limiting the invention, but rather, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims (14)

1. A method for optimizing software pipeline, the method comprising:
according to the serial processing relation among the execution codes, performing modular processing on the serial execution codes in the software to be optimized to obtain a plurality of functional modules, wherein the execution codes comprise: a circulating body;
for each functional module, sequentially expanding the functional modules according to the running sequence of instructions contained in the functional modules to obtain instruction columns, and processing the instruction columns corresponding to the functional modules side by side to obtain a two-dimensional instruction grid; the parallel processing of the instruction sequences corresponding to the functional modules comprises: aligning the first instructions in the instruction columns contained in the functional modules to the same row;
acquiring an adjusting command aiming at an instruction in a two-dimensional instruction grid, and judging whether the instruction in the two-dimensional instruction grid adjusted according to the adjusting command meets a preset rule constraint, wherein the rule constraint comprises the following steps: one or a combination of computational resources, register resources, and instruction latency;
if not, adjusting the position of the instruction which does not accord with the rule constraint, and returning to execute the step of judging whether the instruction in the two-dimensional instruction grid accords with the preset rule constraint or not until the instructions in each row accord with the rule constraint.
2. The method according to claim 1, wherein when the functional modules are sequentially expanded according to the running order of the instructions contained in the functional modules, the method further comprises:
according to the types of the instructions contained in the functional modules, performing differential display on the instructions of the types by using visual elements, wherein the visual elements comprise: one or a combination of fonts, font colors, fill colors and special character marks.
3. The method according to claim 1, wherein the determining whether the instructions in the two-dimensional instruction grid conform to a preset rule constraint includes:
and judging whether the instructions in each row of the two-dimensional instruction grid meet preset rule constraints or not.
4. The method of claim 3, wherein the adjusting the position of the instruction that does not comply with the rule constraint comprises:
the position of the instruction that does not comply with the rule constraint is adjusted to the next or previous line.
5. The method according to claim 1, wherein the determining whether the instructions in the two-dimensional instruction grid conform to a preset rule constraint includes:
and judging whether the instructions in each row of the two-dimensional instruction grid meet preset rule constraints or not.
6. The method of claim 1, wherein the adjusting the position of the instruction that does not comply with the rule constraint comprises:
and adjusting the position of the instruction which does not meet the rule constraint to the left column or the right column.
7. The method of claim 1, wherein the adjusting the position of the instruction that does not comply with the rule constraint comprises:
and receiving an operation command for adjusting the instruction which does not meet the rule constraint, and executing the operation command to adjust the position of the instruction which does not meet the rule constraint.
8. An apparatus for software pipelining optimization, the apparatus comprising:
the processing module is used for modularizing the serial execution codes in the software to be optimized according to the serial processing relation among the execution codes to obtain a plurality of functional modules, wherein the execution codes comprise: a circulating body;
the expansion module is used for sequentially expanding each function module according to the running sequence of the instructions contained in the function module to obtain an instruction column, and processing the instruction columns corresponding to the function modules side by side to obtain a two-dimensional instruction grid; the deployment module is configured to: aligning the first instructions in the instruction columns contained in the functional modules to the same row;
the judging module is used for acquiring an adjusting command aiming at the instructions in the two-dimensional instruction grid and judging whether the instructions in the two-dimensional instruction grid adjusted according to the adjusting command meet preset rule constraints or not, wherein the rule constraints comprise: one or a combination of computational resources, register resources, and instruction latency;
and the adjusting module is used for adjusting the position of the instruction which does not accord with the rule constraint under the condition that the judgment result of the judging module is negative, and returning to the step of judging whether the instruction in the two-dimensional instruction grid accords with the preset rule constraint until the instructions in each row accord with the rule constraint.
9. The software pipelining optimization apparatus of claim 8, wherein the apparatus further comprises: a display module for:
according to the types of the instructions contained in the functional modules, performing differential display on the instructions of each type by using visual elements, wherein the visual elements comprise: font, font color, fill color, special character mark.
10. The software pipelining optimization apparatus of claim 8, wherein the determining module is configured to:
and judging whether the instructions in each row of the two-dimensional instruction grid meet preset rule constraints or not.
11. The software pipelining optimization apparatus of claim 10, wherein the adjusting module is configured to:
the position of instructions that do not meet the rule constraints is adjusted to the next or previous line.
12. The software pipelining optimization apparatus of claim 8, wherein the determining module is configured to:
and judging whether the instructions in each row of the two-dimensional instruction grid meet preset rule constraints or not.
13. The software pipeline optimization apparatus of claim 12, wherein the adjustment module is configured to:
and adjusting the position of the instruction which does not meet the rule constraint to the left column or the right column.
14. The software pipeline optimization device of claim 8, wherein the adjustment module is configured to:
and receiving an operation command for adjusting the instruction which does not meet the rule constraint, and executing the operation command to adjust the position of the instruction which does not meet the rule constraint.
CN201910395467.3A 2019-05-13 2019-05-13 Software flow optimization method and device Active CN110134437B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910395467.3A CN110134437B (en) 2019-05-13 2019-05-13 Software flow optimization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910395467.3A CN110134437B (en) 2019-05-13 2019-05-13 Software flow optimization method and device

Publications (2)

Publication Number Publication Date
CN110134437A CN110134437A (en) 2019-08-16
CN110134437B true CN110134437B (en) 2022-12-16

Family

ID=67573630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910395467.3A Active CN110134437B (en) 2019-05-13 2019-05-13 Software flow optimization method and device

Country Status (1)

Country Link
CN (1) CN110134437B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4814978A (en) * 1986-07-15 1989-03-21 Dataflow Computer Corporation Dataflow processing element, multiprocessor, and processes
CN101714118A (en) * 2009-11-20 2010-05-26 北京邮电大学 Detector for binary-code buffer-zone overflow bugs, and detection method thereof
CN105653244A (en) * 2014-11-10 2016-06-08 华为数字技术(苏州)有限公司 Software optimization method and device
CN108334408A (en) * 2018-01-04 2018-07-27 深圳市天软科技开发有限公司 code execution method, device, terminal device and computer readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9639359B2 (en) * 2013-05-21 2017-05-02 Advanced Micro Devices, Inc. Thermal-aware compiler for parallel instruction execution in processors

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4814978A (en) * 1986-07-15 1989-03-21 Dataflow Computer Corporation Dataflow processing element, multiprocessor, and processes
CN101714118A (en) * 2009-11-20 2010-05-26 北京邮电大学 Detector for binary-code buffer-zone overflow bugs, and detection method thereof
CN105653244A (en) * 2014-11-10 2016-06-08 华为数字技术(苏州)有限公司 Software optimization method and device
CN108334408A (en) * 2018-01-04 2018-07-27 深圳市天软科技开发有限公司 code execution method, device, terminal device and computer readable storage medium

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Smart Grid Infrastructure Using a Hybrid Network Architecture;F.Salvadori et al;《IEEE Transactions on Smart Grid 》;20130806;第4卷(第3期);1-174页 *
多目标实时跟踪可编程片上系统的软件优化;叶有时等;《光学精密工程》;20110331;第19卷(第03期);681-689页 *
类数据流驱动的分片式处理器体系结构;从明;《中国知网》;20111015;第2011年卷(第10期);1630-1639页 *
面向BW104x软流水框架;洪立涛等;《计算机系统应用》;20161015;第25卷(第10期);114-119页 *

Also Published As

Publication number Publication date
CN110134437A (en) 2019-08-16

Similar Documents

Publication Publication Date Title
US8049760B2 (en) System and method for vector computations in arithmetic logic units (ALUs)
US6038582A (en) Data processor and data processing system
KR100909510B1 (en) Matrix Multiplication with Reduced Bandwidth Requirements
US4228498A (en) Multibus processor for increasing execution speed using a pipeline effect
EP3436928B1 (en) Complex multiply instruction
CN1809810A (en) Instruction controlled data processing device
US5083267A (en) Horizontal computer having register multiconnect for execution of an instruction loop with recurrance
WO2015114305A1 (en) A data processing apparatus and method for executing a vector scan instruction
US5036454A (en) Horizontal computer having register multiconnect for execution of a loop with overlapped code
CN104899181A (en) Data processing apparatus and method for processing vector operands
JP5720243B2 (en) Processor verification program
US6092183A (en) Data processor for processing a complex instruction by dividing it into executing units
CN110134437B (en) Software flow optimization method and device
US10754652B2 (en) Processor and control method of processor for address generating and address displacement
CN103077008A (en) Address alignment SIMD (Single Instruction Multiple Data) acceleration method of array addition operation assembly library program
US8055883B2 (en) Pipe scheduling for pipelines based on destination register number
CN112074810A (en) Parallel processing apparatus
CN101615113A (en) The microprocessor realizing method of one finishing one butterfly operation by one instruction
CN112506853A (en) Reconfigurable processing unit array of zero-buffer flow and zero-buffer flow method
CN108268349B (en) INTEL AVX instruction set-based floating point peak value calculation throughput testing method
EP3655851B1 (en) Register-based complex number processing
CN102693118A (en) Scalar floating point operation accelerator
Roascio Analysis and extension of an open-source VHDL model of a General-Purpose GPU
CN111553123B (en) Code execution optimization method based on DSP (digital Signal processor) under complex function limited register
US11789701B2 (en) Controlling carry-save adders in multiplication

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant