CN112527393A

CN112527393A - Instruction scheduling optimization device and method for master-slave fusion architecture processor

Info

Publication number: CN112527393A
Application number: CN201910879804.6A
Authority: CN
Inventors: 吴伟; 朱琪; 管茂林; 沈莉; 钱宏; 武文浩
Original assignee: Wuxi Jiangnan Computing Technology Institute
Current assignee: Wuxi Jiangnan Computing Technology Institute
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2021-03-19

Abstract

The invention discloses an instruction scheduling optimization device and method for a master-slave fusion architecture processor, which are based on the following modules: the instruction scheduling module is used for receiving codes containing target machine information and instruction sequences and scheduling the received instruction sequences according to the instruction templates provided by the instruction template selector; the instruction template selector is used for receiving the target machine information in the code, selecting a master core instruction template or a slave core instruction template according to the target machine information and sending the selected instruction template to the instruction scheduling module; the main core instruction template is used for describing the instruction type of the main core instruction, the target information of the instruction, which pipeline the instruction can be executed on and the instruction delay information; and the slave core instruction template is configured at the back end of the compiler. The invention further reduces the occurrence probability of pipeline blockage, optimizes the instruction scheduling process of the processor, improves the accuracy of instruction scheduling and the performance index of instruction scheduling, and realizes the optimization of the instruction scheduling process.

Description

Instruction scheduling optimization device and method for master-slave fusion architecture processor

Technical Field

The invention relates to an instruction scheduling optimization device and method for a master-slave fusion architecture processor, and belongs to the technical field of computer compiling optimization.

Background

Instruction scheduling is a very important optimization technique in compilation optimization. On the RISC machine pipeline, the formats of all the instructions are consistent, the instruction cycles of all the instructions are also the same, and the times of pipeline blocking can be reduced, cache access failure rate can be reduced, data access locality can be improved, access and storage overhead can be hidden and the like by rearranging the instruction sequence, so that the performance index of the program can be improved. However, instruction scheduling is based on an instruction template, and information such as instruction types, pipeline binding conditions, instruction delays and the like defined by the instruction template plays a guiding role in instruction scheduling. The complete instruction template can support more elaborate instruction scheduling.

The many-core processor adopts a master-slave fusion architecture, a master core and a slave core adopt different pipeline structures and different instruction sets, and the master core instruction and the slave core instruction are different from instruction fetching, transmission and execution. From the view of a pipeline structure, a main core and a slave core both adopt a multi-stream design, the number of pipelines of the main core and the slave core is different, and the support for SIMD instructions, floating point instructions, integer instructions and special instructions is also different; from the instruction set perspective, the latency of the same instruction is different on the master and slave cores, and the number and types of instructions supported by the master and slave cores are different.

The traditional organization form of a set of instruction templates of a processor cannot completely reflect the differences of a main core and a slave core of the processor in the aspects of pipeline structures, instruction sets and the like, and cannot well support respective instruction scheduling mechanisms of the main core and the slave core, so that the further improvement and optimization of the instruction scheduling mechanisms are restricted. Therefore, in the instruction scheduling process, the master core instruction sequence and the slave core instruction sequence need to be considered separately.

Disclosure of Invention

The invention aims to provide an instruction scheduling optimization device and method for a master-slave fusion architecture processor, which further reduce the probability of pipeline blockage, optimize the instruction scheduling process of the processor, improve the accuracy of instruction scheduling and the performance index of instruction scheduling, and realize the optimization of the instruction scheduling process.

In order to achieve the purpose, the invention adopts the technical scheme that: an instruction scheduling optimization device for a master-slave fusion architecture processor is based on the following modules:

the instruction scheduling module is used for receiving codes containing target machine information and instruction sequences, does not distinguish whether the instruction sequences are executed on the main core or the auxiliary core, and is also used for scheduling the received instruction sequences according to the instruction templates provided by the instruction template selector;

the instruction template selector is used for receiving the target machine information in the code, selecting a master core instruction template or a slave core instruction template according to the target machine information, and sending the selected instruction template to the instruction scheduling module;

the main core instruction template describes the instruction type of the main core instruction, the parameter information of the instruction, the pipeline structure information of the instruction and the instruction delay information through the md type configuration file;

the slave core instruction template describes the instruction type of the slave core instruction, parameter information of the instruction, pipeline structure information of the instruction, and instruction delay information through the md type configuration file.

The instruction scheduling optimization method for the master-slave fusion architecture processor based on the instruction scheduling optimization device comprises the following steps:

s1, separating instruction templates, namely separating the instruction templates of the master core and the slave core at the back end of the compiler to generate a master core instruction template host.md file and a slave core instruction template slave.md file;

s2, performing instruction template optimization configuration, namely accurately describing the instruction templates of the master core and the slave core according to the architecture of the target machine and the instruction set information, and accurately describing the instruction type, the instruction delay, the instruction parameters and the pipeline structure information;

and S3, performing instruction scheduling optimization, wherein in the instruction scheduling stage, the compiler calls an instruction template selector to select a newly generated main core instruction template or a newly generated auxiliary core instruction template, and performing fine-grained scheduling optimization on the instructions according to the accurate description of the instruction templates.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:

the invention discloses an instruction scheduling optimization device and method for a processor with a master-slave fusion architecture, which are used for separating a master core from a slave core from the back end of a compiler, respectively optimizing and configuring relevant parameters (instruction delay and pipeline structure) of instructions of the master core and the slave core, generating respective instruction templates, accurately describing information of instructions of the master core and the slave core, and supporting fine granularity and fine scheduling of the instructions in a complex processor structure of a multi-pipeline by combining the master core instruction template and the slave core instruction template in an instruction scheduling stage, further reducing the probability of pipeline blockage, optimizing the instruction scheduling process of the processor, improving the accuracy of the instruction scheduling and the performance index of the instruction scheduling, and realizing the optimization of the instruction scheduling process.

Drawings

FIG. 1 is a schematic block diagram of an instruction scheduling optimization apparatus for a master-slave fusion architecture processor according to the present invention;

FIG. 2 is a flowchart of an instruction scheduling optimization method for a master-slave converged architecture processor according to the present invention.

Detailed Description

Example (b): an instruction scheduling optimization device for a master-slave fusion architecture processor is based on the following modules:

An instruction scheduling optimization method for a master-slave fusion architecture processor based on the instruction scheduling optimization device comprises the following steps:

The examples are further explained below:

the schematic block diagram of the scheme of the invention is shown in fig. 1 and comprises four parts, namely an instruction scheduling part, an instruction template selector, a master core instruction template and a slave core instruction template.

1. Instruction scheduling, mainly three aspects of work are carried out:

(1) the code enters a compiler instruction scheduling module;

(2) the instruction scheduling module receives the instruction sequence without distinguishing whether the instruction sequence is executed on the main core or the slave core;

(3) and the instruction scheduling module schedules the instruction according to the instruction template provided by the instruction template selector.

2. The instruction template selector mainly performs three operations:

(1) the instruction template selector receives the target machine information in the code;

(2) the instruction template selector executes the function of the multi-channel selector and selects a main core instruction template or a slave core instruction template according to the information of the target machine;

(3) and the instruction template selector sends the selected instruction template to the instruction scheduling module.

3. The main core instruction template mainly describes four aspects of information:

(1) describing the instruction type;

(2) target information describing the instruction;

(3) describe on which pipeline an instruction may execute;

(4) instruction delay information is described.

4. The slave core instruction template is similar to the master core instruction template and mainly describes four-aspect information:

(1) describing the instruction type;

(2) target information describing the instruction;

(3) describe on which pipeline an instruction may execute;

(4) instruction delay information is described.

When the instruction scheduling optimization device and method for the master-slave fusion architecture processor are adopted, the master-slave cores are separated from the rear end of the compiler, relevant parameters (instruction delay and pipeline structure) of the master-slave core instructions are optimized and configured respectively, respective instruction templates are generated, information of the master-slave core instructions can be accurately described, fine granularity and fine scheduling of instructions are supported in a complex processor structure of a multi-pipeline by combining the master-slave core instruction templates in an instruction scheduling stage, the probability of pipeline blocking is further reduced, the instruction scheduling process of the processor is optimized, the accuracy of instruction scheduling and performance indexes of instruction scheduling are improved, and optimization of the instruction scheduling process is realized.

To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:

isomerization: the central processing units or specific hardware accelerating units with different architectures are organically and internally fused on one chip according to related technical standards and specifications, and the cooperative computing among different heterogeneous cores is realized.

Processor pipeline: the method is a technology for decomposing an instruction into multiple steps and overlapping the operations of the steps of different instructions so as to realize parallel processing of a plurality of instructions and accelerate the program running process.

RISC: all called Reduced Instruction SET Computer, refers to a Reduced Instruction SET Computer, all instructions are in a consistent format, all instructions have the same Instruction cycle, and pipelining is used.

And (3) instruction scheduling: machine code execution is rearranged to minimize the performance level required to execute a particular instruction sequence.

Blocking a flow line: refers to the situation where a delayed execution of an instruction in the pipeline results from a structure dependent, data dependent, or control dependent.

The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims

1. An instruction scheduling optimization device facing a master-slave fusion architecture processor is characterized in that: based on the following modules:

2. An instruction scheduling optimization method for a master-slave fusion architecture processor based on the instruction scheduling optimization device is characterized in that: the method comprises the following steps: