CN112527393A - Instruction scheduling optimization device and method for master-slave fusion architecture processor - Google Patents

Instruction scheduling optimization device and method for master-slave fusion architecture processor Download PDF

Info

Publication number
CN112527393A
CN112527393A CN201910879804.6A CN201910879804A CN112527393A CN 112527393 A CN112527393 A CN 112527393A CN 201910879804 A CN201910879804 A CN 201910879804A CN 112527393 A CN112527393 A CN 112527393A
Authority
CN
China
Prior art keywords
instruction
core
template
slave
scheduling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910879804.6A
Other languages
Chinese (zh)
Inventor
吴伟
朱琪
管茂林
沈莉
钱宏
武文浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201910879804.6A priority Critical patent/CN112527393A/en
Publication of CN112527393A publication Critical patent/CN112527393A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
    • G06F9/3869Implementation aspects, e.g. pipeline latches; pipeline synchronisation and clocking

Abstract

The invention discloses an instruction scheduling optimization device and method for a master-slave fusion architecture processor, which are based on the following modules: the instruction scheduling module is used for receiving codes containing target machine information and instruction sequences and scheduling the received instruction sequences according to the instruction templates provided by the instruction template selector; the instruction template selector is used for receiving the target machine information in the code, selecting a master core instruction template or a slave core instruction template according to the target machine information and sending the selected instruction template to the instruction scheduling module; the main core instruction template is used for describing the instruction type of the main core instruction, the target information of the instruction, which pipeline the instruction can be executed on and the instruction delay information; and the slave core instruction template is configured at the back end of the compiler. The invention further reduces the occurrence probability of pipeline blockage, optimizes the instruction scheduling process of the processor, improves the accuracy of instruction scheduling and the performance index of instruction scheduling, and realizes the optimization of the instruction scheduling process.

Description

Instruction scheduling optimization device and method for master-slave fusion architecture processor
Technical Field
The invention relates to an instruction scheduling optimization device and method for a master-slave fusion architecture processor, and belongs to the technical field of computer compiling optimization.
Background
Instruction scheduling is a very important optimization technique in compilation optimization. On the RISC machine pipeline, the formats of all the instructions are consistent, the instruction cycles of all the instructions are also the same, and the times of pipeline blocking can be reduced, cache access failure rate can be reduced, data access locality can be improved, access and storage overhead can be hidden and the like by rearranging the instruction sequence, so that the performance index of the program can be improved. However, instruction scheduling is based on an instruction template, and information such as instruction types, pipeline binding conditions, instruction delays and the like defined by the instruction template plays a guiding role in instruction scheduling. The complete instruction template can support more elaborate instruction scheduling.
The many-core processor adopts a master-slave fusion architecture, a master core and a slave core adopt different pipeline structures and different instruction sets, and the master core instruction and the slave core instruction are different from instruction fetching, transmission and execution. From the view of a pipeline structure, a main core and a slave core both adopt a multi-stream design, the number of pipelines of the main core and the slave core is different, and the support for SIMD instructions, floating point instructions, integer instructions and special instructions is also different; from the instruction set perspective, the latency of the same instruction is different on the master and slave cores, and the number and types of instructions supported by the master and slave cores are different.
The traditional organization form of a set of instruction templates of a processor cannot completely reflect the differences of a main core and a slave core of the processor in the aspects of pipeline structures, instruction sets and the like, and cannot well support respective instruction scheduling mechanisms of the main core and the slave core, so that the further improvement and optimization of the instruction scheduling mechanisms are restricted. Therefore, in the instruction scheduling process, the master core instruction sequence and the slave core instruction sequence need to be considered separately.
Disclosure of Invention
The invention aims to provide an instruction scheduling optimization device and method for a master-slave fusion architecture processor, which further reduce the probability of pipeline blockage, optimize the instruction scheduling process of the processor, improve the accuracy of instruction scheduling and the performance index of instruction scheduling, and realize the optimization of the instruction scheduling process.
In order to achieve the purpose, the invention adopts the technical scheme that: an instruction scheduling optimization device for a master-slave fusion architecture processor is based on the following modules:
the instruction scheduling module is used for receiving codes containing target machine information and instruction sequences, does not distinguish whether the instruction sequences are executed on the main core or the auxiliary core, and is also used for scheduling the received instruction sequences according to the instruction templates provided by the instruction template selector;
the instruction template selector is used for receiving the target machine information in the code, selecting a master core instruction template or a slave core instruction template according to the target machine information, and sending the selected instruction template to the instruction scheduling module;
the main core instruction template describes the instruction type of the main core instruction, the parameter information of the instruction, the pipeline structure information of the instruction and the instruction delay information through the md type configuration file;
the slave core instruction template describes the instruction type of the slave core instruction, parameter information of the instruction, pipeline structure information of the instruction, and instruction delay information through the md type configuration file.
The instruction scheduling optimization method for the master-slave fusion architecture processor based on the instruction scheduling optimization device comprises the following steps:
s1, separating instruction templates, namely separating the instruction templates of the master core and the slave core at the back end of the compiler to generate a master core instruction template host.md file and a slave core instruction template slave.md file;
s2, performing instruction template optimization configuration, namely accurately describing the instruction templates of the master core and the slave core according to the architecture of the target machine and the instruction set information, and accurately describing the instruction type, the instruction delay, the instruction parameters and the pipeline structure information;
and S3, performing instruction scheduling optimization, wherein in the instruction scheduling stage, the compiler calls an instruction template selector to select a newly generated main core instruction template or a newly generated auxiliary core instruction template, and performing fine-grained scheduling optimization on the instructions according to the accurate description of the instruction templates.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the invention discloses an instruction scheduling optimization device and method for a processor with a master-slave fusion architecture, which are used for separating a master core from a slave core from the back end of a compiler, respectively optimizing and configuring relevant parameters (instruction delay and pipeline structure) of instructions of the master core and the slave core, generating respective instruction templates, accurately describing information of instructions of the master core and the slave core, and supporting fine granularity and fine scheduling of the instructions in a complex processor structure of a multi-pipeline by combining the master core instruction template and the slave core instruction template in an instruction scheduling stage, further reducing the probability of pipeline blockage, optimizing the instruction scheduling process of the processor, improving the accuracy of the instruction scheduling and the performance index of the instruction scheduling, and realizing the optimization of the instruction scheduling process.
Drawings
FIG. 1 is a schematic block diagram of an instruction scheduling optimization apparatus for a master-slave fusion architecture processor according to the present invention;
FIG. 2 is a flowchart of an instruction scheduling optimization method for a master-slave converged architecture processor according to the present invention.
Detailed Description
Example (b): an instruction scheduling optimization device for a master-slave fusion architecture processor is based on the following modules:
the instruction scheduling module is used for receiving codes containing target machine information and instruction sequences, does not distinguish whether the instruction sequences are executed on the main core or the auxiliary core, and is also used for scheduling the received instruction sequences according to the instruction templates provided by the instruction template selector;
the instruction template selector is used for receiving the target machine information in the code, selecting a master core instruction template or a slave core instruction template according to the target machine information, and sending the selected instruction template to the instruction scheduling module;
the main core instruction template describes the instruction type of the main core instruction, the parameter information of the instruction, the pipeline structure information of the instruction and the instruction delay information through the md type configuration file;
the slave core instruction template describes the instruction type of the slave core instruction, parameter information of the instruction, pipeline structure information of the instruction, and instruction delay information through the md type configuration file.
An instruction scheduling optimization method for a master-slave fusion architecture processor based on the instruction scheduling optimization device comprises the following steps:
s1, separating instruction templates, namely separating the instruction templates of the master core and the slave core at the back end of the compiler to generate a master core instruction template host.md file and a slave core instruction template slave.md file;
s2, performing instruction template optimization configuration, namely accurately describing the instruction templates of the master core and the slave core according to the architecture of the target machine and the instruction set information, and accurately describing the instruction type, the instruction delay, the instruction parameters and the pipeline structure information;
and S3, performing instruction scheduling optimization, wherein in the instruction scheduling stage, the compiler calls an instruction template selector to select a newly generated main core instruction template or a newly generated auxiliary core instruction template, and performing fine-grained scheduling optimization on the instructions according to the accurate description of the instruction templates.
The examples are further explained below:
the schematic block diagram of the scheme of the invention is shown in fig. 1 and comprises four parts, namely an instruction scheduling part, an instruction template selector, a master core instruction template and a slave core instruction template.
1. Instruction scheduling, mainly three aspects of work are carried out:
(1) the code enters a compiler instruction scheduling module;
(2) the instruction scheduling module receives the instruction sequence without distinguishing whether the instruction sequence is executed on the main core or the slave core;
(3) and the instruction scheduling module schedules the instruction according to the instruction template provided by the instruction template selector.
2. The instruction template selector mainly performs three operations:
(1) the instruction template selector receives the target machine information in the code;
(2) the instruction template selector executes the function of the multi-channel selector and selects a main core instruction template or a slave core instruction template according to the information of the target machine;
(3) and the instruction template selector sends the selected instruction template to the instruction scheduling module.
3. The main core instruction template mainly describes four aspects of information:
(1) describing the instruction type;
(2) target information describing the instruction;
(3) describe on which pipeline an instruction may execute;
(4) instruction delay information is described.
4. The slave core instruction template is similar to the master core instruction template and mainly describes four-aspect information:
(1) describing the instruction type;
(2) target information describing the instruction;
(3) describe on which pipeline an instruction may execute;
(4) instruction delay information is described.
When the instruction scheduling optimization device and method for the master-slave fusion architecture processor are adopted, the master-slave cores are separated from the rear end of the compiler, relevant parameters (instruction delay and pipeline structure) of the master-slave core instructions are optimized and configured respectively, respective instruction templates are generated, information of the master-slave core instructions can be accurately described, fine granularity and fine scheduling of instructions are supported in a complex processor structure of a multi-pipeline by combining the master-slave core instruction templates in an instruction scheduling stage, the probability of pipeline blocking is further reduced, the instruction scheduling process of the processor is optimized, the accuracy of instruction scheduling and performance indexes of instruction scheduling are improved, and optimization of the instruction scheduling process is realized.
To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:
isomerization: the central processing units or specific hardware accelerating units with different architectures are organically and internally fused on one chip according to related technical standards and specifications, and the cooperative computing among different heterogeneous cores is realized.
Processor pipeline: the method is a technology for decomposing an instruction into multiple steps and overlapping the operations of the steps of different instructions so as to realize parallel processing of a plurality of instructions and accelerate the program running process.
RISC: all called Reduced Instruction SET Computer, refers to a Reduced Instruction SET Computer, all instructions are in a consistent format, all instructions have the same Instruction cycle, and pipelining is used.
And (3) instruction scheduling: machine code execution is rearranged to minimize the performance level required to execute a particular instruction sequence.
Blocking a flow line: refers to the situation where a delayed execution of an instruction in the pipeline results from a structure dependent, data dependent, or control dependent.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (2)

1. An instruction scheduling optimization device facing a master-slave fusion architecture processor is characterized in that: based on the following modules:
the instruction scheduling module is used for receiving codes containing target machine information and instruction sequences, does not distinguish whether the instruction sequences are executed on the main core or the auxiliary core, and is also used for scheduling the received instruction sequences according to the instruction templates provided by the instruction template selector;
the instruction template selector is used for receiving the target machine information in the code, selecting a master core instruction template or a slave core instruction template according to the target machine information, and sending the selected instruction template to the instruction scheduling module;
the main core instruction template describes the instruction type of the main core instruction, the parameter information of the instruction, the pipeline structure information of the instruction and the instruction delay information through the md type configuration file;
the slave core instruction template describes the instruction type of the slave core instruction, parameter information of the instruction, pipeline structure information of the instruction, and instruction delay information through the md type configuration file.
2. An instruction scheduling optimization method for a master-slave fusion architecture processor based on the instruction scheduling optimization device is characterized in that: the method comprises the following steps:
s1, separating instruction templates, namely separating the instruction templates of the master core and the slave core at the back end of the compiler to generate a master core instruction template host.md file and a slave core instruction template slave.md file;
s2, performing instruction template optimization configuration, namely accurately describing the instruction templates of the master core and the slave core according to the architecture of the target machine and the instruction set information, and accurately describing the instruction type, the instruction delay, the instruction parameters and the pipeline structure information;
and S3, performing instruction scheduling optimization, wherein in the instruction scheduling stage, the compiler calls an instruction template selector to select a newly generated main core instruction template or a newly generated auxiliary core instruction template, and performing fine-grained scheduling optimization on the instructions according to the accurate description of the instruction templates.
CN201910879804.6A 2019-09-18 2019-09-18 Instruction scheduling optimization device and method for master-slave fusion architecture processor Withdrawn CN112527393A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910879804.6A CN112527393A (en) 2019-09-18 2019-09-18 Instruction scheduling optimization device and method for master-slave fusion architecture processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910879804.6A CN112527393A (en) 2019-09-18 2019-09-18 Instruction scheduling optimization device and method for master-slave fusion architecture processor

Publications (1)

Publication Number Publication Date
CN112527393A true CN112527393A (en) 2021-03-19

Family

ID=74974976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910879804.6A Withdrawn CN112527393A (en) 2019-09-18 2019-09-18 Instruction scheduling optimization device and method for master-slave fusion architecture processor

Country Status (1)

Country Link
CN (1) CN112527393A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115686630A (en) * 2022-10-28 2023-02-03 龙芯中科(南京)技术有限公司 Control method and system of controlled assembly, electronic device and readable medium
CN116302114A (en) * 2023-02-24 2023-06-23 进迭时空(珠海)科技有限公司 Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU
CN116431562A (en) * 2023-06-12 2023-07-14 太初(无锡)电子科技有限公司 Multi-head attention mechanism fusion calculation distribution method based on acceleration processor

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101299194A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system thread-level dynamic dispatching method based on configurable processor
CN102831011A (en) * 2012-08-10 2012-12-19 上海交通大学 Task scheduling method and device based on multi-core system
US20140282592A1 (en) * 2013-03-15 2014-09-18 Soft Machines, Inc. Method for executing multithreaded instructions grouped into blocks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101299194A (en) * 2008-06-26 2008-11-05 上海交通大学 Heterogeneous multi-core system thread-level dynamic dispatching method based on configurable processor
CN102831011A (en) * 2012-08-10 2012-12-19 上海交通大学 Task scheduling method and device based on multi-core system
US20140282592A1 (en) * 2013-03-15 2014-09-18 Soft Machines, Inc. Method for executing multithreaded instructions grouped into blocks

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115686630A (en) * 2022-10-28 2023-02-03 龙芯中科(南京)技术有限公司 Control method and system of controlled assembly, electronic device and readable medium
CN116302114A (en) * 2023-02-24 2023-06-23 进迭时空(珠海)科技有限公司 Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU
CN116302114B (en) * 2023-02-24 2024-01-23 进迭时空(珠海)科技有限公司 Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU
CN116431562A (en) * 2023-06-12 2023-07-14 太初(无锡)电子科技有限公司 Multi-head attention mechanism fusion calculation distribution method based on acceleration processor
CN116431562B (en) * 2023-06-12 2023-11-28 太初(无锡)电子科技有限公司 Multi-head attention mechanism fusion calculation distribution method based on acceleration processor

Similar Documents

Publication Publication Date Title
CN101799760B (en) System and method of generating parallel simd code for an arbitrary target architecture
CN112527393A (en) Instruction scheduling optimization device and method for master-slave fusion architecture processor
US7926046B2 (en) Compiler method for extracting and accelerator template program
CN103714039B (en) universal computing digital signal processor
CN108509270B (en) High-performance parallel implementation method of K-means algorithm on domestic Shenwei 26010 many-core processor
CN103729288A (en) Application program debugging method under embedded multi-core environment
US9280332B2 (en) Code converting method, program, and system
CN103116513B (en) A kind of heterogeneous multi-nucleus processor compiler
KR20110093965A (en) System and method for translating high-level programming language code into hardware description language code
CN110865814B (en) Compiler implementation method and system supporting heterogeneous computing core architecture
CN105183698A (en) Control processing system and method based on multi-kernel DSP
US20140317388A1 (en) Apparatus and method for supporting multi-modes of processor
CN110704364A (en) Automatic dynamic reconstruction method and system based on field programmable gate array
CN109558226B (en) DSP multi-core parallel computing scheduling method based on inter-core interruption
CN112558977B (en) Polyhedron optimization method oriented to heterogeneous many-core rear end based cost model
US8650525B2 (en) Integrated circuit compilation
US20080120497A1 (en) Automated configuration of a processing system using decoupled memory access and computation
US20110289298A1 (en) Semiconductor circuit and designing apparatus
CN112783511B (en) Optimization method, system and terminal of grid cell few-group parameter calculation module program
CN112579089B (en) Heterogeneous many-core data reuse method
CN113391932A (en) Parallel characteristic line method transport scanning method and device for heterogeneous many-core architecture
EP0883060A2 (en) Compiler capable of carrying out both size optimization and speed optimization
US6895494B1 (en) Sub-pipelined and pipelined execution in a VLIW
CN113590194B (en) Method for transplanting and cutting execution components crossing instruction sets
CN112947999B (en) Method and device for expanding instruction function of reduced instruction set computer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210319