CN104765590B - A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor - Google Patents

A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor Download PDF

Info

Publication number
CN104765590B
CN104765590B CN201510213002.3A CN201510213002A CN104765590B CN 104765590 B CN104765590 B CN 104765590B CN 201510213002 A CN201510213002 A CN 201510213002A CN 104765590 B CN104765590 B CN 104765590B
Authority
CN
China
Prior art keywords
instruction
jump instruction
superscale
jump
address
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510213002.3A
Other languages
Chinese (zh)
Other versions
CN104765590A (en
Inventor
何虎
付家为
麻军平
杜勇
王旭
侯毓敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201510213002.3A priority Critical patent/CN104765590B/en
Publication of CN104765590A publication Critical patent/CN104765590A/en
Application granted granted Critical
Publication of CN104765590B publication Critical patent/CN104765590B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor, distribute the NDA of bag where fetching level obtains jump instruction from BTB tables first, then judge whether the instruction in the Fetch Packet comprising jump instruction behind jump instruction can perform according to this NDA, and be identified with valid values, it is under superscale or VLIW word pattern to judge the jump instruction level is distributed according to the difference that valid values are identified, under superscale pattern, instruction in distribution bag after jump instruction can not be performed, re-executed from the latter bar instruction of jump instruction if prediction error;Under VLIW word pattern, jump instruction subsequent instruction is allowed and jump instruction executed in parallel in distribution bag, is re-executed from next distribution bag first address if prediction error;The present invention can make the mixed architecture processor can carry out branch prediction in both modes, and jump instruction loss of cycle is reduced while both of which strong point is absorbed, and improve processor performance.

Description

A kind of branch prediction for supporting superscale and very long instruction word mixed architecture processor Method
Technical field
It is more particularly to a kind of to support that superscale is processed with very long instruction word mixed architecture the present invention relates to electronic technology field The branch prediction method of device.
Background technology
The function of modern processors is ever-changing, and application field is also different, but pursues the raising of processor performance But it is the common objective of every field.Processor performance can perform the time to weigh by CPU, equation below:
CPU performs the cpu clock periodicity ' clock cycle length of time=program
CPU performs the cpu clock periodicity of time=program
By above formula as can be seen that instruction number, CPI (Cycles Per Instruction), clock cycle length are common It is same to affect processor performance.Wherein, instruction number determines that CPI is determined by computer architecture by Instruction system and compiler, when Clock Cycle Length is then determined by computer hardware technology.
Pipelining is commonly used in modern processors, and execution process instruction is divided into multiple stages by this technology, The different phase that each instruction is performed so can just improve the utilization rate of streamline unit with overlapped execution, drop Low clock periodicity, improves processor performance.
In order to improve processor performance by reducing CPI, it is necessary to the concurrency in excavation program.Excavating concurrency includes Instruction level parallelism, Thread-Level Parallelism and task-level parallelism, presently the most ripe technology surely belong to instruction level parallelism.So-called instruction Collection concurrency, refers to the ability of effects of overlapping or executed in parallel in program process.Excavating instruction-level parallelism process In, superscale (Superscalar) technology and very long instruction word (VLIW) technology are widely used.
Superscalar techniques are also referred to as dynamic multi-emitting technologies, are in pipeline implementation, instruction to be completed simultaneously by hardware The technology of row scheduling.This technology is relatively low for compiler requirement, and programming is relatively simple, and the portability of software is relatively strong, but Hardware spending is larger, and reconfigurability is poor, meanwhile, hardware time delay, power consumption also increase therewith.
Very Long Instruction Word Computer is also referred to as static multi-emitting technology, is to excavate instruction-parallelism in the compilation phase by compiler, Complete the technology of instruction scheduling.Determined by compiler due to the transmitting of Very Long Instruction Word Computer, so do not increase hardware answering Miscellaneous degree, but the compiling difficulty of compiler is considerably increased, it is very high to compiler requirement.
Design superscale and the purpose of the processor of very long instruction word mixed architecture are exactly that to absorb both of which respective excellent Point, the code more for key execution number of times, instruction-parallelism raising property is fully excavated using VLIW word pattern Can, for residue code, the portability of software is improved using superscale pattern.
In modern processors, pipelining is widely used, in the case where branch prediction is not carried out, jump instruction Next instruction is performed, and finds to perform mistake level is performed, then empty streamline, jumps to destination address and re-executes, this Just the loss in several cycles is brought.With the increase of pipeline series, for the processor without branch prediction techniques, The loss of cycle brought by jump instruction brings serious restriction to processor performance lifting.The appearance of branch prediction techniques is big The loss of cycle that jump instruction is brought is reduced greatly, branch prediction techniques have become modern processors and carry high performance pass Key technology.
The content of the invention
In order to overcome the shortcoming of above-mentioned prior art, support that superscale refers to overlength it is an object of the invention to provide one kind Make the branch prediction method of word mixed architecture processor, you can to give full play to the strong point of superscale and VLIW word pattern, Processor performance can be improved using branch prediction techniques in both modes again.
To achieve these goals, the technical solution adopted by the present invention is:
A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor, including:
Where fetching level obtains jump instruction from branch target buffering (BTB, Branch Target Buffer) table Distribute next distribution bag first address (NDA, Next Dispatch-packet Address of branch of bag Instruction), judge to be redirected described in the Fetch Packet comprising the jump instruction according to the next distribution bag first address Whether instruction instruction below can perform, and is identified with valid values;
The jump instruction is judged in superscale or overlength level is distributed according to the difference that the valid values are identified Under instruction word pattern;
When under superscale pattern, the instruction described in distribution bag after jump instruction is not performed;
When under VLIW word pattern, jump instruction subsequent instruction is allowed with the jump instruction simultaneously described in distribution bag Row is performed;
Branch prediction accuracy is judged level is performed, if prediction is correct, is continued executing with, if pre- sniffing By mistake:
When under superscale pattern, then re-executed from the latter bar instruction of the jump instruction;
When under VLIW word pattern, then re-executed from the next distribution bag first address.
It is described to judge to be redirected described in the Fetch Packet comprising the jump instruction according to the next distribution bag first address Whether the method that can be performed is for instruction instruction below:
Instruction valid after next distribution bag first address in one Fetch Packet is set to 0, remaining instruction valid puts 1, Valid represents this instruction effectively for 1, and valid 0 represents this instruction ignore, i.e. if valid values are 0, the instruction Do not perform, if valid is 1, the instruction is performed.
The jump instruction is judged in superscale or overlength level is distributed according to the difference that the valid values are identified Instructing the rule under word pattern is:
If in same distribution bag, the instruction valid values behind jump instruction are 0, then be in superscale pattern, such as Instruction valid values behind fruit jump instruction are 1, then be VLIW word pattern.
Under superscale pattern, the next distribution bag first address be the jump instruction followed by an instruction Address;
Under VLIW word pattern, the next distribution bag first address is the next of distribution bag where the jump instruction Individual distribution bag first address.
Compared with prior art, branch prediction method of the present invention can apply to superscale and very long instruction word mixed architecture Processor on, it is also possible to be applied individually to any the processor of super scale architecture or VLIW structured, hardware spending is smaller.
Brief description of the drawings
Fig. 1 is the stream of the double transmitting super scale architectures of the embodiment of the present invention and six transmitting VLIW structured hybrid processors Waterline schematic diagram.
Fig. 2 is the double transmitting super scale architectures of the embodiment of the present invention hard with six transmitting VLIW structured hybrid processors Part structural representation.
Fig. 3 is implementation flow chart of the present invention.
Fig. 4 is BTB tables storage content schematic diagram of the present invention.
Fig. 5 is NDA positions schematic diagram in present invention distribution bag.
Fig. 6 is NDA positions and valid value method to set up schematic diagrames in Fetch Packet of the present invention.
Specific embodiment
The specific embodiment of the embodiment of the present invention is further described with reference to Figure of description.Below by The embodiment being described with reference to the drawings is exemplary, is only used for explaining invention, and is not construed as limiting the claims.
In order to be able to have clearer understanding, support superscale proposed by the present invention and VLIW to embodiments of the invention The branch prediction techniques of word mixed architecture processor are mixed with six transmitting VLIW structureds in double transmitting super scale architectures Implement on processor, so the double transmitting super scale architectures for being proposed to the embodiment of the present invention below and six transmitting very long instruction words The streamline of framework hybrid processor is simply introduced.As shown in figure 1, being double transmitting super scale architecture and six transmittings of embodiment The streamline schematic diagram of VLIW structured hybrid processor, streamline is broadly divided into fetching, distribution, decoding, extension level Four, In fetching level, each cycle reads the Fetch Packet of 256 from instruction cache memory, in distribution level, every six instructions One distribution bag of generation.As shown in Fig. 2 being that the double transmitting super scale architectures of embodiment are launched at VLIW structured mixing with six Manage device hardware architecture diagram, in processor include two clusters, each cluster include three independent functional units, including XA, Six functional units of XM, XD, YA, YM, YD.For superscale pattern, two parallel instructions are once at most allowed to launch, only X Cluster is distributed according to the order of XA, XM, XD, because jump instruction is distributed by XD units, so what jump instruction was followed by Instruction can not distribute parallel therewith.For VLIW word pattern, six parallel instructions are once at most allowed to launch, and according to XA, XM, XD, YA, YM, YD order are distributed, and the instruction in same distribution bag behind jump instruction can distribute parallel.
As shown in figure 3, being the implementation flow chart of the embodiment of the present invention, concrete methods of realizing is as follows:
The core of branch predictor is that branch target buffers BTB (Branch Target Buffer), in traditional BTB In table, the historical information of different jump instructions is deposited, generally comprise jump instruction address BIA (Branch Instruction Address), jump target addresses BTA (Branch Target Address) and branch history information BHI (Branch History Information), and corresponding item is indexed by the PC values of jump instruction.The fallout predictor of present invention design On the basis of Classical forecast device BTB tables, next execution packet address NDA (the Next Dispatch- of jump instruction are increased Packet Address of branch instruction) one, as shown in Figure 4.So-called NDA, exactly when prediction redirects mistake Mistake, processor will empty streamline, and correct position re-executes from after jump instruction, and this correct position is NDA.
NDA values are initially calculated in distribution level, in distribution level, for pattern residing for jump instruction B in distribution bag Judged, if superscale pattern, then NDA be exactly jump instruction B followed by an address for instruction, if VLIW word pattern, then NDA is exactly next distribution bag first address of distribution bag where jump instruction B, as shown in Figure 5.
After jump instruction was performed, its jump information is stored in BTB tables including NDA, is performed next time When this same jump instruction, NDA is found and is played a role.Fetching level, every in a Fetch Packet is instructed into Row traversal, searches jump instruction therein, if it find that a jump instruction, the PC values index BTB tables according to it, inquire about its Jump information, finds its NDA values.Setting useful signal valid values, valid represents this instruction effectively for 1, valid 0 Represent this instruction ignore.Instruction valid after NDA in one Fetch Packet is set to 0, remaining instruction valid puts 1, such as Fig. 6 institutes Show.For both of which, NDA positions are variant in Fetch Packet.For superscale pattern, NDA is followed closely after jump instruction, is to jump Turn the address of the latter bar instruction of instruction;For VLIW word pattern, NDA is not followed closely after jump instruction, itself and jump instruction Between several instructions can be instruction with jump instruction executed in parallel.
When instruction follows streamline to enter distribution level, judged according to NDA positions, be superscale pattern or overlength Instruction word pattern.As shown in figure 5, being positions of the NDA in bag is distributed.For superscale pattern, at most double transmittings, jump instruction Do not allow to follow the instruction of executed in parallel afterwards, so jump instruction subsequent instruction is not launched in bag is distributed;For VLIW Word pattern, most six transmittings, has allowed instruction to follow executed in parallel after jump instruction, so referring to behind jump instruction in distribution bag Order can launch together.
When instruction enters performs level, branch prediction accuracy is judged, if prediction is correct, streamline continues Perform, if prediction error, streamline is cleared, while program counter is rewritten as NDA addresses, is re-executed from NDA.
In sum, a kind of of present invention design supports that superscale is pre- with the branch of very long instruction word mixed architecture processor Survey technology, while can giving full play to double mode respective advantage, branch prediction is carried out to jump instruction, is playing processor sheet While body odds for effectiveness, increase the hardware spending of very little, obtain larger performance boost.
More than, only presently preferred embodiments of the present invention, but protection scope of the present invention is not limited thereto, any to be familiar with sheet Those skilled in the art the invention discloses technical scope in, the change or replacement that can be readily occurred in should all be covered Within protection scope of the present invention.Therefore, the protection domain that protection scope of the present invention should be defined by claim is defined.

Claims (2)

1. a kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor, it is characterised in that including:
The distribution where fetching level obtains jump instruction from branch target buffering (BTB, Branch Target Buffer) table The next distribution bag first address (NDA, Next Dispatch-packet Address of branch instruction) of bag, Finger according to the next distribution bag first address to judge described in the Fetch Packet comprising the jump instruction behind jump instruction Whether order can perform, and is identified with valid values;
The jump instruction is judged in superscale or VLIW level is distributed according to the difference that the valid values are identified Under word pattern;
When under superscale pattern, the instruction described in distribution bag after jump instruction is not performed;
When under VLIW word pattern, jump instruction subsequent instruction is allowed and the jump instruction is held parallel described in distribution bag OK;
Branch prediction accuracy is judged level is performed, if prediction is correct, is continued executing with, if prediction error:
When under superscale pattern, then re-executed from the latter bar instruction of the jump instruction;
When under VLIW word pattern, then re-executed from the next distribution bag first address;
It is described that jump instruction described in the Fetch Packet comprising the jump instruction is judged according to the next distribution bag first address Whether the method that can be performed is for instruction below:
Instruction valid after next distribution bag first address in one Fetch Packet is set to 0, remaining instruction valid puts 1, valid and is 1 represents this instruction effectively, and valid 0 represents this instruction ignore, if valid values are 0, the instruction is not performed, if Valid is 1, then the instruction is performed;
The jump instruction is judged in superscale or VLIW level is distributed according to the difference that the valid values are identified Rule under word pattern is:
If in same distribution bag, the instruction valid values behind jump instruction are 0, then be in superscale pattern, if jumped It is 1 to turn instruction instruction valid values below, then be VLIW word pattern.
2. the branch prediction method of superscale and very long instruction word mixed architecture processor is supported according to claim 1, its It is characterised by:
Under superscale pattern, the next distribution bag first address be the jump instruction followed by a ground for instruction Location;
Under VLIW word pattern, the next distribution bag first address is next point of distribution bag where the jump instruction Give out a contract for a project first address.
CN201510213002.3A 2015-04-29 2015-04-29 A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor Active CN104765590B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510213002.3A CN104765590B (en) 2015-04-29 2015-04-29 A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510213002.3A CN104765590B (en) 2015-04-29 2015-04-29 A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor

Publications (2)

Publication Number Publication Date
CN104765590A CN104765590A (en) 2015-07-08
CN104765590B true CN104765590B (en) 2017-06-13

Family

ID=53647448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510213002.3A Active CN104765590B (en) 2015-04-29 2015-04-29 A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor

Country Status (1)

Country Link
CN (1) CN104765590B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108182082A (en) * 2017-12-06 2018-06-19 中国航空工业集团公司西安航空计算技术研究所 A kind of double transmited processor scoreboard circuits of stream treatment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6112299A (en) * 1997-12-31 2000-08-29 International Business Machines Corporation Method and apparatus to select the next instruction in a superscalar or a very long instruction word computer having N-way branching
CN102707988A (en) * 2011-04-07 2012-10-03 威盛电子股份有限公司 Simulation of execution mode back-up register

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6112299A (en) * 1997-12-31 2000-08-29 International Business Machines Corporation Method and apparatus to select the next instruction in a superscalar or a very long instruction word computer having N-way branching
CN102707988A (en) * 2011-04-07 2012-10-03 威盛电子股份有限公司 Simulation of execution mode back-up register

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
VLIW-Superscalar混合结构处理器分支预测结构设计;杜勇 等;《计算机应用与软件》;20140831;第31卷(第8期);第25-27,78页 *

Also Published As

Publication number Publication date
CN104765590A (en) 2015-07-08

Similar Documents

Publication Publication Date Title
US10095519B2 (en) Instruction block address register
US11531552B2 (en) Executing multiple programs simultaneously on a processor core
US10445097B2 (en) Multimodal targets in a block-based processor
CN102750133B (en) 32-Bit triple-emission digital signal processor supporting SIMD
CN108027807A (en) Block-based processor core topology register
US10198263B2 (en) Write nullification
EP3314400A1 (en) Determination of target location for transfer of processor control
US10824429B2 (en) Commit logic and precise exceptions in explicit dataflow graph execution architectures
US10180840B2 (en) Dynamic generation of null instructions
US10031756B2 (en) Multi-nullification
Gautschi et al. Tailoring instruction-set extensions for an ultra-low power tightly-coupled cluster of OpenRISC cores
US7620804B2 (en) Central processing unit architecture with multiple pipelines which decodes but does not execute both branch paths
CN107357552B (en) Optimization method for realizing floating-point complex vector summation based on BWDSP chip
US20150154021A1 (en) Control of switching between execution mechanisms
CN104025034A (en) Configurable reduced instruction set core
CN100451951C (en) 5+3 levels pipeline structure and method in RISC CPU
CN104765590B (en) A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor
Huthmann et al. Automatic high-level synthesis of multi-threaded hardware accelerators
Chappell et al. Microarchitectural support for precomputation microthreads
CN105094750B (en) A kind of the return address prediction technique and device of multiline procedure processor
Caprita et al. Design methods of multithreaded architectures for multicore microcontrollers
JP5696210B2 (en) Processor and instruction processing method thereof
KR102379886B1 (en) Vector instruction processing
KR101118593B1 (en) Apparatus and method for processing VLIW instruction
KR20100129021A (en) Method for processing risc instruction and custom instruction parallely and configurable processor using thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant