CN104765590B - A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor - Google Patents
A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor Download PDFInfo
- Publication number
- CN104765590B CN104765590B CN201510213002.3A CN201510213002A CN104765590B CN 104765590 B CN104765590 B CN 104765590B CN 201510213002 A CN201510213002 A CN 201510213002A CN 104765590 B CN104765590 B CN 104765590B
- Authority
- CN
- China
- Prior art keywords
- instruction
- jump instruction
- superscale
- jump
- address
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor, distribute the NDA of bag where fetching level obtains jump instruction from BTB tables first, then judge whether the instruction in the Fetch Packet comprising jump instruction behind jump instruction can perform according to this NDA, and be identified with valid values, it is under superscale or VLIW word pattern to judge the jump instruction level is distributed according to the difference that valid values are identified, under superscale pattern, instruction in distribution bag after jump instruction can not be performed, re-executed from the latter bar instruction of jump instruction if prediction error;Under VLIW word pattern, jump instruction subsequent instruction is allowed and jump instruction executed in parallel in distribution bag, is re-executed from next distribution bag first address if prediction error;The present invention can make the mixed architecture processor can carry out branch prediction in both modes, and jump instruction loss of cycle is reduced while both of which strong point is absorbed, and improve processor performance.
Description
Technical field
It is more particularly to a kind of to support that superscale is processed with very long instruction word mixed architecture the present invention relates to electronic technology field
The branch prediction method of device.
Background technology
The function of modern processors is ever-changing, and application field is also different, but pursues the raising of processor performance
But it is the common objective of every field.Processor performance can perform the time to weigh by CPU, equation below:
CPU performs the cpu clock periodicity ' clock cycle length of time=program
CPU performs the cpu clock periodicity of time=program
By above formula as can be seen that instruction number, CPI (Cycles Per Instruction), clock cycle length are common
It is same to affect processor performance.Wherein, instruction number determines that CPI is determined by computer architecture by Instruction system and compiler, when
Clock Cycle Length is then determined by computer hardware technology.
Pipelining is commonly used in modern processors, and execution process instruction is divided into multiple stages by this technology,
The different phase that each instruction is performed so can just improve the utilization rate of streamline unit with overlapped execution, drop
Low clock periodicity, improves processor performance.
In order to improve processor performance by reducing CPI, it is necessary to the concurrency in excavation program.Excavating concurrency includes
Instruction level parallelism, Thread-Level Parallelism and task-level parallelism, presently the most ripe technology surely belong to instruction level parallelism.So-called instruction
Collection concurrency, refers to the ability of effects of overlapping or executed in parallel in program process.Excavating instruction-level parallelism process
In, superscale (Superscalar) technology and very long instruction word (VLIW) technology are widely used.
Superscalar techniques are also referred to as dynamic multi-emitting technologies, are in pipeline implementation, instruction to be completed simultaneously by hardware
The technology of row scheduling.This technology is relatively low for compiler requirement, and programming is relatively simple, and the portability of software is relatively strong, but
Hardware spending is larger, and reconfigurability is poor, meanwhile, hardware time delay, power consumption also increase therewith.
Very Long Instruction Word Computer is also referred to as static multi-emitting technology, is to excavate instruction-parallelism in the compilation phase by compiler,
Complete the technology of instruction scheduling.Determined by compiler due to the transmitting of Very Long Instruction Word Computer, so do not increase hardware answering
Miscellaneous degree, but the compiling difficulty of compiler is considerably increased, it is very high to compiler requirement.
Design superscale and the purpose of the processor of very long instruction word mixed architecture are exactly that to absorb both of which respective excellent
Point, the code more for key execution number of times, instruction-parallelism raising property is fully excavated using VLIW word pattern
Can, for residue code, the portability of software is improved using superscale pattern.
In modern processors, pipelining is widely used, in the case where branch prediction is not carried out, jump instruction
Next instruction is performed, and finds to perform mistake level is performed, then empty streamline, jumps to destination address and re-executes, this
Just the loss in several cycles is brought.With the increase of pipeline series, for the processor without branch prediction techniques,
The loss of cycle brought by jump instruction brings serious restriction to processor performance lifting.The appearance of branch prediction techniques is big
The loss of cycle that jump instruction is brought is reduced greatly, branch prediction techniques have become modern processors and carry high performance pass
Key technology.
The content of the invention
In order to overcome the shortcoming of above-mentioned prior art, support that superscale refers to overlength it is an object of the invention to provide one kind
Make the branch prediction method of word mixed architecture processor, you can to give full play to the strong point of superscale and VLIW word pattern,
Processor performance can be improved using branch prediction techniques in both modes again.
To achieve these goals, the technical solution adopted by the present invention is:
A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor, including:
Where fetching level obtains jump instruction from branch target buffering (BTB, Branch Target Buffer) table
Distribute next distribution bag first address (NDA, Next Dispatch-packet Address of branch of bag
Instruction), judge to be redirected described in the Fetch Packet comprising the jump instruction according to the next distribution bag first address
Whether instruction instruction below can perform, and is identified with valid values;
The jump instruction is judged in superscale or overlength level is distributed according to the difference that the valid values are identified
Under instruction word pattern;
When under superscale pattern, the instruction described in distribution bag after jump instruction is not performed;
When under VLIW word pattern, jump instruction subsequent instruction is allowed with the jump instruction simultaneously described in distribution bag
Row is performed;
Branch prediction accuracy is judged level is performed, if prediction is correct, is continued executing with, if pre- sniffing
By mistake:
When under superscale pattern, then re-executed from the latter bar instruction of the jump instruction;
When under VLIW word pattern, then re-executed from the next distribution bag first address.
It is described to judge to be redirected described in the Fetch Packet comprising the jump instruction according to the next distribution bag first address
Whether the method that can be performed is for instruction instruction below:
Instruction valid after next distribution bag first address in one Fetch Packet is set to 0, remaining instruction valid puts 1,
Valid represents this instruction effectively for 1, and valid 0 represents this instruction ignore, i.e. if valid values are 0, the instruction
Do not perform, if valid is 1, the instruction is performed.
The jump instruction is judged in superscale or overlength level is distributed according to the difference that the valid values are identified
Instructing the rule under word pattern is:
If in same distribution bag, the instruction valid values behind jump instruction are 0, then be in superscale pattern, such as
Instruction valid values behind fruit jump instruction are 1, then be VLIW word pattern.
Under superscale pattern, the next distribution bag first address be the jump instruction followed by an instruction
Address;
Under VLIW word pattern, the next distribution bag first address is the next of distribution bag where the jump instruction
Individual distribution bag first address.
Compared with prior art, branch prediction method of the present invention can apply to superscale and very long instruction word mixed architecture
Processor on, it is also possible to be applied individually to any the processor of super scale architecture or VLIW structured, hardware spending is smaller.
Brief description of the drawings
Fig. 1 is the stream of the double transmitting super scale architectures of the embodiment of the present invention and six transmitting VLIW structured hybrid processors
Waterline schematic diagram.
Fig. 2 is the double transmitting super scale architectures of the embodiment of the present invention hard with six transmitting VLIW structured hybrid processors
Part structural representation.
Fig. 3 is implementation flow chart of the present invention.
Fig. 4 is BTB tables storage content schematic diagram of the present invention.
Fig. 5 is NDA positions schematic diagram in present invention distribution bag.
Fig. 6 is NDA positions and valid value method to set up schematic diagrames in Fetch Packet of the present invention.
Specific embodiment
The specific embodiment of the embodiment of the present invention is further described with reference to Figure of description.Below by
The embodiment being described with reference to the drawings is exemplary, is only used for explaining invention, and is not construed as limiting the claims.
In order to be able to have clearer understanding, support superscale proposed by the present invention and VLIW to embodiments of the invention
The branch prediction techniques of word mixed architecture processor are mixed with six transmitting VLIW structureds in double transmitting super scale architectures
Implement on processor, so the double transmitting super scale architectures for being proposed to the embodiment of the present invention below and six transmitting very long instruction words
The streamline of framework hybrid processor is simply introduced.As shown in figure 1, being double transmitting super scale architecture and six transmittings of embodiment
The streamline schematic diagram of VLIW structured hybrid processor, streamline is broadly divided into fetching, distribution, decoding, extension level Four,
In fetching level, each cycle reads the Fetch Packet of 256 from instruction cache memory, in distribution level, every six instructions
One distribution bag of generation.As shown in Fig. 2 being that the double transmitting super scale architectures of embodiment are launched at VLIW structured mixing with six
Manage device hardware architecture diagram, in processor include two clusters, each cluster include three independent functional units, including XA,
Six functional units of XM, XD, YA, YM, YD.For superscale pattern, two parallel instructions are once at most allowed to launch, only X
Cluster is distributed according to the order of XA, XM, XD, because jump instruction is distributed by XD units, so what jump instruction was followed by
Instruction can not distribute parallel therewith.For VLIW word pattern, six parallel instructions are once at most allowed to launch, and according to
XA, XM, XD, YA, YM, YD order are distributed, and the instruction in same distribution bag behind jump instruction can distribute parallel.
As shown in figure 3, being the implementation flow chart of the embodiment of the present invention, concrete methods of realizing is as follows:
The core of branch predictor is that branch target buffers BTB (Branch Target Buffer), in traditional BTB
In table, the historical information of different jump instructions is deposited, generally comprise jump instruction address BIA (Branch Instruction
Address), jump target addresses BTA (Branch Target Address) and branch history information BHI (Branch
History Information), and corresponding item is indexed by the PC values of jump instruction.The fallout predictor of present invention design
On the basis of Classical forecast device BTB tables, next execution packet address NDA (the Next Dispatch- of jump instruction are increased
Packet Address of branch instruction) one, as shown in Figure 4.So-called NDA, exactly when prediction redirects mistake
Mistake, processor will empty streamline, and correct position re-executes from after jump instruction, and this correct position is NDA.
NDA values are initially calculated in distribution level, in distribution level, for pattern residing for jump instruction B in distribution bag
Judged, if superscale pattern, then NDA be exactly jump instruction B followed by an address for instruction, if
VLIW word pattern, then NDA is exactly next distribution bag first address of distribution bag where jump instruction B, as shown in Figure 5.
After jump instruction was performed, its jump information is stored in BTB tables including NDA, is performed next time
When this same jump instruction, NDA is found and is played a role.Fetching level, every in a Fetch Packet is instructed into
Row traversal, searches jump instruction therein, if it find that a jump instruction, the PC values index BTB tables according to it, inquire about its
Jump information, finds its NDA values.Setting useful signal valid values, valid represents this instruction effectively for 1, valid 0
Represent this instruction ignore.Instruction valid after NDA in one Fetch Packet is set to 0, remaining instruction valid puts 1, such as Fig. 6 institutes
Show.For both of which, NDA positions are variant in Fetch Packet.For superscale pattern, NDA is followed closely after jump instruction, is to jump
Turn the address of the latter bar instruction of instruction;For VLIW word pattern, NDA is not followed closely after jump instruction, itself and jump instruction
Between several instructions can be instruction with jump instruction executed in parallel.
When instruction follows streamline to enter distribution level, judged according to NDA positions, be superscale pattern or overlength
Instruction word pattern.As shown in figure 5, being positions of the NDA in bag is distributed.For superscale pattern, at most double transmittings, jump instruction
Do not allow to follow the instruction of executed in parallel afterwards, so jump instruction subsequent instruction is not launched in bag is distributed;For VLIW
Word pattern, most six transmittings, has allowed instruction to follow executed in parallel after jump instruction, so referring to behind jump instruction in distribution bag
Order can launch together.
When instruction enters performs level, branch prediction accuracy is judged, if prediction is correct, streamline continues
Perform, if prediction error, streamline is cleared, while program counter is rewritten as NDA addresses, is re-executed from NDA.
In sum, a kind of of present invention design supports that superscale is pre- with the branch of very long instruction word mixed architecture processor
Survey technology, while can giving full play to double mode respective advantage, branch prediction is carried out to jump instruction, is playing processor sheet
While body odds for effectiveness, increase the hardware spending of very little, obtain larger performance boost.
More than, only presently preferred embodiments of the present invention, but protection scope of the present invention is not limited thereto, any to be familiar with sheet
Those skilled in the art the invention discloses technical scope in, the change or replacement that can be readily occurred in should all be covered
Within protection scope of the present invention.Therefore, the protection domain that protection scope of the present invention should be defined by claim is defined.
Claims (2)
1. a kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor, it is characterised in that including:
The distribution where fetching level obtains jump instruction from branch target buffering (BTB, Branch Target Buffer) table
The next distribution bag first address (NDA, Next Dispatch-packet Address of branch instruction) of bag,
Finger according to the next distribution bag first address to judge described in the Fetch Packet comprising the jump instruction behind jump instruction
Whether order can perform, and is identified with valid values;
The jump instruction is judged in superscale or VLIW level is distributed according to the difference that the valid values are identified
Under word pattern;
When under superscale pattern, the instruction described in distribution bag after jump instruction is not performed;
When under VLIW word pattern, jump instruction subsequent instruction is allowed and the jump instruction is held parallel described in distribution bag
OK;
Branch prediction accuracy is judged level is performed, if prediction is correct, is continued executing with, if prediction error:
When under superscale pattern, then re-executed from the latter bar instruction of the jump instruction;
When under VLIW word pattern, then re-executed from the next distribution bag first address;
It is described that jump instruction described in the Fetch Packet comprising the jump instruction is judged according to the next distribution bag first address
Whether the method that can be performed is for instruction below:
Instruction valid after next distribution bag first address in one Fetch Packet is set to 0, remaining instruction valid puts 1, valid and is
1 represents this instruction effectively, and valid 0 represents this instruction ignore, if valid values are 0, the instruction is not performed, if
Valid is 1, then the instruction is performed;
The jump instruction is judged in superscale or VLIW level is distributed according to the difference that the valid values are identified
Rule under word pattern is:
If in same distribution bag, the instruction valid values behind jump instruction are 0, then be in superscale pattern, if jumped
It is 1 to turn instruction instruction valid values below, then be VLIW word pattern.
2. the branch prediction method of superscale and very long instruction word mixed architecture processor is supported according to claim 1, its
It is characterised by:
Under superscale pattern, the next distribution bag first address be the jump instruction followed by a ground for instruction
Location;
Under VLIW word pattern, the next distribution bag first address is next point of distribution bag where the jump instruction
Give out a contract for a project first address.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510213002.3A CN104765590B (en) | 2015-04-29 | 2015-04-29 | A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510213002.3A CN104765590B (en) | 2015-04-29 | 2015-04-29 | A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104765590A CN104765590A (en) | 2015-07-08 |
CN104765590B true CN104765590B (en) | 2017-06-13 |
Family
ID=53647448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510213002.3A Active CN104765590B (en) | 2015-04-29 | 2015-04-29 | A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104765590B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108182082A (en) * | 2017-12-06 | 2018-06-19 | 中国航空工业集团公司西安航空计算技术研究所 | A kind of double transmited processor scoreboard circuits of stream treatment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6112299A (en) * | 1997-12-31 | 2000-08-29 | International Business Machines Corporation | Method and apparatus to select the next instruction in a superscalar or a very long instruction word computer having N-way branching |
CN102707988A (en) * | 2011-04-07 | 2012-10-03 | 威盛电子股份有限公司 | Simulation of execution mode back-up register |
-
2015
- 2015-04-29 CN CN201510213002.3A patent/CN104765590B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6112299A (en) * | 1997-12-31 | 2000-08-29 | International Business Machines Corporation | Method and apparatus to select the next instruction in a superscalar or a very long instruction word computer having N-way branching |
CN102707988A (en) * | 2011-04-07 | 2012-10-03 | 威盛电子股份有限公司 | Simulation of execution mode back-up register |
Non-Patent Citations (1)
Title |
---|
VLIW-Superscalar混合结构处理器分支预测结构设计;杜勇 等;《计算机应用与软件》;20140831;第31卷(第8期);第25-27,78页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104765590A (en) | 2015-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10095519B2 (en) | Instruction block address register | |
US11531552B2 (en) | Executing multiple programs simultaneously on a processor core | |
US10445097B2 (en) | Multimodal targets in a block-based processor | |
CN102750133B (en) | 32-Bit triple-emission digital signal processor supporting SIMD | |
CN108027807A (en) | Block-based processor core topology register | |
US10198263B2 (en) | Write nullification | |
EP3314400A1 (en) | Determination of target location for transfer of processor control | |
US10824429B2 (en) | Commit logic and precise exceptions in explicit dataflow graph execution architectures | |
US10180840B2 (en) | Dynamic generation of null instructions | |
US10031756B2 (en) | Multi-nullification | |
Gautschi et al. | Tailoring instruction-set extensions for an ultra-low power tightly-coupled cluster of OpenRISC cores | |
US7620804B2 (en) | Central processing unit architecture with multiple pipelines which decodes but does not execute both branch paths | |
CN107357552B (en) | Optimization method for realizing floating-point complex vector summation based on BWDSP chip | |
US20150154021A1 (en) | Control of switching between execution mechanisms | |
CN104025034A (en) | Configurable reduced instruction set core | |
CN100451951C (en) | 5+3 levels pipeline structure and method in RISC CPU | |
CN104765590B (en) | A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor | |
Huthmann et al. | Automatic high-level synthesis of multi-threaded hardware accelerators | |
Chappell et al. | Microarchitectural support for precomputation microthreads | |
CN105094750B (en) | A kind of the return address prediction technique and device of multiline procedure processor | |
Caprita et al. | Design methods of multithreaded architectures for multicore microcontrollers | |
JP5696210B2 (en) | Processor and instruction processing method thereof | |
KR102379886B1 (en) | Vector instruction processing | |
KR101118593B1 (en) | Apparatus and method for processing VLIW instruction | |
KR20100129021A (en) | Method for processing risc instruction and custom instruction parallely and configurable processor using thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |