CN104765590B

CN104765590B - A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor

Info

Publication number: CN104765590B
Application number: CN201510213002.3A
Authority: CN
Inventors: 何虎; 付家为; 麻军平; 杜勇; 王旭; 侯毓敏
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2015-04-29
Filing date: 2015-04-29
Publication date: 2017-06-13
Anticipated expiration: 2035-04-29
Also published as: CN104765590A

Abstract

A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor, distribute the NDA of bag where fetching level obtains jump instruction from BTB tables first, then judge whether the instruction in the Fetch Packet comprising jump instruction behind jump instruction can perform according to this NDA, and be identified with valid values, it is under superscale or VLIW word pattern to judge the jump instruction level is distributed according to the difference that valid values are identified, under superscale pattern, instruction in distribution bag after jump instruction can not be performed, re-executed from the latter bar instruction of jump instruction if prediction error；Under VLIW word pattern, jump instruction subsequent instruction is allowed and jump instruction executed in parallel in distribution bag, is re-executed from next distribution bag first address if prediction error；The present invention can make the mixed architecture processor can carry out branch prediction in both modes, and jump instruction loss of cycle is reduced while both of which strong point is absorbed, and improve processor performance.

Description

A kind of branch prediction for supporting superscale and very long instruction word mixed architecture processor Method

Technical field

It is more particularly to a kind of to support that superscale is processed with very long instruction word mixed architecture the present invention relates to electronic technology field The branch prediction method of device.

Background technology

The function of modern processors is ever-changing, and application field is also different, but pursues the raising of processor performance But it is the common objective of every field.Processor performance can perform the time to weigh by CPU, equation below：

CPU performs the cpu clock periodicity ' clock cycle length of time=program

CPU performs the cpu clock periodicity of time=program

By above formula as can be seen that instruction number, CPI (Cycles Per Instruction), clock cycle length are common It is same to affect processor performance.Wherein, instruction number determines that CPI is determined by computer architecture by Instruction system and compiler, when Clock Cycle Length is then determined by computer hardware technology.

Pipelining is commonly used in modern processors, and execution process instruction is divided into multiple stages by this technology, The different phase that each instruction is performed so can just improve the utilization rate of streamline unit with overlapped execution, drop Low clock periodicity, improves processor performance.

In order to improve processor performance by reducing CPI, it is necessary to the concurrency in excavation program.Excavating concurrency includes Instruction level parallelism, Thread-Level Parallelism and task-level parallelism, presently the most ripe technology surely belong to instruction level parallelism.So-called instruction Collection concurrency, refers to the ability of effects of overlapping or executed in parallel in program process.Excavating instruction-level parallelism process In, superscale (Superscalar) technology and very long instruction word (VLIW) technology are widely used.

Superscalar techniques are also referred to as dynamic multi-emitting technologies, are in pipeline implementation, instruction to be completed simultaneously by hardware The technology of row scheduling.This technology is relatively low for compiler requirement, and programming is relatively simple, and the portability of software is relatively strong, but Hardware spending is larger, and reconfigurability is poor, meanwhile, hardware time delay, power consumption also increase therewith.

Very Long Instruction Word Computer is also referred to as static multi-emitting technology, is to excavate instruction-parallelism in the compilation phase by compiler, Complete the technology of instruction scheduling.Determined by compiler due to the transmitting of Very Long Instruction Word Computer, so do not increase hardware answering Miscellaneous degree, but the compiling difficulty of compiler is considerably increased, it is very high to compiler requirement.

Design superscale and the purpose of the processor of very long instruction word mixed architecture are exactly that to absorb both of which respective excellent Point, the code more for key execution number of times, instruction-parallelism raising property is fully excavated using VLIW word pattern Can, for residue code, the portability of software is improved using superscale pattern.

In modern processors, pipelining is widely used, in the case where branch prediction is not carried out, jump instruction Next instruction is performed, and finds to perform mistake level is performed, then empty streamline, jumps to destination address and re-executes, this Just the loss in several cycles is brought.With the increase of pipeline series, for the processor without branch prediction techniques, The loss of cycle brought by jump instruction brings serious restriction to processor performance lifting.The appearance of branch prediction techniques is big The loss of cycle that jump instruction is brought is reduced greatly, branch prediction techniques have become modern processors and carry high performance pass Key technology.

The content of the invention

In order to overcome the shortcoming of above-mentioned prior art, support that superscale refers to overlength it is an object of the invention to provide one kind Make the branch prediction method of word mixed architecture processor, you can to give full play to the strong point of superscale and VLIW word pattern, Processor performance can be improved using branch prediction techniques in both modes again.

To achieve these goals, the technical solution adopted by the present invention is：

A kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor, including：

Where fetching level obtains jump instruction from branch target buffering (BTB, Branch Target Buffer) table Distribute next distribution bag first address (NDA, Next Dispatch-packet Address of branch of bag Instruction), judge to be redirected described in the Fetch Packet comprising the jump instruction according to the next distribution bag first address Whether instruction instruction below can perform, and is identified with valid values；

The jump instruction is judged in superscale or overlength level is distributed according to the difference that the valid values are identified Under instruction word pattern；

When under superscale pattern, the instruction described in distribution bag after jump instruction is not performed；

When under VLIW word pattern, jump instruction subsequent instruction is allowed with the jump instruction simultaneously described in distribution bag Row is performed；

Branch prediction accuracy is judged level is performed, if prediction is correct, is continued executing with, if pre- sniffing By mistake：

When under superscale pattern, then re-executed from the latter bar instruction of the jump instruction；

When under VLIW word pattern, then re-executed from the next distribution bag first address.

It is described to judge to be redirected described in the Fetch Packet comprising the jump instruction according to the next distribution bag first address Whether the method that can be performed is for instruction instruction below：

Instruction valid after next distribution bag first address in one Fetch Packet is set to 0, remaining instruction valid puts 1, Valid represents this instruction effectively for 1, and valid 0 represents this instruction ignore, i.e. if valid values are 0, the instruction Do not perform, if valid is 1, the instruction is performed.

The jump instruction is judged in superscale or overlength level is distributed according to the difference that the valid values are identified Instructing the rule under word pattern is：

If in same distribution bag, the instruction valid values behind jump instruction are 0, then be in superscale pattern, such as Instruction valid values behind fruit jump instruction are 1, then be VLIW word pattern.

Under superscale pattern, the next distribution bag first address be the jump instruction followed by an instruction Address；

Under VLIW word pattern, the next distribution bag first address is the next of distribution bag where the jump instruction Individual distribution bag first address.

Compared with prior art, branch prediction method of the present invention can apply to superscale and very long instruction word mixed architecture Processor on, it is also possible to be applied individually to any the processor of super scale architecture or VLIW structured, hardware spending is smaller.

Brief description of the drawings

Fig. 1 is the stream of the double transmitting super scale architectures of the embodiment of the present invention and six transmitting VLIW structured hybrid processors Waterline schematic diagram.

Fig. 2 is the double transmitting super scale architectures of the embodiment of the present invention hard with six transmitting VLIW structured hybrid processors Part structural representation.

Fig. 3 is implementation flow chart of the present invention.

Fig. 4 is BTB tables storage content schematic diagram of the present invention.

Fig. 5 is NDA positions schematic diagram in present invention distribution bag.

Fig. 6 is NDA positions and valid value method to set up schematic diagrames in Fetch Packet of the present invention.

Specific embodiment

The specific embodiment of the embodiment of the present invention is further described with reference to Figure of description.Below by The embodiment being described with reference to the drawings is exemplary, is only used for explaining invention, and is not construed as limiting the claims.

In order to be able to have clearer understanding, support superscale proposed by the present invention and VLIW to embodiments of the invention The branch prediction techniques of word mixed architecture processor are mixed with six transmitting VLIW structureds in double transmitting super scale architectures Implement on processor, so the double transmitting super scale architectures for being proposed to the embodiment of the present invention below and six transmitting very long instruction words The streamline of framework hybrid processor is simply introduced.As shown in figure 1, being double transmitting super scale architecture and six transmittings of embodiment The streamline schematic diagram of VLIW structured hybrid processor, streamline is broadly divided into fetching, distribution, decoding, extension level Four, In fetching level, each cycle reads the Fetch Packet of 256 from instruction cache memory, in distribution level, every six instructions One distribution bag of generation.As shown in Fig. 2 being that the double transmitting super scale architectures of embodiment are launched at VLIW structured mixing with six Manage device hardware architecture diagram, in processor include two clusters, each cluster include three independent functional units, including XA, Six functional units of XM, XD, YA, YM, YD.For superscale pattern, two parallel instructions are once at most allowed to launch, only X Cluster is distributed according to the order of XA, XM, XD, because jump instruction is distributed by XD units, so what jump instruction was followed by Instruction can not distribute parallel therewith.For VLIW word pattern, six parallel instructions are once at most allowed to launch, and according to XA, XM, XD, YA, YM, YD order are distributed, and the instruction in same distribution bag behind jump instruction can distribute parallel.

As shown in figure 3, being the implementation flow chart of the embodiment of the present invention, concrete methods of realizing is as follows：

The core of branch predictor is that branch target buffers BTB (Branch Target Buffer), in traditional BTB In table, the historical information of different jump instructions is deposited, generally comprise jump instruction address BIA (Branch Instruction Address), jump target addresses BTA (Branch Target Address) and branch history information BHI (Branch History Information), and corresponding item is indexed by the PC values of jump instruction.The fallout predictor of present invention design On the basis of Classical forecast device BTB tables, next execution packet address NDA (the Next Dispatch- of jump instruction are increased Packet Address of branch instruction) one, as shown in Figure 4.So-called NDA, exactly when prediction redirects mistake Mistake, processor will empty streamline, and correct position re-executes from after jump instruction, and this correct position is NDA.

NDA values are initially calculated in distribution level, in distribution level, for pattern residing for jump instruction B in distribution bag Judged, if superscale pattern, then NDA be exactly jump instruction B followed by an address for instruction, if VLIW word pattern, then NDA is exactly next distribution bag first address of distribution bag where jump instruction B, as shown in Figure 5.

After jump instruction was performed, its jump information is stored in BTB tables including NDA, is performed next time When this same jump instruction, NDA is found and is played a role.Fetching level, every in a Fetch Packet is instructed into Row traversal, searches jump instruction therein, if it find that a jump instruction, the PC values index BTB tables according to it, inquire about its Jump information, finds its NDA values.Setting useful signal valid values, valid represents this instruction effectively for 1, valid 0 Represent this instruction ignore.Instruction valid after NDA in one Fetch Packet is set to 0, remaining instruction valid puts 1, such as Fig. 6 institutes Show.For both of which, NDA positions are variant in Fetch Packet.For superscale pattern, NDA is followed closely after jump instruction, is to jump Turn the address of the latter bar instruction of instruction；For VLIW word pattern, NDA is not followed closely after jump instruction, itself and jump instruction Between several instructions can be instruction with jump instruction executed in parallel.

When instruction follows streamline to enter distribution level, judged according to NDA positions, be superscale pattern or overlength Instruction word pattern.As shown in figure 5, being positions of the NDA in bag is distributed.For superscale pattern, at most double transmittings, jump instruction Do not allow to follow the instruction of executed in parallel afterwards, so jump instruction subsequent instruction is not launched in bag is distributed；For VLIW Word pattern, most six transmittings, has allowed instruction to follow executed in parallel after jump instruction, so referring to behind jump instruction in distribution bag Order can launch together.

When instruction enters performs level, branch prediction accuracy is judged, if prediction is correct, streamline continues Perform, if prediction error, streamline is cleared, while program counter is rewritten as NDA addresses, is re-executed from NDA.

In sum, a kind of of present invention design supports that superscale is pre- with the branch of very long instruction word mixed architecture processor Survey technology, while can giving full play to double mode respective advantage, branch prediction is carried out to jump instruction, is playing processor sheet While body odds for effectiveness, increase the hardware spending of very little, obtain larger performance boost.

More than, only presently preferred embodiments of the present invention, but protection scope of the present invention is not limited thereto, any to be familiar with sheet Those skilled in the art the invention discloses technical scope in, the change or replacement that can be readily occurred in should all be covered Within protection scope of the present invention.Therefore, the protection domain that protection scope of the present invention should be defined by claim is defined.

Claims

1. a kind of branch prediction method for supporting superscale and very long instruction word mixed architecture processor, it is characterised in that including：

The distribution where fetching level obtains jump instruction from branch target buffering (BTB, Branch Target Buffer) table The next distribution bag first address (NDA, Next Dispatch-packet Address of branch instruction) of bag, Finger according to the next distribution bag first address to judge described in the Fetch Packet comprising the jump instruction behind jump instruction Whether order can perform, and is identified with valid values；

The jump instruction is judged in superscale or VLIW level is distributed according to the difference that the valid values are identified Under word pattern；

When under VLIW word pattern, jump instruction subsequent instruction is allowed and the jump instruction is held parallel described in distribution bag OK；

Branch prediction accuracy is judged level is performed, if prediction is correct, is continued executing with, if prediction error：

When under VLIW word pattern, then re-executed from the next distribution bag first address；

It is described that jump instruction described in the Fetch Packet comprising the jump instruction is judged according to the next distribution bag first address Whether the method that can be performed is for instruction below：

Instruction valid after next distribution bag first address in one Fetch Packet is set to 0, remaining instruction valid puts 1, valid and is 1 represents this instruction effectively, and valid 0 represents this instruction ignore, if valid values are 0, the instruction is not performed, if Valid is 1, then the instruction is performed；

The jump instruction is judged in superscale or VLIW level is distributed according to the difference that the valid values are identified Rule under word pattern is：

If in same distribution bag, the instruction valid values behind jump instruction are 0, then be in superscale pattern, if jumped It is 1 to turn instruction instruction valid values below, then be VLIW word pattern.

2. the branch prediction method of superscale and very long instruction word mixed architecture processor is supported according to claim 1, its It is characterised by：

Under superscale pattern, the next distribution bag first address be the jump instruction followed by a ground for instruction Location；

Under VLIW word pattern, the next distribution bag first address is next point of distribution bag where the jump instruction Give out a contract for a project first address.