CN103218209A - Method and apparatus for controlling branch prediction logic - Google Patents

Method and apparatus for controlling branch prediction logic Download PDF

Info

Publication number
CN103218209A
CN103218209A CN2013100243776A CN201310024377A CN103218209A CN 103218209 A CN103218209 A CN 103218209A CN 2013100243776 A CN2013100243776 A CN 2013100243776A CN 201310024377 A CN201310024377 A CN 201310024377A CN 103218209 A CN103218209 A CN 103218209A
Authority
CN
China
Prior art keywords
branch prediction
state
prediction logic
instruction
supervisory routine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013100243776A
Other languages
Chinese (zh)
Other versions
CN103218209B (en
Inventor
P.E.沙特
R.A.希勒
M.R.塔布斯
A.J.穆夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN103218209A publication Critical patent/CN103218209A/en
Application granted granted Critical
Publication of CN103218209B publication Critical patent/CN103218209B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3848Speculative instruction execution using hybrid branch prediction, e.g. selection between prediction techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • G06F9/30189Instruction operation extension or modification according to execution mode, e.g. mode flag
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • G06F9/3806Instruction prefetching for branches, e.g. hedging, branch folding using address prediction, e.g. return stack, branch history buffer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • G06F9/3844Speculative instruction execution using dynamic branch prediction, e.g. using branch history tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • G06F9/3863Recovery, e.g. branch miss-prediction, exception handling using multiple copies of the architectural state, e.g. shadow registers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors

Abstract

The invention provides a method and an apparatus for controlling branch prediction logic. A hypervisor and one or more programs, e.g., guest operating systems and/or user processes or applications hosted by the hypervisor to configured to selectively save and restore the state of branch prediction logic through separate hypervisor-mode and guest-mode and/or user-mode instructions. By doing so, different branch prediction strategies may be employed for different operating systems and user applications hosted thereby to provide finer grained optimization of the branch prediction logic.

Description

The method and apparatus of control branch prediction logic
Technical field
The present invention relates generally to data processing, relate in particular to processor architecture and with therein branch prediction logic.
Background technology
Along with semiconductor technology is more and more continuing slowly to advance near practical limit ground aspect the raising of clock speed, designers more and more pay close attention to the concurrency in the processor architecture so that improve performance.On chip-scale, often a plurality of processing cores are arranged on the same chip, they with the independent processor chip, or in a way, work with complete stand-alone computer mode much at one.In addition, even in core, also be used for using concurrency by the making of a plurality of performance elements that is specifically designed to the operation of handling particular type.Also use streamline in many cases, so that several stages is resolved in some operation that may need a plurality of clock period to carry out, so that other operations can be begun before finishing than morning exercises.Also adopt multithreading to handle, thereby in any given clock period, can both work more comprehensively so that make a plurality of instruction streams can access parallel processing.
In another field that the processor design aspect makes progress is the field of branch prediction, this branch prediction was attempted before execution of conditional branch instructions, according to certain result relatively who carries out with branch instruction, the predicted branches instruction will be branched off into the different code path and still continue along the same code path with interrelating.Branch instruction for example can be used for from high-speed cache or even lower level storer prefetched instruction, shortens the stand-by period of loading and carrying out those instructions when finally fixing branch instruction with box lunch.In addition, in senior pipelining framework, branch prediction can be used for beginning to carry out the instruction from predicted branches before fixing branch instruction, so that can submit the result of those instructions as early as possible to after fixing branch instruction.
When correct when having predicted branch,, then can make character obtain quite big raising if may have the stand-by period of extremely lacking carrying out between branch instruction and the instruction of predicting for the execution after the branch instruction.On the other hand, when error prediction during branch, then often have to refresh the streamline of execution, and the state of processor fundamentally rewound, so that can carry out instruction from correct path.
Therefore therefore, made the precision that a large amount of effort improves branch prediction technically, dropped to the frequency of wrong branch prediction minimum by branch prediction logic.Many branch prediction logics realize for example relying on historical information, and if branch has taken place based on last when carrying out branch instruction, then when carrying out this branch instruction, there is the hypothesis of the possibility that branch takes place next time.In many realizations, for example, use branch history table to store the clauses and subclauses that interrelate with particular branch, when running into those branch instructions with box lunch, can be according to making a prediction with the interrelate data of storage of such branch instruction.
But, realize that in processor there are many challenges in branch prediction logic.For example, the precision that improves branch prediction logic often needs to use and branch prediction is slowed down and increases the more complex logic of the quantity of the required logical circuit of realization logic.About the branch prediction logic based on history, precision often is directly proportional with quantity via the historical information of this logical storage; But the memory capacity that enlarges branch history table needs accessory logic circuit.In many application, wish to make the quantity minimum of the logical circuit that is exclusively used in branch prediction logic in the processor chips, for example,, or vacate additional space and realize other functions so that reduce power consumption and/or cost.
In addition, have been found that the branch prediction algorithm often less is applicable to the program code of some type.Picture, for example, in fact some such program codes of binary tree search present branching characteristic at random, and the branch that is done the term of execution of a time of branch instruction judgement may not provide the opinion that will make what judgement when next time carrying out this instruction.In addition, carrying out under the multi-thread environment of a plurality of threads in the heart simultaneously in process nuclear, the limited size of the branch prediction table that a plurality of threads are shared can cause historical information frequently to be abandoned along with running into new branch instruction, makes the historical information of particular branch may no longer be in the branch prediction table time that branch instruction of later execution.
In fact, have been found that, in some cases, when the loss that rises to error prediction when the number percent of error prediction surpasses the level of stand-by period, in fact branch prediction may make performance descend, fix branch instruction if handle core wait before trial is executed instruction in suitable code path, then waiting time can occur.
Some conventional processors designs provide the ability of selective disabling branch prediction logic.In addition, some conventional processors designs provide the ability of preserving and recover the state of branch prediction logic.Especially, the branch prediction logic based on history trends towards along with collecting more historical informations precision being improved in time; But if a plurality of separate threads is visited branch prediction logic with limited memory space, then the collection of the historical information of a thread may make the historical information of other threads be abandoned.But, by preserving and recover the state of branch prediction logic, often can be different code section " filling " branch prediction logic, so that the historical information of collecting for those code segments more likely resides in the branch prediction logic in the past next time when carrying out those code segments.
Though the ability of the state of selective disabling branch prediction logic and preservation/recovery branch prediction logic can solve some shortcomings of traditional branch prediction, but the characteristics of traditional design are to lack the dirigibility that solves different situations, especially in the more complicated and high-performance data disposal system that can carry out the many dissimilar application with very big different operation characteristics in the above.
For example, the utilization of many high-performance data disposal systems makes a plurality of operating systems can obtain the virtual of trustship under the management of the management level software that often is called supervisory routine on the shared hardware platform.Every kind of operating system as the visitor of supervisory routine operation may be used with one or more users of detached process operation under operating system environment again in trustship.Numerous different application that operation has less the algorithms of different that is fit to the characteristic promoted from the viewpoint of branch prediction may co-exist in such system, make to be difficult to provide the branch prediction strategy that is fit to all scenario best.
Therefore, technically still significant need with flexibly and the mode of the branch prediction logic in the effective means control and treatment core.
Summary of the invention
These and other problems that the present invention interrelates with prior art by providing simulated support to solve, one or more programs of this simulated support permission supervisory routine and supervisory routine trustship (for example, visitor's operating system and/or consumer process (process) or application) both, by managing the instruction of program schema and visitor's pattern and/or user model independently, optionally preserve and recover the state of branch prediction logic.Like this, can use with the user for the different operating system of trustship and adopt different branch prediction strategies, thereby the more fine granularity optimization of branch prediction logic is provided.
According to one aspect of the present invention, branch prediction logic in the data handling system can be controlled as follows: it is to reside in first supervisor mode instruction that the supervisory routine in the data handling system is carried out that core is handled in response, preserves first state of branch prediction logic in the heart in process nuclear; It is second supervisor mode instruction that supervisory routine is carried out that core is handled in response, recovers first state of branch prediction logic; The 3rd instruction that core is the program execution of supervisory routine trustship is handled in response, preserves second state of branch prediction logic; And respond instructing of processing core execution with the 4th of supervisory routine trustship, recover second state of branch prediction logic.
Characterizing these and other advantages of the present invention and feature is set forth in the claims that are attached to herein with the further part that forms this paper.But, in order to understand the present invention better, and the advantage and the purpose that reach by its use, should be with reference to the accompanying drawings and the descriptive content of following of describing one exemplary embodiment of the present invention.
Description of drawings
Fig. 1 comprises the block diagram that can be used on according to the exemplary robot brain tool of the exemplary computer in the data processing of the embodiment of the invention;
Fig. 2 is the block diagram of the exemplary NOC that realizes in the computing machine of Fig. 1;
Fig. 3 is the block diagram of the exemplary realization of the central node of the NOC of illustration Fig. 2 in more detail;
Fig. 4 is the block diagram of the exemplary realization of the central IP piece of the NOC of illustration Fig. 2;
Fig. 5 is can be for realize the block diagram of the data handling system of virtual support according to the fine granularity control of branch prediction logic of the present invention;
Fig. 6 is the exemplary block diagram of enabling the pattern control register in the special register of mentioning in Fig. 5;
Fig. 7 is the peculiar block diagram of enabling the mode data structure of exemplary process that can be used in the data handling system of Fig. 5;
Fig. 8 is the peculiar block diagram of enabling the mode data structure of exemplary thread that can be used in the data handling system of Fig. 5;
Fig. 9 is the process flow diagram of illustration exemplary sequence of the operation of the data handling system execution of Fig. 5 when utilizing the selective enabling branch prediction logic to carry out the background switching between supervisory routine, visitor's operating system and user model program code;
Figure 10 is the block diagram of the exemplary preservation mode control register in the special register of mentioning in Fig. 5;
Figure 11 can be used in the block diagram of preserving and recover the exemplary states load/store unit of branch prediction logic status data in the data handling system of Fig. 5;
Figure 12 is an illustration when along with the preservation of branch prediction logic state with the process flow diagram of the exemplary sequence of the operation that the data handling system of Fig. 5 is carried out when recovering to carry out background switching between supervisory routine, visitor's operating system and user model program code;
Figure 13 is the process flow diagram of the exemplary sequence of the mentioned operation of carrying out in order to preserve the branch prediction logic state of illustration such as Figure 12; And
Figure 14 is the process flow diagram of the exemplary sequence of the mentioned operation of carrying out in order to recover the branch prediction logic state of illustration such as Figure 12.
Embodiment
Utilize the fine granularity control of branch prediction logic by multistage virtual data handling system according to embodiments of the invention, be different application and operating load optimization branch prediction, thereby when handling dissimilar operating loads, improve the overall data process system performance.
In certain embodiments, for example, with supervisory routine with reside in data handling system neutralization and be configured to by managing program schema independently or branch prediction logic is optionally enabled or forbidden to visitor's mode instruction by one or more visitor's operating systems of supervisory routine trustship.Similarly, in certain embodiments, one or more programs of supervisory routine and supervisory routine trustship (for example, visitor's operating system and/or consumer process or application), preserve and recover the state of branch prediction logic by managing program schema and visitor's pattern and/or user model Instruction Selection ground independently.By controlling branch prediction logic with one of these modes or both, thereby can use with the user for the different operating system of trustship and adopt different branch prediction strategies, the more fine granularity optimization of the process nuclear branch prediction logic in the heart that is used in data handling system is provided.
About this respect, supervisory routine can comprise any amount of management mode program, and they can the virtual or one or more visitor's operating systems of trustship, and can realize at the software on the various levels (for example, firmware, kernel etc.).Thereby supervisory routine is virtual and provide and the interface of the operating system of trustship with the bottom hardware in the data handling system usually, so as each operating system it is to work like the unique operating system that resides in the physical hardware of data handling system seemingly.The normally bottom supervisory routine trustship and under operating system environment, support to operate in operating system, logical partition, virtual machine or their assembly of the execution that the one or more users in one or more while processes use of visitor's operating system.Similar with supervisory routine, it is virtual and distribute to one or more users and use and process that visitor's operating system will be assigned to the hardware resource of visitor's operating system basically.It can be again in-process any program that can operate in visitor's operating system that the user uses.
Should understand that under many data processing circumstances, so the software of different stage is carried out in the processor architecture support of using in the above simultaneously with different priorities and access rights.Under virtualized environment, the program code of supervisory routine is usually with supervisory process or supervisor mode operation, and the user uses common user model operation with lower priority.Visitor's operating system also can the supervisory process mode operation or can be with the independent visitor's mode operation between supervisory process pattern and user model.
Should understand that branch prediction logic can be incorporated into so that the stand-by period that interrelates with branch instruction the shortlyest is the logical design of main target any amount.Many branch prediction logic designs utilize branch history table, and many may comprising as g-shared logic, link storehouse logic, branch target buffer etc.
As mentioned above, in some embodiments of the invention, enable and forbid to any or all supervisory routine, visitor's operating system and user's application and procedure Selection branch prediction logic.About this respect, enable or forbid branch prediction logic and can be believed to comprise and enable or forbid all that in specific branch prediction logic design, realize or have only a parts subclass.In addition, the forbidding branch prediction logic can make branch prediction logic obtain resetting, and for example, in certain embodiments, disposes some clauses and subclauses from branch prediction table.But, in other embodiments, the forbidding branch prediction logic can with " time-out " logic class seemingly, for example, so that do not predict, do not collect historical information, but other features that keep the current state of the historical information collected and branch prediction logic are reactivated up to this logic, so that can lost condition or historical information.
In addition, as mentioned above, in some embodiments of the invention, can represent supervisory routine, visitor's operating system and/or user's application or procedure Selection ground to preserve and recover the state of branch prediction logic.The state that can preserve or recover can comprise any or all data that keep of the integrality that characterizes this logic in branch prediction logic, comprises, for example, branch list clauses and subclauses, branch target buffer data, link stack entries, g-shared data etc.
Other mutation and modification are conspicuous for the person of ordinary skill of the art.Therefore, the present invention is not limited to the specific implementation that this paper discusses.
The hardware and software environment
Forward accompanying drawing now to, wherein same numeral is represented same section all the time in several views.Fig. 1 illustration comprise the exemplary robot brain tool that can be used on according to the exemplary computer in the data processing of the embodiment of the invention 10.The computing machine 10 of Fig. 1 comprises at least one computer processor 12 or " CPU ", and is connected the random access storage device 14(" RAM " that is connected with miscellaneous part with computing machine 10 with 12 by high speed memory bus 16 and bus adapter 18).
Be stored among the RAM14 be application program 20, carry out the module of the user class computer program instructions of the particular data Processing tasks for example word processing of picture, electrical form, database manipulation, video-game, stock market's simulation, the simulation of atomic weight subprocess or the application of other user classes.Be stored in addition among the RAM14 is operating system 22.Can comprise UNIX in conjunction with the operating system that embodiments of the invention use TM, Linux TM, the Windows XP of Microsoft TM, AIX TM, IBM i5/OS TM, and other operating systems that can expect as those of ordinary skill in the art.Operating system 22 and application program 20 in the example of Fig. 1 are displayed among the RAM14, but many parts of software also are stored in the nonvolatile memory usually like this, for example, and on the disk drive 24.
As can from following more obvious see, can in network-on-chip (NOC) integrated circuit (IC)-components or chip, realize according to embodiments of the invention, like this, computing machine 10 is illustrated as and comprises two exemplary NOC: video adapter 26 and coprocessor 28.NOC video adapter 26 alternately can be called graphics adapter, and it is to be in particular the figure output of the display device 30 as display screen or computer monitor and the example of the I/O adapter of design.NOC video adapter 26 is by high-speed video bus 32, bus adapter 18 and also be that the front side bus 34 of high-speed bus is connected with processor 12.NOC coprocessor 28 is connected with processor 12 with 36 with the front side bus 34 that also is high-speed bus by bus adapter 18.The NOC coprocessor 1 of Fig. 1 can be optimized to, and for example, quickens the particular data Processing tasks under the order of primary processor 12.
Each all comprises NOC the exemplary NOC video adapter 26 of Fig. 1 and NOC coprocessor 28, it comprises integrated processor (" IP ") piece, router, memory communication controller and network interface controller, and their details will more go through in conjunction with Fig. 2-3 below.Each all is program optimization for using parallel processing and needing quick random access shared storage for NOC video adapter and NOC coprocessor.But the those of ordinary skills that are benefited from the disclosure should understand that the present invention can realize in other equipment except NOC equipment and device framework and equipment framework.Therefore, the present invention is not limited to the realization in the NOC equipment.
The computing machine 10 of Fig. 1 comprises the dish drive adapter 38 by the miscellaneous part coupling of expansion bus 40 and bus adapter 18 and processor 12 and computing machine 10.Dish drive adapter 38 is connected non-volatile data storage with the form of disk drive 24 with computing machine 10, and other adapters that can for example use integrated drive electronics (" IDE ") adapter, small computer system interface (SCSI) adapter and those of ordinary skill in the art to expect are realized.As those of ordinary skill in the art can expect, non-volatile computer memory can also be embodied as CD drive, Electrically Erasable Read Only Memory (so-called " EEPROM " or " flash " storer), ram driver etc.
Computing machine 10 also comprise by, for example, control to the output of display device as the computer display, and, realize one or more I/O (" I/O ") adapter 42 of user oriented I/O from the software driver and the computer hardware of the user of the user input device as keyboard and mouse 44 input.In addition, computing machine 10 also be used for other computing machine 48 data communication and with the communication adapter 46 of data communication network 50 data communication.Such data communication can be passed through the RS-232 web member, by the external bus as USB (universal serial bus) (" USB "), realize by the data communication network as the IP data communications network with other mode serials that those of ordinary skill in the art can expect.Communication adapter realizes that a computing machine directly or data communications is sent to the hardware-level data communication of another computing machine by data communication network.Ethernet (IEEE802.3) adapter that the example that is suitable for use in the communication adapter in the computing machine 10 comprises the modulator-demodular unit that is used for wired dial up communication, be used for wired data communication network service be used for 802.11 adapters that wireless data communication network is communicated by letter.
In order to further specify, Fig. 2 has showed the functional block diagram according to the exemplary NOC102 of the embodiment of the invention.NOC among Fig. 2 is on " chip " 100, that is, realize on integrated circuit.NOC102 comprises integrated processor (" IP ") piece 104, router one 10, memory communication controller 106 and the network interface controller 108 that is grouped into interconnecting nodes.Each IP piece 104 is applicable to router one 10 by memory communication controller 106 and network interface controller 108.Communicating by letter between each memory communication controller 106 control IP pieces and the storer, each network interface controller 108 control is by the IP interblock communication of router one 10.
In NOC102, the representative of each IP piece as assembled block be used in the NOC data processing synchronously or the asynchronous logic design can use the unit again.Term " IP piece " is interpreted into " intellectual property block " sometimes, effectively the IP piece is designated as the design that a certain side has, that is, a certain side licenses to other users of semiconductor circuit or deviser's intellecture property.But, within the scope of the invention, do not require that IP is arranged by any specific owner, therefore, this term always is interpreted into " integrated processor piece " in this manual.As described herein, the IP piece be to be or can not be the logic, unit of the main body of intellecture property or chip layout design can use the unit again.The IP piece is the logic core that can form asic chip design or fpga logic design.
A kind of mode of describing the IP piece by analogy is the IP piece for the NOC design just as routine library for computer programming or concrete integrated circuit components for PCB design.In the NOC according to the embodiment of the invention, the IP piece can be as general door net table, as whole special use or general purpose microprocessor, or realizes in other modes that those of ordinary skill in the art can expect.Similar with the assembly code tabulation that advanced procedures is used, the net table is that the Boolean algebra of IP piece logic function is represented (door, standard block).NOC also can, for example, but realize with the synthesized form that the hardware description language of using as Verilog or VHDL is described.Except net table and can synthesizing the realization, NOC also can reach with more rudimentary physics description list.Simulation with I P block element as SERDES, PLL, DAC, ADC etc. can distribute with the transistor layout form as GDSII.The digital element of IP piece also provides with layout format sometimes.Also should understand, IP piece and other logical circuits of realizing according to the present invention can be realized the function that the circuit of such logic is arranged and/or the computer data file of layout with definition on various level of detail, for example, the form of logical definition program code distribution.Therefore, though hereinafter with all can integrated circuit (IC)-components, utilize that the data handling system of such equipment and other are tangible, the circuit realized in the physical hardware circuit is arranged to background and describes the present invention, but the those of ordinary skills that are benefited from the disclosure should understand, the present invention also can realize in program product, and the present invention and be used for the particular type comparable applications irrespectively of the computer-readable recording medium of distributing programs product.The example of computer-readable recording medium includes, but not limited to as volatibility and non-volatile memory device, floppy disk, hard disk drive, CD-ROM and DVD(and other) physics, recordable-type media.
Each IP piece 104 in the example of Fig. 2 is applicable to router one 10 by memory communication controller 106.Each memory communication controller is the assembly that the synchronous and asynchronous loogical circuit of the data communication that provides between IP piece and the storer is provided.The IP piece comprises memory loads instruction and memory store instruction with the example of communicating by letter between the storer like this.Memory communication controller 106 will be made more detailed description with reference to figure 3 below.Each IP piece 104 also is applicable to router one 10 by network interface controller 108, and network interface controller 108 is by the 10 control communications of the router one between the IP piece 104.The example of the communication between the IP piece is included in parallel the application and transmits the message of data and instruction for processing data in streamline is used between the IP piece.Network interface controller 108 also will be made more detailed description with reference to figure 3 below.
Each router one 10 and the respective link between them 118 realize the network operation of NOC.Link 118 can be the packet configuration of realizing on the physics that connects all-router, parallel bus.That is to say that every kind of link can be realized wide on the bus that is enough to hold simultaneously the whole exchanges data grouping that comprises all header information and effective load data.If packet configuration comprises, for example, contain 64 bytes of Eight characters joint head and 56 byte effective load datas, then the bus facing to every kind of link is 64 byte wides, 512 lines.In addition, every kind of link can be two-way, if so that the link packet structure comprises 64 bytes, then in fact bus comprises 1024 lines between its each neighbour in each router and network.In such realization, a message can comprise a more than grouping, but each grouping is accurately consistent with the width of bus.In alternate embodiments, can on the bus of a part that is enough to hold grouping, realize link only wide, make packet fragmentation is become a plurality of trifles (beat), for example, if link is realized as 16 byte wides, or 128 lines, then 64 byte packet are divided into four trifles.Should understand that different realizations can be used different highway widths according to the actual physics limit and desirable performance characteristic.If being connected between each part of router and bus is called as a port, then each router comprises five ports, each one of the four direction of the data transmission on the network, the 5th port makes router be applicable to specific IP piece by memory communication controller and network interface controller.
Communicating by letter between each memory communication controller 106 control IP pieces and the storer.High-speed cache on the storeies 114 that storer can comprise the outer main RAM112 of sheet, directly be connected with the IP piece by memory communication controller 106, the on-chip memory of enabling as IP piece 116 and the sheet.In NOC102, for example, any of on-chip memory 114,116 can realize going up in flakes cache memory.The storer of all these forms can be arranged in same address space, physical space or the virtual address, even also like this for the storer that directly is attached on the IP piece.Therefore, memory addressing message can be full bi-directional with respect to the IP piece, because the directly addressing any IP piece in any place from network of such storer.Storer 116 on the IP piece can any other IP piece from that IP piece or from NOC in addressing.The storer 114 that directly is attached on the memory communication controller can be by the addressing of IP piece (this IP piece be applicable to network by that memory communication controller), and addressing in also can any other IP piece Anywhere from NOC.
NOC102 has comprised illustration according to two Memory Management Unit (" MMU ") 120,122 of two kinds of alternative memory architectures of the NOC of the embodiment of the invention.MMU120 is implemented in the IP piece, the processor in the IP piece can be operated in virtual memory, and whole all the other frameworks of NOC can be operated in physical memory address space.MMU122 is the outer realization of sheet, is connected with NOC by data communication port 124.Port one 24 is included in and transmits the required pin of signal and other cross tie parts between NOC and the MMU, and the enough intelligence that the message grouping is become the required bus format of outside MMU122 from the NOC packet format conversion.The outside place of MMU means that all processors in all IP pieces of NOC can operate in virtual memory address space, to all conversions of the physical address of chip external memory by the outer MMU122 management of sheet.
Except using MMU120, outside 122 illustrative two kinds of memory architectures, data communication port 126 also illustration can be used on can be with the 3rd memory architecture among the NOC in embodiments of the present invention.Direct connection the between the IP piece 104 that port one 26 provides NOC102 and the chip external memory 112.When not having MMU in handling the path, this framework makes all IP pieces of NOC all utilize physical address space.When two-way shared address space, all IP pieces of NOC can be by guiding, comprise and loading and memory storing addressed messages reference-to storage in address space by the IP piece that directly is connected with port one 26.Port one 26 is included between NOC and the outer MMU112 of sheet and transmits the required pin of signal and other cross tie parts, and the enough intelligence of the bus format that message grouping MMU112 outside the NOC packet format conversion is in blocks is required.
In the example of Fig. 2, one of IP piece is designated as main interface processor 128.Main interface processor 128 provides NOC and interface between the principal computer 10 of NOC can be installed, and other IP pieces on NOC provide the data processing service, comprise, for example, from principal computer receive and between the IP of NOC piece data dispatching handle request.NOC can, for example, realize as top with reference to video graphics adaptor 26 or coprocessor 28 on the described relatively large computing machine 10 of figure 1.In the example of Fig. 2, main interface processor 128 is connected with relatively large principal computer by data communication port 130.Port one 30 is included between NOC and the principal computer and transmits the required pin of signal and other cross tie parts, and the enough intelligence that the message grouping is become the required bus format of principal computer 10 from the NOC packet format conversion.In the example of the NOC coprocessor in the computing machine of Fig. 1, such port provides the data communication format conversion between the required agreement of the link structure of NOC coprocessor 28 and the front side bus between NOC coprocessor 28 and the bus adapter 18 36.
Fig. 3 follow illustration in more detail illustration with the functions of components block diagram of realizing in 132 general illustrations, IP piece 104, memory communication controller 106, network interface controller 108 and the router one 10 in NOC102.IP piece 104 comprises computer processor 134 and I/O function 136.In this example, computer memory is represented with the fragment of the random access storage device in the IP piece 104 (" RAM ") 138.As top described with reference to figure 2, storer can occupy the fragment of the physical address space that the content on each IP piece can any IP piece from NOC looks for and visit.Processor 134 in each IP piece, I/O ability 136 and storer 138 are embodied as the IP piece general microcomputer able to programme effectively.But, as mentioned above, within the scope of the invention, IP piece general proxy as assembled block be used in the NOC data processing synchronously or the asynchronous logic design can use the unit again.Therefore, be the general embodiment that is used for illustration purpose although the IP piece is embodied as general microcomputer able to programme, can not cause restriction to the present invention.
In the NOC102 of Fig. 3, each memory communication controller 106 comprises a plurality of memory communication execution engines 140.Enable each memory communication execution engine 140 so that execution from the memory communication instruction of IP piece 104, comprises the ovonic memory communication instruction stream 141,142,144 between network and the IP piece 104.The memory communication instruction that the memory communication controller is carried out not only can be derived from the IP piece that is applicable to router by the specific memory communication controler, and can be derived from any IP piece 104 Anywhere among the NOC102.That is to say, any IP piece among the NOC can generate the memory communication instruction, and by the router of NOC the sort of memory communication instruction is sent to another memory communication controller that interrelates with another IP piece so that carry out the sort of memory communication instruction.Such memory communication instruction can comprise that for example, conversion look-aside buffer steering order, high-speed cache steering order, barrier instruct and memory loads and storage instruction.
Enable each memory communication execution engine 140 so that carry out engines complete memory communication instruction concurrently independently and with other memory communication.Memory communication is carried out the elastic storage transaction processor of engine realization at the concurrent optimized throughput of memory communication instruction.Memory communication controller 106 supports that in order to carry out many memory communication instructions simultaneously all a plurality of memory communication of operation are simultaneously carried out engine 140.The new memory communication instruction is distributed to memory communication by memory communication controller 106 and is carried out engine 140, and memory communication is carried out engine 140 can accept a plurality of response events simultaneously.In this example, all memory communication execution engines 140 all are identical.Therefore, the quantity of the memory communication instruction that can manage simultaneously of increase and decrease memory communication controller 106 can realize by the quantity that the increase and decrease memory communication is carried out engine 140.
In the NOC102 of Fig. 3, enable each network interface controller 108 so that communication instruction is become the network packet form that transmits by router one 10 from the order format conversion between IP piece 104.Communication instruction can by IP piece 104 or by memory communication controller 106 with the command format systematization, and offer network interface controller 108 with command format.Command format can be the native format that conforms to the architectural registers file of IP piece 104 and memory communication controller 106.The network packet form normally transmits required form by the router one 11 of network.Each such message is made up of one or more network packet.Instruct and memory store instruction from the memory loads that the order format conversion becomes the example of such communication instruction of packet format to be included between IP piece and the storer at network interface controller.Such communication instruction also can be included in the communication instruction that is sent in parallel the application between the IP piece and transmits the message of data and instruction for processing data in streamline is used between the IP piece.
In the NOC102 of Fig. 3, enable each IP piece in case the memory communication controller by the IP piece to the communication information that sends from storer based on storage address, also send to network then by its network interface controller.The memory communication that based on the communication information of storage address is the memory communication controller of IP piece is carried out the memory reference instruction as loading instruction or storage instruction that engine is carried out.Sending in the IP piece usually based on the storage address communication information like this with command format systematization (formulated), and handed to the memory communication controller so that carried out.
Manyly carry out with messaging service based on the storage address communication information, because certain storer of visiting may be in going up Anywhere in the physical storage communication controler address space, on the chip or outside the chip, directly be attached on any memory communication controller among the NOC, or the final IP block access by NOC-with any specific to come from which IP piece based on the storage address communication information irrelevant.Therefore, in NOC102, all that will carry out with messaging service are delivered to the network of relation interface controller based on the storage address communication information from the memory communication controller, so that become packet format and send by network message from the order format conversion.When converting packet format to, network interface controller also depends on will be by the network address of dividing into groups based on one or more storage address identifications of storage address communication information visit.Utilize the message of storer earthing location addressing based on storage address.Each storage address is mapped to the network address by network interface controller, usually, is responsible for the network location of the memory communication controller of certain limit physical memory address.The network location of memory communication controller 106 also is the network location of associated router 110, network interface controller 108 and the IP piece 104 of that memory communication controller naturally.Instruction transform logic 150 in each network interface controller can send for the router by NOC and based on the storage address communication information memory address translation be become the network address.
In case receive messaging service from the router one 10 of network, whether each network interface controller 108 is just examined each grouping memory instructions.Each grouping that comprises memory instructions given and receiving the memory communication controller 106 that network interface controller interrelates, memory communication controller 106 all the other service loads that will divide into groups send to the IP piece be for further processing before execute store instruct.Like this, always be ready to the data processing that memory content is supported the IP piece before the instruction in the middle of the IP BOB(beginning of block) is carried out the message that depends on the specific memory content.
In the NOC102 of Fig. 3, enable each IP piece 104 so that walk around its memory communication controller 106, and directly send IP interblock, the network addressing communication information 146 to network by the network interface controller 108 of IP piece.The network addressing communication information is the message of guiding another IP piece by the network address into.Expect that as those of ordinary skill in the art such message sends the operational data in the streamline application, a plurality of data that the one way preface between the IP piece in the SIMD application is handled etc.Such message is different from the communication information based on storage address because they be know by the router of NOC guide into message the network address send IP piece network addressing at the very start.Such network addressing communication information passes to the network interface controller of IP piece by I/O function 136 with command format by the IP piece, converts packet format to by network interface controller then and the router by NOC sends to another IP piece.Such network addressing communication information 146 is two-way, depends on that they are used in certain application-specific, is given to each IP piece of NOC potentially and produces from each IP piece of NOC.But, enable each network interface controller so that not only such communication information had been sent to associated router but also received such communication information from associated router, and enable each network interface controller so that walk around relational storage communication controler 106, not only such communication information has directly been sent to relevant IP piece but also directly received such communication information from relevant IP piece.
Also enable each network interface controller 108 in the example of Fig. 3 and characterize pseudo channel on the network of network packet with type so that realize.Each network interface controller 108 comprises pseudo channel realization logical one 48, pseudo channel is realized logical one 48 each communication instruction of classifying by type, and hand to router one 10 with block form so as before to send on the NOC with the class record of instruction in the field of network packet form.The example of communication instruction type comprises IP interblock address message Network Based, request message, to the response of request message, guide the invalid message of high-speed cache into; Memory loads and storing message; With to response of memory loads message etc.
Each router one 10 in the example of Fig. 3 comprises logical routing 152, pseudo channel steering logic 154 and pseudo channel impact damper 156.Logical routing is realized as the network of realizing the synchronous and asynchronous logic of data communication protocol storehouse in the network that is formed by the bus line between router one 10, link 118 and the router for data communication usually.Logical routing 152 comprises the function that the reader that is familiar with this area can interrelate sheet outer network and routing table, and the routing table among at least some embodiment is considered to too slowly and the trouble difficulty is used among the NOC.Be embodied as synchronously with the logical routing of the network of asynchronous logic and can be configured to the same routing decision of making fastly with the single clock period.Logical routing in this example is by selecting to transmit the port routing packets of each grouping that receives in router.Each grouping comprises makes grouping be routed to the network address.
When describing the communication information based on storage address in the above, each storage address is described as by network interface controller and is mapped to the network address, that is, and and the network location of memory communication controller.The network location of memory communication controller 106 also is the network location of associated router 110, network interface controller 108 and the IP piece 104 of that memory communication controller naturally.Therefore, in the communication information of IP interblock or address Network Based, therefore, the place that the application layer data processing is regarded the network address as in the network that router, link and bus line by NOC form IP piece is common.Fig. 2 illustration a kind of tissue of such network be the grid of row and column, wherein each network address can be embodied as, for example, every group x like this in the unique identifier of every group of associated router of this grid, IP piece, memory communication controller and network interface controller or this grid, the y coordinate.
In the NOC102 of Fig. 3, each router one 10 is realized two or more virtual communication channels, and wherein each virtual communication channel characterizes by communication type.Communication instruction type, so the pseudo channel type comprises above-mentioned those: IP interblock address message Network Based, request message, to the response of request message, guide the invalid message of high-speed cache into; Memory loads and storing message; With to response of memory loads message etc.For the virtual support channel, each router one 10 in the example of Fig. 3 also comprises pseudo channel steering logic 154 and pseudo channel impact damper 156.Pseudo channel steering logic 154 is checked the communication type of its appointment in the grouping of each reception, and at the sort of communication type each grouping is placed on and spreads out of in the pseudo channel impact damper, so that send to neighboring router on the NOC by port.
Each pseudo channel impact damper 156 has limited storage space.When receiving many groupings in short time interval, the pseudo channel impact damper can be filled-make again and grouping can not be placed in the impact damper.In other agreements, the grouping that arrives on the pseudo channel of buffer fills will be discarded in.But, utilize the control signal of bus line to enable each pseudo channel impact damper 156 in this example, so that advise that by the pseudo channel steering logic surrounding router time-out sends in pseudo channel, that is to say, suspend the grouping that sends the specific communications type.When a pseudo channel so suspends, every other pseudo channel is unaffected-can continue the work of full capacity ground.Control signal transfers back to the relevant network interface controller 108 of each router by each router always.Each network interface controller is configured to according to the like this reception of signal, and refusal receives the communication instruction that suspends pseudo channels from its relational storage communication controler 106 or from its relevant IP piece 104.Like this, the influence of the time-out of pseudo channel realizes all hardware of pseudo channel, gets back to always and sends the IP piece.
An effect suspending the grouping transmission in the pseudo channel is whenever can not abandon grouping.When router runs at picture, for example, when some such unreliable protocols of Internet protocol may abandon the situation of grouping down, router in the example of Fig. 3 can suspend all transmissions of dividing into groups in the pseudo channel by their pseudo channel impact damper 156 and their pseudo channel steering logic 154, available once more up to buffer space, thus grouping abandoned without any necessity.Therefore, the NOC of Fig. 3 can utilize hardware layer realization height reliable network communication protocol as thin as a wafer.
The exemplary NOC of Fig. 3 also can be configured on the retention tab and the cache coherence between the chip external memory high-speed cache.Each NOC can support each a plurality of high-speed cache to the operation of identical bottom memory address space.For example, high-speed cache can be by the IP piece, by the memory communication controller, or by the director cache control of NOC outside.Any of on-chip memory 114,116 in the example of Fig. 2 also can realize going up high-speed cache in flakes, and within the scope of the invention, cache memory also can be realized outward by sheet.
Each router one 10 that is illustrated among Fig. 3 comprises five ports, four port one 58A-D are connected with other routers by bus line 118, and five-port 160 is connected each router with memory communication controller 106 by network interface controller 108 with its relevant IP piece 104.Can seeing in the illustration from Fig. 2 and 3, the router one 10 of NOC102 forms the vertical grid network that is connected the vertical and horizontal port in each router with horizontal link with link 118.In the illustration of Fig. 3, for example, port one 58A, 158C and 160 is named as vertical port, and port one 58B and 158D are named as horizontal port.
Fig. 4 then in another way illustration according to a kind of exemplary realization of IP piece 104 of the present invention, IP piece 104 is realized as the treatment element that is divided into command unit (IU) 162, performance element (XU) 164 and auxiliary performance element (AXU) 166.In illustrative realization, IU262 comprises a plurality of instruction buffers 168 that receive instruction from L1 instruction cache (iCHCHE) 170.Each instruction buffer 168 is exclusively used in a plurality of, for example, and one of four symmetrical multithreadings (SMT) hardware thread.Effectively to real converter unit (iERAT) 172 and iCHCHE170 coupling, the instruction request of taking out that is used for taking out sequencer 174 from a plurality of threads is transformed into from the real address than the low memory search instruction.Each thread taking-up sequencer 174 is exclusively used in the specific hardware thread and is used for guaranteeing the instruction that related linear program will be carried out is got iCHCHE so that highly arrive suitable performance element.In addition, as shown in Figure 4, the instruction of getting in the instruction buffer 168 can be monitored that also branch prediction logic 176 takes out sequencer 174 to each thread and furnishes a hint, so that make the instruction cache that branch causes in the execution thread lose minimum by branch prediction logic 176.
IU162 also comprises and the correlativity that is exclusively used in each hardware thread/send logical block 178 is configured to fix correlativity and steering order sending from instruction buffer 168 to XU164.In addition, in illustrative embodiment,, therefore make different threads issue XU164 and AXU166 with instructing separately simultaneously with independent correlativity/send logical block 178 to be provided among the AXU166.In alternate embodiments, logical one 80 can be arranged among the IU162, or can omit it fully, so that logical one 78 is issued AXU166 with instruction.
XU164 is realized as fixed point execution unit, comprises a plurality of general-purpose registers (GPR) 182 with fixed point logical one 84, branch's logical one 86 and/or 88 couplings of load/store logical one.Load/store logical one 88 is coupled with L1 data cache (dCACHE) 190, and is provided effectively to real conversion by dERAT logical one 92.XU164 can be configured to realize almost any instruction set, for example, and all or part 32b or 64b PowerPC instruction set.
AXU166 works to comprise special-purpose correlativity/the send logical execution units of logical one 80 and one or more execution block 194.AXU166 can comprise any amount of execution block, and can realize the almost performance element of any kind, for example, floating point unit, or the one or more special performance element as encryption/decryption element, coprocessor, vector processing unit, Graphics Processing Unit, XML processing unit etc.In illustrative embodiment, AXU166 comprises the high speed satellite interface with XU164, for example, so that the direct transformation between support AXU architecture states and the XU architecture states.
With IP piece 104 communicate by letter can via with the network interface controller 108 of NOC102 coupling, manage in top mode in conjunction with Fig. 2 discussion.Based on the communication of address, for example, visit L2 cache memory communication controler can provide with message based communication.For example, each IP piece 104 can comprise special-purpose interior case and/or outer container, so that the inter-node communication between the managing I P piece.
Embodiments of the invention can be in the above in conjunction with realizing in the described hardware and software environment of Fig. 1-4.The those of ordinary skills that are benefited from the disclosure should understand that the present invention can realize, and can do other modifications to above-mentioned hardware and software embodiment without departing from the spirit and scope of the present invention in numerous varying environments.Like this, the present invention is not limited to specific hardware disclosed herein and software environment.
The virtual support of the fine granularity control of branch prediction logic
Owing to keep and/or guarantee the importance of the branch prediction logic data as the branch history table data that accurate branch prediction results can be provided, visitor's pattern and/or user model mechanism are provided according to embodiments of the invention, so that enable and forbid the branch prediction logic operation of association manager pattern mechanism operation.
Many microprocessor micro-architectures comprise the hardware branches prediction algorithm of realizing by one or more branch history table wholly or in part.The correct prediction of (take or do not take) branch outcome can greatly influence whole C PU performance, especially in not normal microprocessor.Therefore, guarantee that importantly branch history table content and other branch prediction informations are to represent stream future with representing accurately.
Expanded any mechanism that supervisory routine code control branch prediction logic is operated according to embodiments of the invention, for example, the addressable control register of supervisory routine of the branch prediction logic as the branch history table is enabled and forbidden to the overall situation.Particularly, when still allowing supervisory routine that the global branch predicted operation is set, owing to relate to visitor's operating system and/or user's oneself code flow, so allow visitor's pattern mechanism and alternatively of visitor's operating system of supervisory routine trustship, user mode application and process control branch prediction logic.
Among the embodiment of Tao Luning, can support several functions hereinafter, for example comprise enabling/forbid user model and/or visitor's mode instruction that branch prediction logic upgrades (for example, branch history table is upgraded); According to Process identifier, enable/forbid supervisor mode and/or visitor's mode instruction that branch prediction logic upgrades (for example, branch history table is upgraded); Under the environment that should postpone the control branch prediction logic, the supervisory routine of visitor's pattern of restriction visitor's operating system and/or user's application controls branch prediction logic and/or user model instruction is enabled/is forbidden; Reset with the supervisor mode of user model control, and other features.
Equally, owing to keep and/or guarantee the importance of the branch prediction logic data as the branch prediction history lists data that accurate branch prediction results can be provided, also hope is preserved for fine granularity mechanism and is recovered the branch prediction logic state and give security in certain embodiments.
Usually, branch prediction logic (for example, branch history table) is the shared resource between the intracardiac all hardware thread of given process nuclear and all software process of carrying out those hardware threads.This can cause the branch history table status along with software process is carried out the different code collection problem of (thrash) again and again.Use software to realize using hashing algorithm, various branch history table size and other technologies to come the influence of sharing between the reduction process; But many kinds of these technology have increased the size and the complicacy of branch prediction logic unwished-forly.
On the contrary, can be according to embodiments of the invention for the fine granularity control of preservation and recovery branch prediction logic is given security, this can shorten this logic is collected the historical information of raising branch prediction accuracies for all kinds software of carrying out in system preheating time.Like this, the background switching can make software can utilize the sort of process distinctive " preheating " branch history table to carry out.In addition, by the preservation state data rather than it is rejected, for multiple process collection status data can make the size of branch prediction logic and complicacy be reduced.
As following discussed in detail, provide the instruction of for example preserving and recovering the branch history table status information according to embodiments of the invention, comprise to/from the instruction of the supervisor mode of storer or other storage medium preservation/recovery branch prediction logic states; To/from visitor's pattern of storer preservation/recovery branch prediction logic state and/or user model instruction; The supervisor mode of visitor's pattern and/or user model preservation/recovery instruction is enabled/is forbidden; Reset with the supervisor mode of branch prediction logic state.
Forward Fig. 5 now to, this graphical illustration can be for realize the exemplary hardware and the software environment of the data handling system 200 of virtual support according to the control of the fine granularity of branch prediction logic of the present invention.Angle from hardware 202, data handling system 200 comprises a plurality of processors or processing unit 204, storer 212 and with data handling system and various hardware resource, for example, the I/O layer 214 of coupling such as external network, memory device and network, each processor or processing unit 204 comprise one or more hardware threads 206 and the one or more special register (SPR) 210 that branch prediction logic 208 is supported.
In this one exemplary embodiment, support the selective enabling and the preservation of branch prediction logic state of branch prediction logic and recover both.Like this, and as following institute more go through, the one or more control registers that are embodied as SPR can be used to control the initiate mode of branch prediction logic and the preservation/recovery operation that interrelates with the branch prediction logic state.Further, the status data of preserving 216 can be stored in the storer 212 and with recovery operation and retrieve with interrelating.But, should understand, may only support the selective enabling of branch prediction logic according to data handling systems more of the present invention, and other data handling systems may only be supported branch prediction logic state saving/restoring, therefore, realization disclosed herein is not to realize sole mode of the present invention.
Should understand that being distributed among the different embodiment of hardware thread, branch prediction logic and processor/processing unit has difference.For example, processor 204 can be embodied as the processing core in one or more multi-core processor chips, and each processor 204 can comprise any amount of hardware thread.Further, a plurality of threads can be shared maybe and can duplicate branch prediction logic for different threads.In addition, can to interrelate with the specific control module in the processor maybe can be that processor is so wide to branch prediction logic.In one embodiment, for example, processor 204 can be embodied as the top IP piece of interconnection each other during NOC disclosed arranges in conjunction with Fig. 1-4.Therefore, should understand that the present invention can be used in branch prediction logic is used in processor or the process nuclear almost any hardware environment in the heart, therefore, the present invention is not limited to specific hardware environment disclosed herein.
Angle from software, data handling system realizes virtualized environment, the one or more visitor's operating systems 220 of supervisory routine 218 trustships that wherein also often are called zone manager or virtual machine manager, and provide interface between visitor's operating system 220 and the hardware 202, so that (for example with a part of hardware resource in the data handling system, hardware thread, handle core, processor, storer, I/O function etc.) distribute to visitor's operating system 220 and so that visitor's operating system 220 extraordinary images they under non-virtualized environment, work and work like that.Each visitor's operating system 220 trustship is usually operated in the detached process so that make the conflict between the different application drop to minimum one or more users' application or program 222.
Use 222 executable instructions that interrelate with supervisory routine 218, visitor's operating system 220 and user and be called supervisor mode, visitor's pattern and user model instruction, each processor 204 all wishes to support these different instruction modes, so that optionally limit the activity and their the relative priority level of control of every kind of level software.Especially, supervisory routine 218 is endowed the limit priority pattern, and the priority that will fall progressively is assigned to visitor's operating system 220 and the user uses 222.
In order to realize the selective enabling of branch prediction logic, data handling system 220 is supported supervisor mode and visitor's pattern of branch prediction logic are controlled both, so that the control of visitor's pattern can be used to surmount any supervisor mode control.In addition, in certain embodiments, user model control can be used to surmount any visitor's pattern and/or supervisor mode control.In certain embodiments, such control can be applied to any visitor's operating system and/or the user uses, so that whenever processor or when handling core and carrying out any visitor's pattern/user model instruction, visitor's pattern/user model control is used to control branch prediction logic, and supervisor mode control is used for the supervisor mode instruction.Alternately, visitor's pattern/user model control can be used connection mutually with specific visitor's operating system and/or user, control branch prediction logic dividually so that for example allow each visitor's operating system and/or user's application to use with other visitor's operating systems and/or user.
In addition, may wish to allow the control of " locking " any more high-level software effectively of supervisory routine and/or visitor's operating system, so that the ability of visitor's operating system or user's application controls branch prediction logic can be disabled, system is so wide or be confined to specific operation system and/or user and use.
In illustrative embodiment, the control that the selectivity control of branch prediction logic is enabled realizes by the control to the initiate mode of branch prediction logic.When initiate mode indication branch prediction logic has been enabled, branch prediction logic activates, with normal mode work, and when initiate mode indication branch prediction logic is disabled, branch prediction logic is " closing " basically, make branch prediction logic can not attempt the result of predicted branches instruction, and usually, can be or handle being subjected to monitor to carry out and collecting historical information of one or more instruction streams that core carrying out according to processor.For example, the branch prediction logic that comprises branch history table can be configured to interrupt being cached at new clauses and subclauses in the branch history table or upgrade existing clauses and subclauses in this table.
The initiate mode of branch prediction logic can, for example, use one or more control of branch prediction logic visit based on hardware register whether activate so that determine that branch prediction logic is current.As an example, Fig. 6 illustration the exemplary pattern control register 230 of enabling, it comprises when executive supervisor pattern, visitor's pattern and user model instruction, controls respectively that the supervisory routine whether branch prediction logic activate is enabled field 232, the visitor enables field 234 and the user enables field 236.All three field 232-236 are that supervisory routine can be write, and field 234 and 236 is that visitor's operating system can be write and field 236 is that the user uses and can write simultaneously.
In addition, two lock field, that is, visitor's lock field 238 and Subscriber Locked field 240 are used to forbid visitor's operating system (at field 238) or the corresponding field 234 of enabling is write in user's application (at field 240), therefore ability in 236 controls branch prediction logic.Usually, lock field 238 and 240 is that supervisory routine can be write, with lock field 240 be that visitor's operating system can be write, but two lock field are that all ranks are all readable, make, for example, visitor's operating system or user use and can check whether give the authority of controlling branch prediction logic before the trial of making the initiate mode that changes logic.
In some applications, may wish to switch all or part state of preserving and recovering control register 230 in conjunction with background, so that for example visitor's operating system and/or the user initiate mode of using setting only is used in when carrying out the instruction that those visitor's operating systems and/or user use, thereby support to have the ability that each visitor's operating system and/or user use.
Alternately, as shown in Figure 7, may wish to use as the peculiar data structure of enabling pattern list 250 of the process that comprises a plurality of clauses and subclauses 252, these a plurality of clauses and subclauses 252 are enabled Process identifier 254 with lock field 256,258 with the user who makes branch prediction logic can access the peculiar customization of process and are connect mutually.Table 250 can be by supervisory routine or visitor's operating system management, so that dispose any process, can control branch prediction logic with what application in those processes, and when carrying out separately process, allow or enable and forbid with limiting those process selection branch prediction logic.Also can use similar data structure that the peculiar control to the visitor of a plurality of visitor's operating systems is provided in certain embodiments.
In addition, as mentioned above, branch prediction logic can be shared by a plurality of hardware threads of carrying out in the heart at given processor core, and among the embodiment in being illustrated in Fig. 6-7, the given process nuclear of the common influence of the control of branch prediction logic is utilized all hardware thread of branch prediction logic in the heart.Alternately, as illustrated in Fig. 8, thread as table 260 is peculiar to be enabled the mode data structure and can comprise with different threads and interrelating, with for each hardware thread, comprise that the program of managing independently, visitor and user enable field 264,266,268 and visitor and Subscriber Locked field 27,272 individual entry 262 is so that can be provided with different initiate modes for the different hardware thread.Branch prediction logic can be configured to access list 260 then, when carrying out the instruction that interrelates with the current specific hardware thread that is mapped to given virtual thread, determine whether activate this logic with box lunch.
Substitute as another kind, can be virtual with enabling the control data structure, and interrelate with the particular virtual thread, so that when given hardware thread is being carried out the particular virtual thread, will be used for that virtual thread with user's control that virtual thread interrelates.For example, can to the background of virtual thread between transfer period for be designated as carry out such virtual thread the processing core with virtual thread peculiar control register pack in the hardware based control register be configured to work so that will handle core in the mode that virtual thread is stipulated.
With how keeping the initiate mode related control data have nothing to do, can optionally enable branch prediction logic in the operating period of data handling system with the illustrative general fashion of sequence of the operation 280 of Fig. 9, Fig. 9 illustration the general execution of single hardware thread in the data handling system.Should understand that other hardware threads that reside in the data handling system can be carried out in a similar manner.
On the rank of supervisory routine, as illustrated in square frame 282-294, between supervisory routine and one or more visitor's operating system, periodically switch and carry out.Specifically, square frame 282 visitor's operating system that plan is carried out at thread enables or forbids branch prediction logic according to initiate mode.Should understand, owing to begin to lock visitor's operating system from initiate mode is set, or because visitor's operating system does not surmount the default conditions that supervisory routine is provided with, this initiate mode can be specified by visitor's operating system, or can be specified by supervisory routine.
In case optionally enable or forbidden branch prediction logic, just in certain time period, move or execution (square frame 284) visitor's operating system, so that, enable or forbid branch prediction logic according to initiate mode at current visitor's operating system of carrying out.This is carried out and continues up to preempt interrupt, or shown in square frame 286, has finished its fixed time sheet up to visitor's operating system, thereby passes control to square frame 288, enables or forbid branch prediction logic according to initiate mode at supervisory routine.Operation or executive supervisor (square frame 290) in certain period then, and in square frame 292, just return and carry out last visitor's operating system and still change to another visitor's operating system and make definite.If make the decision that changes to another visitor's operating system, then pass control to square frame 294, so that exchange, turn back to square frame 282 then, so that being used for new visitor's operating system ground, initiate mode carries out new visitor's operating system.Otherwise square frame 292 causes the control return to square frame 282, continues to carry out current visitor's operating system so that initiate mode is used for current visitor's operating system ground.
On the rank of visitor's operating system, as illustrated in square frame 296-308, periodically carry out background and switch between visitor's operating system and one or more user application, specifically, branch prediction logic is enabled or forbidden to user's application that square frame 296 plans to carry out at thread according to initiate mode.Should understand, owing to begin to lock user's application from initiate mode is set, or because the user uses the default conditions that do not surmount supervisory routine or the setting of visitor's operating system, and this initiate mode can be used appointment by the user, or can specify by supervisory routine or visitor's operating system.
In case optionally enable or forbidden branch prediction logic, just operation or execution (square frame 298) user use in certain period, so that at current visitor's operating system of carrying out, enable or forbid branch prediction logic according to initiate mode.This is carried out and continues up to preempt interrupt, or shown in square frame 300, has finished its fixed time sheet up to user's application, thereby passed control to square frame 302, enables or forbid branch prediction logic according to initiate mode at visitor's operating system.Operation or carry out visitor's operating system (square frame 304) in certain period then, and in square frame 306, just return carry out end user use still change to another user use make definite.Change to the decision that another user uses if make, then pass control to square frame 308,, turn back to square frame 296 then, answer land used to carry out new user's application so that initiate mode is used for new user so that exchange.Otherwise square frame 306 causes the control return to square frame 296, answers land used to continue to carry out active user's application so that initiate mode is used for the active user.
Therefore, in according to embodiments of the invention, if the application developer recognizes some part of the application of developing or whole application and tends to destroy branch prediction logic with useless historical information, for example, since operating load at random with unpredictable character, then the developer can become application configuration at using or wherein any has the problem part optionally to forbid branch prediction logic, makes other parts of this application and may use the branch prediction of any other program of branch prediction logic to be improved.Equally, if visitor's operating system is known the type that shows of sort some application or application aspect the branch prediction logic of enabling, if or supervisory routine knows aspect branch prediction of sort some application of performance or visitor's operating system, then visitor's operating system and/or supervisory routine can optionally be forbidden branch prediction logic when those incompatible programs of execution.
Then, in illustrative embodiment, the control of preserving and recover the state of branch prediction logic is instructed by use and management program schema and visitor's pattern and/or user model, for example, via the addressable register in the branch prediction logic, or one or more ports of providing of branch prediction and storer that can the storage cache status information or the move between other impact dampers realize.For example, in certain embodiments, software can provide the address among the SPR, (for example write disengaging (kick-off) position then, one is used for preserving, and one is used for recovering) so that the auxiliary sequencer of notice microcode unit or hardware is to/storage address preservation/restore data from providing.In certain embodiments, keep address and the identical SPR that breaks away from the position can obtain being provided with the above-mentioned supervisory routine/visitor/user facility protection of initiate mode.In addition, if do not use hardware to assist sequencer, then software instruction can be by being provided with storage address, write to break away from the position, and the circulation that increases progressively between the address all obtains branch up to all data the instruction preserved and recovery operation.
In certain embodiments, for example, as illustrated in Figure 10, the lock field 312 that comprises the instruction of relevant visitor's pattern and user model, 314 preservation mode control register 310 can be used for optionally enabling visitor's operating system and/or the user uses or process, so that preserve and/or recovery branch prediction logic status data.With enable/disable function is the same, enables preservation/restore funcitons and can be applied to all visitor's operating systems and/or user and use, or according to some embodiments of the present invention, can be exclusively used in specific visitor's operating system, the user uses and/or consumer process.
In addition, as mentioned above, the realization of preservation and recovery operation can mainly realize with software, for example, via the circulation of move, or alternately, can rely in the branch prediction logic or otherwise with the special logic of branch prediction logic coupling preserve and recovery operation so that quicken.For example, Figure 11 illustration comprise a plurality of clauses and subclauses 322 and with the exemplary branch history table 320 of branch history table load/store unit 324 coupling.Load/store unit 324 can be used for, for example, from branch history table 320, duplicate one or more clauses and subclauses 322 as the status data in the storer 328 326, and recover branch history table 320 in the branch history table 320 by the replicate entries in the status data 326 is got back to.
The copy that can in storer 328, keep a plurality of status datas 326, for example, according to different user application, different visitor's operating systems etc.Storer 328 can be the part of the primary memory framework of data handling system, or in some implementations, can be private buffer, for example, and process nuclear private buffer in the heart.Load/store unit 324 can be embodied as, for example, the input data that the response thread provides, beginning transmits the sequencer or the microcode unit of selected data between branch history table 320 and storer logical 328.
In some implementations, whole branch history table, and alternatively, comprise that status data can be preserved/be reverted to other status datas of branch prediction logic.But, in certain embodiments, may wish that only preservation/recovery represents the subclass of data of the state of branch prediction logic, for example, skip flag is invalid clauses and subclauses 322, or only preserves N the most frequently used or most recently used clauses and subclauses.In addition, as illustrated in the compression/decompression engine in the load/store unit 324 330, may wish the status data in the compressing ram 328 in certain embodiments, so that reduce the required memory space of hold mode data, then, when recovering back branch prediction logic, depressurizing compression data.In alternate embodiments, can use and quicken or otherwise reduce other of performance impact of status data preserving and recover branch prediction logic based on hardware mode.
With how to preserve and to recover the branch prediction logic status data irrelevant, Figure 12 is useful in the order of the operation 340 of the single hardware thread of general execution in the data handling system in conjunction with preserving and having recovered branch prediction logic status data illustration.Should understand that other hardware threads that reside in the data handling system can be carried out in a similar manner.
On the rank of supervisory routine, as illustrated in square frame 342-360, between supervisory routine and one or more visitor's operating system, periodically switch and carry out.Specifically, square frame 342, for example, the visitor's mode instruction in the response visitor operating system is the branch prediction logic status data of visitor's operating system recovery storage.In case the branch prediction logic state is restored, just in certain period, move or execution (square frame 346) visitor's operating system, so that branch prediction logic uses the state that recovers when carrying out visitor's operating system.This is carried out and continues up to preempt interrupt, or shown in square frame 348, finished its fixed time sheet, thereby pass control to square frame 350 up to visitor's operating system, so that for example respond the visitor's mode instruction in visitor's operating system, preserve the state of branch prediction logic.Then, square frame 352, for example, the supervisor mode instruction in the response management program recovers the branch prediction logic state of storage for supervisory routine.Then, operation or carry out (square frame 354) supervisory routine in certain period, after this, square frame 356 is the supervisor mode instruction in the response management program for example, preserves the state of branch prediction logic.Then, in square frame 358, just return and carry out last visitor's operating system and still change to another visitor's operating system and make definite.If make the decision that changes to another visitor's operating system, then pass control to square frame 360, so that exchange, turn back to square frame 342 then, be visitor's operating system recovery branch prediction logic state.Otherwise square frame 358 is returned to square frame 342 with control, so that be visitor's operating system recovery branch prediction logic state.
On the rank of visitor's operating system, as illustrated in square frame 362-380, between visitor's operating system and one or more user application, periodically switch background.Specifically, square frame 362, for example, the branch prediction logic status data that recovers storage is used in the user model instruction during the response user uses for the user.In case the branch prediction logic state is restored, just operation or execution (square frame 364) user use in certain period, so that branch prediction logic uses the state that recovers when carrying out user's application.This is carried out and continues up to preempt interrupt, or shown in square frame 368, has finished its fixed time sheet up to user's application, thereby passed control to square frame 370, so that for example respond the user model instruction in user's application, preserves the state of branch prediction logic.Then, square frame 372 for example responds the visitor's mode instruction in visitor's operating system, is the branch prediction logic state of visitor's operating system recovery storage.Then, operation or execution (square frame 374) visitor's operating system in certain period, after this, square frame 376 for example responds the visitor's mode instruction in visitor's operating system, preserves the state of branch prediction logic.Then, in square frame 378, just return carry out end user use still change to another user use make definite.Change to the decision that another user uses if make, then pass control to square frame 380,, turn back to square frame 362 then, for the user uses the branch prediction logic state that recovers so that exchange.Otherwise square frame 378 is returned to square frame 362 with control, so that use the branch prediction logic state that recovers for the user.
Should understand, realize in the background handoff routines that other status datas that the instruction of preservation and/or recovery branch prediction logic status data can interrelate in the given background for preservation or recovery and hardware thread execution are carried out.In addition, should understand, supervisory routine, selected visitor's operating system and/or selected user are used and may be need not to preserve or recover the branch prediction logic status data, therefore, these selected entities may omit between transfer period in background and be implemented as any instruction that the branch prediction logic data were preserved or recovered to such entity.
Figure 13-14 in more detail illustration in conjunction with preserving and recovering the branch prediction logic state table, for example, the operation that the branch history table clause takes place.For example, Figure 13 illustration for example supervisory routine, visitor's operating system and/or user are applied as and preserve the preservation branch history table routine 390 that the branch prediction logic status data is carried out by program.For example, square frame 392 passes through, and for example, the relevant lock field of check program at first determines whether to allow to preserve the branch prediction logic state.In some cases, for example, for supervisory routine, always authoring program is preserved the branch prediction logic state, therefore can omit square frame 392.If square frame 392 does not allow, then termination routine 390.Otherwise square frame 392 passes control to square frame 394, so that preserve the branch prediction logic state, finishes routine 390 then.
Similarly, Figure 14 illustration by program, for example, supervisory routine, visitor's operating system and/or user are applied as and recover the recovery branch history table routine 400 that the branch prediction logic status data is carried out.For example, square frame 402 passes through, and for example, the relevant lock field of check program at first determines whether to allow to recover the branch prediction logic state.In some cases, for example, for supervisory routine, always authoring program recovers the branch prediction logic state, therefore can omit square frame 402.If square frame 402 does not allow, then termination routine 400.Otherwise square frame 402 passes control to square frame 404, so that pass through, for example, removes all old branch history table clause replacement branch prediction logic states, passes control to square frame 406 then, so that recover the branch prediction logic state, finishes routine 400 then.
Therefore, according to embodiments of the invention by the branch prediction logic status data selective enabling/forbidding and/or optionally preserve and recover, for fine granularity control branch prediction logic more provides assurance.Can think, in certain embodiments, provide more fine granularity control to make it possible to be better dissimilar programs and operating load optimization branch prediction logic, and in some cases, can allow to use less and/or more uncomplicated branch prediction logic, thereby save cost and dwindled processor chips top set prediction logic occupation space size.
Can do various modifications to disclosed embodiment without departing from the spirit and scope of the present invention.Therefore, the claims by the appended claims herein are depended in the present invention.

Claims (24)

1. the method for the branch prediction logic in the control data disposal system, this method comprises:
It is to reside in first supervisor mode instruction that the supervisory routine in the data handling system is carried out that core is handled in response, preserves first state of branch prediction logic in the heart in process nuclear;
It is second supervisor mode instruction that supervisory routine is carried out that core is handled in response, recovers first state of branch prediction logic;
The 3rd instruction that core is the program execution of supervisory routine trustship is handled in response, preserves second state of branch prediction logic; And
Instructing with the 4th of supervisory routine trustship of core execution handled in response, recovers second state of branch prediction logic.
2. the method for claim 1, wherein branch prediction logic is configured to branch prediction data is cached in the branch prediction table, first state of wherein preserving branch prediction logic comprises at least one clauses and subclauses is kept in the branch prediction table, and first state that wherein recovers branch prediction logic comprises at least one clauses and subclauses of recovering in the branch prediction table.
3. method as claimed in claim 2, first state of wherein preserving comprise according to frequency of utilization and only the clauses and subclauses of a subclass are kept in the branch prediction table.
4. method as claimed in claim 2, first state of wherein preserving only comprise effective clauses and subclauses are kept in the branch prediction table.
5. the method for claim 1, first state of wherein preserving comprise data that compression and first state interrelate and packed data are stored in the storer, and first state that wherein recovers comprises the packed data in the decompress(ion) storer.
6. the method for claim 1, first state of wherein preserving comprises makes process nuclear hardware logic in the heart preserve first state.
7. method as claimed in claim 6, wherein hardware logic comprises the microcode logic.
8. the method for claim 1, first state of wherein preserving branch prediction logic switches with the background of leaving supervisory routine and carries out with interrelating, and first state that wherein recovers branch prediction carries out with the background switching to supervisory routine with interrelating.
9. the method for claim 1, second state of wherein preserving branch prediction logic switches with the background of leaving program and carries out with interrelating, and second state that wherein recovers branch prediction carries out with the background switching to program with interrelating.
10. the method for claim 1, wherein program comprises visitor's operating system of supervisory routine trustship, and wherein third and fourth instruction is visitor's mode instruction.
11. the method for claim 1, wherein program comprises the consumer process of supervisory routine trustship, and wherein third and fourth instruction is the user model instruction.
12. method as claimed in claim 11, wherein consumer process is by visitor's operating system trustship of supervisory routine trustship.
13. the method for claim 1 further comprises response and handles the 5th supervisor mode instruction that core is the supervisory routine execution, the state of replacement branch prediction logic.
14. the method for claim 1 further comprises and utilizes supervisory routine to forbid that selectively program is preserved or the state of recovery branch prediction logic.
15. a circuit arrangement comprises:
Handle core; And
Be arranged in process nuclear branch prediction logic in the heart;
Wherein handle core and be configured to respond that to handle core be to reside in first supervisor mode instruction that the supervisory routine in the data handling system is carried out, preserve first state of branch prediction logic; It is second supervisor mode instruction that supervisory routine is carried out that core is handled in response, recovers first state of branch prediction logic; The 3rd instruction that core is the program execution of supervisory routine trustship is handled in response, preserves second state of branch prediction logic; And respond instructing of processing core execution with the 4th of supervisory routine trustship, recover second state of branch prediction logic.
16. circuit arrangement as claimed in claim 15, wherein branch prediction logic is configured to branch prediction data is cached in the branch prediction table, wherein handle core and be configured to first state of preserving branch prediction logic by at least one clauses and subclauses being kept in the branch prediction table, and wherein handle core and be configured to recover first state of branch prediction logic by recovering at least one clauses and subclauses in the branch prediction table.
17. circuit arrangement as claimed in claim 15, wherein handle core and be configured to switch first state of preserving branch prediction logic with interrelating, and wherein switch first state that recovers branch prediction with interrelating with background to supervisory routine with the background of leaving supervisory routine.
18. circuit arrangement as claimed in claim 15, wherein handle core and be configured to switch second state of preserving branch prediction logic with interrelating, and wherein switch second state that recovers branch prediction with interrelating with background to program with the background of leaving program.
19. circuit arrangement as claimed in claim 15, wherein program comprises visitor's operating system of supervisory routine trustship, and wherein third and fourth instruction is visitor's mode instruction.
20. circuit arrangement as claimed in claim 15, wherein program comprises the consumer process of supervisory routine trustship, and wherein third and fourth instruction is the user model instruction.
21. circuit arrangement as claimed in claim 20, wherein consumer process is by visitor's operating system trustship of supervisory routine trustship.
22. circuit arrangement as claimed in claim 15 is wherein handled core and is configured to respond the 5th supervisor mode instruction that core is the supervisory routine execution, the state of replacement branch prediction logic handled.
23. circuit arrangement as claimed in claim 15 is wherein handled core and is configured to the state that response management procedure Selection ground is forbidden the program preservation or recovered branch prediction logic.
24. a data handling system comprises circuit arrangement as claimed in claim 15.
CN201310024377.6A 2012-01-23 2013-01-23 Control the method and apparatus of branch prediction logic Active CN103218209B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US13/355,884 US8935694B2 (en) 2012-01-23 2012-01-23 System and method for selectively saving and restoring state of branch prediction logic through separate hypervisor-mode and guest-mode and/or user-mode instructions
US13/355,884 2012-01-23

Publications (2)

Publication Number Publication Date
CN103218209A true CN103218209A (en) 2013-07-24
CN103218209B CN103218209B (en) 2016-03-02

Family

ID=47748196

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310024377.6A Active CN103218209B (en) 2012-01-23 2013-01-23 Control the method and apparatus of branch prediction logic

Country Status (4)

Country Link
US (1) US8935694B2 (en)
CN (1) CN103218209B (en)
DE (1) DE102013200503A1 (en)
GB (1) GB2500456B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927149A (en) * 2013-01-14 2014-07-16 想象力科技有限公司 Indirect branch prediction
CN105005737A (en) * 2015-07-31 2015-10-28 天津大学 Branch prediction attack oriented micro-architecture level safety protection method
CN113722016A (en) * 2021-09-10 2021-11-30 拉卡拉支付股份有限公司 Application program configuration method, device, equipment, storage medium and program product

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9520180B1 (en) 2014-03-11 2016-12-13 Hypres, Inc. System and method for cryogenic hybrid technology computing and memory
US10503538B2 (en) 2014-06-02 2019-12-10 International Business Machines Corporation Delaying branch prediction updates specified by a suspend branch prediction instruction until after a transaction is completed
US10261826B2 (en) 2014-06-02 2019-04-16 International Business Machines Corporation Suppressing branch prediction updates upon repeated execution of an aborted transaction until forward progress is made
US10235172B2 (en) * 2014-06-02 2019-03-19 International Business Machines Corporation Branch predictor performing distinct non-transaction branch prediction functions and transaction branch prediction functions
US10289414B2 (en) 2014-06-02 2019-05-14 International Business Machines Corporation Suppressing branch prediction on a repeated execution of an aborted transaction
US9742630B2 (en) * 2014-09-22 2017-08-22 Netspeed Systems Configurable router for a network on chip (NoC)
US10348563B2 (en) 2015-02-18 2019-07-09 Netspeed Systems, Inc. System-on-chip (SoC) optimization through transformation and generation of a network-on-chip (NoC) topology
US10063569B2 (en) * 2015-03-24 2018-08-28 Intel Corporation Custom protection against side channel attacks
US10218580B2 (en) 2015-06-18 2019-02-26 Netspeed Systems Generating physically aware network-on-chip design from a physical system-on-chip specification
US10423418B2 (en) 2015-11-30 2019-09-24 International Business Machines Corporation Method for maintaining a branch prediction history table
US10013270B2 (en) 2015-12-03 2018-07-03 International Business Machines Corporation Application-level initiation of processor parameter adjustment
US10452124B2 (en) 2016-09-12 2019-10-22 Netspeed Systems, Inc. Systems and methods for facilitating low power on a network-on-chip
US10489296B2 (en) 2016-09-22 2019-11-26 International Business Machines Corporation Quality of cache management in a computer
US20180159786A1 (en) 2016-12-02 2018-06-07 Netspeed Systems, Inc. Interface virtualization and fast path for network on chip
US10063496B2 (en) 2017-01-10 2018-08-28 Netspeed Systems Inc. Buffer sizing of a NoC through machine learning
US10469337B2 (en) 2017-02-01 2019-11-05 Netspeed Systems, Inc. Cost management against requirements for the generation of a NoC
US11144457B2 (en) 2018-02-22 2021-10-12 Netspeed Systems, Inc. Enhanced page locality in network-on-chip (NoC) architectures
US10983910B2 (en) 2018-02-22 2021-04-20 Netspeed Systems, Inc. Bandwidth weighting mechanism based network-on-chip (NoC) configuration
US10547514B2 (en) 2018-02-22 2020-01-28 Netspeed Systems, Inc. Automatic crossbar generation and router connections for network-on-chip (NOC) topology generation
US11176302B2 (en) 2018-02-23 2021-11-16 Netspeed Systems, Inc. System on chip (SoC) builder
US11023377B2 (en) 2018-02-23 2021-06-01 Netspeed Systems, Inc. Application mapping on hardened network-on-chip (NoC) of field-programmable gate array (FPGA)
US10705848B2 (en) * 2018-05-24 2020-07-07 Arm Limited TAGE branch predictor with perceptron predictor as fallback predictor
GB2574042B (en) 2018-05-24 2020-09-09 Advanced Risc Mach Ltd Branch Prediction Cache
WO2020051254A1 (en) * 2018-09-05 2020-03-12 Fungible, Inc. Dynamically changing configuration of data processing unit when connected to storage device or computing device
US10740140B2 (en) 2018-11-16 2020-08-11 International Business Machines Corporation Flush-recovery bandwidth in a processor
US11061681B2 (en) * 2019-07-25 2021-07-13 International Business Machines Corporation Instruction streaming using copy select vector
US11301254B2 (en) * 2019-07-25 2022-04-12 International Business Machines Corporation Instruction streaming using state migration
US20220100519A1 (en) * 2020-09-25 2022-03-31 Advanced Micro Devices, Inc. Processor with multiple fetch and decode pipelines

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6108775A (en) * 1996-12-30 2000-08-22 Texas Instruments Incorporated Dynamically loadable pattern history tables in a multi-task microprocessor
CN1755660A (en) * 2004-09-28 2006-04-05 惠普开发有限公司 Diagnostic memory dump method in a redundant processor
US20100139393A1 (en) * 2007-01-08 2010-06-10 Vibro-Meter, Inc. Scan lock and track fluid characterization and level sensor apparatus and method
US7849298B2 (en) * 2002-12-05 2010-12-07 International Business Machines Corporation Enhanced processor virtualization mechanism via saving and restoring soft processor/system states

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4679141A (en) 1985-04-29 1987-07-07 International Business Machines Corporation Pageable branch history table
EP0218335A3 (en) 1985-08-30 1989-03-08 Advanced Micro Devices, Inc. Control store for electronic processor
US5228131A (en) 1988-02-24 1993-07-13 Mitsubishi Denki Kabushiki Kaisha Data processor with selectively enabled and disabled branch prediction operation
US5606715A (en) 1995-06-26 1997-02-25 Motorola Inc. Flexible reset configuration of a data processing system and method therefor
US5949995A (en) 1996-08-02 1999-09-07 Freeman; Jackie Andrew Programmable branch prediction system and method for inserting prediction operation which is independent of execution of program code
DE69727773T2 (en) 1996-12-10 2004-12-30 Texas Instruments Inc., Dallas Improved branch prediction in a pipeline microprocessor
US6108776A (en) 1998-04-30 2000-08-22 International Business Machines Corporation Globally or selectively disabling branch history table operations during sensitive portion of millicode routine in millimode supporting computer
US6223280B1 (en) 1998-07-16 2001-04-24 Advanced Micro Devices, Inc. Method and circuit for preloading prediction circuits in microprocessors
US6574712B1 (en) 1999-11-08 2003-06-03 International Business Machines Corporation Software prefetch system and method for predetermining amount of streamed data
US6877089B2 (en) 2000-12-27 2005-04-05 International Business Machines Corporation Branch prediction apparatus and process for restoring replaced branch history for use in future branch predictions for an executing program
JP3802038B2 (en) 2003-01-30 2006-07-26 富士通株式会社 Information processing device
US7308571B2 (en) 2004-10-06 2007-12-11 Intel Corporation Overriding processor configuration settings
US20080114971A1 (en) 2006-11-14 2008-05-15 Fontenot Nathan D Branch history table for debug
US8171328B2 (en) * 2008-12-31 2012-05-01 Intel Corporation State history storage for synchronizing redundant processors

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6108775A (en) * 1996-12-30 2000-08-22 Texas Instruments Incorporated Dynamically loadable pattern history tables in a multi-task microprocessor
US7849298B2 (en) * 2002-12-05 2010-12-07 International Business Machines Corporation Enhanced processor virtualization mechanism via saving and restoring soft processor/system states
CN1755660A (en) * 2004-09-28 2006-04-05 惠普开发有限公司 Diagnostic memory dump method in a redundant processor
US20100139393A1 (en) * 2007-01-08 2010-06-10 Vibro-Meter, Inc. Scan lock and track fluid characterization and level sensor apparatus and method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103927149A (en) * 2013-01-14 2014-07-16 想象力科技有限公司 Indirect branch prediction
CN105005737A (en) * 2015-07-31 2015-10-28 天津大学 Branch prediction attack oriented micro-architecture level safety protection method
CN113722016A (en) * 2021-09-10 2021-11-30 拉卡拉支付股份有限公司 Application program configuration method, device, equipment, storage medium and program product

Also Published As

Publication number Publication date
CN103218209B (en) 2016-03-02
GB201300405D0 (en) 2013-02-20
DE102013200503A1 (en) 2013-07-25
GB2500456B (en) 2014-03-19
US8935694B2 (en) 2015-01-13
GB2500456A (en) 2013-09-25
US20130191825A1 (en) 2013-07-25

Similar Documents

Publication Publication Date Title
CN103218209A (en) Method and apparatus for controlling branch prediction logic
CN104067227B (en) Branch prediction logic
US10831504B2 (en) Processor with hybrid pipeline capable of operating in out-of-order and in-order modes
US10360168B1 (en) Computing in parallel processing environments
CN103870397A (en) Method for visiting data in data processing system and circuit arrangement
CN102906701B (en) Control the method and system of the access to adapter in a computing environment
CN100555247C (en) Justice at multinuclear/multiline procedure processor high speed buffer memory is shared
CN1538296B (en) Method and system for scheduling coprocessor
JP6381541B2 (en) Methods, circuit configurations, integrated circuit devices, program products for processing instructions in a data processing system (conversion management instructions for updating address translation data structures in remote processing nodes)
CN104375890B (en) Processor for performing safety embedded container extends
CN101833475B (en) Method and device for execution of instruction block
US6567839B1 (en) Thread switch control in a multithreaded processor system
US9122465B2 (en) Programmable microcode unit for mapping plural instances of an instruction in plural concurrently executed instruction streams to plural microcode sequences in plural memory partitions
CN104331528A (en) General purpose processing unit with low power digital signal processing (dsp) mode
CN103197953A (en) Speculative execution and rollback
CN104375958A (en) Management of transactional memory access requests by a cache memory
CN102906707A (en) Managing processing associated with hardware events
TWI603198B (en) Decentralized allocation of resources and interconnect structures to support the execution of instruction sequences by a plurality of engines
US20110265093A1 (en) Computer System and Program Product
CN102906694A (en) Load instruction for communicating with adapters
CN104662515A (en) Dynamically erectable computer system
CN104054052A (en) Providing by one program to another program access to a warning track facility
CN101421791B (en) For the method for queue depth management communicated between main frame and peripherals
CN100583064C (en) Method and equipment for removing alias addresses from an alias address pool
CN103218259A (en) Computer-implemented method for selection of a processor, which is incorporated in multiple processors to receive work, which relates to an arithmetic problem

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant