CN107003859A - By the runtime code parallelization for continuously monitoring repetitive instruction sequence - Google Patents
By the runtime code parallelization for continuously monitoring repetitive instruction sequence Download PDFInfo
- Publication number
- CN107003859A CN107003859A CN201580063897.5A CN201580063897A CN107003859A CN 107003859 A CN107003859 A CN 107003859A CN 201580063897 A CN201580063897 A CN 201580063897A CN 107003859 A CN107003859 A CN 107003859A
- Authority
- CN
- China
- Prior art keywords
- register
- instruction
- processor
- monitored
- monitoring
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 119
- 230000003252 repetitive effect Effects 0.000 title claims abstract description 46
- 238000000034 method Methods 0.000 claims abstract description 53
- 230000008859 change Effects 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 6
- 125000004122 cyclic group Chemical group 0.000 claims 2
- 238000004321 preservation Methods 0.000 claims 1
- 230000015654 memory Effects 0.000 description 14
- 230000006870 function Effects 0.000 description 12
- 238000000605 extraction Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 5
- 239000000872 buffer Substances 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 101100369993 Mus musculus Tnfsf10 gene Proteins 0.000 description 1
- 206010038743 Restlessness Diseases 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011437 continuous method Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- XXUZFRDUEGQHOV-UHFFFAOYSA-J strontium ranelate Chemical compound [Sr+2].[Sr+2].[O-]C(=O)CN(CC([O-])=O)C=1SC(C([O-])=O)=C(CC([O-])=O)C=1C#N XXUZFRDUEGQHOV-UHFFFAOYSA-J 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012559 user support system Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/36—Preventing errors by testing or debugging software
- G06F11/362—Software debugging
- G06F11/3644—Software debugging by instrumenting at runtime
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3808—Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache
- G06F9/381—Loop buffering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Multimedia (AREA)
- Advance Control (AREA)
Abstract
A kind of method includes, in the processor (20) of the instruction of configuration processor code, the instruction in the repetitive sequence of the instruction of monitoring traversal flow control track, to construct the specification for the register access that monitored instruction is carried out.Based on the specification, call multiple hardware threads to be performed in parallel the correspondent section of repetitive instruction sequence at least in part.During performing, proceed the monitoring to instruction at least one of section.
Description
Invention field
Present invention relates in general to processor design, and more particularly to the method for runtime code parallelization and it is
System.
Background of invention
The various technologies for parallelization software code dynamically at runtime have been proposed.For example, Akkary and
Collection of thesis " A Dynamics of the Driscoll in the annual international symposium of the 31st microarchitecture in December, 1998
The processor architecture for realizing that the dynamic multi streaming of single program is performed is described in Multithreading Processor ", should
Article is incorporated herein by reference.
Collection of thesis " Speculatives of the Marcuellu et al. in the 12nd international supercomputing meeting of 1998
Describe a kind of processor micro-architecture in Multithreaded Processors ", the micro-architecture by do not need compiler or
The control supposition technology of User support to perform the multiple control threads obtained from single program simultaneously, and this article is by quoting simultaneously
Enter herein.
Collection of thesis " Clustereds of the Marcuello and Gonzales in the 13rd international supercomputing meeting of 1999
Proposed in Speculative Multithreaded Processors " and predictive is operationally produced from single threaded application
The micro-architecture of thread, this article is incorporated herein by reference.
In the 14th collection of thesis " the A Quantitative parallel with distributed treatment international symposium of 2000
In Assessment of Thread-Level Speculation Techniques " (it is incorporated herein by reference),
The value that Marcuello and Gonzales are analyzed in the benefit and thread units of different threads supposition technology is predicted, branch is pre-
The influence of survey, thread initialization expense and connection.
International conference (PDCS's of the Ortiz-Arroyo and Lee in the 16th Parallel and distributed computation system of 2003
03) collection of thesis " describes in Dynamic Simultaneous Multithreaded Architecture " and is referred to as moving
The multi-threaded architecture of state simultaneous multi-threading (DSMT), the multi-threaded architecture is performed from single on multiline procedure processor core at the same time
Multiple threads of program, this article is incorporated herein by reference.
Summary of the invention
The embodiment of invention as described herein provides a kind of method, and this method is included in the instruction of configuration processor code
In processor, the instruction in the repetitive sequence of the instruction of monitoring traversal flow control track, to construct monitored instruction progress
The specification of register access.Based on the specification, call multiple hardware threads to be performed in parallel repetitive instruction sequence at least in part
The correspondent section of row.Proceed the monitoring to instruction during performing, at least one in section.
In certain embodiments, continue to monitor instruction including in response to being detected in given section to different flow control tracks
Change, create and be configured to the different rule of different flow control tracks by monitoring the instruction along different flow control tracks
Model.This method can be included in after the different flow control tracks of monitoring, preserve different specifications or different flow control tracks.
In certain embodiments, repetitive sequence includes circulation or function.In embodiment, continuing to monitor to instruct includes continuing
All sections of monitoring.Alternately, continue to monitor to instruct and at least one for the section for following flow control track can be monitored including continuation
Subset.In addition, alternately, continue to monitor the part subset that instruction can include selecting section, and it is selected to continue monitoring
Section in subset.Selection subset can include the every n-th the being created section that selection is used to continue to monitor, according to predefined week
Phase model selection is used for the section for continuing to monitor, and/or is randomly selected for continuing the section of monitoring.
In certain embodiments, this method is included in the cycle of given quantity after repetitive sequence of stopping, instruction or micro-
The monitoring to giving the instruction in section is terminated in operation.In the exemplary embodiment, it is based upon with different controlling stream tracks not
With the given quantity of section setting, the given quantity of given section is set.
Embodiments in accordance with the present invention, are additionally provided with the processor including execution pipeline and monitoring unit.Perform stream
Waterline is configured as the instruction of configuration processor code.Monitoring unit is configured as being known for the instruction of monitoring traversal flow control track
Other repetitive instruction sequence, to construct the specification for the register access that monitored instruction is carried out, is configured as being based on the rule
Model, calls multiple hardware threads in execution pipeline to be at least partly performed in parallel the correspondent section of repetitive instruction sequence, and
And the instruction at least one in continuing monitoring section during performing.
Embodiments in accordance with the present invention, additionally provide a kind of method, and this method is included in the instruction of configuration processor code
In processor, the instruction in the repetitive sequence of instruction is monitored, to construct the specification for the register access that monitored instruction is carried out.
Based on the instruction monitored, termination criteria is estimated.If meeting termination criteria, the monitoring to instruction is terminated.If
Monitoring to instruction terminates in the case where being unsatisfactory for termination criteria, then based on specification, by multiple sections of repetitive instruction sequence
Perform parallelization.
In certain embodiments, termination criteria depends on the position being ultimately written to register, the register being written into
Number, instruction or microoperation counting, perform the cycle counting and/or branch instruction number exceed threshold value.In addition or can
Alternatively, termination criteria can be depended on monitoring up in the program code that had previously monitored position, monitor up to being identified as
Position in the program code repeated, monitoring period or before the branch misprediction that occurs, and/or depending on the one of processor
Individual or more mark is used as global or global-local classification.
In embodiment, the specification is uniquely associated with the flow control track of the instruction traversal by being monitored.Another
In individual embodiment, the specification is associated with two or more flow control tracks of the instruction traversal by being monitored.
In embodiment, the prison to instruction is immediately performed after being decoded in the execution pipeline of processor to instruction
Control.In another embodiment, the monitoring to instruction is performed before execute instruction in the execution pipeline of processor, it includes
Monitoring is then by the speculative instructions being eliminated.In certain embodiments, this method is included in whole monitoring period and retains deposit
The respective name of device.
Embodiments in accordance with the present invention, additionally provide a kind of method, and it is included in the processing of the instruction of configuration processor code
In device, the repetitive sequence of instruction is monitored, and is commanded according to wherein each register corresponding as operand or destination
Order by what is monitored come to instructing the register accessed to classify.Classification based on register, by the multiple of repetitive sequence
The execution parallelization of section.
In certain embodiments, classification is carried out to register to be included at least some in register being categorized as following deposit
One in device:Local register, its first time in the sequence monitored occurs being as destination;Global register,
It is used only as operand in the sequence monitored;And the overall situation-local register, its first time in the sequence monitored
Appearance is to be used as destination in the sequence monitored as operand, and then.
In embodiment, carrying out classification to register includes, if given register is used as the destination in conditional order
Appear in first in monitored sequence, then the given register is categorized as the overall situation-local register.It is right in embodiment
Register, which carries out classification, to be included, if given register appears in monitored sequence for the first time as the destination in conditional order
In row, then the given register is categorized as the overall situation-local register, else if meeting the condition of conditional order, then will given
Determine register and be categorized as local register.
In another embodiment, carrying out classification to register includes, if given register is as in same instructions
Both operand and destination are appeared in monitored sequence for the first time, then are categorized as the given register global-local
Register.
In certain embodiments, register is classified also includes at least one subset for register, identification pair
Register is ultimately written relevant position of the operation in the sequence monitored.In the embodiment disclosed, identification is ultimately written
The position of operation includes counting the write-in at least one subset of register.Alternately, identification is ultimately written behaviour
The position of work can include the address that record is ultimately written operation.
In embodiment, in addition to register, also one or more marks of processor are performed to being ultimately written
The identification of the position of operation.In another embodiment, the subset of register at least includes the deposit for being categorized as local register
Device.In yet another embodiment, the subset of register at least includes the register for being categorized as the overall situation-local register.
In the exemplary embodiment, the identification of the position to being ultimately written operation includes writing the condition of corresponding registers and grasped
Make.In embodiment, in addition to register, performed also directed to one or more marks of processor according to being used as operand
Or the classification of the use order of destination.
Embodiments in accordance with the present invention, are additionally provided with the processor including execution pipeline and monitoring unit.Perform stream
Waterline is configured as the instruction of configuration processor code.Monitoring unit is configured as the instruction in the repetitive sequence of monitoring instruction, with
Just the specification for the register access that monitored instruction is carried out is constructed, is configured as based on the instruction monitored, to termination criteria
It is estimated, if meeting termination criteria, terminates the monitoring to instruction, and if the monitoring to instruction is being unsatisfactory for terminating
Terminate in the case of standard, then based on specification, by multiple sections of execution parallelization of repetitive instruction sequence.
Embodiments in accordance with the present invention, additionally provide the processor including execution pipeline and monitoring unit.Perform flowing water
Line is configured as the instruction of configuration processor code.Monitoring unit is configured as the repetitive sequence of monitoring instruction, is configured as basis
The deposit that wherein each register is commanded the respective sequence as operand or destination to access the instruction by being monitored
Device is classified, and the classification based on register is come multiple sections of execution of parallelization repetitive sequence.
From the described in detail below of the embodiments of the invention carried out with reference to accompanying drawing, the present invention will be more fully understood,
Wherein:
Brief description
Fig. 1 is the frame that embodiments in accordance with the present invention schematically illustrate the processor for performing runtime code parallelization
Figure;
Fig. 2 is the figure of parallelization when embodiments in accordance with the present invention schematically illustrate the operation of program circulation;
Fig. 3 is the figure of the program circulation according to an embodiment of the invention with multiple tracks and corresponding scoreboard;And
Fig. 4 is that embodiments in accordance with the present invention are schematically illustrated for the continuous method for monitoring repetitive instruction sequence
Flow chart.
Embodiment
Summary
It is described herein The embodiment provides within a processor to the operation of code when parallelization
Improved method and apparatus.In the disclosed embodiment, processor recognizes the command sequence repeated, and establishment and execution are claimed
For multiple parallel codes sequences of section, it performs the sequence of different appearance.These sections are scheduled, for passing through multiple hardware lines
Cheng Jinhang is performed parallel.
For example, repetitive sequence can include circulation, in this case, section includes multiple loop iterations, the part of iteration
Or the continuity of circulation.As another example, repetitive sequence can include function, in this case, and section is adjusted including multiple functions
With the part of function or function continue.Operationally, parallelization is performed to precompile code.Term " repetitive sequence " is often referred to
Be to be accessed and performed multiple any command sequence again.
In certain embodiments, when recognizing repetitive sequence, instruction and structure " scoreboard " in processor supervisory sequence-
By the specification of the instruction that is monitored to the access of register.The specific flow control track of scoreboard and the sequence traversal by being monitored
It is associated.Processor determines how and when to create and perform multiple based on the information collected in scoreboard and track
Section.
In certain embodiments, scoreboard includes the classification of the register accessed by the instruction monitored.Point of register
Class is used as the operand in monitored instruction or the order of destination depending on wherein register.
In certain embodiments, although microoperation is different from instruction, similar mode pair is instructed also according to monitoring
Microoperation is monitored.In other words, in certain embodiments, scored according to microoperation granularity rather than instruction granularity to produce
Plate simultaneously performs monitoring.
The classification can for example its occur first be as destination part (L) register, be used only as operand
Made a distinction between global (G) register and the overall situation-part (GL) register, occurring first for the GL registers is as behaviour
Count and be subsequently used as destination.Additionally or alternatively, scoreboard can be indicated to register at least some registers
The position being ultimately written in the sequence monitored of operation.The instruction can include the write operation number of times for example to register
Counting.
In certain embodiments, processor is continuing to monitor the instruction in one or more sections during performing.It is this after
Continuous monitoring is enabled a processor to may be for example due to data dependence conditional branching in quickly and efficiently convection control track
The change for instructing and occurring in the section monitored is reacted.This document describes several examples of selection standard, processor can
To select the section for continuing to monitor using the selection standard.
In certain embodiments, processor terminates before some section terminates and interrupts the monitoring to this section.It is described herein
The various termination criterias that can be used by processor.The technology of additional disclosure is kept for many of multiple corresponding flow control tracks
Individual synchronous scoreboard, and suitably replace between them.
Processor architecture
Fig. 1 is the block diagram that embodiments in accordance with the present invention are schematically illustrated processor 20.Processor 20 runs precompile
Software code, while make code perform parallelization.Processor in programmed instruction from memory operationally by being extracted
And it is analyzed to perform parallelization decision-making when being decoded.
In this example, processor 20 includes execution pipeline, and it is single that the execution pipeline includes one or more extractions
Member 24, one or more decoding units 28, out of order (OOO) buffer 32 and execution unit 36.Extraction unit 24 refers to from multistage
Extraction procedure in cache is made to instruct, the cache memory includes 1 grade of (L1) instruction cache 40 in this example
With 2 grades of (L2) instruction caches 44.
Inch prediction unit 48 is predicted and is expected during performing by the flow control track of program pass (herein for letter
It is referred to as the sake of short " track ").Prediction is typically based on address or the programmed counting of the prior instructions extracted by extraction unit 24
Device (PC) value.Based on prediction, inch prediction unit 48 indicates which new instruction extraction unit 24 will extract.The stream control of unit 48
The system prediction parallelization that also influence code is performed, as will be explained.
The instruction decoded by decoding unit 28 is stored in OOO buffers 32, for carrying out unrest by execution unit 36
Sequence is performed, i.e. the order for not being compiled and storing in memory according to instructing.Alternately, the instruction buffered can be by
Order is performed.Buffered instruction then is sent, so that various execution units 36 are performed.In this example, execution unit 36 is wrapped
Include one or more multiply-accumulate (MAC) units, one or more ALUs (ALU) and it is one or more plus
Load/memory cell.Additionally or alternatively, execution unit 36 can include the execution unit of other suitable types, such as floating-point
Unit (FPU).
The result produced by execution unit 36 is stored in register file and/or multi-stage data cache memory
In, it includes 1 grade of (L1) data high-speed and caches 52 and 2 grades of (L2) data high-speed cachings 56 in this example.In some embodiments
In, L2 data-cache memories 56 and L2 instruction caches 44 are implemented as in same physical memory independent
Memory area or simply share identical memory in the case of no fixed predistribution.
In certain embodiments, processor 20 also includes the thread monitor and execution unit for being responsible for runtime code parallelization
60.The following detailed description of the function of unit 60.
The configuration of processor 20 shown in Fig. 1 is example arrangement, and it is purely chosen for the sake of clear concept.
In alternate embodiment, any other suitable processor configuration can be used.For example, in Fig. 1 configuration, extracting single using multiple
Member 24 and multiple decoding units 28 realize multithreading.Each hardware thread can include being assigned to extract the finger for thread
The decoding unit that the extraction unit of order and being assigned to is decoded to the instruction extracted.Additionally or alternatively, it is multi-thread
Journey can realize in many other ways, such as using multiple OOO buffers of every thread, single execution unit and/or every
The single register file of thread.In another embodiment, different threads can include different respective handling cores.
As another example, without cache or there can be different cache structures, not have in every thread
Branch prediction realizes processor in the case of having single branch prediction.Processor can include add ons, for example, only
Give some instances, resequencing buffer (ROB), register renaming.In addition, alternately, disclosed technology can be with apparatus
There is the computing device of any other suitable micro-architecture.
Processor 20 can use any suitable hardware for example use one or more application specific integrated circuits (ASIC),
Field programmable gate array (FPGA) or other equipment type are realized.Additionally or alternatively, software can be used or using hard
Some elements of processor 20 are realized in the combination of part and software element.Such as random access memory (RAM) can be used
The memory of any suitable type realizes instruction caches and data-cache memory.
Processor 20 can perform function described herein with software programming.The software can be by network with electronics
Form downloads to processor, for example, or alternatively or additionally, it can be provided and/or to be stored in non-transitory tangible
On medium, such as, magnetic memory, optical memory or electronic memory.
Runtime code parallelization
In certain embodiments, the unit 60 in processor 20 recognizes the command sequence repeated and it is performed parallel
Change.Repetitive instruction sequence can include for example, the corresponding iteration of program circulation, the corresponding appearance of function or process or repeatedly being weighed
New any other suitable command sequence for accessing and performing.In the present context, term " repetitive instruction sequence " refers in mistake
Go to perform the command sequence of its flow control track (for example, PC value sequences) at least one times.Data value (for example, register value) may
It is different because of execution.
In the disclosed embodiment, processor 20 is called and held parallel or semi-concurrently by using multiple hardware threads
The multiple code segments of row carry out parallelization repetitive instruction sequence.The corresponding code segment of each thread execution, the corresponding iteration of such as circulation,
Multiple (being not necessarily continuous) loop iteration, a part for loop iteration, the continuity of circulation, its function or a parts continue
Or the section of any other suitable type.
The parallelization in the stage casing of processor 20 is performed using multiple hardware threads.In the example of fig. 1, although not being inevitable
, but each thread includes distributing via unit 60 with the corresponding extraction unit 24 for performing one or more sections and corresponding solution
Code unit 28.
In fact, data dependency is present between section.For example, the calculating performed in some loop iteration may depend on
The result of the calculating performed in previous ones.The ability of section parallelization is set to depend greatly on this data dependence
Property.
Fig. 2 is the figure of parallelization when illustrating the operation of program circulation according to the example embodiment of the present invention.The top of the figure
Portion shows dependence of the example procedure circulation (being reappeared from the bzip benchmark tests version of SPECint protos test suite PROTOSs) between instruction
Property.Between instruction of some dependences in same loop iteration, and instruction of other dependences in given loop iteration and
Between instruction in previous ones.
The bottom of the figure shows how unit 60 is come using four thread TH1...TH4 according to an embodiment of the invention
The parallelization circulation.The table lists across ten a cycle altogether and which of which thread is performed within each cycle refers to
Order.Each instruction is represented by the instruction number in its number of iterations and iteration.For example, " 14 " represent the 4th finger of the 1st loop iteration
Order.In this example, instruction 5 and instruction 7 are ignored, and assume perfect branch prediction.
Thread executory irregular (staggering) is due to data dependency.For example, due to instruction 21, (second repeatedly
The first instruction in generation) dependent on instruction 13 (the 3rd instruction of first time iteration), therefore thread TH2 is unable to the He of execute instruction 21
Instruction 22 (the first two instruction in second of loop iteration) is until the cycle 1.There is similar dependence in whole table.Total comes
Say, this Parallelization Scheme can perform loop iteration twice within six cycles, or every three cycles perform an iteration.
It is important to note that the parallelization shown in Fig. 2 only considers the data dependency between instruction, without considering it
He constrains, the availability of such as execution unit.Therefore, the cycle in Fig. 2 is not necessarily converted directly into the corresponding clock cycle.Example
Such as, the instruction for being listed as performing in period demand in Fig. 2 may actually be performed within the more than one clock cycle, because it
Compete identical execution unit 36.
The parallelization monitored based on section
In certain embodiments, unit 60 determines how to make code parallel by the instruction in monitoring processor streamline
Change.In response to identification repetitive instruction sequence, unit 60 starts to monitor the sequence when sequence is extracted by processor, decodes and performed
Row.
In some embodiments, the function of unit 60 can be distributed between multiple hardware threads so that given line
Journey can be considered as monitoring its instruction during performing.However, for the sake of clarity, description below assumes monitoring function by list
Member 60 is performed.
As a part for monitoring process, unit 60 generates and instructs the flow control track of traversal and at this by what is monitored
It is referred to as the monitoring table of scoreboard in text.Scoreboard includes being used to appear in the corresponding of each register in monitored sequence
Entry.In embodiment, each register is categorized as global (G), local (L) or global-local (GL) by unit 60, and is referred to
Show the classification in respective entries in scoreboard.Depended on as the classification of G, L or GL register wherein in the sequence monitored
Middle register is used as operand (its value is read) and/or the order as (value is written into) destination.
In embodiment, local (L) register, which is defined as its first time in the sequence monitored, to be occurred being conduct
The register of destination (if any, subsequent appearance can be used as operand and/or destination).Global (G) register
It is defined as being used only as the register of operand in the sequence monitored, i.e. register is read but from being not written into.Entirely
Office-local (GL) register, which is defined as its first time in the sequence monitored, to be occurred being as operand and then in institute
The sequence of monitoring is used as the register of destination.As long as the order between " first time " and " subsequent " is retained, then first
During secondary appearance and subsequent appearances are likely to occur in different instruction or identical instructs.
In alternative embodiments, the exception of above-mentioned classification is related to the conditional order that register is used as to destination.If this
The instruction of sample is that occur first time of the register in the instruction monitored, then the register is classified as GL.Otherwise, according to upper
Rule is stated, register is classified as part (L).If for example, " mov_cond r2, #5 " are in the instruction that is monitored for instruction
R2 first time write-in, then the register r2 in the instruction will be classified as GL, and otherwise register r2 is classified as L.For
For in embodiment, if such instruction is that occur first time of the register in the instruction monitored, the register is divided
Class is GL.Otherwise, register is just only categorized as part when meeting the condition of instruction., should if not meeting condition
Register is not classified.
In embodiment, unit 60 is classified using superset, i.e. merge two or more classification defined above one
Rise.In such embodiments, even if given register is only local in given section, unit 60 is still classified as
GL, with simplify control.
For G, L or GL alternative it is relative to working as according to the dependence of wherein register by the class definition of register
The section generation of preceding monitoring and the position that uses are classified to register:The operand quilt generated outside the section of current monitor
It is categorized as global (G) or global-local (GL).The operand generated in the section of current monitor is classified as part (L).
In certain embodiments, unit 60 finds at least some registers in scoreboard and indicated to being monitored
The position being ultimately written of register in sequence.The instruction is used during performing by unit 60, for determining when send out
The instruction gone out in the subsequent section being ultimately written dependent on this.The general principle of the mechanism behind is, only in section Y execution
To after being ultimately written of the register, can just send the instruction in the section X dependent on the value of the register in prior segment Y.
In one embodiment, the number of times being written into by register in the sequence to being monitored is counted to realize most
Write-in is indicated afterwards.Unit 60 determines that this counts (being expressed as #WRITES), and is indicated in the entry of the register in scoreboard
The #WRITES values.
In this embodiment, when performing section Y, unit 60 counts the write-in number of times of the register to being discussed.
When count reach the #WRITES values indicated in scoreboard when, unit 60, which is concluded, runs into last write-in, and therefore allow to send according to
Rely the execute instruction in the section X of the register discussed.
A known solution for mitigating data dependency is renaming register, i.e. in different Duan Zhongwei
Given register distributes different titles.In certain embodiments, unit 60 avoids carrying out renaming to register, i.e.
Retain register title in the different iteration of repetitive sequence, to promote the counting to #WRITES.In other words, unit 60 is tieed up
Hold the alignment of the Register renaming map between section and thread.
#WRITES mechanism described above is only depicted as finding and indicating to register in the sequence monitored
The position being ultimately written mechanism example.In alternative embodiments, unit 60 can exist in any other suitable way
Found in scoreboard and indicate the position being ultimately written to register, such as by being recorded in scoreboard to register most
The address of write operation afterwards.
In various embodiments, unit 60 is not necessarily required to count the #WRITES of each register.For example, single
Member 60 can be counted for being categorized as GL register, register for being categorized as L or both to #WRITES.
In certain embodiments, unit 60 includes condition write instruction in #WRITES counting, is but regardless of the condition
It is no to be satisfied.In other embodiments, unit 60 is only in the condition that meets and actually performs when writing just in #WRITES counting
Include condition write instruction.
In certain embodiments, processor 20 is maintained at one or more marks used in conditional order.Mark
Example include zero flag (being otherwise "false" for "true" if the result of nearest arithmetical operation is zero), minus flag (if
The result of nearest arithmetical operation is negative, then is "true", is otherwise "false"), carry flag is (if nearest add operation is produced
Carry, then be "true", is otherwise "false"), overflow indicator (if nearest add operation causes spilling, for "true", otherwise for
"false") or any other suitable mark.Generally, mark is implemented as the corresponding positions in special mark register.Mark is by each
Plant instruction or microoperation updates.
In certain embodiments, they to monitor mark with control register similar mode and are included in note by unit 60
Divide in plate.For example, as explained above, label category can be G, L or GL by unit 60.Additionally or alternatively, unit
60 can be counted and be recorded to the position being ultimately written of each mark in the sequence that is monitored (for example, by mark
The #WRITES of will is counted and recorded).
In certain embodiments, unit 60 monitors whole section from the beginning to the end not always necessarily.In the exemplary embodiment, it is single
Member 60 can be monitored (for example, counted and/or classified to register to write-in) since some midpoint in section, and
And update existing scoreboard.
Continuous monitoring to multiple tracks
In certain embodiments, unit 60 continues to monitor the instruction in one or more threads during its execution.Change
Sentence is talked about, once repetitive instruction sequence is identified and monitors, monitoring process would not terminate.During performing, unit 60 is directed to
At least some threads proceed monitoring and scoreboard construction process.As described above, the function of unit 60 can be distributed in thread
Between so that each thread (or at least one subset of thread) monitors the instruction of its execution.
Continuous monitoring to section during performing is important, for example, being performed for effectively handling its Program in fortune
The scene of another flow control track is switched to during row from a flow control track.Under many actual scenes, the program is in tool
Have between two or more repetitive instruction sequences of different tracks alternately.In certain embodiments, unit 60 is by concurrently
Create and keep multiple different scoreboard, handle such scene for the corresponding scoreboard of each track.
Fig. 3 is that embodiments in accordance with the present invention schematically illustrate the program with multiple tracks and corresponding scoreboard and followed
The figure of ring.One section of code with nine instructions is illustrated on the left of the figure.Program circulation is since instruction 2 and at instruction 9
Loop back.
In this example, instruction 4 is the conditional branch instructions for jumping to instruction 6 and skip instruction 5.Therefore, according to condition
The result of branch instruction, some sections are represented as 70A track (not using branch) by following, and other threads will be followed by table
It is shown as 70B track (using branch).
In certain embodiments, unit 60 monitors at least some sections during its execution.When the section for detecting monitored is opened
When beginning follows not previously known track, unit 60 is that new track creates single scoreboard, and records register classification and #
WRITES, as explained above.In this example, unit 60 create and keep for track 70A scoreboard 74A and be used for
Track 70B scoreboard 74B.
By keeping multiple scoreboard, unit 60 rapidly can make a response to trail change.As long as section is followed previously
The track of monitoring, unit 60 has just had the effective scoreboard for the track.Therefore, unit 60 can use available
Scoreboard calls new section immediately.In the case of not this mechanism, calling for new section will be delayed by, until for
The scoreboard of new track be constructed (mean efficiency reduction, and processor may assume incorrectly that its monitoring track be
New).
Fig. 3 multi-trace scene is the simple example in order to show the mechanism of continuous monitoring and multiple scoreboard and describe
Scene.Disclosed technology can be used for wherein performing any other alternate suitable type between multiple flow control tracks
In scene.
Fig. 4 is the stream that embodiments in accordance with the present invention are schematically illustrated the method for continuously monitoring repetitive instruction sequence
Cheng Tu.The figure illustrates that the combination in given thread is performed and monitored.Unit 60 generally directed to be selected for monitoring it is any
Sequence performs the process, and is not necessarily for each section being performed.
In starting step 80, this method starts from unit 60 and provides given track to given hardware thread and corresponding score
Plate.& monitoring steps 84 are being performed, the thread discussed performs section and is performed in parallel monitoring.It is used as one of monitoring process
Point, thread generates the scoreboard for its track followed.
After the execution of section is completed, in checking step 88, unit 60 checks whether track is new.In other words, it is single
Member 60 checks whether the scoreboard for the track has been present.If track is new, unit 60 is remembered in recording step 92
Record the scoreboard for the trajectory creation.The scoreboard follows the subsequent thread of same trajectories by being provided to.Otherwise, if that is,
Scoreboard has been present, then this method terminates at end step 96.
In certain embodiments, scoreboard is uniquely associated with single flow control track.In other embodiments, give
Scoreboard can be associated with two or more tracks.
In certain embodiments, unit 60 for example monitors each section during performing using Fig. 4 method.It is real substituting
Apply in example, unit 60 can select only to monitor the subset of section.The quantity and mark of the section of monitoring are selected for by control,
It is possible that different balances are set between computing cost and parallelization performance.
Unit 60 can use various standards or logic to select which section monitored.For example, unit 60 can be periodically
The section for monitoring is selected, for example, (for some selected constant N) called every n-th section.In another embodiment
In, unit 60 can be selected according to predefined determinate pattern (for example, section 2,3,5,12,13,15,22,23,25...)
Monitor section.As another example, unit 60 can be randomly selected for the section of monitoring, for example, skipping the section of random amount, select
For the section of monitoring, the section of another random amount is skipped, is selected for section of monitoring etc..
As another example, unit 60 can come in response to some the predefined events occurred during the execution of section
Select the section for monitoring.Because different threads may follow different flow control tracks, so unit 60 can select prison
Control follows the section of particular track interested.In addition, alternately, unit 60 can be during performing using any other suitable
Standard select the section for monitoring.
In embodiment, the monitoring carried out by unit 60 is performed in the instruction of the output of decoder module 28.
At this point in streamline, in the sense that some instructions being decoded will be eliminated and is not submitted, instruction is still to push away
The property surveyed.For example, due to branch misprediction, removing may occur.Instructed however, it is preferable that being monitored in this early stage,
Because instruction is still organized in order.In addition, monitoring instruction enables unit 60 to prolong with lower early stage streamline
Utilize scoreboard (that is, calling parallel section using scoreboard) late.
Monitor termination criteria
In certain embodiments, unit 60 terminates the monitoring to this section before given section terminates.Therefore, list can be passed through
Various termination criterias are assessed and used to member 60.Several non-limiting examples of termination criteria may include:
■ exceedes threshold value to the write-in number of times of register.
The number for the register that ■ is written into exceedes threshold value.
■ is instructed or the counting of microoperation exceedes threshold value.
The counting that ■ performs the cycle exceedes threshold value.
The number of ■ branch instructions exceedes threshold value.
■ is monitored up to the position in previously monitored program code.
■ is monitored up to the position in the program code (for example, backward branch or branch link-BL) for being identified as to repeat.
■ branch mispredictions occur in the instruction before one of instruction in monitoring or monitoring.
■ marks are GL or the overall situation.
In addition, alternately, any other suitable termination criteria can be used.
Although embodiment paper described herein general processor, method described herein and it is
System can be also used in other application, such as in graphics processing unit (GPU) or other application specific processors.
Accordingly, it will be recognized that embodiments described above is quoted by way of example, and the present invention is not limited
In the content above having had been particularly shown and described.On the contrary, the scope of the present invention includes the group of various features as described above
Close and sub-portfolio and variant of the invention and modification, the variants and modifications by those skilled in the art read before retouch
State expecting afterwards and be not disclosed in the prior art.The file for being incorporated by reference into present patent application is considered as the application's
Part, except any term in these files being incorporated to a certain extent with it is clearly or hidden in this specification
Outside the mode for the definition conflict made containing ground is defined, the definition in this specification should be only considered.
Claims (60)
1. a kind of method, including:
In the processor of the instruction of configuration processor code, the finger in the repetitive sequence of the instruction of monitoring traversal flow control track
Order, to construct the specification of the register access carried out by the instruction monitored;
Based on the specification, call multiple hardware threads to be at least partly performed in parallel the corresponding of the repetitive instruction sequence
Section;And
Instruction in continuing to monitor at least one in described section during performing.
2. according to the method described in claim 1, wherein, continuing to monitor the instruction is included in response to being detected in given section
Change to different flow control tracks, institute is created and is configured to by monitoring along the instruction of the different flow control tracks
State the different specification of different flow control tracks.
3. method according to claim 2, and be included in after the monitoring different flow control tracks, described in preservation not
With specification or the different flow control tracks.
4. method according to claim 1 or 2, wherein, the repetitive sequence includes circulation or function.
5. method according to claim 1 or 2, wherein, continuing the monitoring instruction includes continuing to monitor all sections.
6. method according to claim 1 or 2, wherein, the continuation monitoring instruction includes continuation monitoring and follows the stream
Control at least one subset in described section of track.
7. method according to claim 1 or 2, wherein, continue to monitor part for instructing and including selecting described section
Collection, and continue to monitor the section in selected subset.
8. method according to claim 7, wherein, select the subset to include at least one in following operation:
Select the every n-th the being created section for continuing to monitor;
According to predefined cyclic pattern, the section for continuing to monitor is selected;And
It is randomly selected for the section for continuing to monitor.
9. method according to claim 1 or 2, and it is included in the week for stopping the given quantity after the repetitive sequence
The monitoring to giving the instruction in section is terminated in phase, instruction or microoperation.
10. method according to claim 9, and set including being based upon the different sections with different controlling stream tracks
The given quantity, sets the given quantity of the given section.
11. a kind of processor, including:
Execution pipeline, the execution pipeline is configured as the instruction of configuration processor code;And
Monitoring unit, the monitoring unit is configured as the finger of the repetitive instruction sequence of the traversal flow control track of monitoring identification
Order, to construct the specification for the register access that monitored instruction is carried out, based on the specification, calls the execution pipeline
In multiple hardware threads to be at least partly performed in parallel the correspondent section of the repetitive instruction sequence, and during performing after
Instruction at least one in continuous described section of monitoring.
12. processor according to claim 11, wherein, in response to being detected in given section to different flow control tracks
Change, the monitoring unit is configured as creating and construct use by monitoring along the instruction of the different flow control tracks
Different specification in the different flow control tracks.
13. processor according to claim 12, wherein, after the different flow control tracks are monitored, the monitoring
Unit is configured as preserving the different specification or the different flow control tracks.
14. the processor according to claim 11 or 12, wherein, the repetitive sequence includes circulation or function.
15. the processor according to claim 11 or 12, wherein, the monitoring unit is configured as continuing to monitor all
Section.
16. the processor according to claim 11 or 12, wherein, the monitoring unit is configured as continuation monitoring and follows institute
State at least one subset in described section of flow control track.
17. the processor according to claim 11 or 12, wherein, the monitoring unit is configured as the portion of described section of selection
Molecule Set, and continue to monitor the section in selected subset.
18. processor according to claim 17, wherein, the monitoring unit is configured as by performing in following operate
At least one select the subset:
Select the every n-th the being created section for continuing to monitor;
According to predefined cyclic pattern, the section for continuing to monitor is selected;And
It is randomly selected for the section for continuing to monitor.
19. the processor according to claim 11 or 12, wherein, the monitoring unit is configured as stopping the repetition
The monitoring to giving the instruction in section is terminated in cycle, instruction or the microoperation of given quantity after sequence.
20. processor according to claim 19, wherein, the monitoring unit is configured as being based upon with different controls
The given quantity that the different sections of trajectory mark are set sets the given quantity of the given section.
21. a kind of method, including:
In the processor of the instruction of configuration processor code, the instruction in the repetitive sequence of instruction is monitored, to construct what is monitored
Instruct the specification of the register access carried out;
Based on the instruction monitored, termination criteria is estimated;
If meeting the termination criteria, the monitoring to the instruction is terminated;And
If the monitoring to the instruction terminates in the case where being unsatisfactory for the termination criteria, based on the specification come
By multiple sections of execution parallelization of the repetitive instruction sequence.
22. method according to claim 21, wherein, the termination criteria depends at least one in following item:
Counting, execution cycle to the position being ultimately written of register, the number for the register being written into, instruction or microoperation
Counting or branch instruction number exceed threshold value;
It is described to monitor up to the position in the described program code previously monitored;
It is described to monitor up to the position in the described program code for being identified as repeating;
Branch misprediction occur the monitoring period or before;And
One or more marks of the processor are used as global or global-local classification.
23. method according to claim 21, wherein, the flow control track of the specification and the instruction traversal by being monitored
Uniquely it is associated.
24. method according to claim 21, wherein, the specification and two or more for instructing traversal by being monitored
Individual flow control track is associated.
25. the method according to any one of claim 21-24, wherein, it is right in the execution pipeline of the processor
The instruction is immediately performed the monitoring to the instruction after being decoded.
26. the method according to any one of claim 21-24, wherein, held in the execution pipeline of the processor
The monitoring to the instruction, including monitoring are performed before the row instruction then by the speculative instructions being eliminated.
27. the method according to any one of claim 21-24, and it is described to be included in the whole monitoring period reservation
The respective name of register.
28. a kind of method, including:
In the processor of the instruction of configuration processor code, the repetitive sequence of instruction is monitored, and the instruction by being monitored is visited
The register root asked is classified according to wherein each register by the respective sequence of the instruction as operand or destination;
And
Based on the classification of the register, by multiple sections of execution parallelization of the repetitive sequence.
29. method according to claim 28, wherein, carrying out classification to the register is included in the register
At least some one be categorized as in following register:
Local register, first time of the local register in the sequence monitored occurs being as destination;
Global register, the global register is used only as operand in the sequence monitored;And
The overall situation-local register, first time of the overall situation-local register in the sequence monitored occurs being as operation
Number, and then it is used as destination in the sequence monitored.
30. method according to claim 28, wherein, make if carrying out classification to the register and including given register
Appeared in for the first time in the sequence monitored by the destination in conditional order, then by the given register be categorized as it is global-
It is local.
31. method according to claim 28, wherein, make if carrying out classification to the register and including given register
Appeared in for the first time in the sequence monitored by the destination in conditional order, then by the given register be categorized as it is global-
It is local, else if meeting the condition of the conditional order, then the given register is categorized as part.
32. method according to claim 28, wherein, make if carrying out classification to the register and including given register
Appeared in for the first time in the sequence monitored by both the operand in same instructions and destination, then by the given register
It is categorized as global-local.
33. method according to claim 28, wherein, the register, which is classified, also to be included being directed to the register
In at least one subset, recognize and relevant position of the operation in the sequence monitored be ultimately written to the register.
34. method according to claim 33, wherein, the position of operation is ultimately written described in identification to be included to post described
The write-in of at least one subset of storage is counted.
35. method according to claim 33, wherein, the position of operation is ultimately written described in identification to be included described in record most
The address of write operation afterwards.
36. method according to claim 33, wherein, in addition to the register, one also to the processor
Or more a mark perform identification to the position for being ultimately written operation.
37. method according to claim 33, wherein, the subset of the register at least includes being classified as part
Register.
38. method according to claim 33, wherein, the subset of the register at least includes being classified as entirely
The register of office-part.
39. method according to claim 33, wherein, the identification to the position for being ultimately written operation is included to corresponding
The condition write operation of register.
40. method according to claim 28, wherein, in addition to the register, also directed to the one of the processor
Individual or more mark is performed according to the classification as operand or the use order of destination.
41. a kind of processor, including:
Execution pipeline, the execution pipeline is configured as the instruction of configuration processor code;And
Monitoring unit, the monitoring unit is configured as the instruction in the repetitive sequence of monitoring instruction, to construct what is monitored
The specification of the register access carried out is instructed, based on the instruction monitored, termination criteria is estimated, if meeting the end
Only standard, then terminate the monitoring to the instruction, and if the monitoring to the instruction is being unsatisfactory for the termination mark
Terminate in the case of standard, then based on the specification, by multiple sections of execution parallelization of the repetitive instruction sequence.
42. processor according to claim 41, wherein, the end condition depends at least one in following item:
Counting, execution cycle to the position being ultimately written of register, the number for the register being written into, instruction or microoperation
Counting or branch instruction number exceed threshold value;
It is described to monitor up to the position in the described program code previously monitored;
It is described to monitor up to the position in the described program code for being identified as repeating;
Branch misprediction occur the monitoring period or before;And
One or more marks of the processor are used as global or global-local classification.
43. processor according to claim 41, wherein, the flow control rail of the specification and the instruction traversal by being monitored
Mark is uniquely associated.
44. processor according to claim 41, wherein, two of the specification and the instruction traversal by being monitored or more
Multiple flow control tracks are associated.
45. the processor according to any one of claim 41-44, wherein, the monitoring unit is configured as described
The instruction is monitored immediately after being decoded in the execution pipeline of processor to the instruction.
46. the processor according to any one of claim 41-44, wherein, the monitoring unit is configured as described
The instruction, including monitoring are monitored before the instruction is performed in the execution pipeline of processor then by the predictive being eliminated
Instruction.
47. the processor according to any one of claim 41-44, wherein, the monitoring unit is configured as whole
The monitoring period retains the respective name of the register.
48. a kind of processor, including:
Execution pipeline, the execution pipeline is configured as the instruction of configuration processor code;And
Monitoring unit, the monitoring unit is configured as the repetitive sequence of monitoring instruction, to posting that the instruction by being monitored is accessed
Storage is classified according to wherein each register by the respective sequence of the instruction as operand or destination, and is based on
The classification of the register, by multiple sections of execution parallelization of the repetitive sequence.
49. processor according to claim 48, wherein, the monitoring unit be configured as by the register extremely
Some are categorized as one in following register less:
Local register, first time of the local register in the sequence monitored occurs being as destination;
Global register, the global register is used only as operand in the sequence monitored;And
The overall situation-local register, first time of the overall situation-local register in the sequence monitored occurs being as operation
Number, and then it is used as destination in the sequence monitored.
50. processor according to claim 48, wherein, the monitoring unit is configured as, if given register is made
Appeared in for the first time in the sequence monitored by the destination in conditional order, then by the given register be categorized as it is global-
It is local.
51. processor according to claim 48, wherein, the monitoring unit is configured as, if given register is made
Appeared in for the first time in the sequence monitored by the destination in conditional order, then by the given register be categorized as it is global-
It is local, else if meeting the condition of the conditional order, then the given register is categorized as part.
52. processor according to claim 48, wherein, the monitoring unit is configured as, if given register is made
Appeared in for the first time in the sequence monitored by both the operand in same instructions and destination, then by the given register
It is categorized as global-local.
53. processor according to claim 48, wherein, when classifying to the register, the monitoring unit
At least one subset for the register is configured as, recognizes that the operation of being ultimately written to the register is being monitored
Relevant position in sequence.
54. processor according to claim 48, wherein, the monitoring unit is configured as by the register
The write-in of at least one subset is counted to recognize the position for being ultimately written operation.
55. processor according to claim 48, wherein, the monitoring unit is configured as described finally writing by recording
Enter the address of operation to recognize the position for being ultimately written operation.
56. processor according to claim 48, wherein, the monitoring unit be configured as except the register it
Outside, also directed to the position that operation is ultimately written described in one or more landmark identifications of the processor.
57. processor according to claim 48, wherein, the subset of the register at least includes being classified as office
The register in portion.
58. processor according to claim 48, wherein, the subset of the register at least includes being classified as entirely
The register of office-part.
59. processor according to claim 48, wherein, the identification to the position for being ultimately written operation is included to phase
Answer the condition write operation of register.
60. processor according to claim 48, wherein, the monitoring unit be configured as except the register it
Outside, also directed to one or more marks of the processor, perform according to the use order as operand or destination
Classification.
Applications Claiming Priority (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201414578518A | 2014-12-22 | 2014-12-22 | |
US14/578,516 US9348595B1 (en) | 2014-12-22 | 2014-12-22 | Run-time code parallelization with continuous monitoring of repetitive instruction sequences |
US14/578,518 | 2014-12-22 | ||
US14/578,516 | 2014-12-22 | ||
PCT/IB2015/059470 WO2016103092A1 (en) | 2014-12-22 | 2015-12-09 | Run-time code parallelization with continuous monitoring of repetitive instruction sequences |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107003859A true CN107003859A (en) | 2017-08-01 |
Family
ID=56149346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201580063897.5A Pending CN107003859A (en) | 2014-12-22 | 2015-12-09 | By the runtime code parallelization for continuously monitoring repetitive instruction sequence |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3238040A4 (en) |
CN (1) | CN107003859A (en) |
WO (1) | WO2016103092A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522049A (en) * | 2017-09-18 | 2019-03-26 | 展讯通信(上海)有限公司 | The verification method and device of register are shared in a kind of synchronizing multiple threads system |
CN111381883A (en) * | 2018-12-27 | 2020-07-07 | 图核有限公司 | Instruction cache in a multithreaded processor |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020144092A1 (en) * | 2001-01-31 | 2002-10-03 | Siroyan Limited. | Handling of loops in processors |
TW200939117A (en) * | 2007-12-31 | 2009-09-16 | Advanced Micro Devices Inc | Processing pipeline having stage-specific thread selection and method thereof |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE69130138T2 (en) * | 1990-06-29 | 1999-05-06 | Digital Equipment Corp | Jump prediction unit for high-performance processor |
US7571302B1 (en) * | 2004-02-04 | 2009-08-04 | Lei Chen | Dynamic data dependence tracking and its application to branch prediction |
US8607209B2 (en) * | 2004-02-04 | 2013-12-10 | Bluerisc Inc. | Energy-focused compiler-assisted branch prediction |
JP4287799B2 (en) * | 2004-07-29 | 2009-07-01 | 富士通株式会社 | Processor system and thread switching control method |
US8291197B2 (en) * | 2007-02-12 | 2012-10-16 | Oracle America, Inc. | Aggressive loop parallelization using speculative execution mechanisms |
US7711929B2 (en) * | 2007-08-30 | 2010-05-04 | International Business Machines Corporation | Method and system for tracking instruction dependency in an out-of-order processor |
US8683185B2 (en) * | 2010-07-26 | 2014-03-25 | International Business Machines Corporation | Ceasing parallel processing of first set of loops upon selectable number of monitored terminations and processing second set |
-
2015
- 2015-12-09 EP EP15872056.5A patent/EP3238040A4/en not_active Withdrawn
- 2015-12-09 CN CN201580063897.5A patent/CN107003859A/en active Pending
- 2015-12-09 WO PCT/IB2015/059470 patent/WO2016103092A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020144092A1 (en) * | 2001-01-31 | 2002-10-03 | Siroyan Limited. | Handling of loops in processors |
TW200939117A (en) * | 2007-12-31 | 2009-09-16 | Advanced Micro Devices Inc | Processing pipeline having stage-specific thread selection and method thereof |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522049A (en) * | 2017-09-18 | 2019-03-26 | 展讯通信(上海)有限公司 | The verification method and device of register are shared in a kind of synchronizing multiple threads system |
CN111381883A (en) * | 2018-12-27 | 2020-07-07 | 图核有限公司 | Instruction cache in a multithreaded processor |
CN111381883B (en) * | 2018-12-27 | 2023-09-29 | 图核有限公司 | Instruction cache in a multithreaded processor |
Also Published As
Publication number | Publication date |
---|---|
EP3238040A1 (en) | 2017-11-01 |
WO2016103092A1 (en) | 2016-06-30 |
EP3238040A4 (en) | 2018-09-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9348595B1 (en) | Run-time code parallelization with continuous monitoring of repetitive instruction sequences | |
CN107003858A (en) | By the runtime code parallelization for monitoring repetitive instruction sequence | |
CN108170471B (en) | Type-based prioritization instructions | |
CN107250977A (en) | Pass through the runtime code parallelization of the approximate monitoring to command sequence | |
CN106104481A (en) | Certainty and opportunistic multithreading | |
CN108027769A (en) | Instructed using register access and initiate instruction block execution | |
CN107430509A (en) | The run time parallelization of the code execution of specification is accessed based on Approximation Register | |
US10013255B2 (en) | Hardware-based run-time mitigation of conditional branches | |
US9389868B2 (en) | Confidence-driven selective predication of processor instructions | |
CN103250131A (en) | Single cycle multi-ranch prediction including shadow cache for early far branch prediction | |
US10268519B2 (en) | Scheduling method and processing device for thread groups execution in a computing system | |
CN101763249A (en) | Branch checkout for reduction of non-control flow commands | |
CN107450888A (en) | Zero-overhead loop in embedded dsp | |
US9652246B1 (en) | Banked physical register data flow architecture in out-of-order processors | |
US11188332B2 (en) | System and handling of register data in processors | |
EP3264263A1 (en) | Sequential monitoring and management of code segments for run-time parallelization | |
CN107918547A (en) | Refreshing in parallelized processor | |
CN111538535B (en) | CPU instruction processing method, controller and central processing unit | |
CN107003859A (en) | By the runtime code parallelization for continuously monitoring repetitive instruction sequence | |
CN104536914B (en) | The associated processing device and method marked based on register access | |
CN109783143A (en) | Control method and control equipment for instruction pipeline stream | |
US20160004538A1 (en) | Multiple issue instruction processing system and method | |
US10180841B2 (en) | Early termination of segment monitoring in run-time code parallelization | |
CN107430511A (en) | Parallel execution based on the command sequence monitored in advance | |
US9928068B2 (en) | Hardware managed dynamic thread fetch rate control |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170801 |