CN105824605B - A kind of controlled dynamic multi-threading and processor - Google Patents
A kind of controlled dynamic multi-threading and processor Download PDFInfo
- Publication number
- CN105824605B CN105824605B CN201610272367.8A CN201610272367A CN105824605B CN 105824605 B CN105824605 B CN 105824605B CN 201610272367 A CN201610272367 A CN 201610272367A CN 105824605 B CN105824605 B CN 105824605B
- Authority
- CN
- China
- Prior art keywords
- instruction
- mark
- processor
- thread
- instructions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 claims abstract description 27
- 230000008520 organization Effects 0.000 claims abstract description 3
- 230000008569 process Effects 0.000 claims description 8
- 238000009826 distribution Methods 0.000 claims description 7
- 230000006870 function Effects 0.000 claims description 5
- 238000003860 storage Methods 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 description 12
- 238000012545 processing Methods 0.000 description 7
- 238000012913 prioritisation Methods 0.000 description 6
- 238000004519 manufacturing process Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000010304 firing Methods 0.000 description 4
- 206010038743 Restlessness Diseases 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Advance Control (AREA)
Abstract
The invention discloses a kind of controlled dynamic multi-threading and processor, the method is to a processor using pipeline organization, increases mark newly in its order structure, which includes two partial informations:The precedence information of thread and mark corresponding instructions belonging to mark corresponding instructions;Processor controls its corresponding instruction according to mark, and thread in mark and precedence information are launched and perform the instruction.The processor includes at least an instruction system containing mark, a program that can identify and track mark performs control unit(Branch), one can identify mark and carry out decoded instruction demoding circuit, an arithmetic operation unit that can identify and decode mark and corresponding internal storage location.The present invention can dynamically dispatch all arithmetic hardware resources of a processor so as to improve the operational capability of processor, and need not increase the hardware of many complexity.
Description
Technical field
The present invention relates to field of processors, more particularly to a kind of controlled dynamic multi-threading(Dynamic Multi-
) and processor threading.
Background technology
In order to improve the operational capability of processor, many parallel processing techniques are developed, such as superscale(Super-
scalar), assembly line(Pipeline)Overlength wide instruction(VLIW), single instrction execution more(SIMD), etc..But due to one
The instruction processing of a software program is that order performs, the dependence of instruction and data present in its implementation procedure
(dependencies)Cause processor it is frequent be waited for thus limit these parallel processing technique efficiency
Play.
In order to overcome the executory dependence of instruction, some improve the technology of instruction issue efficiency, such as out of order code(Out-
of-Order), control program prediction(Branch Prediction)Etc. being developed, but these technologies have its limitation
Property.Their either hardware are extremely complex, or efficiency improves the application of limited and unsuitable embedded system.One insertion
Formula system, especially moves equipment, such as mobile communication, mobile unit, Wearable etc., and the requirement to processor performance is not only
Operational capability will height, more require that power consumption wants low and real-time is eager to excel.
Multithreaded parallel processor technology(Multi-Threading), because it can be parallel in same processor
Handle 2 or multiple completely self-contained operation programs, thus can be relatively good solve execution process instruction in control and number
Limited according to operational efficiency caused by dependence, wherein synchronizing multiple threads technology(Simultaneous Multi-threading)
And token driving multithreading(Token Triggered multi-threading, SMT) in some processor products
Arrive good application, such as the POWER5 of the Hyper-Threading of Intel, IBM, Sun Microsystems
The MT of UltraSPARC T2 and MIPS are to employ SMT technologies.The SandblasterDSP cores of Sandbridge using
Token drives multithreading.
Although the dependency problem in SMT technologies energy settlement procedure implementation procedure, SMT technologies are except needing to per thread
Have will also add thread trace logic outside the register needed for a set of executive program of oneself in every grade of assembly line, and increase is altogether
Enjoy the size of resource, such as Instruction Cache, TLBs etc..Its thread trace logic not only want the stroke of track thread also to check and
Judge the thread whether complete by executed.It is in due to having substantial amounts of thread and performs or half execution state, thus CPU
The necessary sufficiently large Thrashing to avoid between unnecessary thread of the size of Caches and TLB, the complexity of its hardware
Greatly increase with the increase of Thread Count thus limit it and be difficult to apply to embeded processor and low power processor
Design.
Following table is a typical SMT multithread programs implementation procedure:
Token driving multithreading is a kind of time-division multithreading, since it can only perform same line within each clock cycle
The instruction of Cheng Chengxu, thus its hardware complexity will simplify much compared to SMT, but efficiency also and then declines.Its main feature is that:
1. each clock cycle only has a thread to send instruction;
2. all threads are sequence startings as shown in Figure 1, thus simplifying thread selection circuit;
3. per thread has the clock cycle of identical execute instruction, it is not necessary to relies on inspection and the hardware that detours;
4. operation result can guarantee that the thread in next time has just obtained before performing.
Following table gives the program process of token driving multithreading:
1 | Clock cycle i:Thread T0 sends instructions j and j+1 and j+2 |
2 | Clock cycle i+1:Thread T0 sends instructions k and k+1 |
3 | Clock cycle i+2:Thread T2 sends instructions l |
4 | Clock cycle i+3:Thread T3 sends instructions m and m+1 and m+2 |
5 | Clock cycle i+4:Thread T0 instructs missing, and processor waits |
6 | Clock cycle i+5:Thread T1 sends instructions K+2 |
7 | Clock cycle i+6:Thread T2 sends instructions I+1 and I+2 |
8 | Clock cycle i+7:Thread T3 instructs missing, and processor waits |
But since token driving multiline procedure processor can only perform specific threading operation in the defined clock cycle, because
If this is in this clock cycle, its thread specified is due to instruction or the missing of data(missing)Or because dependence and
When being unable to firing order, which is just wasted.In order to overcome this this defect of token driving multithreading, a machine
Meeting multithreading is developed.
Chance multithreading allow a multiline procedure processor a thread within the clock cycle of some if
When there is no an effective instruction need not this clock cycle of HOLD, but give the clock cycle to other thread for having effective instruction.
The clock cycle that will be wasted originally gives other thread as one " chance " and uses.
For having multiline procedure processor to one using this method, its thread no longer can only be sent out one by the per thread cycle
The limitation of secondary instruction, and any " chance " is available with as long as can the firing order clock cycle in each clock cycle
The thread of original start does not instruct effectively within the clock cycle.
1. chance multithreading is as token driving multithreading, it is a kind of timesharing multithreading, each
Clock cycle can only perform a program.Its executable Thread Count is limited to the Thread Count of hardware.
2. chance multithreading needs a branch prediction circuit, for a processor using VLIW structures, it is needed
The dependence of each sub-instructions is predicted.Therefore branch prediction circuit is considerably complicated.
3. chance multithreading needs the thread identity of one group of 2 dimension(ID)Register instructs for track thread
Implementation status per level production line is to ensure that result data will not be mixed up unrest.
4. in practical application, per thread increase is necessarily using each arithmetic element of the processor of chance multithreading
One group of 2 dimension is totally independent of the data registers of other threads to prevent the data between half thread for performing state
Thrashing。
5. in order to the firing order within the clock cycle of each processor, the instruction memory belonging to thread is also necessary
The clock frequency identical with processor clock cycle is operated in ensure that thread can timely read instruction.Thus, multithreading
One there would not be the characteristics of reducing power consumption of memory.
Analysis is it can be seen that more than token driving using the hardware complexity of the processor of chance multiprogram technology above
Threading increase is very much, and in order to enable per thread to read instruction in each clock cycle, its instruction memory
Clock frequency must be as the master oscillator frequenc of processor, and the power consumption of such processor can substantially increase.Thus chance is multi-thread
Journey technology is not appropriate for being applied to low-power-consumption embedded processor design.
Fig. 2 is that the program of chance multithreading performs schematic diagram.
The content of the invention
The technical problems to be solved by the invention are to be directed to the defects of involved in background technology, there is provided a kind of controllable dynamic
State multi-threading and processor.
The present invention uses following technical scheme to solve above-mentioned technical problem:
A kind of controlled dynamic multi-threading, uses one pipeline organization and the processor with I-cache,
Increase mark in its order structure newly, which includes two partial informations:Thread and mark belonging to mark corresponding instructions correspond to
The precedence information of instruction, the precedence information are used for the execution sequence for indicating instruction and the correlation with its front and rear instruction;
Processor controls its corresponding instruction according to mark, is launched by the precedence information and affiliated thread of the instruction and is performed this and refers to
Order.
As a kind of further prioritization scheme of controlled dynamic multi-threading of the present invention, the processor is controlled according to mark
Its corresponding instruction is made, launches by the precedence information and affiliated thread of the instruction and performs comprising the following steps that for the instruction:
Step 1), according to etc. precedence information in the corresponding mark of instruction to be performed read instruction;
Step 2), instruction decoding and distribution:
The decoding circuit of processor is by step 1)In read instruction decoding be mark and each sub-instructions, processor
Distribution logic assigns them to different arithmetic elements according to the function of each sub-instructions and goes to perform;
Step 3), instruction execution:
For each sub-instructions, processor reads corresponding register according to the thread information in instruction mark belonging to it
Data, and by the register of the result of execution deposit its respective thread;
Step 4), jump to step 1).
According to specific Hardware Implementation, step 1 and 2 may require that multiple clock cycle sometimes, when only needing 1 sometimes
Clock cycle, step 3)N-1 clock cycle is then needed, n is the pipeline series of processor arithmetic element.
As a kind of further prioritization scheme of controlled dynamic multi-threading of the present invention, the step 1)Detailed step
It is as follows:
Step 1.1), the instruction reading circuit of processor check I-Cache whether have instruction by etc. it is pending, i.e., whether deposit
In the instruction in Valid states;
Step 1.1.1)If only existing 1 instruction for being in Valid states, the instruction is read;
Step 1.1.2), if the instructions of more than 2 are in Valid states, then checked according to the corresponding mark of instruction
The priority of which bar instruction is high;
Step 1.1.2.1), the instruction of other instructions is higher than if there is priority, then reads the instruction,
Step 1.1.2.2), the instruction of other instructions is higher than if there is no priority, then judges whether back
The instruction thread of execution;
Step 1.1.2.2.1), if there is the instruction thread of back execution, read the order line performed with back
The instruction of Cheng Butong reads instruction according to the order of thread;
Step 1.1.2.2.1), if there is no the instruction thread of back execution, read and instructed according to the order of thread.
As a kind of further prioritization scheme of controlled dynamic multi-threading of the present invention, the mark write by software or
Person's compiler automatically writes in compilation process.
As a kind of further prioritization scheme of controlled dynamic multi-threading of the present invention, the processor is sent out for multiple instructions
Processor is penetrated, its every instruction is all independent to carry the mark of oneself.
As a kind of further prioritization scheme of controlled dynamic multi-threading of the present invention, the processor is sent out for multiple instructions
Processor is penetrated, a plurality of instruction shares one group of mark.
As a kind of further prioritization scheme of controlled dynamic multi-threading of the present invention, the processor is sent out for single instrction
Processor is penetrated, the corresponding mark of its every instruction.
The invention also discloses a kind of processor based on the controlled dynamic multi-threading, including at least mark's
Instruction system, one can identify and track mark program perform control unit, one can identify mark and be decoded
Instruction demoding circuit, an arithmetic operation unit that can identify and decode mark and corresponding internal storage location.
The present invention compared with prior art, has following technique effect using above technical scheme:
1. Multi-thread control circuit and the complicated effective prediction circuit of instruction that need not be complicated can be transferred efficiently
The hardware resource of processor, the priority and correlation of effective decision instruction;
2. do not had to worry according to the priority orders execute instruction of instruction because the missing of some instructions or data
And cause the waste of hardware resource and the phenomenon of operation result confusion occur;
3. effectively improving the utilization rate of the hardware resource of processor, and then reduce power consumption.
Brief description of the drawings
Fig. 1 is the token driving multithreading thread flow figure of four threads;
Fig. 2 is that chance multithread programs perform schematic diagram;
Fig. 3 is the single instrction structure chart with mark;
Fig. 4 is the single mark order structures figure of multiple instructions band;
Fig. 5 is the more mark order structures figures of multiple instructions band;
Fig. 6 is a multithreading execution flow chart with 6 level production lines;
Fig. 7 is a block diagram of the processor with software-controllable dynamic multi streaming.
Embodiment
Technical scheme is described in further detail below in conjunction with the accompanying drawings:
The present invention is to increase by one group of corresponding instruction in the instruction system of the processor of a use multi-stage pipeline arrangement
Thread identity and its precedence information symbol(mark).The instruction system of processor is being read(Fetch)While instruction
Obtain the mark of the thread identity for performing the instruction and the information of its priority.The instruction control arithmetic system of processor
(Branch)The hardware resource of processor and execution sequence are arranged according to the information of the mark.This mark will always with
The each step performed with instruction in order to track the execution step of the instruction, and according to precedence information indicate this instruction with
The dependence of instruction/data before and after it and the order preferentially performed.
The content of the mark of the present invention can set execution according to the requirement of application system when programmer programs to be somebody's turn to do
The thread of programmed instruction and execution priority or compiler set thread and according to programs automatically in compilation process
Calculation function sets its priority in the correlation for differentiating the instruction and its front and rear instruction and data.
Using software design patterns program execution thread and provide in the program priority of every instruction and with being held before and after it
The information of the correlation of row instruction is attached in each instruction and is used as an identifier(mark).Processor hardware only needs can
The dynamic hardware resource for transferring processor can be realized and efficiently perform the finger of multithreading by identifying the information of these mark
Order operation.
The line for being also possible that while running using the execution thread of software design patterns and the program of management multiline procedure processor
Number of passes is from the firing order number of processor and the limitation of pipeline series.Can also avoid because program threads less than assembly line and
Caused by clock cycle/hardware resource waste phenomenon.
To realize software-controllable dynamic multi streaming method, the instruction system of its processor is except the instruction of usual executive program
One group must also be added outside word and includes thread number and the identifier of precedence information is attached in coding line as a mark,
As shown in Figure 3.Mark in figure is 2 binary digits of one at least 2.
By taking the mark of 3 digits as an example:
Assuming that mark=" 000 ";The thread for representing to perform the instruction is 0, and priority is 0(0 represents low priority)
Assuming that mark=" 101 ";The thread for representing to perform the instruction is 1, and priority is 1(Represent high priority)
The concrete numerical value of Mark can be the execution line that programmer sets this section of program in programming according to the requirement of system
Journey and priority or compiling system provides automatically in compilation process according to the function of program.
The software-controllable dynamic multi streaming method of the present invention can be not only used for the processor of single instruction issue, can also use
In the processor of multiple instructions transmitting.
For the processor of a multiple instructions transmitting, the instruction of its multi-emitting can share a mark information, can also be every
Bar instruction carries the mark information of oneself.
Fig. 3 is the order structure of a list mark single instrction.
Fig. 4 is the order structure of a list mark multiple instructions;Wherein coding line 1,2, n must be same multi-threaded program
In different instruction.The structure of single mark coding lines can only perform time-division multiple threads.
What Fig. 5 was provided is more mark, the order structure of multiple instructions word, and in figure, M is the meaning for representing Mark;Due to each
Coding line has the mark of their own, so these instructions can be the instruction of the program of different threads.The finger of this more mark
Structure is made to be applicable to synchronizing multiple threads processing.
The execution step of the dynamic multi streaming method of the present invention is as follows:
Step 1(Or the clock cycle 0)Read instruction:The I-Cache read control circuits of processor check whether there is instruction
Etc. pending(Valid), if the instruction Valid of more than 2,(The I-Cache of the processor of one multithreading should be at least
There is the Bank of 2 or more), then check that the priority of which bar instruction is high, if just reading the high instruction of priority, if
Priority is the same then to be read the instruction different with the instruction thread that back performs or reads instruction according to the order of thread;
Step 2(Or the clock cycle 1)Instruction decoding and distribution:Decoding circuit solution code instruction 1, instruction 2, instruction 3, distribution is patrolled
The function distribution collected further according to solution code instruction goes to perform to different arithmetic elements;
Step 3(Or 2~n+1 of clock cycle)Instruction performs:Processor reads corresponding according to the thread information in mark
The data of register, and by the register of the result of execution deposit its respective thread;By taking instruction control circuit as an example, according to mark's
Thread information presses corresponding PC content of registers sequential execution of programmed instructions, and other work(of corresponding thread are read according to instruction
Can register(Such as loop counter, jump, condition etc.)Data, and the result of execute instruction is restored again into accordingly
Thread these registers;
Here the n numerical value in 2~n+1 of clock cycle is decided by the pipeline series of processor arithmetic element.If one
The structure of a 4 grades of flowing water, this n is equal to 4, if 6 stage pipeline structures, n are equal to 6;
The clock cycle n+1 of step 3 just returns step 1 after having performed.
Since the dynamic multithreading architecture of single mark multiple instructions in the present invention is a time-division multithreaded architecture, work as
Present procedure runs to step 2(Clock cycle 1)When, the I-Cache read control circuits of processor are read in repeat step 1
Take the validity of instruction of the control circuit in the appearance for checking next step(Valid)And determine which reads according to Valid
The programmed instruction of thread.
When current program goes to step 3 (clock cycle 2), I-Cache read control circuits still re-cover step
1, the 3rd group of instruction is read according to the Valid information of instruction;And the decoding distributor circuit of processor then re-covers and performs step 2, solution
The instruction of code and distribution program 2;So in cycles.
Fig. 6, which gives one, has 6 level production lines(Arithmetic element)The execution flow signal of the dynamic multi streaming of structure.Figure
In:
T-thread;
Y-thread number, y=0,1,2,, n;For representing y threads T;For example T (2) represents thread 2;
The value of Y is provided by the mark in coding line;
Ith transmitting of the i-identical thread within the same instruction cycle;An instruction cycle is equal in this example
6 clock cycle;
J-pipeline series;
Such as T (32,4) represent the 2nd time of thread 3 transmitting and its state in the 4th grade of assembly line.
Here the suitable procedures described above 3 of operating process of flow chart).Wherein n is equal to 6, i.e. processor has been read
Instruction is taken and instruction decoding and corresponding processing unit will be allocated to.Corresponding processing unit has been obtained for thread and excellent
The information of first level.
The operating process of one dynamic multi streaming is:(Assuming that program 0,1,2,, 5 be all independent thread)
The C0 in clock cycle zero(Here clock cycle 0 is equivalent to the foregoing instruction cycle 2):The processing list of processor
First mark parts read instruction and decoded coding line obtain the thread Y of present instruction, it is assumed that the journey of Y=0, i.e. thread 0
Thread T (0 is just awarded in the instruction of sequence, the instruction0,0) and performed since zero level assembly line;
The C1 in clock cycle one:Processor, which reads next and instructs and decode mark, obtains Y=1, illustrates that the instruction is
Thread T (1 is awarded in the instruction of the program of thread 1, the instruction0,0), and performed since the first level production line, and at this moment preceding article
Flowing water is to the 1st level production line for instruction, so state becomes T (00,1);
The C2 in clock cycle two:Processor should read instruction i.e. Y=2 of 2 program of thread under normal circumstances, still
For some reason, the instruction missing of the program of thread 2, and the instruction for the program of thread 0 occur is already prepared to, at this moment
Processor can read the mark of instruction and if decoding obtains the decoding of Y=0 and also obtains priority equal to 1(Without waiting for thread
The operation result of the 0 previous bar instruction of program), at this moment processor begin to authorize thread T (01,0) and start to perform the instruction,
Order, before 2 instruction states become, T (00,2) and T (10,1);
The C3 in clock cycle three:Processor, which reads to instruct and decode mark, obtains Y=3, that is, authorizes instruction thread T
(30,0) and start to perform.At this moment the instruction execution state order before becomes T (00,3), T (10,2) and T (01,1);
The C4 in clock cycle four:Processor, which reads to instruct and decode mark, obtains Y=4, that is, authorizes instruction thread T
(40,0) and start to perform.At this moment the instruction execution state order before becomes T (00,4), T (10,3), T (01,2) and T (30,1);
The C5 in clock cycle five:Processor, which reads to instruct and decode mark, obtains Y=5, that is, authorizes instruction thread T
(50,0) and start to perform.At this moment the instruction execution state order before becomes T (00,5), T (10,4), T (01,3), T (30,2) and T
(40,2);So far, an instruction cycle terminates, and instructs T (00) operation result be stored in corresponding register.
As seen from the above analysis, dynamic multi streaming technology is controlled using software, only needs to track for processor
T (the Y of every instructionI, j) it just can effectively transfer hardware resource.And the setting of multithreading completely can will obtain from system
Hair is flexible to be transferred.
Fig. 7 is a Harvard structure, employs the controllable multiline procedure processor logic of the dynamic of software design patterns thread
Block diagram.The order structure of processor in figure is a tri- instruction word issue structure of list mark.Processor as we can see from the figure
Increase the outer other parts in mark positions and a typical processor structure almost one of several bits in coding line structure
Sample.The information of Mark needs to send all arithmetic elements to.Instruction control unit is according to the thread and precedence information of mark
The reading and control of control instruction and the execution state for tracking multithreading, arithmetic operation unit are then come really using the information of mark
Unrest will not be mixed up by protecting the operation result of the instruction.
Those skilled in the art of the present technique are it is understood that unless otherwise defined, all terms used herein(Including skill
Art term and scientific terminology)With the identical meaning of the general understanding with the those of ordinary skill in fields of the present invention.Also
It should be understood that those terms such as defined in the general dictionary should be understood that with the context of the prior art
The consistent meaning of meaning, and unless defined as here, will not be explained with the implication of idealization or overly formal.
Above-described embodiment, has carried out the purpose of the present invention, technical solution and beneficial effect further
Describe in detail, it should be understood that the foregoing is merely the embodiment of the present invention, be not limited to this hair
Bright, within the spirit and principles of the invention, any modification, equivalent substitution, improvement and etc. done, should be included in the present invention
Protection domain within.
Claims (6)
1. a kind of controlled dynamic multi-threading, it is characterised in that pipeline organization and the place with I-cache are used to one
Device is managed, increases mark newly in its order structure, which includes two partial informations:Thread belonging to mark corresponding instructions and
The precedence information of mark corresponding instructions, the precedence information be used for the execution sequence for indicating instruction and with its front and rear instruction
Correlation;Processor controls its corresponding instruction according to mark, launches by the precedence information of the corresponding instruction and affiliated thread
And the instruction is performed, comprise the following steps that:
Step 1), according to etc. precedence information in the corresponding mark of instruction to be performed read instruction;
Step 1.1), the instruction reading circuit of processor check I-Cache whether have instruction by etc. it is pending, i.e., with the presence or absence of place
In the instruction of Valid states;
Step 1.1.1)If only existing 1 instruction for being in Valid states, the instruction is read;
Step 1.1.2), if the instructions of more than 2 are in Valid states, then which is checked according to the corresponding mark of instruction
The priority of bar instruction is high;
Step 1.1.2.1), the instruction of other instructions is higher than if there is priority, then reads the priority higher than other instructions
Instruction;
Step 1.1.2.2), the instruction of other instructions is higher than if there is no priority, then judges whether that back performs
Instruction thread;
Step 1.1.2.2.1), if there is the instruction thread of back execution, read the instruction thread performed with back not
Instruction with thread or the order according to instruction thread read instruction;
Step 1.1.2.2.2), if there is no the instruction thread of back execution, read and instructed according to the order of instruction thread;
Step 2), instruction decoding and distribution:
The decoding circuit of processor is by step 1)In read instruction decoding be mark and each sub-instructions, the distribution of processor
Logic assigns them to different arithmetic elements according to the function of each sub-instructions and goes to perform;
Step 3), instruction execution:
For each sub-instructions, processor reads the corresponding register of the thread according to the thread information in instruction mark belonging to it
Data, and by the register of the result of execution deposit its respective thread;
Step 4), jump to step 1).
2. controlled dynamic multi-threading according to claim 1, it is characterised in that the mark by software write or
Compiler automatically writes in compilation process.
3. controlled dynamic multi-threading according to claim 1, it is characterised in that the processor is launched for multiple instructions
Processor, its every instruction is all independent to carry the mark of oneself.
4. controlled dynamic multi-threading according to claim 1, it is characterised in that the processor is launched for multiple instructions
Processor, a plurality of instruction share one group of mark.
5. controlled dynamic multi-threading according to claim 1, it is characterised in that the processor is single instruction issue
Processor, the corresponding mark of its every instruction.
6. the processor of a kind of controlled dynamic multi-threading for described in perform claim requirement 1, it is characterised in that at least wrap
The program that mark can be identified and tracked containing an instruction system with mark, one perform control unit, one can identify
Mark simultaneously carries out decoded instruction demoding circuit, an arithmetic operation unit that can identify and decode mark and corresponding memory
Unit.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610272367.8A CN105824605B (en) | 2016-04-28 | 2016-04-28 | A kind of controlled dynamic multi-threading and processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610272367.8A CN105824605B (en) | 2016-04-28 | 2016-04-28 | A kind of controlled dynamic multi-threading and processor |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105824605A CN105824605A (en) | 2016-08-03 |
CN105824605B true CN105824605B (en) | 2018-04-13 |
Family
ID=56528841
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610272367.8A Active CN105824605B (en) | 2016-04-28 | 2016-04-28 | A kind of controlled dynamic multi-threading and processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105824605B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5511182A (en) * | 1994-08-31 | 1996-04-23 | Motorola, Inc. | Programmable pin configuration logic circuit for providing a chip select signal and related method |
US7447887B2 (en) * | 2005-10-14 | 2008-11-04 | Hitachi, Ltd. | Multithread processor |
US7518993B1 (en) * | 1999-11-19 | 2009-04-14 | The United States Of America As Represented By The Secretary Of The Navy | Prioritizing resource utilization in multi-thread computing system |
CN101763285A (en) * | 2010-01-15 | 2010-06-30 | 西安电子科技大学 | Zero-overhead switching multithread processor and thread switching method thereof |
CN101763251A (en) * | 2010-01-05 | 2010-06-30 | 浙江大学 | Instruction decode buffer device of multithreading microprocessor |
-
2016
- 2016-04-28 CN CN201610272367.8A patent/CN105824605B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5511182A (en) * | 1994-08-31 | 1996-04-23 | Motorola, Inc. | Programmable pin configuration logic circuit for providing a chip select signal and related method |
US7518993B1 (en) * | 1999-11-19 | 2009-04-14 | The United States Of America As Represented By The Secretary Of The Navy | Prioritizing resource utilization in multi-thread computing system |
US7447887B2 (en) * | 2005-10-14 | 2008-11-04 | Hitachi, Ltd. | Multithread processor |
CN101763251A (en) * | 2010-01-05 | 2010-06-30 | 浙江大学 | Instruction decode buffer device of multithreading microprocessor |
CN101763285A (en) * | 2010-01-15 | 2010-06-30 | 西安电子科技大学 | Zero-overhead switching multithread processor and thread switching method thereof |
Non-Patent Citations (2)
Title |
---|
Speculation-aware thread scheduling for simultaneous multithreading;Kang等;《Electronics Letters》;20041231;第40卷(第5期);第790-795页 * |
基于多个取指优先级的同时多线程处理器取指策略;孙彩霞等;《电子学报》;20060531(第5期);第296-298页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105824605A (en) | 2016-08-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106104481B (en) | System and method for performing deterministic and opportunistic multithreading | |
CN108027807B (en) | Block-based processor core topology register | |
CN108027769B (en) | Initiating instruction block execution using register access instructions | |
US20230106990A1 (en) | Executing multiple programs simultaneously on a processor core | |
CN108027772B (en) | Different system registers for a logical processor | |
EP3350686B1 (en) | Debug support for block-based processor | |
KR102335194B1 (en) | Opportunity multithreading in a multithreaded processor with instruction chaining capability | |
US5710902A (en) | Instruction dependency chain indentifier | |
US10095519B2 (en) | Instruction block address register | |
US20170371660A1 (en) | Load-store queue for multiple processor cores | |
KR101594502B1 (en) | Systems and methods for move elimination with bypass multiple instantiation table | |
CN113703834A (en) | Block-based processor core composition register | |
US20160378491A1 (en) | Determination of target location for transfer of processor control | |
KR20180021812A (en) | Block-based architecture that executes contiguous blocks in parallel | |
US20180032344A1 (en) | Out-of-order block-based processor | |
EP2782004B1 (en) | Opportunistic multi-thread method and processor | |
WO2017223004A1 (en) | Load-store queue for block-based processor | |
CN105824605B (en) | A kind of controlled dynamic multi-threading and processor | |
CN205721743U (en) | A kind of processor of controlled dynamic multi-threading |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20160803 Assignee: Suzhou Hongxin integrated circuit Co.,Ltd. Assignor: Wang Shenghong Contract record no.: X2023990000728 Denomination of invention: A controllable dynamic multithreading method and processor Granted publication date: 20180413 License type: Exclusive License Record date: 20230726 |
|
EE01 | Entry into force of recordation of patent licensing contract |