CN101116057A - A mechanism for instruction set based thread execution on a plurality of instruction sequencers - Google Patents

A mechanism for instruction set based thread execution on a plurality of instruction sequencers Download PDF

Info

Publication number
CN101116057A
CN101116057A CNA2005800448962A CN200580044896A CN101116057A CN 101116057 A CN101116057 A CN 101116057A CN A2005800448962 A CNA2005800448962 A CN A2005800448962A CN 200580044896 A CN200580044896 A CN 200580044896A CN 101116057 A CN101116057 A CN 101116057A
Authority
CN
China
Prior art keywords
instruction
sequencer
user
level
execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2005800448962A
Other languages
Chinese (zh)
Other versions
CN101116057B (en
Inventor
H·王
J·沈
E·格罗乔夫斯基
J·P·赫尔德
B·比比
S·D·考施克
G·钦亚
X·邹
P·哈马伦德
X·田
A·阿加瓦尔
S·D·罗杰斯
B·V·帕特尔
R·汉金斯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/173,326 external-priority patent/US8719819B2/en
Application filed by Intel Corp filed Critical Intel Corp
Publication of CN101116057A publication Critical patent/CN101116057A/en
Application granted granted Critical
Publication of CN101116057B publication Critical patent/CN101116057B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3851Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system

Abstract

In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.

Description

On a plurality of instruction sequencers based on the thread execution mechanism of instruction set
Technical field
Embodiments of the invention relate to the method and apparatus that is used for processing instruction.
Background technology
For improving the information handling system performance of (for example those comprise microprocessor, have not only used hardware technology but also have used the system of software engineering).At hardware aspect, the microprocessor Design method that improves microprocessor performance comprises: increase clock speed, and stream line operation, branch prediction, superscale is carried out, and fault is carried out and high-speed cache.Many these class methods have caused the increase of number of transistors, even in some cases, have caused number of transistors to increase with the speed that is higher than the performance raising.
It is not to seek the raising of performance by increasing transistor strictly speaking that other performances improve mode, but utilizes software engineering.A kind of software approach that is used to improve processor performance is called as " multi-threading ".In the software multi-threading, an instruction stream can be divided into a plurality of instruction streams of energy executed in parallel.As optional scheme, but a plurality of independently software flow executed in parallel.
In a method that is called timeslice multi-threading or time division multiplex (" TMUX ") multi-threading, single processor at a fixed time the section after, between thread, change.In other method, single processor is changed between thread after trigger event (as high latency high-speed cache fault) takes place.In one method of back, be called the multi-threading (" SoEMT ") that switches based on incident, having only a thread at most is movable in preset time.
Multi-threading is supported with hardware more and more.For example, in a method, processor in the multicomputer system (for example chip multi-processor (" CMP ") system (a plurality of processors are encapsulating on the single chip) and symmetric multiprocessor (" SMP ") system (a plurality of processors are on a plurality of chips)) all can act on a plurality of software threads simultaneously.In the other method that is called parallel multithread method (" SMT "), the single physical processor core is made into to be a plurality of logic processors for operating system and user program performance.For SMT, a plurality of software threads can be movable also carrying out simultaneously on single processor core.In other words, each logic processor keeps a whole set of architecture states, but many other resources of concurrent physical processor are to share as high-speed cache, execution unit, branch predictor, control logic unit and bus.Therefore, for SMT, from instruction executed in parallel on each logic processor of a plurality of software threads.
For the system of support software thread parallel execution, as SMT, SMP and/or CMP system, the scheduling and the execution of operating system may command software thread.
As optional scheme, also can directly manage and dispatch a plurality of threads and in disposal system, carry out by some application programs.This application program-predetermined thread is normally sightless to operating system (OS), and is called as user-level thread.
Generally, user-level thread can only be scheduled and carry out for the application program of moving on the processing resource by the OS management.Therefore, in having the exemplary process system of a plurality of processors, the mechanism that does not have dispatched users level thread on the processor of directly not managing, to move by OS.
Description of drawings
Figure 1A and 1B are the high level block diagram of many sequencer systems of explanation one embodiment of the invention;
Fig. 1 C is the block diagram of explanation support by the selected feature of the embodiment of many sequencer systems of user level instruction control thread;
Fig. 2 is the logical diagram of many sequencers hardware of the part of the many sequencer systems shown in the explanation pie graph 1A-1C;
Fig. 3 A is the view of the instruction set architecture of the system shown in the key diagram 1A-1C;
Fig. 3 B is the logical schematic that an embodiment of the processor with two or more instruction sequencers is described, described instruction sequencer comprises user class control transfer instruction and user class monitor command in its instruction set.
Fig. 4 A and 4B illustrate the SXFR of one embodiment of the invention and the form of SEMONITOR instruction respectively;
Fig. 5 explanation is according to one embodiment of the invention, and how the SXFR instruction can be used to realize that control is shifted between sequencer;
Fig. 6 A-6B can be used to the table programmed in the service channel according to one embodiment of the invention;
Fig. 7 illustrates according to one embodiment of the invention the functions of components block diagram of the thread management logic parts of the system shown in the pie graph 1A-1C;
The operation of execution mechanism is acted on behalf of in Fig. 8 explanation according to an embodiment of the invention;
Fig. 9 and Figure 10 illustrate the example of the logic processor of one embodiment of the invention;
Figure 11 illustrates in one embodiment of the invention how SXFR and SEMONITOR instruction just are used to support the agency to carry out immediately after OS handles page fault;
Figure 12 illustrates the disposal system of one embodiment of the invention.
Figure 13 is the block diagram of explanation one illustrative computer system, this system can use the embodiment of processor parts, for example central processing unit (CPU) or chipset, described processor parts comprise one or more instruction sequencers, and they are configured to carry out one or more user-level threads that comprise the user level instruction known to the sequencer.
Embodiment
In the following description, for having illustrated a large amount of specific detail, task of explanation fully understands of the present invention to provide.Yet the present invention can implement under the condition of these specific detail not having, and this it will be apparent to those skilled in the art that.In other examples, more clear in order to make the present invention, with the block diagram formal specification structure of the present invention and equipment.
In this manual, " embodiment (one embodiment) " or " embodiment (an embodiment) " meaned in conjunction with the described specific function of this embodiment, structure or feature comprise at least one embodiment of the present invention.The phrase " (in an embodiment) in one embodiment " that occurs everywhere at this instructions needn't all refer to same embodiment, neither get rid of other or the optional embodiment of other embodiment.In addition, to may be by some embodiment but not the various functions that other embodiment represents be described.Equally, to may being that the required various demands of some embodiment rather than other embodiment are described.
Following description has illustrated the embodiment of the structure mechanism of the thread of carrying out on the sequencer of creating and be controlled at the many sequencer systems that are not subjected to OS control.
Speech used herein " instruction sequencer " or abbreviate " sequencer " as and comprise next instruction pointer logic parts and certain processor state at least.For example, instruction sequencer can comprise logic processor or concurrent physical processor nuclear.
In one embodiment, architectural mechanisms can only comprise two instructions, and these two instructions have defined together at the signaling mechanism that sends and receive the signal between any two sequencers under the condition of not using the OS application programming interfaces.Signal can comprise incident or the scene (scenario) that defines on the structure, is mapped to handler code.After received signal on the sequencer, the scene in the signal is just served as makes the triggering of sequencer to handler code establishment vector.With these two instructions, can realize the thread creation, thread control and the thread synchronization software primitives that provide by existing thread library.
In addition, these two instructions can be used for creating to be acted on behalf of execution mechanism and makes servo sequencer represent client's sequencer run time version, and this will explain below in more detail.
Therefore, to comprising that two or more exemplary processor system of carrying out the instruction sequencer of different threads are described.At least some comprise the user level instruction known to the sequencer in two or more instruction sequencers in their instruction set, these instructions consider between sequencer and control, and this control is by carrying out being operated by the thread management on the instruction sequencer of regulation under the condition that does not have operating system to get involved.User level instruction known to the sequencer can comprise instruction sequencer control transfer instruction, instruction sequencer monitor command, context is held instruction and context recovers instruction.Processor system also can be provided with the thread management logic parts and respond user level instruction, is not having to be created in the parallel thread of carrying out on the relevant hiding instruction sequencer under the situation of operating system dispatcher to allow non-hiding instruction sequencer.And, processor system can have the execution mechanism of agency, can allow some trigger condition that customer instruction sequencer response runs into and not having under the condition that operating system gets involved the term of execution of the instruction on client's sequencer, trigger agent thread and carry out representing on the servo instruction sequencer of customer instruction sequencer.
See the Figure 1A in the accompanying drawing now, Reference numeral 100A represents many sequencer systems of one embodiment of the invention.This many sequencer system 100A comprises storer 102 and many sequencers hardware 104.Storer 102 comprises user class program 106, and user class program 106 comprises scheduler program 108, and scheduler program 108 dispatch commands are to carry out on many sequencers hardware 104.Carry out for expressing multithreading, user class program 106 is used thread API110 and is provided on thread creation, control and the synchronization primitives thread library to user class program 106.Operating system 112 is also in storer 102.Many sequencers hardware 104 comprises a plurality of sequencers, only shows wherein four among Figure 1A.The sequencer that shown this is four is designated as SID0, SID1, SID2 and SID3 respectively.
" sequencer " used herein, can be unique thread execution resource and can be can execution thread any physics or logical block.Instruction sequencer can be included as the next instruction pointer logic parts of the definite next instruction that will carry out of given thread.Sequencer can be logic thread parts or physical thread parts.In one embodiment, a plurality of instruction sequencers can be in same processor core.In one embodiment, each instruction sequencer can be in different processor cores.
Instruction set architecture is included in the given processor core.Instruction set architecture (ISA) can be for by state element (register) with operate in the abstract model of the processor core that the instruction on those state elements forms.Instruction set architecture offers programmer and microprocessor Design person by the abstract specification with the action of processor core, serves as the border of software and hardware.Instruction set can be determined the set of the instruction that processor core can be deciphered and carry out.
Though the embodiment of the chip multiprocessing (CMP) of many sequencers hardware 104 that this paper discusses only relates to the single thread of each sequencer SID0-SID3, should not infer disclosing of this paper and be limited to single-threaded processor.The technology of this paper discussion can be used in any chip multiprocessing (CMP) system or parallel multithread processor (SMT) system, comprise and being used in the commingled system with CMP processor and smt processor, in commingled system, each CMP processor core is smt processor or switches multiprocessor (SoeMT) based on incident.For example, technology disclosed herein can be used in the system that comprises a plurality of multiline procedure processors nuclear in the single Chip Packaging 104.
Sequencer SID0-SID3 needs not to be unified, but can be asymmetric for any factor (as processing speed, processing power and power consumption) that influences calculated mass.For example, sequencer SID0 can be " heavy burden (heavy weight) ", and it is designed to handle all instructions of given instruction set architecture (for example IA32 instruction set architecture).And sequencer SID1 can be " light negative (light weight) ", and it can only handle selected subclass in these instructions.In another embodiment, the HD processor can be with the processor than light burden processor faster speed processing instruction.Sequencer SID0 is that operating system (OS) is visible, and sequencer SID1 is that OS hides to SID3.Yet this does not also mean that each HD sequencer all is that OS is visible, or all sequencers all are that OS hides.Speech used herein " OS hides " expression has been transformed into the sequencer of hidden state or situation.A feature of this hidden state or situation is the instruction that OS does not dispatch the sequencer under this state.
Just as will be seen, many sequencers hardware or firmware (for example microcode) also comprise thread management logic parts 114.In one embodiment, thread management logic parts 114 are virtual with sequencer SID0-SID3, and can make them is unified to user class program 106.In other words, thread management logic parts 114 have hidden the asymmetry of sequencer SID0-SID3, make the assembly language programmer from logical perspective, and sequencer SID0-SID3 is unified, as shown in the view shown in Figure 2 200 in the accompanying drawing.
In the system 100A shown in the accompanying drawing 1A, 106 stressed joints of user class program are on many sequencers hardware 104.In one embodiment, user class program 106 can be connected on many sequencers hardware 104 by the interim driver pine.This system represents with Reference numeral 100B, in accompanying drawing 1B.System 00B and the 100A of system are basic identical, except replacing scheduler program 108, the user class program utilizes kernel level software (as device driver 116, as driver, hardware abstraction layer etc.) communicate by letter with kernel level API 118, on many sequencers hardware 104, carry out with dispatch command.
Fig. 1 c is the block diagram of selected feature of the embodiment 109,115,150,170 of explanation many sequencer systems of being come the Support Line process control by user level instruction.The selected feature of many sequencers of Fig. 1 c explanation SMT multi-threaded system 109, wherein each sequencer be can with other thread execution on other logic processor logic processor of execution thread concurrently.Fig. 1 has also illustrated by switching many sequencer systems 115 that (SoeMT) mechanism (as the time division multiplexing-type handover mechanism) is supported a plurality of logic sequencers based on incident, has made each logic processor move at least one embodiment of its thread successively---in this system 115, only carry out a thread at every turn.
Fig. 1 c also illustrates the selected feature of multinuclear multi-threaded system 150,170.The physics of multinuclear multi-threaded system is endorsed and is single sequencer nuclear (reference example such as system 150) or many sequencers nuclear (reference example such as system 170).The embodiment of this multinuclear multithreading will discuss below after a while, below the many sequencer systems 109,115 of monokaryon is discussed earlier.
In SMT system 109, make single physical processor 103 show as a plurality of thread context, be called TC in the literary composition 1To TC n(not shown).In this n thread context each is actually sequencer.To operating system and/or user program as seen at least some (for example the m among the n) are set as in these thread context, and these thread context are sometimes referred to as the logic processor (not shown), are called LP in the literary composition 1To LP mEach thread context TC 1To TC nKeep one group of architecture states AS respectively 1-AS nFor at least one embodiment, architecture states comprises data register, segment register, control register, debug registers and most models special register.Thread context TC 1-TC nShare other resource of great majority of concurrent physical processor 103, as high-speed cache, execution unit, branch predictor, control logic unit and bus.
Though these functions can be shared, each thread context in the multi-threaded system 109 can independently produce next instruction address (and carrying out for example instruction fetch from instruction cache, execution command high-speed cache or trace cache).Therefore, for each thread context, processor 103 comprises that in logic independently next instruction pointer and instruction fetch logical block 120 come instruction fetch, even a plurality of logic sequencer can be realized in single physical instruction fetch/decoding unit 122.In SMT embodiment, for thread context, speech " sequencer " can comprise next instruction pointer and instruction fetch logical block 120 and at least some related framework states (AS) at least.The sequencer of noting SMT system 109 needs not to be symmetry.For example, two of same concurrent physical processor SMT sequencers can be different on each self-sustaining architecture states quantity of information at them.
Therefore, at least one embodiment, many sequencer systems 109 are single core processor 103 of supporting the parallel multithread method.For this embodiment, each sequencer is to have the instruction next instruction pointer of himself and the logic processor of instruction fetch logical block and the architecture states information of himself, although same concurrent physical processor nuclear 103 is carried out all thread instruction.For this embodiment, logic processor keeps the architecture states version of himself, although the execution resource of uniprocessor nuclear 103 can be shared between the thread of executed in parallel.
Fig. 1 c has also illustrated the optional embodiment 115 of the many sequencer systems that can carry out multi-threaded code.This embodiment 115 is denoted as multi-threading (" the SOEMT ") embodiment that switches based on incident.For this embodiment 115, each sequencer is similar to the sequencer of previous embodiment 109 aspect following: each sequencer is to have the architecture states information of himself and the logic processor of the next instruction pointer of self.Yet system 115 also is different from previous embodiment 109 aspect following: the same physics instruction fetch logical block 120 in the single instruction fetch/decoding unit 122 in the shared concurrent physical processor nuclear 103 of each sequencer and other sequencer.The instruction fetch logical block 120 of the different sequencers of system 115 can be switched to the instruction fetch pattern based on the incident switchover policy based on multiple.Switching (as time division multiplex (TMUX)) based on incident can respond the process of time of specified quantitative or machine cycle and trigger.For other embodiment, SOEMT triggers and can be other incident, as cache-miss incident, page fault, high latency instruction etc.
Fig. 1 c has also illustrated at least two embodiment 150,170 of multinuclear multi-threaded system.For at least some embodiment of the multiple nucleus system shown in Fig. 1 c 150,170, system's available processors 103 is as making up module.Each sequencer can be processor core 103, a plurality of nuclears 103 1-103 n, 103 1-103 mReside in respectively in the single Chip Packaging 160,180.For the system 150 shown in Fig. 1 c, each examines 103 i(i=0~n) can be single-threaded sequencer.For the system 170 shown in Fig. 1 c, each examines 103j, and (j=1~m) can be many sequencers processor core.
Chip Packaging 160,180 dots in Fig. 1 c, and is just illustrative with the single-chip embodiment 150,170 that indicates illustrated multiple nucleus system.For other embodiment, the processor core of multiple nucleus system resides on the independent chip, maybe can be organized into the many sequencer systems of SOEMT.
The first multinuclear multi-threaded system 150 shown in Fig. 1 c can comprise two or more independent concurrent physical processors 103 1-103 n, they can carry out different threads separately, make to the execution of the different threads of small part to carry out simultaneously.Each processor 103 1To 103 nComprise that physically independently instruction fetch parts 122 take out the command information of its its respective thread.In one embodiment, each processor 103 1-103 nCarry out single thread, instruction fetch/decoding unit 122 is realized single next instruction pointer and instruction fetch logical block 120.
Fig. 1 c has also illustrated multinuclear multi-threaded system 170, and this system comprises a plurality of SMT system 109.For this embodiment 170, each processor 103 1-103 mSupport a plurality of thread context.For example, each processor 103 1-103 mBe smt processor, it supports k sequencer so that system 170 effectively realizes m*k sequencer.In addition, for the thread context that respectively is supported, the instruction fetch/decoding unit 122 of system 170 is realized each other next instruction pointer and instruction fetch logical block 120.
Be simplified illustration, following discussion will concentrate on the embodiment 150 of multiple nucleus system.Yet it is restrictive that this concentrated discussion should not be considered to, because the mechanism that describes below can realize in multinuclear or the many sequencer systems of monokaryon.Simultaneously, monokaryon or multiple nucleus system can be examined with single sequencer nuclear or many sequencers and realize.For each many sequencer nuclear, can utilize one or more multithreadings, comprise SMT and/or SoeMT.It is to be understood that the system 109,115,150,170 shown in Fig. 1 c can comprise other parts, as (not shown among Fig. 1 c) such as accumulator system, execution units.
Each sequencer 103 of the embodiment 109,115,150,170 of the system shown in Fig. 1 c, (hereinafter in conjunction with Fig. 3 discussion) can be associated with unique identifier.Each embodiment of system 109,150 can comprise whole sequencers of varying number N.
But the signaling between embodiment 109,115,150,170 each self-supporting sequencer of the system shown in Fig. 1 c.Signaling between the sequencer that speech used herein " sequencer arithmetic " refers to be used for serving between two sequencers.The framework support of sequencer arithmetic can comprise that instruction set architecture is extended to the control and the state that can make one or more instructions be provided to allow the user directly to handle between the sequencer to be transmitted.If it is the sequencer arithmetic instruction or comprises instruction as any other type of the logic sequencer address of parameter, then user level instruction is called as " known to the sequencer (sequencer aware) ", execute in instruction one, they just can be encoded into instruction operands and/or be quoted by implicit.These instructions can comprise the sequencer arithmetic instruction, they or establish for other sequencer (being called " user class control transfer instruction " herein) for transmitting signal, perhaps monitor that for client's sequencer is set sort signal (being called " user class monitor command " herein) establishes.
Instruction known to the sequencer can comprise that also other comprises the instruction as the logic sequencer address of parameter, preserves and the recovery instruction as the state known to the sequencer.The execution of holding instruction of this state, first sequencer just can be created the quick copy of the architecture states of second sequencer.Recovery instruction known to the sequencer can indicate: will preserve the sequencer that architecture states (save architecturalstates) is loaded into regulation.
Instruction known to each sequencer also comprises an above logic sequencer address as parameter alternatively.For example, the instruction known to the sequencer can be used as the set that parameter comprises a plurality of logic sequencers address.This method can be used for from the signal of a sequencer between a plurality of other sequencer multileavings or broadcasting sequencer.For simplifying following discussion, unless otherwise prescribed, the example that the following describes can refer to unicast case: the instruction known to the sequencer of single other logic sequencer address that puts rules into practice of first sequencer.This method is only for ease of explanation with for explanatory purpose, and should not be considered to restrictive.The embodiment that one skilled in the art will realize that the mechanism that this paper discusses also can be used for broadcasting with the multileaving sequencer known to instruction.
Fig. 3 A shows the view of the instruction set architecture of the system shown in Figure 1A-1C.Referring now to Fig. 3 A, it shows the view 300 of the instruction set architecture (ISA) of 100A of system and 100B.The logical view of ISA define system, the view that assembly language programmer, binary translation program, assembly routine etc. are seen.According to its ISA, system 100A and 100B comprise logical storage 302 and instruction set 304.Logical storage 302 is stipulated the visible storage hierarchy, addressing scheme, registers group of the 100A of system, 100B etc., and instruction set 304 is stipulated instruction and order format that 100A of system and 100B support.In one embodiment, instruction set 304 can comprise and is called IA32 instruction set and expansion thereof, and also may be other instruction set.In addition, in one embodiment, instruction set 304 comprises two instructions that are called user class control transfer instruction and user class monitor command.One example of user class control transfer instruction can be the SXFR instruction.One example of user class monitor command can be the SEMONITOR instruction.One exemplary SXFR instruction and SEMONITOR instruction will come into question to help to understand user class control transfer instruction and user class monitor command.
In a broad sense, the SXFR instruction is used for signal is sent to second sequencer from first sequencer, and the SEMONITOR instruction is used for disposing second sequencer to monitor the signal from first sequencer.In addition, these controls are shifted and monitor command is known to the sequencer, as below will discussing, and can form compound instruction known to more sequencers.
Fig. 3 B illustrates the logical schematic of the embodiment of the processor with two or more instruction sequencers, and described instruction sequencer comprises user class control transfer instruction and user class monitor command in their instruction set.Processor 332 can comprise that one or more instruction sequencer 338-342 are to carry out different threads.In one embodiment, a plurality of instruction sequencers can be shared a decoder unit and/or instruction execution unit.Equally, each instruction sequencer can have the dedicated process instruction pipelining of himself, and this streamline comprises decoder unit (as first decoder unit 334), instruction execution unit (as first instruction execution unit 335) etc.At least some comprise instruction set 344 among a plurality of instruction sequencer 338-342, and instruction set 344 comprises the recovery instruction (instructing as SRSTOR) known to hold instruction (as the SSAVE instruction) and the sequencer known to user class monitor command (as SEMONITOR instruction), user class control transfer instruction (as the SXFR instruction), the sequencer at least.As optional scheme, the storage known to the sequencer and to recover instruction be not the part of instruction set 344 yet.And user class control is shifted and monitor command can be the part of described instruction set, and can use in conjunction with the pointer of scene and handler code, with storage known to the composition sequencer and recovery instruction.The scene type of the compound trigger condition that can define on framework based on micro-architecture event hereinafter will be described.The flow process of control transfer operation can take place as follows.
First example of user class monitor command 346 can stipulate that the pointer of memory location of one of described instruction sequencer, handler code and some control shifts one of scenes.Monitor command 346 can make the instruction sequencer of just carrying out (as first instruction sequencer 338) that the instruction sequencer of regulation is set, with in case observe or receive the control of regulation and shift the signaling of scene and just be invoked at handler code on the core position of regulation.First memory location 348 of storage processing program code can be register, high-speed cache or other similar storer.User class monitor command 346 can at first be carried out before the sourse instruction sequencer sends the control transfer signal, received this control transfer signal with the target instruction target word sequencer that regulation is set.
Execution command sequencer (as first instruction sequencer 338) can be carried out holding instruction to preserve the context state of target instruction target word sequencer known to the sequencer.Can there be second memory location 350 in the context state of target instruction target word sequencer.Second memory location can be in the shared storage array diverse location or in being different from the discrete memory block of first memory location.
First example of control transfer instruction 352 can be stipulated one of one of instruction sequencer and many controls transfer scene.The control of defined is shifted scene and for example can be stored in the table 354.Control transfer instruction 352 makes the instruction sequencer of just carrying out produce the control transfer signal, and this signal will be received by the target instruction target word sequencer 340 (as second instruction sequencer) of regulation.
The target instruction target word sequencer 340 of defined detects the execution of control transfer instruction 352 of response regulation instruction sequencer and the control transfer signal that produces.The target instruction target word sequencer 340 of defined is carried out the monitor command 346 predetermined process program codes by the regulation instruction sequencer then.
After handler code was complete, first instruction sequencer 338 (being the sourse instruction sequencer) can instruct the context state that recovers the target instruction target word sequencer from the recovery known to sequencer is carried out in its memory location second memory location 350.
In one embodiment, processor can comprise many sequencers hardware.Each instruction sequencer can be carried out different threads.In a plurality of instruction sequencers at least some can be carried out user level instruction.User level instruction can be known to the sequencer.Each user level instruction can comprise at least one information in a plurality of instruction sequencers of regulation.The execution of the instruction on the instruction sequencer of just carrying out makes this instruction sequencer of just carrying out execution thread bookkeeping on a regulation sequencer in a plurality of instruction sequencers under the condition that does not have operating system to intervene.The thread management operation can be thread creation, thread control or thread synchronization operation.User level instruction for example comprises SXFR, the SEMONITOR known to the sequencer of hereinafter describing in detail, SSAVE and SRSTR instruction.
In one embodiment, SXFR instruction comprises the order format shown in Fig. 4 A of accompanying drawing.With reference to Fig. 4 A, can see that the SXFR instruction comprises operational code 400A and operand 402A to 410A.Operand 402A is corresponding to destination sequencer ID (SID)/the be sent out targeted sequencer of signal.Operand 404A comprises scene or control messages, can be the identifier code that defines on framework of representing situation or expected event.Scene can be used for realizing being about to the asynchronous control transfer of description.With reference to Fig. 6 A of accompanying drawing, it is a scene table according to an embodiment of the invention.In a broad sense, scene can be and is divided into scene between interior scene of sequencer and sequencer.In one embodiment, scene belongs to unavailable resource (RNA) category in the sequencer, and it is because the category of the incident that the unavailable resource on the visit sequencer produced in the term of execution on the sequencer.In one embodiment, the scene that belongs to the RNA category comprises page fault, the OS that can not directly activate the OS service hides the system call on the sequencer or (deprecated) operating troubles of not approving of.The operating troubles of not approving of is the operating troubles that the subclass limited or that do not approve of by the ISA function that realizes on sequencer causes.The operating troubles of for example, not approving of takes place in the time of may carrying out the instruction that needs the floating-point adder device on attempting not realize physically the sequencer of floating-point adder device.For those of ordinary skill in the art, mechanism described herein can realize in application software, system-Level software, the microcode that is suitable for firmware or hardware with the abstract of different stage.
Scene for example comprises the initialization scene that is called " INIT " scene, " FORK/EXEC " scene and " PROXY " scene between sequencer.The sequencer that the INIT scene causes its SID to be prescribed in the SXFR instruction makes the architecture states (as general-purpose register or machine-specific control register) of one group of sequencer special use be initialized to one group of initial value respectively, and by the targeted sequencer state that comprises instruction pointer (EIP) and/or stack pointer (ESP) at least being provided with specific value, the FORK/EXEC scene makes execution derive from the parallel thread of carrying out on the sequencer of (fork) or the destination SID identification of beginning in being instructed by SXFR at the thread on the sequencer of carrying out the SXFR instruction.For example, be that representative carries out the sequencer processing instruction of SXFR instruction, the PROXY scene is used for making the sequencer by the target SID identification of SXFR instruction to operate in acting on behalf of execution pattern.For example, in one embodiment, operate in the sequencer of acting on behalf of execution pattern and can be used for handling and only to support not approve of the instruction of handling on the sequencer of ISA function set.In one embodiment, the PROXY scene can be and is divided into BEGIN_PROXY scene and END_ROXY scene.As described, the BEGIN_PROXY scene operates in instruction sequencer and acts on behalf of execution pattern, and the END_PROXY scene then stops acting on behalf of the operation of execution pattern.
Refer again to Fig. 4 A of accompanying drawing, in one embodiment, operand 406A comprises conditional parameter, this parameter declaration condition.Conditional parameter for example comprises " WAIT " and " NOWAIT " parameter.For example, when SXFR was used in combination with the PROXY scene, when the agency on waiting for other sequencer was complete, the WAIT conditional parameter stopped the instruction execution on the sequencer of carrying out the SXFR instruction.The execution that the NOWAIT conditional parameter is defined on the sequencer of carrying out the SXFR instruction can be carried out continuation concurrently with the agency on other instruction sequencer.
In one embodiment, operand 408A comprises special-purpose useful load of scene or data-message.For example under the situation of FORK/EXEC scene, useful load can comprise instruction pointer, is begun by the execution on the sequencer of operand 402A identification at the pointed place.According to different embodiment, useful load can comprise instruction pointer, stack pointer etc.The address that is included in the useful load can be expressed with multiple addressing mode (as mode word addressing, register indirect addressing and plot/side-play amount (base/offset) addressing).
Operand 410A regulation is about being included in the Route Selection function of the SID among the operand 402A.Whether this Route Selection functions control sends as broadcasting, clean culture or multileaving signal as the signal of carrying out the generation of SXFR instruction results.The Route Selection function topological special-purpose prompting operation information of can also encoding, this information can be used to help between basic sequencer in the Route Selection interconnection to transmit signal.
Referring now to Fig. 4 B of accompanying drawing, the form of the SEMONITOR instruction of its explanation one embodiment of the invention.As can be seen, the SEMONITOR instruction comprises operational code 400B and operand 402B to 406B.Operand 402B stipulates a scene, and it can for example be expressed according to scene ID.Operand 404B regulation comprises the tuple of sequencer ID (SID) and instruction pointer (EIP).Describe for convenient, this tuple is called " SIDEIP ".
The SEMONITOR instruction is mapped to the SIDEIP that stipulates among the operand 404B with the scene of stipulating among the operand 402B.Therefore, the SEMONITOR instruction can be used for creating the mapping table shown in Fig. 6 B, and it is mapped to specific SIDEIP with each scene.Each scene is called as " service channel " to the mapping of specific SIDEIP.Operand 406B allow the programmer to import one or more controlled variable how serviced control the specific service passage, this will explain below in more detail.The programmer can instruct to the service channel programming with SEMONITOR, and described passage is used for monitoring given scenario by specific sequencer.In one embodiment, when the expection situation observed corresponding to a scene, sequencer causes the surrender button.onrelease that surrender (yield) incident is transferred to asynchronous control to start from the SIDEIP that is mapped to described scene.For example, if the expection situation is then controlled the surrender incident and in a single day is initiated corresponding to fault, current (returning) instruction pointer just is forced in the current stack, and the SIDEIP that is mapped to observed scene is transferred in control.If the expection situation is corresponding to trap, then the next instruction pointer is forced in the current stack, and control is transferred to the SIDEIP that is mapped to observed scene.Fault can be removed this instruction before instruction is carried out.Trap can be carried out the back in instruction and remove this instruction.
In one embodiment, the obstruction that defines on the framework can be configured to prevent that the recurrence of the incident of surrendering from triggering, and is reset up to obstruction.Can reset automatically obstruction and the control of special-purpose link order turns back to it from the surrender button.onrelease and carries out the former code that produces the surrender incident.
Based on top description, will appreciate that SXFR and SEMONITOR are " known to the sequencers ", they comprise the operand of discerning specific sequencer.In addition, hereinafter SSAVE of Miao Shuing and SRSTOR instruction also is " known to the sequencer ", and they comprise the operand of discerning specific sequencer.Simultaneously, these user level instructions can be " known to the sequencer " because they have instruct in the handler code pointer.Handler code is quoted one or more specific instruction sequencers when being carried out by instruction execution unit, at this moment handler code is performed.Handler code is associated with user level instruction, because user level instruction is directed to the place that begins of handler code with instruction pointer, and user level instruction is finished in handler code and carried out the operation that thread is pointed in the back.Therefore, user level instruction can be known to the sequencer, as long as user level instruction has 1) field or 2 of one or more instruction sequencers being carried out specific reference) with the implicit reference of pointer to handler code, this code is addressed to one or more instruction sequencers especially when handler code is performed.
In one embodiment, instruction SXFR and SEMONITOR can be used to realize that control is shifted between sequencer, and this Fig. 5 with reference to the accompanying drawings is described.
With reference to Fig. 5, sequencer 500 transferring control to sequencer 502 once running into the SXFR instruction of locating at an instruction pointer " I ", makes sequencer 502 begin to carry out and starts from the handling procedure instruction that an instruction pointer " J " is located.In one embodiment, and SXFR (SID, SCENARIO_ID, the CONDITION_PARAMETER) SXFR of form instruction, for example, (502, BEGIN_PROXY NOWAIT) can be used to influence control and shifts SXFR.Examine the form of SXFR instruction, " SID " that occurs in the instruction is the reference to the sequencer identifier (SID) of sequencer 502." SCENARIO_ID " part of this instruction is the reference to scene, and as previously mentioned, this scene can be programmed into comes inducing asynchronous control to shift among 100A of system and the 100B.As previously mentioned, in one embodiment, the scene that illustrates in the scene table among Fig. 6 A of system 100A and 100B support accompanying drawing.Each scene is encoded as scene identifiers (ID).In one embodiment, can be programmed in the register, when the SXFR instruction is performed, can from this register, read corresponding to the value of special scenes ID.
In one embodiment, be to try to achieve the instruction pointer of (resolve) and " SCENARIO_ID " part correlation connection of SXFR instruction, used each scene is mapped to mapping table among Fig. 6 B of SIDEIP.
As previously mentioned, for the table among the blank map 6B of service channel, used the SEMONITOR instruction.For example, (SCENARIO_ID, SIDEIP) (1, (502, J)) are mapped to the scene of being indicated by SCENARIO_ID=1, i.e. BEGIN_PROXY scene with the instruction pointer on the sequencer 502 " J " to the instruction SEMONITOR of form to SEMONITOR.Instruct the execution of SXFR (502,1) to make on the sequencer 500 to comprise the SCENARIO_ID be that 1 signal is passed to sequencer 502.
Respond this signal, sequencer 502 causes a surrender incident, and this incident makes control transfer to instruction pointer " J ", and the handler code that is associated with the BEGIN_PROXY scene starts from this pointed place.In one embodiment, be not the response this signal reception and carry out the handler code that begins in instruction pointer " J " sensing place immediately, but sequencer 502 can be ranked to the signal of some receptions, in case and the quantity of signal surpasses threshold value, sequencer 502 is just dealt with these signals by carrying out the handler code that joins with each signal correction.In one embodiment, the ad hoc fashion of specific sequencer 502 processing signals (promptly whether by processing immediately, or by handling with queue delay) and the value of threshold value are controlled or are disposed by the controlled variable 406B in the SEMONITOR instruction.The queuing of request also available software realizes.
In one embodiment, handler code can comprise the instruction that makes service thread begin execution on instruction sequencer 502.In fact, service thread is to help or the auxiliary any thread of going up first thread of carrying out at another sequencer (promptly at Fig. 5 middle finger sequencer 500) of carrying out.In order to allow service thread on sequencer 502, carry out, the state transitions of certain form should be arranged between sequencer 500 and 502.In one embodiment, except SXFR and SEMONITOR instruction, provide the context of sequencer special use to hold instruction and the context of sequencer special use recovers instruction.The sequencer context is held instruction and is represented as SSAVE, and the recovery operation of sequencer context is represented as SRSTOR.SSAVE and SRSTOR are the instructions known to the sequencer.As possibility, the minimum specification instruction set also can include only SXFR and SEMONITOR instruction.For example, in one embodiment, defined the scene that the sequencer context is preserved and/or recovered.At this moment SXFR and SEMONITOR instruction is used in conjunction with scene with to the pointer of handler code.Corresponding handler code can be carried out corresponding sequencer context preservation and/or recovery operation on the targeted sequencer, thereby obtains and special-purpose SRSTOR and the identical effect of SSAVE instruction.
In another embodiment, the context known to the sequencer is held instruction and can be preserved synthetic by the context that allows the scene that is mapped to code block carry out known to the sequencer.Equally, also available scene is synthesized the context recovery operation known to the sequencer.
In one embodiment, SSAVE and SRSTOR instruction all comprises corresponding to the operand of SID and comprises the operand of an address of " conservation zone ", will be stored in this conservation zone by the state of the sequencer of SID operand identification.In the example of Fig. 5, in order to allow sequencer 502 can carry out service thread to promote or to help to operate in the execution of first thread on the sequencer 500, sequencer 502 must be able to use the execution context of first thread.For the execution context that makes first thread can be used sequencer 502, instruction SSAVE at first is kept in first memory location 512 at the execution context that is performed on the sequencer 502 with first thread that will carry out on sequencer 500.For remaining on the work on hand of carrying out on the sequencer 502 before calculating representing sequencer 500 to carry out service threads, the code of current operation on 502 (hereinafter being called " previous code ") can be carried out SSAVE the execution context of previous code is saved in second memory location 514.The conservation zone non-overlapping of first memory location 512 and second memory location 514.
In case the execution context of previous code is kept at second memory location 514, sequencer 502 is just carried out the SRSTOR instruction of indication first memory location 512, changes to the execution context/state that is associated with the processing of first thread on the sequencer 500 with the sequencer state with sequencer 502.After this, sequencer 502 can begin the execution of service thread.When this service thread was carried out, the option of sequencer 500 comprises waited for that service thread finishes execution, or then carries out second thread.In case this service thread is finished the execution on sequencer 502, sequencer 502 will be carried out SXFR and instruct and send a signal to sequencer 500, finishes with the execution of indicating this service thread.Indicate before the execution of this service thread finished sending a signal to sequencer 500, sequencer 502 is carried out the SSAVE instruction this service thread is finished after the execution context of first thread renewal is kept at first memory location 516.
When sequencer 500 is just waiting for that service thread is finished execution, service thread on the sequencer 502 can be carried out the SRSTOR of indication the 3rd memory location 516 then, to notify sequencer 500 to restart the execution context that code upgrades first thread on the sequencer 500 before carrying out carrying out SXFR.After notice sequencer 500 service threads are finished
As optional scheme, also can one receive the signal that the indication service thread is finished from sequencer 502, sequencer 500 is just carried out SRSTOR, and (500, POINTER_TO_SAVE_AREA_B) instruction is being the execution context of first thread with the execution context changes of sequencer 500 when service thread one is finished.
In one embodiment, the preservation of the context state of instruction sequencer and recovery can long-range execution on targeted sequencer.The source sequencer sends the message of preserving and/or recovering the context state of its sequencer to the target instruction target word sequencer.This can be used as the SXFR with special scenes and instructs and realize.
In one embodiment, thread management logic parts 114 comprise to be acted on behalf of execution mechanism 700 and sequencer and hides mechanism 702, shown in Figure 7 as accompanying drawing.
Act on behalf of the operation of execution mechanism 700 for explanation, consider the system shown in Figure 8 800 in the accompanying drawing, it comprises is appointed as S1 and two sequencers of S2 respectively.Sequencer sS1 and S2 can be symmetry or asymmetric each other.In this example, sequencer is asymmetric, and sequencer S1 only comprises processing resource A and B, and sequencer S2 comprises processing resource A, D and C.The processing resource of sequencer S1 must be able to be supported the execution of instruction block 1 and 2.
(T1) is positioned at the arrow end of instruction block 2 constantly.T1 represents that monitor detects the single thread of sening as an envoy to is transferred to servo instruction sequencer S2 from customer instruction sequencer S1 incident.At moment T1, the 3rd instruction block is scheduled to carry out the processing resource (for example, handling resource D) that has yet the 3rd instruction block need use nothing on sequencer S1 on sequencer S2 on sequencer S1.This moment, at least in one embodiment sequencer S1 causes resource and does not have fault, and resource that can (or in thread management logical block hardware or firmware) definition in user-level software does not exist handling procedure to call to act on behalf of execution mechanism 700 and makes the 3rd instruction block be transferred to sequencer S2 to carry out thereon.
(T2) is positioned to the starting point of the line of the arrow of the 3rd instruction block constantly.The beginning that T2 represents to represent on the servo instruction sequencer S2 of customer instruction sequencer S1 this instruction block in single-threaded to carry out.
(T3) is positioned at the end of the arrow of the 3rd instruction block constantly.What T3 represented that this of servo instruction sequencer S2 instruction block in single-threaded carries out finishes.At moment t3, on sequencer S2, carried out after the 3rd instruction block with handling resource D, sequencer S2 signals to sequencer S1 and informs the execution of finishing the 3rd instruction block with acting on behalf of execution mechanism 700.
(T4) is positioned at the starting point of the arrow line of the 4th instruction block constantly.T4 is illustrated in S2 on the instruction sequencer, and to go up the agency of this instruction block in single-threaded complete, and turn back to customer instruction sequencer S1.Sequencer S1 can continue to carry out the 4th instruction block then, and this need have available processing resource on sequencer S1.
Therefore, in last example, sequencer S1 represents its execute block instruction with sequencer S2, and sequencer S1 is called " client " sequencer.Move the sequencer S2 that represents client's sequencer execute block instruction and be called as " servo " sequencer to act on behalf of execution pattern.Resource D can comprise the height special feature of limited sets of applications.Described functional part is high energy consumption, costliness and complicated relatively.Therefore, for saving cost, in application-specific, on sequencer S2, realize for resource D, and on sequencer S1, do not realize.Yet, as previously mentioned, by processing mapping resources available on each sequencer in many sequencer systems being become can make client's sequencer can thread be transferred to have on the sequencer of carrying out this thread processing resource required or through optimizing carry out, act on behalf of the asymmetry that execution mechanism 700 has hidden sequencer in many sequencer systems with acting on behalf of execution mechanism.For example, for realizing OS service (handling or system call) as page fault, act on behalf of execution mechanism 700 and also can be used for the instruction block of carrying out on hiding sequencer at OS transferred on the visible sequencer of OS and carry out, this Fig. 11 in reference to the accompanying drawings explains below in more detail.
For the given physics realization of many sequencer systems, as previously mentioned, act on behalf of execution mechanism 700 and can instruct with SEMONITOR and SXFR and construct, and comprise mapping mechanism with asymmetric resource organizations.Usually, acting on behalf of execution mechanism 700 can reside on hardware, firmware (for example microcode) or system software layer or the application software layer.In one embodiment, acting on behalf of execution mechanism 700 can instruct with SEMONITOR and SXFR and handle two class agency services.The first kind is called outlet (egress) service scenarios, and second class is called inlet (ingress) service scenarios.On client's sequencer, for unavailable in client's sequencer or physically unsupported one group of resource and associative operation, the outlet service scenarios is defined by these operations are decided to be trap or fault.Each exports scene and is mapped to the sequencer ID (and instruction pointer (SIDEIP)) that points to servo sequencer.This mapping available hardware, firmware or even realize with software.As previously mentioned, the agency of servo sequencer can realize with signaling between sequencer then.
Servo sequencer is responsible for supporting to not appearing on client's sequencer but is appeared at the proxy access that the resource on the servo sequencer is carried out.On behalf of client's sequencer, the inlet service scenarios is defined and is configured in the service channel and is mapped to act on behalf of the local service handling procedure (handler code) of execution.Provided the tabulation of exemplary outlet and inlet service scenarios in the table of Fig. 6 A.
In some sense, the outlet service scenarios is corresponding to trap or failed operation, described operation owing to required to unavailable on client's sequencer and cause " miss (miss) " on client's sequencer in the visit that processing resource available on the servo sequencer is carried out.On the contrary, the inlet service scenarios is corresponding to the asynchronous interrupt state, and this state indication representative is not handled local client's sequencer of handling resource and visited the arrival that the request of resource is handled in this locality available on servo sequencer.Acting on behalf of execution mechanism will be defined as with the level of abstraction that each sequencer in many sequencers is associated and can make client's sequencer and servo sequencer co-operation carry out proxy resources visit.In at least one embodiment, the agency carries out with the firmware realization or or realizes with hardware that directly the proxy resources visit is transparent for user-level software and OS.
Except service scenarios triggers the situation of dedicated processes flow of program code, each service scenarios serve as with conventional I SA in the similar role of operational code.Therefore, available SXFR instruction also utilizes the outlet service scenarios of the handling procedure-code that is mapped to the instruction that just is being synthesized to synthesize new compound instruction as metainstruction.In one embodiment, the relation object between service scenarios ID and its handler code stream is similar to the relation between complex instruction set computer (CISC) (CISC) operational code and its corresponding microcode flow.CISC can construct as standard instruction basis with supervision known to the user class sequencer and control transfer instruction, to set up microcode flow.As previously mentioned, the mapping between service scenarios and its handler code realizes by SEMONITOR, and SXFR is provided at the mechanism that transmits control message between the sequencer.The triggering of the execution of the handler code that is mapped to service scenarios is served as in the communication of control messages.
In one embodiment, sequencer hides that mechanism 702 can be used to shine upon or the visible sequencer of the OS that divide into groups and OS hide the particular combinations of sequencer, with the formation logic processor.This mapping can be the one-to-many mapping, and the visible sequencer of wherein single OS is mapped to a plurality of OS and hides sequencer, or the multi-to-multi mapping, and the visible sequencer of wherein a plurality of OS is mapped to a plurality of OS and hides sequencer.For example, Fig. 9 represents to comprise many sequencer systems of two logic processors 900 and 902. Logic processor 900 and 902 respectively comprises the one-to-many mapping, and the visible sequencer of wherein single OS is mapped to a plurality of OS and hides sequencer.
Forward Figure 10 to, exemplary many sequencer systems 1000 can comprise 18 sequencers altogether, and wherein two visible sequencers of OS are mapped to the hiding sequencer of 16 OS and define the multi-to-multi mapping.In the logic processor of system 1000, these two visible sequencers of OS can both serve as OS and hide any one agency in the sequencer.
In one embodiment, the hiding mechanism 702 of sequencer can make sequencer break away from OS control selectively and be hidden.Different embodiment according to the subject invention, sequencer can be hidden after startup, or is hidden in some example even between the starting period.For hide sequencer under OS control, the hiding mechanism 702 of sequencer can be provided with designator to OS and indicate sequencer at disabled state.For example, sequencer is hidden mechanism 702 and can be imitated the power or the power/performance states of sequencer and come OS indication sequencer has been entered specific down state, makes OS will think sequencer overload or overheated and can not assign this sequencer and calculate or dispatch command.In one embodiment, for the sequencer of realizing power save mechanism (as Intel SpeedStep  technology), sequencer is hidden mechanism 702 and the particular subset of the visible sequencer of OS can be turned to specific power rating to indicate the subclass of sequencer to be in down state, make OS will think sequencer the subclass overload and therefore do not assign and calculate the subclass of giving sequencer.In to the OS transparent way, SXFR and SEMONITOR instruction can be used for calculating or thread for hiding the sequencer scheduling.
In one embodiment, execute thread in case hide sequencer, the control of this hiding sequencer can be given back OS.This can come OS indicated and hide the mechanism that instruction sequencer no longer is in down state and realize by designator is set.
In one embodiment, it is synchronous hiding the privileged state of the instruction sequencer privileged state corresponding with the calculating section of the non-hiding instruction sequencer under OS controls still.
Usually, be that standard ground supports general M:N multi-thread software bag, be about to the software package that M thread is mapped to N sequencer, wherein M>>N, required minimum brick pattern component synchronization is to liking critical section (critical section) and incident.Use these synchronization objects, can construct higher synchronization object (as mutual exclusion, conditional-variable and semaphore).Critical section can be realized by the hardware primitive that locks.Hiding sequencer can be inherited the state in the non-hiding sequencer, makes that the view of virtual memory all is identical for hiding sequencer with non-hiding sequencer.Incident can be by supporting with the synthetic event driven many sequencers scheduler program of SXFR and SEMONITOR instruction (concentrated or distributed).For example, can create simple POSIX adaptation or the compatible distributed scheduling program that has by the task queue of overall importance of critical section protection.The visit that effectively scheduler program of operation copies and effort is competed the task queue head of each sequencer moves on sequencer to capture next ready affairs thread.If affairs on the sequencer are being waited for cogradient variable (as mutual exclusion, conditional-variable or semaphore), these affairs will be placed on global task's rear of queue by surrender and go scheduling (de-scheduled) after entering corresponding critical section.
Owing to extensively adopt thread primitive in most of modern OS thread libraries, therefore, a large amount of existing thread code of setting up on the basis of that adapt at these POSIX or compatible thread library can be transplanted to many sequencer environment.Nature, the heading file in the thread may must be remapped, and leaves over thread code and must recompilate.
By using SFXR and SEMONITOR instruction and INIT scene, can be at the thread of scheduled for executing under the condition of not using OS on the hiding sequencer of OS.Therefore, by means of technology disclosed herein, can set up the sequencer that has more than OS the many sequencer systems that can support, and allow not the user class scheduling of the thread on the sequencer of many sequencer systems of being supported by OS.
Therefore, in one embodiment, have through a plurality of instruction sequencers of the instruction set of expansion can also be on a plurality of processors that its quantity can be supported naturally greater than this OS institute support single image OS.For example, can support the OS of 4 road instruction sequencers to realize as actual hard-wired OS with 32 road instruction sequencer systems.This allows application program to use the sequencer quantity of supporting than this OS to limit more processor.Instruction sequencer can be asymmetric sequencer or symmetrical sequencer.
The embodiment that agency in many sequencer systems carries out is described now, wherein some sequencer be OS visible and other to be OS sightless.Usually, when code on operating in the sightless sequencer of OS causes the page fault that needs the OS service or system call, act on behalf of execution mechanism and guarantee correct processing.Referring now to Figure 11 of accompanying drawing, it has illustrated the trigger event that response agent is carried out, and has the process flow diagram of the operation that the OS OS service on the sequencer of hiding of sequencer IDSID1 carries out for influence.Run into trigger event when one, OS hide sequencer SID1 1100 execution command SSAVE (1, ST_1_0).This trigger event can be the predetermined state of carrying out under the architecture states of needs OS service (for example trap page fault, or system call).The execution context that it carries out the thread that produces trigger event is preserved in this instruction.Be convenient and describe that the contextual conservation zone of the execution of this thread designated (ST_1_0) at least one embodiment, can not cause page fault to the visit of this conservation zone.1102, be delivered to the visible sequencer SID0 of OS for exporting service scenarios " BEGIN_PROXY ", carry out the SXFR instruction.Notice that because comprise conditional parameter " WAIT " in the 1102 SXFR instructions of carrying out, the processing that sequencer SID1 goes up instruction will get clogged and wait for and act on behalf of finishing of execution thread on the sequencer SID0.1104, sequencer SID0 detects the signal from sequencer SID1, and the execution of surrender or " the interim hang-up " current thread.1106, carry out the SSAVE instruction to preserve execution context or the state that is associated with sequencer SID0.Carry out the context conservation zone and be denoted as " ST_0_0 ", it and ST_1_0 are not overlapping.1108, agency position is changed to 1 and is just operating in indication sequencer SID0 and to act on behalf of execution pattern.1110, to carry out context recovery operation (SRSTOR) and come replication status " ST_1_0 ", it is the execution context that is associated with page fault on the SID1.1112, on sequencer SID0, duplicate or imitate page fault.1114, carry out the annular conversion so that control is switched to OS.OS serves page fault.When (i.e. annular conversion) switched in OS service in the privilege level from OS to the user class after, finishing immediately and if the agency is ON, then the END_PROXY scene is initiated as surrender incident in the sequencer.In the surrender button.onrelease, because the END_PROXY scene is carried out the context preservation 1116 and carried out context " ST_1_1 " to preserve.Be changed to 0 in 1118 agency positions.1120, carry out the SXFR instruction so that service scenarios " END_PROXY " is delivered to sequencer SID1.1122, the sequencer SIDO ST_0_0 that returns to form.1124, sequencer SID1 recovers context " ST_1_1 " once receiving the surrender of " END_PROXY " scene 1126, and the feasible execution that meets with the thread of trigger event can restart.
In one embodiment, the agency carries out that to can be response when carrying out user-level thread detected at the visible instruction sequencer of OS be in the asymmetric state between the instruction sequencer under the application layer programmed control and carry out the user-level thread migration.
Asymmetric state between the instruction sequencer can comprise the following state of the level of annular/special permission being changed the demand of (ring/privilege level transition) that causes at least: comprising: page fault or system call; The instruction performance of carrying out the instruction sequencer of user-level thread lacks (the invalid op code fault of for example, not approving of specific instruction on the sequencer and result to obtain); And the difference on the instruction execution performance between two instruction sequencers.
State transition term of execution of agency can be heavy burden (heavy weight) or light negative (lightweight).The heavy burden migration is complete buffer status, and this state is preserved and returned on the reception sequencer from transmitting sequencer.The heavy burden migration has at least in an instruction that is receiving for the transmission sequencer in the user-level thread of carrying out on the sequencer.The heavy burden migration is considered: after replacing the move instruction sequencer to carry out one or more instructions on the reception sequencer, the user-level thread that just is being performed is stayed and is received sequencer or echo-plex sequencer.
Light negative transfer has had a lot of variations, and it is to simplify to be particular state that this paper intends.Light negative transfer can comprise that some a spot of state of transmission just can make some little tasks can be processed.In some light negative transfer scenes, be not performed from the instruction reality of user-level thread, for example, under the page fault state.Instruction sequencer under the application layer programmed control is skipped and is caused the address of page fault to be transmitted.The reception sequencer only carries out probe (probe) loading is loaded this page fault, and the situation that the task executed that will need then finishes conveys to the instruction sequencer that is under the application layer programmed control.Therefore, migration may and not mean that the instruction in the user-level thread of migration is actually carried out.
Therefore, when second instruction sequencer is carried out " representative " or " coming from " action at first instruction sequencer of carrying out user-level thread, agency's execution has in fact just taken place.In one embodiment, handle for the light negative of page fault, the one side that the agency carries out comprises that the instruction in the user-level thread in first instruction sequencer that is in the application layer programmed control under carries out hang-up.Address pointer is transferred to the visible instruction sequencer of OS from first instruction sequencer that is under the application layer programmed control.Carry out the content of address pointer sensing place loads with the visible instruction sequencer of OS.At last, after the content of address pointer sensing place had been loaded, the execution of first user-level thread in the instruction sequencer under the application layer programmed control restarted.
Comprising on the other hand that the agency carries out is sent to the visible instruction sequencer of OS with control and status information from the hiding instruction sequencer of OS.And, the execution of at least one instruction is moved to the visible instruction sequencer of OS from first user-level thread that OS hides on the instruction sequencer, makes the visible instruction sequencer of OS can trigger an operating system and represents OS to hide instruction sequencer execution OS operation.
Figure 12 represents the disposal system 1200 of one embodiment of the invention.Just as will be seen, system 1200 comprises the processing element 1202 that is connected to storer 1204.In one embodiment, processing element 1202 comprises a plurality of instruction sequencers, only illustrates wherein two in Figure 12 of accompanying drawing, and they are denoted as 1206A and 1206B respectively in the accompanying drawings.Processing element 1202 also comprises control metastasis 1208, comprising signaling mechanism 1210 and monitoring mechanism 1212.Signaling mechanism 1210 is used between the sequencer of processing element 1202 and sends scene/control Transfer Message.So, in one embodiment, signaling mechanism 1210 comprises the logical block of carrying out aforementioned SXFR instruction.Monitoring mechanism 1212 can be used to any one in the instruction sequencer of set handling parts 1202, comprises the signal of specific control messages/scene with supervision.In one embodiment, monitoring mechanism comprises the logical block that aforementioned SEMONITOR instruction is deciphered.
As previously mentioned, processing element 1202 also comprises the hiding mechanism 1214 of sequencer.
Storer 1204 can comprise operating system.In one embodiment, this operating system can be carried out the context switching by the whole buffer status of storing last task and the whole buffer status that recover next task.
In processing element 1202, available various technology are provided with for example sequencer 1206B, to monitor the signal specific from sequencer 1206A.In one embodiment, sequencer 1206B can be monitored the signal that carries specific control messages/scene by pre-configured (that is, mustn't want the Any user configuration step).In one embodiment, sequencer 1206B can be by the pre-configured signal that carries the INIT scene that monitors.It is to be understood that user level instruction (as SXFR) can be used to trigger the execution that sequencer 1206B goes up setup code.Setup code itself can comprise SEMONITOR instruction, and this instruction can be used for being provided with sequencer 1206B and monitors signal specific (scene) from sequencer 1206A.
In another embodiment, the instruction of the SEMONITOR known to the sequencer can be carried out on sequencer 1206A so that sequencer 1206B monitors the signal specific/scene from sequencer 1206A.In another embodiment, point to the pointer of the memory location of storage start-up routine/setup code, a contextual part that can be used as the sequencer 1206A that uses aforesaid SSAVE instruction is preserved.For this embodiment, can on sequencer 1206B, carry out SRSTOR and instruct the context/state that recovers sequencer 1206A, make start-up routine/setup code to be performed.Start-up routine/setup code itself comprises at least one SEMONITOR instruction, and this instruction is provided with sequencer 1206B and monitors signal specific/scene from sequencer 1206A.
Figure 13 is the block diagram that the illustrative computer system of an embodiment that can use processor parts (as CPU or chipset) is described, described parts comprise one or more instruction sequencers, and described sequencer is designed to carry out one or more user-level threads that comprise the user level instruction known to the sequencer.In one embodiment, computer system 1300 comprises communication mechanism or the bus 1311 that is used to the information that transmits, and the integrated circuit components that is used for process information, as is connected to the primary processor 1312 of bus 1311.One or more parts in the computer system 1300 or equipment (as primary processor 1312 or chipset 1336) can use the embodiment of the instruction sequencer that is designed to carry out one or more user-level threads.Primary processor 1312 can be made of as a cooperative processor core in unit one or more.
Computer system 1300 also comprises random-access memory (ram) or other dynamic storage 1304 (being called main memory) of the instruction that is used for canned data and will be carried out by primary processor 1312 that is connected to bus 1311.Temporary variable term of execution that main memory 1304 also can be used for storing the instruction of being undertaken by primary processor 1312 or other intermediate information.
Firmware 1303 can be the combination of software and hardware, as has the EPROM (EPROM) of the operation that is used to be recorded in the routine on the EPROM.Firmware 1303 can be embedded into foundation code, basic input/output code (BIOS) or other similar code.Firmware 1303 can make the self-starting of computer system 1300 become possibility.
Computer system 1300 also comprise be connected to bus 1311 be used to store the static information of primary processor 1312 and the ROM (read-only memory) of instruction (ROM) and/or other static memory 1306.Static memory 1306 can be stored OS level and application layer software.
Computer system 1300 also can be connected to display device 1321 (as cathode ray tube (CRT) or LCD (LCD)), and described display device is connected and gives computer user's display message on the bus 1311.Chipset is connected with display device 1321 interfaces.
Alphanumeric Entry Device (keyboard) 1322 comprises alphanumeric key and other key, also can be connected and give primary processor 1312 transmission information and command selection on the bus 1311.A kind of additional user input equipment is a cursor control device 1323, as mouse, tracking ball, track pad, stylus or cursor direction key, described cursor control device is connected on the bus 1311 directional information and command selection is sent to primary processor 1312, and the cursor on the control display device 1321 moves.Chipset can be connected with the input-output device interface.
Other equipment that can be connected on the bus 1311 is hard copy device 1324, and it can be used at medium (as paper, film or similar medium) print order, data or out of Memory.In addition, as possibility, also sound recording (can be connected on the bus 1311 to be connected with computer system 1300 audio interface as loudspeaker and/or microphone (not shown) with playback apparatus.Other equipment that can be connected to bus 1311 is wired/wireless communication facilitiess 1325.
In one embodiment, be used for making the software of the easier operation of routine to be embedded on the machine readable medium.Any providing (promptly is provided machine readable medium, storage and/or transmission) with machine (for example, computing machine, the network equipment, personal digital assistant, fabrication tool, any equipment etc.) with one or more processors but any mechanism of the information that the access form exists.For example, machine readable medium comprise can record/can not recording medium (for example, ROM (read-only memory) (ROM) (comprising firmware), random-access memory (ram), magnetic disk storage medium, optical memory medium, flash memory etc.), and the transmitting signal of electricity, light, sound or other form (for example, carrier wave, infrared signal, digital signal etc.) etc.
Between development stage, design can be experienced each stage, simulates making from creating to.Express the data of design and can express design with many modes.The first, hardware available hardware descriptive language or functional description language represent that this is useful in simulation.In addition, the circuit stages model of logical block and/or transistor gate can produce in some stage of design process.In addition, the great majority design reaches the data level of representing the physical layout of various device in the hardware model in some stage.In the occasion of using the conventional semiconductors manufacturing technology, the data of expression hardware model can be the existence or the non-existent data of the various features on the different mask layers that are given for the mask of producing integrated circuit.In any expression of design, data can be stored on any type of machine readable medium.Machine readable medium can be any light wave or electric wave modulation or otherwise that produce, that be used for changing this information storer or magnetic or optical memory (as memory disc).In these media any one can " be carried " or " indication " design or software information.When indication or the electric carrier wave that carries code or design are transmitted, electric signal duplicate, cushion or transmission is performed again scope in, will produce new copy.Therefore, communication provider or network provider can be made the copy of the product (carrier wave) that embodies the technology of the present invention.
Although above some embodiment be described and be described in the accompanying drawings, but it is to be understood that these embodiment are indicative rather than limitation of the present invention, and the invention is not restricted to the specific structure and the layout that illustrate and describe, because those skilled in the art can propose various other change embodiment after research the present invention.Development rapidly and progress be difficult in the technical field of prediction because the promotion of technical progress, be not difficult to the disclosed embodiments arrange and details on modify and do not deviate from the scope of principle of the present invention or additional claim.

Claims (37)

1. method comprises:
The execution of the user level instruction of response on second instruction sequencer under the application layer programmed control, leading subscriber level thread on first instruction sequencer; And
Operation comprises first user-level thread of one or more user level instructions on second instruction sequencer, and wherein, it is one of following two that first user level instruction contains at least: the field of 1) quoting one or more instruction sequencers; And 2) impliedly quote code with pointer, described code is the one or more instruction sequencers of addressing specifically when carrying out.
2. the method for claim 1, wherein: leading subscriber level thread comprises carries out state and control transfer operation known to the sequencer to user-level thread operation, and described user-level thread operation is selected from the group that comprises user-level thread creation operation, user-level thread control operation and user-level thread synchronous operation.
3. the method for claim 1 also comprises:
On the sourse instruction sequencer, carry out the user class control transfer instruction, wherein, described user class control transfer instruction regulation control messages and target instruction target word sequencer; And
Respond the execution of described user class control transfer instruction, described sourse instruction sequencer sends and comprises the signal of control messages to described target instruction target word sequencer.
4. the method for claim 1 also comprises:
Carry out the user class monitor command on the sourse instruction sequencer, wherein, the user-level thread operation is carried out in the memory location of described user class monitor command define objective instruction sequencer, control messages and the handler code that is associated with control messages; And
Respond the execution of described user class monitor command, create the mapping between the memory location of described target instruction target word sequencer, described control messages and described handler code.
5. the method for claim 1 also comprises:
On first instruction sequencer of one or more other instruction sequencers of regulation, carry out user class and hold instruction, and respond the execution that described user class holds instruction the execution context of one or more other instruction sequencers is saved.
6. the method for claim 1 also comprises:
Carry out user class and recover instruction on first instruction sequencer of one or more other instruction sequencers of regulation, the execution context of described other instruction sequencers recovers when described user level instruction is performed.
7. the method for claim 1 also comprises:
First user-level thread the term of execution, one runs into predetermined state, the instruction of just moving in second instruction sequencer is carried out, to carry out the part of those instructions on first instruction sequencer.
8. method as claimed in claim 7, wherein, described predetermined state is: detect second instruction sequencer that is under the application layer programmed control and the asymmetrical state between first instruction sequencer.
9. method as claimed in claim 8, wherein, described asymmetrical state is selected from the group that comprises following state: need operating system (OS) to carry out the fault that is run into that the OS operation solves; Need operating system (OS) to carry out the trap that is run into that the OS operation solves; Under the application layer programmed control and can not directly enable system call on the instruction sequencer of OS service; The operating troubles of perhaps not approving of, wherein, the instruction sequencer under the application layer programmed control lacks the execution that internal resource is supported the instruction of first in first user-level thread.
10. the method for claim 1 also comprises:
To control with status information and be sent to the visible instruction sequencer of OS from the hiding instruction sequencer of OS; And
The visible instruction sequencer of OS is moved in the execution of OS being hidden at least one instruction in first user-level thread on the instruction sequencer, makes the visible instruction sequencer of OS can represent OS to hide instruction sequencer trigger action system and carries out the OS service.
11. the method for claim 1 also comprises:
The execution of first user-level thread in second instruction sequencer under the application layer programmed control is hung up;
Address pointer is transferred to the visible instruction sequencer of OS from second instruction sequencer, and wherein, described address pointer points to the content of storing in storer;
Load the content that address pointer points to the visible instruction sequencer of OS; And
After the content that described address pointer points to has been loaded, the execution that recovers first user-level thread in second instruction sequencer.
12. a device comprises:
The processor that comprises one or more instruction sequencers, described instruction sequencer is configured to carry out the user-level thread that comprises the user level instruction known to the sequencer, and described instruction is available for being undertaken controlling between sequencer by the user-level thread bookkeeping on the instruction sequencer of regulation;
User level instruction known to the one or more demoders, decodable code first sequencer; And
One or more instruction execution units can be carried out the user level instruction known to first sequencer.
13. device as claimed in claim 12 also comprises:
First instruction sequencer that contains first demoder, described demoder decodable code are specified the user level instruction known to first sequencer of second instruction sequencer.
14. device as claimed in claim 12 also comprises:
Contain first instruction sequencer of instruction execution unit, described execution unit can be carried out the user level instruction known to first sequencer, and described instruction is impliedly quoted second instruction sequencer by the execution of correlative code when being performed.
15. device as claimed in claim 12 also comprises:
First demoder, it converts the content in the field of the user level instruction known to first sequencer through the decoded instruction code to, wherein, user level instruction known to first sequencer comprises the control transfer instruction with one or more fields, described fields specify control messages and target instruction target word sequencer.
16. device as claimed in claim 12 also comprises:
Carry out first instruction execution unit of the user level instruction known to first sequencer, wherein, user level instruction known to first sequencer comprises control transfer instruction, described control transfer instruction comprises the data payload portions, with when being carried out described control transfer instruction by first instruction execution unit at REFER object instruction sequencer semantically.
17. device as claimed in claim 12 also comprises:
First demoder, it converts the content in the field of the user level instruction known to first sequencer through the decoded instruction code to, wherein, user level instruction known to first sequencer comprises the monitor command with one or more fields, and the memory location of described fields specify target instruction target word sequencer, control messages and the handler code that is associated with control messages is to carry out the user-level thread bookkeeping.
18. device as claimed in claim 12 also comprises:
First demoder, it converts the content in the field of the user level instruction known to first sequencer through the decoded instruction code to, wherein, user level instruction known to first sequencer comprises and has holding instruction of one or more fields, at least one instruction sequencer of described fields specify, respond described execution of holding instruction, the context state of described instruction sequencer is saved.
19. device as claimed in claim 12 also comprises:
First demoder, it converts the content in the field of the user level instruction known to first sequencer through the decoded instruction code to, wherein, user level instruction known to first sequencer comprises the recovery instruction with one or more fields, at least one instruction sequencer of described fields specify, it is carried out context and will be resumed.
20. device as claimed in claim 12, wherein: described user-level thread bookkeeping is selected from the group that comprises user-level thread creation operation, user-level thread control operation and user-level thread synchronous operation.
21. device as claimed in claim 12 also comprises:
The customer instruction sequencer, it contains one group of customer resources and comes processing instruction;
The servo instruction sequencer, it contains one group of servo resource and comes processing instruction; And
Act on behalf of execution mechanism, can allow described customer instruction sequencer respond the term of execution detected predetermined state and under the condition that does not have operating system to intervene of first user-level thread on described client's sequencer, trigger proxy user level thread and on the described servo instruction sequencer of the described customer instruction sequencer of representative, carry out.
22. device as claimed in claim 21, wherein: described customer resources and described servo resource are asymmetric, and described predetermined state indicate first user-level thread just attempting to use unavailable on client's sequencer but on servo sequencer available resource.
23. device as claimed in claim 22, wherein: the described execution mechanism of acting on behalf of has hidden customer resources in the user class program and the asymmetry between the servo resource.
24. device as claimed in claim 21, wherein: the described execution mechanism of acting on behalf of comprises one group of outlet scene that is associated with the customer instruction sequencer, and each exports the trigger condition that the agency on the scenario definition initialization servo instruction sequencer carries out.
25. device as claimed in claim 12 also comprises:
The customer instruction sequencer, it contains one group of customer resources and comes processing instruction;
The servo instruction sequencer, it contains one group of servo resource and comes processing instruction; And
Act on behalf of execution mechanism, can allow customer instruction sequencer response the term of execution detected predetermined state and trigger control and be sent to described servo instruction sequencer from described customer instruction sequencer with status information at first user-level thread, wherein, described customer instruction sequencer is moved to described servo instruction sequencer with the execution of at least one instruction in described first user-level thread, makes described servo instruction sequencer can represent described customer instruction sequencer to come the trigger action system to carry out the OS operation.
26. device as claimed in claim 12 also comprises: the customer instruction sequencer, it contains one group of customer resources and comes processing instruction.
27. device as claimed in claim 26 also comprises: the servo instruction sequencer, it contains one group of servo resource and comes processing instruction.
28. device as claimed in claim 27 also comprises:
Act on behalf of execution mechanism, can allow customer instruction sequencer response the term of execution detected asymmetrical state and trigger address pointer is transferred to described servo instruction sequencer from described customer instruction sequencer at first user-level thread, wherein, described servo instruction sequencer loads the content of described address pointer sensing place, and described customer instruction sequencer is carried out the instruction in first user-level thread after the content of described address pointer sensing place is loaded.
29. a system comprises:
Processor, it comprises that two or more instruction sequencers are to carry out different user-level threads, wherein, at least some comprise the user level instruction known to the sequencer in described two or more instruction sequencers in its instruction set, and described user level instruction is available for controlling between the sequencer that is undertaken by the user-level thread bookkeeping on the instruction sequencer of regulation;
First instruction sequencer, it moves under the control of application layer program when the user level instruction known to carrying out described sequencer;
Second instruction sequencer, it moves under the control of operating system; And
Be connected to the nonvolatile memory of described processor, therein storage operating system.
30. system as claimed in claim 29 also comprises:
First demoder, it converts the content in the field of the user level instruction known to first sequencer through the decoded instruction code to, wherein, user level instruction known to first sequencer comprises the control transfer instruction with one or more fields, described fields specify control messages and target instruction target word sequencer, and described first demoder converts the content in the described field through the decoded instruction code to.
31. system as claimed in claim 29 also comprises:
First instruction execution unit, it carries out the user level instruction known to first sequencer, wherein, user level instruction known to first sequencer comprises control transfer instruction, described control transfer instruction comprises the data payload portions, with when being carried out described control transfer instruction by first instruction execution unit at REFER object instruction sequencer semantically.
32. system as claimed in claim 29 also comprises:
First demoder, it converts the content in the field of the user level instruction known to first sequencer through the decoded instruction code to, wherein, user level instruction known to first sequencer comprises the monitor command with one or more fields, and the memory location of described fields specify target instruction target word sequencer, control messages and the handler code that is associated with control messages is to carry out the user-level thread bookkeeping.
33. system as claimed in claim 29 also comprises:
First demoder, it converts the content in the field of the user level instruction known to first sequencer through the decoded instruction code to, wherein, user level instruction known to first sequencer comprises and has holding instruction of one or more fields, at least one instruction sequencer of described fields specify, its context state will respond described execution of holding instruction and be saved.
34. system as claimed in claim 29 also comprises:
First instruction execution unit, it carries out the user level instruction known to first sequencer, and wherein, the user level instruction known to first sequencer comprises the recovery instruction with one or more fields, at least one instruction sequencer of described fields specify, it is carried out context and will be resumed.
35. system as claimed in claim 29 also comprises:
First execution unit, it carries out the user level instruction known to first sequencer; And
Act on behalf of execution mechanism, allow first instruction sequencer respond the execution of the user level instruction known to first sequencer and trigger first user-level thread and carry out representing on second instruction sequencer of first instruction sequencer.
36. system as claimed in claim 29 also comprises:
Act on behalf of execution mechanism, the term of execution detected predetermined state and triggering that allows first instruction sequencer respond first user-level thread will control and status information is sent to second instruction sequencer from first instruction sequencer, wherein, first instruction sequencer is moved to second instruction sequencer with the execution of at least one instruction in described first user-level thread, makes second instruction sequencer can represent first instruction sequencer to come the trigger action system to carry out the OS operation.
37. system as claimed in claim 29 also comprises:
Act on behalf of execution mechanism, allow first instruction sequencer respond the term of execution detected predetermined state and trigger address pointer is transferred to second instruction sequencer from first instruction sequencer of first user-level thread, wherein, second instruction sequencer loads the content that described address pointer points to, and first instruction sequencer is carried out the instruction in first user-level thread after the content that described address pointer points to is loaded.
CN2005800448962A 2004-12-30 2005-12-28 A mechanism for instruction set based thread execution on a plurality of instruction sequencers Expired - Fee Related CN101116057B (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
US64042504P 2004-12-30 2004-12-30
US60/640,425 2004-12-30
US11/173,326 2005-06-30
US11/173,326 US8719819B2 (en) 2005-06-30 2005-06-30 Mechanism for instruction set based thread execution on a plurality of instruction sequencers
PCT/US2005/047328 WO2006074024A2 (en) 2004-12-30 2005-12-28 A mechanism for instruction set based thread execution on a plurality of instruction sequencers

Publications (2)

Publication Number Publication Date
CN101116057A true CN101116057A (en) 2008-01-30
CN101116057B CN101116057B (en) 2011-10-05

Family

ID=36579277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2005800448962A Expired - Fee Related CN101116057B (en) 2004-12-30 2005-12-28 A mechanism for instruction set based thread execution on a plurality of instruction sequencers

Country Status (4)

Country Link
JP (2) JP5260962B2 (en)
CN (1) CN101116057B (en)
DE (1) DE112005003343B4 (en)
WO (1) WO2006074024A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103052930A (en) * 2011-07-27 2013-04-17 赛普拉斯半导体公司 Method and apparatus for parallel scanning and data processing for touch sense arrays
CN105683905A (en) * 2013-11-01 2016-06-15 高通股份有限公司 Efficient hardware dispatching of concurrent functions in multicore processors, and related processor systems, methods, and computer-readable media
CN106464746A (en) * 2014-06-06 2017-02-22 萨思学会有限公司 Computer system to support failover in event stream processing system
CN108241504A (en) * 2011-12-23 2018-07-03 英特尔公司 The device and method of improved extraction instruction
US10102028B2 (en) 2013-03-12 2018-10-16 Sas Institute Inc. Delivery acknowledgment in event stream processing

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB0408164D0 (en) 2004-04-13 2004-05-19 Immune Targeting Systems Ltd Antigen delivery vectors and constructs
GB0716992D0 (en) 2007-08-31 2007-10-10 Immune Targeting Systems Its L Influenza antigen delivery vectors and constructs
US20070150895A1 (en) * 2005-12-06 2007-06-28 Kurland Aaron S Methods and apparatus for multi-core processing with dedicated thread management
JP4978914B2 (en) * 2007-10-19 2012-07-18 インテル・コーポレーション Method and system enabling expansion of multiple instruction streams / multiple data streams on a microprocessor
FR2950714B1 (en) * 2009-09-25 2011-11-18 Bull Sas SYSTEM AND METHOD FOR MANAGING THE INTERLEAVED EXECUTION OF INSTRUCTION WIRES
US9569278B2 (en) * 2011-12-22 2017-02-14 Intel Corporation Asymmetric performance multicore architecture with same instruction set architecture
WO2022040877A1 (en) * 2020-08-24 2022-03-03 华为技术有限公司 Graph instruction processing method and device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2882475B2 (en) * 1996-07-12 1999-04-12 日本電気株式会社 Thread execution method
US6651163B1 (en) * 2000-03-08 2003-11-18 Advanced Micro Devices, Inc. Exception handling with reduced overhead in a multithreaded multiprocessing system
JP4651790B2 (en) * 2000-08-29 2011-03-16 株式会社ガイア・システム・ソリューション Data processing device
US20020199179A1 (en) * 2001-06-21 2002-12-26 Lavery Daniel M. Method and apparatus for compiler-generated triggering of auxiliary codes
US7487502B2 (en) * 2003-02-19 2009-02-03 Intel Corporation Programmable event driven yield mechanism which may activate other threads

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103052930A (en) * 2011-07-27 2013-04-17 赛普拉斯半导体公司 Method and apparatus for parallel scanning and data processing for touch sense arrays
CN108241504A (en) * 2011-12-23 2018-07-03 英特尔公司 The device and method of improved extraction instruction
US10102028B2 (en) 2013-03-12 2018-10-16 Sas Institute Inc. Delivery acknowledgment in event stream processing
CN105683905A (en) * 2013-11-01 2016-06-15 高通股份有限公司 Efficient hardware dispatching of concurrent functions in multicore processors, and related processor systems, methods, and computer-readable media
CN106464746A (en) * 2014-06-06 2017-02-22 萨思学会有限公司 Computer system to support failover in event stream processing system
CN106464746B (en) * 2014-06-06 2018-11-06 萨思学会有限公司 Support the method and non-transitory computer-readable media and system of the failure transfer in event stream processing system

Also Published As

Publication number Publication date
JP2011023032A (en) 2011-02-03
WO2006074024A2 (en) 2006-07-13
CN101116057B (en) 2011-10-05
DE112005003343T5 (en) 2007-11-29
DE112005003343B4 (en) 2011-05-19
JP5260962B2 (en) 2013-08-14
JP2008527501A (en) 2008-07-24
WO2006074024A3 (en) 2006-10-26
JP5244160B2 (en) 2013-07-24

Similar Documents

Publication Publication Date Title
CN101116057B (en) A mechanism for instruction set based thread execution on a plurality of instruction sequencers
US10452403B2 (en) Mechanism for instruction set based thread execution on a plurality of instruction sequencers
US8887174B2 (en) Mechanism for monitoring instruction set based thread execution on a plurality of instruction sequencers
JP7052170B2 (en) Processor and system
TWI321749B (en) Method, apparatus, article of manufacture and system to provide user-level multithreading
TWI233545B (en) Mechanism for processor power state aware distribution of lowest priority interrupts
JP4690988B2 (en) Apparatus, system and method for persistent user level threads
CN103348323B (en) Method and system for performance objective program in computer systems
US8296768B2 (en) Method and apparatus to enable runtime processor migration with operating system assistance
JP6317065B2 (en) Reconfigurable processor and code conversion apparatus and method thereof
US20080148259A1 (en) Structured exception handling for application-managed thread units
JP2011076639A (en) Mechanism to schedule thread on os-sequestered sequencer without operating system intervention
CN101320314A (en) Method and apparatus for quickly changing the power state of a data processing system
JPH06250853A (en) Management method and system for process scheduling
CN113448724A (en) Apparatus and method for dynamic control of microprocessor configuration
US9836323B1 (en) Scalable hypervisor scheduling of polling tasks
US20110231637A1 (en) Central processing unit and method for workload dependent optimization thereof
KR20220105678A (en) How to transfer tasks between heterogeneous processors
US20240134648A1 (en) Implementing heterogeneous instruction sets in heterogeneous compute architectures
JP2553526B2 (en) Multitasking processor
TW202303389A (en) Device, method and system to provide thread scheduling hints to a software process
CN114489793A (en) User timer programmed directly by application
JPS63276635A (en) Interruption control system in emulation mode

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111005

CF01 Termination of patent right due to non-payment of annual fee