WO2008049938A1 - Comunicación entre múltiples secuencias de procesamiento en un procesador - Google Patents
Comunicación entre múltiples secuencias de procesamiento en un procesador Download PDFInfo
- Publication number
- WO2008049938A1 WO2008049938A1 PCT/ES2006/070162 ES2006070162W WO2008049938A1 WO 2008049938 A1 WO2008049938 A1 WO 2008049938A1 ES 2006070162 W ES2006070162 W ES 2006070162W WO 2008049938 A1 WO2008049938 A1 WO 2008049938A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- processing sequence
- instruction
- log file
- core
- indicator
- Prior art date
Links
- 238000004891 communication Methods 0.000 title abstract description 5
- 238000000034 method Methods 0.000 claims abstract description 28
- 238000012545 processing Methods 0.000 claims description 190
- 238000004519 manufacturing process Methods 0.000 claims description 14
- 239000003550 marker Substances 0.000 claims description 14
- 238000003860 storage Methods 0.000 claims description 9
- 238000000151 deposition Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 238000002360 preparation method Methods 0.000 claims description 4
- 241000907897 Tilia Species 0.000 claims 1
- 230000003213 activating effect Effects 0.000 claims 1
- 230000015654 memory Effects 0.000 description 17
- 238000005457 optimization Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 238000009434 installation Methods 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 230000001360 synchronised effect Effects 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000001419 dependent effect Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 101100514842 Xenopus laevis mtus1 gene Proteins 0.000 description 1
- 238000001816 cooling Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000010432 diamond Substances 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline, look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3838—Dependency mechanisms, e.g. register scoreboarding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/52—Program synchronisation; Mutual exclusion, e.g. by means of semaphores
- G06F9/526—Mutual exclusion algorithms
Definitions
- nuclei by order may be less effective than nuclei without order in the use of ILP. That is, while processors in order can effectively manage applications in parallel, applications with a single processing sequence and portions of serial code in parallel applications may not behave effectively in such architectures. Accordingly, certain processors can separate or divide said applications in order to equate detailed sequence of processing sequences in order to maintain minimal complexity while improving efficiency.
- Figure 1 is a flow chart of a method according to an embodiment of the present invention.
- FIG. 2 is a block diagram of a marker frame according to an embodiment of the present invention.
- Figure 3 is a block diagram of an implementation of the instruction execution according to an embodiment of the present invention.
- Figure 4 is a flow chart of a method for generating flow mark information in accordance with an embodiment of the present invention.
- Figure 5 is a block diagram of a system according to an embodiment of the present invention.
- a multiplicity of processing sequences that are performed or executed in a processor core can access values contained in a log file associated with another processing sequence.
- the embodiments can provide efficient operation with multiple chains or simultaneous processing sequences (SMT - "simultaneous multithreading").
- first and second processing sequences can be carried out in a single processor core in order, such as an SMT processor.
- a control can be provided in order to make possible a producer / consumer model in which the data values generated by the first processing sequence by the second processing sequence can be accessed, and vice versa.
- Each of the processing sequences may be able to read the state of the structural or architectural records of the other processing sequence during the execution,
- a synchronization control can be provided in such a way that the sequence of consumer processing read the appropriate data in the production processing sequence.
- a scoreboard structure such as that which can be used in association with the issuance or assignment of instructions, may include synchronization indicators. More specifically, each entry in the scoreboard structure for a record of a first processing sequence may include a synchronization indicator for the corresponding record of the second processing sequence, and vice versa. The use of this indicator can prevent a consumer instruction from continuing until the corresponding production instruction of the other processing sequence is carried out, such that the desired value is present in the producer's log file.
- certain embodiments may provide flow information or associated with the instructions.
- indications or flow marks may be provided along with the content of an instruction.
- an instruction may include an operation code, source and destination operands, as well as flow marks for each of the source and destination operands.
- various mechanisms may be responsible for the generation of flow marks or for the corresponding instructions.
- a compiler support can be provided in such a way that it is possible to generate said stream marks or during compilation according to the instruction support provided in an instruction set architecture (ISA - "instruction set architecture" .
- an optimization mechanism such as a hardware or software optimization device, can analyze the flow or code control and generate flow marks accordingly. In this way, multiple processing sequences can be synchronized with a degree of shredding or shredding by instructions.
- a mechanism of deposit of processing sequence of low organizational activity can take advantage of the access of the log files of other sequences of processing. In this way, the expense of copying all the records of the context of a depositing processing sequence to the context of a depositing processing sequence can be avoided. This is so since it is possible to mark the appropriate consuming instructions contained in the deposit processing sequence, so that they read, instead, operands from the register file of the depositing processing sequence.
- certain embodiments may be used in association with the support of speculative operation with multiple processing sequences, as well as with the deposit of the processing sequences. Still further, certain embodiments may be used in association with the so-called aid processing sequences, which may be initiated in order to handle specific tasks that take place during the execution of another processing sequence.
- aid processing sequences which may be initiated in order to handle specific tasks that take place during the execution of another processing sequence.
- the scope of the present invention is not limited in this regard, and that logging access and synchronization mechanisms can be used in many different implementations, including different processor architectures, systems and so on.
- the method 10 can be used to carry out an instruction of a first chain or processing sequence that can access the information contained in a log file of a second processing sequence.
- method 10 can begin with the receipt of an instruction from the first processing sequence for execution (block 20). For example, üii uc tuna cCUU ⁇ uc ⁇ I ⁇ U UC C IOÜCS ut u ⁇ l pi ' u C cS ⁇ uux p ucuc i cti a decode the instruction.
- the instructions may include an operation code, intended to indicate a type of operation to be carried out, as well as an identification of the source and destination operands.
- the instruction may include information indicating whether any of the source / destination operands are to be accessed from a remote log file, for example, a log file associated with a second processing sequence, or if it is to be provided thereto.
- a remote log file it should be understood that a local log file and a remote log file may be present in a single core of a processor, such as a multiple processor or many Cores that include, for example, a certain number of cores in order.
- any operand of the instruction source is to be obtained from a distant processing chain or sequence (rhombus 30). That is, it can be determined, based on the flow mark information (in one of the embodiments), if any source operand is to be obtained from a distant processing sequence, for example, from a second processing sequence that It includes a log file contained within the SMT core. If so, the control goes to rhombus 40, where it can be determined whether the synchronization indicators associated with said operand or all said operands of distant source are active.
- a marker box or other storage that identifies status information relative to the availability of values in given records can be analyzed, in order to determine if all source operands to be obtained from the remote record file include the desired values. . In other words, it can be determined whether a production processing sequence has completed an operation on which an instruction dependent on the first processing sequence depends.
- the synchronization indicator (s) is not active, the rhombus 40 can be fed back or closed on itself in order to carry out the instruction until they are present.
- the synchronization indicators When the indicators of
- the control passes to block 60, in which the instruction can be issued for its execution.
- an instruction issuer such as an allocation device, a reserve station or other structure that includes a scoreboard or similar state storage, can issue the instruction to be carried out or executed.
- an execution unit such as an entire unit, a floating point unit (FPU) or other similar unit, can access the source operands indicated from the log file. specifically identified (block 70).
- FPU floating point unit
- the preparation or disposal indicator associated with the destination register can be adjusted accordingly.
- the synchronization indicator of this register can be reset in order to indicate that the value must not be synchronized. If, instead, it is indicated that the destination operand is to be used remotely, it is possible to set a synchronization indicator of the remote log file accordingly (eg, the second one). In addition, an arrangement indicator of the local log file can be set. In this way, both a distant processing sequence and the local processing sequence can access the destination operand as a source operand, which allows efficient producer / consumer operation, both within the same processing sequence and between sequences of processing.
- FIG. 2 a block diagram of a marker frame according to an embodiment of the present invention is shown therein, which includes support for synchronization between multiple processing sequences.
- a marker box 100 which may consist of a storage present within the core of a processor, may include entries intended to store status information associated with the records of multiple file files.
- each of a first processing sequence and a second processing sequence may include entries for each existing record in its log file. So, as shown in the
- a first processing sequence may include a plurality of inputs 1 12a-1 12n (generically, input 1 12). Each entry 12 may be indexed by the use of a registration identifier (ID), and each entry may include status information.
- ID registration identifier
- a preparation or arrangement indicator 1 14a-1 14n (generically, arrangement indicator 1 14) and a synchronization indicator 1 16a-1 16n (generically, indicator of synchronization 116).
- the disposition indicator 114 can be used to indicate when the corresponding operand that is stored in the identified register is ready to be used by consuming operations of that processing sequence, while the synchronization indicator 1 16 can indicate whether an operand distant to which it is to be accessed by the local processing sequence, it is ready for access by the local processing sequence, that is, if a production instruction of the production processing sequence (distant) has been carried out and stored the appropriate value in the desired position.
- a second processing sequence 120 may also include a plurality of entries 122a-122n (generically, entry 122), each of which is associated with a record of its log file.
- each input 122 may include a corresponding arrangement indicator 124a-124n (generically, arrangement indicator 124), as well as a synchronization indicator 12óa-126n (generically, synchronization indicator 126).
- arrangement indicator 124a-124n generatorically, arrangement indicator 124
- synchronization indicator 12óa-126n generatorically, synchronization indicator 126
- processor architectures may have different control configurations to analyze the instructions and issue them to one or more processor execution units.
- Some processors may include allocation devices, backup stations, scorecards, controllers and other varieties of logic in order to determine when a decoded instruction has available several resources that are necessary for the execution, and select, either according to a criterion by order or out of order, the instruction to be provided to an execution unit.
- the marker frame 100 may be part of an instruction emitter, in any form that is available in a given processor architecture, or it may be connected to said instruction emitter or to another logic similar for that purpose. to allow the issuance decisions of the instructions to be made based on the information presented in the marker table 100.
- a system 200 may include an execution unit 230 that performs various operations on the incoming data.
- a first log file 220 and a second log file 225 may be connected to the execution unit 230.
- the first log file 220 may be associated with a first processing sequence
- the second log file 225 may be associated with a second processing sequence.
- an instruction 205 from the first processing sequence and to be carried out by the execution unit 230 may include an operation code in order to identify a certain type of instruction, for example, an operation of addition, multiplication or other operation.
- Instruction 205 also identifies a destination for the result, namely a destination operand, which may correspond to a first register, rax. Associated with this destination operand there is a location indicator, which is identified in Figure 3 as DISTING DESTINATION ("REMOTE_DEST).
- this location indicator has a value of one, indicating that the destination operand must be accessed later by a distant processing sequence, that is, the second processing sequence in the example of Figure 3, and can also be accessed by consuming instructions existing in the local processing sequence
- instruction 205 also identifies two source operands, in particular a first source operand (SRC l) that accesses an rbx register, because a location indicator of this source operand (that is, SRCJMSTANTE ⁇ "REMOTE_SRC”) is set to a value of one, it is possible to access this source operand from a remote log file, that is, log file 225.
- SRC l first source operand
- REMOTE_SRC REMOTE_SRC
- instruction 205 includes a second source operand (SRC2) that is intended to access a second register, rcx, which can be obtained from the first register file 220, according to the local indicator for this second source operand (that is, SRC_DISTANTE) is set to a value of zero.
- SRC2 second source operand
- instruction 205 may not continue until it is active (for example, it has been set ) a synchronization indicator associated with that source operand.
- tc can iiu p StahlScg uu Sc COn lá. ma u uuCion 205 lia. ⁇ > The one that ÜC ciic ueu uc also activates a readiness indicator associated with the second source operand.
- the marker frame 100 includes a first input 1 12b, associated with the first source operand, and a second input 1 12c, associated with the second operating
- instruction 205 can be issued to the execution unit 230. This is so since the appropriate values are present. in the rbx registers, of the first log file 225, and rcx, of the second log file 220. Accordingly, the execution unit 230 can read the rcx in the first log file 220, and read the rbx in the second log file 225.
- the result can be stored in the rax target register of the first log file 220.
- an input 1 12a associated with the first processing sequence can be updated, so that it has an arrangement indicator 1 14a having a value of one, and a synchronization indicator 1 16, which has a value of zero.
- the input 126a associated with the second processing sequence may have a corresponding synchronization indicator 1 16a, set to a value of one in order to indicate that the appropriate value is present in the rax register, which acts as operating from source for a consuming instruction of the second processing sequence. While shown with this particular implementation in the embodiment of Figure 3, the scope of the present invention is not limited in this regard.
- a controller 250 may be present within the system 200.
- the controller 250 may include various combinations of hardware or physical devices, software or programming, firmware or logic permanently installed in the hardware, or combinations of the same, in order to be in charge of issuing instructions from each of the processing sequences to the execution unit 230.
- the controller 250 may implement a logical functional capability in order to enable the issuance of instructions when the instruction source operands are available.
- the functional capacity It may correspond to a logical operation Y ("AND") in which the disposition indicators for the local source operands can be analyzed in order to determine whether they are the two indicative of a ready or ready state. If this is not the case, the instruction can be maintained until the two disposition indicators are adjusted, for example, in a high logical state, indicative of a disposition state.
- the logical operation may correspond to a logical operation Y ("AND") in which a local disposition indicator is checked, for the local operand, and a local synchronization indicator, for the remote operand, in order to determine whether both are indicative of the availability of the operands. If so, the operation can continue; otherwise, the controller 250 may suspend the operation until said two source operands are ready, in accordance with what is indicated by its associated arrangement and synchronization indicators.
- a multi-core processor such as a dual-core or multi-core processor, may similarly implement the embodiments of the present il ⁇ VciiC x uil.
- Pümti ü a CL Utii n ⁇ processing performed in a first core can access a log file associated with a sequence of different processing that is carried out in another core. In doing so, however, certain amounts of processes or activities of organization to access said remote log files during the execution of a given processing sequence.
- the flow information associated with the instruction that is, the indications or flow marks for the various source and destination operands
- different entities can generate processing sequences and the corresponding code, eg, a dynamic optimization device, a compiler, a hardware optimization device, and so on. Whichever entity generates this code, you can mark the instructions with appropriate flow marks.
- the entity can guarantee that a record that is involved in a producer / consumer relationship has not been redefined in the production processing sequence before being read by the consuming processing sequence.
- synchronization points can be established between the processing sequences, such that the production processing sequence is not continued until the consuming processing sequence has read the associated value. While the scope of the present invention is not limited in this respect, in some embodiments, said synchronization points can be implemented using the synchronization indicators described above.
- the flow marks that have been described in the embodiment of Figure 3 may be single-bit indicators intended to indicate whether an operand is present in a local or distant position, other embodiments may extend said location indicators to multiple bits in order to indicate the presence in more than two such processing sequences. That is, in some implementations, more than two processing sequences may be running in a given core or in multiple given cores.
- traces of code can be generated by a dynamic optimization device.
- a compiler-based support or other mechanisms may be used to generate instructions with the appropriate flow mark information.
- FIG 4 there is shown a flow chart or a method for generating flow mark information or in accordance with an embodiment of the present invention.
- method 300 may begin by initializing the synchronization indicators for the first and second processing sequences, in a non-synchronized state (block 310).
- the two processing sequences may consist of traces of code generated by a dynamic optimization device.
- these multiple traces can be executed simultaneously by different processing sequences of a single processor core, namely the first and second processing sequences.
- the synchronization indicators can be initialized in a non-synchronized state, for example, a logical value of zero in some embodiments.
- the optimization device can have control over which of the records are involved in a producer / consumer manner, and, thus, the optimization device can control also the presence of a synchronization point, at which time operands can be synchronized.
- a record that is consumed at a distance can be defined many times in the production processing sequence, the optimization device can guarantee that the last definition of a record before being used by a c ⁇ & u is the definition that establishes the producer relationship / consumer
- the control can then proceed to rhombus 320, in which it can be determined whether an operand is both produced and consumed only by a single processing sequence.
- the control can pass to block 330, in which the location of the use can be indicated in the flow or instruction marks that use the operand. Specifically, if this operand is used, either as a source or destination operand, a corresponding location indicator, that is, a flow mark, may be in a restored state (for example, a logical zero) for the purpose to indicate that that operand is used only locally.
- the control can pass to rhombus 340.
- rhombus 340 it can be determined whether the identified operand It is a destination operand to be consumed by another processing sequence. If so, the control goes to block 350, where local uses of the destination operand can be identified according to producing instruction flow marks until a final definition of the destination operand is reached (i.e., the last definition before use by the consuming processing sequence). Once this last definition is reached, the control passes block 360, in which this instruction can be identified with a flow mark in order to indicate the remote use of the destination operand.
- this instruction may have a location indicator for the destination operand in a set state (for example, a logical one).
- control passes to rhombus 370, in which it can be determined whether a source operand is to be produced by the other processing sequence. If not, the control returns back to block 330, explained above and in which the flow marks or associated to the operand can be indicated as local (ie, with a logical value of zero). Referring still to Figure 4, if, instead, it is determined in rhombus 370 that the source operand is to be produced
- two processing sequences that run in the same core can be synchronized by direct access to each other's records.
- By accessing information at the registration level it is possible to reduce the organizational synchronization activities associated with obtaining information from a different processing sequence through indirect memory, and bandwidth problems can be alleviated or alleviated. memory.
- Through the use of registers of log files present in the core communication of 64-bit or 128-bit values can take place between two processing sequences that run in the same core. In this way, there may be no need to reproduce or copy a registration status for a processing sequence that has just been deposited, since the operands to be used by the processing sequence that has just been deposited may be read directly in the log file of the depositing processing sequence.
- certain embodiments can be used in detailed processing sequence sequencing paradigms, such as operation in multiple processing sequences, help sequence processing and processing sequences. forward travel, eg emplo. Accordingly, the instructions present in said processing sequences can be reduced, since copying or other instructions that reproduce the architecture state of a first log file can be avoided. Instead, these additional processing sequences can directly obtain the necessary information from the log file of another processing sequence.
- the multi-processor system 500 is a point-to-point interconnected system, and includes a first processor 570 and a second processor 580 connected through a point-to-point interconnection 550.
- each of the 570 and 580 processors can be a multi-core processor that includes first and second processor cores (i.e., processor cores 574a and 574b, and processor cores 584a and 584b). Note that each of the cores can include multiple log files, each of them intended to be used by a different processing sequence.
- each core may include hardware, software or firmware, or logic permanently installed in the hardware, in order to allow direct access by a consuming processing sequence to a log file of the producing processing sequence, to through flow marks and synchronization indicators, in accordance with an embodiment of the present invention.
- a processing sequence that runs in the processor core 574a can access a log file associated with a processing sequence that runs in the processor core 574b, and vice versa.
- the first processor 570 additionally includes interfaces
- a memory controller hub (MCH's - “memory controller hubs”) 572 and 582 connect the processors to respective memories, namely a 532 memory and a memory 534, which may be portions of a main memory attached locally to the respective processors.
- the first processor 570 and the second processor 580 can be connected to a chip installation 590 through interconnections of P-P 552 and 554, respectively.
- the 590 chip installation includes some 594 and 598 PP interfaces.
- the 590 chip installation includes an 592 interface to connect the 590 chip installation with a high graphics generating device. 538 performance.
- an Advanced Graphics Port (AGP) bus 539 can be used in order to connect the graphics generator device 538 with the 590 chip installation.
- the AGP 539 bus can be conform to the Accelerated Graphics Door Interface Specification, Revision 2.0, published on May 4, 1998 by the Intel Corporation, of Santa Clara, California.
- a point-to-point interconnection 539 can connect these components.
- the chip installation 590 may be connected to a first bus 516 through an interface 596.
- the first bus 516 may consist of a Peripheral Component Interconnect (PCI - "Peripheral Component Interconnect” bus. ), as defined by the PCI Local Bus Specification, Production Version, Revision 2, 1, dated June 1995, or on a bus such as a PCI Express TM bus or other input / output interconnect bus ( Third generation I / O - "I / O”), although the scope of the present invention is not limited thereto.
- PCI - Peripheral Component Interconnect
- various I / O devices 514 can be connected to a first bus 516, in conjunction with a bus bridge 518 that connects the first bus 516 with a second bus 520.
- the second bus 520 it can be a low pin bus (LPC - "low pin bus").
- LPC - "low pin bus” Various devices can be connected to the second bus 520, including, for example, a keyboard /
- the embodiments can be implemented in code and can be stored in a storage medium that has, stored therein, instructions that can be used to program a system so that it carries out the instructions.
- the storage medium may include any type of disk, including floppy disks, optical discs, compact disc read-only memories (CD-ROMs), compact discs likely to re-enroll in them (CD-RWs) and magneto-optical discs , semiconductor devices such as read-only memories (ROMs) 5 random access memories (RAMs), such as dynamic random access memories (DRAMs), static random access memories (SRAMs), programmable read-only memories that can be erased (EPROMs), pulse cooling memories (“flash memories”), programmable read-only and electrically erasable memories (EEPROMs), magnetic or optical cards, or any other type of medium suitable for storing electronic instructions, although It is not limited by them.
- ROMs read-only memories
- RAMs random access memories
- DRAMs dynamic random access memories
- SRAMs static random access memories
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200680056225.2A CN101529377B (zh) | 2006-10-27 | 2006-10-27 | 处理器中多线程之间通信的方法、装置和系统 |
JP2009524212A JP2010500679A (ja) | 2006-10-27 | 2006-10-27 | プロセッサ内のマルチスレッド間通信 |
PCT/ES2006/070162 WO2008049938A1 (es) | 2006-10-27 | 2006-10-27 | Comunicación entre múltiples secuencias de procesamiento en un procesador |
US12/446,930 US8261046B2 (en) | 2006-10-27 | 2006-10-27 | Access of register files of other threads using synchronization |
DE112006004005T DE112006004005T5 (de) | 2006-10-27 | 2006-10-27 | Kommunikation zwischen Mehrfach-Ausführungsfolgen in einem Prozessor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/ES2006/070162 WO2008049938A1 (es) | 2006-10-27 | 2006-10-27 | Comunicación entre múltiples secuencias de procesamiento en un procesador |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008049938A1 true WO2008049938A1 (es) | 2008-05-02 |
Family
ID=39324164
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/ES2006/070162 WO2008049938A1 (es) | 2006-10-27 | 2006-10-27 | Comunicación entre múltiples secuencias de procesamiento en un procesador |
Country Status (5)
Country | Link |
---|---|
US (1) | US8261046B2 (es) |
JP (1) | JP2010500679A (es) |
CN (1) | CN101529377B (es) |
DE (1) | DE112006004005T5 (es) |
WO (1) | WO2008049938A1 (es) |
Families Citing this family (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2159692A4 (en) * | 2007-06-20 | 2010-09-15 | Fujitsu Ltd | Information processor and load cancellation control method |
US8832712B2 (en) * | 2009-09-09 | 2014-09-09 | Ati Technologies Ulc | System and method for synchronizing threads using shared memory having different buffer portions for local and remote cores in a multi-processor system |
US8650554B2 (en) | 2010-04-27 | 2014-02-11 | International Business Machines Corporation | Single thread performance in an in-order multi-threaded processor |
US8667253B2 (en) | 2010-08-04 | 2014-03-04 | International Business Machines Corporation | Initiating assist thread upon asynchronous event for processing simultaneously with controlling thread and updating its running status in status register |
US8793474B2 (en) | 2010-09-20 | 2014-07-29 | International Business Machines Corporation | Obtaining and releasing hardware threads without hypervisor involvement |
US8713290B2 (en) | 2010-09-20 | 2014-04-29 | International Business Machines Corporation | Scaleable status tracking of multiple assist hardware threads |
US8572628B2 (en) | 2010-12-02 | 2013-10-29 | International Business Machines Corporation | Inter-thread data communications in a computer processor |
US8561070B2 (en) | 2010-12-02 | 2013-10-15 | International Business Machines Corporation | Creating a thread of execution in a computer processor without operating system intervention |
US9529596B2 (en) * | 2011-07-01 | 2016-12-27 | Intel Corporation | Method and apparatus for scheduling instructions in a multi-strand out of order processor with instruction synchronization bits and scoreboard bits |
US9086873B2 (en) | 2013-03-15 | 2015-07-21 | Intel Corporation | Methods and apparatus to compile instructions for a vector of instruction pointers processor architecture |
US9389871B2 (en) | 2013-03-15 | 2016-07-12 | Intel Corporation | Combined floating point multiplier adder with intermediate rounding logic |
US9348595B1 (en) | 2014-12-22 | 2016-05-24 | Centipede Semi Ltd. | Run-time code parallelization with continuous monitoring of repetitive instruction sequences |
US9135015B1 (en) | 2014-12-25 | 2015-09-15 | Centipede Semi Ltd. | Run-time code parallelization with monitoring of repetitive instruction sequences during branch mis-prediction |
US9208066B1 (en) | 2015-03-04 | 2015-12-08 | Centipede Semi Ltd. | Run-time code parallelization with approximate monitoring of instruction sequences |
WO2016156908A1 (en) * | 2015-03-27 | 2016-10-06 | Intel Corporation | Apparatus and method for inter-strand communication |
US10296350B2 (en) | 2015-03-31 | 2019-05-21 | Centipede Semi Ltd. | Parallelized execution of instruction sequences |
US10296346B2 (en) | 2015-03-31 | 2019-05-21 | Centipede Semi Ltd. | Parallelized execution of instruction sequences based on pre-monitoring |
US9715390B2 (en) | 2015-04-19 | 2017-07-25 | Centipede Semi Ltd. | Run-time parallelization of code execution based on an approximate register-access specification |
US10423415B2 (en) * | 2017-04-01 | 2019-09-24 | Intel Corporation | Hierarchical general register file (GRF) for execution block |
US11157276B2 (en) | 2019-09-06 | 2021-10-26 | International Business Machines Corporation | Thread-based organization of slice target register file entry in a microprocessor to permit writing scalar or vector data to portions of a single register file entry |
US11093246B2 (en) | 2019-09-06 | 2021-08-17 | International Business Machines Corporation | Banked slice-target register file for wide dataflow execution in a microprocessor |
US11119774B2 (en) | 2019-09-06 | 2021-09-14 | International Business Machines Corporation | Slice-target register file for microprocessor |
US11816486B2 (en) * | 2022-01-18 | 2023-11-14 | Nxp B.V. | Efficient inter-thread communication between hardware processing threads of a hardware multithreaded processor by selective aliasing of register blocks |
CN114610394B (zh) * | 2022-03-14 | 2023-12-22 | 海飞科(南京)信息技术有限公司 | 指令调度的方法、处理电路和电子设备 |
CN117170750B (zh) * | 2023-09-04 | 2024-04-30 | 上海合芯数字科技有限公司 | 多源操作数指令的调度方法、装置、处理器、设备及介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5187796A (en) * | 1988-03-29 | 1993-02-16 | Computer Motion, Inc. | Three-dimensional vector co-processor having I, J, and K register files and I, J, and K execution units |
EP0762270A2 (en) * | 1995-09-11 | 1997-03-12 | International Business Machines Corporation | Microprocessor with load/store operation to/from multiple registers |
WO1999008185A1 (en) * | 1997-08-06 | 1999-02-18 | Advanced Micro Devices, Inc. | A dependency table for reducing dependency checking hardware |
US5968160A (en) * | 1990-09-07 | 1999-10-19 | Hitachi, Ltd. | Method and apparatus for processing data in multiple modes in accordance with parallelism of program by using cache memory |
Family Cites Families (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2537526B2 (ja) * | 1987-12-02 | 1996-09-25 | 富士通株式会社 | マルチプロセッサシステム |
JP2834298B2 (ja) * | 1990-09-19 | 1998-12-09 | 株式会社日立製作所 | データ処理装置及びデータ処理方法 |
JPH06242948A (ja) * | 1993-02-16 | 1994-09-02 | Fujitsu Ltd | パイプライン処理計算機 |
JP2970553B2 (ja) * | 1996-08-30 | 1999-11-02 | 日本電気株式会社 | マルチスレッド実行方法 |
US5887166A (en) * | 1996-12-16 | 1999-03-23 | International Business Machines Corporation | Method and system for constructing a program including a navigation instruction |
US5845307A (en) * | 1997-01-27 | 1998-12-01 | Sun Microsystems, Inc. | Auxiliary register file accessing technique |
US6061710A (en) * | 1997-10-29 | 2000-05-09 | International Business Machines Corporation | Multithreaded processor incorporating a thread latch register for interrupt service new pending threads |
US6317820B1 (en) * | 1998-06-05 | 2001-11-13 | Texas Instruments Incorporated | Dual-mode VLIW architecture providing a software-controlled varying mix of instruction-level and task-level parallelism |
US6286027B1 (en) * | 1998-11-30 | 2001-09-04 | Lucent Technologies Inc. | Two step thread creation with register renaming |
US20030188141A1 (en) * | 2002-03-29 | 2003-10-02 | Shailender Chaudhry | Time-multiplexed speculative multi-threading to support single-threaded applications |
US7149878B1 (en) * | 2000-10-30 | 2006-12-12 | Mips Technologies, Inc. | Changing instruction set architecture mode by comparison of current instruction execution address with boundary address register values |
US20020103847A1 (en) * | 2001-02-01 | 2002-08-01 | Hanan Potash | Efficient mechanism for inter-thread communication within a multi-threaded computer system |
US6928645B2 (en) * | 2001-03-30 | 2005-08-09 | Intel Corporation | Software-based speculative pre-computation and multithreading |
US6950927B1 (en) * | 2001-04-13 | 2005-09-27 | The United States Of America As Represented By The Secretary Of The Navy | System and method for instruction-level parallelism in a programmable multiple network processor environment |
US6976155B2 (en) * | 2001-06-12 | 2005-12-13 | Intel Corporation | Method and apparatus for communicating between processing entities in a multi-processor |
US7752423B2 (en) * | 2001-06-28 | 2010-07-06 | Intel Corporation | Avoiding execution of instructions in a second processor by committing results obtained from speculative execution of the instructions in a first processor |
US7185338B2 (en) * | 2002-10-15 | 2007-02-27 | Sun Microsystems, Inc. | Processor with speculative multithreading and hardware to support multithreading software |
US7484075B2 (en) * | 2002-12-16 | 2009-01-27 | International Business Machines Corporation | Method and apparatus for providing fast remote register access in a clustered VLIW processor using partitioned register files |
US20040268093A1 (en) * | 2003-06-26 | 2004-12-30 | Samra Nicholas G | Cross-thread register sharing technique |
US7596682B2 (en) * | 2004-04-08 | 2009-09-29 | International Business Machines Corporation | Architected register file system utilizes status and control registers to control read/write operations between threads |
US8166282B2 (en) * | 2004-07-21 | 2012-04-24 | Intel Corporation | Multi-version register file for multithreading processors with live-in precomputation |
US7610470B2 (en) * | 2007-02-06 | 2009-10-27 | Sun Microsystems, Inc. | Preventing register data flow hazards in an SST processor |
US20080229062A1 (en) * | 2007-03-12 | 2008-09-18 | Lorenzo Di Gregorio | Method of sharing registers in a processor and processor |
US8898438B2 (en) * | 2007-03-14 | 2014-11-25 | XMOS Ltd. | Processor architecture for use in scheduling threads in response to communication activity |
US9047197B2 (en) * | 2007-10-23 | 2015-06-02 | Oracle America, Inc. | Non-coherent store instruction for fast inter-strand data communication for processors with write-through L1 caches |
-
2006
- 2006-10-27 CN CN200680056225.2A patent/CN101529377B/zh not_active Expired - Fee Related
- 2006-10-27 JP JP2009524212A patent/JP2010500679A/ja active Pending
- 2006-10-27 DE DE112006004005T patent/DE112006004005T5/de not_active Withdrawn
- 2006-10-27 WO PCT/ES2006/070162 patent/WO2008049938A1/es active Application Filing
- 2006-10-27 US US12/446,930 patent/US8261046B2/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5187796A (en) * | 1988-03-29 | 1993-02-16 | Computer Motion, Inc. | Three-dimensional vector co-processor having I, J, and K register files and I, J, and K execution units |
US5968160A (en) * | 1990-09-07 | 1999-10-19 | Hitachi, Ltd. | Method and apparatus for processing data in multiple modes in accordance with parallelism of program by using cache memory |
EP0762270A2 (en) * | 1995-09-11 | 1997-03-12 | International Business Machines Corporation | Microprocessor with load/store operation to/from multiple registers |
WO1999008185A1 (en) * | 1997-08-06 | 1999-02-18 | Advanced Micro Devices, Inc. | A dependency table for reducing dependency checking hardware |
Also Published As
Publication number | Publication date |
---|---|
US20100005277A1 (en) | 2010-01-07 |
JP2010500679A (ja) | 2010-01-07 |
CN101529377A (zh) | 2009-09-09 |
DE112006004005T5 (de) | 2009-06-10 |
CN101529377B (zh) | 2016-09-07 |
US8261046B2 (en) | 2012-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2008049938A1 (es) | Comunicación entre múltiples secuencias de procesamiento en un procesador | |
CN108351830B (zh) | 用于存储器损坏检测的硬件装置和方法 | |
US8639882B2 (en) | Methods and apparatus for source operand collector caching | |
US9727475B2 (en) | Method and apparatus for distributed snoop filtering | |
US9442559B2 (en) | Exploiting process variation in a multicore processor | |
TWI590162B (zh) | 用於使處理元件提早離開深度睡眠狀態的處理器和方法 | |
KR20150112774A (ko) | 동적 비순차적 프로세서 파이프라인을 구현하기 위한 방법 및 장치 | |
US20140189302A1 (en) | Optimal logical processor count and type selection for a given workload based on platform thermals and power budgeting constraints | |
KR20180021812A (ko) | 연속하는 블록을 병렬 실행하는 블록 기반의 아키텍쳐 | |
KR20180020985A (ko) | 디커플링된 프로세서 명령어 윈도우 및 피연산자 버퍼 | |
KR20180021850A (ko) | 블록 크기에 기초하여 명령어 블록을 명령어 윈도우에 맵핑하기 | |
KR20140113444A (ko) | 공유 메모리에 대한 액세스들의 동기화를 완화하기 위한 프로세서들, 방법들 및 시스템들 | |
US9875108B2 (en) | Shared memory interleavings for instruction atomicity violations | |
CN104252392A (zh) | 一种访问数据缓存的方法和处理器 | |
CN104969178B (zh) | 用于实现便笺式存储器的装置和方法 | |
US9354875B2 (en) | Enhanced loop streaming detector to drive logic optimization | |
BR112015022683B1 (pt) | Sistema de processamento e método de realização de uma operação de manipulação de dados | |
KR20170001577A (ko) | 트랜잭션적인 전력 관리를 수행하기 위한 하드웨어 장치들 및 방법들 | |
KR20190033084A (ko) | 로드 스토어 유닛들을 바이패싱하여 스토어 및 로드 추적 | |
CN108874458A (zh) | 一种多核SoC的固件启动方法以及多核SoC设备 | |
US9684541B2 (en) | Method and apparatus for determining thread execution parallelism | |
US20140281612A1 (en) | Measurement of performance scalability in a microprocessor | |
US20140223105A1 (en) | Method and apparatus for cutting senior store latency using store prefetching | |
US9552169B2 (en) | Apparatus and method for efficient memory renaming prediction using virtual registers | |
US10083033B2 (en) | Apparatus and method for efficient register allocation and reclamation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200680056225.2 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 06820035 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2009524212 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1120060040057 Country of ref document: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12446930 Country of ref document: US |
|
RET | De translation (de og part 6b) |
Ref document number: 112006004005 Country of ref document: DE Date of ref document: 20090610 Kind code of ref document: P |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 06820035 Country of ref document: EP Kind code of ref document: A1 |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8607 |