US20090235258A1 - Multi-Thread Peripheral Processing Using Dedicated Peripheral Bus - Google Patents
Multi-Thread Peripheral Processing Using Dedicated Peripheral Bus Download PDFInfo
- Publication number
- US20090235258A1 US20090235258A1 US12/410,760 US41076009A US2009235258A1 US 20090235258 A1 US20090235258 A1 US 20090235258A1 US 41076009 A US41076009 A US 41076009A US 2009235258 A1 US2009235258 A1 US 2009235258A1
- Authority
- US
- United States
- Prior art keywords
- threads
- instruction
- thread
- program
- peripheral
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 230000002093 peripheral effect Effects 0.000 title claims abstract description 118
- 238000012545 processing Methods 0.000 title claims abstract description 52
- 238000012546 transfer Methods 0.000 claims abstract description 15
- 230000015654 memory Effects 0.000 claims description 53
- 238000000034 method Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 5
- 230000004044 response Effects 0.000 description 21
- 238000010586 diagram Methods 0.000 description 18
- 230000006870 function Effects 0.000 description 8
- 230000008901 benefit Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 230000010365 information processing Effects 0.000 description 2
- 230000000977 initiatory effect Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000004148 unit process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30094—Condition code generation, e.g. Carry, Zero flag
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
- G06F9/3851—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution from multiple instruction streams, e.g. multistreaming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G06F9/3891—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
Definitions
- This invention relates to computer architecture.
- the invention relates to multi-thread computers.
- SONET Synchronous Optical Network
- WAN Wide Area Network
- Popular standards include T1 (1.5 Mbps), T3 (45 Mbps), OC-3c (155 Mbps), OC-12c (622 Mbps), OC48c (2.5 Gbps), OC-192c (10 Gbps), OC-768c (40 Gbps), etc.
- FIG. 1 is a diagram illustrating a system in which one embodiment of the invention can be practiced.
- FIG. 2 is a diagram illustrating a multiprocessor core shown in FIG. 1 according to one embodiment of the invention.
- FIG. 3 is a diagram illustrating a multi-threaded processor shown in FIG. 2 according to one embodiment of the invention.
- FIG. 4 is a diagram illustrating a processing slice shown in FIG. 3 according to one embodiment of the invention.
- FIG. 5A is a diagram illustrating format of a command message according to one embodiment of the invention.
- FIG. 5B is a diagram illustrating format of a response message according to one embodiment of the invention.
- FIG. 6 is a diagram illustrating peripheral operations according to one embodiment of the invention.
- FIG. 7 is a diagram illustrating an instruction processing according to one embodiment of the invention.
- the present invention is a method and apparatus to perform peripheral operations in a multi-thread processor.
- a peripheral bus is coupled to a peripheral unit to transfer peripheral information including a command message specifying a peripheral operation.
- a processing slice executes a plurality of threads and is coupled to the peripheral bus to execute peripheral operations. The plurality of threads includes a first thread sending the command message to the peripheral unit.
- the command message includes at least one of a message content, a peripheral address identifying the peripheral unit, and a command code specifying the peripheral operation.
- the peripheral information includes a response message sent from the peripheral unit to the processing slice.
- the response message indicates the peripheral operation is completed and includes at least one of a thread identifier identifying the first thread, one or more result phrases, each including an operation result, a data register address specifying a data register in the processing slice to store the operation result, and an end flag indicating the last one of the result phrases.
- the command message may be a wait instruction or a non-wait instruction.
- the processing slice disables the first thread after sending the command message if the command message is a wait instruction.
- the first thread continues to execute after sending the command message if the command message is a non-wait instruction.
- the processing slice enables the first thread after receiving the response message from the peripheral unit if the first thread was disabled.
- the peripheral bus may include a bi-directional bus to transfer the command message from the processing slice to the peripheral unit and the response message from peripheral unit to the processing slice.
- the processing slice includes an instruction processing unit, a thread control unit, a peripheral unit, a memory access unit, a functional unit, a condition code memory, and a register file.
- the processing slice is configured to support execution of a number of threads.
- Several processing slices operate concurrently in a processor.
- a multiprocessor core may include several of these processors in a network processor system.
- the instruction processing unit processes instructions fetched from a program memory.
- the instruction processing unit includes an instruction fetch unit, an instruction buffer, and an instruction decoder and dispatcher.
- the instruction fetch unit fetches the instructions from the program memory using a number of program counters, each of which corresponds to each of the threads.
- the instruction buffer holds the fetched instructions waiting for execution.
- the instruction decoder and dispatcher decodes the instructions and dispatches the decoded instructions to the memory access unit, the functional unit, or the peripheral unit.
- the thread control unit manages initiation and termination of at least one of the threads.
- the peripheral unit transfers the peripheral information between the peripheral unit and the instruction processing unit.
- the peripheral unit receives command messages from the processing slices to direct performance of peripheral operations and sends response messages to the processing slices to convey results of the peripheral operations to the threads.
- the memory access unit provides access to one of a number of data memories via a data memory switch.
- the functional unit performs an operation specified in one of the instructions.
- the condition code memory stores a number of condition codes, each of which corresponds to each of the threads.
- the register file has a number of data registers, of which a subset is associated with each of the threads.
- FIG. 1 is a diagram illustrating a system 100 in which one embodiment of the invention can be practiced.
- the system 100 includes a multiprocessor core 110 , a memory controller 120 , peripheral units 130 , an off-chip program/data memory 140 , and a host control processor 150 .
- the multiprocessor core 110 is a high-performance multi-thread computing subsystem capable of performing all functions related to network operations. These network operations may include adjusting transmission rates, handling special cells and packets used to implement flow control protocols on an individual connection basis, and supporting Asynchronous Transfer Mode (ATM) traffic management for Available Bit Rate (ABR), Variable Bit Rate (VBR), and Unspecified Bit Rate (UBR) connections.
- the memory controller 120 provides access to additional memory devices and includes circuitry to interface to various memory types including dynamic random access memory (DRAM) and static random access memory (SRAM).
- the peripheral units 130 include a number of peripheral or input/output (I/O) units for peripheral or I/O operations.
- the peripheral units 130 include an input interface 162 , and output interface 164 , a cyclic redundancy code (CRC) engine 166 , a check-out content addressable memory (CAM) 168 , a bit vector unit 172 , and a spare 174 .
- the input and output interfaces 162 and 164 provide interfaces to inbound and outbound network traffics, respectively. These interfaces may include line and switch/system interfaces that support industry standards, including multi-phy features such as Universal Test and Operations PHY Interface for ATM (UTOPIA).
- the CRC engine 166 supports segmentation and re-assembly for ATM Adaptation Layer Type 5 (AAL5) transmission of packets over ATM connections.
- the check-out CAM 168 is an associative memory unit that supports the maintenance of several connection records in the on-chip memory for the duration of cell processing for those connections.
- the bit vector unit 172 supports round-robin scheduling algorithms at the OC-48 line rate.
- the off-chip program/data memory 140 includes memory devices that store programs or data in addition to the on-chip programs and data stored in the multiprocessor core 110 .
- the host control processor 150 is a processor that performs the general control functions in the network. These functions may include connection set-up, parameter adjustment, operation monitoring, program loading and debugging support.
- FIG. 2 is a diagram illustrating the multiprocessor core 110 shown in FIG. 1 according to one embodiment of the invention.
- the multiprocessor core 110 includes four multi-thread processors 210 1 to 210 4 , a split transaction switch 220 , a host interface bus 250 , and a peripheral bus 260 . It is noted that the use of four processors is for illustrative purposes only. As is known to one skilled in the art, any reasonable number of processors can be used.
- the split transaction switch 210 permits each of the processors to access the data words held in any of the other three data memories with a small additional access time.
- the host interface bus 250 allows the any of the four processors 210 1 to 210 4 to communicate with the host control processor 150 ( FIG. 1 ). This includes passing parameters, loading program and data, and reporting status.
- the peripheral bus 260 allows any one of the peripheral units 130 to communicate with any of the processors 210 1 to 210 4 .
- Some peripheral units may have direct memory access (DMA) channels to the local data memories of any one of the processors 210 1 to 210 4 . In one embodiment, each of these channels supports burst transfer of 32-bit data at 100 MHz clock rate, equivalent to greater than the OC-48 speed.
- DMA direct memory access
- FIG. 3 is a diagram illustrating the multi-thread processor 210 shown in FIG. 2 according to one embodiment of the invention.
- the multi-thread processor 210 includes four processing slices (PS's) 310 1 to 310 4 , a data memory switch 320 , banks of data memory 330 , a peripheral message unit 340 , a control and monitor interface 350 , and a program memory 360 . It is noted that the use of four PS's is for illustrative purposes only. As is known by one skilled in the art, any number of PS's can be used.
- the multi-thread processor 210 is a data and/or information processing machine that supports the simultaneous execution of several programs, each program being represented by a sequence of instructions.
- a thread is a sequence of instructions that may be a program, or a part of a program.
- the multi-thread processor 210 may have one or more instruction execution resources such as arithmetic logic units, branch units, memory interface units, and input-output interface units. In any operation cycle of the multi-thread processor 210 , any instruction execution resource may operate to carry out execution of an instruction in any thread. Any one instruction resource unit may participate in the execution of instructions of different threads in successive cycles of processor operation.
- the multi-thread processor 210 may have a separate hardware register, referred to as the program counter, for each thread that indicates the position or address of the next instruction to be executed within the thread.
- a multi-thread multiprocessor is a data and/or information processing system composed of several multi-thread processors.
- Each of the PS's 310 1 to 310 4 contains a program sequencer and execution units to perform instruction fetch, decode, dispatch and execution for four threads. Each of the PS's operates by interleaving the execution of instructions from the four threads, including the ability to execute several instructions concurrently in the same clock cycle.
- the data memory switch 320 allows any of the four PS's 310 1 to 310 4 to access any data memory bank in the banks of data memories 330 .
- the banks of memories 330 include four banks 335 1 to 335 4 : data memory banks 0 to 3 .
- Each of the data memory banks 335 1 to 335 4 stores data to be used or accessed by any of the PS's 310 1 to 310 4 .
- each of the data memory banks 335 1 to 335 4 has an interface to the DMA bus to support DMA transfers between the peripherals and data memory banks.
- the banks 335 1 to 335 4 are interleaved on the low-order address bits. In this way, DMA transfers to and from several of the peripheral units 130 can proceed simultaneously with thread execution without interference.
- the four PS's 310 1 to 310 4 are connected to the peripheral message unit 340 via four PS buses 315 1 to 315 4 , respectively.
- the peripheral message unit 340 is a distribution or switching location to switch the peripheral bus 260 to each of the PS buses 315 1 to 315 4 .
- the peripheral message unit 340 is interfaced to the peripheral bus 260 via a command bus 342 and a response bus 344 .
- the command bus 342 and the response bus 344 may be combined into one single bi-directional bus. Appropriate signaling scheme or handshaking protocol is used to determine if the information is a command message or the response message.
- a command message is sent from the issuing PS to the command bus 342 .
- the command message specifies the peripheral unit where the peripheral operation is to be performed by including the address of the peripheral unit. All peripheral units connected to the peripheral bus 260 have an address decoder to decode the peripheral unit address in the command message.
- a peripheral unit recognizes that it is the intended peripheral unit for the peripheral operation, it will decode the command code contained in the command message and then carry out the operation. If the command message is a wait message instruction, the issuing thread is stalled for an interval during which the responding peripheral unit carries out the peripheral operation.
- the resources associated with the issuing thread are available to other threads in the issuing slice. In this way, high resource utilization can be achieved. If it is a no_wait instruction, the issuing thread continues executing its sequence without waiting for the peripheral operation to be completed. The issuing thread may or may not need a response from the peripheral unit.
- the control and monitor interface 350 permits the host control processor 150 to interact with any one of the four PS's 310 1 to 310 4 through the host interface bus 350 to perform control and monitoring functions.
- the program memory 360 stores program instructions to be used by any one of the threads in any one of the four PS's 310 1 to 310 4 .
- the program memory 360 supports simultaneous fetches of four instruction words in each clock cycle.
- FIG. 4 is a diagram illustrating the processing slice 310 shown in FIG. 3 according to one embodiment of the invention.
- the processing slice 310 includes an instruction processing unit 410 , a peripheral unit interface 420 , a register file 430 , a condition code memory 440 , a functional unit 450 , a memory access unit 460 , and a thread control unit 470 .
- the processing slice 310 is configured to have four threads. The use of four threads is for illustrative purposes only. As is known by one skilled in the art, any number of threads can be used.
- the instruction processing unit 410 processes instructions fetched from the program memory 360 .
- the instruction processing unit 410 includes an instruction fetch unit 412 , an instruction buffer 414 , and an instruction decoder and dispatcher 416 .
- the instruction fetch unit 412 fetches the instructions from the program memory 360 using a plurality of program counters. Each program counter corresponds to each of the threads.
- the instruction buffer 414 holds the fetched instructions waiting for execution for any of the four threads.
- the instruction decoder and dispatcher 416 decodes the instructions and dispatches the decoded instructions to the peripheral unit 420 , the register file 430 , the condition code memory 440 , the functional unit 450 , or the memory access unit 460 as appropriate.
- the thread control unit 470 manages initiation and termination of at least one of the four threads.
- the thread control unit 470 includes program counters 472 and a program (or code) base register unit 473 containing program base addresses corresponding to the threads. Execution of a computation may start from a single thread, executing the main function of the program. A thread may initiate execution of another thread by means of a start instruction. The new thread executes in the same function context as the given thread. In other words, it uses the same data and code base register contents. A thread runs until it encounters a peripheral wait, or until it reaches a quit instruction.
- the peripheral unit interface 420 is connected to the instruction processing unit 410 and the peripheral message unit 340 to transfer the peripheral information between the peripheral units 130 ( FIG. 1 ) and the instruction processing unit 410 .
- the peripheral operation may be an input or an output operation.
- an input or output operation is initiated by a message instruction that causes a command message to be transferred to a specified peripheral unit over the peripheral bus.
- the message instruction may be marked wait or no_wait. If the message instruction is marked wait, it is expected that the peripheral unit will return a response message; the processing slice that issued the message-wait instruction will execute the following instructions of that thread only when the response message has been received over the peripheral bus.
- a command message includes a content part that contains data words from data registers specified in the message instruction. If a response message is returned, it contains one or more result phrases, each specifying a data word and a data register identifier; the slice puts each data word in the specified data register, and continues execution of the thread after processing the last result phrase.
- the register file 430 has four sets of data registers. Each of the four sets of data registers corresponds to each of the four threads.
- the data registers store data or temporary items used by the threads. Peripheral operations may reference the data registers in the command or response message.
- the condition code memory 440 stores four condition codes. Each of the condition codes corresponds to each of the four threads.
- the condition code includes condition bits that represent the conditions generated by the functional unit 450 . These condition bits include overflow, greater_than, equal, less_than conditions.
- the condition bits are set according to the type of the instruction being executed. For example, the compare instructions sets the greater_than, equal, and less_than condition bits and clears the overflow condition bit.
- the functional unit 450 performs an operation specified in the dispatched instruction.
- the functional unit 450 performs all operations of the instruction set that manipulate values in the data registers. These operations include arithmetic and logical register operations, shift and selected bit operations.
- the operation performed by the functional unit 450 is determined by a decoded opcode value passed from the instruction decoder and dispatcher 416 .
- the functional unit 450 has connections to the condition code memory 440 to set a thread's condition code according to the outcome of an arithmetic operation or comparison.
- the memory access unit 460 provides for read and write accesses to any of the four data memory banks 315 1 to 315 4 via the data memory switch 320 ( FIG. 3 ).
- the memory access unit 460 has a base register unit 462 having four base registers to receive the base address used in address formation and for saving and restoring the base registers for the call and return instructions. Each of the four data base registers corresponds to each of the four threads.
- the instruction processing unit 410 may include M program base registers. Each of the M program base registers is associated with each of the M threads. The contents of a base register are added to the contents of the corresponding program counter to determine the location in the program memory from which the next instruction for the corresponding thread is to be fetched.
- An advantage of this scheme is that the branch target specified in the instruction that transfers control may be represented in fewer bits for local transfers.
- the memory access unit 460 may include a data base register unit 462 having M data base registers 462 .
- Each of the M data base registers is associated with each of the M threads.
- the contents of the appropriate base register are added to the corresponding program counter to form the effective address for selected instructions. This permits offset addressing to be used, leading to more compact programs.
- FIG. 5A is a diagram illustrating format of a command message 500 according to one embodiment of the invention.
- the command message 500 includes a processor identifier 505 , a thread identifier 507 , a message content 510 , a peripheral address 520 , and a command code 530 .
- the processor identifier 505 and the thread identifier 507 identifies the thread in the processor that issues the command so that the peripheral unit can direct a resulting response message to the correct processor and thread. This is necessary to allow peripheral units to be shared by threads executing on distinct processors.
- the message content 510 is a sequence of N-bit words taken from the data registers specified in the message instruction.
- the message content 510 includes the data or operand to be used by the peripheral unit.
- the message content 510 may contain the starting address for the transfer in data memory, and a count indicating the number of words to be transferred.
- the peripheral address 520 specifies the address of the peripheral unit to perform the peripheral operation specified by the command code 530 .
- the command code 530 specifies the peripheral operation performed by the peripheral unit corresponding to the address specified in the peripheral address 520 .
- FIG. 5B is a diagram illustrating format of a response message 550 according to one embodiment of the invention.
- the response message 550 includes a thread identifier 560 , K data register address 570 1 to 570 K , K operation result 580 1 to 580 K , and an end flag 590 .
- the thread identifier 560 specifies the thread that the response message is directed to. Usually, this thread is the thread that issued the command message to the peripheral unit.
- Each of the data register addresses 570 1 to 570 K specifies the data register in the register file 430 to store the corresponding one of K operation results 580 1 to 580 K .
- Each of the operation results 580 1 to 580 K is the result of the corresponding operation, which may include status information regarding the peripheral operation.
- the end flag 590 indicates the end of the response message 550 .
- FIG. 6 is a diagram 600 illustrating peripheral operations according to one embodiment of the invention.
- the diagram 600 shows the interactions between the processing slice 310 and the peripheral unit 130 .
- only one peripheral unit 130 is shown, it is contemplated that more than one peripheral units may operate with the processing slice 310 .
- Block 620 selects one or more instructions for execution. One or more instructions may be selected for execution concurrently.
- K instructions are selected for execution.
- Block 630 1 to Block 630 K process instruction 1 to instruction K, respectively.
- Blocks 630 1 to 630 K may terminate at the same time or at different times at the end cycle 640 .
- Blocks 630 1 to 630 K will interact with the peripheral unit 130 via the peripheral bus 260 via the peripheral bus 260 when they are processing message instructions.
- Block 650 receives the command message from the processing slice 310 .
- Block 660 starts the I/O operation for the thread T as specified in the command message.
- Block 670 completes the I/O operation for thread T′ where thread T′ may or may not be the same as thread T.
- the command message for thread T′ may have been received by block 650 either earlier or later than the command message for thread T, due to out-of-order performance of operations by the peripheral unit.
- Block 680 sends a response message to indicate the completion of thread T′ to the processing slice via the peripheral bus 260 .
- FIG. 7 is a diagram illustrating the instruction processing block 630 k shown in FIG. 6 according to one embodiment of the invention.
- begin and end nodes 710 and 790 respectively, indicate the beginning and the end of the instruction processing block 630 k .
- Begin node 710 starts the instruction processing.
- Block 720 dispatches thread T according to the instruction type. If the instruction type is a non-message instruction, block 730 processes the instruction as appropriate. If the instruction is a message instruction, block 740 sends the command message to the peripheral unit 130 . The command message is to be received by block 650 as shown in FIG. 6 . If the command message is a wait instruction, block 750 disables thread T and goes to the end node 790 . If the command message is a non-wait instruction, block 780 indicates that the instruction is done and goes to the end node 790 .
- Block 760 receives the response message for thread T′ as sent from block 680 shown in FIG. 6 . Then, block 770 enables thread T′ if thread T′ was disabled and then goes to block 780 to indicate that the instruction processing is done. Next, end node 790 indicates the end of the instruction processing block.
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Advance Control (AREA)
Abstract
Description
- This application claims the benefit of U.S. Provisional Application No. 60/166,686, titled “Integrated Processor for Multithread with Real-Time Input-Output Capability” filed on Nov. 19, 1999.
- 1. Field of the Invention
- This invention relates to computer architecture. In particular, the invention relates to multi-thread computers.
- 2. Description of Related Art
- Demand in high speed data transmission has given rise to many large bandwidth network protocols and standards. For example, the Synchronous Optical Network (SONET) has a number of standards used in Wide Area Network (WAN) with speeds ranging from a few megabits per second (Mbps) to several gigabits per second (Gbps). Popular standards include T1 (1.5 Mbps), T3 (45 Mbps), OC-3c (155 Mbps), OC-12c (622 Mbps), OC48c (2.5 Gbps), OC-192c (10 Gbps), OC-768c (40 Gbps), etc.
- In network applications, the requirements for cell processing and packet processing functions at line rates for broadband communications switches and routers have become increasingly difficult to satisfy, and demand multiple processor configurations to meet performance requirements. The processing of cells and packets typically requires frequent interactions with special function devices and external units. Existing techniques are inadequate to meet real-time demands without degrading performance.
- Therefore, there is a need to have a technique to perform peripheral operations for cell and packet processing.
- The features and advantages of the present invention will become apparent from the following detailed description of the present invention in which:
-
FIG. 1 is a diagram illustrating a system in which one embodiment of the invention can be practiced. -
FIG. 2 is a diagram illustrating a multiprocessor core shown inFIG. 1 according to one embodiment of the invention. -
FIG. 3 is a diagram illustrating a multi-threaded processor shown inFIG. 2 according to one embodiment of the invention. -
FIG. 4 is a diagram illustrating a processing slice shown inFIG. 3 according to one embodiment of the invention. -
FIG. 5A is a diagram illustrating format of a command message according to one embodiment of the invention. -
FIG. 5B is a diagram illustrating format of a response message according to one embodiment of the invention. -
FIG. 6 is a diagram illustrating peripheral operations according to one embodiment of the invention. -
FIG. 7 is a diagram illustrating an instruction processing according to one embodiment of the invention. - The present invention is a method and apparatus to perform peripheral operations in a multi-thread processor. A peripheral bus is coupled to a peripheral unit to transfer peripheral information including a command message specifying a peripheral operation. A processing slice executes a plurality of threads and is coupled to the peripheral bus to execute peripheral operations. The plurality of threads includes a first thread sending the command message to the peripheral unit.
- The command message includes at least one of a message content, a peripheral address identifying the peripheral unit, and a command code specifying the peripheral operation. The peripheral information includes a response message sent from the peripheral unit to the processing slice. The response message indicates the peripheral operation is completed and includes at least one of a thread identifier identifying the first thread, one or more result phrases, each including an operation result, a data register address specifying a data register in the processing slice to store the operation result, and an end flag indicating the last one of the result phrases.
- The command message may be a wait instruction or a non-wait instruction. The processing slice disables the first thread after sending the command message if the command message is a wait instruction. The first thread continues to execute after sending the command message if the command message is a non-wait instruction. The processing slice enables the first thread after receiving the response message from the peripheral unit if the first thread was disabled.
- The peripheral bus may include a bi-directional bus to transfer the command message from the processing slice to the peripheral unit and the response message from peripheral unit to the processing slice.
- The processing slice includes an instruction processing unit, a thread control unit, a peripheral unit, a memory access unit, a functional unit, a condition code memory, and a register file. The processing slice is configured to support execution of a number of threads. Several processing slices operate concurrently in a processor. A multiprocessor core may include several of these processors in a network processor system.
- In the processing slice, the instruction processing unit processes instructions fetched from a program memory. The instruction processing unit includes an instruction fetch unit, an instruction buffer, and an instruction decoder and dispatcher. The instruction fetch unit fetches the instructions from the program memory using a number of program counters, each of which corresponds to each of the threads. The instruction buffer holds the fetched instructions waiting for execution. The instruction decoder and dispatcher decodes the instructions and dispatches the decoded instructions to the memory access unit, the functional unit, or the peripheral unit.
- The thread control unit manages initiation and termination of at least one of the threads. The peripheral unit transfers the peripheral information between the peripheral unit and the instruction processing unit. The peripheral unit receives command messages from the processing slices to direct performance of peripheral operations and sends response messages to the processing slices to convey results of the peripheral operations to the threads. The memory access unit provides access to one of a number of data memories via a data memory switch. The functional unit performs an operation specified in one of the instructions. The condition code memory stores a number of condition codes, each of which corresponds to each of the threads. The register file has a number of data registers, of which a subset is associated with each of the threads.
- In the following description, for purposes of explanation, numerous details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one skilled in the art that these specific details are not required in order to practice the present invention. In other instances, well-known electrical structures and circuits are shown in block diagram form in order not to obscure the present invention.
-
FIG. 1 is a diagram illustrating asystem 100 in which one embodiment of the invention can be practiced. Thesystem 100 includes amultiprocessor core 110, amemory controller 120,peripheral units 130, an off-chip program/data memory 140, and ahost control processor 150. - The
multiprocessor core 110 is a high-performance multi-thread computing subsystem capable of performing all functions related to network operations. These network operations may include adjusting transmission rates, handling special cells and packets used to implement flow control protocols on an individual connection basis, and supporting Asynchronous Transfer Mode (ATM) traffic management for Available Bit Rate (ABR), Variable Bit Rate (VBR), and Unspecified Bit Rate (UBR) connections. Thememory controller 120 provides access to additional memory devices and includes circuitry to interface to various memory types including dynamic random access memory (DRAM) and static random access memory (SRAM). Theperipheral units 130 include a number of peripheral or input/output (I/O) units for peripheral or I/O operations. Theperipheral units 130 include aninput interface 162, andoutput interface 164, a cyclic redundancy code (CRC)engine 166, a check-out content addressable memory (CAM) 168, abit vector unit 172, and a spare 174. The input andoutput interfaces CRC engine 166 supports segmentation and re-assembly for ATM Adaptation Layer Type 5 (AAL5) transmission of packets over ATM connections. The check-outCAM 168 is an associative memory unit that supports the maintenance of several connection records in the on-chip memory for the duration of cell processing for those connections. Thebit vector unit 172 supports round-robin scheduling algorithms at the OC-48 line rate. - The off-chip program/
data memory 140 includes memory devices that store programs or data in addition to the on-chip programs and data stored in themultiprocessor core 110. Thehost control processor 150 is a processor that performs the general control functions in the network. These functions may include connection set-up, parameter adjustment, operation monitoring, program loading and debugging support. -
FIG. 2 is a diagram illustrating themultiprocessor core 110 shown inFIG. 1 according to one embodiment of the invention. Themultiprocessor core 110 includes fourmulti-thread processors 210 1 to 210 4, asplit transaction switch 220, a host interface bus 250, and a peripheral bus 260. It is noted that the use of four processors is for illustrative purposes only. As is known to one skilled in the art, any reasonable number of processors can be used. - The four
multi-thread processors 210 1 to 210 4 are essentially the same. Each of theprocessors 210 1 to 210 4 has local program and data memories for N-bit words of instructions and data, respectively. In one embodiment, N=32. Thesplit transaction switch 210 permits each of the processors to access the data words held in any of the other three data memories with a small additional access time. - The host interface bus 250 allows the any of the four
processors 210 1 to 210 4 to communicate with the host control processor 150 (FIG. 1 ). This includes passing parameters, loading program and data, and reporting status. The peripheral bus 260 allows any one of theperipheral units 130 to communicate with any of theprocessors 210 1 to 210 4. Some peripheral units may have direct memory access (DMA) channels to the local data memories of any one of theprocessors 210 1 to 210 4. In one embodiment, each of these channels supports burst transfer of 32-bit data at 100 MHz clock rate, equivalent to greater than the OC-48 speed. -
FIG. 3 is a diagram illustrating themulti-thread processor 210 shown inFIG. 2 according to one embodiment of the invention. Themulti-thread processor 210 includes four processing slices (PS's) 310 1 to 310 4, adata memory switch 320, banks ofdata memory 330, aperipheral message unit 340, a control and monitorinterface 350, and aprogram memory 360. It is noted that the use of four PS's is for illustrative purposes only. As is known by one skilled in the art, any number of PS's can be used. - The
multi-thread processor 210 is a data and/or information processing machine that supports the simultaneous execution of several programs, each program being represented by a sequence of instructions. A thread is a sequence of instructions that may be a program, or a part of a program. Themulti-thread processor 210 may have one or more instruction execution resources such as arithmetic logic units, branch units, memory interface units, and input-output interface units. In any operation cycle of themulti-thread processor 210, any instruction execution resource may operate to carry out execution of an instruction in any thread. Any one instruction resource unit may participate in the execution of instructions of different threads in successive cycles of processor operation. To support this mode of operation, themulti-thread processor 210 may have a separate hardware register, referred to as the program counter, for each thread that indicates the position or address of the next instruction to be executed within the thread. A multi-thread multiprocessor is a data and/or information processing system composed of several multi-thread processors. - Each of the PS's 310 1 to 310 4 contains a program sequencer and execution units to perform instruction fetch, decode, dispatch and execution for four threads. Each of the PS's operates by interleaving the execution of instructions from the four threads, including the ability to execute several instructions concurrently in the same clock cycle. The
data memory switch 320 allows any of the four PS's 310 1 to 310 4 to access any data memory bank in the banks ofdata memories 330. The banks ofmemories 330 include fourbanks 335 1 to 335 4: data memory banks 0 to 3. Each of thedata memory banks 335 1 to 335 4 stores data to be used or accessed by any of the PS's 310 1 to 310 4. In addition, each of thedata memory banks 335 1 to 335 4 has an interface to the DMA bus to support DMA transfers between the peripherals and data memory banks. Thebanks 335 1 to 335 4 are interleaved on the low-order address bits. In this way, DMA transfers to and from several of theperipheral units 130 can proceed simultaneously with thread execution without interference. - The four PS's 310 1 to 310 4 are connected to the
peripheral message unit 340 via fourPS buses 315 1 to 315 4, respectively. Theperipheral message unit 340 is a distribution or switching location to switch the peripheral bus 260 to each of thePS buses 315 1 to 315 4. Theperipheral message unit 340 is interfaced to the peripheral bus 260 via a command bus 342 and a response bus 344. The command bus 342 and the response bus 344 may be combined into one single bi-directional bus. Appropriate signaling scheme or handshaking protocol is used to determine if the information is a command message or the response message. - When a thread in any of the four PS's 310 1 to 310 4 executes a wait or no_wait instruction for a peripheral operation, a command message is sent from the issuing PS to the command bus 342. The command message specifies the peripheral unit where the peripheral operation is to be performed by including the address of the peripheral unit. All peripheral units connected to the peripheral bus 260 have an address decoder to decode the peripheral unit address in the command message. When a peripheral unit recognizes that it is the intended peripheral unit for the peripheral operation, it will decode the command code contained in the command message and then carry out the operation. If the command message is a wait message instruction, the issuing thread is stalled for an interval during which the responding peripheral unit carries out the peripheral operation. During this interval, the resources associated with the issuing thread are available to other threads in the issuing slice. In this way, high resource utilization can be achieved. If it is a no_wait instruction, the issuing thread continues executing its sequence without waiting for the peripheral operation to be completed. The issuing thread may or may not need a response from the peripheral unit.
- The control and monitor
interface 350 permits thehost control processor 150 to interact with any one of the four PS's 310 1 to 310 4 through thehost interface bus 350 to perform control and monitoring functions. Theprogram memory 360 stores program instructions to be used by any one of the threads in any one of the four PS's 310 1 to 310 4. Theprogram memory 360 supports simultaneous fetches of four instruction words in each clock cycle. -
FIG. 4 is a diagram illustrating theprocessing slice 310 shown inFIG. 3 according to one embodiment of the invention. Theprocessing slice 310 includes aninstruction processing unit 410, aperipheral unit interface 420, aregister file 430, acondition code memory 440, afunctional unit 450, amemory access unit 460, and athread control unit 470. Theprocessing slice 310 is configured to have four threads. The use of four threads is for illustrative purposes only. As is known by one skilled in the art, any number of threads can be used. - The
instruction processing unit 410 processes instructions fetched from theprogram memory 360. Theinstruction processing unit 410 includes an instruction fetchunit 412, aninstruction buffer 414, and an instruction decoder anddispatcher 416. The instruction fetchunit 412 fetches the instructions from theprogram memory 360 using a plurality of program counters. Each program counter corresponds to each of the threads. Theinstruction buffer 414 holds the fetched instructions waiting for execution for any of the four threads. The instruction decoder anddispatcher 416 decodes the instructions and dispatches the decoded instructions to theperipheral unit 420, theregister file 430, thecondition code memory 440, thefunctional unit 450, or thememory access unit 460 as appropriate. - The
thread control unit 470 manages initiation and termination of at least one of the four threads. Thethread control unit 470 includes program counters 472 and a program (or code)base register unit 473 containing program base addresses corresponding to the threads. Execution of a computation may start from a single thread, executing the main function of the program. A thread may initiate execution of another thread by means of a start instruction. The new thread executes in the same function context as the given thread. In other words, it uses the same data and code base register contents. A thread runs until it encounters a peripheral wait, or until it reaches a quit instruction. - The
peripheral unit interface 420 is connected to theinstruction processing unit 410 and theperipheral message unit 340 to transfer the peripheral information between the peripheral units 130 (FIG. 1 ) and theinstruction processing unit 410. The peripheral operation may be an input or an output operation. In one embodiment, an input or output operation is initiated by a message instruction that causes a command message to be transferred to a specified peripheral unit over the peripheral bus. The message instruction may be marked wait or no_wait. If the message instruction is marked wait, it is expected that the peripheral unit will return a response message; the processing slice that issued the message-wait instruction will execute the following instructions of that thread only when the response message has been received over the peripheral bus. - In a peripheral operation, a command message includes a content part that contains data words from data registers specified in the message instruction. If a response message is returned, it contains one or more result phrases, each specifying a data word and a data register identifier; the slice puts each data word in the specified data register, and continues execution of the thread after processing the last result phrase.
- The
register file 430 has four sets of data registers. Each of the four sets of data registers corresponds to each of the four threads. The data registers store data or temporary items used by the threads. Peripheral operations may reference the data registers in the command or response message. - The
condition code memory 440 stores four condition codes. Each of the condition codes corresponds to each of the four threads. The condition code includes condition bits that represent the conditions generated by thefunctional unit 450. These condition bits include overflow, greater_than, equal, less_than conditions. The condition bits are set according to the type of the instruction being executed. For example, the compare instructions sets the greater_than, equal, and less_than condition bits and clears the overflow condition bit. - The
functional unit 450 performs an operation specified in the dispatched instruction. Thefunctional unit 450 performs all operations of the instruction set that manipulate values in the data registers. These operations include arithmetic and logical register operations, shift and selected bit operations. The operation performed by thefunctional unit 450 is determined by a decoded opcode value passed from the instruction decoder anddispatcher 416. Thefunctional unit 450 has connections to thecondition code memory 440 to set a thread's condition code according to the outcome of an arithmetic operation or comparison. - The
memory access unit 460 provides for read and write accesses to any of the fourdata memory banks 315 1 to 315 4 via the data memory switch 320 (FIG. 3 ). Thememory access unit 460 has abase register unit 462 having four base registers to receive the base address used in address formation and for saving and restoring the base registers for the call and return instructions. Each of the four data base registers corresponds to each of the four threads. - In one alternative embodiment of the invention, the
instruction processing unit 410 may include M program base registers. Each of the M program base registers is associated with each of the M threads. The contents of a base register are added to the contents of the corresponding program counter to determine the location in the program memory from which the next instruction for the corresponding thread is to be fetched. An advantage of this scheme is that the branch target specified in the instruction that transfers control may be represented in fewer bits for local transfers. - In one alternative embodiment of the invention, the
memory access unit 460 may include a database register unit 462 having M data base registers 462. Each of the M data base registers is associated with each of the M threads. The contents of the appropriate base register are added to the corresponding program counter to form the effective address for selected instructions. This permits offset addressing to be used, leading to more compact programs. -
FIG. 5A is a diagram illustrating format of acommand message 500 according to one embodiment of the invention. Thecommand message 500 includes aprocessor identifier 505, athread identifier 507, amessage content 510, aperipheral address 520, and acommand code 530. - The
processor identifier 505 and thethread identifier 507 identifies the thread in the processor that issues the command so that the peripheral unit can direct a resulting response message to the correct processor and thread. This is necessary to allow peripheral units to be shared by threads executing on distinct processors. - The
message content 510 is a sequence of N-bit words taken from the data registers specified in the message instruction. Themessage content 510 includes the data or operand to be used by the peripheral unit. For a DMA operation, themessage content 510 may contain the starting address for the transfer in data memory, and a count indicating the number of words to be transferred. Theperipheral address 520 specifies the address of the peripheral unit to perform the peripheral operation specified by thecommand code 530. Thecommand code 530 specifies the peripheral operation performed by the peripheral unit corresponding to the address specified in theperipheral address 520. -
FIG. 5B is a diagram illustrating format of aresponse message 550 according to one embodiment of the invention. Theresponse message 550 includes athread identifier 560, K data register address 570 1 to 570 K, K operation result 580 1 to 580 K, and anend flag 590. - The
thread identifier 560 specifies the thread that the response message is directed to. Usually, this thread is the thread that issued the command message to the peripheral unit. Each of the data register addresses 570 1 to 570 K specifies the data register in theregister file 430 to store the corresponding one of K operation results 580 1 to 580 K. Each of the operation results 580 1 to 580 K is the result of the corresponding operation, which may include status information regarding the peripheral operation. Theend flag 590 indicates the end of theresponse message 550. -
FIG. 6 is a diagram 600 illustrating peripheral operations according to one embodiment of the invention. The diagram 600 shows the interactions between theprocessing slice 310 and theperipheral unit 130. Although only oneperipheral unit 130 is shown, it is contemplated that more than one peripheral units may operate with theprocessing slice 310. - Within the
processing slice 310, thebegin cycle 610 starts an instruction execution cycle.Block 620 selects one or more instructions for execution. One or more instructions may be selected for execution concurrently. Suppose K instructions are selected for execution. Block 630 1 to Block 630 Kprocess instruction 1 to instruction K, respectively. The details of the block 630 K, where k=1, . . . , K, are shown inFIG. 7 . Blocks 630 1 to 630 K may terminate at the same time or at different times at theend cycle 640. Blocks 630 1 to 630 K will interact with theperipheral unit 130 via the peripheral bus 260 via the peripheral bus 260 when they are processing message instructions. - Within the
peripheral unit 130, there are processing blocks to receive and send peripheral messages via the peripheral bus 260.Block 650 receives the command message from theprocessing slice 310. Block 660 starts the I/O operation for the thread T as specified in the command message.Block 670 completes the I/O operation for thread T′ where thread T′ may or may not be the same as thread T. The command message for thread T′ may have been received byblock 650 either earlier or later than the command message for thread T, due to out-of-order performance of operations by the peripheral unit.Block 680 sends a response message to indicate the completion of thread T′ to the processing slice via the peripheral bus 260. -
FIG. 7 is a diagram illustrating the instruction processing block 630 k shown inFIG. 6 according to one embodiment of the invention. In this block, begin and endnodes - Begin
node 710 starts the instruction processing.Block 720 dispatches thread T according to the instruction type. If the instruction type is a non-message instruction, block 730 processes the instruction as appropriate. If the instruction is a message instruction, block 740 sends the command message to theperipheral unit 130. The command message is to be received byblock 650 as shown inFIG. 6 . If the command message is a wait instruction, block 750 disables thread T and goes to theend node 790. If the command message is a non-wait instruction, block 780 indicates that the instruction is done and goes to theend node 790. -
Block 760 receives the response message for thread T′ as sent fromblock 680 shown inFIG. 6 . Then, block 770 enables thread T′ if thread T′ was disabled and then goes to block 780 to indicate that the instruction processing is done. Next,end node 790 indicates the end of the instruction processing block. - While one or more threads are waiting for completion of peripheral operations, other threads may continue executing instructions that utilize functional unit and memory resources. In this way, higher average performance of the multi-thread processors may be achieved.
- While this invention has been described with reference to illustrative embodiments, this description is not intended to be construed in a limiting sense. Various modifications of the illustrative embodiments, as well as other embodiments of the invention, which are apparent to persons skilled in the art to which the invention pertains are deemed to lie within the spirit and scope of the invention.
Claims (16)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US12/410,760 US20090235258A1 (en) | 1999-11-19 | 2009-03-25 | Multi-Thread Peripheral Processing Using Dedicated Peripheral Bus |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16668699P | 1999-11-19 | 1999-11-19 | |
US09/715,772 US7512724B1 (en) | 1999-11-19 | 2000-11-17 | Multi-thread peripheral processing using dedicated peripheral bus |
US12/410,760 US20090235258A1 (en) | 1999-11-19 | 2009-03-25 | Multi-Thread Peripheral Processing Using Dedicated Peripheral Bus |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/715,772 Continuation US7512724B1 (en) | 1999-11-19 | 2000-11-17 | Multi-thread peripheral processing using dedicated peripheral bus |
Publications (1)
Publication Number | Publication Date |
---|---|
US20090235258A1 true US20090235258A1 (en) | 2009-09-17 |
Family
ID=40474138
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/715,772 Active 2025-07-31 US7512724B1 (en) | 1999-11-19 | 2000-11-17 | Multi-thread peripheral processing using dedicated peripheral bus |
US12/410,760 Abandoned US20090235258A1 (en) | 1999-11-19 | 2009-03-25 | Multi-Thread Peripheral Processing Using Dedicated Peripheral Bus |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US09/715,772 Active 2025-07-31 US7512724B1 (en) | 1999-11-19 | 2000-11-17 | Multi-thread peripheral processing using dedicated peripheral bus |
Country Status (1)
Country | Link |
---|---|
US (2) | US7512724B1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090232154A1 (en) * | 1999-11-19 | 2009-09-17 | Government Agency - The United States Of America As Represented By The Secretary Of The Navy | Prioritizing Resource Utilization In Multi-thread Computing System |
Families Citing this family (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7512724B1 (en) * | 1999-11-19 | 2009-03-31 | The United States Of America As Represented By The Secretary Of The Navy | Multi-thread peripheral processing using dedicated peripheral bus |
US7716521B1 (en) * | 2005-05-06 | 2010-05-11 | Oracle America, Inc. | Multiple-core, multithreaded processor with flexible error steering mechanism |
US8190982B2 (en) * | 2006-09-29 | 2012-05-29 | University Of Connecticut | Error-tolerant multi-threaded memory systems with reduced error accumulation |
US20080181254A1 (en) * | 2007-01-25 | 2008-07-31 | Inventec Corporation | Data transmission method |
US8271967B2 (en) * | 2008-06-09 | 2012-09-18 | Ricoh Company, Ltd. | MFP software update using web service |
US9058206B2 (en) * | 2008-06-19 | 2015-06-16 | Freescale emiconductor, Inc. | System, method and program product for determining execution flow of the scheduler in response to setting a scheduler control variable by the debugger or by a processing entity |
US20110099552A1 (en) * | 2008-06-19 | 2011-04-28 | Freescale Semiconductor, Inc | System, method and computer program product for scheduling processor entity tasks in a multiple-processing entity system |
US8966490B2 (en) * | 2008-06-19 | 2015-02-24 | Freescale Semiconductor, Inc. | System, method and computer program product for scheduling a processing entity task by a scheduler in response to a peripheral task completion indicator |
US9075623B2 (en) * | 2012-01-18 | 2015-07-07 | International Business Machines Corporation | External auxiliary execution unit interface for format conversion of instruction from issue unit to off-chip auxiliary execution unit |
US9665372B2 (en) | 2014-05-12 | 2017-05-30 | International Business Machines Corporation | Parallel slice processor with dynamic instruction stream mapping |
US9672043B2 (en) | 2014-05-12 | 2017-06-06 | International Business Machines Corporation | Processing of multiple instruction streams in a parallel slice processor |
US9760375B2 (en) | 2014-09-09 | 2017-09-12 | International Business Machines Corporation | Register files for storing data operated on by instructions of multiple widths |
US9720696B2 (en) | 2014-09-30 | 2017-08-01 | International Business Machines Corporation | Independent mapping of threads |
US9977678B2 (en) | 2015-01-12 | 2018-05-22 | International Business Machines Corporation | Reconfigurable parallel execution and load-store slice processor |
US10133576B2 (en) | 2015-01-13 | 2018-11-20 | International Business Machines Corporation | Parallel slice processor having a recirculating load-store queue for fast deallocation of issue queue entries |
US10133581B2 (en) | 2015-01-13 | 2018-11-20 | International Business Machines Corporation | Linkable issue queue parallel execution slice for a processor |
US9983875B2 (en) | 2016-03-04 | 2018-05-29 | International Business Machines Corporation | Operation of a multi-slice processor preventing early dependent instruction wakeup |
US10037211B2 (en) | 2016-03-22 | 2018-07-31 | International Business Machines Corporation | Operation of a multi-slice processor with an expanded merge fetching queue |
US10346174B2 (en) | 2016-03-24 | 2019-07-09 | International Business Machines Corporation | Operation of a multi-slice processor with dynamic canceling of partial loads |
US10761854B2 (en) | 2016-04-19 | 2020-09-01 | International Business Machines Corporation | Preventing hazard flushes in an instruction sequencing unit of a multi-slice processor |
US10037229B2 (en) | 2016-05-11 | 2018-07-31 | International Business Machines Corporation | Operation of a multi-slice processor implementing a load/store unit maintaining rejected instructions |
US9934033B2 (en) | 2016-06-13 | 2018-04-03 | International Business Machines Corporation | Operation of a multi-slice processor implementing simultaneous two-target loads and stores |
US10042647B2 (en) | 2016-06-27 | 2018-08-07 | International Business Machines Corporation | Managing a divided load reorder queue |
US10318419B2 (en) | 2016-08-08 | 2019-06-11 | International Business Machines Corporation | Flush avoidance in a load store unit |
GB2565338B (en) | 2017-08-10 | 2020-06-03 | Mips Tech Llc | Fault detecting and fault tolerant multi-threaded processors |
US11645178B2 (en) | 2018-07-27 | 2023-05-09 | MIPS Tech, LLC | Fail-safe semi-autonomous or autonomous vehicle processor array redundancy which permits an agent to perform a function based on comparing valid output from sets of redundant processors |
US11755785B2 (en) * | 2020-08-03 | 2023-09-12 | Nxp Usa, Inc. | System and method of limiting access of processors to hardware resources |
Citations (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4106092A (en) * | 1976-09-30 | 1978-08-08 | Burroughs Corporation | Interface system providing interfaces to central processing unit and modular processor-controllers for an input-output subsystem |
US4394727A (en) * | 1981-05-04 | 1983-07-19 | International Business Machines Corporation | Multi-processor task dispatching apparatus |
US4449182A (en) * | 1981-10-05 | 1984-05-15 | Digital Equipment Corporation | Interface between a pair of processors, such as host and peripheral-controlling processors in data processing systems |
US5168547A (en) * | 1989-12-29 | 1992-12-01 | Supercomputer Systems Limited Partnership | Distributed architecture for input/output for a multiprocessor system |
US5179702A (en) * | 1989-12-29 | 1993-01-12 | Supercomputer Systems Limited Partnership | System and method for controlling a highly parallel multiprocessor using an anarchy based scheduler for parallel execution thread scheduling |
US5203002A (en) * | 1989-12-27 | 1993-04-13 | Wetzel Glen F | System with a multiport memory and N processing units for concurrently/individually executing 2N-multi-instruction-words at first/second transitions of a single clock cycle |
US5226131A (en) * | 1989-12-27 | 1993-07-06 | The United States Of America As Represented By The United States Department Of Energy | Sequencing and fan-out mechanism for causing a set of at least two sequential instructions to be performed in a dataflow processing computer |
US5418917A (en) * | 1990-06-29 | 1995-05-23 | Hitachi, Ltd. | Method and apparatus for controlling conditional branch instructions for a pipeline type data processing apparatus |
US5421014A (en) * | 1990-07-13 | 1995-05-30 | I-Tech Corporation | Method for controlling multi-thread operations issued by an initiator-type device to one or more target-type peripheral devices |
US5452452A (en) * | 1990-06-11 | 1995-09-19 | Cray Research, Inc. | System having integrated dispatcher for self scheduling processors to execute multiple types of processes |
US5524255A (en) * | 1989-12-29 | 1996-06-04 | Cray Research, Inc. | Method and apparatus for accessing global registers in a multiprocessor system |
US5560029A (en) * | 1991-07-22 | 1996-09-24 | Massachusetts Institute Of Technology | Data processing system with synchronization coprocessor for multiple threads |
US5812811A (en) * | 1995-02-03 | 1998-09-22 | International Business Machines Corporation | Executing speculative parallel instructions threads with forking and inter-thread communication |
US5815727A (en) * | 1994-12-20 | 1998-09-29 | Nec Corporation | Parallel processor for executing plural thread program in parallel using virtual thread numbers |
US5913059A (en) * | 1996-08-30 | 1999-06-15 | Nec Corporation | Multi-processor system for inheriting contents of register from parent thread to child thread |
US5938765A (en) * | 1997-08-29 | 1999-08-17 | Sequent Computer Systems, Inc. | System and method for initializing a multinode multiprocessor computer system |
US5951672A (en) * | 1997-07-02 | 1999-09-14 | International Business Machines Corporation | Synchronization method for work distribution in a multiprocessor system |
US5974438A (en) * | 1996-12-31 | 1999-10-26 | Compaq Computer Corporation | Scoreboard for cached multi-thread processes |
US6047355A (en) * | 1993-04-30 | 2000-04-04 | Intel Corporation | Symmetric multiprocessing system with unified environment and distributed system functions |
US6047122A (en) * | 1992-05-07 | 2000-04-04 | Tm Patents, L.P. | System for method for performing a context switch operation in a massively parallel computer system |
US6061710A (en) * | 1997-10-29 | 2000-05-09 | International Business Machines Corporation | Multithreaded processor incorporating a thread latch register for interrupt service new pending threads |
US6389446B1 (en) * | 1996-07-12 | 2002-05-14 | Nec Corporation | Multi-processor system executing a plurality of threads simultaneously and an execution method therefor |
US6535936B2 (en) * | 1999-06-30 | 2003-03-18 | Adaptec, Inc. | SCSI phase status register for use in reducing instructions executed by an on-chip sequencer in asserting a SCSI acknowledge signal and method |
US6668317B1 (en) * | 1999-08-31 | 2003-12-23 | Intel Corporation | Microengine for parallel processor architecture |
US6772412B2 (en) * | 2000-03-16 | 2004-08-03 | Omron Corporation | Data processing device equipped with a thread switching circuit |
US7512724B1 (en) * | 1999-11-19 | 2009-03-31 | The United States Of America As Represented By The Secretary Of The Navy | Multi-thread peripheral processing using dedicated peripheral bus |
-
2000
- 2000-11-17 US US09/715,772 patent/US7512724B1/en active Active
-
2009
- 2009-03-25 US US12/410,760 patent/US20090235258A1/en not_active Abandoned
Patent Citations (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4106092A (en) * | 1976-09-30 | 1978-08-08 | Burroughs Corporation | Interface system providing interfaces to central processing unit and modular processor-controllers for an input-output subsystem |
US4394727A (en) * | 1981-05-04 | 1983-07-19 | International Business Machines Corporation | Multi-processor task dispatching apparatus |
US4449182A (en) * | 1981-10-05 | 1984-05-15 | Digital Equipment Corporation | Interface between a pair of processors, such as host and peripheral-controlling processors in data processing systems |
US4449182B1 (en) * | 1981-10-05 | 1989-12-12 | ||
US5203002A (en) * | 1989-12-27 | 1993-04-13 | Wetzel Glen F | System with a multiport memory and N processing units for concurrently/individually executing 2N-multi-instruction-words at first/second transitions of a single clock cycle |
US5226131A (en) * | 1989-12-27 | 1993-07-06 | The United States Of America As Represented By The United States Department Of Energy | Sequencing and fan-out mechanism for causing a set of at least two sequential instructions to be performed in a dataflow processing computer |
US5524255A (en) * | 1989-12-29 | 1996-06-04 | Cray Research, Inc. | Method and apparatus for accessing global registers in a multiprocessor system |
US5168547A (en) * | 1989-12-29 | 1992-12-01 | Supercomputer Systems Limited Partnership | Distributed architecture for input/output for a multiprocessor system |
US5179702A (en) * | 1989-12-29 | 1993-01-12 | Supercomputer Systems Limited Partnership | System and method for controlling a highly parallel multiprocessor using an anarchy based scheduler for parallel execution thread scheduling |
US6195676B1 (en) * | 1989-12-29 | 2001-02-27 | Silicon Graphics, Inc. | Method and apparatus for user side scheduling in a multiprocessor operating system program that implements distributive scheduling of processes |
US5452452A (en) * | 1990-06-11 | 1995-09-19 | Cray Research, Inc. | System having integrated dispatcher for self scheduling processors to execute multiple types of processes |
US5418917A (en) * | 1990-06-29 | 1995-05-23 | Hitachi, Ltd. | Method and apparatus for controlling conditional branch instructions for a pipeline type data processing apparatus |
US5421014A (en) * | 1990-07-13 | 1995-05-30 | I-Tech Corporation | Method for controlling multi-thread operations issued by an initiator-type device to one or more target-type peripheral devices |
US5560029A (en) * | 1991-07-22 | 1996-09-24 | Massachusetts Institute Of Technology | Data processing system with synchronization coprocessor for multiple threads |
US6047122A (en) * | 1992-05-07 | 2000-04-04 | Tm Patents, L.P. | System for method for performing a context switch operation in a massively parallel computer system |
US6047355A (en) * | 1993-04-30 | 2000-04-04 | Intel Corporation | Symmetric multiprocessing system with unified environment and distributed system functions |
US5815727A (en) * | 1994-12-20 | 1998-09-29 | Nec Corporation | Parallel processor for executing plural thread program in parallel using virtual thread numbers |
US5812811A (en) * | 1995-02-03 | 1998-09-22 | International Business Machines Corporation | Executing speculative parallel instructions threads with forking and inter-thread communication |
US6389446B1 (en) * | 1996-07-12 | 2002-05-14 | Nec Corporation | Multi-processor system executing a plurality of threads simultaneously and an execution method therefor |
US5913059A (en) * | 1996-08-30 | 1999-06-15 | Nec Corporation | Multi-processor system for inheriting contents of register from parent thread to child thread |
US5974438A (en) * | 1996-12-31 | 1999-10-26 | Compaq Computer Corporation | Scoreboard for cached multi-thread processes |
US5951672A (en) * | 1997-07-02 | 1999-09-14 | International Business Machines Corporation | Synchronization method for work distribution in a multiprocessor system |
US5938765A (en) * | 1997-08-29 | 1999-08-17 | Sequent Computer Systems, Inc. | System and method for initializing a multinode multiprocessor computer system |
US6061710A (en) * | 1997-10-29 | 2000-05-09 | International Business Machines Corporation | Multithreaded processor incorporating a thread latch register for interrupt service new pending threads |
US6535936B2 (en) * | 1999-06-30 | 2003-03-18 | Adaptec, Inc. | SCSI phase status register for use in reducing instructions executed by an on-chip sequencer in asserting a SCSI acknowledge signal and method |
US6668317B1 (en) * | 1999-08-31 | 2003-12-23 | Intel Corporation | Microengine for parallel processor architecture |
US7512724B1 (en) * | 1999-11-19 | 2009-03-31 | The United States Of America As Represented By The Secretary Of The Navy | Multi-thread peripheral processing using dedicated peripheral bus |
US6772412B2 (en) * | 2000-03-16 | 2004-08-03 | Omron Corporation | Data processing device equipped with a thread switching circuit |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090232154A1 (en) * | 1999-11-19 | 2009-09-17 | Government Agency - The United States Of America As Represented By The Secretary Of The Navy | Prioritizing Resource Utilization In Multi-thread Computing System |
US8531955B2 (en) | 1999-11-19 | 2013-09-10 | The United States Of America As Represented By The Secretary Of The Navy | Prioritizing resource utilization in multi-thread computing system |
Also Published As
Publication number | Publication date |
---|---|
US7512724B1 (en) | 2009-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20090235258A1 (en) | Multi-Thread Peripheral Processing Using Dedicated Peripheral Bus | |
US8531955B2 (en) | Prioritizing resource utilization in multi-thread computing system | |
EP1242883B1 (en) | Allocation of data to threads in multi-threaded network processor | |
US7058735B2 (en) | Method and apparatus for local and distributed data memory access (“DMA”) control | |
US6912610B2 (en) | Hardware assisted firmware task scheduling and management | |
US7376952B2 (en) | Optimizing critical section microblocks by controlling thread execution | |
EP0473777B1 (en) | High-speed packet switching apparatus and method | |
US7831974B2 (en) | Method and apparatus for serialized mutual exclusion | |
US6671827B2 (en) | Journaling for parallel hardware threads in multithreaded processor | |
US5596331A (en) | Real-time control sequencer with state matrix logic | |
US6560667B1 (en) | Handling contiguous memory references in a multi-queue system | |
US20060168283A1 (en) | Programmable network protocol handler architecture | |
US20040205747A1 (en) | Breakpoint for parallel hardware threads in multithreaded processor | |
US20050289238A1 (en) | Data transfer, synchronising applications, and low latency networks | |
JPH09325947A (en) | Method and device for atomically transferring command and data information to device | |
JP2001142842A (en) | Dma handshake protocol | |
EP1302854B1 (en) | Asynchronous Data transfer | |
US7761688B1 (en) | Multiple thread in-order issue in-order completion DSP and micro-controller | |
US6880047B2 (en) | Local emulation of data RAM utilizing write-through cache hardware within a CPU module | |
CA2042171C (en) | High-speed packet switching apparatus and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: UNITED STATES OF AMERICA AS REPRESENTED BY THE SEC Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:ACORN NETWORKS, INC.;REEL/FRAME:022534/0045 Effective date: 20020611 Owner name: ACORN NETWORKS, INC., VIRGINIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:DENNIS, JACK B.;SANDBOTE, SAM B.;REEL/FRAME:022534/0033 Effective date: 20001117 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |