DE19816153B4 - Bus interconnect system for graphic processing system - has individual buses coupling graphics processing elements in ring, with signal lines for transferring command signals between elements - Google Patents

Bus interconnect system for graphic processing system - has individual buses coupling graphics processing elements in ring, with signal lines for transferring command signals between elements

Info

Publication number
DE19816153B4
DE19816153B4 DE1998116153 DE19816153A DE19816153B4 DE 19816153 B4 DE19816153 B4 DE 19816153B4 DE 1998116153 DE1998116153 DE 1998116153 DE 19816153 A DE19816153 A DE 19816153A DE 19816153 B4 DE19816153 B4 DE 19816153B4
Authority
DE
Germany
Prior art keywords
information
bus
processing element
data
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
DE1998116153
Other languages
German (de)
Other versions
DE19816153A1 (en
Inventor
Roy R. Faget
Ronald D. Larson
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hewlett Packard Development Co LP
Original Assignee
Hewlett Packard Development Co LP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US847271 priority Critical
Priority to US08/847,271 priority patent/US5911056A/en
Application filed by Hewlett Packard Development Co LP filed Critical Hewlett Packard Development Co LP
Priority claimed from DE19861337A external-priority patent/DE19861337B4/en
Priority to DE19861337A priority patent/DE19861337B4/en
Publication of DE19816153A1 publication Critical patent/DE19816153A1/en
Publication of DE19816153B4 publication Critical patent/DE19816153B4/en
Application granted granted Critical
Anticipated expiration legal-status Critical
Application status is Expired - Fee Related legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Abstract

The system includes a bus structure, coupling a number of graphics processing elements (202,222,242) in a ring. The bus structure comprises a number of individual buses (250,252,254), with each bus connecting a pair of graphics processing elements. - Each individual bus comprises a number of signal lines (250A-250F) for transferring graphics command signals and information signals between graphics processing elements in the ring.

Description

  • These The present invention relates to a bus connection system for digital Pipeline data transmissions.
  • Computer graphics systems are generally used to display graphical representations of Objects used on a two-dimensional display screen. current Computer graphics systems can deliver highly detailed representations and are at a variety used by applications.
  • at typical computer graphics systems will be an object on a computer Display screen is to be displayed in a plurality of Divided into graphic primitives. basic elements are basic components of a graphic image, such. Points, lines, Vectors and polygons (eg triangles). Typically, this is a hardware / software scheme implements the graphic primitives, which is a view of a or multiple objects on a two-dimensional display screen to prepare (or draw).
  • One Host computer generally provides primitive data containing the primitives a three-dimensional object to be rendered. For example, if the primitive is a triangle, the host computer may the triangle with respect x, y, z coordinates and red, green, blue color values Define (R, G, B color values) each vertex. A preparation hardware interpolates the primitive data to the display screen pixels, which form each primitive, and the R, G, B color values for each To calculate pixels.
  • The Basic components of a typical computer graphics system a geometry accelerator, a rasterizer, and a frame buffer. The system may also include another hardware component, such as a hardware component. B. texture mapping hardware (to be described below), include. The geometry accelerator receives one from the host computer Vertex coordinate and color data for the primitives that a Form object. The geometry accelerator typically performs with respect to Vertex coordinate data (i.e., in terms of screen space coordinates) Transformations through, divides squares into triangles and can add more Functions, such as A brightness adjustment, truncation and performing of plane equation calculations for each primitive. The Output signal from the geometry accelerator as the conditioning data signal is called by the rasterizer (and optionally by a Texture mapping hardware) to final screen space coordinates and RGB color values for to compute each pixel forming the primitives. The final data are stored in the frame buffer for a display is stored on a display screen. Some graphics systems have a pipeline structure such that various operations (such as transformations, interpolations, etc.) simultaneously of different components with different object primitives carried out become.
  • Technically more sophisticated systems optionally provide texture mapping so that objects with an improved surface detail can be displayed. A texture map is a method of mapping a source image, which is called texture, on a surface of a three-dimensional Object and then a mapping of the textured three-dimensional Includes object on the two-dimensional graphics display screen, to display the resulting image. The texture map includes using one or more point elements (texels) of a texture at each point element (pixel) of the displayed portion of the object, on which the texture is mapped. Texture mapping hardware subsystems typically include a local memory containing the texture data stores the portion of the object that is being processed, assigned.
  • Pipeline-structured graphics systems, ie, those that provide data-intensive texture mapping, typically have complex bus structures on which data is communicated between the various components, often compromising system bandwidth. For example, in these systems, it is common practice for data paths (buses) to be reserved for particular types of data and operations, otherwise leaving the paths unused. In the absence of such reserved paths, it may be necessary for a pipelining main processing bus system to be evicted to perform certain operations, such as those described above. B. downloading texture data from the host into a local texture map memory. The bus structure often also includes, in systems having multiple parallel-connected chips, one separate bus for each these chips, further reducing system bandwidth.
  • The System bandwidth is directly related to system behavior. Due to technical advances, the host processors are in the Location, the graphics processing chips the primitive data at increased speeds provide. To a bus structure to the improved technology the host processors, a bus structure is required, which is capable of data transfers handle with a high bandwidth.
  • Various transmission and network systems are known in the art in which communication takes place via a ring connection structure or a ring connection bus. For example, the EP 032211682 A a connection system for a multiprocessor structure in which the acquisition cells are coupled in a ring configuration on a communication bus.
  • From the US 5,504,918 A For example, a parallel processing system is known in which the processors communicate with each other via a ring network.
  • Similarly, from the US 4,536,873 A a data transmission system having a plurality of elements communicating with each other by exchanging a control message along a ring structure.
  • It It is the object of the present invention to provide an improved bus connection system for a To create a computer graphics system with a pipeline structure in which the system bandwidth is maximized.
  • These The object is achieved by a bus connection system according to claim 1.
  • preferred embodiments The present invention will be hereinafter referred to explained in more detail in the accompanying drawings. Show it:
  • 1 - 3 Block diagrams of pipelined computer graphics systems using the bus architecture of the invention;
  • 4 a block diagram illustrating a general embodiment of the bus architecture of the invention;
  • 5 a timing diagram showing the operating phases of the system clock used in the invention;
  • 6 a more detailed block diagram of one of the in 4 shown interface circuits;
  • 7 a timing diagram showing how address and data information is stored in different registers at the in 6 shown interface circuit can be used, can be clocked;
  • 8th 5 is a timing diagram illustrating the delay between a time when an interface circuit deactivates its buffered ready signal to a processing element located upstream thereof and a time when the upstream processing element ceases to output data to the interface circuit to send;
  • 9 Figure 3 is a timing diagram showing the delay between a time when an interface circuit reactivates its Buffered_Ready signal to a processing element located upstream of it and a time when the upstream processing element begins to send valid data to the interface circuit;
  • 10 a more detailed block diagram of the buffered and unbuffered write FIFOs incorporated in 6 are shown;
  • 11 FIG. 5 is a timing diagram showing the relationship between various signals used to generate the in-field signals 10 to control shown buffered and unbuffered write FIFOs;
  • 12 a more detailed block diagram of the buffered and unbuffered read FIFOs incorporated in 6 are shown, where 12 further showing how these FIFOs can operate when a processing element is configured in a redirect mode;
  • 13 a more detailed block diagram of the buffered and unbuffered read FIFOs incorporated in 6 are shown, where 13 Figure 4 shows how these FIFOs can operate when no processing element is configured in a redirect mode; and
  • 14 FIG. 5 is a timing diagram illustrating the relationship between various signals used to control the buffered and unbuffered read FIFOs incorporated in FIG 12 and 13 are shown.
  • 1. System overview
  • The 1 - 3 FIGURES are block diagrams showing different exemplary embodiments of computer graphics systems that utilize a bus architecture and a data transfer protocol in accordance with the present invention. Each system has a different number of components and thus offers a different level of performance; two of the systems shown also provide an additional texture mapping feature. It should be understood that the systems shown are illustrative and in no way limiting, and that any data transfer system could use the bus architecture and data transfer protocol of the invention.
  • 1 shows the simplest of the three systems. As shown, the system includes a host computer 100 , an upstream subsystem 102 and a frame buffer subsystem 104 , The upstream subsystem 102 receives from the host computer 100 over a bus 101 the basic elements to be processed. The primitives are typically characterized by x, y, z coordinate data, RGB color data, and alpha blending data for each of the particular portions of the primitives, such as the. For the triangle vertices.
  • Data representing the primitives in three dimensions is taken from the upstream subsystem 102 over a bus 122 , the segments 122A . 122B and 122C supplied to the frame buffer subsystem. The frame buffer subsystem 104 interpolates the editing data received from the upstream subsystem 102 to calculate the pixels on the display screen that will represent each primitive and to determine the RGB color values of the resulting object for each pixel. RGB color control signals for each pixel are each via RGB lines 123 to control the pixels of the display screen (not shown) to display a resultant image thereon.
  • At the in 1 embodiment shown includes the upstream subsystem 102 a host interface 106 and a three-dimensional geometry accelerator (3D geometry accelerator) 110 , As noted, the host interface receives 106 the x, y, z coordinates and color primitive data over the bus 101 from the host computer 100 , This data is from the host interface 106 over a bus 108 to the geometry accelerator 110 delivered. The geometry accelerator 110 performs conventional geometry accelerator functions, resulting in a rendering of the data for a display. These functions may include three-dimensional transform, brightness adjustment, truncation and perspective subdivision operations, as well as level equation generation performed in floating point format. The conditioning data is provided by a geometry accelerator 110 over a bus 112 to the host interface 106 which reformats the conditioning data, performs a floating-point to fixed-point conversion, and transmits this data over the bus system 122 to the frame buffer subsystem 104 supplies.
  • In this embodiment, the frame buffer subsystem includes 104 two frame buffer controls 114A and 114B each having synchronous graphics random access memory (SGRAM). 116A and 116B as well as a digital-to-analog converter for random access memory (RAMDAC). 120 exhibit. Both the frame buffer controls 114A and 114B as well as the host interface 106 are with the bus system 122 connected. In this embodiment, the bus includes 122 three buses 122A . 122B and 122C that are identical to each other. The bus architecture 122 and that too Ordered data transmission protocols, which will be discussed in more detail below, provide improved bandwidth over prior art architectures and protocols.
  • In the embodiment of 1 receives every frame buffer control 114A and 114B Processing data from the host interface 106 , Each frame buffer controller may control different, non-overlapping segments of the display screen. The frame buffer controls may interpolate the primitive data to compute the screen display pixel coordinates representing the primitive and the corresponding object RGB color values for each pixel coordinate.
  • The resulting image video data taken from the frame buffer controls 114A and 114B can be generated, which include the RGB values for each pixel, in the corresponding SGRAMs 116A and 116B get saved. The video data may be read from the SGRAM chips into the frame buffer controls, reformatted to be from the RAMDAC 120 can be handled and delivered to the RAMDAC. The RAMDAC 120 In turn, the digital color data can be converted into analog RGB color control signals for each pixel via RGB lines 123 provided to control a screen display (not shown).
  • The host interface 106 can also via the video bus 124 directly with the RAMDAC 120 communicate. The system is preferably a pipelined system such that the frame buffer subsystem 104 can process a first primitive while the upstream subsystem 102 processed a (temporally) subsequent primitive.
  • The bus system of the present invention may further be applied to the in 2 shown graphics system can be used. This in 2 shown system corresponds to the in 1 with the exception that: (1.) two 3D geometry accelerators 110A and 110B in the upstream subsystem 102 (2) a texture mapping subsystem 130 and (3.) the capacity of each SGRAM memory 116A and 116B has been increased. With the two geometry accelerators 110A and 110B For example, the primitive data is assigned to the geometry accelerators according to the data transmission protocol of the invention, which will be described in more detail below.
  • A texture mapping subsystem 130 may be any subsystem that performs texture mapping operations, and in this exemplary embodiment, a texture mapping circuit 132 and an associated local cache memory 134 which stores a limited amount of texture data.
  • In this embodiment, the bus includes 122 sections 122A - 122D , The texture mapping circuit 132 is by bus 122 between the host interface 106 and the frame buffer controller 114A connected. During operation, the texture mapping circuit receives 132 like both frame buffer control circuits 114A and 114B , Primitive data on the bus 122A , The data may include x, y, z object pixel coordinates, object RGB color values and S, T texture map coordinates for at least one vertex, and the plane equation of the primitive (ie, triangle). The texture mapping circuit 132 can interpolate the x, y pixel coordinates and the S, T texture coordinates to calculate the resulting texture data for each pixel. The texture data for each pixel may be in the cache memory at any time 134 get saved. If this is done, the texture data will be cached 134 read. If the required texture data is not in the cache memory at this time 134 are present, the required texture data is downloaded from the host computer in an efficient manner in accordance with the present invention, as described in more detail below, without requiring the parsing pipeline or a reserved texture data path.
  • The texture data for each pixel can be over a texel bus 136 to each frame buffer controller 114A and 114B in which they are combined by each frame buffer controller on a pixel-by-pixel basis with the object RGB color values.
  • It should be obvious that the upstream subsystem 102 , the frame buffer subsystem 104 and the texture mapping subsystem 130 can be any currently known subsystems or later developed subsystems. In addition, each of these subsystems preferably has a pipeline structure and processes multiple primitives simultaneously. For example, while the texture mapping subsystem 130 and the frame buffer subsystem 104 Process basic elements previously from the upstream subsystem 102 provided, the upstream subsystem travels 102 to process new primitives until the pipelines are in that downstream directional part become full.
  • The bus architecture of the present invention is configured such that different types of graphics processing chips can be interchangeable in architecture. That is, every chip that goes by bus 120 connected, could any graphic function such. As a texture map, frame buffer control or more of these functions, perform.
  • 3 Fig. 10 is a block diagram showing another embodiment of a graphics system using the bus architecture of the present invention. In the system of 3 includes an upstream subsystem 102 three 3D geometry accelerators 110A . 110B and 110C ; a frame buffer subsystem 104 includes four frame buffer controls 114A - 114D each with an associated SGRAM memory 116A - 106D having; and a texture mapping subsystem 130 includes two texture mapping circuits 132 and 133 each having an associated cache memory 134 and 135 exhibit.
  • About bus segments 122A - G bus architecture 122 The present invention is a host interface 106 Each of the texture mapping circuits 132 and 133 and each of the frame buffer controls 114A - 114D connected. A Texel bus 137 connects the texture mapping circuit 133 with the frame buffer controls 114C and 114D , The operation corresponds to that with reference to 2 has been described.
  • 2. The bus connection system
  • Referring now to 4 is a block diagram of a bus connection system 200 shown in accordance with the present invention. The bus connection system, which can be used to connect multiple graphics chips (e.g., frame buffer controls), includes multiple processing elements 202 . 222 and 242 , The processing element 202 is via a PCI bus 201 (Peripheral Component Interconnect PCI) interface with an external host device (not shown), e.g. B. connected to a PC. Each of the processing elements 202 . 222 and 242 preferably includes an ASIC (Application Specific Integrated Circuit) chip, the core of which may be used as any graphics processing device, such as a graphics processing device. A frame buffer controller, texture mapping device, etc. may be configured.
  • Examples of systems that use the bus connection network 200 could use in the 1 - 3 shown. In 1 is for example a host interface 106 over the bus 101 with the host computer 100 connected, the host interface 106 in a ring with the frame buffer controls 114A and 114B connected is. Thus, in this example, the processing element could 202 ( 4 ) of the host interface 106 ( 1 ), the PCI bus 201 ( 4 ) could be the bus 101 ( 1 ) and the processing elements 222 and 242 ( 4 ) could each be the frame buffer controls 114A and 114B ( 1 ) correspond.
  • As it is in 4 is shown, includes each of the processing elements 202 . 222 and 242 a core processor 204 . 224 and 244 , or an interface circuit 206 . 226 and 246 , The core processors of each processing element may coincide or differ while the interface circuits 206 . 226 and 246 (preferably) are identical. Between the core processors 204 . 224 and 244 and their associated interfaces 206 . 226 and 246 are several asynchronous FIFO (FIFO) buffers (first-in-first-out) 208 . 210 . 212 . 214 . 228 . 230 . 232 . 234 . 248 . 250 . 252 and 254 coupled. These asynchronous FIFOs provide buffered (ie, non-priority) information paths and unbuffered (ie, priority) information paths from both the interface circuits 206 . 226 respectively. 246 to the core processors 204 . 224 respectively. 244 as well as from the core processors 204 . 224 respectively. 244 to the interface circuits 206 . 226 respectively. 246 ,
  • The difference between buffered (ie, non-priority) information and unbuffered (ie, priority) information and the processing of each information type between buffered (ie, non-priority) and unbuffered (ie, priority) information paths will be discussed in more detail below. At this point, however, it should be apparent that two different types of information can be transferred between the processing elements using a shared bus. Each information packet is identified (by setting certain bits in type field data transmitted concurrently therewith) as either buffered (ie, non-priority) or unbuffered (ie, priority) information. After the information has been received from an interface circuit, the information identified as buffered information becomes a memory element that only receives buffered information (ie, non-priority information) item), wherein information identified as unbuffered is transferred to a storage element that only receives unbuffered information (ie, to a priority information storage element).
  • By Control the interface circuits in the ring so that unbuffered (i.e., priority) Information always takes precedence over buffered (i.e., non-priority) information have, however, so that non-priority information in an information path that is from the unbuffered (i.e., priority) information path is kept until the unbuffered (i.e., priority) information are processed, have significant advantages over systems provided that a pipeline route must be "evicted" before priority information can be forwarded by the same.
  • It it should be obvious that although priority information identified herein as "unbuffered" the same anyway processed by information storage elements indeed, and as such, in the general sense of the word "buffered". Nevertheless, will be herein priority information referred to as unbuffered, since they are non-priority information so to speak are unbuffered.
  • The following will continue on 4 Referenced. Using, for example, the processing element 222 is a buffered write FIFO (BW-FIFO) 228 (ie, a non-priority interface output memory element) between an interface circuit 226 and a core processor 224 coupled to provide a buffered (ie, non-priority) information path from the interface circuit to the core processor. Accordingly, an unbuffered write FIFO (UW-FIFO; UW = unbuffered write) 230 (ie a priority interface output memory element) between the interface circuit 226 and the core processor 224 coupled to provide an unbuffered (ie, a priority) information path from the interface circuit to the core processor. In addition, a buffered read FIFO (BR-FIFO) is provided. 232 (ie, a non-priority interface input storage element) between the core processor 224 and the interface circuit 226 coupled to provide a buffered (ie a non-priority) information path between the core processor and the interface circuit. Finally, an unbuffered read FIFO (UR-FIFO; UR = unbuffered read) 234 (ie, a priority interface input storage element) between the core processor 224 and the interface circuit 226 coupled to provide an unbuffered (ie, a priority) information path between the core processor and the interface circuit. The arrangement and operation of the exemplary read and write FIFOs 228 . 230 . 232 and 234 is shown and described in more detail below.
  • The processing elements 202 . 222 and 242 are by means of a unidirectional bus, which are the bus segments 250 . 252 and 254 includes, interconnected. The bus segments 250 . 252 and 254 of the connection network 200 have an identical structure and an identical width. As shown, the bus segments connect 250 . 252 and 254 the processing elements 202 . 222 and 242 in a ring format, with the bus segment 250 the processing elements 202 and 222 connects, the bus segment 252 the processing elements 222 and 242 connects, and the bus segment 254 the processing elements 242 and 202 combines. In this way, information from the processing element 202 to the processing element 222 forwarded information from the processing element 222 to the processing element 242 forwarded, and information from the processing element 242 to the processing element 202 brought back.
  • By using the in 4 Arrangement may be information between the external host device and the processing element 202 (via the PCI bus 201 ), where information is unidirectional of each of the processing elements 202 . 222 and 242 to the processing element following it in the ring, circulating. This circumferential (circular) arrangement provides simple point-to-point connections between each adjacent pair of processing elements, such that all that is required is for the output of each interface to drive the input of the interface following it in the ring. Since each processing element only drives a load, signal integrity problems are minimized, as a result of which a high bus bandwidth can be obtained.
  • In addition, in this embodiment, the information transmitted from one element to another is transmitted together with a clock signal. As such, the need to provide system-wide clock synchronization is eliminated, with the addition of additional processing elements accomplished by merely plugging a new processing element into the ring becomes. That is, this transmission of the clock signal with the information allows the number of elements included in the bus to be virtually unlimited, although the integrity of the clock signal naturally deteriorates as the number of processing elements in the ring is increased.
  • consequently allows the ring connection system according to the invention an increase the communication bandwidth with a bus that decreased one Has width. High throughput is achieved by providing information about the Interfaces of the ring are forwarded quickly while allowing that the Core processors of each element commands asynchronously from the information transfer rate to process. The signal routing and signal integrity problems are reduced because the bus width is reduced and only one Load per bus is present. Furthermore can with the clock signal transmitted with the information in the ring array will, without further additional Added processing elements to the ring network or deleted from the same be without negative Impact for the synchronization of the system will occur. Due to the low Time that each of the processing elements requires to get information from the bus results in the addition of additional processing elements in the ring only one additional Clock cycle per added Processing element, causing a minimum additional delay becomes.
  • 3. The bus connection
  • Further referring to 4 there is a function of the bus segments 250 . 252 and 254 therein, information packets between the processing elements 202 . 222 and 242 transferred to. According to one embodiment of the invention, a complete information packet comprises two different sections, each section 32 Has information bits. When the two sections of each information packet are transmitted, they are time multiplexed so that for each complete packet that is transmitted, 64 bits of information are actually transferred between the processing elements. For example, a first portion of an information packet could have a 32-bit address and a second portion of the packet (immediately following the first portion) a 32-bit data word. Further, according to this embodiment, each section of each complete information packet has a 9-bit type data field associated therewith, this 9-bit type field being transmitted simultaneously with the information packet to which it is associated.
  • Each of the bus segments 250 . 252 and 254 has 41 bus lines reserved for the transmission of information and type data. Using, for example, the bus segment 250 is the bus line 250F comprising 32 different bus lines for the unidirectional transmission of 32 bits of information between the interface circuit 206 and the interface circuit 226 reserved, with the bus line 250E comprising 9 different bus lines for unidirectional transmission of 9 type data bits between the interface circuit 206 and the interface circuit 226 is reserved. Thus, during a single clock cycle, the bus lines are 250E and 250F together capable of 9 data type bits and 32 information bits between the processing elements 202 and 222 transferred to. Preferably, half packets of information and data types (ie, 32 information bits and 9 data type bits) are transmitted at a rate of at least 200 MHz.
  • As it is in 4 shown represents each of the bus segments 250 . 252 and 254 provide common signals between the processing elements connecting them. Again using the bus segment 250 For example, these signals include a clock signal (CLK, CLK = Clock), via the bus line 250A A buffered information ready signal (B_Rdy; B_Rdy = buffered ready) is transmitted via the bus line 250B and an unbuffered information ready signal (U_Rdy; U_Rdy = unbuffered ready) transmitted via the bus line 250C is transmitted, and a busy signal (busy), via the bus line 250D Type field signals (type [8: 0]) transmitted via the bus lines 250E and information field signals (Info [31: 0]) transmitted through the bus lines 250F be transmitted. The clock signals (CLK), type field signals (type [8: 0]) and information field signals (info [31: 0]) are sent in a first downstream direction from a processing element (such as the processing element 202 ) to the next processing element (such as the processing element 222 ) forwarded in the ring. The buffered information ready signals (B_Rdy) and the unbuffered information ready signals (U_Rdy) as well as the busy signals (Busy) are sent in a reverse direction (upward direction) from a second processing element (such as the processing element 222 ) to a first processing element (such as the processing element 202 ) delivered in the ring. Two of these reverse direction signals, ie the B_Rdy and U_Rdy signals, are used to control the flow of information between the processing elements, as described in more detail below.
  • The Busy signal is used to indicate if any of the chips are busy on the bus or not. A register in the main processing element (master Processing Element) that can be read by the host device, whether the chips are busy are or not. In this way, the host device can determine when it is able to perform certain operations involving the processing elements concern.
  • A distinct advantage of the bus connection system 200 is its flexibility. That is, the bus connection system 200 can connect any number of processing elements without significant degradation in signal integrity or bandwidth since, as noted above, the clock signal is transmitted with the data and only one load is coupled to each source. As a result, the bus architecture of the invention allows optional processing elements to be added to facilitate functions, e.g. As a texture mapping, image processing, volume rendering, perform, or that standard processing elements such. Rasterizers or frame buffer controllers, for added performance or functionality.
  • One further advantage provided by the bus structure, is that through (temporally) multiplexing two different 32-bit information words Number of pins, which is required to transfer this information between the processing elements to exchange, half as big as the number is that would be required if the information packets not multiplexed like this. The speed of the bus, d. H. 200 MHz, is big enough that this Multiplexing the 32 address bits and 30 data bits to the one described above Fashion performed can be while it still possible is that the graphics device meets the performance requirements. Because the pin pin reduction In general, this is a major factor in an ASIC design the ability the bus structure, the performance requirements with half the number from pins, the for an information transfer are reserved, to fulfill, a clear advantage over Systems that have a larger number from pins use to transmit the same amount of information.
  • However, as ASIC technology improves and available pin count increases or as technology advances dictate the need for a faster data transfer rate, the 41-bit information path can be readily extended such that more pins can be reserved for information transfer , whereby the bandwidth of the system can be increased accordingly. Furthermore, improvements in PC board technology and I / O pad designs will undoubtedly enable the bus architecture 200 runs at higher frequencies.
  • As noted above, the bus segments are 250 . 252 and 254 additionally unidirectional. Unidirectional buses generally provide a faster data transfer rate than bidirectional buses because there is no delay period associated with reversing the direction of the bus. Further, since there is no need to provide both transmit and receive paths and control them, unidirectional bus structures are typically easier to manage and require simpler hardware.
  • 4. The processing elements
  • As mentioned above, each of the processing elements of 4 a core processor (for example, the core processor 224 ), a series of write and read FIFOs (for example, the FIFOs 228 . 230 . 232 and 234 ) and an interface circuit (for example, the interface circuit 226 ), where the core processors of the processing elements need not be identical, as further mentioned above. For example, in the embodiment of FIG 4 the core processor 204 of the processing element 202 a logic circuit for interfacing with the host PCI bus, which need not be included in the other core processors. In the example of 4 is the processing element 202 the main processing element in the ring network 200 and thus, as mentioned above, the host interface 106 in the 1 - 3 could correspond. As will be described in more detail below, the processing element processes 202 thus the input signal packets in a slightly different manner than the other processing elements coupled in the ring.
  • The general operation of each of the processing elements 202 . 222 and 242 arises with regard to the bus connection network 200 as follows. A primary clock signal is divided into two phases, ie, a phase 1 (P1) and a phase 2 (P2). As mentioned above, the packets passing through the information lines (eg the bus lines 250F ), z. B. multipiped lexte addresses / data information, each packet (in an exemplary embodiment) 32 Address information bits and 32 Data information bits (for a total of 64 Information bits per packet). Thus, during a phase 1 (P1), an address portion of a packet may be transmitted over the information lines (eg, the lines 250F ), wherein during phase 2 (P2) the associated data portion of the packet can be transmitted over the information lines.
  • It However, it should be noted at this point that an information package not necessarily an address packet followed by a data packet, must have but any combination of address and data information may include. As shown below in Table II is shown, an information packet may comprise a data packet, followed by another packet of data, if an operation, such as z. B. a block transfer, carried out becomes. It should also be noted that an information packet is not must be multiplexed into two or more separate sections, but alternatively, as a single multi-bit packet can be transmitted without deviate from the intended scope of the invention.
  • Each information packet is transmitted over a set of information lines (for example, the bus lines 250F ) from an interface circuit (for example, the interface circuit 226 ) received. All packets received by a processing element are dispatched to the core processor of that processing element (e.g., to the core processor 224 of the processing element 222 ) forwarded. Each processing element has a unique base address associated with it so that, for example, during a read operation, five bits of an incoming address may be compared to the base address of the processing element to see if the packet is for that processing element.
  • Preferably, all identical processing element types will share a common base address. If a particular packet does not affect a processing element, the core processor will simply drop the packet. Otherwise, the core processor will perform the function prescribed by the package. For example, if the packet is a read packet, the core processor (for example, the core processor 224 ) the information read from its memory (after a small delay) to the interface circuit associated with it (for example to the interface circuit 226 ), so that the interface circuit transmits the information to the downstream processing elements (for example to the processing element 242 ) can forward.
  • One reason that the bus connection system 200 is able to operate at a high frequency, is that each interface circuit for each processing element (not the main processing element 202 ) Holds information for only a minimum amount of time before being forwarded to the next processing element in the ring. That is, as well as information in each of the processing elements 222 and 242 are transmitted, each of the interface circuits wins 226 respectively. 246 the information in an input register (not shown) during a first clock cycle and forwards them to an output register (not shown) during a second clock cycle following the first clock cycle, which may be the next subsequent clock cycle, but not necessarily have to be. As a result, in one embodiment only one latency of one clock cycle occurs in each of the processing elements 226 and 246 is caused. Consequently, all packets sent by a processing element (not the main processing element 202 ) are passed through its interface circuit to the next processing element in the ring.
  • In contrast, the main processing element is configured to operate in a "bypass mode" since it is the beginning and end of the ring connection. During the bypass mode, the interface circuit operates 206 like two different halves. One input half (including the write FIFOs 208 and 210 ) receives all address / data information from the ring and passes the information through the core processor 204 with one output half (including the read FIFOs 212 and 214 ) new address / data information from the core processor 204 receives and forwards this information to the next processing element in the ring. Consequently, in the in 4 example shown all the information provided by the interface circuit 206 (in the main processing element 202 ) are received into the core element 204 entered, with the information never directly from the input of the interface 206 be forwarded to the output of the same, as in the case of the interface circuits 226 and 246 the case is. The output half is able to receive commands from the host processor (via the PCI bus) to pass or modify these commands (or to generate additional commands in response to them) and forward those commands to the ring.
  • Even though the bypass mode has been described above as whether the same only for the Main processing element is used, it is conceivable that the same for others Processing elements in the ring as well as to provide a increased functionality can be used. For example, the bypass mode be used on additional processing elements to enable that these Elements functions, such as As an image processing perform. During an image processing In general, data is passed into a core processor, manipulated and led out of the core processor. Consequently, the data, which are received by a core processor (if the same one Performing image processing), typically stored and processed such that the latency between the input and the output of the data can be very long. additionally The amount of data that is entered into the core may differ from the Distinguish the amount of data leaving it. Consequently, by a Placing one or more of the processing elements in the ring in the redirecting mode, image processing by means of this Processing elements without deterioration of the overall behavior of the ring are unwound.
  • The Forwarding mode may also be used, for example, to enable that a processing element transmits a digital video data stream to a video data processing element, the in the processing direction after the transferring processing element is arranged. The processing element (s) that comprise the digital Feed video stream, should always be in the processing direction before the video data processing element (s) be arranged to receive the digital video stream. Around to perform such a function places the host device the digital video data processing element in a bypass mode, by providing a bypass state hardware register (not shown) in FIG the interface of the digital video processing element described becomes. No further operations should be performed on the connection after the video processing element in the bypass mode was placed. After the video processing element in the bypass mode can place the same writes into a memory or in overlay buffers, in image buffer or texture cache arrays in any of generate downstream processing elements in the ring bus.
  • The divert mode is enabled by setting a divert state hardware flag on each interface circuit to be "rerouted". Consequently, in the case of 4 The example shown is the bypass state hard flag of the interface circuit 206 permanently set while the bypass state hardware flags of the interface circuits 226 and 246 optional can be set.
  • Because the main processing element 202 (in 4 ) initiates information packet communication in the ring network, it also provides the master clock signal that is forwarded with the information. That is, in addition to the configuration of the main processing element 202 to operate in a bypass mode, it is also configured to operate in a "master mode". Regardless of whether the processing elements are configured or not to operate in a main mode, all processing elements receive an incoming clock signal (In_CLK) from an upstream processing element along with information from that processing element. This incoming clock signal In_CLK is used to clock the input circuitry of the interface circuits (to be described below). The processing elements, ie without the processing element 202 Also, which are not configured to operate in a main mode also use the incoming clock signal In_CLK to clock the output circuitry of the interface circuit (to be described below).
  • Because the interface circuit 206 is configured to operate in a main mode, it is controlled so that its output circuitry using a source clock signal (Src_CLK = source clock) generated by the core processor 204 is generated and not clocked using the incoming clock signal (In_CLK). The source clock signal Src_CLK from the core of a "main" processing element is thus forwarded along with the information from the core of that processing element to the next processing element in the ring. The manner in which either the source clock signal or the incoming clock signal is selected to provide timing for the output circuitry of an interface circuit (depending on whether it is configured to operate in a main mode or not) as well as the generation of suitable timing signals thereof, will be described in more detail below.
  • As mentioned above, the processing elements associated with the main processing element are not coupled, generally not in the bypass mode. Consequently, for all non-main processing elements (for example, the processing elements 222 and 242 ) the routing of information from one processing element to another in the ring is largely controlled by the interface circuit of the processing element (for example, by the interface circuits 226 and 246 ). For example, using the non-main processing element 222 the core processor receives 224 for example, information about asynchronous buffered write and unbuffered write FI-FOs (BW, UW) 228 and 230 and forwards information coming from the core processor 224 via asynchronous buffered read and unbuffered read FIFOs (BR, UR). 232 and 234 to the output of the interface circuit 226 further. The FIFOs 228 . 230 . 232 and 234 are called asynchronous because the buffered write and unbuffered write FIFOs (BW, UW) 228 and 230 from the interface circuit 226 Receive Information, and the Buffered Read and Unbuffered Read FIFOs (BR, UR) 232 and 234 using a first clock signal (that from the interface circuit 226 is provided) information to the interface circuit 226 while the buffered write and unbuffered write FIFOs (BW, UW) 228 and 230 Information to the core processor 224 and the buffered read and unbuffered read FIFOs (BR, UR). 232 and 234 using a second clock signal (that from the core processor 224 provided) that may have a different frequency than the first clock signal, information from the core processor 224 receive. Because the interface circuit 226 and the core processor 224 can work with different frequencies, facilitates the use of asynchronous FIFOs 228 . 230 . 232 and 234 hence the transfer of information between the two devices.
  • For example, still using the processing element 222 In one embodiment, each of the FIFOs 228 . 230 . 232 and 234 able to store 82 bits (parallel) per FIFO input. The reason for using both buffered FIFOs (ie, non-priority interface storage circuits) and unbuffered FIFOs (ie, priority interface storage circuits) in each of the write and read paths will be described in more detail below. At this point, however, it should be noted that two different information paths, ie, a buffered (ie, a non-priority) information path and an unbuffered (ie, a priority) information path into and out of the core processor 224 are provided. In one embodiment, 41 of the 82 parallel bits correspond to each input of the buffered write and unbuffered write FIFOs (BW, UW). 228 and 230 the 32 information bits and 9 type bits received during the phase 1 (P1) of the packet transmission, the other 41 bits corresponding to the 32 information bits and the 9 type bits received during the phase 2 (P2) of the packet transmission. If a function, e.g. Thus, for example, if a write function requiring an address and data is performed, each buffer entry will include a 32-bit address, a 9-bit type field associated with the address, 32 data bits, and a 9-bit type field. associated with the data (which is generally a duplicate of the type field associated with the address). In one embodiment of the invention, each of the buffered write and unbuffered write FIFOs (BW, UW) should be 228 and 230 at least five entries deep, each of the reading FIFOs should be at least one entry deep. However, it should be apparent that the number of entries in each FIFO can be adjusted as required by the core.
  • In one embodiment, the clock signal passed between each processing element is a 200 MHz clock signal. Each interface circuit (for example, the interface circuit 226 ) uses a two-phase clock system, with each phase operating at half an external clock frequency, ie each phase operates at 100 MHz in this embodiment. The following will be briefly on 5 Referenced. There is shown a timing diagram illustrating the relationship between the incoming system clock signal In_CLK (In_CLK = incoming clock) from the upstream processing element and the two clock phases Phase 1 (P1) and Phase 2 (P2) generated internally at each of the interface circuits become. It is noted that the operating phase (ie P1 or P2) changes for each falling edge of the incoming system clock signal In_CLK. For timing reasons, it may be desirable to have a complementary system clock signal rather than the one in FIG 5 to use shown single-phase clock signal. For reasons of simplicity, however, only a single-phase clock signal will be described below.
  • As mentioned above, if a processing element is not configured to operate in a main mode, the processing element will transmit the incoming clock signal In_CLK as an outgoing clock (Out_CLK) transmitted with the outgoing information. Consequently, the internally generated phases P1 and P2 for the outgoing clock signal Out_CLK, if not configured in a main mode, will be identical to the internally generated phases P1 and P2 for the incoming clock signal In_CLK. However, when a processing element is configured to operate in a main mode, it becomes a source clock signal Src_CLK from its core, not the core use incoming clock signal In_CLK to transmit data. Thus, when an interface circuit is in a main mode, it will transmit a source clock signal as its output clock signal Out_CLK, and to control the flow of outgoing information, it will generate a two-phase clocking system (which also includes phases P1 and P2). that the in 5 shown is similar (but not identical).
  • Referring now to 6 is a detailed block diagram of an interface circuit 226 shown. In one embodiment, since the interface circuits of all the processing elements in the ring are identical, it should be apparent that the following description of the interface circuit 226 for both the interface circuit 206 as well as 246 Application finds. As noted above, however, the interface circuit may 206 generally operate only in a "bypass mode" and a "main mode", whereby their operation is somewhat different from that of the interface circuits 226 and 246 will differ.
  • As it is in 6 3, type data and multiplexed information packets are simultaneously applied to the input of the interface circuit 226 delivered and in an input register 260 (ie, an input storage element). The input register 260 can in the input pads of the interface circuit 226 recorded or alternatively be arranged externally on the interface circuit. As noted above, the logic is in the interface circuit 226 clocked using a two-phase clock system. The incoming clock signal In_CLK is applied to the input of a divide-by-two circuit 295 which provides output signals, ie, a write clock signal (Wr_CLK; Wr_CLK) and a phase 2 clock signal (P2_CLK) which is the inverse signal of the Wr_CLK signal and thus has a high state during the phase P2. The signals Wr_CLK and P2_CLK are used to control various registers in the interface logic as described below.
  • The incoming clock signal In_CLK is also sent to an input of a multiplexer 299 delivered. The multiplexer 299 also receives a source clock signal Src_CLK from the core processor 224 as a second input signal. If the interface circuit 224 is configured to operate in a main mode, the multiplexer 299 select the source clock signal Src_CLK as its output, and the source clock signal Src_CLK is output to the output of the interface circuit as the output clock signal Out_CLK 226 deliver. Conversely, if the interface circuit is not configured to operate in the main mode, the multiplexer becomes 299 select the input clock signal In_CLK as its output and provide the input clock signal In_CLK as the output clock signal Out_CLK.
  • The output signal of the multiplexer 299 is also applied to the input of a divide-by-two circuit 297 which supplies a read clock signal RD_CLK (RD_CLK = read clock) as well as a signal (not shown) which is the inverse signal of the RD_CLK signal used for output control to the output thereof. If the interface circuit 226 is not configured in the main mode, consequently, the write clock signal Wr_CLK and the read clock signal RD_CLK should be synchronous, both of which should be the inverse signal of P2_CLK. Conversely, if the interface circuit 226 is configured in the main mode, the read clock signal RD_CLK should be driven only by the source clock signal Src_CLK.
  • According to one embodiment, the clock signals generated by the divide-by-two circuits 295 and 297 system-wide by toggling the state of a single information bit on one of the information bus lines, e.g. B. bit [31] of the bus lines 250F (shown in 4 ), and by synchronizing the output signals of the divide-by-two circuits 295 and 297 be synchronized with them.
  • By using a passageway 261 , which (via multiplexer 270 and 282 ) between the input register 260 and an output register 300 (ie, an output memory element) is the interface circuit 226 able to transfer information packets (and type data) via their circuitry to the output register 300 for forwarding to the next processing element quickly. The passageway 261 is used when the backup path 290 (which will be described below) is empty and the downstream processing element (not shown) sends a signal to the interface circuit 226 indicating that it is currently able to receive information. However, if the downstream processing element indicates that it is unable to receive information, the information that would have been forwarded to the downstream element (if it has been ready to receive the information) will be stored in information storage elements (e.g. Register) in the backup information path 290 (described below) chert. This secured information must be in the information store items of the backup information path 290 be stored at least until the interface circuit 226 ceases to receive information from an upstream processing element (e.g., the processing element 202 in 4 ) to recieve. The upstream processing element will stop transmitting this information by placing the upstream processing element on the interface circuit 226 which provides a signal to the upstream processing element instructing it to halt its transmission of information.
  • 5. Communication protocol
  • In the embodiment described herein, as presented above, the information packets transmitted between the processing elements have two sections. For example, the first portion may include address information, wherein the second portion may include data associated with that address. In addition, each of these two sections of an information packet is assigned a type field. In one embodiment, the type field comprises 9 Bits and, as shown in Table I, may be encoded as follows:
  • Table I
    Figure 00330001
  • The Type field is used to send an instruction to each of the processing elements as to what to do with what to do with the information is that are received simultaneously with the same. Two types of apples which are shared are the register read and register write types. A register-write operation is done in two steps. First, the address is during Phase 1 (P1) second, during phase 2 (P2) data is sent be sent. For Register read operations the address during phase 1 (P1), the data field being read during the Phase 2 (P2) is sent, unknown. After a short delay will be the data read out from the core processor into the before Inserted unknown data slot of the same register read packet and to the next Processing element forwarded in the ring.
  • Two other shared type fields are BIN read and BIN write instructions, which are for reading and writing BIN type data (ie, read and write information into specific buffers, such as overlay buffers, frame buffers, or texture cache arrays). As another type of box option, block transfer instructions can be used to transfer large blocks of data between a source and a destination. For example, a large block of data could be transferred from a host computer (via the main processing element) to a downstream processing element by direct memory access (DMA) techniques. In addition, level equations can be included in the type field to transfer level equation data to the registers for 3D rendering. Finally, other type instructions (misc; misc = miscellaneous) are used to implement special functions on the bus. For example, the Misc type instruction could be used to clear a read path before performing another type of operation.
  • As it is obvious from Table I, use some instructions common bits with common meanings. For example, that is Bit <8> of the type field a "valid" bit, which is used to validate to display each of the sections of an information packet. If the validity bit a "0" is the Bits <7: 0> of the type field ignored. additionally the bit <3> of the type field becomes typical used to identify whether a buffered (i.e., a non-priority) information path or an unbuffered (i.e., a priority) information path should be to: (1.) information from the interface circuit to transfer to the core processor, (2) Information from the core processor to the interface circuit transferred to, or (3) information in the backup information path (which is in the previously described). More details regarding buffered and unbuffered information paths are provided below.
  • As mentioned above, information is transmitted in multiplexed packets on the bus, with one half of each information packet being transmitted during each phase of the clock signal. The following Table II summarizes the information transmitted during Phase 1 and 2 (P1 and P2) for the types of operations listed above and listed in Table I: Table II
    Figure 00350001
  • Referring to the 4 and 6 (and again using, for example, the processing element 222 ), as shown in Table II, the first portion of each information packet from the interface circuit 226 during phase P1 (on the information lines 250F ), wherein the second portion of each packet is from the interface circuit 226 during phase P2 (also on the information lines 250F ) Will be received. As it is in 6 is shown, includes the interface circuit 226 an input register 260 , a P1 register 262 , a P1WD register 264 and a P2WD register 266 , The input register 260 receives the incoming information packets (and their associated type data), the P1 register 262 its input signal from the output of the input register 260 receives. In addition, the P1WD register receives 264 its input signal from the output of the P1 register 262 , Where at the P2WD register its input from the output of the input register 260 receives.
  • The following will be briefly on 7 Referenced. It is a timing diagram showing the receipt of various complete information packets by the interface circuit 226 represents. In particular, this timing diagram shows: (1) the state of the incoming clock signal (In_CLK) that is on the bus line 250A (2.) the information (Info [31: 0]) that is on the bus lines 250F (3) the write clock signal (Wr_CLK) resulting from the divide-by-two circuit 295 and (4.) the contents of the input register 260 (IN_Reg; IN_Reg = input register), the P1 register 262 (P1_Reg) and the P1WD and P2WD registers 264 respectively. 266 (P1 / P2WD). The states of the previous signals are shown over a series of transmission cycles T0-T3, each of the transmission cycles T0-T3 comprising a first phase P1 and a second phase P2. These signals are shown for comparison on the same time scale (horizontal axis). The arrangement of one signal over another does not indicate that a signal has a greater amplitude than the other signals.
  • Since one embodiment of the invention described herein is implemented using ASIC circuits as the processing elements, the control of the "next state" content of each of the individual information storage elements described in US Pat 6 are shown (ie, what the contents of the information storage elements look like during the next clocked state), as well as the control of the flow of information across the various multiplexers by means of a software simulation language, such as a. As Verilog be implemented, and then by means of a synthesis tool, such as. B. Synopsys , which operates on a general-purpose processor, be converted into a logic gate circuit. The logical functions may alternatively be written in a software program and performed by a processor.
  • As it is in 7 For example, an output register of an upstream processing element conducts, for example, an address ADR1 (ie, an address portion of a first information packet) over the bus lines 250F during the phase P1 of the transfer cycle T1 on. During the phase P2 of the transmission cycle T1, the falling edge of the incoming clock signal In_CLK latches the ADR1 (address 1) in the input register 260 the interface circuit 226 , During the phase P1 at time T2, the falling edge of the incoming clock signal In_CLK latches the data portion DATA1 of the first information packet into the input register 260 , This leaves the input register 260 and the interface circuit 226 is free to receive the ADR2 from an upstream processing element during the next falling edge of the incoming clock signal In_CLK (ie during phase 2 (P2) at time T2). The type data is from the interface circuit 226 (from the bus lines 250E ) received in an identical manner. Consequently, information and type data from the interface circuit 226 at the frequency of the incoming clock signal In_CLK, ie at about 200 MHz.
  • To convert the two sections of each information packet into an 82-bit parallel word (that is to the core processor 224 is to be demultiplexed) is the phase P1 register 262 provided. If the information (and associated type fields) from the interface circuit 226 is received, as described above, the P1 register 262 controlled by a half frequency clock Wr_CLK to latch only the first portion of each information packet. Further, the P1 / P2WD registers become 264 / 266 (to forward information to the core processor 224 ) is controlled by the half frequency clock signal P2_CLK (which is the inverse signal of the half frequency clock signal Wr_CLK) to simultaneously latch both portions of each information packet.
  • In particular, during the rising edge of the half frequency clock signal Wr_CLK, the content of the input register becomes 260 into the P1 register 262 during the rising edge of the half frequency clock signal P2_CLK (which is the falling edge of Wr_CLK) is the content of the P1 register 262 into the P1WD register 264 and the content of the input register 260 into the P2WD register 266 is being laughed. Consequently, the P1 / P2WD registers store 264 and 266 after the rising edge of a P2_CLK signal, respectively the first and second sections of an information packet. During any time prior to the next rising edge of the P2_CLK signal (eg, during the rising edge of the next Wr_CLK signal), the information in the P1 / P2WD registers is written to either the asynchronous buffered (ie the non-priority -) Write FIFO (BW) 228 or the asynchronous unbuffered (ie the priority) write FIFO (UW) 230 that is, depending on the values of particular bits in the type fields associated with the packet, as will be explained in more detail below. There is no filtering of the address / data and type information provided by the interface. Consequently, each information packet received at the interface becomes the core processor 224 which determines whether this package is of interest or not.
  • The following will be up again 6 Referenced. All incoming multiplexed information packets are from the input register 260 during each falling edge of the incoming clock signal In_CLK (as noted above), the information being either from (1) the input register 260 (over the transit route 261 ), (2.) the backup information path 290 or (3.) the buffered read or unbuffered read FIFOs (BR, UR) 232 or 234 in the output register 300 (via one or more of the multiplexers 270 . 282 and 280 ) during the falling edges of the output clock signal Out_CLK (which corresponds to the input clock signal In_CLK when the processing element is not in the main configuration). Consequently, receive and transmit the input register 260 and the output register 300 when not configured in the main mode, information during each falling edge of the input clock signal In_CLK, ie both during the P1 and P2 transmission cycles. Since the information packets are multiplexed into two sections (for example, an address section and a data section), the input register receives 260 the first portion of each information packet, for example the address information, in fact during the falling edge of the input clock signal In_CLK during the P2 operation phase, and the second portion of the information packet, for example the data information, during the falling edge of the input clock signal In_CLK during a P1 operation phase (the P2 phase immediately follows while the address is in the register 260 was clocked). Accordingly, the first portion of each information packet is that of the interface circuit 226 via a falling edge of the output clock signal Out_CLK (corresponding to the input clock signal In_CLK when the processing element is not in the main configuration) into the output register during a P1 operation phase 300 wherein the second portion of each transmitted information packet is detected by means of a falling edge of the output clock signal Out_CLK during a P2 phase of operation (which immediately follows the P1 phase during which the first portion of the packet enters the register 300 clocked) in the output register 300 is clocked.
  • If the processing element that is the processing element 222 is downstream (for example, the processing element 242 ), indicates that it is capable of receiving information from the processing element 222 to receive, and the processing element 222 itself is not delayed (for example, when it is waiting for read information from its core processor), the output register becomes 300 Information directly from the input register 260 (over the passageway 261 ) such that a portion of an information packet (and its associated type data) is received via the interface circuit during each falling edge of the input clock signal In_CLK 226 is shifted, causing a latency of only a single clock cycle as a result of the presence of the interface circuit.
  • 6. Backup information paths
  • If the processing element that the processing element 222 is downstream, (for example, the processing element 242 ) indicates that it is currently unable to retrieve information from the processing element 222 or if the processing element 222 itself is delayed (for example, when it is waiting for read information from its core), the information storage items in the backup information path become 290 (described below) used to store information in the interface circuit 226 while the processing element (for example, the processing element 202 ), the processing element 222 is still downstream, still sending data, ie until the upstream processing element 202 on a display from the processing element 222 appeals that it is currently unable to accept information. That is, this backup function is performed at least until the interface circuit 226 is able to interfere with the upstream processing element (eg, the processing element 202 ), to stop sending information.
  • As it is in 6 is shown, includes the backup information path 290 a buffered (ie non-priority) information backup path 292 and an unbuffered (ie, a priority) information backup path 294 , The buffered information backup path 292 again comprises buffered (ie, non-priority) save route storage elements B0, B1, B2, B3, B4, and B5, the unbuffered information backup path 294 unbuffered (ie, priority) backup storage elements U0, U1, U2, U3, U4, and U5. Further, in the backup information path 290 multiplexer 292A and 292B arranged in one embodiment, to the contents of only one (or none) of the buffered Sicherungswegspeicherelemente B4 and B5 and the unbuffered Sicherungswegspeicherelemente U4 and U5 to one of the inputs of the multiplexer 270 to deliver, with which the outputs of the multiplexer 292A and 292B are connected.
  • Furthermore, in 6 a B_Rdy_Register (B_Rdy = buffered ready) 322 , a U_Rdy register (U_Rdy = unbuffered ready) 324 , a B_Rdy logic circuit 326 and a U_Rdy logic circuit 328 shown. The B_Rdy register 322 receives an incoming buffered ready signal (B_Rdy_In) from a downstream processing element (via the B_Rdy logic circuit 326 ), and provides an outgoing buffered ready signal (B_Rdy_Out) to an upstream processing element. Accordingly, the U_Rdy register receives 324 an incoming unbuffered ready signal (U_Rdy_In) from a downstream processing element (via the U_Rdy logic circuit 328 ), and provides an outgoing unbuffered ready signal (U_Rdy_Out) to an upstream processing element. The B_Rdy and U_Rdy logic circuits 326 and 328 receive, in addition to the incoming B_Rdy_In and U_Rdy_In signals, a series of further input signals, any of which may indicate that the processing element 222 is currently unable to receive information from an upstream processing element. Each of the B_Rdy and U_Rdy registers 322 and 324 is clocked during the rising edge of the Wr_CLK signal.
  • The downstream processing elements signal the upstream processing elements to stop the transmission of information by simply sending their outgoing buffered ready signals (B_Rdy_Out) (for example, on the bus line 250B ) or their outgoing unbuffered ready signals (U_Rdy_Out) (for example on the bus line 250C ) to the upstream element. The B_Rdy_Out and U_Rdy_Out signals indicate the respective capability of the processing element 222 to receive buffered (ie, non-priority) information and unbuffered (ie, priority) information. Further explanations regarding the use and control of buffered and unbuffered information are presented in detail below.
  • The backup information path 290 in 6 includes three buffering levels in each of its buffered and unbuffered information paths 292 and 294 to accommodate the maximum amount of information that could be saved, for example, between the time at which the interface circuit 226 first an incoming disabled B_Rdy_In signal from the downstream interface circuit 246 receives, and the time at which the interface circuit 226 actually ceases to receive information from the upstream interface circuit 206 to receive (after the interface circuit 226 its outgoing buffered ready signal B_Rdy_Out to the upstream interface circuit 206 has disabled).
  • Receiving either the incoming B_Rdy_In or U_Rdy_In signal through the interface circuit 226 (from a downstream processing element) will cause the interface circuit 226 the transfer of information of the identified type (ie, either buffered or unbuffered information) to the downstream processing element and any pending information as well as any information in reserved (default) buffers persisted in the backup information path 290 are recorded, stored during the period between the time when the interface circuit 226 its outgoing B_Rdy_Out or U_Rdy_Out signal (to an upstream processing element) is disabled, and the time when the upstream processing element actually stops sending information of the identified type. Thus, the receipt of a disabled incoming B_Rdy_In or U_Rdy_In signal is an indication that the processing element receiving the signal should (as soon as possible) stop sending information and type data on its outgoing information and type data buses.
  • Once the interface circuit 226 receives a disabled B_Rdy_In or U_Rdy_In signal from a downstream processing element, it will halt its information transfer only after it is ready to send any complete information packet that had already started transmitting when it received the signal. For example, if the interface circuit 226 a deactivated B_Rdy_In signal from the interface circuit 246 receives just after an address (which is associated with a non-priority operation) in the output register 300 has been clocked, the interface circuit 226 thus, continue to clock the data portion of the information packet (associated with the previously sent address) before transmitting information to the downstream processing element 246 is stopped.
  • It is noted that the information related to the backup path 290 transferred to the P1WD register 264 and the P2WD register 266 latched and either to the buffered write FIFO (BW) 221 or the unbuffered write FIFO (UW) 230 (depending on whether the information is buffered or unbuffered) so that even the "backed up" information is the core 224 (in case the information concerns the core).
  • For the sake of simplicity, the description below will focus solely on the use and effects of the B_Rdy_In and B_Rdy_Out signals and the use of the buffered backup information path 292 Although it should be apparent that the methods used for temporary storage of secured information are identical for both buffered and unbuffered information.
  • If information (and type data) is in the buffered backup path 292 are transferred, they are first written to registers B4 and B5. Once registers B4 and B5 are full, incoming information (and type data) is then written to registers B2 and B3. When registers B2 and B3 are full, eventually the incoming information (and type data) is written to registers B0 and B1. If the information (and type data) in the buffered backup path 292 are transmitted, the information (and type data) from the P1 register Re 262 in addition, always written in one of the registers B0, B2 or B4, with the information from the input register 260 always written in one of registers B1, B3 or B5. Although the information transfer paths from the P1 register 262 to the buffered backup storage elements B2 and B4, and from the input register 260 to the buffered backup storage elements B3 and B5 not explicitly in 6 It should be apparent that these information transfer paths are still present and that the inputs are in the buffered backup path 292 Inputs to each pair of buffered fuse storage elements (ie B0 and B1, B2 and B3, and B4 and B5) in the buffered fuse path 292 should represent.
  • The information (and type data) is always reversed, as is the buffered backup path 292 is filled, read from the registers B4 and B5 when the buffered safety path 292 is emptied. In addition, when registers B4 and B5 are empty, the contents of registers B2 and B3 (if any) are shifted to registers B4 and B5, respectively, with the contents of registers B0 and B1 (if present) in registers B2 and B3, respectively is moved.
  • As noted above, software designed / hardware implemented logic may be used to control the next level content of each of the registers used in the invention. According to one embodiment, instead of checking the contents of each of the individual registers in the backup information path 290 which contains type data as well as the single type data register in the P2 register 262 in order to determine what information should be transferred to which position in the circuit during the next clock cycle, a series of separate single-bit registers (ie, content-identifying memory elements), for example the registers 302 . 304 . 306 . 308 . 310 . 312 and 314 stored in the control register block 320 used to keep track of which type of information in which positions in the backup information path 290 is present as well as to keep track of which type of information in the P1 register 262 is available. For example: (1) the single register 302 can be used to indicate whether the buffered backup registers B4 and B5 are full (2) the single register 304 can be used to indicate whether the buffered backup registers B2 and B3 are full (3.) the single register 306 can be used to indicate whether the buffered backup registers B0 and B1 are full (4.) the single register 308 may be used to indicate whether the buffered backup register B0 has information of another type stored therein (5.) the single register 310 can be used to indicate whether the buffered backup register B2 has information of the other type stored therein (6.) the single register 312 can be used to indicate whether the buffered backup register B4 has information of another type stored therein, and (7.) the single register 314 can be used to indicate if the P1 register 262 having buffered information of the other type stored therein. The use of these single bit registers simplifies the control of the circuit and allows the circuit to operate at a higher speed that would otherwise not be possible for operation.
  • Corresponding registers (not shown) may also be stored in the control register block 320 be provided to indicate whether each of the backup registers B0, B2 and B4, as well as the P1 register 262 other specific types of information stored in the same, such as. For example, an address corresponding to the base address of the core of this processing element, an address of a BIN read packet or a block transmission packet (BltOb) as defined in Table I above. Corresponding information is also contained in the additional single-bit registers (not shown) in the control register block 320 (ie, in content-identifying storage elements) to store the contents of the unbuffered (ie, priority) backup storage elements, such as the registers, in the unbuffered backup path 294 are included. In addition, the "type identifying" single bit registers (not shown) in the control register block 320 used to identify if the P1 register 262 certain types of contains buffered information, such as buffered BltOb information, and then uses it to identify if it contains certain types of unbuffered information, such as unbuffered BltOb information.
  • The B_Rdy_Out signal has two states on; d. H. an activated state indicating that the processing element, that transmits the B_Rdy_Out signal is available to record information and a non-activated state, which indicates that that Processing element not available is to take information. In one embodiment corresponds to the activated state of a logical "1", wherein the deactivated state of a logical "0" corresponds.
  • In addition to having a processing element disable its outgoing B_Rdy_Out signal to an upstream processing element in response to receipt of a deactivated incoming B_Rdy_In signal from a downstream processing element, a processing element may not be able to receive information for a variety of other reasons. For example, if the buffered write FIFO (BW) 228 (ie a non-priority information storage element) becomes full within a certain number of entries (as will be explained in more detail below) becomes the B_Rdy logic circuit 326 cause the outgoing B_Rdy_Out signal (during the next Wr_CLK clock) to be deactivated to prevent the upstream processing element from transmitting information, thereby ensuring that no input information is lost as the interface circuit 226 can not receive any further information. Further, during a core read operation, if no read data already exists (for example, due to a prefetch operation = pre-fetch operation), the B_Rdy logic circuit 326 cause the outgoing B_Rdy_Out signal (during the next Wr_CLK clock) to be disabled when an incoming information packet is encoded to a read operation from the core 224 perform. This read operation may include reading one or more core registers or reading a memory element (such as a frame buffer) coupled to the core.
  • When a read packet (eg, a register read packet) is received, if there is no prefetched data, the processing element receives 222 an indication that it should stop (as soon as possible) from sending information and type data to the downstream processing element. That is, a read packet from the processing element 222 is received, forms an indication that the processing element 222 (in addition to signaling the upstream processing element to stop sending information) should currently (as soon as possible) stop sending information and type data to the downstream processing element, hence the processing element 222 must wait for information to be read from its core before that information (which is multiplexed with the previously received address) can be transmitted to the downstream processing element.
  • As mentioned above, when read-out of core data is requested, a small delay period occurs as well as to the core processor 224 is accessed before the read data is sent back. The data is either sent via the buffered read FIFO (BR) 232 (ie via a non-priority interface input memory element) or the unbuffered read FIFO (UR) 234 (ie, via a Priority Interface Input Memory Element) is read from the core, depending on whether the data is buffered or unbuffered, as explained below. The data from these FIFOs are then placed in one of four information holding registers, ie information holding registers P1_OUT, P2_OUT, U_P1_OUT and U_P2_OUT ( 272 . 274 . 276 respectively. 278 ), read in. For simplicity, only reading buffered data will be described again below, but it should be apparent that reading of unbuffered data is done the same way (although different registers are used).
  • During the delay period, when on the core processor 224 is accessed, the address of the read operation in an information storage element within the backup information path 290 stored, wherein the processing element 224 its outgoing B_Rdy out signal on the bus line 250B disabled to signal the upstream processing element to stop sending information. If the data from the core processor 224 are sent back through the buffered read FIFO (BR) 232 and out in the P2_OUT register 274 saved. Once the recovered data in the P2_OUT register 274 are stored, the B_Rdy_Out signal is reactivated (indicating that the upstream processing element is again allowed to send information), the address corresponding to the read being passed through the multiplexers 292A . 270 and 282 is forwarded and output during the rising edge of the next P2_CLK signal. The like The recovered data is then (after passing through the multiplexer 280 and 282 ) during the rising edge of the next RD_CLK signal. Thus, when a kernel read is performed, the B_Rdy_Out signal may be used to temporarily suspend the upstream processing elements until the read data is available. Consequently, after the core read operation is performed, the address and the data retrieved from this address can be transferred synchronously with a delay period determined (substantially) only by the delay associated with the core read operation.
  • Referring now to 8th Figure 12 is a timing diagram illustrating the latency that occurs between the time when the outgoing B_Rdy_Out signal from the processing element 222 is deactivated after it has received, for example, a core read packet and the time when the processing element 222 actually ceases to receive information from an upstream processing element (e.g., the processing element 202 ) to recieve. These signals are shown for comparison on the same time scale (horizontal axis). The arrangement of one signal over another does not indicate that a signal has a greater amplitude than the other signals.
  • As shown, during phase 1 (P1) and phase 2 (P2) of each transfer cycle, address and data information, respectively, are presented on the bus lines 250F receive. If the type information associated with an address sent during the phase P1 of the transfer cycle T1 indicates that a core read operation is to be performed, then during the phase P2 of the transfer cycle T1 the B_Rdy_Out signal is deactivated. As indicated by the high state of the bit <8> of the type field (ie, the validity signal), are on the bus lines 250F until the end of the transmission cycle T2 continuously receive valid address and data information. The time delay between the time when the B_Rdy_Out signal is deactivated during the transmission cycle T1 and the time when it ceases to transmit information on the bus during the transmission cycle T2 results in a backup of the information at the processing element 222 ,
  • Referring now to 9 There is a corresponding delay between the time when the B_Rdy_Out signal passes through the processing element 222 is activated again, and the time when the processing element 222 again valid information from an upstream processing element (for example, the processing element 202 ) is displayed. These signals are shown for comparison on the same time scale (horizontal axis). The arrangement of one signal over another signal does not indicate that a signal has a higher amplitude than the other signals. This time delay allows information in the backup information path 290 from the output register 300 be forwarded before new information from the input register 260 the interface circuit 226 be received.
  • The following will be up again 6 Referenced. Now the operation of the interface circuit 226 after it has received an incoming B_Rdy_In signal (from a downstream processing element) that has been deactivated. A first portion of an incoming information packet (eg, an address and address type information) will be at the input of the input register 260 during the phase P1 of a transmission cycle. During the phase P2 of the same transfer cycle, the first portion of the information packet becomes the input register 260 latched. During the phase P1 of the next transmission cycle, the information stored in the input register 260 stored in the P1 register 262 loaded simultaneously with a second portion of the information packet (eg, data and data type information) in the input register 260 is loaded. Both information sections then become the first stage of the buffered backup path during the subsequent P2 clock phase 292 (ie to registers B4 and B5) forwarded. The information is from the P1 register 262 and the input register 260 continued into subsequent registers (ie, buffered backup storage elements) in the buffered backup path 292 (during the subsequent P2 clock phases) until the upstream processor in response to the interface circuit 226 that disables its outgoing B_Rdy_Out signal, stops transmitting.
  • If the interface circuit 226 is again ready to receive and / or transmit information, its outgoing B_Rdy_Out signal is asserted, the process being to clear the registers in the buffered backup path 292 starts. During phase 1 (P1) of the first transfer cycle, after the outgoing B_Rdy_Out signal has been reactivated, the information from register B4 is passed through the multiplexers 292A . 270 and 282 forwarded and in the output register 300 latched. During phase 2 (P2) of the same transfer cycle, information from register B5 is passed through the multiplexers 292A . 270 and 282 forwarded and in the output register 300 latched.
  • During phase 1 (P1) and phase 2 (P2) of the second transfer cycle, the information from registers B4 and B5, respectively, which have been shifted out of registers B2 and B3 (if any), is added to the output register 300 forwarded. Next, during phase 1 (P1) of the third transfer cycle, the information from register B4 (if present) is transferred from register B0 to register B2 during the first transfer cycle and from register B2 to register B4 during the second transfer cycle have been shifted, via the multiplexers to the output register 300 forwarded. Further, during phase 1 (P1) of the third transfer cycle, an upstream processing element (eg, the processing element 202 ) a first portion of an information packet to the bus 250 , Consequently, during phase 2 (P2) of the third transfer cycle, the first portion of the information packet from the upstream processing element becomes the register 260 while at the same time the information from the register B5, which has been shifted from the register B1 to the register B3 during the first transmission cycle and from the register B3 during the second transmission cycle, into the output register 300 be licked, resulting in the buffered safety path 292 is emptied.
  • Finally, during phase one (P1) of the fourth transfer cycle, the second portion of the information packet from the upstream processing element may be input to the input register at the same time 260 be latched, on which the first section of the information packet (the previous in the input register 260 was licked) directly over the passageway 261 in the output register 300 can be forwarded. Because the buffered backup way 292 has a size to accommodate the maximum amount of information that may possibly be saved in it, no information is lost when a processing element stops operating, maintaining very fast communication throughput.
  • 7. The buffered and unbuffered information channels
  • The following will continue on 6 Referenced. As explained above, the first and second sections of each incoming information packet (as well as its associated type field) each become the P1 register 262 and the input register 260 (ie an input memory element) before they are respectively written to the P2WD register 266 and the P1WD register 264 latched and to the core processor 224 of the processing element 222 to get redirected. Both sections of each information packet appear in the P1WD and P2WD registers 264 and 266 are latched into either the buffered write FIFO (BW), depending on the contents of the type field of the first section of the packet. 228 (ie the non-priority interface output memory element) or the unbuffered write FIFO (UW) 230 (ie, the priority interface output memory element). That is, the type field of the first portion of the packet, ie the portion that is in the P1 register 262 is checked to determine if the packet is identified as buffered or unbuffered according to the listings in Table I.
  • As shown in Table I, if the bit <7> of this type field is a "1", the packet should always go to the buffered write FIFO (BW). 228 If the bit <7> is a "0", it depends on the value of the <3> bit of the type field, if the packet is to the buffered write FIFO (BW). 228 or the unbuffered write FIFO (UW) 230 is transmitted. If the bit <3> of the type field of the first portion of a packet is a "1", in one embodiment the information is unbuffered, conversely, if the bit <3> is a "0", the information is buffered.
  • According to one embodiment, the content of the type field of the P1 register does not become 262 to determine whether it contains buffered or unbuffered information, but two separate single-bit registers (e.g., the registers 316 and 318 in the control register block 320 ) are used to keep track of whether the P1 register 262 currently contains buffered or unbuffered information. For example, the register 316 used to indicate if the P1 register 262 currently contains buffered information, where the register 318 can be used to indicate if the P1 register 262 currently contains unbuffered information. The next state content of each of the registers 316 and 318 can be determined by specifying the type of information contained in the input register 260 are stored, checked, so that the P1 register 262 Input information is received, the registers 316 and 318 Input signals are received, which are the type of information input signal in the P1 register 262 during the same clock phase.
  • The buffered write FIFO (BW) 228 and the unbuffered write FIFO (UW) 230 are more detailed in 10 shown. As shown, in one embodiment, each of the buffered write and unbuffered write FIFOs (BW, UW) comprises: 228 and 230 eight entries to each of the two sections to store an information packet. That is, each of the buffered write and unbuffered write FIFOs (BW, UW) 228 and 230 includes eight entries to store: (1) Phase I information and type data (P1_WrInfo and P1_WrType) from the P1WD register 264 , and (2.) Phase 2 information and type data (P2_WrInfo and P2_WrType) from the P1WD register 266 , The FIFO 228 further includes a charge control circuit 227 and a discharge control circuit 229 where the unbuffered write FIFO (UW) 230 a charge control circuit 231 and a discharge control circuit 233 includes.
  • The following will be briefly on 11 Referenced. A timing chart showing the relationship between the incoming clock signal In_CLK and the write clock signal Wr_CLK from the divide-by-two circuit is shown 295 (in the 6 is shown) and the P1_WrInfo and P2_WrInfo signals (ie, the information that is written to the FIFO 228 or 230 from the registers 264 and 266 have been written) shows. These signals are shown for comparison on the same time scale (horizontal axis). Placing one signal over another signal does not indicate that a signal has a higher amplitude than the other signals. As it is in 10 2, each of the charge control circuits receives 227 and 231 the Wr_CLK signal as a clock input. The charging control circuit 227 also receives a buffered information loading signal (B_WrInfoLd), which is a result of the type-bit check explained above, to identify whether the information is buffered or unbuffered. The charging control circuit 227 returns a buffered write full signal ( B _WrFull), which is activated when the buffered write FIFO (BW) 228 is full to four or fewer entries and that is disabled when the buffered write FIFO (BW) 228 has five or more entries that can be filled. Accordingly, the charging control circuit receives 231 an unbuffered information loading signal (U_WrInfoLd), which is also a result of the type-bit check explained above to identify whether the information is buffered or unbuffered. The charging control circuit 231 returns an unbuffered write full signal (U_WrFull) which is activated when the unbuffered write FIFO (UW) 230 is full to four or fewer entries, and this is disabled when the unbuffered write (UW) FIFO 230 has five or more entries that can be filled.
  • Each of the discharge control circuits 229 and 233 further receives a respective core clock signal and a discharge signal from the core (which allows the core to discharge information from the FIFOs at the core clock rate when one of the discharge signals is asserted) and provides a respective empty signal to the core (the core indicates that the FIFO currently has no information stored therein). Thus, the kernel can access the P1 and P2 information and type data (P1_Info, P1_Type, P2_Info, and P2_Type) from either the buffered write or unbuffered write FIFO (BW, UW). 228 or 230 access at a rate asynchronous to the clock rate at which these FIFOs retrieve the information and type data from the interface circuit 226 receive.
  • In the 12 and 13 are the asynchronous buffered read and unbuffered read FIFOs (BR, UR) 232 and 234 shown in more detail. 12 shows how the buffered read and unbuffered read FIFOs (BR, UR) 232 and 234 could work if the processing element 222 is configured in a forwarding mode, wherein 13 shows how the buffered read and unbuffered read FIFOs (BR, UR) 232 and 234 work if the processing item is not configured in forwarding mode (such as those in 6 shown configuration). First up 12 Referenced. If the processing element 222 configured in a forwarding mode receives one of the buffered read and unbuffered read FIFOs (BR, UR) 232 and 234 that is selected, both portions of an information packet (ie, phase 1 information (P1_Info) and type data (P1_Type) and phase 2 information (P2_Info) and type data (P2_Type)) from the core along with a core clock signal and a load signal, and provides a "full" signal to the core to indicate to the core when each of the FIFOs is full. In the example shown, both the buffered read and unbuffered read FIFOs (BR, UR) are 232 and 234 eight entries deep, however, the necessary depth of the read FIFOs is determined by the operations to be performed by the kernel.
  • As it is in 12 shown, provides the buffered read FIFO (BR) 232 both sections of a buffered information packet (ie, phase 1 output information (P1_OutInfo) and type data (P1_OutType) and phase 2 output information (P2_OutInfo) and type data (P2_OutType)) to the interface circuit 226 , In particular, the phase 1 output information P1_OutInfo and type data P1_OutType are applied to the P1_Out register 272 supplied, with the phase 2 output information P2_OutInfo and type data P2_OutType to the P2_Out register 274 to be delivered. Accordingly, the unbuffered read FIFO (UR) provides 234 both sections of an unbuffered information packet to the interface circuit 226 , That is, the unbuffered phase I output information (P1_U_OutInfo) and type data (P1_U_OutType) are taken from the unbuffered read FIFO (UR). 234 to the U_P1_Out register 276 supplied, with the unbuffered phase 2 output information (P2_U_OutInfo) and type data (P2_U_OutType) from the non-buffer reading FIFO (UR) 234 to the U_P2_Out register 278 to be delivered.
  • The following will now be on 13 Referenced. If the processing element 222 is not configured in a redirect mode, receives one of the buffered read and unbuffered read FIFOs (BR, UR). 232 and 234 , which is selected from the core information together with mask data, which are explained below and associated with the information. If the redirect mode is not configured, the type data associated with each section of an information packet is not relevant and will be ignored. The buffered read FIFO (BR) 232 provides at its output either the two portions of a buffered information packet (ie either the buffered phase 1 output information (P1_OutInfo) or the buffered phase 2 output information (P2_OutInfo)) as well as the mask data (OutMask) from the core to the interface circuit 226 , In particular, either the buffered phase 1 output information P1_OutInfo or the buffered phase 2 output information P2_OutInfo are applied to the P1_Out register 272 or the P2_Out register 274 supplied with the 32 bits of the buffered mask data (OutMask) to the mask register 284 to be delivered. Accordingly, the unbuffered read FIFO (UR) provides 234 each portion of an unbuffered information packet as well as the unbuffered mask data (OutMask) from the core to the interface circuit 226 , In particular, either the unbuffered phase 1 output data (P1_U_OutInfo) or the unbuffered phase 2 output data (P2_U_OutInfo) are either applied to the U_P1_Out register 276 or the U_P2_Out register 278 supplied, with the unbuffered mask data (OutMask) to the mask register 284 to be delivered. According to one embodiment, only a single bit of the unbuffered mask data is used.
  • The buffered mask data (OutMask) associated with the mask register 284 are transmitted (when buffered information is read) are used to perform selective masking or bit-slicing of the buffered information output from the core. In this way, the 32-bit mask selects buffered information on a bit-by-bit basis from either: (1) the buffered information output from the core, or (2) either the buffered information passing through the passageway 261 flow, or the buffered information coming from the backup information path 290 be read out. The only bit of unbuffered mask data (OutMask) that goes to the mask register 284 is transmitted (when unbuffered information is read out) is used to select an entire 32-bit information word from one of the two just listed sources. It should be noted that, alternatively, 32 bits of the unbuffered mask data may be used to selectively mask the unbuffered information from the core "bit by bit".
  • The following will be briefly on 14 Referenced. A timing chart illustrating the relationship between the outgoing clock signal Out_CLK and the read clock signal RD_CLK from the divide-by-two circuit is presented 297 (shown in 6 ) and the buffered output information from one of the FIFOs 232 and 234 represents. The buffered output information, ie P1_OutInfo and P2_OutInfo, from the buffered read FIFO (BR) 232 are shown as an example and could also represent the unbuffered output information P1_U_OutInfo and P2_U_OutInfo. These signals are shown for comparison on the same time scale (horizontal axis). The arrangement of one signal over another does not indicate that a signal has a greater amplitude than the other signals.
  • In the 12 and 13 includes the buffered read FIFO (BR) 232 the discharge control circuit 235 and the charge control circuit 237 where the unbuffered read FIFO (UR) 234 the discharge control circuit 239 and the charge control circuit 234 includes. The information and mask data are loaded into one of the FIFOs during a transition of the core clock signal when the load signal to one of the charge control circuits 237 or 241 is enabled, and is discharged from one of the FIFOs during a transition of the RD_CLK clock signal when either the buffered or unbuffered unload signal (FIG. B _OutInfoUnld or U_OutInfoUnld) to one of the discharge control circuits 235 and 239 is activated. The signals B_Out-DatVal and U_OutDatVal from the discharge control circuits 235 respectively. 239 indicate that buffered or unbuffered information is ready to enter into a corresponding read register 272 . 274 . 276 and 278 to be clocked.
  • As noted above, by controlling the FIFOs 228 . 230 . 232 and 234 and the backup information path 290 in such a way that unbuffered (ie priority) information always takes precedence over buffered (ie non-priority) information, two different types of information are passed through a pipeline along a single bus without requiring that any non-priority Information that exists in the pipeline may be evicted from the pipeline or discarded before the priority information can be sent. That is, any non-priority information that exists in the pipeline when a priority information packet enters the interface may occur in one of the non-priority information storage elements in one of the non-priority information paths (ie in the buffered read FIFO (BR)). 232 , the buffered write FIFO (BW) 228 or the buffered security route 292 ) are held stationary until the priority information passes through the interface, whereupon the non-priority information can be retransmitted. By sharing both the incoming and outgoing bus of an interface circuit (e.g., the interface circuit 226 ) for two distinct and independent pipelined paths and not by providing reserved paths for either the priority or the non-priority information, significant hardware and ASIC pin number savings are realized over prior art systems which merely Provide pipelined bus solutions that do not share resources.

Claims (5)

  1. Bus connection system comprising: a plurality of interface units ( 202 . 222 . 242 ) coupled in a ring by means of a bus structure, wherein each of the plurality of interface units is adapted to communicate with a corresponding processor (1). 204 . 224 . 244 ) to provide communication signals received from the interface unit to the processor, the bus structure comprising a plurality of individual buses, each of the plurality of individual buses being coupled between a forward-flow interface unit (16). 202 . 222 . 242 ) and a downstream interface unit in the data flow direction ( 202 . 222 . 242 ) is coupled to act as the output bus for the respective upstream data interface device and as the input bus for the respective upstream downstream interface device; wherein at least one of the interface units has the following features: a passageway ( 261 ) coupled between the input bus and the output bus of the at least one interface unit for coupling communication signals from the input bus directly through the at least one interface unit to the output bus; and a security route ( 290 ) located between the input bus and the output bus of the at least one interface unit and parallel to the passageway (FIG. 261 ) to temporarily store the communication signals from the input bus before the communication signals are transmitted to the output bus.
  2. The bus link system of claim 1, wherein the at least one interface unit further comprises a plurality of buffers (Fig. 228 . 230 ), which communicates with the passageway ( 261 ) to temporarily store the communication signals to enable the processor coupled to the at least one interface unit to read the communication signals.
  3. Bus connection system according to claim 1 or 2, wherein by the at least one interface unit from the input bus the at least one interface unit received communication signals transferred to the processor coupled to the at least one interface unit is before this from the at least one interface unit transmitted to the output bus of the at least one interface unit become.
  4. Bus connection system according to one of claims 1 to 3, in which each of the plurality of individual buses identical to the other is the plurality of individual buses.
  5. Bus connection system according to one of claims 1 to 4, wherein the plurality of interface units comprises: a respective passageway ( 261 ) coupled to the input bus and the output bus of the interface unit, for transmitting communication signals from the input bus directly through the interface unit to the output bus, and a respective fuse path coupled between the input bus and the output bus of the interface unit and parallel to the passageway to temporarily store the communication signals from the input bus before they are transmitted to the output bus.
DE1998116153 1997-05-01 1998-04-09 Bus interconnect system for graphic processing system - has individual buses coupling graphics processing elements in ring, with signal lines for transferring command signals between elements Expired - Fee Related DE19816153B4 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US847271 1997-05-01
US08/847,271 US5911056A (en) 1997-05-01 1997-05-01 High speed interconnect bus
DE19861337A DE19861337B4 (en) 1997-05-01 1998-04-09 Bus interconnect system for graphic processing system - has individual buses coupling graphics processing elements in ring, with signal lines for transferring command signals between elements

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
DE19861337A DE19861337B4 (en) 1997-05-01 1998-04-09 Bus interconnect system for graphic processing system - has individual buses coupling graphics processing elements in ring, with signal lines for transferring command signals between elements

Publications (2)

Publication Number Publication Date
DE19816153A1 DE19816153A1 (en) 1998-11-05
DE19816153B4 true DE19816153B4 (en) 2005-06-23

Family

ID=34621368

Family Applications (1)

Application Number Title Priority Date Filing Date
DE1998116153 Expired - Fee Related DE19816153B4 (en) 1997-05-01 1998-04-09 Bus interconnect system for graphic processing system - has individual buses coupling graphics processing elements in ring, with signal lines for transferring command signals between elements

Country Status (1)

Country Link
DE (1) DE19816153B4 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4536873A (en) * 1984-03-19 1985-08-20 Honeywell Inc. Data transmission system
EP0322116A2 (en) * 1987-12-22 1989-06-28 Kendall Square Research Corporation Interconnect system for multiprocessor structure
US5504918A (en) * 1991-07-30 1996-04-02 Commissariat A L'energie Atomique Parallel processor system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4536873A (en) * 1984-03-19 1985-08-20 Honeywell Inc. Data transmission system
EP0322116A2 (en) * 1987-12-22 1989-06-28 Kendall Square Research Corporation Interconnect system for multiprocessor structure
US5504918A (en) * 1991-07-30 1996-04-02 Commissariat A L'energie Atomique Parallel processor system

Also Published As

Publication number Publication date
DE19816153A1 (en) 1998-11-05

Similar Documents

Publication Publication Date Title
US5517611A (en) Floating-point processor for a high performance three dimensional graphics accelerator
JP2686849B2 (en) Integrated circuit cache memory
US6336180B1 (en) Method, apparatus and system for managing virtual memory with virtual-physical mapping
US6108725A (en) Multi-port internally cached DRAM system utilizing independent serial interfaces and buffers arbitratively connected under a dynamic configuration to allow access to a common internal bus
US6925553B2 (en) Staggering execution of a single packed data instruction using the same circuit
US6532525B1 (en) Method and apparatus for accessing memory
US6564304B1 (en) Memory processing system and method for accessing memory including reordering memory requests to reduce mode switching
EP2472403B1 (en) Memory arbitration system and method having an arbitration packet protocol
US6674536B2 (en) Multi-instruction stream processor
US5870627A (en) System for managing direct memory access transfer in a multi-channel system using circular descriptor queue, descriptor FIFO, and receive status queue
EP0681282B1 (en) Flexible DRAM access in a frame buffer memory and system
US5870625A (en) Non-blocking memory write/read mechanism by combining two pending commands write and read in buffer and executing the combined command in advance of other pending command
US6118724A (en) Memory controller architecture
JP2014501950A (en) High speed memory system
JP4315552B2 (en) Semiconductor integrated circuit device
EP0627700A2 (en) Architecture for a high performance three dimensional graphics accelerator
US6707463B1 (en) Data normalization technique
US6047348A (en) System and method for supporting a multiple width memory subsystem
US6950910B2 (en) Mobile wireless communication device architectures and methods therefor
JP4499420B2 (en) Supercharge message exchange device
JP4511638B2 (en) Computer system controller with internal memory and external memory control
EP0875854A2 (en) Reconfigurable image processing pipeline
US4949301A (en) Improved pointer FIFO controller for converting a standard RAM into a simulated dual FIFO by controlling the RAM&#39;s address inputs
CN1093963C (en) Accumulated time delay reduction in synchronous transmission between two mutually asynchronous buses
US6356270B2 (en) Efficient utilization of write-combining buffers

Legal Events

Date Code Title Description
OP8 Request for examination as to paragraph 44 patent law
8127 New person/name/address of the applicant

Owner name: HEWLETT-PACKARD CO. (N.D.GES.D.STAATES DELAWARE),

Q171 Divided out to:

Ref document number: 19861337

Country of ref document: DE

Kind code of ref document: P

8172 Supplementary division/partition in:

Ref document number: 19861337

Country of ref document: DE

Kind code of ref document: P

8127 New person/name/address of the applicant

Owner name: HEWLETT-PACKARD DEVELOPMENT CO., L.P., HOUSTON, TE

8364 No opposition during term of opposition
R119 Application deemed withdrawn, or ip right lapsed, due to non-payment of renewal fee
R119 Application deemed withdrawn, or ip right lapsed, due to non-payment of renewal fee

Effective date: 20141101