WO2022026497A1

WO2022026497A1 - An enhanced processor data transport mechanism

Info

Publication number: WO2022026497A1
Application number: PCT/US2021/043374
Authority: WO
Inventors: Forrest L. PIERSON
Original assignee: Pierson Forrest L
Priority date: 2020-07-30
Filing date: 2021-07-28
Publication date: 2022-02-03

Abstract

Disclosed herein are methods, systems, and devices for providing an enhanced processor data transport mechanism for increased throughput and mitigation of data starvation. In at least one embodiment, the invention provides the transport mechanism for all high capacity communications between integrated circuits (ICs) inside a computer or data processing system; the CPU, main memory, boot memory, mass storage, graphics, a low speed Input- Output (I/O) controller, and high speed I/O ports. The transport mechanism is deliberately void of any instruction sets or data orientation of its payload contents. Traffic routing and switching is controlled at the transport layer only. This allows the same transport infrastructure to carry all high capacity traffic between different ICs, with instructions, data orientation and data structure controlled by upper layer protocols that are applicable to the end points only and irrelevant to any intermediate ICs that just happen to switch the payload from source to destination.

Description

TITLE

AN ENHANCED PROCESSOR DATA TRANSPORT MECHANISM

CROSS-REFERENCE TO RELATED APPLICATION

[0001] This application claims priority to U.S. Provisional Patent Application No. 63/058,652 filed on July 30, 2020, the entire content of which is incorporated herein by reference.

TECHNICAL FIELD

[0002] The present disclosure relates to data switching. More particularly, the present disclosure relates to replacing all of the different internal protocols for communications between Integrated Circuits (IC) inside a processor system and other high capacity data handling systems with a single physical layer protocol used for switching data between all ICs.

BACKGROUND

[0003] Traditional computers use a variety of different protocols for IC-to-IC communications inside a data processing system, or other high capacity data handling, storage, or processing systems. The Central Processor Unit or other central controller (CPU) communicates with boot memory using an enhanced Local Bus Controller (eLBC). The CPU communicates with main memory using Double Data Rate (DDR) protocols with different version numbers describing the voltage levels used between the logic low level and the logic high level of the main memory interface. Since DDR is a single ended signal, it has a very limited distance of operation, requiring main memory ICs to be place in very close proximity to the CPU and oriented in a particular fashion. As DDR generates a great deal of heat, the close proximity of the CPU to the DDR memories makes removal of heat from either a challenge.

[0004] The CPU communicates with rotating mass storage devices (disk drives), Capacitive

Disk (CD), Digital Versatile Disk (DVD), and Solid State Disk drives (SSD) using a protocol called Serial Advanced Technology® Attachment, or alternately a protocol called Serial Attached SCSI (SAS), with the different version numbers describing the speed with which data is passed to and from the mass storage device. Note that when used in conjunction with mass storage devices the term “Advanced Technology” has been trademarked by IBM Corporation, so many companies simply use the term “AT” in Serial AT Attachment (SATA) to avoid trademark infringement issues. There is also an external SATA interface (eSATA) with slightly different voltage levels used for communicating with mass storage devices externally attached to a data processing system, although SATA and eSATA are sufficiently similar that under most conditions they can be used interchangeably.

[0005] The CPU also communicates with most external high speed I/O components using a protocol called Peripheral Communications Interface express (PCIe). PCIe has become a prevalent communications protocol for most high speed I/O interfaces between ICs including graphics generators, ethemet interfaces, various types of Universal Serial Bus (USB), and other external interfaces into and out of the data processing system. A variation on PCIe is used to communicate between the CPU and a peripheral interface IC generically called the Northbridge, which provides PCIe, USB, and optionally some graphics generation. For graphics generators, the information received over a PCIe bus is then translated into a video graphics standard such as VGA, DVI, HDMI, or DisplayPort, such translations incurring delays which, while not applicable to most viewing of video displays, for people engaged in intense interactive video activities such as playing video games with other players, the users suffer disadvantages versus other players that is proportional to the delay in translating between PCIe and their chosen outgoing display standard.

[0006] Further, all of these protocols involve a relationship between the CPU and the resource the CPU is accessing called a “master-slave” relationship, that is, the CPU must initiate all transfers and transactions. Resources are not allowed to autonomously transfer data or status to the CPU.

[0007] The protocols identified in the previous few paragraphs are not intended to be a complete list of the different means by which components of a processing system communicate with each other. There may be other protocols, including older protocols that evolved into or were rendered obsolete by these protocols, that are still in use today.

[0008] The protocols identified in the previous few paragraphs are not compatible with each other. Some use differential signals with small voltage levels and technologies while others use single ended signals with larger voltage levels. The different protocols use different operating speeds. Each protocol has a different set of command structures and data organization strategies riding on top of the physical transport layers that are also incompatible with each other. When not in use, the particular protocol’s physical layer is idle, preventing it from being used by ICs adhering to other standards. The throughput of many data processing systems today is stymied by the fact that the CPU itself is data- starved, that is, it has to wait for external interfaces to bring it more data while sitting idle waiting for said data.

[0009] To alleviate some of these issues, many manufacturers of CPU ICs have resorted to making packages with over a thousand pins, and in some instances, are approaching two thousand pins, to increase the amount of data flowing into and out of the CPU. However, these packaging options are expensive to implement and require expensive processes to mass produce and functionally test Printed Circuit Boards (PCB) using these large pin-count ICs. Further, due to the physical nature of the densest main memory ICs available at the time of the writing of this invention disclosure, even the most powerful and largest of CPU ICs are limited into how much main memory they can access, typically ranging from 128 Gbytes to 512 Gbytes.

[00010] Clearly, a single common transport protocol that can connect all of these different resources and I/O devices to each other and the CPU is needed, a single physical layer protocol that can carry memory accesses, mass storage accesses, video data or commands, slow speed and high speed I/O peripherals into and out of the CPU, a protocol that can transparently transport different upper layer command sets and data to the different peripherals, and can even be used as a protocol to replace Ethernet as the transport protocol of choice for connectivity to Local Area Networks (LAN), a protocol that can transport so much data into and out of the CPU that data starvation is significantly mitigated, a protocol that can allow resources to autonomously initiate data and status transfers with the CPU or each other, a protocol that can reduce the pin count needed to transport data into and out of a CPU by a factor of ten or more, a protocol that can spread out and place main memory further away from the CPU and therefore have access to more PCB surface area for more main memory than what DDR can physically allow for, and make it mechanically easier to carry away heat from the CPU and main memory. SUMMARY

Disclosed herein are methods, systems, and devices for providing an enhanced processor data transport mechanism for increased throughput and mitigation of data starvation. In at least one embodiment, the invention provides the transport mechanism for all high capacity communications between integrated circuits (ICs) inside a computer or data processing system; the CPU, main memory, boot memory, mass storage, graphics, a low speed Input- Output (I/O) controller, and high speed I/O ports. The transport mechanism is deliberately void of any instruction sets or data orientation of its payload contents. Traffic routing and switching is controlled at the transport layer only. This allows the same transport infrastructure to carry all high capacity traffic between different ICs, with instructions, data orientation and data structure controlled by upper layer protocols that are applicable to the end points only and irrelevant to any intermediate ICs that just happen to switch the payload from source to destination. These upper layer protocols are carried by the transport mechanism but are not acted upon by the transport mechanism.

[00011] In at least one embodiment, the transport mechanism implements the same rules for packet size, packet structure, and packet transfer protocols implemented in US Provisional Patent Application No. 61/778,393 titled “INDEFINITELY EXPANDABLE HIGH-CAPACITY DATA SWITCH,” filed on March 12, 2013; and US Utility Application No. 14/021,563 titled “INDEFINITELY EXPANDABLE HIGH-CAPACITY DATA SWITCH,” filed on September 9, 2013, patented 9,577,955 on February 21, 2017 all above-identified applications, which are incorporated by reference herein.

This allows an I/O port based on this invention description to be directly connected to an Indefinitely Expandable Data Switch (IEDS) compatible switching system to carry its packets to and from the CPU without having to be translated into Ethernet or other commonly used high speed protocols to connect processors to LANs. Note that at the time of the writing of this invention description, state of the art in serial protocols over electrical pins typically runs at between 55 and 60 Gbps, for example, the 58.7 Gbps on the Pulse Amplitude Modulation (PAM) interface on Altera/Intel Stratix 10 FPGAs. Optical speeds can exceed this capacity, but that is because there are multiple electrical interfaces into and out of the optical I/F that are multiplexed together in the optical interface to produce, for example, 100 Gbps or 400 Gbps ethernet signals.

[00012] In at least one embodiment, the transport mechanism implements the same transport mechanism protocols used in the IEDS for transferring incoming packets through serial lanes from one IC to another.

[00013] In at least one embodiment, a destination port for packets carried by the transport mechanism does not have to be another transport mechanism compatible serial port, but instead can be an internal function of the destination IC, for example, an IC hosting main memory. The said IC hosting main memory contains a transport mechanism compatible Switching Engine (SWE) with numerous ports. Incoming ports to the SWE can be from transport mechanism serial receivers, as well as internal ports such as command and control, or data ports. Outgoing ports from the SWE can to go transport mechanism serial transmitters, and also internal ports such as command and control, or data ports.

[00014] Several transport mechanism serial ports are connected to the CPU, but only enough so that if the main memory host has a DDR4 interface running at 3200 MHz, which is at or near the “state of the art” at the time of the writing of this invention disclosure, the quantity of such ports is sufficient to carry the data from the DDR interface into the CPU without having to pause the transmission of packets to the CPU, as the goal is to keep the data interface to the CPU fully engaged. The remaining transport mechanism serial ports are used to connect to other CPUs or other central controllers, as well as connecting to other ICs that act as other resources for the CPUs and central controllers. All of these resources and the CPUs or central controllers are connected in a star, daisy chain, or even spider-web type of arrangement. Thus, additional transport mechanism serial ports go from the SWE inside the main memory host to other resources common to computer and other data processing systems: mass storage devices, graphics generators or interfaces, low speed I/O controllers, and high speed I/O interfaces. Nor does the main memory host have to be the only resource with an SWE in it. Other resources may have SWE in them as well, allowing additional resources and CPUs or other central controllers to be connected to each other in an indefinitely expandable arrangement. [00015] In at least one embodiment, internal ports in the main memory host provide the interface between the SWE and the DDR controller. Said port can be of significantly larger capacity compared to the capacity of any one single transport mechanism serial port, as it will deal with data from the DDR interface. Nor is the main memory host limited to handling DDR memories. Once the described invention becomes used often enough, it will become economically viable to build memory ICs with the transport mechanism rather than DDR interfaces on them and completely eliminate the need for a DDR interface, significantly reducing power consumption inside a data processing, storage, or handling system.

[00016] In at least one embodiment, the initial implementations of the invention will use resources that interface between the SWE and the older protocols. This will allow industry time to transition from building memories based on DDR, non-volatile memories connected to an enhanced Local Bus Controller (eLBC), mass storage based on SATA, graphics and other I/O based on PCIe, to these same resources that directly interface to the transport mechanism serial ports. For example, once main memories themselves interface using the transport mechanism serial port, then resources that host a main memory interface between DDR and the transport mechanism serial port will no longer be needed as DDR will go away.

[00017] In at least one embodiment, the main memory host would contain an internal cache of sufficient size such that when a memory read access is requested from a CPU or even another resource, the first few hundred bytes of said memory block can be pulled from the cache while the main memory host sets up the transfers from the DDR memory. Thus, while the first several packets are placed in the outgoing buffers of the transport mechanism serial port, by the time they are transmitted out of the buffer, the DDR memory will have begun providing sufficient amounts of data to refill the buffers before the buffer that was filled only by cache completely empties. In this regard, a superior service can be performed by the main memory host, enabling it to start transmitting data with less latency than if the main memory host had to wait for DDR accesses to start supplying data, which is a problem on today’s CPUs that directly host a DDR interface.

[00018] In at least one embodiment, incoming traffic on any transport mechanism’s port into the destination IC intended for a specific function inside said IC would reach the specific function regardless of which of the transport mechanism’s ports it came in on using a concept called “Physical Address Routing” (PAR). In PAR, the physical address of the destination port is carried in the lead address field of a fixed sized packet, either an only packet carrying the entire payload, or in the first packet of a multi-packet payload. All incoming packets would first pass through an SWE and be presented on the output of the SWE to all destinations. Said destination would see its address, or one of the addresses it accepts, in the lead or only packet’s lead address field and accept the packet(s), while those destinations that do not see their address or any of their addresses ignore the packet(s).

[00019] In at least one embodiment, packet switching is performed by specialized SWE designed to efficiently and autonomously switch packets using information inside the packet header. To do this, incoming packets are deserialized and all of the bits of the packet are passed around in parallel. Typical silicon switching rates and the time of the writing of this invention disclosure are somewhat in excess of 4 GHz, which means an SWE for the packets can transfer over 4 billion packets per second, which is the approximate capacity of 36 lanes of 58.7 Gbps if each packet is 512 bits in length with a 64B/66B overhead. A CPU hosting up to 36 such transport mechanism’s ports would need a single layer SWE to switch incoming data from all lanes to any destination inside the CPU. If there are more than 36 such ports in the CPU, the CPU would have a multi-level SWE, with intermediate buffers accepting physical addresses for every destination it would service from each first level SWE, which are then distributed to their destinations at the 2^nd level SWE.

[00020] In at least one embodiment, continuation packets, which will always follow a lead packet, do not carry routing information but are routed the same as the lead packet they follow. As such they must always stay with their lead packet, whether going through a transport mechanism serial port or through an SWE. Once their lead packet start passing through an SWE or go out a transport mechanism serial port, the transfer must continue until the entire payload has passed through to keep the continuation packets with their lead packet, even if a higher priority payload becomes known to the outgoing transport mechanism’s serial port or an SWE.

[00021] In at least one embodiment, if a packet or set of packets containing a payload from the packet source enters an intermediate IC, for example, a CPU sends packets to an IC hosting main memory, and if the payload is destined for an outgoing transport mechanism serial port of the IC hosting main memory, then said IC will pass the packet(s) to its addressed outgoing port to another IC connected to it. In this fashion, the ICs most frequently accessed for transfers into the CPU(s) or central controllers, which are typically main memory hosts, are connected directly to the CPU to minimize latency. Each main memory host will have multiple instances of a transport mechanism serial port directly connected to the CPU, and additional transport mechanism serial ports connected to other ICs. This allows the CPU to access all other resources over the same transport mechanism lanes connected to the main memory hosts.

[00022] In at least one embodiment, if the CPU has three such ports of 58.7 Gbps it can receive memory transfers into itself at approximately the same rate as a DDR4-3200 implementation connected directly to the CPU. Note that DDR4-3200 has a peak transfer rate of 25.6 Gbytes/seconds, although its sustained rate will be somewhat less. If the CPU has sufficient numbers of transport mechanism serial ports all connected directly to ICs hosting main memory, then the CPU can receive data from multiple instances of main memory swiftly enough to avoid the worst instances of data starvation so prevalent in the data processing industry at the time of the writing of this invention description.

[00023] In at least one implementation, the pin count for three transport mechanism serial ports (12 pins for signals and an equal number for power and ground for a total of 24) is all that is needed to supply data to a CPU at a rate comparable to that of a DDR4-3200. DDR4 needs an estimated 260 pins for signals, power and ground per DDR4 instance. Thus, the transport mechanism’s three instances does not take up anywhere near the pin count or power consumption that a single DDR4-3200 implementation takes. Further, because the transport mechanism serial ports are all one-way differential signals, they can travel much further on a PCB than DDR4 signals can, allowing main memory hosts to be spread further apart from the CPU. This spreading apart allows for more room for cooling mechanisms to access the CPU and other cooling mechanisms to access a more physically distant main memory host without mechanically interfering with each other. It also allows the CPU to access more main memory resources than the maximum number of DDR instances a single CPU IC can host, allowing the CPU to have access to larger amounts of main memory than systems where the CPU IC directly hosts DDR memories. It is possible for CPUs using this invention disclosure to have access to many terabytes of main memory, whereas most CPUs at the time of this invention disclosure are limited to 128Gbytes to 512Gbytes of main memory, depending on the density of the main memory ICs at the time of this invention disclosure and whether the CPUs have one or two DDR interfaces on them.

[00024] In at least one embodiment, even though a single 58.7 Gbps lane carries more than enough bandwidth to refresh every pixel of a “4K” screen at 120 Hz, 4K referring to a screen with 2160 lines of 3840 pixels per line, with 24 bits of color definition per pixel resulting in a requirement of approximately 24 Gbps, such an arrangement would burden the outgoing transport mechanism’s port of said CPU if it also passed through a resource hosting a portion of main memory by slowing down write operations, as around 44% of the transport mechanism’s port’s capacity would be consumed carrying raw graphics information to a display screen. Therefore, multiple transport mechanism’s ports would each carry a portion of the outgoing traffic of a graphics display, reducing the burden on any one transport mechanism’s port to a smaller, more acceptable percentage of its outgoing bandwidth.

[00025] In at least one implementation, the graphics display would receive data from multiple transport mechanism serial ports. In each instance of said port, the payload from each would be sent to an internal destination port, the payloads extracted, and their combined bit maps merged together to generate the display’s pixel streams. Nor will outgoing traffic on a transport mechanism’s port interfere with, or slow down traffic going in the other direction. Each transport mechanism’s port consists of an independent transmitter and independent receiver that do not interfere with or interact directly with each other during operation, although some Built In Test (BIT) features may briefly connect them together after a system wide reset.

[00026] In at least one embodiment, said graphics display may receive packets not just for itself, but for other displays as well as several low speed I/O ports such as USB ports or sound systems on said display. The lead address field in the packet carrying a payload destined for the USB port or sound system of said display would direct said packet to the appropriate I/O controller inside the display. Alternately, the lead address field in the packet for a packet destined for other video displays would direct said packet to the outgoing transport mechanism serial port(s) inside the display for transport to another display. Said other display can be daisy-chained to other displays as well, as many as are desired for the data processing system, limited only by how many displays can be supported by the CPU and the operating system running on the CPU. Note that while the raw data throughput needs of a display consume about 44% of an transport mechanism’s port’s capacity, an efficient video data compression scheme at the video display can easily reduce these bandwidth needs by an order of magnitude or more.

[00027] In at least one embodiment, rather than a CPU handle graphics processing, a transport mechanism compliant graphics controller receives instructions from said CPU(s) and processes the graphics, driving a video display or television using commercially acceptable display interfaces such as VGA, DVI, HDMI, Display Port or other video display interfaces, as well as any potential future video interfaces.

[00028] In at least one embodiment, when a transport mechanism serial port is used as the connection to a LAN, malicious users may try to send unsolicited packets to the data processing system for any number of nefarious reasons. Without proper security said packets could be interpreted inside the data processing system as commands from the CPU, allowing the malicious users to take control of the data processing system. However, said packets can only be interpreted inside the data processing system as commands if the lead packet or only packet is allowed to remain as a lead packet. The resource that presents the transport mechanism’s port to the outside world as the data processing system’s interface to a LAN would take all incoming packets and store them inside its own local memory, to be read by another resource inside the data processing system rather than allowing them to travel through the data processing system as if they were to be trusted. This reading of the incoming packets encapsulates them such that no part of them is considered to be a lead packet able to route its payload through the data processing system anymore.

[00029] In at least one embodiment, multiple hosts for main memory may be daisy chained to each other, with each host containing a portion of a minimum size block of memory that is transferred into a CPU. Typically, this block would be 4096 bytes in size, which is the size of a section of memory managed by the Memory Management Units (MMU) used in many Intel-based computers at the time of the writing of this invention description. This allows larger memories for CPUs, and for CPUs with fewer transport mechanism serial ports on it, the ability to access just as much memory as CPUs with more such ports. The block of memory would be divided up between the main memory host closest to the CPU and the main memory host(s) daisy chained after it. The main memory host closest to the CPU would respond the quickest, and begin filling up its buffers to those lanes going to the CPU, giving the main memory host(s) daisy chained to it the opportunity to also access its memory even after suffering the additional delay involved in going through an intermediate SWE. Said daisy chained main memory host(s) will be filling up its outgoing buffers for transmission to the main memory host closest to the CPU before the closer main memory host has finished transmitting its contents to the CPU. This results in a continuous stream of packets to the CPU from main memory, and this process can be repeated in a daisy-chain fashion as often as needed to provide the CPU with all the main memory it needs.

[00030] In at least one embodiment, multiple CPUs, multiple main memory hosts, multiple high speed and low speed IO interfaces, multiple mass storage devices, multiple graphics displays, and any other transport mechanism serial port compatible IC may exist in a single data processing system, interconnected to each other in whatever fashion the data processing system designer choses.

[00031] In at least one embodiment, a serial port of the transport mechanism can be converted from electrical signals into optical signals in close proximity to the IC hosting the port, carried a distance over optical fiber to the transport mechanism serial port of another IC, and be re-converted back into electrical signals at said IC. Due to the extremely high bit rates of the transport mechanism lanes, it is difficult at best to carry differential signals more than a few decimeters on printed circuit boards (PCB) and still be accurately recoverable. Discrete signals (ones and zeros are examples of discrete signals) utilize multiple harmonics of the bit rate to help receiving circuits accurately capture the transmitted bits. The actual distance will be dependent upon the signal amplitude and frequency response of the signal transmitter and its ability to provide additional higher frequency signal amplification of its transmitter (called pre-emphasis), the higher frequency attenuation caused by the material of the PCB the signal is being carried over, and the ability of the receiver to re-amplify higher frequency content more than lower frequency content of the received signal (called post-emphasis) to overcome the higher attenuation that higher frequency signals experience traveling over PCBs. Further, converting to an optical transport allows resources to be on separate PCBs that are electrically isolated from each other, which makes the job of hardening a data processing system against Electro-Magnetic Interference (EMI) easier. A real-world need for EMI hardening is for use in applications where the data processing system is subjected to being near a radar or radio transmitter as one might find in the aviation or ocean-going industries. [00032] In at least one embodiment, instead of converting a serial port of the transport mechanism into an optical signal, the serial signal can be kept electrical and transported over a cable independently of the PCB the ICs are on, including being transported to separate PCBs. Such embodiments will not provide as much EMI hardening as converting to optical would provide, but such embodiment will be less expensive to implement and allow the transport mechanism to be carried further than it can on etch in a PCB.

[00033] In at least one embodiment, because of the reduced pin count needed by the serial transport mechanism to provide the needed bandwidth of data entering or leaving a CPU or other controller, larger, expensive IC packaging designs with very high pin counts can be replaced with lower pin count, less expensive IC packaging designs, including having the transport mechanism, after reset, supply to the CPU all needed configuration information and boot code, thereby reducing to a bare minimum a CPU package with power, ground, multiple instances of the transport mechanism signal pins (as many as are needed for the intended use of the CPU), and reset, with the CPU deriving its internal clock references from the receive ports of one of the transport mechanisms. Further, the need for a reset signal can be eliminated if the CPU IC is conditioned to reset itself if all receivers on its transport mechanism serial ports report no received signal detected; the CPU would be reset under those conditions.

[00034] In at least one embodiment, as the serial transport mechanism can be daisy chained to go through multiple intermediate ICs prior to a payload being delivered to its destination, multiple implementations of main memory hosts can be daisy chained to each other, providing a near indefinite amount of main memory to the CPU for those applications where very large amounts of main memory are needed, for example, detailed weather forecasting.

In another embodiment, a computing module is disclosed. The computing module includes a semiconductor carrier having a four sided pin configuration, a central processing unit (CPU), serial port circuitry electrically coupled with the CPU, and a plurality of serial ports electrically coupled with the serial port circuitry. A first serial port of the plurality of serial ports is electrically coupled with a first plurality of pins positioned on a first side of the semiconductor carrier, a second serial port of the plurality of serial ports is electrically coupled with a second plurality of pins positioned on a second side of the semiconductor carrier, a third serial port of the plurality of serial ports is electrically coupled with a third plurality of pins positioned on a third side of the semiconductor carrier, and a fourth serial port of the plurality of serial ports is electrically coupled with a fourth plurality of pins positioned on a fourth side of the semiconductor carrier. The first, second, third, and fourth plurality of pins each have a commonly positioned transmit output port and transmit input port associated with the given serial port.

In some embodiments, the serial port circuitry may include a non-blocking switching engine for connectivity of payloads between the CPU and each serial port.

In some embodiments, the first, second, third, and fourth plurality of pins may each have a commonly positioned power pin and a commonly positioned ground pin.

In some embodiments, the each transmit output port may include a differential transmit output port.

In some embodiments, each receive input port may include a differential receive input port.

In some embodiments, the semiconductor carrier may be a 44-pin plastic leaded chip carrier

(PLCC).

In other embodiments, the semiconductor carrier may be a 68-pin PLCC and the plurality of serial ports may include at least eight serial ports.

In other embodiments, the semiconductor carrier may be a 100-pin PLCC and the plurality of serial ports may include at least twelve serial ports.

In other embodiments, the semiconductor carrier may be a 144-pin Quad Flat Pack (QFP) and the plurality of serial ports may include at least sixteen serial ports.

In other embodiments, the semiconductor carrier may be a 208-pin QFP and the plurality of serial ports may include at least twenty serial ports. [00035] The features and advantages described in this summary and the following detailed description are not all-inclusive. Many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims presented herein.

BRIEF DESCRIPTION OF THE DRAWINGS

[00036] The present embodiments are illustrated by way of example and are not intended to be limited by the figures of the accompanying drawings. In the drawings:

[00037] FIG. 1 depicts a diagram illustrating an embodiment of the packet structures in accordance with embodiments of the present disclosure.

FIG. 2 depicts a logic diagram illustrating a detailed embodiment of a single level Switching Engine (SWE) wherein buffers are removed for clarity but are assumed to be present as they are in FIG. 3 in accordance with embodiments of the present disclosure.

[00038] FIG. 3 depicts a flow diagram illustrating a high-level embodiment of a multi-level SWE with buffers in accordance with embodiments of the present disclosure.

[00039] FIG. 4 depicts a block diagram illustrating data processing system using a single central processing unit (CPU) in accordance with embodiments of the present disclosure.

FIG. 5 depicts a mechanical diagram illustrating a minimal pin count CPU integrated circuit (IC) using a 44-pin plastic leaded chip carrier (PLCC) package having four transport mechanism serial ports and a 68-pin PLCC package having eight transport mechanism serial ports in accordance with embodiments of the present disclosure.

DETAILED DESCRIPTIONS

The following description and figures are illustrative and are not to be construed as limiting. Numerous specific details are described to provide a thorough understanding of the disclosure. However, in certain instances, well-known or conventional details are not described in order to avoid obscuring the description. References to “one embodiment” or “an embodiment” in the present disclosure can be, but not necessarily are, references to the same embodiment and such references mean at least one of the embodiments.

Reference in this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the disclosure. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not for other embodiments.

The terms used in this specification generally have their ordinary meanings in the art, within the context of the disclosure, and in the specific context where each term is used. Certain terms that are used to describe the disclosure are discussed below, or elsewhere in the specification, to provide additional guidance to the practitioner regarding the description of the disclosure. For convenience, certain terms may be highlighted, for example using italics and/or quotation marks. The use of highlighting has no influence on the scope and meaning of a term; the scope and meaning of a term is the same, in the same context, whether or not it is highlighted. It will be appreciated that same thing can be said in more than one way.

Consequently, alternative language and synonyms may be used for any one or more of the terms discussed herein, nor is any special significance to be placed upon whether or not a term is elaborated or discussed herein. Synonyms for certain terms are provided. A recital of one or more synonyms does not exclude the use of other synonyms. The use of examples anywhere in this specification, including examples of any terms discussed herein, is illustrative only, and is not intended to further limit the scope and meaning of the disclosure or of any exemplified term. Likewise, the disclosure is not limited to various embodiments given in this specification.

Without intent to limit the scope of the disclosure, examples of instruments, apparatus, methods and their related results according to the embodiments of the present disclosure are given below. Note that titles or subtitles may be used in the examples for convenience of a reader, which in no way should limit the scope of the disclosure. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure pertains. In the case of conflict, the present document, including definitions, will control.

[00040] The following definitions may be used in the description and figures:

[00041] Payload A variable size collection of data that is passed from a source to a destination. Payloads discussed in this implementation of the invention are not specified, but can include internal constructs of blocks of memories, command and control sequences, or Input/Output (IO) data. The term Payload may be used interchangeably with datagram.

[00042] Datagram - see the definition for “Payload”.

[00043] Switching Engine (SWE) Logic and buffers implemented inside an IC to take packets from any incoming port and switch them so that every outgoing port can examine the packets, and decide whether to accept or reject the packets.

[00044] Packet A fixed sized collection of data carried over a transport layer. When a packet can carry an entire payload, it is referred to as a single or only packet, and when payloads are too large to fit inside a single packet, then a lead packet with one or more follow on, or continuation packets, are used to carry the balance of the payload that could not fit into the lead packet. If a payload does not completely fill out the last packet, then a pad field is added to make the last packet size match that of the other packets. It will then be the responsibility of the destination to go through the payload, find its size field, and truncate the pad off the end of the payload, but this function is beyond the scope of this invention description.

[00045] Single Packet (or Only Packet) A packet that carries all needed routing information and an entire payload in it. Note that the term “Single Packet” and “Only Packet” are interchangeable. Single packets are distinguished from lead packets by the fact that there are no continuation packets immediately following a single packet. [00046] Only Packet - see the definition for “Single Packet”.

[00047] Lead Packet a packet that contains the routing information for itself and any number of continuation (or follow-on) packets. The lead packet also carries a portion of the payload, with the balance carried in the continuation packet(s) that immediately follow it.

[00048] Null Packet - an Only packet with a lead address field that will not be accepted by any outgoing port connected to an SWE. It is intended to be inserted into an outgoing serial port, or a switching engine, when there are no other packets to be transmitted or switched. Null packets are ignored by all receivers, and signals or informs buffers on the output of an SWE that were accepting continuation packets that the last valid payload has completely transitioned through the SWE if no more packets are present to otherwise go through it. Its purpose is to keep the serial link active so the receiving port will stay locked onto the stream of one’s and zero’s. As an example, a null packet is defined by the first two bits are set (indicating an only packet), the next 14 bits are all zeroes (an example of an address not used by any outgoing port), then alternating with every other bit in the null packet is either a one or a zero (maximizing the number of edges transmitted to ensure the receiver stays locked onto the incoming signal).

[00049] Null Port - a port going into an SWE that generates null packets for a SWE when no other packets are being passed through the SWE.

[00050] Framed - A high speed serial receiver that is locked onto the incoming datastream and can segregate it into incoming packets is considered “Framed”. When two ports are connected to each other, when they first coming out of reset (including power up reset) they are not “Framed” to each other. Both begin transmitting Framing Packets to each other (see the definition for a Framing Packet). Once they can detect and isolate the packet boundaries of the Framing Packet, they switch to transmitting Null Packets. Once they receive and detect null packets, they are considered “Framed” to each other and can begin transmitting data packets.

[00051] Framing Packet - Similar to a null packet, except that the 6 bits immediately following the two lead bits are also one’s. The framing pattern allows receivers that are not locked onto the 64b/66b pattern to quickly find the pattern edge and the start of packet edge. Once a receiver has locked onto the framing pattern, it will transmit null packets until it is also receiving null packets or packets carrying payloads. Once a receiver is receiving null packets or packets carrying payloads, it can start transmitting payloads if it has any pending in its outgoing buffer.

[00052] Continuation Packet - A packet that follows a lead packet or another continuation packet in a multi-packet payload. The continuation packet is distinguished from a lead or only packet by a field in its header that is different from a lead or only packet. Continuation packets do not contain any routing information and must always follow their lead packet to get to their destination. But as a result of this, better than 99% of a continuation packet can be used to carry the payload, making it extremely efficient in its utilization of the transport mechanism’s bandwidth.

[00053] Transport Mechanism - the serial transport lanes between Integrated Circuits (IC), as well as those internal portions of an IC dedicated to sending, receiving, buffering, or switching packets.

[00054] Switching Engine (SWE) a mechanism inside an IC that switches fixed-size packets of data from any number of packet sources to any number of packet destinations, the goal being able to work so swiftly that the rate of arrival of incoming data cannot overwhelm the SWE’s ability to move the data to the outgoing destinations.

[00055] Physical Address Routing (PAR) A mechanism where the outgoing port address of a payload is contained in the lead address field of either a single packet or the lead packet of a multipacket payload as it emerges from an SWE. The outgoing port(s) who identify the value of the lead address field as (one of) their own accepts the packet(s). After accepting the packet(s), the outgoing port shifts the balance of the first or only packet up by one address field, overwriting the lead address field and backfilling the end of the packet with alternating ones and zeros to increase the edge density of the packet contents and thus make it easier for additional receiver(s) to stay locked onto the incoming data stream.

[00056] Resource - any IC inside a data processing system that communicates with other ICs using the transport mechanism of this invention disclosure. Resources can include CPUs and other controlling processors, main memory hosts, mass storage hosts, I/O controller hosts, graphics generators and video displays, or any other IC that can connect up to the transport mechanism and provide a function, feature, or service to the CPU(s) or other controlling processor(s).

[00057] Mass Storage - a mechanism inside or attached to a processing system used to store large quantities of non-volatile information, non-volatile referring to a memory system where the contents are not altered or lost when power is removed from the mechanism. Commonly used mass storage at the time of the writing of this invention description can include but is not limited to rotating disk drives, solid state disk drives (SSD), Capacitive Disks (CD), and Digital Versatile Disk (DVD). Past systems, mostly obsolete now, used floppy disks, rotating drums, and magnetic or optical tape drives.

[00058] The following acronyms may be used in drawings and in these descriptions:

AF Address Field BIT Built In Test Gbps Giga bits per second

IEDS Indefinitely Expandable high capacity Data Switch

IP Internet Protocol

IPv4 Internet Protocol Version 4

IPv6 Internet Protocol Version 6

PAR Physical Address Routing

PCB Printed Circuit Board assembly

RFU Reserved for Future Use

SWE Switching Engine

[00059] FIG. 4 depicts a block diagram 400 illustrating data processing system using a single central processing unit (CPU) in accordance with embodiments of the present disclosure. Block diagram 400 further illustrates a novel architecture for the switching of data inside a processing system. Instead of each of the different sub-functions that make up a data processing system having their own unique paths, protocols, and transport mechanisms, a common transport mechanism is used throughout all high capacity connections between ICs 402 402a 402b, and possibly a transport mechanism serial link 415 can go between the data processing system and a Local Area Network (LAN). Protocols that ride on top of the transport mechanism are only relevant to the end points. The transport mechanism does not have to directly connect a data source 406, 408, 414, 416 to a data destination 401; it can be switched by intermediate ICs 403 and 403b, or 403a and 403c, which allows a limited number of paths 402 into and out of one or more CPUs 401 or other central controllers to be connected to an indefinite number of other resources within the processing system while being able to utilize the bandwidth of all the ports 402 of the CPU 401 concurrently.

[00060] Another feature of the transport mechanism is that those resources 403, 403a most frequently accessed by a CPU 401 or other central controllers in the processing system can be directly connected to said controllers, limiting the latency from a resource request to a resource response. In most cases that resource 403, 403a will be a main memory host, although cases can be made for other resource hosts as determined by the architect of the processing system.

[00061] Another feature of the transport mechanism is that resources of any type can be added provided they adhere to the transport mechanism’s protocol, in any numbers, and in any arrangement.

[00062] Another feature of the transport mechanism is that resources may be connected to each other and to the central controllers by more than one physical path 402 if the bandwidth between them justifies the need. FIG. 2 depicts a logic diagram 200 illustrating a detailed embodiment of a single level Switching Engine (SWE) wherein buffers are removed for clarity but are assumed to be present as they are in FIG. 3 in accordance with embodiments of the present disclosure. The single level SWE is optimized for use in 4-input lookup tables.

[00063] FIG. 3 depicts a flow diagram 300 illustrating a high-level embodiment of a multi-level SWE with buffers in accordance with embodiments of the present disclosure. The common transport mechanism of flow diagram 300 can share bandwidth by simply selecting the outgoing port with the least full outgoing buffer 303 for the next payload going out of said port(s) that all go to the same destination. Note that it is the responsibility of upper layer protocols, which are beyond the scope of this invention disclosure, to be able to handle receiving payloads, possibly out of sequence, and correctly putting them back together in the correct order. This challenge has already been identified and solved in other higher-level protocols such Internet Protocol’s (IP) Transmission Control Protocol (TCP). The common transport mechanism is implemented in a manner having similar functionalities to the systems disclosed in US Provisional Patent Application No. 61/778,393 titled “INDEFINITELY EXPANDABLE HIGH-CAPACITY DATA SWITCH,” fded on March 12, 2013; and US Utility Application No. 14/021,563 titled “INDEFINITELY EXPANDABLE HIGH- CAPACITY DATA SWITCH,” fded on September 9, 2013, patented 9,577,955 on February 21, 2017 all above-identified applications, which are incorporated by reference herein.

More specifically, Physical Address Routing (PAR) switching involves a source knowing the route a payload needs to take to get to the destination. FIG. 1 depicts a diagram 100 illustrating an embodiment of the packet structures in accordance with embodiments of the present disclosure. Specifically, the source places in the header 101 102 103 104 105 106 encapsulating the payload the route through the switching system the payload will take. At each intermediate switching resource, the lead address field 104 in the payload points to the outgoing port of the resource that the payload will be switched out of. After going out on said port, the payload’s lead or only packet has the last 62 bytes 105 106107 108 109 in it shifted up by one address field value to provide a new lead address 104 for the next receiving IC. This over-writes the lead address 104 that was just used. The process will also fill in the back end 109 with alternating ones and zeroes to increase the density of transitions in the data, which enables a receiving port to more easily stay centered on the bit positions of the incoming serial data stream.

[00064] The common transport mechanism uses a novel means of selecting the outgoing port when multiple ports are available for use. When an outgoing payload encapsulated in one or more packets is presented to these ports for acceptance, the ports will arbitrate among themselves for the right to accept the payload based on the following scheme: 1) the port is functioning and in communications with its other end, 2) the port with the least filled outgoing buffer is selected, and in the case of a tie, either a round-robin scheme or an arbitrary tie-breaking scheme of port priority is used.

[00065] Another distinct feature of the transport mechanism in this invention disclosure is that, unlike the transport mechanism used in the Indefinitely Expandable Data Switch (IEDS), there is no use of active and standby redundancy, with the possible exception of connections to Local Area Networks (LAN). This significantly simplifies the implementation of the transport mechanism. Where active and standby are needed in LAN implementations, the LAN implementation will implement the same protocol used in the IEDS and provide the interface to the processing system, but that is beyond the scope of this invention disclosure.

[00066] Another distinct feature of the transport mechanism is the ability to switch polarities. To transfer data across printed circuit boards at the speeds used in the transport mechanism, differential signaling is used. Differential signal consists of two separate signals, with one signal always of the opposite polarity of the other with respect to a common central point. Differential signaling minimizes the magnetic fields generated by propagating signals as they cancel each other out, which increases switching speeds and tolerance to outside signal interference, and minimizes power consumption. The two signals are routed adjacent to each other from the transmitter to the receiver. To minimize distorting the signal as it moves along the etch, the following rules have to apply:

1) The distance between the two etch must be kept consistent except when connecting to the transmitter and receiver pins;

2) The quality of the PCB material must be such as to minimize a loss of signal strength at higher frequencies;

3) The signals should never change layers, that is, go through a hole called a via in the PCB to go from one layer to another of a multi-layer PCB, as this represents a change in the impedance of the transmission line, causing signal reflections and distortions that can lead to erratic reception at the receiver;

4) The signals should be on a layer next to a ground plane, and the ground plane is continuously underneath the signals from the transmitting pins to the receiving pins, to keep the signal impedance constant and free from interference from signals on other layers nearby;

5) The signals are, as much as can possibly be done, kept the exact same length;

6) When changing the direction, the signals shall encounter no angle in the etch sharper than 45 degrees from straight (for example, 90 degrees is much sharper than 45 degrees), and the distance between two corners of 45 degrees or less must be at least twice the distance between the two etch. [00067] A key rule to observe is rule 5), keeping the etch the exact same length. If while routing the signal the distance can be kept closer together by swapping one set of pins, doing so will result in a polarity inversion, that is, the receiver will see a “one” signal where it was expecting a “zero” signal and vice versa. Polarity inversions can be detected by looking at the framing pattern of the 64B/66B protocol used to identify packet boundaries. This protocol guarantees a minimum of two transitions for every 66 bits transmitted, which is sufficient for a receiver to stay locked onto an incoming data stream. The two bits of framing overhead will typically be transmitted as a zero bit, then a one bit, seven out of eight times, and the eighth time, it will be transmitted as a one, then a zero. This difference in the polarity of the bits of the framing field indicates the start of a packet, which will consist of 512 bits in eight groups of 64 bits with two framing bits added per 64-bit group. If the framing pattern is seen as seven “01” and a single “10” then the polarity inversion has not occurred, but if the receiver detects a framing pattern of seven “10” and one “01” then a polarity inversion has occurred. The receiver will detect inversion in the framing signal and thus invert all bits received, correcting the signal’s polarity reversal on the PCB. Bit inversion is already practiced in PCIe interfaces and is an accepted industry practice whose goal is to keep etch lengths in a pair of high speed differential signal etch as close to the same length as possible.

[00068] All data transferred over the transport protocol will be done in fixed sized packets 116

116a. This makes it very easy for the hardware to process the packets, and because the packets are fixed in size, there is no need to provide a field in the packet header describing the packet size, which improves transmission efficiency as there is less information needed in the header, making more of the packet available to carry payloads.

[00069] Payloads carried inside the packets can vary in size. It is not the responsibility of the transport protocol to identify payload boundaries to an upper level protocol, so the following is beyond the scope of this invention disclosure, but it will be discussed to show how the invention can carry payloads in multiple packets efficiently. When a lead packet and one or more continuation packets carry a payload 107 108 111 112 113, part of the lead packet’s tail end 109 will have been backfilled if the lead packet went through one or more intermediate ICs switching the packet from the source to the destination. To help the receiver identify which parts of the lead packet past the address fields are part of the payload, and which parts are fillers, the first field 107 found in the payload created by the payload source should be a count of how many bytes are part of the payload in the lead packet. This value will have reached the position of the first subsequent address field 105 at the destination IC. Only these values in the count 107 are extracted from the lead packet 116 and merged with the payload in the continuation packets 116a to form the payload at the receiver. If the payload fits entirely inside a single packet only, then the byte count field 107 of the first packet isn’t needed, as the payload will have the size field embedded in it and it can use that to truncate the pad at the end of the packet, or else it will be a fixed size so no byte count field is needed. Note that is the responsibility of the payload source, not the transport mechanism, to know how many intermediate resources must be used to carry the packet to the destination so that it can properly calculate the payload field byte count in the first location past the last address field 106. Note also that the term “byte” as used in this paragraph normally represents 8 bits, but if the lead address field is not 8 bits, then term “byte” as used in this paragraph represents the number of bits in the lead address field.

[00070] The structure 116 of the lead packet and the single (or only) packet is the same. The only difference between the two is that a lead packet will have one or more continuation packets 116a following it, while the single packet 116 is followed by another single or lead packet 116, or a null packet if there are no more packets to be transmitted or switched. The last continuation packet will be followed by a single packet, a lead packet, or a null packet.

[00071] Traditional CPUs have a feature known as “Direct Memory Access (DMA)”, where a resource uses a DMA engine to take control of the memory bus and transport I/O content directly to memory without first going through the CPU. While this speeds things up in traditional computers in that there is now only one access to get the information into or out of main memory from an IO device, it does force the CPU to be idle while the transfer takes place if the CPU needs access to main memory. In this invention disclosure, DMA engines become obsolete. A resource such as a LAN interface 414 that needs to send one or more payloads to a main memory host 403 403a 403b 403c simply directs the payloads to the main memory host 403 403a 403b 403c rather than the CPU 401 using the transport mechanism, and since the paths 402 between the CPU and the main memory are not used in this transfer, the CPU can continue to access main memory as needed without sharing its bandwidth. The CPU may encounter additional delays when accessing other resources as the paths to the main memory from other resources may have some or most of their bandwidth consumed carrying the DMA transfer, or the main memory has to switch from handling packets of the DMA transfer to handling packets for the CPU, but this delay is small and much less impacting on a CPU than would be found using traditional DMA engines and traditional busses to access main memory. Note that the transport mechanism paths 402 to the CPU must never be used to route packets between resources. In the example shown in FIG. 4, the Enhanced Graphics Processor 408 has a large number of paths 402a going to it that can be linked to all resources but they are lightly loaded; these paths should be used to bridge between resources as needed.

[00072] Once a payload reaches a destination, the destination IC will have one last switching function to perform. For example, a main memory resource 403 may receive a command from the CPU 401 to read a block of memory and send it to the CPU. The command port internal address will be different than the data port address, therefore, the final lead address field’s value inside the lead or only packet will direct the packet to the command port rather than the data port. What ports will be needed and provided is a function of the upper layer protocols riding on top of this transport protocol and are beyond the scope of the claims of this invention.

[00073] As paths between two resources may not always be filled, null packets must be inserted to keep the path working and the two resources in sync with each other. For this reason, certain addresses in the lead address field will be set aside for such uses, including a lead address value to identify the packet as a null packet only which does not contain any payload. Such packets are ignored at the receiving end. Note that a lead/single packet followed by a null packet will be interpreted as a single packet.

[00074] If the receiver falls out of sync with the incoming signal but is still receiving one’s and zero’s, it will start transmitting a packet called the framing packet. A framing packet is similar to a null packet, but lets the far end know communications is down so that it will stop transmitting valid packets. The other end will start transmitting framing packets if it has also lost sync, or it if continued to stay synchronized, null packets. Once a receiver is properly sync’d up to the incoming packet boundaries, it switches from transmitting framing packets to transmitting null packets if it is receiving framing packets. If it is receiving null packets it knows the far end is synchronized and can start transmitting payloads if there are any in its outgoing buffer, otherwise it transmits null packets. [00075] When a processing system first comes out of reset, the CPU or other central processor must understand, or “discover” the network that exists in the processing system it resides in. This is not novel or new, it has been used by earlier generations of processors for decades now. To implement the discovery process, an agreed upon address value placed in the lead address field 104 will enable a CPU 401 or other central processor to access the command port of a resource directly connected to it after loading boot code. While the discovery process itself is beyond the scope of this invention disclosure, upper level protocols will need to be developed that enable the CPU 401 or other central processor to utilize the command ports of resources to understand them, and then using said resource as an intermediate transport mechanism IC, discover resources connected to the intermediate transport mechanism, etc. until all resources in the processing system are identified.

[00076] Further, said discovery process should be able to handle the insertion or removal of resources after the initial discovery occurs after reset. Connections to Local Area Networks (LAN), as well as connections to external resources such as additional mass storage may happen. Again, this is beyond the scope of this invention, but is mentioned to show that upper level protocols can be developed using the claimed invention as the physical layer implemented in the system.

[00077] As the routing mechanism described in this invention disclosure could be used by malicious users to hijack, or take control of the processing system it is part of, hardware based security will be part of the transport mechanism design. A priority field 103, typically 3 bits, will define eight levels of priority in the lead or only packets. The two highest priorities will be reserved for command and response functions of various resources. Resources connected to the LAN 414 will not allow the response function priority to pass to the LAN from the processing system, since all responses from a resource to a CPU 401 or other central controller must only go to those controlling resources inside the data processing systems. Further, resources 414 connecting the LAN to the processing system will not allow command function priority to pass to the processing system from the LAN, again as all controlling resources must reside inside the data processing system. Commands may pass outside the data processing system to allow the internal controllers the ability to discover and use external resources, and responses from external resources may pass back into the data processing system as said resources must allow controlling resources inside the data processing system the ability to discover and use them. Priorities below these two priorities, which will be all forms of data, will be allowed to pass into and out of the data processing system unhindered. Of necessity, command ports in all resources will not respond to commands unless the command payload is carried by a transport mechanism packet with the command priority in the packet’s priority field 103.

[00078] Further securities can be enabled by passing all user data from the LAN to the processing system through a local memory in the LAN host 414, which is then read out of by the CPU or another resource and placed in main memory. This way buffer overflow attacks and numerous other attacks can never overwrite areas of main memory where executable code resides, and outside resources cannot send packets directly into the transport mechanism network inside the processing system.

[00079] Another feature of CPUs is the ability to have resources generate interrupts. An interrupt is a physical line or combination of lines set to certain levels asking the CPU to interrupt what it is doing to dedicate resources to handling the immediate needs of the resource. Some interrupts are generated when said resource has new incoming I/O for the CPU, or can accept more outgoing I/O as it has emptied its outgoing buffer enough to accept more data. Other interrupts occur because potentially erroneous events have occurred, for example, a loss of incoming power and the CPU 401 must immediately begin an orderly shutdown before the holdup capacity of its power source is exhausted. Other interrupts occur when a main memory resource detects a double bit error in a memory with error detection and correction. And then there are some interrupts generated internally, for example, a user program attempts to access a section of memory it isn’t allowed to access, or a divide by zero operation occurs.

[00080] The externally generated interrupts can be replaced with “interrupt packets”, which would typically be a single packet payload from a resource indicating an event requiring the immediate attention of the CPU 401 is needed. The packet would be directed towards the CPU’s command port with the goal of entering into an “interrupt register” in the CPU command port with the goal of replacing interrupt lines to minimize the pin count needed by the CPU. In general, the fewer the number of pins on an IC, the less expensive its packaging will be.

[00081] Another method of reducing pin counts on the CPU IC is to eliminate the pins needed for an enhanced Local Bus (eLB), and instead depend on an eLB Controller (eLBC) 416 to be connected to boot memory and other resources that use a traditional address and data bus to connect to the CPU 401. The eLBC resource 416, after reset, would access the boot code attached to it, place it in packets and send it to the CPU 401 over the transport mechanism. The route needed would have to be programmed into a part of the boot code as the eLBC resource 416 will not know how to discover where the master CPU 401 is. Hardware in the CPU would receive the packets, places them in its internal cache memory, and then allow the CPU to begin execution of the boot code that allows it to start up. Only after the CPU 401, having come out of reset, receives these packets with the boot code from the eLBC 416 does it begin to execute commands. The CPU 401 would only need power, ground, transport mechanism ports, and a reset signal. Even its reference clock source 421 can be obtained from the receiver of one of the transport mechanism ports 402. When a CPU IC first comes out of reset, it will operate at a minimum clock speed referenced to one of these ports, with the goal of boot code providing the command sequence needed by the CPU IC to set its own clock speed based on some multiplication factor of the select transport mechanism port whose received signal is used as the basis for an internal clock generator.

[00082] The reset signal itself can be eliminated in the CPU 401 with the simple technique of shutting off all signals going to it over the transport mechanism ports 402. Only after these signals start to ‘wiggle’ will the CPU 401 start coming out of reset and set itself up to receive boot code.

FIG. 5 depicts a mechanical diagram 500 illustrating a minimal pin count CPU integrated circuit (IC) using a 44-pin plastic leaded chip carrier (PLCC) package 501 having four transport mechanism serial ports, and a 68-pin PLCC package 502 having eight transport mechanism serial ports in accordance with embodiments of the present disclosure. Note that the signals shown on one side of each PLCC package 501 and 502 are repeated on all four sides in a symmetrical manner as depicted in FIG. 5. Properly implemented, a CPU in a quad package could be oriented in any of four directions and still properly function in the data processing system. Additional ports can be added with larger IC packages; a 100 pin PLCC package would have 12 ports 402, a 144 pin Quad Flat Pack (QFP) would have 16 ports, and a 208 pin QFP would have 24 ports on it. [00083] Typically the eLBC host 416 would also serve as a low speed I/O 419 controller, and perhaps even a health monitoring component of the data processing system, much as the Intelligent Platform Management Interface (IPMI) provides a similar service to many traditional high end processing systems that exist at the time of the writing of this invention disclosure. IPMI uses the I²C bus (part of 419) to access tiny temperature measurement ICs placed in various locations in a processing system, and other tiny voltage and current measurement ICs to measure incoming power and the output power of various voltage converters in the processing system.

In summary, a data processing system utilizing a radical means of moving data such that the data transfer rate per pin of the IC is significantly greater than any existing generally accepted protocol and can be transported over sufficient distances that PCB area intensive resources to a controlling processor can have sufficient space to implement all of said resource desired for the controller, the goal of which is to provide a means of overcoming data starvation and memory size restrictions in a processor system and provide a means of making use of all transport paths into and out of a data processing IC for any type of communications between different resources in a data processing system, such that whether the resource is a controlling processor, a main memory host, an enhanced local bus controller host, a mass storage host, high speed I/O host, a low speed I/O host, a host for a numeric or graphics processor, or a graphics display, all such resources can utilize the same pins of the controller for transporting data and instructions and thus not leave dedicated paths idle while other paths are a bottleneck to the moving of data around inside the data processing system, and said movement of data through said data processing system shall be such that intermediate ICs acting as a bridge between two endpoints will not be affected by the organization and structure of the payload as all information needed to provide the bridging function is contained in the transport mechanism and not in the payload being carried. Said data processing system will also have sufficient hardware security features built in to prevent external controllers from being able to access resources inside said data processing system. Due to the speed of the transport mechanism may low speed signals can be offloaded to resource host ICs serving the processing IC, the transport mechanism will be able to replace many of the very low speed signal pins on a traditional controller IC with the equivalent functions inside payloads carried over the transport mechanism to minimize the size and pin count, and thus the cost of said controller IC packaging. Due to the symmetry of pin assignments the transport mechanisms can provide, IC packages can be designed such that they can be installed in any orientation and still operate with no loss of functionality, efficiency, or throughput. A data processing system where close proximity of closely inter-functioning ICs can be relaxed to allow more room for easier cooling of said data processing system.

Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions.

These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create an ability for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware- based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a,” “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated. The descriptions of the various embodiments of the present invention have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

CLAIMS What is claimed is:

1. A universal transport mechanism to carry all high capacity, high speed communications between Integrated Circuits (IC) inside a data processing system using signal switching methods that provide for more bandwidth per IC pin than traditional methods for the purpose of: said transport mechanism to carry more data into and out of a Central Processing Unit (CPU) Integrated Circuit (IC) or other controller IC, per signal pin of said IC, than any Double Data Rate (DDR) mechanism in existence at the time of this invention disclosure per signal pin, whereas a small number of multiple instances of said transport mechanism are able to carry more data than a single instance of DDR using fewer pins among all of said instances of the transport mechanism than a single instance of a DDR implementation, and the CPU is able to host over 70 instances of said transport mechanism as is the case for existing technology at the time of this invention disclosure, thus mitigating a major problem with CPUs known as data starvation, and there is no practical upper limit to the number instances of the transport mechanisms an IC may have; said transport mechanism allows high use resources to be directly connected to said CPU or controller to minimize latency, and said high use resources are also connected to other less frequently used resources to enable said less frequently used resources to be accessible to the CPU or controller over the same pins of said CPU or controller as said high use resources, thus allowing all communications pins of a CPU or controller meant for high speed input and output to be used to access any resource the CPU or controller needs in the performance of its duties; and said transport mechanism carries a payload between two endpoints whose structure and organization is irrelevant to the transport mechanism, enabling any intermediate resource between a payload source and payload destination to switch said payload towards its destination even if the payload structure, content, and organization is incomprehensible to said intermediate resource.

2. The transport mechanism according to claim 1, further comprising of ICs each with a multiplicity of the transport system mechanisms: on one or more CPUs or other controllers of the data processing system; and on high use resources such that said resources can be connected directly to one or more CPUs or other controllers with multiple instances of the transport mechanism and said high use resources provide sufficient additional transport mechanisms to connect to less frequently used resources as needed for the processing system, such that over all paths into and out of the CPU or other controller any resource needed by the CPU or other controller can be accessed.

3. The transport mechanism protocol according to claim 1, containing within the protocol all necessary features to transport a payload from any data source inside the data processing system to any data destination within same said data processing system including: able to carry a payload and not utilize any field of said payload to assist in routing said payload from source to destination; and said transport mechanism containing fields within the transport protocol envelope placed around or in front of the payload which contain the address of each intermediate resource’s outgoing port the payload is being carried through, and after passing through said intermediate resource outgoing port, the address of the port is removed and the address of the next port is brought forward to the lead address field so that when the transport mechanism with its payload arrives at the next intermediate resource being used to switch the payload to its destination, or the destination IC itself, the payload will be delivered to the correct outgoing port of the next intermediate IC or the destination port within the destination IC.

4. The transport mechanism according to claim 1, utilizing a signal switching technology able to travel a greater distance over a Printed Circuit Board (PCB) than DDR signals can travel, for the purpose of: providing more physical space between ICs of a data processing system to make it easier to provide cooling mechanisms to each IC of said data processing system; provide more room for PCB-surface-area intensive resources such as main memory than what DDR can provide in the immediate vicinity of a CPU or other controller, to provide both higher overall bandwidth between all instances of said resource(s) and said CPU or controller, and more of said resource than what said resource being limited to the immediate vicinity of said CPU or controller could otherwise provide; and provide more room for other resources than would otherwise be available if PCB area intensive resources such as main memory consume most of the PCB area so close to the CPU or other controller that resource would otherwise by unavailable to the CPU or other controller.

5. The transport mechanism according to claim 4, capable of being easily carried over electrical cables or converted into an optical signal to be carried over optical fibers, for the purpose of allowing resources to be: located at such a distance from other resources such that the transport mechanism cannot be faithfully carried via PCB etch at full bandwidth; located on those parts of the PCB where routing electrical connections for the transport protocol on the PCB itself makes it too difficult to do so in such a way as to be able to fully utilize the transport mechanism’s full bandwidth; located on different PCBs than other resources but still be part of the same data processing system; and easier to harden said resources from interfering or damaging electrical fields, magnetic fields, or electromagnetic fields.

6. The transport mechanism according to claim 1, with resources designed to allow the insertion or removal of additional resources located outside of the data processing system with much higher bandwidth than what most existing interfaces provide at the time of this invention disclosure: for the purpose of allowing users of said data processing system the ability to add or remove additional resources as deemed needed by the user; for the purpose of allowing users to attach said data processing system to, or detach from, an external network such as a Local Area Network (LAN) to connect said data processing system to other systems; and for other reasons of allowing users to attach or remove to said data processing system other data processing systems.

7. The transport mechanism according to claim 1, which carries a field inside the transport mechanism to identify the priority of the payload carried by the transport mechanism for the purpose of: identifying packets carrying command payloads; identifying packets carrying response payloads; identifying packets carrying various priorities of data for the purpose of moving higher priority data, command, or response packets before lower priority data when different priorities of payloads are residing in buffers waiting to pass through a Switching Engine (SWE) or passing to an outgoing port on an IC to another IC; and identifying lower priority packets that can be discarded when buffers begin to fill up.

8. The transport mechanism according to claim 7, which utilizes the priority field of said transport mechanism, to: prevent response packets, which are packets a resource sends back to a CPU or other controller acknowledging receipt of a command packet, status of a executing a command, or unsolicited responses such as alerting the CPU or other controller of conditions that need their immediate attention, from leaving a data processing system for the purpose of preventing an external controller from attempting to take control of any resource inside the data processing system by not allowing said external controller from receiving a response from any internal resource; and prevent command packets, which are packets a CPU or other controller sends to another resource requesting it to perform a function for said CPU or controller, from entering the data processing system for the purpose of preventing an external controller from attempting to take control of an internal resource or discovering the organization of said data processing system.

9. The transport mechanism according to claim 1, which utilizes higher level protocols being carried by the transport mechanism, to perform all functions of transmission reduction, that is, when any IC detects that its internal buffers are nearing capacity, otherwise known as suffering congestion, shall send out high priority packets to all other ICs connected to it that for a limited length of time will temporarily restrict transmission of packets to the IC suffering congestion for the purpose of allowing said IC to empty out its own internal buffers so they won’t overflow and lose packets, and said IC connected to an IC that is suffering congestion, when it receives a packet notifying it the IC is encountering congestion, for a brief period of times suspends transmissions of packets to said source of the congestion notification packet for the purpose of allowing the source of the congestion notification time to empty or partially empty out its buffers.

10. The transport mechanism according to claim 2, whose outgoing and destination ports will look at the lead address field of the encapsulation around a payload and accept said payload if the address matches what the port is configured to accept; said port not being restricted to accepting a single address only; and said port not being given the exclusive use of any address so that multiple outgoing ports may arbitrate to carry a payload to provide more bandwidth between two ICs by providing multiple ports and paths between them.

11. The transport mechanism according to claim 10, with a packet or packets whose lead address field is an outgoing port, and confronted with a multiplicity of outgoing ports all responding to the same address in the lead address field, to provide for a means of selecting which outgoing port will accept said packet or packets, based on the criteria of, a) the said outgoing port is in communications with the receiving port of the IC it is connected to such that it can faithfully carry payloads over the transport mechanism, b) the said outgoing port’s buffer has fewer packets pending inside it than any other outgoing ports it is competing against to receive said payload, and c) if a tie exists among two or more outgoing ports, a round-robin or arbitrary priority scheme is used to select the port that will accept the packet(s) of the payload.

12. A transport mechanism whose payload contents can replace all other types of signal pins except reset on a CPU IC by using high speed packets destined for internal ports inside the CPU IC which provide the same function as traditional pins on a CPU IC, with the goal of reducing the pin count of said CPU IC, which reduces the manufacturing costs of said CPU IC, leaving only power pins, power return pins otherwise commonly referred to as ground, a reset signal, and the transport mechanism ports as the only pins needed on a CPU IC.

13. The transport mechanism according to claim 12 further including integrated circuits (ICs) for the suppling of controller configuration information and boot code as needed by a CPU IC to operate in a diverse environment, that the environment configuration over a high speed transport mechanism between the two ICs, and provide an enhanced Local Bus Controller function in a separate IC package to keep the pin count of the CPU low enough to use smaller packages, and said support ICs are not just limited to providing configuration information and boot code, but may optionally provide 10 signals commonly used in processing mechanisms, time-of-day clock and calendar functions, health monitoring functions for the processing system, and other features or functions as deemed needed for the processing system.

14. The transport mechanism according to claim 13 further including a separate enhanced Local Bus Controller IC that accesses boot code, configuration information, and low speed I/O functions, and supplies it to a CPU over a high speed transport mechanism to a CPU such that said CPU does not need to use pins to provide for the enhanced local bus and thus can keep its pin count down so it can be installed in a lower cost package.

15. The transport mechanism according to claim 12 that can suspend all signal switching on its outgoing ports for the purpose of signaling to the receiving ports that a reset is in progress, such that said receiving IC no longer needs a reset pin but will place itself in reset when the receiving ports of all transport mechanisms of said IC are no longer transitioning, the purpose be to reduce the pin count and hence the packaging costs of said IC.

16. A transport mechanism according to claim 15 wherein the pin assignments of an IC are completely symmetrical to the point where a rectangular shaped IC package can be inserted in either orientation and the IC will properly function, and where a square shaped IC package can be inserted in any of the four different directions and the IC will properly function, the purpose of which is to ease in the manufacturing of printed circuit boards with said IC in that the orientation of the IC will be irrelevant and thus if not installed in the correct orientation the IC will still function as intended with no loss of efficiency or throughput.

17. A computing module comprising: a semiconductor carrier having a four sided pin configuration; a central processing unit (CPU); serial port circuitry electrically coupled with the CPU; and a plurality of serial ports electrically coupled with the serial port circuitry, wherein: a first serial port of the plurality of serial ports is electrically coupled with a first plurality of pins positioned on a first side of the semiconductor carrier; a second serial port of the plurality of serial ports is electrically coupled with a second plurality of pins positioned on a second side of the semiconductor carrier; a third serial port of the plurality of serial ports is electrically coupled with a third plurality of pins positioned on a third side of the semiconductor carrier; a fourth serial port of the plurality of serial ports is electrically coupled with a fourth plurality of pins positioned on a fourth side of the semiconductor carrier; and the first, second, third, and fourth plurality of pins each have a commonly positioned transmit output port and transmit input port associated with the given serial port.

18. The computing module of claim 17, wherein the serial port circuitry includes a non-blocking switching engine for connectivity of payloads between the CPU and each serial port.

19. The computing module of claim 17, wherein the first, second, third, and fourth plurality of pins each have one or more commonly positioned power pin(s) and one or more commonly positioned ground pin(s).

20. The computing module of claim 19, wherein: each transmit output port includes a differential transmit output port; and each receive input port includes a differential receive input port.

21. The computing module of claim 17, wherein the semiconductor carrier is at least one of: a 44-pin plastic leaded chip carrier (PLCC); a 68-pin PLCC and the plurality of serial ports includes at least eight serial ports; a 100-pin PLCC and the plurality of serial ports includes at least twelve serial ports; a 144-pin Quad Flat Pack (QFP) and the plurality of serial ports includes at least sixteen serial ports; and a 208-pin QFP and the plurality of serial ports includes at least twenty serial ports.