WO2022026497A1 - An enhanced processor data transport mechanism - Google Patents
An enhanced processor data transport mechanism Download PDFInfo
- Publication number
- WO2022026497A1 WO2022026497A1 PCT/US2021/043374 US2021043374W WO2022026497A1 WO 2022026497 A1 WO2022026497 A1 WO 2022026497A1 US 2021043374 W US2021043374 W US 2021043374W WO 2022026497 A1 WO2022026497 A1 WO 2022026497A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- transport mechanism
- cpu
- port
- payload
- packets
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/382—Information transfer, e.g. on bus using universal interface adapter
Definitions
- the present disclosure relates to data switching. More particularly, the present disclosure relates to replacing all of the different internal protocols for communications between Integrated Circuits (IC) inside a processor system and other high capacity data handling systems with a single physical layer protocol used for switching data between all ICs.
- IC Integrated Circuits
- DDR Double Data Rate
- the CPU communicates with rotating mass storage devices (disk drives), Capacitive
- CD Compact Disk
- DVD Digital Versatile Disk
- SSD Solid State Disk drives
- SAS Serial Attached SCSI
- AT Serial AT Attachment
- SATA Serial AT Attachment
- eSATA external SATA interface
- PCIe Peripheral Communications Interface express
- Northbridge a peripheral interface IC generically called the Northbridge, which provides PCIe, USB, and optionally some graphics generation.
- the information received over a PCIe bus is then translated into a video graphics standard such as VGA, DVI, HDMI, or DisplayPort, such translations incurring delays which, while not applicable to most viewing of video displays, for people engaged in intense interactive video activities such as playing video games with other players, the users suffer disadvantages versus other players that is proportional to the delay in translating between PCIe and their chosen outgoing display standard.
- a video graphics standard such as VGA, DVI, HDMI, or DisplayPort
- a single common transport protocol that can connect all of these different resources and I/O devices to each other and the CPU is needed, a single physical layer protocol that can carry memory accesses, mass storage accesses, video data or commands, slow speed and high speed I/O peripherals into and out of the CPU, a protocol that can transparently transport different upper layer command sets and data to the different peripherals, and can even be used as a protocol to replace Ethernet as the transport protocol of choice for connectivity to Local Area Networks (LAN), a protocol that can transport so much data into and out of the CPU that data starvation is significantly mitigated, a protocol that can allow resources to autonomously initiate data and status transfers with the CPU or each other, a protocol that can reduce the pin count needed to transport data into and out of a CPU by a factor of ten or more, a protocol that can spread out and place main memory further away from the CPU and therefore have access to more PCB surface area for more main memory than what DDR can physically allow for, and make it mechanically easier to carry away heat from the CPU
- the invention provides the transport mechanism for all high capacity communications between integrated circuits (ICs) inside a computer or data processing system; the CPU, main memory, boot memory, mass storage, graphics, a low speed Input- Output (I/O) controller, and high speed I/O ports.
- ICs integrated circuits
- the transport mechanism is deliberately void of any instruction sets or data orientation of its payload contents. Traffic routing and switching is controlled at the transport layer only.
- the transport mechanism implements the same rules for packet size, packet structure, and packet transfer protocols implemented in US Provisional Patent Application No. 61/778,393 titled “INDEFINITELY EXPANDABLE HIGH-CAPACITY DATA SWITCH,” filed on March 12, 2013; and US Utility Application No. 14/021,563 titled “INDEFINITELY EXPANDABLE HIGH-CAPACITY DATA SWITCH,” filed on September 9, 2013, patented 9,577,955 on February 21, 2017 all above-identified applications, which are incorporated by reference herein.
- the transport mechanism implements the same transport mechanism protocols used in the IEDS for transferring incoming packets through serial lanes from one IC to another.
- a destination port for packets carried by the transport mechanism does not have to be another transport mechanism compatible serial port, but instead can be an internal function of the destination IC, for example, an IC hosting main memory.
- the said IC hosting main memory contains a transport mechanism compatible Switching Engine (SWE) with numerous ports.
- SWE transport mechanism compatible Switching Engine
- Incoming ports to the SWE can be from transport mechanism serial receivers, as well as internal ports such as command and control, or data ports.
- Outgoing ports from the SWE can to go transport mechanism serial transmitters, and also internal ports such as command and control, or data ports.
- additional transport mechanism serial ports go from the SWE inside the main memory host to other resources common to computer and other data processing systems: mass storage devices, graphics generators or interfaces, low speed I/O controllers, and high speed I/O interfaces.
- mass storage devices graphics generators or interfaces
- low speed I/O controllers low speed I/O interfaces
- high speed I/O interfaces Nor does the main memory host have to be the only resource with an SWE in it. Other resources may have SWE in them as well, allowing additional resources and CPUs or other central controllers to be connected to each other in an indefinitely expandable arrangement.
- internal ports in the main memory host provide the interface between the SWE and the DDR controller. Said port can be of significantly larger capacity compared to the capacity of any one single transport mechanism serial port, as it will deal with data from the DDR interface.
- main memory host limited to handling DDR memories. Once the described invention becomes used often enough, it will become economically viable to build memory ICs with the transport mechanism rather than DDR interfaces on them and completely eliminate the need
- the initial implementations of the invention will use resources that interface between the SWE and the older protocols. This will allow industry time to transition from building memories based on DDR, non-volatile memories connected to an enhanced Local Bus Controller (eLBC), mass storage based on SATA, graphics and other I/O based on PCIe, to these same resources that directly interface to the transport mechanism serial ports. For example, once main memories themselves interface using the transport mechanism serial port, then resources that host a main memory interface between DDR and the transport mechanism serial port will no longer be needed as DDR will go away.
- eLBC enhanced Local Bus Controller
- the main memory host would contain an internal cache of sufficient size such that when a memory read access is requested from a CPU or even another resource, the first few hundred bytes of said memory block can be pulled from the cache while the main memory host sets up the transfers from the DDR memory.
- the DDR memory will have begun providing sufficient amounts of data to refill the buffers before the buffer that was filled only by cache completely empties.
- a superior service can be performed by the main memory host, enabling it to start transmitting data with less latency than if the main memory host had to wait for DDR accesses to start supplying data, which is a problem on today’s CPUs that directly host a DDR interface.
- incoming traffic on any transport mechanism’s port into the destination IC intended for a specific function inside said IC would reach the specific function regardless of which of the transport mechanism’s ports it came in on using a concept called “Physical Address Routing” (PAR).
- PAR Physical Address Routing
- the physical address of the destination port is carried in the lead address field of a fixed sized packet, either an only packet carrying the entire payload, or in the first packet of a multi-packet payload. All incoming packets would first pass through an SWE and be presented on the output of the SWE to all destinations. Said destination would see its address, or one of the addresses it accepts, in the lead or only packet’s lead address field and accept the packet(s), while those destinations that do not see their address or any of their addresses ignore the packet(s).
- packet switching is performed by specialized SWE designed to efficiently and autonomously switch packets using information inside the packet header.
- incoming packets are deserialized and all of the bits of the packet are passed around in parallel.
- Typical silicon switching rates and the time of the writing of this invention disclosure are somewhat in excess of 4 GHz, which means an SWE for the packets can transfer over 4 billion packets per second, which is the approximate capacity of 36 lanes of 58.7 Gbps if each packet is 512 bits in length with a 64B/66B overhead.
- a CPU hosting up to 36 such transport mechanism’s ports would need a single layer SWE to switch incoming data from all lanes to any destination inside the CPU. If there are more than 36 such ports in the CPU, the CPU would have a multi-level SWE, with intermediate buffers accepting physical addresses for every destination it would service from each first level SWE, which are then distributed to their destinations at the 2 nd level SWE.
- continuation packets which will always follow a lead packet, do not carry routing information but are routed the same as the lead packet they follow. As such they must always stay with their lead packet, whether going through a transport mechanism serial port or through an SWE. Once their lead packet start passing through an SWE or go out a transport mechanism serial port, the transfer must continue until the entire payload has passed through to keep the continuation packets with their lead packet, even if a higher priority payload becomes known to the outgoing transport mechanism’s serial port or an SWE.
- a packet or set of packets containing a payload from the packet source enters an intermediate IC, for example, a CPU sends packets to an IC hosting main memory, and if the payload is destined for an outgoing transport mechanism serial port of the IC hosting main memory, then said IC will pass the packet(s) to its addressed outgoing port to another IC connected to it.
- the ICs most frequently accessed for transfers into the CPU(s) or central controllers, which are typically main memory hosts are connected directly to the CPU to minimize latency.
- Each main memory host will have multiple instances of a transport mechanism serial port directly connected to the CPU, and additional transport mechanism serial ports connected to other ICs. This allows the CPU to access all other resources over the same transport mechanism lanes connected to the main memory hosts.
- the CPU has three such ports of 58.7 Gbps it can receive memory transfers into itself at approximately the same rate as a DDR4-3200 implementation connected directly to the CPU. Note that DDR4-3200 has a peak transfer rate of 25.6 Gbytes/seconds, although its sustained rate will be somewhat less. If the CPU has sufficient numbers of transport mechanism serial ports all connected directly to ICs hosting main memory, then the CPU can receive data from multiple instances of main memory swiftly enough to avoid the worst instances of data starvation so prevalent in the data processing industry at the time of the writing of this invention description.
- the pin count for three transport mechanism serial ports (12 pins for signals and an equal number for power and ground for a total of 24) is all that is needed to supply data to a CPU at a rate comparable to that of a DDR4-3200.
- DDR4 needs an estimated 260 pins for signals, power and ground per DDR4 instance.
- the transport mechanism’s three instances does not take up anywhere near the pin count or power consumption that a single DDR4-3200 implementation takes.
- the transport mechanism serial ports are all one-way differential signals, they can travel much further on a PCB than DDR4 signals can, allowing main memory hosts to be spread further apart from the CPU.
- the graphics display would receive data from multiple transport mechanism serial ports.
- the payload from each would be sent to an internal destination port, the payloads extracted, and their combined bit maps merged together to generate the display’s pixel streams.
- bit maps merged together to generate the display’s pixel streams.
- Each transport mechanism’s port consists of an independent transmitter and independent receiver that do not interfere with or interact directly with each other during operation, although some Built In Test (BIT) features may briefly connect them together after a system wide reset.
- BIT Built In Test
- said graphics display may receive packets not just for itself, but for other displays as well as several low speed I/O ports such as USB ports or sound systems on said display.
- the lead address field in the packet carrying a payload destined for the USB port or sound system of said display would direct said packet to the appropriate I/O controller inside the display.
- the lead address field in the packet for a packet destined for other video displays would direct said packet to the outgoing transport mechanism serial port(s) inside the display for transport to another display.
- Said other display can be daisy-chained to other displays as well, as many as are desired for the data processing system, limited only by how many displays can be supported by the CPU and the operating system running on the CPU. Note that while the raw data throughput needs of a display consume about 44% of an transport mechanism’s port’s capacity, an efficient video data compression scheme at the video display can easily reduce these bandwidth needs by an order of magnitude or more.
- a transport mechanism compliant graphics controller receives instructions from said CPU(s) and processes the graphics, driving a video display or television using commercially acceptable display interfaces such as VGA, DVI, HDMI, Display Port or other video display interfaces, as well as any potential future video interfaces.
- a transport mechanism serial port when used as the connection to a LAN, malicious users may try to send unsolicited packets to the data processing system for any number of nefarious reasons. Without proper security said packets could be interpreted inside the data processing system as commands from the CPU, allowing the malicious users to take control of the data processing system. However, said packets can only be interpreted inside the data processing system as commands if the lead packet or only packet is allowed to remain as a lead packet.
- the resource that presents the transport mechanism’s port to the outside world as the data processing system’s interface to a LAN would take all incoming packets and store them inside its own local memory, to be read by another resource inside the data processing system rather than allowing them to travel through the data processing system as if they were to be trusted. This reading of the incoming packets encapsulates them such that no part of them is considered to be a lead packet able to route its payload through the data processing system anymore.
- multiple hosts for main memory may be daisy chained to each other, with each host containing a portion of a minimum size block of memory that is transferred into a CPU.
- this block would be 4096 bytes in size, which is the size of a section of memory managed by the Memory Management Units (MMU) used in many Intel-based computers at the time of the writing of this invention description.
- MMU Memory Management Units
- This allows larger memories for CPUs, and for CPUs with fewer transport mechanism serial ports on it, the ability to access just as much memory as CPUs with more such ports.
- the block of memory would be divided up between the main memory host closest to the CPU and the main memory host(s) daisy chained after it.
- the main memory host closest to the CPU would respond the quickest, and begin filling up its buffers to those lanes going to the CPU, giving the main memory host(s) daisy chained to it the opportunity to also access its memory even after suffering the additional delay involved in going through an intermediate SWE.
- Said daisy chained main memory host(s) will be filling up its outgoing buffers for transmission to the main memory host closest to the CPU before the closer main memory host has finished transmitting its contents to the CPU. This results in a continuous stream of packets to the CPU from main memory, and this process can be repeated in a daisy-chain fashion as often as needed to provide the CPU with all the main memory it needs.
- multiple CPUs, multiple main memory hosts, multiple high speed and low speed IO interfaces, multiple mass storage devices, multiple graphics displays, and any other transport mechanism serial port compatible IC may exist in a single data processing system, interconnected to each other in whatever fashion the data processing system designer choses.
- a serial port of the transport mechanism can be converted from electrical signals into optical signals in close proximity to the IC hosting the port, carried a distance over optical fiber to the transport mechanism serial port of another IC, and be re-converted back into electrical signals at said IC. Due to the extremely high bit rates of the transport mechanism lanes, it is difficult at best to carry differential signals more than a few decimeters on printed circuit boards (PCB) and still be accurately recoverable. Discrete signals (ones and zeros are examples of discrete signals) utilize multiple harmonics of the bit rate to help receiving circuits accurately capture the transmitted bits.
- the actual distance will be dependent upon the signal amplitude and frequency response of the signal transmitter and its ability to provide additional higher frequency signal amplification of its transmitter (called pre-emphasis), the higher frequency attenuation caused by the material of the PCB the signal is being carried over, and the ability of the receiver to re-amplify higher frequency content more than lower frequency content of the received signal (called post-emphasis) to overcome the higher attenuation that higher frequency signals experience traveling over PCBs.
- pre-emphasis the signal amplitude and frequency response of the signal transmitter and its ability to provide additional higher frequency signal amplification of its transmitter
- post-emphasis the ability of the receiver to re-amplify higher frequency content more than lower frequency content of the received signal
- a real-world need for EMI hardening is for use in applications where the data processing system is subjected to being near a radar or radio transmitter as one might find in the aviation or ocean-going industries.
- the serial signal instead of converting a serial port of the transport mechanism into an optical signal, the serial signal can be kept electrical and transported over a cable independently of the PCB the ICs are on, including being transported to separate PCBs.
- Such embodiments will not provide as much EMI hardening as converting to optical would provide, but such embodiment will be less expensive to implement and allow the transport mechanism to be carried further than it can on etch in a PCB.
- serial transport mechanism can be daisy chained to go through multiple intermediate ICs prior to a payload being delivered to its destination
- multiple implementations of main memory hosts can be daisy chained to each other, providing a near indefinite amount of main memory to the CPU for those applications where very large amounts of main memory are needed, for example, detailed weather forecasting.
- a computing module in another embodiment, includes a semiconductor carrier having a four sided pin configuration, a central processing unit (CPU), serial port circuitry electrically coupled with the CPU, and a plurality of serial ports electrically coupled with the serial port circuitry.
- CPU central processing unit
- serial port circuitry electrically coupled with the CPU
- serial port circuitry electrically coupled with the serial port circuitry.
- a first serial port of the plurality of serial ports is electrically coupled with a first plurality of pins positioned on a first side of the semiconductor carrier
- a second serial port of the plurality of serial ports is electrically coupled with a second plurality of pins positioned on a second side of the semiconductor carrier
- a third serial port of the plurality of serial ports is electrically coupled with a third plurality of pins positioned on a third side of the semiconductor carrier
- a fourth serial port of the plurality of serial ports is electrically coupled with a fourth plurality of pins positioned on a fourth side of the semiconductor carrier.
- the first, second, third, and fourth plurality of pins each have a commonly positioned transmit output port and transmit input port associated with the given serial port.
- the serial port circuitry may include a non-blocking switching engine for connectivity of payloads between the CPU and each serial port.
- the first, second, third, and fourth plurality of pins may each have a commonly positioned power pin and a commonly positioned ground pin.
- the each transmit output port may include a differential transmit output port.
- each receive input port may include a differential receive input port.
- the semiconductor carrier may be a 44-pin plastic leaded chip carrier
- the semiconductor carrier may be a 68-pin PLCC and the plurality of serial ports may include at least eight serial ports.
- the semiconductor carrier may be a 100-pin PLCC and the plurality of serial ports may include at least twelve serial ports.
- the semiconductor carrier may be a 144-pin Quad Flat Pack (QFP) and the plurality of serial ports may include at least sixteen serial ports.
- QFP Quad Flat Pack
- the semiconductor carrier may be a 208-pin QFP and the plurality of serial ports may include at least twenty serial ports.
- FIG. 1 depicts a diagram illustrating an embodiment of the packet structures in accordance with embodiments of the present disclosure.
- FIG. 2 depicts a logic diagram illustrating a detailed embodiment of a single level Switching Engine (SWE) wherein buffers are removed for clarity but are assumed to be present as they are in FIG. 3 in accordance with embodiments of the present disclosure.
- SWE single level Switching Engine
- FIG. 3 depicts a flow diagram illustrating a high-level embodiment of a multi-level SWE with buffers in accordance with embodiments of the present disclosure.
- FIG. 4 depicts a block diagram illustrating data processing system using a single central processing unit (CPU) in accordance with embodiments of the present disclosure.
- FIG. 5 depicts a mechanical diagram illustrating a minimal pin count CPU integrated circuit (IC) using a 44-pin plastic leaded chip carrier (PLCC) package having four transport mechanism serial ports and a 68-pin PLCC package having eight transport mechanism serial ports in accordance with embodiments of the present disclosure.
- IC minimal pin count CPU integrated circuit
- PLCC plastic leaded chip carrier
- Payload A variable size collection of data that is passed from a source to a destination. Payloads discussed in this implementation of the invention are not specified, but can include internal constructs of blocks of memories, command and control sequences, or Input/Output (IO) data. The term Payload may be used interchangeably with datagram.
- Switching Engine Logic and buffers implemented inside an IC to take packets from any incoming port and switch them so that every outgoing port can examine the packets, and decide whether to accept or reject the packets.
- Packet A fixed sized collection of data carried over a transport layer.
- a packet can carry an entire payload, it is referred to as a single or only packet, and when payloads are too large to fit inside a single packet, then a lead packet with one or more follow on, or continuation packets, are used to carry the balance of the payload that could not fit into the lead packet. If a payload does not completely fill out the last packet, then a pad field is added to make the last packet size match that of the other packets. It will then be the responsibility of the destination to go through the payload, find its size field, and truncate the pad off the end of the payload, but this function is beyond the scope of this invention description.
- Single Packet (or Only Packet) A packet that carries all needed routing information and an entire payload in it. Note that the term “Single Packet” and “Only Packet” are interchangeable. Single packets are distinguished from lead packets by the fact that there are no continuation packets immediately following a single packet. [00046] Only Packet - see the definition for “Single Packet”.
- Lead Packet a packet that contains the routing information for itself and any number of continuation (or follow-on) packets.
- the lead packet also carries a portion of the payload, with the balance carried in the continuation packet(s) that immediately follow it.
- Null Packet - an Only packet with a lead address field that will not be accepted by any outgoing port connected to an SWE. It is intended to be inserted into an outgoing serial port, or a switching engine, when there are no other packets to be transmitted or switched. Null packets are ignored by all receivers, and signals or informs buffers on the output of an SWE that were accepting continuation packets that the last valid payload has completely transitioned through the SWE if no more packets are present to otherwise go through it. Its purpose is to keep the serial link active so the receiving port will stay locked onto the stream of one’s and zero’s.
- a null packet is defined by the first two bits are set (indicating an only packet), the next 14 bits are all zeroes (an example of an address not used by any outgoing port), then alternating with every other bit in the null packet is either a one or a zero (maximizing the number of edges transmitted to ensure the receiver stays locked onto the incoming signal).
- Framed - A high speed serial receiver that is locked onto the incoming datastream and can segregate it into incoming packets is considered “Framed”.
- Framing Packet Similar to a null packet, except that the 6 bits immediately following the two lead bits are also one’s.
- the framing pattern allows receivers that are not locked onto the 64b/66b pattern to quickly find the pattern edge and the start of packet edge. Once a receiver has locked onto the framing pattern, it will transmit null packets until it is also receiving null packets or packets carrying payloads. Once a receiver is receiving null packets or packets carrying payloads, it can start transmitting payloads if it has any pending in its outgoing buffer.
- Continuation Packet A packet that follows a lead packet or another continuation packet in a multi-packet payload.
- the continuation packet is distinguished from a lead or only packet by a field in its header that is different from a lead or only packet.
- Continuation packets do not contain any routing information and must always follow their lead packet to get to their destination. But as a result of this, better than 99% of a continuation packet can be used to carry the payload, making it extremely efficient in its utilization of the transport mechanism’s bandwidth.
- Transport Mechanism the serial transport lanes between Integrated Circuits (IC), as well as those internal portions of an IC dedicated to sending, receiving, buffering, or switching packets.
- Switching Engine a mechanism inside an IC that switches fixed-size packets of data from any number of packet sources to any number of packet destinations, the goal being able to work so swiftly that the rate of arrival of incoming data cannot overwhelm the SWE’s ability to move the data to the outgoing destinations.
- PAR Physical Address Routing
- Resource any IC inside a data processing system that communicates with other ICs using the transport mechanism of this invention disclosure.
- Resources can include CPUs and other controlling processors, main memory hosts, mass storage hosts, I/O controller hosts, graphics generators and video displays, or any other IC that can connect up to the transport mechanism and provide a function, feature, or service to the CPU(s) or other controlling processor(s).
- Mass Storage a mechanism inside or attached to a processing system used to store large quantities of non-volatile information, non-volatile referring to a memory system where the contents are not altered or lost when power is removed from the mechanism.
- Commonly used mass storage at the time of the writing of this invention description can include but is not limited to rotating disk drives, solid state disk drives (SSD), Capacitive Disks (CD), and Digital Versatile Disk (DVD).
- SSD solid state disk drives
- CD Capacitive Disks
- DVD Digital Versatile Disk
- Past systems mostly obsolete now, used floppy disks, rotating drums, and magnetic or optical tape drives.
- FIG. 4 depicts a block diagram 400 illustrating data processing system using a single central processing unit (CPU) in accordance with embodiments of the present disclosure.
- Block diagram 400 further illustrates a novel architecture for the switching of data inside a processing system.
- a common transport mechanism is used throughout all high capacity connections between ICs 402 402a 402b, and possibly a transport mechanism serial link 415 can go between the data processing system and a Local Area Network (LAN). Protocols that ride on top of the transport mechanism are only relevant to the end points.
- LAN Local Area Network
- the transport mechanism does not have to directly connect a data source 406, 408, 414, 416 to a data destination 401; it can be switched by intermediate ICs 403 and 403b, or 403a and 403c, which allows a limited number of paths 402 into and out of one or more CPUs 401 or other central controllers to be connected to an indefinite number of other resources within the processing system while being able to utilize the bandwidth of all the ports 402 of the CPU 401 concurrently.
- Another feature of the transport mechanism is that those resources 403, 403a most frequently accessed by a CPU 401 or other central controllers in the processing system can be directly connected to said controllers, limiting the latency from a resource request to a resource response. In most cases that resource 403, 403a will be a main memory host, although cases can be made for other resource hosts as determined by the architect of the processing system.
- Another feature of the transport mechanism is that resources of any type can be added provided they adhere to the transport mechanism’s protocol, in any numbers, and in any arrangement.
- FIG. 2 depicts a logic diagram 200 illustrating a detailed embodiment of a single level Switching Engine (SWE) wherein buffers are removed for clarity but are assumed to be present as they are in FIG. 3 in accordance with embodiments of the present disclosure.
- the single level SWE is optimized for use in 4-input lookup tables.
- FIG. 3 depicts a flow diagram 300 illustrating a high-level embodiment of a multi-level SWE with buffers in accordance with embodiments of the present disclosure.
- the common transport mechanism of flow diagram 300 can share bandwidth by simply selecting the outgoing port with the least full outgoing buffer 303 for the next payload going out of said port(s) that all go to the same destination.
- IP Internet Protocol
- TCP Transmission Control Protocol
- the common transport mechanism is implemented in a manner having similar functionalities to the systems disclosed in US Provisional Patent Application No. 61/778,393 titled “INDEFINITELY EXPANDABLE HIGH-CAPACITY DATA SWITCH,” fded on March 12, 2013; and US Utility Application No. 14/021,563 titled “INDEFINITELY EXPANDABLE HIGH- CAPACITY DATA SWITCH,” fded on September 9, 2013, patented 9,577,955 on February 21, 2017 all above-identified applications, which are incorporated by reference herein.
- FIG. 1 depicts a diagram 100 illustrating an embodiment of the packet structures in accordance with embodiments of the present disclosure.
- the source places in the header 101 102 103 104 105 106 encapsulating the payload the route through the switching system the payload will take.
- the lead address field 104 in the payload points to the outgoing port of the resource that the payload will be switched out of.
- the payload’s lead or only packet has the last 62 bytes 105 106107 108 109 in it shifted up by one address field value to provide a new lead address 104 for the next receiving IC.
- the process will also fill in the back end 109 with alternating ones and zeroes to increase the density of transitions in the data, which enables a receiving port to more easily stay centered on the bit positions of the incoming serial data stream.
- the common transport mechanism uses a novel means of selecting the outgoing port when multiple ports are available for use.
- the ports will arbitrate among themselves for the right to accept the payload based on the following scheme: 1) the port is functioning and in communications with its other end, 2) the port with the least filled outgoing buffer is selected, and in the case of a tie, either a round-robin scheme or an arbitrary tie-breaking scheme of port priority is used.
- Another distinct feature of the transport mechanism in this invention disclosure is that, unlike the transport mechanism used in the Indefinitely Expandable Data Switch (IEDS), there is no use of active and standby redundancy, with the possible exception of connections to Local Area Networks (LAN). This significantly simplifies the implementation of the transport mechanism. Where active and standby are needed in LAN implementations, the LAN implementation will implement the same protocol used in the IEDS and provide the interface to the processing system, but that is beyond the scope of this invention disclosure.
- IEDS Indefinitely Expandable Data Switch
- Differential signal consists of two separate signals, with one signal always of the opposite polarity of the other with respect to a common central point. Differential signaling minimizes the magnetic fields generated by propagating signals as they cancel each other out, which increases switching speeds and tolerance to outside signal interference, and minimizes power consumption.
- the two signals are routed adjacent to each other from the transmitter to the receiver. To minimize distorting the signal as it moves along the etch, the following rules have to apply:
- the quality of the PCB material must be such as to minimize a loss of signal strength at higher frequencies
- the signals should be on a layer next to a ground plane, and the ground plane is continuously underneath the signals from the transmitting pins to the receiving pins, to keep the signal impedance constant and free from interference from signals on other layers nearby;
- This protocol guarantees a minimum of two transitions for every 66 bits transmitted, which is sufficient for a receiver to stay locked onto an incoming data stream.
- the two bits of framing overhead will typically be transmitted as a zero bit, then a one bit, seven out of eight times, and the eighth time, it will be transmitted as a one, then a zero.
- This difference in the polarity of the bits of the framing field indicates the start of a packet, which will consist of 512 bits in eight groups of 64 bits with two framing bits added per 64-bit group.
- the receiver will detect inversion in the framing signal and thus invert all bits received, correcting the signal’s polarity reversal on the PCB.
- Bit inversion is already practiced in PCIe interfaces and is an accepted industry practice whose goal is to keep etch lengths in a pair of high speed differential signal etch as close to the same length as possible.
- Payloads carried inside the packets can vary in size. It is not the responsibility of the transport protocol to identify payload boundaries to an upper level protocol, so the following is beyond the scope of this invention disclosure, but it will be discussed to show how the invention can carry payloads in multiple packets efficiently.
- a lead packet and one or more continuation packets carry a payload 107 108 111 112 113, part of the lead packet’s tail end 109 will have been backfilled if the lead packet went through one or more intermediate ICs switching the packet from the source to the destination.
- the first field 107 found in the payload created by the payload source should be a count of how many bytes are part of the payload in the lead packet. This value will have reached the position of the first subsequent address field 105 at the destination IC. Only these values in the count 107 are extracted from the lead packet 116 and merged with the payload in the continuation packets 116a to form the payload at the receiver.
- the byte count field 107 of the first packet isn’t needed, as the payload will have the size field embedded in it and it can use that to truncate the pad at the end of the packet, or else it will be a fixed size so no byte count field is needed. Note that is the responsibility of the payload source, not the transport mechanism, to know how many intermediate resources must be used to carry the packet to the destination so that it can properly calculate the payload field byte count in the first location past the last address field 106. Note also that the term “byte” as used in this paragraph normally represents 8 bits, but if the lead address field is not 8 bits, then term “byte” as used in this paragraph represents the number of bits in the lead address field.
- the structure 116 of the lead packet and the single (or only) packet is the same. The only difference between the two is that a lead packet will have one or more continuation packets 116a following it, while the single packet 116 is followed by another single or lead packet 116, or a null packet if there are no more packets to be transmitted or switched. The last continuation packet will be followed by a single packet, a lead packet, or a null packet.
- DMA Direct Memory Access
- a resource uses a DMA engine to take control of the memory bus and transport I/O content directly to memory without first going through the CPU. While this speeds things up in traditional computers in that there is now only one access to get the information into or out of main memory from an IO device, it does force the CPU to be idle while the transfer takes place if the CPU needs access to main memory.
- DMA engines become obsolete.
- a resource such as a LAN interface 414 that needs to send one or more payloads to a main memory host 403 403a 403b 403c simply directs the payloads to the main memory host 403 403a 403b 403c rather than the CPU 401 using the transport mechanism, and since the paths 402 between the CPU and the main memory are not used in this transfer, the CPU can continue to access main memory as needed without sharing its bandwidth.
- the CPU may encounter additional delays when accessing other resources as the paths to the main memory from other resources may have some or most of their bandwidth consumed carrying the DMA transfer, or the main memory has to switch from handling packets of the DMA transfer to handling packets for the CPU, but this delay is small and much less impacting on a CPU than would be found using traditional DMA engines and traditional busses to access main memory.
- the transport mechanism paths 402 to the CPU must never be used to route packets between resources.
- the Enhanced Graphics Processor 408 has a large number of paths 402a going to it that can be linked to all resources but they are lightly loaded; these paths should be used to bridge between resources as needed.
- a main memory resource 403 may receive a command from the CPU 401 to read a block of memory and send it to the CPU.
- the command port internal address will be different than the data port address, therefore, the final lead address field’s value inside the lead or only packet will direct the packet to the command port rather than the data port.
- null packets must be inserted to keep the path working and the two resources in sync with each other. For this reason, certain addresses in the lead address field will be set aside for such uses, including a lead address value to identify the packet as a null packet only which does not contain any payload. Such packets are ignored at the receiving end. Note that a lead/single packet followed by a null packet will be interpreted as a single packet.
- the receiver If the receiver falls out of sync with the incoming signal but is still receiving one’s and zero’s, it will start transmitting a packet called the framing packet.
- a framing packet is similar to a null packet, but lets the far end know communications is down so that it will stop transmitting valid packets. The other end will start transmitting framing packets if it has also lost sync, or it if continued to stay synchronized, null packets.
- Once a receiver Once a receiver is properly sync’d up to the incoming packet boundaries, it switches from transmitting framing packets to transmitting null packets if it is receiving framing packets.
- said discovery process should be able to handle the insertion or removal of resources after the initial discovery occurs after reset. Connections to Local Area Networks (LAN), as well as connections to external resources such as additional mass storage may happen. Again, this is beyond the scope of this invention, but is mentioned to show that upper level protocols can be developed using the claimed invention as the physical layer implemented in the system.
- LAN Local Area Networks
- a priority field 103 typically 3 bits, will define eight levels of priority in the lead or only packets. The two highest priorities will be reserved for command and response functions of various resources. Resources connected to the LAN 414 will not allow the response function priority to pass to the LAN from the processing system, since all responses from a resource to a CPU 401 or other central controller must only go to those controlling resources inside the data processing systems. Further, resources 414 connecting the LAN to the processing system will not allow command function priority to pass to the processing system from the LAN, again as all controlling resources must reside inside the data processing system.
- Commands may pass outside the data processing system to allow the internal controllers the ability to discover and use external resources, and responses from external resources may pass back into the data processing system as said resources must allow controlling resources inside the data processing system the ability to discover and use them.
- Priorities below these two priorities which will be all forms of data, will be allowed to pass into and out of the data processing system unhindered.
- command ports in all resources will not respond to commands unless the command payload is carried by a transport mechanism packet with the command priority in the packet’s priority field 103.
- An interrupt is a physical line or combination of lines set to certain levels asking the CPU to interrupt what it is doing to dedicate resources to handling the immediate needs of the resource.
- Some interrupts are generated when said resource has new incoming I/O for the CPU, or can accept more outgoing I/O as it has emptied its outgoing buffer enough to accept more data.
- Other interrupts occur because potentially erroneous events have occurred, for example, a loss of incoming power and the CPU 401 must immediately begin an orderly shutdown before the holdup capacity of its power source is exhausted.
- Other interrupts occur when a main memory resource detects a double bit error in a memory with error detection and correction. And then there are some interrupts generated internally, for example, a user program attempts to access a section of memory it isn’t allowed to access, or a divide by zero operation occurs.
- the externally generated interrupts can be replaced with “interrupt packets”, which would typically be a single packet payload from a resource indicating an event requiring the immediate attention of the CPU 401 is needed.
- the packet would be directed towards the CPU’s command port with the goal of entering into an “interrupt register” in the CPU command port with the goal of replacing interrupt lines to minimize the pin count needed by the CPU.
- interrupt packets typically be a single packet payload from a resource indicating an event requiring the immediate attention of the CPU 401 is needed.
- the packet would be directed towards the CPU’s command port with the goal of entering into an “interrupt register” in the CPU command port with the goal of replacing interrupt lines to minimize the pin count needed by the CPU.
- the fewer the number of pins on an IC the less expensive its packaging will be.
- Another method of reducing pin counts on the CPU IC is to eliminate the pins needed for an enhanced Local Bus (eLB), and instead depend on an eLB Controller (eLBC) 416 to be connected to boot memory and other resources that use a traditional address and data bus to connect to the CPU 401.
- eLB enhanced Local Bus
- the eLBC resource 416 after reset, would access the boot code attached to it, place it in packets and send it to the CPU 401 over the transport mechanism.
- the route needed would have to be programmed into a part of the boot code as the eLBC resource 416 will not know how to discover where the master CPU 401 is.
- Hardware in the CPU would receive the packets, places them in its internal cache memory, and then allow the CPU to begin execution of the boot code that allows it to start up.
- the reset signal itself can be eliminated in the CPU 401 with the simple technique of shutting off all signals going to it over the transport mechanism ports 402. Only after these signals start to ‘wiggle’ will the CPU 401 start coming out of reset and set itself up to receive boot code.
- FIG. 5 depicts a mechanical diagram 500 illustrating a minimal pin count CPU integrated circuit (IC) using a 44-pin plastic leaded chip carrier (PLCC) package 501 having four transport mechanism serial ports, and a 68-pin PLCC package 502 having eight transport mechanism serial ports in accordance with embodiments of the present disclosure. Note that the signals shown on one side of each PLCC package 501 and 502 are repeated on all four sides in a symmetrical manner as depicted in FIG. 5. Properly implemented, a CPU in a quad package could be oriented in any of four directions and still properly function in the data processing system.
- PLCC plastic leaded chip carrier
- IPMI Intelligent Platform Management Interface
- a data processing system utilizing a radical means of moving data such that the data transfer rate per pin of the IC is significantly greater than any existing generally accepted protocol and can be transported over sufficient distances that PCB area intensive resources to a controlling processor can have sufficient space to implement all of said resource desired for the controller, the goal of which is to provide a means of overcoming data starvation and memory size restrictions in a processor system and provide a means of making use of all transport paths into and out of a data processing IC for any type of communications between different resources in a data processing system, such that whether the resource is a controlling processor, a main memory host, an enhanced local bus controller host, a mass storage host, high speed I/O host, a low speed I/O host, a host for a numeric or graphics processor, or a graphics display, all such resources can utilize the same pins of the controller for transporting data and instructions and thus not leave dedicated paths idle while other paths are a bottleneck to the moving of data around inside the data processing system, and said movement of data through said data processing system shall
- Said data processing system will also have sufficient hardware security features built in to prevent external controllers from being able to access resources inside said data processing system. Due to the speed of the transport mechanism may low speed signals can be offloaded to resource host ICs serving the processing IC, the transport mechanism will be able to replace many of the very low speed signal pins on a traditional controller IC with the equivalent functions inside payloads carried over the transport mechanism to minimize the size and pin count, and thus the cost of said controller IC packaging. Due to the symmetry of pin assignments the transport mechanisms can provide, IC packages can be designed such that they can be installed in any orientation and still operate with no loss of functionality, efficiency, or throughput. A data processing system where close proximity of closely inter-functioning ICs can be relaxed to allow more room for easier cooling of said data processing system.
- These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create an ability for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
- the computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
- the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
- each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration can be implemented by special purpose hardware- based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB2302608.1A GB2612554A (en) | 2020-07-30 | 2021-07-28 | An enhanced processor data transport mechanism |
CA3190446A CA3190446A1 (en) | 2020-07-30 | 2021-07-28 | An enhanced processor data transport mechanism |
US17/402,117 US11675587B2 (en) | 2015-12-03 | 2021-08-13 | Enhanced protection of processors from a buffer overflow attack |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202063058652P | 2020-07-30 | 2020-07-30 | |
US63/058,652 | 2020-07-30 |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/792,432 Continuation-In-Part US11119769B2 (en) | 2015-12-03 | 2020-02-17 | Enhanced protection of processors from a buffer overflow attack |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/402,117 Continuation-In-Part US11675587B2 (en) | 2015-12-03 | 2021-08-13 | Enhanced protection of processors from a buffer overflow attack |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022026497A1 true WO2022026497A1 (en) | 2022-02-03 |
Family
ID=80036709
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2021/043374 WO2022026497A1 (en) | 2015-12-03 | 2021-07-28 | An enhanced processor data transport mechanism |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022026497A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11499308B2 (en) | 2015-12-31 | 2022-11-15 | Cfs Concrete Forming Systems Inc. | Structure-lining apparatus with adjustable width and tool for same |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4441075A (en) * | 1981-07-02 | 1984-04-03 | International Business Machines Corporation | Circuit arrangement which permits the testing of each individual chip and interchip connection in a high density packaging structure having a plurality of interconnected chips, without any physical disconnection |
US8164272B2 (en) * | 2005-03-15 | 2012-04-24 | International Rectifier Corporation | 8-pin PFC and ballast control IC |
US20150009743A1 (en) * | 2010-11-03 | 2015-01-08 | Shine C. Chung | Low-Pin-Count Non-Volatile Memory Interface for 3D IC |
US9128690B2 (en) * | 2012-09-24 | 2015-09-08 | Texas Instruments Incorporated | Bus pin reduction and power management |
US20160231380A1 (en) * | 1999-03-26 | 2016-08-11 | Texas Instruments Incorporated | Third tap circuitry controlling linking first and second tap circuitry |
-
2021
- 2021-07-28 WO PCT/US2021/043374 patent/WO2022026497A1/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4441075A (en) * | 1981-07-02 | 1984-04-03 | International Business Machines Corporation | Circuit arrangement which permits the testing of each individual chip and interchip connection in a high density packaging structure having a plurality of interconnected chips, without any physical disconnection |
US20160231380A1 (en) * | 1999-03-26 | 2016-08-11 | Texas Instruments Incorporated | Third tap circuitry controlling linking first and second tap circuitry |
US8164272B2 (en) * | 2005-03-15 | 2012-04-24 | International Rectifier Corporation | 8-pin PFC and ballast control IC |
US20150009743A1 (en) * | 2010-11-03 | 2015-01-08 | Shine C. Chung | Low-Pin-Count Non-Volatile Memory Interface for 3D IC |
US9128690B2 (en) * | 2012-09-24 | 2015-09-08 | Texas Instruments Incorporated | Bus pin reduction and power management |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11499308B2 (en) | 2015-12-31 | 2022-11-15 | Cfs Concrete Forming Systems Inc. | Structure-lining apparatus with adjustable width and tool for same |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10884965B2 (en) | PCI express tunneling over a multi-protocol I/O interconnect | |
KR100611268B1 (en) | An enhanced general input/output architecture and related methods for establishing virtual channels therein | |
US7698483B2 (en) | Switching apparatus and method for link initialization in a shared I/O environment | |
US7917658B2 (en) | Switching apparatus and method for link initialization in a shared I/O environment | |
US7953074B2 (en) | Apparatus and method for port polarity initialization in a shared I/O device | |
US7937447B1 (en) | Communication between computer systems over an input/output (I/O) bus | |
EP1706824B1 (en) | Method and apparatus for shared i/o in a load/store fabric | |
US7219183B2 (en) | Switching apparatus and method for providing shared I/O within a load-store fabric | |
US7174413B2 (en) | Switching apparatus and method for providing shared I/O within a load-store fabric | |
US8102843B2 (en) | Switching apparatus and method for providing shared I/O within a load-store fabric | |
EP2333671B1 (en) | Inter-die interconnection interface | |
US20040179534A1 (en) | Method and apparatus for shared I/O in a load/store fabric | |
US20020085493A1 (en) | Method and apparatus for over-advertising infiniband buffering resources | |
US20050132089A1 (en) | Directly connected low latency network and interface | |
US20050283546A1 (en) | Switch/network adapter port coupling a reconfigurable processing element to one or more microprocessors for use with interleaved memory controllers | |
EP1428130A1 (en) | General input/output architecture, protocol and related methods to provide isochronous channels | |
WO2022026497A1 (en) | An enhanced processor data transport mechanism | |
US6088744A (en) | Multiport data buffer having multi level caching wherein each data port has a FIFO buffer coupled thereto | |
CA3190446A1 (en) | An enhanced processor data transport mechanism | |
US20040037292A1 (en) | Processing of received data within a multiple processor device | |
US8521937B2 (en) | Method and apparatus for interfacing multiple dies with mapping to modify source identity | |
US20240070106A1 (en) | Reconfigurable dataflow unit having remote fifo management functionality | |
US20240073129A1 (en) | Peer-to-peer communication between reconfigurable dataflow units | |
US20240070111A1 (en) | Reconfigurable dataflow unit with streaming write functionality | |
US20240073136A1 (en) | Reconfigurable dataflow unit with remote read/write functionality |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21848733 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 3190446 Country of ref document: CA |
|
ENP | Entry into the national phase |
Ref document number: 202302608 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20210728 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21848733 Country of ref document: EP Kind code of ref document: A1 |