CN116303191B - Method, equipment and medium for interconnecting wafer-to-wafer interfaces - Google Patents

Method, equipment and medium for interconnecting wafer-to-wafer interfaces Download PDF

Info

Publication number
CN116303191B
CN116303191B CN202310537595.3A CN202310537595A CN116303191B CN 116303191 B CN116303191 B CN 116303191B CN 202310537595 A CN202310537595 A CN 202310537595A CN 116303191 B CN116303191 B CN 116303191B
Authority
CN
China
Prior art keywords
data
wafer
interface
wafers
compression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310537595.3A
Other languages
Chinese (zh)
Other versions
CN116303191A (en
Inventor
李剑峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinyaohui Technology Co ltd
Original Assignee
Xinyaohui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinyaohui Technology Co ltd filed Critical Xinyaohui Technology Co ltd
Priority to CN202310537595.3A priority Critical patent/CN116303191B/en
Priority to CN202310976442.9A priority patent/CN116881188B/en
Publication of CN116303191A publication Critical patent/CN116303191A/en
Application granted granted Critical
Publication of CN116303191B publication Critical patent/CN116303191B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4027Coupling between buses using bus bridges
    • G06F13/405Coupling between buses using bus bridges where the bridge performs a synchronising function
    • G06F13/4059Coupling between buses using bus bridges where the bridge performs a synchronising function where the synchronisation uses buffers, e.g. for speed matching between buses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/382Information transfer, e.g. on bus using universal interface adapter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4063Device-to-bus coupling
    • G06F13/4068Electrical coupling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4282Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0061Error detection codes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/0078Avoidance of errors by organising the transmitted data in a format specifically designed to deal with errors, e.g. location
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/04Protocols for data compression, e.g. ROHC
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Abstract

The application provides a method, equipment and medium for interconnecting wafer to wafer interfaces. The method comprises the following steps: responding to data transmission, and inputting data to be transmitted into an interface cache; the method comprises the steps of performing data cutting on data to be transmitted through a protocol processing unit to obtain cut data, performing cyclic redundancy check calculation on the cut data to generate a cyclic redundancy calculation result, adding the cyclic redundancy calculation result into the cut data, assembling and striping the cut data to obtain distributed data, performing coding operation to obtain coded data, scrambling and framing the coded data, transmitting the coded data through a transmission interface, and acquiring the data to be received through the transmission interface in response to data receiving; and carrying out framing and descrambling on the data to be received through a protocol processing unit, then carrying out decoding operation to obtain decoded data, carrying out data aggregation processing to the decoded data to obtain aggregated data, and carrying out cyclic redundancy check and data combination and then inputting the aggregated data into an interface cache. This contributes to a high data transmission rate and data transmission reliability.

Description

Method, equipment and medium for interconnecting wafer-to-wafer interfaces
Technical Field
The present application relates to the field of chip design technologies, and in particular, to a method, an apparatus, and a medium for interconnecting wafer-to-wafer interfaces.
Background
With the progress of semiconductor technology and the increase of chip design scale, the area of a chip is also larger and larger, and the performance improvement of a System On Chip (SOC), i.e. a system on chip, becomes more difficult, and meanwhile, the problems of increased leakage current, increased difficulty in heat dissipation processing, slow increase of the main frequency of a chip clock and the like are faced. Breakthrough in semiconductor process is also difficult to achieve optimal performance and extremely low power consumption for SOC, and is also faced with the problem of rising chip manufacturing costs and reduced manufacturing yields. In order to manufacture chips with satisfactory performance and power consumption using existing technology, the SOC is broken into multiple wafers (die) and packaged and interconnected to construct a chipset or multi-chip module. For example, through chip technology, each functional block originally integrated in the same system single chip is split, and after being manufactured separately, the functional blocks are finally integrated and packaged into a system chipset through packaging and interconnection technology. To ensure accuracy of data interconnection between the packaged multiple wafers, the wafer-to-wafer (die-to-die) interface interconnection also needs to have characteristics of high data bandwidth, low latency and high reliability, so as to meet application requirements in fields such as networks, large-scale data centers, artificial intelligence and the like. However, the manner in which the wafer-to-wafer interfaces of the prior art are interconnected is difficult to meet the requirements of high data transmission rates and high data transmission reliability.
Therefore, the application provides a method, equipment and medium for interconnecting wafer to wafer interfaces, which are used for solving the technical problems in the prior art.
Disclosure of Invention
In a first aspect, the present application provides a method of wafer-to-wafer interface interconnection. Each wafer of a plurality of wafers including a wafer-to-wafer interface for data interconnection between the wafer and another wafer of the plurality of wafers relative to the wafer, the method being applied to a first wafer, the first wafer being any wafer of the plurality of wafers, the wafer-to-wafer interface of the first wafer including an interface cache, a protocol processing unit, and a transport interface, the method comprising: responding to the data transmission of the first wafer, and inputting first data to be transmitted to the interface cache; performing data cutting on the first data to be transmitted through the protocol processing unit to obtain cut data, performing cyclic redundancy check calculation on the cut data to generate a cyclic redundancy calculation result, adding the cyclic redundancy calculation result into the cut data to perform assembly and striping distribution to obtain distributed data, performing coding operation on the distributed data to obtain coded data, and performing scrambling and framing on the coded data to obtain second data to be transmitted; the second data to be sent are sent through the transmission interface, and the first data to be received are obtained through the transmission interface in response to the data receiving of the first wafer; the protocol processing unit is used for framing and descrambling the first data to be received, then carrying out decoding operation to obtain decoded data, carrying out data aggregation on the decoded data to obtain aggregated data, and carrying out cyclic redundancy check and data combination on the aggregated data to obtain second data to be received; and inputting the second data to be received to the interface cache.
According to the first aspect of the application, the requirements of high data transmission rate and high data transmission reliability can be met, and the requirements of various protocols, rules, strategies and the like in terms of data transmission, flow control, scheduling functions, bandwidth, data channels and the like can be flexibly adapted.
In a possible implementation manner of the first aspect of the present application, the encoding operation is based on a first encoding scheme, and the decoding operation is based on a first decoding scheme, the first encoding scheme corresponding to the first decoding scheme.
In a possible implementation manner of the first aspect of the present application, the first coding scheme is 64/67 coding, and the first decoding scheme is 64/67 decoding.
In a possible implementation manner of the first aspect of the present application, the first data to be sent is from a user data interface associated with the first wafer, and the second data to be received is sent to the user data interface.
In a possible implementation manner of the first aspect of the present application, the second data to be received is subjected to a rate adaptation process before being sent to the user data interface.
In a possible implementation form of the first aspect of the application, the second data to be sent is sent to a transport interface of a wafer-to-wafer interface of a second wafer relative to the first wafer.
In a possible implementation manner of the first aspect of the present application, adding, by the protocol processing unit, the cyclic redundancy calculation result to the cut data so as to perform assembling and striping distribution to obtain the distributed data includes: and adding the cyclic redundancy calculation result into the cut data through the protocol processing unit so as to assemble and stripe the data according to the burst length setting to obtain the distributed data.
In a possible implementation manner of the first aspect of the present application, the transmission interface is a serializer deserializer interface.
In a possible implementation manner of the first aspect of the present application, the protocol processing unit is an Interlaken protocol processing unit.
In a possible implementation manner of the first aspect of the present application, by the protocol processing unit, at least before performing data slicing on the first data to be sent to obtain sliced data, rate adaptation processing is performed on the first data to be sent in the interface buffer.
In a possible implementation manner of the first aspect of the present application, by the protocol processing unit, a control field is synchronously generated for recording description information of the cut data in a process of performing cyclic redundancy check calculation on the cut data to generate the cyclic redundancy calculation result.
In a possible implementation manner of the first aspect of the present application, the plurality of wafers are homogenous wafers or heterogeneous wafers.
In a possible implementation manner of the first aspect of the present application, the plurality of wafers correspond to functional blocks in a same system single wafer, and the plurality of wafers are packaged together through respective wafer-to-wafer interfaces so as to form a system chipset corresponding to the system single wafer.
In a possible implementation of the first aspect of the present application, the plurality of wafers are packaged together by a die technology.
In a possible implementation manner of the first aspect of the present application, the encoding operation is based on a first encoding scheme, the encoded data includes a plurality of pre-compression data, and sizes of the plurality of pre-compression data are all bits of a first value, the first value is based on the first encoding scheme, where scrambling and framing the encoded data to obtain the second data to be sent includes: and respectively carrying out compression transcoding on the plurality of pre-compression data to obtain a plurality of compressed data which are in one-to-one correspondence with the plurality of pre-compression data, carrying out forward error correction calculation on the plurality of pre-compression data to generate a redundant error correction code, adding the redundant error correction code into the plurality of compressed data to update the encoded data, and carrying out scrambling and framing on the updated encoded data to obtain the second data to be transmitted.
In a possible implementation manner of the first aspect of the present application, the first coding scheme is a 64/67 coding, and the first value is 67.
In a possible implementation manner of the first aspect of the present application, a size of the encoded data before updating and a size of the encoded data after updating are consistent, the plurality of compressed data corresponds to a data field in the plurality of pre-compression data, and the redundant error correction code generated by performing forward error correction calculation on the plurality of pre-compression data corresponds to a bit field for synchronization in the plurality of pre-compression data.
In a possible implementation manner of the first aspect of the present application, the decoding operation is based on a first decoding scheme, where the first encoding scheme corresponds to the first decoding scheme, and performing data aggregation processing on the decoded data to obtain aggregated data includes: and decompressing and reversely transcoding the decoded data, then carrying out forward error correction test, and then carrying out data aggregation processing to obtain the aggregated data.
In a possible implementation manner of the first aspect of the present application, performing compression transcoding on the plurality of pre-compression data to obtain the plurality of post-compression data corresponding to the plurality of pre-compression data one to one respectively includes: compressing bit fields for synchronization in the plurality of pre-compression data for transmission of the redundant error correction code, and maintaining data fields in the plurality of pre-compression data.
In a possible implementation manner of the first aspect of the present application, a link transmission bandwidth of the transmission interface associated with the plurality of data before compression is equal to a link transmission bandwidth of the transmission interface associated with the plurality of data after compression.
In a possible implementation manner of the first aspect of the present application, performing compression transcoding on the plurality of pre-compression data to obtain the plurality of compressed data corresponding to the plurality of pre-compression data one-to-one respectively is based on the first coding scheme, where the first coding scheme is based on a wafer-to-wafer interface interconnection protocol associated with the protocol processing unit.
In a second aspect, embodiments of the present application further provide a computer device, the computer device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing a method according to any one of the implementations of any one of the above aspects when the computer program is executed.
In a third aspect, embodiments of the present application also provide a computer-readable storage medium storing computer instructions that, when run on a computer device, cause the computer device to perform a method according to any one of the implementations of any one of the above aspects.
In a fourth aspect, embodiments of the present application also provide a computer program product comprising instructions stored on a computer-readable storage medium, which when run on a computer device, cause the computer device to perform a method according to any one of the implementations of any one of the above aspects.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a system chipset integrated by a plurality of wafers interconnected with each other according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a plurality of wafers interconnected by a wafer-to-wafer interface according to an embodiment of the present application;
FIG. 3 is a schematic illustration of a plurality of wafers interconnected by a wafer-to-wafer interface according to another embodiment of the present application;
FIG. 4 is a flow chart of a method for wafer-to-wafer interface interconnection according to an embodiment of the present application;
FIG. 5 is a flow chart of a data transmission process through a wafer-to-wafer interface and based on a first encoding scheme according to an embodiment of the present application;
fig. 6 is a flow chart of a data receiving process through a wafer-to-wafer interface and based on a first decoding scheme according to an embodiment of the present application.
Detailed Description
Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
It should be understood that in the description of the application, "at least one" means one or more than one, and "a plurality" means two or more than two. In addition, the words "first," "second," and the like, unless otherwise indicated, are used solely for the purposes of description and are not to be construed as indicating or implying a relative importance or order.
Fig. 1 is a schematic diagram of a system chipset integrated by a plurality of wafers interconnected with each other according to an embodiment of the present application. As shown in fig. 1, system chipset 110 includes four wafers, wafer a 102, wafer B104, wafer C106, and wafer D108, respectively. Among them, the four wafers included in the system chipset 110 shown in fig. 1, that is, wafer a 102, wafer B104, wafer C106, and wafer D108, all belong to wafers (die). A wafer, also called a die or die, is a body of a small integrated circuit fabricated from semiconductor material without packaging. A wafer is understood to mean a die before the chip is unpackaged, which is a small piece cut from a silicon wafer (wafer) by laser, each wafer being a separate functional chip. A wafer is packaged as a unit into a chip. The die cannot be used directly because the die has no leads nor heat sinks without being packaged. System on chip refers to the function of implementing the entire system on a single chip, also called a System On Chip (SOC). With the rapid increase of chip design scale, system functions become richer and more complex, the area of a single wafer also increases rapidly, and the realization of the functions of the whole system on a single wafer, that is, the preparation of the system single wafer faces greater and greater challenges, such as more difficult performance improvement, greater difficulty in heat dissipation, increased leakage current, slower increase of the main frequency of a chip clock, and the like. But also the increase in wafer area and the increase in chip design complexity have caused a problem of reduced manufacturing yield. In order to improve performance and control cost and power consumption, the functions of the whole system which are originally realized on a system single chip are split and realized through a plurality of chips, and a chip set or a multi-chip module is constructed by using packaging and interconnection technology. For example, through chip technology, each functional block originally integrated in the same system single chip is split, and after being manufactured separately, the functional blocks are finally integrated and packaged into a system chipset through packaging and interconnection technology.
With continued reference to fig. 1, the system chipset 110 shown in fig. 1 may correspond to the functionality of the overall system. The functions of the system implemented by the system chipset 110 may be implemented on a single wafer, i.e., may be implemented by a single system wafer, while the functions of the system chipset 110 are split in fig. 1 by four wafers, i.e., wafer a 102, wafer B104, wafer C106, and wafer D108. Wafer a 102, wafer B104, wafer C106, and wafer D108 are interconnected with each other through a wafer-to-wafer (die-to-die) interface to effect data interconnection. As can be seen in fig. 1, any one of wafer a 102, wafer B104, wafer C106 and wafer D108 has a wafer-to-wafer interface interconnect relationship with the other three wafers, respectively (one wafer-to-wafer interface interconnect relationship is shown with a double-headed arrow in fig. 1). The interconnection interface between the wafers is used for data interconnection between the wafers, and also corresponds to wafer-to-wafer interface interconnection, and the characteristics of high data bandwidth, low delay, high reliability and the like must be met, so that the interconnection interface can be used for integrally packaging a plurality of wafers and further meet the application of chips in the fields of networks, ultra-large-scale data centers, artificial intelligence and the like. Wafer-to-wafer interface interconnection means here that one wafer is interconnected with another wafer for packaging together, each wafer comprising at least one module with a physical interface. One die with a common interface may communicate with another die over a short-range wire. Wafer-to-wafer interface interconnects can be divided into two cases, a homogenous wafer and a heterogeneous wafer. Wherein a homogenous wafer is primarily wafer split, interconnecting two or more equal or homogenous wafer data such that these smaller multiple wafers behave like a single wafer. For example, the four wafers in fig. 1, wafer a 102, wafer B104, wafer C106, and wafer D108, may each be a central processing unit (Central Processing Unit). The heterogeneous wafer is mainly used for packaging and integrating different functions into a uniformly packaged chip set. For example, wafer a 102 and wafer B104 in fig. 1 may be central processors, while wafer C106 and wafer D108 may be graphics processors (graphic processing unit, GPU), neural network processors (neural-network processing unit, NPU), tensor processors (tensor processing unit, TPU), or data processors (data processing unit, DPU). Thus, different semiconductor fabrication processes, different integrated packaging approaches, may affect details regarding wafer-to-wafer interface interconnection. For example, wafer-to-wafer interface interconnections may be used for data interconnections between the wafer of one central processor and the wafer of another central processor, as well as between the wafer of one central processor and the wafer of a neural network processor. In addition, wafer-to-wafer interface interconnects typically employ specific protocols, rules, policies, etc. for specifying details regarding high-speed data transfer between wafers. For example, optimization or requirements can be made in terms of flow control, scheduling functions, bandwidth, data channels, etc., to better meet business scenarios such as networking, ultra-large scale data centers, and artificial intelligence. For example, wafer-to-wafer interface interconnects may employ protocols that support multi-channel transmission and may be up to 10 gigabits per second (Gbps) to 300Gbps or higher in data transmission bandwidth. The wafer-to-wafer interface interconnect may also need to support a SERializer deserializer (SERializer/DESerializer, SERDES). SERDES means that multiple low-speed parallel signals are converted into high-speed serial signals at a transmitting end and then transmitted in a differential mode, and the high-speed serial signals are converted into low-speed parallel signals at a receiving end. In order to support SERDES, an integrated clock data recovery (clock data recovery, CDR) circuit is required at the receiving end, with the CDR circuit recovering the clock signal from the data edge information and sampling to recover the data signal. Because multiple wafer packages are to be integrated together, wafer-to-wafer interface interconnects are also integrated within the chipset, such as the data interconnects shown in FIG. 1 for wafer A102, wafer B104, wafer C106, and wafer D108 with each other are integrated within system chipset 110. This means that when a problem occurs in the data interconnection between the wafers, such as a data transmission error, a data loss, etc., it is difficult to locate the problem and to correct it. However, as the data transfer rate requirements between wafers become higher, it also presents greater challenges in ensuring the reliability and correctness of data transfer between wafers. Therefore, the embodiment of the application provides a method, equipment and medium for interconnecting wafer to wafer interfaces, which are used for enabling the mode of interconnecting wafer to wafer interfaces to meet the requirements of high data transmission rate and high data transmission reliability.
Fig. 2 is a schematic diagram of a plurality of wafers interconnected by a wafer-to-wafer interface according to an embodiment of the present application. As shown in fig. 2, data interconnection is achieved between wafer E210 and wafer F220 via respective wafer-to-wafer interfaces. In particular, die E210 includes a die-to-die interface E212, a flash peripheral component interconnect interface E214, a high bandwidth memory interface E216, and an ethernet interface E218. Wafer F220 includes wafer-to-wafer interface F222, flash peripheral component interconnect interface F224, high bandwidth memory interface F226, and ethernet interface F228. Wherein wafer-to-wafer interface E212 included in wafer E210 is configured to physically connect with wafer-to-wafer interface F222 included in wafer F220, thereby enabling data interconnection between wafer E210 and wafer F220. While the peripheral component interconnect express interface E214 included in the die E210 and the peripheral component interconnect express interface F224 included in the die F220 are used to support peripheral component interconnect express (peripheral component interconnect express, PCIE) standards, for example, for connecting PCIE standard devices, so that they can be used to connect servers and high performance computing centers. The high bandwidth memory interface E216 included in die E210 and the high bandwidth memory interface F226 included in die F220 are both used to connect to high bandwidth memory (High Bandwidth Memory, HBM) for use in traffic scenarios such as network switching, network message forwarding, and graphics processors. The ethernet interface E218 included in the die E210 and the ethernet interface F228 included in the die F220 are each used to provide ethernet (ethernet) related functions, such as accessing an ethernet card or the like. The die E210 and the die F220 shown in fig. 2 each include interfaces with a plurality of different functions, so that the interfaces can be used in various service scenarios, for example, access to a PCIE bus through a fast peripheral component interconnect interface E214 included in the die E210 or access to an HBM through a high bandwidth memory interface F226 included in the die F220. The data interconnection between the wafers is realized through the physical connection between the wafer E210 and the wafer F220 and between the wafer E212 and the wafer F222, so that the wafer E210 and the wafer F220 externally behave like one wafer, thereby improving the performance and simultaneously controlling the cost and the power consumption. The wafer-to-wafer interface interconnection between the wafer E210 and the wafer F220, that is, the data interconnection between the wafer-to-wafer interface E212 included in the wafer E210 and the wafer-to-wafer interface F222 included in the wafer F220, needs to meet the requirements in terms of output transmission proposed by a specific protocol, rule, policy, etc., and may also need to meet the requirements in terms of flow control, scheduling function, bandwidth, data channel, etc.
FIG. 3 is a schematic diagram of a plurality of wafers interconnected by a wafer-to-wafer interface according to another embodiment of the present application. As shown in fig. 3, wafer G310 includes a first wafer-to-wafer interface G312, a second wafer-to-wafer interface G313, a high bandwidth memory interface G316, and an ethernet interface G318. Wafer H320 includes a first wafer-to-wafer interface H322, a second wafer-to-wafer interface H323, a high bandwidth memory interface H326, and an ethernet interface H328. The physical connection between the wafer G310 and the wafer H320 through the first wafer-to-wafer interface G312 included in the wafer G310 and the first wafer-to-wafer interface H322 included in the wafer H320 realizes data interconnection between the wafers, so that the wafer G310 and the wafer H320 externally behave like one wafer, and thus the performance is improved while the cost and the power consumption are controlled. In addition, wafer G310 may be connected to another wafer via a second wafer-to-wafer interface G313 and wafer H320 may be connected to another wafer via a second wafer-to-wafer interface H323. The wafer-to-wafer interface interconnection between wafer G310 and wafer H320 may be required to meet output transmission requirements set forth by specific protocols, rules, policies, etc., and may also be required to meet flow control, scheduling functions, bandwidth, data channels, etc.
Referring to fig. 1, 2 and 3, wafer-to-wafer interface interconnection means that one wafer is interconnected with another wafer for packaging, each wafer including at least one module with a physical interface. One die with a common interface may communicate with another die over a short-range wire. Depending on how multiple wafers are integrated and packaged into a chipset, and also on the functional partitioning and interface arrangement of the multiple wafers, a wafer may have one or more wafer-to-wafer interfaces, each of which may also employ specific protocols, rules, policies, etc. for meeting requirements in terms of data transmission, flow control, scheduling functions, bandwidth, data channels, etc. These protocols, rules, policies, etc. may also require specific details of the wafer-to-wafer interface interconnection, such as the number of communication channels, control word structures, data encoding and data decoding, flow control options, etc. Therefore, it is necessary to provide a method for interconnecting wafer to wafer interfaces, which can meet the requirements of high data transmission rate and high data transmission reliability, and can flexibly adapt to the requirements of various protocols, rules, strategies and the like in terms of data transmission, flow control, scheduling functions, bandwidth, data channels and the like. This is described in detail below in conjunction with fig. 4.
Fig. 4 is a flow chart of a method for interconnecting wafer-to-wafer interfaces according to an embodiment of the present application. Wherein the method of wafer-to-wafer interface interconnection shown in fig. 4 is applicable to a plurality of wafers, each wafer of the plurality of wafers including a wafer-to-wafer interface for data interconnection between the wafer and another wafer of the plurality of wafers relative to the wafer. The method is applied to a first wafer, which is any wafer of the plurality of wafers, and a wafer-to-wafer interface of the first wafer comprises an interface cache, a protocol processing unit and a transmission interface. As shown in fig. 4, the method includes the following steps.
Step S402: and in response to the data transmission of the first wafer, inputting first data to be transmitted to the interface cache.
Step S404: and performing data cutting on the first data to be transmitted through the protocol processing unit to obtain cut data, performing cyclic redundancy check calculation on the cut data to generate a cyclic redundancy calculation result, adding the cyclic redundancy calculation result into the cut data to perform assembly and striping distribution to obtain distributed data, performing coding operation on the distributed data to obtain coded data, and performing scrambling and framing on the coded data to obtain second data to be transmitted.
Step S406: and sending the second data to be sent through the transmission interface.
Step S408: and responding to the data receiving of the first wafer, and acquiring first data to be received through the transmission interface.
Step S410: and carrying out framing and descrambling on the first data to be received through the protocol processing unit, then carrying out decoding operation to obtain decoded data, carrying out data aggregation processing on the decoded data to obtain aggregated data, and carrying out cyclic redundancy check and data combination on the aggregated data to obtain second data to be received.
Step S412: and inputting the second data to be received to the interface cache.
The method of wafer-to-wafer interface interconnection shown in fig. 4 is applicable to a plurality of wafers, each wafer of the plurality of wafers including a wafer-to-wafer interface for data interconnection between the wafer and another wafer of the plurality of wafers relative to the wafer. The method is applied to a first wafer, which is any wafer of the plurality of wafers, and a wafer-to-wafer interface of the first wafer comprises an interface cache, a protocol processing unit and a transmission interface. The plurality of wafers corresponds to a plurality of functional blocks combined together into a single chip, and the interface between each functional block is a wafer-to-wafer interface. The first wafer is any one of the plurality of wafers, and the wafer-to-wafer interface of the first wafer provides data interconnection between the first wafer and another wafer. Referring to the above steps, in step S402, first data to be transmitted is input to the interface buffer in response to the data transmission of the first wafer. The first wafer may receive data from a user side and input the data from the user side to the interface cache. In some embodiments, rate adaptation processing may be performed through the interface cache, which may help with overall data transmission stability and reliability. Next, in step S404, the protocol processing unit performs data slicing on the first data to be sent to obtain sliced data. The wafer-to-wafer interface of the first wafer may employ specific protocols, rules, policies, etc. for meeting requirements in terms of data transmission, flow control, scheduling functions, bandwidth, data channels, etc. These protocols, rules, policies, etc. may also require specific details of the wafer-to-wafer interface interconnection, such as the number of communication channels, control word structures, data encoding and data decoding, flow control options, etc. Here, the protocol processing unit is mainly configured to provide necessary processing functions to meet requirements of protocols, rules, or policies adopted by the wafer-to-wafer interface of the first wafer in terms of data transmission, flow control, scheduling functions, bandwidth, data channels, and the like. In step S404, the protocol processing unit performs data slicing on the first data to be sent to obtain sliced data. Here, it means that data slicing is performed at the protocol layer, which is to consider the burst length setting and control word assembly requirements in the subsequent flow, and the data length of one transmission is called burst length. Generally, data slicing is performed by slicing data in units of fixed bits (e.g., 64 bits). And performing cyclic redundancy check calculation on the cut data to generate a cyclic redundancy calculation result. The cyclic redundancy check calculation is performed to generate a cyclic redundancy calculation result, which is cyclic redundancy check (cyclic redundancy check, CRC) for the subsequent flow. CRC uses a hash function that generates a short fixed bit check code from data such as packets or files to check for errors that may occur after data transmission or storage. The check code may be calculated and appended to the data prior to transmission or storage of the data so that the recipient can check to determine if the data has changed. In step S404, the cyclic redundancy calculation result is added to the cut data so that assembling and striping distribution are performed to obtain distributed data. Here, assembling means data cutting and control word assembling. In some embodiments, the protocol control word is assembled on a data cut basis with a data word representing the data to be transmitted in combination with a burst length setting. Striping means that the distributed data can be sent over multiple channels, e.g., one SERDES physical link per channel, depending on the requirements in terms of data channels and channelized data transmission. Thus, in response to the data transmission of the first wafer, first data to be transmitted is input to the interface buffer, then cut by data to obtain cut data, and then the cyclic redundancy calculation result is added to the cut data to perform assembly and striping distribution to obtain distributed data. The conversion from the first data to be sent to the distributed data is completed, the details of control at the protocol layer are also embodied, and the method can be used for meeting the requirements of the number of communication channels, the control word structure and the like. Next, in step S404, the distributed data is encoded to obtain encoded data, and the encoded data is scrambled and framed to obtain second data to be transmitted. Performing the encoding operation means performing a frame layer control word assembly including adding a sync word as a meta frame sync header for determining the meta frame location. It should be appreciated that the encoding operation is generally based on a particular protocol, rule, policy, etc. employed by the wafer-to-wafer interface of the first wafer, e.g., the protocol may specify a particular encoding format requiring that the original data be written as data in the particular encoding format according to certain rules. In some embodiments, performing the encoding operation means encoding 64 bits of data into 67 bits of data. And after the coding operation is carried out to obtain coded data, scrambling and framing are carried out on the coded data to obtain second data to be transmitted. This helps to enhance data integrity and link stability. Finally, in step S406, the second data to be sent is sent through the transmission interface. In some embodiments, the transmission interface is connected to a plurality of SERDES channels, and the second data to be sent may be aligned and sent through the plurality of SERDES channels.
With continued reference to fig. 4 and the above steps, in step S408, in response to the data reception of the first wafer, first data to be received is obtained through the transmission interface. Next, in step S410, the protocol processing unit performs framing and descrambling on the first data to be received, then performs decoding operation to obtain decoded data, performs data aggregation on the decoded data to obtain aggregated data, and performs cyclic redundancy check and data combination on the aggregated data to obtain second data to be received. Thus, the first data to be received, such as data received through a SERDES interface, received from the transmission interface is subjected to framing and descrambling, that is, frame positioning and descrambling are completed. And then, performing decoding operation, performing data aggregation processing on the decoded data, finally, performing cyclic redundancy check and data combination to recover user data, and finally, writing the user data into a data cache. And can be sent to the user side interface after rate adaptation adjustment.
Thus, the method for interconnecting wafer to wafer interfaces shown in fig. 4 can meet the requirements of high data transmission rate and high data transmission reliability, and can flexibly adapt to the requirements of various protocols, rules, strategies and the like in aspects such as data transmission, flow control, scheduling functions, bandwidth, data channels and the like. The method of wafer-to-wafer interface interconnection shown in fig. 4, wherein the encoding and decoding operations may be based on the specific protocol employed by the wafer-to-wafer interface of the first wafer, such as encoding and decoding according to encoding formats and rules specified therein. Typically, the interface between the wafers requires a large data transfer bandwidth and also has flow control and channeling characteristics. In the traffic scenario of the network chip, to achieve high-speed digital communication transmission, a combination of interfaces of a specific communication protocol and SERDES may be adopted for the interfaces between the wafers.
In one possible implementation, the encoding operation is based on a first encoding scheme, and the decoding operation is based on a first decoding scheme, the first encoding scheme corresponding to the first decoding scheme. In some embodiments, the first encoding scheme is a 64/67 encoding and the first decoding scheme is a 64/67 decoding. In this way, the data transmission and the data reception of the wafer-to-wafer interface of the first wafer can be achieved by the first encoding scheme and the first decoding scheme corresponding to each other. The 64/67 coding means that original data is written into data under a specific coding format according to a certain rule, namely, 64-bit data is coded into 67-bit data. 64/67 decoding means that 67 bits of data are decoded into 64 bits of data.
In one possible implementation, the first data to be sent is from a user data interface associated with the first wafer, and the second data to be received is sent to the user data interface. In some embodiments, the second data to be received is rate adapted before being sent to the user data interface. Thus, the stability and reliability of the data transmission as a whole are facilitated by the rate adaptation process.
In one possible embodiment, the second data to be sent is sent to a transmission interface of a wafer-to-wafer interface of a second wafer opposite the first wafer. In this way, data interconnection between the first wafer and the second wafer is achieved.
In a possible implementation manner, the protocol processing unit adds the cyclic redundancy calculation result to the cut data so as to assemble and stripe the data, and the method includes: and adding the cyclic redundancy calculation result into the cut data through the protocol processing unit so as to assemble and stripe the data according to the burst length setting to obtain the distributed data. In this way, the protocol control word is assembled on a data cut basis with the data word representing the data to be transmitted in combination with the burst length setting. Striping means that the distributed data can be sent over multiple channels, e.g., one SERDES physical link per channel, depending on the requirements in terms of data channels and channelized data transmission. Thus, adding the cyclic redundancy calculation result to the cut data to assemble and stripe the distributed data according to the burst length setting means that the burst length setting, that is, the data length setting of one transmission, is combined, which facilitates the subsequent data transmission through multiple channels, such as multiple SERDES channels.
In one possible implementation, the transmission interface is a serializer deserializer interface. In a possible implementation, the protocol processing unit is an Interlaken protocol processing unit. Typically, the interface between the wafers requires a large data transfer bandwidth and also has flow control and channeling characteristics. In the traffic scenario of the network chip, to achieve high-speed digital communication transmission, a combination of interfaces of a specific communication protocol and SERDES may be adopted for the interfaces between the wafers. As mentioned above, the method of wafer-to-wafer interface interconnection shown in fig. 4, wherein the encoding and decoding operations may be based on the specific protocol employed by the wafer-to-wafer interface of the first wafer, for example encoding and decoding according to the encoding formats and rules specified therein. For example, the Interlaken protocol specifies a specific coding format of 64/67 codes, that is, means that when the protocol processing unit is an Interlaken protocol processing unit, the wafer-to-wafer interface of the first wafer performs inter-wafer data interconnection based on the Interlaken protocol, so that original data needs to be written into data in the specific coding format according to a certain rule, that is, 64-bit data is coded into 67-bit data. It should be understood that the Interlaken protocol processing unit is optimally designed for the Interlaken protocol. The protocol processing unit may also be adapted to other inter-chip communication protocols, such as XAUI protocols and PCIE protocols. Depending on the specific adopted protocol, and the setting of a transmission interface, a corresponding coding scheme and decoding scheme can be adopted, so that the requirements of various protocols, rules, strategies and the like in terms of data transmission, flow control, scheduling functions, bandwidth, data channels and the like can be flexibly adapted.
In one possible implementation manner, by the protocol processing unit, at least before data cutting is performed on the first data to be sent to obtain cut data, rate adaptation processing is performed on the first data to be sent in the interface cache. Thus, the stability and reliability of the data transmission as a whole are facilitated by the rate adaptation process.
In a possible implementation manner, the protocol processing unit synchronously generates a control field for recording description information of the cut data in the process of generating the cyclic redundancy calculation result by performing cyclic redundancy check calculation on the cut data. In this manner, the generated control fields may be used for data cutting and control word assembly. For example, the control field may record descriptive information of the cut data to embody protocol layer control details, such as a control word structure, etc.
In one possible embodiment, the plurality of wafers is a homogenous wafer or a heterogeneous wafer. In one possible implementation, the plurality of wafers correspond to functional blocks in the same system single wafer, and the plurality of wafers are packaged together through respective wafer-to-wafer interfaces to form a system chipset corresponding to the system single wafer. In one possible embodiment, the plurality of wafers are packaged together by a die technique. It should be appreciated that different semiconductor fabrication processes, different integrated packaging schemes, may affect details regarding wafer-to-wafer interface interconnection. For example, wafer-to-wafer interface interconnects may be used for data interconnection between a homogenous wafer, such as a wafer of one central processor, and a wafer of another central processor, and also for data interconnection between a heterogeneous wafer, such as a wafer of one central processor, and a wafer of a neural network processor.
With continued reference to fig. 4, the method for interconnecting the wafer-to-wafer interfaces shown in fig. 4 can meet the requirements of high data transmission rate and high data transmission reliability, and can flexibly adapt to the requirements of various protocols, rules, strategies and the like in terms of data transmission, flow control, scheduling functions, bandwidth, data channels and the like. The method of wafer-to-wafer interface interconnection shown in fig. 4, wherein the encoding and decoding operations may be based on the specific protocol employed by the wafer-to-wafer interface of the first wafer, such as encoding and decoding according to encoding formats and rules specified therein. With the higher demands placed on data transmission bandwidth and data transmission rate, there are also greater challenges in terms of data transmission reliability and data error correction. In addition, it is also necessary to consider the specific coding scheme and decoding scheme, which may be changed due to the protocol, rule, policy, etc. adopted by the wafer-to-wafer interface, so that it is necessary to provide a general data protection mechanism in the protocol layer processing part, which can flexibly enhance data protection according to the requirements of the specific communication protocol and the specific coding and decoding scheme, including enhancing error detection on link data and providing an error correction mechanism, so as to improve the reliability of high-speed data transmission of the interconnection interface between wafers, and meet the application requirements in fields such as network, large-scale data center, artificial intelligence, etc. These improvements are described in detail below.
In one possible implementation manner, the encoding operation is based on a first encoding scheme, the encoded data includes a plurality of pre-compression data, the sizes of the plurality of pre-compression data are all bits of a first value, the first value is based on the first encoding scheme, and scrambling and framing the encoded data to obtain the second data to be sent includes: and respectively carrying out compression transcoding on the plurality of pre-compression data to obtain a plurality of compressed data which are in one-to-one correspondence with the plurality of pre-compression data, carrying out forward error correction calculation on the plurality of pre-compression data to generate a redundant error correction code, adding the redundant error correction code into the plurality of compressed data to update the encoded data, and carrying out scrambling and framing on the updated encoded data to obtain the second data to be transmitted. Thus, using forward error correction (forward error correction, FEC) techniques, control of transmission errors in a communication system and error recovery of additional information transmitted with the data is achieved to reduce bit error rates. Specifically, by using the FEC technique, the data to be transmitted is transmitted together with a certain redundancy error correction code by enhancing the data to be transmitted, so that the receiver can perform error detection and error correction on the received data according to the error correction code. In this way, on the basis of a plurality of pre-compression data included in the encoded data, the plurality of pre-compression data are respectively compression-transcoded to obtain a plurality of pre-compression data corresponding to the plurality of pre-compression data one by one, forward error correction calculation is performed on the plurality of pre-compression data to generate a redundant error correction code, and the redundant error correction code is added to the plurality of post-compression data to update the encoded data. In this way, by using the FEC technology to generate the redundancy error correction code, the data protection is enhanced, the error rate of the interface data transmission between the wafers can be reduced, and the plurality of pre-compression data are respectively compressed and transcoded to obtain a plurality of post-compression data corresponding to the plurality of pre-compression data one by one, so that the data size is reduced, and the redundancy error correction code can be added into the plurality of post-compression data to update the encoded data, for example, a redundancy data field can be added at the tail of the data to be transmitted, thereby reducing the influence on the data transmission bandwidth. And finally, scrambling and framing the updated encoded data to obtain the second data to be transmitted, so that the data protection is enhanced and the influence on the data transmission bandwidth is reduced. It should be noted that the encoding operation is based on a first encoding scheme, the size of the plurality of pre-compression data being bits of a first value, the first value being based on the first encoding scheme. The enhanced data protection mechanism described above may thus be adapted to the first coding scheme specified in the communication protocol, and thus may be able to flexibly enhance data protection according to the requirements of the specific communication protocol and the specific coding and decoding scheme. In some embodiments, the first encoding scheme is a 64/67 encoding and the first value is 67. That is, the size of the plurality of pre-compression data is 67 bits.
Further, in some embodiments, the size of the encoded data before updating and the size of the encoded data after updating are consistent, the plurality of compressed data corresponds to a data field in the plurality of pre-compressed data, and the redundant error correction code generated by performing forward error correction calculation on the plurality of pre-compressed data corresponds to a bit field for synchronization in the plurality of pre-compressed data. In this way, after the encoding operation is performed based on the first encoding scheme to obtain a plurality of pre-compression data included in the encoded data, the plurality of pre-compression data is respectively compression-transcoded to obtain a plurality of post-compression data corresponding to the plurality of pre-compression data one by one. And the size of the encoded data before updating is identical to the size of the encoded data after updating. This means that, when a redundancy error correction code is generated using an FEC technique and added to the plurality of compressed data to update the encoded data, the size of the data is not changed, but a compressed data space is obtained by compression transcoding for transmitting a redundancy data field of an FEC algorithm, that is, an FEC error correction code. And, the plurality of compressed data corresponds to a data field in the plurality of pre-compressed data, which means that the error correction code is stored with a redundant portion on the premise that the data field is kept unchanged, so that the size of the encoded data as a whole remains unchanged before and after updating. For example, it is assumed that the plurality of pre-compression data are four data (for example, the first value is 67) each having 67 bits, the four data each having 67 bits become 261 bits after compression, and then the error correction code occupies 7 bits. This means that the size of the encoded data before updating is four data of 67 bits each, i.e. 268 bits, and the size of the encoded data after updating is 261 bits plus 7 bits occupied by an error correction code and thus still 268 bits. Thus, the FEC error correction mechanism is introduced under the condition of not changing the transmission bandwidth of the link, and the data protection can be enhanced by adding corresponding reverse transcoding and FEC decoding operations at the receiving end. In some embodiments, the FEC algorithm may be an RS (536, 522) algorithm, and the corresponding redundancy error correction code is 140 bits, which may protect 5220 bits of data. That is, the transmitting end calculates a redundant error correction code (140 bits) from an input data block (5220 bits), and combines the redundant error correction code at the end of the data block and transmits the combined redundant error correction code. And at the receiving end, according to the FEC data block and the reverse FEC check calculation after frame delimitation, if errors are found in the data of the data field, error correction can be carried out according to the error correction code so as to recover the correct data. The data protection scheme employing such an FEC algorithm can correct errors of up to 70 bits in the data field.
Further, in one possible implementation manner, the decoding operation is based on a first decoding scheme, where the first encoding scheme corresponds to the first decoding scheme, and performing data aggregation processing on the decoded data to obtain aggregated data includes: and decompressing and reversely transcoding the decoded data, then carrying out forward error correction test, and then carrying out data aggregation processing to obtain the aggregated data. In this way, on the basis that the transmitting end uses the FEC technology to enhance data protection, corresponding reverse transcoding and FEC decoding operations (for example, compressed synchronization header bits for synchronization can be added to corresponding data block positions at the receiving end) are also performed at the receiving end, so that a general data protection mechanism for data interconnection between wafers is established, on the one hand, the FEC technology is utilized to generate a redundant error correction code and add the redundant error correction code to the compressed data to update the encoded data, on the other hand, the redundant error correction code generated by the FEC algorithm is utilized to correct errors in a data domain and provide data transmission reliability, on the other hand, the compressed transcoding is utilized to ensure that the size of the encoded data as a whole is kept unchanged before and after updating, so that the link transmission bandwidth is prevented from being increased due to the introduction of the redundant error correction code, and on the premise that the redundant error correction code generated by performing forward error correction calculation on the compressed data corresponds to the bit fields for synchronization in the compressed data, the FEC protection is realized on the premise that the link transmission bandwidth is unchanged. In addition, both the encoding and decoding operations may incorporate the requirements of a particular communication protocol. For example, the Interlaken protocol specifies that the first coding scheme is a 64/67 code, and the corresponding compression transcoding is performed, for example, by compression transcoding four bits of data, each 67 bits in size, to 261 bits, thereby providing 7 bits for the redundant error correction code. In other words, an appropriate FEC algorithm may be flexibly adopted according to the requirements of a specific communication protocol and a specific codec scheme, so that the first coding scheme specifically specified by the communication protocol may be adapted, and thus data protection may be flexibly enhanced according to the requirements of a specific communication protocol and a specific codec scheme. In addition, by selecting the first decoding scheme corresponding to the first coding scheme, on the basis that the transmitting end strengthens data protection by utilizing the FEC technology, corresponding reverse transcoding and FEC decoding operations are also performed at the receiving end, so that a general data protection mechanism for data interconnection between wafers is established, the data protection can be flexibly enhanced according to the requirements of specific communication protocols and specific coding and decoding schemes, error detection of link data can be enhanced, an error correction mechanism is provided, the reliability of high-speed data transmission of interconnection interfaces between wafers can be further improved, and application requirements in the fields such as networks, large-scale data centers, artificial intelligence and the like are met.
Further, in one possible implementation manner, performing compression transcoding on the plurality of pre-compression data to obtain the plurality of compressed data corresponding to the plurality of pre-compression data one to one respectively, where the method includes: compressing bit fields for synchronization in the plurality of pre-compression data for transmission of the redundant error correction code, and maintaining data fields in the plurality of pre-compression data. In some embodiments, a link transmission bandwidth of the transmission interface associated with the plurality of pre-compressed data is equal to a link transmission bandwidth of the transmission interface associated with the plurality of post-compressed data. In some embodiments, the compressing the plurality of pre-compression data to obtain the plurality of compressed data in one-to-one correspondence with the plurality of pre-compression data is based on the first encoding scheme, the first encoding scheme being based on a wafer-to-wafer interface interconnection protocol associated with the protocol processing unit. In some embodiments, the first decoding scheme is also based on a wafer-to-wafer interface interconnect protocol associated with the protocol processing unit. In this way, it is achieved that a suitable FEC algorithm is flexibly adapted according to the requirements of a specific communication protocol and a specific codec scheme, so that a first coding scheme specifically specified by the communication protocol can be adapted, and thus data protection can be flexibly enhanced according to the requirements of a specific communication protocol and a specific codec scheme. In addition, by selecting the first decoding scheme corresponding to the first coding scheme, on the basis that the transmitting end strengthens data protection by utilizing the FEC technology, corresponding reverse transcoding and FEC decoding operations are also performed at the receiving end, so that a general data protection mechanism for data interconnection between wafers is established, the data protection can be flexibly enhanced according to the requirements of specific communication protocols and specific coding and decoding schemes, error detection of link data can be enhanced, an error correction mechanism is provided, the reliability of high-speed data transmission of interconnection interfaces between wafers can be further improved, and application requirements in the fields such as networks, large-scale data centers, artificial intelligence and the like are met.
Fig. 5 is a flow chart of a data transmission process through a wafer-to-wafer interface and based on a first coding scheme according to an embodiment of the present application. As shown in fig. 5, the data transmission process through the wafer-to-wafer interface and based on the first coding scheme consists of the following steps.
Step S502: and (5) caching data.
Step S504: cutting data.
Step S506: the cyclic redundancy check calculates and generates a control word.
Step S508: assembling and banding.
Step S510: encoding based on a first encoding scheme.
Step S512: transcoding is compressed based on the first coding scheme.
Step S514: forward error correction calculation.
Step S516: scrambling and framing.
The data transmission process through the wafer-to-wafer interface and based on the first coding scheme shown in fig. 5 can refer to the method of interconnection between the wafer-to-wafer interfaces shown in fig. 4, so that the requirements of high data transmission rate and high data transmission reliability can be met, and the requirements of various protocols, rules, strategies and the like in terms of data transmission, flow control, scheduling functions, bandwidth, data channels and the like can be flexibly adapted. Further, the data transmission process shown in fig. 5 is performed through the wafer-to-wafer interface and based on the first coding scheme, and the transcoding is compressed based on the first coding scheme in step S512 and the FEC calculation is performed in step S514, so that a general data protection mechanism based on FEC technology is introduced into the protocol layer processing part, so that data protection can be flexibly enhanced according to the requirements of a specific communication protocol and a specific codec scheme, including enhancing error detection of link data and providing an error correction mechanism, and further improving the reliability of high-speed data transmission of the interconnection interface between wafers, and meeting the application requirements in fields such as network, large-scale data center, artificial intelligence, and the like.
Fig. 6 is a flow chart of a data receiving process through a wafer-to-wafer interface and based on a first decoding scheme according to an embodiment of the present application. As shown in fig. 6, the data reception process through the wafer-to-wafer interface and based on the first decoding scheme consists of the following steps.
Step S602: framing and descrambling.
Step S604: decoding is performed based on the first decoding scheme.
Step S606: decompression and reverse transcoding are based on the first decoding scheme.
Step S608: and (5) forward error correction verification.
Step S610: and (5) data aggregation processing.
Step S612: and (5) cyclic redundancy check.
Step S614: and (5) data combination.
Step S616: and (5) caching data.
The data reception process through the wafer-to-wafer interface and based on the first decoding scheme shown in fig. 6 may be referred to the method of wafer-to-wafer interface interconnection shown in fig. 4, embodying the corresponding operations taken at the receiving end. And further, the data receiving process is performed through the wafer-to-wafer interface and based on the first decoding scheme as shown in fig. 6, the decompression and reverse transcoding are performed in step S606 based on the first decoding scheme and the forward error correction check is performed in step S608, so that on the basis that the FEC technology is utilized to enhance the data protection at the transmitting end in the protocol layer processing part, the corresponding reverse transcoding and FEC decoding operations are performed at the receiving end, and thus a general data protection mechanism for the data interconnection between the wafers is established.
Referring to fig. 5 and 6, the data transmission process through the wafer-to-wafer interface and based on the first coding scheme shown in fig. 5 is performed, the transcoding is compressed based on the first coding scheme in step S512 and the forward error correction calculation is performed based on the first decoding scheme in step S514, the data reception process through the wafer-to-wafer interface and based on the first decoding scheme shown in fig. 6 is decompressed and reverse-transcoded based on the first decoding scheme in step S606 and the forward error correction check is performed in step S608, and thus, a suitable FEC algorithm is flexibly adopted according to the requirements of a specific communication protocol and a specific coding scheme, so that the first coding scheme specifically defined by the communication protocol can be adapted, and thus, the data protection can be flexibly enhanced according to the requirements of a specific communication protocol and a specific coding and decoding scheme. By selecting the first decoding scheme corresponding to the first coding scheme, on the basis that the transmitting end utilizes the FEC technology to strengthen data protection, corresponding reverse transcoding and FEC decoding operations are also carried out on the receiving end, so that a general data protection mechanism for data interconnection between wafers is established, the data protection can be flexibly enhanced according to the requirements of specific communication protocols and specific coding and decoding schemes, error detection of link data can be enhanced, an error correction mechanism is provided, the reliability of high-speed data transmission of interconnection interfaces between wafers can be further improved, and application requirements in the fields such as networks, large-scale data centers, artificial intelligence and the like are met.
The method and the device provided by the embodiment of the application are based on the same inventive concept, and because the principle of solving the problem by the method and the device is similar, the embodiment, the implementation, the example or the implementation of the method and the device can be mutually referred, and the repetition is not repeated. Embodiments of the present application also provide a system comprising a plurality of computing devices, each of which may be structured as described above. The functions or operations that may be implemented by the system may refer to specific implementation steps in the above method embodiments and/or specific functions described in the above apparatus embodiments, which are not described herein.
Embodiments of the present application also provide a computer-readable storage medium having stored therein computer instructions which, when executed on a computer device (e.g., one or more processors), implement the method steps of the method embodiments described above. The specific implementation of the processor of the computer readable storage medium in executing the above method steps may refer to specific operations described in the above method embodiments and/or specific functions described in the above apparatus embodiments, which are not described herein again.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. The application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Embodiments of the application may be implemented, in whole or in part, in software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The present application may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein. The computer program product includes one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line), or wireless (e.g., infrared, wireless, microwave, etc.). Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc. that contain one or more collections of available media. Usable media may be magnetic media (e.g., floppy disks, hard disks, tape), optical media, or semiconductor media. The semiconductor medium may be a solid state disk, or may be a random access memory, flash memory, read only memory, erasable programmable read only memory, electrically erasable programmable read only memory, register, or any other form of suitable storage medium.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. Each flow and/or block of the flowchart and/or block diagrams, and combinations of flows and/or blocks in the flowchart and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments. It will be apparent to those skilled in the art that various modifications and variations can be made to the embodiments of the present application without departing from the spirit or scope of the embodiments of the application. The steps in the method of the embodiment of the application can be sequentially adjusted, combined or deleted according to actual needs; the modules in the system of the embodiment of the application can be divided, combined or deleted according to actual needs. The present application is also intended to include such modifications and alterations if they come within the scope of the claims and the equivalents thereof.

Claims (16)

1. A method of wafer-to-wafer interface interconnection, wherein each wafer of a plurality of wafers includes a wafer-to-wafer interface for data interconnection between the wafer and another wafer of the plurality of wafers relative to the wafer, the method being applied to a first wafer that is any wafer of the plurality of wafers, the wafer-to-wafer interface of the first wafer including an interface cache, a protocol processing unit, and a transport interface, the method comprising:
Responding to the data transmission of the first wafer, and inputting first data to be transmitted to the interface cache;
performing data cutting on the first data to be transmitted through the protocol processing unit to obtain cut data, performing cyclic redundancy check calculation on the cut data to generate a cyclic redundancy calculation result, adding the cyclic redundancy calculation result into the cut data to perform assembly and striping distribution to obtain distributed data, performing coding operation on the distributed data to obtain coded data, and performing scrambling and framing on the coded data to obtain second data to be transmitted;
transmitting the second data to be transmitted through the transmission interface,
responding to the data receiving of the first wafer, and acquiring first data to be received through the transmission interface;
the protocol processing unit is used for framing and descrambling the first data to be received, then carrying out decoding operation to obtain decoded data, carrying out data aggregation on the decoded data to obtain aggregated data, and carrying out cyclic redundancy check and data combination on the aggregated data to obtain second data to be received;
inputting the second data to be received to the interface cache,
The encoding operation is based on a first encoding scheme, the encoded data includes a plurality of pre-compression data, the sizes of the plurality of pre-compression data are all bits of a first value, the first value is based on the first encoding scheme, wherein scrambling and framing the encoded data to obtain the second data to be transmitted includes: performing compression transcoding on the plurality of pre-compression data to obtain a plurality of compressed data corresponding to the plurality of pre-compression data one by one, performing forward error correction calculation on the plurality of pre-compression data to generate a redundant error correction code, adding the redundant error correction code into the plurality of compressed data to update the encoded data, performing scrambling and framing on the updated encoded data to obtain the second data to be transmitted,
the size of the encoded data before updating is consistent with the size of the encoded data after updating, the plurality of compressed data corresponds to a data field in the plurality of pre-compressed data, the redundant error correction code generated by performing forward error correction calculation on the plurality of pre-compressed data corresponds to a bit field for synchronization in the plurality of pre-compressed data,
The decoding operation is based on a first decoding scheme, the first encoding scheme corresponds to the first decoding scheme, wherein the data aggregation processing is performed on the decoded data to obtain aggregated data, and the method comprises the following steps: decompressing and reverse transcoding the decoded data, performing forward error correction test, performing data aggregation to obtain aggregated data,
performing compression transcoding on the plurality of pre-compression data to obtain the plurality of compressed data corresponding to the plurality of pre-compression data one to one, wherein the method comprises the following steps: compressing a bit field for synchronization in the plurality of pre-compression data for transmitting the redundant error correction code, and maintaining a data field in the plurality of pre-compression data,
the link transmission bandwidth of the transmission interface associated with the plurality of pre-compressed data is equal to the link transmission bandwidth of the transmission interface associated with the plurality of post-compressed data,
the compressing and transcoding the plurality of pre-compression data to obtain the plurality of compressed data corresponding to the plurality of pre-compression data one to one is based on the first coding scheme, wherein the first coding scheme is based on a wafer-to-wafer interface interconnection protocol associated with the protocol processing unit.
2. The method of claim 1, wherein the first encoding scheme is a 64/67 encoding and the first decoding scheme is a 64/67 decoding.
3. The method of claim 1, wherein the first data to be sent is from a user data interface associated with the first die, and the second data to be received is sent to the user data interface.
4. A method according to claim 3, characterized in that the second data to be received is rate-adapted before being sent to the user data interface.
5. The method of claim 1, wherein the second data to be sent is sent to a wafer-to-wafer interface transport interface of a second wafer relative to the first wafer.
6. The method according to claim 1, wherein adding, by the protocol processing unit, the cyclic redundancy calculation result to the cut data to perform assembling and striping distribution to obtain the distributed data, comprises: and adding the cyclic redundancy calculation result into the cut data through the protocol processing unit so as to assemble and stripe the data according to the burst length setting to obtain the distributed data.
7. The method of claim 1, wherein the transmission interface is a serializer deserializer interface.
8. The method of claim 1, wherein the protocol processing unit is an Interlaken protocol processing unit.
9. The method according to claim 1, wherein, by the protocol processing unit, rate adaptation processing is performed on the first data to be sent in the interface buffer at least before data cutting is performed on the first data to be sent to obtain cut data.
10. The method according to claim 1, wherein a control field is synchronously generated by the protocol processing unit in the process of generating the cyclic redundancy calculation result by performing cyclic redundancy check calculation on the cut data for recording the description information of the cut data.
11. The method of claim 1, wherein the plurality of wafers are homogenous wafers or heterogeneous wafers.
12. The method of claim 1, wherein the plurality of wafers correspond to functional blocks in a same system single wafer, the plurality of wafers being packaged together through respective wafer-to-wafer interfaces to form a system chipset corresponding to the system single wafer.
13. The method of claim 1, wherein the plurality of wafers are packaged together by a die technique.
14. The method of claim 1, wherein the first encoding scheme is a 64/67 encoding and the first value is 67.
15. A computer device, characterized in that it comprises a memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the method according to any of claims 1 to 14 when executing the computer program.
16. A computer readable storage medium storing computer instructions which, when run on a computer device, cause the computer device to perform the method of any one of claims 1 to 14.
CN202310537595.3A 2023-05-15 2023-05-15 Method, equipment and medium for interconnecting wafer-to-wafer interfaces Active CN116303191B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202310537595.3A CN116303191B (en) 2023-05-15 2023-05-15 Method, equipment and medium for interconnecting wafer-to-wafer interfaces
CN202310976442.9A CN116881188B (en) 2023-05-15 2023-05-15 Method, equipment and medium for interconnecting wafer-to-wafer interfaces

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310537595.3A CN116303191B (en) 2023-05-15 2023-05-15 Method, equipment and medium for interconnecting wafer-to-wafer interfaces

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202310976442.9A Division CN116881188B (en) 2023-05-15 2023-05-15 Method, equipment and medium for interconnecting wafer-to-wafer interfaces

Publications (2)

Publication Number Publication Date
CN116303191A CN116303191A (en) 2023-06-23
CN116303191B true CN116303191B (en) 2023-09-15

Family

ID=86803433

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202310537595.3A Active CN116303191B (en) 2023-05-15 2023-05-15 Method, equipment and medium for interconnecting wafer-to-wafer interfaces
CN202310976442.9A Active CN116881188B (en) 2023-05-15 2023-05-15 Method, equipment and medium for interconnecting wafer-to-wafer interfaces

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN202310976442.9A Active CN116881188B (en) 2023-05-15 2023-05-15 Method, equipment and medium for interconnecting wafer-to-wafer interfaces

Country Status (1)

Country Link
CN (2) CN116303191B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117135133B (en) * 2023-10-20 2023-12-29 中诚华隆计算机技术有限公司 Network interconnection method in chip

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1700313A (en) * 2004-05-21 2005-11-23 凌阳科技股份有限公司 Method of expandable wafer connection pin position, wafer and memory wafer
US9124383B1 (en) * 2010-09-23 2015-09-01 Ciena Corporation High capacity fiber-optic integrated transmission and switching systems
CN108111930A (en) * 2017-12-15 2018-06-01 中国人民解放军国防科技大学 Multi-bare-chip high-order optical switching structure based on high-density memory
CN111710662A (en) * 2020-07-01 2020-09-25 无锡中微亿芯有限公司 Universal multi-die silicon stacking interconnection structure
CN115312475A (en) * 2021-05-06 2022-11-08 美光科技公司 Encapsulation warpage reduction for semiconductor die assemblies and associated methods and systems

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105024948A (en) * 2014-04-30 2015-11-04 深圳市中兴微电子技术有限公司 Data transmission method, apparatus and system based on chip
CN117749323A (en) * 2021-01-25 2024-03-22 华为技术有限公司 Method, device, equipment, system and readable storage medium for data transmission
US20220320046A1 (en) * 2021-03-31 2022-10-06 Taiwan Semiconductor Manufacturing Company Limited Semiconductor package including semiconductor dies having different lattice directions and method of forming the same

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1700313A (en) * 2004-05-21 2005-11-23 凌阳科技股份有限公司 Method of expandable wafer connection pin position, wafer and memory wafer
US9124383B1 (en) * 2010-09-23 2015-09-01 Ciena Corporation High capacity fiber-optic integrated transmission and switching systems
CN108111930A (en) * 2017-12-15 2018-06-01 中国人民解放军国防科技大学 Multi-bare-chip high-order optical switching structure based on high-density memory
CN111710662A (en) * 2020-07-01 2020-09-25 无锡中微亿芯有限公司 Universal multi-die silicon stacking interconnection structure
CN115312475A (en) * 2021-05-06 2022-11-08 美光科技公司 Encapsulation warpage reduction for semiconductor die assemblies and associated methods and systems

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于FPGA 的GFP 通用成帧协议系统设计;王秀翠等;《网络与信息安全 信息技术与信息化》;第197-200页 *

Also Published As

Publication number Publication date
CN116881188B (en) 2024-01-09
CN116303191A (en) 2023-06-23
CN116881188A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
US11907140B2 (en) Serial interface for semiconductor package
CN116303191B (en) Method, equipment and medium for interconnecting wafer-to-wafer interfaces
CN109426636B (en) Method and device for transmitting high-bit-width data between FPGA (field programmable Gate array) chips
US11023412B2 (en) RDMA data sending and receiving methods, electronic device, and readable storage medium
US11539461B2 (en) Encoding method and related device
US11954059B2 (en) Signal processing chip and signal processing system
WO2021147050A1 (en) Pcie-based data transmission method and apparatus
MX2014013560A (en) Apparatus and method of transmitting and receiving packet in a broadcasting and communication system.
US9979566B2 (en) Hybrid forward error correction and replay technique for low latency
CN102231141B (en) A kind of data read-write method and system
CN116107953A (en) Communication method and system between core particles
US20220069944A1 (en) Data Processing Method And Related Apparatus
US11636061B2 (en) On-demand packetization for a chip-to-chip interface
WO2021249260A1 (en) Data transmission method and apparatus, circuit board, storage medium and electronic apparatus
CN109818705B (en) Method, device and equipment for transmitting and receiving subrate signals
CN114301485B (en) Interface assembly and data transmission method
WO2022062881A1 (en) Data processing method, communication apparatus and communication device
US11902171B2 (en) Communication system between dies and operation method thereof
WO2023137666A1 (en) Data transmission method and data transmission apparatus
WO2021169184A1 (en) Line coding method and apparatus
WO2022193098A1 (en) Data transmission method, communication device, and system
US20220405223A1 (en) Method and system for data transactions on a communications interface
WO2023071309A1 (en) Data transmission method, apparatus and system, and device and readable storage medium
CN113220620A (en) System for converting data stream format and data stream transmission system
CN115658576A (en) PCIe and RapidIO composite task packet transmission system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant