TWI744113B

TWI744113B - Interface device and interface method for 3d semiconductor device

Info

Publication number: TWI744113B
Application number: TW109141468A
Authority: TW
Inventors: 毅格艾爾卡諾維奇; 阿姆農帕納斯; 喻珮; 葉力墾; 方勇勝; 林聖偉; 黃智強; 譚競豪; 陳卿芳
Original assignee: 創意電子股份有限公司; 台灣積體電路製造股份有限公司
Priority date: 2020-09-30
Filing date: 2020-11-26
Publication date: 2021-10-21
Also published as: TW202215244A; CN114328328A; CN114328328B

Abstract

An interface device and an interface method for interfacing between a master device and a slave device is provided. The master device generates command and the slave device generates data according to the command. The interface device includes a master interface and a slave interface. The master interface is coupled to the master device and configured to send the command to the slave device and/or receive the data from the slave device. The slave interface is coupled to the slave device and configured to receive the command from the master device and/or send the data to the master device. The master interface and the slave interface are driven by a clock generated by a clock generator. The master interface and the slave interface are electrically connected by one or plurality of bonds and/or TSVs.

Description

Interface device and interface method for three-dimensional semiconductor device

本公開涉及一種用於三維（three dimensional，3D）半導體器件的技術，且更具體來說，涉及一種用於3D半導體器件的介面器件及介面方法。The present disclosure relates to a technology for a three dimensional (3D) semiconductor device, and more specifically, to an interface device and an interface method for a 3D semiconductor device.

近年來，電子器件例如個人電腦（personal computer，PC）或智慧手機，已在封裝方面得到發展，這樣一來，電子器件的大小變得緊湊且生產成本可相應地得到降低。電子器件發展的關鍵因素之一是3D半導體技術。可通過將中央處理單元（central processing unit，CPU）與記憶體垂直地內連來將包括CPU及記憶體的各種半導體器件集成到單個晶片中。這種結構一般來說被稱為3D積體電路（3D integrated circuit，3D IC）。另一方面，為了維持可靠的資料傳送/通信，需要由介面器件來調節一個CPU/記憶體與其他CPU/記憶體之間的內連。然而，3D積體電路的介面器件仍在開發中。In recent years, electronic devices such as personal computers (PCs) or smart phones have been developed in terms of packaging. As a result, the size of the electronic devices has become compact and the production cost can be reduced accordingly. One of the key factors in the development of electronic devices is 3D semiconductor technology. Various semiconductor devices including the CPU and the memory can be integrated into a single chip by vertically interconnecting the central processing unit (CPU) and the memory. This structure is generally called a 3D integrated circuit (3D integrated circuit, 3D IC). On the other hand, in order to maintain reliable data transmission/communication, an interface device is required to adjust the internal connection between one CPU/memory and other CPU/memory. However, interface devices for 3D integrated circuits are still under development.

本發明提供一種用於3D半導體器件的介面器件及介面方法。所述介面器件及所述介面方法在主器件與從器件之間提供可靠的資料通信。The invention provides an interface device and an interface method for a 3D semiconductor device. The interface device and the interface method provide reliable data communication between the master device and the slave device.

在實施例中，本發明提供一種用於在主器件與從器件之間進行介面的介面器件。所述主器件產生命令且所述從器件根據所述命令產生資料，所述介面器件包括主介面及從介面。所述主介面耦合到所述主器件。所述主介面被配置成將所述命令發送到所述從器件和/或從所述從器件接收所述資料。所述從介面耦合到所述從器件。所述從介面被配置成從所述主器件接收所述命令和/或將所述資料發送到所述主器件。所述主介面及所述從介面由時鐘產生器產生的時鐘驅動。所述主介面與所述從介面由一個或多個結合件進行電連接。用於驅動所述從介面的所述時鐘是通過將所述時鐘的時鐘相位改變成與所述命令的資料群集和/或所述資料的資料群集對齊來進行訓練。In an embodiment, the present invention provides an interface device for interfacing between a master device and a slave device. The master device generates a command and the slave device generates data according to the command. The interface device includes a master interface and a slave interface. The main interface is coupled to the main device. The master interface is configured to send the command to the slave device and/or receive the data from the slave device. The slave interface is coupled to the slave device. The slave interface is configured to receive the command from the master device and/or send the data to the master device. The master interface and the slave interface are driven by clocks generated by a clock generator. The master interface and the slave interface are electrically connected by one or more coupling elements. The clock used to drive the slave interface is trained by changing the clock phase of the clock to be aligned with the command data cluster and/or the data cluster of the data.

在實施例中，本發明提供一種用於在主器件與從器件之間進行介面的介面方法。所述主器件產生命令且所述從器件根據所述命令產生資料，所述介面方法包括：由主介面將所述命令發送到所述從器件和/或從所述從器件接收所述資料；以及由從介面從所述主器件接收所述命令和/或將所述資料發送到所述主器件。所述主介面及所述從介面由時鐘產生器產生的時鐘驅動。所述主介面與所述從介面由一個或多個結合件進行電連接。用於驅動所述從介面的所述時鐘是通過將所述時鐘的時鐘相位改變成與所述命令的資料群集和/或所述資料的資料群集對齊來進行訓練。In an embodiment, the present invention provides an interface method for interfacing between a master device and a slave device. The master device generates a command and the slave device generates data according to the command, and the interface method includes: the master interface sends the command to the slave device and/or receives the data from the slave device; And the slave interface receives the command from the master device and/or sends the data to the master device. The master interface and the slave interface are driven by clocks generated by a clock generator. The master interface and the slave interface are electrically connected by one or more coupling elements. The clock used to drive the slave interface is trained by changing the clock phase of the clock to be aligned with the command data cluster and/or the data cluster of the data.

為了使上述內容更容易理解，以下將詳細闡述附有圖式的若干實施例。In order to make the above content easier to understand, several embodiments with drawings will be described in detail below.

以下公開內容提供用於實施本公開的不同特徵的許多不同實施例或實例。以下闡述元件及排列的具體實例以簡化本公開。當然，這些僅為實例且不旨在進行限制。舉例來說，以下說明中將第一特徵形成在第二特徵之上或第二特徵上可包括其中第一特徵與第二特徵被形成為直接接觸的實施例，且也可包括其中第一特徵與第二特徵之間可形成有附加特徵從而使得所述第一特徵與所述第二特徵可不直接接觸的實施例。另外，本公開可能在各種實例中重複使用參考編號和/或字母。這種重複使用是出於簡潔及清晰的目的，而不是自身指示所論述的各種實施例和/或配置之間的關係。The following disclosure provides many different embodiments or examples for implementing different features of the present disclosure. Specific examples of elements and arrangements are described below to simplify the present disclosure. Of course, these are only examples and are not intended to be limiting. For example, the formation of the first feature on the second feature or the second feature in the following description may include an embodiment in which the first feature and the second feature are formed in direct contact, and may also include the first feature An embodiment in which an additional feature may be formed between the second feature so that the first feature and the second feature may not directly contact. In addition, the present disclosure may reuse reference numbers and/or letters in various examples. This repeated use is for the purpose of brevity and clarity, rather than indicating the relationship between the various embodiments and/or configurations discussed by itself.

此外，為易於說明，本文中可能使用例如“在...之下（beneath）”、“在...下方（below）”、“下部的（lower）”、“在...上方（above）”、“上部的（upper）”等空間相對性用語來闡述圖中所示的一個元件或特徵與另一（其他）元件或特徵的關係。所述空間相對性用語旨在除圖中所繪示的取向外還囊括器件在使用或操作中的不同取向。設備可具有其他取向（旋轉90度或處於其他取向），且本文中所使用的空間相對性描述語可同樣相應地進行解釋。In addition, for ease of description, this article may use, for example, "beneath", "below", "lower", "above )", "upper" and other spatially relative terms are used to describe the relationship between one element or feature shown in the figure and another (other) element or feature. The terms of spatial relativity are intended to cover different orientations of the device in use or operation in addition to the orientations depicted in the figures. The device can have other orientations (rotated by 90 degrees or in other orientations), and the spatial relativity descriptors used herein can also be interpreted accordingly.

本公開公開一種用於3D半導體器件的介面器件及介面方法。介面器件在主器件與從器件之間提供可靠的資料通信。所述可靠的資料通信是通過根據時鐘產生器產生的時鐘將主器件提供的資料等待時間分配到每一從器件來產生。每一從器件具有根據時鐘產生器的時鐘產生的本地時鐘。每一從器件可調整本地時鐘，這樣一來，可避免從器件之間的資料競爭。此外，通過避免每一從器件之間的資料競爭，可將位元錯誤（bit error）最小化或避免位元錯誤，這樣一來，便不需要使用糾錯模組及方法。因此，可提高資料通信速度。The present disclosure discloses an interface device and an interface method for a 3D semiconductor device. The interface device provides reliable data communication between the master device and the slave device. The reliable data communication is generated by allocating the data waiting time provided by the master device to each slave device according to the clock generated by the clock generator. Each slave device has a local clock generated from the clock of the clock generator. Each slave device can adjust the local clock, in this way, data competition between slave devices can be avoided. In addition, by avoiding data competition between each slave device, bit errors can be minimized or avoided. In this way, there is no need to use error correction modules and methods. Therefore, the data communication speed can be increased.

另外，當啟動電子器件時，每一從器件能夠通過向主器件發送內置自測（built-in-self-test，BIST）資料來訓練每一從器件的本地時鐘。通過精確地產生本地時鐘，每一從器件能夠提供具有低錯誤率或零錯誤率的精確資料。通過這樣做，不需要糾錯且可相應地提高資料通信速度。為了避免每一從器件之間的資料競爭且訓練每一從器件的本地時鐘，將根據以下提供的實施例詳述介面器件及介面方法的實施方式（特別是考慮到從到主介面的實施方式）。In addition, when starting the electronic device, each slave device can train the local clock of each slave device by sending built-in-self-test (BIST) data to the master device. By accurately generating the local clock, each slave device can provide accurate data with low error rate or zero error rate. By doing so, there is no need for error correction and the data communication speed can be increased accordingly. In order to avoid data competition between each slave device and train the local clock of each slave device, the implementation of the interface device and interface method will be described in detail according to the embodiments provided below (especially considering the implementation of the slave to the master interface ).

圖1示出根據本公開實施例的包括主器件及從器件的半導體器件的示意性方塊圖。半導體器件100實施在例如以下3D封裝中：晶粒對晶片對基板（chip-on-wafer-on-substrate，CoWoS）、系統集成晶片（system-on-integrated-chip，SoIC）、晶片對晶片（wafer-on-wafer，WoW）及其他3D封裝集成。FIG. 1 shows a schematic block diagram of a semiconductor device including a master device and a slave device according to an embodiment of the present disclosure. The semiconductor device 100 is implemented in, for example, the following 3D packages: chip-on-wafer-on-substrate (CoWoS), system-on-integrated-chip (SoIC), wafer-to-wafer ( wafer-on-wafer, WoW) and other 3D package integration.

參照圖1，半導體器件100包括主晶片120、從晶片130及時鐘產生器115。主晶片120通過矽穿孔（through-silicon-via，TSV）104耦合到從晶片130。主晶片120包括主器件105及耦合到主器件105的主介面102。另一方面，從晶片130包括從器件110及耦合到從器件110的從介面103。主器件105經由主介面102及從介面103耦合到從器件110。主介面102與從介面103經由TSV 104耦合且整合在一起作為介面器件101。介面器件101適於垂直地連接主器件105與從器件110，此形成3D半導體器件。介面器件101的結構被稱為Glink-3D。此外，時鐘產生器115產生用於驅動主器件105、主介面102、從介面103及從器件110的時鐘。由時鐘產生器115產生的時鐘以正向及反向用於主介面102及從介面103。1, the semiconductor device 100 includes a master wafer 120, a slave wafer 130, and a clock generator 115. The master wafer 120 is coupled to the slave wafer 130 through a through-silicon-via (TSV) 104. The main chip 120 includes a main device 105 and a main interface 102 coupled to the main device 105. On the other hand, the slave wafer 130 includes a slave device 110 and a slave interface 103 coupled to the slave device 110. The master device 105 is coupled to the slave device 110 via the master interface 102 and the slave interface 103. The master interface 102 and the slave interface 103 are coupled via the TSV 104 and integrated together as the interface device 101. The interface device 101 is suitable for vertically connecting the master device 105 and the slave device 110, which forms a 3D semiconductor device. The structure of the interface device 101 is called Glink-3D. In addition, the clock generator 115 generates clocks for driving the master device 105, the master interface 102, the slave interface 103, and the slave device 110. The clock generated by the clock generator 115 is used for the master interface 102 and the slave interface 103 in forward and reverse directions.

在實施例中，主器件105及從器件110分別被實施成例如處理器及記憶體（即，靜態隨機存取記憶體（static random access memory，SRAM））。時鐘產生器115由例如振盪器實施。主介面102與從介面103之間的連接由具有平行匯流排的TSV實施，所述平行匯流排用於以高達5.0 Gbps的採樣速率或2.5 GHz的雙倍數據速率（double data rate，DDR）傳送資料。平行匯流排還用於在從器件110與從介面103之間以及還在主器件105與主介面102之間耦合。在實施例中，主器件105與從器件110之間的等待時間被設定為1 ns到2 ns。主器件105與從器件110之間的資料傳送具有低位元錯誤或無位元錯誤（no bit error，no BER）。In the embodiment, the master device 105 and the slave device 110 are respectively implemented as, for example, a processor and a memory (ie, static random access memory (SRAM)). The clock generator 115 is implemented by, for example, an oscillator. The connection between the master interface 102 and the slave interface 103 is implemented by a TSV with a parallel bus for transmission at a sampling rate of up to 5.0 Gbps or a double data rate (DDR) of 2.5 GHz material. The parallel bus is also used for coupling between the slave device 110 and the slave interface 103 and also between the master device 105 and the master interface 102. In the embodiment, the waiting time between the master device 105 and the slave device 110 is set to 1 ns to 2 ns. The data transmission between the master device 105 and the slave device 110 has a low-bit error or no bit error (no bit error, no BER).

圖2示出根據本公開實施例的包括主器件及從器件的半導體器件的示意性方塊圖。圖2中所示的半導體器件200類似於圖1中所示的半導體器件100。不同之處在於，時鐘產生器107被實施在主器件106內部而不是被實施成如圖1中所示的外部時鐘產生器。FIG. 2 shows a schematic block diagram of a semiconductor device including a master device and a slave device according to an embodiment of the present disclosure. The semiconductor device 200 shown in FIG. 2 is similar to the semiconductor device 100 shown in FIG. 1. The difference is that the clock generator 107 is implemented inside the main device 106 instead of being implemented as an external clock generator as shown in FIG. 1.

圖3示出根據本公開實施例的包括主器件及多個從器件的半導體器件的示意性方塊圖。圖3中所示的半導體器件300類似於圖1中所示的半導體器件100。不同之處在於，主器件105包括多個中央處理單元（CPU）108-1到108-M。此外，介面器件111包括主介面102及多個從介面103-1到103-N。每一從介面103-1到103-N以一對一的關係耦合到每一從器件110-1到110-N。N及M是等於或大於1的整數。此外，時鐘產生器115產生用於驅動具有所述多個CPU 108-1到108-M的主器件105、主介面102、所述多個從介面103-1到103-N、以及所述多個從器件110-1到110-N的時鐘。時鐘產生器115可如圖2中所示被包括到主器件105。FIG. 3 shows a schematic block diagram of a semiconductor device including a master device and a plurality of slave devices according to an embodiment of the present disclosure. The semiconductor device 300 shown in FIG. 3 is similar to the semiconductor device 100 shown in FIG. 1. The difference is that the main device 105 includes a plurality of central processing units (CPUs) 108-1 to 108-M. In addition, the interface device 111 includes a master interface 102 and a plurality of slave interfaces 103-1 to 103-N. Each slave interface 103-1 to 103-N is coupled to each slave device 110-1 to 110-N in a one-to-one relationship. N and M are integers equal to or greater than 1. In addition, the clock generator 115 generates for driving the master device 105 having the plurality of CPUs 108-1 to 108-M, the master interface 102, the plurality of slave interfaces 103-1 to 103-N, and the plurality of One slave 110-1 to 110-N clock. The clock generator 115 may be included in the master device 105 as shown in FIG. 2.

圖4示出根據本公開實施例的包括主晶片及從晶片的半導體器件的示意性設計圖。半導體器件400被垂直地排列以形成3D封裝，且包括例如與主介面404耦合的主/處理器/晶片1晶片402、與從介面410耦合的從/記憶體/晶片2晶片408。處理器晶片402與記憶體晶片408經由處理器介面404及記憶體介面410通過所述多個TSV 406耦合。此外，記憶體晶片408包括所述多個TSV 412及所述多個連接件414。FIG. 4 shows a schematic design diagram of a semiconductor device including a master wafer and a slave wafer according to an embodiment of the present disclosure. The semiconductor device 400 is arranged vertically to form a 3D package, and includes, for example, a master/processor/chip 1 chip 402 coupled with the master interface 404 and a slave/memory/chip 2 chip 408 coupled with the slave interface 410. The processor chip 402 and the memory chip 408 are coupled through the plurality of TSVs 406 via the processor interface 404 and the memory interface 410. In addition, the memory chip 408 includes the plurality of TSVs 412 and the plurality of connectors 414.

圖5示出根據本公開實施例的包括主晶片及多個從晶片的半導體器件的示意性設計圖。半導體器件500被垂直地排列以形成3D封裝，且包括例如耦合到主介面501-2的主晶片501-1、耦合到多個第一從介面502-2的多個第一從晶片502-1、以及耦合到多個第二從介面503-2的多個第二從晶片503-1。所述多個第一從晶片502-1包括TSV 502-4。主介面501-2經由TSV 502-3耦合到所述多個第一從介面502-2且經由TSV 503-3耦合到所述多個第二從介面503-2。此外，半導體器件包括將主介面501-2連接到連接件506的TSV連接件504。FIG. 5 shows a schematic design diagram of a semiconductor device including a master wafer and a plurality of slave wafers according to an embodiment of the present disclosure. The semiconductor device 500 is vertically arranged to form a 3D package, and includes, for example, a master chip 501-1 coupled to a master interface 501-2, and a plurality of first slave chips 502-1 coupled to a plurality of first slave interfaces 502-2. , And a plurality of second slave chips 503-1 coupled to the plurality of second slave interfaces 503-2. The plurality of first slave wafers 502-1 include TSV 502-4. The master interface 501-2 is coupled to the plurality of first slave interfaces 502-2 via the TSV 502-3 and to the plurality of second slave interfaces 503-2 via the TSV 503-3. In addition, the semiconductor device includes a TSV connector 504 that connects the main interface 501-2 to the connector 506.

在實施例中，半導體器件（即，500）支援面對面介面及面對背介面。舉例來說，主晶片501-1與第一從晶片502-1之間的介面和/或主晶片501-1與第二從晶片503-1之間的介面是面對面介面。並且面對背介面用於每一第一從晶片502-1之間的介面和/或每一第二從晶片503-1之間的介面。In an embodiment, the semiconductor device (ie, 500) supports a face-to-face interface and a face-to-back interface. For example, the interface between the master chip 501-1 and the first slave chip 502-1 and/or the interface between the master chip 501-1 and the second slave chip 503-1 is a face-to-face interface. And the face-to-back interface is used for the interface between each first slave chip 502-1 and/or the interface between each second slave chip 503-1.

圖6示出根據本公開實施例的包括主晶片及多個從晶片的半導體器件的示意性3D圖。半導體器件600包括處理器晶片，所述處理器晶片包括多個CPU核心晶片及經由作為設備介面的Glink-3D與處理器晶片垂直地連接的多個SRAM晶片。從處理器晶片到SRAM晶片並返回到處理器晶片的往返資料傳送的讀取等待時間等於或小於5 ns。實施此讀取等待時間值是為了實現處理器晶片與SRAM晶片之間的可靠的資料通信。FIG. 6 shows a schematic 3D diagram of a semiconductor device including a master wafer and a plurality of slave wafers according to an embodiment of the present disclosure. The semiconductor device 600 includes a processor chip including a plurality of CPU core chips and a plurality of SRAM chips connected vertically to the processor chip via Glink-3D as a device interface. The read latency for the round-trip data transfer from the processor chip to the SRAM chip and back to the processor chip is equal to or less than 5 ns. The read latency value is implemented to achieve reliable data communication between the processor chip and the SRAM chip.

圖7示出根據本公開實施例的包括介面器件結構的實例的半導體器件的示意性3D圖。半導體器件700包括耦合到作為主介面的Glink-3D master的CPU核心晶片及耦合到作為從介面的Glink-3D slaves的快取記憶體晶片。Glink-3D master經由TSV耦合到Glink-3D slaves。舉例來說，在讀取操作期間，CPU核心晶片經由Glink-3D master及Glink-3D slaves將命令發送到快取記憶體晶片。且然後，快取記憶體晶片接收來自CPU核心晶片的命令。快取記憶體晶片根據命令產生資料且經由Glink-3D slaves及Glink-3D master將資料發送到CPU核心晶片。最後，CPU核心晶片從快取記憶體晶片接收資料。此外，CPU核心晶片與快取記憶體晶片之間經由Glink-3D master及Glink-3D slaves的資料通信由時鐘產生器（即，115）產生的時鐘驅動。FIG. 7 shows a schematic 3D diagram of a semiconductor device including an example of an interface device structure according to an embodiment of the present disclosure. The semiconductor device 700 includes a CPU core chip coupled to a Glink-3D master as a master interface and a cache memory chip coupled to Glink-3D slaves as a slave interface. Glink-3D master is coupled to Glink-3D slaves via TSV. For example, during a read operation, the CPU core chip sends commands to the cache chip via the Glink-3D master and Glink-3D slaves. And then, the cache memory chip receives the command from the CPU core chip. The cache memory chip generates data according to commands and sends the data to the CPU core chip via Glink-3D slaves and Glink-3D master. Finally, the CPU core chip receives data from the cache memory chip. In addition, the data communication between the CPU core chip and the cache memory chip via the Glink-3D master and Glink-3D slaves is driven by the clock generated by the clock generator (ie, 115).

在此實施例中，Glink-3D master與Glink-3D slaves具有相同的結構並且以一對一的關係連接。舉例來說，每一Glink-3D master及Glink-3D slaves包括多個塊。每一塊被分成多個胞元，例如5×5胞元。Glink-3D master的每一胞元經由TSV以一對一的關係連接到Glink-3D slaves的每一胞元。此Glink-3D結構被用作例如高級微控制器匯流排架構一致性集線器介面（advance microcontroller bus architecture coherent hub interface，AMBA CHI）協定的實體層。3D半導體器件上包括Glink-3D master及Glink-3D slaves的介面器件的細節及對應的實施方式將進一步闡述如下。In this embodiment, Glink-3D master and Glink-3D slaves have the same structure and are connected in a one-to-one relationship. For example, each Glink-3D master and Glink-3D slaves includes multiple blocks. Each block is divided into multiple cells, such as 5×5 cells. Each cell of Glink-3D master is connected to each cell of Glink-3D slaves in a one-to-one relationship via TSV. This Glink-3D structure is used as, for example, the physical layer of the advanced microcontroller bus architecture coherent hub interface (AMBA CHI) protocol. The details of the interface device including Glink-3D master and Glink-3D slaves on the 3D semiconductor device and the corresponding implementation will be further elaborated as follows.

圖8示出根據本公開實施例的包括主介面及多個從介面的介面器件的示意性示意圖。可使用多個電子元件（即，觸發器（flip-flop，FF）、多工器（multiplexer，MUX）、反相器及緩衝器）來實施示意圖800。FIG. 8 shows a schematic diagram of an interface device including a master interface and a plurality of slave interfaces according to an embodiment of the present disclosure. The schematic 800 may be implemented using multiple electronic components (ie, flip-flop (FF), multiplexer (MUX), inverter, and buffer).

參照圖8，使用Glink-3D master作為主晶片的介面。Glink-3D slaveK及Glink-3D slaveN分別用作slaveK晶片的介面及slaveN晶片的介面。Glink-3D master、Glink-3D slaveK及Glink-3D slaveN由時鐘產生器（即，115）產生的時鐘clk_in驅動。Glink-3D master、Glink-3D slaveK及Glink-3D slaveN通過一個或多個結合件進行電連接。舉例來說，Glink-3D master結合件806-1到806-3使用TSV以一對一的關係連接到Glink-3D slaveN結合件808-1到808-3。Referring to Figure 8, Glink-3D master is used as the interface of the master chip. Glink-3D slaveK and Glink-3D slaveN are used as the interface of the slaveK chip and the interface of the slaveN chip, respectively. The Glink-3D master, Glink-3D slaveK, and Glink-3D slaveN are driven by the clock clk_in generated by the clock generator (ie, 115). Glink-3D master, Glink-3D slaveK, and Glink-3D slaveN are electrically connected through one or more joints. For example, the Glink-3D master joints 806-1 to 806-3 are connected to the Glink-3D slaveN joints 808-1 to 808-3 in a one-to-one relationship using TSV.

在此實施例中，Glink-3D master包括FF 802、DDR MUX 804、結合件806-1到806-3、以及包括多個FF 803-1到803-3的讀取先進先出（first-in-first-out，FIFO）。FF 802耦合到DDR MUX 804且從主晶片接收命令tx_data command。命令tx_data command可被形成為例如資料群集。命令tx_data command可包括用作從晶片位址的slave_ID。DDR MUX 804耦合到結合件806-1且以DDR資料格式的形式通過結合件806-1及808-1將命令tx_data command遞送（proceed）到Glink-3D slaveN。FF 803-1耦合到FF 803-2及結合件806-3。FF 803-3耦合到FF 803-2及主晶片且將資料rx_data發送到主晶片。FF 802、DDR MUX 804、結合件806-2及FF 803-3由時鐘產生器（即，115）所產生的clk_in驅動。FF 803-1及803-2是通過結合件806-3及808-3由例如Glink-3D slaveN產生的本地時鐘驅動。In this embodiment, the Glink-3D master includes FF 802, DDR MUX 804, couplings 806-1 to 806-3, and read first-in first-out (first-in first-out) including multiple FFs 803-1 to 803-3. -first-out, FIFO). The FF 802 is coupled to the DDR MUX 804 and receives the command tx_data command from the host chip. The command tx_data command can be formed as, for example, a data cluster. The command tx_data command may include slave_ID used as the address of the slave chip. The DDR MUX 804 is coupled to the bonding element 806-1 and sends the command tx_data command to the Glink-3D slaveN through the bonding elements 806-1 and 808-1 in the form of a DDR data format. The FF 803-1 is coupled to the FF 803-2 and the coupling 806-3. The FF 803-3 is coupled to the FF 803-2 and the main chip and sends the data rx_data to the main chip. FF 802, DDR MUX 804, combination 806-2, and FF 803-3 are driven by clk_in generated by a clock generator (ie, 115). FF 803-1 and 803-2 are driven by a local clock generated by, for example, Glink-3D slaveN through the combination of 806-3 and 808-3.

在此實施例中，Glink-3D slaveN包括結合件808-1到808-3、FF 810到814、DDR MUX 816、以及緩衝器818及820。結合件808-1耦合到結合件806-1且FF 810將命令rx_data command發送到slaveN晶片。結合件808-2耦合到結合件806-2且將時鐘clk發送到slaveN晶片。FF 812耦合到DDR MUX 816及slaveN晶片且從slaveN晶片接收資料tx_data。FF 814耦合到slaveN晶片且接收啟用信號tx_en。緩衝器820耦合到DDR MUX 816及結合件808-3且以DDR資料格式的形式發送資料tx_data。緩衝器818耦合到結合件808-3且通過結合件808-3及806-3將本地時鐘發送到Glink 3D master。FF 810到FF 814及DDR MUX 816由時鐘clk驅動。緩衝器818及820由啟用信號tx_en驅動。另外，slaveK晶片及對應的Glink-3D slaveK具有與slaveN晶片及Glink-3D slaveN相同的結構及資料通信。Glink-3D slaveN與Glink-3D slaveK之間的不同之處在於本地時鐘的產生。產生本地時鐘的過程將在稍後根據圖10進行闡述。In this embodiment, Glink-3D slaveN includes couplings 808-1 to 808-3, FFs 810 to 814, DDR MUX 816, and buffers 818 and 820. The bond 808-1 is coupled to the bond 806-1 and the FF 810 sends the command rx_data command to the slaveN chip. The bond 808-2 is coupled to the bond 806-2 and sends the clock clk to the slaveN chip. The FF 812 is coupled to the DDR MUX 816 and the slaveN chip and receives data tx_data from the slaveN chip. The FF 814 is coupled to the slaveN chip and receives the enable signal tx_en. The buffer 820 is coupled to the DDR MUX 816 and the coupling 808-3 and transmits the data tx_data in the form of the DDR data format. The buffer 818 is coupled to the coupling 808-3 and sends the local clock to the Glink 3D master through the couplings 808-3 and 806-3. FF 810 to FF 814 and DDR MUX 816 are driven by clock clk. The buffers 818 and 820 are driven by the enable signal tx_en. In addition, the slaveK chip and the corresponding Glink-3D slaveK have the same structure and data communication as the slaveN chip and Glink-3D slaveN. The difference between Glink-3D slaveN and Glink-3D slaveK is the generation of the local clock. The process of generating the local clock will be explained later according to FIG. 10.

圖9示出根據本公開實施例的在讀取操作期間包括主晶片及從晶片的介面器件的示意性示意圖。示意圖900與示意圖800類似。示意圖900與示意圖800之間的不同之處在於，示意圖900示出例如具有對應的Glink-3D slaveN的一個slaveN晶片及SRAM 901。另外，還包括邏輯單元902及FF 904。FIG. 9 shows a schematic diagram of an interface device including a master wafer and a slave wafer during a read operation according to an embodiment of the present disclosure. The schematic diagram 900 is similar to the schematic diagram 800. The difference between the schematic diagram 900 and the schematic diagram 800 is that the schematic diagram 900 shows, for example, a slaveN chip and an SRAM 901 with a corresponding Glink-3D slaveN. In addition, it also includes a logic unit 902 and an FF 904.

參照圖9，在讀取操作期間，主晶片經由Glink-3D master及Glink-3D slaveN將包括作為從晶片N的位址的晶片標識（identification，ID）的命令wr_data發送到SRAM 901。邏輯單元902耦合到Glink-3D slaveN、SRAM 901及FF 904。FF 904耦合到Glink-3D slaveN。邏輯單元902產生用於在晶片選擇（chip select，CS）命令、讀取（read，RD）命令或寫入（write，WR）命令之間進行選擇的信號。邏輯單元902及對應的FF 904產生啟用信號tx_en。SRAM 901根據命令產生資料tx_data。Glink-3D slaveN以DDR資料格式的形式將資料tx_data發送到Glink-3D master。主晶片根據Glink-3D slaveN的本地時鐘讀取資料tx_data。Referring to FIG. 9, during a read operation, the master chip sends a command wr_data including a chip identification (ID) as the address of the slave chip N to the SRAM 901 via the Glink-3D master and Glink-3D slaveN. The logic unit 902 is coupled to Glink-3D slaveN, SRAM 901 and FF 904. FF 904 is coupled to Glink-3D slaveN. The logic unit 902 generates a signal for selecting between a chip select (CS) command, a read (read, RD) command, or a write (write, WR) command. The logic unit 902 and the corresponding FF 904 generate the enable signal tx_en. The SRAM 901 generates data tx_data according to the command. Glink-3D slaveN sends data tx_data to Glink-3D master in the form of DDR data format. The master chip reads the data tx_data according to the local clock of Glink-3D slaveN.

圖10示出根據本公開實施例的包括時鐘樹的從到主介面的示意性示意圖。示意圖1000與示意圖800及示意圖900相同。示意圖1000與示意圖800及示意圖900之間的不同之處在於，從從到主介面的角度來看，示意圖1000示出資料路徑及時鐘路徑中所包括的更詳細的電路。此外，時鐘路徑具有用於將時鐘從Glink-3D master遞送到每一Glink-3D slave的時鐘樹（即，1019及1020）。另外，提供從Glink-3D slaveN及Glink-3D slaveK發送到Glink-3D master的DDR資料格式的形式的資料的時序圖。FIG. 10 shows a schematic diagram of a slave-to-master interface including a clock tree according to an embodiment of the present disclosure. The schematic diagram 1000 is the same as the schematic diagram 800 and the schematic diagram 900. The difference between the diagram 1000 and the diagram 800 and the diagram 900 is that, from the perspective of the slave to the main interface, the diagram 1000 shows more detailed circuits included in the data path and the clock path. In addition, the clock path has a clock tree (ie, 1019 and 1020) for delivering clocks from the Glink-3D master to each Glink-3D slave. In addition, the timing chart of the data in the form of DDR data format sent from Glink-3D slaveN and Glink-3D slaveK to Glink-3D master is provided.

在此實施例中，從介面Glink-3D slaveN及其他從介面（即，Glink-3D slaveK）中的每一個還被配置成使用雙倍數據速率（double data rate，DDR）配置將資料/其他資料（即，tx_data [31:0]）發送到主介面。舉例來說，將資料tx_data[31:0]折疊成數據tx_data[31:16]及數據tx_data[15:0]。數據tx_data[31:16]及數據tx_data[15:0]中的每一個被稱為例如資料群集。In this embodiment, each of the slave interface Glink-3D slaveN and other slave interfaces (ie, Glink-3D slaveK) is also configured to use a double data rate (DDR) configuration to transfer data/other data (Ie, tx_data [31:0]) is sent to the main interface. For example, the data tx_data[31:0] is folded into data tx_data[31:16] and data tx_data[15:0]. Each of the data tx_data[31:16] and the data tx_data[15:0] is called, for example, a data cluster.

在此實施例中，DDR配置由DDR單元產生，DDR單元包括第一FF 1002、第二FF 1004及多工器1006。第一FF 1002及第二FF 1004被表示為圖8所示FF 812，且多工器1006由圖8所示DDR MUX 816表示。第一FF 1002、第二FF 1004及多工器1006由時鐘1019驅動。第一FF 1002被配置成根據資料/其他資料（tx_data[31:0]）產生一部分資料（即，資料tx_data[31:16]）。第二FF 1004被配置成根據資料/其他資料（tx_data[31:0]）產生另一部分資料（即，資料tx_data[15:0]）。多工器1006耦合到第一FF 1002及第二FF 1004。多工器1006被配置成將一部分資料tx_data[31:16]及另一部分資料tx_data[15:0] 經由緩衝器1008發送到主器件。緩衝器1008由圖8所示緩衝器820表示。緩衝器1008由啟用信號tx_en啟用。圖10中所示的啟用信號tx_en與圖8及圖9中所示的啟用信號tx_en相同。通過啟用緩衝器1008，將一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]通過結合件1011及1021發送到Glink-3D master。In this embodiment, the DDR configuration is generated by a DDR unit, and the DDR unit includes a first FF 1002, a second FF 1004, and a multiplexer 1006. The first FF 1002 and the second FF 1004 are represented as the FF 812 shown in FIG. 8, and the multiplexer 1006 is represented by the DDR MUX 816 shown in FIG. 8. The first FF 1002, the second FF 1004, and the multiplexer 1006 are driven by a clock 1019. The first FF 1002 is configured to generate a part of data (ie, data tx_data[31:16]) based on data/other data (tx_data[31:0]). The second FF 1004 is configured to generate another part of data (ie, data tx_data[15:0]) based on data/other data (tx_data[31:0]). The multiplexer 1006 is coupled to the first FF 1002 and the second FF 1004. The multiplexer 1006 is configured to send a part of the data tx_data[31:16] and another part of the data tx_data[15:0] to the master device via the buffer 1008. The buffer 1008 is represented by the buffer 820 shown in FIG. 8. The buffer 1008 is enabled by the enable signal tx_en. The enable signal tx_en shown in FIG. 10 is the same as the enable signal tx_en shown in FIG. 8 and FIG. 9. By enabling the buffer 1008, a part of the data tx_data[31:16] and another part of the data tx_data[15:0] are sent to the Glink-3D master through the combination parts 1011 and 1021.

在另一實施例中，從介面（即，Glink-3D slaveN）及其他從介面（即，Glink-3D slaveK）中的每一個還包括第一選通1015及第二選通1016。第一選通1015及第二選通1016耦合到時鐘路徑1019。第一選通1015被配置成根據時鐘產生器（即，115）產生的時鐘clk_in產生第一本地時鐘RDQS_F。第二選通1016被配置成根據時鐘產生器（即，115）產生的時鐘clk_in產生第二本地時鐘RDQS_R。時鐘路徑1019是時鐘樹（即，1019、1020）的一個分支。時鐘clk_in通過結合件1024及1014作為clk_out遞送。時鐘路徑1019將時鐘clk_out經由時鐘路徑1019遞送到第一FF 1002、第二FF 1004、緩衝器1008、第一選通1015及第二選通1016。緩衝器1017通過結合件1012及1022將第一本地時鐘RDQS_F遞送到Glink-3D master。緩衝器1017根據啟用信號tx_en被啟用。圖10中所示的啟用信號tx_en與圖8及圖9中所示的啟用信號tx_en相同。緩衝器1018經由結合件1013及1023將第二本地時鐘RDQS_R遞送到Glink-3D master。緩衝器1018根據啟用信號tx_en被啟用。圖10中所示的啟用信號tx_en與圖8及圖9中所示的啟用信號tx_en相同。In another embodiment, each of the slave interface (ie, Glink-3D slaveN) and the other slave interface (ie, Glink-3D slaveK) further includes a first strobe 1015 and a second strobe 1016. The first strobe 1015 and the second strobe 1016 are coupled to the clock path 1019. The first strobe 1015 is configured to generate the first local clock RDQS_F according to the clock clk_in generated by the clock generator (ie, 115). The second strobe 1016 is configured to generate the second local clock RDQS_R according to the clock clk_in generated by the clock generator (ie, 115). The clock path 1019 is a branch of the clock tree (ie, 1019, 1020). The clock clk_in is delivered as clk_out through the bonds 1024 and 1014. The clock path 1019 delivers the clock clk_out to the first FF 1002, the second FF 1004, the buffer 1008, the first strobe 1015, and the second strobe 1016 via the clock path 1019. The buffer 1017 delivers the first local clock RDQS_F to the Glink-3D master through the couplings 1012 and 1022. The buffer 1017 is enabled according to the enable signal tx_en. The enable signal tx_en shown in FIG. 10 is the same as the enable signal tx_en shown in FIG. 8 and FIG. 9. The buffer 1018 delivers the second local clock RDQS_R to the Glink-3D master via the couplings 1013 and 1023. The buffer 1018 is enabled according to the enable signal tx_en. The enable signal tx_en shown in FIG. 10 is the same as the enable signal tx_en shown in FIG. 8 and FIG. 9.

在此實施例中，由第一選通1015產生的第一本地時鐘RDQS_F由Glink-3D master用於讀取由第一FF 1002產生的一部分資料tx_data[31:16]，且由第二選通1016產生的第二本地時鐘RDQS_R由Glink-3D master用於讀取由第二FF 1004產生的另一部分資料tx_data[15:0]。舉例來說，Glink-3D master包括單元塊，所述單元塊被配置成根據第一本地時鐘RDQS_F讀取一部分資料tx_data[31:16]且根據第二本地時鐘RDQS_R讀取另一部分資料tx_data[15:0]。Glink-3D master使用DDR資料格式讀取一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]。因此，Glink-3D master對一部分資料tx_data[31:16]與另一部分資料tx_data[15:0]進行組合，以產生完整的資料rx_data[31:0]。Glink-3D master然後將完整的資料rx_data[31:0]發送到處理器。In this embodiment, the first local clock RDQS_F generated by the first strobe 1015 is used by the Glink-3D master to read a part of the data tx_data[31:16] generated by the first FF 1002, and the second strobe The second local clock RDQS_R generated by 1016 is used by the Glink-3D master to read another part of data tx_data[15:0] generated by the second FF 1004. For example, the Glink-3D master includes a unit block configured to read a part of data tx_data[31:16] according to a first local clock RDQS_F and another part of data tx_data[15] according to a second local clock RDQS_R :0]. Glink-3D master uses DDR data format to read part of data tx_data[31:16] and another part of data tx_data[15:0]. Therefore, the Glink-3D master combines a part of the data tx_data[31:16] with another part of the data tx_data[15:0] to generate a complete data rx_data[31:0]. The Glink-3D master then sends the complete data rx_data[31:0] to the processor.

在此實施例中，Glink-3D master還包括FIFO單元。圖10所示FIFO單元也被表示為圖8及圖9所示FIFO單元。可實施FIFO單元來獲得如前所述的單元塊的功能。FIFO單元可由多個FF（即，1031、1032、1051、1041、1042、1061）實施。FF 1031及1041表示圖8所示FF 803-1。FF 1032及1042表示圖8所示FF 803-2。FF 1051及1061表示圖8所示FF 803-3。具體來說，FF 1031及1041耦合到結合件1021，以接收一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]。FF 1031經由反相器1030耦合到結合件1022，以接收第一本地時鐘RDQS_F。FF 1041耦合到結合件1023以接收第二本地時鐘RDQS_R。FF 1031及1041分別耦合到FF 1032及1042，以形成FIFO單元。FF的數目並不限於特定數目。FF的數目可由任意數目的FF來實施。In this embodiment, the Glink-3D master also includes a FIFO unit. The FIFO unit shown in FIG. 10 is also represented as the FIFO unit shown in FIGS. 8 and 9. The FIFO unit can be implemented to obtain the function of the unit block as described above. The FIFO unit can be implemented by multiple FFs (ie, 1031, 1032, 1051, 1041, 1042, 1061). FF 1031 and 1041 represent FF 803-1 shown in Fig. 8. FF 1032 and 1042 represent FF 803-2 shown in FIG. 8. FF 1051 and 1061 represent FF 803-3 shown in Fig. 8. Specifically, the FFs 1031 and 1041 are coupled to the coupling 1021 to receive a part of the data tx_data[31:16] and another part of the data tx_data[15:0]. The FF 1031 is coupled to the coupling 1022 via an inverter 1030 to receive the first local clock RDQS_F. The FF 1041 is coupled to the coupling 1023 to receive the second local clock RDQS_R. FF 1031 and 1041 are coupled to FF 1032 and 1042, respectively, to form a FIFO unit. The number of FFs is not limited to a specific number. The number of FFs can be implemented by any number of FFs.

此外，FIFO單元包括FF 1051及1061。FF 1051及1061被配置成基於DDR資料格式對一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]進行處理。FF 1051耦合到例如FF 1032，且FF 1061耦合到例如FF 1042。FF 1051及1061被配置成使用由時鐘產生器（即，115）產生的時鐘將來自Glink-3D slaveN及另一Glink-3D（即，Glink-3D slaveK）的FIFO單元的一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]進行重新計時。實行重新計時過程是為了使一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]與時鐘clk_in同步。通過與時鐘clk_in同步，使用例如由處理器產生的命令tx_data command以相同的頻率及相同的相位對一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]進行採樣。In addition, the FIFO unit includes FF 1051 and 1061. FF 1051 and 1061 are configured to process a part of data tx_data[31:16] and another part of data tx_data[15:0] based on the DDR data format. FF 1051 is coupled to, for example, FF 1032, and FF 1061 is coupled to, for example, FF 1042. FF 1051 and 1061 are configured to use the clock generated by the clock generator (ie, 115) to transfer part of the data tx_data[31: 16] and another part of the data tx_data[15:0] to re-timing. The re-timing process is implemented to synchronize a part of the data tx_data[31:16] and another part of the data tx_data[15:0] with the clock clk_in. By synchronizing with the clock clk_in, a part of the data tx_data[31:16] and another part of the data tx_data[15:0] are sampled at the same frequency and the same phase using, for example, the command tx_data command generated by the processor.

舉例來說，FF 1031及1041接收一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]。FF 1031通過從第一選通1015接收的第一本地時鐘RDQS_F對一部分資料tx_data[31:16]進行採樣。FF 1031將一部分資料tx_data[31:16]發送到FF 1032。FF 1051從例如FF 1032接收一部分資料tx_data[31:16]且基於時鐘clk_in對一部分資料tx_data[31:16]進行採樣。因此，FF 1041通過從第二選通1016接收的第二本地時鐘RDQS_R對另一部分資料tx_data[15:0]進行採樣。FF 1041將另一部分資料tx_data[15:0]發送到FF 1042。FF 1061從例如FF 1042接收另一部分資料tx_data[15:0]且基於時鐘clk_in對另一部分資料tx_data[15:0]進行採樣。最後，FF 1051及1061產生完整的資料rx_data[31:0]且將完整的資料rx_data[31:0]發送到處理器。也就是說，Glink-3D master的FIFO單元對從例如Glink-3D slaveN接收的資料tx_data[31:0]進行處理，以基於DDR資料格式產生完整的資料rx_data[31:0]。For example, FF 1031 and 1041 receive a part of data tx_data[31:16] and another part of data tx_data[15:0]. The FF 1031 samples a part of the data tx_data[31:16] through the first local clock RDQS_F received from the first strobe 1015. FF 1031 sends a part of the data tx_data[31:16] to FF 1032. The FF 1051 receives a part of the data tx_data[31:16] from, for example, the FF 1032 and samples a part of the data tx_data[31:16] based on the clock clk_in. Therefore, the FF 1041 samples another part of the data tx_data[15:0] through the second local clock RDQS_R received from the second strobe 1016. FF 1041 sends another part of data tx_data[15:0] to FF 1042. The FF 1061 receives another part of the data tx_data[15:0] from, for example, the FF 1042 and samples the other part of the data tx_data[15:0] based on the clock clk_in. Finally, FFs 1051 and 1061 generate complete data rx_data[31:0] and send the complete data rx_data[31:0] to the processor. In other words, the FIFO unit of the Glink-3D master processes the data tx_data[31:0] received from, for example, the Glink-3D slaveN to generate complete data rx_data[31:0] based on the DDR data format.

在另一實施例中，參照圖10，主器件（即，處理器）進一步產生轉向（TA）迴圈。TA迴圈是例如由Glink-3D master的FIFO單元在結合件1021處接收的Glink-3D slaveN的資料tx_data與Glink-3D slaveK的資料tx_data之間的間隔。舉例來說，由Glink-3D master的FIFO單元在結合件1021處接收的資料tx_data指的是Master RX_D。從Glink-3D slaveN接收的Master RX_D包含資料DN[15:0]及DN[31:16]。從Glink-3D slaveK接收的Master RX_D包含資料DK[15:0]及DK[31:16]。也就是說，TA迴圈是資料DN[31:16]與數據DK[15:0]之間的間隔。In another embodiment, referring to FIG. 10, the main device (ie, the processor) further generates a steering (TA) loop. The TA loop is, for example, the interval between the data tx_data of Glink-3D slaveN and the data tx_data of Glink-3D slaveK received at the coupling 1021 by the FIFO unit of the Glink-3D master. For example, the data tx_data received by the FIFO unit of the Glink-3D master at the coupling 1021 refers to the Master RX_D. The Master RX_D received from Glink-3D slaveN contains data DN[15:0] and DN[31:16]. The Master RX_D received from Glink-3D slaveK contains data DK[15:0] and DK[31:16]. In other words, TA loop is the interval between data DN[31:16] and data DK[15:0].

在此實施例中，TA迴圈用於防止從器件（即，slaveN器件）的回應與其他從器件（即，slaveK器件）的回應之間的匯流排競爭。舉例來說，在讀取操作期間，主器件/處理器利用分配時隙將包括從ID的命令發送到slaveN器件及slaveK器件。slaveN器件及slaveK器件分別將資料及本地時鐘經由Glink-3D slaveN及Glink-3D slaveK根據分配時隙發送到處理器。slaveN器件及slaveK器件根據分配時隙使用資料匯流排。Glink-3D slaveN將數據tx_data[31:0]通過結合件1011發送到Glink-3D master。Glink-3D slaveN還分別將第一本地時鐘RDQS_F及第二本地時鐘RDQS_R通過結合件1012及1013發送到Glink-3D master。Glink-3D master在結合件1021處從Glink-3D slaveN接收資料DN[15:0]及DN[31:16]。Glink-3D master使用第二本地時鐘RDQS_R對資料DN[15:0]進行採樣。Glink-3D master使用第一本地時鐘RDQS_F對及資料DN[31:16]進行採樣。In this embodiment, the TA loop is used to prevent bus competition between the response of the slave device (ie, the slaveN device) and the response of other slave devices (ie, the slaveK device). For example, during a read operation, the master device/processor uses the allocated time slot to send a command including the slave ID to the slaveN device and the slaveK device. The slaveN device and the slaveK device respectively send the data and the local clock to the processor via the Glink-3D slaveN and Glink-3D slaveK according to the allocated time slots. The slaveN device and slaveK device use the data bus according to the allocated time slot. The Glink-3D slaveN sends the data tx_data[31:0] to the Glink-3D master through the combination 1011. The Glink-3D slaveN also sends the first local clock RDQS_F and the second local clock RDQS_R to the Glink-3D master through the combination parts 1012 and 1013, respectively. The Glink-3D master receives the data DN[15:0] and DN[31:16] from the Glink-3D slaveN at the junction 1021. The Glink-3D master uses the second local clock RDQS_R to sample data DN[15:0]. The Glink-3D master uses the first local clock RDQS_F to sample the data DN[31:16].

且然後，Glink-3D slaveK將資料tx_data[31:0]經由Glink-3D slaveK的對應的結合件發送到Glink-3D master。Glink-3D slaveK還將第一本地時鐘RDQS_F及第二本地時鐘RDQS_R經由Glink-3D slaveK的對應的結合件發送到Glink-3D master。在TA迴圈之後，Glink-3D master在結合件1021處從Glink-3D slaveK接收資料DK[15:0]及DK[31:16]。Glink-3D master使用第二本地時鐘RDQS_R對資料DK[15:0]進行採樣。Glink-3D master使用第一本地時鐘RDQS_F對及資料DK[31:16]進行採樣。And then, Glink-3D slaveK sends the data tx_data[31:0] to the Glink-3D master via the corresponding combination of Glink-3D slaveK. The Glink-3D slaveK also sends the first local clock RDQS_F and the second local clock RDQS_R to the Glink-3D master via the corresponding combination of the Glink-3D slaveK. After the TA loop, the Glink-3D master receives the data DK[15:0] and DK[31:16] from the Glink-3D slaveK at the junction 1021. The Glink-3D master uses the second local clock RDQS_R to sample the data DK[15:0]. The Glink-3D master uses the first local clock RDQS_F to sample the data DK[31:16].

也就是說，在從slaveN器件及slaveK器件將資料傳送到處理器的期間，通過為slaveN及slaveK提供使用資料匯流排的時隙，TA迴圈會防止slaveN器件與slaveK器件之間的匯流排競爭。In other words, during the transfer of data from the slaveN device and the slaveK device to the processor, by providing the time slot for the slaveN and slaveK to use the data bus, the TA loop will prevent the bus competition between the slaveN device and the slaveK device. .

在此實施例中，TA迴圈用於補償從器件與其他從器件之間的往返延遲（RTD, round-trip-delays）差。RTD是由Glink-3D master發送的命令與由Glink-3D master接收的資料之間的間隔。由於每一從器件是例如由不同的製造公司生產，因此每一從器件具有不同的回應特性。回應特性包括RTD。從器件之間的RTD差由TA迴圈補償。In this embodiment, the TA loop is used to compensate for the round-trip-delays (RTD) difference between the slave device and other slave devices. RTD is the interval between the command sent by the Glink-3D master and the data received by the Glink-3D master. Since each slave device is produced by, for example, a different manufacturing company, each slave device has different response characteristics. Response characteristics include RTD. The RTD difference between the slave devices is compensated by the TA loop.

舉例來說，在讀取操作期間，slaveN器件及slaveK器件分別經由Glink-3D slaveN及Glink-3D slaveK從處理器接收命令。由於slaveN器件與slaveK器件具有不同的RTD，因此Glink-3D master在不同的時間從Glink-3D slaveN及Glink-3D slaveK接收資料。儘管Glink-3D master已配備有下拉功能，但如果RTD差大於由處理器分配到slaveN器件及slaveK器件的分配時隙差，則可能發生匯流排競爭。因此，通過將TA迴圈與RTD差相加（即，1迴圈+/-ΔRTD、1.5迴圈+/-ΔRTD），從Glink-3D slaveN（DN[15:0]及DN[31:16]）接收資料的時間與從Glink-3D slaveK（DK[15:0]及DK[31:16]）接收資料的時間之間的間隔得到維持，這樣一來，可避免匯流排競爭。For example, during a read operation, the slaveN device and the slaveK device receive commands from the processor via Glink-3D slaveN and Glink-3D slaveK, respectively. Since slaveN devices and slaveK devices have different RTDs, the Glink-3D master receives data from Glink-3D slaveN and Glink-3D slaveK at different times. Although the Glink-3D master has been equipped with a pull-down function, if the RTD difference is greater than the difference between the allocated time slots of the processor to the slaveN device and the slaveK device, bus contention may occur. Therefore, by adding TA loops and RTD differences (ie, 1 loop +/-ΔRTD, 1.5 loops +/-ΔRTD), from Glink-3D slaveN (DN[15:0] and DN[31:16 ]) The interval between the time of receiving data and the time of receiving data from Glink-3D slaveK (DK[15:0] and DK[31:16]) is maintained. In this way, bus contention can be avoided.

圖11示出根據本公開實施例的具有相同時鐘速度的兩個從晶片之間的資料的示意性時序圖。並且圖12示出根據本公開實施例的具有不同時鐘速度的兩個從晶片之間的資料的示意性時序圖。slaveN器件及slaveK器件用作時序圖的實例。FIG. 11 shows a schematic timing diagram of data between two slave chips having the same clock speed according to an embodiment of the present disclosure. And FIG. 12 shows a schematic timing diagram of data between two slave chips with different clock speeds according to an embodiment of the present disclosure. SlaveN devices and slaveK devices are used as examples of timing diagrams.

在此實施例中，從器件（即，slaveN器件）及其他從器件（即，slaveK器件）在資料之前及資料之後產生零資料，以防止由於不同的RTD而引起的從器件與其他從器件之間的匯流排競爭。參照圖11，時鐘slaveK clk_B速度與時鐘slaveN clk_B速度具有相同速度（即，正常/典型速度）。slaveK器件配備有啟用信號tx_en且將資料tx_dataK經由Glink-3D slaveK發送到主器件。Glink-3D slaveK將資料dataK通過結合件（即，1011）遞送到Glink-3D master。因此，Glink-3D slaveK將資料dataK之前的零資料及資料dataK之後的零資料通過結合件（即，1011）發送到Glink-3D master。Glink-3D slaveK還產生本地時鐘RDQS_R及RDQS_F。In this embodiment, the slave device (ie, slaveN device) and other slave devices (ie, slaveK device) generate zero data before and after the data to prevent the difference between the slave device and other slave devices due to different RTDs. Bus competition between. Referring to FIG. 11, the clock slaveK clk_B speed and the clock slaveN clk_B speed have the same speed (ie, normal/typical speed). The slaveK device is equipped with an enable signal tx_en and sends the data tx_dataK to the master device via Glink-3D slaveK. The Glink-3D slaveK delivers the data dataK to the Glink-3D master through a combination (ie, 1011). Therefore, Glink-3D slaveK sends the zero data before the data dataK and the zero data after the data dataK to the Glink-3D master through the combination (ie, 1011). Glink-3D slaveK also generates local clocks RDQS_R and RDQS_F.

另一方面，slaveN器件配備有啟用信號tx_en且將資料tx_dataN0及tx_dataN1經由Glink-3D slaveN發送到主器件。Glink-3D slaveN將資料dataN0及dataN1經由對應的結合件遞送到Glink-3D master。因此，Glink-3D slaveN將資料dataN0及dataN1之前的零資料以及資料dataN0及dataN1之後的零資料經由對應的結合件發送到Glink-3D master。Glink-3D slaveN還產生本地時鐘RDQS_R及RDQS_F。由於資料dataK配備有資料dataK之後的零資料且資料dataN0配備有資料dataN0之前零資料，因此在資料dataK與資料dataN0之間存在間隔（即，1T TA時間）。也就是說，由slaveN器件及slaveK器件產生的零資料在資料dataK與資料dataN0之間產生間隔（即，1T TA時間），以防止在其中時鐘slaveK clk_B與時鐘slaveN clk_B具有相同速度的情況下，slaveN器件與slaveK器件之間的匯流排競爭。On the other hand, the slaveN device is equipped with an enable signal tx_en and sends the data tx_dataN0 and tx_dataN1 to the master device via the Glink-3D slaveN. The Glink-3D slaveN delivers the data dataN0 and dataN1 to the Glink-3D master via the corresponding combination. Therefore, Glink-3D slaveN sends the zero data before dataN0 and dataN1 and the zero data after dataN0 and dataN1 to the Glink-3D master via the corresponding combination. Glink-3D slaveN also generates local clocks RDQS_R and RDQS_F. Since the data dataK is equipped with zero data after the data dataK and the data dataN0 is equipped with zero data before the dataN0, there is an interval (ie, 1T TA time) between the data dataK and the dataN0. That is to say, the zero data generated by the slaveN device and the slaveK device generates an interval (ie, 1T TA time) between the data dataK and the dataN0 to prevent the clock slaveK clk_B and the clock slaveN clk_B from having the same speed. Bus competition between slaveN devices and slaveK devices.

在另一實施例中，參照圖12，圖11所示實施例與圖12所示實施例之間的不同之處在於，時鐘slaveK clk_B速度與時鐘slaveN clk_B速度具有不同速度。舉例來說，時鐘slaveK clk_B速度慢且時鐘slaveN clk_B速度快。換句話說，時鐘slaveN clk_B比時鐘slaveK clk_B快。換句話說，時鐘slaveN clk_B比時鐘slaveK clk_B早。較早時鐘的間隔小於1T（2.5 GHz為＜ 400 ps）。由於資料dataK配備有資料dataK之前的零資料及資料dataK之後的零資料，且資料dataN0配備有資料dataN0之前的零資料，因此在時鐘slaveN clk_B與時鐘slaveK clk_B之間存在間隔（＜1T）且在資料dataK與資料dataN0之間存在間隔（TA時間）。In another embodiment, referring to FIG. 12, the difference between the embodiment shown in FIG. 11 and the embodiment shown in FIG. 12 is that the clock slaveK clk_B speed and the clock slaveN clk_B speed have different speeds. For example, the clock slaveK clk_B is slow and the clock slaveN clk_B is fast. In other words, the clock slaveN clk_B is faster than the clock slaveK clk_B. In other words, the clock slaveN clk_B is earlier than the clock slaveK clk_B. The interval of the earlier clock is less than 1T (<400 ps for 2.5 GHz). Since the data dataK is equipped with the zero data before the data dataK and the zero data after the data dataK, and the data dataN0 is equipped with the zero data before the dataN0, there is an interval (<1T) between the clock slaveN clk_B and the clock slaveK clk_B and There is an interval (TA time) between data dataK and dataN0.

也就是說，由slaveN器件及slaveK器件產生的零資料在資料dataK與資料dataN0之間產生間隔（TA時間），以在其中時鐘slaveK clk_B與時鐘slaveN clk_B具有不同速度的情況下防止slaveN器件與slaveK器件之間的匯流排競爭。In other words, the zero data generated by the slaveN device and the slaveK device generates an interval (TA time) between the data dataK and the dataN0 to prevent the slaveN device and the slaveK from having different speeds in the clock slaveK clk_B and the clock slaveN clk_B. Bus competition between devices.

圖13示出根據本公開實施例的具有2個TA迴圈的兩個從晶片之間的資料的示意性時序圖。時序圖1300包括2個讀取等待時間迴圈及2個TA迴圈。讀取等待時間是Glink-3D slave經由對應的結合件從主器件接收命令的時間與Glink-3D slave經由對應的結合件根據命令發送資料的時間之間的間隔。FIG. 13 shows a schematic timing diagram of data between two slave wafers with 2 TA loops according to an embodiment of the present disclosure. The timing diagram 1300 includes 2 read waiting time loops and 2 TA loops. The read waiting time is the interval between the time when the Glink-3D slave receives a command from the master device through the corresponding combination and the time when the Glink-3D slave sends data according to the command through the corresponding combination.

具體來說，例如，在讀取操作期間，Glink-3D slaveK及Glink-3D slaveN接收包括從ID d_did及對應的時鐘clk_out的命令s_cmd。主器件在被發送到slaveK器件的讀取命令RD與被發送到slaveN器件的前同步命令PA之間發送命令NOP。命令NOP是無操作命令。前同步命令PA是用於從器件準備資料的命令。讀取命令RD是用於從器件在從器件已準備資料之後發送資料的讀取命令。Specifically, for example, during a read operation, Glink-3D slaveK and Glink-3D slaveN receive the command s_cmd including the slave ID d_did and the corresponding clock clk_out. The master device sends the command NOP between the read command RD sent to the slaveK device and the preamble command PA sent to the slaveN device. The command NOP is a no-operation command. The preamble command PA is a command used to prepare data from the device. The read command RD is a read command used by the slave device to send data after the slave device has prepared the data.

在此實施例中，slaveK器件以比slaveN器件根據由主器件分配的時隙發送資料（即，tx_dataN、前言碼）早的分配時隙發送資料（即，tx_dataK、前言碼）。當啟動啟用信號tx_en（即，1）時，將由slaveK器件發送的資料（即，tx_dataK、前言碼）和/或由slaveN器件發送的資料（即，tx_dataN、前言碼）遞送到對應的從結合件TX_D。反之，當將啟用信號tx_en去啟動（即，0）時，將由slaveK器件發送的資料（即，tx_dataK、前言碼）和/或由slaveN器件發送的資料（即，tx_dataN、前言碼）遞送到對應的從結合件TX_D。在讀取等待時間具有2個迴圈的情況下，例如由Glink-3D slaveK接收的命令NOP與由Glink-3D slaveK在對應的從結合件TX_D發送的資料dataK之間的間隔為2個迴圈。具有2個迴圈的讀取等待時間對應於由主器件發送的命令NOP。另一方面，在TA具有2個迴圈的情況下，由Glink-3D slaveK在對應的從結合件TX_D處發送的資料dataK與由Glink-3D slaveN在對應的從結合件TX_D處發送的資料dataN之間的間隔是2個迴圈+/- ΔRTD。In this embodiment, the slaveK device sends data (ie, tx_dataK, the preamble) at an earlier allocated time slot than the slaveN device sends data according to the time slot allocated by the master device (ie, tx_dataN, the preamble). When the enable signal tx_en (ie, 1) is activated, the data sent by the slaveK device (ie, tx_dataK, preamble) and/or the data sent by the slaveN device (ie, tx_dataN, preamble) are delivered to the corresponding slave component TX_D. Conversely, when the enable signal tx_en is deactivated (ie, 0), the data sent by the slaveK device (ie, tx_dataK, preamble) and/or the data sent by the slaveN device (ie, tx_dataN, preamble) are delivered to the corresponding The slave unit TX_D. In the case that the read waiting time has 2 loops, for example, the interval between the command NOP received by Glink-3D slaveK and the data dataK sent by Glink-3D slaveK in the corresponding slave TX_D is 2 loops . The read waiting time with 2 loops corresponds to the command NOP sent by the master device. On the other hand, when the TA has 2 loops, the data dataK sent by the Glink-3D slaveK at the corresponding slave connector TX_D and the data dataN sent by the Glink-3D slaveN at the corresponding slave connector TX_D The interval between is 2 cycles +/- ΔRTD.

也就是說，具有2個迴圈的TA容忍高達2T的差並且可由主器件在前同步命令PA之前添加命令NOP來設定。此外，在RTD差小於1週期T（2.5 GHz為400 ps）的情況下，具有1個迴圈的TA就足夠了。In other words, the TA with 2 loops tolerates a difference of up to 2T and can be set by the master device by adding the command NOP before the preamble command PA. In addition, in the case where the RTD difference is less than 1 cycle T (400 ps for 2.5 GHz), a TA with 1 loop is sufficient.

圖14示出根據本公開實施例的訓練之前與訓練之後的第一選通單元及第二選通單元的示意性比較。圖14所示方塊示意圖1400表示圖10所示方塊圖1000。圖14與圖10之間的不同之處在於，電路圖1400包括訓練之前與訓練之後的第一選通1015及第二選通1016的比較。FIG. 14 shows a schematic comparison of the first gating unit and the second gating unit before and after training according to an embodiment of the present disclosure. The block diagram 1400 shown in FIG. 14 represents the block diagram 1000 shown in FIG. 10. The difference between FIG. 14 and FIG. 10 is that the circuit diagram 1400 includes a comparison of the first strobe 1015 and the second strobe 1016 before and after training.

在此實施例中，從器件（即，slaveN器件）及其他從器件（即，slaveK器件）訓練第一選通1015及第二選通1016以在最佳資料採樣點定位一部分資料（即，DN[31:16]）及另一部分資料（即，DN[15:0]）。一部分資料（即，DN[31:16]）及另一部分資料（即，DN[15:0]）被稱為例如資料群集。具體來說，當半導體器件被啟動/接通時，主器件逐個選擇從器件以進行訓練。舉例來說，主器件選擇slaveN器件。由主器件選擇的slaveN器件管理如下所述的訓練序列。slaveN器件將Glink-3D slaveN的第一選通1015及第二選通1016設定為零，第一選通1015及第二選通1016由第一本地時鐘RDQS_F Initial及第二本地時鐘RDQS_R Initial表示。且然後，slaveN器件將BIST資料（即，DN[31:16]及DN[15:0]）發送到主器件。主器件在對應的主結合件處接收BIST資料（即，DN[31:16]及DN[15:0]），所述BIST資料例如由RX_D表示。主器件單獨向slaveN器件報告資料DN[31:16]及DN[15:0]的通過/失敗。slaveN器件使第一本地時鐘RDQS_F Initial的相位及第二本地時鐘RDQS_R Initial的相位遞增。繼續進行使第一本地時鐘RDQS_F Initial的相位及第二本地時鐘RDQS_R Initial的相位遞增的過程，直到slaveN器件接收到由主器件報告的第一個通過及最後一個通過。當主器件報告最後一個通過時，slaveN器件停止將BIST資料發送到主器件。例如在主器件報告通過之後，在報告失敗之後獲得最後一個通過。且然後，slaveN器件通過例如將總通過除以2在中間點處設定第一本地時鐘的相位及第二本地時鐘的相位。因此，slaveN器件向主器件發送就緒資料。第一個通過是由例如第一本地時鐘的RDQS_F Initial及第二本地時鐘的RDQS_R Initial分別表示。中間點是由例如第一本地時鐘的RDQS_F Trained及第二本地時鐘的RDQS_R Trained分別表示。中間點表示最佳資料採樣點。In this embodiment, the slave device (ie, slaveN device) and other slave devices (ie, slaveK device) train the first strobe 1015 and the second strobe 1016 to locate a part of the data at the best data sampling point (ie, DN [31:16]) and another part of the data (ie, DN[15:0]). One part of the data (ie, DN[31:16]) and another part of the data (ie, DN[15:0]) are called, for example, a data cluster. Specifically, when the semiconductor device is turned on/on, the master device selects the slave devices one by one for training. For example, the master device selects the slaveN device. The slaveN device selected by the master device manages the training sequence described below. The slaveN device sets the first strobe 1015 and the second strobe 1016 of the Glink-3D slaveN to zero. The first strobe 1015 and the second strobe 1016 are represented by the first local clock RDQS_F Initial and the second local clock RDQS_R Initial. And then, the slaveN device sends the BIST data (ie, DN[31:16] and DN[15:0]) to the master device. The master device receives the BIST data (ie, DN[31:16] and DN[15:0]) at the corresponding master combination, and the BIST data is represented by, for example, RX_D. The master device separately reports the pass/fail of the data DN[31:16] and DN[15:0] to the slaveN device. The slaveN device increments the phase of the first local clock RDQS_F Initial and the phase of the second local clock RDQS_R Initial. Continue the process of increasing the phase of the first local clock RDQS_F Initial and the phase of the second local clock RDQS_R Initial until the slaveN device receives the first pass and the last pass reported by the master device. When the master device reports that the last one passed, the slaveN device stops sending BIST data to the master device. For example, after the master device reports a pass, the last pass is obtained after the report fails. And then, the slaveN device sets the phase of the first local clock and the phase of the second local clock at an intermediate point by, for example, dividing the total pass by two. Therefore, the slaveN device sends ready data to the master device. The first pass is represented by, for example, RDQS_F Initial of the first local clock and RDQS_R Initial of the second local clock, respectively. The intermediate point is represented by, for example, RDQS_F Trained of the first local clock and RDQS_R Trained of the second local clock, respectively. The middle point represents the best data sampling point.

也就是說，最佳資料採樣點通過如下方式獲得：單獨地使第一選通1015的第一本地時鐘的相位遞增及使第二選通1016的第二本地時鐘的相位遞增直到獲得最佳採樣點。That is to say, the optimal data sampling point is obtained by individually increasing the phase of the first local clock of the first gate 1015 and increasing the phase of the second local clock of the second gate 1016 until the optimum sampling is obtained point.

在另一實施例中，從器件（即，slaveN器件）使用主介面Glink-3D master的第一選通的第一時鐘及第二選通的第二時鐘來更新從介面（Glink-3D slaveN）的第一選通的第一本地時鐘及第二選通的第二本地時鐘，以補償電壓到溫度（voltage-to-temperature，V-T）改變。In another embodiment, the slave device (ie, slaveN device) uses the first clock of the first strobe of the master interface Glink-3D master and the second clock of the second strobe to update the slave interface (Glink-3D slaveN) The first local clock of the first strobe and the second local clock of the second strobe are used to compensate for voltage-to-temperature (VT) changes.

舉例來說，半導體器件在正常處理期間具有正常溫度且在高溫處理期間具有高溫。在高溫期間經由Glink-3D從從器件（即，slaveN器件）經由從介面（即，Glink-3D slaveN）發送到主器件的資料具有例如比正常溫度期間長的持續時間/週期。slaveN器件根據資料在高溫下的週期及資料在常溫下的週期更新第一本地時鐘的相位（即，RDQS_F Trained）及第二本地時鐘的相位（即，RDQS_R Trained）。通過比較資料在常溫下的中間點與資料在高溫下的中間點來實行更新過程。For example, a semiconductor device has a normal temperature during normal processing and a high temperature during high temperature processing. The data sent from the Glink-3D slave device (ie, slaveN device) to the master device via the slave interface (ie, Glink-3D slaveN) during the high temperature period has, for example, a longer duration/period than during the normal temperature period. The slaveN device updates the phase of the first local clock (ie, RDQS_F Trained) and the phase of the second local clock (ie, RDQS_R Trained) according to the period of the data at high temperature and the period of the data at room temperature. The update process is carried out by comparing the middle point of the data at room temperature with the middle point of the data at high temperature.

也就是說，通過根據主介面的第一時鐘及第二時鐘在不同溫度下更新從介面的第一本地時鐘的相位及第二本地時鐘的相位，可補償V-T改變。因此，主介面在最佳資料採樣點對從從介面接收的資料進行採樣。That is, by updating the phase of the first local clock and the phase of the second local clock of the slave interface at different temperatures according to the first clock and the second clock of the master interface, the V-T change can be compensated. Therefore, the master interface samples the data received from the slave interface at the optimal data sampling point.

圖15示出根據本公開實施例的DLL訓練的示意性流程圖。流程圖1500是在DLL訓練開始前實行。DLL訓練旨在獲得DLL的最大步階。DLL的最大步階指的是DLL延遲從介面（即，Glink-3D slaveN）中的時鐘的能力。DLL指的是第一選通1015及第二選通1016。DLL訓練是在兩個不同的視角下實行，所述兩個不同的視角包括內部積體電路（Inter-Integrated Circuit，I2C）序列及從序列。I2C序列是在I2C協議中實行的流程圖。並且從序列是在從器件中實行的流程圖。FIG. 15 shows a schematic flowchart of DLL training according to an embodiment of the present disclosure. The flowchart 1500 is executed before the DLL training starts. DLL training aims to obtain the maximum step of DLL. The maximum step of the DLL refers to the ability of the DLL to delay the clock in the slave interface (ie, Glink-3D slaveN). DLL refers to the first strobe 1015 and the second strobe 1016. DLL training is carried out under two different perspectives, the two different perspectives including an internal integrated circuit (Inter-Integrated Circuit, I2C) sequence and a slave sequence. The I2C sequence is a flowchart implemented in the I2C protocol. And the slave sequence is a flowchart executed in the slave device.

在I2C序列中，從步驟S1505到步驟S1520實行DLL訓練。在步驟S1505中，清除/重置DLL值。且然後，在步驟S1510中，將每一從器件的寄存器設定為通過例如將DLL訓練旗標改變為1來啟用DLL訓練。用於啟用DLL訓練的寄存器指的是累加器（accumulator，ACC）。在步驟S1515中，檢查指示DLL訓練完成的從旗標。實行步驟S1515，直到通過例如將對應旗標改變為1來設定指示DLL訓練完成的從旗標。對所有從器件（即，slaveN器件、slaveK器件）實行步驟S1515。在步驟S1520中，當設定了所有從器件的對應旗標時，通過例如將DLL訓練旗標改變為0來重置DLL訓練旗標。通過這樣做，表示DLL訓練的每一從器件的寄存器被禁用。也就是說，通過實行步驟S1505到S1520，獲得每一從器件的DLL的最大步階/延遲。In the I2C sequence, DLL training is performed from step S1505 to step S1520. In step S1505, the DLL value is cleared/reset. And then, in step S1510, the register of each slave device is set to enable DLL training by, for example, changing the DLL training flag to 1. The register used to enable DLL training refers to the accumulator (ACC). In step S1515, the slave flag indicating the completion of the DLL training is checked. Step S1515 is performed until the slave flag indicating that the DLL training is completed is set by, for example, changing the corresponding flag to 1. Step S1515 is performed on all slave devices (ie, slaveN devices, slaveK devices). In step S1520, when the corresponding flags of all slave devices are set, the DLL training flag is reset by, for example, changing the DLL training flag to 0. By doing so, the register of each slave device that represents DLL training is disabled. That is, by performing steps S1505 to S1520, the maximum step/delay of the DLL of each slave device is obtained.

在從序列中，通過步驟S1555到S1575實行DLL訓練。在步驟S1555中，從器件（即，slaveN器件）檢查DLL訓練是否被啟用。在步驟S1560中，如果DLL訓練被啟用，則通過例如向DLL值加1來增大DLL值。在步驟S1565中，檢查滯後旗標及超前旗標。如果DLL值最大，則滯後旗標顯示0且超前旗標顯示1，因此，如果DLL值不是最大，則重複步驟S1560。如果DLL值最大，則通過將DLL值減少1將步驟繼續到步驟S1570。將DLL值減少1的原因是，當步驟S1565中的條件為否時，最大值表示條件中DLL值的最後值。最後，在步驟S1575中，從器件設定表示DLL訓練完成的旗標。也就是說，表示DLL完成的旗標表示從器件（即，slaveN器件）的DLL訓練完成。因此，獲得最大DLL值。步驟S1555到S1575由每一從器件實行。In the slave sequence, DLL training is performed through steps S1555 to S1575. In step S1555, the slave device (ie, slaveN device) checks whether the DLL training is enabled. In step S1560, if DLL training is enabled, the DLL value is increased by, for example, adding 1 to the DLL value. In step S1565, the lagging flag and the leading flag are checked. If the DLL value is the largest, the lag flag displays 0 and the advance flag displays 1; therefore, if the DLL value is not the largest, step S1560 is repeated. If the DLL value is the largest, the step continues to step S1570 by reducing the DLL value by one. The reason for reducing the DLL value by 1 is that when the condition in step S1565 is No, the maximum value represents the final value of the DLL value in the condition. Finally, in step S1575, the slave device sets a flag indicating that the DLL training is completed. In other words, the flag indicating the completion of the DLL indicates that the DLL training of the slave device (ie, the slaveN device) is completed. Therefore, the maximum DLL value is obtained. Steps S1555 to S1575 are executed by each slave device.

圖16示出根據本公開實施例的寫入資料群集訓練的示意性流程圖。在根據流程圖1500獲得最大DLL值之後，如流程圖1600中所示通過寫入資料群集訓練繼續DLL訓練。由於寫入資料群集訓練是為了根據最佳時鐘相位將資料從處理器105寫入到從器件（即，slaveN器件、slaveK器件），因此寫入資料群集訓練指的是主到從訓練。寫入資料群集訓練是在不同的視角下實行，所述不同的視角包括I2C序列、主序列及從序列。寫入資料群集訓練的目的是在寫入資料期間獲得DLL的中間值。通過根據DLL的中間值寫入資料，正確地寫入資料，因此，可將位元錯誤最小化。DLL的中間值表示最佳時鐘相位。FIG. 16 shows a schematic flowchart of writing data cluster training according to an embodiment of the present disclosure. After obtaining the maximum DLL value according to the flowchart 1500, the DLL training is continued by writing data cluster training as shown in the flowchart 1600. Since the data-writing cluster training is to write data from the processor 105 to the slave devices (ie, slaveN devices, slaveK devices) according to the optimal clock phase, the data-writing cluster training refers to master-to-slave training. The written data cluster training is implemented under different perspectives, which include the I2C sequence, the master sequence, and the slave sequence. The purpose of data writing cluster training is to obtain the intermediate value of the DLL during data writing. By writing data according to the intermediate value of the DLL, the data is written correctly, and therefore, bit errors can be minimized. The middle value of DLL represents the best clock phase.

在I2C序列中，在步驟S1605到S1625中實行寫入資料群集訓練。在步驟S1605中，將處理器105的對應的寄存器設定為啟用寫入資料群集訓練。在步驟S1610中，將每一從器件的寄存器設定為啟用寫入資料群集訓練。在步驟S1615中，檢查與寫入資料群集訓練完成對應的每一從器件的寄存器。如果設定了與完成寫入資料群集訓練對應的每一從器件的寄存器，則通過禁用每一從器件的寄存器來實行步驟S1620。在步驟S1625中，禁用處理器105的寄存器。也就是說，通過獲得與完成寫入資料群集訓練對應的每一從器件的寄存器，已優化用於寫入資料的每一從器件的DLL值。因此，可將位元錯誤最小化。In the I2C sequence, write data cluster training is performed in steps S1605 to S1625. In step S1605, the corresponding register of the processor 105 is set to enable the write data cluster training. In step S1610, the register of each slave device is set to enable the write data cluster training. In step S1615, the register of each slave device corresponding to the completion of the write-in data cluster training is checked. If the register of each slave device corresponding to the completion of the write data cluster training is set, step S1620 is performed by disabling the register of each slave device. In step S1625, the register of the processor 105 is disabled. In other words, by obtaining the register of each slave device corresponding to the completion of the written data cluster training, the DLL value of each slave device used for writing data has been optimized. Therefore, bit errors can be minimized.

在主序列中，在步驟S1630到S1645中實行寫入資料群集訓練。在步驟S1630中，處理器105檢查寫入資料群集訓練是否被啟用。如果寫入資料群集訓練被啟用，則在步驟S1635中，啟用BIST產生器。在步驟S1640中，處理器105檢查寫入資料群集訓練是否被禁用。在當對所有從器件的寫入資料群集訓練已完成的情況下，禁用寫入資料群集訓練。在步驟S1645中，由於已完成對所有從器件的寫入資料群集訓練，因此禁用BIST產生器。也就是說，通過獲得寫入資料群集訓練被禁用，已完成對所有從器件的寫入資料群集訓練。因此，已獲得用於寫入資料的最佳時鐘相位。In the main sequence, write data cluster training is performed in steps S1630 to S1645. In step S1630, the processor 105 checks whether the write data cluster training is enabled. If the write data cluster training is enabled, then in step S1635, the BIST generator is enabled. In step S1640, the processor 105 checks whether the write data cluster training is disabled. When the write data cluster training for all slave devices has been completed, disable the write data cluster training. In step S1645, since the write data cluster training for all slave devices has been completed, the BIST generator is disabled. In other words, the cluster training is disabled by obtaining the written data, and the written data cluster training for all slave devices has been completed. Therefore, the optimal clock phase for writing data has been obtained.

在從序列中，在步驟S1650到S1695中實行寫入資料群集訓練。在步驟S1650中，檢查與寫入資料群集訓練被啟用對應的寄存器。回應於寫入資料群集訓練被啟用，在步驟S1655中，將DLL值設定為0。在步驟S1660中，啟用BIST檢查器。通過啟用BIST檢查器，檢查由處理器105產生的BIST。在步驟S1665中，在例如X次以內檢查BIST。X表示等於或大於1的整數值。X也可表示檢查BIST的持續時間。如果已檢查BIST X次，則在步驟S1670中禁用BIST檢查器。在步驟S1675中，更新表示通過值的DLL視窗。通過值表示從器件正確讀取BIST。在步驟S1680中，檢查DLL值是否達到最大值。DLL的最大值已根據圖15獲得。如果DLL值不是最大值，則在步驟S1685中增大DLL值。且然後，重複步驟S1660到S1685，直到DLL值達到最大循環/值。在步驟S1690中，如果DLL值達到最大循環，則將DLL值設定為通過視窗的中間值。通過視窗的中間值表示在最佳時鐘相位將BIST寫入到從器件。在步驟S1695中，通過例如將對應旗標改變為1來設定表示寫入資料群集訓練完成的寄存器。在主序列中，處理器105檢查此旗標，以確定已完成對所有從器件的寫入資料群集訓練。In the slave sequence, write data cluster training is performed in steps S1650 to S1695. In step S1650, the register corresponding to the activation of the write data cluster training is checked. In response to the write data cluster training being enabled, in step S1655, the DLL value is set to 0. In step S1660, the BIST checker is enabled. By enabling the BIST checker, the BIST generated by the processor 105 is checked. In step S1665, the BIST is checked within X times, for example. X represents an integer value equal to or greater than 1. X can also indicate the duration of the BIST check. If the BIST has been checked X times, the BIST checker is disabled in step S1670. In step S1675, the DLL window indicating the pass value is updated. The pass value indicates that the slave device reads the BIST correctly. In step S1680, it is checked whether the DLL value reaches the maximum value. The maximum value of DLL has been obtained according to Figure 15. If the DLL value is not the maximum value, the DLL value is increased in step S1685. And then, steps S1660 to S1685 are repeated until the DLL value reaches the maximum cycle/value. In step S1690, if the DLL value reaches the maximum cycle, the DLL value is set to the middle value through the window. The middle value through the window indicates that the BIST is written to the slave device at the optimal clock phase. In step S1695, by changing the corresponding flag to 1, for example, a register indicating the completion of the write data cluster training is set. In the main sequence, the processor 105 checks this flag to determine that the write data cluster training for all slave devices has been completed.

圖17示出根據本公開實施例的讀取資料群集訓練的示意性流程圖。可在流程圖1600後實行流程圖1700。實行讀取資料群集訓練以獲得用於讀取資料的最佳時鐘相位。讀取資料群集訓練是在不同的視角下實行，所述不同的視角包括I2C序列、主序列及從序列。Fig. 17 shows a schematic flow chart of reading data cluster training according to an embodiment of the present disclosure. The flowchart 1700 can be implemented after the flowchart 1600. Perform reading data cluster training to obtain the best clock phase for reading data. The reading data cluster training is implemented under different perspectives, which include the I2C sequence, the master sequence, and the slave sequence.

在I2C序列中，在步驟S1702到S1716中實行讀取資料群集訓練。在步驟S1702中，從從器件的對應的寄存器讀取最大DLL值。在步驟S1704中，將從從器件的對應的寄存器讀取的DLL值寫入到處理器105的寄存器。在步驟S1706中，將表示讀數據群集訓練被啟用的對應旗標設定到從器件的寄存器。在步驟S1708中，將表示讀取資料群集訓練被啟用的對應旗標設定到處理器105的寄存器。在步驟S1710中，從從器件檢查表示讀取資料群集訓練完成的對應旗標。在步驟S1712中，如果表示讀取資料群集訓練完成的對應旗標被啟用，則禁用表示讀取資料群集訓練的處理器105的對應旗標。在步驟S1714中，禁用表示讀取資料群集訓練的從器件的對應旗標。在步驟S1716中，檢查每一從器件是否已實行讀取資料群集訓練。如果一個或多個從器件尚未實行讀取資料群集訓練的過程，則重複步驟S1706到S1716，直到所有從器件已實行讀取資料群集訓練。也就是說，通過獲得每一從器件的寄存器的對應旗標已被啟用，每一從器件已實行讀取資料群集訓練。In the I2C sequence, read data cluster training is performed in steps S1702 to S1716. In step S1702, the maximum DLL value is read from the corresponding register of the slave device. In step S1704, the DLL value read from the corresponding register of the slave device is written to the register of the processor 105. In step S1706, a corresponding flag indicating that the read data cluster training is enabled is set to the register of the slave device. In step S1708, a corresponding flag indicating that the read data cluster training is enabled is set to the register of the processor 105. In step S1710, the slave device checks the corresponding flag indicating that the training of the read data cluster is completed. In step S1712, if the corresponding flag indicating that the training of the reading data cluster is completed is enabled, the corresponding flag of the processor 105 indicating the training of the reading data cluster is disabled. In step S1714, the corresponding flag representing the slave device of the read data cluster training is disabled. In step S1716, it is checked whether each slave device has implemented read data cluster training. If one or more slave devices have not implemented the data-reading cluster training process, steps S1706 to S1716 are repeated until all the slave devices have implemented the data-reading cluster training. That is to say, by obtaining the corresponding flag of each slave device's register has been enabled, each slave device has implemented read data cluster training.

在主序列中，在步驟S1720到S1748中實行讀取資料群集訓練。在步驟S1720中，處理器105檢查與讀取資料群集訓練對應的寄存器的旗標。在步驟S1722中，如果與讀取資料群集訓練對應的寄存器的旗標被啟用，則將DLL值設定為0。在步驟S1724中，處理器105設定用於更新DLL_r值的命令。在步驟S1726中，處理器105設定用於更新DLL_f值的命令。在步驟S1728中，處理器105重置讀取FIFO。讀取FIFO需要被重置的原因是為了避免處理器105從從器件讀取錯誤的讀取資料序列。如果讀取FIFO沒有被清除，則讀取FIFO中的資料序列可能不表示正確的資料序列。在步驟S1730中，處理器105設定用於啟用tx_en的命令。在步驟S1732中，處理器105啟用BIST檢查器。通過啟用BIST檢查器，處理器準備讀取由從器件產生的BIST資料。在步驟S1734中，在X次內讀取由從器件產生的BIST資料。在前述說明中已闡述X。在步驟S1736中，如果已在X次內讀取BIST資料，則處理器105禁用BIST檢查器。在步驟S1738中，處理器105設定用於禁用tx_en的命令。在步驟S1740中，處理器105更新通過視窗。在前述說明中已闡述通過視窗。在步驟S1742中，檢查DLL值是否達到最大循環/值。在步驟S1744中，如果DLL值未達到最大循環，則增大DLL。且然後，重複步驟S1724到S1744，直到DLL值達到最大值。在步驟S1746中，如果DLL值已達到最大值，則將DLL設定為從器件的通過視窗的中間值。在步驟S1748中，設定表示讀取資料群集訓練完成的旗標。In the main sequence, read data cluster training is performed in steps S1720 to S1748. In step S1720, the processor 105 checks the flag of the register corresponding to the read data cluster training. In step S1722, if the flag of the register corresponding to the read data cluster training is enabled, the DLL value is set to 0. In step S1724, the processor 105 sets a command for updating the value of DLL_r. In step S1726, the processor 105 sets a command for updating the value of DLL_f. In step S1728, the processor 105 resets the read FIFO. The reason why the read FIFO needs to be reset is to prevent the processor 105 from reading the wrong read data sequence from the slave device. If the read FIFO is not cleared, the data sequence in the read FIFO may not indicate the correct data sequence. In step S1730, the processor 105 sets a command for enabling tx_en. In step S1732, the processor 105 enables the BIST checker. By enabling the BIST checker, the processor is ready to read the BIST data generated by the slave device. In step S1734, the BIST data generated by the slave device is read in X times. X has been explained in the foregoing description. In step S1736, if the BIST data has been read within X times, the processor 105 disables the BIST checker. In step S1738, the processor 105 sets a command for disabling tx_en. In step S1740, the processor 105 updates the through window. Through the window has been explained in the foregoing description. In step S1742, it is checked whether the DLL value reaches the maximum cycle/value. In step S1744, if the DLL value does not reach the maximum cycle, the DLL is increased. And then, steps S1724 to S1744 are repeated until the DLL value reaches the maximum value. In step S1746, if the DLL value has reached the maximum value, the DLL is set to the middle value of the pass window of the slave device. In step S1748, a flag indicating completion of the training of the read data cluster is set.

在從序列中，在步驟S1750到S1766中實行讀取資料群集訓練。在步驟S1750中，檢查與讀取資料群集訓練啟用對應的旗標。在步驟S1752中，如果設定了與讀取資料群集訓練啟用對應的旗標，則啟用BIST產生器。通過啟用BIST產生器，從器件產生BIST資料並相應地將BIST資料發送到處理器105。在步驟S1754中，從器件檢查處理器105是否根據命令設定tx_en。在步驟S1756中，處理器105根據命令設定tx_en，從器件啟用tx_en。在步驟S1758中，從器件檢查處理器105是否根據命令清除tx_en。在步驟S1760中，處理器105根據命令清除tx_en，從器件禁用tx_en。在步驟S1762中，從器件檢查是否更新了DLL_r ot或DLL_f。如果更新了DLL_r或DLL_f，則重複步驟S1754到S1762。在步驟S1764中，如果從器件未更新DLL_r或DLL_f，則檢查表示讀取資料群集訓練被禁用的旗標。如果表示讀取資料群集訓練的旗標未被禁用，則重複步驟S1762到S1764。在步驟S1766中，如果表示讀取資料群集訓練的旗標被禁用，則禁用BIST產生器。也就是說，從器件通過實行讀取資料群集訓練來更新DLL_r和/或DLL_f。In the slave sequence, read data cluster training is performed in steps S1750 to S1766. In step S1750, the flag corresponding to the activation of the read data cluster training is checked. In step S1752, if a flag corresponding to the activation of the read data cluster training is set, the BIST generator is activated. By enabling the BIST generator, the slave device generates BIST data and sends the BIST data to the processor 105 accordingly. In step S1754, the slave device checks whether the processor 105 sets tx_en according to the command. In step S1756, the processor 105 sets tx_en according to the command, and the slave device enables tx_en. In step S1758, the slave device checks whether the processor 105 clears tx_en according to the command. In step S1760, the processor 105 clears tx_en according to the command, and the slave device disables tx_en. In step S1762, the slave device checks whether DLL_rot or DLL_f is updated. If DLL_r or DLL_f is updated, steps S1754 to S1762 are repeated. In step S1764, if the slave device does not update DLL_r or DLL_f, a flag indicating that the read data cluster training is disabled is checked. If the flag indicating the training of the read data cluster is not disabled, steps S1762 to S1764 are repeated. In step S1766, if the flag indicating the training of the read data cluster is disabled, the BIST generator is disabled. In other words, the slave device updates DLL_r and/or DLL_f by performing read data cluster training.

此外，用於讀取資料群集訓練的命令的實例被提供如下。由於在讀數據群集訓練中使用的DLL值是9個位，則這9個位是通過對讀取命令的第一位（S_CMD[0]）、從到主ID的4個位（S_DID[3:0]）及主到從ID的4個位（M_DID[3:0]）進行組合產生。另一方面，通過對讀取命令的第二位（S_CMD[1]）與寫入命令的2個位（M_CMD[1:0]）進行組合來產生所述命令。舉例來說，所述命令通過將位值設定為{0,0,0}來產生IDLE命令。所述命令通過將位值設定為{0,0,1}來產生更新DLL_r值命令。所述命令通過將位值設定為{0,1,0}來產生更新DLL_f值命令。所述命令通過將位值設定為{0,1,1}來產生更新DLL值命令。所述命令通過將位值設定為{1,0,1}來產生tx_en啟用命令。所述命令通過將位值設定為{1,1,0}來產生tx_en禁用命令。In addition, examples of commands for reading data cluster training are provided as follows. Since the DLL value used in the read data cluster training is 9 bits, these 9 bits are passed to the first bit (S_CMD[0]) of the read command and the 4 bits from the master ID (S_DID[3: 0]) and the 4 bits of the master to slave ID (M_DID[3:0]) are combined to generate. On the other hand, the command is generated by combining the second bit (S_CMD[1]) of the read command and 2 bits (M_CMD[1:0]) of the write command. For example, the command generates an IDLE command by setting the bit value to {0, 0, 0}. The command generates an update DLL_r value command by setting the bit value to {0, 0, 1}. The command generates an update DLL_f value command by setting the bit value to {0,1,0}. The command generates an update DLL value command by setting the bit value to {0, 1, 1}. The command generates the tx_en enable command by setting the bit value to {1,0,1}. The command generates the tx_en disable command by setting the bit value to {1,1,0}.

圖18示出根據本公開實施例的介面方法。在步驟S1805中，通過由主介面將命令發送到從器件來開始半導體器件的介面方法。在步驟S1810中，通過由從介面及其他從介面中的每一個從主器件接收命令和/或將資料/其他資料發送到主器件來繼續進行介面方法。步驟S1810包括在步驟S1815中使用雙倍數據速率（DDR）配置將資料/其他資料發送到主介面。步驟S1815包括步驟S1820到S1840。在步驟1820中，通過由第一觸發器（FF）單元根據資料/其他資料產生一部分資料來繼續進行設備方法。在步驟S1825中，通過由第二FF單元根據資料/其他資料產生另一部分資料來繼續進行設備方法。在步驟S1830中，通過由多工器將一部分資料及另一部分資料發送到主器件來繼續進行設備方法。在步驟1835中，通過由第一選通單元根據由時鐘產生器產生的時鐘產生第一本地時鐘來繼續進行設備方法。在步驟S1840中，通過由第二選通單元根據由時鐘產生器產生的時鐘產生第二本地時鐘來繼續進行設備方法。且然後，在步驟S1845中，由主介面從從器件接收資料來繼續進行設備方法。步驟S1845包括由主介面使用在步驟S1850中由時鐘產生器產生的時鐘，對來自從介面及其他從介面的DDR單元的一部分資料及另一部分資料進行重新計時。FIG. 18 shows an interface method according to an embodiment of the present disclosure. In step S1805, the interface method of the semiconductor device is started by sending a command from the master interface to the slave device. In step S1810, the interface method is continued by receiving commands from the master device and/or sending data/other data to the master device by each of the slave interface and the other slave interfaces. Step S1810 includes using double data rate (DDR) configuration to send data/other data to the main interface in step S1815. Step S1815 includes steps S1820 to S1840. In step 1820, the device method is continued by generating a part of the data based on the data/other data by the first flip-flop (FF) unit. In step S1825, the device method is continued by generating another part of data from the data/other data by the second FF unit. In step S1830, the device method is continued by sending part of the data and another part of the data to the master device by the multiplexer. In step 1835, the device method is continued by generating the first local clock by the first gating unit according to the clock generated by the clock generator. In step S1840, the device method is continued by generating a second local clock by the second gating unit according to the clock generated by the clock generator. And then, in step S1845, the master interface receives data from the slave device to continue the device method. Step S1845 includes using the clock generated by the clock generator in step S1850 by the master interface to re-time a part of the data and another part of the data from the DDR unit of the slave interface and other slave interfaces.

總之，用於3D半導體器件的介面器件及介面方法在主器件與從器件之間提供可靠的資料通信。可靠的資料通信是通過向每一從器件提供特定的時隙來實現。主器件還提供從器件之間的資料等待時間。通過這樣做，可避免從器件之間的匯流排競爭。此外，為了以最佳採樣相位對資料進行採樣，當半導體器件啟動/接通時，從設備訓練本地時鐘。通過訓練本地時鐘，可在最佳資料採樣點對資料進行採樣，這樣一來，可降低錯誤率。另外，從器件還更新本地時鐘，以補償半導體器件的T-V改變。In short, the interface device and interface method used for 3D semiconductor devices provide reliable data communication between the master device and the slave device. Reliable data communication is achieved by providing specific time slots to each slave device. The master device also provides the data waiting time between the slave devices. By doing so, bus competition between slave devices can be avoided. In addition, in order to sample the data with the best sampling phase, when the semiconductor device is turned on/on, the slave device trains the local clock. By training the local clock, the data can be sampled at the best data sampling point, so that the error rate can be reduced. In addition, the slave device also updates the local clock to compensate for the T-V change of the semiconductor device.

在另一實施例中，提供用於在主器件與從器件之間進行介面的介面器件，其中所述主器件產生命令且所述從器件根據所述命令產生資料，所述介面器件包括主介面以及從介面。所述主介面耦合到所述主器件。所述主介面被配置成將所述命令發送到所述從器件和/或從所述從器件接收所述資料。所述從介面耦合到所述從器件。所述從介面被配置成從所述主器件接收所述命令和/或將所述資料發送到所述主器件。所述主介面及所述從介面由時鐘產生器產生的時鐘驅動。所述主介面與所述從介面由一個或多個結合件和/或TSV進行電連接。用於驅動所述從介面的所述時鐘是通過將所述時鐘的時鐘相位改變成與所述命令的資料群集和/或所述資料的資料群集對齊來進行訓練。In another embodiment, an interface device for interfacing between a master device and a slave device is provided, wherein the master device generates a command and the slave device generates data according to the command, and the interface device includes a master interface And from the interface. The main interface is coupled to the main device. The master interface is configured to send the command to the slave device and/or receive the data from the slave device. The slave interface is coupled to the slave device. The slave interface is configured to receive the command from the master device and/or send the data to the master device. The master interface and the slave interface are driven by clocks generated by a clock generator. The master interface and the slave interface are electrically connected by one or more couplings and/or TSVs. The clock used to drive the slave interface is trained by changing the clock phase of the clock to be aligned with the command data cluster and/or the data cluster of the data.

在另一實施例中，所述介面器件還包括其他從介面。所述其他從介面以一對一的關係耦合到其他從器件。所述其他從介面被配置成從所述主器件接收所述命令和/或將由所述其他從器件產生的其他資料發送到所述主器件。所述其他從介面由所述時鐘產生器產生的所述時鐘驅動且通過所述一個或多個結合件和/或所述TSV電連接到所述主介面。用於驅動所述其他從介面中的每一個的每一時鐘是通過將每一時鐘的時鐘相位改變成與和所述其他從介面中的每一個對應的所述命令的資料群集和/或所述資料的資料群集對齊來進行訓練。In another embodiment, the interface device further includes other slave interfaces. The other slave interfaces are coupled to other slave devices in a one-to-one relationship. The other slave interface is configured to receive the command from the master device and/or send other data generated by the other slave device to the master device. The other slave interface is driven by the clock generated by the clock generator and is electrically connected to the master interface through the one or more coupling members and/or the TSV. Each clock used to drive each of the other slave interfaces is performed by changing the clock phase of each clock to the data cluster and/or all of the commands corresponding to each of the other slave interfaces. The data clusters of the data are aligned for training.

在另一實施例中，所述從介面及所述其他從介面中的每一個還被配置成使用雙倍數據速率（DDR）配置將所述資料/所述其他資料發送到所述主介面。所述DDR配置是由DDR單元產生。所述DDR單元包括第一觸發器（FF）單元、第二觸發器單元以及多工器。所述第一FF單元被配置成根據所述資料/所述其他資料產生一部分資料。所述第二FF單元被配置成根據所述資料/所述其他資料產生另一部分資料。所述多工器耦合到所述第一FF單元及所述第二FF單元。所述多工器被配置成將所述一部分資料及所述另一部分資料發送到所述主器件。In another embodiment, each of the slave interface and the other slave interfaces is further configured to use a double data rate (DDR) configuration to send the data/the other data to the master interface. The DDR configuration is generated by the DDR unit. The DDR unit includes a first flip-flop (FF) unit, a second flip-flop unit, and a multiplexer. The first FF unit is configured to generate a part of data based on the data/the other data. The second FF unit is configured to generate another part of data based on the data/the other data. The multiplexer is coupled to the first FF unit and the second FF unit. The multiplexer is configured to send the part of the data and the other part of the data to the main device.

在另一實施例中，所述從介面及所述其他從介面中的每一個還包括第一選通單元以及第二選通單元。所述第一選通單元被配置成根據由所述時鐘產生器產生的所述時鐘產生第一本地時鐘。所述第二選通單元被配置成根據由所述時鐘產生器產生的所述時鐘產生第二本地時鐘。由所述第一選通單元產生的所述第一本地時鐘由所述主介面用於讀取由所述第一FF單元產生的所述一部分資料。由所述第二選通單元產生的所述第二本地時鐘由所述主介面用於讀取由所述第二FF單元產生的所述另一部分資料。In another embodiment, each of the slave interface and the other slave interfaces further includes a first gating unit and a second gating unit. The first gating unit is configured to generate a first local clock according to the clock generated by the clock generator. The second gating unit is configured to generate a second local clock according to the clock generated by the clock generator. The first local clock generated by the first gating unit is used by the main interface to read the part of the data generated by the first FF unit. The second local clock generated by the second gating unit is used by the main interface to read the other part of the data generated by the second FF unit.

在另一實施例中，所述主器件還產生轉向（TA）迴圈。所述TA迴圈用於防止所述從器件的回應與所述其他從器件的回應之間的匯流排競爭。在另一實施例中，所述TA迴圈用於補償所述從器件與所述其他從器件之間的往返延遲（RTD）差。在另一實施例中，所述從器件及所述其他從器件在所述資料之前及所述資料之後產生零資料，以防止由於不同的所述RTD而引起的所述從器件與所述其他從器件之間的競爭。在另一實施例中，所述主介面還包括先進先出（FIFO）單元，所述先進先出單元被配置成使用由所述時鐘產生器產生的所述時鐘對來自所述從介面及所述其他從介面的所述DDR單元的所述一部分資料及所述另一部分資料進行重新計時。在另一實施例中，所述從器件及所述其他從器件對所述第一選通單元及所述第二選通單元進行訓練，以在最佳資料採樣點定位所述一部分資料及所述另一部分資料。In another embodiment, the main device also generates a steering (TA) loop. The TA loop is used to prevent bus competition between the response of the slave device and the response of the other slave devices. In another embodiment, the TA loop is used to compensate for the round trip delay (RTD) difference between the slave device and the other slave devices. In another embodiment, the slave device and the other slave devices generate zero data before the data and after the data to prevent the slave device and the other slave devices from being caused by different RTDs. Competition between slave devices. In another embodiment, the master interface further includes a first-in-first-out (FIFO) unit configured to use the clock pair generated by the clock generator from the slave interface and all The part of the data and the other part of the data of the DDR unit of the other slave interface are timed again. In another embodiment, the slave device and the other slave devices train the first gating unit and the second gating unit to locate the part of the data and all the data at the optimal data sampling point. State another part of the information.

在另一實施例中，提供一種用於在主器件與從器件之間進行介面的介面方法，其中所述主器件產生命令且所述從器件根據所述命令產生資料，所述介面方法包括：由主介面將所述命令發送到所述從器件和/或從所述從器件接收所述資料；以及由從介面從所述主器件接收所述命令和/或將所述資料發送到所述主器件。所述主介面及所述從介面由時鐘產生器產生的時鐘驅動。所述主介面與所述從介面由一個或多個結合件和/或TSV進行電連接。用於驅動所述從介面的所述時鐘是通過將所述時鐘的時鐘相位改變成與所述命令的資料群集和/或所述資料的資料群集對齊來進行訓練。In another embodiment, an interface method for interfacing between a master device and a slave device is provided, wherein the master device generates a command and the slave device generates data according to the command, and the interface method includes: The master interface sends the command to the slave device and/or receives the data from the slave device; and the slave interface receives the command from the master device and/or sends the data to the slave device. The main device. The master interface and the slave interface are driven by clocks generated by a clock generator. The master interface and the slave interface are electrically connected by one or more couplings and/or TSVs. The clock used to drive the slave interface is trained by changing the clock phase of the clock to be aligned with the command data cluster and/or the data cluster of the data.

在另一實施例中，所述介面方法還包括：由其他從介面從所述主器件接收所述命令和/或將由所述其他從器件產生的其他資料發送到所述主器件。所述其他從介面由所述時鐘產生器產生的所述時鐘驅動且通過所述一個或多個結合件和/或所述TSV電連接到所述主介面。用於驅動所述其他從介面中的每一個的每一時鐘是通過將每一時鐘的時鐘相位改變成與和所述其他從介面中的每一個對應的所述命令的資料群集和/或所述資料的資料群集對齊來進行訓練。In another embodiment, the interface method further includes: receiving the command from the master device by another slave interface and/or sending other data generated by the other slave device to the master device. The other slave interface is driven by the clock generated by the clock generator and is electrically connected to the master interface through the one or more coupling members and/or the TSV. Each clock used to drive each of the other slave interfaces is performed by changing the clock phase of each clock to the data cluster and/or all of the commands corresponding to each of the other slave interfaces. The data clusters of the data are aligned for training.

在另一實施例中，所述從介面及所述其他從介面中的每一個從所述主器件接收所述命令和/或將所述資料/所述其他資料發送到所述主器件還包括：使用雙倍數據速率（DDR）配置將所述資料/所述其他資料發送到所述主介面。使用雙倍數據速率（DDR）配置將所述資料/所述其他資料發送到所述主介面還包括：由第一觸發器（FF）單元根據所述資料/所述其他資料產生一部分資料；由第二FF單元根據所述資料/所述其他資料產生另一部分資料；以及由多工器將所述一部分資料及所述另一部分資料發送到所述主器件。In another embodiment, each of the slave interface and the other slave interfaces receiving the command from the master device and/or sending the data/the other data to the master device further includes : Use double data rate (DDR) configuration to send the data/the other data to the main interface. Using the double data rate (DDR) configuration to send the data/the other data to the main interface further includes: generating a part of the data according to the data/the other data by the first flip-flop (FF) unit; The second FF unit generates another part of the data according to the data/the other data; and the multiplexer sends the part of the data and the other part of the data to the main device.

在另一實施例中，使用DDR配置將所述資料/所述其他資料發送到所述主介面還包括：由第一選通單元根據由所述時鐘產生器產生的所述時鐘產生第一本地時鐘；以及由第二選通單元根據由所述時鐘產生器產生的所述時鐘產生第二本地時鐘。由所述第一選通單元產生的所述第一本地時鐘由所述主介面用於讀取由所述第一FF單元產生的所述一部分資料。由所述第二選通單元產生的所述第二本地時鐘由所述主介面用於讀取由所述第二FF單元產生的所述另一部分資料。In another embodiment, using the DDR configuration to send the data/the other data to the main interface further includes: generating, by the first gating unit, a first local device according to the clock generated by the clock generator. Clock; and the second gating unit generates a second local clock according to the clock generated by the clock generator. The first local clock generated by the first gating unit is used by the main interface to read the part of the data generated by the first FF unit. The second local clock generated by the second gating unit is used by the main interface to read the other part of the data generated by the second FF unit.

在另一實施例中，所述主器件還產生轉向（TA）迴圈。所述TA迴圈用於防止所述從器件的回應與所述其他從器件的回應之間的匯流排競爭。在另一實施例中，所述TA迴圈用於補償所述從器件與所述其他從器件之間的往返延遲（RTD）差。在另一實施例中，所述從器件及所述其他從器件在所述資料之前及所述資料之後產生零資料，以防止由於不同的所述RTD而引起的所述從器件與所述其他從器件之間的競爭。在另一實施例中，由所述主介面將所述命令發送到所述從器件和/或從所述從器件接收所述資料還包括：使用由所述時鐘產生器產生的所述時鐘對來自所述從介面及所述其他從介面的所述DDR單元的所述一部分資料及所述另一部分資料進行重新計時。在另一實施例中，所述從器件及所述其他從器件對所述第一選通單元及所述第二選通單元進行訓練，以在最佳資料採樣點定位所述一部分資料及所述另一部分資料。In another embodiment, the main device also generates a steering (TA) loop. The TA loop is used to prevent bus competition between the response of the slave device and the response of the other slave devices. In another embodiment, the TA loop is used to compensate for the round trip delay (RTD) difference between the slave device and the other slave devices. In another embodiment, the slave device and the other slave devices generate zero data before the data and after the data to prevent the slave device and the other slave devices from being caused by different RTDs. Competition between slave devices. In another embodiment, sending the command to the slave device by the master interface and/or receiving the data from the slave device further includes: using the clock pair generated by the clock generator The part of the data and the other part of the data from the DDR unit of the slave interface and the other slave interfaces are retimed. In another embodiment, the slave device and the other slave devices train the first gating unit and the second gating unit to locate the part of the data and all the data at the optimal data sampling point. State another part of the information.

以上已概述若干實施例的特徵，以使所屬領域中的技術人員可更好地理解以下詳細說明。所屬領域中的技術人員應理解，他們可容易地使用本公開作為設計或修改其他工藝及結構的基礎來施行與本文中所介紹的實施例相同的目的和/或實現與本文中所介紹的實施例相同的優點。所屬領域中的技術人員還應認識到，這些等效構造並不背離本公開的精神及範圍，而且他們可在不背離本公開的精神及範圍的條件下在本文中作出各種改變、代替及變更。The features of several embodiments have been summarized above so that those skilled in the art can better understand the following detailed description. Those skilled in the art should understand that they can easily use the present disclosure as a basis for designing or modifying other processes and structures to perform the same purpose as the embodiments described herein and/or achieve the implementations described herein Example of the same advantages. Those skilled in the art should also realize that these equivalent structures do not depart from the spirit and scope of the present disclosure, and they can make various changes, substitutions and alterations in this article without departing from the spirit and scope of the present disclosure. .

100、200、300、400、500、600、700:半導體器件 101、111:介面器件 102、501-2:主介面 103、103-1～103-N、Glink-3D slaves:從介面 104、406、412、502-3、502-4、503-3:矽穿孔 105、106:主器件 107、115:時鐘產生器 108-1～108-M:中央處理單元 110、110-1～110-N:從器件 120、501-1:主晶片 130:從晶片 402:晶片 404:介面 408:晶片 410:介面 414、506:連接件 502-1:第一從晶片 502-2:第一從介面 503-1:第二從晶片 503-2:第二從介面 504:TSV連接件 800、900、1000:示意圖 802、803-1、803-2、803-3、810、812、814、904、1031、1032、1041、1042、1051、1061:觸發器（FF） 804、816:雙倍數據速率（DDR）多工器（MUX） 806-1、806-2、806-3:結合件 808-1、808-2、808-3:結合件 818、820、1008、1017、1018:緩衝器 901:SRAM 902:邏輯單元 1002:第一FF 1004:第二FF 1006:多工器 1011、1012、1013、1014、1021、1022、1023、1024:結合件 1015:第一選通 1016:第二選通 1019:時鐘路徑 1020:時鐘樹 1030:反相器 1300:時序圖 1400:電路圖 1500、1600、1700:流程圖 address wr_data、NOP、rx_data command、s_cmd、tx_data command:命令 clk、clk_in、clk_out、slaveK clk_B、slaveN clk_B:時鐘 d_did:從ID dataK、dataN0、dataN1、DK[15:0]、DK[31:16]、DN[15:0]、DN[31:16]、Master RX_D、rx_data、rx_data[31:0]、tx_data、tx_data[31:16]、tx_data[15:0]、tx_dataK、tx_dataN、tx_dataN0、tx_dataN1:數據 Glink-3D:設備介面 Glink-3D master:主介面 Glink-3D slaveK、Glink-3D slaveN:從介面 PA:前同步命令 RD:讀取命令 RDQS_F:本地時鐘 RDQS_F Initial:本地時鐘 RDQS_R:本地時鐘 RDQS_R Initial:本地時鐘 RDQS_F Trained、RDQS_R Trained:相位 S1505、S1510、S1515、S1520、S1555、S1560、S1565、S1570、S1575、S1605、S1610、S1615、S1620、S1625、S1630、S1635、S1640、S1645、S1650、S1655、S1660、S1665、S1670、S1675、S1680、S1685、S1690、S1695、S1702、S1704、S1706、S1708、S1710、S1712、S1714、S1716、S1720、S1722、S1724、S1726、S1728、S1730、S1732、S1734、S1736、S1738、S1740、S1742、S1744、S1746、S1748、S1750、S1752、S1754、S1756、S1758、S1760、S1762、S1764、S1766、S1805、S1810、S1815、1820、S1825、S1830、1835、S1840、S1845、S1850:步驟 slave_ID:從晶片位址 slaveN、slaveK:晶片 TX_D:從結合件 tx_data [31:0]:資料 tx_en:啟用信號 100, 200, 300, 400, 500, 600, 700: semiconductor devices 101, 111: Interface devices 102, 501-2: Main interface 103, 103-1～103-N, Glink-3D slaves: slave interface 104, 406, 412, 502-3, 502-4, 503-3: silicon through hole 105, 106: master device 107, 115: clock generator 108-1～108-M: Central processing unit 110, 110-1～110-N: Slave device 120, 501-1: main chip 130: Slave chip 402: Chip 404: Interface 408: Chip 410: Interface 414, 506: connectors 502-1: The first slave chip 502-2: The first slave interface 503-1: The second slave chip 503-2: The second slave interface 504: TSV connector 800, 900, 1000: schematic diagram 802, 803-1, 803-2, 803-3, 810, 812, 814, 904, 1031, 1032, 1041, 1042, 1051, 1061: flip-flop (FF) 804, 816: Double Data Rate (DDR) Multiplexer (MUX) 806-1, 806-2, 806-3: Combination 808-1, 808-2, 808-3: combination parts 818, 820, 1008, 1017, 1018: buffer 901: SRAM 902: Logic Unit 1002: First FF 1004: second FF 1006: multiplexer 1011, 1012, 1013, 1014, 1021, 1022, 1023, 1024: combination 1015: first gate 1016: second strobe 1019: clock path 1020: clock tree 1030: inverter 1300: Timing diagram 1400: circuit diagram 1500, 1600, 1700: flow chart address wr_data, NOP, rx_data command, s_cmd, tx_data command: command clk, clk_in, clk_out, slaveK clk_B, slaveN clk_B: clock d_did: Slave ID dataK, dataN0, dataN1, DK[15:0], DK[31:16], DN[15:0], DN[31:16], Master RX_D, rx_data, rx_data[31:0], tx_data, tx_data[ 31:16], tx_data[15:0], tx_dataK, tx_dataN, tx_dataN0, tx_dataN1: data Glink-3D: Device interface Glink-3D master: main interface Glink-3D slaveK, Glink-3D slaveN: slave interface PA: pre-synchronization command RD: Read command RDQS_F: local clock RDQS_F Initial: local clock RDQS_R: local clock RDQS_R Initial: local clock RDQS_F Trained, RDQS_R Trained: phase S1505, S1510, S1515, S1520, S1555, S1560, S1565, S1570, S1575, S1605, S1610, S1615, S1620, S1625, S1630, S1635, S1640, S1645, S1650, S1655, S1660, S1665, S1670, S1675, S1675, S1685, S1690, S1695, S1702, S1704, S1706, S1708, S1710, S1712, S1714, S1716, S1720, S1722, S1724, S1726, S1728, S1730, S1732, S1734, S1736, S1738, S1740, S1742, S1744, S1746, S1748, S1750, S1752, S1754, S1756, S1758, S1760, S1762, S1764, S1766, S1805, S1810, S1815, 1820, S1825, S1830, 1835, S1840, S1845, S1850: Step slave_ID: slave chip address slaveN, slaveK: chip TX_D: From the combined piece tx_data [31:0]: data tx_en: enable signal

圖1示出根據本公開實施例的包括主器件及從器件的半導體器件的示意性方塊圖。圖2示出根據本公開實施例的包括主器件及從器件的半導體器件的示意性方塊圖。圖3示出根據本公開實施例的包括主器件及多個從器件的半導體器件的示意性方塊圖。圖4示出根據本公開實施例的包括主晶片及從晶片的半導體器件的示意性設計圖。圖5示出根據本公開實施例的包括主晶片及多個從晶片的半導體器件的示意性設計圖。圖6示出根據本公開實施例的包括主晶片及多個從晶片的半導體器件的示意性3D圖。圖7示出根據本公開實施例的包括介面器件結構的實例的半導體器件的示意性3D圖。圖8示出根據本公開實施例的包括主介面及多個從介面的介面器件的示意性示意圖。圖9示出根據本公開實施例的在讀取操作期間包括主晶片及從晶片的介面器件的示意性示意圖。圖10示出根據本公開實施例的包括時鐘樹（clock tree）的從到主介面（slave-to-master interface）的示意性示意圖。圖11示出根據本公開實施例的具有相同本地時鐘速度的兩個從晶片之間的資料的示意性時序圖。圖12示出根據本公開實施例的具有不同時鐘速度的兩個從晶片之間的資料的示意性時序圖。圖13示出根據本公開實施例的具有2個轉向（turn-around，TA）迴圈的兩個從晶片之間的資料的示意性時序圖。圖14示出根據本公開實施例的訓練之前與訓練之後的第一選通單元及第二選通單元的示意性比較。圖15示出根據本公開實施例的延遲鎖定回路（delay lock loop，DLL）訓練的示意性流程圖。圖16示出根據本公開實施例的寫入資料群集訓練的示意性流程圖。圖17示出根據本公開實施例的讀取資料群集訓練的示意性流程圖。圖18示出根據本公開實施例的介面方法。 FIG. 1 shows a schematic block diagram of a semiconductor device including a master device and a slave device according to an embodiment of the present disclosure. FIG. 2 shows a schematic block diagram of a semiconductor device including a master device and a slave device according to an embodiment of the present disclosure. FIG. 3 shows a schematic block diagram of a semiconductor device including a master device and a plurality of slave devices according to an embodiment of the present disclosure. FIG. 4 shows a schematic design diagram of a semiconductor device including a master wafer and a slave wafer according to an embodiment of the present disclosure. FIG. 5 shows a schematic design diagram of a semiconductor device including a master wafer and a plurality of slave wafers according to an embodiment of the present disclosure. FIG. 6 shows a schematic 3D diagram of a semiconductor device including a master wafer and a plurality of slave wafers according to an embodiment of the present disclosure. FIG. 7 shows a schematic 3D diagram of a semiconductor device including an example of an interface device structure according to an embodiment of the present disclosure. FIG. 8 shows a schematic diagram of an interface device including a master interface and a plurality of slave interfaces according to an embodiment of the present disclosure. FIG. 9 shows a schematic diagram of an interface device including a master wafer and a slave wafer during a read operation according to an embodiment of the present disclosure. FIG. 10 shows a schematic diagram of a slave-to-master interface including a clock tree according to an embodiment of the present disclosure. FIG. 11 shows a schematic timing diagram of data between two slave chips having the same local clock speed according to an embodiment of the present disclosure. FIG. 12 shows a schematic timing diagram of data between two slave chips with different clock speeds according to an embodiment of the present disclosure. FIG. 13 shows a schematic timing diagram of data between two slave wafers with 2 turn-around (TA) loops according to an embodiment of the present disclosure. FIG. 14 shows a schematic comparison of the first gating unit and the second gating unit before and after training according to an embodiment of the present disclosure. FIG. 15 shows a schematic flowchart of delay lock loop (DLL) training according to an embodiment of the present disclosure. FIG. 16 shows a schematic flowchart of writing data cluster training according to an embodiment of the present disclosure. Fig. 17 shows a schematic flow chart of reading data cluster training according to an embodiment of the present disclosure. FIG. 18 shows an interface method according to an embodiment of the present disclosure.

102:主介面 102: Main Interface

103-1、103-N: 103-1, 103-N:

105:主器件 105: master device

108-1、108-M:中央處理單元(CPU) 108-1, 108-M: Central Processing Unit (CPU)

110-1、110-N:從器件 110-1, 110-N: Slave device

111:介面器件 111: Interface device

115:時鐘產生器 115: clock generator

300:半導體器件 300: semiconductor device

Claims

An interface device for performing an interface between a master device and a slave device, wherein the master device generates a command and the slave device generates data according to the command, and the interface device includes: a master interface coupled to the master A device configured to send the command to the slave device and/or receive the data from the slave device; and a slave interface, coupled to the slave device, configured to receive the data from the master device Command and/or send the data to the master device, wherein the master interface and the slave interface are driven by a clock generated by a clock generator; the master interface and the slave interface are composed of one or more combination components And/or silicon via for electrical connection, and the clock used to drive the slave interface is performed by changing the clock phase of the clock to be aligned with the command data cluster and/or the data cluster train.

The interface device according to claim 1, further comprising other slave interfaces, the other slave interfaces are coupled to other slave devices in a one-to-one relationship, and are configured to receive the command from the master device and/or be controlled by all The other data generated by the other slave device is sent to the master device, wherein the other slave interface is driven by the clock generated by the clock generator and passes through the one or more bonding members and/or the silicon via Electrically connected to the main interface, Each clock used to drive each of the other slave interfaces is performed by changing the clock phase of each clock to the data cluster and/or all of the commands corresponding to each of the other slave interfaces. The data clusters of the data are aligned for training.

The interface device according to claim 2, wherein each of the slave interface and the other slave interfaces is further configured to use a double data rate (DDR) configuration to send the data/the other data to the The main interface is described.

The interface device according to claim 3, wherein the double data rate configuration is generated by a double data rate unit, and the double data rate unit includes: a first flip-flop (FF) unit configured to The data/the other data generates a part of the data; the second trigger unit is configured to generate another part of the data according to the data/the other data; and a multiplexer is coupled to the first trigger unit and the The second trigger unit is configured to send the part of the data and the other part of the data to the host device.

The interface device according to claim 4, wherein each of the slave interface and the other slave interfaces further includes: a first strobe unit configured to generate the clock according to the clock generated by the clock generator A first local clock; and a second gating unit configured to generate a second local clock according to the clock generated by the clock generator, The first local clock generated by the first gating unit is used by the main interface to read the part of the data generated by the first flip-flop unit, and is used by the second gating unit The generated second local clock is used by the main interface to read the other part of the data generated by the second flip-flop unit.

The interface device according to claim 2, wherein the master device also generates a turnaround (TA) loop, wherein the turnaround loop is used to prevent the response of the slave device from the response of the other slave device. Bus competition.

The interface device according to claim 6, wherein the turning loop is used to compensate for a round-trip delay (RTD) difference between the slave device and the other slave devices.

The interface device according to claim 7, wherein the slave device and the other slave devices generate zero data before the data and after the data, so as to prevent the slave devices from being caused by different round-trip delays. The competition between the device and the other slave devices.

The interface device according to claim 4, wherein the main interface further includes a first-in-first-out (FIFO) unit, and the first-in-first-out unit is configured to use the clock pair generated by the clock generator to come from the The part of the data and the other part of the data of the double data rate unit of the slave interface and the other slave interface are retimed.

The interface device according to claim 5, wherein the slave device and the other slave devices train the first gating unit and the second gating unit to locate the optimal data sampling point Part of the data and the other part of the data.

An interface method for performing an interface between a master device and a slave device, wherein the master device generates a command and the slave device generates data according to the command, and the interface method includes: sending the command by the master interface To the slave device and/or receive the data from the slave device; and receive the command from the master device by a slave interface and/or send the data to the master device, wherein the master interface And the slave interface is driven by a clock generated by a clock generator, wherein the master interface and the slave interface are electrically connected by one or more bonding members and/or silicon vias, and all of the slave interfaces are used to drive the slave interface. The clock is trained by changing the clock phase of the clock to be aligned with the commanded data cluster and/or the data cluster of the data.

The interface method according to claim 11, further comprising: receiving the command from the master device by another slave interface and/or sending other data generated by the other slave device to the master device, wherein the other slave interface The slave interface is driven by the clock generated by the clock generator and is electrically connected to the master interface through the one or more bonding members and/or the silicon via, and is used to drive the other slave interfaces Each clock of each is achieved by changing the clock phase of each clock to be paired with each of the other slave interfaces. Training is performed in response to the command data cluster and/or the data cluster alignment of the data.

The interface method according to claim 12, wherein each of the slave interface and the other slave interfaces receives the command from the master device and/or sends the data/the other data to the The host device further includes: sending the data/the other data to the host interface using a double data rate (DDR) configuration.

The interface method according to claim 13, wherein the double data rate configuration is generated by a double data rate unit, and the data/the other data are sent to all using the double data rate (DDR) configuration The main interface also includes: a first trigger (FF) unit generates a part of data based on the data/the other data; a second trigger unit generates another part of data based on the data/the other data; and The multiplexer sends the part of the data and the other part of the data to the main device.

The interface method according to claim 14, wherein sending the data/the other data to the main interface using the double data rate configuration further includes: The generated clock generates a first local clock; and the second gating unit generates a second local clock according to the clock generated by the clock generator, The first local clock generated by the first gating unit is used by the main interface to read the part of the data generated by the first flip-flop unit, and is used by the second gating unit The generated second local clock is used by the main interface to read the other part of the data generated by the second flip-flop unit.

The interface method according to claim 12, wherein the master device also generates a steering (TA) loop, wherein the steering loop is used to prevent a response between the slave device and the response of the other slave devices. Bus competition.

The interface method according to claim 16, wherein the steering loop is used to compensate for a round-trip delay (RTD) difference between the slave device and the other slave devices.

The interface method according to claim 17, wherein the slave device and the other slave devices generate zero data before the data and after the data, so as to prevent the slave devices from being caused by different round-trip delays. The competition between the device and the other slave devices.

The interface method according to claim 14, wherein the master interface sending the command to the slave device and/or receiving the data from the slave device further includes: using the clock generator generated The clock retimes the part of the data and the other part of the data from the double data rate unit from the slave interface and the other slave interfaces.

The interface method according to claim 15, wherein the slave device and the other slave device train the first gating unit and the second gating unit Practice to locate the part of the data and the other part of the data at the best data sampling point.