TW202215244A

TW202215244A - Interface device and interface method for 3d semiconductor device

Info

Publication number: TW202215244A
Application number: TW109141468A
Authority: TW
Inventors: 毅格艾爾卡諾維奇; 阿姆農帕納斯; 喻珮; 葉力墾; 方勇勝; 林聖偉; 黃智強; 譚競豪; 陳卿芳
Original assignee: 創意電子股份有限公司; 台灣積體電路製造股份有限公司
Priority date: 2020-09-30
Filing date: 2020-11-26
Publication date: 2022-04-16
Also published as: CN114328328B; TWI744113B; CN114328328A

Abstract

An interface device and an interface method for interfacing between a master device and a slave device is provided. The master device generates command and the slave device generates data according to the command. The interface device includes a master interface and a slave interface. The master interface is coupled to the master device and configured to send the command to the slave device and/or receive the data from the slave device. The slave interface is coupled to the slave device and configured to receive the command from the master device and/or send the data to the master device. The master interface and the slave interface are driven by a clock generated by a clock generator. The master interface and the slave interface are electrically connected by one or plurality of bonds and/or TSVs.

Description

Interface device and interface method for three-dimensional semiconductor device

本公開涉及一種用於三維（three dimensional，3D）半導體器件的技術，且更具體來說，涉及一種用於3D半導體器件的介面器件及介面方法。The present disclosure relates to a technique for a three-dimensional (3D) semiconductor device, and more particularly, to an interface device and an interface method for the 3D semiconductor device.

近年來，電子器件例如個人電腦（personal computer，PC）或智慧手機，已在封裝方面得到發展，這樣一來，電子器件的大小變得緊湊且生產成本可相應地得到降低。電子器件發展的關鍵因素之一是3D半導體技術。可通過將中央處理單元（central processing unit，CPU）與記憶體垂直地內連來將包括CPU及記憶體的各種半導體器件集成到單個晶片中。這種結構一般來說被稱為3D積體電路（3D integrated circuit，3D IC）。另一方面，為了維持可靠的資料傳送/通信，需要由介面器件來調節一個CPU/記憶體與其他CPU/記憶體之間的內連。然而，3D積體電路的介面器件仍在開發中。In recent years, electronic devices, such as personal computers (PCs) or smart phones, have been developed in terms of packaging, so that the size of the electronic devices has become compact and the production cost can be reduced accordingly. One of the key factors in the development of electronic devices is 3D semiconductor technology. Various semiconductor devices including a CPU and a memory can be integrated into a single chip by vertically interconnecting a central processing unit (CPU) with the memory. This structure is generally referred to as a 3D integrated circuit (3D integrated circuit, 3D IC). On the other hand, in order to maintain reliable data transfer/communication, the interconnection between one CPU/memory and other CPU/memory needs to be regulated by an interface device. However, interface devices for 3D integrated circuits are still under development.

本發明提供一種用於3D半導體器件的介面器件及介面方法。所述介面器件及所述介面方法在主器件與從器件之間提供可靠的資料通信。The present invention provides an interface device and an interface method for a 3D semiconductor device. The interface device and the interface method provide reliable data communication between a master device and a slave device.

在實施例中，本發明提供一種用於在主器件與從器件之間進行介面的介面器件。所述主器件產生命令且所述從器件根據所述命令產生資料，所述介面器件包括主介面及從介面。所述主介面耦合到所述主器件。所述主介面被配置成將所述命令發送到所述從器件和/或從所述從器件接收所述資料。所述從介面耦合到所述從器件。所述從介面被配置成從所述主器件接收所述命令和/或將所述資料發送到所述主器件。所述主介面及所述從介面由時鐘產生器產生的時鐘驅動。所述主介面與所述從介面由一個或多個結合件進行電連接。用於驅動所述從介面的所述時鐘是通過將所述時鐘的時鐘相位改變成與所述命令的資料群集和/或所述資料的資料群集對齊來進行訓練。In an embodiment, the present invention provides an interface device for interfacing between a master device and a slave device. The master device generates commands and the slave device generates data according to the commands, and the interface device includes a master interface and a slave interface. The host interface is coupled to the host device. The master interface is configured to send the command to the slave device and/or receive the data from the slave device. The slave interface is coupled to the slave device. The slave interface is configured to receive the command from the master device and/or send the data to the master device. The master interface and the slave interface are driven by a clock generated by a clock generator. The master interface and the slave interface are electrically connected by one or more bonding elements. The clock used to drive the slave interface is trained by changing the clock phase of the clock to align with the data clusters of the command and/or the data clusters of the data.

在實施例中，本發明提供一種用於在主器件與從器件之間進行介面的介面方法。所述主器件產生命令且所述從器件根據所述命令產生資料，所述介面方法包括：由主介面將所述命令發送到所述從器件和/或從所述從器件接收所述資料；以及由從介面從所述主器件接收所述命令和/或將所述資料發送到所述主器件。所述主介面及所述從介面由時鐘產生器產生的時鐘驅動。所述主介面與所述從介面由一個或多個結合件進行電連接。用於驅動所述從介面的所述時鐘是通過將所述時鐘的時鐘相位改變成與所述命令的資料群集和/或所述資料的資料群集對齊來進行訓練。In an embodiment, the present invention provides an interface method for interfacing between a master device and a slave device. The master device generates a command and the slave device generates data according to the command, and the interface method includes: sending the command to the slave device by the master interface and/or receiving the data from the slave device; and receiving the command from the master device and/or sending the data to the master device by a slave interface. The master interface and the slave interface are driven by a clock generated by a clock generator. The master interface and the slave interface are electrically connected by one or more bonding elements. The clock used to drive the slave interface is trained by changing the clock phase of the clock to align with the data clusters of the command and/or the data clusters of the data.

為了使上述內容更容易理解，以下將詳細闡述附有圖式的若干實施例。In order to make the above content easier to understand, several embodiments with accompanying drawings will be explained in detail below.

以下公開內容提供用於實施本公開的不同特徵的許多不同實施例或實例。以下闡述元件及排列的具體實例以簡化本公開。當然，這些僅為實例且不旨在進行限制。舉例來說，以下說明中將第一特徵形成在第二特徵之上或第二特徵上可包括其中第一特徵與第二特徵被形成為直接接觸的實施例，且也可包括其中第一特徵與第二特徵之間可形成有附加特徵從而使得所述第一特徵與所述第二特徵可不直接接觸的實施例。另外，本公開可能在各種實例中重複使用參考編號和/或字母。這種重複使用是出於簡潔及清晰的目的，而不是自身指示所論述的各種實施例和/或配置之間的關係。The following disclosure provides many different embodiments or examples for implementing different features of the present disclosure. Specific examples of elements and arrangements are set forth below to simplify the present disclosure. Of course, these are only examples and are not intended to be limiting. For example, forming a first feature over or on a second feature in the following description may include embodiments in which the first and second features are formed in direct contact, and may also include embodiments in which the first feature is formed Embodiments in which additional features may be formed between and second features such that the first and second features may not be in direct contact. Additionally, the present disclosure may reuse reference numbers and/or letters in various instances. This re-use is for the purpose of brevity and clarity and is not itself indicative of the relationship between the various embodiments and/or configurations discussed.

此外，為易於說明，本文中可能使用例如“在...之下（beneath）”、“在...下方（below）”、“下部的（lower）”、“在...上方（above）”、“上部的（upper）”等空間相對性用語來闡述圖中所示的一個元件或特徵與另一（其他）元件或特徵的關係。所述空間相對性用語旨在除圖中所繪示的取向外還囊括器件在使用或操作中的不同取向。設備可具有其他取向（旋轉90度或處於其他取向），且本文中所使用的空間相對性描述語可同樣相應地進行解釋。Also, for ease of description, for example, "beneath", "below", "lower", "above" may be used herein. )", "upper" and other spatially relative terms are used to describe the relationship of one element or feature to another (other) element or feature shown in the figures. The spatially relative terms are intended to encompass different orientations of the device in use or operation in addition to the orientation depicted in the figures. The device may have other orientations (rotated 90 degrees or at other orientations) and the spatially relative descriptors used herein interpreted accordingly.

本公開公開一種用於3D半導體器件的介面器件及介面方法。介面器件在主器件與從器件之間提供可靠的資料通信。所述可靠的資料通信是通過根據時鐘產生器產生的時鐘將主器件提供的資料等待時間分配到每一從器件來產生。每一從器件具有根據時鐘產生器的時鐘產生的本地時鐘。每一從器件可調整本地時鐘，這樣一來，可避免從器件之間的資料競爭。此外，通過避免每一從器件之間的資料競爭，可將位元錯誤（bit error）最小化或避免位元錯誤，這樣一來，便不需要使用糾錯模組及方法。因此，可提高資料通信速度。The present disclosure discloses an interface device and an interface method for a 3D semiconductor device. Interface devices provide reliable data communication between master and slave devices. The reliable data communication is produced by distributing the data latency provided by the master device to each slave device according to the clock generated by the clock generator. Each slave has a local clock generated from the clock of the clock generator. Each slave can adjust its local clock, thus avoiding data races between slaves. Furthermore, by avoiding data races between each slave device, bit errors can be minimized or avoided, thus eliminating the need for error correction modules and methods. Therefore, the data communication speed can be improved.

另外，當啟動電子器件時，每一從器件能夠通過向主器件發送內置自測（built-in-self-test，BIST）資料來訓練每一從器件的本地時鐘。通過精確地產生本地時鐘，每一從器件能夠提供具有低錯誤率或零錯誤率的精確資料。通過這樣做，不需要糾錯且可相應地提高資料通信速度。為了避免每一從器件之間的資料競爭且訓練每一從器件的本地時鐘，將根據以下提供的實施例詳述介面器件及介面方法的實施方式（特別是考慮到從到主介面的實施方式）。Additionally, each slave device can train each slave device's local clock by sending built-in-self-test (BIST) data to the master device when the electronics are powered on. By accurately generating a local clock, each slave device can provide accurate data with low or zero error rates. By doing so, error correction is not required and the data communication speed can be increased accordingly. In order to avoid data races between each slave and to train each slave's local clock, the implementation of the interface device and interface method will be detailed in accordance with the examples provided below (especially considering the implementation of the slave-to-master interface ).

圖1示出根據本公開實施例的包括主器件及從器件的半導體器件的示意性方塊圖。半導體器件100實施在例如以下3D封裝中：晶粒對晶片對基板（chip-on-wafer-on-substrate，CoWoS）、系統集成晶片（system-on-integrated-chip，SoIC）、晶片對晶片（wafer-on-wafer，WoW）及其他3D封裝集成。FIG. 1 shows a schematic block diagram of a semiconductor device including a master device and a slave device according to an embodiment of the present disclosure. The semiconductor device 100 is implemented in, for example, the following 3D packages: chip-on-wafer-on-substrate (CoWoS), system-on-integrated-chip (SoIC), wafer-to-wafer ( wafer-on-wafer, WoW) and other 3D packaging integration.

參照圖1，半導體器件100包括主晶片120、從晶片130及時鐘產生器115。主晶片120通過矽穿孔（through-silicon-via，TSV）104耦合到從晶片130。主晶片120包括主器件105及耦合到主器件105的主介面102。另一方面，從晶片130包括從器件110及耦合到從器件110的從介面103。主器件105經由主介面102及從介面103耦合到從器件110。主介面102與從介面103經由TSV 104耦合且整合在一起作為介面器件101。介面器件101適於垂直地連接主器件105與從器件110，此形成3D半導體器件。介面器件101的結構被稱為Glink-3D。此外，時鐘產生器115產生用於驅動主器件105、主介面102、從介面103及從器件110的時鐘。由時鐘產生器115產生的時鐘以正向及反向用於主介面102及從介面103。Referring to FIG. 1 , a semiconductor device 100 includes a master wafer 120 , a slave wafer 130 and a clock generator 115 . Master wafer 120 is coupled to slave wafer 130 by through-silicon-via (TSV) 104 . The host wafer 120 includes a host device 105 and a host interface 102 coupled to the host device 105 . On the other hand, slave wafer 130 includes slave device 110 and slave interface 103 coupled to slave device 110 . The master device 105 is coupled to the slave device 110 via the master interface 102 and the slave interface 103 . The master interface 102 and the slave interface 103 are coupled through the TSV 104 and integrated together as the interface device 101 . The interface device 101 is adapted to connect the master device 105 and the slave device 110 vertically, which forms a 3D semiconductor device. The structure of the interface device 101 is called Glink-3D. In addition, the clock generator 115 generates a clock for driving the master device 105 , the master interface 102 , the slave interface 103 and the slave device 110 . The clock generated by the clock generator 115 is used for the master interface 102 and the slave interface 103 in the forward and reverse directions.

在實施例中，主器件105及從器件110分別被實施成例如處理器及記憶體（即，靜態隨機存取記憶體（static random access memory，SRAM））。時鐘產生器115由例如振盪器實施。主介面102與從介面103之間的連接由具有平行匯流排的TSV實施，所述平行匯流排用於以高達5.0 Gbps的採樣速率或2.5 GHz的雙倍數據速率（double data rate，DDR）傳送資料。平行匯流排還用於在從器件110與從介面103之間以及還在主器件105與主介面102之間耦合。在實施例中，主器件105與從器件110之間的等待時間被設定為1 ns到2 ns。主器件105與從器件110之間的資料傳送具有低位元錯誤或無位元錯誤（no bit error，no BER）。In an embodiment, the master device 105 and the slave device 110 are implemented, for example, as a processor and a memory (ie, static random access memory (SRAM)), respectively. The clock generator 115 is implemented by, for example, an oscillator. The connection between the master interface 102 and the slave interface 103 is implemented by TSVs with parallel bus bars for transmission at sampling rates up to 5.0 Gbps or double data rate (DDR) of 2.5 GHz material. Parallel busbars are also used for coupling between the slave device 110 and the slave interface 103 and also between the master device 105 and the master interface 102 . In an embodiment, the latency between the master device 105 and the slave device 110 is set to be 1 ns to 2 ns. The data transfer between the master device 105 and the slave device 110 has a little bit error or no bit error (no BER).

圖2示出根據本公開實施例的包括主器件及從器件的半導體器件的示意性方塊圖。圖2中所示的半導體器件200類似於圖1中所示的半導體器件100。不同之處在於，時鐘產生器107被實施在主器件106內部而不是被實施成如圖1中所示的外部時鐘產生器。2 shows a schematic block diagram of a semiconductor device including a master device and a slave device according to an embodiment of the present disclosure. The semiconductor device 200 shown in FIG. 2 is similar to the semiconductor device 100 shown in FIG. 1 . The difference is that the clock generator 107 is implemented inside the master device 106 rather than being implemented as an external clock generator as shown in FIG. 1 .

圖3示出根據本公開實施例的包括主器件及多個從器件的半導體器件的示意性方塊圖。圖3中所示的半導體器件300類似於圖1中所示的半導體器件100。不同之處在於，主器件105包括多個中央處理單元（CPU）108-1到108-M。此外，介面器件111包括主介面102及多個從介面103-1到103-N。每一從介面103-1到103-N以一對一的關係耦合到每一從器件110-1到110-N。N及M是等於或大於1的整數。此外，時鐘產生器115產生用於驅動具有所述多個CPU 108-1到108-M的主器件105、主介面102、所述多個從介面103-1到103-N、以及所述多個從器件110-1到110-N的時鐘。時鐘產生器115可如圖2中所示被包括到主器件105。3 shows a schematic block diagram of a semiconductor device including a master device and a plurality of slave devices according to an embodiment of the present disclosure. The semiconductor device 300 shown in FIG. 3 is similar to the semiconductor device 100 shown in FIG. 1 . The difference is that the main device 105 includes a plurality of central processing units (CPUs) 108-1 to 108-M. In addition, the interface device 111 includes a master interface 102 and a plurality of slave interfaces 103-1 to 103-N. Each slave interface 103-1 through 103-N is coupled to each slave device 110-1 through 110-N in a one-to-one relationship. N and M are integers equal to or greater than 1. In addition, the clock generator 115 generates for driving the master device 105 having the plurality of CPUs 108-1 to 108-M, the master interface 102, the plurality of slave interfaces 103-1 to 103-N, and the plurality of clocks from devices 110-1 to 110-N. The clock generator 115 may be included to the master device 105 as shown in FIG. 2 .

圖4示出根據本公開實施例的包括主晶片及從晶片的半導體器件的示意性設計圖。半導體器件400被垂直地排列以形成3D封裝，且包括例如與主介面404耦合的主/處理器/晶片1晶片402、與從介面410耦合的從/記憶體/晶片2晶片408。處理器晶片402與記憶體晶片408經由處理器介面404及記憶體介面410通過所述多個TSV 406耦合。此外，記憶體晶片408包括所述多個TSV 412及所述多個連接件414。4 shows a schematic design diagram of a semiconductor device including a master wafer and a slave wafer according to an embodiment of the present disclosure. Semiconductor device 400 is vertically aligned to form a 3D package and includes, for example, a master/processor/die 1 die 402 coupled with a master interface 404 , a slave/memory/die 2 die 408 coupled with a slave interface 410 . The processor chip 402 and the memory chip 408 are coupled through the plurality of TSVs 406 via the processor interface 404 and the memory interface 410 . Additionally, the memory chip 408 includes the plurality of TSVs 412 and the plurality of connectors 414 .

圖5示出根據本公開實施例的包括主晶片及多個從晶片的半導體器件的示意性設計圖。半導體器件500被垂直地排列以形成3D封裝，且包括例如耦合到主介面501-2的主晶片501-1、耦合到多個第一從介面502-2的多個第一從晶片502-1、以及耦合到多個第二從介面503-2的多個第二從晶片503-1。所述多個第一從晶片502-1包括TSV 502-4。主介面501-2經由TSV 502-3耦合到所述多個第一從介面502-2且經由TSV 503-3耦合到所述多個第二從介面503-2。此外，半導體器件包括將主介面501-2連接到連接件506的TSV連接件504。5 shows a schematic design diagram of a semiconductor device including a master wafer and a plurality of slave wafers according to an embodiment of the present disclosure. Semiconductor device 500 is vertically aligned to form a 3D package and includes, for example, a master die 501-1 coupled to a master interface 501-2, a plurality of first slave dies 502-1 coupled to a plurality of first slave interfaces 502-2 , and a plurality of second slave wafers 503-1 coupled to a plurality of second slave interfaces 503-2. The plurality of first slave wafers 502-1 includes a TSV 502-4. The master interface 501-2 is coupled to the plurality of first slave interfaces 502-2 via TSV 502-3 and to the plurality of second slave interfaces 503-2 via TSV 503-3. In addition, the semiconductor device includes a TSV connector 504 that connects the main interface 501 - 2 to the connector 506 .

在實施例中，半導體器件（即，500）支援面對面介面及面對背介面。舉例來說，主晶片501-1與第一從晶片502-1之間的介面和/或主晶片501-1與第二從晶片503-1之間的介面是面對面介面。並且面對背介面用於每一第一從晶片502-1之間的介面和/或每一第二從晶片503-1之間的介面。In an embodiment, the semiconductor device (ie, 500 ) supports a face-to-face interface and a face-to-back interface. For example, the interface between the master chip 501-1 and the first slave chip 502-1 and/or the interface between the master chip 501-1 and the second slave chip 503-1 is a face-to-face interface. And the face-to-back interface is used for the interface between each of the first slave chips 502-1 and/or the interface between each of the second slave chips 503-1.

圖6示出根據本公開實施例的包括主晶片及多個從晶片的半導體器件的示意性3D圖。半導體器件600包括處理器晶片，所述處理器晶片包括多個CPU核心晶片及經由作為設備介面的Glink-3D與處理器晶片垂直地連接的多個SRAM晶片。從處理器晶片到SRAM晶片並返回到處理器晶片的往返資料傳送的讀取等待時間等於或小於5 ns。實施此讀取等待時間值是為了實現處理器晶片與SRAM晶片之間的可靠的資料通信。6 shows a schematic 3D diagram of a semiconductor device including a master wafer and a plurality of slave wafers according to an embodiment of the present disclosure. The semiconductor device 600 includes a processor die including a plurality of CPU core die and a plurality of SRAM die connected vertically to the processor die via Glink-3D as a device interface. The read latency for round-trip data transfers from the processor die to the SRAM die and back to the processor die is 5 ns or less. This read latency value is implemented to enable reliable data communication between the processor die and the SRAM die.

圖7示出根據本公開實施例的包括介面器件結構的實例的半導體器件的示意性3D圖。半導體器件700包括耦合到作為主介面的Glink-3D master的CPU核心晶片及耦合到作為從介面的Glink-3D slaves的快取記憶體晶片。Glink-3D master經由TSV耦合到Glink-3D slaves。舉例來說，在讀取操作期間，CPU核心晶片經由Glink-3D master及Glink-3D slaves將命令發送到快取記憶體晶片。且然後，快取記憶體晶片接收來自CPU核心晶片的命令。快取記憶體晶片根據命令產生資料且經由Glink-3D slaves及Glink-3D master將資料發送到CPU核心晶片。最後，CPU核心晶片從快取記憶體晶片接收資料。此外，CPU核心晶片與快取記憶體晶片之間經由Glink-3D master及Glink-3D slaves的資料通信由時鐘產生器（即，115）產生的時鐘驅動。7 shows a schematic 3D diagram of a semiconductor device including an example of an interface device structure according to an embodiment of the present disclosure. Semiconductor device 700 includes a CPU core die coupled to a Glink-3D master as a master interface and a cache die coupled to Glink-3D slaves as a slave interface. Glink-3D master is coupled to Glink-3D slaves via TSV. For example, during a read operation, the CPU core chip sends commands to the cache chip via the Glink-3D master and Glink-3D slaves. And then, the cache chip receives the command from the CPU core chip. The cache chip generates data on command and sends the data to the CPU core chip via Glink-3D slaves and Glink-3D master. Finally, the CPU core chip receives data from the cache memory chip. In addition, the data communication between the CPU core chip and the cache chip via the Glink-3D master and Glink-3D slaves is driven by the clock generated by the clock generator (ie, 115).

在此實施例中，Glink-3D master與Glink-3D slaves具有相同的結構並且以一對一的關係連接。舉例來說，每一Glink-3D master及Glink-3D slaves包括多個塊。每一塊被分成多個胞元，例如5×5胞元。Glink-3D master的每一胞元經由TSV以一對一的關係連接到Glink-3D slaves的每一胞元。此Glink-3D結構被用作例如高級微控制器匯流排架構一致性集線器介面（advance microcontroller bus architecture coherent hub interface，AMBA CHI）協定的實體層。3D半導體器件上包括Glink-3D master及Glink-3D slaves的介面器件的細節及對應的實施方式將進一步闡述如下。In this embodiment, the Glink-3D master and Glink-3D slaves have the same structure and are connected in a one-to-one relationship. For example, each Glink-3D master and Glink-3D slaves includes multiple blocks. Each block is divided into multiple cells, eg, 5×5 cells. Each cell of the Glink-3D master is connected to each cell of the Glink-3D slaves in a one-to-one relationship via TSV. This Glink-3D structure is used, for example, as a physical layer for the advanced microcontroller bus architecture coherent hub interface (AMBA CHI) protocol. The details of the interface device including the Glink-3D master and the Glink-3D slaves on the 3D semiconductor device and the corresponding embodiments will be further described below.

圖8示出根據本公開實施例的包括主介面及多個從介面的介面器件的示意性示意圖。可使用多個電子元件（即，觸發器（flip-flop，FF）、多工器（multiplexer，MUX）、反相器及緩衝器）來實施示意圖800。FIG. 8 shows a schematic diagram of an interface device including a master interface and a plurality of slave interfaces according to an embodiment of the present disclosure. The schematic diagram 800 may be implemented using multiple electronic components (ie, flip-flops (FFs), multiplexers (MUXs), inverters, and buffers).

參照圖8，使用Glink-3D master作為主晶片的介面。Glink-3D slaveK及Glink-3D slaveN分別用作slaveK晶片的介面及slaveN晶片的介面。Glink-3D master、Glink-3D slaveK及Glink-3D slaveN由時鐘產生器（即，115）產生的時鐘clk_in驅動。Glink-3D master、Glink-3D slaveK及Glink-3D slaveN通過一個或多個結合件進行電連接。舉例來說，Glink-3D master結合件806-1到806-3使用TSV以一對一的關係連接到Glink-3D slaveN結合件808-1到808-3。Referring to Figure 8, use Glink-3D master as the interface of the master chip. Glink-3D slaveK and Glink-3D slaveN are used as the interface of slaveK chip and the interface of slaveN chip respectively. Glink-3D master, Glink-3D slaveK, and Glink-3D slaveN are driven by the clock clk_in generated by the clock generator (ie, 115). The Glink-3D master, Glink-3D slaveK and Glink-3D slaveN are electrically connected through one or more bonding pieces. For example, Glink-3D master bonds 806-1 to 806-3 are connected to Glink-3D slaveN bonds 808-1 to 808-3 in a one-to-one relationship using TSVs.

在此實施例中，Glink-3D master包括FF 802、DDR MUX 804、結合件806-1到806-3、以及包括多個FF 803-1到803-3的讀取先進先出（first-in-first-out，FIFO）。FF 802耦合到DDR MUX 804且從主晶片接收命令tx_data command。命令tx_data command可被形成為例如資料群集。命令tx_data command可包括用作從晶片位址的slave_ID。DDR MUX 804耦合到結合件806-1且以DDR資料格式的形式通過結合件806-1及808-1將命令tx_data command遞送（proceed）到Glink-3D slaveN。FF 803-1耦合到FF 803-2及結合件806-3。FF 803-3耦合到FF 803-2及主晶片且將資料rx_data發送到主晶片。FF 802、DDR MUX 804、結合件806-2及FF 803-3由時鐘產生器（即，115）所產生的clk_in驅動。FF 803-1及803-2是通過結合件806-3及808-3由例如Glink-3D slaveN產生的本地時鐘驅動。In this embodiment, the Glink-3D master includes FF 802, DDR MUX 804, bonds 806-1 through 806-3, and a read first-in first-out including multiple FFs 803-1 through 803-3 -first-out, FIFO). FF 802 is coupled to DDR MUX 804 and receives the command tx_data command from the master die. The command tx_data command can be formed, for example, as a data cluster. The command tx_data command may include the slave_ID used as the slave address. DDR MUX 804 is coupled to bond 806-1 and processes the command tx_data command to Glink-3D slaveN in DDR data format through bonds 806-1 and 808-1. FF 803-1 is coupled to FF 803-2 and bond 806-3. FF 803-3 is coupled to FF 803-2 and the master chip and sends data rx_data to the master chip. FF 802, DDR MUX 804, bond 806-2, and FF 803-3 are driven by clk_in generated by the clock generator (ie, 115). FFs 803-1 and 803-2 are driven by local clocks generated, for example, by Glink-3D slaveN through bonds 806-3 and 808-3.

在此實施例中，Glink-3D slaveN包括結合件808-1到808-3、FF 810到814、DDR MUX 816、以及緩衝器818及820。結合件808-1耦合到結合件806-1且FF 810將命令rx_data command發送到slaveN晶片。結合件808-2耦合到結合件806-2且將時鐘clk發送到slaveN晶片。FF 812耦合到DDR MUX 816及slaveN晶片且從slaveN晶片接收資料tx_data。FF 814耦合到slaveN晶片且接收啟用信號tx_en。緩衝器820耦合到DDR MUX 816及結合件808-3且以DDR資料格式的形式發送資料tx_data。緩衝器818耦合到結合件808-3且通過結合件808-3及806-3將本地時鐘發送到Glink 3D master。FF 810到FF 814及DDR MUX 816由時鐘clk驅動。緩衝器818及820由啟用信號tx_en驅動。另外，slaveK晶片及對應的Glink-3D slaveK具有與slaveN晶片及Glink-3D slaveN相同的結構及資料通信。Glink-3D slaveN與Glink-3D slaveK之間的不同之處在於本地時鐘的產生。產生本地時鐘的過程將在稍後根據圖10進行闡述。In this embodiment, Glink-3D slaveN includes bonds 808-1 to 808-3, FFs 810 to 814, DDR MUX 816, and buffers 818 and 820. Bond 808-1 is coupled to bond 806-1 and FF 810 sends the command rx_data command to the slaveN die. Bond 808-2 is coupled to bond 806-2 and sends the clock clk to the slaveN die. FF 812 is coupled to DDR MUX 816 and the slaveN die and receives data tx_data from the slaveN die. FF 814 is coupled to the slaveN die and receives the enable signal tx_en. Buffer 820 is coupled to DDR MUX 816 and bond 808-3 and transmits data tx_data in DDR data format. Buffer 818 is coupled to bond 808-3 and sends the local clock to Glink 3D master through bonds 808-3 and 806-3. FF 810 to FF 814 and DDR MUX 816 are driven by clock clk. Buffers 818 and 820 are driven by enable signal tx_en. In addition, the slaveK chip and the corresponding Glink-3D slaveK have the same structure and data communication as the slaveN chip and the Glink-3D slaveN. The difference between Glink-3D slaveN and Glink-3D slaveK is the generation of the local clock. The process of generating the local clock will be explained later with reference to FIG. 10 .

圖9示出根據本公開實施例的在讀取操作期間包括主晶片及從晶片的介面器件的示意性示意圖。示意圖900與示意圖800類似。示意圖900與示意圖800之間的不同之處在於，示意圖900示出例如具有對應的Glink-3D slaveN的一個slaveN晶片及SRAM 901。另外，還包括邏輯單元902及FF 904。9 shows a schematic diagram of an interface device including a master wafer and a slave wafer during a read operation according to an embodiment of the present disclosure. Diagram 900 is similar to diagram 800 . The difference between diagram 900 and diagram 800 is that diagram 900 shows, for example, one slaveN die and SRAM 901 with a corresponding Glink-3D slaveN. In addition, logic unit 902 and FF 904 are also included.

參照圖9，在讀取操作期間，主晶片經由Glink-3D master及Glink-3D slaveN將包括作為從晶片N的位址的晶片標識（identification，ID）的命令wr_data發送到SRAM 901。邏輯單元902耦合到Glink-3D slaveN、SRAM 901及FF 904。FF 904耦合到Glink-3D slaveN。邏輯單元902產生用於在晶片選擇（chip select，CS）命令、讀取（read，RD）命令或寫入（write，WR）命令之間進行選擇的信號。邏輯單元902及對應的FF 904產生啟用信號tx_en。SRAM 901根據命令產生資料tx_data。Glink-3D slaveN以DDR資料格式的形式將資料tx_data發送到Glink-3D master。主晶片根據Glink-3D slaveN的本地時鐘讀取資料tx_data。9, during a read operation, the master chip sends a command wr_data including a chip identification (ID) as the address of the slave chip N to the SRAM 901 via Glink-3D master and Glink-3D slaveN. Logic unit 902 is coupled to Glink-3D slaveN, SRAM 901 and FF 904. FF 904 is coupled to Glink-3D slaveN. The logic unit 902 generates signals for selecting between a chip select (CS) command, a read (RD) command or a write (write, WR) command. The logic unit 902 and the corresponding FF 904 generate the enable signal tx_en. The SRAM 901 generates data tx_data according to the command. Glink-3D slaveN sends data tx_data to Glink-3D master in DDR data format. The master chip reads the data tx_data according to the local clock of the Glink-3D slaveN.

圖10示出根據本公開實施例的包括時鐘樹的從到主介面的示意性示意圖。示意圖1000與示意圖800及示意圖900相同。示意圖1000與示意圖800及示意圖900之間的不同之處在於，從從到主介面的角度來看，示意圖1000示出資料路徑及時鐘路徑中所包括的更詳細的電路。此外，時鐘路徑具有用於將時鐘從Glink-3D master遞送到每一Glink-3D slave的時鐘樹（即，1019及1020）。另外，提供從Glink-3D slaveN及Glink-3D slaveK發送到Glink-3D master的DDR資料格式的形式的資料的時序圖。FIG. 10 shows a schematic diagram of a slave-to-master interface including a clock tree according to an embodiment of the present disclosure. Diagram 1000 is the same as diagram 800 and diagram 900 . Diagram 1000 differs from diagram 800 and diagram 900 in that diagram 1000 shows more detailed circuitry included in the data path and clock path from a slave-to-master interface perspective. Additionally, the clock path has a clock tree (ie, 1019 and 1020) for delivering the clock from the Glink-3D master to each Glink-3D slave. In addition, timing diagrams of data in the form of DDR data format sent from Glink-3D slaveN and Glink-3D slaveK to Glink-3D master are provided.

在此實施例中，從介面Glink-3D slaveN及其他從介面（即，Glink-3D slaveK）中的每一個還被配置成使用雙倍數據速率（double data rate，DDR）配置將資料/其他資料（即，tx_data [31:0]）發送到主介面。舉例來說，將資料tx_data[31:0]折疊成數據tx_data[31:16]及數據tx_data[15:0]。數據tx_data[31:16]及數據tx_data[15:0]中的每一個被稱為例如資料群集。In this embodiment, each of the slave interface Glink-3D slaveN and the other slave interfaces (ie, Glink-3D slaveK) is also configured to transfer data/other data using a double data rate (DDR) configuration (ie, tx_data[31:0]) sent to the main interface. For example, data tx_data[31:0] is collapsed into data tx_data[31:16] and data tx_data[15:0]. Each of the data tx_data[31:16] and the data tx_data[15:0] is called, for example, a data cluster.

在此實施例中，DDR配置由DDR單元產生，DDR單元包括第一FF 1002、第二FF 1004及多工器1006。第一FF 1002及第二FF 1004被表示為圖8所示FF 812，且多工器1006由圖8所示DDR MUX 816表示。第一FF 1002、第二FF 1004及多工器1006由時鐘1019驅動。第一FF 1002被配置成根據資料/其他資料（tx_data[31:0]）產生一部分資料（即，資料tx_data[31:16]）。第二FF 1004被配置成根據資料/其他資料（tx_data[31:0]）產生另一部分資料（即，資料tx_data[15:0]）。多工器1006耦合到第一FF 1002及第二FF 1004。多工器1006被配置成將一部分資料tx_data[31:16]及另一部分資料tx_data[15:0] 經由緩衝器1008發送到主器件。緩衝器1008由圖8所示緩衝器820表示。緩衝器1008由啟用信號tx_en啟用。圖10中所示的啟用信號tx_en與圖8及圖9中所示的啟用信號tx_en相同。通過啟用緩衝器1008，將一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]通過結合件1011及1021發送到Glink-3D master。In this embodiment, the DDR configuration is generated by a DDR unit including a first FF 1002 , a second FF 1004 and a multiplexer 1006 . The first FF 1002 and the second FF 1004 are represented as FF 812 shown in FIG. 8 , and the multiplexer 1006 is represented by DDR MUX 816 shown in FIG. 8 . The first FF 1002 , the second FF 1004 and the multiplexer 1006 are driven by the clock 1019 . The first FF 1002 is configured to generate a portion of the data (ie, data tx_data[31:16]) from the data/other data (tx_data[31:0]). The second FF 1004 is configured to generate another part of the data (ie, data tx_data[15:0]) from the data/other data (tx_data[31:0]). A multiplexer 1006 is coupled to the first FF 1002 and the second FF 1004 . The multiplexer 1006 is configured to send a portion of the data tx_data[31:16] and another portion of the data tx_data[15:0] via the buffer 1008 to the master device. Buffer 1008 is represented by buffer 820 shown in FIG. 8 . The buffer 1008 is enabled by the enable signal tx_en. The enable signal tx_en shown in FIG. 10 is the same as the enable signal tx_en shown in FIGS. 8 and 9 . By enabling the buffer 1008, a part of the data tx_data[31:16] and another part of the data tx_data[15:0] are sent to the Glink-3D master through the combiners 1011 and 1021 .

在另一實施例中，從介面（即，Glink-3D slaveN）及其他從介面（即，Glink-3D slaveK）中的每一個還包括第一選通1015及第二選通1016。第一選通1015及第二選通1016耦合到時鐘路徑1019。第一選通1015被配置成根據時鐘產生器（即，115）產生的時鐘clk_in產生第一本地時鐘RDQS_F。第二選通1016被配置成根據時鐘產生器（即，115）產生的時鐘clk_in產生第二本地時鐘RDQS_R。時鐘路徑1019是時鐘樹（即，1019、1020）的一個分支。時鐘clk_in通過結合件1024及1014作為clk_out遞送。時鐘路徑1019將時鐘clk_out經由時鐘路徑1019遞送到第一FF 1002、第二FF 1004、緩衝器1008、第一選通1015及第二選通1016。緩衝器1017通過結合件1012及1022將第一本地時鐘RDQS_F遞送到Glink-3D master。緩衝器1017根據啟用信號tx_en被啟用。圖10中所示的啟用信號tx_en與圖8及圖9中所示的啟用信號tx_en相同。緩衝器1018經由結合件1013及1023將第二本地時鐘RDQS_R遞送到Glink-3D master。緩衝器1018根據啟用信號tx_en被啟用。圖10中所示的啟用信號tx_en與圖8及圖9中所示的啟用信號tx_en相同。In another embodiment, each of the slave interface (ie, Glink-3D slaveN) and the other slave interfaces (ie, Glink-3D slaveK) further includes a first gate 1015 and a second gate 1016 . The first gate 1015 and the second gate 1016 are coupled to the clock path 1019 . The first gate 1015 is configured to generate the first local clock RDQS_F according to the clock clk_in generated by the clock generator (ie, 115 ). The second gate 1016 is configured to generate the second local clock RDQS_R according to the clock clk_in generated by the clock generator (ie, 115 ). Clock path 1019 is a branch of the clock tree (ie, 1019, 1020). Clock clk_in is delivered as clk_out through bonds 1024 and 1014 . Clock path 1019 delivers clock clk_out to first FF 1002 , second FF 1004 , buffer 1008 , first gate 1015 and second gate 1016 via clock path 1019 . The buffer 1017 delivers the first local clock RDQS_F to the Glink-3D master through the bonds 1012 and 1022 . The buffer 1017 is enabled according to the enable signal tx_en. The enable signal tx_en shown in FIG. 10 is the same as the enable signal tx_en shown in FIGS. 8 and 9 . The buffer 1018 delivers the second local clock RDQS_R to the Glink-3D master via the bonds 1013 and 1023 . The buffer 1018 is enabled according to the enable signal tx_en. The enable signal tx_en shown in FIG. 10 is the same as the enable signal tx_en shown in FIGS. 8 and 9 .

在此實施例中，由第一選通1015產生的第一本地時鐘RDQS_F由Glink-3D master用於讀取由第一FF 1002產生的一部分資料tx_data[31:16]，且由第二選通1016產生的第二本地時鐘RDQS_R由Glink-3D master用於讀取由第二FF 1004產生的另一部分資料tx_data[15:0]。舉例來說，Glink-3D master包括單元塊，所述單元塊被配置成根據第一本地時鐘RDQS_F讀取一部分資料tx_data[31:16]且根據第二本地時鐘RDQS_R讀取另一部分資料tx_data[15:0]。Glink-3D master使用DDR資料格式讀取一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]。因此，Glink-3D master對一部分資料tx_data[31:16]與另一部分資料tx_data[15:0]進行組合，以產生完整的資料rx_data[31:0]。Glink-3D master然後將完整的資料rx_data[31:0]發送到處理器。In this embodiment, the first local clock RDQS_F generated by the first gate 1015 is used by the Glink-3D master to read a portion of the data tx_data[31:16] generated by the first FF 1002, and is used by the second gate The second local clock RDQS_R generated by 1016 is used by the Glink-3D master to read another part of the data tx_data[15:0] generated by the second FF 1004 . For example, the Glink-3D master includes a unit block configured to read a part of the data tx_data[31:16] according to the first local clock RDQS_F and read another part of the data tx_data[15] according to the second local clock RDQS_R :0]. Glink-3D master uses DDR data format to read part of data tx_data[31:16] and another part of data tx_data[15:0]. Therefore, the Glink-3D master combines a part of the data tx_data[31:16] with another part of the data tx_data[15:0] to generate the complete data rx_data[31:0]. The Glink-3D master then sends the complete data rx_data[31:0] to the processor.

在此實施例中，Glink-3D master還包括FIFO單元。圖10所示FIFO單元也被表示為圖8及圖9所示FIFO單元。可實施FIFO單元來獲得如前所述的單元塊的功能。FIFO單元可由多個FF（即，1031、1032、1051、1041、1042、1061）實施。FF 1031及1041表示圖8所示FF 803-1。FF 1032及1042表示圖8所示FF 803-2。FF 1051及1061表示圖8所示FF 803-3。具體來說，FF 1031及1041耦合到結合件1021，以接收一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]。FF 1031經由反相器1030耦合到結合件1022，以接收第一本地時鐘RDQS_F。FF 1041耦合到結合件1023以接收第二本地時鐘RDQS_R。FF 1031及1041分別耦合到FF 1032及1042，以形成FIFO單元。FF的數目並不限於特定數目。FF的數目可由任意數目的FF來實施。In this embodiment, the Glink-3D master also includes a FIFO unit. The FIFO unit shown in FIG. 10 is also denoted as the FIFO unit shown in FIGS. 8 and 9 . A FIFO unit can be implemented to obtain the functionality of the unit block as previously described. A FIFO unit may be implemented by multiple FFs (ie, 1031, 1032, 1051, 1041, 1042, 1061). FFs 1031 and 1041 represent FF 803-1 shown in FIG. 8 . FFs 1032 and 1042 represent FF 803-2 shown in FIG. 8 . FFs 1051 and 1061 represent FF 803-3 shown in FIG. 8 . Specifically, FFs 1031 and 1041 are coupled to bond 1021 to receive a portion of data tx_data[31:16] and another portion of data tx_data[15:0]. FF 1031 is coupled to bond 1022 via inverter 1030 to receive the first local clock RDQS_F. The FF 1041 is coupled to the bond 1023 to receive the second local clock RDQS_R. FFs 1031 and 1041 are coupled to FFs 1032 and 1042, respectively, to form FIFO units. The number of FFs is not limited to a specific number. The number of FFs can be implemented by any number of FFs.

此外，FIFO單元包括FF 1051及1061。FF 1051及1061被配置成基於DDR資料格式對一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]進行處理。FF 1051耦合到例如FF 1032，且FF 1061耦合到例如FF 1042。FF 1051及1061被配置成使用由時鐘產生器（即，115）產生的時鐘將來自Glink-3D slaveN及另一Glink-3D（即，Glink-3D slaveK）的FIFO單元的一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]進行重新計時。實行重新計時過程是為了使一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]與時鐘clk_in同步。通過與時鐘clk_in同步，使用例如由處理器產生的命令tx_data command以相同的頻率及相同的相位對一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]進行採樣。In addition, the FIFO unit includes FFs 1051 and 1061 . The FFs 1051 and 1061 are configured to process a part of the data tx_data[31:16] and another part of the data tx_data[15:0] based on the DDR data format. FF 1051 is coupled to, for example, FF 1032, and FF 1061 is coupled to, for example, FF 1042. FFs 1051 and 1061 are configured to use the clock generated by the clock generator (ie, 115) to transfer a portion of the data tx_data[31: 16] and another part of the data tx_data[15:0] to re-time. The reclocking process is performed to synchronize a part of the data tx_data[31:16] and another part of the data tx_data[15:0] with the clock clk_in. One part of the data tx_data[31:16] and another part of the data tx_data[15:0] are sampled at the same frequency and the same phase using a command tx_data command, eg generated by the processor, by synchronizing with the clock clk_in.

舉例來說，FF 1031及1041接收一部分資料tx_data[31:16]及另一部分資料tx_data[15:0]。FF 1031通過從第一選通1015接收的第一本地時鐘RDQS_F對一部分資料tx_data[31:16]進行採樣。FF 1031將一部分資料tx_data[31:16]發送到FF 1032。FF 1051從例如FF 1032接收一部分資料tx_data[31:16]且基於時鐘clk_in對一部分資料tx_data[31:16]進行採樣。因此，FF 1041通過從第二選通1016接收的第二本地時鐘RDQS_R對另一部分資料tx_data[15:0]進行採樣。FF 1041將另一部分資料tx_data[15:0]發送到FF 1042。FF 1061從例如FF 1042接收另一部分資料tx_data[15:0]且基於時鐘clk_in對另一部分資料tx_data[15:0]進行採樣。最後，FF 1051及1061產生完整的資料rx_data[31:0]且將完整的資料rx_data[31:0]發送到處理器。也就是說，Glink-3D master的FIFO單元對從例如Glink-3D slaveN接收的資料tx_data[31:0]進行處理，以基於DDR資料格式產生完整的資料rx_data[31:0]。For example, FFs 1031 and 1041 receive a portion of data tx_data[31:16] and another portion of data tx_data[15:0]. The FF 1031 samples a portion of the data tx_data[31:16] by the first local clock RDQS_F received from the first gate 1015 . FF 1031 sends a part of data tx_data[31:16] to FF 1032. FF 1051 receives a portion of data tx_data[31:16] from eg FF 1032 and samples a portion of data tx_data[31:16] based on clock clk_in. Therefore, the FF 1041 samples another part of the data tx_data[15:0] by the second local clock RDQS_R received from the second gate 1016 . FF 1041 sends another part of data tx_data[15:0] to FF 1042. FF 1061 receives another portion of data tx_data[15:0] from eg FF 1042 and samples another portion of data tx_data[15:0] based on clock clk_in. Finally, FFs 1051 and 1061 generate the complete data rx_data[31:0] and send the complete data rx_data[31:0] to the processor. That is, the FIFO unit of the Glink-3D master processes the data tx_data[31:0] received from eg the Glink-3D slaveN to generate the complete data rx_data[31:0] based on the DDR data format.

在另一實施例中，參照圖10，主器件（即，處理器）進一步產生轉向（TA）迴圈。TA迴圈是例如由Glink-3D master的FIFO單元在結合件1021處接收的Glink-3D slaveN的資料tx_data與Glink-3D slaveK的資料tx_data之間的間隔。舉例來說，由Glink-3D master的FIFO單元在結合件1021處接收的資料tx_data指的是Master RX_D。從Glink-3D slaveN接收的Master RX_D包含資料DN[15:0]及DN[31:16]。從Glink-3D slaveK接收的Master RX_D包含資料DK[15:0]及DK[31:16]。也就是說，TA迴圈是資料DN[31:16]與數據DK[15:0]之間的間隔。In another embodiment, referring to FIG. 10 , the master device (ie, the processor) further generates a turnaround (TA) loop. The TA loop is, for example, the interval between the data tx_data of the Glink-3D slaveN and the data tx_data of the Glink-3D slaveK received at the junction 1021 by the FIFO unit of the Glink-3D master. For example, the data tx_data received at the junction 1021 by the FIFO unit of the Glink-3D master refers to the Master RX_D. The Master RX_D received from Glink-3D slaveN contains data DN[15:0] and DN[31:16]. The Master RX_D received from Glink-3D slaveK contains data DK[15:0] and DK[31:16]. That is, the TA loop is the interval between the data DN[31:16] and the data DK[15:0].

在此實施例中，TA迴圈用於防止從器件（即，slaveN器件）的回應與其他從器件（即，slaveK器件）的回應之間的匯流排競爭。舉例來說，在讀取操作期間，主器件/處理器利用分配時隙將包括從ID的命令發送到slaveN器件及slaveK器件。slaveN器件及slaveK器件分別將資料及本地時鐘經由Glink-3D slaveN及Glink-3D slaveK根據分配時隙發送到處理器。slaveN器件及slaveK器件根據分配時隙使用資料匯流排。Glink-3D slaveN將數據tx_data[31:0]通過結合件1011發送到Glink-3D master。Glink-3D slaveN還分別將第一本地時鐘RDQS_F及第二本地時鐘RDQS_R通過結合件1012及1013發送到Glink-3D master。Glink-3D master在結合件1021處從Glink-3D slaveN接收資料DN[15:0]及DN[31:16]。Glink-3D master使用第二本地時鐘RDQS_R對資料DN[15:0]進行採樣。Glink-3D master使用第一本地時鐘RDQS_F對及資料DN[31:16]進行採樣。In this embodiment, the TA loop is used to prevent bus competition between responses from slave devices (ie, slaveN devices) and responses from other slave devices (ie, slaveK devices). For example, during a read operation, the master/processor sends commands including the slave ID to the slaveN and slaveK devices using the allocated time slot. The slaveN device and the slaveK device respectively send the data and the local clock to the processor through the Glink-3D slaveN and Glink-3D slaveK according to the allocated time slot. The slaveN device and slaveK device use the data bus according to the assigned time slot. The Glink-3D slaveN sends the data tx_data[31:0] to the Glink-3D master through the coupling 1011. The Glink-3D slaveN also sends the first local clock RDQS_F and the second local clock RDQS_R to the Glink-3D master through the coupling elements 1012 and 1013, respectively. Glink-3D master receives data DN[15:0] and DN[31:16] from Glink-3D slaveN at bond 1021 . The Glink-3D master uses the second local clock RDQS_R to sample the data DN[15:0]. The Glink-3D master uses the first local clock RDQS_F to sample the pair and data DN[31:16].

且然後，Glink-3D slaveK將資料tx_data[31:0]經由Glink-3D slaveK的對應的結合件發送到Glink-3D master。Glink-3D slaveK還將第一本地時鐘RDQS_F及第二本地時鐘RDQS_R經由Glink-3D slaveK的對應的結合件發送到Glink-3D master。在TA迴圈之後，Glink-3D master在結合件1021處從Glink-3D slaveK接收資料DK[15:0]及DK[31:16]。Glink-3D master使用第二本地時鐘RDQS_R對資料DK[15:0]進行採樣。Glink-3D master使用第一本地時鐘RDQS_F對及資料DK[31:16]進行採樣。And then, the Glink-3D slaveK sends the data tx_data[31:0] to the Glink-3D master via the corresponding combination of the Glink-3D slaveK. The Glink-3D slaveK also sends the first local clock RDQS_F and the second local clock RDQS_R to the Glink-3D master via the corresponding combination of the Glink-3D slaveK. After the TA loop, the Glink-3D master receives data DK[15:0] and DK[31:16] from the Glink-3D slaveK at the junction 1021 . The Glink-3D master uses the second local clock RDQS_R to sample the data DK[15:0]. The Glink-3D master uses the first local clock RDQS_F to sample the pair and data DK[31:16].

也就是說，在從slaveN器件及slaveK器件將資料傳送到處理器的期間，通過為slaveN及slaveK提供使用資料匯流排的時隙，TA迴圈會防止slaveN器件與slaveK器件之間的匯流排競爭。That is, the TA loop prevents bus contention between slaveN and slaveK devices by providing slaveN and slaveK with time slots to use the data bus during data transfer from slaveN and slaveK devices to the processor .

在此實施例中，TA迴圈用於補償從器件與其他從器件之間的往返延遲（RTD, round-trip-delays）差。RTD是由Glink-3D master發送的命令與由Glink-3D master接收的資料之間的間隔。由於每一從器件是例如由不同的製造公司生產，因此每一從器件具有不同的回應特性。回應特性包括RTD。從器件之間的RTD差由TA迴圈補償。In this embodiment, the TA loop is used to compensate the round-trip delay (RTD, round-trip-delays) difference between the slave device and other slave devices. RTD is the interval between the command sent by the Glink-3D master and the data received by the Glink-3D master. Since each slave device is produced, for example, by a different manufacturing company, each slave device has different response characteristics. Response characteristics include RTD. RTD differences between slaves are compensated by the TA loop.

舉例來說，在讀取操作期間，slaveN器件及slaveK器件分別經由Glink-3D slaveN及Glink-3D slaveK從處理器接收命令。由於slaveN器件與slaveK器件具有不同的RTD，因此Glink-3D master在不同的時間從Glink-3D slaveN及Glink-3D slaveK接收資料。儘管Glink-3D master已配備有下拉功能，但如果RTD差大於由處理器分配到slaveN器件及slaveK器件的分配時隙差，則可能發生匯流排競爭。因此，通過將TA迴圈與RTD差相加（即，1迴圈+/-ΔRTD、1.5迴圈+/-ΔRTD），從Glink-3D slaveN（DN[15:0]及DN[31:16]）接收資料的時間與從Glink-3D slaveK（DK[15:0]及DK[31:16]）接收資料的時間之間的間隔得到維持，這樣一來，可避免匯流排競爭。For example, during a read operation, slaveN devices and slaveK devices receive commands from the processor via Glink-3D slaveN and Glink-3D slaveK, respectively. Since the slaveN device and the slaveK device have different RTDs, the Glink-3D master receives data from Glink-3D slaveN and Glink-3D slaveK at different times. Although the Glink-3D master is already equipped with pull-down functionality, bus contention may occur if the RTD difference is greater than the difference in the allocated time slots allocated by the processor to slaveN devices and slaveK devices. Therefore, by summing the TA loop and the RTD difference (ie, 1 loop +/- ΔRTD, 1.5 loop +/- ΔRTD), slave N (DN[15:0] and DN[31:16] from Glink-3D ]) The interval between when data is received and when data is received from Glink-3D slaveK (DK[15:0] and DK[31:16]) is maintained so that bus competition is avoided.

圖11示出根據本公開實施例的具有相同時鐘速度的兩個從晶片之間的資料的示意性時序圖。並且圖12示出根據本公開實施例的具有不同時鐘速度的兩個從晶片之間的資料的示意性時序圖。slaveN器件及slaveK器件用作時序圖的實例。11 shows a schematic timing diagram of data between two slave wafers having the same clock speed according to an embodiment of the present disclosure. And FIG. 12 shows a schematic timing diagram of data between two slave wafers with different clock speeds according to an embodiment of the present disclosure. slaveN devices and slaveK devices are used as examples of timing diagrams.

在此實施例中，從器件（即，slaveN器件）及其他從器件（即，slaveK器件）在資料之前及資料之後產生零資料，以防止由於不同的RTD而引起的從器件與其他從器件之間的匯流排競爭。參照圖11，時鐘slaveK clk_B速度與時鐘slaveN clk_B速度具有相同速度（即，正常/典型速度）。slaveK器件配備有啟用信號tx_en且將資料tx_dataK經由Glink-3D slaveK發送到主器件。Glink-3D slaveK將資料dataK通過結合件（即，1011）遞送到Glink-3D master。因此，Glink-3D slaveK將資料dataK之前的零資料及資料dataK之後的零資料通過結合件（即，1011）發送到Glink-3D master。Glink-3D slaveK還產生本地時鐘RDQS_R及RDQS_F。In this embodiment, the slave device (ie, the slaveN device) and other slave devices (ie, the slaveK device) generate zero data before and after the data to prevent the slave device and other slave devices due to different RTDs. competition between busbars. Referring to FIG. 11 , the clock slaveK clk_B speed is the same speed as the clock slaveN clk_B speed (ie, normal/typical speed). The slaveK device is equipped with the enable signal tx_en and sends the data tx_dataK to the master via the Glink-3D slaveK. The Glink-3D slaveK delivers the data dataK to the Glink-3D master through the bond (ie, 1011). Therefore, the Glink-3D slaveK sends the zero data before the data dataK and the zero data after the data dataK to the Glink-3D master through the combiner (ie, 1011). Glink-3D slaveK also generates local clocks RDQS_R and RDQS_F.

另一方面，slaveN器件配備有啟用信號tx_en且將資料tx_dataN0及tx_dataN1經由Glink-3D slaveN發送到主器件。Glink-3D slaveN將資料dataN0及dataN1經由對應的結合件遞送到Glink-3D master。因此，Glink-3D slaveN將資料dataN0及dataN1之前的零資料以及資料dataN0及dataN1之後的零資料經由對應的結合件發送到Glink-3D master。Glink-3D slaveN還產生本地時鐘RDQS_R及RDQS_F。由於資料dataK配備有資料dataK之後的零資料且資料dataN0配備有資料dataN0之前零資料，因此在資料dataK與資料dataN0之間存在間隔（即，1T TA時間）。也就是說，由slaveN器件及slaveK器件產生的零資料在資料dataK與資料dataN0之間產生間隔（即，1T TA時間），以防止在其中時鐘slaveK clk_B與時鐘slaveN clk_B具有相同速度的情況下，slaveN器件與slaveK器件之間的匯流排競爭。On the other hand, the slaveN device is equipped with the enable signal tx_en and sends the data tx_dataN0 and tx_dataN1 to the master device via the Glink-3D slaveN. The Glink-3D slaveN delivers the data dataN0 and dataN1 to the Glink-3D master via the corresponding connectors. Therefore, the Glink-3D slaveN sends the zero data before the data dataN0 and dataN1 and the zero data after the data dataN0 and dataN1 to the Glink-3D master through the corresponding combination. Glink-3D slaveN also generates local clocks RDQS_R and RDQS_F. Since data dataK is equipped with zero data after data dataK and data dataN0 is equipped with zero data before data dataN0, there is a gap (ie, 1T TA time) between data dataK and data dataN0. That is, the zero data generated by the slaveN device and the slaveK device creates a gap (ie, 1T TA time) between the data dataK and the data dataN0 to prevent, in the case where the clock slaveK clk_B and the clock slaveN clk_B have the same speed, Bus competition between slaveN devices and slaveK devices.

在另一實施例中，參照圖12，圖11所示實施例與圖12所示實施例之間的不同之處在於，時鐘slaveK clk_B速度與時鐘slaveN clk_B速度具有不同速度。舉例來說，時鐘slaveK clk_B速度慢且時鐘slaveN clk_B速度快。換句話說，時鐘slaveN clk_B比時鐘slaveK clk_B快。換句話說，時鐘slaveN clk_B比時鐘slaveK clk_B早。較早時鐘的間隔小於1T（2.5 GHz為＜ 400 ps）。由於資料dataK配備有資料dataK之前的零資料及資料dataK之後的零資料，且資料dataN0配備有資料dataN0之前的零資料，因此在時鐘slaveN clk_B與時鐘slaveK clk_B之間存在間隔（＜1T）且在資料dataK與資料dataN0之間存在間隔（TA時間）。In another embodiment, referring to FIG. 12 , the difference between the embodiment shown in FIG. 11 and the embodiment shown in FIG. 12 is that the clock slaveK clk_B speed and the clock slaveN clk_B speed have different speeds. For example, the clock slaveK clk_B is slow and the clock slaveN clk_B is fast. In other words, the clock slaveN clk_B is faster than the clock slaveK clk_B. In other words, the clock slaveN clk_B is earlier than the clock slaveK clk_B. The earlier clocks are less than 1T apart (<400 ps for 2.5 GHz). Since data dataK is equipped with zero data before data dataK and zero data after data dataK, and data dataN0 is equipped with zero data before data dataN0, there is a gap (<1T) between clock slaveN clk_B and clock slaveK clk_B and in There is a gap (TA time) between the data dataK and the data dataN0.

也就是說，由slaveN器件及slaveK器件產生的零資料在資料dataK與資料dataN0之間產生間隔（TA時間），以在其中時鐘slaveK clk_B與時鐘slaveN clk_B具有不同速度的情況下防止slaveN器件與slaveK器件之間的匯流排競爭。That is, the zero data generated by the slaveN device and the slaveK device creates an interval (TA time) between the data dataK and the data dataN0 to prevent the slaveN device and the slaveK in the case where the clock slaveK clk_B and the clock slaveN clk_B have different speeds Bus competition between devices.

圖13示出根據本公開實施例的具有2個TA迴圈的兩個從晶片之間的資料的示意性時序圖。時序圖1300包括2個讀取等待時間迴圈及2個TA迴圈。讀取等待時間是Glink-3D slave經由對應的結合件從主器件接收命令的時間與Glink-3D slave經由對應的結合件根據命令發送資料的時間之間的間隔。13 shows a schematic timing diagram of data between two slave wafers with 2 TA loops according to an embodiment of the present disclosure. Timing diagram 1300 includes 2 read latency loops and 2 TA loops. The read latency is the interval between the time when the Glink-3D slave receives a command from the master device via the corresponding bond and the time when the Glink-3D slave sends data according to the command via the corresponding bond.

具體來說，例如，在讀取操作期間，Glink-3D slaveK及Glink-3D slaveN接收包括從ID d_did及對應的時鐘clk_out的命令s_cmd。主器件在被發送到slaveK器件的讀取命令RD與被發送到slaveN器件的前同步命令PA之間發送命令NOP。命令NOP是無操作命令。前同步命令PA是用於從器件準備資料的命令。讀取命令RD是用於從器件在從器件已準備資料之後發送資料的讀取命令。Specifically, for example, during a read operation, Glink-3D slaveK and Glink-3D slaveN receive the command s_cmd including the slave ID d_did and the corresponding clock clk_out. The master device sends the command NOP between the read command RD sent to the slaveK device and the preamble command PA sent to the slaveN device. The command NOP is a no-operation command. The preamble command PA is a command for the slave device to prepare data. The read command RD is a read command for the slave device to send data after the slave device has prepared the data.

在此實施例中，slaveK器件以比slaveN器件根據由主器件分配的時隙發送資料（即，tx_dataN、前言碼）早的分配時隙發送資料（即，tx_dataK、前言碼）。當啟動啟用信號tx_en（即，1）時，將由slaveK器件發送的資料（即，tx_dataK、前言碼）和/或由slaveN器件發送的資料（即，tx_dataN、前言碼）遞送到對應的從結合件TX_D。反之，當將啟用信號tx_en去啟動（即，0）時，將由slaveK器件發送的資料（即，tx_dataK、前言碼）和/或由slaveN器件發送的資料（即，tx_dataN、前言碼）遞送到對應的從結合件TX_D。在讀取等待時間具有2個迴圈的情況下，例如由Glink-3D slaveK接收的命令NOP與由Glink-3D slaveK在對應的從結合件TX_D發送的資料dataK之間的間隔為2個迴圈。具有2個迴圈的讀取等待時間對應於由主器件發送的命令NOP。另一方面，在TA具有2個迴圈的情況下，由Glink-3D slaveK在對應的從結合件TX_D處發送的資料dataK與由Glink-3D slaveN在對應的從結合件TX_D處發送的資料dataN之間的間隔是2個迴圈+/- ΔRTD。In this embodiment, the slaveK device transmits data (ie, tx_dataK, preamble) at an earlier allocated time slot than the slaveN device transmits data (ie, tx_dataN, preamble) according to the time slot allocated by the master. When the enable signal tx_en (ie, 1) is activated, data sent by the slaveK device (ie, tx_dataK, preamble) and/or data sent by the slaveN device (ie, tx_dataN, preamble) are delivered to the corresponding slave bond TX_D. Conversely, when the enable signal tx_en is deactivated (ie, 0), the data sent by the slaveK device (ie, tx_dataK, the preamble) and/or the data sent by the slaveN device (ie, tx_dataN, the preamble) are delivered to the corresponding of the slave binding piece TX_D. In the case of read latency with 2 loops, for example the interval between the command NOP received by Glink-3D slaveK and the data dataK sent by Glink-3D slaveK on the corresponding slave bond TX_D is 2 loops . The read latency with 2 loops corresponds to the command NOP sent by the master. On the other hand, in the case where TA has 2 loops, the data dataK sent by Glink-3D slaveK at the corresponding slave joint TX_D and the data dataN sent by Glink-3D slaveN at the corresponding slave joint TX_D The interval between is 2 loops +/- ΔRTD.

也就是說，具有2個迴圈的TA容忍高達2T的差並且可由主器件在前同步命令PA之前添加命令NOP來設定。此外，在RTD差小於1週期T（2.5 GHz為400 ps）的情況下，具有1個迴圈的TA就足夠了。That is, TA with 2 loops tolerates a difference of up to 2T and can be set by the master adding a command NOP before the preamble command PA. Also, in cases where the RTD difference is less than 1 period T (400 ps for 2.5 GHz), a TA with 1 loop is sufficient.

圖14示出根據本公開實施例的訓練之前與訓練之後的第一選通單元及第二選通單元的示意性比較。圖14所示方塊示意圖1400表示圖10所示方塊圖1000。圖14與圖10之間的不同之處在於，電路圖1400包括訓練之前與訓練之後的第一選通1015及第二選通1016的比較。FIG. 14 shows a schematic comparison of the first gating unit and the second gating unit before and after training according to an embodiment of the present disclosure. The block diagram 1400 shown in FIG. 14 represents the block diagram 1000 shown in FIG. 10 . The difference between Figure 14 and Figure 10 is that the circuit diagram 1400 includes a comparison of the first gating 1015 and the second gating 1016 before and after training.

在此實施例中，從器件（即，slaveN器件）及其他從器件（即，slaveK器件）訓練第一選通1015及第二選通1016以在最佳資料採樣點定位一部分資料（即，DN[31:16]）及另一部分資料（即，DN[15:0]）。一部分資料（即，DN[31:16]）及另一部分資料（即，DN[15:0]）被稱為例如資料群集。具體來說，當半導體器件被啟動/接通時，主器件逐個選擇從器件以進行訓練。舉例來說，主器件選擇slaveN器件。由主器件選擇的slaveN器件管理如下所述的訓練序列。slaveN器件將Glink-3D slaveN的第一選通1015及第二選通1016設定為零，第一選通1015及第二選通1016由第一本地時鐘RDQS_F Initial及第二本地時鐘RDQS_R Initial表示。且然後，slaveN器件將BIST資料（即，DN[31:16]及DN[15:0]）發送到主器件。主器件在對應的主結合件處接收BIST資料（即，DN[31:16]及DN[15:0]），所述BIST資料例如由RX_D表示。主器件單獨向slaveN器件報告資料DN[31:16]及DN[15:0]的通過/失敗。slaveN器件使第一本地時鐘RDQS_F Initial的相位及第二本地時鐘RDQS_R Initial的相位遞增。繼續進行使第一本地時鐘RDQS_F Initial的相位及第二本地時鐘RDQS_R Initial的相位遞增的過程，直到slaveN器件接收到由主器件報告的第一個通過及最後一個通過。當主器件報告最後一個通過時，slaveN器件停止將BIST資料發送到主器件。例如在主器件報告通過之後，在報告失敗之後獲得最後一個通過。且然後，slaveN器件通過例如將總通過除以2在中間點處設定第一本地時鐘的相位及第二本地時鐘的相位。因此，slaveN器件向主器件發送就緒資料。第一個通過是由例如第一本地時鐘的RDQS_F Initial及第二本地時鐘的RDQS_R Initial分別表示。中間點是由例如第一本地時鐘的RDQS_F Trained及第二本地時鐘的RDQS_R Trained分別表示。中間點表示最佳資料採樣點。In this embodiment, the slave device (ie, the slaveN device) and other slave devices (ie, the slaveK device) train the first gate 1015 and the second gate 1016 to locate a portion of the data (ie, the DN) at the best data sampling point [31:16]) and another part of the data (ie, DN[15:0]). A portion of the data (ie, DN[31:16]) and another portion of the data (ie, DN[15:0]) are referred to, for example, as a data cluster. Specifically, when the semiconductor device is activated/turned on, the master device selects the slave devices one by one for training. For example, the master device selects the slaveN device. The slaveN device selected by the master device manages the training sequence as described below. The slaveN device sets the first gate 1015 and the second gate 1016 of the Glink-3D slaveN to zero, and the first gate 1015 and the second gate 1016 are represented by the first local clock RDQS_F Initial and the second local clock RDQS_R Initial. And then, the slaveN device sends the BIST data (ie, DN[31:16] and DN[15:0]) to the master device. The master device receives BIST data (ie, DN[31:16] and DN[15:0]) at the corresponding master junctions, for example, represented by RX_D. The master device reports the pass/fail of the data DN[31:16] and DN[15:0] to the slaveN device alone. The slaveN device increments the phase of the first local clock RDQS_F Initial and the phase of the second local clock RDQS_R Initial. The process of incrementing the phase of the first local clock RDQS_F Initial and the phase of the second local clock RDQS_R Initial continues until the slaveN device receives the first pass and the last pass reported by the master. When the master device reports the last pass, the slaveN device stops sending BIST data to the master device. For example, after the master reports a pass, the last pass is obtained after a failure is reported. And then, the slaveN device sets the phase of the first local clock and the phase of the second local clock at an intermediate point by, for example, dividing the total pass by 2. Therefore, the slaveN device sends ready data to the master device. The first pass is represented by, for example, RDQS_F Initial of the first local clock and RDQS_R Initial of the second local clock, respectively. The middle point is represented by, for example, RDQS_F Trained of the first local clock and RDQS_R Trained of the second local clock, respectively. The middle point represents the best data sampling point.

也就是說，最佳資料採樣點通過如下方式獲得：單獨地使第一選通1015的第一本地時鐘的相位遞增及使第二選通1016的第二本地時鐘的相位遞增直到獲得最佳採樣點。That is, the optimal data sampling point is obtained by individually incrementing the phase of the first local clock of the first gate 1015 and the phase of the second local clock of the second gate 1016 until the optimal sample is obtained point.

在另一實施例中，從器件（即，slaveN器件）使用主介面Glink-3D master的第一選通的第一時鐘及第二選通的第二時鐘來更新從介面（Glink-3D slaveN）的第一選通的第一本地時鐘及第二選通的第二本地時鐘，以補償電壓到溫度（voltage-to-temperature，V-T）改變。In another embodiment, the slave device (ie, the slaveN device) uses the first clock of the first gate of the master interface Glink-3D master and the second clock of the second gate to update the slave interface (Glink-3D slaveN) The first gated first local clock and the second gated second local clock to compensate for voltage-to-temperature (V-T) changes.

舉例來說，半導體器件在正常處理期間具有正常溫度且在高溫處理期間具有高溫。在高溫期間經由Glink-3D從從器件（即，slaveN器件）經由從介面（即，Glink-3D slaveN）發送到主器件的資料具有例如比正常溫度期間長的持續時間/週期。slaveN器件根據資料在高溫下的週期及資料在常溫下的週期更新第一本地時鐘的相位（即，RDQS_F Trained）及第二本地時鐘的相位（即，RDQS_R Trained）。通過比較資料在常溫下的中間點與資料在高溫下的中間點來實行更新過程。For example, semiconductor devices have normal temperatures during normal processing and high temperatures during high temperature processing. The data sent from the slave (ie, slaveN device) via Glink-3D to the master via the slave interface (ie, Glink-3D slaveN) during high temperature has eg longer duration/period than during normal temperature. The slaveN device updates the phase of the first local clock (ie, RDQS_F Trained) and the phase of the second local clock (ie, RDQS_R Trained) according to the period of the data at high temperature and the period of the data at normal temperature. The update process is carried out by comparing the midpoint of the data at room temperature with the midpoint of the data at high temperature.

也就是說，通過根據主介面的第一時鐘及第二時鐘在不同溫度下更新從介面的第一本地時鐘的相位及第二本地時鐘的相位，可補償V-T改變。因此，主介面在最佳資料採樣點對從從介面接收的資料進行採樣。That is, by updating the phase of the first local clock and the phase of the second local clock of the slave interface at different temperatures according to the first clock and the second clock of the master interface, the V-T change can be compensated for. Therefore, the master interface samples the data received from the slave interface at the optimal data sampling point.

圖15示出根據本公開實施例的DLL訓練的示意性流程圖。流程圖1500是在DLL訓練開始前實行。DLL訓練旨在獲得DLL的最大步階。DLL的最大步階指的是DLL延遲從介面（即，Glink-3D slaveN）中的時鐘的能力。DLL指的是第一選通1015及第二選通1016。DLL訓練是在兩個不同的視角下實行，所述兩個不同的視角包括內部積體電路（Inter-Integrated Circuit，I2C）序列及從序列。I2C序列是在I2C協議中實行的流程圖。並且從序列是在從器件中實行的流程圖。FIG. 15 shows a schematic flowchart of DLL training according to an embodiment of the present disclosure. Flowchart 1500 is performed before DLL training begins. DLL training aims to obtain the maximum steps of the DLL. The maximum step of the DLL refers to the ability of the DLL to delay the clock in the slave interface (ie, Glink-3D slaveN). DLL refers to the first gate 1015 and the second gate 1016 . DLL training is performed under two different perspectives, including the Inter-Integrated Circuit (I2C) sequence and the slave sequence. The I2C sequence is a flowchart implemented in the I2C protocol. And a slave sequence is a flowchart implemented in a slave device.

在I2C序列中，從步驟S1505到步驟S1520實行DLL訓練。在步驟S1505中，清除/重置DLL值。且然後，在步驟S1510中，將每一從器件的寄存器設定為通過例如將DLL訓練旗標改變為1來啟用DLL訓練。用於啟用DLL訓練的寄存器指的是累加器（accumulator，ACC）。在步驟S1515中，檢查指示DLL訓練完成的從旗標。實行步驟S1515，直到通過例如將對應旗標改變為1來設定指示DLL訓練完成的從旗標。對所有從器件（即，slaveN器件、slaveK器件）實行步驟S1515。在步驟S1520中，當設定了所有從器件的對應旗標時，通過例如將DLL訓練旗標改變為0來重置DLL訓練旗標。通過這樣做，表示DLL訓練的每一從器件的寄存器被禁用。也就是說，通過實行步驟S1505到S1520，獲得每一從器件的DLL的最大步階/延遲。In the I2C sequence, DLL training is performed from step S1505 to step S1520. In step S1505, the DLL value is cleared/reset. And then, in step S1510, each slave's register is set to enable DLL training by, for example, changing the DLL training flag to 1. The register used to enable DLL training refers to the accumulator (ACC). In step S1515, the slave flag indicating the completion of DLL training is checked. Step S1515 is carried out until the slave flag indicating the completion of DLL training is set by changing the corresponding flag to 1, for example. Step S1515 is performed for all slave devices (ie, slaveN devices, slaveK devices). In step S1520, when the corresponding flags of all slave devices are set, the DLL training flag is reset by, for example, changing the DLL training flag to 0. By doing so, the registers for each slave that represent DLL training are disabled. That is, by carrying out steps S1505 to S1520, the maximum step/delay of the DLL of each slave device is obtained.

在從序列中，通過步驟S1555到S1575實行DLL訓練。在步驟S1555中，從器件（即，slaveN器件）檢查DLL訓練是否被啟用。在步驟S1560中，如果DLL訓練被啟用，則通過例如向DLL值加1來增大DLL值。在步驟S1565中，檢查滯後旗標及超前旗標。如果DLL值最大，則滯後旗標顯示0且超前旗標顯示1，因此，如果DLL值不是最大，則重複步驟S1560。如果DLL值最大，則通過將DLL值減少1將步驟繼續到步驟S1570。將DLL值減少1的原因是，當步驟S1565中的條件為否時，最大值表示條件中DLL值的最後值。最後，在步驟S1575中，從器件設定表示DLL訓練完成的旗標。也就是說，表示DLL完成的旗標表示從器件（即，slaveN器件）的DLL訓練完成。因此，獲得最大DLL值。步驟S1555到S1575由每一從器件實行。In the slave sequence, DLL training is performed through steps S1555 to S1575. In step S1555, the slave device (ie, the slaveN device) checks whether DLL training is enabled. In step S1560, if DLL training is enabled, the DLL value is increased by, for example, adding 1 to the DLL value. In step S1565, the lag flag and the lead flag are checked. If the DLL value is the largest, the lag flag displays 0 and the lead flag displays 1, so if the DLL value is not the largest, step S1560 is repeated. If the DLL value is the largest, the step continues to step S1570 by decreasing the DLL value by one. The reason for decrementing the DLL value by 1 is that when the condition in step S1565 is NO, the maximum value represents the last value of the DLL value in the condition. Finally, in step S1575, the slave device sets a flag indicating completion of DLL training. That is, a flag indicating that the DLL is complete indicates that the DLL training of the slave device (ie, the slaveN device) is complete. Therefore, the maximum DLL value is obtained. Steps S1555 to S1575 are performed by each slave device.

圖16示出根據本公開實施例的寫入資料群集訓練的示意性流程圖。在根據流程圖1500獲得最大DLL值之後，如流程圖1600中所示通過寫入資料群集訓練繼續DLL訓練。由於寫入資料群集訓練是為了根據最佳時鐘相位將資料從處理器105寫入到從器件（即，slaveN器件、slaveK器件），因此寫入資料群集訓練指的是主到從訓練。寫入資料群集訓練是在不同的視角下實行，所述不同的視角包括I2C序列、主序列及從序列。寫入資料群集訓練的目的是在寫入資料期間獲得DLL的中間值。通過根據DLL的中間值寫入資料，正確地寫入資料，因此，可將位元錯誤最小化。DLL的中間值表示最佳時鐘相位。FIG. 16 shows a schematic flowchart of write data cluster training according to an embodiment of the present disclosure. After obtaining the maximum DLL value according to flowchart 1500 , DLL training continues by writing data cluster training as shown in flowchart 1600 . Since write data cluster training is to write data from processor 105 to slave devices (ie, slaveN devices, slaveK devices) according to the optimal clock phase, write data cluster training refers to master-to-slave training. Write cluster training is performed under different perspectives including I2C sequence, master sequence and slave sequence. The purpose of the write data cluster training is to obtain the intermediate value of the DLL during the write data. By writing the data according to the intermediate value of the DLL, the data is correctly written, and therefore, bit errors can be minimized. The median value of the DLL represents the optimal clock phase.

在I2C序列中，在步驟S1605到S1625中實行寫入資料群集訓練。在步驟S1605中，將處理器105的對應的寄存器設定為啟用寫入資料群集訓練。在步驟S1610中，將每一從器件的寄存器設定為啟用寫入資料群集訓練。在步驟S1615中，檢查與寫入資料群集訓練完成對應的每一從器件的寄存器。如果設定了與完成寫入資料群集訓練對應的每一從器件的寄存器，則通過禁用每一從器件的寄存器來實行步驟S1620。在步驟S1625中，禁用處理器105的寄存器。也就是說，通過獲得與完成寫入資料群集訓練對應的每一從器件的寄存器，已優化用於寫入資料的每一從器件的DLL值。因此，可將位元錯誤最小化。In the I2C sequence, write data cluster training is performed in steps S1605 to S1625. In step S1605, the corresponding register of the processor 105 is set to enable the writing data cluster training. In step S1610, the registers of each slave device are set to enable write data cluster training. In step S1615, the register of each slave device corresponding to the completion of the writing data cluster training is checked. If the register of each slave device corresponding to the completion of the write data cluster training is set, step S1620 is performed by disabling the register of each slave device. In step S1625, the registers of the processor 105 are disabled. That is, by obtaining the registers of each slave corresponding to the completion of the write data cluster training, the DLL value of each slave for writing data has been optimized. Therefore, bit errors can be minimized.

在主序列中，在步驟S1630到S1645中實行寫入資料群集訓練。在步驟S1630中，處理器105檢查寫入資料群集訓練是否被啟用。如果寫入資料群集訓練被啟用，則在步驟S1635中，啟用BIST產生器。在步驟S1640中，處理器105檢查寫入資料群集訓練是否被禁用。在當對所有從器件的寫入資料群集訓練已完成的情況下，禁用寫入資料群集訓練。在步驟S1645中，由於已完成對所有從器件的寫入資料群集訓練，因此禁用BIST產生器。也就是說，通過獲得寫入資料群集訓練被禁用，已完成對所有從器件的寫入資料群集訓練。因此，已獲得用於寫入資料的最佳時鐘相位。In the main sequence, write data cluster training is carried out in steps S1630 to S1645. In step S1630, the processor 105 checks whether the write data cluster training is enabled. If write-data cluster training is enabled, then in step S1635, the BIST generator is enabled. In step S1640, the processor 105 checks whether the write data cluster training is disabled. Disables write-cluster training when write-cluster training is complete for all slaves. In step S1645, the BIST generator is disabled since the write data cluster training for all slave devices has been completed. That is, with get-write-cluster training disabled, write-cluster training for all slaves has been completed. Therefore, the optimum clock phase for writing the data has been obtained.

在從序列中，在步驟S1650到S1695中實行寫入資料群集訓練。在步驟S1650中，檢查與寫入資料群集訓練被啟用對應的寄存器。回應於寫入資料群集訓練被啟用，在步驟S1655中，將DLL值設定為0。在步驟S1660中，啟用BIST檢查器。通過啟用BIST檢查器，檢查由處理器105產生的BIST。在步驟S1665中，在例如X次以內檢查BIST。X表示等於或大於1的整數值。X也可表示檢查BIST的持續時間。如果已檢查BIST X次，則在步驟S1670中禁用BIST檢查器。在步驟S1675中，更新表示通過值的DLL視窗。通過值表示從器件正確讀取BIST。在步驟S1680中，檢查DLL值是否達到最大值。DLL的最大值已根據圖15獲得。如果DLL值不是最大值，則在步驟S1685中增大DLL值。且然後，重複步驟S1660到S1685，直到DLL值達到最大循環/值。在步驟S1690中，如果DLL值達到最大循環，則將DLL值設定為通過視窗的中間值。通過視窗的中間值表示在最佳時鐘相位將BIST寫入到從器件。在步驟S1695中，通過例如將對應旗標改變為1來設定表示寫入資料群集訓練完成的寄存器。在主序列中，處理器105檢查此旗標，以確定已完成對所有從器件的寫入資料群集訓練。In the slave sequence, write data cluster training is performed in steps S1650 to S1695. In step S1650, the register corresponding to which write data cluster training is enabled is checked. In response to the write data cluster training being enabled, the DLL value is set to 0 in step S1655. In step S1660, the BIST checker is enabled. The BIST generated by the processor 105 is checked by enabling the BIST checker. In step S1665, the BIST is checked, for example, within X times. X represents an integer value equal to or greater than 1. X may also indicate the duration of the check BIST. If the BIST has been checked X times, the BIST checker is disabled in step S1670. In step S1675, the DLL window representing the pass value is updated. A pass value indicates that the BIST was correctly read by the slave. In step S1680, it is checked whether the DLL value reaches the maximum value. The maximum value of DLL has been obtained according to Figure 15. If the DLL value is not the maximum value, the DLL value is increased in step S1685. And then, steps S1660 to S1685 are repeated until the DLL value reaches the maximum cycle/value. In step S1690, if the DLL value reaches the maximum cycle, the DLL value is set to the middle value through the window. The median value through the window indicates that the BIST is written to the slave at the optimum clock phase. In step S1695, the register indicating the completion of the writing data cluster training is set by changing the corresponding flag to 1, for example. During the main sequence, the processor 105 checks this flag to determine that write cluster training for all slaves has been completed.

圖17示出根據本公開實施例的讀取資料群集訓練的示意性流程圖。可在流程圖1600後實行流程圖1700。實行讀取資料群集訓練以獲得用於讀取資料的最佳時鐘相位。讀取資料群集訓練是在不同的視角下實行，所述不同的視角包括I2C序列、主序列及從序列。FIG. 17 shows a schematic flowchart of reading data cluster training according to an embodiment of the present disclosure. Flowchart 1700 may be performed after flowchart 1600 . Read cluster training is performed to obtain the optimal clock phase for the reads. Read clustering training is performed under different perspectives including I2C sequence, master sequence, and slave sequence.

在I2C序列中，在步驟S1702到S1716中實行讀取資料群集訓練。在步驟S1702中，從從器件的對應的寄存器讀取最大DLL值。在步驟S1704中，將從從器件的對應的寄存器讀取的DLL值寫入到處理器105的寄存器。在步驟S1706中，將表示讀數據群集訓練被啟用的對應旗標設定到從器件的寄存器。在步驟S1708中，將表示讀取資料群集訓練被啟用的對應旗標設定到處理器105的寄存器。在步驟S1710中，從從器件檢查表示讀取資料群集訓練完成的對應旗標。在步驟S1712中，如果表示讀取資料群集訓練完成的對應旗標被啟用，則禁用表示讀取資料群集訓練的處理器105的對應旗標。在步驟S1714中，禁用表示讀取資料群集訓練的從器件的對應旗標。在步驟S1716中，檢查每一從器件是否已實行讀取資料群集訓練。如果一個或多個從器件尚未實行讀取資料群集訓練的過程，則重複步驟S1706到S1716，直到所有從器件已實行讀取資料群集訓練。也就是說，通過獲得每一從器件的寄存器的對應旗標已被啟用，每一從器件已實行讀取資料群集訓練。In the I2C sequence, read data cluster training is performed in steps S1702 to S1716. In step S1702, the maximum DLL value is read from the corresponding register of the slave device. In step S1704, the DLL value read from the corresponding register of the slave device is written to the register of the processor 105. In step S1706, a corresponding flag indicating that read data cluster training is enabled is set to the register of the slave device. In step S1708, a corresponding flag indicating that read data cluster training is enabled is set to a register of the processor 105. In step S1710, the slave device checks the corresponding flag indicating that the training of the read data cluster is completed. In step S1712, if the corresponding flag indicating that the training of the read data cluster is completed is enabled, the corresponding flag of the processor 105 indicating the training of the read data cluster is disabled. In step S1714, the corresponding flags representing slaves that read data cluster training are disabled. In step S1716, it is checked whether each slave device has performed read data cluster training. If one or more slave devices have not performed the process of reading data cluster training, steps S1706 to S1716 are repeated until all slave devices have performed read data cluster training. That is, each slave has performed read data cluster training by obtaining the corresponding flag of each slave's register has been enabled.

在主序列中，在步驟S1720到S1748中實行讀取資料群集訓練。在步驟S1720中，處理器105檢查與讀取資料群集訓練對應的寄存器的旗標。在步驟S1722中，如果與讀取資料群集訓練對應的寄存器的旗標被啟用，則將DLL值設定為0。在步驟S1724中，處理器105設定用於更新DLL_r值的命令。在步驟S1726中，處理器105設定用於更新DLL_f值的命令。在步驟S1728中，處理器105重置讀取FIFO。讀取FIFO需要被重置的原因是為了避免處理器105從從器件讀取錯誤的讀取資料序列。如果讀取FIFO沒有被清除，則讀取FIFO中的資料序列可能不表示正確的資料序列。在步驟S1730中，處理器105設定用於啟用tx_en的命令。在步驟S1732中，處理器105啟用BIST檢查器。通過啟用BIST檢查器，處理器準備讀取由從器件產生的BIST資料。在步驟S1734中，在X次內讀取由從器件產生的BIST資料。在前述說明中已闡述X。在步驟S1736中，如果已在X次內讀取BIST資料，則處理器105禁用BIST檢查器。在步驟S1738中，處理器105設定用於禁用tx_en的命令。在步驟S1740中，處理器105更新通過視窗。在前述說明中已闡述通過視窗。在步驟S1742中，檢查DLL值是否達到最大循環/值。在步驟S1744中，如果DLL值未達到最大循環，則增大DLL。且然後，重複步驟S1724到S1744，直到DLL值達到最大值。在步驟S1746中，如果DLL值已達到最大值，則將DLL設定為從器件的通過視窗的中間值。在步驟S1748中，設定表示讀取資料群集訓練完成的旗標。In the main sequence, read data cluster training is carried out in steps S1720 to S1748. In step S1720, the processor 105 checks the flag of the register corresponding to the training of the read data cluster. In step S1722, if the flag of the register corresponding to the read data cluster training is enabled, the DLL value is set to 0. In step S1724, the processor 105 sets a command for updating the value of DLL_r. In step S1726, the processor 105 sets a command for updating the value of DLL_f. In step S1728, the processor 105 resets the read FIFO. The reason the read FIFO needs to be reset is to prevent the processor 105 from reading an erroneous sequence of read data from the slave. If the read FIFO is not cleared, the data sequence in the read FIFO may not represent the correct data sequence. In step S1730, the processor 105 sets a command for enabling tx_en. In step S1732, the processor 105 enables the BIST checker. By enabling the BIST checker, the processor prepares to read the BIST data generated by the slave device. In step S1734, the BIST data generated by the slave device is read X times. X has been set forth in the foregoing description. In step S1736, if the BIST material has been read within X times, the processor 105 disables the BIST checker. In step S1738, the processor 105 sets a command for disabling tx_en. In step S1740, the processor 105 updates the pass window. The pass-through window has been explained in the foregoing description. In step S1742, it is checked whether the DLL value reaches the maximum cycle/value. In step S1744, if the DLL value has not reached the maximum cycle, the DLL is increased. And then, steps S1724 to S1744 are repeated until the DLL value reaches the maximum value. In step S1746, if the DLL value has reached the maximum value, the DLL is set to the middle value of the pass-through window of the slave device. In step S1748, a flag indicating that the training of the read data cluster is completed is set.

在從序列中，在步驟S1750到S1766中實行讀取資料群集訓練。在步驟S1750中，檢查與讀取資料群集訓練啟用對應的旗標。在步驟S1752中，如果設定了與讀取資料群集訓練啟用對應的旗標，則啟用BIST產生器。通過啟用BIST產生器，從器件產生BIST資料並相應地將BIST資料發送到處理器105。在步驟S1754中，從器件檢查處理器105是否根據命令設定tx_en。在步驟S1756中，處理器105根據命令設定tx_en，從器件啟用tx_en。在步驟S1758中，從器件檢查處理器105是否根據命令清除tx_en。在步驟S1760中，處理器105根據命令清除tx_en，從器件禁用tx_en。在步驟S1762中，從器件檢查是否更新了DLL_r ot或DLL_f。如果更新了DLL_r或DLL_f，則重複步驟S1754到S1762。在步驟S1764中，如果從器件未更新DLL_r或DLL_f，則檢查表示讀取資料群集訓練被禁用的旗標。如果表示讀取資料群集訓練的旗標未被禁用，則重複步驟S1762到S1764。在步驟S1766中，如果表示讀取資料群集訓練的旗標被禁用，則禁用BIST產生器。也就是說，從器件通過實行讀取資料群集訓練來更新DLL_r和/或DLL_f。In the slave sequence, read data cluster training is carried out in steps S1750 to S1766. In step S1750, a flag corresponding to read data cluster training enable is checked. In step S1752, if a flag corresponding to read data cluster training enable is set, the BIST generator is enabled. By enabling the BIST generator, the slave device generates BIST data and sends the BIST data to the processor 105 accordingly. In step S1754, the slave device checks whether the processor 105 sets tx_en according to the command. In step S1756, the processor 105 sets tx_en according to the command, and enables tx_en from the device. In step S1758, the slave device checks whether the processor 105 clears tx_en according to the command. In step S1760, the processor 105 clears tx_en according to the command, and disables tx_en from the slave device. In step S1762, the slave device checks whether DLL_rot or DLL_f is updated. If DLL_r or DLL_f is updated, steps S1754 to S1762 are repeated. In step S1764, if the slave device has not updated DLL_r or DLL_f, a flag indicating that read data cluster training is disabled is checked. If the flag indicating read data cluster training is not disabled, steps S1762 to S1764 are repeated. In step S1766, if the flag indicating read data cluster training is disabled, the BIST generator is disabled. That is, the slave device updates DLL_r and/or DLL_f by performing read data cluster training.

此外，用於讀取資料群集訓練的命令的實例被提供如下。由於在讀數據群集訓練中使用的DLL值是9個位，則這9個位是通過對讀取命令的第一位（S_CMD[0]）、從到主ID的4個位（S_DID[3:0]）及主到從ID的4個位（M_DID[3:0]）進行組合產生。另一方面，通過對讀取命令的第二位（S_CMD[1]）與寫入命令的2個位（M_CMD[1:0]）進行組合來產生所述命令。舉例來說，所述命令通過將位值設定為{0,0,0}來產生IDLE命令。所述命令通過將位值設定為{0,0,1}來產生更新DLL_r值命令。所述命令通過將位值設定為{0,1,0}來產生更新DLL_f值命令。所述命令通過將位值設定為{0,1,1}來產生更新DLL值命令。所述命令通過將位值設定為{1,0,1}來產生tx_en啟用命令。所述命令通過將位值設定為{1,1,0}來產生tx_en禁用命令。Additionally, examples of commands for reading data cluster training are provided below. Since the DLL value used in the read data cluster training is 9 bits, these 9 bits are passed to the first bit of the read command (S_CMD[0]), the 4 bits from the slave to the master ID (S_DID[3: 0]) and the 4 bits of the master to slave ID (M_DID[3:0]) are combined to generate. On the other hand, the command is generated by combining the second bit (S_CMD[1]) of the read command with 2 bits (M_CMD[1:0]) of the write command. For example, the command generates an IDLE command by setting the bit values to {0,0,0}. The command generates an update DLL_r value command by setting the bit values to {0,0,1}. The command generates an update DLL_f value command by setting the bit values to {0,1,0}. The command generates an update DLL value command by setting the bit values to {0,1,1}. The command generates the tx_en enable command by setting the bit values to {1,0,1}. The command generates the tx_en disable command by setting the bit values to {1,1,0}.

圖18示出根據本公開實施例的介面方法。在步驟S1805中，通過由主介面將命令發送到從器件來開始半導體器件的介面方法。在步驟S1810中，通過由從介面及其他從介面中的每一個從主器件接收命令和/或將資料/其他資料發送到主器件來繼續進行介面方法。步驟S1810包括在步驟S1815中使用雙倍數據速率（DDR）配置將資料/其他資料發送到主介面。步驟S1815包括步驟S1820到S1840。在步驟1820中，通過由第一觸發器（FF）單元根據資料/其他資料產生一部分資料來繼續進行設備方法。在步驟S1825中，通過由第二FF單元根據資料/其他資料產生另一部分資料來繼續進行設備方法。在步驟S1830中，通過由多工器將一部分資料及另一部分資料發送到主器件來繼續進行設備方法。在步驟1835中，通過由第一選通單元根據由時鐘產生器產生的時鐘產生第一本地時鐘來繼續進行設備方法。在步驟S1840中，通過由第二選通單元根據由時鐘產生器產生的時鐘產生第二本地時鐘來繼續進行設備方法。且然後，在步驟S1845中，由主介面從從器件接收資料來繼續進行設備方法。步驟S1845包括由主介面使用在步驟S1850中由時鐘產生器產生的時鐘，對來自從介面及其他從介面的DDR單元的一部分資料及另一部分資料進行重新計時。FIG. 18 illustrates an interface method according to an embodiment of the present disclosure. In step S1805, the interface method of the semiconductor device is started by sending a command from the master interface to the slave device. In step S1810, the interface method continues by receiving commands from the master device and/or sending data/other data to the master device by each of the slave interface and other slave interfaces. Step S1810 includes sending the data/other data to the host interface using a double data rate (DDR) configuration in step S1815. Step S1815 includes steps S1820 to S1840. In step 1820, the device method continues by generating a portion of the data from the data/other data by the first flip-flop (FF) unit. In step S1825, the device method is continued by generating another part of the data from the data/other data by the second FF unit. In step S1830, the device method continues by sending one part of the data and another part of the data to the master by the multiplexer. In step 1835, the device method continues by generating, by the first gating unit, a first local clock from the clock generated by the clock generator. In step S1840, the device method continues by generating a second local clock by the second gating unit according to the clock generated by the clock generator. And then, in step S1845, the device method is continued by receiving data from the slave device by the master interface. Step S1845 includes using the clock generated by the clock generator in step S1850 by the master interface to re-clock a portion of the data and another portion of the data from the DDR cells of the slave interface and other slave interfaces.

總之，用於3D半導體器件的介面器件及介面方法在主器件與從器件之間提供可靠的資料通信。可靠的資料通信是通過向每一從器件提供特定的時隙來實現。主器件還提供從器件之間的資料等待時間。通過這樣做，可避免從器件之間的匯流排競爭。此外，為了以最佳採樣相位對資料進行採樣，當半導體器件啟動/接通時，從設備訓練本地時鐘。通過訓練本地時鐘，可在最佳資料採樣點對資料進行採樣，這樣一來，可降低錯誤率。另外，從器件還更新本地時鐘，以補償半導體器件的T-V改變。In summary, an interface device and interface method for 3D semiconductor devices provides reliable data communication between a master device and a slave device. Reliable data communication is achieved by providing specific time slots to each slave device. The master device also provides data latency between slave devices. By doing so, bus contention between slave devices can be avoided. Furthermore, in order to sample the data with the optimal sampling phase, the slave device trains the local clock when the semiconductor device is turned on/on. By training the local clock, the data can be sampled at the optimal data sampling point, thus reducing the error rate. In addition, the slave device also updates the local clock to compensate for T-V changes in the semiconductor device.

在另一實施例中，提供用於在主器件與從器件之間進行介面的介面器件，其中所述主器件產生命令且所述從器件根據所述命令產生資料，所述介面器件包括主介面以及從介面。所述主介面耦合到所述主器件。所述主介面被配置成將所述命令發送到所述從器件和/或從所述從器件接收所述資料。所述從介面耦合到所述從器件。所述從介面被配置成從所述主器件接收所述命令和/或將所述資料發送到所述主器件。所述主介面及所述從介面由時鐘產生器產生的時鐘驅動。所述主介面與所述從介面由一個或多個結合件和/或TSV進行電連接。用於驅動所述從介面的所述時鐘是通過將所述時鐘的時鐘相位改變成與所述命令的資料群集和/或所述資料的資料群集對齊來進行訓練。In another embodiment, an interface device is provided for interfacing between a master device and a slave device, wherein the master device generates commands and the slave device generates data according to the commands, the interface device includes a master interface and from the interface. The host interface is coupled to the host device. The master interface is configured to send the command to the slave device and/or receive the data from the slave device. The slave interface is coupled to the slave device. The slave interface is configured to receive the command from the master device and/or send the data to the master device. The master interface and the slave interface are driven by a clock generated by a clock generator. The master interface and the slave interface are electrically connected by one or more bonding elements and/or TSVs. The clock used to drive the slave interface is trained by changing the clock phase of the clock to align with the data clusters of the command and/or the data clusters of the data.

在另一實施例中，所述介面器件還包括其他從介面。所述其他從介面以一對一的關係耦合到其他從器件。所述其他從介面被配置成從所述主器件接收所述命令和/或將由所述其他從器件產生的其他資料發送到所述主器件。所述其他從介面由所述時鐘產生器產生的所述時鐘驅動且通過所述一個或多個結合件和/或所述TSV電連接到所述主介面。用於驅動所述其他從介面中的每一個的每一時鐘是通過將每一時鐘的時鐘相位改變成與和所述其他從介面中的每一個對應的所述命令的資料群集和/或所述資料的資料群集對齊來進行訓練。In another embodiment, the interface device further includes other slave interfaces. The other slave interfaces are coupled to other slave devices in a one-to-one relationship. The other slave interface is configured to receive the command from the master device and/or send other data generated by the other slave device to the master device. The other slave interfaces are driven by the clock generated by the clock generator and are electrically connected to the master interface through the one or more bonds and/or the TSV. Each clock used to drive each of the other slave interfaces is generated by changing the clock phase of each clock to the data cluster and/or all of the commands corresponding to each of the other slave interfaces. The data clusters of the above data are aligned for training.

在另一實施例中，所述從介面及所述其他從介面中的每一個還被配置成使用雙倍數據速率（DDR）配置將所述資料/所述其他資料發送到所述主介面。所述DDR配置是由DDR單元產生。所述DDR單元包括第一觸發器（FF）單元、第二觸發器單元以及多工器。所述第一FF單元被配置成根據所述資料/所述其他資料產生一部分資料。所述第二FF單元被配置成根據所述資料/所述其他資料產生另一部分資料。所述多工器耦合到所述第一FF單元及所述第二FF單元。所述多工器被配置成將所述一部分資料及所述另一部分資料發送到所述主器件。In another embodiment, the slave interface and the other slave interfaces are each further configured to send the data/the other data to the master interface using a double data rate (DDR) configuration. The DDR configuration is generated by DDR cells. The DDR unit includes a first flip-flop (FF) unit, a second flip-flop unit, and a multiplexer. The first FF unit is configured to generate a portion of the data from the data/the other data. The second FF unit is configured to generate another portion of data from the data/the other data. The multiplexer is coupled to the first FF unit and the second FF unit. The multiplexer is configured to send the portion of the data and the other portion of the data to the master device.

在另一實施例中，所述從介面及所述其他從介面中的每一個還包括第一選通單元以及第二選通單元。所述第一選通單元被配置成根據由所述時鐘產生器產生的所述時鐘產生第一本地時鐘。所述第二選通單元被配置成根據由所述時鐘產生器產生的所述時鐘產生第二本地時鐘。由所述第一選通單元產生的所述第一本地時鐘由所述主介面用於讀取由所述第一FF單元產生的所述一部分資料。由所述第二選通單元產生的所述第二本地時鐘由所述主介面用於讀取由所述第二FF單元產生的所述另一部分資料。In another embodiment, each of the slave interface and the other slave interfaces further includes a first gating unit and a second gating unit. The first gating unit is configured to generate a first local clock according to the clock generated by the clock generator. The second gating unit is configured to generate a second local clock according to the clock generated by the clock generator. The first local clock generated by the first gating unit is used by the host interface to read the part of the data generated by the first FF unit. The second local clock generated by the second gating unit is used by the main interface to read the other part of the data generated by the second FF unit.

在另一實施例中，所述主器件還產生轉向（TA）迴圈。所述TA迴圈用於防止所述從器件的回應與所述其他從器件的回應之間的匯流排競爭。在另一實施例中，所述TA迴圈用於補償所述從器件與所述其他從器件之間的往返延遲（RTD）差。在另一實施例中，所述從器件及所述其他從器件在所述資料之前及所述資料之後產生零資料，以防止由於不同的所述RTD而引起的所述從器件與所述其他從器件之間的競爭。在另一實施例中，所述主介面還包括先進先出（FIFO）單元，所述先進先出單元被配置成使用由所述時鐘產生器產生的所述時鐘對來自所述從介面及所述其他從介面的所述DDR單元的所述一部分資料及所述另一部分資料進行重新計時。在另一實施例中，所述從器件及所述其他從器件對所述第一選通單元及所述第二選通單元進行訓練，以在最佳資料採樣點定位所述一部分資料及所述另一部分資料。In another embodiment, the master device also generates turnaround (TA) loops. The TA loop is used to prevent bus competition between responses from the slave device and responses from the other slave devices. In another embodiment, the TA loop is used to compensate for round trip delay (RTD) differences between the slave device and the other slave devices. In another embodiment, the slave and the other slaves generate zero data before the data and after the data to prevent the slave and the other slaves from being caused by different the RTDs competition between slave devices. In another embodiment, the master interface further includes a first-in, first-out (FIFO) unit configured to use the clock pair generated by the clock generator from the slave interface and all The part of the data and the other part of the data of the DDR cells of the other slave interface are re-clocked. In another embodiment, the slave device and the other slave devices train the first gating unit and the second gating unit to locate the portion of data and all data at optimal data sampling points another part of the information.

在另一實施例中，提供一種用於在主器件與從器件之間進行介面的介面方法，其中所述主器件產生命令且所述從器件根據所述命令產生資料，所述介面方法包括：由主介面將所述命令發送到所述從器件和/或從所述從器件接收所述資料；以及由從介面從所述主器件接收所述命令和/或將所述資料發送到所述主器件。所述主介面及所述從介面由時鐘產生器產生的時鐘驅動。所述主介面與所述從介面由一個或多個結合件和/或TSV進行電連接。用於驅動所述從介面的所述時鐘是通過將所述時鐘的時鐘相位改變成與所述命令的資料群集和/或所述資料的資料群集對齊來進行訓練。In another embodiment, there is provided an interface method for interfacing between a master device and a slave device, wherein the master device generates a command and the slave device generates data according to the command, the interface method comprising: sending, by a master interface, the command to the slave device and/or receiving the data from the slave device; and receiving, by a slave interface, the command from the master device and/or sending the data to the slave device master device. The master interface and the slave interface are driven by a clock generated by a clock generator. The master interface and the slave interface are electrically connected by one or more bonding elements and/or TSVs. The clock used to drive the slave interface is trained by changing the clock phase of the clock to align with the data clusters of the command and/or the data clusters of the data.

在另一實施例中，所述介面方法還包括：由其他從介面從所述主器件接收所述命令和/或將由所述其他從器件產生的其他資料發送到所述主器件。所述其他從介面由所述時鐘產生器產生的所述時鐘驅動且通過所述一個或多個結合件和/或所述TSV電連接到所述主介面。用於驅動所述其他從介面中的每一個的每一時鐘是通過將每一時鐘的時鐘相位改變成與和所述其他從介面中的每一個對應的所述命令的資料群集和/或所述資料的資料群集對齊來進行訓練。In another embodiment, the interface method further comprises: receiving, by the other slave interface, the command from the master device and/or sending other data generated by the other slave device to the master device. The other slave interfaces are driven by the clock generated by the clock generator and are electrically connected to the master interface through the one or more bonds and/or the TSV. Each clock used to drive each of the other slave interfaces is generated by changing the clock phase of each clock to the data cluster and/or all of the commands corresponding to each of the other slave interfaces. The data clusters of the above data are aligned for training.

在另一實施例中，所述從介面及所述其他從介面中的每一個從所述主器件接收所述命令和/或將所述資料/所述其他資料發送到所述主器件還包括：使用雙倍數據速率（DDR）配置將所述資料/所述其他資料發送到所述主介面。使用雙倍數據速率（DDR）配置將所述資料/所述其他資料發送到所述主介面還包括：由第一觸發器（FF）單元根據所述資料/所述其他資料產生一部分資料；由第二FF單元根據所述資料/所述其他資料產生另一部分資料；以及由多工器將所述一部分資料及所述另一部分資料發送到所述主器件。In another embodiment, each of the slave interface and the other slave interfaces receiving the command from the master device and/or sending the data/the other data to the master device further comprises : Send the data/the other data to the main interface using a double data rate (DDR) configuration. Sending the data/the other data to the host interface using a double data rate (DDR) configuration further includes: generating, by a first flip-flop (FF) unit, a portion of the data according to the data/the other data; by The second FF unit generates another part of the data according to the data/the other data; and sends the part of the data and the other part of the data to the host device by a multiplexer.

在另一實施例中，使用DDR配置將所述資料/所述其他資料發送到所述主介面還包括：由第一選通單元根據由所述時鐘產生器產生的所述時鐘產生第一本地時鐘；以及由第二選通單元根據由所述時鐘產生器產生的所述時鐘產生第二本地時鐘。由所述第一選通單元產生的所述第一本地時鐘由所述主介面用於讀取由所述第一FF單元產生的所述一部分資料。由所述第二選通單元產生的所述第二本地時鐘由所述主介面用於讀取由所述第二FF單元產生的所述另一部分資料。In another embodiment, sending the data/the other data to the host interface using the DDR configuration further comprises: generating, by a first gating unit, a first local based on the clock generated by the clock generator a clock; and generating, by a second gating unit, a second local clock according to the clock generated by the clock generator. The first local clock generated by the first gating unit is used by the host interface to read the part of the data generated by the first FF unit. The second local clock generated by the second gating unit is used by the main interface to read the other part of the data generated by the second FF unit.

在另一實施例中，所述主器件還產生轉向（TA）迴圈。所述TA迴圈用於防止所述從器件的回應與所述其他從器件的回應之間的匯流排競爭。在另一實施例中，所述TA迴圈用於補償所述從器件與所述其他從器件之間的往返延遲（RTD）差。在另一實施例中，所述從器件及所述其他從器件在所述資料之前及所述資料之後產生零資料，以防止由於不同的所述RTD而引起的所述從器件與所述其他從器件之間的競爭。在另一實施例中，由所述主介面將所述命令發送到所述從器件和/或從所述從器件接收所述資料還包括：使用由所述時鐘產生器產生的所述時鐘對來自所述從介面及所述其他從介面的所述DDR單元的所述一部分資料及所述另一部分資料進行重新計時。在另一實施例中，所述從器件及所述其他從器件對所述第一選通單元及所述第二選通單元進行訓練，以在最佳資料採樣點定位所述一部分資料及所述另一部分資料。In another embodiment, the master device also generates turnaround (TA) loops. The TA loop is used to prevent bus competition between responses from the slave device and responses from the other slave devices. In another embodiment, the TA loop is used to compensate for round trip delay (RTD) differences between the slave device and the other slave devices. In another embodiment, the slave and the other slaves generate zero data before the data and after the data to prevent the slave and the other slaves from being caused by different the RTDs competition between slave devices. In another embodiment, sending, by the master interface, the command to the slave device and/or receiving the data from the slave device further comprises: using the clock pair generated by the clock generator The portion of data and the other portion of data from the DDR cells of the slave interface and the other slave interfaces are reclocked. In another embodiment, the slave device and the other slave devices train the first gating unit and the second gating unit to locate the portion of data and all data at optimal data sampling points another part of the information.

以上已概述若干實施例的特徵，以使所屬領域中的技術人員可更好地理解以下詳細說明。所屬領域中的技術人員應理解，他們可容易地使用本公開作為設計或修改其他工藝及結構的基礎來施行與本文中所介紹的實施例相同的目的和/或實現與本文中所介紹的實施例相同的優點。所屬領域中的技術人員還應認識到，這些等效構造並不背離本公開的精神及範圍，而且他們可在不背離本公開的精神及範圍的條件下在本文中作出各種改變、代替及變更。The features of several embodiments have been outlined above in order that those skilled in the art may better understand the detailed description that follows. Those skilled in the art should appreciate that they may readily use the present disclosure as a basis for designing or modifying other processes and structures for carrying out the same purposes and/or implementing the embodiments described herein example of the same advantages. Those skilled in the art should also realize that these equivalent constructions do not depart from the spirit and scope of the present disclosure, and that they can make various changes, substitutions and alterations herein without departing from the spirit and scope of the present disclosure .

100、200、300、400、500、600、700:半導體器件 101、111:介面器件 102、501-2:主介面 103、103-1～103-N、Glink-3D slaves:從介面 104、406、412、502-3、502-4、503-3:矽穿孔 105、106:主器件 107、115:時鐘產生器 108-1～108-M:中央處理單元 110、110-1～110-N:從器件 120、501-1:主晶片 130:從晶片 402:晶片 404:介面 408:晶片 410:介面 414、506:連接件 502-1:第一從晶片 502-2:第一從介面 503-1:第二從晶片 503-2:第二從介面 504:TSV連接件 800、900、1000:示意圖 802、803-1、803-2、803-3、810、812、814、904、1031、1032、1041、1042、1051、1061:觸發器（FF） 804、816:雙倍數據速率（DDR）多工器（MUX） 806-1、806-2、806-3:結合件 808-1、808-2、808-3:結合件 818、820、1008、1017、1018:緩衝器 901:SRAM 902:邏輯單元 1002:第一FF 1004:第二FF 1006:多工器 1011、1012、1013、1014、1021、1022、1023、1024:結合件 1015:第一選通 1016:第二選通 1019:時鐘路徑 1020:時鐘樹 1030:反相器 1300:時序圖 1400:電路圖 1500、1600、1700:流程圖 address wr_data、NOP、rx_data command、s_cmd、tx_data command:命令 clk、clk_in、clk_out、slaveK clk_B、slaveN clk_B:時鐘 d_did:從ID dataK、dataN0、dataN1、DK[15:0]、DK[31:16]、DN[15:0]、DN[31:16]、Master RX_D、rx_data、rx_data[31:0]、tx_data、tx_data[31:16]、tx_data[15:0]、tx_dataK、tx_dataN、tx_dataN0、tx_dataN1:數據 Glink-3D:設備介面 Glink-3D master:主介面 Glink-3D slaveK、Glink-3D slaveN:從介面 PA:前同步命令 RD:讀取命令 RDQS_F:本地時鐘 RDQS_F Initial:本地時鐘 RDQS_R:本地時鐘 RDQS_R Initial:本地時鐘 RDQS_F Trained、RDQS_R Trained:相位 S1505、S1510、S1515、S1520、S1555、S1560、S1565、S1570、S1575、S1605、S1610、S1615、S1620、S1625、S1630、S1635、S1640、S1645、S1650、S1655、S1660、S1665、S1670、S1675、S1680、S1685、S1690、S1695、S1702、S1704、S1706、S1708、S1710、S1712、S1714、S1716、S1720、S1722、S1724、S1726、S1728、S1730、S1732、S1734、S1736、S1738、S1740、S1742、S1744、S1746、S1748、S1750、S1752、S1754、S1756、S1758、S1760、S1762、S1764、S1766、S1805、S1810、S1815、1820、S1825、S1830、1835、S1840、S1845、S1850:步驟 slave_ID:從晶片位址 slaveN、slaveK:晶片 TX_D:從結合件 tx_data [31:0]:資料 tx_en:啟用信號 100, 200, 300, 400, 500, 600, 700: Semiconductor devices 101, 111: Interface Devices 102, 501-2: Main interface 103, 103-1～103-N, Glink-3D slaves: slave interface 104, 406, 412, 502-3, 502-4, 503-3: TSV 105, 106: Master device 107, 115: Clock generator 108-1～108-M: Central Processing Unit 110, 110-1～110-N: slave device 120, 501-1: Main wafer 130: From Wafer 402: Wafer 404: interface 408: Wafer 410: Interface 414, 506: Connectors 502-1: First slave wafer 502-2: First slave interface 503-1: Second slave wafer 503-2: Second slave interface 504:TSV connector 800, 900, 1000: Schematic 802, 803-1, 803-2, 803-3, 810, 812, 814, 904, 1031, 1032, 1041, 1042, 1051, 1061: Flip Flop (FF) 804, 816: Double Data Rate (DDR) Multiplexer (MUX) 806-1, 806-2, 806-3: Bonding pieces 808-1, 808-2, 808-3: Bonding pieces 818, 820, 1008, 1017, 1018: Buffer 901:SRAM 902: Logic Unit 1002: First FF 1004: Second FF 1006: Multiplexer 1011, 1012, 1013, 1014, 1021, 1022, 1023, 1024: Joints 1015: First Strobe 1016: Second Strobe 1019: Clock Path 1020: Clock Tree 1030: Inverter 1300: Timing Diagram 1400: Circuit Diagram 1500, 1600, 1700: Flowchart address wr_data, NOP, rx_data command, s_cmd, tx_data command: command clk, clk_in, clk_out, slaveK clk_B, slaveN clk_B: clock d_did: from ID dataK, dataN0, dataN1, DK[15:0], DK[31:16], DN[15:0], DN[31:16], Master RX_D, rx_data, rx_data[31:0], tx_data, tx_data[ 31:16], tx_data[15:0], tx_dataK, tx_dataN, tx_dataN0, tx_dataN1: data Glink-3D: Device Interface Glink-3D master: main interface Glink-3D slaveK, Glink-3D slaveN: slave interface PA: Preamble command RD: read command RDQS_F: local clock RDQS_F Initial: local clock RDQS_R: local clock RDQS_R Initial: local clock RDQS_F Trained, RDQS_R Trained: Phase S1505、S1510、S1515、S1520、S1555、S1560、S1565、S1570、S1575、S1605、S1610、S1615、S1620、S1625、S1630、S1635、S1640、S1645、S1650、S1655、S1660、S1665、S1670、S1675、S1680、 S1685、S1690、S1695、S1702、S1704、S1706、S1708、S1710、S1712、S1714、S1716、S1720、S1722、S1724、S1726、S1728、S1730、S1732、S1734、S1736、S1738、S1740、S1742、S1744、S1746、 Steps slave_ID: slave chip address slaveN, slaveK: chip TX_D: slave binding tx_data[31:0]: data tx_en: enable signal

圖1示出根據本公開實施例的包括主器件及從器件的半導體器件的示意性方塊圖。圖2示出根據本公開實施例的包括主器件及從器件的半導體器件的示意性方塊圖。圖3示出根據本公開實施例的包括主器件及多個從器件的半導體器件的示意性方塊圖。圖4示出根據本公開實施例的包括主晶片及從晶片的半導體器件的示意性設計圖。圖5示出根據本公開實施例的包括主晶片及多個從晶片的半導體器件的示意性設計圖。圖6示出根據本公開實施例的包括主晶片及多個從晶片的半導體器件的示意性3D圖。圖7示出根據本公開實施例的包括介面器件結構的實例的半導體器件的示意性3D圖。圖8示出根據本公開實施例的包括主介面及多個從介面的介面器件的示意性示意圖。圖9示出根據本公開實施例的在讀取操作期間包括主晶片及從晶片的介面器件的示意性示意圖。圖10示出根據本公開實施例的包括時鐘樹（clock tree）的從到主介面（slave-to-master interface）的示意性示意圖。圖11示出根據本公開實施例的具有相同本地時鐘速度的兩個從晶片之間的資料的示意性時序圖。圖12示出根據本公開實施例的具有不同時鐘速度的兩個從晶片之間的資料的示意性時序圖。圖13示出根據本公開實施例的具有2個轉向（turn-around，TA）迴圈的兩個從晶片之間的資料的示意性時序圖。圖14示出根據本公開實施例的訓練之前與訓練之後的第一選通單元及第二選通單元的示意性比較。圖15示出根據本公開實施例的延遲鎖定回路（delay lock loop，DLL）訓練的示意性流程圖。圖16示出根據本公開實施例的寫入資料群集訓練的示意性流程圖。圖17示出根據本公開實施例的讀取資料群集訓練的示意性流程圖。圖18示出根據本公開實施例的介面方法。 FIG. 1 shows a schematic block diagram of a semiconductor device including a master device and a slave device according to an embodiment of the present disclosure. 2 shows a schematic block diagram of a semiconductor device including a master device and a slave device according to an embodiment of the present disclosure. 3 shows a schematic block diagram of a semiconductor device including a master device and a plurality of slave devices according to an embodiment of the present disclosure. 4 shows a schematic design diagram of a semiconductor device including a master wafer and a slave wafer according to an embodiment of the present disclosure. 5 shows a schematic design diagram of a semiconductor device including a master wafer and a plurality of slave wafers according to an embodiment of the present disclosure. 6 shows a schematic 3D diagram of a semiconductor device including a master wafer and a plurality of slave wafers according to an embodiment of the present disclosure. 7 shows a schematic 3D diagram of a semiconductor device including an example of an interface device structure according to an embodiment of the present disclosure. FIG. 8 shows a schematic diagram of an interface device including a master interface and a plurality of slave interfaces according to an embodiment of the present disclosure. 9 shows a schematic diagram of an interface device including a master wafer and a slave wafer during a read operation according to an embodiment of the present disclosure. FIG. 10 shows a schematic diagram of a slave-to-master interface including a clock tree according to an embodiment of the present disclosure. 11 shows a schematic timing diagram of data between two slave dies with the same local clock speed according to an embodiment of the present disclosure. 12 shows a schematic timing diagram of data between two slave wafers with different clock speeds, according to an embodiment of the present disclosure. 13 shows a schematic timing diagram of data between two slave wafers with 2 turn-around (TA) loops according to an embodiment of the present disclosure. FIG. 14 shows a schematic comparison of the first gating unit and the second gating unit before and after training according to an embodiment of the present disclosure. FIG. 15 shows a schematic flowchart of delay lock loop (DLL) training according to an embodiment of the present disclosure. FIG. 16 shows a schematic flowchart of write data cluster training according to an embodiment of the present disclosure. FIG. 17 shows a schematic flowchart of reading data cluster training according to an embodiment of the present disclosure. FIG. 18 illustrates an interface method according to an embodiment of the present disclosure.

102:主介面 102: Main interface

103-1、103-N: 103-1, 103-N:

105:主器件 105: Master device

108-1、108-M:中央處理單元(CPU) 108-1, 108-M: Central Processing Unit (CPU)

110-1、110-N:從器件 110-1, 110-N: Slave

111:介面器件 111: Interface Devices

115:時鐘產生器 115: Clock generator

300:半導體器件 300: Semiconductor Devices

Claims

An interface device for interfacing between a master device and a slave device, wherein the master device generates a command and the slave device generates data according to the command, the interface device comprising: a master interface, coupled to the master device, configured to send the command to the slave device and/or receive the data from the slave device; and A slave interface, coupled to the slave device, is configured to receive the command from the master device and/or transmit the data to the master device, wherein the master interface and the slave interface are driven by a clock generated by a clock generator; The master interface and the slave interface are electrically connected by one or more bonding elements and/or TSVs, The clock used to drive the slave interface is trained by changing the clock phase of the clock to align with the data clusters of the command and/or the data clusters of the data.

The interface device of claim 1, further comprising other slave interfaces coupled in a one-to-one relationship to other slave devices, configured to receive the command from the master device and/or to be other data generated by the other slave devices are sent to the master device, where the other slave interfaces are driven by the clock generated by the clock generator and electrically connected to the master interface through the one or more bonds and/or the TSV, Each clock used to drive each of the other slave interfaces is generated by changing the clock phase of each clock to the data cluster and/or all of the commands corresponding to each of the other slave interfaces. The data clusters of the above data are aligned for training.

The interface device of claim 2, wherein the slave interface and the other slave interfaces are each further configured to send the data/the other data to the the main interface.

The interface device of claim 3, wherein the double data rate configuration is generated by a double data rate unit comprising: a first flip-flop (FF) unit configured to generate a portion of the data according to the data/the other data; a second trigger unit configured to generate another part of the data according to the data/the other data; and A multiplexer, coupled to the first flip-flop unit and the second flip-flop unit, is configured to send the portion of the data and the other portion of the data to the master device.

The interface device of claim 2, wherein each of the slave interface and the other slave interfaces further comprises: a first gating unit configured to generate a first local clock according to the clock generated by the clock generator; and a second gating unit configured to generate a second local clock according to the clock generated by the clock generator, The first local clock generated by the first gating unit is used by the main interface to read the part of the data generated by the first flip-flop unit, and the second gating unit is used to read the part of the data generated by the first trigger unit. The generated second local clock is used by the master interface to read the other part of the data generated by the second flip-flop unit.

The interface device of claim 2, wherein the master device further generates a turnaround (TA) loop, wherein the turnaround loop is used to prevent communication between responses from the slave device and responses from the other slave devices Bus competition.

The interface device of claim 6, wherein the steering loop is used to compensate for differences in round trip delay (RTD) between the slave device and the other slave devices.

The interface device of claim 7, wherein the slave device and the other slave devices generate zero data before the data and after the data to prevent the slaves from being caused by different round-trip delays competition between the device and the other slave devices.

The interface device of claim 3, wherein the master interface further includes a first-in, first-out (FIFO) unit configured to use the clock pair generated by the clock generator from the clock The portion of the data and the other portion of the data for the double data rate units of the slave interface and the other slave interfaces are reclocked.

The interface device of claim 5, wherein the slave device and the other slave devices train the first gating unit and the second gating unit to locate the first gating unit at an optimal data sampling point part of the information and said part of the information.

An interface method for interfacing between a master device and a slave device, wherein the master device generates a command and the slave device generates data according to the command, the interface method comprising: sending, by a master interface, the command to the slave device and/or receiving the data from the slave device; and receiving said command from said master device and/or sending said data to said master device by a slave interface, wherein the master interface and the slave interface are driven by a clock generated by a clock generator, wherein the master interface and the slave interface are electrically connected by one or more bonding elements and/or TSVs, wherein the clock used to drive the slave interface is trained by changing the clock phase of the clock to align with the data clusters of the command and/or the data clusters of the data.

The interface method according to claim 11, further comprising: receiving, by other slave interfaces, the command from the master device and/or sending other data generated by the other slave devices to the master device, wherein the other slave interfaces are driven by the clock generated by the clock generator and are electrically connected to the master interface through the one or more bonds and/or the TSV, wherein each clock used to drive each of the other slave interfaces is by changing the clock phase of each clock to the data cluster and/or the command corresponding to each of the other slave interfaces The data clusters of the data are aligned for training.

The interface method of claim 12, wherein each of the slave interface and the other slave interfaces receives the command from the master device and/or sends the data/the other data to the The master device also includes sending the data/the other data to the master interface using a double data rate (DDR) configuration.

The interface method of claim 13, wherein sending the data/the other data to the host interface using a double data rate (DDR) configuration further comprises: generating a part of the data according to the data/the other data by the first flip-flop (FF) unit; generating another part of the data according to the data/the other data by the second flip-flop unit; and The portion of the data and the other portion of the data are sent to the master by a multiplexer.

The interface method of claim 14, wherein using the double data rate configuration to send the data/the other data to the host interface further comprises: generating, by a first gating unit, a first local clock from the clock generated by the clock generator; and generating a second local clock by a second gating unit according to the clock generated by the clock generator, The first local clock generated by the first gating unit is used by the main interface to read the part of the data generated by the first flip-flop unit, and the second gating unit is used to read the part of the data generated by the first trigger unit. The generated second local clock is used by the master interface to read the other part of the data generated by the second flip-flop unit.

The interface method of claim 12, wherein the master device further generates a turnaround (TA) loop, wherein the turnaround loop is used to prevent communication between responses from the slave device and responses from the other slave devices Bus competition.

The interface method of claim 16, wherein the steering loop is used to compensate for differences in round trip delay (RTD) between the slave device and the other slave devices.

The interface method of claim 17, wherein the slave and the other slaves generate zero data before the data and after the data to prevent the slaves from being caused by different round-trip delays competition between the device and the other slave devices.

The interface method of claim 13, wherein sending, by the master interface, the command to the slave device and/or receiving the data from the slave device further comprises: using a clock generated by the clock generator The clock reclocks the portion of data and the other portion of data from the double data rate units of the slave interface and the other slave interfaces.

The interface method of claim 15, wherein the slave device and the other slave devices train the first gating unit and the second gating unit to locate the first gating unit at an optimal data sampling point part of the information and said part of the information.