CN117640548A

CN117640548A - Cross-die access method and system

Info

Publication number: CN117640548A
Application number: CN202311775482.3A
Authority: CN
Inventors: 陆佳佳; 杨亮
Original assignee: Muxi Integrated Circuit Shanghai Co ltd
Current assignee: Muxi Integrated Circuit Shanghai Co ltd
Priority date: 2023-12-21
Filing date: 2023-12-21
Publication date: 2024-03-01

Abstract

The present invention relates to the field of chip design technology, and in particular, to a method and a system for multi-level cross die access, and a source die thereof ₀ Receiving response data packet sequences rpack returned by H destination die through the die-on-chip network; the data length of all the response data packets in rpack is the same; each response data packet sequence comprises a plurality of data fragments split by one response data and a plurality of data packets obtained by independently packaging each data fragment; RHIB (RHIB) ₀ Resolving rpack to obtain the address of the destination die returned to each data packet; sequentially redistributing each packet to different packets according to destination die addressObtaining H destination die response queues in the destination die response queues; analyzing the response queue of each destination die to obtain all response data, and dispatching the response data to the network-on-chip in die for returning, thereby realizing the purposes of simplifying the protocol and uniformly distributing hardware transmission resources.

Description

Cross-die access method and system

Technical Field

The invention relates to the technical field of chip design, in particular to a multi-level cross die access method and system.

Background

Both the data network on chip (data network on chip, DNOC) and the register network on chip (register network on chip, RNOC) are architectures of network on chip (NoC) for connecting individual modules or IP cores on the chip for efficient, low latency data transmission and communication. The RNOC realizes data transmission and communication between the cores mainly through register reading and writing, and is suitable for system design of control access of small data volume; DNOC is mainly used for realizing data transmission and communication between IP cores through data packet exchange, and is suitable for system design of data access with large data volume.

The IP cores in the Die are communicated through the NOC, and the data packets are sent from the source Die to another destination Die of the physical connection through the NOC, so that one-to-one cross Die transmission between two directly connected Die can be realized. However, the above access method has the following problems:

1. if the source die and the destination die are not directly connected, the access is not available.

2. If the same source die sends access requests with the same read-write identity to different destination die, the response data received by the source die can be out of order.

3. If different source die send the access request with the same read-write identity to the same destination die, the response data sent by the destination die also has disorder problem.

And 4, the complexity of the transmission protocol of the die-chip network is high, and the hardware transmission resources are unevenly distributed.

Disclosure of Invention

In order to solve the 4 th technical problem, the invention adopts the following technical scheme:

in a first aspect, an embodiment of the present invention provides a method for accessing a cross die, where the method includes:

p100, source die ₀ Receiving response data packet sequence rpack= { rpack returned by H destination die through die-chip network ₁ ,rpack ₂ ,…,rpack _h ,…,rpack _H }，rpack _h The value range of H is 1 to H for the response data packet sequence returned by the H die; data length in all response packets in rpack The same; rpack _h The system comprises a plurality of data packets which are obtained by splitting a plurality of data fragments by response data and independently packaging each data fragment;

p200, source die ₀ Control unit RHUB of (a) ₀ Analyzing all the data packets in the rpack to obtain the address of the destination die returned to each data packet;

p300, according to the address of the destination die of each data packet, redistributing each data packet to the response queues of different destination die in sequence to obtain the response queues of H destination die;

and P400, analyzing the response queue of each destination die to obtain all response data, and scheduling the response data to the network-on-chip in the die for returning.

In a second aspect, an embodiment of the present invention provides a system for accessing a cross die, where the system includes a processor and a non-transitory computer readable storage medium, where at least one instruction or at least one program is stored in the non-transitory computer readable storage medium, where the at least one instruction or the at least one program is loaded and executed by the processor to implement the method for accessing a cross die.

Compared with the prior art, the method and the system for cross-die access provided by the invention have obvious beneficial effects, can achieve quite technical progress and practicality, have wide industrial utilization value, and have at least the following beneficial effects:

The invention provides a method and a system for accessing across die, which divide all data transmitted between inter-die networks into data segments with the same data length, independently package each data segment into data packets, and recombine and analyze the data packets at a receiving end according to the splitting sequence to obtain complete data, thereby realizing the purposes of simplifying a protocol and uniformly distributing hardware transmission resources.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a multi-level cross die access method according to a first embodiment of the present invention;

FIG. 2 is a flow chart of a forward order preserving method across die according to a second embodiment of the present invention;

FIG. 3 is a flow chart of a reverse order preserving method across die according to a third embodiment of the present invention;

fig. 4 is a flowchart of a method for accessing a cross die according to a fourth embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

In order to solve the first technical problem, an embodiment of the present invention is provided.

Example 1

Referring to FIG. 1, a multi-level cross die access method is shown, the method comprising the steps of:

s100, acquiring a source die ₀ Medium IP ₀ The transmitted access request req carries user information and destination address addrA, the addrA comprises destination die _t Address and destination IP of (c) _t Is a logical address of the host.

The die comprises a plurality of IP cores, a die internal network-on-chip RNOC, a control unit RUB and a die inter-chip network DNOC interconnection interface.

The IP core (Intellectual Property) is a mature functional module which is designed in advance, verified and reusable. IP (Internet protocol) ₀ Is die ₀ Is a single IP core. In the embodiment of the invention Is an IP core and is not declared below.

Wherein, the network-on-chip RNOC in die is used between IP cores in die ₀ And carrying out interconnection communication. And carrying out interconnection communication between die through a die inter-chip network DNOC. Die network on chip RNOC0 is a register network on chip (register network on chip, RNOC). die-space network-on-chip DNOC is a data network-on-chip (data network on chip, DNOC).

The control unit RHUB is configured to package the inter-die access request into a data packet conforming to the die-chip network DNOC protocol, or parse the data packet sent by the die-chip network DNOC, and schedule the data packet.

Wherein the access request is a read request or a write request.

Optionally, the higher order of addrA is the destination die _t Address of (2), low order IP _t Is a logical address of the host.

Wherein the user information includes source IP ₀ Identity information and other information. Wherein the source IP ₀ Is the source IP ₀ Is used for the identification of the unique identity of the user. Wherein other information includes, but is not limited to, data length, bit width and source IP ₀ Identity of the user, etc.

Alternatively, IP ₀ Transmitting req to RNOC via control channel ₀ 。

S200, source die ₀ Die-on-chip network RNOC in die ₀ Analyzing the user information and addrA carried in req to obtain the destination die _t Is a logical address of the host. It should be noted that, through the die-in-die network-on-die RNOC in die ₀ Firstly, preliminary screening is carried out, and the destination die of req is judged _t And if the communication is the current die, the communication is in the die, otherwise, the communication is between the die. The primary screening can further reduce the searching range and the searching complexity.

Further, when the destination die _t When the address of the address is the address of the current die, IP according to the purpose _t Address of (c) sends user information to source die ₀ Destination IP of (E) _t . It should be noted that, determining the destination die _t If the address of addrA is 0, i.e. if the high address of addrA is 0, whenWhen the high-order address is 0, the destination die is described _t The address of (2) is the address of the current die; otherwise, the other purpose die is achieved. When destination die _t When the address of (a) is not the address of the current die, the req is sent to the source die ₀ RHIB in (B) ₀ And performs the following steps S220 to S240.

S220，RHUB ₀ Analyzing req to obtain user information and addrA, and extracting a destination die from the addrA according to a preset rule _t The address of the address (A) regenerates a destination address addrB, and the user information and the addrA are packed into a data packet pack; and sending addrB and pack to die inter-chip network DNOC.

The preset rule is an address format of DNOC. addrB complies with the network transport protocol of DNOC. For example, according to coding and destination die of DNOC network interface _t Is the destination address addrB.

Preferably, the RNOC ₀ Transmitting req to RHIB via control channel ₀ 。RHUB ₀ And sending addrB through a write address channel and sending pack to die-space network-on-chip DNOC through a write data channel respectively.

Wherein, the package format of the data package accords with the package protocol of DNOC.

The addrB can enable DNOC to only analyze the addrB when forwarding to different die to judge whether forwarding in die is needed to be stopped or not, and does not need to analyze other large amount of irrelevant information, so that light-weight rapid forwarding is realized.

S240, forwarding addrB and pack according to the routing table by DNOC, wherein the addrB and pack are forwarded to the ith die _i When die _i The DNOC interface of (1) analyzes addrB to obtain the destination die _t Is a logical address of the host.

Wherein, die and die are interconnected through physical connection. The path of the address pointing to the destination die is stored in the routing table of the DNOC. The DNOC can select a correct path according to the routing table to send data to the next die, and the next die also forwards according to its own routing table, and so on, until the data is forwarded to the destination die, and the forwarding is stopped.

Further, when the destination die _t Address and die of (d) _i When the addresses of (a) are different, continuingForward to the next die, otherwise, die _i For purposes die _t Sending pack to die _i RHIB of (C) _i And performs the steps of:

S242，RHUB _i analyzing pack to obtain user information and addrA, and extracting target IP in addrA _t Address of (a); user information and IP _t Address of (c) is sent to die _i Die network-on-chip RNOC _i 。

It should be noted that each die includes an RNOC addressed in the die, and RHIB for transmitting control information class _i And a DNOC interface for cross die addressing.

Wherein when the destination die is obtained, RHIB _i No longer need destination die _t Directly stripping the address in addrA to extract the target IP _t Is a logical address of the host. Destination IP obtained by extraction _t For delivery to die network-on-chip RNOC in die to be IP-based on destination _t Is used for transmitting corresponding user information to the destination IP _t 。

S243，RNOC _i According to the IP of the purpose _t Address of (c) sends user information to die _i Destination IP in (a) _t 。

The embodiment of the invention uses the destination die _t The address of the address (A) regenerates the destination address addrB, packages the user information and addrA into a data packet pack, forwards the addrB and the pack through DNOC, and analyzes the addrB and judges the destination die in the forwarding process _t Whether the address of the current die is the target die is judged by judging whether the address of the current die is the same as the address of the current die, and pack analysis is not needed, so that the data quantity and complexity of analysis are reduced, the forwarding speed and efficiency are improved, and further, the lightweight rapid forwarding is realized. Meanwhile, as the die serving as the forwarding node in the middle does not need to analyze the pack, information in the pack can be prevented from being leaked, and the safety of pack transmission is improved.

As a preferred embodiment, the method further comprises:

s220 further includes: RHIB (RHIB) ₀ Resolving req to obtain user information, a source address and addrA, and extracting a destination die from the addrA according to a preset rule _t Address re-generation of (a)Forming a destination address addrB, and packaging the user information, the source address and the addrA into a data packet pack; and sending addrB and pack to die inter-chip network DNOC.

S242 also includes RHIB _i Analyzing pack to obtain user information, source address and addrA, and extracting target IP in addrA _t Address of (a); user information and IP _t Address of (c) is sent to die _i Die network-on-chip RNOC _i . The RHIB is used for the treatment of a disease _i Without the need to send the source address to the RNOC _i The source address is used for providing response data to be returned to the source die in the response process ₀ IP in (a) ₀ Is a logical address of the host.

As a preferred embodiment, the method further comprises a response step, wherein:

S300，RNOC _i receiving destination IP _t Response information response of (a) _t When source die ₀ When the address of the current die is the address according to the source IP ₀ Address of (c) will respond _t Return to source IP ₀ The method comprises the steps of carrying out a first treatment on the surface of the Otherwise, response is to _t Returns to RHIB _i And performs the steps of:

S320，RHUB _i receiving response _t And respond _t And source address are packed into data packet pack _t Source die ₀ Address regeneration of (a) into addrR, and DNOC generating pack according to the addrR _t Return to source die ₀ 。

The source die is described as ₀ Step of address re-generation into addrR and method according to destination die _t The same procedure is used for regenerating the destination address addrB. addrR complies with the network transport protocol of DNOC.

The response information is along with pack _t The original path is returned, and in the process of returning the response information, the response information still passes through the RNOC at first _i Judging whether the response information is the response information of the current die or the response information of the cross die, if the response information is the response information of the current die, directly transmitting the response information to the corresponding destination IP in the die; if the response information is cross-die, RHIB is needed to pass _i The response information is packed into a data packet and is forwarded to the source die through DNOC, and is returned to the source dieCorresponding source IP ₀ 。

Through steps S100-S200, the cross die access of the access request can be completed, and the complete cross die access and response process of the access request can be completed in combination with S300. The multi-stage cross-die quick access is realized through the analysis and judgment of the destination address addrB between die, and the technical problem that the source die and the destination die cannot be accessed when not directly connected is solved.

Based on the same inventive concept as the method embodiment of the first embodiment, the first embodiment further provides a multi-level inter-die access system, which includes a processor and a non-transitory computer readable storage medium storing at least one instruction or at least one program therein, the at least one instruction or the at least one program being loaded and executed by the processor to implement the multi-level inter-die access method provided in the first embodiment, wherein one multi-level inter-die access method has been described in detail in the first embodiment and is not repeated.

When a source die initiates a read-write request, the situation that two or more access request data carry the same read-write identity and the destination die is different may occur, and the access request data with the same read-write identity are sequentially sent to different destination die through the source die. Because the different destination die process the access request independently, and the processing delays of the different destination die may not be the same, the order of the response data returned by the destination die is not completely in the order sent from the source die, and the response data returned by the RHIB is disordered. Therefore, in order to solve the second technical problem, the embodiment of the present invention provides the second embodiment.

Example two

To solve the problem of response out-of-order, embodiment two passes RHIB at source die ₀ And establishing a sending cache table, wherein the cache table mainly records the read-write identity identifier and the destination die address of each transmitted inter-die access request. When the access request needs to cross die access, the sending cache table needs to be traversed first, if the access request to be sent is recorded in the sending cache tableWhen the access request records with the same read-write identity and different destination die addresses are recorded, in order to prevent the occurrence of disorder, the access request to be sent needs to be intercepted until all the access requests with the same read-write identity and different destination die addresses in the cache table are responded, and then the read-write identity of the intercepted access request to be sent and the destination die addresses are recorded in the cache table and sent out. That is, in the embodiment, the sending cache table intercepts the access request with the same read-write identity identifier as any list in the table and different destination die address, and the intercepted access request is re-recorded in the sending cache table and sent out after the relevant list in the table is responded, so that the condition that disorder is not caused can be ensured, and the returned response data is necessarily returned according to the sending order.

Referring to fig. 2, a method for forward order preservation across die is shown, the method comprising the steps of:

t200, obtaining access request req sent by source die ₀ Read-write identity mark axid carried in ₀ And address daddr of destination die ₀ 。

Wherein the source die is the die from which the access request was initiated. The destination die address is the destination die accessed by the access request.

As a preferred embodiment, the read-write identity is a read address identity or a write address identity.

T400, control unit RHIB for obtaining source die ₀ In a transmit cache table ₀ ，table ₀ Includes N sending list tab ₀ Each sending list records the read-write identity and the destination die address of the access request which is sent and not responded; tab ₀ ＝{t ₁ ,t ₂ ,…,t _i ,…,t _N }，t _i For the ith transmit list, t _i Including read-write identity mark axid _i Destination die address daddr _i Having an axid _i And daddr _i Number cnt of access requests issued and not responded to _i Wherein i has a value ranging from 1 to N; wherein the tab ₀ Not included in (1) hasIdentical axid _i And different daddr _i Is included in the list of (a) and (b).

It should be noted that the source die includes a control unit RHUB ₀ 。

Optionally, N has a value of 32.

T600 according to axid ₀ Traversing tab ₀ Screening out read-write identity mark and axid ₀ Identical transmit list t _i . And screening out candidate transmission lists with the same read-write identity through the read-write identity so as to reduce the searching range.

T620, when daddr ₀ And t _i Daddr of (2) _i When the same, cnt _i The count value of (2) is incremented by 1, and req is allowed ₀ Cross die access of (c). Through cnt _i The number of the access requests sent by the same address of the read-write identity mark and the destination die is counted, so that the storage space for repeatedly recording the sending cache table can be saved.

Wherein, every time an access request with the same read-write identity and the same destination die address is added, t _i Count value cnt of _i 1 is added. Every time a response data of same read-write identity and same destination die address is received, t _i Count value cnt of _i Decrease 1 until cnt _i Equal to 0. When cnt _j Equal to 0, indicate the sum t sent by the source die _i The access requests with the same read-write identity and destination die address return response data, and when cnt _j T is equal to 0 _i This list is released and the released list can be re-occupied.

T640, daddr ₀ And t _i Daddr of (2) _i At different times, interception req ₀ Is used for the access of (a).

The sending cache table records that the access requests which are sent out and not responded yet have the same read-write identity and different destination die, and intercepts req ₀ The problem of disorder of response data can be prevented, and the problem of disorder of response data received by a source die when the same source die sends access requests with the same read-write identity to different destination die is solved.

As a preferred embodiment, the following steps are also included after T640:

t642, when transmitting list T _i Cnt of (2) _i 0, according to req ₀ Axid of (A) ₀ And addr ₀ In the table ₀ A new transmission list is recorded in the transmission list, and req is allowed ₀ Cross die access of (c).

It should be noted that when t _i Cnt of (2) _i At 0, we say that the and req has been sent out from the source die ₀ There is no access request which is not responded in the access request with the same read-write identity and the address of different destination die, so when the source die sends t with invalid again _i When the same read-write identity mark and different destination die address are requested to access, at the moment, due to t _i If the list is invalid, a new list can be normally established, the access request is not intercepted, the cross die access is allowed to be normally carried out, and the req is sent again ₀ No disorder problem occurs. In addition, every time when an access request is sent, the sending cache table is traversed, and as the information stored in the sending cache table only has the key read-write identity and the address of the destination die and has no other redundant information, the list meeting the conditions is firstly screened out through the read-write identity, so that the complexity of the table lookup is low and the lookup speed is high.

It should be noted that, for the sending cache table, since multiple requests with the same read-write identity and addresses of the same destination die occupy one list, and the read-write requests with the same read-write identity and addresses of different destination die are intercepted, no new list is built in the sending cache table, so multiple candidate lists with the same read-write identity and different destination die do not exist in the sending cache table, and therefore the sending cache table is similar to daddr ₀ There is at most one send list of different destination die addresses.

As a preferred embodiment, the T620 further includes:

when daddr ₀ And t _i Daddr of (2) _i When the same, judging cnt _i Whether the count value of the counting unit is equal to the preset maximum value, when cnt _i The count value of (2) is equal to the preset maximum value, and the blockInterception req ₀ Is accessed by a user; when cnt _i When the count value of the CPU is smaller than the preset maximum value, cnt _i The count value of (2) is incremented by 1, and req is allowed ₀ Cross die access of (c). By presetting the maximum value, the number of access requests which are sent by the same die and have the same read-write identity and the same destination die can be limited, and the blocking of the destination die caused by the inflow of a large number of access requests is prevented.

As a preferred embodiment, the method further comprises:

t700, obtaining table ₀ Maximum number of transmit list threshold N _max When tab ₀ Read-write identity and axid in the card ₀ Are all different, and n=n _max At the time, interception req ₀ Is accessed by a user; otherwise, in the table ₀ Record req ₀ Axid of (A) ₀ And daddr ₀ A new transmission list is established. That is, when a new list is needed, if the record in the sending cache table has reached the maximum value, interception is performed to ensure that all the sent access requests are recorded in the sending cache table. Wherein N is _max A quantity threshold value specified for the user.

As a preferred embodiment, in the transmitted access request, an access request with a destination die not existing is initiated due to misoperation or other reasons, when such an access request exists, a list is normally established in the transmission cache table, but because the destination die does not exist, the access request corresponding to the list will not obtain response data, that is, the data will always occupy the space of the transmission cache table and cannot be eliminated. To solve this problem, the method further comprises: t800, optimizing sending cache table ₀ Comprises the steps of:

t810, obtaining address addr= { addr of X die connected by source die ₁ ,addr ₂ ,…,addr _x ,…,addr _X }，addr _x For the X-th die connected to the source die, the value of X ranges from 1 to X. It should be noted that, each die can acquire topology information interconnected with the die, and obtain address information of other die interconnected with the die. Optionally, the x-th die connected to the source die is a directly connected die, or is an intermediate die And (5) connecting the connected die.

T820 traversing the table according to addr ₀ Obtaining a table ₀ F destination die addresses Ndaddr not belonging to addr, wherein F is more than or equal to 0. The Ndaddr includes F destination die addresses. Because the destination die in the Ndaddr does not belong to the die in the interconnection topology of the source die, i.e. the access request carrying the destination die address in the Ndaddr cannot be sent to the destination die, and the corresponding response data cannot be returned, the list corresponding to the access request carrying the destination die in the Ndaddr is always present in the table ₀ And will not disappear, the list established by such access requests occupies the space resources of the sending cache table.

T830, obtaining an access request queue allowing normal access, and respectively extracting F access requests corresponding to Ndaddr from the access request queue to obtain an isolation queue fence; preset response data is configured for each access request in the nonce.

Optionally, the access request queue is a read request queue or a write request queue according to different read-write identities of the access requests.

The preset response data are response data appointed by a user. Preferably, the preset response data is null information.

And T840, packaging the access request queue after the fence extraction into a data packet for transmission, and obtaining a normal response queue.

It should be noted that, the access request queue after the fe is extracted is the access request queue of normal access, where there is no access request that cannot be sent to the destination die any more.

RHIB of T850, source die ₀ And scheduling preset response data of the normal response queue and the isolation queue to the die network-on-chip RNOC for returning.

It should be noted that, due to the table ₀ The in-order return of response data can already be guaranteed, and thus at scheduling, response data returned with the same order in-order. Thus, the access request in the fe gets the response data, and not only the table can be eliminated ₀ The occupied list in the cache list is released, and the resource for sending the cache list can be releasedAnd giving response information to the corresponding access request, and ending the access request.

As a preferred embodiment, between T840 and T850, further comprises: t841, the source die receives the normal response data returned by different die, RHIB ₀ And re-arranging and unpacking the normal response data according to the die with different purposes to obtain a normal response data queue returned by the die with different purposes.

The step of optimizing the transmit buffer table through T800 can eliminate list data in which destination die does not exist in the transmit buffer table due to a malfunction or other reasons.

It should be noted that, the access request in the second embodiment is the same as the access request in the first embodiment, the access request is initiated from the source IP of the source die, routed to the destination die through the network on die and the RHUB of the source die, and accessed to the destination IP through the RHUB of the destination die and the network on die of the die through the DNOC network; the access process and the response process are the same as those of the first embodiment, and will not be described again. The difference is that during the access period, before the RHIB of the source die sends the received access request to the die-chip network of the source die, the read-write identity of each access request and the address of the destination die are utilized to traverse the sending cache table in the RHIB, whether a list of the same read-write identity and addresses of different destination die exists in the sending cache table is searched, if so, the access request is intercepted, after the relevant list in the sending cache table is responded, the intercepted access request is allowed to be normally sent to the die-chip network, so that the response sequence is ensured, and disorder is prevented.

Based on the same inventive concept as the method embodiment of the second embodiment, the second embodiment further provides a forward order preservation system across die, the system including a processor and a non-transitory computer readable storage medium, wherein at least one instruction or at least one program is stored in the non-transitory computer readable storage medium, and the at least one instruction or the at least one program is loaded and executed by the processor to implement a forward order preservation method across die provided by the first embodiment, wherein a forward order preservation method across die is already described in detail in the first embodiment and is not repeated.

The second embodiment can implement forward order preservation. However, when receiving access requests from different source die for the same destination die, if the received access requests have the same read-write identity, the RHIB cannot distinguish which source die the response information is returned to when the response data is returned, and at this time, the problem of disorder in response information return also occurs. Therefore, in order to solve the third technical problem, the present invention provides a third embodiment.

Example III

In order to solve the problem of disorder response, the third embodiment is also implemented by building a table in the RHUB, building a receiving buffer table in the RHUB of the destination die, and recording the read-write identity of each access request and the address of the source die received and allowed to be sent to the die network in the die in the receiving buffer table. And the die network transmits the access request recorded in the cache table to the destination IP and acquires corresponding response data. When an access request is sent to an RNOC network of a destination die through RHIB, the receiving cache table needs to be traversed first, and if the same read-write identity and different source die addresses as the access request to be sent are recorded in the receiving cache table, the access request to be sent is intercepted in order to prevent disorder. When the relevant list in the receiving cache table is responded, the intercepted access request is allowed to be normally accessed and recorded in the receiving cache table, and the problem of disorder in the response returning process in the destination die is avoided.

Referring to fig. 2, a method for reverse order preservation across die is shown, the method comprising:

e100, obtaining destination die to receive access requests sent by R source die, wherein f-th access request req _f Read-write identity mark axid carried in _f And source die address saddr _f The value range of f is 1 to R, and R is more than or equal to 2.

E200, obtaining the control unit RHIB in the destination die _t Is a received cache table of (a) _t ，table _t Includes K receiving list tab _t Each receiving list records the read-write identity of the received access requestAnd a source die address; tab _t ＝{tt ₁ ,tt ₂ ,…,tt _q ,…,tt _K }，tt _q For the q-th receive list, tt _q Including read-write identity mark axid _q Source die address saddr _q And have axid _q And saddr _q Number cnt of access requests issued and not responded to _q Wherein q has a value ranging from 1 to K.

In the destination die, RHIB is passed _t Analyzing the data packet to obtain corresponding access request data, and then using RHIB _t And sending the corresponding data to the die on-chip network for routing, and sending the data to a final destination IP. At RHIB _t When the corresponding data is sent to the die network-on-chip for routing, the access request data sent to the die network-on-chip is recorded in a receiving cache table _t In the method, one data is ensured to be recorded every time one data is transmitted. Record only read-write identity mark axid in receiving buffer table _q Source die address saddr _q And number cnt _q Is free from recording other irrelevant information, can reduce the occupation of space resources, and has the same quantity cnt _q The occupation of space resources can be further reduced.

E300, req received according to destination die _f Axid carried in (a) _f Traversing tab _t Screening out read-write identity mark and axid _f The same receive list tt _q . Similarly, the searching range can be further reduced by selecting the read-write identity through the preferred mode, and the calculation complexity is reduced.

E400, judge req _f Saddr of (2) _f And tt _q Saddr of (2) _q Whether or not they are identical, if so, cnt _q Add 1, allow req _f Normal access; otherwise, intercept req _f 。

Embodiments by the method provided by E100-E400, by recording an access request sent to the die on-chip network in a receive cache table _t When the access request to be sent to the network-on-chip in die is the same as the identity recorded in any one list in the table and the source die is different, the sending is intercepted,and further, the condition that returned response data cannot be disordered is ensured, and the problem that the response data sent by the destination die is disordered when different source die sends access requests with the same read-write identity to the same destination die is solved. In addition, when the access request traverses the receiving cache table, the information stored in the receiving cache table only has the key read-write identity and the address of the source die, and no other redundant information, and the list meeting the conditions is firstly screened out through the read-write identity, so that the complexity of the table lookup is low, and the lookup speed is high.

As a preferred embodiment, the following steps are also included after E400: e500, when receiving list tt _q Cnt of (2) _q When equal to 0, according to req _f Axid of (A) _f And saddr _f In the table _t A new receive list is recorded and req is allowed _f And sending the data to the die on-chip network.

Note that, when cnt _q When equal to 0, it is stated that tt _q Has returned, and therefore no longer intercepts req _f Allow req _f Normally sending to the die on-chip network and in the table _t In normal establishment of req _f Corresponding new list to ensure sequential access of subsequent access requests.

As a preferred embodiment, E400 also includes determining req _f Saddr of (2) _f And tt _q Saddr of (2) _q If the two are the same, judging the cnt _q Whether the count value is equal to the preset maximum threshold value, when cnt _q Intercepting req when the count value of the interception request is equal to a preset maximum threshold value _f When cnt is accessed _q Cnt when the count value is smaller than the preset maximum value _q Add 1, allow req _f Normal access; otherwise, intercept req _f . The number of access requests with the same read-write identity and the same source die sent to the network-on-chip in die each time can be limited by presetting the maximum threshold value, and access blocking can be prevented.

As a preferred embodiment, the method further comprises:

E600, obtaining the table _t Maximum number of received list threshold K _max When tab _t Read-write identity and axid in the card _f Are all different, and k=k _max At the time, interception req _f Is accessed by a user; otherwise, in the table _t Record req _f Axid of (A) _f And saddr _f A new receive list is established. Wherein K is _max A quantity threshold value specified for the user. When the number of lists in the receive cache table has reached the maximum number, interception is also performed to ensure that all access requests sent to the die network on chip are recorded in the receive cache table.

As a preferred embodiment, RHIB is also included in the destination die prior to E100 _t Analyzing the received data packet to obtain access requests sent by R source die and destination address addrA of each access request; extracting destination IP in addrA _t Address of the destination IP _t For making the die-on-chip network in the destination die according to the destination IP _t Sends the access request to the destination IP _t . It should be noted that, the destination address addrA is the same as that in the first embodiment, and will not be described again.

As a preferred embodiment, the method further comprises the steps of: e500, network-on-chip reception req within die in destination die _f Response information response of (a) _f When the address of the source die is the address of the current die, response is generated according to the address of the source die _f Returning to the source IP; otherwise, response is to _f RHIB returned to destination die _t ，RHUB _t Will receive req in the cache list _f Is a list tt of (2) _q Cnt of (2) _q Minus 1.

It should be noted that, the access request in the third embodiment is the same as the access request in the first embodiment, the access request is initiated from the source IP of the source die, routed to the destination die through the network on die and the RHUB of the source die, and accessed to the destination IP through the RHUB of the destination die and the network on die of the die through the DNOC network; the access process and the response process are the same as those of the first embodiment, and will not be described again. The method is characterized in that during the access period, before the RHIB of the destination die sends the received access request to the die network-on-chip of the destination die, the read-write identity of each access request and the address of the source die are utilized to traverse a receiving cache table in the RHIB, whether a list of the same read-write identity and addresses of different source die exists in the receiving cache table is searched, if so, the access request is intercepted, after the relevant list in the receiving cache table is responded, the intercepted access request is allowed to be normally sent to the die network-on-chip of the destination die, so that the sequence of the response is ensured, and disorder is prevented.

Based on the same inventive concept as the method embodiment of the third embodiment, the third embodiment further provides a cross-die reverse order preserving system, which includes a processor and a non-transitory computer readable storage medium storing at least one instruction or at least one program therein, wherein the at least one instruction or the at least one program is loaded and executed by the processor to implement a cross-die reverse order preserving method provided by the third embodiment, and one cross-die reverse order preserving method is described in detail in the third embodiment and is not repeated.

The multi-level inter-die access method provided in the first embodiment can realize multi-level inter-die, and can realize inter-die data transmission through the die on-die network, but because the data length of the inter-die transmission is uncertain, some data lengths are larger, some data lengths are smaller, so that the complexity of a transmission protocol of the die on-die network is higher, and the data with larger data length needs to be transmitted through a plurality of clock cycles when being transmitted through the die on-die network, so that the transmission process occupies hardware transmission resources for a longer time, and the problem of uneven allocation of the hardware transmission resources is caused. The present invention provides a fourth embodiment in order to solve this problem.

Example IV

In order to solve the problems of higher complexity of the transmission protocol and uneven distribution of hardware transmission resources of the die-chip network, in the fourth embodiment of the invention, all data transmitted between the die-chip network are split into data segments with the same data length, each data segment is independently packaged into a data packet, and the data packets are recombined and analyzed again according to the split sequence at a receiving end to obtain complete data.

Referring to fig. 4, a cross die access method is shown, the method includes:

p100, source die ₀ Receiving response data packet sequence rpack= { rpack returned by H destination die through die-chip network ₁ ,rpack ₂ ,…,rpack _h ,…,rpack _H }，rpack _h The value range of H is 1 to H for the response data packet sequence returned by the H die; the data length of all the response data packets in rpack is the same; rpack _h The system comprises a plurality of data packets which are obtained by splitting a response data into a plurality of data fragments and independently packaging each data fragment.

Note that rpack _h Includes W response data packets, wherein W is equal to or greater than 0.

Wherein, the source die ₀ Die for sending access request. The destination die is die which receives the access request and returns the response data.

Wherein rpack is _h A plurality of data packets corresponding to one response data included in the data packets have the same read-write identity and source die ₀ Is a packet of an address of (a) a packet of (b).

P200, source die ₀ Control unit RHUB of (a) ₀ And resolving all the data packets in the rpack to obtain the address of the destination die returned to each data packet.

It should be noted that, the format of the data packet is the same as that of the data packet in the first embodiment, and the data packet includes the source address, the address of the destination die, and the data fragment.

And P300, redistributing each data packet to response queues of different destination die according to the address of the destination die of each data packet in sequence to obtain response queues of H destination die.

In the source die ₀ After splitting the data into data fragments and independently packing each data fragment into data packets, the control unit RHUB of the destination die schedules and transmits the data fragments to the die-chip network in units of each data packet,because the scheduling is performed by taking the data packet as a unit, the data packets of different destination die may be cross-scheduled in the scheduling process, so that the data packets returned by other destination die are inserted between a plurality of different data packets of the same data, and further, the disorder is caused. In order to prevent the occurrence of the disorder problem, the data packet needs to be put into a response queue corresponding to each destination die according to the different destination die before analysis, so that the returning order can be ensured not to be disordered.

And P400, analyzing the response queue of each destination die to obtain all response data, and scheduling the response data to the network-on-chip in the die for returning. Since the response queue of each destination die is order-preserving, correct phase response data can be obtained.

By splitting and reorganizing the data, the complexity of the die-to-die network protocol can be reduced, and the problem of uneven hardware resource allocation caused by overlarge transmission data length can be prevented.

As a preferred embodiment, rpack in P100 _h Each data packet in the database carries the total length of response data; wherein:

p300 further comprises: and re-placing each data packet into the response queues of the destination die according to the address of the destination die of each data packet, and obtaining the response queues of H destination die when the data length of the data packet placed into the response queues of each destination die is equal to the total length of the corresponding response data. The data packets are returned to the source die in sequence ₀ Source die ₀ The response data are respectively put into response queues from different destination die in sequence, so that the sequential return of the returned response data can be ensured.

As a preferred embodiment, before the P100, the method further comprises:

P010, splitting each response data burst into a plurality of data fragments single with the same data length by the destination die through a control unit RHIB of the destination die, and independently packaging each split data fragment single to obtain a response data packet of each data fragment; wherein each response data burst corresponds to the data length, the reading and writing of a plurality of response data packetsIdentity and source die ₀ The addresses of (a) are all the same. It should be noted that, when the data length of the response data burst is less than or equal to the data length of the data segment single, the response data burst is individually packaged into a data packet.

P020, the obtained response data packet is sent to the source die through the die network ₀ . By splitting the data length into data fragments with fixed length, the sizes of all data packets transmitted through the inter-die network are the same, and the transmission protocol of the inter-die network is further simplified.

As a preferred embodiment, the response data packet in the P100 includes a read response data packet and a write response data packet; wherein:

the P300 further comprises the step of sequentially redistributing each data packet into a write response queue and a read response queue of different destination die according to the address of the destination die of each data packet and the type of the data packet, so as to obtain a write response queue and a read response queue of H destination die.

P400 also includes, respectively analyzing the write response queue and the read response queue of each destination die to obtain all the write response data and the read response data, and respectively scheduling the write response data and the read response data to the die on-chip network for returning. The read-write response data can be further divided through the step, and order-preserving return is guaranteed.

As a preferred embodiment, the die-space network is a data network-on-chip (data network on chip, DNOC). The die on-chip network is a register on-chip network (register network on chip, RNOC).

As a preferred embodiment, wherein: p400 further comprises: analyzing the response queue of each destination die to obtain all response data, obtaining preset response data of the isolation queue, and dispatching all analyzed response data and preset response data to the die on-chip network for returning; wherein the access request in the isolation queue is the source die ₀ RHIB of (C) ₀ The extracted destination die is invalid and is not sent to the access request of the destination die through the die-on-chip network. Wherein, the isolation queue and the preset response data and real thereofThe embodiments are the same, and will not be described in detail. The P400 can simplify the transmission protocol of the die-chip network and ensure the sequence of data return.

As a preferred embodiment, the method for obtaining the destination die invalidation is as follows: obtaining addresses addr= { addr of X die of source die connection ₁ ,addr ₂ ,…,addr _x ,…,addr _X }，addr _x For the X-th die connected with the source die, the value range of X is 1 to X; the cache table is sent according to addr traversal ₀ Obtaining a table ₀ F destination die addresses Ndaddr which do not belong to addr, wherein F is more than or equal to 0; wherein the destination die in Ndaddr is not valid; wherein, table ₀ Medium recording source die ₀ RHIB of (C) ₀ And sending an access request to the die-on-chip network. It should be noted that, the step of obtaining the invalid destination die is the same as the steps T810-T820 in the second embodiment, and will not be described again.

As a preferred embodiment, the access request comprises a write address request and a write data request.

Based on the same inventive concept as the method embodiment of the fourth embodiment, the fourth embodiment further provides a cross-die access system, where the system includes a processor and a non-transitory computer readable storage medium, where at least one instruction or at least one program is stored in the non-transitory computer readable storage medium, where the at least one instruction or the at least one program is loaded and executed by the processor to implement a cross-die access method provided by the third embodiment, where one cross-die access method is already described in detail in the third embodiment and is not repeated.

While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the present disclosure is defined by the appended claims.

Claims

1. A method of cross die access, the method comprising:

p100, source die ₀ Receiving response data packet sequence rpack= { rpack returned by H destination die through die-chip network ₁ ,rpack ₂ ,…,rpack _h ,…,rpack _H }，rpack _h The value range of H is 1 to H for the response data packet sequence returned by the H die; the data length of all the response data packets in rpack is the same; rpack _h The system comprises a plurality of data packets which are obtained by splitting a plurality of data fragments by response data and independently packaging each data fragment;

2. The method of claim 1, wherein rpack in P100 _h Each data packet in the database carries the total length of response data; wherein:

p300 further comprises: and re-placing each data packet into the response queues of the destination die according to the address of the destination die of each data packet, and obtaining the response queues of H destination die when the data length of the data packet placed into the response queues of each destination die is equal to the total length of the corresponding response data.

3. The method of claim 1, further comprising, prior to the P100:

p010, the destination die splits each response data burst into a plurality of data fragments single with the same data length through a control unit RHIB of the destination die, and performs uniqueness on each split data fragment singlePacking immediately to obtain a response data packet of each data segment; wherein each response data burst corresponds to the data length, the read-write identity and the source die of a plurality of response data packets ₀ The addresses of (a) are the same;

p020, the obtained response data packet is sent to the source die through the die network ₀ 。

4. The method of claim 1, wherein the response packets in P100 include a read response packet and a write response packet; wherein:

p300 also includes, according to the address of destination die of each data packet and the type of the data packet, redistributing each data packet to write response queues and read response queues of different destination die in sequence, so as to obtain write response queues and read response queues of H destination die;

p400 also includes, respectively analyzing the write response queue and the read response queue of each destination die to obtain all the write response data and the read response data, and respectively scheduling the write response data and the read response data to the die on-chip network for returning.

5. The method of claim 1, wherein the die-on-chip network is a data-on-chip network.

6. The method of claim 1, wherein the die network-on-chip is a register network-on-chip.

7. The method according to claim 1, wherein: p400 further comprises: analyzing the response queue of each destination die to obtain all response data, obtaining preset response data of the isolation queue, and dispatching all analyzed response data and preset response data to the die on-chip network for returning; wherein the access request in the isolation queue is the source die ₀ RHIB of (C) ₀ The extracted destination die is invalid and is not sent to the access request of the destination die through the die-on-chip network.

8. The method of claim 7, wherein the method for obtaining the destination die invalidation is: obtaining addresses addr= { addr of X die of source die connection ₁ ,addr ₂ ,…,addr _x ,…,addr _X }，addr _x For the X-th die connected with the source die, the value range of X is 1 to X; the cache table is sent according to addr traversal ₀ Obtaining a table ₀ F destination die addresses Ndaddr which do not belong to addr, wherein F is more than or equal to 0; wherein the destination die in Ndaddr is not valid; wherein, table ₀ Medium recording source die ₀ RHIB of (C) ₀ And sending an access request to the die-on-chip network.

9. The method of claim 7, wherein the access request comprises a write address request and a write data request.

10. A cross die access system comprising a processor and a non-transitory computer readable storage medium having at least one instruction or at least one program stored therein, wherein the at least one instruction or the at least one program is loaded and executed by the processor to implement the method of any one of claims 1-9.