CN117389931A - Protocol conversion module and method suitable for bus access to GPU (graphics processing unit) nuclear memory - Google Patents

Protocol conversion module and method suitable for bus access to GPU (graphics processing unit) nuclear memory Download PDF

Info

Publication number
CN117389931A
CN117389931A CN202311698394.8A CN202311698394A CN117389931A CN 117389931 A CN117389931 A CN 117389931A CN 202311698394 A CN202311698394 A CN 202311698394A CN 117389931 A CN117389931 A CN 117389931A
Authority
CN
China
Prior art keywords
nsp
channel signal
request
response
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311698394.8A
Other languages
Chinese (zh)
Other versions
CN117389931B (en
Inventor
陈明
黄宇浩
何颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xindong Microelectronics Technology Wuhan Co ltd
Original Assignee
Xindong Microelectronics Technology Wuhan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xindong Microelectronics Technology Wuhan Co ltd filed Critical Xindong Microelectronics Technology Wuhan Co ltd
Priority to CN202311698394.8A priority Critical patent/CN117389931B/en
Priority claimed from CN202311698394.8A external-priority patent/CN117389931B/en
Publication of CN117389931A publication Critical patent/CN117389931A/en
Application granted granted Critical
Publication of CN117389931B publication Critical patent/CN117389931B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/382Information transfer, e.g. on bus using universal interface adapter
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1673Details of memory controller using buffers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1684Details of memory controller using multiple buses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7825Globally asynchronous, locally synchronous, e.g. network on chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0038System on Chip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/38Universal adapter
    • G06F2213/3852Converter between protocols
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a protocol conversion module and a protocol conversion method suitable for bus access to a GPU (graphics processing unit) nuclear memory. The protocol conversion module comprises a request processing module and a read response processing module; the request processing module is used for identifying the request operation type according to a request channel signal nsp_req of the NSP protocol interface, converting the request channel signal nsp_req into a command channel signal BIF _cmd of the BIF protocol interface and transmitting the command channel signal BIF _cmd to the slave when the request operation type is identified as a read operation; the read response processing module is configured to receive a read data channel signal bif _return from the host, generate a response channel signal nsp_resp of the NSP protocol interface according to the read data channel signal bif _return, and send the response channel signal nsp_resp back to the host. The invention can save a large amount of chip area and reduce the difficulty of back-end layout and wiring while enabling the GPU in-core memory to perform normal data interaction with the upper-layer operation unit.

Description

Protocol conversion module and method suitable for bus access to GPU (graphics processing unit) nuclear memory
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a protocol conversion module and a protocol conversion method suitable for bus access to a GPU (graphics processing unit) nuclear memory.
Background
Image processors (Graphics Processing Unit, GPU) are mainly used for performing image and graphics related operations, and with the development of computer technology, demands for GPU computing power are increasing, so GPU cores continue to develop in a direction with larger specifications, more complex structures and smaller area and power consumption, which all put more stringent demands on the design and implementation of the various parts of the GPU cores. In order to improve performance, a large-area shared memory is placed in the GPU kernel, and the traditional GPU kernel bus accesses the shared memory by beating to reach the kernel memory, but the structure is excessively 'directly whitened', so that signal lines are excessively dense, which is not beneficial to subsequent layout and wiring, expansion and iteration of the GPU kernel, and the like.
Disclosure of Invention
Aiming at the defects or improvement demands of the prior art, the invention provides a protocol conversion module and a protocol conversion method suitable for bus access to a GPU (graphics processing unit) nuclear memory, which can save a large amount of chip area and reduce the difficulty of back-end layout wiring while enabling the GPU nuclear memory and a GPU upper operation unit to perform normal data interaction.
To achieve the above object, according to one aspect of the present invention, there is provided a protocol conversion module including a request processing module and a read response processing module; the request processing module is used for identifying the request operation type according to a request channel signal nsp_req of the NSP protocol interface, converting the request channel signal nsp_req into a command channel signal BIF _cmd of the BIF protocol interface and transmitting the command channel signal BIF _cmd to the slave when the request operation type is identified as a read operation; the read response processing module is configured to receive a read data channel signal bif _return from the slave, generate a response channel signal nsp_resp of the NSP protocol interface according to the read data channel signal bif _return, and send the response channel signal nsp_resp back to the host, so that the host can determine whether the read request in the request channel signal nsp_req is executed according to the response channel signal nsp_resp.
In some embodiments, the request channel signal nsp_req includes a first flag signal TrID, a request data beat len, and a second flag signal seq_num; the first flag signal TrID is used to identify the identity of the request channel signal nsp_req and the second flag signal seq_num is used to identify the identity of the command channel signal bif _cmd; the read response processing module further comprises a first storage unit; the request processing module is further configured to store the first flag signal TrID, the request data beat number len, and the second flag signal seq_num into the first storage unit when the request operation type is identified as a read operation; the read response processing module is further configured to generate a response channel signal nsp_resp according to the data stored in the first storage unit, so that the host can determine whether the read request in the request channel signal nsp_req corresponding to the first flag signal TrID is executed according to the response channel signal nsp_resp.
In some embodiments, the read response processing module is further configured to generate a flag bit flag1 corresponding to the first flag signal TrID, and set the flag bit flag1 to a first value and store the first value in the first storage unit; the read response processing module is further configured to compare the second flag signal seq_num in the read data channel signal bif _return with the second flag signal seq_num in the first storage unit, and when the matched second flag signal seq_num exists in the first storage unit and the corresponding flag bit flag1 is a first value, convert the corresponding first flag signal TrID and the read data channel signal bif _return in the first storage unit into a response channel signal nsp_resp, so that the host can determine that the execution of one read request in the request channel signal nsp_req corresponding to the first flag signal TrID is completed according to the response channel signal nsp_resp.
In some embodiments, the read response processing module is further configured to adjust a value of the second flag signal seq_num and a value of the request data beat len when there is a matched second flag signal seq_num in the first storage unit and the corresponding flag bit flag1 is the first value, and generate a response channel signal nsp_resp according to the value of the request data beat len, so that the host can further determine whether all read requests in the request channel signal nsp_req corresponding to the first flag signal TrID are executed according to the response channel signal nsp_resp.
In some embodiments, when the value of the requested data beat len is not 0, the generated response channel signal nsp_resp contains the first flag signal TrID and the data signal; when the value of the request data beat number len is 0, the generated response channel signal nsp_resp includes a first flag signal TrID, a data signal, and a third flag signal last, where the third flag signal last is used to identify that all requests are executed.
In some embodiments, the read response processing module is further configured to, when there is a matched second flag signal seq_num in the first storage unit and the corresponding flag bit flag1 is the first value, add 1 to the value of the second flag signal seq_num, subtract 1 to the value of the request data beat len, generate a response channel signal nsp_resp, execute the next request, and so on, until the request data beat len is 0, generate the response channel signal nsp_resp, and enable the response channel signal nsp_resp to include a third flag signal last, so that the host can determine that the read requests in the request channel signal nsp_req corresponding to the first flag signal TrID are all executed according to the third flag signal last.
In some embodiments, the read response processing module is further configured to adjust the flag bit flag1 to the second value when there is a matched second flag signal seq_num in the first storage unit, the corresponding flag bit flag1 is the first value, and the value of the requested data beat len is adjusted to be 0.
In some embodiments, the protocol conversion module further includes a write response processing module; the request processing module is further used for converting a request channel signal nsp_req into a command channel signal BIF _cmd and a write data channel signal BIF _write of the BIF protocol interface to be issued to the slave when the request operation type is identified as a write operation; the write response processing module is further configured to automatically generate a write response after the data of the last write request in the request channel signal nsp_req is received by the write data channel of the BIF protocol interface, and send the write response to the host through the response channel of the NSP protocol interface.
In some embodiments, the write response processing module includes a second storage unit; the request processing module is further configured to store the first flag signal TrID in the request channel signal nsp_req into the second storage unit after the data of the last write request in the request channel signal nsp_req is received by the write data channel of the BIF protocol interface; the write response processing module is further configured to generate a response channel signal nsp_resp of the NSP protocol interface according to the data stored in the second storage unit, so that the host computer can determine that all write requests in the request channel signal nsp_req corresponding to the first flag signal TrID are executed according to the response channel signal nsp_resp.
In some embodiments, the write response processing module is further configured to generate a flag bit flag2 corresponding to the first flag signal TrID, and set the flag bit flag2 to a third value, and store the third value in the second storage unit, to indicate that the corresponding first flag signal TrID is valid.
In some embodiments, the write response processing module is further configured to adjust, after the generated write response is received by the response channel of the NSP protocol interface, a flag bit flag2 corresponding to the first flag signal TrID to a fourth value, where the fourth value is used to indicate that the corresponding first flag signal TrID is invalid.
According to another aspect of the present invention, there is provided a graphics processing unit, including a master, a slave, and the above protocol conversion module.
According to still another aspect of the present invention, there is provided a protocol conversion method including:
identifying a request operation type according to a request channel signal nsp_req of the NSP protocol interface;
when the type of the request operation is identified as a read operation, the following steps are performed:
converting the request channel signal nsp_req into a command channel signal BIF _cmd of the BIF protocol interface and transmitting the command channel signal BIF _cmd to the slave; and
receiving the read data channel signal bif _return from the slave, generating a response channel signal nsp_resp of the NSP protocol interface according to the read data channel signal bif _return, and sending the response channel signal nsp_resp back to the host, so that the host can determine whether the read request in the request channel signal nsp_req is executed according to the response channel signal nsp_resp.
In some embodiments, upon identifying the type of requested operation as a read operation, the following steps are also performed: storing a first set of data including a first flag signal TrID, a request data beat len, and a second flag signal seq_num in a request channel signal nsp_req; a response channel signal nsp_resp is generated from the stored first set of data, so that the host can determine whether the read request in the request channel signal nsp_req corresponding to the first flag signal TrID is completed or not according to the response channel signal nsp_resp.
In some embodiments, the protocol conversion method further includes:
when the type of the request operation is identified as a write operation, the following steps are performed:
converting the request channel signal nsp_req into a command channel signal BIF _cmd and a write data channel signal BIF _write of the BIF protocol interface to be issued to the slave; and
after the last write request data in the request channel signal nsp_req is received by the write data channel of the BIF protocol interface, a write response is automatically generated and sent to the host through the response channel of the NSP protocol interface.
In general, the above technical solutions conceived by the present invention have the following beneficial effects compared with the prior art: the NSP (Arteris NoC Socket Protocol) protocol supported by a Network On Chip (NOC) Bus is converted into a Bus Interface (BIF) protocol of a GPU core, on one hand, a memory in the GPU core can perform normal data interaction with a GPU upper operation unit, and On the other hand, due to the intervention of the NOC topology Network, data flows can be combined, so that a large amount of Chip area can be saved, and the difficulty of back-end layout and wiring is reduced. In addition, the NOC bus is beneficial to the iterative expansion of the GPU kernel by connecting the functional module in the chip to the network, so that the increase of the GPU kernel operation unit can be realized with less labor cost.
Drawings
FIG. 1 is a schematic diagram of a protocol conversion module suitable for bus access to a GPU-internal memory according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of storing data in a first memory cell according to an embodiment of the invention;
FIG. 3 is a schematic diagram of storing data in a second memory cell according to an embodiment of the present invention;
fig. 4 is a flow chart of a protocol conversion method suitable for bus access to the GPU in-core memory according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. As will be recognized by those of skill in the pertinent art, the described embodiments may be modified in various different ways without departing from the spirit or scope of the present application. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.
With the development of modern information technology, the chip integration level is higher and higher, the requirements on the chip area and the performance are more and more stringent, and the number and the bandwidth requirements between the modules in the chip are greatly increased. The conventional point-to-point mesh structure has difficulty in supporting such huge performance requirements, and thus, it has become extremely important to solve the problem of difficulty in chip development due to iterative development of chips.
A Network On Chip (NOC) bus is used as a topological structure Network, and can package and transmit the data stream of a transmitting end to a receiving end, so that the requirements of area and performance can be met at the same time. However, since the GPU core bus protocol is specific, a protocol conversion module is required to convert the protocol supported by the NOC bus into the bus protocol of the GPU core. Therefore, the invention designs a protocol conversion module which converts the NSP (Arteris NoC Socket Protocol) protocol supported by the NOC Bus into a Bus Interface (BIF) protocol of the GPU kernel.
After the protocol conversion module, the memory in the GPU core can perform normal data interaction with the GPU upper operation unit, and because of the intervention of the NOC topology network, data flows can be combined, so that a large amount of chip area can be saved, the difficulty of back-end layout and wiring is reduced, the iterative expansion of the GPU core is facilitated, and the increase of the GPU core operation unit can be realized with less labor cost.
In the embodiment of the invention, the NSP protocol of the network on chip comprises two channels, namely a request channel and a response channel; the self-defined BIF protocol for the GPU nuclear operation unit to access the nuclear memory comprises three channels, namely a command channel, a data writing channel and a data reading channel.
As shown in fig. 1, a protocol conversion module suitable for accessing a GPU in-core memory by a bus according to an embodiment of the present invention includes a request processing module, a read response processing module, and a write response processing module. The read response processing module comprises a first storage unit (first buffer), and the write response processing module comprises a second storage unit (second buffer).
The NSP protocol interface request channel signal nsp_req includes: request address, request operation type, first flag signal TrID, data signal, control signal, and bypass signal. Wherein the first flag signal TrID is used to identify the identity of the request channel signal nsp_req; the control signal includes a request data beat number len; the bypass signal includes a second flag signal seq_num identifying the identity of the command channel signal BIF _cmd of the BIF protocol interface; the bypass signal includes a request-related bypass signal and a data-related bypass signal.
During a read operation, the request processing module is configured to identify that the read operation is based on a request operation type in a request channel signal nsp_req of the NSP protocol interface, and store a first flag signal TrID in the request channel signal nsp_req as a location index, together with a second flag signal seq_num and a request data number len, in a first storage unit in the read response processing module, where the read response processing module sets a flag bit flag1 corresponding to the location index as a first value, and stores the first value in the first storage unit, to indicate that data under the location index is all valid, as shown in fig. 2. The request processing module is further configured to convert a request channel signal nsp_req of the NSP protocol interface into a command channel signal BIF _cmd of the BIF protocol interface and issue the command channel signal BIF _cmd to the slave. Specifically, through the request processing module, the request address, the request operation type, the control signal and the request related bypass signal in the request channel signal nsp_req of the NSP protocol interface are converted into the command channel signal BIF _cmd of the BIF protocol interface.
The slave receives the command channel signal BIF _cmd from the request processing module, generates the read data channel signal BIF _return of the BIF protocol interface, and sends the read data channel signal BIF _return to the read response processing module. The read data channel signal bif _return includes read data and a read data bypass signal, the read data bypass signal including the second flag signal seq_num.
In some embodiments, the read response processing module generates a response channel signal nsp_resp of the NSP protocol interface according to the read data channel signal bif _return, the position index in the first storage unit, the second flag signal seq_num corresponding to the position index, the request data beat number len, and the flag bit flag1, and the host determines whether the read request in the request channel signal nsp_req corresponding to the first flag signal TrID is executed through the response channel signal nsp_resp.
In some embodiments, the read response processing module compares the second flag signal seq_num in the read data channel signal bif _return with the second flag signal seq_num in the first storage unit, and when there is a matched second flag signal seq_num in the first storage unit and the corresponding flag bit flag1 is a first value, converts the corresponding position index and the read data channel signal bif _return in the first storage unit into a response channel signal nsp_resp of the NSP protocol interface, and returns the response channel signal nsp_resp to the host, and the host determines that one read request in the request channel signal nsp_req corresponding to the first flag signal TrID is completed according to the response channel signal nsp_resp, that is, one read data transmission is completed.
In some embodiments, one first flag signal TrID corresponds to a plurality of second flag signals seq_num. In some embodiments, when there is a matched second flag signal seq_num in the first storage unit and the corresponding flag bit flag1 is the first value, the read response processing module further adjusts the value of the second flag signal seq_num and the value of the request data beat len, generates a response channel signal nsp_resp according to the value of the request data beat len, returns the response channel signal nsp_resp to the host, and the host further determines whether all read requests in the request channel signal nsp_req corresponding to the first flag signal TrID are executed according to the response channel signal nsp_resp.
In some embodiments, when the value of the requested data beat len is not 0, the generated response channel signal nsp_resp contains the first flag signal TrID and the data signal; when the value of the request data beat number len is 0, the generated response channel signal nsp_resp comprises a first flag signal TrID, a data signal and a third flag signal last, wherein the third flag signal last is used for marking that all the requests are executed, and when the host receives the third flag signal last, the host judges that all the read requests in the request channel signal nsp_req corresponding to the first flag signal TrID are executed.
In some embodiments, when there is a matched second flag signal seq_num in the first storage unit and the corresponding flag bit flag1 is the first value, the read response processing module adds 1 to the value of the second flag signal seq_num, subtracts 1 to the value of the request data beat len, generates a response channel signal nsp_resp, executes the next request, and so on, until the request data beat len is 0, generates the response channel signal nsp_resp, and makes the response channel signal nsp_resp include the third flag signal last, and after receiving the response channel signal nsp_resp, the host determines that all the read requests in the request channel signal nsp_req corresponding to the first flag signal TrID are executed.
For example, the request processing module stores the second flag signal seq_num=80 of the first storage unit, requests the data beat number len=3, the first data returns, reads the second flag signal seq_num included in the data channel signal bif _return to be 80, and the read response processing unit generates the response channel signal nsp_resp by adding 1 to the value of the second flag signal seq_num to be 81 and subtracting 1 to the value of the request data beat number len to be 2; a second data return, wherein the read data channel signal bif _return contains a second flag signal seq_num of 81, the read response processing unit generates a response channel signal nsp_resp by adding 1 to the value of the second flag signal seq_num to 82 and subtracting 1 to the value of the request data beat len; and returning the third data, wherein the second flag signal seq_num contained in the read data channel signal bif _return is 82, the read response processing unit adds 1 to the value of the second flag signal seq_num to be 83, subtracts 1 to the value of the request data beat number len to be 0, generates a response channel signal nsp_resp, and causes the response channel signal nsp_resp to contain a third flag signal last, and the host judges that all read requests in the request channel signal nsp_req corresponding to the first flag signal TrID are completely executed according to the third flag signal last, namely all the requested data are completely read.
In some embodiments, when there is a matched second flag signal seq_num in the first storage unit, the corresponding flag bit flag1 is a first value, and the value of the request data beat len is adjusted to be 0, it indicates that all the requests in the request channel signal nsp_req corresponding to the first flag signal TrID have been replied, and the read response processing module adjusts the flag bit flag1 to a second value, which indicates that the data under the position index is all invalid.
In some embodiments, the first value is 1 and the second value is 0.
In the above-mentioned read operation, the second flag signal seq_num in the return channel of the BIF protocol interface is not kept unchanged, for example, the second flag signal seq_num is gradually increased, and the commands may be interleaved with each other, resulting in difficulty in comparison. In order to ensure smooth comparison of command information, when the request processing module stores related information, the flag bit flag1 is used to indicate that the related information is valid, for example, the flag bit flag1 is pulled high. Further, through the synergistic effect of the second flag signal seq_num, the request data beat number len and the flag bit flag1, information returned by a return channel of the BIF protocol interface is distinguished, and information refreshing is carried out after each return data is received so as to carry out comparison next time, so that transmission can be correctly carried out no matter whether data returned by a read request are out in disorder or are interleaved, and smooth communication between an upstream operation unit and a downstream memory of a GPU core is further ensured.
During writing, the request processing module is configured to identify that the writing operation is performed according to a request operation type in a request channel signal nsp_req of the NSP protocol interface, store a first flag signal TrID in the request channel signal nsp_req as a location index into a second storage unit in the write response processing module, and set a flag bit 2 corresponding to the location index as a third value and store the third value into the second storage unit, as shown in fig. 3; the request processing module is further configured to convert a request channel signal nsp_req of the NSP protocol interface into a command channel signal BIF _cmd and a write data channel signal BIF _write of the BIF protocol interface for issuing to the slave. Specifically, through the request processing module, the request address, the request operation type, the control signal and the request related bypass signal in the request channel signal nsp_req of the NSP protocol interface are converted into the command channel signal BIF _cmd of the BIF protocol interface; the data signal, control signal and data dependent bypass signal in the request channel signal nsp_req of the NSP protocol interface are converted into the write data channel signal BIF _wirte of the BIF protocol interface.
On the one hand, since the BIF bus protocol specifies that the write data and related data information carried in the write request of the NSP protocol interface must wait for the command channel of the BIF protocol interface to receive the command before the related information can be transferred to the write data channel of the BIF protocol interface, handshaking signals between the request channel of the NSP protocol interface and the request channel and the write data channel of the BIF protocol interface need to be carefully processed.
On the other hand, in the BIF bus protocol, the downstream module does not generate a write response reply to the upstream module after completing the write request, whereas in the NSP bus protocol, each request must receive a corresponding response, so the protocol conversion module needs to automatically generate a write response after completing the write request to ensure smooth information transmission. Since the BIF bus protocol does not have a write response, in order to ensure the performance of the protocol conversion module more, the priority of sending the write response by the protocol conversion module is lower than the priority of sending the read response, and the write response is only initiated when the bus response reply channel is idle. For example, the write response processing module is sending a write response to the response channel of the NSP protocol interface, and the read data channel of the BIF protocol interface returns a read response, at which time the write response processing module immediately interrupts the operation of sending the write response, and hands over the manipulation right of the response channel of the NSP protocol interface to the read response processing module.
In some embodiments, after the last write request data (i.e., the last write data) is received by the write data channel of the BIF protocol interface, the request processing module stores the first flag signal TrID in the request channel signal nsp_req as a location index into the second storage unit in the write response processing module, and the write response processing module sets the flag bit flag2 corresponding to the location index to a third value, stores the third value into the second storage unit, and automatically generates a write response, and sends the write response to the host through the response channel of the NSP protocol interface (i.e., the response channel signal nsp_resp). In some embodiments, the write response processing module adjusts the flag2 corresponding to the location index to a fourth value after the generated write response is received by the response channel of the NSP protocol interface, which indicates that all data under the location index is invalid.
In some embodiments, the response channel signal nsp_resp includes a first flag signal TrID, and after the host receives the response channel signal nsp_resp, the host determines that all write requests in the request channel signal nsp_req corresponding to the first flag signal TrID are performed.
In some embodiments, when flag2 is a third value, it indicates that the data under the position index is all valid. In some embodiments, the third value is 1 and the fourth value is 0.
In the above writing operation, the flag bit flag2 is used to indicate that the related information is valid, for example, the flag bit flag2 is pulled high, which not only can avoid that the writing response data is covered by the reading response data stream, but also can ensure that the writing data is not lost under unexpected conditions such as non-receiving of the NSP bus.
As shown in fig. 4, a protocol conversion method applicable to a bus accessing a GPU in-core memory according to an embodiment of the present invention includes:
step S401: the request operation type is identified from the request channel signal nsp_req of the NSP protocol interface.
When the type of the request operation is identified as a read operation, the following steps are performed:
step S403: the command channel signal BIF _cmd, which converts the request channel signal nsp_req to the BIF protocol interface, is issued to the slave.
Step S405: receiving the read data channel signal bif _return from the slave, generating a response channel signal nsp_resp of the NSP protocol interface according to the read data channel signal bif _return, and sending the response channel signal nsp_resp back to the host, so that the host can determine whether the read request in the request channel signal nsp_req is executed according to the response channel signal nsp_resp.
In some embodiments, upon identifying the type of requested operation as a read operation, the following steps are also performed: storing a first set of data including a first flag signal TrID, a request data beat len, and a second flag signal seq_num in a request channel signal nsp_req; a response channel signal nsp_resp is generated from the stored first set of data, so that the host can determine whether the read request in the request channel signal nsp_req corresponding to the first flag signal TrID is completed or not according to the response channel signal nsp_resp.
When the type of the request operation is identified as a write operation, the following steps are performed:
step S407: the request channel signal nsp_req is converted into a command channel signal BIF _cmd and a write data channel signal BIF _write of the BIF protocol interface to be issued to the slave.
Step S409: after the last write request data in the request channel signal nsp_req is received by the write data channel of the BIF protocol interface, a write response is automatically generated and sent to the host through the response channel of the NSP protocol interface.
For more details of the protocol conversion method according to the embodiment of the present invention, reference may be made to the description of the protocol conversion module described above, which have the same or similar technical effects. The present invention is not described in detail herein.
According to the invention, NSP protocol supported by NOC bus is converted into BIF protocol of GPU kernel, on one hand, the memory in GPU kernel can perform normal data interaction with the upper operation unit of GPU, on the other hand, because NOC topology network is interposed, data streams can be combined, so that a large amount of chip area can be saved, and difficulty in back-end layout and wiring is reduced. In addition, the NOC bus is beneficial to the iterative expansion of the GPU kernel by connecting the functional module in the chip to the network, so that the increase of the GPU kernel operation unit can be realized with less labor cost.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.
Any process or method description in a flowchart or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more (two or more) executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present application includes additional implementations in which functions may be performed in a substantially simultaneous manner or in an opposite order from that shown or discussed, including in accordance with the functions that are involved.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.
It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. All or part of the steps of the methods of the embodiments described above may be performed by a program that, when executed, comprises one or a combination of the steps of the method embodiments, instructs the associated hardware to perform the method.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules described above, if implemented in the form of software functional modules and sold or used as a stand-alone product, may also be stored in a computer-readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of various changes or substitutions within the technical scope of the present application, and these should be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (15)

1. The protocol conversion module is characterized by comprising a request processing module and a read response processing module;
the request processing module is used for identifying the request operation type according to a request channel signal nsp_req of the NSP protocol interface, converting the request channel signal nsp_req into a command channel signal BIF _cmd of the BIF protocol interface and transmitting the command channel signal BIF _cmd to the slave when the request operation type is identified as a read operation;
the read response processing module is configured to receive a read data channel signal bif _return from the slave, generate a response channel signal nsp_resp of the NSP protocol interface according to the read data channel signal bif _return, and send the response channel signal nsp_resp back to the host, so that the host can determine whether the read request in the request channel signal nsp_req is executed according to the response channel signal nsp_resp.
2. The protocol conversion module according to claim 1, wherein the request channel signal nsp_req includes a first flag signal TrID, a request data beat number len, and a second flag signal seq_num; the first flag signal TrID is used to identify the identity of the request channel signal nsp_req and the second flag signal seq_num is used to identify the identity of the command channel signal bif _cmd;
the read response processing module further comprises a first storage unit; the request processing module is further configured to store a first flag signal TrID, a request data beat number len, and a second flag signal seq_num into the first storage unit when the request operation type is identified as a read operation; the read response processing module is further configured to generate a response channel signal nsp_resp according to the data stored in the first storage unit, so that the host can determine whether the read request in the request channel signal nsp_req corresponding to the first flag signal TrID is executed according to the response channel signal nsp_resp.
3. The protocol conversion module according to claim 2, wherein the read response processing module is further configured to generate a flag bit flag1 corresponding to the first flag signal TrID, and store the flag bit flag1 as a first value in the first storage unit;
the read response processing module is further configured to compare a second flag signal seq_num in the read data channel signal bif _return with the second flag signal seq_num in the first storage unit, and when the matched second flag signal seq_num exists in the first storage unit and the corresponding flag bit flag1 is a first value, convert the corresponding first flag signal TrID and the read data channel signal bif _return in the first storage unit into a response channel signal nsp_resp, so that the host can determine that the execution of one read request in the request channel signal nsp_req corresponding to the first flag signal TrID is completed according to the response channel signal nsp_resp.
4. The protocol conversion module according to claim 3, wherein the read response processing module is further configured to adjust a value of the second flag signal seq_num and a value of the request data beat len when there is a matched second flag signal seq_num in the first storage unit and the corresponding flag bit flag1 is a first value, and generate a response channel signal nsp_resp according to the value of the request data beat len, so that the host can further determine whether all read requests in the request channel signal nsp_req corresponding to the first flag signal TrID are executed according to the response channel signal nsp_resp.
5. The protocol conversion module according to claim 4, wherein the generated response channel signal nsp_resp contains the first flag signal TrID and the data signal when the value of the requested data beat len is not 0; when the value of the request data beat number len is 0, the generated response channel signal nsp_resp includes a first flag signal TrID, a data signal, and a third flag signal last, where the third flag signal last is used to identify that all requests are executed.
6. The protocol conversion module according to claim 5, wherein the read response processing module is further configured to, when there is a matched second flag signal seq_num in the first storage unit and the corresponding flag bit flag1 is a first value, add 1 to the value of the second flag signal seq_num, subtract 1 to the value of the request data beat len, generate a response channel signal nsp_resp, execute a next request, and so on, until the request data beat len is 0, generate a response channel signal nsp_resp, and enable the response channel signal nsp_resp to include a third flag signal last, so that it is possible to determine that the read requests in the request channel signal nsp_req corresponding to the first flag signal TrID are all executed according to the third flag signal last.
7. The protocol conversion module according to claim 6, wherein the read response processing module is further configured to adjust the flag bit flag1 to the second value when there is a matched second flag signal seq_num in the first storage unit, the corresponding flag bit flag1 is the first value, and the value of the request data beat number len is adjusted to be 0.
8. The protocol conversion module according to any one of claims 1 to 7, further comprising a write response processing module;
the request processing module is further configured to convert a request channel signal nsp_req into a command channel signal BIF _cmd and a write data channel signal BIF _write of the BIF protocol interface to be issued to the slave when the request operation type is identified as a write operation;
the write response processing module is further configured to automatically generate a write response after the data of the last write request in the request channel signal nsp_req is received by the write data channel of the BIF protocol interface, and send the write response to the host through the response channel of the NSP protocol interface.
9. The protocol conversion module according to claim 8, wherein the write response processing module includes a second storage unit; the request processing module is further configured to store a first flag signal TrID in the request channel signal nsp_req into the second storage unit after the data of the last write request in the request channel signal nsp_req is received by a write data channel of the BIF protocol interface; the write response processing module is further configured to generate a response channel signal nsp_resp of the NSP protocol interface according to the data stored in the second storage unit, so that the host computer can determine that all the write requests in the request channel signal nsp_req corresponding to the first flag signal TrID are executed according to the response channel signal nsp_resp.
10. The protocol conversion module according to claim 9, wherein the write response processing module is further configured to generate a flag bit flag2 corresponding to the first flag signal TrID, and to set the flag bit flag2 to a third value, and to store the third value in the second storage unit, for indicating that the corresponding first flag signal TrID is valid.
11. The protocol conversion module according to claim 10, wherein the write response processing module is further configured to adjust a flag bit flag2 corresponding to the first flag signal TrID to a fourth value for indicating that the corresponding first flag signal TrID is invalid after the generated write response is received by the response channel of the NSP protocol interface.
12. A graphics processing unit comprising a master, a slave and a protocol conversion module according to any one of claims 1 to 11.
13. A method of protocol conversion, comprising:
identifying a request operation type according to a request channel signal nsp_req of the NSP protocol interface;
when the type of the request operation is identified as a read operation, the following steps are performed:
converting the request channel signal nsp_req into a command channel signal BIF _cmd of the BIF protocol interface and transmitting the command channel signal BIF _cmd to the slave; and
receiving the read data channel signal bif _return from the slave, generating a response channel signal nsp_resp of the NSP protocol interface according to the read data channel signal bif _return, and sending the response channel signal nsp_resp back to the host, so that the host can determine whether the read request in the request channel signal nsp_req is executed according to the response channel signal nsp_resp.
14. The protocol conversion method according to claim 13, wherein when the request operation type is identified as a read operation, the following steps are further performed: storing a first set of data including a first flag signal TrID, a request data beat len, and a second flag signal seq_num in a request channel signal nsp_req; a response channel signal nsp_resp is generated from the stored first set of data, so that the host can determine whether the read request in the request channel signal nsp_req corresponding to the first flag signal TrID is completed or not according to the response channel signal nsp_resp.
15. The protocol conversion method according to claim 13, further comprising:
when the type of the request operation is identified as a write operation, the following steps are performed:
converting the request channel signal nsp_req into a command channel signal BIF _cmd and a write data channel signal BIF _write of the BIF protocol interface to be issued to the slave; and
after the last write request data in the request channel signal nsp_req is received by the write data channel of the BIF protocol interface, a write response is automatically generated and sent to the host through the response channel of the NSP protocol interface.
CN202311698394.8A 2023-12-12 Protocol conversion module and method suitable for bus access to GPU (graphics processing unit) nuclear memory CN117389931B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311698394.8A CN117389931B (en) 2023-12-12 Protocol conversion module and method suitable for bus access to GPU (graphics processing unit) nuclear memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311698394.8A CN117389931B (en) 2023-12-12 Protocol conversion module and method suitable for bus access to GPU (graphics processing unit) nuclear memory

Publications (2)

Publication Number Publication Date
CN117389931A true CN117389931A (en) 2024-01-12
CN117389931B CN117389931B (en) 2024-05-03

Family

ID=

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1603049A1 (en) * 2004-06-03 2005-12-07 STMicroelectronics S.A. Interface of functional modules in a chip system
CN1910571A (en) * 2003-07-25 2007-02-07 国际商业机器公司 A single chip protocol converter
US20070055808A1 (en) * 2005-08-31 2007-03-08 Ati Technologies Inc. Methods and apparatus for translating write request messages in a computing system
US20070198762A1 (en) * 2003-12-18 2007-08-23 Xiaokun Xiong A bus interface converter capable of converting amba ahb bus protocol into i960-like bus protocol
WO2011149066A1 (en) * 2010-05-28 2011-12-01 国立大学法人東北大学 Asynchronous protocol converter device
CN104685480A (en) * 2012-09-25 2015-06-03 高通科技公司 Network on a chip socket protocol
CN114416632A (en) * 2021-12-28 2022-04-29 北京时代民芯科技有限公司 Two-stage cache interconnection structure based on flexible conversion of multi-bus protocol
CN114697276A (en) * 2020-12-30 2022-07-01 阿特里斯公司 Broadcast switch system in network on chip (NoC)
CN114756493A (en) * 2022-03-31 2022-07-15 中国电子科技集团公司第五十八研究所 Interface design and communication method for expandable interconnected bare core and peer-to-peer equipment
CN114928657A (en) * 2021-02-12 2022-08-19 阿特里斯公司 System and method for composition of connectivity to interconnects in a multi-protocol system on a chip

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1910571A (en) * 2003-07-25 2007-02-07 国际商业机器公司 A single chip protocol converter
US20070198762A1 (en) * 2003-12-18 2007-08-23 Xiaokun Xiong A bus interface converter capable of converting amba ahb bus protocol into i960-like bus protocol
EP1603049A1 (en) * 2004-06-03 2005-12-07 STMicroelectronics S.A. Interface of functional modules in a chip system
US20070055808A1 (en) * 2005-08-31 2007-03-08 Ati Technologies Inc. Methods and apparatus for translating write request messages in a computing system
WO2011149066A1 (en) * 2010-05-28 2011-12-01 国立大学法人東北大学 Asynchronous protocol converter device
CN104685480A (en) * 2012-09-25 2015-06-03 高通科技公司 Network on a chip socket protocol
CN114697276A (en) * 2020-12-30 2022-07-01 阿特里斯公司 Broadcast switch system in network on chip (NoC)
CN114928657A (en) * 2021-02-12 2022-08-19 阿特里斯公司 System and method for composition of connectivity to interconnects in a multi-protocol system on a chip
CN114416632A (en) * 2021-12-28 2022-04-29 北京时代民芯科技有限公司 Two-stage cache interconnection structure based on flexible conversion of multi-bus protocol
CN114756493A (en) * 2022-03-31 2022-07-15 中国电子科技集团公司第五十八研究所 Interface design and communication method for expandable interconnected bare core and peer-to-peer equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
田鑫: ""基于AXI总线的Interconnect IP核的硬件设计与验证"", 《中国优秀硕士学位论文全文数据库(电子期刊)》, 1 December 2022 (2022-12-01) *

Similar Documents

Publication Publication Date Title
US6502167B1 (en) Duplicated shared memory controller for disk array
US20240028525A1 (en) Method for writing data from axi bus to opb and method for reading data from axi bus to opb bus
US8954644B2 (en) Apparatus and method for controlling memory
US20060080398A1 (en) Direct access of cache lock set data without backing memory
US9424193B2 (en) Flexible arbitration scheme for multi endpoint atomic accesses in multicore systems
US9864687B2 (en) Cache coherent system including master-side filter and data processing system including same
CN111563052A (en) Cache method and device for reducing read delay, computer equipment and storage medium
CN115509959A (en) Processing system, control method, chip, and computer-readable storage medium
CN114036089B (en) Data processing method and device, buffer, processor and electronic equipment
US20130191587A1 (en) Memory control device, control method, and information processing apparatus
CN105718242A (en) Processing method and system for supporting software and hardware data consistency in multi-core DSP (Digital Signal Processing)
US20140115265A1 (en) Optimum cache access scheme for multi endpoint atomic access in a multicore system
CN117389931B (en) Protocol conversion module and method suitable for bus access to GPU (graphics processing unit) nuclear memory
CN117389931A (en) Protocol conversion module and method suitable for bus access to GPU (graphics processing unit) nuclear memory
WO2023186143A1 (en) Data processing method, host, and related device
US20200125489A1 (en) Data loading method, data loading apparatus, and recording medium
CN116893991A (en) Storage module conversion interface under AXI protocol and conversion method thereof
WO2023124304A1 (en) Chip cache system, data processing method, device, storage medium, and chip
US20060277326A1 (en) Data transfer system and method
US11275589B2 (en) Method for managing the supply of information, such as instructions, to a microprocessor, and a corresponding system
CN109726149B (en) Method and device for accessing NAND FLASH through AXI bus
CN117370231B (en) Protocol conversion module and method for realizing network bus access on chip of GPU (graphics processing Unit)
WO2023240719A1 (en) Memory testing method and apparatus, and storage medium and electronic device
US20240143392A1 (en) Task scheduling method, chip, and electronic device
US11094368B2 (en) Memory, memory chip and memory data access method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination