CN114116553B - Data processing device, method and system - Google Patents

Data processing device, method and system Download PDF

Info

Publication number
CN114116553B
CN114116553B CN202111445101.6A CN202111445101A CN114116553B CN 114116553 B CN114116553 B CN 114116553B CN 202111445101 A CN202111445101 A CN 202111445101A CN 114116553 B CN114116553 B CN 114116553B
Authority
CN
China
Prior art keywords
access request
transmission
data
buffer
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111445101.6A
Other languages
Chinese (zh)
Other versions
CN114116553A (en
Inventor
潘治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Haiguang Information Technology Co Ltd
Original Assignee
Haiguang Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Haiguang Information Technology Co Ltd filed Critical Haiguang Information Technology Co Ltd
Priority to CN202111445101.6A priority Critical patent/CN114116553B/en
Publication of CN114116553A publication Critical patent/CN114116553A/en
Application granted granted Critical
Publication of CN114116553B publication Critical patent/CN114116553B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/16Handling requests for interconnection or transfer for access to memory bus
    • G06F13/1668Details of memory controller
    • G06F13/1673Details of memory controller using buffers

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bus Control (AREA)

Abstract

The present disclosure provides a data processing apparatus, method and system, the data processing apparatus including a calculation unit, a transmission access request buffer and a command processing unit. The computing unit is configured to generate a transmission access request in response to data that needs to be transmitted; the transmission access request buffer is configured to buffer transmission access requests; the command processing unit is configured to: in response to a transmission access request buffered in the transmission access request buffer, the transmission access request is provided to the data transmission control device, whereby the data transmission control device responds to the transmission access request to transmit data to be transmitted. The data processing device can reduce the system overhead and improve the system performance.

Description

Data processing device, method and system
Technical Field
Embodiments of the present disclosure relate to a data processing apparatus, a data processing method, and a data processing system.
Background
Direct Memory Access (DMA) is a mechanism for fast data transfer in computer systems. DMA transfers data from one address space to another. A Central Processing Unit (CPU) initiates a DMA transfer operation, which itself is directly performed and completed by a DMA controller. The DMA controller takes over control of the system bus completely from the CPU independent of the CPU. DMA transfers mainly involve four cases, but these four cases are essentially the same, all being transferred from one memory area to another. These four cases include: peripheral to memory (D2H), memory to peripheral (H2D), memory to memory (H2H), peripheral to peripheral (D2D) (the memory is considered to belong to the Host side).
Disclosure of Invention
At least one embodiment of the present disclosure provides a data processing apparatus including:
a computing unit configured to generate a transmission access request in response to data that needs to be transmitted;
a transmit access request buffer configured to buffer the transmit access request;
a command processing unit configured to: providing the transmission access request to a data transmission control device in response to the transmission access request buffered in the transmission access request buffer, whereby the data transmission control device responds to the transmission access request to transmit the data to be transmitted.
For example, in a data processing apparatus according to at least one embodiment of the present disclosure, the computing unit is further configured to notify a central processing unit other than the data processing apparatus after generating the transmission access request.
For example, in the data processing apparatus according to at least one embodiment of the present disclosure, the calculation unit is further configured to, after generating the transmission access request, inquire whether the transmission access request buffer is full, and in response to the transmission access request buffer not being full, write the transmission access request into the transmission access request buffer.
For example, in a data processing apparatus according to at least one embodiment of the present disclosure, the transmitting an access request includes: a transmission source address, a transmission destination address, and a transmission data size.
For example, in the data processing apparatus according to at least one embodiment of the present disclosure, the transmission access request further includes a transmission status, and the transmission status includes pending transmission, in transmission, transmission completion, or transmission failure.
For example, the data processing apparatus according to at least one embodiment of the present disclosure further includes: a command buffer configured to buffer an operation instruction for the command processing unit; wherein the command processing unit is further configured to: and preferentially processing at least one transmission access request buffered in the transmission access request buffer under the condition that at least one operation instruction is buffered in the command buffer and at least one transmission access request is buffered in the transmission access request buffer.
For example, in a data processing apparatus according to at least one embodiment of the present disclosure, the command processing unit is further configured to: after processing each operation instruction in the command buffer, inquiring whether at least one transmission access request is cached in the transmission access request buffer, responding to the transmission access request to be processed cached in the transmission access request buffer, and after processing the transmission access request to be processed cached in the transmission access request buffer, returning to process other operation instructions in the command buffer.
For example, in a data processing apparatus according to at least one embodiment of the present disclosure, the transmission access request buffer is a first-in first-out buffer.
For example, in a data processing apparatus according to at least one embodiment of the present disclosure, the command processing unit is further configured to provide the transmission access request to the data transmission control apparatus other than the data processing apparatus.
At least one embodiment of the present disclosure also provides a data processing system, including: the data processing apparatus as described in any of the above and the above data transmission control apparatus, the data transmission control apparatus being configured to transmit the data to be transmitted in response to the transmission access request.
For example, a data processing system according to at least one embodiment of the present disclosure further comprises a central processing unit, wherein the computing unit is further configured to notify the central processing unit after generating the transmission access request.
For example, in a data processing system according to at least one embodiment of the present disclosure, the central processing unit is configured to continue to check a completion status of the transmission access request after receiving the notification about the transmission access request.
For example, in a data processing system according to at least one embodiment of the present disclosure, the data transfer control device includes a direct memory access device.
At least one embodiment of the present disclosure also provides a data processing method, including: generating a transmission access request in response to data that needs to be transmitted; caching the transmission access request; providing the transmission access request to a data transmission control device in response to the buffered transmission access request; and enabling the data transmission control device to respond to the transmission access request to transmit the data needing to be transmitted.
For example, the data processing method according to at least one embodiment of the present disclosure further includes: caching an operation instruction; causing a command processing unit executing the operation instruction and the transmission access request to preferentially process the transmission access request.
Drawings
To more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings of the embodiments will be briefly introduced below, and it is apparent that the drawings in the following description relate only to some embodiments of the present disclosure and are not limiting to the present disclosure.
FIG. 1 shows a schematic diagram of a computer system;
FIG. 2 shows a schematic diagram of a data processing apparatus according to at least one embodiment of the present disclosure;
FIG. 3 shows a schematic diagram of a data processing system in accordance with at least one embodiment of the present disclosure;
fig. 4 shows a schematic diagram of a data processing method according to at least one embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the drawings of the embodiments of the present disclosure. It is to be understood that the described embodiments are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the described embodiments of the disclosure without inventive step, are within the scope of protection of the disclosure.
The computer system may include one or more Central Processing Units (CPUs), one or more depth calculation units including, for example, image processors (GPUs), tensor Processing Units (TPUs), neural calculation units (NPUs), and the like. In order to improve the overall performance of the system as much as possible, the depth calculation unit and the CPU need to be closely matched. For example, when a large-scale matrix operation is continued using the depth calculation unit, a large-batch data transmission operation needs to be frequently performed. In this case, the DMA method has higher transfer efficiency.
Fig. 1 shows a computer system including a Central Processing Unit (CPU) 101, a depth calculation unit 100, a DMA controller 103, and a bus 104, the Central Processing Unit (CPU) 101, the depth calculation unit 100, and the DMA controller 103 communicating with each other through the bus 104.
When data transfer in a DMA manner is required for data required by or generated by the depth calculation unit 100, for example, the following steps may be employed.
(1) The data calculation is performed at the depth calculation unit 100 according to the instruction, and the central processing unit 101 is notified that the data to be transmitted is ready, and the data to be transmitted may be transmitted from the depth calculation unit 100 to the memory or other peripheral devices, or transmitted from the memory or other peripheral devices to the depth calculation unit 100.
(2) The central processing unit 101 issues a DMA transfer access request through an Application Program Interface (API) of the operating system according to an instruction of an application program currently running in the system, for example, the DMA transfer access request is written into a running queue of the operating system.
(3) The run queue writes the DMA transfer access request and other requests generated during the current program run, for example, in the form of a PM4 packet, to a command buffer shared by the CPU/depth calculation unit.
(4) A command processing unit (CP) of the depth calculation unit 100 processes various packets buffered in the command buffer, thereby providing the DMA transfer access request to a DMA controller (DMA Engine).
(5) The DMA controller carries out data transmission and directly transmits data to be transmitted from a source address to a destination address. And after the DMA controller finishes the data transmission, informing the CPU of the completion of the data transmission.
The above steps (2) - (3) still involve the work that the CPU needs to perform to some extent in practice. In the case of a large number of data transfer operations, the above steps are iterated a number of times. In this case, these steps (2) - (3) will have an adverse effect on the system performance.
At least one embodiment of the present disclosure provides a data processing apparatus including a Calculation Unit (Calculation Unit), a transmission access Request Buffer (Request Buffer), and a Command processing Unit (Command Processor). The computing unit is configured to generate a transfer access request in response to data that needs to be transferred. The transmit access buffer is configured to buffer transmit access requests. The command processing unit is configured to: in response to the transmission access request buffered in the transmission access buffer, the transmission access request is provided to the data transmission control device, whereby the data transmission control device responds to the transmission access to transmit the data to be transmitted.
At least one embodiment of the present disclosure also provides a data processing system including the above-described data processing apparatus and a data transmission control apparatus configured to respond to a transmission access request to transmit data that needs to be transmitted.
At least one embodiment of the present disclosure also provides a data processing method, including: generating a transmission access request in response to data that needs to be transmitted; caching the transmission access request; providing the transmission access request to the data transmission control device in response to the cached transmission access request; so that the data transmission control means responds to the transmission access request to transmit the data to be transmitted.
When the data processing device, the system and the method of the embodiment of the disclosure are applied to a DMA mode, the overhead of a CPU can be further reduced, the effective bandwidth of DMA operation is increased, and the overall performance of the system is improved.
The data processing apparatus of the embodiments of the present disclosure may be, for example, a depth calculation unit, for example, the depth calculation unit may be embodied as an image processor (GPU), a Tensor Processing Unit (TPU), a Neural Processing Unit (NPU), and the like, and the embodiments of the present disclosure are not limited thereto and may be implemented by an integrated circuit, and the like.
FIG. 2 shows a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure; FIG. 3 shows a schematic diagram of a data processing system including, for example, the data processing apparatus of FIG. 2, according to an embodiment of the present disclosure. For example, the data processing Device is, for example, a depth calculation unit, and the data processing system is, for example, a computer system in which the data processing Device is a peripheral (Device) side with respect to the CPU, and correspondingly, the CPU or the memory is a Host (Host) side.
As shown in fig. 2, the data processing apparatus 200 includes a command processing unit 211, a command buffer 212, a transfer access request buffer 213, one or more computing units 230, and the like. The data processing device 200 may also comprise a storage device (not shown), such as a High Bandwidth Memory (HBM), for example. For example, the storage device of the data processing apparatus 200 may be accessed by a device (e.g., a CPU or another data processing apparatus) other than the data processing apparatus 200.
Fig. 2 specifically shows 2 calculation units 230 as an example, and other possible calculation units are omitted. Each compute unit 230 includes a thread bundle dispatch/dispatch module, multiple compute cores (kernels), register files, a shared L1 cache, and the like. In order to schedule threads for performing computational tasks among the plurality of computational units 230, the data processing apparatus 200 further comprises a thread block scheduling unit 221.
The data processing apparatus may be used for computational tasks such as matrix computations, image rendering, etc., which may be performed in parallel by a plurality of threads (threads). For example, the threads are divided into thread blocks (thread blocks) in the thread block scheduling unit 221 before being executed, and then the thread blocks are distributed to the respective computing units. All threads in a thread block are typically assigned to the same compute unit for execution. Meanwhile, a thread block is split into thread bundles (or simply thread bundles), for example, each thread bundle contains a fixed number (or less than the fixed number) of threads, for example, 32 threads. Multiple thread blocks may be executed in the same compute unit or in different compute units.
In each computing unit, the thread bundle scheduling/distributing module schedules and distributes thread bundles so that the plurality of computing cores of the computing unit 230 run the corresponding thread bundles. Each compute core includes an Arithmetic Logic Unit (ALU), a floating point compute unit, and the like. According to the number of the computing cores in the computing unit, a plurality of thread bundles in one thread block can be executed simultaneously or in a time-sharing manner. Multiple threads in each of the bundles execute the same instruction. For example, reading, decoding and transmitting of the instructions are all completed in the thread bundle scheduling/distributing module. The memory execution instructions may be transmitted to a shared cache (e.g., a shared L1 cache) in the compute unit or further transmitted to a unified cache for read and write operations, etc.
In the course of performing a computing task, the computing unit 230 needs to obtain input data to be processed, as well as to generate result data, which may be stored, for example, in a storage device (e.g., HBM) of the data processing apparatus 200. These input data or result data may be transferred (input or output) by means of DMA, in which case the calculation unit 230 will generate a transfer access request for DMA. The transmission access request buffer 213 is used for buffering the transmission access request. The calculation unit 230 buffers (commits) the transmission access request to the transmission access request buffer 213 buffer. For example, the computing unit 230 notifies a central processing unit outside the data processing apparatus 200, for example, via a bus, after generating the transmission access request.
The transfer access request buffer 213 may include a plurality of entries (entries) according to its capacity, and thus may hold a plurality of transfer access requests. For example, the transmit access request buffer 213 may be a first-in-first-out (FIFO) queue, whereby a first generated transmit access request may be processed first. For example, when the transmit access request buffer 213 is not full, new transmit access requests may be received and buffered until a full state is reached.
For example, the calculation unit 230 queries whether the transmission access request buffer 213 is full after generating the transmission access request, and writes the transmission access request into the transmission access request buffer 213 in case the transmission access request buffer 213 is not full, otherwise, for example, waits, or discards the current transmission access request.
For example, the transfer access request may include a transfer source address, a transfer destination address, and a transfer data size. The transmission source address and the transmission destination address may be storage addresses of a storage device in the data processing device 200, or may be storage addresses of a system memory, or storage addresses of other peripheral devices (e.g., another data processing device) in the computer system, and the like according to specific requirements, which is not limited by the embodiment of the present disclosure. Depending on the specific situation, transmission access requests can be obtained for four cases, for example: peripheral to memory (Device to Host, D2H), memory to peripheral (H2D), memory to memory (H2H), peripheral to peripheral (D2D). The size of the transferred data refers to the length of the continuously transferred data when performing DMA, and may be, for example, equal to the bandwidth of the system bus (e.g., may be 64 bits or 128 bits, etc.).
After the transmission access request buffer 213 stores the transmission access request, the transmission access request buffer 213 may further record the status of each transmission access request, in addition to the transmission source address, the transmission destination address, and the transmission data size, for example, the status may include: pending transmission, in transmission, transmission complete or transmission failure. For example, such state information may be accessed, or even modified, by command processing unit 211 or other device (e.g., a CPU or data transfer control device) that may access the transfer access request buffer 213. The transfer access request may also include other states, as desired.
For example, in at least one example, the command processing unit 211 may query the transfer access request buffer 213 for at least one pending transfer access request, submit the transfer access request to a data transfer control device (e.g., a DMA controller) if there is a pending transfer access request, or submit the transfer access requests to the data transfer control device in turn for the data transfer control device to perform a data transfer operation. The data transmission control device processes according to the specific transmission access request.
For example, after a transfer access request is processed, the transfer access request may be deleted from the transfer access request buffer 213, e.g., if it is determined that the transfer is complete or that the transfer fails. This operation may be performed by the command processing unit 211, for example, to query the transmission access request buffer 213 for a transmission access request to be processed, obtain the status of the transmission access request still buffered in the transmission access request buffer 213, and delete the transmission access request from the transmission access request buffer 213 when determining that the data transmission corresponding to a certain transmission access request is completed or fails.
In an embodiment of the present disclosure, the command buffer 212 within the data processing apparatus 200 is used to buffer instructions or requests or the like from a device such as a CPU or the data processing apparatus 200 itself, which are buffered, for example, in the form of packets (e.g., PM4 packets). The command buffer 212 may include a plurality of entries (entries) according to its capacity, and thus may hold a plurality of packets. Correspondingly, the command processing unit 211 is also configured to execute instructions or requests buffered in the command buffer 212, and for example, may also communicate with and interact with devices such as a CPU. For example, the command buffer 212 and the transfer access request buffer 213 may be provided independently of each other within the data processing apparatus 200, or share the same physical storage device, but are logically controlled and managed separately.
For example, in at least one example, command processing unit 211 prioritizes processing of at least one transport access request buffered in transport access request buffer 213 in situations where at least one operational instruction is buffered in command buffer 212 and at least one transport access request is buffered in transport access request buffer 213. That is, when the command processing unit 211 is simultaneously faced with the transfer access request to be processed and another operation instruction to be processed, the transfer access request is processed first, and the transfer access request is submitted to the data transfer control apparatus.
Specifically, in one example, for example, the command processing unit 211 queries whether at least one transmission access request is cached in the transmission access request buffer 213 after processing each operation instruction in the command buffer, and returns to process other operation instructions in the command buffer 212 after processing the pending transmission access request cached in the transmission access request buffer 213 if at least one transmission access request is cached in the transmission access request buffer 213. For example, the command processing unit 211 returns to the position where it left the command buffer 212 before to continue processing the operation instruction in the command buffer 212 after all the transfer access requests in the transfer access request buffer 213 are supplied to the data transfer control device.
For example, in one example, the command processing unit 211 further performs subsequent processing, e.g., modifying its state information, deleting it, etc., on a certain transfer access request still buffered in the transfer access request buffer 213 in response to a message regarding a state change (e.g., data transfer complete) of the transfer access request.
The computer system shown in fig. 3 includes a Central Processing Unit (CPU) 201, one or more data processing devices 200, a storage device 202, a data transfer control device 203, and a bus 204. The data processing apparatus 200 is the data processing apparatus shown in fig. 2, in which 2 data processing apparatuses 200 are shown as an example. The 2 data processing apparatuses 200 are the data processing apparatus 200-1 and the data processing apparatus 200-2, respectively, and data transfer may be performed by the data transfer control apparatus according to the transfer access request between the data processing apparatus 200-1 and the data processing apparatus 200-2, for example.
In the embodiment of the present disclosure, the central processing unit 201 may be, for example, a single-core processor or a multi-core processor, may be a RISC or CISC processor, and may be, for example, an ARM processor, a RISC-V processor, and the like, which is not limited in this respect. The data transfer control device 203 is, for example, a DMA controller provided separately or integrated in, for example, the central processing unit 201 or the like. The storage device 202 may be a system memory or an external memory (e.g., a hard disk) of the computer system, for example, a semiconductor storage device, etc., and the embodiment of the disclosure does not limit the type, structure, etc. of the storage device.
The central processing unit 201, the data processing apparatus 200, the storage apparatus 202, and the DMA controller 203 communicate with each other through a bus 204. Embodiments of the present disclosure are not limited as to the type of bus, and may be, for example, various suitable buses, such as a PCIE bus.
For example, the central processing unit 201 continues to check the completion status of the transmission access request after receiving the notification about the transmission access request, and notifies, for example, the application program after determining that the transmission access request is completed.
For example, the data transmission control device 203 may perform a data transmission operation from the transmission source address to the transmission destination address according to the transmission access request, and may further update the status of the corresponding transmission access request during the data transmission, after the data transmission is completed, or after the data transmission is attempted for a plurality of times and still fails, for example, provide a message about a status change to update the status of the corresponding transmission access request stored in the transmission access request buffer 213 of the data processing device 200.
An exemplary operation of the data processing apparatus and the data processing system according to at least one embodiment of the present disclosure will be described below with reference to fig. 2 and 3.
During operation, the CPU 201 sends a computational task to one or more of the data processing apparatuses 200 for processing, according to the currently running application.
For example, with respect to the data processing apparatus 200-1 therein, during execution of the calculation task, the calculation unit 230 performs calculation, for example, according to the instruction and generates a certain transmission access request (referred to herein as a "first transmission access request"), after which the calculation unit 230 inquires whether the transmission access request buffer 213 is full, writes the first transmission access request to the transmission access request buffer 213 to be processed if the transmission access request buffer 213 is not full, and marks the status of the first transmission access request as "not transmitted". The first transmission access request records a corresponding transmission source address, a transmission destination address and a transmission data scale; at the same time, the calculation unit 230 also notifies the CPU 201 that the first transfer access request is generated. The CPU 201 continues to monitor the completion status of the first transfer access request after receiving the notification.
While the calculation unit 230 executes the calculation task, the command processing unit 211 in the data processing apparatus 200 processes the buffered plurality of operation instructions in the command buffer 212; after executing one of the operation instructions P1, the command processing unit 211 inquires whether the transmission access request buffer 213 currently has a transmission access request to be processed. After the first transport access request has been written into the transport access request buffer 213, the command processing unit 211 accesses the first transport access request, which has a higher priority than the operation instruction in the command buffer 212, and knows that its status is "pending transmission", so that the command processing unit 211 proceeds to process the first transport access request, which is sent to the data transmission control device 203. Thereafter, if there are other transmission access requests in the transmission access request buffer 213, the command processing unit 211 continues to process these pending transmission access requests until all pending transmission access requests have been successfully sent to the data transmission control device 203, and then the command processing unit 211 returns to the command buffer 212 to continue to process other pending operation instructions in the command buffer 212 starting from the position of the previously processed operation instruction P1.
The data transfer control device 203 performs a corresponding data transfer operation according to the transfer source address, the transfer destination address and the transfer data size described in the first transfer access request, and notifies the command processing unit 211 of the data processing device 200-1 to modify the state of the first transfer access request to "transfer in progress"; and, after the data transfer operation is completed, the data transfer control device 203 notifies the command processing unit 211 of the data processing device 200-1 that the status of the first transfer access request is modified to "transfer completed" in the transfer access request buffer 213, and deletes the completed first transfer access request from the transfer access request buffer 213, for example, after a predetermined time or triggered by another operation. Further, the data transfer control device 203 may also notify the CPU 201 that the first transfer access request has been completed, whereby the CPU 201 can feed back the information to the upper application.
In the above process, the data processing apparatus 200 actively initiates the transmission access request, so that compared with initiating the transmission access request via the CPU, the system overhead can be reduced, and the system performance can be improved.
Corresponding to the device and the system, the embodiment of the disclosure also provides a data processing method. As shown in fig. 4, the data processing method includes steps 301 to 304 as follows:
step 301: a transmit access request is generated in response to data that needs to be transmitted.
Step 302: the cache transmits the access request.
Step 303: in response to the buffered transfer access request, the transfer access request is provided to the data transfer control device.
Step 304: so that the data transmission control means responds to the transmission access request to transmit the data to be transmitted.
For example, the calculation is performed by the calculation unit and the data to be transmitted is determined, and the transmission access request is generated in response to the data to be transmitted. The transfer access request is buffered by transferring the access request. The transmission access request is supplied to the data transmission control device in response to the transmission access request buffered in the transmission access request buffer by the command processing unit. And responding to the transmission access request by the data transmission control device to transmit the data needing to be transmitted.
For example, the data processing method according to at least one embodiment of the present disclosure further includes: caching the operation instruction; so that the command processing unit executing the operation instruction and transmitting the access request preferentially processes the transmitting access request.
For other possible operation steps or selections of the method, reference may be made to the operations described above with reference to fig. 2 and 3, which are not described again here.
The data processing method of the embodiment can initiate the transmission access request through the data processing device, so that the system overhead can be reduced, and the system performance can be improved.
For the present disclosure, there are also the following points to be explained:
(1) The drawings of the embodiments of the disclosure only relate to the structures related to the embodiments of the disclosure, and other structures can refer to general designs.
(2) Without conflict, embodiments of the present disclosure and features of the embodiments may be combined with each other to arrive at new embodiments.
The above description is intended to be exemplary of the present disclosure, and not to limit the scope of the present disclosure, which is defined by the claims appended hereto.

Claims (15)

1. A data processing apparatus comprising:
a computing unit configured to generate a transmission access request in response to data that needs to be transmitted;
a transmit access request buffer configured to buffer the transmit access request;
a command processing unit configured to: providing the transfer access request to a data transfer control device in response to the transfer access request buffered in the transfer access request buffer, whereby the data transfer control device directly transfers the data to be transferred from a source address to a destination address in a direct memory access manner in response to the transfer access request.
2. The data processing apparatus according to claim 1, wherein the computing unit is further configured to notify a central processing unit outside the data processing apparatus after generating the transmission access request.
3. The data processing apparatus according to claim 1, wherein the computing unit is further configured to, after generating the transfer access request, query whether the transfer access request buffer is full, and in response to the transfer access request buffer not being full, write the transfer access request to the transfer access request buffer.
4. The data processing apparatus according to any of claims 1-3, wherein said transmitting an access request comprises: a transmission source address, a transmission destination address, and a transmission data size.
5. The data processing apparatus according to claim 4, wherein the transfer access request further comprises a transfer status,
the transmission state comprises waiting for transmission, transmission in progress, transmission completion or transmission failure.
6. A data processing apparatus according to any one of claims 1 to 3, further comprising:
a command buffer configured to buffer an operation instruction for the command processing unit;
wherein the command processing unit is further configured to: and preferentially processing at least one transmission access request buffered in the transmission access request buffer under the condition that at least one operation instruction is buffered in the command buffer and at least one transmission access request is buffered in the transmission access request buffer.
7. The data processing apparatus according to claim 1, wherein the command processing unit is further configured to: after processing each operation instruction in the command buffer, querying whether at least one transmission access request is cached in the transmission access request buffer, responding to the transmission access request to be processed cached in the transmission access request buffer, and after processing the at least one transmission access request cached in the transmission access request buffer, returning to process other operation instructions in the command buffer.
8. The data processing apparatus according to claim 1, wherein said transmission access request buffer is a first-in-first-out buffer.
9. The data processing apparatus according to claim 1, wherein the command processing unit is further configured to provide the transmission access request to the data transmission control apparatus outside the data processing apparatus.
10. A data processing system comprising:
a data processing apparatus as claimed in any one of claims 1, 3 to 9;
the data transmission control device is configured to respond to the transmission access request to transmit the data needing to be transmitted.
11. The data processing system of claim 10, further comprising a central processing unit, wherein the computing unit is further configured to notify the central processing unit after generating the transmitted access request.
12. The data processing system of claim 11, wherein the central processing unit is configured to continue to view a completion status of the transmitted access request after receiving notification of the transmitted access request.
13. A data processing system according to any one of claims 10 to 12, wherein said data transfer control means comprises direct memory access means.
14. A method of data processing, comprising:
generating a transmission access request in response to data that needs to be transmitted;
caching the transmission access request;
providing the transmission access request to a data transmission control device in response to the buffered transmission access request;
causing the data transfer control means to directly transfer the data to be transferred from a source address to a destination address in a direct memory access manner in response to the transfer access request.
15. The data processing method of claim 14, further comprising:
caching the operation instruction;
causing a command processing unit executing the operational instructions and the transport access request to preferentially process the transport access request.
CN202111445101.6A 2021-11-30 2021-11-30 Data processing device, method and system Active CN114116553B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111445101.6A CN114116553B (en) 2021-11-30 2021-11-30 Data processing device, method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111445101.6A CN114116553B (en) 2021-11-30 2021-11-30 Data processing device, method and system

Publications (2)

Publication Number Publication Date
CN114116553A CN114116553A (en) 2022-03-01
CN114116553B true CN114116553B (en) 2023-01-20

Family

ID=80368796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111445101.6A Active CN114116553B (en) 2021-11-30 2021-11-30 Data processing device, method and system

Country Status (1)

Country Link
CN (1) CN114116553B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000354083A (en) * 1999-04-09 2000-12-19 Matsushita Electric Ind Co Ltd Data transmitter

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH03122745A (en) * 1989-10-05 1991-05-24 Mitsubishi Electric Corp Dma control system
JP2000215154A (en) * 1999-01-25 2000-08-04 Matsushita Electric Ind Co Ltd Dma controller
JP5102917B2 (en) * 2008-02-22 2012-12-19 株式会社日立製作所 Storage apparatus and access command transmission method
TW201015321A (en) * 2008-09-25 2010-04-16 Panasonic Corp Buffer memory device, memory system and data trnsfer method
US8775699B2 (en) * 2011-03-01 2014-07-08 Freescale Semiconductor, Inc. Read stacking for data processor interface
US9658975B2 (en) * 2012-07-31 2017-05-23 Silicon Laboratories Inc. Data transfer manager
CN106202261A (en) * 2016-06-29 2016-12-07 浪潮(北京)电子信息产业有限公司 The distributed approach of a kind of data access request and engine
JP6880402B2 (en) * 2017-05-10 2021-06-02 富士通株式会社 Memory access control device and its control method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000354083A (en) * 1999-04-09 2000-12-19 Matsushita Electric Ind Co Ltd Data transmitter

Also Published As

Publication number Publication date
CN114116553A (en) 2022-03-01

Similar Documents

Publication Publication Date Title
US8149854B2 (en) Multi-threaded transmit transport engine for storage devices
US10331595B2 (en) Collaborative hardware interaction by multiple entities using a shared queue
US8316220B2 (en) Operating processors over a network
US8713287B2 (en) Off-loading of processing from a processor blade to storage blades based on processing activity, availability of cache, and other status indicators
US20150261535A1 (en) Method and apparatus for low latency exchange of data between a processor and coprocessor
US20150253997A1 (en) Method and Apparatus for Memory Allocation in a Multi-Node System
US20230214326A1 (en) Computer Memory Expansion Device and Method of Operation
US10592459B2 (en) Method and system for ordering I/O access in a multi-node environment
WO2013082809A1 (en) Acceleration method, device and system for co-processing
US11258887B2 (en) Payload cache
US20150254183A1 (en) Inter-chip interconnect protocol for a multi-chip system
JP4585647B2 (en) Support for multiple outstanding requests to multiple targets in a pipelined memory system
JP3266470B2 (en) Data processing system with per-request write-through cache in forced order
US9286129B2 (en) Termination of requests in a distributed coprocessor system
JP5213485B2 (en) Data synchronization method and multiprocessor system in multiprocessor system
US20200310690A1 (en) Dynamic near-data processing control mechanism based on computer resource availability on solid-state disk platforms
CN114116553B (en) Data processing device, method and system
CN114356839B (en) Method, device, processor and device readable storage medium for processing write operation
US7120758B2 (en) Technique for improving processor performance
US20100095040A1 (en) Multi-core processor, control method thereof, and information processing apparatus
US20230195664A1 (en) Software management of direct memory access commands
US11995351B2 (en) DMA engines configured to perform first portion data transfer commands with a first DMA engine and second portion data transfer commands with second DMA engine
US20230132931A1 (en) Hardware management of direct memory access commands
US11789866B2 (en) Method for processing non-cache data write request, cache, and node
US20240069965A1 (en) Systems and methods for executing compute functions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant