CN117076064A - Method for automatically triggering SOC DMA by NPU DMA on chip - Google Patents
Method for automatically triggering SOC DMA by NPU DMA on chip Download PDFInfo
- Publication number
- CN117076064A CN117076064A CN202310628747.0A CN202310628747A CN117076064A CN 117076064 A CN117076064 A CN 117076064A CN 202310628747 A CN202310628747 A CN 202310628747A CN 117076064 A CN117076064 A CN 117076064A
- Authority
- CN
- China
- Prior art keywords
- dma
- npu
- soc
- trigger signal
- chip
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000005540 biological transmission Effects 0.000 claims abstract description 11
- 230000001960 triggered effect Effects 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000012805 post-processing Methods 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 claims description 3
- 238000013461 design Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4812—Task transfer initiation or dispatching by interrupt, e.g. masked
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/20—Handling requests for interconnection or transfer for access to input/output bus
- G06F13/28—Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/445—Program loading or initiating
- G06F9/44505—Configuring for program initiating, e.g. using registry, configuration files
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Bus Control (AREA)
Abstract
The invention discloses a method for automatically triggering SOC DMA (system on chip) by NPU (network platform unit) DMA (direct memory access) on a chip, wherein the NPU DMA has the capability of sending out a hardware trigger signal, the SOC DMA has the capability of receiving the hardware trigger signal, and after the SOC DMA is configured and one-time transportation is completed, the subsequent transportation is automatically triggered by an NPU DMA hardware circuit; a reporting interrupt port and an output trigger signal port are arranged in an NPU DMA IP port, and the output trigger signal represents triggering and starting of another SOC DMA after a plurality of tasks are completed; in the port of the SOC DMA IP, the processor starts transmission through a bus configuration register, and an input trigger signal port is also arranged, wherein the input trigger signal represents the start of a carrying task. According to the technical scheme disclosed by the invention, the DMA transport task is automatically triggered by hardware, so that the delay on an NPU data processing path is reduced, the processing performance is improved, and the power consumption of a chip is reduced.
Description
Technical Field
The invention discloses a method for automatically triggering SOC DMA by NPU DMA on a chip, which relates to the technical field of SOC hardware architecture design in a chip, in particular to hardware design of DMA modules in NPU and SOC.
Background
In conventional chip designs, different DMA (Direct Memory Access ) modules are independent of each other, serving different processors, there is no hardware connection between the two DMA modules, and each initiation of a DMA transfer task must be controlled and initiated by the corresponding processor. In high performance scenarios, such conventional designs can be time-delayed due to the time of the software process. In an ultra low power consumption scenario, power consumption is stressed because the processor is frequently awakened.
In the current NPUSOC architecture, NPU DMA and SOC DMA are two completely independent IPs, serving NPU and CPU, respectively. Under the traditional architecture, the DMA data carrying task from one Chiplet to another Chiplet needs software intervention to be initiated by a CPU every time, and a certain delay and power consumption increase are brought. Particularly for delay-sensitive working scenarios, the delay impact caused by the intervention of software is severe.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: in order to improve the computational performance of the NPU, it is important to reduce the latency of data handling. The invention provides a hardware design method for automatically triggering another DMA module to carry out a carrying task by the DMA module, which greatly reduces the participation degree of software and a CPU, thereby reducing the time delay on a data path and simultaneously saving the power consumption of the CPU.
The invention adopts the following technical scheme for solving the technical problems:
a method for automatically triggering SOC DMA by NPU DMA on a chip, wherein the NPU DMA has the capability of sending out a hardware trigger signal, the SOC DMA has the capability of receiving the hardware trigger signal, and after the SOC DMA is configured and one-time transportation is completed, the subsequent transportation is automatically triggered by an NPU DMA hardware circuit.
As a further preferable scheme of the invention, a reporting interrupt port and an output trigger signal port are arranged in the port of the NPU DMA IP, and the output trigger signal represents triggering and starting of the other SOC DMA after the completion of a plurality of tasks; in the port of the SOC DMA IP, the processor starts transmission through a bus configuration register, and an input trigger signal port is also arranged, wherein the input trigger signal represents the start of a carrying task.
As a further preferable scheme of the invention, when the trigger signal output by the NPU DMA IP is effective, the corresponding interrupt occurrence times are configured through software; and opening or closing the function of starting the SOC DMA IP by inputting a trigger signal through software configuration.
As a further preferable scheme of the invention, when the NPU DMA processes the last piece of data, the next block data to be processed is prepared in advance, and the delay of processing tasks is shortened.
As a further preferred embodiment of the present invention, the application of the method includes two chiplets, and Chiplet0 NPU is set to perform preprocessing of data, where Chiplet1NPU is responsible for post-processing of data, and the method includes:
the Chiplet0 NPU completes the preprocessing of a block, the NPU DMA stores the processed data into the Chiplet0DDR, and the NPU enters a low power consumption mode after reporting an interrupt to the Chiplet0 CPU;
the Chiplet0CPU informs the Chiplet1 CPU of starting post-processing through a Die communication mechanism;
the chip 1 CPU carries out initial configuration on the SOC DMA, so that the SOC DMA can carry data from the chip 0DDR to the chip 1 DDR through a Die-to-Die interface module;
after the chip 1SOC DMA completes the data of one block, notifying the chip 0CPU through inter-Die interrupt; the Chiplet0CPU starts the NPU to perform the preprocessing of the next Block, and the data is stored in the Chiplet0 DDR;
the Chiplet1SOC DMA interrupt informs the Chiplet1 CPU, and the Chiplet1 CPU starts the Chiplet1NPU to start data post-processing; the Chiplet1 CPU shields the SOC DMA interrupt and then enters the WFI;
after the Chiplet1NPU DMA finishes carrying for set times, directly sending a trigger signal to the Chiplet1SOC DMA through a hardware circuit;
after the chip 1SOC DMA is triggered, using the initial configuration, the next block of data is initially transferred from the chip 0DDR to the chip 1 DDR.
As a further preferable aspect of the present invention, in the method, the DMA module setting with an automatic triggering function includes: the APB of the SOC DMA configures a bus interface and an AXI data bus interface; an APB configuration bus interface and an AXI data bus interface of the NPU DMA; externally connecting a memory access interface; an AXI bus interface of the CPU; accessing a DDR bus interface; the system also comprises a trigger signal interface which is externally output and is used for triggering the data transmission tasks of other DMA modules; and the trigger signal interfaces input by other DMA modules are used for triggering the data transmission task of the DMA module.
The control logic is derived from the interrupt signal output by the interrupt module according to the trigger signal output by the external; a counter of an interrupt signal is newly added in a hardware structure, meanwhile, software sets a variable value through configuring an APB register to represent the occurrence times of the interrupt, and after the interrupt times are equal to or exceed the set value, an output trigger_out signal is pulled up; the input trigger signal is controlled by software to control whether the trigger signal is functional or not, the selection signal of the MUX is configured through a register, and when the input trigger signal is gated, the DMA channel is triggered to start, and the data carrying service is started.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects: according to the technical scheme disclosed by the invention, the DMA transport task is automatically triggered by hardware, so that the delay on an NPU data processing path is reduced, the processing performance is improved, and the power consumption of a chip is reduced.
Drawings
Fig. 1 is a schematic of the workflow of the present invention.
FIG. 2 is a schematic diagram of the connection of a DMA module with auto-triggering function inside a chip according to the present invention.
FIG. 3 is a schematic diagram of the internal structure of a DMA module with auto-triggering function according to the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings:
the invention discloses a method for automatically triggering SOC DMA by NPU DMA on a Chiplet, which is used for architecture optimization and performance improvement in the Chiplet, and the specific workflow is shown in figure 1.
Taking an application scenario of the scheme as an example, it is assumed that the Chiplet0 NPU is responsible for preprocessing data, the preprocessing time is short, the Chiplet1NPU is responsible for post-processing of data, and the post-processing time is long.
The Chiplet0 NPU completes the preprocessing of a block, and the NPU DMA stores the processed data into the Chiplet0DDR (assuming the data quantity is 10 MB), and enters a low power consumption mode after reporting an interrupt to the Chiplet0 CPU.
The Chiplet0CPU informs the Chiplet1 CPU to start post-processing through an inter-Die communication mechanism. The Chiplet1 CPU initially configures the SOC DMA to handle data from the Chiplet0DDR to the Chiplet1 DDR through the Die-to-Die interface module (D2D). The amount of data per transfer is assumed to be 10MB.
After the chip 1SOC DMA completes the data of one Block (10 MB), the chip 0CPU is notified by inter-Die interrupt, and the chip 0CPU starts the NPU to perform the preprocessing of the next Block, and stores the data in the chip 0 DDR.
Simultaneously with the operation, the Chiplet1SOC DMA interrupt informs the Chiplet1 CPU, the Chiplet1 CPU starts the Chiplet1NPU to start data post-processing, and then the Chiplet1 CPU shields the SOC DMA interrupt and enters the WFI. Chiplet1NPU DMA is carried 10 times in total, calculated from DDR carrying 1MB data at a time (determined by the size of the SRAM inside the NPU).
After the Chiplet1NPU DMA completes the tenth transportation, the interrupt is not required to be sent to the CPU, but a trigger signal is directly sent to the Chiplet1SOC DMA through a hardware circuit; after the chip 1SOC DMA is triggered, the initial configuration is still used, source address and destination address remain unchanged, and the next block10MB of data is initially transferred from the chip 0DDR to the chip 1 DDR.
After the Chiplet1NPU finishes processing the last 1MB of data, the state of the SOC DMA is inquired, and the next block10MB of data is confirmed to be ready to be completed in the DDR at the moment, so that the next round of processing can be directly started.
In the present invention, a schematic diagram of connection of a DMA module with an auto-triggering function inside a chip is shown in fig. 2, wherein: (1) and (3) an APB configuration bus interface and an AXI data bus interface of the SOC DMA, (2) and (4) an APB configuration bus interface and an AXI data bus interface of the NPU DMA, (5) an interrupt signal sent to the CPU by the NPU after the NPU performs a task, (6) an interrupt signal sent to the CPU by the SOC DMA, (7) a trigger signal externally output by the NPU DMA and used for automatically triggering the next transmission task of the SOC DMA, (8) an AXI bus interface of the CPU, and (9) a bus interface for accessing the DDR.
In the present invention, a schematic diagram of the internal structure of a DMA module with an auto-triggering function is shown in fig. 3, in which: (1) the system is an APB configuration bus interface, (2) an AXI data transmission bus interface, (3) an external memory access interface, which is used for caching read-write data of each carrying channel, (4) an output interrupt signal, and (5) a hardware DMA request and reply handshake signal. The above interfaces are all universal DMA module interfaces. (6) And (7) is a newly added interface signal of the invention, wherein (6) is a trigger signal which is output externally and used for triggering the data transmission task of other DMA modules, and (7) is a trigger signal which is input by other DMA modules and used for triggering the data transmission task of the DMA module.
The control logic is derived from the interrupt signal output by the interrupt module, a counter of the interrupt signal is newly added, meanwhile, software sets a variable value through configuring an APB register to represent the occurrence times of the interrupt, and after the interrupt times are equal to or exceed the set value, the output trigger_out signal is pulled up.
The input trigger signal is controlled by software to be functional or not. The register configures the selection signal of the MUX, so that the routine software starting mode can be selected by default, when the selection signal is configured to be 1, the hardware triggering mode is switched, and the input triggering signal is gated at the moment, so that the starting of the DMA channel is triggered, and the data carrying service is started.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention. The present invention is not limited to the preferred embodiments, but is capable of modification and variation in detail, and other embodiments, such as those described above, of making various modifications and equivalents will fall within the spirit and scope of the present invention.
Claims (7)
1. A method for automatically triggering SOC DMA by NPU DMA on a chip is characterized in that: the NPU DMA has the capability of sending out a hardware trigger signal, the SOC DMA has the capability of receiving the hardware trigger signal, and after the SOC DMA is configured and the transportation is completed once, the subsequent transportation is automatically triggered by the NPU DMA hardware circuit.
2. The method for automatically triggering SOC DMA by NPU DMA on a Chiplet of claim 1, wherein: a reporting interrupt port and an output trigger signal port are arranged in an NPU DMA IP port, and the output trigger signal represents triggering and starting of another SOC DMA after a plurality of tasks are completed;
in the port of the SOC DMA IP, the processor starts transmission through a bus configuration register, and an input trigger signal port is also arranged, wherein the input trigger signal represents the start of a carrying task.
3. The method for automatically triggering SOC DMA by NPU DMA on a Chiplet as claimed in claim 2, wherein: when the trigger signal output by the NPU DMA IP is effective, configuring corresponding interrupt occurrence times through software;
and opening or closing the function of starting the SOC DMA IP by inputting a trigger signal through software configuration.
4. The method for automatically triggering SOC DMA by NPU DMA on a Chiplet as claimed in claim 2, wherein: when the NPU DMA processes the last piece of data, the next block data to be processed is prepared in advance, and the delay of processing tasks is shortened.
5. A method for automatically triggering SOC DMA by NPU DMA on a Chiplet according to claim 1 or 2, wherein the application of the method includes two chiplets, a Chiplet0 NPU is set for preprocessing data, and the Chiplet1NPU is responsible for post-processing data, the method comprising:
the Chiplet0 NPU completes the preprocessing of a block, the NPU DMA stores the processed data into the Chiplet0DDR, and the NPU enters a low power consumption mode after reporting an interrupt to the Chiplet0 CPU;
the Chiplet0CPU informs the Chiplet1 CPU of starting post-processing through a Die communication mechanism;
the chip 1 CPU carries out initial configuration on the SOC DMA, so that the SOC DMA can carry data from the chip 0DDR to the chip 1 DDR through a Die-to-Die interface module;
after the chip 1SOC DMA completes the data of one block, notifying the chip 0CPU through inter-Die interrupt;
the Chiplet0CPU starts the NPU to perform the preprocessing of the next Block, and the data is stored in the Chiplet0 DDR;
the Chiplet1SOC DMA interrupt informs the Chiplet1 CPU, and the Chiplet1 CPU starts the Chiplet1NPU to start data post-processing;
the Chiplet1 CPU shields the SOC DMA interrupt and then enters the WFI;
after the Chiplet1NPU DMA finishes carrying for set times, directly sending a trigger signal to the Chiplet1SOC DMA through a hardware circuit;
after the chip 1SOC DMA is triggered, using the initial configuration, the next block of data is initially transferred from the chip 0DDR to the chip 1 DDR.
6. A method for automatically triggering SOC DMA by NPU DMA on a chip according to claim 1 or 2, wherein the method comprises: the APB of the SOC DMA configures a bus interface and an AXI data bus interface; an APB configuration bus interface and an AXI data bus interface of the NPU DMA; externally connecting a memory access interface; an AXI bus interface of the CPU; accessing a DDR bus interface;
the system also comprises a trigger signal interface which is externally output and is used for triggering the data transmission tasks of other DMA modules;
and the trigger signal interfaces input by other DMA modules are used for triggering the data transmission task of the DMA module.
7. The method for automatically triggering SOC DMA by NPU DMA on a chip as recited in claim 6, wherein said externally output trigger signal, control logic is derived from an interrupt signal output by an interrupt module;
a counter of an interrupt signal is newly added in a hardware structure, meanwhile, software sets a variable value through configuring an APB register to represent the occurrence times of the interrupt, and after the interrupt times are equal to or exceed the set value, an output trigger_out signal is pulled up;
the input trigger signal is controlled by software to control whether the trigger signal is functional or not, the selection signal of the MUX is configured through a register, and when the input trigger signal is gated, the DMA channel is triggered to start, and the data carrying service is started.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310628747.0A CN117076064A (en) | 2023-05-31 | 2023-05-31 | Method for automatically triggering SOC DMA by NPU DMA on chip |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310628747.0A CN117076064A (en) | 2023-05-31 | 2023-05-31 | Method for automatically triggering SOC DMA by NPU DMA on chip |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117076064A true CN117076064A (en) | 2023-11-17 |
Family
ID=88706718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310628747.0A Pending CN117076064A (en) | 2023-05-31 | 2023-05-31 | Method for automatically triggering SOC DMA by NPU DMA on chip |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117076064A (en) |
-
2023
- 2023-05-31 CN CN202310628747.0A patent/CN117076064A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7096296B2 (en) | Supercharge message exchanger | |
US6094700A (en) | Serial bus system for sending multiple frames of unique data | |
US11269796B2 (en) | Acceleration control system based on binarization algorithm, chip, and robot | |
US5682551A (en) | System for checking the acceptance of I/O request to an interface using software visible instruction which provides a status signal and performs operations in response thereto | |
WO2018120780A1 (en) | Method and system for pcie interrupt | |
CN108595350B (en) | AXI-based data transmission method and device | |
CN114490460B (en) | FLASH controller for ASIC and control method thereof | |
CN109800193B (en) | Bridging device of SRAM on AHB bus access chip | |
CN108959136B (en) | SPI-based data transmission accelerating device and system and data transmission method | |
CN104714918B (en) | The reception of high speed FC bus datas and way to play for time under hosted environment | |
US20030131173A1 (en) | Method and apparatus for host messaging unit for peripheral component interconnect busmaster devices | |
US20150268985A1 (en) | Low Latency Data Delivery | |
US10909056B2 (en) | Multi-core electronic system | |
US6633927B1 (en) | Device and method to minimize data latency and maximize data throughput using multiple data valid signals | |
CN113760792A (en) | AXI4 bus control circuit for image access based on FPGA and data transmission method thereof | |
CN117076064A (en) | Method for automatically triggering SOC DMA by NPU DMA on chip | |
CN115328832B (en) | Data scheduling system and method based on PCIE DMA | |
CN116301627A (en) | NVMe controller and initialization and data read-write method thereof | |
CN111190840A (en) | Multi-party central processing unit communication architecture based on field programmable gate array control | |
US11449450B2 (en) | Processing and storage circuit | |
CN114968863A (en) | Data transmission method based on DMA controller | |
CN113886104A (en) | Multi-core chip and communication method thereof | |
US8462561B2 (en) | System and method for interfacing burst mode devices and page mode devices | |
CN107807888B (en) | Data prefetching system and method for SOC architecture | |
CN108153703A (en) | A kind of peripheral access method and apparatus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |