CN117076064A - Method for automatically triggering SOC DMA by NPU DMA on chip - Google Patents

Method for automatically triggering SOC DMA by NPU DMA on chip Download PDF

Info

Publication number
CN117076064A
CN117076064A CN202310628747.0A CN202310628747A CN117076064A CN 117076064 A CN117076064 A CN 117076064A CN 202310628747 A CN202310628747 A CN 202310628747A CN 117076064 A CN117076064 A CN 117076064A
Authority
CN
China
Prior art keywords
dma
npu
soc
trigger signal
chip
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310628747.0A
Other languages
Chinese (zh)
Inventor
战诗苗
黄海林
李力游
小约翰·罗伯特·罗兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Lanyang Intelligent Technology Co ltd
Original Assignee
Nanjing Lanyang Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Lanyang Intelligent Technology Co ltd filed Critical Nanjing Lanyang Intelligent Technology Co ltd
Priority to CN202310628747.0A priority Critical patent/CN117076064A/en
Publication of CN117076064A publication Critical patent/CN117076064A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4812Task transfer initiation or dispatching by interrupt, e.g. masked
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • G06F13/28Handling requests for interconnection or transfer for access to input/output bus using burst mode transfer, e.g. direct memory access DMA, cycle steal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44505Configuring for program initiating, e.g. using registry, configuration files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Bus Control (AREA)

Abstract

The invention discloses a method for automatically triggering SOC DMA (system on chip) by NPU (network platform unit) DMA (direct memory access) on a chip, wherein the NPU DMA has the capability of sending out a hardware trigger signal, the SOC DMA has the capability of receiving the hardware trigger signal, and after the SOC DMA is configured and one-time transportation is completed, the subsequent transportation is automatically triggered by an NPU DMA hardware circuit; a reporting interrupt port and an output trigger signal port are arranged in an NPU DMA IP port, and the output trigger signal represents triggering and starting of another SOC DMA after a plurality of tasks are completed; in the port of the SOC DMA IP, the processor starts transmission through a bus configuration register, and an input trigger signal port is also arranged, wherein the input trigger signal represents the start of a carrying task. According to the technical scheme disclosed by the invention, the DMA transport task is automatically triggered by hardware, so that the delay on an NPU data processing path is reduced, the processing performance is improved, and the power consumption of a chip is reduced.

Description

Method for automatically triggering SOC DMA by NPU DMA on chip
Technical Field
The invention discloses a method for automatically triggering SOC DMA by NPU DMA on a chip, which relates to the technical field of SOC hardware architecture design in a chip, in particular to hardware design of DMA modules in NPU and SOC.
Background
In conventional chip designs, different DMA (Direct Memory Access ) modules are independent of each other, serving different processors, there is no hardware connection between the two DMA modules, and each initiation of a DMA transfer task must be controlled and initiated by the corresponding processor. In high performance scenarios, such conventional designs can be time-delayed due to the time of the software process. In an ultra low power consumption scenario, power consumption is stressed because the processor is frequently awakened.
In the current NPUSOC architecture, NPU DMA and SOC DMA are two completely independent IPs, serving NPU and CPU, respectively. Under the traditional architecture, the DMA data carrying task from one Chiplet to another Chiplet needs software intervention to be initiated by a CPU every time, and a certain delay and power consumption increase are brought. Particularly for delay-sensitive working scenarios, the delay impact caused by the intervention of software is severe.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: in order to improve the computational performance of the NPU, it is important to reduce the latency of data handling. The invention provides a hardware design method for automatically triggering another DMA module to carry out a carrying task by the DMA module, which greatly reduces the participation degree of software and a CPU, thereby reducing the time delay on a data path and simultaneously saving the power consumption of the CPU.
The invention adopts the following technical scheme for solving the technical problems:
a method for automatically triggering SOC DMA by NPU DMA on a chip, wherein the NPU DMA has the capability of sending out a hardware trigger signal, the SOC DMA has the capability of receiving the hardware trigger signal, and after the SOC DMA is configured and one-time transportation is completed, the subsequent transportation is automatically triggered by an NPU DMA hardware circuit.
As a further preferable scheme of the invention, a reporting interrupt port and an output trigger signal port are arranged in the port of the NPU DMA IP, and the output trigger signal represents triggering and starting of the other SOC DMA after the completion of a plurality of tasks; in the port of the SOC DMA IP, the processor starts transmission through a bus configuration register, and an input trigger signal port is also arranged, wherein the input trigger signal represents the start of a carrying task.
As a further preferable scheme of the invention, when the trigger signal output by the NPU DMA IP is effective, the corresponding interrupt occurrence times are configured through software; and opening or closing the function of starting the SOC DMA IP by inputting a trigger signal through software configuration.
As a further preferable scheme of the invention, when the NPU DMA processes the last piece of data, the next block data to be processed is prepared in advance, and the delay of processing tasks is shortened.
As a further preferred embodiment of the present invention, the application of the method includes two chiplets, and Chiplet0 NPU is set to perform preprocessing of data, where Chiplet1NPU is responsible for post-processing of data, and the method includes:
the Chiplet0 NPU completes the preprocessing of a block, the NPU DMA stores the processed data into the Chiplet0DDR, and the NPU enters a low power consumption mode after reporting an interrupt to the Chiplet0 CPU;
the Chiplet0CPU informs the Chiplet1 CPU of starting post-processing through a Die communication mechanism;
the chip 1 CPU carries out initial configuration on the SOC DMA, so that the SOC DMA can carry data from the chip 0DDR to the chip 1 DDR through a Die-to-Die interface module;
after the chip 1SOC DMA completes the data of one block, notifying the chip 0CPU through inter-Die interrupt; the Chiplet0CPU starts the NPU to perform the preprocessing of the next Block, and the data is stored in the Chiplet0 DDR;
the Chiplet1SOC DMA interrupt informs the Chiplet1 CPU, and the Chiplet1 CPU starts the Chiplet1NPU to start data post-processing; the Chiplet1 CPU shields the SOC DMA interrupt and then enters the WFI;
after the Chiplet1NPU DMA finishes carrying for set times, directly sending a trigger signal to the Chiplet1SOC DMA through a hardware circuit;
after the chip 1SOC DMA is triggered, using the initial configuration, the next block of data is initially transferred from the chip 0DDR to the chip 1 DDR.
As a further preferable aspect of the present invention, in the method, the DMA module setting with an automatic triggering function includes: the APB of the SOC DMA configures a bus interface and an AXI data bus interface; an APB configuration bus interface and an AXI data bus interface of the NPU DMA; externally connecting a memory access interface; an AXI bus interface of the CPU; accessing a DDR bus interface; the system also comprises a trigger signal interface which is externally output and is used for triggering the data transmission tasks of other DMA modules; and the trigger signal interfaces input by other DMA modules are used for triggering the data transmission task of the DMA module.
The control logic is derived from the interrupt signal output by the interrupt module according to the trigger signal output by the external; a counter of an interrupt signal is newly added in a hardware structure, meanwhile, software sets a variable value through configuring an APB register to represent the occurrence times of the interrupt, and after the interrupt times are equal to or exceed the set value, an output trigger_out signal is pulled up; the input trigger signal is controlled by software to control whether the trigger signal is functional or not, the selection signal of the MUX is configured through a register, and when the input trigger signal is gated, the DMA channel is triggered to start, and the data carrying service is started.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects: according to the technical scheme disclosed by the invention, the DMA transport task is automatically triggered by hardware, so that the delay on an NPU data processing path is reduced, the processing performance is improved, and the power consumption of a chip is reduced.
Drawings
Fig. 1 is a schematic of the workflow of the present invention.
FIG. 2 is a schematic diagram of the connection of a DMA module with auto-triggering function inside a chip according to the present invention.
FIG. 3 is a schematic diagram of the internal structure of a DMA module with auto-triggering function according to the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention.
The technical scheme of the invention is further described in detail below with reference to the accompanying drawings:
the invention discloses a method for automatically triggering SOC DMA by NPU DMA on a Chiplet, which is used for architecture optimization and performance improvement in the Chiplet, and the specific workflow is shown in figure 1.
Taking an application scenario of the scheme as an example, it is assumed that the Chiplet0 NPU is responsible for preprocessing data, the preprocessing time is short, the Chiplet1NPU is responsible for post-processing of data, and the post-processing time is long.
The Chiplet0 NPU completes the preprocessing of a block, and the NPU DMA stores the processed data into the Chiplet0DDR (assuming the data quantity is 10 MB), and enters a low power consumption mode after reporting an interrupt to the Chiplet0 CPU.
The Chiplet0CPU informs the Chiplet1 CPU to start post-processing through an inter-Die communication mechanism. The Chiplet1 CPU initially configures the SOC DMA to handle data from the Chiplet0DDR to the Chiplet1 DDR through the Die-to-Die interface module (D2D). The amount of data per transfer is assumed to be 10MB.
After the chip 1SOC DMA completes the data of one Block (10 MB), the chip 0CPU is notified by inter-Die interrupt, and the chip 0CPU starts the NPU to perform the preprocessing of the next Block, and stores the data in the chip 0 DDR.
Simultaneously with the operation, the Chiplet1SOC DMA interrupt informs the Chiplet1 CPU, the Chiplet1 CPU starts the Chiplet1NPU to start data post-processing, and then the Chiplet1 CPU shields the SOC DMA interrupt and enters the WFI. Chiplet1NPU DMA is carried 10 times in total, calculated from DDR carrying 1MB data at a time (determined by the size of the SRAM inside the NPU).
After the Chiplet1NPU DMA completes the tenth transportation, the interrupt is not required to be sent to the CPU, but a trigger signal is directly sent to the Chiplet1SOC DMA through a hardware circuit; after the chip 1SOC DMA is triggered, the initial configuration is still used, source address and destination address remain unchanged, and the next block10MB of data is initially transferred from the chip 0DDR to the chip 1 DDR.
After the Chiplet1NPU finishes processing the last 1MB of data, the state of the SOC DMA is inquired, and the next block10MB of data is confirmed to be ready to be completed in the DDR at the moment, so that the next round of processing can be directly started.
In the present invention, a schematic diagram of connection of a DMA module with an auto-triggering function inside a chip is shown in fig. 2, wherein: (1) and (3) an APB configuration bus interface and an AXI data bus interface of the SOC DMA, (2) and (4) an APB configuration bus interface and an AXI data bus interface of the NPU DMA, (5) an interrupt signal sent to the CPU by the NPU after the NPU performs a task, (6) an interrupt signal sent to the CPU by the SOC DMA, (7) a trigger signal externally output by the NPU DMA and used for automatically triggering the next transmission task of the SOC DMA, (8) an AXI bus interface of the CPU, and (9) a bus interface for accessing the DDR.
In the present invention, a schematic diagram of the internal structure of a DMA module with an auto-triggering function is shown in fig. 3, in which: (1) the system is an APB configuration bus interface, (2) an AXI data transmission bus interface, (3) an external memory access interface, which is used for caching read-write data of each carrying channel, (4) an output interrupt signal, and (5) a hardware DMA request and reply handshake signal. The above interfaces are all universal DMA module interfaces. (6) And (7) is a newly added interface signal of the invention, wherein (6) is a trigger signal which is output externally and used for triggering the data transmission task of other DMA modules, and (7) is a trigger signal which is input by other DMA modules and used for triggering the data transmission task of the DMA module.
The control logic is derived from the interrupt signal output by the interrupt module, a counter of the interrupt signal is newly added, meanwhile, software sets a variable value through configuring an APB register to represent the occurrence times of the interrupt, and after the interrupt times are equal to or exceed the set value, the output trigger_out signal is pulled up.
The input trigger signal is controlled by software to be functional or not. The register configures the selection signal of the MUX, so that the routine software starting mode can be selected by default, when the selection signal is configured to be 1, the hardware triggering mode is switched, and the input triggering signal is gated at the moment, so that the starting of the DMA channel is triggered, and the data carrying service is started.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention. The present invention is not limited to the preferred embodiments, but is capable of modification and variation in detail, and other embodiments, such as those described above, of making various modifications and equivalents will fall within the spirit and scope of the present invention.

Claims (7)

1. A method for automatically triggering SOC DMA by NPU DMA on a chip is characterized in that: the NPU DMA has the capability of sending out a hardware trigger signal, the SOC DMA has the capability of receiving the hardware trigger signal, and after the SOC DMA is configured and the transportation is completed once, the subsequent transportation is automatically triggered by the NPU DMA hardware circuit.
2. The method for automatically triggering SOC DMA by NPU DMA on a Chiplet of claim 1, wherein: a reporting interrupt port and an output trigger signal port are arranged in an NPU DMA IP port, and the output trigger signal represents triggering and starting of another SOC DMA after a plurality of tasks are completed;
in the port of the SOC DMA IP, the processor starts transmission through a bus configuration register, and an input trigger signal port is also arranged, wherein the input trigger signal represents the start of a carrying task.
3. The method for automatically triggering SOC DMA by NPU DMA on a Chiplet as claimed in claim 2, wherein: when the trigger signal output by the NPU DMA IP is effective, configuring corresponding interrupt occurrence times through software;
and opening or closing the function of starting the SOC DMA IP by inputting a trigger signal through software configuration.
4. The method for automatically triggering SOC DMA by NPU DMA on a Chiplet as claimed in claim 2, wherein: when the NPU DMA processes the last piece of data, the next block data to be processed is prepared in advance, and the delay of processing tasks is shortened.
5. A method for automatically triggering SOC DMA by NPU DMA on a Chiplet according to claim 1 or 2, wherein the application of the method includes two chiplets, a Chiplet0 NPU is set for preprocessing data, and the Chiplet1NPU is responsible for post-processing data, the method comprising:
the Chiplet0 NPU completes the preprocessing of a block, the NPU DMA stores the processed data into the Chiplet0DDR, and the NPU enters a low power consumption mode after reporting an interrupt to the Chiplet0 CPU;
the Chiplet0CPU informs the Chiplet1 CPU of starting post-processing through a Die communication mechanism;
the chip 1 CPU carries out initial configuration on the SOC DMA, so that the SOC DMA can carry data from the chip 0DDR to the chip 1 DDR through a Die-to-Die interface module;
after the chip 1SOC DMA completes the data of one block, notifying the chip 0CPU through inter-Die interrupt;
the Chiplet0CPU starts the NPU to perform the preprocessing of the next Block, and the data is stored in the Chiplet0 DDR;
the Chiplet1SOC DMA interrupt informs the Chiplet1 CPU, and the Chiplet1 CPU starts the Chiplet1NPU to start data post-processing;
the Chiplet1 CPU shields the SOC DMA interrupt and then enters the WFI;
after the Chiplet1NPU DMA finishes carrying for set times, directly sending a trigger signal to the Chiplet1SOC DMA through a hardware circuit;
after the chip 1SOC DMA is triggered, using the initial configuration, the next block of data is initially transferred from the chip 0DDR to the chip 1 DDR.
6. A method for automatically triggering SOC DMA by NPU DMA on a chip according to claim 1 or 2, wherein the method comprises: the APB of the SOC DMA configures a bus interface and an AXI data bus interface; an APB configuration bus interface and an AXI data bus interface of the NPU DMA; externally connecting a memory access interface; an AXI bus interface of the CPU; accessing a DDR bus interface;
the system also comprises a trigger signal interface which is externally output and is used for triggering the data transmission tasks of other DMA modules;
and the trigger signal interfaces input by other DMA modules are used for triggering the data transmission task of the DMA module.
7. The method for automatically triggering SOC DMA by NPU DMA on a chip as recited in claim 6, wherein said externally output trigger signal, control logic is derived from an interrupt signal output by an interrupt module;
a counter of an interrupt signal is newly added in a hardware structure, meanwhile, software sets a variable value through configuring an APB register to represent the occurrence times of the interrupt, and after the interrupt times are equal to or exceed the set value, an output trigger_out signal is pulled up;
the input trigger signal is controlled by software to control whether the trigger signal is functional or not, the selection signal of the MUX is configured through a register, and when the input trigger signal is gated, the DMA channel is triggered to start, and the data carrying service is started.
CN202310628747.0A 2023-05-31 2023-05-31 Method for automatically triggering SOC DMA by NPU DMA on chip Pending CN117076064A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310628747.0A CN117076064A (en) 2023-05-31 2023-05-31 Method for automatically triggering SOC DMA by NPU DMA on chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310628747.0A CN117076064A (en) 2023-05-31 2023-05-31 Method for automatically triggering SOC DMA by NPU DMA on chip

Publications (1)

Publication Number Publication Date
CN117076064A true CN117076064A (en) 2023-11-17

Family

ID=88706718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310628747.0A Pending CN117076064A (en) 2023-05-31 2023-05-31 Method for automatically triggering SOC DMA by NPU DMA on chip

Country Status (1)

Country Link
CN (1) CN117076064A (en)

Similar Documents

Publication Publication Date Title
US7096296B2 (en) Supercharge message exchanger
US6094700A (en) Serial bus system for sending multiple frames of unique data
US11269796B2 (en) Acceleration control system based on binarization algorithm, chip, and robot
US5682551A (en) System for checking the acceptance of I/O request to an interface using software visible instruction which provides a status signal and performs operations in response thereto
WO2018120780A1 (en) Method and system for pcie interrupt
CN108595350B (en) AXI-based data transmission method and device
CN114490460B (en) FLASH controller for ASIC and control method thereof
CN109800193B (en) Bridging device of SRAM on AHB bus access chip
CN108959136B (en) SPI-based data transmission accelerating device and system and data transmission method
CN104714918B (en) The reception of high speed FC bus datas and way to play for time under hosted environment
US20030131173A1 (en) Method and apparatus for host messaging unit for peripheral component interconnect busmaster devices
US20150268985A1 (en) Low Latency Data Delivery
US10909056B2 (en) Multi-core electronic system
US6633927B1 (en) Device and method to minimize data latency and maximize data throughput using multiple data valid signals
CN113760792A (en) AXI4 bus control circuit for image access based on FPGA and data transmission method thereof
CN117076064A (en) Method for automatically triggering SOC DMA by NPU DMA on chip
CN115328832B (en) Data scheduling system and method based on PCIE DMA
CN116301627A (en) NVMe controller and initialization and data read-write method thereof
CN111190840A (en) Multi-party central processing unit communication architecture based on field programmable gate array control
US11449450B2 (en) Processing and storage circuit
CN114968863A (en) Data transmission method based on DMA controller
CN113886104A (en) Multi-core chip and communication method thereof
US8462561B2 (en) System and method for interfacing burst mode devices and page mode devices
CN107807888B (en) Data prefetching system and method for SOC architecture
CN108153703A (en) A kind of peripheral access method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination