CN111401541A

CN111401541A - Data transmission control method and device

Info

Publication number: CN111401541A
Application number: CN202010162763.1A
Authority: CN
Inventors: 陈子荷; 唐明华; 袁涛; 赵修齐; 马爱永; 王宏利
Original assignee: Hunan Goke Microelectronics Co Ltd
Current assignee: Hunan Goke Microelectronics Co Ltd
Priority date: 2020-03-10
Filing date: 2020-03-10
Publication date: 2020-07-10

Abstract

The invention discloses a data transmission control method and a device, wherein the method comprises the following steps: the processor sends a data request command; the direct memory access controller receives and processes a data request command sent by the processor, and determines a data reading rule; the direct memory access controller reads data requested by the processor from the random static memory according to the data reading rule and sends the data requested by the processor to the processor; the direct memory access controller is configured to receive and process data request commands sent by a plurality of processors at the same time, and determine corresponding data reading rules; the processor receives the requested data and begins the operation. The invention can enable different processors to read data from the static random access memory in the neural network acceleration processor in parallel, and saves the waiting time of data transmission during operation.

Description

Data transmission control method and device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data transmission control method and apparatus.

Background

The neural network is an arithmetic mathematical model for simulating animal neural network behavior characteristics and performing distributed parallel information processing, and the aim of processing information is fulfilled by adjusting the interconnection relationship among a large number of internal nodes depending on the complexity of the system. NNA (neural Network accelerator), also called as neural Network accelerator, is a module for calculating the operation tasks contained in the application scene of the artificial intelligence, the calculation complexity of the neural Network model is in direct proportion to the size of the input data, and the data volume needing to be operated is larger and larger as the application scene of the artificial intelligence is wider and wider.

The existing neural network accelerator does not support parallel operation, data needs to be moved to the outside to be calculated by other HOST processors after the NNA processor finishes operation, data transmission of the next step can be carried out only by waiting for the operation result of the previous step in the operation process, and once the data volume is large, the data transmission waiting time during operation is too long, and the operation efficiency is low.

Disclosure of Invention

In order to solve the above technical problems, the present invention provides a data transmission control method and apparatus, which enable different processors to read data from a static random access memory in a neural network acceleration processor in parallel, and save the waiting time of data transmission during operation.

One aspect of the present invention provides a data transmission control method, including:

the processor sends a data request command;

the direct memory access controller receives and processes the data request command sent by the processor, and determines a data reading rule; the direct memory access controller reads the data requested by the processor from a random static memory according to the data reading rule and sends the data requested by the processor to the processor; the direct memory access controller is configured to receive and process data request commands sent by a plurality of processors at the same time, and determine corresponding data reading rules;

the processor receives the requested data and begins the operation.

Preferably, the processor comprises a neural network acceleration processor and/or an external processor.

Preferably, the first and second substrates are, among others,

when the processor comprises a neural network acceleration processor, the direct memory access controller receives and processes a data request command sent by the neural network acceleration processor, and determines a data reading address of the neural network acceleration processor; the direct memory access controller reads the data requested by the neural network acceleration processor from a static random access memory according to the data reading address of the neural network acceleration processor and sends the data requested by the neural network acceleration processor to the neural network acceleration processor;

when the processor comprises an external processor, the direct memory access controller receives and processes a data request command sent by the external processor, and determines a data reading address of the external processor; the direct memory access controller reads the data requested by the external processor from a static random access memory according to the data reading address of the external processor and sends the data requested by the external processor to the external processor;

when the processor comprises a neural network acceleration processor and an external processor, the direct memory access controller receives and processes a data request command sent by the neural network acceleration processor and a data request command sent by the external processor, and determines a data reading address of the neural network acceleration processor and a data reading address of the external processor; and the direct memory access controller reads the data requested by the neural network acceleration processor from a static random access memory according to the data reading address of the neural network acceleration processor and reads the data requested by the external processor from the static random access memory according to the data reading address of the external processor, and sends the data requested by the neural network acceleration processor to the neural network acceleration processor and sends the data requested by the external processor to the external processor.

Preferably, when the processor includes a neural network acceleration processor and an external processor, the method further includes:

the direct memory access controller judges whether the data reading address of the neural network acceleration processor is the same as the data reading address of the external processor; wherein,

when the data reading address of the neural network acceleration processor is different from the data reading address of the external processor, the direct memory access controller simultaneously reads data from the static random access memory;

and when the data reading address of the neural network acceleration processor is the same as the data reading address of the external processor, the direct memory access controller determines the priority of the data requested by the neural network acceleration processor and the priority of the data requested by the external processor, and sequentially reads the data from the static random access memory according to the priority.

Preferably, when the processor includes a neural network acceleration processor and an external processor, the processor receives the requested data and starts to operate, specifically:

the neural network acceleration processor and the external processor respectively receive the requested data and start parallel operation.

Preferably, the sending the data requested by the external processor to the external processor specifically includes:

and sending the data requested by the external processor to the external processor through a network-on-chip bus.

The invention provides a data transmission control device on the other hand, which comprises a processor, a direct memory access controller and a static random access memory;

the processor is used for sending a data request command;

the direct memory access controller is used for receiving and processing a data request command sent by the processor and determining a data reading rule; the direct memory access controller is also used for reading the data requested by the processor from a random static memory according to the data reading rule and sending the data requested by the processor to the processor; the direct memory access controller is configured to receive and process data request commands sent by a plurality of processors at the same time, and determine corresponding data reading rules;

the processor is also configured to receive the requested data and begin the operation.

Preferably, the dma controller is specifically configured to:

when the processor comprises a neural network acceleration processor, receiving and processing a data request command sent by the neural network acceleration processor, and determining a data reading address of the neural network acceleration processor; reading the data requested by the neural network acceleration processor from a static random access memory according to the data reading address of the neural network acceleration processor, and sending the data requested by the neural network acceleration processor to the neural network acceleration processor;

when the processor comprises an external processor, receiving and processing a data request command sent by the external processor, and determining a data reading address of the external processor; reading the data requested by the external processor from a static random access memory according to the data reading address of the external processor, and sending the data requested by the external processor to the external processor;

when the processor comprises a neural network acceleration processor and an external processor, receiving and processing a data request command sent by the neural network acceleration processor and a data request command sent by the external processor, and determining a data reading address of the neural network acceleration processor and a data reading address of the external processor; and reading the data requested by the neural network acceleration processor from a static random access memory according to the data reading address of the neural network acceleration processor and the data requested by the external processor from the static random access memory according to the data reading address of the external processor, and sending the data requested by the neural network acceleration processor to the neural network acceleration processor and sending the data requested by the external processor to the external processor.

Preferably, when the processor comprises a neural network acceleration processor and an external processor, the direct memory access controller is further configured to determine whether a data read address of the neural network acceleration processor is the same as a data read address of the external processor; wherein,

Preferably, the dma controller sends the data requested by the external processor to the external processor, specifically:

and the direct memory access controller sends the data requested by the external processor to the external processor through an on-chip network bus.

The invention has at least the following beneficial effects:

the direct memory access controller embedded in the neural network acceleration processor is configured to simultaneously receive and process data request commands sent by a plurality of processors and determine corresponding data reading rules, so that different processors can read requested data from the static random access memory in the neural network acceleration processor in parallel through the direct memory access controller, thereby accelerating parallel operation of different processors and saving the waiting time of data transmission during operation.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flow chart of a data transmission control method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a data transmission control apparatus according to an embodiment of the present invention.

Detailed Description

The core of the invention is to provide a data transmission control method and a device, and the direct memory access controller is configured to receive and process data request commands sent by a plurality of processors at the same time and determine corresponding data reading rules, so that different processors can read the requested data from the static random access memory respectively through the direct memory access controller in parallel, and thus, the parallel operation of different processors can be accelerated.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

An embodiment of the present invention provides a data transmission control method in one aspect, please refer to fig. 1, where the data transmission control method includes:

step S110, the processor sends a data request command.

In the embodiment of the present invention, when different processors need to Access a Static Random Access Memory (SRAM) inside a Neural Network Accelerator (NNA), data request commands may be respectively issued.

Step S120, the direct memory access controller receives and processes the data request command sent by the processor, and determines a data reading rule; the direct memory access controller reads data requested by the processor from the random static memory according to the data reading rule and sends the data requested by the processor to the processor; the direct memory access controller is configured to receive and process data request commands sent by a plurality of processors at the same time, and determine corresponding data reading rules.

In the embodiment of the invention, the NNA Direct Memory Access (DMA) controller is configured to simultaneously receive and process data request commands sent by a plurality of processors, and corresponding data reading rules are determined, so that different processors can read requested data from the SRAM in parallel through the DMA controller. The DMA controller in the NNA can be configured through software, so that the DMA controller can receive and process data access requests of a plurality of processors at the same time, and the whole process is convenient to debug.

In step S130, the processor receives the requested data and starts the operation.

In the embodiment of the invention, different processors can respectively start operation after receiving the requested data, so that the parallel operation of the different processors can be accelerated, and the waiting time of data transmission during the operation of the processors is saved.

As can be seen from the above, in the data transfer control method provided in the embodiment of the present invention, the DMA controller inside the NNA is configured to receive and process the data request commands sent by the multiple processors at the same time, and determine the corresponding data reading rule, so that different processors can read the requested data from the SRAM inside the NNA in parallel through the DMA controller.

In particular implementations, the processor may include a Neural Network Acceleration (NNA) processor and/or an external Host processor. Specifically, the NNA processor can access the internal SRAM independently, and only the data request commands sent by the NNA processor are received and processed by the DMA controller; the external Host processor can also access the SRAM in the NNA independently, and the data request command sent by the processor and received and processed by the DMA controller is only the data request command sent by the external Host processor; the NNA processor and the external Host processor can also simultaneously access the SRAM in the NNA, and the data request command sent by the processor and received and processed by the DMA controller simultaneously comprises the data request command sent by the NNA processor and the data request command sent by the external Host processor.

The data reading rules determined by the DMA controller vary according to the received and processed data request commands sent by the processor. Next, a specific flow of step S120 in the embodiment of the present invention is specifically described.

When the processor includes an NNA processor, the step S120 specifically includes: the DMA controller receives and processes a data request command sent by the NNA processor, and determines a data reading address of the NNA processor; the DMA controller reads the data requested by the NNA processor from the SRAM according to the NNA processor data read address and sends the data requested by the NNA processor to the NNA processor. In this embodiment, when only the NNA processor accesses the SRAM alone, the data request command sent by the NNA processor and received and processed by the DMA controller is only the data request command sent by the NNA processor, and at this time, only the data read address of the NNA processor needs to be determined, the data requested by the NNA processor is read from the SRAM and then sent to the NNA processor, and the operation can be started after the NNA processor receives the requested data.

When the processor includes an external Host processor, the step S120 specifically includes: the DMA controller receives and processes a data request command sent by an external Host processor, and determines a data reading address of the external Host processor; the DMA controller reads the data requested by the external Host processor from the SRAM according to the external Host processor data read address, and transmits the data requested by the external Host processor to the external Host processor. In this embodiment, when only the external Host processor accesses the SRAM of the NNA alone, the data request command sent by the processor and received and processed by the DMA controller is only the data request command sent by the external Host processor, at this time, only the data read address of the external Host processor needs to be determined, the data requested by the external Host processor is read from the SRAM and then sent to the external Host processor, and the external Host processor can start to operate after receiving the requested data.

When the processors include an NNA processor and an external Host processor, the step S120 specifically includes: the DMA controller processes a data request command sent by the NNA processor and a data request command sent by the external Host processor, and determines an NNA processor data reading address and an external Host processor data reading address; the DMA controller reads the data requested by the NNA processor from the SRAM according to the NNA processor data read address and the data requested by the external Host processor from the SRAM according to the external Host processor data read address, and transmits the data requested by the NNA processor to the NNA processor and the data requested by the external Host processor to the external Host processor. In this embodiment, when the NNA processor and the external Host processor access the SRAM of the NNA simultaneously, the data request command sent by the processor received and processed by the DMA controller includes the data request command sent by the NNA processor and the data request command sent by the external Host processor, at this time, it is necessary to determine the data read address of the NNA processor and the data read address of the external Host processor, respectively, read the data requested by the NNA processor from the SRAM, send the data to the NNA processor, read the data requested by the external Host processor from the SRAM, and send the data to the external Host processor, where the NNA processor and the external Host processor start to operate after receiving the requested data.

When the NNA processor and the external Host processor access different physical BANKs of the SRAM in the NNA at the same time, the DMA controller can respectively read data requested by the NNA processor from the different physical BANKs of the SRAM and then send the data to the NNA processor, and read data requested by the external Host processor and then send the data to the external Host processor. But when the NNA processor and the external Host processor simultaneously access the same physical BANK of the NNA internal SRAM, the DMA controller cannot simultaneously perform two read data operations in the same physical BANK of the SRAM.

To solve the above problem, in some preferred embodiments of the present invention, when the processor includes an NNA processor and an external Host processor, the data transmission control method further includes:

the DMA controller judges whether the NNA processor data reading address is the same as the external Host processor data reading address or not; wherein,

when the NNA processor data reading address is different from the external Host processor data reading address, the DMA controller simultaneously reads data from the SRAM;

when the NNA processor data reading address is the same as the external Host processor data reading address, the DMA controller determines the priority of the data requested by the NNA processor and the data requested by the external Host processor, and reads the data from the SRAM in sequence according to the priority.

In the embodiment of the invention, when the NNA processor and the external Host processor access different physical BANKs of the SRAM in the NNA at the same time, the DMA controller can simultaneously read data requested by the NNA processor from the different physical BANKs of the SRAM, send the data to the NNA processor and read data requested by the external Host processor, and send the data to the external Host processor; when the NNA processor and the external Host processor simultaneously access the same physical BANK of the SRAM in the NNA, the DMA controller determines the priority of the data requested by the NNA processor and the data requested by the external Host processor, the DMA controller preferentially reads the data with high priority from the SRAM according to the priority of the requested data and then respectively sends the data to the corresponding processors, and the NNA processor and the external Host processor respectively start to operate after receiving the requested data. The priority can be configured by software, and debugging is facilitated.

Optionally, in the foregoing embodiment, when the processor includes an NNA processor and an external Host processor, the processor receives the requested data and starts to perform the operation, specifically:

the NNA processor and the external Host processor each receive the requested data and begin parallel operations.

In this embodiment, the NNA processor and the external Host processor can read requested data from the SRAM in parallel through the DMA controller to perform parallel computation acceleration. Therefore, the external Host processor does not need to wait for the NNA processor to finish the calculation and then read the data in the SRAM in the NNA for subsequent calculation, and does not need to wait for the calculation result of the previous step in the calculation process, thereby saving the waiting time of data transmission in the calculation of the processor.

In a specific implementation, the sending of the data requested by the external Host processor to the external Host processor is specifically to send the data requested by the external Host processor to the external Host processor through an on-chip network bus. In this embodiment, the external Host processor may issue a data request command to the SRAM inside the NNA via a Network On Chip (NOC) bus, and the data in the SRAM inside the NNA may also be sent to the external Host processor via the NOC bus. Optionally, the NNA may be mounted on the NoC bus, and since the bandwidth of the NoC bus is large enough, the external Host processor reads data in the SRAM inside the NNA very quickly, thereby further shortening the data transmission latency.

As can be seen from the above, the data transmission control method provided in the embodiment of the present invention can realize that different processors read data from the static random access memory in the neural network acceleration processor in parallel, and save the waiting time of data transmission during operation.

Another aspect of the present invention provides a data transmission control apparatus, which is described below and to which the above-described method is mutually referred. Referring to fig. 2, the data transmission control apparatus includes: a processor, a Direct Memory Access (DMA) controller 100, and a Static Random Access Memory (SRAM) 200;

the processor is used for sending a data request command;

the dma controller 100 is configured to receive and process a data request command sent by a processor, and determine a data reading rule; the dma controller 100 is further configured to read data requested by the processor from the sram 200 according to a data reading rule, and send the data requested by the processor to the processor; the dma controller 100 is configured to receive and process data request commands sent by multiple processors at the same time, and determine corresponding data reading rules;

As a preferred embodiment of the present invention, the processor includes a neural network acceleration processor 300 and/or an external processor 400. It is understood that, in the embodiment, the data request command sent by the processor received and processed by the dma controller 100 includes a data request command sent by the Neural Network Accelerator (NNA) processor 300 and/or a data request command sent by the external (Host) processor 400.

As a preferred embodiment of the present invention, the dma controller 100 is specifically configured to:

when the processor comprises the neural network acceleration processor 300, receiving and processing a data request command sent by the neural network acceleration processor 300, and determining a data reading address of the neural network acceleration processor; reading the data requested by the neural network acceleration processor 300 from the static random access memory 200 according to the neural network acceleration processor data reading address, and sending the data requested by the neural network acceleration processor to the neural network acceleration processor 300;

when the processor comprises the external processor 400, receiving and processing a data request command sent by the external processor 400, and determining an external processor data reading address; and reads the data requested by the external processor 400 from the static random access memory 200 according to the external processor data read address and transmits the data requested by the external processor to the external processor 400;

when the processor comprises the neural network acceleration processor 300 and the external processor 400, receiving and processing a data request command sent by the neural network acceleration processor 300 and a data request command sent by the external processor 400, and determining a data reading address of the neural network acceleration processor and a data reading address of the external processor; and reading data requested by the neural network acceleration processor 300 from the static random access memory 200 according to the neural network acceleration processor data read address and data requested by the external processor 400 from the static random access memory 200 according to the external processor data read address, and transmitting the data requested by the neural network acceleration processor to the neural network acceleration processor 300 and transmitting the data requested by the external processor to the external processor 400.

As a preferred embodiment of the present invention, when the processor includes the neural network acceleration processor 300 and the external processor 400, the dma controller 100 is further configured to determine whether the neural network acceleration processor data read address and the external processor data read address are the same; wherein,

when the neural network accelerated processor data reading address is different from the external processor data reading address, the direct memory access controller 100 reads data from the static random access memory 200 at the same time;

when the neural network acceleration processor data read address and the external processor data read address are the same, the dma controller 100 determines the priority of the data requested by the neural network acceleration processor 300 and the data requested by the external processor 400, and sequentially reads the data from the sram 200 according to the priority.

As a preferred embodiment of the present invention, when the processor includes the neural network acceleration processor 300 and the external processor 400, the processor receives the requested data and starts to operate, specifically:

the neural network acceleration processor 300 and the external processor 400 respectively receive the requested data and start parallel operations.

As a preferred embodiment of the present invention, the dma controller 100 sends the data requested by the external processor 400 to the external processor 400, specifically:

the dma controller 100 transmits data requested by the external processor 400 to the external processor 400 through a Network On Chip (NOC) bus.

As can be seen from the above, the data transmission control device provided in the embodiment of the present invention can realize that different processors read data from the static random access memory in the neural network acceleration processor in parallel, thereby saving the waiting time of data transmission during operation.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A data transmission control method, comprising:

the processor sends a data request command;

the processor receives the requested data and begins the operation.

2. The data transmission control method according to claim 1, wherein the processor comprises a neural network acceleration processor and/or an external processor.

3. The data transmission control method according to claim 2, wherein,

4. The data transmission control method of claim 3, wherein when the processor includes a neural network acceleration processor and an external processor, the method further comprises:

5. The data transmission control method according to claim 4, wherein when the processor includes a neural network acceleration processor and an external processor, the processor receives the requested data and starts to perform operations, specifically:

6. The data transmission control method according to any one of claims 3 to 5, wherein the sending the data requested by the external processor to the external processor specifically includes:

7. A data transmission control device is characterized by comprising a processor, a direct memory access controller and a static random access memory;

the processor is used for sending a data request command;

8. The data transmission control device of claim 7, wherein the processor comprises a neural network acceleration processor and/or an external processor.

9. The data transfer control device of claim 8, wherein the dma controller is specifically configured to:

10. The data transmission control device of claim 9, wherein when the processor comprises a neural network acceleration processor and an external processor, the dma controller is further configured to determine whether the neural network acceleration processor data read address and the external processor data read address are the same; wherein,

11. The data transmission control device according to claim 10, wherein when the processor includes a neural network acceleration processor and an external processor, the processor receives the requested data and starts to perform operations, specifically:

12. The data transmission control device according to any one of claims 9 to 11, wherein the dma controller sends the data requested by the external processor to the external processor, specifically: