CN111797034A

CN111797034A - Data management method, neural network processor and terminal equipment

Info

Publication number: CN111797034A
Application number: CN202010590844.1A
Authority: CN
Inventors: 曹庆新; 李炜
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2020-10-20

Abstract

The application is applicable to the technical field of neural networks, and provides a data management method, a neural network processor and terminal equipment. The neural network processor applied to the method comprises a task manager, a memory unit, a controller, a cache space, a computing unit, a first DMA for writing data into the cache space and a second DMA for reading data from the cache space. The method comprises the following steps: the task manager configures the working parameters of each DMA according to a data caching strategy, wherein the working parameters comprise a data storage mode, the size of an access space, an access starting address and an access ending address; each DMA is combined with the controller to control the transmission and storage of the operation data among the memory unit, the cache space and the computing unit according to the working parameters. The method provided by the embodiment can reduce the time length of caching operation data of the neural network processor and improve the calculation rate of the neural network.

Description

Data management method, neural network processor and terminal equipment

Technical Field

The application belongs to the technical field of neural networks, and particularly relates to a data management method, a neural network processor and terminal equipment.

Background

In the neural network processor, a cache space, such as a Static Random Access Memory (SRAM), is provided between a Memory unit for storing operation data and a computing unit.

For caching the operation data.

An input storage area, a weight storage area and a result storage area are configured in the cache space to store different operation data at the same time, wherein the operation data comprises input data, weight data and result data. At present, the size and the storage mode of each storage area are usually fixed, but as the calculation of the neural network is performed, the structure of the calculation data is constantly changed, for example, the input data is gradually reduced, and the weight data is gradually increased, so that the weight storage area cannot timely cache massive weight data, which results in long time consumption and slow calculation rate in the data caching process of the neural network processor.

Disclosure of Invention

The embodiment of the application provides a data management method, a neural network processor and terminal equipment, and can solve the problems that in the neural network calculation process, the time consumption of a data caching process is long and the calculation rate is slow.

In a first aspect, an embodiment of the present application provides a data management method, which is applied to a neural network processor, where the neural network processor includes a task manager, a memory unit, a controller, a cache space, a computing unit, a first DMA that writes data into the cache space, and a second DMA that reads data from the cache space; the method comprises the following steps:

the task manager configures the working parameters of each DMA according to a data caching strategy, wherein the working parameters comprise a data storage mode, the size of an access space, an access starting address and an access ending address;

and each DMA jointly controls the transmission and storage of operation data among the memory unit, the cache space and the computing unit according to the working parameters, wherein the operation data comprises input data, weight data and result data.

In this embodiment, since the DMA is a component for transferring data between the memory unit and the buffer space or between the buffer space and the computing unit, the accessible address range determines the size of the storage space for the transferred data, and the data storage mode determines the manner of transferring the data. Therefore, the working parameters (such as a data storage mode, an access starting address, an access ending address and the like) of each configured DMA are configured according to the data caching strategy, so that the DMA and the controller can be combined to cache the operation data according to the data caching strategy, the time consumption of the data caching process is reduced, and the calculation rate of the neural network is improved.

In a second aspect, the present implementation provides a neural network processor, comprising: the system comprises a task manager, a memory unit, a controller, a cache space, a computing unit, a first DMA for writing data into the cache space and a second DMA for reading data from the cache space;

the task manager is used for configuring the working parameters of each DMA according to a data caching strategy, wherein the working parameters comprise a data storage mode, the size of an access space, an access starting address and an access ending address;

each DMA is used for controlling the transmission and storage of operation data among the memory unit, the cache space and the computing unit in combination with the controller according to the working parameters, and the operation data comprises input data, weight data and result data.

In a third aspect, this embodiment provides a terminal device, including a memory, a main processor, a neural network processor, and a computer program stored in the memory and operable on the neural network processor, where the neural network processor implements the data management method provided in the first aspect when executing the computer program.

It is to be understood that, the beneficial effects of the second to third aspects may be referred to the related description of the first aspect, and are not described herein again.

Drawings

Fig. 1 is a schematic structural diagram of a neural network processor according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a data management method according to an embodiment of the present application;

fig. 3 is a first schematic diagram illustrating a memory area division of a weight memory area according to an embodiment of the present application;

fig. 4 is a schematic diagram illustrating a memory area division of a weight memory area according to an embodiment of the present application;

fig. 5 is a schematic diagram of dividing a storage region of a weight storage region according to an embodiment of the present application;

fig. 6 is a schematic flowchart of a method for storing weight data according to an embodiment of the present application;

fig. 7 is a schematic flowchart of a method for reading weight data according to an embodiment of the present application;

fig. 8 is a block diagram of a terminal device according to an embodiment of the present application.

Detailed Description

The data management method provided by the embodiment of the application can be applied to a neural network processor of a mobile phone, a tablet personal computer, a wearable device, a vehicle-mounted device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA) and other terminal devices, and the embodiment of the application does not limit the specific type of the terminal device at all.

It should be understood that the reference herein to first, second, and various numerical designations is merely a convenient division to describe and is not intended to limit the scope of the present application.

It should also be understood that the term "and/or" herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The following describes a neural network processor according to an embodiment of the present application.

Referring to fig. 1, a neural network processor is exemplarily illustrated in fig. 1 and includes a task manager, and a neural network computing module connected to the task manager. The neural network computing module comprises: the Memory device comprises a Memory unit, a controller, a cache space, a computing unit and a plurality of Direct Memory Access (DMA). The plurality of DMAs includes DMA-EI, DMA-I, DMA-EW, DMA-W, DMA-EO and DMA-O. The DMA-EI, the DMA-EW and the DMA-EO are connected between the memory unit and the controller, the DMA-I, DMA-W and the DMA-O are connected between the controller and the computing unit, and the controller is connected with the cache space.

And the task manager is used for controlling and scheduling each element in the neural network computing module to work cooperatively.

The memory unit, such as a Double Data Rate (DDR) memory unit, is configured to store operation Data of the neural network model, where the operation Data includes input Data, weight Data, and result Data calculated by the calculating unit according to the input Data and the weight Data.

DMA for reading data from one element and writing it to another element. E.g., DMA-EI and DMA-EW, are used to read the input data and weight data from the memory cells and write them into the buffer space, respectively. And the DMA-O is used for reading the result data from the computing unit and writing the result data into the cache space. DMA-I and DMA-W are used to read the input data and weight data, respectively, from the buffer space and write them to the calculation unit. The DMA-EO is used to read the result data from the cache space and write it to the memory unit.

It should be noted that the present embodiment refers to the DMA writing data to the buffer space as the first DMA, such as DMA-EI, DMA-EW, and DMA-O. The DMA that reads data from the buffer space is referred to as a second DMA, e.g., DMA-I, DMA-W and DMA-EO.

The cache space, such as a locally shared SRAM cache space, is used for caching operation data between the memory unit and the computing unit. According to different operation data types, different storage areas are divided in the cache space. Illustratively, referring to fig. 1, when the operation data includes input data, weight data and result data, the cache space is internally divided into an input storage region, a weight storage region and a result storage region, respectively, to store the data of the corresponding category.

A controller, such as Cross Bar arbitration logic (XBAR), is used to manage DMA write and read data operations to the buffer space. In the present embodiment, referring to fig. 1, the controller is provided with an input arbitration component, a weight arbitration component and a result arbitration component, which are respectively used for managing the corresponding storage areas, corresponding to the cache space divided into the input storage area, the weight storage area and the result storage area.

And the computing unit is used for computing to obtain result data according to the preset neural network model and the input data and the weight data read from the cache space, and returning the result data to the memory unit through the cache space.

Based on the neural network processor provided by the embodiment, the embodiment of the application provides a data management method for improving the data caching efficiency of the neural network processor, so as to improve the calculation efficiency.

Referring to fig. 2, fig. 2 is a schematic diagram of a data management method according to an embodiment of the present application. The method comprises the following steps S201-S202.

S201, the task manager configures the working parameters of each DMA in the neural network processor according to a data caching strategy, wherein the working parameters comprise a data storage mode, the size of an access space, an access starting address and an access ending address.

The data caching strategy determines how to cache the operational data in the calculation process of the neural network, wherein the data caching strategy comprises the data storage mode of each operational data, the size of an occupied storage area and the like. According to the data caching strategy, the task manager can configure the working parameters of each DMA in the neural network processor.

It should be noted that, in the operating parameters, the data storage mode may be a static storage mode or a FIFO storage mode, but the present embodiment does not limit the size of the access space, and specific values of the access start address and the access end address, which are specifically determined according to the data caching policy.

S202, each DMA jointly controls the transmission and storage of the operation data among the memory unit, the cache space and the computing unit according to the configured working parameters and the controller.

Illustratively, in conjunction with fig. 1, the DMA and controller jointly control the transfer and storage of operation data among the memory unit, the buffer space, and the computing unit, including: the DMA-EI is controlled by the controller to write the input data into the input storage area from the memory unit according to the configured working parameters; the DMA-I is controlled in conjunction with the controller to read input data from the input memory region by the computing unit in accordance with the configured operating parameters. And the DMA-EW is combined with the controller to control the weight data to be written into the weight storage area from the memory unit according to the configured working parameters. The DMA-W is controlled in conjunction with the controller to read the weight data from the weight storage area by the computing unit according to the configured operating parameters. And the DMA-O controls the result data to be written into the result storage area from the calculation unit in combination with the controller according to the configured working parameters. The DMA-EO is controlled in conjunction with the controller to read result data from the result storage area by the memory unit according to the configured operating parameters.

The following describes the data caching strategy related to the present application, which is specifically shown as follows.

In this embodiment, before the operation of the neural network processor, a computation bottleneck that may occur in the operation process of the neural network model is determined in advance according to the size of the neural network model to be operated and the cache space. Different data caching strategies can be configured for the neural network model aiming at different computing bottlenecks. Wherein the computation bottleneck is used for characterizing the reason causing the reduction of the computation rate of the neural network. For example, a computational bottleneck may be too slow a supply of input data, or too slow a supply of weight data.

In some embodiments, when the computational bottleneck of the neural network processor is too slow to supply the input data, the data caching policy comprises:

for the result data of the neural networks from the 1 st layer to the nth layer acquired by the computing unit in the cache space, the cache space stores the result data of the neural network of the i-th layer to the local, so that the result data can be directly called by the computing unit in the computing process of the (i + 1) th layer neural network; and only sending the result data of the nth layer of neural network to the memory unit. Wherein n is the total number of layers of the neural network, i belongs to [1, n-1], and i is an integer.

Illustratively, for a neural network model with 10 layers, for the result data of the neural network from the layer 1 to the layer 10, which is acquired from the computing unit in the cache space, the result data of the neural network of the layer 1 is stored locally, so as to be directly called by the computing unit in the computing process of the neural network of the layer 2; storing result data of the layer 2 neural network to the local so as to be directly called by a computing unit in the layer 3 neural network computing process; and so on. And sending the result data of the last layer of neural network, namely the 10 th layer of neural network to the memory unit.

When the computational bottleneck of the neural network processor is too slow to supply the weight data, the data caching policy may include the following (1) - (3).

(1) The multiplexed data is determined from the weight data.

The neural network processor may set a part of the weight data as multiplexed data (i.e., data that is read multiple times after one buffering) and set the rest of the weight data as non-multiplexed data (i.e., data that is read only once after one buffering) according to the neural network model and the size of the buffer space.

It should be noted that not all data that will be reused may be set as multiplexed data, or may be set as non-multiplexed data, and is repeatedly written into the buffer space for multiple times, which is determined according to the data buffering policy.

(2) And determining a multiplexing area in the buffer space, wherein the multiplexing area is larger than or equal to the size of the storage space occupied by the multiplexing data.

In the present embodiment, the division of the weight storage space may be determined according to the multiplexing ratio of the weight data. The multiplexing proportion is the ratio of the size of the storage space occupied by the multiplexing data in the weight data to the size of the storage space occupied by all the weight data. The multiplexing ratio is between 0-100%, for example, it may be 0%, 10%, 15%, or 100%. Illustratively, when the size of the storage space occupied by the weight data is 100 kilobytes (Kb) and the size of the storage space occupied by the multiplexed data in the weight data is 20Kb, then the multiplexing ratio of the weight data is 20%.

When the multiplexing ratio of the weight data is 100%, the weight storage area of the buffer space includes only the multiplexing area, for example, as shown in fig. 3. When the multiplexing ratio is greater than 0% and less than 100%, for example, 10%, the weight storage area includes a multiplexing area and a non-multiplexing area, for example, as shown in fig. 4. When the multiplexing proportion of the weight data is 0%, the weight storage area includes only the non-multiplexing area, as shown in fig. 5, for example.

The multiplexing area is used for caching the multiplexing data in the weight data, and the multiplexing area is larger than or equal to the size of the storage space occupied by all the multiplexing data so as to store all the multiplexing data at the same time. The multiplexing area can store the written multiplexing data for a long time according to the requirement of a neural network algorithm so that the calculation unit can read the data for multiple times.

The non-multiplexing area is used for buffering non-multiplexing data in the weighted data, the memory of the non-multiplexing area is usually small, all the non-multiplexing data cannot be buffered at one time, the non-multiplexing data needs to be buffered in batches, and the data stored firstly may be covered by the data written later and cannot be reused.

(3) And buffering the multiplexing data to the multiplexing area, and buffering the non-multiplexing data to the non-multiplexing area.

And according to the data multiplexing strategy, the memory unit writes the multiplexing data into the corresponding multiplexing area and writes the non-multiplexing data into the corresponding non-multiplexing area. For example, when the multiplexing ratio of the weight data is 10%, the memory unit writes the multiplexed data in the weight data into the multiplexing region in the weight storage region through the DMA-EW, and writes the non-multiplexed data in the weight data into the non-multiplexing region in the weight storage region through the DMA-EW.

It should be noted that, the size of the buffer space is considered in the process of determining the multiplexing ratio by the neural network processor, so that there is enough memory in the buffer space to buffer all the multiplexed data simultaneously, and data overflow does not occur.

The memory area is divided in the buffer space according to the multiplexing proportion of the multiplexed data, and the memory area comprises a multiplexing area and/or a non-multiplexing area. When the multiplexing data exists in the neural network computing process, by adopting the data management method provided by the embodiment, the memory unit only needs to cache the multiplexing data to the multiplexing area in the storage area once, and the computing unit can read the multiplexing data for multiple times, so that the time consumption of the data caching process is reduced, and the computing rate is improved.

In other embodiments, the computational bottleneck of the neural network processor includes both slow supply of input data and slow supply of weight data. As the neural network computation proceeds, the input data will gradually become less and the weight data will gradually increase. Therefore, the data caching policy may be to use the data caching policy on the input data at an early stage of the neural network computation and use the data caching policy on the weighted data at a later stage. The details are as follows.

In the calculation process of the neural networks of the 1 st layer to the m th layer, storing the result data of the neural network of the j th layer to the local for the result data of the neural networks of the 1 st layer to the m th layer, which is acquired from the calculation unit in the cache space, so that the calculation unit can directly call the result data in the calculation process of the neural network of the j +1 th layer; and sending the result data of the mth layer neural network to a memory unit.

Determining multiplexing data from the weight data in the calculation process of the neural networks from the (m + 1) th layer to the nth layer; determining a multiplexing area in the cache space, wherein the multiplexing area is larger than or equal to the size of a storage space occupied by the multiplexing data; caching the weight data to a multiplexing area; wherein n is the total number of layers of the neural network, m is less than n and is an integer, j belongs to [1, m-1], and j is an integer.

Illustratively, for a neural network model with 10 layers, when m is 3, storing result data of the layer 1 neural network to the local for being directly called by a computing unit in the layer 2 neural network computing process; and storing the result data of the layer 2 neural network to the local for being directly called by the computing unit in the layer 3 neural network computing process. And then, all the result data of the neural networks from the layer 3 to the layer 10 are sent to the memory unit. And, in the layer 4 to layer 10 neural network calculation process, determining multiplexing data from the weight data; determining a multiplexing area in the cache space, wherein the multiplexing area is larger than or equal to the size of a storage space occupied by the multiplexing data; and buffering the multiplexing data to the multiplexing area.

It should be noted that, the neural network computation may use a Cross Layer slicing (CLT) technique, that is, a feature map (feature map) data is sliced into a plurality of data blocks (tiles), and multiple layers of data operations are continuously performed on each data block. When the CLT technique is adopted, the above-mentioned caching strategies for the input data and the weight data can be adopted for each data block in the process of caching the related operation data.

In addition, in the calculation process of each layer of neural network, the data caching policy about the input data and the data caching policy about the weight data may also be adopted at the same time, which is not limited in this embodiment.

The following describes the DMA operating parameters related to the present application.

Taking the neural network processor shown in fig. 1 as an example, since both DMA-EI and DMA-I are used to access input data in the input memory space, their operating parameters are the same. Both DMA-EW and DMA-W are used to access the weight data in the weight memory space, and therefore their operating parameters are the same. Both DMA-EO and DMA-O are used to access result data in the result memory space and therefore have the same operating parameters.

When the DMA is used for accessing the cache space in the static storage mode, the task manager configures the DMA with the following contents:

(1) data storage mode of DMA: the static storage mode, i.e. the DMA writes data to the buffer space or reads data from the buffer space in a non-FIFO queue mode.

(2) Access parameters of DMA: the size of the access space, the access start address, and the access end address, etc.

When a DMA is used to access a buffer space In a First-In First-Out (FIFO) queue mode, the task manager configures the DMA with the following contents:

(1) data storage mode of DMA: the FIFO queue mode, i.e. the DMA writes data to or reads data from the buffer space in the FIFO queue mode.

(2) Access parameters of DMA: including the depth of the FIFO queue, the access start address, the access end address, etc. Note that, after accessing the end address of the FIFO queue, the DMA accesses from the start address of the FIFO queue again.

(3) And the pointer resetting instruction is used for resetting the FIFO queue pointer.

(4) The handshake granularity K of the FIFO queue is not less than 1 and is an integer, and is used for controlling the handshake granularity of the read pointer and the write pointer of the FIFO queue. The handshake granularity refers to the number of times of executing data writing operation when the DMA starts to read data from the FIFO queue. For example, when K is 5, the write data operation in the FIFO queue is executed the 5 th time, and the read data operation starts to be executed.

It is worth noting that by configuring the size of the handshake granularity, the time point when the DMA starts to read data from the FIFO queue can be controlled. For example, when the handshake granularity is configured to be small, for example, K is 1, then each time the DMA-EW writes one piece of weight data to the FIFO queue, the write pointer of the corresponding FIFO queue is incremented by 1, so that the FIFO queue is not empty, thereby triggering the operation of the DMA-W to read the weight data. When the handshake granularity is configured to be large, for example, when K is 10, then DMA-EW triggers the write pointer of the FIFO queue to add 1 every time 10 data are written, so that the FIFO queue is not empty, and the operation of DMA-W to read the weight data is triggered.

It can be understood that the smaller the handshake granularity is configured, the smaller the delay between writing data and reading data, which helps to improve the computational efficiency of the neural network processor. However, it should be noted that, in a specific application scenario, the value of the handshake granularity needs to match the design logic of a specific DMA.

In some embodiments, when the first DMA write data has a width A and the second DMA read data has a width B. In order to ensure that the increased operation data volume in the FIFO queue is equal to the decreased operation data volume in the FIFO queue when the write pointer of the FIFO queue is increased by 1 and the read pointer is increased by 1, the first DMA sends a write pointer increase by 1 instruction to the corresponding arbitration component in the controller every time the first DMA sends the Kth write access request. And a second DMA per Transmit

And sending a read pointer plus 1 instruction to a corresponding arbitration component in the controller when the read access request is received. Wherein the content of the first and second substances,

is an integer and K is handshake granularity.

Illustratively, when the DMA-EI write data width is 256 bits and the DMA-W read data width is 128 bits, the DMA-EW sends a write pointer plus 1 instruction to the weight arbitration component every time it sends 1 write access request. That is, when the write pointer of the FIFO queue is increased by 1, the operation data added to the FIFO queue is 256 bits. The DMA-I sends a read pointer plus 1 instruction to the weight arbitration component every time it sends 2 read access requests.

The access start address and the access end address of the DMA determine the storage location and the size of the storage space of the data handled by the DMA in the buffer space. For example, for DMA-EI and DMA-I, their configured access start address and access end address determine the location and size of the input memory region in the buffer space for buffering input data. For DMA-EW and DMA-W, their configured access start address and access end address determine the location and size of the weight storage region in the cache space for caching weight data. For DMA-EO and DMA-O, their configured access start address and access end address determine the location and size of the result storage space in the cache space for caching result data.

According to different data caching strategies, in different neural network computing layers, configuration parameters of each DMA may be different, and corresponding data storage modes, and positions and sizes of an input storage region, a weight storage region, and a result storage region may also be changed.

The following describes how the DMA according to the present application controls the transmission and storage of the operation data among the memory unit, the buffer space, and the computing unit in conjunction with the controller according to the configured operating parameters.

Taking the weight data as an example, the weight storage space is provided with a multiplexing area and a non-multiplexing area, which can be in a FIFO queue storage mode or a static storage mode. The two storage modes of the multiplexing area and the non-multiplexing area will be described below.

When the DMA-EW and the DMA-W are configured to be in a static storage mode, for the non-multiplexing region, because the memory is generally small, the memory unit writes the weight data into the non-multiplexing region through the DMA-EW in turn for multiple times according to the order of the weight data used by the neural network calculation process. After the weight data written in each time is read by the computing unit through the DMA-W, the memory unit can write new weight data into the cache space again and overwrite the original weight data. For the multiplexing region, since it is possible to buffer all the multiplexed data at once, the memory unit stores all the multiplexed data in the weight data in the multiplexing region by DMA-EW. After all the multiplexing data are completely cached, the computing unit can directly read from the multiplexing area through the DMA-W when using the multiplexing data each time, and the memory unit is not required to write the multiplexing area into the multiplexing area for multiple times. Therefore, the time consumption of the data caching process is reduced, and the calculation rate of the neural network processor is improved.

Taking the weight data as an example, when the DMA-EW and the DMA-W are configured in the FIFO queue storage mode, the process of writing the operation data into the memory unit is the same for the multiplexing region and the non-multiplexing region, and the process of reading the operation data from the computing unit is the same. In contrast, the FIFO queue of the multiplexing section may store all multiplexed weight data at once. And after the computing unit finishes reading all the multiplexing data in the weight data once, the task manager changes the multiplexing area from the FIFO queue storage mode to the static storage mode, so that the computing unit can directly read the same weight data from the multiplexing area for multiple times. While the FIFO queue for the non-multiplexed region is typically small and typically cannot buffer all of the non-multiplexed data at once. Therefore, the operation data is buffered in the FIFO queue in the form of data flow, and the weight data is input into another element from one element. Such as from a memory unit into a computing unit.

The following describes the process of storing and reading data in the FIFO queue storage mode, taking weighted data as an example.

Referring to fig. 6, fig. 6 shows a process of writing weight data to a FIFO queue of a multiplexing section in a weight storage section by a memory unit through DMA-EW, the process including the following steps S601 to S605.

S601, the weight arbitration component detects the empty and full state of the FIFO queue corresponding to the DMA-EW in the weight storage area.

And for the weight data, when the corresponding multiplexing area and/or non-multiplexing area is in a FIFO queue storage mode, the weight arbitration component is used for managing the read-write process of the weight data.

In one example, the weight arbitration component can determine an empty-full status of the weight storage area FIFO queue based on a depth of the FIFO queue, a write pointer, and a read pointer.

S602, when the FIFO queue is not full, the weight arbitration component sends a high level signal to the DMA-EW for indicating that the FIFO queue can write data currently.

S603, the DMA-EW reads the first weight data from the memory unit.

The bandwidth of the DMA-EW to read the weight data from the memory cell is preconfigured. Illustratively, the bandwidth may be 64 bits, 128 bits, etc., and the embodiment is not limited.

S604, the DMA-EW sends a write access request to the weight arbitration component, wherein the write access request carries the first weight data and the first address information of the FIFO queue in the weight storage area.

S605, the weight arbitration component writes the first weight data into the FIFO queue corresponding to the first address information according to the write access request, and adds 1 to the write pointer.

In the above process, it should be noted that when the weight arbitration component detects that the FIFO queue is full, a low signal is sent to the DMA-EW for indicating that the FIFO queue may not be currently written with data. The DMA-EW stops writing the weight data into the FIFO queue after receiving the low level signal, and avoids the FIFO queue from overflowing.

Referring to fig. 7, fig. 7 shows a process of the calculation unit reading weight data from the FIFO queue of the multiplexing area in the weight storage area through DMA-W, the process including the following steps S701 to S706.

S701, the weight arbitration component detects the empty and full state of the FIFO queue in the weight storage area.

In one example, the weight arbitration component can determine an empty-full status of the FIFO queue in the weight storage area based on a depth of the FIFO queue, a write pointer, and a read pointer.

S702, when the FIFO queue of the weight storage area is not empty, the weight arbitration component sends a high level signal to the DMA-W, and the high level signal is used for indicating that the FIFO queue can read data currently.

S703, the DMA-W sends a read access request to the weight arbitration component, wherein the read access request comprises the second address information of the FIFO queue.

S704, the weight arbitration component reads the second weight data from the FIFO queue according to the second address information.

S705, the weight arbitration component sends the second weight data to the DMA-W and adds 1 to the read pointer.

S706, the DMA-W sends the second weight data to the computing unit.

In the above process, it should be noted that when the weight arbitration component detects that the FIFO queue is empty, a low signal is sent to the DMA-W to indicate that the FIFO queue may not currently read data. The DMA-W suspends the reading of data from the FIFO queue after receiving the low level signal, and avoids the FIFO queue from underflowing.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

The neural network processor completes the transmission of the operation data between the memory unit and the computing unit in a data flow mode by means of the FIFO queue, so that the data writing operation and the data reading operation can be simultaneously carried out in a cache space, the transmission time of the operation data between the memory unit and the computing unit is shortened, the data transmission process does not limit the size of data volume, the middle process of the transmission does not need software intervention, and the transmission efficiency is higher.

In addition, it should be noted that, in the calculation process of the neural network, when the buffer space is used for inputting data and buffering result data, the data may also be buffered in a FIFO queue storage mode or a static storage mode. For a specific buffering process, refer to the above description related to the buffering weight data, and are not described herein again.

Referring to fig. 8, the present embodiment further provides a terminal device, which includes a memory 801, a main processor 802, a neural network processor 803, and a computer program 804 stored in the memory 801 and executable on the neural network processor 803, and when the neural network processor 803 executes the computer program, the method shown in steps S201-S202 is implemented.

The memory 801 may be an internal storage unit of the neural network processor, such as a hard disk or a memory, in some embodiments. The memory 801 may also be an external storage device of the neural network processor in other embodiments, such as a plug-in hard disk provided on the neural network processor, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 801 may also include both an internal storage unit of the neural network processor and an external storage device. The memory 801 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, and other programs, such as program codes of the computer programs. The memory 801 may also be used to temporarily store data that has been output or is to be output.

The main Processor 802 may be a Central Processing Unit (CPU), and the main Processor 802 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field-Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The present embodiment also provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a neural network processor, the computer program can implement the method shown in the above steps S201 to S202.

The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-only Memory (ROM), Random Access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

The embodiment of the application also provides a computer program product containing instructions. When the computer program product is run on a computer or a neural network processor, the computer or the neural network processor is caused to perform the method illustrated in the above steps S201-S202.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optics, digital subscriber line) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media.

One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM and RAM.

Finally, it should be noted that: the above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A data management method is characterized by being applied to a neural network processor, wherein the neural network processor comprises a task manager, a memory unit, a controller, a cache space, a computing unit, a first DMA for writing data into the cache space and a second DMA for reading data from the cache space; the method comprises the following steps:

2. The method of claim 1, wherein the data caching policy comprises:

for the result data of the neural networks from the layer 1 to the layer n acquired by the computing unit in the cache space, the cache space stores the result data of the neural network of the layer i to the local, so that the result data can be directly called by the computing unit in the computing process of the neural network of the layer i + 1; only sending the result data of the nth layer of neural network to the memory unit;

wherein n is the total number of layers of the neural network, i belongs to [1, n-1], and i is an integer.

3. The method of claim 1, wherein the data caching policy comprises:

determining multiplexed data from the weight data;

determining a multiplexing area in the cache space, wherein the multiplexing area is larger than or equal to the size of a storage space occupied by the multiplexing data;

and buffering the multiplexing data to the multiplexing area.

4. The method of claim 1, wherein the data caching policy comprises:

in the calculation process of the neural networks of the 1 st layer to the m th layer, storing the result data of the neural network of the j th layer to the local part of the result data of the neural network of the 1 st layer to the m th layer, which is acquired from the calculation unit by the cache space, so that the calculation unit directly calls the result data in the calculation process of the neural network of the j +1 th layer; sending result data of the neural networks from the m layer to the n layer to the memory unit;

in the calculation process of the neural networks from the (m + 1) th layer to the nth layer, determining multiplexing data from the weight data; determining a multiplexing area in the cache space, wherein the multiplexing area is larger than or equal to the size of a storage space occupied by the multiplexing data; caching the multiplexing data to the multiplexing area;

wherein n is the total number of layers of the neural network, m is less than n and is an integer, j belongs to [1, m-1], and j is an integer.

5. The method of claim 3 or 4, wherein the data caching policy further comprises:

when the multiplexing proportion of the multiplexed data is greater than 0 and less than 100%, the buffer space further includes a non-multiplexing area, and the non-multiplexing area is used for buffering the non-multiplexed data in the weight data.

6. The method of claim 1, wherein the data storage mode comprises a static storage mode or a first-in-first-out (FIFO) storage mode.

7. The method as claimed in claim 6, wherein when the data storage mode is the FIFO storage mode, the controlling, by each DMA in combination with the controller, the transfer and storage of operation data among the memory unit, the buffer space, and the compute unit according to the operating parameters comprises:

the controller detects the empty and full state of an FIFO queue corresponding to the first DMA in the cache space;

when the FIFO queue is not full, the controller sends a high level signal to the first DMA;

responding to the high level signal, the first DMA sends a write access request to the controller, wherein the write access request carries a target write address of the FIFO queue and data to be written read from the memory unit or the computing unit;

and the controller writes the data to be written into a storage area corresponding to a target write address in the FIFO queue according to the write access request.

8. The method of claim 7, further comprising:

when the FIFO queue is full, the controller sends a low level signal to the first DMA;

in response to the low signal, the first DMA suspends writing data to the FIFO queue.

9. The method as claimed in claim 6, wherein when the data storage mode is a FIFO storage mode, each DMA, in combination with the controller, controls the transfer and storage of operation data among the memory unit, the buffer space, and the compute unit according to the operating parameters, further comprising:

the controller detects an empty-full state of a FIFO queue corresponding to the second DMA in the cache space;

when the FIFO queue is not empty, the controller sends a high level signal to the second DMA;

in response to the high signal, the second DMA sends a read access request to the controller, the read access request including a target read address in the FIFO queue;

and the second DMA reads the operation data from the FIFO queue according to the target reading address and writes the read operation data into the memory unit or the computing unit.

10. The method of claim 7, further comprising:

when the FIFO queue is empty, the controller sends a low level signal to the second DMA;

in response to the low signal, the second DMA suspends reading data from the FIFO queue.

11. The method of claim 6, wherein when the data storage mode is the FIFO storage mode, the operating parameters further comprise a handshake granularity K of the FIFO queue, the handshake granularity K being used to indicate the number of times a first DMA performs a data write operation to the FIFO queue when a second DMA starts to read data from the FIFO queue, where K is an integer.

12. The method of claim 11, further comprising:

when the first DMA sends the Kth write access request, a write pointer plus 1 instruction is sent to the controller; and, in response to the write pointer plus 1 instruction, the controller adds 1 to the write pointer of the FIFO queue;

every time the second DMA sends the second

When K read access requests exist, sending a read pointer plus 1 instruction to the controller; and, in response toAdding 1 to the read pointer, and adding 1 to the read pointer of the FIFO queue; wherein A is the data writing width of the first DMA, B is the data reading width of the second DMA,

are integers.

13. A neural network processor, comprising: the system comprises a task manager, a memory unit, a controller, a cache space, a computing unit, a first DMA for writing data into the cache space and a second DMA for reading data from the cache space;

14. A terminal device comprising a memory, a main processor, a neural network processor and a computer program stored in the memory and executable on the neural network processor, wherein the neural network processor implements the method of any one of claims 1-12 when executing the computer program.