WO2023134360A1

WO2023134360A1 - Data processing method and apparatus, and storage medium

Info

Publication number: WO2023134360A1
Application number: PCT/CN2022/138424
Authority: WO
Inventors: 孙炜; 祝叶华
Original assignee: 哲库科技(上海)有限公司
Priority date: 2022-01-14
Filing date: 2022-12-12
Publication date: 2023-07-20
Also published as: CN114492776A

Abstract

A data processing method and apparatus, and a storage medium. The apparatus comprises: a neural network processor (10), a cache memory (11), and an internal memory (12). The cache memory (11) comprises cache units (110). Each cache unit (110) comprises a cache data block (1100) and a cache counter (1101); the cache data block (1100) is used for caching stored data in the internal memory (12) and/or output data generated by the neural network processor (10); and the cache counter (1101) is used for caching the number of reads corresponding to the stored data and/or the output data, the number of reads being the same as the number of algorithm layers for reading the stored data and/or the output data, the number of algorithm layers being determined according to a network structure of an algorithm network.

Description

A data processing method and device, storage medium

Cross References to Related Applications

This application is based on a Chinese patent application with application number 202210044147.5 and a filing date of January 14, 2022, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated into this application by reference.

technical field

The present application relates to the field of artificial intelligence, in particular to a data processing method and device, and a storage medium.

Background technique

In the architecture design of artificial intelligence processors, the structure of separation of calculation and storage is often adopted. Among them, the structure of hierarchical storage is adopted in artificial intelligence processors, that is, a buffer memory is set between the computing engine and the memory storage. Store some data for the calculation engine to perform temporary data interaction. When the data read by the calculation engine does not hit in the buffer memory, it is necessary to transfer new data from the memory storage into the buffer memory, so that the calculation engine reads from the buffer memory.

At present, the data mapping method of memory memory and buffer memory is designed for CPU, taking into account the characteristics of high flexibility and uncertain data access address during CPU execution, while embedded neural network processor (Neural-network Processing Unit, NPU ) architecture, if a data caching mechanism needs to be added, the buffer memory design scheme of the CPU is reused, resulting in the problem of low data caching efficiency for the NPU.

Contents of the invention

Embodiments of the present application provide a data processing method and device, and a storage medium, which can improve the process of data caching efficiency for an NPU.

The technical scheme of the present application is realized like this:

In the first aspect, the embodiment of the present application proposes a data processing device, which includes: a neural network processor, a buffer memory, and a memory memory; wherein, the buffer memory includes cache units, and each cache unit includes a cache data blocks and a cache counter;

The cache data block is used to cache the stored data in the internal memory and/or the output data generated by the neural network processor;

The cache counter is used to cache the number of reads corresponding to the stored data and/or the output data. The number of algorithm layers is the same.

In the second aspect, the embodiment of the present application proposes a data processing method, which is applied to the above-mentioned data processing device, and the method includes:

Obtain the network structure of the algorithm network to be executed, and determine the number of algorithm layers for reading the stored data in the memory storage and/or the output data of each algorithm layer in the algorithm network according to the network structure;

Determining the number of layers of the algorithm as the number of reads of the stored data and/or the number of reads of the output data; and determining the number of reads of the stored data and the stored data, and/or the output The data and the read count of the output data are added to the buffer memory.

In a third aspect, the embodiment of the present application provides a storage medium on which a computer program is stored, and when the computer program is executed by a processor, the above-mentioned data processing method is implemented.

Description of drawings

FIG. 1 is a schematic structural diagram of a data processing device provided in an embodiment of the present application;

FIG. 2 is a schematic structural diagram of an exemplary data processing device using separation of computing and storage provided in the embodiment of the present application;

FIG. 3 is a schematic network structure diagram of an exemplary NPU-executed algorithm network provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of an exemplary storage mapping method between a memory storage and a buffer storage provided in an embodiment of the present application;

FIG. 5 is a flowchart of a data processing method provided by an embodiment of the present application.

Detailed ways

Optionally, the buffer memory is further configured to count the number of read times in the cache counter corresponding to the cache data block each time a read operation on the stored data and/or the output data is detected minus one; until the number of reads in the cache counter is set to zero, then delete the stored data and/or the output data.

Optionally, the neural network processor is configured to determine the number of reads of the output data according to the number of algorithm layers for reading the output data; according to the destination address of the output data in the memory storage, and a storage mapping method between the memory memory and the buffer memory, determining a first storage unit from the buffer memory and/or the memory memory; , and/or buffer the output data into the first storage unit.

Optionally, the neural network processor is further configured to determine the first address corresponding to the output data from the buffer memory according to the destination address of the output data in the internal memory and the storage mapping method. A cache unit group; if the first cache unit group includes a first idle cache unit, then determine the first idle cache unit as the first cache unit;

The neural network processor is further configured to if the first free cache unit is not included in the first cache unit group, and it is not found from the first cache unit group that the number of reads is less than the output data The cache unit of the read times, then according to the destination address, determine the first storage unit from the internal memory;

The neural network processor is further configured to: if the first cache unit group does not include the first idle first cache unit, and the number of reads found from the first cache unit group is less than the A cache unit whose output data read times is determined as the first cache unit whose read times are less than the output data read times;

The buffer memory is further configured to delete the currently stored output data and the remaining read times corresponding to the currently stored output data in the cache unit whose read times are smaller than the output data read times.

Optionally, the neural network processor is further configured to store the output data, and/or the output data and the number of reads of the output data is updated to the internal memory.

Optionally, the neural network processor is further configured to set a to-be-synchronized flag for the output data when the number of reads of the output data and the output data are cached in the buffer memory, and in the When the buffer memory deletes the output data and the read times of the output data, the output data, and/or the output data and the read times of the output data are updated to in the memory storage.

Optionally, the internal memory is configured to determine from the buffer memory the second cache unit group corresponding to the stored data according to the storage address of the stored data and the storage mapping method, if the second If there is a second free storage unit in the cache unit group, then cache the stored data into the second free storage unit; if there is no second free storage unit in the second cache unit group, then from the second free storage unit Find the second cache unit with the smallest number of reads in the second cache unit group, and cache the stored data into the second cache unit;

The neural network processor is also used to determine the number of reads of the stored data; if there is a second free storage unit in the second cache unit group, cache the number of reads of the stored data to the In the second free storage unit, if there is no second free storage unit in the second cache unit group, the read times of the stored data are cached in the second cache unit.

Optionally, if the memory storage includes a storage data block and a storage counter corresponding to the storage data block, the neural network processor is further configured to acquire the storage data from the storage counter times, the number of reads in the storage counter is determined according to the number of algorithm layers for reading data in the storage data block, and/or determined according to the number of reads transmitted by the neural network processor.

Optionally, if the internal memory only includes the stored data block, the neural network processor is further configured to determine from the buffer memory the number of reads currently stored in the second cache unit, And according to the read times currently stored in the second cache unit, determine the read times of the stored data, and/or determine the read times of the stored data according to the number of algorithm layers for reading the stored data.

Optionally, when an algorithm layer read operation on the stored data and/or the output data is detected, the number of reads of the stored data and/or the output data in the buffer memory The number of data reads is reduced by one;

Until the number of reads of the stored data and/or the number of reads of the output data in the buffer memory is set to zero, then the corresponding stored data and/or the output data are read from the buffer memory delete.

Optionally, adding the output data and the read times of the output data to the buffer memory includes:

determining a first storage unit from the buffer memory according to a destination address of the output data in the memory memory and a storage mapping manner between the memory memory and the buffer memory;

The read times of the output data and the output data are cached in the first storage unit.

Optionally, determining the first storage unit from the buffer memory according to the destination address of the output data in the memory memory and the storage mapping method between the memory memory and the buffer memory, include:

Determine the first cache unit group corresponding to the output data from the buffer memory according to the destination address of the output data in the memory storage and the storage mapping manner.

Optionally, if the first cache unit group includes a first idle cache unit, then determine the first idle cache unit as the first cache unit.

Optionally, if the first cache unit group does not include the first idle first cache unit, and the number of reads found from the first cache unit group is less than the number of reads of the output data cache unit, then determine the cache unit of the read times of the output data as the first cache unit.

Optionally, if the first idle cache unit is not included in the first cache unit group, and no cache with a read count less than the read count of the output data is found from the first cache unit group unit, then according to the destination address, determine the first storage unit from the memory storage.

Optionally, after adding the output data and the read times of the output data to the buffer memory, the method further includes:

updating the output data, and/or the output data and the number of reads of the output data into the memory storage.

Setting a flag to be synchronized for the output data, and when the buffer memory deletes the output data and the number of reads of the output data, according to the flag to be synchronized, the output data, and/or all The output data and the reading times of the output data are updated to the memory storage.

Optionally, adding the stored data and the read times of the stored data to the buffer memory includes:

determining the number of reads of the stored data;

determining a second cache unit group corresponding to the stored data from the buffer memory according to the storage address of the stored data and the storage mapping method;

If there is a second free storage unit in the second cache unit group, cache the stored data and the read times of the stored data into the second free storage unit;

If there is no second free storage unit in the second cache unit group, then search for the second cache unit with the smallest number of reads from the second cache unit group, and cache the read times of the stored data to In the second cache unit.

An embodiment of the present application provides a data processing method and device, and a storage medium, the device including: a neural network processor, a buffer memory, and a memory memory; wherein, the buffer memory includes cache units, and each cache unit includes a cache data block and a cache counter; cache data blocks for caching stored data in memory storage and/or output data generated by the neural network processor; cache counters for caching stored data and/or output data corresponding to the number of reads, read The fetch times are the same as the number of algorithm layers for reading the stored data and/or the output data determined according to the network structure of the algorithm network. Using the above-mentioned device implementation scheme, aiming at the fixed and predictable data flow of the neural network processor, the number of algorithm layers for reading data in the buffer memory is known in advance according to the network structure of the algorithm network, and the buffer counter is set in the buffer memory. Storing the number of algorithm layers can ensure that the data to be processed is cached in the buffer memory, greatly reducing the number of times data is written from the memory memory to the buffer memory, thereby improving the data cache efficiency for the NPU.

In order to understand the characteristics and technical contents of the embodiments of the present application in more detail, the implementation of the embodiments of the present application will be described in detail below in conjunction with the accompanying drawings. The attached drawings are only for reference and description, and are not intended to limit the embodiments of the present application.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.

In the following description, references to "some embodiments" describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict. It should also be pointed out that the term "first\second\third" involved in the embodiment of the present application is only used to distinguish similar objects, and does not represent a specific ordering of objects. Understandably, "first\second\ The specific order or sequence of "third" may be interchanged where permitted so that the embodiments of the application described herein can be implemented in an order other than that illustrated or described herein.

The embodiment of the present application provides a data processing device 1. As shown in FIG. 1, the device 1 includes: a neural network processor 10, a buffer memory 11, and a memory memory 12; wherein, the buffer memory 11 includes a cache unit 110, Each cache unit 110 includes a cache data block 1100 and a cache counter 1101;

The cache data block 1100 is used to cache the stored data in the memory storage 12 and/or the output data generated by the neural network processor 10;

The cache counter 1101 is used to cache the number of reads corresponding to the stored data and/or the output data, and the number of reads is the same as the read count of the stored data and/or The number of algorithm layers of the output data is the same.

The data processing device proposed in the embodiment of the present application is a data cache device designed for an NPU architecture.

In the embodiment of the present application, the neural network processor is an NPU, the buffer memory is a cache memory, and the memory memory can be a synchronous dynamic random-access memory (SDRAM, synchronous dynamic random-access memory), double rate synchronous dynamic random-access memory (DDR , Double Data Rate SDRAM) and other memory.

In the embodiment of the present application, the data processing device adopts a structure in which calculation and storage are separated. As shown in FIG. The cache memory is a buffer memory close to the NPU computing engine. A certain amount of data is stored in the buffer memory for the NPU computing engine to perform temporary data interaction. The read and write speed is fast, but the storage capacity is small. The memory memory is far away from the NPU computing engine. All the data is stored in it, and the storage capacity is large, but the reading and writing speed is slow, and each reading path is long, so the reading and writing efficiency is low.

It should be noted that since the NPU has a fixed data flow and can be pre-judged, before executing the algorithm network, the algorithm layer in the algorithm network can be determined by analyzing the network structure of the algorithm network executed by the NPU. As shown in Figure 3, it is a schematic diagram of the network structure of the algorithm network executed by the NPU. Each circle is an algorithm layer. For an algorithm layer, its workflow is to read in the source data and perform calculation. Sub-processing, in which the operator can be convolution, pooling, activation, full connection, etc. After the operator processing is completed, it enters the next algorithm layer for processing. For example, the output data of the algorithm layer 0 will be used as the input data by 1 No. 2 algorithm layer and No. 2 algorithm layer read, the output data of No. 1 algorithm layer will be read as input data by No. 3 algorithm layer and No. 4 algorithm layer, and the output data of No. 2 algorithm layer will be used as input data by No. 5 algorithm Layer, No. 6 algorithm layer and No. 7 algorithm layer are read. Therefore, through the analysis of the network structure of the algorithm network, we can know the number of times the output data of each algorithm layer in the algorithm network is subsequently read out, such as No. 0 algorithm The number of times the output data of the layer is read out is 2, the number of times the output data of the No. 1 algorithm layer is read out is 2, and the number of times the output data of the No. 2 algorithm layer is read out is 3.

Based on the above concept, in the embodiment of the present application, in the cache memory between the NPU and the internal memory, a cache counter is set for each cache data block, and a cache data block and a corresponding cache counter together form a cache unit. Wherein, what is filled in the cache counter is the number of times the data in the corresponding cache data block is read out.

It should be noted that the data stored in the cache memory can be the stored data from the internal memory, or the output data generated after the neural network processor performs operator processing on the stored data, which can be selected according to the actual situation. The embodiments of the present application do not make specific limitations.

Optionally, the buffer memory 11 is further configured to read the cache counter corresponding to the cache data block each time a read operation on the stored data and/or the output data is detected. The number of times is reduced by one; until the number of times of reading in the cache counter is set to zero, the stored data and/or the output data are deleted.

In the embodiment of this application, the operator layer in the NPU will read the storage data and/or output data from the buffer memory, and each time the storage data and/or output data are read from the buffer memory, the buffer memory determines to cache the A cache unit that stores data and/or outputs data, and subtracts one from the number of reads in the counter of the cache unit; until the number of reads in the cache counter is set to zero, it means that the data in the cache unit will not be read out later , at this time, the stored data and/or output data are deleted from the buffer memory, and the corresponding cache unit of the buffer memory is cleared, so that the data can be written into the cache unit later.

It should be noted that the number of reads also reflects the importance of the corresponding cache data block. The greater the number of reads, the more times the data stored in the cache data block will be read by the subsequent algorithm layer, which means The more important the data stored in the cache data block, on the contrary, the smaller the number of reads, indicating that the data stored in the cache data block will be read less frequently by the subsequent algorithm layer, which means that the data stored in the cache data block is more important. unimportant.

Optionally, the neural network processor 10 is configured to determine the number of reads of the output data according to the number of algorithm layers for reading the output data; according to the destination address of the output data in the memory storage , and the memory mapping method between the memory memory and the buffer memory, determine the first storage unit from the buffer memory and/or the memory memory; the number of reads of the output data and the output The data and/or the output data are buffered into the first storage unit.

In the embodiment of the present application, the neural network processor may determine the number of algorithm layers for reading the output data according to the network structure of the algorithm network, and then determine the number of reads of the output data according to the number of algorithm layers.

In the embodiment of the present application, the storage mapping method between the memory storage and the cache memory can be determined according to hardware parameters and cache efficiency. As shown in FIG. 4, the storage data in each storage data block in the memory storage can be mapped to Among the four cache data blocks in the cache memory, for example, the cache data block includes 16 cache data blocks 0-15, and each cache data block is preceded by a cache counter cnt, which together constitute 16 cache units. No. 0, No. 8, ..., No. 2040 storage data blocks in the memory are mapped to No. 0-3 cache data blocks in the cache memory, and so on, and No. 7, No. 15, and No. 2047 storage data blocks in the memory storage are mapped to The cache data blocks No. 12-15 in the cache memory realize the memory mapping mode between the memory memory and the buffer memory.

In the embodiment of the present application, the NPU receives an instruction, which includes a source data address, a target data address, and a convolution operation command, wherein the target data address is the destination address of the output data in the memory storage in this application, The NPU may determine the first storage unit from the buffer memory and/or the internal memory according to the destination address and the storage mapping relationship.

Specifically, the neural network processor 10 is further configured to determine the first address corresponding to the output data from the buffer memory according to the destination address of the output data in the internal memory and the storage mapping method A cache unit group; if the first cache unit group includes a first idle cache unit, then determine the first idle cache unit as being in the first cache unit;

The neural network processor 10 is further configured to if the first free cache unit is not included in the first cache unit group, and the number of reads is not found from the first cache unit group to be less than the output The caching unit for the number of read times of data, then according to the destination address, determine the first storage unit from the internal memory;

The neural network processor 10 is further configured to: if the first cache unit group does not include the first idle first cache unit, and the number of reads found from the first cache unit group is less than the set If the number of reads of the output data is the cache unit, then the cache unit whose read times is less than the number of reads of the output data is determined as the first cache unit;

The buffer memory 11 is further configured to delete the currently stored output data and the remaining read times corresponding to the currently stored output data in the buffer units whose read times are smaller than the output data read times.

In the embodiment of the present application, the neural network processor writes the output data back to the cache memory or memory memory. Specifically, the neural network processor first determines the output data from the buffer memory according to the destination address and storage mapping method of the output data in the memory memory The first cache unit group, and judging whether there is a first idle cache unit in the first cache unit group; if there is a first idle cache unit in the first cache unit group, then determining the first idle cache unit as the first cache unit, And the output data and the read times of the output data are directly cached in the first free cache unit of the buffer memory.

In the embodiment of the present application, if there is no first idle cache unit in the first cache unit group, the number of reads of the output data is compared with the number of reads stored in the first cache unit group in sequence, and if the first cache In the unit group, there are cache units whose read times are less than the read times of the output data, and the importance of representing the output data is higher than the data cached in the cache units whose read times are less than the read times of the output data; at this time, the read times The cache unit that is less than the read times of the output data is determined as the first cache unit, and deletes the currently stored output data in the cache unit of the output data read times and the remaining read times corresponding to the currently stored output data, and then The output data and the number of reads of the output data are cached in a buffer unit whose read count is smaller than the read count of the output data in the buffer memory.

In the embodiment of the present application, if there is no cache unit in the first cache unit group whose reading frequency is less than that of the output data, it means that there is no data cached in the first cache unit group that is less important than the output data. For data, at this time, directly determine the first cache unit from the memory storage according to the destination address, and store the output data, and/or the output data and the read times of the output data into the first cache unit in the memory storage.

Optionally, the neural network processor 10 is further configured to store the output data and/or the output data and the number of reads of said output data are updated to said memory storage;

Or, the neural network processor 10 is further configured to set a synchronization flag for the output data when the output data read times and the output data are cached in the buffer memory, and in the buffer When the memory deletes the output data and the read times of the output data, the output data, and/or the output data and the read times of the output data are updated to the in the memory storage described above.

It should be noted that the data in the cache memory is equivalent to the backup of the data in the memory memory, so when the neural network memory caches the output data and the number of reads of the output data to the buffer memory, it is also necessary to save the output data and/or Or the output data and the number of reads of the output data are synchronized to the memory storage to ensure data consistency between the cache memory and the memory storage. For the synchronization process, there are two ways, one is to update the output data, and/or the output data and the read times of the output data to the memory storage when the output data read times and the output data are cached to the buffer memory, The other is to set a pending synchronization flag for the output data, and when the buffer memory deletes the output data and the number of reads of the output data, update the output data, and/or the output data and the number of readings of the output data according to the pending synchronization flag to memory storage.

It should be noted that the scenario where the buffer memory deletes the output data and the number of reads of the output data may be that the number of reads of the output data is reduced to zero; it may also be that when the NPU caches new output data in the buffer memory, it determines that the output data There is no cache unit in an idle state in the corresponding cache unit group, and the number of reads of output data is less than the number of reads of new output data; it can also be determined that when the memory storage writes new storage data to the buffer memory, There is no cache unit in an idle state in the cache unit group corresponding to the output data, and the read times of the output data are the minimum read times in the corresponding cache unit group.

Optionally, the memory storage 12 is configured to determine from the buffer memory the second cache unit group corresponding to the stored data according to the storage address of the stored data and the storage mapping method, if the first There is a second free storage unit in the second cache unit group, then cache the stored data into the second free storage unit, if there is no second free storage unit in the second cache unit group, then from the Searching for a second cache unit with the smallest number of reads in the second cache unit group, and caching the stored data into the second cache unit;

The neural network processor 10 is also used to determine the read times of the stored data; if there is a second free storage unit in the second cache unit group, cache the read times of the stored data to the In the second free storage unit, if there is no second free storage unit in the second cache unit group, the read times of the stored data are cached in the second cache unit.

In the embodiment of the present application, when the memory memory caches the stored data to the cache memory, the neural network processor also determines the number of reads of the stored data. Determine the second cache unit group corresponding to the stored data, and judge whether there is a second free storage unit in the second cache unit group; if there is a second free storage unit in the second cache unit group, then store the data and store the data read The number of fetches is cached in the second free storage unit; if there is no second free storage unit in the second cache unit group, then search for the second cache unit with the smallest number of read times in the second cache unit group, and then store the data and The read times of the stored data are cached in the second cache unit.

Optionally, if the memory storage 12 includes: a storage data block and a storage counter corresponding to the storage data block; then the neural network processor 10 is further configured to obtain the storage data from the storage counter The number of reads in the storage counter is determined according to the number of algorithm layers for reading data in the stored data block, and/or determined according to the number of reads transmitted by the neural network processor;

If 12 in the memory storage only includes: the storage data block; then the neural network processor 10 is also used to determine the number of reads currently stored in the second cache unit from the buffer memory, according to The reading times currently stored in the second cache unit determine the reading times of the stored data, and/or determine the reading times of the stored data according to the number of algorithm layers for reading the stored data.

In an optional embodiment, a storage counter can be set correspondingly for each storage data block in the memory storage, and the number of reads of the storage data is stored in the corresponding storage counter; the neural network processor can directly read from the storage counter Get the number of reads of stored data in .

In another optional embodiment, only storage data blocks can be set in the memory storage, at this time, the number of times of reading the storage data is not stored in the memory storage; the neural network processor can read the storage data according to the The number of algorithm layers determines the number of reads of the stored data, and can also determine the number of reads of the stored data according to the number of reads currently stored in the second cache unit, wherein the number of reads of the stored data is greater than the number of times currently stored in the second cache unit The number of times of reading, specifically the difference between the number of times of reading the stored data and the number of times of reading currently stored in the second cache can be obtained based on the previous evaluation of the algorithm network.

It can be understood that, in view of the fixed and predictable data flow of the neural network processor, the number of algorithm layers for reading data in the buffer memory is known in advance according to the network structure of the algorithm network, and the cache counter is set in the buffer memory to store The number of algorithm layers can ensure that the data to be processed is cached in the buffer memory, which greatly reduces the number of times data is written from the memory memory to the buffer memory, thereby improving the data cache efficiency for the NPU.

Based on the above-mentioned embodiments, the embodiment of the present application also proposes a data processing method, as shown in FIG. 5, which is applied to the above-mentioned data processing device, and the method includes:

S101. Obtain the network structure of the algorithm network to be executed, and determine the number of algorithm layers for reading the stored data in the memory storage and/or the output data of each algorithm layer in the algorithm network according to the network structure.

In the embodiment of the present application, since the NPU has a fixed data flow and can be pre-judged, before executing the algorithm network, the network structure of the algorithm network executed by the NPU can be analyzed to determine the The data dependencies between the algorithm layers, as shown in Figure 2, is a schematic diagram of the network structure of the algorithm network executed by the NPU. Each circle is an algorithm layer. For an algorithm layer, its workflow is to read the source data, Perform operator processing, where the operator can be convolution, pooling, activation, full connection, etc. After the operator processing is completed, enter the next algorithm layer for processing. For example, the output data of the algorithm layer 0 will be used as input data It is read by Algorithm Layer 1 and Algorithm Layer 2, the output data of Algorithm Layer 1 will be read by Algorithm Layer 3 and Algorithm Layer 4 as input data, and the output data of Algorithm Layer 2 will be read by Algorithm Layer 5 as input data. Algorithm layer No. 6, Algorithm layer No. 6, and Algorithm layer No. 7. Therefore, through the analysis of the network structure of the algorithm network, we can know the number of times the output data of each algorithm layer in the algorithm network is subsequently read out, such as 0 The number of times the output data of the No. algorithm layer is read out is 2, the number of times the output data of the No. 1 algorithm layer is read out is 2, and the number of times the output data of the No. 2 algorithm layer is read out is 3.

S102. Determine the number of algorithm layers as the number of reads of stored data and/or the number of reads of output data; and add the stored data and the number of reads of stored data, and/or the number of reads of output data and output data to in buffer memory.

In the embodiment of the present application, adding the output data and the number of reads of the output data to the buffer memory includes: according to the destination address of the output data in the memory memory and the storage mapping method between the memory memory and the buffer memory, from The buffer memory determines the first storage unit; caches the read times of the output data and the output data into the first storage unit.

Specifically, according to the destination address of the output data in the memory memory and the storage mapping method between the memory memory and the buffer memory, determining the first storage unit from the buffer memory includes: according to the destination address of the output data in the memory memory and In the storage mapping mode, the first cache unit group corresponding to the output data is determined from the buffer memory; if the first cache unit group includes the first idle cache unit, the first idle cache unit is determined as the first cache unit; if the first cache unit is included in the first idle cache unit; A cache unit group does not include the first idle first cache unit, and a cache unit whose read times is less than the read times of the output data is found from the first cache unit group, then the cache unit of the output data read times is The unit is determined as the first cache unit.

Further, if the first free cache unit is not included in the first cache unit group, and no cache unit whose read times is less than the output data read times is not found from the first cache unit group, then according to the destination address, from the memory The memory determines a first storage unit.

In the embodiment of the present application, adding the stored data and the number of reads of the stored data to the buffer memory includes: determining the number of reads of the stored data; The second cache unit group corresponding to the data; if there is a second free storage unit in the second cache unit group, the stored data and the read times of the stored data are cached in the second free storage unit; if in the second cache unit group If there is no second free storage unit, the second cache unit with the smallest number of reads is searched from the second cache unit group, and the read times of the stored data are cached in the second cache unit.

In the embodiment of the present application, when an algorithm layer is detected to read the stored data and/or the output data, the number of reads of the stored data and/or the number of reads of the output data in the buffer memory is reduced by one ; Until the number of reads of stored data and/or the number of reads of output data in the buffer memory is set to zero, then the corresponding stored data and/or output data will be deleted from the buffer memory.

It should be noted that after adding the output data and the number of reads of the output data to the buffer memory, a process of data synchronization is also performed, specifically: updating the output data, and/or the number of reads of the output data and the output data to or, set the flag to be synchronized for the output data, and when the buffer memory deletes the output data and the read times of the output data, according to the flag to be synchronized, the output data, and/or the read of the output data and the output data The number of fetches is updated to the memory storage.

An embodiment of the present application provides a storage medium on which a computer program is stored. The computer-readable storage medium stores one or more programs, and the one or more programs can be executed by one or more neural network processors. In the data processing device, the computer program realizes the above-mentioned data processing method.

It should be noted that, in this document, the term "comprising", "comprising" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements, It also includes other elements not expressly listed, or elements inherent in the process, method, article, or apparatus. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.

Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present disclosure can be embodied in the form of a software product in essence or the part that contributes to the related technology. The computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk, etc.) ) includes several instructions to make an image display device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in various embodiments of the present disclosure.

The above descriptions are only preferred embodiments of the present application, and are not intended to limit the protection scope of the present application.

.

Claims

A data processing device, the device comprising: a neural network processor, a buffer memory, and a memory memory; wherein, the buffer memory includes cache units, and each cache unit includes a cache data block and a cache counter;

The cache data block is used to cache the stored data in the internal memory and/or the output data generated by the neural network processor;

The cache counter is used to cache the number of reads corresponding to the stored data and/or the output data. The number of algorithm layers for the above output data is the same.
The device according to claim 1, wherein,

The buffer memory is further configured to decrement the number of reads in the cache counter corresponding to the cache data block by one each time a read operation on the stored data and/or the output data is detected; until If the number of reads in the buffer counter is set to zero, the stored data and/or the output data are deleted.
The device according to claim 1, wherein,

The neural network processor is configured to determine the number of reads of the output data according to the number of algorithm layers for reading the output data; according to the destination address of the output data in the memory storage and the memory In a storage mapping manner between the memory and the buffer memory, the first storage unit is determined from the buffer memory and/or the internal memory; the number of reads of the output data and the output data, and/or The output data is buffered into the first storage unit.
The apparatus according to claim 3, wherein,

The neural network processor is further configured to determine the first cache unit group corresponding to the output data from the buffer memory according to the destination address of the output data in the memory memory and the storage mapping method; If the first cache unit group includes a first idle cache unit, then determine the first idle cache unit as the first cache unit;

The neural network processor is further configured to if the first free cache unit is not included in the first cache unit group, and the number of reads is not found in the first cache unit group to be less than the output data The cache unit of the read times, then according to the destination address, determine the first storage unit from the internal memory;

The neural network processor is further configured to: if the first cache unit group does not include the first idle first cache unit, and the number of reads found from the first cache unit group is less than the A cache unit whose output data read times is determined as the first cache unit whose read times are less than the output data read times;

The buffer memory is further configured to delete the currently stored output data and the remaining read times corresponding to the currently stored output data in the cache unit whose read times are smaller than the output data read times.
The apparatus according to claim 3, wherein,

The neural network processor is further configured to store the output data, and/or the output data and the output The read count of data is updated to the memory storage.
The apparatus according to claim 3, wherein,

The neural network processor is further configured to set a synchronization flag for the output data when the output data read times and the output data are cached in the buffer memory, and the output data is stored in the buffer memory When the output data and the number of reads of the output data are deleted, the output data, and/or the output data and the number of reads of the output data are updated to the internal memory according to the flag to be synchronized middle.
The apparatus according to claim 2, wherein,

The memory storage is configured to determine from the buffer memory the second cache unit group corresponding to the stored data according to the storage address of the stored data and the storage mapping method, if the second cache unit group is If there is a second free storage unit, cache the stored data into the second free storage unit, if there is no second free storage unit in the second cache unit group, then cache the stored data from the second cache unit group Find the second cache unit with the smallest number of reads, and cache the stored data into the second cache unit;

The neural network processor is also used to determine the number of reads of the stored data; if there is a second free storage unit in the second cache unit group, cache the number of reads of the stored data to the In the second free storage unit, if there is no second free storage unit in the second cache unit group, the read times of the stored data are cached in the second cache unit.
The apparatus according to claim 7, wherein,

If the memory storage includes a stored data block and a storage counter corresponding to the stored data block, the neural network processor is further configured to obtain the number of reads of the stored data from the storage counter, the The number of reads in the storage counter is determined according to the number of algorithm layers for reading data in the storage data block, and/or determined according to the number of reads transmitted by the neural network processor.
The apparatus according to claim 7, wherein,

If only the stored data block is included in the internal memory, the neural network processor is further configured to determine the number of reads currently stored in the second cache unit from the buffer memory, and according to the The reading times currently stored in the second cache unit determine the reading times of the stored data, and/or determine the reading times of the stored data according to the number of algorithm layers for reading the stored data.
A data processing method, applied to the data processing device described in any one of claims 1-9 above, the method comprising:

Obtain the network structure of the algorithm network to be executed, and according to the network structure, determine the number of algorithm layers to read the stored data in the memory storage and/or the output data of each algorithm layer in the algorithm network;

Determining the number of layers of the algorithm as the number of reads of the stored data and/or the number of reads of the output data; and determining the number of reads of the stored data and the stored data, and/or the output The data and the read count of the output data are added to the buffer memory.
The method according to claim 10, wherein the method further comprises:

When an algorithm layer read operation on the stored data and/or the output data is detected, the number of reads of the stored data and/or the read of the output data in the buffer memory times minus one;

Until the number of reads of the stored data and/or the number of reads of the output data in the buffer memory is set to zero, then the corresponding stored data and/or the output data are read from the buffer memory delete.
The method according to claim 10, wherein said adding the output data and the number of reads of the output data to a buffer memory comprises:

determining a first storage unit from the buffer memory according to a destination address of the output data in the memory memory and a storage mapping manner between the memory memory and the buffer memory;

The read times of the output data and the output data are cached in the first storage unit.
The method according to claim 12, wherein, according to the destination address of the output data in the memory memory and the storage mapping method between the memory memory and the buffer memory, the output from the buffer memory Determine the first storage unit, including:

Determine the first cache unit group corresponding to the output data from the buffer memory according to the destination address of the output data in the memory storage and the storage mapping manner.
The method of claim 13, wherein,

If the first cache unit group includes a first idle cache unit, then determine the first idle cache unit as the first cache unit.
The method of claim 13, wherein,

If the first cache unit group does not include the first idle first cache unit, and a cache unit with a read count less than the read count of the output data is found from the first cache unit group, Then determine the cache unit of the read times of the output data as the first cache unit.
The method according to claim 13, wherein the method further comprises:

If the first idle cache unit is not included in the first cache unit group, and no cache unit whose read times are less than the read times of the output data is not found from the first cache unit group, then according to The destination address determines the first storage unit from the memory storage.
The method according to claim 10, wherein, after adding the output data and the read times of the output data into the buffer memory, the method further comprises:

updating the output data, and/or the output data and the read times of the output data into the memory storage.
The method according to claim 10, wherein, after adding the output data and the read times of the output data to the buffer memory, the method further comprises:

Setting a flag to be synchronized for the output data, and when the buffer memory deletes the output data and the number of reads of the output data, according to the flag to be synchronized, the output data, and/or all The output data and the reading times of the output data are updated to the memory storage.
The method according to claim 10, wherein the adding the stored data and the number of reads of the stored data to a buffer memory comprises:

determining the number of reads of the stored data;

determining a second cache unit group corresponding to the stored data from the buffer memory according to the storage address of the stored data and the storage mapping method;

If there is a second free storage unit in the second cache unit group, cache the stored data and the read times of the stored data into the second free storage unit;

If there is no second free storage unit in the second cache unit group, then search for the second cache unit with the smallest number of reads from the second cache unit group, and cache the read times of the stored data to In the second cache unit.
A storage medium, on which a computer program is stored, and when the computer program is executed by a neural network processor, the method according to any one of claims 10-19 is realized.