CN114742214A

CN114742214A - Caching method, system, device and storage medium of neural network

Info

Publication number: CN114742214A
Application number: CN202210299126.8A
Authority: CN
Inventors: 王鉴; 虞志益; 邓慧鹏; 叶华锋; 肖山林
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-07-12
Also published as: WO2023179619A1

Abstract

The invention discloses a neural network caching method, a system, a device and a storage medium. The method comprises the steps of obtaining configuration information containing neural network dimensionality; setting the working mode of the cache according to the configuration information; acquiring target data to be processed through the set cache; and processing the target data to be processed through the set cache according to the configuration information. The method comprises the steps of configuring a high-speed buffer to ensure that the high-speed buffer determines target data to be processed, and processing different data to be processed by adopting different data processing schemes according to configuration information, so that cache mapping aiming at different dimensionality neural networks is realized, the high-speed buffer can avoid congestion, and high-concurrency and high-throughput writing and output are performed. The invention can be widely applied to the technical field of neural network algorithms.

Description

Caching method, system, device and storage medium of neural network

Technical Field

The invention relates to the technical field of neural network algorithms, in particular to a neural network caching method, a neural network caching system, a neural network caching device and a storage medium.

Background

Neural networks of different dimensions and different sizes differ. Networks of different dimensions need to allocate additional resources, resulting in a waste of computing resource distribution; networks of different sizes may cause a cache that cannot be flexibly configured to fail to meet high performance computing requirements to become a successful bottleneck. The realization and deployment differentiation of the convolutional neural network on a hardware platform is increased day by day, and the hardware design lacks the flexibility of supporting various network dimensions and sizes.

In summary, the problems of the related art need to be solved.

Disclosure of Invention

The present invention aims to solve at least to some extent one of the technical problems existing in the prior art.

To this end, it is an object of the embodiments of the present invention to provide a neural network caching method, system, apparatus and medium, which enable a cache to be written and output with high concurrency and high throughput.

In order to achieve the technical purpose, the technical scheme adopted by the embodiment of the invention comprises the following steps:

in one aspect, an embodiment of the present invention provides a cache method for a neural network, including the following steps:

acquiring configuration information of a cache, wherein the configuration information comprises dimension information of a neural network to be processed;

setting the working mode of the cache according to the dimension information;

acquiring target data to be processed through the set cache;

and processing the target data to be processed through the set cache according to the configuration information.

Further, the configuration information includes dimension information, calculation size information, and calculation step size information.

Further, the step of setting the operation mode of the cache memory includes:

acquiring the dimension information from the configuration information;

and setting a cache mapping scheme of the cache according to the dimension information.

Further, the cache mapping scheme includes a one-dimensional cache mapping scheme, a two-dimensional cache mapping scheme, and a three-dimensional cache mapping scheme.

Further, the setting of the cache to acquire target data to be processed specifically includes the following steps:

acquiring the target data to be processed from the data to be processed according to the configuration information;

and writing the target data to be processed into the cache.

Further, the processing the target data to be processed by the set cache specifically includes the following steps:

determining a corresponding data multiplexing strategy according to the calculated size information and the calculated step length information;

and processing the target data to be processed according to the data multiplexing strategy.

Further, the data multiplexing strategy comprises a one-dimensional data multiplexing strategy, a two-dimensional data multiplexing strategy and a three-dimensional data multiplexing strategy.

In another aspect, an embodiment of the present invention provides a cache system of a neural network, including:

a first module for obtaining configuration information of a cache;

a second module, configured to set a working mode of the cache according to the configuration information;

the third module is used for acquiring target data to be processed through the set cache;

and the fourth module is used for processing the target data to be processed through the set cache according to the configuration information.

In another aspect, an embodiment of the present invention provides a cache apparatus for a neural network, including:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, the at least one program causes the at least one processor to implement the caching method for the neural network.

In another aspect, the embodiment of the present invention provides a storage medium, in which processor-executable instructions are stored, and when the processor-executable instructions are executed by a processor, the processor-executable instructions are used for implementing the cache method of the neural network.

The invention discloses a high-speed cache method of a neural network, which has the following beneficial effects:

the embodiment obtains configuration information containing neural network dimensionality; setting the working mode of the cache according to the configuration information; acquiring target data to be processed through the set cache; and processing the target data to be processed through the set cache according to the configuration information. The method comprises the steps of configuring a high-speed buffer to ensure that the high-speed buffer determines target data to be processed, and processing different data to be processed by adopting different data processing schemes according to configuration information, so that cache mapping aiming at different dimensionality neural networks is realized, the high-speed buffer can avoid congestion, and high-concurrency and high-throughput writing and output are performed. Moreover, the corresponding unified fixed calculation array can realize high-efficiency mapping, thereby improving the calculation efficiency. Meanwhile, the redundant method of zero filling and storing of data in the cache can be effectively reduced by supporting the calculation sizes of different convolutional neural networks. And finally, different multiplexing strategies are selected according to data with different dimensions and different sizes, so that the cache access data volume is reduced, and the hardware cache resource overhead can be reduced.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic flowchart illustrating a caching method for a neural network according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a cache system of a neural network according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a cache apparatus of a neural network according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating a direct mapping according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a fully associative mapping according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating a set associative mapping according to an embodiment of the present invention;

fig. 7 is a schematic diagram of data multiplexing of three-dimensional data according to an embodiment of the present invention;

fig. 8 is a schematic diagram of data multiplexing of two-dimensional data according to an embodiment of the present invention;

fig. 9 is a schematic diagram of data multiplexing of one-dimensional data according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the present embodiments of the present invention, preferred embodiments of which are illustrated in the accompanying drawings, wherein the drawings are provided for the purpose of visually supplementing the description in the specification and so forth, and which are not intended to limit the scope of the invention.

In the description of the embodiments of the present invention, several means are one or more, a plurality of means is two or more, more than, less than, more than, etc. are understood as excluding the essential numbers, more than, less than, inner, etc. are understood as including the essential numbers, "at least one" means one or more, "at least one of the following" and the like, and any combination of these items, including any combination of a single item or plural items, is meant. If the description of "first", "second", etc. is used for the purpose of distinguishing technical features, it is not intended to indicate or imply relative importance or to implicitly indicate the number of indicated technical features or to implicitly indicate the precedence of the indicated technical features.

It should be noted that terms such as setting, installing, connecting and the like in the embodiments of the present invention should be understood in a broad sense, and a person skilled in the art may reasonably determine specific meanings of the terms in the embodiments of the present invention by combining specific contents of the technical solutions. For example, the term "coupled" may be mechanical, electrical, or may be in communication with each other; may be directly connected or indirectly connected through an intermediate.

In the description of embodiments of the present disclosure, reference to the description of the terms "one embodiment/implementation," "another embodiment/implementation," or "certain embodiments/implementations," "in the above embodiments/implementations," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least two embodiments or implementations of the present disclosure. In the present disclosure, a schematic representation of the above terms does not necessarily refer to the same exemplary embodiment or implementation. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or implementations.

It should be noted that the technical features related to the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The neural network is an arithmetic mathematical model which imitates the behavior characteristics of the animal neural network and performs distributed parallel information processing. The network achieves the aim of processing information by adjusting the mutual connection relationship among a large number of nodes in the network depending on the complexity of the system. Such mathematical models are composed of a large number of nodes (or neurons) directly associated with each other; each node (except the input node) represents a particular output function (or operation), called the stimulus function; the connection of each two nodes represents the weight of the signal in transmission (namely the weight of the node for which the 'memory value' is considered to be transferred), and is called weight; the output of the network varies due to the difference in excitation functions and weights, and is an approximation to a certain function or an approximate description of a mapping relationship.

Neural networks of different dimensions and different sizes differ. Networks of different dimensions need to allocate additional resources, resulting in a waste of computing resource distribution; networks of different sizes can create a successful bottleneck in that caches that cannot be flexibly configured cannot meet high performance computing requirements. The realization and deployment differentiation of the convolutional neural network on a hardware platform is increased day by day, and the hardware design in the related technology lacks the flexibility of supporting various network dimensions and sizes, so that data congestion is caused, high-concurrency work cannot be performed, and the calculation efficiency is low.

Therefore, the application provides a cache method, a cache system, a cache device and a storage medium of a neural network, wherein the method comprises the steps of obtaining configuration information containing neural network dimensionality; setting the working mode of the cache according to the configuration information; acquiring target data to be processed through the set cache; and processing the target data to be processed through the set cache according to the configuration information. The method comprises the steps of configuring a high-speed buffer to ensure that the high-speed buffer determines target data to be processed, and processing different data to be processed by adopting different data processing schemes according to configuration information, so that cache mapping aiming at different dimensionality neural networks is realized, the high-speed buffer can avoid congestion, and high-concurrency and high-throughput writing and output are performed. The invention can be widely applied to the technical field of neural network algorithms.

Fig. 1 is a flowchart of a caching method of a neural network according to an embodiment of the present application. Referring to fig. 1, the caching method of the neural network includes, but is not limited to, steps S110 to S140.

S110, obtaining configuration information of the cache, wherein the configuration information comprises dimension information of the neural network to be processed.

In this step, in order to set the working mode of the cache, configuration information of the cache needs to be obtained, where the configuration information includes dimension information of the neural network to be processed. It is understood that the dimension information of the neural network includes one-dimensional dimension information, two-dimensional dimension information, and three-dimensional dimension information, and different cache configurations are implemented for different dimensional neural networks. The dimension information is obtained from the outside and is determined by a neural network to be processed. Exemplarily, if the neural network to be processed is a one-dimensional neural network, the dimension information contains one-dimensional dimension information; if the neural network to be processed is a two-dimensional neural network, the dimension information includes two-dimensional dimension information.

Cache refers to Cache memory (Cache) in its original sense, which is a Random Access Memory (RAM) that is faster than the normal RAM, and generally uses SRAM technology, which is expensive but faster, rather than DRAM technology, as the main memory of the system.

The cache memory is a first-level memory existing between a main memory and a CPU, and is composed of static memory chips (SRAM), and has a relatively small capacity but a speed much higher than that of the main memory and close to that of the CPU. In the hierarchy of computer memory systems, there is a high speed, small capacity memory between the central processor and the main memory. Which together with the main memory constitutes the primary memory. The scheduling and transfer of information between the cache memory and the main memory is automated by hardware.

S120, setting the working mode of the cache according to the dimension information.

In this step, the operating mode of the cache is set according to the acquired dimension information. Specifically, according to different dimension configuration information, corresponding cache mapping schemes are adopted for one-dimensional data, two-dimensional data and three-dimensional data respectively. The cache structure contains 4 set caches, each containing 4 tile caches. Different cache mappings are needed in convolutional neural network calculations of different dimensions (one-dimensional, two-dimensional, three-dimensional) to achieve high-parallelism and high-throughput output.

S130, acquiring target data to be processed through the set cache.

In this step, after the cache is configured, the cache can determine and read the data mode to be calculated, and the cache can select and read the data addresses of different cache groups and cache slices, so as to obtain the data with the corresponding size for calculation. When reading out data, the size is calculated according to the configured output, and the continuously distributed data is read and output.

S140, processing the target data to be processed through the set cache according to the configuration information.

In this step, the cache selects a corresponding data read-out multiplexing strategy according to the calculation size and the calculation step size of the convolutional neural network. Because in the calculation process of the convolutional neural network, overlapped data is generated when the activation features are cut into blocks, namely, a certain data may exist in two different convolution processes at the same time. In the related art, such the same data is read and stored for multiple times, which results in resource waste. By controlling the cache address pointer, the method and the device can realize one-time storage and multiple reading and use of the overlapped data, thereby achieving data multiplexing and reducing the stored data amount.

The application provides a high-speed cache method, a high-speed cache system, a high-speed cache device and a storage medium of a neural network, which are characterized in that configuration information containing neural network dimensionality is obtained; setting the working mode of the cache according to the configuration information; acquiring target data to be processed through the set cache; and processing the target data to be processed through the set cache according to the configuration information. The method comprises the steps of configuring a high-speed buffer to ensure that the high-speed buffer determines target data to be processed, and processing different data to be processed by adopting different data processing schemes according to configuration information, so that cache mapping aiming at different dimensionality neural networks is realized, the high-speed buffer can avoid congestion, and high-concurrency and high-throughput writing and output are performed. Moreover, the corresponding unified fixed calculation array can realize high-efficiency mapping, thereby improving the calculation efficiency. Meanwhile, the redundant method of zero filling and storing of data in the cache can be effectively reduced by supporting the calculation sizes of different convolutional neural networks. And finally, different multiplexing strategies are selected according to the data with different dimensions and different sizes, so that the cache access data volume is reduced, and the hardware cache resource overhead can be reduced.

Further as an optional implementation manner, the configuration information includes dimension information, calculation size information, and calculation step size information.

As a further optional embodiment, the step of setting the operation mode of the cache memory includes:

acquiring the dimension information from the configuration information;

Further as an optional implementation manner, the cache mapping scheme includes a one-dimensional cache mapping scheme, a two-dimensional cache mapping scheme, and a three-dimensional cache mapping scheme.

Specifically, there are three mapping modes for the cache memory, direct mapping, fully associative mapping, and set associative mapping, respectively

Direct mapping, the Cache organization of direct mapping is shown in FIG. 4. One block in the main memory can only be mapped into a certain specific block of the Cache. For example, the 0 th block, the 16 th block, … …, or 2032 of the main memory can only map to the 0 th block of the Cache; while the 1 st, 17 th, … … th, 2033 th blocks of the main memory can only map to the 1 st block … … of the Cache.

The direct mapping is the simplest address mapping mode, and the direct mapping has simple hardware, low cost and high address conversion speed and does not relate to the problem of a replacement algorithm. However, this method is not flexible enough, the storage space of the Cache cannot be fully utilized, each main memory block has only one fixed position for storage, and conflicts are easily generated, so that the Cache efficiency is reduced, and therefore, the method is only suitable for being used by high-capacity caches. For example, if a program needs to refer to the 0 th block and the 16 th block in the main memory repeatedly, it is better to copy the 0 th block and the 16 th block of the main memory into the Cache at the same time, but since both of them can only be copied into the 0 th block of the Cache, even if the other storage space in the Cache is empty, they cannot be occupied, so the two blocks will be loaded into the Cache alternately, resulting in a reduced hit rate.

The fully associative mapping mode, fig. 5 is the Cache organization of the fully associative mapping, and any block in the main memory can be mapped to any block position in the Cache.

The fully associative mapping mode is flexible, each block of the main memory can be mapped into any block of the Cache, the utilization rate of the Cache is high, the block conflict probability is low, and any block of the main memory can be called as long as a certain block of the Cache is eliminated. However, because the design and implementation of the Cache comparison circuit are difficult, this approach is only suitable for small-capacity caches.

The set associative mapping is actually a compromise between direct mapping and fully associative mapping, and the organization structure is shown in fig. 6. The main memory and the Cache are divided into groups, the number of blocks in one group in the main memory is the same as the number of the groups in the Cache, direct mapping is adopted among the groups, and fully associative mapping is adopted in the groups. That is, the Cache is divided into u groups of v blocks each, and which group the main memory block is stored into is fixed, and which block is stored into the group is flexible. For example, main memory is divided into 256 groups of 8 blocks each, and Cache is divided into 8 groups of 2 blocks each.

Further as an optional implementation manner, the obtaining of the target to-be-processed data by the set cache specifically includes the following steps:

and writing the target data to be processed into the cache.

In this step, unlike the configuration information, which is obtained from the outside and is used to set the operating state of the cache, the data to be processed is information that the cache needs to be computationally processed. After the cache is configured with the calculated size, the cache can determine the mode for reading the data to be calculated. The cache may choose to read the data addresses of different cache sets and cache slices to obtain the corresponding size of data for computation. When reading out the data, the continuous distribution data is read out and output according to the configured output calculation size.

As a further optional implementation manner, the processing the target data to be processed by the set cache specifically includes the following steps:

As a further optional implementation manner, the data multiplexing policy includes a one-dimensional data multiplexing policy, a two-dimensional data multiplexing policy, and a three-dimensional data multiplexing policy.

Because in the calculation process of the convolutional neural network, overlapped data is generated when the activation features are cut into blocks, namely, a certain data may exist in two different convolution processes at the same time. In the related art, such the same data is read and stored for multiple times, which results in resource waste. By controlling the cache address pointer, the method and the device can realize one-time storage and multiple reading and use of the overlapped data, thereby achieving data multiplexing and reducing the stored data amount.

One-dimensional data multiplexing only exists in the same piece of cache, and corresponds to one-dimensional row (W) dimension, namely column data updating in convolution calculation; two-dimensional data multiplexing exists not only in the same cache but also between different groups, and corresponds to a two-dimensional row-column (WxH) dimension. When the active data needs to be updated in a line feed manner in the two-dimensional convolution process, the updated data can be formed only by adjusting the pointer of the previously used cache group to zero and adding the data reading of the new cache group, the data of the first two cache groups are multiplexed, and the cache number is reduced; the three-dimensional data multiplexing range exists among the same slice, different groups and different slices at the same time, and corresponds to the dimension of a three-dimensional row-column frame (WxHxF). The multiplexing case also exists in the active column update, row update and frame update processes, respectively. And by controlling the cache slice address pointer and the group address pointer, pointer maintenance is carried out on data needing multiplexing, and the function of reserving overlapped calculation data is realized.

The cache structure contains 4 set caches, each containing 4 tile caches. Different cache mappings are needed in convolutional neural network calculations of different dimensions (one-dimensional, two-dimensional, three-dimensional) to achieve high-parallelism and high-throughput output.

The three-dimensional data has three dimensions of a row-column frame (HxWxF) except a channel (C), the cache mapping of the three-dimensional data is stored into 4 slice caches of one group by taking 4 frames (F) as basic units, and the data of the same row is mapped into continuous addresses of the same slice of the cache. Storing the next column of data in another group of cache so as to simultaneously output multi-frame data of multiple rows and multiple columns for three-dimensional convolution calculation;

illustratively, table 1 shows the address pointer transformation and control situation of three-dimensional data during data multiplexing, referring to fig. 7, if there is an overlap of two frames of calculated data, the cache only needs to store the data of the overlap of the two frames once, and then the data multiplexing can be realized by repeatedly reading the address pointer at the overlapping position twice, so as to obtain two data blocks with data overlap for calculation, instead of storing all data of the two data blocks.

Example of frame data update when reading data of size 3 × 3 × 3:

TABLE 1

The two-dimensional data has two dimensions of rows and columns (HxW) except for the channel (C), and the two-dimensional data can be regarded as a special case of 1 dimension of three-dimensional data, the cache mapping of the two-dimensional data takes 4 channels (C) as a basic unit and caches the three-dimensional data in 4 slice caches in the same group, and the data in the same row is mapped to continuous addresses of the same slice in the cache. The next column of data is stored in the buffer memory of the other group, so that the data of multiple rows and multiple columns of multiple channels are simultaneously output for two-dimensional convolution calculation;

illustratively, table 2 shows the address pointer transformation and control of the two-dimensional data during data multiplexing, and referring to fig. 8, if there is two overlapping rows of calculated data, the cache only needs to store the two overlapping rows of data once, and then the data multiplexing can be realized by repeatedly reading the overlapping position address pointer twice, so as to obtain two data blocks with data overlapping for calculation, instead of storing all data of the two data blocks.

Example of row data update when reading data of size 3 × 3 × 3:

TABLE 2

The one-dimensional data only has line (W) dimension except for a channel (C), the channel data is divided into a channel (longitudinal) and a channel (transverse) according to the arrangement direction, 4 channels (transverse) are used as basic units to be cached in 4 slice caches in the same group, and the data in the same line is mapped to continuous addresses of the same slice of the cache. The next channel (vertical data) is stored in another set of buffers. The mapping method of different dimensions can effectively reduce the vacancy of the computing unit for a fixed and uniform computing circuit.

Illustratively, table 3 shows the address pointer transformation and control situation of the one-dimensional data during data multiplexing, referring to fig. 9, if there is two columns of overlapping data in the calculated data, the cache only needs to store the two columns of overlapping data once, and then the data multiplexing can be realized by repeatedly reading the address pointer at the overlapping position twice, so as to obtain two data blocks with overlapping data for calculation, instead of storing all data of the two data blocks.

Column data update example when reading 3 × 3 × 3 size data:

TABLE 3

Referring to fig. 2, a cache system of a neural network according to an embodiment of the present invention includes:

a first module 210, configured to obtain configuration information of a cache;

a second module 220, configured to set a working mode of the cache according to the configuration information;

a third module 230, configured to obtain target to-be-processed data through the set cache;

a fourth module 240, configured to process the target to-be-processed data through the set cache according to the configuration information.

The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.

Referring to fig. 3, an embodiment of the present invention provides a cache apparatus of a neural network, including:

at least one processor 310;

at least one memory 320 for storing at least one program;

the at least one program, when executed by the at least one processor 310, causes the at least one processor 310 to implement the caching method of the neural network shown in fig. 1.

The contents in the above method embodiments are all applicable to the present apparatus embodiment, the functions specifically implemented by the present apparatus embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present apparatus embodiment are also the same as those achieved by the above method embodiments.

Embodiments of the present invention also provide a storage medium having stored therein processor-executable instructions, which when executed by a processor, are used to implement the caching method of the neural network shown in fig. 1.

It can be understood that, compared with the prior art, the embodiment of the present invention also has the following advantages:

1) cache mapping for neural networks of different dimensions is realized, so that congestion of a cache can be avoided, and high-concurrency and high-throughput writing and outputting are performed.

2) The corresponding unified fixed calculation array can realize high-efficiency mapping, thereby improving the calculation efficiency.

3) The redundant method of zero padding storage of data in the cache can be effectively reduced by supporting the calculation sizes of different convolutional neural networks.

4) Different multiplexing strategies are selected according to data with different dimensions and different sizes, so that the cache access data volume is reduced, and the hardware cache resource overhead can be reduced.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A caching method for a neural network, comprising the steps of:

setting the working mode of the cache according to the dimension information;

acquiring target data to be processed through the set cache;

2. The method of claim 1, wherein the configuration information comprises dimension information, calculation size information, and calculation stride information.

3. The method of claim 1, wherein the step of setting the operating mode of the cache comprises:

acquiring the dimension information from the configuration information;

4. The neural network caching method of claim 3, wherein the cache mapping scheme comprises a one-dimensional cache mapping scheme, a two-dimensional cache mapping scheme, and a three-dimensional cache mapping scheme.

5. The method for caching in a neural network according to claim 1, wherein the setting-completed cache for obtaining the target data to be processed specifically includes the following steps:

and writing the target data to be processed into the cache.

6. The method according to claim 2, wherein the set cache is used for processing the target data to be processed, and the method specifically comprises the following steps:

7. The caching method for a neural network of claim 6, wherein the data multiplexing policy comprises a one-dimensional data multiplexing policy, a two-dimensional data multiplexing policy, and a three-dimensional data multiplexing policy.

8. A cache system for a neural network, comprising:

a first module for obtaining configuration information of a cache;

9. A cache apparatus of a neural network, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, the at least one program causes the at least one processor to implement the caching method for the neural network of any one of claims 1-7.

10. A computer readable storage medium having stored therein processor executable instructions, which when executed by a processor, are for implementing a caching method for a neural network as claimed in any one of claims 1 to 7.