CN114742214A - Caching method, system, device and storage medium of neural network - Google Patents

Caching method, system, device and storage medium of neural network Download PDF

Info

Publication number
CN114742214A
CN114742214A CN202210299126.8A CN202210299126A CN114742214A CN 114742214 A CN114742214 A CN 114742214A CN 202210299126 A CN202210299126 A CN 202210299126A CN 114742214 A CN114742214 A CN 114742214A
Authority
CN
China
Prior art keywords
cache
data
processed
neural network
configuration information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210299126.8A
Other languages
Chinese (zh)
Inventor
王鉴
虞志益
邓慧鹏
叶华锋
肖山林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202210299126.8A priority Critical patent/CN114742214A/en
Publication of CN114742214A publication Critical patent/CN114742214A/en
Priority to PCT/CN2023/082863 priority patent/WO2023179619A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a neural network caching method, a system, a device and a storage medium. The method comprises the steps of obtaining configuration information containing neural network dimensionality; setting the working mode of the cache according to the configuration information; acquiring target data to be processed through the set cache; and processing the target data to be processed through the set cache according to the configuration information. The method comprises the steps of configuring a high-speed buffer to ensure that the high-speed buffer determines target data to be processed, and processing different data to be processed by adopting different data processing schemes according to configuration information, so that cache mapping aiming at different dimensionality neural networks is realized, the high-speed buffer can avoid congestion, and high-concurrency and high-throughput writing and output are performed. The invention can be widely applied to the technical field of neural network algorithms.

Description

Caching method, system, device and storage medium of neural network
Technical Field
The invention relates to the technical field of neural network algorithms, in particular to a neural network caching method, a neural network caching system, a neural network caching device and a storage medium.
Background
Neural networks of different dimensions and different sizes differ. Networks of different dimensions need to allocate additional resources, resulting in a waste of computing resource distribution; networks of different sizes may cause a cache that cannot be flexibly configured to fail to meet high performance computing requirements to become a successful bottleneck. The realization and deployment differentiation of the convolutional neural network on a hardware platform is increased day by day, and the hardware design lacks the flexibility of supporting various network dimensions and sizes.
In summary, the problems of the related art need to be solved.
Disclosure of Invention
The present invention aims to solve at least to some extent one of the technical problems existing in the prior art.
To this end, it is an object of the embodiments of the present invention to provide a neural network caching method, system, apparatus and medium, which enable a cache to be written and output with high concurrency and high throughput.
In order to achieve the technical purpose, the technical scheme adopted by the embodiment of the invention comprises the following steps:
in one aspect, an embodiment of the present invention provides a cache method for a neural network, including the following steps:
acquiring configuration information of a cache, wherein the configuration information comprises dimension information of a neural network to be processed;
setting the working mode of the cache according to the dimension information;
acquiring target data to be processed through the set cache;
and processing the target data to be processed through the set cache according to the configuration information.
Further, the configuration information includes dimension information, calculation size information, and calculation step size information.
Further, the step of setting the operation mode of the cache memory includes:
acquiring the dimension information from the configuration information;
and setting a cache mapping scheme of the cache according to the dimension information.
Further, the cache mapping scheme includes a one-dimensional cache mapping scheme, a two-dimensional cache mapping scheme, and a three-dimensional cache mapping scheme.
Further, the setting of the cache to acquire target data to be processed specifically includes the following steps:
acquiring the target data to be processed from the data to be processed according to the configuration information;
and writing the target data to be processed into the cache.
Further, the processing the target data to be processed by the set cache specifically includes the following steps:
determining a corresponding data multiplexing strategy according to the calculated size information and the calculated step length information;
and processing the target data to be processed according to the data multiplexing strategy.
Further, the data multiplexing strategy comprises a one-dimensional data multiplexing strategy, a two-dimensional data multiplexing strategy and a three-dimensional data multiplexing strategy.
In another aspect, an embodiment of the present invention provides a cache system of a neural network, including:
a first module for obtaining configuration information of a cache;
a second module, configured to set a working mode of the cache according to the configuration information;
the third module is used for acquiring target data to be processed through the set cache;
and the fourth module is used for processing the target data to be processed through the set cache according to the configuration information.
In another aspect, an embodiment of the present invention provides a cache apparatus for a neural network, including:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, the at least one program causes the at least one processor to implement the caching method for the neural network.
In another aspect, the embodiment of the present invention provides a storage medium, in which processor-executable instructions are stored, and when the processor-executable instructions are executed by a processor, the processor-executable instructions are used for implementing the cache method of the neural network.
The invention discloses a high-speed cache method of a neural network, which has the following beneficial effects:
the embodiment obtains configuration information containing neural network dimensionality; setting the working mode of the cache according to the configuration information; acquiring target data to be processed through the set cache; and processing the target data to be processed through the set cache according to the configuration information. The method comprises the steps of configuring a high-speed buffer to ensure that the high-speed buffer determines target data to be processed, and processing different data to be processed by adopting different data processing schemes according to configuration information, so that cache mapping aiming at different dimensionality neural networks is realized, the high-speed buffer can avoid congestion, and high-concurrency and high-throughput writing and output are performed. Moreover, the corresponding unified fixed calculation array can realize high-efficiency mapping, thereby improving the calculation efficiency. Meanwhile, the redundant method of zero filling and storing of data in the cache can be effectively reduced by supporting the calculation sizes of different convolutional neural networks. And finally, different multiplexing strategies are selected according to data with different dimensions and different sizes, so that the cache access data volume is reduced, and the hardware cache resource overhead can be reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flowchart illustrating a caching method for a neural network according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a cache system of a neural network according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a cache apparatus of a neural network according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a direct mapping according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a fully associative mapping according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a set associative mapping according to an embodiment of the present invention;
fig. 7 is a schematic diagram of data multiplexing of three-dimensional data according to an embodiment of the present invention;
fig. 8 is a schematic diagram of data multiplexing of two-dimensional data according to an embodiment of the present invention;
fig. 9 is a schematic diagram of data multiplexing of one-dimensional data according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the present embodiments of the present invention, preferred embodiments of which are illustrated in the accompanying drawings, wherein the drawings are provided for the purpose of visually supplementing the description in the specification and so forth, and which are not intended to limit the scope of the invention.
In the description of the embodiments of the present invention, several means are one or more, a plurality of means is two or more, more than, less than, more than, etc. are understood as excluding the essential numbers, more than, less than, inner, etc. are understood as including the essential numbers, "at least one" means one or more, "at least one of the following" and the like, and any combination of these items, including any combination of a single item or plural items, is meant. If the description of "first", "second", etc. is used for the purpose of distinguishing technical features, it is not intended to indicate or imply relative importance or to implicitly indicate the number of indicated technical features or to implicitly indicate the precedence of the indicated technical features.
It should be noted that terms such as setting, installing, connecting and the like in the embodiments of the present invention should be understood in a broad sense, and a person skilled in the art may reasonably determine specific meanings of the terms in the embodiments of the present invention by combining specific contents of the technical solutions. For example, the term "coupled" may be mechanical, electrical, or may be in communication with each other; may be directly connected or indirectly connected through an intermediate.
In the description of embodiments of the present disclosure, reference to the description of the terms "one embodiment/implementation," "another embodiment/implementation," or "certain embodiments/implementations," "in the above embodiments/implementations," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least two embodiments or implementations of the present disclosure. In the present disclosure, a schematic representation of the above terms does not necessarily refer to the same exemplary embodiment or implementation. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or implementations.
It should be noted that the technical features related to the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The neural network is an arithmetic mathematical model which imitates the behavior characteristics of the animal neural network and performs distributed parallel information processing. The network achieves the aim of processing information by adjusting the mutual connection relationship among a large number of nodes in the network depending on the complexity of the system. Such mathematical models are composed of a large number of nodes (or neurons) directly associated with each other; each node (except the input node) represents a particular output function (or operation), called the stimulus function; the connection of each two nodes represents the weight of the signal in transmission (namely the weight of the node for which the 'memory value' is considered to be transferred), and is called weight; the output of the network varies due to the difference in excitation functions and weights, and is an approximation to a certain function or an approximate description of a mapping relationship.
Neural networks of different dimensions and different sizes differ. Networks of different dimensions need to allocate additional resources, resulting in a waste of computing resource distribution; networks of different sizes can create a successful bottleneck in that caches that cannot be flexibly configured cannot meet high performance computing requirements. The realization and deployment differentiation of the convolutional neural network on a hardware platform is increased day by day, and the hardware design in the related technology lacks the flexibility of supporting various network dimensions and sizes, so that data congestion is caused, high-concurrency work cannot be performed, and the calculation efficiency is low.
Therefore, the application provides a cache method, a cache system, a cache device and a storage medium of a neural network, wherein the method comprises the steps of obtaining configuration information containing neural network dimensionality; setting the working mode of the cache according to the configuration information; acquiring target data to be processed through the set cache; and processing the target data to be processed through the set cache according to the configuration information. The method comprises the steps of configuring a high-speed buffer to ensure that the high-speed buffer determines target data to be processed, and processing different data to be processed by adopting different data processing schemes according to configuration information, so that cache mapping aiming at different dimensionality neural networks is realized, the high-speed buffer can avoid congestion, and high-concurrency and high-throughput writing and output are performed. The invention can be widely applied to the technical field of neural network algorithms.
Fig. 1 is a flowchart of a caching method of a neural network according to an embodiment of the present application. Referring to fig. 1, the caching method of the neural network includes, but is not limited to, steps S110 to S140.
S110, obtaining configuration information of the cache, wherein the configuration information comprises dimension information of the neural network to be processed.
In this step, in order to set the working mode of the cache, configuration information of the cache needs to be obtained, where the configuration information includes dimension information of the neural network to be processed. It is understood that the dimension information of the neural network includes one-dimensional dimension information, two-dimensional dimension information, and three-dimensional dimension information, and different cache configurations are implemented for different dimensional neural networks. The dimension information is obtained from the outside and is determined by a neural network to be processed. Exemplarily, if the neural network to be processed is a one-dimensional neural network, the dimension information contains one-dimensional dimension information; if the neural network to be processed is a two-dimensional neural network, the dimension information includes two-dimensional dimension information.
Cache refers to Cache memory (Cache) in its original sense, which is a Random Access Memory (RAM) that is faster than the normal RAM, and generally uses SRAM technology, which is expensive but faster, rather than DRAM technology, as the main memory of the system.
The cache memory is a first-level memory existing between a main memory and a CPU, and is composed of static memory chips (SRAM), and has a relatively small capacity but a speed much higher than that of the main memory and close to that of the CPU. In the hierarchy of computer memory systems, there is a high speed, small capacity memory between the central processor and the main memory. Which together with the main memory constitutes the primary memory. The scheduling and transfer of information between the cache memory and the main memory is automated by hardware.
S120, setting the working mode of the cache according to the dimension information.
In this step, the operating mode of the cache is set according to the acquired dimension information. Specifically, according to different dimension configuration information, corresponding cache mapping schemes are adopted for one-dimensional data, two-dimensional data and three-dimensional data respectively. The cache structure contains 4 set caches, each containing 4 tile caches. Different cache mappings are needed in convolutional neural network calculations of different dimensions (one-dimensional, two-dimensional, three-dimensional) to achieve high-parallelism and high-throughput output.
S130, acquiring target data to be processed through the set cache.
In this step, after the cache is configured, the cache can determine and read the data mode to be calculated, and the cache can select and read the data addresses of different cache groups and cache slices, so as to obtain the data with the corresponding size for calculation. When reading out data, the size is calculated according to the configured output, and the continuously distributed data is read and output.
S140, processing the target data to be processed through the set cache according to the configuration information.
In this step, the cache selects a corresponding data read-out multiplexing strategy according to the calculation size and the calculation step size of the convolutional neural network. Because in the calculation process of the convolutional neural network, overlapped data is generated when the activation features are cut into blocks, namely, a certain data may exist in two different convolution processes at the same time. In the related art, such the same data is read and stored for multiple times, which results in resource waste. By controlling the cache address pointer, the method and the device can realize one-time storage and multiple reading and use of the overlapped data, thereby achieving data multiplexing and reducing the stored data amount.
The application provides a high-speed cache method, a high-speed cache system, a high-speed cache device and a storage medium of a neural network, which are characterized in that configuration information containing neural network dimensionality is obtained; setting the working mode of the cache according to the configuration information; acquiring target data to be processed through the set cache; and processing the target data to be processed through the set cache according to the configuration information. The method comprises the steps of configuring a high-speed buffer to ensure that the high-speed buffer determines target data to be processed, and processing different data to be processed by adopting different data processing schemes according to configuration information, so that cache mapping aiming at different dimensionality neural networks is realized, the high-speed buffer can avoid congestion, and high-concurrency and high-throughput writing and output are performed. Moreover, the corresponding unified fixed calculation array can realize high-efficiency mapping, thereby improving the calculation efficiency. Meanwhile, the redundant method of zero filling and storing of data in the cache can be effectively reduced by supporting the calculation sizes of different convolutional neural networks. And finally, different multiplexing strategies are selected according to the data with different dimensions and different sizes, so that the cache access data volume is reduced, and the hardware cache resource overhead can be reduced.
Further as an optional implementation manner, the configuration information includes dimension information, calculation size information, and calculation step size information.
As a further optional embodiment, the step of setting the operation mode of the cache memory includes:
acquiring the dimension information from the configuration information;
and setting a cache mapping scheme of the cache according to the dimension information.
Further as an optional implementation manner, the cache mapping scheme includes a one-dimensional cache mapping scheme, a two-dimensional cache mapping scheme, and a three-dimensional cache mapping scheme.
Specifically, there are three mapping modes for the cache memory, direct mapping, fully associative mapping, and set associative mapping, respectively
Direct mapping, the Cache organization of direct mapping is shown in FIG. 4. One block in the main memory can only be mapped into a certain specific block of the Cache. For example, the 0 th block, the 16 th block, … …, or 2032 of the main memory can only map to the 0 th block of the Cache; while the 1 st, 17 th, … … th, 2033 th blocks of the main memory can only map to the 1 st block … … of the Cache.
The direct mapping is the simplest address mapping mode, and the direct mapping has simple hardware, low cost and high address conversion speed and does not relate to the problem of a replacement algorithm. However, this method is not flexible enough, the storage space of the Cache cannot be fully utilized, each main memory block has only one fixed position for storage, and conflicts are easily generated, so that the Cache efficiency is reduced, and therefore, the method is only suitable for being used by high-capacity caches. For example, if a program needs to refer to the 0 th block and the 16 th block in the main memory repeatedly, it is better to copy the 0 th block and the 16 th block of the main memory into the Cache at the same time, but since both of them can only be copied into the 0 th block of the Cache, even if the other storage space in the Cache is empty, they cannot be occupied, so the two blocks will be loaded into the Cache alternately, resulting in a reduced hit rate.
The fully associative mapping mode, fig. 5 is the Cache organization of the fully associative mapping, and any block in the main memory can be mapped to any block position in the Cache.
The fully associative mapping mode is flexible, each block of the main memory can be mapped into any block of the Cache, the utilization rate of the Cache is high, the block conflict probability is low, and any block of the main memory can be called as long as a certain block of the Cache is eliminated. However, because the design and implementation of the Cache comparison circuit are difficult, this approach is only suitable for small-capacity caches.
The set associative mapping is actually a compromise between direct mapping and fully associative mapping, and the organization structure is shown in fig. 6. The main memory and the Cache are divided into groups, the number of blocks in one group in the main memory is the same as the number of the groups in the Cache, direct mapping is adopted among the groups, and fully associative mapping is adopted in the groups. That is, the Cache is divided into u groups of v blocks each, and which group the main memory block is stored into is fixed, and which block is stored into the group is flexible. For example, main memory is divided into 256 groups of 8 blocks each, and Cache is divided into 8 groups of 2 blocks each.
Further as an optional implementation manner, the obtaining of the target to-be-processed data by the set cache specifically includes the following steps:
acquiring the target data to be processed from the data to be processed according to the configuration information;
and writing the target data to be processed into the cache.
In this step, unlike the configuration information, which is obtained from the outside and is used to set the operating state of the cache, the data to be processed is information that the cache needs to be computationally processed. After the cache is configured with the calculated size, the cache can determine the mode for reading the data to be calculated. The cache may choose to read the data addresses of different cache sets and cache slices to obtain the corresponding size of data for computation. When reading out the data, the continuous distribution data is read out and output according to the configured output calculation size.
As a further optional implementation manner, the processing the target data to be processed by the set cache specifically includes the following steps:
determining a corresponding data multiplexing strategy according to the calculated size information and the calculated step length information;
and processing the target data to be processed according to the data multiplexing strategy.
As a further optional implementation manner, the data multiplexing policy includes a one-dimensional data multiplexing policy, a two-dimensional data multiplexing policy, and a three-dimensional data multiplexing policy.
Because in the calculation process of the convolutional neural network, overlapped data is generated when the activation features are cut into blocks, namely, a certain data may exist in two different convolution processes at the same time. In the related art, such the same data is read and stored for multiple times, which results in resource waste. By controlling the cache address pointer, the method and the device can realize one-time storage and multiple reading and use of the overlapped data, thereby achieving data multiplexing and reducing the stored data amount.
One-dimensional data multiplexing only exists in the same piece of cache, and corresponds to one-dimensional row (W) dimension, namely column data updating in convolution calculation; two-dimensional data multiplexing exists not only in the same cache but also between different groups, and corresponds to a two-dimensional row-column (WxH) dimension. When the active data needs to be updated in a line feed manner in the two-dimensional convolution process, the updated data can be formed only by adjusting the pointer of the previously used cache group to zero and adding the data reading of the new cache group, the data of the first two cache groups are multiplexed, and the cache number is reduced; the three-dimensional data multiplexing range exists among the same slice, different groups and different slices at the same time, and corresponds to the dimension of a three-dimensional row-column frame (WxHxF). The multiplexing case also exists in the active column update, row update and frame update processes, respectively. And by controlling the cache slice address pointer and the group address pointer, pointer maintenance is carried out on data needing multiplexing, and the function of reserving overlapped calculation data is realized.
The cache structure contains 4 set caches, each containing 4 tile caches. Different cache mappings are needed in convolutional neural network calculations of different dimensions (one-dimensional, two-dimensional, three-dimensional) to achieve high-parallelism and high-throughput output.
The three-dimensional data has three dimensions of a row-column frame (HxWxF) except a channel (C), the cache mapping of the three-dimensional data is stored into 4 slice caches of one group by taking 4 frames (F) as basic units, and the data of the same row is mapped into continuous addresses of the same slice of the cache. Storing the next column of data in another group of cache so as to simultaneously output multi-frame data of multiple rows and multiple columns for three-dimensional convolution calculation;
illustratively, table 1 shows the address pointer transformation and control situation of three-dimensional data during data multiplexing, referring to fig. 7, if there is an overlap of two frames of calculated data, the cache only needs to store the data of the overlap of the two frames once, and then the data multiplexing can be realized by repeatedly reading the address pointer at the overlapping position twice, so as to obtain two data blocks with data overlap for calculation, instead of storing all data of the two data blocks.
Example of frame data update when reading data of size 3 × 3 × 3:
Figure BDA0003564359620000081
TABLE 1
The two-dimensional data has two dimensions of rows and columns (HxW) except for the channel (C), and the two-dimensional data can be regarded as a special case of 1 dimension of three-dimensional data, the cache mapping of the two-dimensional data takes 4 channels (C) as a basic unit and caches the three-dimensional data in 4 slice caches in the same group, and the data in the same row is mapped to continuous addresses of the same slice in the cache. The next column of data is stored in the buffer memory of the other group, so that the data of multiple rows and multiple columns of multiple channels are simultaneously output for two-dimensional convolution calculation;
illustratively, table 2 shows the address pointer transformation and control of the two-dimensional data during data multiplexing, and referring to fig. 8, if there is two overlapping rows of calculated data, the cache only needs to store the two overlapping rows of data once, and then the data multiplexing can be realized by repeatedly reading the overlapping position address pointer twice, so as to obtain two data blocks with data overlapping for calculation, instead of storing all data of the two data blocks.
Example of row data update when reading data of size 3 × 3 × 3:
Figure BDA0003564359620000091
TABLE 2
The one-dimensional data only has line (W) dimension except for a channel (C), the channel data is divided into a channel (longitudinal) and a channel (transverse) according to the arrangement direction, 4 channels (transverse) are used as basic units to be cached in 4 slice caches in the same group, and the data in the same line is mapped to continuous addresses of the same slice of the cache. The next channel (vertical data) is stored in another set of buffers. The mapping method of different dimensions can effectively reduce the vacancy of the computing unit for a fixed and uniform computing circuit.
Illustratively, table 3 shows the address pointer transformation and control situation of the one-dimensional data during data multiplexing, referring to fig. 9, if there is two columns of overlapping data in the calculated data, the cache only needs to store the two columns of overlapping data once, and then the data multiplexing can be realized by repeatedly reading the address pointer at the overlapping position twice, so as to obtain two data blocks with overlapping data for calculation, instead of storing all data of the two data blocks.
Column data update example when reading 3 × 3 × 3 size data:
Figure BDA0003564359620000092
TABLE 3
Referring to fig. 2, a cache system of a neural network according to an embodiment of the present invention includes:
a first module 210, configured to obtain configuration information of a cache;
a second module 220, configured to set a working mode of the cache according to the configuration information;
a third module 230, configured to obtain target to-be-processed data through the set cache;
a fourth module 240, configured to process the target to-be-processed data through the set cache according to the configuration information.
The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.
Referring to fig. 3, an embodiment of the present invention provides a cache apparatus of a neural network, including:
at least one processor 310;
at least one memory 320 for storing at least one program;
the at least one program, when executed by the at least one processor 310, causes the at least one processor 310 to implement the caching method of the neural network shown in fig. 1.
The contents in the above method embodiments are all applicable to the present apparatus embodiment, the functions specifically implemented by the present apparatus embodiment are the same as those in the above method embodiments, and the advantageous effects achieved by the present apparatus embodiment are also the same as those achieved by the above method embodiments.
Embodiments of the present invention also provide a storage medium having stored therein processor-executable instructions, which when executed by a processor, are used to implement the caching method of the neural network shown in fig. 1.
It can be understood that, compared with the prior art, the embodiment of the present invention also has the following advantages:
1) cache mapping for neural networks of different dimensions is realized, so that congestion of a cache can be avoided, and high-concurrency and high-throughput writing and outputting are performed.
2) The corresponding unified fixed calculation array can realize high-efficiency mapping, thereby improving the calculation efficiency.
3) The redundant method of zero padding storage of data in the cache can be effectively reduced by supporting the calculation sizes of different convolutional neural networks.
4) Different multiplexing strategies are selected according to data with different dimensions and different sizes, so that the cache access data volume is reduced, and the hardware cache resource overhead can be reduced.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A caching method for a neural network, comprising the steps of:
acquiring configuration information of a cache, wherein the configuration information comprises dimension information of a neural network to be processed;
setting the working mode of the cache according to the dimension information;
acquiring target data to be processed through the set cache;
and processing the target data to be processed through the set cache according to the configuration information.
2. The method of claim 1, wherein the configuration information comprises dimension information, calculation size information, and calculation stride information.
3. The method of claim 1, wherein the step of setting the operating mode of the cache comprises:
acquiring the dimension information from the configuration information;
and setting a cache mapping scheme of the cache according to the dimension information.
4. The neural network caching method of claim 3, wherein the cache mapping scheme comprises a one-dimensional cache mapping scheme, a two-dimensional cache mapping scheme, and a three-dimensional cache mapping scheme.
5. The method for caching in a neural network according to claim 1, wherein the setting-completed cache for obtaining the target data to be processed specifically includes the following steps:
acquiring the target data to be processed from the data to be processed according to the configuration information;
and writing the target data to be processed into the cache.
6. The method according to claim 2, wherein the set cache is used for processing the target data to be processed, and the method specifically comprises the following steps:
determining a corresponding data multiplexing strategy according to the calculated size information and the calculated step length information;
and processing the target data to be processed according to the data multiplexing strategy.
7. The caching method for a neural network of claim 6, wherein the data multiplexing policy comprises a one-dimensional data multiplexing policy, a two-dimensional data multiplexing policy, and a three-dimensional data multiplexing policy.
8. A cache system for a neural network, comprising:
a first module for obtaining configuration information of a cache;
a second module, configured to set a working mode of the cache according to the configuration information;
the third module is used for acquiring target data to be processed through the set cache;
and the fourth module is used for processing the target data to be processed through the set cache according to the configuration information.
9. A cache apparatus of a neural network, comprising:
at least one processor;
at least one memory for storing at least one program;
when executed by the at least one processor, the at least one program causes the at least one processor to implement the caching method for the neural network of any one of claims 1-7.
10. A computer readable storage medium having stored therein processor executable instructions, which when executed by a processor, are for implementing a caching method for a neural network as claimed in any one of claims 1 to 7.
CN202210299126.8A 2022-03-25 2022-03-25 Caching method, system, device and storage medium of neural network Pending CN114742214A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210299126.8A CN114742214A (en) 2022-03-25 2022-03-25 Caching method, system, device and storage medium of neural network
PCT/CN2023/082863 WO2023179619A1 (en) 2022-03-25 2023-03-21 Neural network caching method, system, and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210299126.8A CN114742214A (en) 2022-03-25 2022-03-25 Caching method, system, device and storage medium of neural network

Publications (1)

Publication Number Publication Date
CN114742214A true CN114742214A (en) 2022-07-12

Family

ID=82276481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210299126.8A Pending CN114742214A (en) 2022-03-25 2022-03-25 Caching method, system, device and storage medium of neural network

Country Status (2)

Country Link
CN (1) CN114742214A (en)
WO (1) WO2023179619A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023179619A1 (en) * 2022-03-25 2023-09-28 中山大学 Neural network caching method, system, and device and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11062203B2 (en) * 2016-12-30 2021-07-13 Intel Corporation Neuromorphic computer with reconfigurable memory mapping for various neural network topologies
CN111783933A (en) * 2019-04-04 2020-10-16 北京芯启科技有限公司 Hardware circuit design and method for data loading device combining main memory and accelerating deep convolution neural network calculation
CN112860596B (en) * 2021-02-07 2023-12-22 厦门壹普智慧科技有限公司 Data stream cache device of neural network tensor processor
CN112988621A (en) * 2021-03-12 2021-06-18 苏州芯启微电子科技有限公司 Data loading device and method for tensor data
CN114742214A (en) * 2022-03-25 2022-07-12 中山大学 Caching method, system, device and storage medium of neural network

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023179619A1 (en) * 2022-03-25 2023-09-28 中山大学 Neural network caching method, system, and device and storage medium

Also Published As

Publication number Publication date
WO2023179619A1 (en) 2023-09-28

Similar Documents

Publication Publication Date Title
JP5715644B2 (en) System and method for storing data in a high speed virtual memory system
US6816947B1 (en) System and method for memory arbitration
US8819359B2 (en) Hybrid interleaving in memory modules by interleaving physical addresses for a page across ranks in a memory module
US7944931B2 (en) Balanced bandwidth utilization
US7664922B2 (en) Data transfer arbitration apparatus and data transfer arbitration method
JP2005522773A (en) Non-uniform cache device, system and method
CN106126112A (en) Each cycle has multiple read port and a plurality of memorizer of multiple write port
CN102446159B (en) Method and device for managing data of multi-core processor
CN110705702A (en) Dynamic extensible convolutional neural network accelerator
CN109993293B (en) Deep learning accelerator suitable for heap hourglass network
US20230089754A1 (en) Neural network accelerator using only on-chip memory and method for implementing neural network accelerator using only on-chip memory
WO2023179619A1 (en) Neural network caching method, system, and device and storage medium
KR102623702B1 (en) Semiconductor device including a memory buffer
CN115168247B (en) Method for dynamically sharing memory space in parallel processor and corresponding processor
CN115203076B (en) Data structure optimized private memory caching
US6775742B2 (en) Memory device storing data and directory information thereon, and method for providing the directory information and the data in the memory device
US7406554B1 (en) Queue circuit and method for memory arbitration employing same
US20220374348A1 (en) Hardware Acceleration
CN113222115B (en) Convolutional neural network-oriented shared cache array
CN112925727B (en) Tensor cache and access structure and method thereof
JP5348157B2 (en) Information processing apparatus, memory access control apparatus and address generation method thereof
CN111078589B (en) Data reading system, method and chip applied to deep learning calculation
CN112712167A (en) Memory access method and system supporting acceleration of multiple convolutional neural networks
JP4117621B2 (en) Data batch transfer device
US20220276969A1 (en) Sedram-based stacked cache system and device and controlling method therefor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination