WO2023236365A1 - Procédé et appareil de traitement de données, et puce ia, dispositif électronique et support de stockage - Google Patents

Procédé et appareil de traitement de données, et puce ia, dispositif électronique et support de stockage Download PDF

Info

Publication number
WO2023236365A1
WO2023236365A1 PCT/CN2022/114886 CN2022114886W WO2023236365A1 WO 2023236365 A1 WO2023236365 A1 WO 2023236365A1 CN 2022114886 W CN2022114886 W CN 2022114886W WO 2023236365 A1 WO2023236365 A1 WO 2023236365A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
chip
neural network
compressed data
compression algorithm
Prior art date
Application number
PCT/CN2022/114886
Other languages
English (en)
Chinese (zh)
Inventor
段茗
Original Assignee
成都登临科技有限公司
上海登临科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 成都登临科技有限公司, 上海登临科技有限公司 filed Critical 成都登临科技有限公司
Publication of WO2023236365A1 publication Critical patent/WO2023236365A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application belongs to the field of neural network technology, and specifically relates to a data processing method, device, AI chip, electronic equipment and storage medium.
  • AI Artificial Intelligence
  • neural networks have received widespread attention and application.
  • large neural networks have a large number of levels and nodes, which results in a large number of weight parameters.
  • the network training process is time-consuming, and the trained model takes up a lot of storage space. Therefore, in the field of artificial intelligence, more and more attention is being paid to sparse neural networks, and many optimization methods have been proposed to obtain greater benefits compared to ordinary neural networks in this specific scenario.
  • Another way is to deploy the neural network on a dedicated AI acceleration chip (such as Google's TPU (Tensor Processing Unit, tensor processor)).
  • a dedicated AI acceleration chip such as Google's TPU (Tensor Processing Unit, tensor processor)
  • the dedicated chip may not support the compression and decompression operations of sparse networks; even if the chip supports sparse networks, a certain compression and decompression scheme used may not be suitable for the current network scenario. , making the benefits obtained not obvious.
  • the purpose of this application is to provide a data processing method, device, AI chip, electronic equipment and storage medium to improve the shortcomings of existing solutions and improve the energy consumption ratio (performance-efficiency ratio) of processing sparse neural networks. consumption ratio) and efficiency.
  • inventions of the present application provide a data processing method.
  • the data processing method may include: obtaining basic information of a neural network deployed in an AI chip and basic information of the AI chip; according to the neural network The basic information of the neural network and the basic information of the AI chip, select the optimal compression algorithm from multiple preset compression algorithms; use the optimal compression algorithm to compress the relevant data of the neural network to obtain the compressed data and the corresponding Data index, the data index is used to restore the compressed data to the original data before uncompression, or to determine the position of the non-zero element in the compressed data in the original data before uncompression.
  • the basic information of the neural network and the basic information of the AI chip are obtained, and the optimal compression scheme suitable for the current scenario is flexibly selected, and then the relevant data of the neural network is compressed using the optimal compression scheme.
  • the optimal compression scheme suitable for the current scenario is flexibly selected, and then the relevant data of the neural network is compressed using the optimal compression scheme.
  • the basic information of the neural network may include: network sparsity and the amount of original data in the network;
  • the basic information of the AI chip may include: the transmission bandwidth of the chip, the calculation of the chip Resource usage and on-chip memory consumption of the chip; based on the basic information of the neural network and the basic information of the AI chip, select the optimal compression algorithm from multiple preset compression algorithms, which may include: for each The preset compression algorithm, the compression algorithm, the network sparsity, the original data volume of the network, the transmission bandwidth of the chip, the computing resource usage of the chip, and the on-chip memory consumption of the chip are input Processing is performed in the preset performance evaluation model to obtain the corresponding evaluation score; wherein, the preset performance evaluation model is used to simulate the AI chip to pre-set the simulated compressed data and the corresponding data index that have been simulated and compressed by the compression algorithm.
  • the performance overhead required during processing; the compression algorithm corresponding to the maximum evaluation score is regarded as the optimal compression algorithm.
  • the AI chip is simulated to preprocess the simulated compressed data and the corresponding data index that have been simulated and compressed by the compression algorithm.
  • the performance overhead makes it possible to quickly obtain the evaluation indicators of hardware operation (approximate to the performance data running on real hardware) without actually running the hardware, so as to select the compression scheme, so that the compression scheme can be flexibly selected suitable for the current scenario.
  • Optimal compression scheme is possible to quickly obtain the evaluation indicators of hardware operation (approximate to the performance data running on real hardware) without actually running the hardware, so as to select the compression scheme, so that the compression scheme can be flexibly selected suitable for the current scenario.
  • the network sparsity may represent the proportion of zero-valued weights in the network relative to the overall weight; the original data volume of the network may be uncompressed weight data in the network the size of.
  • the compression algorithm, the network sparsity, the network original data volume, the transmission bandwidth of the chip, the computing resource usage of the chip, the The on-chip memory consumption of the chip is input into the preset performance evaluation model for processing to obtain the corresponding evaluation score, which may include: the preset performance evaluation model obtains the compression based on the network sparsity and the network original data volume.
  • the algorithm simulates the amount of simulated compressed data and the corresponding data index amount after compression; the preset performance evaluation model performs data segmentation on the amount of simulated compressed data and the corresponding data index amount based on the on-chip memory consumption of the chip.
  • the preset performance evaluation model simulates the AI chip to load the data block according to the transmission bandwidth of the chip, and loads the data according to the computing resource usage of the chip.
  • the performance overhead required for specified processing of the data the preset performance evaluation model obtains the corresponding evaluation score based on the corresponding performance overhead of each simulated data block.
  • the amount of data after simulation and compression by the compression algorithm can be obtained according to the network sparsity and the original data amount of the network. Then, according to the on-chip memory consumption of the chip, the compressed data volume is segmented to be consistent with the actual processing flow to avoid data that is too large to be loaded into the on-chip memory at one time. Afterwards, each segmented data is block, simulating the entire process of the AI chip from loading compressed data to decompressing the configurable decompression unit, and finally the computing unit performing specified calculations on the decompressed data. This can very accurately evaluate the final operation of the chip in the current scenario. Performance and energy consumption (similar to performance data running on real hardware), to select the optimal compression solution.
  • using an optimal compression algorithm to compress the relevant data of the neural network may include: dividing the relevant data of the neural network into blocks according to the format required by the hardware; For each segmented data block, the data is aligned according to the alignment requirements required by the hardware; the optimal compression algorithm is used to compress each aligned data block according to the alignment requirements required by the hardware.
  • the relevant data when compressing relevant data, is divided into blocks according to the format required by the hardware, so as to better utilize the hardware performance, and the block data is processed according to the format required by the hardware. Alignment requires data alignment, and then each aligned data block is compressed according to the alignment requirements required by the hardware to improve the efficiency of subsequent hardware reading of data.
  • the method may further include: when it is necessary to perform specified calculations on the relevant data of the neural network, obtain the target compressed data and corresponding target data index corresponding to the relevant data of the neural network; determine whether the computing unit can directly perform specified calculations on the target compressed data and corresponding target data index; determine whether the computing unit can directly perform specified calculation on the target compressed data; When the data and the corresponding target data index perform specified calculations, the target compressed data and the corresponding target data index are transparently transmitted to the calculation unit for specified calculations.
  • the corresponding target compressed data and the corresponding target data index are obtained. If the computing unit can directly specify the target compressed data and the corresponding target data index For calculation, the obtained target compressed data and the corresponding target data index are directly transparently transmitted to the computing unit for specified calculation, so as to reduce the processing flow of decompressing the target data and thereby improve the processing efficiency.
  • the method may further include: if no, The target compressed data is decompressed according to the target data index, and the decompressed original data is sent to the computing unit for specified calculation.
  • the computing unit if the computing unit cannot directly perform specified calculations on the target compressed data and the corresponding target data index, the target compressed data is decompressed, and then the decompressed original data is sent to the computing unit for specified calculations, To ensure that the calculation unit can calculate correctly and avoid calculation errors.
  • the neural network may be a convolutional neural network, a recurrent neural network, or a long short-term memory neural network.
  • the preset compression algorithm may include at least one of the following: Bitmap compression algorithm, row compression algorithm or column compression algorithm, coordinate compression algorithm, and run-length coding compression algorithm.
  • the AI chip may include: an on-chip memory, a data loading unit, a configurable decompression unit, and a computing unit; the on-chip memory is configured to store and deploy data to The compressed data of the neural network in the AI chip and the corresponding data index; a data loading unit configured to read the target compressed data and the corresponding target data index stored in the on-chip memory; a configurable decompression unit , is configured to obtain the target compressed data sent by the data loading unit and the corresponding target data index, and determine whether the target compressed data needs to be decompressed according to the configuration information; if not, transparently transmit the Target compressed data and corresponding target data index; a calculation unit configured to receive the target compressed data and corresponding target data index transparently transmitted by the configurable decompression unit, and perform specified calculations on them.
  • the configurable decompression unit may also be configured to compress the target according to the target data index when the target compressed data needs to be decompressed.
  • the data is decompressed, and the decompressed original data is sent to the computing unit; the computing unit may also be configured to perform specified calculations on the original data sent by the configurable decompression unit.
  • inventions of the present application also provide a data processing device.
  • the data processing device may include: an acquisition module, a selection module, and a compression module; the acquisition module is configured to acquire the neural network deployed in the AI chip. Basic information of the network and basic information of the AI chip; a selection module configured to select the best compression algorithm from a plurality of preset compression algorithms based on the basic information of the neural network and the basic information of the AI chip.
  • Optimal compression algorithm a compression module configured to use an optimal compression algorithm to compress the relevant data of the neural network to obtain compressed data and a corresponding data index, and the data index is used to restore the compressed data to The original data before uncompression, or, used to determine the position of non-zero elements in the compressed data in the original data before uncompression.
  • the data processing device may further include a decompression module and a sending module; the acquisition module may also be configured to obtain compressed data and corresponding data from the compression module. After the data is indexed, when it is necessary to perform specified calculations on the relevant data of the neural network, obtain the target compressed data corresponding to the relevant data of the neural network and the corresponding target data index; the decompression module can be configured to use To determine whether the computing unit can directly perform specified calculations on the target compressed data and the corresponding target data index; the sending module may be configured to determine whether the computing unit can directly perform the specified calculation on the target compressed data and the corresponding target data. When performing specified calculations on the index, the target compressed data and the corresponding target data index are transparently transmitted to the computing unit for specified calculations.
  • inventions of the present application further provide an electronic device.
  • the electronic device may include: a memory and a processor, the processor being configured to be connected to the memory; the memory being configured to Stored program; the processor is configured to call the program stored in the memory to execute the above-mentioned first aspect embodiment and/or provided in conjunction with any possible implementation of the first aspect embodiment. method.
  • embodiments of the present application further provide a computer-readable storage medium on which a computer program is stored.
  • the computer program can execute the above-mentioned embodiment of the first aspect and/or combine the first Any possible implementation manner of the aspect embodiment provides a method.
  • FIG. 1 shows a schematic principle diagram of a data processing flow combining software and hardware provided by an embodiment of the present application.
  • Figure 2 shows a schematic flowchart of a data processing method provided by an embodiment of the present application.
  • Figure 3 shows a schematic structural diagram of an AI chip provided by an embodiment of the present application.
  • Figure 4 shows a functional module schematic diagram of a data processing device provided by an embodiment of the present application.
  • FIG. 5 shows a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • 100-data processing device 110-acquisition module; 120-selection module; 130-compression module; 200-electronic equipment; 210-transceiver; 220-memory; 240-processor.
  • a and/or B can mean: A alone exists, and A and A exist simultaneously. B, there are three situations of B alone.
  • the embodiments of this application provide an overall solution from top to bottom, from software level to hardware level, to improve the energy consumption ratio (performance-to-power ratio) and efficiency of processing sparse neural networks.
  • this application provides a dedicated AI chip, which can avoid seizing the computing resources of third-party general-purpose hardware and does not need to involve the conversion of compression and decompression program instructions.
  • For operations such as "general instructions” third-party general hardware resources need to convert the compression/decompression program instructions into “general instructions” when performing "compression” and “decompression” operations, and then the compression/decompression operations can be performed), Greatly improve energy consumption ratio and efficiency.
  • this application Compared with common AI acceleration chips (such as Google's TPU), this application provides a driver that matches the AI chip, which supports flexible compression algorithm selection and can solve certain problems used by existing AI acceleration chips.
  • the compression scheme may not be suitable for the current network scenario, so that the benefits obtained are not obvious. It can accelerate the sparse neural network in a relatively optimal way and improve the energy consumption ratio and performance.
  • this application provides a dedicated AI chip.
  • the AI chip includes on-chip memory, a data loading unit, a configurable decompression unit, and a computing unit.
  • the AI acceleration chip may not support the compression and decompression operations of sparse networks. Or, even if the chip supports sparse networks, a certain compression scheme it uses may not be suitable for the current network scenario, making the benefits obtained For non-obvious problems, at the software level, this application provides a driver for use with the AI chip.
  • the driver can be solidified in a storage medium such as a CD, U disk, etc., and sold together with the AI chip. After the user installs the AI chip, the data processing method shown in this application can be automatically completed by running the driver. processing.
  • the driver can also be mounted on the Internet, and when needed, downloaded from the Internet through a link and installed locally.
  • the driver will obtain the basic information of the neural network deployed in the AI chip and the basic information of the AI chip.
  • existing deployment methods can be used to deploy the trained neural network onto the dedicated AI chip of this application, which will not be introduced here.
  • the basic information of the neural network can include information such as network sparsity and the amount of original data of the network.
  • Network sparsity represents the proportion of zero-valued weights in the network relative to the overall weight. For example, assuming that the network sparsity is 30%, it means that the proportion of zero-valued weights is 30% and the proportion of non-zero-valued weights is 70%.
  • the original data volume of the network is the size of the uncompressed weight data in the network, such as 100M. It should be noted that the values exampled here are only examples and depend on the information of the neural network itself.
  • the basic information of the AI chip can include some basic information of the hardware, such as the chip's transmission bandwidth, the chip's computing resource usage, the chip's on-chip memory consumption, etc.
  • the neural network in this application can be any neural network that can be deployed in an AI chip, such as a convolutional neural network (Convolution Neural Network, CNN), a recurrent neural network (Rerrent Neural Network, RNN), a long short-term memory (Long Short Term Memory, LSTM) neural network and other neural networks.
  • a convolutional neural network Convolution Neural Network, CNN
  • a recurrent neural network RNN
  • LSTM Long Short Term Memory
  • S2 Select the optimal compression algorithm from multiple preset compression algorithms based on the basic information of the neural network and the basic information of the AI chip.
  • the driver can select the optimal compression algorithm from multiple preset compression algorithms based on the obtained basic information of the neural network and the basic information of the AI chip. Since different compression algorithms compress different amounts of data, the hardware resources required for decompression are also different. Therefore, by comparing the consumption of hardware performance caused by different compression algorithms, the optimal compression algorithm is selected to improve energy consumption ratio and performance.
  • the default compression algorithm can be Bitmap compression algorithm, row compression algorithm such as CSR (Compressed Sparse Row) Or column compression algorithms such as CSC (Compressed Sparse Column), coordinate compression algorithms such as COO (Coordinate) or COO-1D (deformation of COO), run length coding (Run Length Coding, RLC) compression algorithm, etc.
  • CSR Compressed Sparse Row
  • CSC Compressed Sparse Column
  • coordinate compression algorithms such as COO (Coordinate) or COO-1D (deformation of COO)
  • RLC Run Length Coding
  • the main thing is to remove zero value (0) elements from the data, retain only non-0 elements, and then arrange the non-0 elements in order, and how to represent non-0 elements or 0 elements is generated.
  • the location information depends on the compression algorithm. Different compression algorithms correspond to different data indexes. For example, if the input data is (2, 0, 5, 0) and the Bitmap compression algorithm is used, the compressed data obtained is (2, 5), the data amount is reduced by half, and the data index is a binary number 1010. For each bit in a binary number, 1 represents a non-zero element and 0 represents a 0 element.
  • the compressed data obtained is still (2, 5), but the data index becomes a sequence of coordinate information (0, 2), where coordinate 0 represents the element corresponding to position 0 It is a non-0 element.
  • Coordinate 2 means that the element corresponding to position 2 is a non-0 element.
  • the implementation process of S2 can be: for each preset compression algorithm, the compression algorithm, network sparsity, network original data volume, chip transmission bandwidth, chip computing resource usage,
  • the on-chip memory consumption of the chip is input into the preset performance evaluation model for processing, and the corresponding evaluation score is obtained; the compression algorithm corresponding to the maximum evaluation score is regarded as the optimal compression algorithm.
  • the preset performance evaluation model is used to simulate the performance overhead required when the AI chip preprocesses specified data (such as data loading, decompression, specified calculation processing, etc.).
  • the specified data includes simulated compression by the compression algorithm. Simulate compressed data and corresponding data indexes.
  • the data index is used to restore the compressed data to the original data before uncompression, or to determine the position of the non-zero elements in the compressed data in the original data before uncompression.
  • the driver will call cost_func one by one to calculate the scores for all compression algorithms supported by the hardware, and finally select the compression algorithm with the largest score as an alternative.
  • the maximum score is lower than the threshold set by the driver, it means that the current scene is not suitable for compression processing, or the benefits brought by compression processing are not great.
  • the driver will process the neural network in a non-compression manner. The data.
  • the logic of selecting the optimal compression algorithm using the preset performance evaluation model can be:
  • the driver uses the evaluation function to obtain the score score i for the compression algorithm a i (where i ranges from 1 to n in sequence, n is the number of preset compression algorithms). If the score i is greater than the score max , the maximum value is updated.
  • the score is score i , and the candidate compression algorithm is a i ;
  • step b) Repeat step b) until the scores of all compression algorithms are obtained, otherwise proceed to step d);
  • the preset performance evaluation model is a performance and energy consumption model. It internally models the entire process of the AI chip from loading compressed data to decompressing the configurable decompression unit, and finally the computing unit performs specified calculations on the decompressed data. It can It can very accurately evaluate the performance and energy consumption of the final operation of the chip in the current scenario, so that the evaluation indicators of hardware operation can be obtained very quickly without actually running the hardware.
  • the preset performance evaluation model simulates various modules of the entire hardware, including data loading unit, configurable decompression unit, computing unit and other modules. According to the parameters entered previously, the performance data of the network running on the hardware (approximate to the performance data running on the real hardware) is obtained, and the optimal compression scheme is selected.
  • Performance evaluation models are often used as an "analysis tool" to observe the operation of hardware in a specific scenario and locate the bottlenecks of hardware operation in that scenario.
  • time overhead in seconds
  • the compression algorithm, network sparsity, network raw data volume, chip transmission bandwidth, chip computing resource usage, and chip on-chip memory consumption are input into a preset performance evaluation model for processing.
  • the process of obtaining the corresponding evaluation score can be: the preset performance evaluation model obtains the simulated compressed data volume and the corresponding data index volume after simulated compression by the compression algorithm according to the network sparsity and the network original data volume; the preset performance evaluation model According to the on-chip memory consumption of the chip, the simulated compressed data amount and the corresponding data index amount are segmented; for each segmented data block, a preset performance evaluation model simulates the AI chip to process the data according to the chip's transmission bandwidth. Blocks are used to load data, and the performance overhead required for processing the loaded data is specified according to the computing resource usage of the chip; the preset performance evaluation model obtains the corresponding evaluation score based on the corresponding performance overhead of each simulated data block.
  • the preset performance evaluation model simulates the AI chip to load data blocks according to the chip's transmission bandwidth, and performs specified processing of the loaded data according to the chip's computing resource usage.
  • the hardware involved includes data Loading unit, configurable decompression unit and computing unit. Therefore, it is necessary to simulate the performance consumption of the entire process from processing loaded compressed data to the configurable decompression unit for decompression, and finally the computing unit performing specified calculations on the decompressed data.
  • the preset performance evaluation model simulates the performance overhead required by the data loading unit in the AI chip to load the data block according to the chip's transmission bandwidth; the preset performance evaluation model simulates the AI The performance overhead required by the configurable decompression unit in the chip to decompress the data block; the preset performance evaluation model simulates the performance overhead required by the computing unit in the AI chip to perform specified operations on the data block according to the chip's computing resource usage. Performance overhead.
  • the bottlenecks that restrict the hardware are selected, and the corresponding evaluation scores are obtained. For example, if the bottleneck that restricts the hardware is the transmission bandwidth of the data loading unit, the corresponding evaluation score will be obtained based on the performance consumption of the data loading unit.
  • the bottleneck that restricts the hardware is a computing bottleneck such as a configurable decompression unit or a computing unit
  • the corresponding evaluation score will be based on the configurable decompression unit or computing unit. Configure the performance consumption of the decompression unit or computing unit to obtain the corresponding evaluation score.
  • the configurable decompression unit needs to decompress the compressed data according to the data index. Assuming that the number of configurable decompression units is 1, the decompressed data of each configurable decompression unit at a time If the volume is 4M/s, then for data block 1, it takes 10s, and for data block 2, it takes 8.75s. Then for the configurable decompression unit, the total performance overhead is 18.75s.
  • the decompressed data volume of data block 1 is 55M
  • the decompressed data volume of data block 2 is 45M.
  • the performance overhead required for specified processing of loaded data according to the computing resource usage of the chip (assuming that the number of available computing units is 5 and the calculation amount of each computing unit is 1M/s)
  • the data block 1 it takes 10 (i.e. 55/5)s
  • the data block 2 it takes 9 (i.e. 45/5)s.
  • the total performance overhead is 19s.
  • the performance overhead is 37.5s; for the configurable decompression unit, the performance overhead is 18.75s; for the computing unit, the performance overhead is 19s.
  • the bottleneck restricting the AI chip is the transmission bandwidth (because the data loading unit has the largest performance overhead). Therefore, when the preset performance evaluation model simulation obtains the corresponding evaluation score based on the corresponding performance overhead of each data block, it mainly obtains the corresponding evaluation score based on the corresponding performance overhead of the data loading unit in processing each data block. For example, by looking up the table, we can find the evaluation score corresponding to a performance overhead of 37.5 seconds.
  • S3 Use the optimal compression algorithm to compress the relevant data of the neural network to obtain the compressed data and the corresponding data index.
  • the data index is used to restore the compressed data to the original uncompressed data, or use Used to determine the position of non-zero elements in compressed data in the original data before compression.
  • the driver uses the optimal compression algorithm to compress the relevant data of the neural network to obtain the compressed data and the corresponding data index.
  • the data index is used to restore the compressed data to the original uncompressed data. Or, used to determine the position of non-zero elements in compressed data in the original, uncompressed data.
  • the process when using an optimal compression algorithm to compress the relevant data of the neural network, may be: dividing the relevant data of the neural network into blocks according to the format required by the hardware; For each data block, the data is aligned according to the alignment requirements required by the hardware; the optimal compression algorithm is used to compress each aligned data block according to the alignment requirements required by the hardware to obtain the corresponding compressed data and the corresponding data. Index to ensure that the compressed data meets the alignment requirements required by the hardware.
  • the computing unit can only complete 1M task calculations at a time. At this time, the hardware needs to repeat 100 times to load the data into the computing unit for calculation. The data loaded each time is a "piece" of the overall input data. The size of a data block is completely determined by the amount of hardware computing resources and the size of on-chip memory. After segmentation, a 100M computing task is cut into 100 "subtasks" of 1M in size, and compression is performed on each subtask.
  • the hardware has requirements for the format of input data.
  • Weight data generally has four attributes (output channel (input channel), kernel height (kernel height), kernel width (kernel width), input channel (input channel)). How much each attribute is placed in these 1M subtasks is the format of the blocks. Different hardware implementations have different requirements for block formats, and matching formats can better maximize hardware performance.
  • the hardware has strict requirements on the starting address of input data. Generally speaking, it must be aligned to 32 bytes (that is, it needs to be aligned to an integer multiple of 32 bytes). In this way, the hardware can obtain data correctly and efficiently. If it is not 32bytes aligned data, the hardware may need to spend several more hardware clocks (cycles) to obtain the required data. If it is aligned data, generally one hardware clock can obtain the data. Therefore, when the software layer (driver) allocates memory to the block data, it needs to allocate it in a way that is aligned to 32 bytes to ensure that the data address is aligned. At the same time, when performing data compression, it is also necessary to ensure that the compressed data is aligned according to the alignment requirements of the hardware, and the compressed data also needs to be aligned to 32 bytes.
  • the type of data must also be considered when aligning data.
  • np.int8 its placement in the on-chip memory is that every 63 bytes will be empty 1 Bytes, while the second case (np.float32) is that every 252 bytes will be empty 4 Bytes.
  • the main thing is to remove the zero value (0) elements in the data, retain only the non-0 elements, and then arrange the non-0 elements in order, and how to represent the non-0 elements is generated Or the position information of the 0 element depends on the compression algorithm. Different compression algorithms correspond to different data indexes. It should be pointed out that when performing data compression, not all 0 elements are removed. This will retain a small amount of 0 according to the alignment requirements of the hardware. During compression, if there are not that many non-zero elements, a certain number of 0 elements need to be retained to ensure that the compressed data meets the alignment requirements of the hardware.
  • the AI chip in addition to selecting the optimal compression algorithm to compress the relevant data of the neural network, after obtaining the compressed data and the corresponding data index, the AI chip can also be used to run the neural network to excite external input (such as input image features) for preprocessing (such as convolution processing, pooling, vector addition operations, classification, etc.).
  • the data processing method also includes: when it is necessary to perform specified calculations on the relevant data of the neural network, obtain the target compressed data corresponding to the relevant data of the neural network and the corresponding target data index; determine whether the computing unit can directly process the target compressed data and the corresponding target data.
  • Target compressed data index Perform specified calculations on the target data index (such as convolution processing, pooling, vector addition operations, classification, etc.); when it is determined that the computing unit can directly perform specified calculations on the target compressed data and the corresponding target data index, the target compressed data and the corresponding target data index are transparently transmitted to the computing unit for specified calculations.
  • the target compressed data is decompressed according to the target data index, and the original data obtained by decompression is sent to the computing unit for specified calculation.
  • software may be used to obtain the target compressed data and the corresponding target data index corresponding to the relevant data of the neural network; and determine whether the computing unit can directly specify the target compressed data and the corresponding target data index. Calculation; when it is determined that the computing unit can directly perform specified calculations on the target compressed data and the corresponding target data index, the target compressed data and the corresponding target data index are transparently transmitted to the computing unit for specified calculations.
  • the driver when it needs to perform specified calculations on the relevant data of the neural network, it can use the data loading unit to normally obtain the specified calculations from the on-chip memory of the AI chip.
  • the required target compressed data and corresponding target data index because the currently acquired data is compressed, the data volume is greatly reduced, and the bandwidth requirement for the data loading unit to read the data will also be greatly reduced.
  • the configurable decompression unit determines whether the computing unit can directly perform specified calculations on the target compressed data and the corresponding target data index based on the preset configuration information.
  • the computing unit can directly process the target compressed data, it will "pass-through" the data to the computing unit without performing any decompression operation. If the computing unit cannot directly process the target compressed data, then before performing calculations, the target compressed data will be decompressed according to the target data index and restored back to the state before compression, and then the decompressed original data will be sent to the computing unit for specified calculations. .
  • the configurable decompression unit can determine whether the calculation unit can directly perform specified calculations on the target compressed data and the corresponding target data index according to the preset configuration information. For example, assuming that the configuration information is 1, it means that the computing unit can directly perform specified calculations on the target compressed data and the corresponding target data index. On the contrary, if the configuration information is 0, it means that the computing unit can directly perform specified calculations on the target compressed data and the corresponding target data. The data index performs the specified calculation. Of course, it can also be reversed. If the configuration information is 1, it indicates whether the computing unit can directly perform specified calculations on the target compressed data and the corresponding target data index.
  • the calculation logic is divided into two situations: 1. For the decompressed data, the computing unit does not perform any special processing, which is the same as not turning on the compression scheme. . 2. For non-decompressed data, the computing unit needs to obtain the data index of the non-zero element in order to locate the position of the non-zero element in the original uncompressed data to perform correct calculations.
  • the data processing method provided by the embodiment of this application is designed from the software and hardware levels, combines the specific conditions of the neural network and AI chip, and flexibly selects a compression solution suitable for the current scenario, which can not only save the overall bandwidth of the neural network in the hardware , it can also save the use of computing resources in the network hardware, accelerate the sparse neural network in a relatively optimal way, and improve the energy consumption ratio and performance.
  • inventions of the present application also provide an AI chip, as shown in Figure 3.
  • the AI chip includes: on-chip memory, data loading unit, configurable decompression unit, and computing unit.
  • On-chip memory is configured to store compressed data and corresponding data indexes of the neural network deployed in the AI chip. Reduce the amount of stored data and speed up data processing through data compression.
  • the compression of the original data of the neural network deployed in the AI chip can be completed through the driver program matched with the AI chip, and the corresponding compressed data and corresponding data index can be obtained.
  • the basic information of the neural network deployed in the AI chip and the basic information of the AI chip can be obtained; based on the basic information of the neural network and the basic information of the AI chip, multiple preset compression algorithms can be obtained. Select the optimal compression algorithm from Used to determine the position of non-zero elements in compressed data in the original data before compression.
  • the data loading unit is configured to read the target compressed data and the corresponding target data index stored in the on-chip memory. For example, when it is necessary to perform specified calculations on relevant data of the neural network, the data loading unit can be used to normally obtain the target compressed data required for the specified calculation and the corresponding target data index from the on-chip memory of the AI chip.
  • the configurable decompression unit is configured to obtain the target compressed data and the corresponding target data index sent by the data loading unit, and determine whether the target compressed data needs to be decompressed according to the configuration information; if not, transparently transmit the target Compressed data and corresponding target data index.
  • the configuration information is configured to determine whether the target compressed data needs to be decompressed. For example, when the configuration information is 1, it indicates whether the computing unit can directly perform specified calculations on the target compressed data and the corresponding target data index. There is no need to decompress the target compressed data. The target compressed data is decompressed; otherwise, if the configuration information is 0, it means that the computing unit can directly perform specified calculations on the target compressed data and the corresponding target data index, and the target compressed data needs to be decompressed.
  • the calculation unit is configured to receive the target compressed data and the corresponding target data index transparently transmitted by the configurable decompression unit, and perform specified calculations on them.
  • the configurable decompression unit is also configured to: when the target compressed data needs to be decompressed, decompress the target compressed data according to the target data index, and send the decompressed original data to the computing unit; calculate The unit is also configured to perform specified calculations on the raw data sent by the configurable decompression unit.
  • an embodiment of the present application also provides a data processing device 100, as shown in Figure 4.
  • the data processing device 100 includes: an acquisition module 110, a selection module 120, and a compression module 130.
  • the acquisition module 110 is configured to acquire basic information of the neural network deployed in the AI chip and basic information of the AI chip.
  • the selection module 120 is configured to select the optimal compression algorithm from a plurality of preset compression algorithms based on the basic information of the neural network and the basic information of the AI chip.
  • the compression module 130 is configured to use the optimal compression algorithm to compress the relevant data of the neural network to obtain the compressed data and the corresponding data index.
  • the data index is used to restore the compressed data to the original data before uncompression, or, Used to determine the position of non-zero elements in compressed data in the original data before compression.
  • the basic information of the neural network includes: the sparsity of the network and the amount of original data of the network; the basic information of the AI chip includes: the transmission bandwidth of the chip, the usage of computing resources of the chip, and the consumption of on-chip memory of the chip; select the module 120 is configured to input the compression algorithm, network sparsity, network original data volume, chip transmission bandwidth, chip computing resource usage, and chip on-chip memory consumption for each preset compression algorithm.
  • processing is performed in the performance evaluation model to obtain the corresponding evaluation score; among which, the preset performance evaluation model is used to simulate the AI chip that is required to preprocess the simulated compressed data and the corresponding data index that have been simulated and compressed by the compression algorithm. Performance overhead; use the compression algorithm corresponding to the maximum evaluation score as the optimal compression algorithm.
  • the selection module 120 is specifically configured to use a preset performance evaluation model to obtain the simulated compressed data volume and the corresponding data index volume after simulated compression by the compression algorithm according to the network sparsity and the network original data volume; using the preset performance evaluation model According to the on-chip memory consumption of the chip, the simulated compressed data amount and the corresponding data index amount are segmented; for each segmented data block, the preset performance evaluation model is used to simulate the AI chip according to the chip's transmission bandwidth. The data block is loaded, and the performance overhead required for processing the loaded data is specified according to the chip's computing resource usage; the preset performance evaluation model is used to obtain the corresponding evaluation score based on the corresponding performance overhead of each simulated data block. .
  • the compression module 130 is configured to divide the relevant data of the neural network into blocks according to the format required by the hardware; for each divided data block, perform data alignment according to the alignment requirements required by the hardware; use the optimal compression algorithm according to The alignment requirements required by the hardware are used to compress each aligned data block.
  • the data processing device 100 also includes a decompression module and a sending module; the acquisition module 110 is also configured to perform processing on the relevant data of the neural network after the compression module 130 obtains the compressed data and the corresponding data index. When specifying calculation, obtain the target compressed data corresponding to the relevant data of the neural network and the corresponding target data index.
  • the decompression module is configured to determine whether the calculation unit can directly perform specified calculations on the target compressed data and the corresponding target data index.
  • the sending module is configured to transparently transmit the target compressed data and the corresponding target data index to the computing unit for the designated calculation when it is determined that the computing unit can directly perform the designated calculation on the target compressed data and the corresponding target data index.
  • the decompression module is further configured to decompress the target compressed data according to the target data index when it is No.
  • the sending module is also used to send the decompressed raw data to the computing unit for specified calculations.
  • FIG. 5 shows a structural block diagram of an electronic device 200 provided by an embodiment of the present application.
  • the electronic device 200 includes: a transceiver 210, a memory 220, a communication bus 230 and a processor 240.
  • the components of the transceiver 210 , the memory 220 , and the processor 240 are directly or indirectly electrically connected to each other to realize data transmission or interaction. For example, these components may be electrically connected to each other through one or more communication buses 230 or signal lines.
  • the transceiver 210 is used to send and receive data.
  • the memory 220 is used to store computer programs, such as the software function modules shown in FIG. 4 , that is, the data processing device 100 .
  • the data processing apparatus 100 includes at least one software function module that can be stored in the memory 220 in the form of software or firmware or solidified in the operating system (Operating System, OS) of the electronic device 200 .
  • the processor 240 is used to execute executable modules stored in the memory 220 , such as software function modules or computer programs included in the data processing apparatus 100 .
  • the processor 240 is used to obtain the basic information of the neural network deployed in the AI chip and the basic information of the AI chip; based on the basic information of the neural network and the basic information of the AI chip, from the preset Select the optimal compression algorithm from multiple compression algorithms; use the optimal compression algorithm to compress the relevant data of the neural network to obtain the compressed data and the corresponding data index.
  • the data index is used to restore the compressed data to The original uncompressed data, or, used to determine the position of non-zero elements in the compressed data in the original uncompressed data.
  • the memory 220 can be, but is not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read-only memory (Programmable Read-Only Memory, PROM), and can Erasable Programmable Read-Only Memory (EPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), etc.
  • RAM Random Access Memory
  • ROM read-only memory
  • PROM programmable read-only memory
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • the processor 240 may be an integrated circuit chip with signal processing capabilities.
  • the above-mentioned processor can be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), etc.; it can also be a digital signal processor (Digital Signal Processor, DSP), special integrated Circuit (Application Specific Integrated Circuit, ASIC), Field Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • a general-purpose processor may be a microprocessor or the processor 240 may be any conventional processor or the like.
  • the above-mentioned electronic device 200 includes, but is not limited to, a computer, a server, etc.
  • the embodiment of the present application also provides a non-volatile computer-readable storage medium (hereinafter referred to as the storage medium).
  • the storage medium stores a computer program.
  • the computer program is run by a computer such as the above-mentioned electronic device 200, Perform the data processing method shown above.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more components for implementing the specified logical function(s). Executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures.
  • each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.
  • each functional module in each embodiment of the present application can be integrated together to form an independent part, each module can exist alone, or two or more modules can be integrated to form an independent part.
  • the functions are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the relevant technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a computer-readable storage medium , including several instructions to cause a computer device (which can be a personal computer, a laptop, a server, or an electronic device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • a computer device which can be a personal computer, a laptop, a server, or an electronic device, etc.
  • the aforementioned computer-readable storage media include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc., which can store programs.
  • the medium of the code includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc., which can store programs.
  • the medium of the code include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
  • This application provides data processing methods, devices, AI chips, electronic equipment and storage media. Through design from the software and hardware levels, it combines the specific conditions of neural networks and AI chips and flexibly selects a compression solution suitable for the current scenario. It can not only Saving the overall bandwidth of the neural network in the hardware can also save the network's use of computing resources in the hardware. It can accelerate the sparse neural network in a relatively optimal way and improve the energy consumption ratio and performance.
  • the data processing method, device, AI chip, electronic device and storage medium of the present application are reproducible and can be used in a variety of industrial applications.
  • the data processing method, device, AI chip, electronic equipment and storage medium of this application can be used in the field of neural network technology.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Databases & Information Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

La présente demande appartient au domaine technique des réseaux neuronaux, et concerne un procédé et un appareil de traitement de données, et une puce IA, un dispositif électronique et un support de stockage. Le procédé de traitement de données consiste à : acquérir des informations de base d'un réseau neuronal qui est déployé dans une puce IA, et des informations de base de la puce IA ; selon les informations de base du réseau neuronal et les informations de base de la puce IA, sélectionner un algorithme de compression optimal parmi une pluralité d'algorithmes de compression prédéfinis ; et compresser des données pertinentes du réseau neuronal à l'aide de l'algorithme de compression optimal, de façon à obtenir des données compressées et un indice de données correspondant, l'indice de données étant utilisé pour restaurer les données compressées en des données d'origine non compressées, ou étant utilisé pour déterminer la position, dans les données d'origine non compressées, d'un élément non nul dans les données compressées. Au moyen de la combinaison de conditions spécifiques d'un réseau neuronal et d'une puce IA, une solution de compression appropriée pour le scénario actuel est sélectionnée de manière flexible, et l'accélération d'un réseau neuronal clairsemé est achevée d'une manière relativement optimale, de telle sorte que le rapport de consommation d'énergie et les performances sont améliorés.
PCT/CN2022/114886 2022-06-10 2022-08-25 Procédé et appareil de traitement de données, et puce ia, dispositif électronique et support de stockage WO2023236365A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210649451.2A CN114723033B (zh) 2022-06-10 2022-06-10 数据处理方法、装置、ai芯片、电子设备及存储介质
CN202210649451.2 2022-06-10

Publications (1)

Publication Number Publication Date
WO2023236365A1 true WO2023236365A1 (fr) 2023-12-14

Family

ID=82232650

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/114886 WO2023236365A1 (fr) 2022-06-10 2022-08-25 Procédé et appareil de traitement de données, et puce ia, dispositif électronique et support de stockage

Country Status (2)

Country Link
CN (1) CN114723033B (fr)
WO (1) WO2023236365A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118626148A (zh) * 2024-08-09 2024-09-10 中昊芯英(杭州)科技有限公司 基于神经网络模型的数据存储方法、装置、设备及介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114723033B (zh) * 2022-06-10 2022-08-19 成都登临科技有限公司 数据处理方法、装置、ai芯片、电子设备及存储介质
CN115186821B (zh) * 2022-09-13 2023-01-06 之江实验室 面向芯粒的神经网络推理开销估计方法及装置、电子设备
CN115643310B (zh) * 2022-09-26 2024-08-13 建信金融科技有限责任公司 一种压缩数据的方法、装置和系统
CN118093532A (zh) * 2022-11-21 2024-05-28 华为云计算技术有限公司 数据处理方法及装置
CN116185307B (zh) * 2023-04-24 2023-07-04 之江实验室 一种模型数据的存储方法、装置、存储介质及电子设备
CN117472910B (zh) * 2023-11-23 2024-06-25 中国人民大学 一种同态压缩数据处理方法和系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113495669A (zh) * 2020-03-19 2021-10-12 华为技术有限公司 一种解压装置、加速器、和用于解压装置的方法
CN114077893A (zh) * 2020-08-20 2022-02-22 华为技术有限公司 一种压缩和解压缩神经网络模型的方法及设备
CN114402596A (zh) * 2020-04-16 2022-04-26 腾讯美国有限责任公司 神经网络模型压缩
CN114723033A (zh) * 2022-06-10 2022-07-08 成都登临科技有限公司 数据处理方法、装置、ai芯片、电子设备及存储介质

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107944555B (zh) * 2017-12-07 2021-09-17 广州方硅信息技术有限公司 神经网络压缩和加速的方法、存储设备和终端
CN108985451B (zh) * 2018-06-29 2020-08-04 百度在线网络技术(北京)有限公司 基于ai芯片的数据处理方法及设备
US10530387B1 (en) * 2018-10-22 2020-01-07 Uber Technologies, Inc. Estimating an optimal ordering for data compression
US11489541B2 (en) * 2019-05-21 2022-11-01 Nvidia Corporation Compression techniques for data structures suitable for artificial neural networks
CN112633484A (zh) * 2019-09-24 2021-04-09 中兴通讯股份有限公司 神经网络加速器、卷积运算实现方法、装置及存储介质
US11675768B2 (en) * 2020-05-18 2023-06-13 Microsoft Technology Licensing, Llc Compression/decompression using index correlating uncompressed/compressed content
CN111709563B (zh) * 2020-06-05 2022-03-11 山东大学 压缩感知结合bp神经网络在粮食温度趋势预测中的工作方法
CN111553471A (zh) * 2020-07-13 2020-08-18 北京欣奕华数字科技有限公司 一种数据分析处理方法及装置
CN111832692A (zh) * 2020-07-14 2020-10-27 Oppo广东移动通信有限公司 数据处理方法、装置、终端及存储介质
US20200401891A1 (en) * 2020-09-04 2020-12-24 Intel Corporation Methods and apparatus for hardware-aware machine learning model training
CN112116084A (zh) * 2020-09-15 2020-12-22 中国科学技术大学 可重构平台上固化全网络层的卷积神经网络硬件加速器
CN112101548A (zh) * 2020-09-22 2020-12-18 珠海格力电器股份有限公司 数据压缩方法及装置、数据解压方法及装置、电子设备
CN112418424A (zh) * 2020-12-11 2021-02-26 南京大学 一种具有极高压缩比的剪枝深度神经网络的分层稀疏编码方法
CN112308215B (zh) * 2020-12-31 2021-03-30 之江实验室 基于神经网络中数据稀疏特性的智能训练加速方法及系统
CN113159297B (zh) * 2021-04-29 2024-01-09 上海阵量智能科技有限公司 一种神经网络压缩方法、装置、计算机设备及存储介质
CN113747170A (zh) * 2021-09-08 2021-12-03 深圳市算筹信息技术有限公司 一种使用ai芯片进行视频编解码运算的方法
CN114118394A (zh) * 2021-11-16 2022-03-01 杭州研极微电子有限公司 神经网络模型的加速方法和装置
CN114466082B (zh) * 2022-01-29 2024-01-09 上海阵量智能科技有限公司 数据压缩、数据解压方法、系统及人工智能ai芯片

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113495669A (zh) * 2020-03-19 2021-10-12 华为技术有限公司 一种解压装置、加速器、和用于解压装置的方法
CN114402596A (zh) * 2020-04-16 2022-04-26 腾讯美国有限责任公司 神经网络模型压缩
CN114077893A (zh) * 2020-08-20 2022-02-22 华为技术有限公司 一种压缩和解压缩神经网络模型的方法及设备
CN114723033A (zh) * 2022-06-10 2022-07-08 成都登临科技有限公司 数据处理方法、装置、ai芯片、电子设备及存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118626148A (zh) * 2024-08-09 2024-09-10 中昊芯英(杭州)科技有限公司 基于神经网络模型的数据存储方法、装置、设备及介质

Also Published As

Publication number Publication date
CN114723033A (zh) 2022-07-08
CN114723033B (zh) 2022-08-19

Similar Documents

Publication Publication Date Title
WO2023236365A1 (fr) Procédé et appareil de traitement de données, et puce ia, dispositif électronique et support de stockage
US11551068B2 (en) Processing system and method for binary weight convolutional neural network
US11599770B2 (en) Methods and devices for programming a state machine engine
CN109087384B (zh) 光线跟踪系统和方法以及光线压缩方法和模块
US10929154B2 (en) Overflow detection and correction in state machine engines
US20190044535A1 (en) Systems and methods for compressing parameters of learned parameter systems
EP2875434A1 (fr) Procédés et systèmes pour utiliser des données de vecteur d'états dans un moteur de machine d'états
CN113570033B (zh) 神经网络处理单元、神经网络的处理方法及其装置
WO2020062252A1 (fr) Accélérateur opérationnel et procédé de compression
WO2020026475A1 (fr) Processeur de réseau neuronal, procédé de traitement de réseau neuronal et programme
US20200242467A1 (en) Calculation method and calculation device for sparse neural network, electronic device, computer readable storage medium, and computer program product
CN116762080A (zh) 神经网络生成装置、神经网络运算装置、边缘设备、神经网络控制方法以及软件生成程序
US12039421B2 (en) Deep learning numeric data and sparse matrix compression
CN113554149A (zh) 神经网络处理单元npu、神经网络的处理方法及其装置
KR20200139909A (ko) 전자 장치 및 그의 연산 수행 방법
KR102502162B1 (ko) 특징 맵을 컴프레싱하는 장치 및 방법
US20220318604A1 (en) Sparse machine learning acceleration
US12001237B2 (en) Pattern-based cache block compression
Chen et al. A technique for approximate communication in network-on-chips for image classification
CN113298224A (zh) 神经网络模型的重训练方法和相关产品
Chen et al. Approximate Network-on-Chips with Application to Image Classification
CN113570034B (zh) 处理装置、神经网络的处理方法及其装置
CN116011551B (zh) 优化数据加载的图采样训练方法、系统、设备及存储介质
US11979174B1 (en) Systems and methods for providing simulation data compression, high speed interchange, and storage
US20220414457A1 (en) Selective data structure encoding for deep neural network training

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22945499

Country of ref document: EP

Kind code of ref document: A1