CN111832692A - Data processing method, device, terminal and storage medium - Google Patents
Data processing method, device, terminal and storage medium Download PDFInfo
- Publication number
- CN111832692A CN111832692A CN202010676005.1A CN202010676005A CN111832692A CN 111832692 A CN111832692 A CN 111832692A CN 202010676005 A CN202010676005 A CN 202010676005A CN 111832692 A CN111832692 A CN 111832692A
- Authority
- CN
- China
- Prior art keywords
- intermediate data
- target intermediate
- neural network
- data
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 21
- 238000013528 artificial neural network Methods 0.000 claims abstract description 110
- 238000012545 processing Methods 0.000 claims abstract description 56
- 238000000034 method Methods 0.000 claims abstract description 42
- 230000006837 decompression Effects 0.000 claims description 37
- 238000007906 compression Methods 0.000 claims description 31
- 230000006835 compression Effects 0.000 claims description 31
- 230000008569 process Effects 0.000 claims description 15
- 230000004044 response Effects 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 238000013144 data compression Methods 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 5
- 230000003068 static effect Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 19
- 238000013473 artificial intelligence Methods 0.000 description 18
- 238000011176 pooling Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 7
- 238000013500 data storage Methods 0.000 description 6
- 210000004027 cell Anatomy 0.000 description 4
- 239000002699 waste material Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 150000004770 chalcogenides Chemical class 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000004148 unit process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0608—Saving storage space on storage systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the application provides a data processing method, a data processing device, a terminal and a storage medium, and relates to the technical field of chips. The method comprises the following steps: compressing target intermediate data, wherein the target intermediate data refers to data required to be used in nth layer processing of a neural network, and n is a positive integer; responding to the nth layer of the neural network, and processing the target intermediate data to obtain compressed target intermediate data; decompressing the compressed target intermediate data to obtain target intermediate data; and processing the nth layer of the neural network according to the target intermediate data. The embodiment of the application can avoid the phenomenon that the power consumption of the terminal is increased due to the fact that the target intermediate data are exchanged into the DRAM.
Description
Technical Field
The embodiment of the application relates to the technical field of chips, in particular to a data processing method, a data processing device, a terminal and a storage medium.
Background
With the development of terminal technology, the terminal has introduced an AI (Artificial Intelligence) chip, which can be used to implement the processing procedure of the neural network.
In the related art, during a process of implementing the neural network, target intermediate data (for example, a weight and/or a feature map) in the process of implementing the neural network needs to be exchanged from an SRAM (Static Random Access Memory) included in the AI chip to an external DRAM (Dynamic Random Access Memory).
However, the above-described related art may cause an increase in power consumption of the terminal.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device, a terminal and a storage medium. The technical scheme is as follows:
in one aspect, an embodiment of the present application provides a data processing method, where the method includes:
compressing target intermediate data, wherein the target intermediate data refers to data required to be used during nth layer processing of a neural network, and n is a positive integer;
responding to the nth layer of the neural network to process by using the target intermediate data, and acquiring the compressed target intermediate data;
decompressing the compressed target intermediate data to obtain the target intermediate data;
and processing the nth layer of the neural network according to the target intermediate data.
In another aspect, an embodiment of the present application provides a data processing apparatus, where the apparatus includes:
the data compression module is used for compressing target intermediate data, wherein the target intermediate data refers to data which needs to be used during nth layer processing of the neural network, and n is a positive integer;
the data acquisition module is used for responding to the nth layer of the neural network and processing the target intermediate data to acquire the compressed target intermediate data;
the data decompression module is used for decompressing the compressed target intermediate data to obtain the target intermediate data;
and the data processing module is used for processing the nth layer of the neural network according to the target intermediate data.
In another aspect, an embodiment of the present application provides a terminal, where the terminal includes a processor and a memory, where the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the data processing method according to the above aspect.
In still another aspect, an embodiment of the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program is loaded and executed by a processor to implement the data processing method according to the above aspect.
In another aspect, the present application provides a computer program product, which is used to implement the above data processing method when the computer program product is executed.
The technical scheme provided by the embodiment of the application can bring the following beneficial effects:
the intermediate data corresponding to the nth layer which is not needed to be used temporarily is compressed, and the intermediate data corresponding to the nth layer is decompressed when the intermediate data is needed to be used, so that the memory space of the terminal is reasonably utilized, and the requirement on the memory space is reduced.
Drawings
Fig. 1 is a schematic structural diagram of an AI chip according to an embodiment of the present application;
FIG. 2 is a flow chart of a data processing method provided by an embodiment of the present application;
FIG. 3 is a flow chart of a data processing method provided by another embodiment of the present application;
FIG. 4 is a flow chart of a data processing method provided by another embodiment of the present application;
fig. 5 is a schematic structural diagram of a pnet network provided in an embodiment of the present application;
FIG. 6 is a flow chart of a data processing method provided by another embodiment of the present application;
FIG. 7 is a flow chart of a data processing method provided by yet another embodiment of the present application;
FIG. 8 is a block diagram of a data processing apparatus provided in one embodiment of the present application;
fig. 9 is a block diagram of a terminal according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
The execution subject of the embodiment of the present application may be a terminal, for example, the terminal may be a mobile phone, a PC (personal computer), a tablet computer, or other electronic devices.
Illustratively, the terminal is provided with an AI chip, which may also be referred to as an AI accelerator or a computing card, and is a chip dedicated to handling a large number of computing tasks in artificial intelligence applications. The AI chip can process data such as images, voice, video, etc. Optionally, the AI chip includes an NPU (Neural-network Processing Unit), a GPU (Graphics Processing Unit), an FPGA (Field Programmable logic array), an ASIC (Application Specific Integrated Circuit), a brain-like chip, a reconfigurable general AI chip, and the like. The technical scheme provided by the embodiment of the application can be executed by an AI chip.
Referring to fig. 1, which shows a schematic structural diagram of an AI chip according to an embodiment of the present application, the AI chip 100 includes a compression unit 110, a storage unit 120, and a decompression unit 130. The AI chip 100 runs a neural network processing engine (which may be called a neural network inference engine), and the neural network includes N layers, each Layer being Layer1 (Layer 1), Layer2 (Layer 2), Layer3 (Layer 3) … … LayerN (Layer N). In the embodiment of the present application, the compression unit 110 is configured to compress target intermediate data (intermediate data unused in a next layer temporarily), where the target intermediate data refers to data that needs to be used in processing of an nth layer of the neural network. The storage unit 120 is used to store the target intermediate data. The decompression unit 130 is used for decompressing the compressed target intermediate data.
Referring to fig. 2, a flowchart of a data processing method according to an embodiment of the present application is shown. The embodiment of the application can be applied to the AI chip shown in fig. 1, and the method can include the following steps.
The target intermediate data refers to data required to be used in nth layer processing of the neural network, and n is a positive integer. A neural network is an operational model, which is formed by a large number of nodes (or neurons) interconnected. The neural network includes an input layer, a hidden layer, and an output layer. In a possible implementation, the hidden layer includes a convolutional layer, an active layer, a pooling layer, and a connection layer. In the embodiment of the present application, the nth layer of the neural network refers to any layer in the neural network, for example, the nth layer of the neural network is a convolutional layer, a pooling layer, a connection layer, or the like, which is not limited in the embodiment of the present application.
In one example, the target intermediate data includes a feature map corresponding to the nth layer of the neural network, and the feature map corresponding to the nth layer of the neural network refers to a feature map which needs to be used by the nth layer of the neural network for processing. For example, the pooling layer of the neural network needs to pool the feature map of the convolutional layer, and at this time, the feature map of the convolutional layer may be referred to as a feature map corresponding to the pooling layer.
In another example, the target intermediate data includes weights of an nth layer of the neural network, which refers to network weights of the nth layer of the neural network.
In yet another example, the target intermediate data includes both the corresponding feature maps of the nth layer of the neural network and the weights of the nth layer of the neural network.
It should be noted that, the compression timings of the feature map corresponding to the nth layer of the neural network and the weight value of the nth layer of the neural network are different, and the description of the compression timings may refer to the following embodiments, which are not described here first.
The processing of the neural network refers to a process of processing input data to finally obtain an output result. The input data may be images, voice, video, etc., and the embodiment of the present application is not limited thereto.
And when the nth layer of the neural network needs to use the target intermediate data for processing, the terminal acquires the compressed target intermediate data. In one example, when the nth layer of the neural network needs to use the feature map corresponding to the nth layer of the neural network, the terminal acquires the compressed feature map; in another example, when the nth layer of the neural network needs to use the weight of the nth layer of the neural network, the terminal obtains the compressed weight; in another example, when the nth layer of the neural network needs to use the feature map corresponding to the nth layer of the neural network and the weight of the nth layer of the neural network, the terminal may simultaneously obtain the feature map corresponding to the nth layer and the weight of the nth layer; or the weight of the nth layer can be obtained first, and then the characteristic diagram corresponding to the nth layer is obtained; the feature map corresponding to the nth layer may also be obtained first, and then the weight of the nth layer is obtained, which is not limited in this embodiment.
And step 203, decompressing the compressed target intermediate data to obtain target intermediate data.
After the terminal acquires the compressed target intermediate data, the compressed target intermediate data needs to be decompressed to obtain the target intermediate data.
And step 204, processing the nth layer of the neural network according to the target intermediate data.
And the nth layer of the neural network processes according to the target intermediate data.
To sum up, in the technical solution provided in this application embodiment, the intermediate data corresponding to the nth layer that is not needed to be used temporarily is compressed, and the intermediate data corresponding to the nth layer is decompressed when needed to be used, so that the memory space of the terminal is reasonably utilized, and the requirement on the memory space is reduced.
In addition, because the embodiment of the application does not need to exchange the intermediate data into the DRAM, the occupation of the DRAM memory bandwidth is avoided, so that the DRAM memory bandwidth can be used by other application programs, and the purpose of system optimization is achieved.
In addition, the embodiment of the application realizes decompression as required, and avoids space waste.
Referring to fig. 3, a flowchart of a data processing method according to another embodiment of the present application is shown. The embodiment of the application can be applied to the AI chip shown in fig. 1, and the method can include the following steps.
In an embodiment of the present application, the compression request is used to request compression of the target intermediate data.
In a possible implementation manner, the AI chip includes a control unit, the control unit calls a compression interface to send a compression request to the compression unit, so that the compression unit compresses the target intermediate data, and in response to the compression request, the compression unit compresses the target intermediate data.
In a possible implementation, the memory cell includes an SRAM. SRAM is a type of random access memory. As long as the SRAM remains powered on, the data stored within the SRAM can be constantly maintained. In a possible implementation, the storage unit comprises at least one of: NOR Flash (non-volatile Flash), PCM (phase change Memory), MRAM (Magnetic Random Access Memory). NOR Flash can be used in stand-alone and embedded applications. PCM stores data using a large difference in conductivity of chalcogenide in crystalline and amorphous states. MRAM is a non-volatile magnetic random access memory that possesses the high speed read and write capabilities of SRAM, as well as the high integration of DRAM. The embodiment of the present application does not limit the type of the memory cell.
And 303, responding to the nth layer of the neural network to process by using the target intermediate data, and acquiring the compressed target intermediate data from the storage unit.
In a possible implementation manner, the control unit calls the data obtaining interface to send a data obtaining request to the storage unit, where the data obtaining request is used to request that the target intermediate data is sent to the decompression unit.
In the embodiment of the present application, the decompression request is used to request decompression of the compressed target intermediate data.
In a possible implementation, the control unit calls the decompression interface to send a decompression request to the decompression unit. And in response to the decompression request, the decompression unit decompresses the obtained compressed target intermediate data after receiving the decompression request.
And 305, processing the nth layer of the neural network according to the target intermediate data.
In a possible implementation manner, the processing and the storage in the embodiment of the present application are performed in a storage unit. And when the decompression unit finishes decompressing the compressed target intermediate data, sending the target intermediate data to the storage unit so that the neural network processing engine on the storage unit processes the target intermediate data.
In the technical scheme provided by the embodiment of the application, the target intermediate data are compressed when not needed, and are decompressed and read when needed, so that the memory occupation of the storage unit is reduced, the memory requirement on the storage unit is reduced, the occupation space of the SRAM is saved, and the memory requirement on the AI chip is reduced.
In addition, the target intermediate data is compressed and accessed through the interface, the design is flexible and convenient, and the convenience and flexibility in system application are improved.
In an exemplary embodiment, the terminal stores the compressed target intermediate data to the storage unit by:
firstly, identification information of compressed target intermediate data is determined.
In the embodiment of the present application, the identification information is used to uniquely identify the compressed target intermediate data. The terminal can determine the target intermediate data according to the identification information of the target intermediate data, thereby facilitating the acquisition of the target intermediate data. For example, it may be determined from the identification information which layer of the neural network the compressed target intermediate data is in, or it may be determined from the identification information which layer of the neural network the compressed target intermediate data is in.
In one example, the terminal inputs the compressed target intermediate data into a hash table to obtain a hash value of the compressed target intermediate data, wherein the hash value corresponds to a mapping address in the hash table; determining index information of the hash value according to the number of layers of the target intermediate data and the operation corresponding to the target intermediate data; and determining the hash value and the index information of the hash value as identification information.
The hash table may also be referred to as a hash table, and is a data structure directly accessed from a key value. Hash values are a unique and extremely compact representation of a piece of data as a value.
After the terminal inputs the compressed target intermediate data into the hash table, the hash value of the compressed target intermediate data and the mapping address of the compressed target intermediate data in the hash table can be obtained. However, it is not possible to accurately know which layer of the feature map or which layer of the weight value the compressed target intermediate data belongs to according to the hash value, and at this time, the terminal may determine the index information of the compressed target intermediate data for the hash value according to the number of layers of the target intermediate data and the operation corresponding to the target intermediate data, and then determine the hash value and the index information of the hash value as the identification information. For example, if the compressed target intermediate data is a feature map of convolutional layers included in the 5 th layer of the neural network, the terminal may determine the index information of the hash value to be 5-convolutional, and if a plurality of convolutional layers are included in the 5 th layer of the neural network, the terminal may refine the index information of the hash value to a position of the convolutional layer generating the feature map in all convolutional layers included in the 5 th layer in order to more accurately acquire the compressed target intermediate data. For another example, if the compressed target intermediate data is the weight of the pooling layer included in the layer 6 of the neural network, the terminal may determine that the index information of the hash value is 6-pooling.
In another example, the terminal determines index information of the compressed target intermediate data according to the number of layers in which the target intermediate data is located and the operation corresponding to the target intermediate data.
And the terminal directly determines the index information of the compressed target intermediate data according to the layer number of the target intermediate data and the operation corresponding to the target intermediate data, and the operation is simple and convenient.
And secondly, correspondingly storing the compressed target intermediate data and the identification information to a storage unit.
When the compressed target intermediate data is subsequently acquired, the terminal can quickly acquire the compressed target intermediate data from the storage unit according to the identification information of the compressed target intermediate data.
Assuming that the terminal stores the compressed target intermediate data in a hash table manner, and the target intermediate data is taken as a feature map of the convolutional layer included in the 5 th layer for description, when the terminal needs to acquire the feature map of the convolutional layer included in the compressed 5 th layer, the terminal first determines that index information of the hash value is 5-convolution, and then the terminal determines the corresponding hash value according to the 5-convolution, so as to finally determine the mapping address of the feature map of the convolutional layer included in the compressed 5 th layer in the hash table, thereby acquiring the feature map of the convolutional layer included in the 5 th layer.
Assuming that the terminal accesses the compressed target intermediate data by means of an index, and the target intermediate data is taken as the feature map of the convolutional layer included in the 5 th layer for description, when the terminal needs to acquire the feature map of the convolutional layer included in the compressed 5 th layer, the terminal first determines that the index information of the feature map of the convolutional layer included in the compressed 5 th layer is 5-convolution, and then the terminal can directly acquire the feature map of the convolutional layer included in the compressed 5 th layer according to the 5-convolution.
In the embodiment of the application, one piece of identification information is determined for each piece of compressed target intermediate data, so that the subsequent reading is facilitated, the reading is convenient, and the reading granularity is controllable.
In an exemplary embodiment, the intermediate data is a feature map of the mth layer of the neural network, and m is a positive integer. Fig. 4 is a flowchart illustrating a data processing method according to another embodiment of the present application.
When the m +2 th layer of the neural network is processed by the diagnosis plan of the m-th layer of the neural network, the characteristic plan of the m-th layer of the neural network is indicated to be in an unused state temporarily, and the characteristic plan of the m-th layer can be compressed at the moment, so that the occupied space of the characteristic plan of the m-th layer in the storage unit is reduced.
And 402, responding to the nth layer of the neural network to process by using the characteristic diagram, and acquiring the compressed characteristic diagram.
And when the nth layer of the neural network needs to be processed by using the characteristic diagram of the mth layer, the terminal acquires the compressed characteristic diagram.
For example, as shown in fig. 5, a schematic diagram of the structure of a kind of Unet network is shown. The Unet network includes convolution operations, upsampling operations, max pooling operations, stacking operations, and the like. The convolution operation is an operation of processing an original image according to a template to obtain a new image; the up-sampling operation is an operation of enlarging an original image so that the enlarged image can be displayed on a higher-resolution display device; the maximum pooling operation is an operation in which the maximum value in an image region is set as a value obtained by pooling the region; a stack operation refers to an operation that connects two cells together.
In this embodiment, the mth layer is an input layer at the position of the solid line frame in fig. 5, the nth layer is an output layer at the position of the dotted line frame in fig. 5, and after the (m + 1) th layer completes processing based on the characteristic diagram of the mth layer, the terminal compresses the characteristic diagram of the mth layer through the compression unit 110, and stores the compressed characteristic diagram in the storage unit 120.
When the neural network processes to the nth layer, the nth layer needs to be stacked with the feature map of the mth layer, so that the nth layer needs to be processed by using the feature map of the mth layer, and at this time, the terminal acquires the compressed feature map.
In order to reduce the occupation of the memory cell space, if the feature map of a certain layer of the neural network is not used after the next layer of processing is completed, the feature map of the certain layer may be deleted.
And step 403, decompressing the compressed feature map to obtain the feature map.
Still taking the above example as an example, the terminal decompresses the above compressed feature map through the decompression unit 130 to obtain the feature map of the mth layer.
And step 404, processing the nth layer of the neural network according to the feature map.
And after the nth layer acquires the characteristic diagram of the mth layer, performing subsequent processing.
For the description of steps 403 to 404, reference may be made to the above embodiments, which are not described herein again.
In an exemplary embodiment, the target intermediate data are weights of the nth layer of the neural network. Fig. 6 is a flowchart illustrating a data processing method according to another embodiment of the present application.
In a possible implementation, the weights of the layers comprised by the neural network are compressed separately before loading the neural network. For example, assuming that the neural network includes 3 layers, the compression results in 3 compressed weights, respectively.
And when the nth layer of the neural network needs to use the weight value of the neural network for processing, obtaining the compressed weight value.
And step 604, processing the nth layer of the neural network according to the weight value.
In a possible implementation manner, after the nth layer of the neural network completes processing, the weight may be deleted to reduce the occupation of the storage unit space.
For the description of steps 602 to 604, reference may be made to the above embodiments, which are not repeated herein.
Compared with the prior art in which the weights of all layers of the neural network are integrally decompressed, the embodiment of the application realizes decompression as required, and avoids waste of storage unit space.
In an exemplary embodiment, the target data includes weights of an nth layer of the neural network and a feature map of an mth layer. Fig. 7 is a flowchart illustrating a data processing method according to another embodiment of the present application.
In a possible implementation, after the training of the neural network is completed, the weights of the layers included in the neural network are separately compressed.
And step 702, responding to the m +2 th layer of the neural network, wherein the characteristic diagram of the m-th layer of the neural network cannot be processed, and compressing the characteristic diagram of the m-th layer.
When the m +2 th layer of the neural network is processed by the feature map of the m-th layer of the neural network, which indicates that the feature map of the m-th layer of the neural network is temporarily in an unused state, the feature map of the m-th layer can be compressed, so that the occupied space of the feature map of the m-th layer in the storage unit is reduced.
The intermediate data comprises a weight of the nth layer and a feature map of the mth layer, and the terminal can obtain the weight of the nth layer and then obtain the feature map of the mth layer; or, the terminal may also obtain the feature map of the mth layer first and then obtain the weight of the nth layer; or, the terminal may also obtain the feature map of the mth layer and the weight of the nth layer at the same time.
Step 704, decompressing the compressed intermediate data to obtain the target intermediate data.
The terminal may decompress the weight of the nth layer first, and then decompress the feature map of the mth layer; or, the terminal may also decompress the feature map of the mth layer first, and then decompress the weight of the nth layer; or, the terminal may also decompress the feature map of the mth layer and the weight of the nth layer at the same time.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 8, a block diagram of a data processing apparatus provided in an embodiment of the present application is shown, where the apparatus has a function of implementing the above method example, and the function may be implemented by hardware or by hardware executing corresponding software. The apparatus 800 may include: a data compression module 810, a data acquisition module 820, a data decompression module 830, and a data processing module 840.
A data compression module 810, configured to compress target intermediate data, where the target intermediate data is data that needs to be used when processing an nth layer of a neural network, and n is a positive integer;
a data obtaining module 820, configured to respond to the nth layer of the neural network and perform processing using the target intermediate data, and obtain the compressed target intermediate data;
a data decompression module 830, configured to decompress the compressed target intermediate data to obtain the target intermediate data;
a data processing module 840, configured to process an nth layer of the neural network according to the target intermediate data.
To sum up, in the technical solution provided in this application embodiment, the intermediate data corresponding to the nth layer that is not needed to be used temporarily is compressed, and the intermediate data corresponding to the nth layer is decompressed when needed to be used, so that the memory space of the terminal is reasonably utilized, and the requirement on the memory space is reduced.
Optionally, the data compression module 810 is configured to:
sending a compression request to a compression unit through a compression interface, the compression request requesting compression of the target intermediate data,
in response to the compression request, the compression unit compresses the target intermediate data.
Optionally, the apparatus 800 further includes: and a data storage module (not shown).
The data storage module is used for storing the compressed target intermediate data to a storage unit;
a data acquisition module 820 for:
and acquiring the compressed target intermediate data from the storage unit.
Optionally, the data storage module includes: an identification determination unit and a data storage unit.
An identification determining unit, configured to determine identification information of the compressed target intermediate data, where the identification information is used to uniquely identify the compressed target intermediate data;
and the data storage unit is used for correspondingly storing the compressed target intermediate data and the identification information to the storage unit.
Optionally, the identification determining unit is configured to:
inputting the compressed target intermediate data into a hash table to obtain a hash value of the compressed target intermediate data, wherein the hash value corresponds to a mapping address in the hash table;
determining index information of the hash value according to the number of layers of the target intermediate data and the operation corresponding to the target intermediate data;
and determining the hash value and the index information of the hash value as the identification information.
Optionally, the identification determining unit is configured to:
and determining index information of the compressed target intermediate data according to the layer number of the target intermediate data and the operation corresponding to the target intermediate data.
Optionally, the data decompression module 830 is configured to:
sending a decompression request to a decompression unit through a decompression interface, the decompression request requesting decompression of the compressed target intermediate data,
in response to the decompression request, the decompression unit decompresses the target intermediate data.
Optionally, the target intermediate data is a weight of an nth layer of the neural network;
the data compression module 810 is configured to:
compressing the weights of the nth layer of the neural network prior to loading the neural network.
Optionally, the target intermediate data further includes a feature map of an mth layer of the neural network, where m is a positive integer less than or equal to n;
the data compression module 810 is further configured to:
compressing the feature map of the mth layer in response to the mth +2 layer of the neural network not being processed with the feature map of the mth layer of the neural network.
Optionally, the memory unit comprises a static random access memory SRAM.
It should be noted that, when the apparatus provided in the foregoing embodiment implements the functions thereof, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the apparatus may be divided into different functional modules to implement all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
Referring to fig. 9, a block diagram of a terminal according to an embodiment of the present application is shown.
The terminal in the embodiment of the present application may include one or more of the following components: a processor 910 and a memory 920.
Optionally, the processor 910, when executing the program instructions in the memory 920, implements the methods provided by the various method embodiments described above.
The Memory 920 may include a Random Access Memory (RAM) or a Read-Only Memory (ROM). Optionally, the memory 920 includes a non-transitory computer-readable medium. The memory 920 may be used to store instructions, programs, code sets, or instruction sets. The memory 920 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function, instructions for implementing the various method embodiments described above, and the like; the storage data area may store data created according to the use of the terminal, and the like.
The structure of the terminal described above is only illustrative, and in actual implementation, the terminal may include more or less components, such as: a display screen, etc., which are not limited in this embodiment.
Those skilled in the art will appreciate that the configuration shown in fig. 9 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components may be used.
In an exemplary embodiment, a computer-readable storage medium is also provided, in which a computer program is stored, which is loaded and executed by a processor of a computer device to implement the individual steps in the above-described method embodiments.
In an exemplary embodiment, a computer program product is also provided for implementing the above method when executed.
The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.
Claims (13)
1. A method of data processing, the method comprising:
compressing target intermediate data, wherein the target intermediate data refers to data required to be used during nth layer processing of a neural network, and n is a positive integer;
responding to the nth layer of the neural network to process by using the target intermediate data, and acquiring the compressed target intermediate data;
decompressing the compressed target intermediate data to obtain the target intermediate data;
and processing the nth layer of the neural network according to the target intermediate data.
2. The method of claim 1, wherein compressing the target intermediate data comprises:
sending a compression request to a compression unit through a compression interface, the compression request requesting compression of the target intermediate data,
in response to the compression request, the compression unit compresses the target intermediate data.
3. The method of claim 1, wherein after compressing the target intermediate data, further comprising:
storing the compressed target intermediate data to a storage unit;
the acquiring the compressed target intermediate data includes:
and acquiring the compressed target intermediate data from the storage unit.
4. The method of claim 3, wherein storing the compressed target intermediate data to a storage unit comprises:
determining identification information of the compressed target intermediate data, wherein the identification information is used for uniquely identifying the compressed target intermediate data;
and correspondingly storing the compressed target intermediate data and the identification information to the storage unit.
5. The method of claim 4, wherein the determining the identification information of the compressed target intermediate data comprises:
inputting the compressed target intermediate data into a hash table to obtain a hash value of the compressed target intermediate data, wherein the hash value corresponds to a mapping address in the hash table;
determining index information of the hash value according to the number of layers of the target intermediate data and the operation corresponding to the target intermediate data;
and determining the hash value and the index information of the hash value as the identification information.
6. The method of claim 4, wherein the determining the identification information of the compressed target intermediate data comprises:
and determining index information of the compressed target intermediate data according to the layer number of the target intermediate data and the operation corresponding to the target intermediate data.
7. The method according to claim 1, wherein the decompressing the compressed target intermediate data to obtain the target intermediate data comprises:
sending a decompression request to a decompression unit through a decompression interface, the decompression request requesting decompression of the compressed target intermediate data,
in response to the decompression request, the decompression unit decompresses the target intermediate data.
8. The method according to any one of claims 1 to 7, wherein the target intermediate data is a weight of an nth layer of the neural network;
the compressing the target intermediate data comprises the following steps:
compressing the weights of the nth layer of the neural network prior to loading the neural network.
9. The method of claim 8, wherein the target intermediate data further comprises a feature map of an mth layer of the neural network, the m being a positive integer less than or equal to the n;
after compressing the weight of the nth layer of the neural network, the method further includes:
compressing the feature map of the mth layer in response to the mth +2 layer of the neural network not being processed with the feature map of the mth layer of the neural network.
10. The method according to any one of claims 1 to 7, wherein the memory unit comprises a Static Random Access Memory (SRAM).
11. A data processing apparatus, characterized in that the apparatus comprises:
the data compression module is used for compressing target intermediate data, wherein the target intermediate data refers to data which needs to be used during nth layer processing of the neural network, and n is a positive integer;
the data acquisition module is used for responding to the nth layer of the neural network and processing the target intermediate data to acquire the compressed target intermediate data;
the data decompression module is used for decompressing the compressed target intermediate data to obtain the target intermediate data;
and the data processing module is used for processing the nth layer of the neural network according to the target intermediate data.
12. A terminal, characterized in that it comprises a processor and a memory, said memory storing a computer program which is loaded and executed by said processor to implement the data processing method according to any one of claims 1 to 10.
13. A computer-readable storage medium, in which a computer program is stored, which is loaded and executed by a processor to implement the data processing method according to any one of claims 1 to 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010676005.1A CN111832692A (en) | 2020-07-14 | 2020-07-14 | Data processing method, device, terminal and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010676005.1A CN111832692A (en) | 2020-07-14 | 2020-07-14 | Data processing method, device, terminal and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111832692A true CN111832692A (en) | 2020-10-27 |
Family
ID=72923210
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010676005.1A Withdrawn CN111832692A (en) | 2020-07-14 | 2020-07-14 | Data processing method, device, terminal and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111832692A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114723033A (en) * | 2022-06-10 | 2022-07-08 | 成都登临科技有限公司 | Data processing method, data processing device, AI chip, electronic device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106096023A (en) * | 2016-06-24 | 2016-11-09 | 腾讯科技(深圳)有限公司 | Method for reading data, method for writing data and data server |
CN106447034A (en) * | 2016-10-27 | 2017-02-22 | 中国科学院计算技术研究所 | Neutral network processor based on data compression, design method and chip |
CN108255859A (en) * | 2016-12-29 | 2018-07-06 | 航天信息股份有限公司 | A kind of method and system for being used to establish index for mass digital certificate |
US20190190538A1 (en) * | 2017-12-18 | 2019-06-20 | Facebook, Inc. | Accelerator hardware for compression and decompression |
CN110163370A (en) * | 2019-05-24 | 2019-08-23 | 上海肇观电子科技有限公司 | Compression method, chip, electronic equipment and the medium of deep neural network |
CN111047020A (en) * | 2018-10-12 | 2020-04-21 | 上海寒武纪信息科技有限公司 | Neural network operation device and method supporting compression and decompression |
-
2020
- 2020-07-14 CN CN202010676005.1A patent/CN111832692A/en not_active Withdrawn
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106096023A (en) * | 2016-06-24 | 2016-11-09 | 腾讯科技(深圳)有限公司 | Method for reading data, method for writing data and data server |
CN106447034A (en) * | 2016-10-27 | 2017-02-22 | 中国科学院计算技术研究所 | Neutral network processor based on data compression, design method and chip |
CN108255859A (en) * | 2016-12-29 | 2018-07-06 | 航天信息股份有限公司 | A kind of method and system for being used to establish index for mass digital certificate |
US20190190538A1 (en) * | 2017-12-18 | 2019-06-20 | Facebook, Inc. | Accelerator hardware for compression and decompression |
CN111047020A (en) * | 2018-10-12 | 2020-04-21 | 上海寒武纪信息科技有限公司 | Neural network operation device and method supporting compression and decompression |
CN110163370A (en) * | 2019-05-24 | 2019-08-23 | 上海肇观电子科技有限公司 | Compression method, chip, electronic equipment and the medium of deep neural network |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114723033A (en) * | 2022-06-10 | 2022-07-08 | 成都登临科技有限公司 | Data processing method, data processing device, AI chip, electronic device and storage medium |
CN114723033B (en) * | 2022-06-10 | 2022-08-19 | 成都登临科技有限公司 | Data processing method, data processing device, AI chip, electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11909422B2 (en) | Neural network processor using compression and decompression of activation data to reduce memory bandwidth utilization | |
US20190318231A1 (en) | Method for acceleration of a neural network model of an electronic euqipment and a device thereof related appliction information | |
WO2022037257A1 (en) | Convolution calculation engine, artificial intelligence chip, and data processing method | |
CN110991630A (en) | Convolutional neural network processor for edge calculation | |
TWI766568B (en) | Processing device for executing convolution neural network computation and operation method thereof | |
CN111338695A (en) | Data processing method based on pipeline technology and related product | |
WO2021147276A1 (en) | Data processing method and apparatus, and chip, electronic device and storage medium | |
CN110647981B (en) | Data processing method, data processing device, computer equipment and storage medium | |
US11494237B2 (en) | Managing workloads of a deep neural network processor | |
CN111832692A (en) | Data processing method, device, terminal and storage medium | |
CN110458285B (en) | Data processing method, data processing device, computer equipment and storage medium | |
CN115456149B (en) | Impulse neural network accelerator learning method, device, terminal and storage medium | |
US20200285955A1 (en) | Method for accelerating deep learning and user terminal | |
US20210224632A1 (en) | Methods, devices, chips, electronic apparatuses, and storage media for processing data | |
CN116051345A (en) | Image data processing method, device, computer equipment and readable storage medium | |
US11507349B2 (en) | Neural processing element with single instruction multiple data (SIMD) compute lanes | |
CN112612427A (en) | Vehicle stop data processing method and device, storage medium and terminal | |
US11741349B2 (en) | Performing matrix-vector multiply operations for neural networks on electronic devices | |
CN115456858B (en) | Image processing method, device, computer equipment and computer readable storage medium | |
CN113435591B (en) | Data processing method, device, computer equipment and storage medium | |
CN117370488A (en) | Data processing method, device, electronic equipment and computer readable storage medium | |
CN114077889A (en) | Neural network processor and data processing method | |
KR20220034542A (en) | STORAGE DEVICE, and METHOD OF OPERATING STORAGE DEVICE | |
CN118153552A (en) | Data analysis method, device, computer equipment and storage medium | |
CN116360575A (en) | Data processing method, device, terminal equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20201027 |
|
WW01 | Invention patent application withdrawn after publication |