CN111160516B

CN111160516B - Convolutional layer sparsification method and device for deep neural network

Info

Publication number: CN111160516B
Application number: CN201811320668.9A
Authority: CN
Inventors: 张渊; 谢迪; 浦世亮
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-11-07
Filing date: 2018-11-07
Publication date: 2023-09-05
Anticipated expiration: 2038-11-07
Also published as: CN111160516A

Abstract

The embodiment of the application provides a convolutional layer sparsification method and device of a deep neural network, wherein the method comprises the following steps: acquiring tensors and structural parameters of a convolution layer in a deep neural network; dividing weights in the same dimension in tensors of the convolution layer into units in the same dimension by using a preset dimension dividing method according to structural parameters of the convolution layer to obtain a plurality of dimension units; and for each dimension unit, carrying out the same sparsification operation on each weight in the dimension unit according to the weight in the dimension unit. By the scheme, the access efficiency of the memory can be improved.

Description

Convolutional layer sparsification method and device for deep neural network

Technical Field

The application relates to the technical field of machine learning, in particular to a convolutional layer sparsification method and device of a deep neural network.

Background

Deep neural networks, which are an emerging field in machine learning research, analyze data by mimicking the mechanisms of the human brain, and are an intelligent model for analytical learning by building and simulating the human brain. The deep neural network has been well applied in the aspects of target detection and segmentation, behavior detection and recognition, voice recognition, etc. However, with the continuous development of the deep neural network, the scale of the deep neural network is continuously increased, and the storage amount and the calculated amount of data are increasingly large, so that the deep neural network has higher calculation complexity and needs strong hardware resources.

In order to reduce the computational complexity of the deep neural network and reduce the pressure of hardware resources, the network model of the deep neural network needs to be compressed, and the current mainstream method for compressing the network model of the deep neural network has localization, sparsification and the like. The sparsification mainly comprises the steps of setting unimportant weights in the deep neural network to zero and keeping the important weights unchanged, so that dense weights of the deep neural network are converted into sparse weights, and the storage capacity and the calculated amount are obviously reduced.

However, in the corresponding deep neural network sparsification method, the sparsification operation is directed to each weight in the deep neural network, and the sparsification of each weight is irregular and circulated due to the irregularity of each weight. When the subsequent network model training and other processes are performed, although some weights in the deep neural network are set to zero, specific weights cannot be regularly determined, and each weight needs to be read from the memory, so that the memory access efficiency is low.

Disclosure of Invention

The embodiment of the application aims to provide a convolutional layer thinning method and device of a deep neural network, so as to improve the access efficiency of a memory. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a convolutional layer thinning method of a deep neural network, where the method includes:

acquiring tensors and structural parameters of a convolution layer in a deep neural network;

according to the structural parameters, dividing weights in the same dimension in the tensor into units in the same dimension by using a preset dimension dividing method to obtain a plurality of dimension units;

and for each dimension unit, carrying out the same sparsification operation on each weight in the dimension unit according to the weight in the dimension unit.

Optionally, according to the structural parameter, a preset dimension dividing method is used to divide weights in the same dimension in the tensor into units in the same dimension, so as to obtain a plurality of dimension units, including:

dividing weights in the same row in the tensor into units in the same dimension according to row parameters of the spatial dimension of the filter in the structural parameters to obtain a plurality of row dimension units;

or alternatively, the process may be performed,

and dividing weights in the same column in the tensor into units in the same dimension according to column parameters of the spatial dimension of the filter in the structural parameters to obtain a plurality of column dimension units.

and dividing weights in the same filter space in the tensor into the same dimension units according to the filter space dimension parameters in the structural parameters to obtain a plurality of space dimension units.

and dividing weights in the same input channel in the tensor into units in the same dimension according to the input channel parameters in the structural parameters to obtain a plurality of input channel dimension units.

Optionally, for each dimension unit, performing the same sparsification operation on each weight in the dimension unit according to the weight in the dimension unit, including:

summing the absolute values of all weights in each dimension unit aiming at each dimension unit to obtain a calculation result;

judging whether the calculation result is larger than a preset threshold value or not;

if not, setting the weight in the dimension unit to zero;

if the weight is larger than the preset weight, each weight in the dimension unit is kept unchanged.

In a second aspect, an embodiment of the present application provides a convolutional layer thinning apparatus for a deep neural network, where the apparatus includes:

the acquisition module is used for acquiring tensors and structural parameters of the convolution layers in the deep neural network;

the dividing module is used for dividing the weight value in the same dimension in the tensor into units in the same dimension by using a preset dimension dividing method according to the structural parameters to obtain a plurality of dimension units;

the sparse module is used for carrying out the same sparse operation on the weight values in each dimension unit according to the weight values in the dimension unit aiming at each dimension unit.

Optionally, the dividing module is specifically configured to:

or alternatively, the process may be performed,

Optionally, the dividing module is specifically configured to:

Optionally, the sparse module is specifically configured to:

if not, setting the weight in the dimension unit to zero;

In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: all steps of the convolutional layer thinning method of the deep neural network provided by the first aspect of the embodiment of the application are realized.

In a fourth aspect, embodiments of the present application provide a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, implement all the steps of the convolutional layer thinning method of a deep neural network provided in the first aspect of the present application.

According to the method and the device for sparsifying the convolution layer of the deep neural network, tensor and structural parameters of the convolution layer in the deep neural network are obtained, weights in the same dimension in the tensor of the convolution layer are divided into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolution layer, multiple dimension units are obtained, and the same sparsifying operation is carried out on the weights in the dimension units according to the weights in the dimension units aiming at the dimension units. The tensor of the convolution layer is subjected to dimension division to obtain a plurality of dimension units, the same sparsification operation is adopted for all weights in the same dimension unit, so that the sparsification of the convolution layer of the deep neural network has certain regularity, when the sparsified weights are stored, the dimension with the weight set to zero can be marked, when the weights are read from the memory, only the weights which are not set to zero can be read based on the mark, and the access efficiency of the memory is improved.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of the effect of convolutional layer sparsification of a corresponding deep neural network;

FIG. 2 is a schematic flow chart of a convolutional layer thinning method of a deep neural network according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a row dimension unit according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a column dimension unit according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a spatial dimension unit according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an input channel dimension unit according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a convolutional layer thinning apparatus of a deep neural network according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Fig. 1 is a schematic diagram of the effect after the model weight is thinned, and since the thinning operation is directed to a single weight, the thinning effect of the weight in each filter of size r×s in the convolutional layer of the deep neural network of size c×k×r×s is irregular (the black small box in fig. 1 represents the weight that remains unchanged, and the white small box represents the weight that is set to zero). In theory, the sparse method can convert the convolution layer tensor of the deep neural network into very sparse convolution layer tensor, so that the calculation and storage resource consumption of the deep neural network is greatly reduced.

However, this sparsing method is not well supported on general-purpose hardware acceleration platforms (e.g., GPU (Graphics Processing Unit, graphics processing unit)) due to irregularities, and often requires a customized hardware accelerator to implement, such as FPGA (Field-Programmable Gate Array, field programmable gate array), ASIC (Application Specific Integrated Circuit ), etc. Although this irregular weight sparsification method can be implemented by customizing the hardware accelerator, it also often requires a higher hardware design cost.

Therefore, in order to improve the access efficiency of the memory and reduce the design cost of the customized hardware accelerator, the embodiment of the application provides a convolutional layer thinning method, a device, an electronic device and a machine-readable storage medium of a deep neural network.

The method for thinning the convolutional layer of the deep neural network provided by the embodiment of the application is described below.

The execution main body of the convolutional layer sparsification method of the deep neural network provided by the embodiment of the application can be electronic equipment for realizing the functions of target detection and segmentation, behavior detection and recognition, voice equipment and the like. The manner of implementing the convolutional layer thinning method of the deep neural network provided by the embodiment of the application can be at least one manner of software, hardware circuits and logic circuits arranged in an execution body.

As shown in fig. 2, the method for thinning a convolutional layer of a deep neural network provided by the embodiment of the application may include the following steps:

s201, tensor and structural parameters of a convolution layer in the deep neural network are obtained.

The deep neural network is a relatively wide data processing method, and specifically, the deep neural network can be any one of data processing methods such as CNN (Convolutional Neural Network ), RNN (Recurrent Neural Network, cyclic neural network), LSTM (Long Short Term Memory, long-term memory network) and the like.

Each network layer of the deep neural network is also called a convolution layer because it mainly performs convolution operation, and the tensor of the convolution layer is actually each specific weight value in the convolution layer, for example, the tensor of the convolution layer W is a four-dimensional tensor, which means that when the convolution layer is expressed as c×k×r×s dimensions, the specific weight value in the four-dimensional tensor. The structural parameters of the convolution layer include the number of output channels C, the number of input channels K, and the filter spatial dimension size R x S of the convolution layer.

S202, dividing weights in the same dimension in tensors of the convolution layer into units in the same dimension by using a preset dimension dividing method according to structural parameters of the convolution layer, and obtaining a plurality of dimension units.

Considering convolution characteristics, weights in the same dimension in the same convolution layer of the deep neural network often have more similar numerical ranges, for example, weights in the same row in the convolution layer are similar in size, weights in the same column in some convolution layers are similar in size, and weights in the same filter space dimension in other convolution layers are similar in size, which is related to initial setting of the convolution layer structure and functional setting of the deep neural network. Therefore, the weights in the same dimension in the tensor of the convolution layer can be divided into units in the same dimension according to the structural parameters of the convolution layer. The same dimensions mentioned in embodiments of the present application may include, but are not limited to: the same filter space dimension, the same row, the same column, the same input channel in the filter space dimension.

The embodiment of the application can determine the form of dimension unit division according to the actual situation of the deep neural network. Optionally, the manner of dimension unit division of the tensor of the convolution layer may be specifically divided into the following several manners:

in the first mode, according to the line parameters of the spatial dimension of the filter in the structural parameters of the convolution layer, the weights in the same line in the tensor of the convolution layer are divided into the same dimension units, so that a plurality of line dimension units are obtained.

For the case that the weight difference of the same row in the convolution layer is smaller, the weights in the same row in the tensor of the convolution layer can be divided into units in the same dimension according to the row parameter of the spatial dimension of the filter in the structural parameter of the convolution layer, and a row dimension unit as shown in fig. 3 can be obtained, wherein the mathematical expression of the row dimension unit is as follows:

cell＝W(c,k,r,:)

in the second mode, according to column parameters of the spatial dimension of the filter in the structural parameters of the convolution layer, weights in the same column in the tensor of the convolution layer are divided into units in the same dimension, so that a plurality of units in the column dimension are obtained.

For the case that the weight difference of the same column in the convolution layer is smaller, the weights in the same column in the tensor of the convolution layer can be divided into units in the same dimension according to the column parameter of the spatial dimension of the filter in the structural parameter of the convolution layer, and a column dimension unit as shown in fig. 4 can be obtained, wherein the mathematical expression of the column dimension unit is as follows:

cell＝W(c,k,:,s)

in a third mode, according to the spatial dimension parameters of the filter in the structural parameters of the convolution layer, the weight in the same filter space in the tensor of the convolution layer is divided into the same dimension units, so as to obtain a plurality of spatial dimension units.

For the case that the weight difference in the same filter space in the convolution layer is smaller, the weight in the same filter space in the tensor of the convolution layer can be divided into the same dimension unit according to the filter space dimension parameter in the structural parameter of the convolution layer, so that the space dimension unit shown in fig. 5 can be obtained, and the mathematical expression of the space dimension unit is as follows:

cell＝W(c,k,:,:)

in a fourth mode, according to input channel parameters in structural parameters of a convolution layer, weights in the same input channel in tensors of the convolution layer are divided into units in the same dimension, so that a plurality of input channel dimension units are obtained.

For the case that the weight difference in the same input channel in the convolution layer is smaller, the weights in the same input channel in the tensor of the convolution layer can be divided into the same dimension units according to the input channel parameters in the structural parameters of the convolution layer, the input channel dimension units shown in fig. 6 can be obtained, and the mathematical expression of the input channel dimension units is as follows:

cell＝W(c,:,r,s)

the four manners of performing dimension unit division on the tensor of the convolution layer are only given as examples, but the manner of performing actual dimension unit division is not limited to this, and only the weights with relatively similar weights are divided into the same dimension unit, for example, multiple rows are divided into the same dimension unit, multiple columns are divided into the same dimension unit, and the like, and certain regularity of each weight in the same dimension unit can be ensured through the division of the dimension unit.

S203, for each dimension unit, performing the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit.

After the tensor of the convolution layer is divided to obtain each dimension unit, as each weight value in the same dimension unit has certain regularity, the dimension unit can be considered as a structured unit, the same sparsification operation can be performed on each weight value in one dimension unit based on the weight value in the dimension unit, and after the sparsification operation, the processed dimension unit is wholly divided into zero value or non-zero value.

The basis for performing the sparsification operation on each weight in the dimension unit can be various, for example, whether the maximum weight in the dimension unit is smaller than a preset threshold value can be judged, if so, it is explained that the weight in the dimension unit is not important for the deep neural network operation, and the weight in the dimension unit can be set to zero; whether the minimum weight in the dimension unit is larger than a preset threshold value or not can be judged, if so, the weight in the dimension unit is important for the deep neural network operation, and the weight in the dimension unit can be reserved; the average value of all the weights in the dimension unit can be calculated, whether the average value is larger than a preset threshold value or not is judged, if so, the weights in the dimension unit are indicated to be important for the deep neural network operation, all the weights in the dimension unit can be reserved, if not, the weights in the dimension unit are indicated to be unimportant for the deep neural network operation, and all the weights in the dimension unit can be set to zero; and the absolute value of each weight in the dimension unit can be summed up and calculated, whether the calculation result is larger than a preset threshold value is judged, if so, the weight in the dimension unit is indicated to be important for the deep neural network operation, the weight in the dimension unit can be reserved, if not, the weight in the dimension unit is indicated to be unimportant for the deep neural network operation, and the weight in the dimension unit can be set to zero. The basis for performing the sparsification operation on each weight in the dimension unit is not limited to the above modes, and the mode of comprehensively determining the importance of the weight in the dimension unit to the deep neural network operation belongs to the protection scope of the embodiment of the present application, and is not described in detail herein.

Optionally, S203 may specifically be:

if not, setting the weight in the dimension unit to zero;

For the dimension units, although the values of the weights in the same dimension unit are similar, the weights with larger difference do not exist occasionally, but the weights in the same dimension unit are similar to each other in whole, so that in order to avoid the situation that the final sparsification is influenced, the basis of the sparsification operation of the weights in the dimension unit can be specifically calculated to evaluate the importance of the dimension unit in the whole deep neural network by summing the absolute values of the weights in the dimension unit, the dimension unit with large calculation result can be set to be important, the dimension unit with small calculation result is set to be unimportant, all the weights in the unimportant dimension unit can be set to zero, and the original weights can be kept unchanged for the important dimension unit. In the subsequent deep neural network training process, only the weight of the important dimension unit can be finely adjusted, so that the calculated amount of the whole deep neural network operation is reduced.

By applying the embodiment, tensor and structural parameters of a convolution layer in a deep neural network are acquired, weights in the same dimension in the tensor of the convolution layer are divided into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolution layer, a plurality of dimension units are obtained, and for each dimension unit, the same sparsification operation is carried out on each weight in the dimension unit according to the weights in the dimension unit. The tensor of the convolution layer is subjected to dimension division to obtain a plurality of dimension units, the same sparsification operation is adopted for all weights in the same dimension unit, so that the sparsification of the convolution layer of the deep neural network has certain regularity, when the sparsified weights are stored, the dimension with the weight set to zero can be marked, when the weights are read from the memory, only the weights which are not set to zero can be read based on the mark, and the access efficiency of the memory is improved. Moreover, as the convolutional layer sparsification of the neural network has certain regularity, the operation of the sparse deep neural network can be realized on a general hardware acceleration platform, and the hardware design cost is greatly reduced.

Corresponding to the above method embodiment, the embodiment of the present application provides a convolutional layer thinning apparatus of a deep neural network, as shown in fig. 7, the apparatus may include:

an obtaining module 710, configured to obtain tensors and structural parameters of a convolutional layer in the deep neural network;

the dividing module 720 is configured to divide weights in the same dimension in the tensor into units in the same dimension by using a preset dimension dividing method according to the structural parameter, so as to obtain a plurality of dimension units;

the sparse module 730 is configured to perform, for each dimension unit, the same sparse operation on each weight in the dimension unit according to the weight in the dimension unit.

Optionally, the dividing module 720 may specifically be configured to:

or alternatively, the process may be performed,

Optionally, the dividing module 720 may specifically be configured to:

Optionally, the sparse module 730 may specifically be configured to:

if not, setting the weight in the dimension unit to zero;

By applying the embodiment, tensor and structural parameters of a convolution layer in a deep neural network are acquired, weights in the same dimension in the tensor of the convolution layer are divided into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolution layer, a plurality of dimension units are obtained, and for each dimension unit, the same sparsification operation is carried out on each weight in the dimension unit according to the weights in the dimension unit. The tensor of the convolution layer is subjected to dimension division to obtain a plurality of dimension units, the same sparsification operation is adopted for all weights in the same dimension unit, so that the sparsification of the convolution layer of the deep neural network has certain regularity, when the sparsified weights are stored, the dimension with the weight set to zero can be marked, when the weights are read from the memory, only the weights which are not set to zero can be read based on the mark, and the access efficiency of the memory is improved.

Embodiments of the present application also provide an electronic device, as shown in fig. 8, may include a processor 801 and a machine-readable storage medium 802, the machine-readable storage medium 802 storing machine-executable instructions capable of being executed by the processor 801, the processor 801 being caused by the machine-executable instructions to: all steps of the convolutional layer thinning method of the deep neural network are realized.

The machine-readable storage medium may include RAM (Random Access Memory ) or NVM (Non-Volatile Memory), such as at least one magnetic disk Memory. In the alternative, the machine-readable storage medium may also be at least one memory device located remotely from the processor.

The processor may be a general-purpose processor, including a CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but also DSP (Digital Signal Processing, digital signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field-Programmable Gate Array, field programmable gate array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.

The machine-readable storage medium 802 and the processor 801 may be in data communication by wired or wireless connection, and the electronic device may communicate with other devices via a wired or wireless communication interface. Fig. 8 is merely an example of data transmission between the processor 801 and the machine-readable storage medium 802 via a bus, and is not limited to a specific connection.

In this embodiment, the processor 801 is capable of implementing by reading machine-executable instructions stored in the machine-readable storage medium 802 and by executing the machine-executable instructions: the method comprises the steps of obtaining tensors and structural parameters of a convolution layer in a deep neural network, dividing weights in the same dimension in the tensors of the convolution layer into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolution layer, obtaining a plurality of dimension units, and carrying out the same sparsification operation on the weights in the dimension units according to the weights in the dimension units aiming at the dimension units. The tensor of the convolution layer is subjected to dimension division to obtain a plurality of dimension units, the same sparsification operation is adopted for all weights in the same dimension unit, so that the sparsification of the convolution layer of the deep neural network has certain regularity, when the sparsified weights are stored, the dimension with the weight set to zero can be marked, when the weights are read from the memory, only the weights which are not set to zero can be read based on the mark, and the access efficiency of the memory is improved.

The embodiment of the application also provides a machine-readable storage medium which stores machine-executable instructions and when being called and executed by a processor, realizes all steps of the convolutional layer thinning method of the deep neural network.

In this embodiment, the machine-readable storage medium stores machine-executable instructions for executing the convolutional layer thinning method of the deep neural network provided by the embodiment of the present application at runtime, so that it can implement: the method comprises the steps of obtaining tensors and structural parameters of a convolution layer in a deep neural network, dividing weights in the same dimension in the tensors of the convolution layer into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolution layer, obtaining a plurality of dimension units, and carrying out the same sparsification operation on the weights in the dimension units according to the weights in the dimension units aiming at the dimension units. The tensor of the convolution layer is subjected to dimension division to obtain a plurality of dimension units, the same sparsification operation is adopted for all weights in the same dimension unit, so that the sparsification of the convolution layer of the deep neural network has certain regularity, when the sparsified weights are stored, the dimension with the weight set to zero can be marked, when the weights are read from the memory, only the weights which are not set to zero can be read based on the mark, and the access efficiency of the memory is improved.

For the electronic device and the machine-readable storage medium embodiments, the description is relatively simple, and reference should be made to part of the description of the method embodiments for the relevant matters, since the method content involved is basically similar to the method embodiments described above.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, electronic devices, and machine-readable storage medium embodiments, the description is relatively simple as it is substantially similar to method embodiments, with reference to the section descriptions of method embodiments being merely illustrative.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. The convolutional layer sparsification method of the deep neural network is characterized by being applied to a hardware acceleration platform, wherein the hardware acceleration platform comprises a memory, and the method comprises the following steps:

tensor and structural parameters of a convolution layer in a deep neural network are obtained, wherein the deep neural network is used for target detection and segmentation, behavior detection and recognition and voice recognition;

aiming at each dimension unit, carrying out the same sparsification operation on each weight in the dimension unit according to the weight in the dimension unit; when the weight is stored and thinned, the dimension with the weight set to zero is marked, and when the weight is read from the memory, only the weight which is not set to zero is read based on the mark.

2. The method according to claim 1, wherein the dividing weights in the same dimension in the tensor into the same dimension units by using a preset dimension dividing method according to the structural parameter to obtain a plurality of dimension units includes:

or alternatively, the process may be performed,

3. The method according to claim 1, wherein the dividing weights in the same dimension in the tensor into the same dimension units by using a preset dimension dividing method according to the structural parameter to obtain a plurality of dimension units includes:

4. The method according to claim 1, wherein the dividing weights in the same dimension in the tensor into the same dimension units by using a preset dimension dividing method according to the structural parameter to obtain a plurality of dimension units includes:

5. The method according to claim 1, wherein for each dimension unit, performing the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit, includes:

if not, setting the weight in the dimension unit to zero;

6. The convolutional layer sparsification device of the deep neural network is characterized by being applied to a hardware acceleration platform, wherein the hardware acceleration platform comprises a memory, and the device comprises:

the acquisition module is used for acquiring tensors and structural parameters of a convolution layer in the deep neural network, wherein the deep neural network is used for target detection and segmentation, behavior detection and recognition and voice recognition;

the sparse module is used for carrying out the same sparse operation on each weight value in each dimension unit according to the weight value in the dimension unit; when the weight is stored and thinned, the dimension with the weight set to zero is marked, and when the weight is read from the memory, only the weight which is not set to zero is read based on the mark.

7. The apparatus of claim 6, wherein the partitioning module is specifically configured to:

or alternatively, the process may be performed,

8. The apparatus of claim 6, wherein the partitioning module is specifically configured to:

9. The apparatus of claim 6, wherein the partitioning module is specifically configured to:

10. The apparatus of claim 6, wherein the sparseness module is configured to:

if not, setting the weight in the dimension unit to zero;