CN111160516A

CN111160516A - Convolutional layer sparsization method and device of deep neural network

Info

Publication number: CN111160516A
Application number: CN201811320668.9A
Authority: CN
Inventors: 张渊; 谢迪; 浦世亮
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-11-07
Filing date: 2018-11-07
Publication date: 2020-05-15
Anticipated expiration: 2038-11-07
Also published as: CN111160516B

Abstract

The embodiment of the application provides a convolutional layer sparsification method and device of a deep neural network, wherein the method comprises the following steps: obtaining tensor and structural parameters of a convolutional layer in a deep neural network; dividing weights in the same dimension in the tensor of the convolutional layer into units in the same dimension by using a preset dimension division method according to the structural parameters of the convolutional layer to obtain a plurality of dimension units; and aiming at each dimension unit, performing the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit. By the scheme, the access efficiency of the memory can be improved.

Description

Convolutional layer sparsization method and device of deep neural network

Technical Field

The application relates to the technical field of machine learning, in particular to a convolutional layer sparsification method and device of a deep neural network.

Background

The deep neural network is an emerging field in machine learning research, analyzes data by simulating a mechanism of a human brain, and is an intelligent model for analyzing and learning by establishing and simulating the human brain. Deep neural networks have been well applied in the fields of target detection and segmentation, behavior detection and recognition, speech recognition, and the like. However, with the continuous development of the deep neural network, the scale of the deep neural network is continuously increased, and the storage amount and the calculation amount of data are larger and larger, so that the deep neural network has higher calculation complexity and needs powerful hardware resources.

In order to reduce the computational complexity of the deep neural network and reduce the pressure of hardware resources, the network model of the deep neural network needs to be compressed, and currently, mainstream methods for compressing the network model of the deep neural network include fixed-point compression, thinning and the like. The sparse method mainly comprises the steps of setting unimportant weights in the deep neural network to be zero and keeping important weights unchanged, so that the dense weights of the deep neural network are converted into sparse weights, and the storage capacity and the calculated amount are obviously reduced.

However, in the corresponding deep neural network sparsification method, the sparsification operation is directed at each weight in the deep neural network, and the irregularity of each weight causes the sparsification of each weight to be irregular. When subsequent network model training and other processing are performed, although some weights in the deep neural network are set to zero, it is not possible to regularly determine which weights are set to zero, and each weight needs to be read from the memory, which results in low memory access efficiency.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method and an apparatus for sparse convolutional layer of a deep neural network, so as to improve the access efficiency of a memory. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present application provides a convolutional layer sparsification method for a deep neural network, where the method includes:

obtaining tensor and structural parameters of a convolutional layer in a deep neural network;

dividing weights in the tensor in the same dimension into units in the same dimension by using a preset dimension dividing method according to the structural parameters to obtain a plurality of dimension units;

and aiming at each dimension unit, performing the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit.

Optionally, the dividing, according to the structural parameter, the weights in the same dimension in the tensor into units of the same dimension by using a preset dimension dividing method to obtain a plurality of dimension units includes:

dividing weights in the same row in the tensor into units with the same dimensionality according to row parameters of filter space dimensionality in the structural parameters to obtain a plurality of row dimensionality units;

alternatively, the first and second electrodes may be,

and dividing the weights in the same column in the tensor into units with the same dimension according to the column parameters of the filter space dimension in the structural parameters to obtain a plurality of column dimension units.

and dividing the weights in the same filter space in the tensor into the same dimensionality unit according to the filter space dimensionality parameter in the structural parameters to obtain a plurality of space dimensionality units.

and dividing the weights in the same input channel in the tensor into the same dimensionality unit according to the input channel parameters in the structural parameters to obtain a plurality of input channel dimensionality units.

Optionally, for each dimension unit, performing the same sparsification operation on each weight in the dimension unit according to the weight in the dimension unit, including:

for each dimension unit, carrying out summation calculation on absolute values of all weights in the dimension unit to obtain a calculation result;

judging whether the calculation result is larger than a preset threshold value or not;

if not, setting each weight in the dimension unit to zero;

if so, keeping each weight in the dimension unit unchanged.

In a second aspect, an embodiment of the present application provides a convolutional layer sparsification apparatus of a deep neural network, the apparatus including:

the acquisition module is used for acquiring tensor and structural parameters of a convolutional layer in the deep neural network;

the dividing module is used for dividing the weights in the same dimensionality in the tensor into units in the same dimensionality by using a preset dimensionality dividing method according to the structural parameters to obtain a plurality of dimensionality units;

and the sparse module is used for carrying out the same sparse operation on each weight value in the dimension unit according to the weight value in the dimension unit.

Optionally, the dividing module is specifically configured to:

alternatively, the first and second electrodes may be,

Optionally, the dividing module is specifically configured to:

Optionally, the sparse module is specifically configured to:

if not, setting each weight in the dimension unit to zero;

if so, keeping each weight in the dimension unit unchanged.

In a third aspect, embodiments provide an electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: all the steps of the convolutional layer sparsification method of the deep neural network provided by the first aspect of the embodiment of the present application are realized.

In a fourth aspect, embodiments of the present application provide a machine-readable storage medium storing machine-executable instructions, which when invoked and executed by a processor, implement all the steps of the convolutional layer sparsification method of a deep neural network provided in the first aspect of the embodiments of the present application.

According to the method and the device for thinning the convolution layer of the deep neural network, the tensor and the structural parameters of the convolution layer in the deep neural network are obtained, weights in the tensor of the convolution layer in the same dimensionality are divided into units in the same dimensionality by a preset dimensionality division method according to the structural parameters of the convolution layer, a plurality of dimensionality units are obtained, and the same thinning operation is conducted on the weights in the dimensionality units according to the weights in the dimensionality units aiming at the dimensionality units. The tensor of the convolutional layer is subjected to dimensionality division to obtain a plurality of dimensionality units, the same sparsification operation is adopted for each weight in the same dimensionality unit, so that the convolutional layer sparsification of the deep neural network has certain regularity, the dimensionality with the weight set to zero can be marked when the weight after the sparsification is stored, and only the weight which is not set to zero can be read based on the mark when the weight is read from the memory, so that the access efficiency of the memory is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a schematic diagram illustrating the effect of a corresponding deep neural network after the convolution layer is thinned;

fig. 2 is a schematic flowchart of a convolutional layer sparsification method of a deep neural network according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a row dimension unit according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a column dimension unit according to an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a spatial dimension unit according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of an input channel dimension unit according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a convolutional layer sparsifying device of a deep neural network according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Fig. 1 is a schematic diagram of an effect of a traditional method for thinning a model weight, and since a single weight is targeted for the thinning operation, a thinning effect of the weight in each filter with the size of R × S in a convolutional layer of a deep neural network with the size of C × K × R × S is irregular (in fig. 1, a black small box represents a weight which remains unchanged, and a white small box represents a weight which is set to zero). Theoretically, the sparse method can convert the convolutional layer tensor of the deep neural network into the very sparse convolutional layer tensor, and the calculation and storage resource consumption of the deep neural network is greatly reduced.

However, due to the irregularity, the thinning method cannot be well supported on a general hardware acceleration platform (such as a GPU (graphics processing Unit)), and often needs a customized hardware accelerator to implement, such as an FPGA (Field Programmable Gate Array), an ASIC (Application Specific Integrated Circuit), and the like. Although this irregular weight sparsification method can be implemented by customizing a hardware accelerator, it often requires a higher hardware design cost.

Therefore, in order to improve the access efficiency of the memory and reduce the design cost of the customized hardware accelerator, embodiments of the present application provide a method and an apparatus for sparse convolutional layers of a deep neural network, an electronic device, and a machine-readable storage medium.

First, a convolutional layer thinning method of a deep neural network provided in an embodiment of the present application is described below.

The execution main body of the convolutional layer sparsification method of the deep neural network provided by the embodiment of the application can be electronic equipment for realizing the functions of target detection and segmentation, behavior detection and recognition, voice equipment and the like. The method for realizing the convolutional layer sparsification method of the deep neural network provided by the embodiment of the application can be at least one of software, a hardware circuit and a logic circuit arranged in an execution main body.

As shown in fig. 2, a convolutional layer sparsification method of a deep neural network provided in an embodiment of the present application may include the following steps:

s201, tensor and structural parameters of a convolutional layer in the deep neural network are obtained.

The deep Neural Network is a relatively wide data processing method, and specifically, the deep Neural Network may be any one of CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), LSTM (Long Short Term Memory Network), and other data processing methods.

Each network layer of the deep neural network is also called a convolutional layer because it mainly performs convolution operation, and the tensor of the convolutional layer is actually a specific weight of each convolutional layer, for example, the tensor of the convolutional layer W is a four-dimensional tensor, which refers to a specific weight of the four-dimensional tensor when the convolutional layer is expressed by dimensions of C × K × R × S. The structure parameters of the convolutional layer include the number of output channels C, the number of input channels K, and the size of spatial dimension R × S of the filter.

S202, dividing weights in the same dimension in the tensor of the convolutional layer into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolutional layer to obtain a plurality of dimension units.

Considering the convolution characteristic, the weights in the same dimension in the same convolutional layer of the deep neural network often have a more similar numerical range, for example, the weights in the same row in some convolutional layers are more similar, the weights in the same column in some convolutional layers are more similar, and the weights in the same filter space dimension in other convolutional layers are more similar, which is related to the initial setting of the convolutional layer structure and the function setting of the deep neural network. Therefore, the weights in the same dimension in the tensor of the convolutional layer can be divided into units of the same dimension according to the structural parameters of the convolutional layer. The same dimensions mentioned in the embodiments of the present application may include, but are not limited to: the filter has the same spatial dimension, the same row and the same column on the spatial dimension of the filter and the same input channel.

The embodiment of the application can determine the form of dimension unit division according to the actual situation of the deep neural network. Optionally, the method of dividing the tensor of the convolutional layer into dimension units may be specifically divided into the following:

in the first mode, according to the line parameters of the filter space dimension in the structure parameters of the convolutional layer, the weights in the same line in the tensor of the convolutional layer are divided into units with the same dimension, and a plurality of units with the line dimension are obtained.

For the case that the difference between the weights of the same row in the convolutional layer is small, the weights in the same row in the tensor of the convolutional layer can be divided into units of the same dimension according to the row parameters of the spatial dimension of the filter in the structural parameters of the convolutional layer, so that the unit of the row dimension shown in fig. 3 can be obtained, and the mathematical expression of the unit of the row dimension is as follows:

cell＝W(c,k,r,:)

in the second mode, the weights in the same column in the tensor of the convolutional layer are divided into units with the same dimension according to the column parameters of the spatial dimension of the filter in the structural parameters of the convolutional layer, and a plurality of units with the column dimension are obtained.

For the case that the weight difference of the same column in the convolutional layer is small, the weights in the same column in the tensor of the convolutional layer can be divided into the same dimension unit according to the column parameters of the filter space dimension in the structural parameters of the convolutional layer, so as to obtain the column dimension unit shown in fig. 4, where the mathematical expression of the column dimension unit is:

cell＝W(c,k,:,s)

in the third mode, according to the filter space dimension parameter in the structure parameters of the convolutional layer, the weights in the same filter space in the tensor of the convolutional layer are divided into the same dimension unit, and a plurality of space dimension units are obtained.

For the situation that the weight difference in the same filter space in the convolutional layer is small, the weights in the same filter space in the tensor of the convolutional layer can be divided into the same dimension unit according to the filter space dimension parameter in the structure parameters of the convolutional layer, so that the space dimension unit shown in fig. 5 can be obtained, and the mathematical expression of the space dimension unit is as follows:

cell＝W(c,k,:,:)

in the fourth mode, according to the input channel parameters in the structure parameters of the convolutional layer, the weights in the same input channel in the tensor of the convolutional layer are divided into the same dimension unit, and a plurality of input channel dimension units are obtained.

For the case that the difference of the weights in the same input channel in the convolutional layer is small, the weights in the same input channel in the tensor of the convolutional layer can be divided into the same dimension unit according to the input channel parameters in the structural parameters of the convolutional layer, so as to obtain the input channel dimension unit shown in fig. 6, where the mathematical expression of the input channel dimension unit is:

cell＝W(c,:,r,s)

the above are only exemplary to provide four ways of dividing the dimension unit of the tensor of the convolutional layer, but the way of dividing the actual dimension unit is not limited to this, and only weights with relatively similar weights are divided into the same dimension unit, for example, a plurality of rows are divided into the same dimension unit, a plurality of columns are divided into the same dimension unit, and the like, and the division of the dimension unit can ensure that each weight in the same dimension unit has certain regularity.

S203, aiming at each dimension unit, carrying out the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit.

After the tensor of the convolutional layer is divided to obtain each dimension unit, because each weight in the same dimension unit has certain regularity, the dimension unit can be regarded as a structured unit, the same sparsifying operation can be performed on each weight in the dimension unit based on the weight in one dimension unit, and after the sparsifying operation is performed, the processed dimension unit is divided into zero values or non-zero values as a whole.

The basis for performing the sparsifying operation on each weight in the dimension unit can be various, for example, whether the maximum weight in the dimension unit is smaller than a preset threshold value can be judged, if so, it is not important to indicate that each weight in the dimension unit corresponds to the deep neural network operation, and each weight in the dimension unit can be set to zero; whether the minimum weight in the dimension unit is greater than a preset threshold value or not can be judged, if so, the importance of each weight in the dimension unit corresponding to the deep neural network operation is shown, and each weight in the dimension unit can be reserved; the average value of each weight in the dimension unit can be calculated, whether the average value is greater than a preset threshold value or not is judged, if so, each weight in the dimension unit is important corresponding to the deep neural network operation, and each weight in the dimension unit can be reserved, and if not, each weight in the dimension unit is not important corresponding to the deep neural network operation, and each weight in the dimension unit can be set to be zero; and summing the absolute values of the weights in the dimension unit, and judging whether the calculation result is greater than a preset threshold value, if so, indicating that the weights in the dimension unit are important for the deep neural network operation, and keeping the weights in the dimension unit, and if not, indicating that the weights in the dimension unit are not important for the deep neural network operation, and setting the weights in the dimension unit to be zero. The basis for performing the sparsifying operation on each weight in the dimension unit is not limited to the above modes, and the modes that the importance of the weight in the dimension unit to the deep neural network operation can be comprehensively determined all belong to the protection scope of the embodiment of the present application, and are not described in detail here.

Optionally, S203 may specifically be:

if not, setting each weight in the dimension unit to zero;

if so, keeping each weight in the dimension unit unchanged.

For the dimension unit, although values of the weights in the same dimension unit are relatively close, it is inevitable that there are occasionally weights with large differences, but the weights in the same dimension unit are also approximate as a whole, so in order to avoid that the final sparsification is influenced, the importance of the dimension unit in the whole deep neural network can be evaluated by performing summation calculation on absolute values of the weights in the dimension unit according to the basis of performing sparsification operation on the weights in the dimension unit, the dimension unit with a large calculation result can be set as important, the dimension unit with a small calculation result is set as unimportant, all weights in the unimportant dimension unit can be set as zero, and the original weights can be kept unchanged for the important dimension unit. In the subsequent deep neural network training process, the weights of the important dimension units can be only finely adjusted, so that the calculation amount of the whole deep neural network operation is reduced.

By applying the embodiment, tensors and structural parameters of the convolutional layers in the deep neural network are obtained, weights in the same dimension in the tensors of the convolutional layers are divided into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolutional layers to obtain a plurality of dimension units, and for each dimension unit, the same thinning operation is performed on each weight in the dimension unit according to the weight in the dimension unit. The tensor of the convolutional layer is subjected to dimensionality division to obtain a plurality of dimensionality units, the same sparsification operation is adopted for each weight in the same dimensionality unit, so that the convolutional layer sparsification of the deep neural network has certain regularity, the dimensionality with the weight set to zero can be marked when the weight after the sparsification is stored, and only the weight which is not set to zero can be read based on the mark when the weight is read from the memory, so that the access efficiency of the memory is improved. In addition, because the convolution layer sparseness of the neural network has certain regularity, the operation of the sparse deep neural network can be realized on a general hardware acceleration platform, and the hardware design cost is greatly reduced.

Corresponding to the above method embodiment, an embodiment of the present application provides a convolutional layer sparsification apparatus of a deep neural network, and as shown in fig. 7, the apparatus may include:

an obtaining module 710, configured to obtain tensors and structural parameters of convolutional layers in a deep neural network;

a dividing module 720, configured to divide the weights in the tensor into units of the same dimension by using a preset dimension dividing method according to the structural parameters, so as to obtain multiple dimension units;

and a thinning module 730, configured to perform the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit.

Optionally, the dividing module 720 may be specifically configured to:

alternatively, the first and second electrodes may be,

Optionally, the dividing module 720 may be specifically configured to:

Optionally, the sparse module 730 may be specifically configured to:

if not, setting each weight in the dimension unit to zero;

if so, keeping each weight in the dimension unit unchanged.

By applying the embodiment, tensors and structural parameters of the convolutional layers in the deep neural network are obtained, weights in the same dimension in the tensors of the convolutional layers are divided into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolutional layers to obtain a plurality of dimension units, and for each dimension unit, the same thinning operation is performed on each weight in the dimension unit according to the weight in the dimension unit. The tensor of the convolutional layer is subjected to dimensionality division to obtain a plurality of dimensionality units, the same sparsification operation is adopted for each weight in the same dimensionality unit, so that the convolutional layer sparsification of the deep neural network has certain regularity, the dimensionality with the weight set to zero can be marked when the weight after the sparsification is stored, and only the weight which is not set to zero can be read based on the mark when the weight is read from the memory, so that the access efficiency of the memory is improved.

Embodiments of the present application further provide an electronic device, as shown in fig. 8, which may include a processor 801 and a machine-readable storage medium 802, where the machine-readable storage medium 802 stores machine-executable instructions that can be executed by the processor 801, and the processor 801 is caused by the machine-executable instructions to: all the steps of the convolution layer sparsification method of the deep neural network are realized.

The machine-readable storage medium may include a RAM (Random Access Memory) and a NVM (Non-Volatile Memory), such as at least one disk Memory. Alternatively, the machine-readable storage medium may be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.

The machine-readable storage medium 802 and the processor 801 may be in data communication via a wired or wireless connection, and the electronic device may communicate with other devices via a wired or wireless communication interface. Fig. 8 shows only an example of data transmission between the processor 801 and the machine-readable storage medium 802 via a bus, and the connection manner is not limited in particular.

In this embodiment, the processor 801 can realize that by reading the machine executable instructions stored in the machine-readable storage medium 802 and executing the machine executable instructions: obtaining tensor and structural parameters of a convolutional layer in a deep neural network, dividing weights in the same dimension in the tensor of the convolutional layer into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolutional layer to obtain a plurality of dimension units, and performing the same thinning operation on each weight in each dimension unit according to the weight in the dimension unit for each dimension unit. The tensor of the convolutional layer is subjected to dimensionality division to obtain a plurality of dimensionality units, the same sparsification operation is adopted for each weight in the same dimensionality unit, so that the convolutional layer sparsification of the deep neural network has certain regularity, the dimensionality with the weight set to zero can be marked when the weight after the sparsification is stored, and only the weight which is not set to zero can be read based on the mark when the weight is read from the memory, so that the access efficiency of the memory is improved.

The embodiment of the application also provides a machine-readable storage medium, which stores machine executable instructions and realizes all the steps of the convolution layer sparsification method of the deep neural network when the machine executable instructions are called and executed by a processor.

In this embodiment, the machine-readable storage medium stores machine-executable instructions for executing the convolutional layer sparsification method of the deep neural network provided in the embodiment of the present application when running, so that the following can be implemented: obtaining tensor and structural parameters of a convolutional layer in a deep neural network, dividing weights in the same dimension in the tensor of the convolutional layer into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolutional layer to obtain a plurality of dimension units, and performing the same thinning operation on each weight in each dimension unit according to the weight in the dimension unit for each dimension unit. The tensor of the convolutional layer is subjected to dimensionality division to obtain a plurality of dimensionality units, the same sparsification operation is adopted for each weight in the same dimensionality unit, so that the convolutional layer sparsification of the deep neural network has certain regularity, the dimensionality with the weight set to zero can be marked when the weight after the sparsification is stored, and only the weight which is not set to zero can be read based on the mark when the weight is read from the memory, so that the access efficiency of the memory is improved.

For the embodiments of the electronic device and the machine-readable storage medium, since the contents of the related methods are substantially similar to those of the foregoing embodiments of the methods, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the embodiments of the methods.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, and the machine-readable storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to the description, reference may be made to some portions of the method embodiments.

The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A method for convolutional layer sparsification of a deep neural network, the method comprising:

2. The method according to claim 1, wherein the dividing weights in the tensor in the same dimension into units of the same dimension by using a preset dimension dividing method according to the structural parameters to obtain a plurality of dimension units comprises:

alternatively, the first and second electrodes may be,

3. The method according to claim 1, wherein the dividing weights in the tensor in the same dimension into units of the same dimension by using a preset dimension dividing method according to the structural parameters to obtain a plurality of dimension units comprises:

4. The method according to claim 1, wherein the dividing weights in the tensor in the same dimension into units of the same dimension by using a preset dimension dividing method according to the structural parameters to obtain a plurality of dimension units comprises:

5. The method according to claim 1, wherein for each dimension unit, performing the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit comprises:

if not, setting each weight in the dimension unit to zero;

if so, keeping each weight in the dimension unit unchanged.

6. An apparatus for convolutional layer sparsification of a deep neural network, the apparatus comprising:

7. The apparatus according to claim 6, wherein the partitioning module is specifically configured to:

alternatively, the first and second electrodes may be,

8. The apparatus according to claim 6, wherein the partitioning module is specifically configured to:

9. The apparatus according to claim 6, wherein the partitioning module is specifically configured to:

10. The apparatus of claim 6, wherein the sparse module is specifically configured to:

if not, setting each weight in the dimension unit to zero;

if so, keeping each weight in the dimension unit unchanged.