CN111160516A - Convolutional layer sparsization method and device of deep neural network - Google Patents

Convolutional layer sparsization method and device of deep neural network Download PDF

Info

Publication number
CN111160516A
CN111160516A CN201811320668.9A CN201811320668A CN111160516A CN 111160516 A CN111160516 A CN 111160516A CN 201811320668 A CN201811320668 A CN 201811320668A CN 111160516 A CN111160516 A CN 111160516A
Authority
CN
China
Prior art keywords
dimension
same
units
dimensionality
weights
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811320668.9A
Other languages
Chinese (zh)
Other versions
CN111160516B (en
Inventor
张渊
谢迪
浦世亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201811320668.9A priority Critical patent/CN111160516B/en
Publication of CN111160516A publication Critical patent/CN111160516A/en
Application granted granted Critical
Publication of CN111160516B publication Critical patent/CN111160516B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides a convolutional layer sparsification method and device of a deep neural network, wherein the method comprises the following steps: obtaining tensor and structural parameters of a convolutional layer in a deep neural network; dividing weights in the same dimension in the tensor of the convolutional layer into units in the same dimension by using a preset dimension division method according to the structural parameters of the convolutional layer to obtain a plurality of dimension units; and aiming at each dimension unit, performing the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit. By the scheme, the access efficiency of the memory can be improved.

Description

Convolutional layer sparsization method and device of deep neural network
Technical Field
The application relates to the technical field of machine learning, in particular to a convolutional layer sparsification method and device of a deep neural network.
Background
The deep neural network is an emerging field in machine learning research, analyzes data by simulating a mechanism of a human brain, and is an intelligent model for analyzing and learning by establishing and simulating the human brain. Deep neural networks have been well applied in the fields of target detection and segmentation, behavior detection and recognition, speech recognition, and the like. However, with the continuous development of the deep neural network, the scale of the deep neural network is continuously increased, and the storage amount and the calculation amount of data are larger and larger, so that the deep neural network has higher calculation complexity and needs powerful hardware resources.
In order to reduce the computational complexity of the deep neural network and reduce the pressure of hardware resources, the network model of the deep neural network needs to be compressed, and currently, mainstream methods for compressing the network model of the deep neural network include fixed-point compression, thinning and the like. The sparse method mainly comprises the steps of setting unimportant weights in the deep neural network to be zero and keeping important weights unchanged, so that the dense weights of the deep neural network are converted into sparse weights, and the storage capacity and the calculated amount are obviously reduced.
However, in the corresponding deep neural network sparsification method, the sparsification operation is directed at each weight in the deep neural network, and the irregularity of each weight causes the sparsification of each weight to be irregular. When subsequent network model training and other processing are performed, although some weights in the deep neural network are set to zero, it is not possible to regularly determine which weights are set to zero, and each weight needs to be read from the memory, which results in low memory access efficiency.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for sparse convolutional layer of a deep neural network, so as to improve the access efficiency of a memory. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a convolutional layer sparsification method for a deep neural network, where the method includes:
obtaining tensor and structural parameters of a convolutional layer in a deep neural network;
dividing weights in the tensor in the same dimension into units in the same dimension by using a preset dimension dividing method according to the structural parameters to obtain a plurality of dimension units;
and aiming at each dimension unit, performing the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit.
Optionally, the dividing, according to the structural parameter, the weights in the same dimension in the tensor into units of the same dimension by using a preset dimension dividing method to obtain a plurality of dimension units includes:
dividing weights in the same row in the tensor into units with the same dimensionality according to row parameters of filter space dimensionality in the structural parameters to obtain a plurality of row dimensionality units;
alternatively, the first and second electrodes may be,
and dividing the weights in the same column in the tensor into units with the same dimension according to the column parameters of the filter space dimension in the structural parameters to obtain a plurality of column dimension units.
Optionally, the dividing, according to the structural parameter, the weights in the same dimension in the tensor into units of the same dimension by using a preset dimension dividing method to obtain a plurality of dimension units includes:
and dividing the weights in the same filter space in the tensor into the same dimensionality unit according to the filter space dimensionality parameter in the structural parameters to obtain a plurality of space dimensionality units.
Optionally, the dividing, according to the structural parameter, the weights in the same dimension in the tensor into units of the same dimension by using a preset dimension dividing method to obtain a plurality of dimension units includes:
and dividing the weights in the same input channel in the tensor into the same dimensionality unit according to the input channel parameters in the structural parameters to obtain a plurality of input channel dimensionality units.
Optionally, for each dimension unit, performing the same sparsification operation on each weight in the dimension unit according to the weight in the dimension unit, including:
for each dimension unit, carrying out summation calculation on absolute values of all weights in the dimension unit to obtain a calculation result;
judging whether the calculation result is larger than a preset threshold value or not;
if not, setting each weight in the dimension unit to zero;
if so, keeping each weight in the dimension unit unchanged.
In a second aspect, an embodiment of the present application provides a convolutional layer sparsification apparatus of a deep neural network, the apparatus including:
the acquisition module is used for acquiring tensor and structural parameters of a convolutional layer in the deep neural network;
the dividing module is used for dividing the weights in the same dimensionality in the tensor into units in the same dimensionality by using a preset dimensionality dividing method according to the structural parameters to obtain a plurality of dimensionality units;
and the sparse module is used for carrying out the same sparse operation on each weight value in the dimension unit according to the weight value in the dimension unit.
Optionally, the dividing module is specifically configured to:
dividing weights in the same row in the tensor into units with the same dimensionality according to row parameters of filter space dimensionality in the structural parameters to obtain a plurality of row dimensionality units;
alternatively, the first and second electrodes may be,
and dividing the weights in the same column in the tensor into units with the same dimension according to the column parameters of the filter space dimension in the structural parameters to obtain a plurality of column dimension units.
Optionally, the dividing module is specifically configured to:
and dividing the weights in the same filter space in the tensor into the same dimensionality unit according to the filter space dimensionality parameter in the structural parameters to obtain a plurality of space dimensionality units.
Optionally, the dividing module is specifically configured to:
and dividing the weights in the same input channel in the tensor into the same dimensionality unit according to the input channel parameters in the structural parameters to obtain a plurality of input channel dimensionality units.
Optionally, the sparse module is specifically configured to:
for each dimension unit, carrying out summation calculation on absolute values of all weights in the dimension unit to obtain a calculation result;
judging whether the calculation result is larger than a preset threshold value or not;
if not, setting each weight in the dimension unit to zero;
if so, keeping each weight in the dimension unit unchanged.
In a third aspect, embodiments provide an electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: all the steps of the convolutional layer sparsification method of the deep neural network provided by the first aspect of the embodiment of the present application are realized.
In a fourth aspect, embodiments of the present application provide a machine-readable storage medium storing machine-executable instructions, which when invoked and executed by a processor, implement all the steps of the convolutional layer sparsification method of a deep neural network provided in the first aspect of the embodiments of the present application.
According to the method and the device for thinning the convolution layer of the deep neural network, the tensor and the structural parameters of the convolution layer in the deep neural network are obtained, weights in the tensor of the convolution layer in the same dimensionality are divided into units in the same dimensionality by a preset dimensionality division method according to the structural parameters of the convolution layer, a plurality of dimensionality units are obtained, and the same thinning operation is conducted on the weights in the dimensionality units according to the weights in the dimensionality units aiming at the dimensionality units. The tensor of the convolutional layer is subjected to dimensionality division to obtain a plurality of dimensionality units, the same sparsification operation is adopted for each weight in the same dimensionality unit, so that the convolutional layer sparsification of the deep neural network has certain regularity, the dimensionality with the weight set to zero can be marked when the weight after the sparsification is stored, and only the weight which is not set to zero can be read based on the mark when the weight is read from the memory, so that the access efficiency of the memory is improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating the effect of a corresponding deep neural network after the convolution layer is thinned;
fig. 2 is a schematic flowchart of a convolutional layer sparsification method of a deep neural network according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a row dimension unit according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a column dimension unit according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of a spatial dimension unit according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of an input channel dimension unit according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a convolutional layer sparsifying device of a deep neural network according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a schematic diagram of an effect of a traditional method for thinning a model weight, and since a single weight is targeted for the thinning operation, a thinning effect of the weight in each filter with the size of R × S in a convolutional layer of a deep neural network with the size of C × K × R × S is irregular (in fig. 1, a black small box represents a weight which remains unchanged, and a white small box represents a weight which is set to zero). Theoretically, the sparse method can convert the convolutional layer tensor of the deep neural network into the very sparse convolutional layer tensor, and the calculation and storage resource consumption of the deep neural network is greatly reduced.
However, due to the irregularity, the thinning method cannot be well supported on a general hardware acceleration platform (such as a GPU (graphics processing Unit)), and often needs a customized hardware accelerator to implement, such as an FPGA (Field Programmable Gate Array), an ASIC (Application Specific Integrated Circuit), and the like. Although this irregular weight sparsification method can be implemented by customizing a hardware accelerator, it often requires a higher hardware design cost.
Therefore, in order to improve the access efficiency of the memory and reduce the design cost of the customized hardware accelerator, embodiments of the present application provide a method and an apparatus for sparse convolutional layers of a deep neural network, an electronic device, and a machine-readable storage medium.
First, a convolutional layer thinning method of a deep neural network provided in an embodiment of the present application is described below.
The execution main body of the convolutional layer sparsification method of the deep neural network provided by the embodiment of the application can be electronic equipment for realizing the functions of target detection and segmentation, behavior detection and recognition, voice equipment and the like. The method for realizing the convolutional layer sparsification method of the deep neural network provided by the embodiment of the application can be at least one of software, a hardware circuit and a logic circuit arranged in an execution main body.
As shown in fig. 2, a convolutional layer sparsification method of a deep neural network provided in an embodiment of the present application may include the following steps:
s201, tensor and structural parameters of a convolutional layer in the deep neural network are obtained.
The deep Neural Network is a relatively wide data processing method, and specifically, the deep Neural Network may be any one of CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), LSTM (Long Short Term Memory Network), and other data processing methods.
Each network layer of the deep neural network is also called a convolutional layer because it mainly performs convolution operation, and the tensor of the convolutional layer is actually a specific weight of each convolutional layer, for example, the tensor of the convolutional layer W is a four-dimensional tensor, which refers to a specific weight of the four-dimensional tensor when the convolutional layer is expressed by dimensions of C × K × R × S. The structure parameters of the convolutional layer include the number of output channels C, the number of input channels K, and the size of spatial dimension R × S of the filter.
S202, dividing weights in the same dimension in the tensor of the convolutional layer into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolutional layer to obtain a plurality of dimension units.
Considering the convolution characteristic, the weights in the same dimension in the same convolutional layer of the deep neural network often have a more similar numerical range, for example, the weights in the same row in some convolutional layers are more similar, the weights in the same column in some convolutional layers are more similar, and the weights in the same filter space dimension in other convolutional layers are more similar, which is related to the initial setting of the convolutional layer structure and the function setting of the deep neural network. Therefore, the weights in the same dimension in the tensor of the convolutional layer can be divided into units of the same dimension according to the structural parameters of the convolutional layer. The same dimensions mentioned in the embodiments of the present application may include, but are not limited to: the filter has the same spatial dimension, the same row and the same column on the spatial dimension of the filter and the same input channel.
The embodiment of the application can determine the form of dimension unit division according to the actual situation of the deep neural network. Optionally, the method of dividing the tensor of the convolutional layer into dimension units may be specifically divided into the following:
in the first mode, according to the line parameters of the filter space dimension in the structure parameters of the convolutional layer, the weights in the same line in the tensor of the convolutional layer are divided into units with the same dimension, and a plurality of units with the line dimension are obtained.
For the case that the difference between the weights of the same row in the convolutional layer is small, the weights in the same row in the tensor of the convolutional layer can be divided into units of the same dimension according to the row parameters of the spatial dimension of the filter in the structural parameters of the convolutional layer, so that the unit of the row dimension shown in fig. 3 can be obtained, and the mathematical expression of the unit of the row dimension is as follows:
cell=W(c,k,r,:)
in the second mode, the weights in the same column in the tensor of the convolutional layer are divided into units with the same dimension according to the column parameters of the spatial dimension of the filter in the structural parameters of the convolutional layer, and a plurality of units with the column dimension are obtained.
For the case that the weight difference of the same column in the convolutional layer is small, the weights in the same column in the tensor of the convolutional layer can be divided into the same dimension unit according to the column parameters of the filter space dimension in the structural parameters of the convolutional layer, so as to obtain the column dimension unit shown in fig. 4, where the mathematical expression of the column dimension unit is:
cell=W(c,k,:,s)
in the third mode, according to the filter space dimension parameter in the structure parameters of the convolutional layer, the weights in the same filter space in the tensor of the convolutional layer are divided into the same dimension unit, and a plurality of space dimension units are obtained.
For the situation that the weight difference in the same filter space in the convolutional layer is small, the weights in the same filter space in the tensor of the convolutional layer can be divided into the same dimension unit according to the filter space dimension parameter in the structure parameters of the convolutional layer, so that the space dimension unit shown in fig. 5 can be obtained, and the mathematical expression of the space dimension unit is as follows:
cell=W(c,k,:,:)
in the fourth mode, according to the input channel parameters in the structure parameters of the convolutional layer, the weights in the same input channel in the tensor of the convolutional layer are divided into the same dimension unit, and a plurality of input channel dimension units are obtained.
For the case that the difference of the weights in the same input channel in the convolutional layer is small, the weights in the same input channel in the tensor of the convolutional layer can be divided into the same dimension unit according to the input channel parameters in the structural parameters of the convolutional layer, so as to obtain the input channel dimension unit shown in fig. 6, where the mathematical expression of the input channel dimension unit is:
cell=W(c,:,r,s)
the above are only exemplary to provide four ways of dividing the dimension unit of the tensor of the convolutional layer, but the way of dividing the actual dimension unit is not limited to this, and only weights with relatively similar weights are divided into the same dimension unit, for example, a plurality of rows are divided into the same dimension unit, a plurality of columns are divided into the same dimension unit, and the like, and the division of the dimension unit can ensure that each weight in the same dimension unit has certain regularity.
S203, aiming at each dimension unit, carrying out the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit.
After the tensor of the convolutional layer is divided to obtain each dimension unit, because each weight in the same dimension unit has certain regularity, the dimension unit can be regarded as a structured unit, the same sparsifying operation can be performed on each weight in the dimension unit based on the weight in one dimension unit, and after the sparsifying operation is performed, the processed dimension unit is divided into zero values or non-zero values as a whole.
The basis for performing the sparsifying operation on each weight in the dimension unit can be various, for example, whether the maximum weight in the dimension unit is smaller than a preset threshold value can be judged, if so, it is not important to indicate that each weight in the dimension unit corresponds to the deep neural network operation, and each weight in the dimension unit can be set to zero; whether the minimum weight in the dimension unit is greater than a preset threshold value or not can be judged, if so, the importance of each weight in the dimension unit corresponding to the deep neural network operation is shown, and each weight in the dimension unit can be reserved; the average value of each weight in the dimension unit can be calculated, whether the average value is greater than a preset threshold value or not is judged, if so, each weight in the dimension unit is important corresponding to the deep neural network operation, and each weight in the dimension unit can be reserved, and if not, each weight in the dimension unit is not important corresponding to the deep neural network operation, and each weight in the dimension unit can be set to be zero; and summing the absolute values of the weights in the dimension unit, and judging whether the calculation result is greater than a preset threshold value, if so, indicating that the weights in the dimension unit are important for the deep neural network operation, and keeping the weights in the dimension unit, and if not, indicating that the weights in the dimension unit are not important for the deep neural network operation, and setting the weights in the dimension unit to be zero. The basis for performing the sparsifying operation on each weight in the dimension unit is not limited to the above modes, and the modes that the importance of the weight in the dimension unit to the deep neural network operation can be comprehensively determined all belong to the protection scope of the embodiment of the present application, and are not described in detail here.
Optionally, S203 may specifically be:
for each dimension unit, carrying out summation calculation on absolute values of all weights in the dimension unit to obtain a calculation result;
judging whether the calculation result is larger than a preset threshold value or not;
if not, setting each weight in the dimension unit to zero;
if so, keeping each weight in the dimension unit unchanged.
For the dimension unit, although values of the weights in the same dimension unit are relatively close, it is inevitable that there are occasionally weights with large differences, but the weights in the same dimension unit are also approximate as a whole, so in order to avoid that the final sparsification is influenced, the importance of the dimension unit in the whole deep neural network can be evaluated by performing summation calculation on absolute values of the weights in the dimension unit according to the basis of performing sparsification operation on the weights in the dimension unit, the dimension unit with a large calculation result can be set as important, the dimension unit with a small calculation result is set as unimportant, all weights in the unimportant dimension unit can be set as zero, and the original weights can be kept unchanged for the important dimension unit. In the subsequent deep neural network training process, the weights of the important dimension units can be only finely adjusted, so that the calculation amount of the whole deep neural network operation is reduced.
By applying the embodiment, tensors and structural parameters of the convolutional layers in the deep neural network are obtained, weights in the same dimension in the tensors of the convolutional layers are divided into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolutional layers to obtain a plurality of dimension units, and for each dimension unit, the same thinning operation is performed on each weight in the dimension unit according to the weight in the dimension unit. The tensor of the convolutional layer is subjected to dimensionality division to obtain a plurality of dimensionality units, the same sparsification operation is adopted for each weight in the same dimensionality unit, so that the convolutional layer sparsification of the deep neural network has certain regularity, the dimensionality with the weight set to zero can be marked when the weight after the sparsification is stored, and only the weight which is not set to zero can be read based on the mark when the weight is read from the memory, so that the access efficiency of the memory is improved. In addition, because the convolution layer sparseness of the neural network has certain regularity, the operation of the sparse deep neural network can be realized on a general hardware acceleration platform, and the hardware design cost is greatly reduced.
Corresponding to the above method embodiment, an embodiment of the present application provides a convolutional layer sparsification apparatus of a deep neural network, and as shown in fig. 7, the apparatus may include:
an obtaining module 710, configured to obtain tensors and structural parameters of convolutional layers in a deep neural network;
a dividing module 720, configured to divide the weights in the tensor into units of the same dimension by using a preset dimension dividing method according to the structural parameters, so as to obtain multiple dimension units;
and a thinning module 730, configured to perform the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit.
Optionally, the dividing module 720 may be specifically configured to:
dividing weights in the same row in the tensor into units with the same dimensionality according to row parameters of filter space dimensionality in the structural parameters to obtain a plurality of row dimensionality units;
alternatively, the first and second electrodes may be,
and dividing the weights in the same column in the tensor into units with the same dimension according to the column parameters of the filter space dimension in the structural parameters to obtain a plurality of column dimension units.
Optionally, the dividing module 720 may be specifically configured to:
and dividing the weights in the same filter space in the tensor into the same dimensionality unit according to the filter space dimensionality parameter in the structural parameters to obtain a plurality of space dimensionality units.
Optionally, the dividing module 720 may be specifically configured to:
and dividing the weights in the same input channel in the tensor into the same dimensionality unit according to the input channel parameters in the structural parameters to obtain a plurality of input channel dimensionality units.
Optionally, the sparse module 730 may be specifically configured to:
for each dimension unit, carrying out summation calculation on absolute values of all weights in the dimension unit to obtain a calculation result;
judging whether the calculation result is larger than a preset threshold value or not;
if not, setting each weight in the dimension unit to zero;
if so, keeping each weight in the dimension unit unchanged.
By applying the embodiment, tensors and structural parameters of the convolutional layers in the deep neural network are obtained, weights in the same dimension in the tensors of the convolutional layers are divided into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolutional layers to obtain a plurality of dimension units, and for each dimension unit, the same thinning operation is performed on each weight in the dimension unit according to the weight in the dimension unit. The tensor of the convolutional layer is subjected to dimensionality division to obtain a plurality of dimensionality units, the same sparsification operation is adopted for each weight in the same dimensionality unit, so that the convolutional layer sparsification of the deep neural network has certain regularity, the dimensionality with the weight set to zero can be marked when the weight after the sparsification is stored, and only the weight which is not set to zero can be read based on the mark when the weight is read from the memory, so that the access efficiency of the memory is improved.
Embodiments of the present application further provide an electronic device, as shown in fig. 8, which may include a processor 801 and a machine-readable storage medium 802, where the machine-readable storage medium 802 stores machine-executable instructions that can be executed by the processor 801, and the processor 801 is caused by the machine-executable instructions to: all the steps of the convolution layer sparsification method of the deep neural network are realized.
The machine-readable storage medium may include a RAM (Random Access Memory) and a NVM (Non-Volatile Memory), such as at least one disk Memory. Alternatively, the machine-readable storage medium may be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also a DSP (Digital Signal Processing), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component.
The machine-readable storage medium 802 and the processor 801 may be in data communication via a wired or wireless connection, and the electronic device may communicate with other devices via a wired or wireless communication interface. Fig. 8 shows only an example of data transmission between the processor 801 and the machine-readable storage medium 802 via a bus, and the connection manner is not limited in particular.
In this embodiment, the processor 801 can realize that by reading the machine executable instructions stored in the machine-readable storage medium 802 and executing the machine executable instructions: obtaining tensor and structural parameters of a convolutional layer in a deep neural network, dividing weights in the same dimension in the tensor of the convolutional layer into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolutional layer to obtain a plurality of dimension units, and performing the same thinning operation on each weight in each dimension unit according to the weight in the dimension unit for each dimension unit. The tensor of the convolutional layer is subjected to dimensionality division to obtain a plurality of dimensionality units, the same sparsification operation is adopted for each weight in the same dimensionality unit, so that the convolutional layer sparsification of the deep neural network has certain regularity, the dimensionality with the weight set to zero can be marked when the weight after the sparsification is stored, and only the weight which is not set to zero can be read based on the mark when the weight is read from the memory, so that the access efficiency of the memory is improved.
The embodiment of the application also provides a machine-readable storage medium, which stores machine executable instructions and realizes all the steps of the convolution layer sparsification method of the deep neural network when the machine executable instructions are called and executed by a processor.
In this embodiment, the machine-readable storage medium stores machine-executable instructions for executing the convolutional layer sparsification method of the deep neural network provided in the embodiment of the present application when running, so that the following can be implemented: obtaining tensor and structural parameters of a convolutional layer in a deep neural network, dividing weights in the same dimension in the tensor of the convolutional layer into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolutional layer to obtain a plurality of dimension units, and performing the same thinning operation on each weight in each dimension unit according to the weight in the dimension unit for each dimension unit. The tensor of the convolutional layer is subjected to dimensionality division to obtain a plurality of dimensionality units, the same sparsification operation is adopted for each weight in the same dimensionality unit, so that the convolutional layer sparsification of the deep neural network has certain regularity, the dimensionality with the weight set to zero can be marked when the weight after the sparsification is stored, and only the weight which is not set to zero can be read based on the mark when the weight is read from the memory, so that the access efficiency of the memory is improved.
For the embodiments of the electronic device and the machine-readable storage medium, since the contents of the related methods are substantially similar to those of the foregoing embodiments of the methods, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the embodiments of the methods.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, and the machine-readable storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and in relation to the description, reference may be made to some portions of the method embodiments.
The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (10)

1. A method for convolutional layer sparsification of a deep neural network, the method comprising:
obtaining tensor and structural parameters of a convolutional layer in a deep neural network;
dividing weights in the tensor in the same dimension into units in the same dimension by using a preset dimension dividing method according to the structural parameters to obtain a plurality of dimension units;
and aiming at each dimension unit, performing the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit.
2. The method according to claim 1, wherein the dividing weights in the tensor in the same dimension into units of the same dimension by using a preset dimension dividing method according to the structural parameters to obtain a plurality of dimension units comprises:
dividing weights in the same row in the tensor into units with the same dimensionality according to row parameters of filter space dimensionality in the structural parameters to obtain a plurality of row dimensionality units;
alternatively, the first and second electrodes may be,
and dividing the weights in the same column in the tensor into units with the same dimension according to the column parameters of the filter space dimension in the structural parameters to obtain a plurality of column dimension units.
3. The method according to claim 1, wherein the dividing weights in the tensor in the same dimension into units of the same dimension by using a preset dimension dividing method according to the structural parameters to obtain a plurality of dimension units comprises:
and dividing the weights in the same filter space in the tensor into the same dimensionality unit according to the filter space dimensionality parameter in the structural parameters to obtain a plurality of space dimensionality units.
4. The method according to claim 1, wherein the dividing weights in the tensor in the same dimension into units of the same dimension by using a preset dimension dividing method according to the structural parameters to obtain a plurality of dimension units comprises:
and dividing the weights in the same input channel in the tensor into the same dimensionality unit according to the input channel parameters in the structural parameters to obtain a plurality of input channel dimensionality units.
5. The method according to claim 1, wherein for each dimension unit, performing the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit comprises:
for each dimension unit, carrying out summation calculation on absolute values of all weights in the dimension unit to obtain a calculation result;
judging whether the calculation result is larger than a preset threshold value or not;
if not, setting each weight in the dimension unit to zero;
if so, keeping each weight in the dimension unit unchanged.
6. An apparatus for convolutional layer sparsification of a deep neural network, the apparatus comprising:
the acquisition module is used for acquiring tensor and structural parameters of a convolutional layer in the deep neural network;
the dividing module is used for dividing the weights in the same dimensionality in the tensor into units in the same dimensionality by using a preset dimensionality dividing method according to the structural parameters to obtain a plurality of dimensionality units;
and the sparse module is used for carrying out the same sparse operation on each weight value in the dimension unit according to the weight value in the dimension unit.
7. The apparatus according to claim 6, wherein the partitioning module is specifically configured to:
dividing weights in the same row in the tensor into units with the same dimensionality according to row parameters of filter space dimensionality in the structural parameters to obtain a plurality of row dimensionality units;
alternatively, the first and second electrodes may be,
and dividing the weights in the same column in the tensor into units with the same dimension according to the column parameters of the filter space dimension in the structural parameters to obtain a plurality of column dimension units.
8. The apparatus according to claim 6, wherein the partitioning module is specifically configured to:
and dividing the weights in the same filter space in the tensor into the same dimensionality unit according to the filter space dimensionality parameter in the structural parameters to obtain a plurality of space dimensionality units.
9. The apparatus according to claim 6, wherein the partitioning module is specifically configured to:
and dividing the weights in the same input channel in the tensor into the same dimensionality unit according to the input channel parameters in the structural parameters to obtain a plurality of input channel dimensionality units.
10. The apparatus of claim 6, wherein the sparse module is specifically configured to:
for each dimension unit, carrying out summation calculation on absolute values of all weights in the dimension unit to obtain a calculation result;
judging whether the calculation result is larger than a preset threshold value or not;
if not, setting each weight in the dimension unit to zero;
if so, keeping each weight in the dimension unit unchanged.
CN201811320668.9A 2018-11-07 2018-11-07 Convolutional layer sparsification method and device for deep neural network Active CN111160516B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811320668.9A CN111160516B (en) 2018-11-07 2018-11-07 Convolutional layer sparsification method and device for deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811320668.9A CN111160516B (en) 2018-11-07 2018-11-07 Convolutional layer sparsification method and device for deep neural network

Publications (2)

Publication Number Publication Date
CN111160516A true CN111160516A (en) 2020-05-15
CN111160516B CN111160516B (en) 2023-09-05

Family

ID=70555336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811320668.9A Active CN111160516B (en) 2018-11-07 2018-11-07 Convolutional layer sparsification method and device for deep neural network

Country Status (1)

Country Link
CN (1) CN111160516B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344200A (en) * 2021-06-17 2021-09-03 阿波罗智联(北京)科技有限公司 Method for training separable convolutional network, road side equipment and cloud control platform
WO2022095984A1 (en) * 2020-11-06 2022-05-12 Moffett Technologies Co., Limited Method and system for convolution with workload-balanced activation sparsity
CN114692847A (en) * 2020-12-25 2022-07-01 中科寒武纪科技股份有限公司 Data processing circuit, data processing method and related product

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127297A (en) * 2016-06-02 2016-11-16 中国科学院自动化研究所 The acceleration of degree of depth convolutional neural networks based on resolution of tensor and compression method
US20170140263A1 (en) * 2015-11-12 2017-05-18 Google Inc. Convolutional gated recurrent neural networks
US20170316312A1 (en) * 2016-05-02 2017-11-02 Cavium, Inc. Systems and methods for deep learning processor
CN107480770A (en) * 2017-07-27 2017-12-15 中国科学院自动化研究所 The adjustable neutral net for quantifying bit wide quantifies the method and device with compression
CN107729995A (en) * 2017-10-31 2018-02-23 中国科学院计算技术研究所 Method and system and neural network processor for accelerans network processing unit
CN107784276A (en) * 2017-10-13 2018-03-09 中南大学 Microseismic event recognition methods and device
US20180075344A1 (en) * 2016-09-09 2018-03-15 SK Hynix Inc. Neural network hardware accelerator architectures and operating method thereof
US20180082179A1 (en) * 2016-09-19 2018-03-22 Vicarious Fpc, Inc. Systems and methods for deep learning with small training sets
CN108073917A (en) * 2018-01-24 2018-05-25 燕山大学 A kind of face identification method based on convolutional neural networks
US20180165577A1 (en) * 2016-12-13 2018-06-14 Google Inc. Performing average pooling in hardware
US20180181857A1 (en) * 2016-12-27 2018-06-28 Texas Instruments Incorporated Reduced Complexity Convolution for Convolutional Neural Networks
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN108345831A (en) * 2017-12-28 2018-07-31 新智数字科技有限公司 The method, apparatus and electronic equipment of Road image segmentation based on point cloud data

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170140263A1 (en) * 2015-11-12 2017-05-18 Google Inc. Convolutional gated recurrent neural networks
US20170316312A1 (en) * 2016-05-02 2017-11-02 Cavium, Inc. Systems and methods for deep learning processor
CN106127297A (en) * 2016-06-02 2016-11-16 中国科学院自动化研究所 The acceleration of degree of depth convolutional neural networks based on resolution of tensor and compression method
US20180075344A1 (en) * 2016-09-09 2018-03-15 SK Hynix Inc. Neural network hardware accelerator architectures and operating method thereof
US20180082179A1 (en) * 2016-09-19 2018-03-22 Vicarious Fpc, Inc. Systems and methods for deep learning with small training sets
US20180165577A1 (en) * 2016-12-13 2018-06-14 Google Inc. Performing average pooling in hardware
US20180181857A1 (en) * 2016-12-27 2018-06-28 Texas Instruments Incorporated Reduced Complexity Convolution for Convolutional Neural Networks
CN107480770A (en) * 2017-07-27 2017-12-15 中国科学院自动化研究所 The adjustable neutral net for quantifying bit wide quantifies the method and device with compression
CN107784276A (en) * 2017-10-13 2018-03-09 中南大学 Microseismic event recognition methods and device
CN107729995A (en) * 2017-10-31 2018-02-23 中国科学院计算技术研究所 Method and system and neural network processor for accelerans network processing unit
CN108345831A (en) * 2017-12-28 2018-07-31 新智数字科技有限公司 The method, apparatus and electronic equipment of Road image segmentation based on point cloud data
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN108073917A (en) * 2018-01-24 2018-05-25 燕山大学 A kind of face identification method based on convolutional neural networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
浦世亮、郑钢、王杰: "低分辨率自然场景文本识别", 《中国安防》 *
谢迪: "深度学习技术兴起将给安贿来什幺", 《中国安防》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022095984A1 (en) * 2020-11-06 2022-05-12 Moffett Technologies Co., Limited Method and system for convolution with workload-balanced activation sparsity
CN114692847A (en) * 2020-12-25 2022-07-01 中科寒武纪科技股份有限公司 Data processing circuit, data processing method and related product
CN114692847B (en) * 2020-12-25 2024-01-09 中科寒武纪科技股份有限公司 Data processing circuit, data processing method and related products
CN113344200A (en) * 2021-06-17 2021-09-03 阿波罗智联(北京)科技有限公司 Method for training separable convolutional network, road side equipment and cloud control platform

Also Published As

Publication number Publication date
CN111160516B (en) 2023-09-05

Similar Documents

Publication Publication Date Title
US11373087B2 (en) Method and apparatus for generating fixed-point type neural network
CN111144561B (en) Neural network model determining method and device
CN111160516A (en) Convolutional layer sparsization method and device of deep neural network
CN111176820A (en) Deep neural network-based edge computing task allocation method and device
KR20200079059A (en) Method and apparatus for processing neural network based on bitwise operation
CN112183295A (en) Pedestrian re-identification method and device, computer equipment and storage medium
CN113705775A (en) Neural network pruning method, device, equipment and storage medium
CN110874625A (en) Deep neural network quantification method and device
CN109325530B (en) Image classification method, storage device and processing device
CN111178258B (en) Image identification method, system, equipment and readable storage medium
CN111709415B (en) Target detection method, device, computer equipment and storage medium
CN112990420A (en) Pruning method for convolutional neural network model
CN109754077B (en) Network model compression method and device of deep neural network and computer equipment
CN112132278A (en) Model compression method and device, computer equipment and storage medium
CN114091554A (en) Training set processing method and device
CN115310595A (en) Neural network mixing precision quantification method and system of memristor
CN111160517A (en) Convolutional layer quantization method and device of deep neural network
JP2020027604A (en) Information processing method, and information processing system
CN114139678A (en) Convolutional neural network quantization method and device, electronic equipment and storage medium
US11810265B2 (en) Image reconstruction method and device, apparatus, and non-transitory computer-readable storage medium
WO2021055364A1 (en) Efficient inferencing with fast pointwise convolution
CN109754066B (en) Method and apparatus for generating a fixed-point neural network
CN117436482A (en) Neural network pruning method, device, chip, equipment, storage medium and product
CN113255576B (en) Face recognition method and device
US20220108156A1 (en) Hardware architecture for processing data in sparse neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant