CN111160516B - Convolutional layer sparsification method and device for deep neural network - Google Patents

Convolutional layer sparsification method and device for deep neural network Download PDF

Info

Publication number
CN111160516B
CN111160516B CN201811320668.9A CN201811320668A CN111160516B CN 111160516 B CN111160516 B CN 111160516B CN 201811320668 A CN201811320668 A CN 201811320668A CN 111160516 B CN111160516 B CN 111160516B
Authority
CN
China
Prior art keywords
dimension
same
units
weight
weights
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811320668.9A
Other languages
Chinese (zh)
Other versions
CN111160516A (en
Inventor
张渊
谢迪
浦世亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN201811320668.9A priority Critical patent/CN111160516B/en
Publication of CN111160516A publication Critical patent/CN111160516A/en
Application granted granted Critical
Publication of CN111160516B publication Critical patent/CN111160516B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application provides a convolutional layer sparsification method and device of a deep neural network, wherein the method comprises the following steps: acquiring tensors and structural parameters of a convolution layer in a deep neural network; dividing weights in the same dimension in tensors of the convolution layer into units in the same dimension by using a preset dimension dividing method according to structural parameters of the convolution layer to obtain a plurality of dimension units; and for each dimension unit, carrying out the same sparsification operation on each weight in the dimension unit according to the weight in the dimension unit. By the scheme, the access efficiency of the memory can be improved.

Description

Convolutional layer sparsification method and device for deep neural network
Technical Field
The application relates to the technical field of machine learning, in particular to a convolutional layer sparsification method and device of a deep neural network.
Background
Deep neural networks, which are an emerging field in machine learning research, analyze data by mimicking the mechanisms of the human brain, and are an intelligent model for analytical learning by building and simulating the human brain. The deep neural network has been well applied in the aspects of target detection and segmentation, behavior detection and recognition, voice recognition, etc. However, with the continuous development of the deep neural network, the scale of the deep neural network is continuously increased, and the storage amount and the calculated amount of data are increasingly large, so that the deep neural network has higher calculation complexity and needs strong hardware resources.
In order to reduce the computational complexity of the deep neural network and reduce the pressure of hardware resources, the network model of the deep neural network needs to be compressed, and the current mainstream method for compressing the network model of the deep neural network has localization, sparsification and the like. The sparsification mainly comprises the steps of setting unimportant weights in the deep neural network to zero and keeping the important weights unchanged, so that dense weights of the deep neural network are converted into sparse weights, and the storage capacity and the calculated amount are obviously reduced.
However, in the corresponding deep neural network sparsification method, the sparsification operation is directed to each weight in the deep neural network, and the sparsification of each weight is irregular and circulated due to the irregularity of each weight. When the subsequent network model training and other processes are performed, although some weights in the deep neural network are set to zero, specific weights cannot be regularly determined, and each weight needs to be read from the memory, so that the memory access efficiency is low.
Disclosure of Invention
The embodiment of the application aims to provide a convolutional layer thinning method and device of a deep neural network, so as to improve the access efficiency of a memory. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a convolutional layer thinning method of a deep neural network, where the method includes:
acquiring tensors and structural parameters of a convolution layer in a deep neural network;
according to the structural parameters, dividing weights in the same dimension in the tensor into units in the same dimension by using a preset dimension dividing method to obtain a plurality of dimension units;
and for each dimension unit, carrying out the same sparsification operation on each weight in the dimension unit according to the weight in the dimension unit.
Optionally, according to the structural parameter, a preset dimension dividing method is used to divide weights in the same dimension in the tensor into units in the same dimension, so as to obtain a plurality of dimension units, including:
dividing weights in the same row in the tensor into units in the same dimension according to row parameters of the spatial dimension of the filter in the structural parameters to obtain a plurality of row dimension units;
or alternatively, the process may be performed,
and dividing weights in the same column in the tensor into units in the same dimension according to column parameters of the spatial dimension of the filter in the structural parameters to obtain a plurality of column dimension units.
Optionally, according to the structural parameter, a preset dimension dividing method is used to divide weights in the same dimension in the tensor into units in the same dimension, so as to obtain a plurality of dimension units, including:
and dividing weights in the same filter space in the tensor into the same dimension units according to the filter space dimension parameters in the structural parameters to obtain a plurality of space dimension units.
Optionally, according to the structural parameter, a preset dimension dividing method is used to divide weights in the same dimension in the tensor into units in the same dimension, so as to obtain a plurality of dimension units, including:
and dividing weights in the same input channel in the tensor into units in the same dimension according to the input channel parameters in the structural parameters to obtain a plurality of input channel dimension units.
Optionally, for each dimension unit, performing the same sparsification operation on each weight in the dimension unit according to the weight in the dimension unit, including:
summing the absolute values of all weights in each dimension unit aiming at each dimension unit to obtain a calculation result;
judging whether the calculation result is larger than a preset threshold value or not;
if not, setting the weight in the dimension unit to zero;
if the weight is larger than the preset weight, each weight in the dimension unit is kept unchanged.
In a second aspect, an embodiment of the present application provides a convolutional layer thinning apparatus for a deep neural network, where the apparatus includes:
the acquisition module is used for acquiring tensors and structural parameters of the convolution layers in the deep neural network;
the dividing module is used for dividing the weight value in the same dimension in the tensor into units in the same dimension by using a preset dimension dividing method according to the structural parameters to obtain a plurality of dimension units;
the sparse module is used for carrying out the same sparse operation on the weight values in each dimension unit according to the weight values in the dimension unit aiming at each dimension unit.
Optionally, the dividing module is specifically configured to:
dividing weights in the same row in the tensor into units in the same dimension according to row parameters of the spatial dimension of the filter in the structural parameters to obtain a plurality of row dimension units;
or alternatively, the process may be performed,
and dividing weights in the same column in the tensor into units in the same dimension according to column parameters of the spatial dimension of the filter in the structural parameters to obtain a plurality of column dimension units.
Optionally, the dividing module is specifically configured to:
and dividing weights in the same filter space in the tensor into the same dimension units according to the filter space dimension parameters in the structural parameters to obtain a plurality of space dimension units.
Optionally, the dividing module is specifically configured to:
and dividing weights in the same input channel in the tensor into units in the same dimension according to the input channel parameters in the structural parameters to obtain a plurality of input channel dimension units.
Optionally, the sparse module is specifically configured to:
summing the absolute values of all weights in each dimension unit aiming at each dimension unit to obtain a calculation result;
judging whether the calculation result is larger than a preset threshold value or not;
if not, setting the weight in the dimension unit to zero;
if the weight is larger than the preset weight, each weight in the dimension unit is kept unchanged.
In a third aspect, an embodiment of the present application provides an electronic device comprising a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor, the processor being caused by the machine-executable instructions to: all steps of the convolutional layer thinning method of the deep neural network provided by the first aspect of the embodiment of the application are realized.
In a fourth aspect, embodiments of the present application provide a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, implement all the steps of the convolutional layer thinning method of a deep neural network provided in the first aspect of the present application.
According to the method and the device for sparsifying the convolution layer of the deep neural network, tensor and structural parameters of the convolution layer in the deep neural network are obtained, weights in the same dimension in the tensor of the convolution layer are divided into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolution layer, multiple dimension units are obtained, and the same sparsifying operation is carried out on the weights in the dimension units according to the weights in the dimension units aiming at the dimension units. The tensor of the convolution layer is subjected to dimension division to obtain a plurality of dimension units, the same sparsification operation is adopted for all weights in the same dimension unit, so that the sparsification of the convolution layer of the deep neural network has certain regularity, when the sparsified weights are stored, the dimension with the weight set to zero can be marked, when the weights are read from the memory, only the weights which are not set to zero can be read based on the mark, and the access efficiency of the memory is improved.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of the effect of convolutional layer sparsification of a corresponding deep neural network;
FIG. 2 is a schematic flow chart of a convolutional layer thinning method of a deep neural network according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a row dimension unit according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a column dimension unit according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a spatial dimension unit according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an input channel dimension unit according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a convolutional layer thinning apparatus of a deep neural network according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Fig. 1 is a schematic diagram of the effect after the model weight is thinned, and since the thinning operation is directed to a single weight, the thinning effect of the weight in each filter of size r×s in the convolutional layer of the deep neural network of size c×k×r×s is irregular (the black small box in fig. 1 represents the weight that remains unchanged, and the white small box represents the weight that is set to zero). In theory, the sparse method can convert the convolution layer tensor of the deep neural network into very sparse convolution layer tensor, so that the calculation and storage resource consumption of the deep neural network is greatly reduced.
However, this sparsing method is not well supported on general-purpose hardware acceleration platforms (e.g., GPU (Graphics Processing Unit, graphics processing unit)) due to irregularities, and often requires a customized hardware accelerator to implement, such as FPGA (Field-Programmable Gate Array, field programmable gate array), ASIC (Application Specific Integrated Circuit ), etc. Although this irregular weight sparsification method can be implemented by customizing the hardware accelerator, it also often requires a higher hardware design cost.
Therefore, in order to improve the access efficiency of the memory and reduce the design cost of the customized hardware accelerator, the embodiment of the application provides a convolutional layer thinning method, a device, an electronic device and a machine-readable storage medium of a deep neural network.
The method for thinning the convolutional layer of the deep neural network provided by the embodiment of the application is described below.
The execution main body of the convolutional layer sparsification method of the deep neural network provided by the embodiment of the application can be electronic equipment for realizing the functions of target detection and segmentation, behavior detection and recognition, voice equipment and the like. The manner of implementing the convolutional layer thinning method of the deep neural network provided by the embodiment of the application can be at least one manner of software, hardware circuits and logic circuits arranged in an execution body.
As shown in fig. 2, the method for thinning a convolutional layer of a deep neural network provided by the embodiment of the application may include the following steps:
s201, tensor and structural parameters of a convolution layer in the deep neural network are obtained.
The deep neural network is a relatively wide data processing method, and specifically, the deep neural network can be any one of data processing methods such as CNN (Convolutional Neural Network ), RNN (Recurrent Neural Network, cyclic neural network), LSTM (Long Short Term Memory, long-term memory network) and the like.
Each network layer of the deep neural network is also called a convolution layer because it mainly performs convolution operation, and the tensor of the convolution layer is actually each specific weight value in the convolution layer, for example, the tensor of the convolution layer W is a four-dimensional tensor, which means that when the convolution layer is expressed as c×k×r×s dimensions, the specific weight value in the four-dimensional tensor. The structural parameters of the convolution layer include the number of output channels C, the number of input channels K, and the filter spatial dimension size R x S of the convolution layer.
S202, dividing weights in the same dimension in tensors of the convolution layer into units in the same dimension by using a preset dimension dividing method according to structural parameters of the convolution layer, and obtaining a plurality of dimension units.
Considering convolution characteristics, weights in the same dimension in the same convolution layer of the deep neural network often have more similar numerical ranges, for example, weights in the same row in the convolution layer are similar in size, weights in the same column in some convolution layers are similar in size, and weights in the same filter space dimension in other convolution layers are similar in size, which is related to initial setting of the convolution layer structure and functional setting of the deep neural network. Therefore, the weights in the same dimension in the tensor of the convolution layer can be divided into units in the same dimension according to the structural parameters of the convolution layer. The same dimensions mentioned in embodiments of the present application may include, but are not limited to: the same filter space dimension, the same row, the same column, the same input channel in the filter space dimension.
The embodiment of the application can determine the form of dimension unit division according to the actual situation of the deep neural network. Optionally, the manner of dimension unit division of the tensor of the convolution layer may be specifically divided into the following several manners:
in the first mode, according to the line parameters of the spatial dimension of the filter in the structural parameters of the convolution layer, the weights in the same line in the tensor of the convolution layer are divided into the same dimension units, so that a plurality of line dimension units are obtained.
For the case that the weight difference of the same row in the convolution layer is smaller, the weights in the same row in the tensor of the convolution layer can be divided into units in the same dimension according to the row parameter of the spatial dimension of the filter in the structural parameter of the convolution layer, and a row dimension unit as shown in fig. 3 can be obtained, wherein the mathematical expression of the row dimension unit is as follows:
cell=W(c,k,r,:)
in the second mode, according to column parameters of the spatial dimension of the filter in the structural parameters of the convolution layer, weights in the same column in the tensor of the convolution layer are divided into units in the same dimension, so that a plurality of units in the column dimension are obtained.
For the case that the weight difference of the same column in the convolution layer is smaller, the weights in the same column in the tensor of the convolution layer can be divided into units in the same dimension according to the column parameter of the spatial dimension of the filter in the structural parameter of the convolution layer, and a column dimension unit as shown in fig. 4 can be obtained, wherein the mathematical expression of the column dimension unit is as follows:
cell=W(c,k,:,s)
in a third mode, according to the spatial dimension parameters of the filter in the structural parameters of the convolution layer, the weight in the same filter space in the tensor of the convolution layer is divided into the same dimension units, so as to obtain a plurality of spatial dimension units.
For the case that the weight difference in the same filter space in the convolution layer is smaller, the weight in the same filter space in the tensor of the convolution layer can be divided into the same dimension unit according to the filter space dimension parameter in the structural parameter of the convolution layer, so that the space dimension unit shown in fig. 5 can be obtained, and the mathematical expression of the space dimension unit is as follows:
cell=W(c,k,:,:)
in a fourth mode, according to input channel parameters in structural parameters of a convolution layer, weights in the same input channel in tensors of the convolution layer are divided into units in the same dimension, so that a plurality of input channel dimension units are obtained.
For the case that the weight difference in the same input channel in the convolution layer is smaller, the weights in the same input channel in the tensor of the convolution layer can be divided into the same dimension units according to the input channel parameters in the structural parameters of the convolution layer, the input channel dimension units shown in fig. 6 can be obtained, and the mathematical expression of the input channel dimension units is as follows:
cell=W(c,:,r,s)
the four manners of performing dimension unit division on the tensor of the convolution layer are only given as examples, but the manner of performing actual dimension unit division is not limited to this, and only the weights with relatively similar weights are divided into the same dimension unit, for example, multiple rows are divided into the same dimension unit, multiple columns are divided into the same dimension unit, and the like, and certain regularity of each weight in the same dimension unit can be ensured through the division of the dimension unit.
S203, for each dimension unit, performing the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit.
After the tensor of the convolution layer is divided to obtain each dimension unit, as each weight value in the same dimension unit has certain regularity, the dimension unit can be considered as a structured unit, the same sparsification operation can be performed on each weight value in one dimension unit based on the weight value in the dimension unit, and after the sparsification operation, the processed dimension unit is wholly divided into zero value or non-zero value.
The basis for performing the sparsification operation on each weight in the dimension unit can be various, for example, whether the maximum weight in the dimension unit is smaller than a preset threshold value can be judged, if so, it is explained that the weight in the dimension unit is not important for the deep neural network operation, and the weight in the dimension unit can be set to zero; whether the minimum weight in the dimension unit is larger than a preset threshold value or not can be judged, if so, the weight in the dimension unit is important for the deep neural network operation, and the weight in the dimension unit can be reserved; the average value of all the weights in the dimension unit can be calculated, whether the average value is larger than a preset threshold value or not is judged, if so, the weights in the dimension unit are indicated to be important for the deep neural network operation, all the weights in the dimension unit can be reserved, if not, the weights in the dimension unit are indicated to be unimportant for the deep neural network operation, and all the weights in the dimension unit can be set to zero; and the absolute value of each weight in the dimension unit can be summed up and calculated, whether the calculation result is larger than a preset threshold value is judged, if so, the weight in the dimension unit is indicated to be important for the deep neural network operation, the weight in the dimension unit can be reserved, if not, the weight in the dimension unit is indicated to be unimportant for the deep neural network operation, and the weight in the dimension unit can be set to zero. The basis for performing the sparsification operation on each weight in the dimension unit is not limited to the above modes, and the mode of comprehensively determining the importance of the weight in the dimension unit to the deep neural network operation belongs to the protection scope of the embodiment of the present application, and is not described in detail herein.
Optionally, S203 may specifically be:
summing the absolute values of all weights in each dimension unit aiming at each dimension unit to obtain a calculation result;
judging whether the calculation result is larger than a preset threshold value or not;
if not, setting the weight in the dimension unit to zero;
if the weight is larger than the preset weight, each weight in the dimension unit is kept unchanged.
For the dimension units, although the values of the weights in the same dimension unit are similar, the weights with larger difference do not exist occasionally, but the weights in the same dimension unit are similar to each other in whole, so that in order to avoid the situation that the final sparsification is influenced, the basis of the sparsification operation of the weights in the dimension unit can be specifically calculated to evaluate the importance of the dimension unit in the whole deep neural network by summing the absolute values of the weights in the dimension unit, the dimension unit with large calculation result can be set to be important, the dimension unit with small calculation result is set to be unimportant, all the weights in the unimportant dimension unit can be set to zero, and the original weights can be kept unchanged for the important dimension unit. In the subsequent deep neural network training process, only the weight of the important dimension unit can be finely adjusted, so that the calculated amount of the whole deep neural network operation is reduced.
By applying the embodiment, tensor and structural parameters of a convolution layer in a deep neural network are acquired, weights in the same dimension in the tensor of the convolution layer are divided into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolution layer, a plurality of dimension units are obtained, and for each dimension unit, the same sparsification operation is carried out on each weight in the dimension unit according to the weights in the dimension unit. The tensor of the convolution layer is subjected to dimension division to obtain a plurality of dimension units, the same sparsification operation is adopted for all weights in the same dimension unit, so that the sparsification of the convolution layer of the deep neural network has certain regularity, when the sparsified weights are stored, the dimension with the weight set to zero can be marked, when the weights are read from the memory, only the weights which are not set to zero can be read based on the mark, and the access efficiency of the memory is improved. Moreover, as the convolutional layer sparsification of the neural network has certain regularity, the operation of the sparse deep neural network can be realized on a general hardware acceleration platform, and the hardware design cost is greatly reduced.
Corresponding to the above method embodiment, the embodiment of the present application provides a convolutional layer thinning apparatus of a deep neural network, as shown in fig. 7, the apparatus may include:
an obtaining module 710, configured to obtain tensors and structural parameters of a convolutional layer in the deep neural network;
the dividing module 720 is configured to divide weights in the same dimension in the tensor into units in the same dimension by using a preset dimension dividing method according to the structural parameter, so as to obtain a plurality of dimension units;
the sparse module 730 is configured to perform, for each dimension unit, the same sparse operation on each weight in the dimension unit according to the weight in the dimension unit.
Optionally, the dividing module 720 may specifically be configured to:
dividing weights in the same row in the tensor into units in the same dimension according to row parameters of the spatial dimension of the filter in the structural parameters to obtain a plurality of row dimension units;
or alternatively, the process may be performed,
and dividing weights in the same column in the tensor into units in the same dimension according to column parameters of the spatial dimension of the filter in the structural parameters to obtain a plurality of column dimension units.
Optionally, the dividing module 720 may specifically be configured to:
and dividing weights in the same filter space in the tensor into the same dimension units according to the filter space dimension parameters in the structural parameters to obtain a plurality of space dimension units.
Optionally, the dividing module 720 may specifically be configured to:
and dividing weights in the same input channel in the tensor into units in the same dimension according to the input channel parameters in the structural parameters to obtain a plurality of input channel dimension units.
Optionally, the sparse module 730 may specifically be configured to:
summing the absolute values of all weights in each dimension unit aiming at each dimension unit to obtain a calculation result;
judging whether the calculation result is larger than a preset threshold value or not;
if not, setting the weight in the dimension unit to zero;
if the weight is larger than the preset weight, each weight in the dimension unit is kept unchanged.
By applying the embodiment, tensor and structural parameters of a convolution layer in a deep neural network are acquired, weights in the same dimension in the tensor of the convolution layer are divided into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolution layer, a plurality of dimension units are obtained, and for each dimension unit, the same sparsification operation is carried out on each weight in the dimension unit according to the weights in the dimension unit. The tensor of the convolution layer is subjected to dimension division to obtain a plurality of dimension units, the same sparsification operation is adopted for all weights in the same dimension unit, so that the sparsification of the convolution layer of the deep neural network has certain regularity, when the sparsified weights are stored, the dimension with the weight set to zero can be marked, when the weights are read from the memory, only the weights which are not set to zero can be read based on the mark, and the access efficiency of the memory is improved.
Embodiments of the present application also provide an electronic device, as shown in fig. 8, may include a processor 801 and a machine-readable storage medium 802, the machine-readable storage medium 802 storing machine-executable instructions capable of being executed by the processor 801, the processor 801 being caused by the machine-executable instructions to: all steps of the convolutional layer thinning method of the deep neural network are realized.
The machine-readable storage medium may include RAM (Random Access Memory ) or NVM (Non-Volatile Memory), such as at least one magnetic disk Memory. In the alternative, the machine-readable storage medium may also be at least one memory device located remotely from the processor.
The processor may be a general-purpose processor, including a CPU (Central Processing Unit ), NP (Network Processor, network processor), etc.; but also DSP (Digital Signal Processing, digital signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field-Programmable Gate Array, field programmable gate array) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components.
The machine-readable storage medium 802 and the processor 801 may be in data communication by wired or wireless connection, and the electronic device may communicate with other devices via a wired or wireless communication interface. Fig. 8 is merely an example of data transmission between the processor 801 and the machine-readable storage medium 802 via a bus, and is not limited to a specific connection.
In this embodiment, the processor 801 is capable of implementing by reading machine-executable instructions stored in the machine-readable storage medium 802 and by executing the machine-executable instructions: the method comprises the steps of obtaining tensors and structural parameters of a convolution layer in a deep neural network, dividing weights in the same dimension in the tensors of the convolution layer into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolution layer, obtaining a plurality of dimension units, and carrying out the same sparsification operation on the weights in the dimension units according to the weights in the dimension units aiming at the dimension units. The tensor of the convolution layer is subjected to dimension division to obtain a plurality of dimension units, the same sparsification operation is adopted for all weights in the same dimension unit, so that the sparsification of the convolution layer of the deep neural network has certain regularity, when the sparsified weights are stored, the dimension with the weight set to zero can be marked, when the weights are read from the memory, only the weights which are not set to zero can be read based on the mark, and the access efficiency of the memory is improved.
The embodiment of the application also provides a machine-readable storage medium which stores machine-executable instructions and when being called and executed by a processor, realizes all steps of the convolutional layer thinning method of the deep neural network.
In this embodiment, the machine-readable storage medium stores machine-executable instructions for executing the convolutional layer thinning method of the deep neural network provided by the embodiment of the present application at runtime, so that it can implement: the method comprises the steps of obtaining tensors and structural parameters of a convolution layer in a deep neural network, dividing weights in the same dimension in the tensors of the convolution layer into units in the same dimension by using a preset dimension dividing method according to the structural parameters of the convolution layer, obtaining a plurality of dimension units, and carrying out the same sparsification operation on the weights in the dimension units according to the weights in the dimension units aiming at the dimension units. The tensor of the convolution layer is subjected to dimension division to obtain a plurality of dimension units, the same sparsification operation is adopted for all weights in the same dimension unit, so that the sparsification of the convolution layer of the deep neural network has certain regularity, when the sparsified weights are stored, the dimension with the weight set to zero can be marked, when the weights are read from the memory, only the weights which are not set to zero can be read based on the mark, and the access efficiency of the memory is improved.
For the electronic device and the machine-readable storage medium embodiments, the description is relatively simple, and reference should be made to part of the description of the method embodiments for the relevant matters, since the method content involved is basically similar to the method embodiments described above.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for apparatus, electronic devices, and machine-readable storage medium embodiments, the description is relatively simple as it is substantially similar to method embodiments, with reference to the section descriptions of method embodiments being merely illustrative.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims (10)

1. The convolutional layer sparsification method of the deep neural network is characterized by being applied to a hardware acceleration platform, wherein the hardware acceleration platform comprises a memory, and the method comprises the following steps:
tensor and structural parameters of a convolution layer in a deep neural network are obtained, wherein the deep neural network is used for target detection and segmentation, behavior detection and recognition and voice recognition;
according to the structural parameters, dividing weights in the same dimension in the tensor into units in the same dimension by using a preset dimension dividing method to obtain a plurality of dimension units;
aiming at each dimension unit, carrying out the same sparsification operation on each weight in the dimension unit according to the weight in the dimension unit; when the weight is stored and thinned, the dimension with the weight set to zero is marked, and when the weight is read from the memory, only the weight which is not set to zero is read based on the mark.
2. The method according to claim 1, wherein the dividing weights in the same dimension in the tensor into the same dimension units by using a preset dimension dividing method according to the structural parameter to obtain a plurality of dimension units includes:
dividing weights in the same row in the tensor into units in the same dimension according to row parameters of the spatial dimension of the filter in the structural parameters to obtain a plurality of row dimension units;
or alternatively, the process may be performed,
and dividing weights in the same column in the tensor into units in the same dimension according to column parameters of the spatial dimension of the filter in the structural parameters to obtain a plurality of column dimension units.
3. The method according to claim 1, wherein the dividing weights in the same dimension in the tensor into the same dimension units by using a preset dimension dividing method according to the structural parameter to obtain a plurality of dimension units includes:
and dividing weights in the same filter space in the tensor into the same dimension units according to the filter space dimension parameters in the structural parameters to obtain a plurality of space dimension units.
4. The method according to claim 1, wherein the dividing weights in the same dimension in the tensor into the same dimension units by using a preset dimension dividing method according to the structural parameter to obtain a plurality of dimension units includes:
and dividing weights in the same input channel in the tensor into units in the same dimension according to the input channel parameters in the structural parameters to obtain a plurality of input channel dimension units.
5. The method according to claim 1, wherein for each dimension unit, performing the same thinning operation on each weight in the dimension unit according to the weight in the dimension unit, includes:
summing the absolute values of all weights in each dimension unit aiming at each dimension unit to obtain a calculation result;
judging whether the calculation result is larger than a preset threshold value or not;
if not, setting the weight in the dimension unit to zero;
if the weight is larger than the preset weight, each weight in the dimension unit is kept unchanged.
6. The convolutional layer sparsification device of the deep neural network is characterized by being applied to a hardware acceleration platform, wherein the hardware acceleration platform comprises a memory, and the device comprises:
the acquisition module is used for acquiring tensors and structural parameters of a convolution layer in the deep neural network, wherein the deep neural network is used for target detection and segmentation, behavior detection and recognition and voice recognition;
the dividing module is used for dividing the weight value in the same dimension in the tensor into units in the same dimension by using a preset dimension dividing method according to the structural parameters to obtain a plurality of dimension units;
the sparse module is used for carrying out the same sparse operation on each weight value in each dimension unit according to the weight value in the dimension unit; when the weight is stored and thinned, the dimension with the weight set to zero is marked, and when the weight is read from the memory, only the weight which is not set to zero is read based on the mark.
7. The apparatus of claim 6, wherein the partitioning module is specifically configured to:
dividing weights in the same row in the tensor into units in the same dimension according to row parameters of the spatial dimension of the filter in the structural parameters to obtain a plurality of row dimension units;
or alternatively, the process may be performed,
and dividing weights in the same column in the tensor into units in the same dimension according to column parameters of the spatial dimension of the filter in the structural parameters to obtain a plurality of column dimension units.
8. The apparatus of claim 6, wherein the partitioning module is specifically configured to:
and dividing weights in the same filter space in the tensor into the same dimension units according to the filter space dimension parameters in the structural parameters to obtain a plurality of space dimension units.
9. The apparatus of claim 6, wherein the partitioning module is specifically configured to:
and dividing weights in the same input channel in the tensor into units in the same dimension according to the input channel parameters in the structural parameters to obtain a plurality of input channel dimension units.
10. The apparatus of claim 6, wherein the sparseness module is configured to:
summing the absolute values of all weights in each dimension unit aiming at each dimension unit to obtain a calculation result;
judging whether the calculation result is larger than a preset threshold value or not;
if not, setting the weight in the dimension unit to zero;
if the weight is larger than the preset weight, each weight in the dimension unit is kept unchanged.
CN201811320668.9A 2018-11-07 2018-11-07 Convolutional layer sparsification method and device for deep neural network Active CN111160516B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811320668.9A CN111160516B (en) 2018-11-07 2018-11-07 Convolutional layer sparsification method and device for deep neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811320668.9A CN111160516B (en) 2018-11-07 2018-11-07 Convolutional layer sparsification method and device for deep neural network

Publications (2)

Publication Number Publication Date
CN111160516A CN111160516A (en) 2020-05-15
CN111160516B true CN111160516B (en) 2023-09-05

Family

ID=70555336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811320668.9A Active CN111160516B (en) 2018-11-07 2018-11-07 Convolutional layer sparsification method and device for deep neural network

Country Status (1)

Country Link
CN (1) CN111160516B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220147826A1 (en) * 2020-11-06 2022-05-12 Moffett Technologies Co., Limited Method and system for convolution with workload-balanced activation sparsity
CN114692847B (en) * 2020-12-25 2024-01-09 中科寒武纪科技股份有限公司 Data processing circuit, data processing method and related products
CN113344200A (en) * 2021-06-17 2021-09-03 阿波罗智联(北京)科技有限公司 Method for training separable convolutional network, road side equipment and cloud control platform

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127297A (en) * 2016-06-02 2016-11-16 中国科学院自动化研究所 The acceleration of degree of depth convolutional neural networks based on resolution of tensor and compression method
CN107480770A (en) * 2017-07-27 2017-12-15 中国科学院自动化研究所 The adjustable neutral net for quantifying bit wide quantifies the method and device with compression
CN107729995A (en) * 2017-10-31 2018-02-23 中国科学院计算技术研究所 Method and system and neural network processor for accelerans network processing unit
CN107784276A (en) * 2017-10-13 2018-03-09 中南大学 Microseismic event recognition methods and device
CN108073917A (en) * 2018-01-24 2018-05-25 燕山大学 A kind of face identification method based on convolutional neural networks
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN108345831A (en) * 2017-12-28 2018-07-31 新智数字科技有限公司 The method, apparatus and electronic equipment of Road image segmentation based on point cloud data

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108351982B (en) * 2015-11-12 2022-07-08 谷歌有限责任公司 Convolution gated recurrent neural network
US11055063B2 (en) * 2016-05-02 2021-07-06 Marvell Asia Pte, Ltd. Systems and methods for deep learning processor
US11501131B2 (en) * 2016-09-09 2022-11-15 SK Hynix Inc. Neural network hardware accelerator architectures and operating method thereof
WO2018053464A1 (en) * 2016-09-19 2018-03-22 Vicarious Fpc, Inc. Systems and methods for deep learning with small training sets
US10037490B2 (en) * 2016-12-13 2018-07-31 Google Llc Performing average pooling in hardware
US11048997B2 (en) * 2016-12-27 2021-06-29 Texas Instruments Incorporated Reduced complexity convolution for convolutional neural networks

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127297A (en) * 2016-06-02 2016-11-16 中国科学院自动化研究所 The acceleration of degree of depth convolutional neural networks based on resolution of tensor and compression method
CN107480770A (en) * 2017-07-27 2017-12-15 中国科学院自动化研究所 The adjustable neutral net for quantifying bit wide quantifies the method and device with compression
CN107784276A (en) * 2017-10-13 2018-03-09 中南大学 Microseismic event recognition methods and device
CN107729995A (en) * 2017-10-31 2018-02-23 中国科学院计算技术研究所 Method and system and neural network processor for accelerans network processing unit
CN108345831A (en) * 2017-12-28 2018-07-31 新智数字科技有限公司 The method, apparatus and electronic equipment of Road image segmentation based on point cloud data
CN108280514A (en) * 2018-01-05 2018-07-13 中国科学技术大学 Sparse neural network acceleration system based on FPGA and design method
CN108073917A (en) * 2018-01-24 2018-05-25 燕山大学 A kind of face identification method based on convolutional neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
深度学习技术兴起将给安贿来什幺;谢迪;《中国安防》;全文 *

Also Published As

Publication number Publication date
CN111160516A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN111160516B (en) Convolutional layer sparsification method and device for deep neural network
CN106779057B (en) Method and device for calculating binary neural network convolution based on GPU
CN111832437B (en) Building drawing identification method, electronic equipment and related products
CN111144561A (en) Neural network model determining method and device
KR20210032140A (en) Method and apparatus for performing pruning of neural network
CN112001110A (en) Structural damage identification monitoring method based on vibration signal space real-time recursive graph convolutional neural network
CN107133190A (en) The training method and training system of a kind of machine learning system
CN108805174A (en) clustering method and device
CN111709415B (en) Target detection method, device, computer equipment and storage medium
CN112132278A (en) Model compression method and device, computer equipment and storage medium
CN112990420A (en) Pruning method for convolutional neural network model
CN112947080B (en) Scene parameter transformation-based intelligent decision model performance evaluation system
CN111160517B (en) Convolutional layer quantization method and device for deep neural network
US11036980B2 (en) Information processing method and information processing system
CN113485848B (en) Deep neural network deployment method and device, computer equipment and storage medium
CN114462723B (en) Cloud layer migration minute-level photovoltaic power prediction method based on high-altitude wind resource influence
EP3293682A1 (en) Method and device for analyzing sensor data
KR102365270B1 (en) Method of generating sparse neural networks and system therefor
CN110097183B (en) Information processing method and information processing system
CN114494682A (en) Object position prediction method, device, equipment and storage medium
CN113255927A (en) Logistic regression model training method and device, computer equipment and storage medium
CN109362027B (en) Positioning method, device, equipment and storage medium
WO2021055364A1 (en) Efficient inferencing with fast pointwise convolution
EP2325755A2 (en) Method of searching a set of real numbers for a nearest neighbor
CN116738816A (en) Structure vulnerability analysis method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant