WO2023111109A1

WO2023111109A1 - Class separation aware artificial neural network pruning method

Info

Publication number: WO2023111109A1
Application number: PCT/EP2022/085998
Authority: WO
Inventors: S Indeer PREET; Oisin Boydell; John DEEPU
Original assignee: University College Dublin
Priority date: 2021-12-14
Filing date: 2022-12-14
Publication date: 2023-06-22
Also published as: GB202118066D0

Abstract

Disclosed is a method of pruning an artificial neural network (ANN) that includes recording activations of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes, scoring each neuron of the ANN based on a class-separation ability of a neuron in corresponding activations, and pruning the one or more neurons based on the scoring. The scoring of a neuron comprises computing a plurality of activation scores of a neuron for corresponding plurality of data inputs for each class, calculating a class centre of the neuron based on an average value of the plurality of activation scores for said class for each class, calculating a set of Euclidean distances for the neuron, for corresponding sets of class pairs, wherein a Euclidean distance is distance between class centres of corresponding pair, and calculating a class-separation score of the neuron, wherein the class-separation score is a median value of the plurality of Euclidean distances.

Description

Title

Class Separation Aware Artificial Neural Network Pruning Method

Field

The disclosure relates to artificial neural networks, and more specifically to pruning an artificial neural network.

Background of Invention

Deep Neural Networks (DNNs) have achieved state-of-the- art accuracy across a variety of tasks like classification, semantic segmentation, and robot navigation. Typically, these DNNs contain a few hundred thousand to a few million parameters which can take a lot of memory, computation resources and power, which typical devices at the edge are severely constrained in. Several techniques like quantization, tensor decomposition and pruning have been proposed to compress DNNs for easy deployment on these edge devices.

Neural network pruning has been deemed essential in the deployment of DNNs on resource constrained edge devices, greatly reducing the number of network parameters without drastically compromising accuracy. There are several pruning techniques present in the literature based on /-norm based metrics and diversity score-based approaches. The /-norm based techniques rely too heavily on magnitude of parameters or activations, in that, the smaller the /-norm (or the magnitude) of a parameter, the less is its importance, an abstraction which cannot be justified and can be detrimental to the accuracy of a classification task. An importance score is assigned to each parameter and those of least importance are pruned. However, most of these methods are based on generalised estimations of the importance of each parameter, ignoring the context of the specific task at hand.

On the other hand, diversity-based metrics do not take into account that if a filter/neuron has variable output for the same class, it would lead to more diversity, but at the same time decrease the classification accuracy for that neuron. Thus, most of these methods would give a higher diversity score only because the outputs are spread over. Some techniques use mean standard deviation of a feature map to measure information variance and remove the ones with redundant information. This eliminates the need for /-norm based scores. However, when calculating the diversity, they aggregate statistics of all feature maps irrespective of their classes. In some other techniques, geometric median of filters is used which also suffers from the diversity problem. Further, simple magnitude-based pruning is effective, in which the weights are pruned based on their magnitude, and then the pruned network is fine tuned to recover accuracy. However, this leads to unstructured pruning and hardware resources may better utilize the pruned network if they are structured.

Further, most pruning algorithms prune individual connections or weights leading to a sparse network without taking into account if the hardware on which the network is deployed on can take advantage of that sparsity or not. Also, generally people with skills in architecting and training a neural network do not have embedded programming skills and the true measure of the neural network’s performance after deployment can only be ascertained after it has been deployed on an embedded system. Some methods try to do this by simulating edge devices in software, but they are often not accurate.

The document titled ’’Rethinking Class-Discrimination based CNN channel pruning” by Yuchen Liu et al disclose a FLOP-normalized sensitivity analysis scheme to automate the structural pruning procedure. Said document discloses calculating a score by taking one class, and determining how it is different from the cluster of all the other classes taken together. This might not be justified as a neuron/filter might be able to differentiate between a few classes and not all, and if the other neurons can differentiate between other classes, the accuracy would be high.

Another document US 2018/0181867 discloses a method and apparatus for configuring an artificial neural network to a particular surveillance situation in that the object classes form a subset of the total number of object classes for which the artificial neural network is trained. A database is accessed that includes activation frequency values for the neurons within the artificial neural network. Those neurons having activation frequency values lower than a threshold value for the subset of selected object classes are removed (pruned) from the artificial neural network. However, even if the activation of a neuron is high, it may fail to discriminate between all the classes. This is because the activations for all the classes might be similarly high.

Summary of the Invention

In an aspect of the present invention, there is provided, as set out in the appended claims, a method of pruning an artificial neural network (ANN) that comprises recording activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes, scoring each neuron of the ANN based on the class-separation ability of a neuron in corresponding activations, and pruning one or more neurons based on the scores. The scoring method can be further paired with different methods, such as an evolutionary algorithm, to fully automate or customize pruning.

The present invention provides a task specific pruning approach that is based on how efficiently a neuron, or a convolutional filter is able to separate classes. The pruning method of the present invention is based on an axiomatic approach that assigns an importance score based on how separable different classes are in the output activations or feature maps, preserving separation of classes which avoids reduction in classification accuracy. The pruning method of the present invention does not rely on magnitudes of weights or activations or their /-norms, and makes it invariant to the zero centre of data, and it only depends on separation between classes. The present pruning method also solves the problem of lack of diversity by aggregating the statistics of different classes separately unlike earlier works which kept them together. The whole neurons or filters are pruned leading to a more structured pruned network, whose sparsity can be more efficiently utilised by the hardware. The pruning method of the present invention is evaluated against a bench- mark dataset and network architectures and it is shown that how the pruning method of the present invention outperforms comparable pruning techniques.

As compared to existing systems, the present invention succeeds in discriminating among the classes by not relying on the mere activation values but the difference between the activation values of different classes, by separating the activation of different classes before calculating the final score. So, even if a neuron has high activation, the present invention would generate a lower score, when not being able to differentiate between the different classes. As such, even if a neuron has a high activation which is similar for all classes, it will have a lesser score than a corresponding one with low activations, but which is able to differentiate between different classes.

Further, the present invention does not disclose calculating a score by taking one class, and determining how it is different from the cluster of all the other classes taken together. Instead, the present invention takes pairs of classes and calculates how well a neuron/filter is able to differentiate between those pairs, so if the neuron is able to differentiate between a good enough number of pairs, the separation score would reflect this and be high as it should be.

In an embodiment of the present invention, the method further includes for each class, computing a plurality of activation scores of a neuron for corresponding plurality of data inputs; for each class, calculating a class centre of the neuron, wherein the class centre is an average value of the plurality of activation scores for said class; calculating a set of Euclidean distances for the neuron, for corresponding sets of class pairs, after flattening them, wherein a Euclidean distance is distance between class centres of corresponding pair; calculating a class-separation score of the neuron, wherein the class-separation score is a median value of the plurality of Euclidean distances, wherein the class-separation score is a measure of the class-separation ability of the neuron for the plurality of classes; and pruning the neuron with the lowest scores from the ANN, after choosing the number of neurons to be pruned.

In an embodiment of the present invention, the ANN is one of: a deep neural network (DNN), a convolutional neural network (CNN), and a fully connected neural network.

In an embodiment of the present invention, the ANN includes a first number of layers, and each layer comprises a second number of neurons. In an embodiment of the present invention, the method further includes pruning a layer of the ANN based on class-separation scores of neurons of corresponding layer.

In an embodiment of the present invention, the class centre of a neuron for a given class is calculated based on the equation:

where, cc-J- is class centre of ij^th neuron for class ri and Actfj is activation score of ij^th neuron for class r for the d^th input of the input dataset.

In an embodiment of the present invention, the set E of Euclidean distances is calculated based on the following:

Where,

j E of Euclidean distances for ij^th neuron neuron

second class centre of pair for ij^th neuron

In a further aspect of the present invention, there is provided a system for pruning an artificial neural network (ANN) that includes a memory, and a processor communicatively coupled to the memory. The processor is configured to record activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes, score each neuron of the ANN based on a class-separation ability of a neuron in corresponding activations, and prune the one or more neurons based on the scoring.

In an embodiment of the present invention, the processor is further configured to: for each class, compute a plurality of activation scores of a neuron for corresponding plurality of data inputs; for each class, calculate a class centre of the neuron, wherein the class centre is an average value of the plurality of activation scores for said class; calculate a set of Euclidean distances for the neuron, for corresponding sets of class pairs, wherein a Euclidean distance is distance between class centres of corresponding pair; calculate a class-separation score of the neuron, wherein the class-separation score is a median value of the plurality of Euclidean distances, wherein the class- separation score is a measure of the class-separation ability of the neuron for the plurality of classes; and prune the neuron from the ANN, when the class-separation score of the neuron is less than a threshold class-separation score.

In yet another aspect of the present invention, there is provided a non-transitory computer readable medium having stored thereon computer-executable instructions which, when executed by a processor, cause the processor to record activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes; score each neuron of the ANN based on a class-separation ability of a neuron in corresponding activations; and prune the one or more neurons based on the scoring.

Brief Description of the Drawings

The invention will be more clearly understood from the following description of an embodiment thereof, given by way of example only, with reference to the accompanying drawings, in which:-

FIG.1 illustrates an artificial neural network (ANN), in accordance with an embodiment of the present invention;

FIG.2 is a flowchart illustrating a method of pruning the ANN, in accordance with an embodiment of the present invention;

FIG.3 illustrates a table that show the layers and the number of filters pruned in each layer for an exemplary ANN for different compression ratios, in accordance with an embodiment of the present invention; and

FIG.4 illustrates accuracy vs. CR and speedup for the exemplary ANN based on the table of FIG.3, in accordance with an embodiment of the present invention.

Detailed Description of the drawings

FIG.1 illustrates an artificial neural network (ANN) 100, in accordance with an embodiment of the present invention. The ANN 100 is one of a deep neural network (DNN), a convolutional neural network (CNN), and a fully connected NN. The ANN 100 includes k number of layers, such that the ANN 100 includes an input layer, hidden layer 1 , hidden layer 2, and an output layer, setting the value of k as 4.

The ANN 100 including k layers may be represented by the following:

ANN = {NNi, NN₂, NN₃, NNk} . (1 )

Where, NNi represents an i^th layer

Each layer of the ANN 100 includes a plurality of neurons/convolutional filters. The following invention has been described with reference to neurons, however, it would be apparent to one of ordinary skill in the art, that the present embodiments would be applicable to the convolutional filters as well. In an example, the input layer includes three neurons, each of the hidden layers 1 and 2 include four neurons, and the output layer includes a single neuron.

In the ANN 100, each neuron is represented by NNij, where ij is the address of the neuron, such that NNij is j^th neuron of i^th layer. For example, the neuron 102 may be represented as Noo, the neuron 104 may be represented as N01, and the neuron 106 may be represented as N10.

FIG.2 is a flowchart illustrating a method 200 of pruning the ANN 100, in accordance with an embodiment of the present invention.

Referring to FIGS. 1 and 2 together, at step 202, the method includes recording activation of the ANN 100 on an input dataset that includes a plurality of data inputs of a plurality of classes.

Let there be a dataset D with m data inputs, each belonging to one of the n classes in set C, such that

D = {di, d₂, d₃, ...., dm} . (2)

Where, di, d2, ds, .... dm correspond to the plurality of data inputs of the dataset D,

where,

correspond to the plurality of classes of the data inputs

In the context of the present invention, the activation of the ANN 100 on the input dataset D is recorded by computing an activation dataset of each neuron separately for each class. As a result, for each neuron, the number of activation datasets would be equal to the number of classes of the input dataset D. In an example, when there are three classes ci,C2, C3 of input data, then for the input dataset D, three activation datasets corresponding to the three classes are computed for a neuron 102. The computation of activation datasets of the neuron 102 independently for each class is based on the assumption that the activations of the neuron 102 on the input dataset D would be different for different classes.

Further, for a given class m, the activation dataset Actij (m) of the ij^th neuron includes a plurality of activation scores corresponding to the plurality of data inputs of D, such that each activation score Actij^d (ci) corresponds to an output of ij^th neuron for the d^th input for class m. The exemplary activation datasets Actij (ci), Actij (02), and Actij (03) of the neuron 102 for an exemplary input dataset D, for the classes m, C2 and C3 are represented by the following:

Actij (m) = [10, 20, -110, 0, 232, 123, 0.12, . ] . (4)

Actij (c₂) = [123, 231 , -1 , -2345, 0.12, . ] . (5)

Actij (c₃) = [-100, 232, 232, 1 ,1 ,1 , . ] . (6)

Thereafter, at step 204, the method includes scoring each neuron of the ANN 100 based on a class-separation ability of a neuron in corresponding activations. The class- separation ability of a neuron is measured based on a class-separation score Css of the neuron. The class-separation score Css of a neuron is a measure of how much separation in classes the neuron has. The class-separation score Css of a neuron is measured based on class centres of the plurality of classes. The class center (ccⁿ) of the ij^th filter/neuron for a given class cⁿ’ is calculated as follows:

Where,

Class centre of

neuron for class cⁿ’ Summation of activation scores of ij^th neuron for class cⁿ’ for all the data

inputs of the dataset

Thus, the class centre of a neuron for a given class is mean of activation scores of corresponding activation dataset. In an example, the class centre of the neuron 102 for class m is computed by summing all the activation scores of the activation score dataset Actij (m) (Equation 4), and then dividing the sum by the total number of recorded elements. In the similar manner, the class centre of each neuron for each class is computed based on the input dataset D, and each class centre is flattened to form a vector.

After that, a set E of Euclidean distances between all pairs of class centres within the same neuron is calculated based on the following equation: {Euclidean distance

Where,

= a set E of Euclidean distances for ij^th neuron

first class centre of pair for ij^th neuron second class centre of pair for ij^th neuron

In an example, when a neuron has three class centres cm, CC2, and ccs corresponding to three classes m , C2 and C3, then the pairs of class centres are (cm, CC2), (cc2, ccs) and (CC1, CC3).

It is to be noted that /-norms are used for calculating Euclidean distances in equation 8, but that is neither on the weights nor activations. It is only to quantify the distance between class centres, and overcomes the problems posed by calculation of /-norms in existing pruning methods.

Once the set E of Euclidean distances is calculated for each neuron of the ANN 100, a class-separation score Css is estimated for each neuron based on the following:

Where, class separation score fo

neuron a set of Euclidean distances for neuron

Therefore, the class separation score of a neuron is estimated by calculating a median value of Euclidean distances of corresponding set. The median is chosen here because it is robust to noisy data. If mean is chosen instead of median, the class separation score may be incorrectly inflated for neurons, where only a few classes have large separation and rest others overlap.

The use of class-separation score as a metric and also the way it is calculated by first grouping the classes together is a novel aspect of the present invention.

At step 206, the method includes pruning the one or more neurons based on the scoring. A neuron is pruned from the ANN, when the class-separation score of the neuron is less than a threshold class-separation score. In the context of the present invention, the pruning a neuron means assigning 0 value to weight of the neuron. Also, as the outputs of the pruned layers are pruned, the weights/connections in the subsequent layer are also removed. Also, the pruning can be extended to whole layers, instead of just neurons.

In the context of the present invention, a variety of algorithms may be used to prune the ANN 100 based on the class separation scores of the neurons. Some examples include:

• Global pruning may be considered where the neurons with the least class separation scores are filtered out. The percentage of neurons to prune may also be chosen to make the process completely automatic

• Layer-wise structured pruning may also be chosen where percentage of neurons to prune in each layer may be specified. Percentage for each layer could be same or different. This algorithm may also be made completely automatic.

• A human may also look at the class separation scores and see which neurons and filters may be pruned out. This process is manual but hand-tuning pruning like this may give more control over how the model is pruned.

When the method 200 is implemented using human in the loop algorithm, where after looking at class separation scores for all neurons, a user may manually decide on the neurons to be pruned. The pseudo code of the method 200 implemented on a Deep Neural Network (DNN) using human in the loop algorithm is summarized below:

Experiments

An input dataset (for example, known dataset CIFAR-10) with 60,000 images of size 32x32 and 3 channels has been used for demonstrating pruning method 200 of the present invention on exemplary neural networks. The input dataset has 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck) and may be used for classification tasks. Two metrics, the theoretical speedup and compression ratio (CR) are measured herein because different kind of layers of the neural network have different Floating-Point Operations (FLOPs), and though the number of parameters pruned might be same, the reduction in execution time depends on FLOPs. Hence the two metrics, one that quantifies the reduction in memory footprint (CR) and the other that quantifies the reduction in execution time (speedup) are needed. The theoretical speedup and compression ratio (CR) are estimated based on the following:

The accuracy of an unpruned network can be set as a baseline, and a change in accuracy of the pruned network from the baseline is used to determine how the method 200 of the present invention affects the accuracy.

In a first example, an exemplary residual network, more popularly known as ResNet-56 is pruned using the method 200. The ResNet-56 includes 27 blocks, and each block consists of two convolutional layers. When compared with the state-of-the-art, the accuracy of pruned network is found to be reasonably well while getting the best compression ratio. The theoretical speedup is found to be around 2.5x, and more than half of the parameters of the network have been pruned (CR=2.1 1x).

In a second example, an exemplary residual network, more popularly known as ResNet- 32 is pruned using the method 200. The pre-trained model has an accuracy of 92.63% on the input dataset. The network has 15 residual blocks, and is divided into three groups, each group having 5 residual blocks. Each group is denoted by layerl , Iayer2 and Iayer3. The two layers of each residual block are named convl and conv2. Table I shows these layers and the number of filters pruned in each layer for different compression ratios. Table I also illustrates the compression ratio (CR), speed up and change in accuracy from the baseline.

It is to be noted from Table I that how quickly accuracy drops when CR goes above 2.00. It decreases more than 2% for 2.52 and 3.05. But at the same time, the pruned NN has less than half the parameters of the unpruned NN, so this drop is justified. And as high losses in accuracy are usually avoided, the range 1 .1 -2.02 of CR is focussed, and it can be clearly seen that more the NN is compressed, more is the decrease in accuracy, but in this range the drop is minuscule. Figure 4 illustrates accuracy vs. CR and speedup for ResNet-32 on the input dataset based on Table I. It is shown that how the accuracy drops when CR and speedup are increased. The accuracy increases for compression ratios 1 .10, 1 .20 and 1 .30. All these experiments show that the technique of the present invention works over a wide range of compression ratios and speedup with acceptable drop in accuracy, and is able to prune large networks on an input dataset, and achieve highest compression rates.

Thus, using an axiomatic approach with well-defined concept can prune a neural network effectively achieving high compression ratios and speedup compared to existing studies. The pruning method of the present invention is envisioned to be used with other methods of pruning. The pruning method may be extended to other deep learning tasks like regression, segmentation, etc.

Further, in an embodiment of the present invention, there is provided a tool that allows a neural network expert to automatically apply the network pruning method and deploy and evaluate their pruned models on an edge device directly. The edge hardware devices are devices that are constrained in terms of resources such as memory and processing power. The tool automates the application of the network pruning on a trained NN and deploys the NN. Performance measures such as change in accuracy and the memory occupied are calculated, allowing the neural network expert to understand the effects of the pruning method on the deployed NN.

In the specification, the terms "comprise, comprises, comprised and comprising" or any variation thereof and the terms include, includes, included and including" or any variation thereof are considered to be interchangeable, and they should all be afforded the widest possible interpretation and vice versa.

The invention is not limited to the embodiments hereinbefore described but may be varied in both construction and detail.

Claims

1 . A method of pruning an artificial neural network (ANN) comprising: recording activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes; for each class, computing a plurality of activation scores of a neuron for corresponding plurality of data inputs; for each class, calculating a class centre of the neuron, wherein the class centre is an average value of the plurality of activation scores for said class; calculating a set of Euclidean distances for the neuron, for corresponding sets of class pairs, wherein a Euclidean distance is distance between class centres of corresponding pair; calculating a class-separation score of the neuron, wherein the class-separation score is a median value of the plurality of Euclidean distances, wherein the class-separation score is a measure of the class-separation ability of the neuron for the plurality of classes; and pruning the neuron from the ANN, when the class-separation score of the neuron is less than a threshold class-separation score.

2. The method as claimed in any preceding claim, wherein the ANN is one of: a deep neural network (DNN), a convolutional neural network (CNN), and a fully connected neural network.

3. The method as claimed in any preceding claim, wherein the ANN includes a first number of layers, and each layer comprises a second number of neurons.

4. The method as claimed in any preceding claim further comprising pruning a layer of the ANN based on class-separation scores of neurons of corresponding layer.

5. The method as claimed in any preceding claim, wherein the class centre of a neuron for a given class is calculated based on the equation

where is class centre of neuron for class ^is summation of

activation scores of neuron for class

for all the data inputs of the dataset.

6. The method as claimed in any preceding claim, wherein the set E of Euclidean distances is calculated based on the following:

Where, Eij is a set E of Euclidean distances for ij^th neuron,

’ is first class centre of pair for neuron and is second class centre of pair for

neuron.

7. A system for pruning an artificial neural network (ANN) comprising a memory, and a processor communicatively coupled to the memory, and configured to: record activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes; for each class, compute a plurality of activation scores of a neuron for corresponding plurality of data inputs; for each class, calculate a class centre of the neuron, wherein the class centre is an average value of the plurality of activation scores for said class; calculate a set of Euclidean distances for the neuron, for corresponding sets of class pairs, wherein a Euclidean distance is distance between class centres of corresponding pair; and calculate a class-separation score of the neuron, wherein the class- separation score is a median value of the plurality of Euclidean distances, wherein the class-separation score is a measure of the class-separation ability of the neuron for the plurality of classes; and prune the neuron from the ANN, when the class-separation score of the neuron is less than a threshold class-separation score.

8. The system as claimed in claim 7, wherein the ANN is one of: a deep neural network (DNN), a convolutional neural network (CNN), and a fully connected neural network.

9. The system as claimed in claim 7, wherein the processor is further configured to prune a layer of the ANN based on class-separation scores of neurons of corresponding layer.

10. A non-transitory computer readable medium having stored thereon computer- executable instructions which, when executed by a processor, cause the processor to: record activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes; for each class, compute a plurality of activation scores of a neuron for corresponding plurality of data inputs; for each class, calculate a class centre of the neuron, wherein the class centre is an average value of the plurality of activation scores for said class; calculate a set of Euclidean distances for the neuron, for corresponding sets of class pairs, wherein a Euclidean distance is distance between class centres of corresponding pair; and calculate a class-separation score of the neuron, wherein the class- separation score is a median value of the plurality of Euclidean distances, wherein the class-separation score is a measure of the class-separation ability of the neuron for the plurality of classes; and prune the neuron from the ANN, when the class-separation score of the neuron is less than a threshold class-separation score.