WO2023111109A1 - Class separation aware artificial neural network pruning method - Google Patents

Class separation aware artificial neural network pruning method Download PDF

Info

Publication number
WO2023111109A1
WO2023111109A1 PCT/EP2022/085998 EP2022085998W WO2023111109A1 WO 2023111109 A1 WO2023111109 A1 WO 2023111109A1 EP 2022085998 W EP2022085998 W EP 2022085998W WO 2023111109 A1 WO2023111109 A1 WO 2023111109A1
Authority
WO
WIPO (PCT)
Prior art keywords
class
neuron
ann
separation
neural network
Prior art date
Application number
PCT/EP2022/085998
Other languages
French (fr)
Inventor
S Indeer PREET
Oisin Boydell
John DEEPU
Original Assignee
University College Dublin
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University College Dublin filed Critical University College Dublin
Publication of WO2023111109A1 publication Critical patent/WO2023111109A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming

Definitions

  • the disclosure relates to artificial neural networks, and more specifically to pruning an artificial neural network.
  • DNNs Deep Neural Networks
  • these DNNs have achieved state-of-the- art accuracy across a variety of tasks like classification, semantic segmentation, and robot navigation.
  • these DNNs contain a few hundred thousand to a few million parameters which can take a lot of memory, computation resources and power, which typical devices at the edge are severely constrained in.
  • Several techniques like quantization, tensor decomposition and pruning have been proposed to compress DNNs for easy deployment on these edge devices.
  • Neural network pruning has been deemed essential in the deployment of DNNs on resource constrained edge devices, greatly reducing the number of network parameters without drastically compromising accuracy.
  • the /-norm based techniques rely too heavily on magnitude of parameters or activations, in that, the smaller the /-norm (or the magnitude) of a parameter, the less is its importance, an abstraction which cannot be justified and can be detrimental to the accuracy of a classification task.
  • An importance score is assigned to each parameter and those of least importance are pruned.
  • most of these methods are based on generalised estimations of the importance of each parameter, ignoring the context of the specific task at hand.
  • the document titled ’’Rethinking Class-Discrimination based CNN channel pruning” by Yuchen Liu et al disclose a FLOP-normalized sensitivity analysis scheme to automate the structural pruning procedure. Said document discloses calculating a score by taking one class, and determining how it is different from the cluster of all the other classes taken together. This might not be justified as a neuron/filter might be able to differentiate between a few classes and not all, and if the other neurons can differentiate between other classes, the accuracy would be high.
  • Another document US 2018/0181867 discloses a method and apparatus for configuring an artificial neural network to a particular surveillance situation in that the object classes form a subset of the total number of object classes for which the artificial neural network is trained.
  • a database is accessed that includes activation frequency values for the neurons within the artificial neural network. Those neurons having activation frequency values lower than a threshold value for the subset of selected object classes are removed (pruned) from the artificial neural network.
  • the activation of a neuron is high, it may fail to discriminate between all the classes. This is because the activations for all the classes might be similarly high.
  • a method of pruning an artificial neural network that comprises recording activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes, scoring each neuron of the ANN based on the class-separation ability of a neuron in corresponding activations, and pruning one or more neurons based on the scores.
  • the scoring method can be further paired with different methods, such as an evolutionary algorithm, to fully automate or customize pruning.
  • the present invention provides a task specific pruning approach that is based on how efficiently a neuron, or a convolutional filter is able to separate classes.
  • the pruning method of the present invention is based on an axiomatic approach that assigns an importance score based on how separable different classes are in the output activations or feature maps, preserving separation of classes which avoids reduction in classification accuracy.
  • the pruning method of the present invention does not rely on magnitudes of weights or activations or their /-norms, and makes it invariant to the zero centre of data, and it only depends on separation between classes.
  • the present pruning method also solves the problem of lack of diversity by aggregating the statistics of different classes separately unlike earlier works which kept them together.
  • the whole neurons or filters are pruned leading to a more structured pruned network, whose sparsity can be more efficiently utilised by the hardware.
  • the pruning method of the present invention is evaluated against a bench- mark dataset and network architectures and it is shown that how the pruning method of the present invention outperforms comparable pruning techniques.
  • the present invention succeeds in discriminating among the classes by not relying on the mere activation values but the difference between the activation values of different classes, by separating the activation of different classes before calculating the final score. So, even if a neuron has high activation, the present invention would generate a lower score, when not being able to differentiate between the different classes. As such, even if a neuron has a high activation which is similar for all classes, it will have a lesser score than a corresponding one with low activations, but which is able to differentiate between different classes.
  • the present invention does not disclose calculating a score by taking one class, and determining how it is different from the cluster of all the other classes taken together. Instead, the present invention takes pairs of classes and calculates how well a neuron/filter is able to differentiate between those pairs, so if the neuron is able to differentiate between a good enough number of pairs, the separation score would reflect this and be high as it should be.
  • the method further includes for each class, computing a plurality of activation scores of a neuron for corresponding plurality of data inputs; for each class, calculating a class centre of the neuron, wherein the class centre is an average value of the plurality of activation scores for said class; calculating a set of Euclidean distances for the neuron, for corresponding sets of class pairs, after flattening them, wherein a Euclidean distance is distance between class centres of corresponding pair; calculating a class-separation score of the neuron, wherein the class-separation score is a median value of the plurality of Euclidean distances, wherein the class-separation score is a measure of the class-separation ability of the neuron for the plurality of classes; and pruning the neuron with the lowest scores from the ANN, after choosing the number of neurons to be pruned.
  • the ANN is one of: a deep neural network (DNN), a convolutional neural network (CNN), and a fully connected neural network.
  • DNN deep neural network
  • CNN convolutional neural network
  • fully connected neural network a fully connected neural network
  • the ANN includes a first number of layers, and each layer comprises a second number of neurons.
  • the method further includes pruning a layer of the ANN based on class-separation scores of neurons of corresponding layer.
  • the class centre of a neuron for a given class is calculated based on the equation: where, cc-J- is class centre of ij th neuron for class ri and Actfj is activation score of ij th neuron for class r for the d th input of the input dataset.
  • the set E of Euclidean distances is calculated based on the following: Where, j E of Euclidean distances for ij th neuron neuron second class centre of pair for ij th neuron
  • a system for pruning an artificial neural network that includes a memory, and a processor communicatively coupled to the memory.
  • the processor is configured to record activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes, score each neuron of the ANN based on a class-separation ability of a neuron in corresponding activations, and prune the one or more neurons based on the scoring.
  • the processor is further configured to: for each class, compute a plurality of activation scores of a neuron for corresponding plurality of data inputs; for each class, calculate a class centre of the neuron, wherein the class centre is an average value of the plurality of activation scores for said class; calculate a set of Euclidean distances for the neuron, for corresponding sets of class pairs, wherein a Euclidean distance is distance between class centres of corresponding pair; calculate a class-separation score of the neuron, wherein the class-separation score is a median value of the plurality of Euclidean distances, wherein the class- separation score is a measure of the class-separation ability of the neuron for the plurality of classes; and prune the neuron from the ANN, when the class-separation score of the neuron is less than a threshold class-separation score.
  • a non-transitory computer readable medium having stored thereon computer-executable instructions which, when executed by a processor, cause the processor to record activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes; score each neuron of the ANN based on a class-separation ability of a neuron in corresponding activations; and prune the one or more neurons based on the scoring.
  • FIG.1 illustrates an artificial neural network (ANN), in accordance with an embodiment of the present invention
  • FIG.2 is a flowchart illustrating a method of pruning the ANN, in accordance with an embodiment of the present invention
  • FIG.3 illustrates a table that show the layers and the number of filters pruned in each layer for an exemplary ANN for different compression ratios, in accordance with an embodiment of the present invention.
  • FIG.4 illustrates accuracy vs. CR and speedup for the exemplary ANN based on the table of FIG.3, in accordance with an embodiment of the present invention.
  • FIG.1 illustrates an artificial neural network (ANN) 100, in accordance with an embodiment of the present invention.
  • the ANN 100 is one of a deep neural network (DNN), a convolutional neural network (CNN), and a fully connected NN.
  • the ANN 100 includes k number of layers, such that the ANN 100 includes an input layer, hidden layer 1 , hidden layer 2, and an output layer, setting the value of k as 4.
  • the ANN 100 including k layers may be represented by the following:
  • ANN ⁇ NNi, NN 2 , NN 3 , NNk ⁇ . (1 )
  • NNi represents an i th layer
  • Each layer of the ANN 100 includes a plurality of neurons/convolutional filters.
  • the following invention has been described with reference to neurons, however, it would be apparent to one of ordinary skill in the art, that the present embodiments would be applicable to the convolutional filters as well.
  • the input layer includes three neurons
  • each of the hidden layers 1 and 2 include four neurons
  • the output layer includes a single neuron.
  • each neuron is represented by NNij, where ij is the address of the neuron, such that NNij is j th neuron of i th layer.
  • the neuron 102 may be represented as Noo
  • the neuron 104 may be represented as N01
  • the neuron 106 may be represented as N10.
  • FIG.2 is a flowchart illustrating a method 200 of pruning the ANN 100, in accordance with an embodiment of the present invention.
  • the method includes recording activation of the ANN 100 on an input dataset that includes a plurality of data inputs of a plurality of classes.
  • di, d2, ds, .... dm correspond to the plurality of data inputs of the dataset D, where, correspond to the plurality of classes of the data inputs
  • the activation of the ANN 100 on the input dataset D is recorded by computing an activation dataset of each neuron separately for each class.
  • the number of activation datasets would be equal to the number of classes of the input dataset D.
  • three activation datasets corresponding to the three classes are computed for a neuron 102.
  • the computation of activation datasets of the neuron 102 independently for each class is based on the assumption that the activations of the neuron 102 on the input dataset D would be different for different classes.
  • the activation dataset Actij (m) of the ij th neuron includes a plurality of activation scores corresponding to the plurality of data inputs of D, such that each activation score Actij d (ci) corresponds to an output of ij th neuron for the d th input for class m.
  • the exemplary activation datasets Actij (ci), Actij (02), and Actij (03) of the neuron 102 for an exemplary input dataset D, for the classes m, C2 and C3 are represented by the following:
  • Actij (m) [10, 20, -110, 0, 232, 123, 0.12, . ] . (4)
  • Actij (c 2 ) [123, 231 , -1 , -2345, 0.12, . ] . (5)
  • Actij (c 3 ) [-100, 232, 232, 1 ,1 ,1 , . ] . (6)
  • the method includes scoring each neuron of the ANN 100 based on a class-separation ability of a neuron in corresponding activations.
  • the class- separation ability of a neuron is measured based on a class-separation score Css of the neuron.
  • the class-separation score Css of a neuron is a measure of how much separation in classes the neuron has.
  • the class-separation score Css of a neuron is measured based on class centres of the plurality of classes.
  • the class center (cc n ) of the ij th filter/neuron for a given class c n ’ is calculated as follows:
  • Class centre of neuron for class c n Summation of activation scores of ij th neuron for class c n ’ for all the data inputs of the dataset
  • the class centre of a neuron for a given class is mean of activation scores of corresponding activation dataset.
  • the class centre of the neuron 102 for class m is computed by summing all the activation scores of the activation score dataset Actij (m) (Equation 4), and then dividing the sum by the total number of recorded elements.
  • the class centre of each neuron for each class is computed based on the input dataset D, and each class centre is flattened to form a vector.
  • /-norms are used for calculating Euclidean distances in equation 8, but that is neither on the weights nor activations. It is only to quantify the distance between class centres, and overcomes the problems posed by calculation of /-norms in existing pruning methods.
  • a class-separation score Css is estimated for each neuron based on the following:
  • class separation score fo neuron a set of Euclidean distances for neuron
  • the class separation score of a neuron is estimated by calculating a median value of Euclidean distances of corresponding set.
  • the median is chosen here because it is robust to noisy data. If mean is chosen instead of median, the class separation score may be incorrectly inflated for neurons, where only a few classes have large separation and rest others overlap.
  • class-separation score as a metric and also the way it is calculated by first grouping the classes together is a novel aspect of the present invention.
  • the method includes pruning the one or more neurons based on the scoring.
  • a neuron is pruned from the ANN, when the class-separation score of the neuron is less than a threshold class-separation score.
  • the pruning a neuron means assigning 0 value to weight of the neuron. Also, as the outputs of the pruned layers are pruned, the weights/connections in the subsequent layer are also removed. Also, the pruning can be extended to whole layers, instead of just neurons.
  • a variety of algorithms may be used to prune the ANN 100 based on the class separation scores of the neurons. Some examples include:
  • Global pruning may be considered where the neurons with the least class separation scores are filtered out.
  • the percentage of neurons to prune may also be chosen to make the process completely automatic
  • Layer-wise structured pruning may also be chosen where percentage of neurons to prune in each layer may be specified. Percentage for each layer could be same or different. This algorithm may also be made completely automatic.
  • a human may also look at the class separation scores and see which neurons and filters may be pruned out. This process is manual but hand-tuning pruning like this may give more control over how the model is pruned.
  • DNN Deep Neural Network
  • An input dataset (for example, known dataset CIFAR-10) with 60,000 images of size 32x32 and 3 channels has been used for demonstrating pruning method 200 of the present invention on exemplary neural networks.
  • the input dataset has 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck) and may be used for classification tasks.
  • Two metrics, the theoretical speedup and compression ratio (CR) are measured herein because different kind of layers of the neural network have different Floating-Point Operations (FLOPs), and though the number of parameters pruned might be same, the reduction in execution time depends on FLOPs.
  • FLOPs Floating-Point Operations
  • the two metrics one that quantifies the reduction in memory footprint (CR) and the other that quantifies the reduction in execution time (speedup) are needed.
  • the theoretical speedup and compression ratio (CR) are estimated based on the following:
  • the accuracy of an unpruned network can be set as a baseline, and a change in accuracy of the pruned network from the baseline is used to determine how the method 200 of the present invention affects the accuracy.
  • an exemplary residual network more popularly known as ResNet-56 is pruned using the method 200.
  • the ResNet-56 includes 27 blocks, and each block consists of two convolutional layers.
  • the accuracy of pruned network is found to be reasonably well while getting the best compression ratio.
  • an exemplary residual network more popularly known as ResNet- 32 is pruned using the method 200.
  • the pre-trained model has an accuracy of 92.63% on the input dataset.
  • the network has 15 residual blocks, and is divided into three groups, each group having 5 residual blocks. Each group is denoted by layerl , Iayer2 and Iayer3.
  • the two layers of each residual block are named convl and conv2.
  • Table I shows these layers and the number of filters pruned in each layer for different compression ratios. Table I also illustrates the compression ratio (CR), speed up and change in accuracy from the baseline.
  • the pruning method of the present invention is envisioned to be used with other methods of pruning.
  • the pruning method may be extended to other deep learning tasks like regression, segmentation, etc.
  • a tool that allows a neural network expert to automatically apply the network pruning method and deploy and evaluate their pruned models on an edge device directly.
  • the edge hardware devices are devices that are constrained in terms of resources such as memory and processing power.
  • the tool automates the application of the network pruning on a trained NN and deploys the NN. Performance measures such as change in accuracy and the memory occupied are calculated, allowing the neural network expert to understand the effects of the pruning method on the deployed NN.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed is a method of pruning an artificial neural network (ANN) that includes recording activations of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes, scoring each neuron of the ANN based on a class-separation ability of a neuron in corresponding activations, and pruning the one or more neurons based on the scoring. The scoring of a neuron comprises computing a plurality of activation scores of a neuron for corresponding plurality of data inputs for each class, calculating a class centre of the neuron based on an average value of the plurality of activation scores for said class for each class, calculating a set of Euclidean distances for the neuron, for corresponding sets of class pairs, wherein a Euclidean distance is distance between class centres of corresponding pair, and calculating a class-separation score of the neuron, wherein the class-separation score is a median value of the plurality of Euclidean distances.

Description

Title
Class Separation Aware Artificial Neural Network Pruning Method
Field
The disclosure relates to artificial neural networks, and more specifically to pruning an artificial neural network.
Background of Invention
Deep Neural Networks (DNNs) have achieved state-of-the- art accuracy across a variety of tasks like classification, semantic segmentation, and robot navigation. Typically, these DNNs contain a few hundred thousand to a few million parameters which can take a lot of memory, computation resources and power, which typical devices at the edge are severely constrained in. Several techniques like quantization, tensor decomposition and pruning have been proposed to compress DNNs for easy deployment on these edge devices.
Neural network pruning has been deemed essential in the deployment of DNNs on resource constrained edge devices, greatly reducing the number of network parameters without drastically compromising accuracy. There are several pruning techniques present in the literature based on /-norm based metrics and diversity score-based approaches. The /-norm based techniques rely too heavily on magnitude of parameters or activations, in that, the smaller the /-norm (or the magnitude) of a parameter, the less is its importance, an abstraction which cannot be justified and can be detrimental to the accuracy of a classification task. An importance score is assigned to each parameter and those of least importance are pruned. However, most of these methods are based on generalised estimations of the importance of each parameter, ignoring the context of the specific task at hand.
On the other hand, diversity-based metrics do not take into account that if a filter/neuron has variable output for the same class, it would lead to more diversity, but at the same time decrease the classification accuracy for that neuron. Thus, most of these methods would give a higher diversity score only because the outputs are spread over. Some techniques use mean standard deviation of a feature map to measure information variance and remove the ones with redundant information. This eliminates the need for /-norm based scores. However, when calculating the diversity, they aggregate statistics of all feature maps irrespective of their classes. In some other techniques, geometric median of filters is used which also suffers from the diversity problem. Further, simple magnitude-based pruning is effective, in which the weights are pruned based on their magnitude, and then the pruned network is fine tuned to recover accuracy. However, this leads to unstructured pruning and hardware resources may better utilize the pruned network if they are structured.
Further, most pruning algorithms prune individual connections or weights leading to a sparse network without taking into account if the hardware on which the network is deployed on can take advantage of that sparsity or not. Also, generally people with skills in architecting and training a neural network do not have embedded programming skills and the true measure of the neural network’s performance after deployment can only be ascertained after it has been deployed on an embedded system. Some methods try to do this by simulating edge devices in software, but they are often not accurate.
The document titled ’’Rethinking Class-Discrimination based CNN channel pruning” by Yuchen Liu et al disclose a FLOP-normalized sensitivity analysis scheme to automate the structural pruning procedure. Said document discloses calculating a score by taking one class, and determining how it is different from the cluster of all the other classes taken together. This might not be justified as a neuron/filter might be able to differentiate between a few classes and not all, and if the other neurons can differentiate between other classes, the accuracy would be high.
Another document US 2018/0181867 discloses a method and apparatus for configuring an artificial neural network to a particular surveillance situation in that the object classes form a subset of the total number of object classes for which the artificial neural network is trained. A database is accessed that includes activation frequency values for the neurons within the artificial neural network. Those neurons having activation frequency values lower than a threshold value for the subset of selected object classes are removed (pruned) from the artificial neural network. However, even if the activation of a neuron is high, it may fail to discriminate between all the classes. This is because the activations for all the classes might be similarly high.
Summary of the Invention
In an aspect of the present invention, there is provided, as set out in the appended claims, a method of pruning an artificial neural network (ANN) that comprises recording activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes, scoring each neuron of the ANN based on the class-separation ability of a neuron in corresponding activations, and pruning one or more neurons based on the scores. The scoring method can be further paired with different methods, such as an evolutionary algorithm, to fully automate or customize pruning.
The present invention provides a task specific pruning approach that is based on how efficiently a neuron, or a convolutional filter is able to separate classes. The pruning method of the present invention is based on an axiomatic approach that assigns an importance score based on how separable different classes are in the output activations or feature maps, preserving separation of classes which avoids reduction in classification accuracy. The pruning method of the present invention does not rely on magnitudes of weights or activations or their /-norms, and makes it invariant to the zero centre of data, and it only depends on separation between classes. The present pruning method also solves the problem of lack of diversity by aggregating the statistics of different classes separately unlike earlier works which kept them together. The whole neurons or filters are pruned leading to a more structured pruned network, whose sparsity can be more efficiently utilised by the hardware. The pruning method of the present invention is evaluated against a bench- mark dataset and network architectures and it is shown that how the pruning method of the present invention outperforms comparable pruning techniques.
As compared to existing systems, the present invention succeeds in discriminating among the classes by not relying on the mere activation values but the difference between the activation values of different classes, by separating the activation of different classes before calculating the final score. So, even if a neuron has high activation, the present invention would generate a lower score, when not being able to differentiate between the different classes. As such, even if a neuron has a high activation which is similar for all classes, it will have a lesser score than a corresponding one with low activations, but which is able to differentiate between different classes.
Further, the present invention does not disclose calculating a score by taking one class, and determining how it is different from the cluster of all the other classes taken together. Instead, the present invention takes pairs of classes and calculates how well a neuron/filter is able to differentiate between those pairs, so if the neuron is able to differentiate between a good enough number of pairs, the separation score would reflect this and be high as it should be.
In an embodiment of the present invention, the method further includes for each class, computing a plurality of activation scores of a neuron for corresponding plurality of data inputs; for each class, calculating a class centre of the neuron, wherein the class centre is an average value of the plurality of activation scores for said class; calculating a set of Euclidean distances for the neuron, for corresponding sets of class pairs, after flattening them, wherein a Euclidean distance is distance between class centres of corresponding pair; calculating a class-separation score of the neuron, wherein the class-separation score is a median value of the plurality of Euclidean distances, wherein the class-separation score is a measure of the class-separation ability of the neuron for the plurality of classes; and pruning the neuron with the lowest scores from the ANN, after choosing the number of neurons to be pruned.
In an embodiment of the present invention, the ANN is one of: a deep neural network (DNN), a convolutional neural network (CNN), and a fully connected neural network.
In an embodiment of the present invention, the ANN includes a first number of layers, and each layer comprises a second number of neurons. In an embodiment of the present invention, the method further includes pruning a layer of the ANN based on class-separation scores of neurons of corresponding layer.
In an embodiment of the present invention, the class centre of a neuron for a given class is calculated based on the equation:
Figure imgf000007_0001
where, cc-J- is class centre of ijth neuron for class ri and Actfj is activation score of ijth neuron for class r for the dth input of the input dataset.
In an embodiment of the present invention, the set E of Euclidean distances is calculated based on the following:
Figure imgf000007_0002
Where,
Figure imgf000007_0003
j E of Euclidean distances for ijth neuron neuron
Figure imgf000007_0004
Figure imgf000007_0005
second class centre of pair for ijth neuron
In a further aspect of the present invention, there is provided a system for pruning an artificial neural network (ANN) that includes a memory, and a processor communicatively coupled to the memory. The processor is configured to record activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes, score each neuron of the ANN based on a class-separation ability of a neuron in corresponding activations, and prune the one or more neurons based on the scoring.
In an embodiment of the present invention, the processor is further configured to: for each class, compute a plurality of activation scores of a neuron for corresponding plurality of data inputs; for each class, calculate a class centre of the neuron, wherein the class centre is an average value of the plurality of activation scores for said class; calculate a set of Euclidean distances for the neuron, for corresponding sets of class pairs, wherein a Euclidean distance is distance between class centres of corresponding pair; calculate a class-separation score of the neuron, wherein the class-separation score is a median value of the plurality of Euclidean distances, wherein the class- separation score is a measure of the class-separation ability of the neuron for the plurality of classes; and prune the neuron from the ANN, when the class-separation score of the neuron is less than a threshold class-separation score.
In yet another aspect of the present invention, there is provided a non-transitory computer readable medium having stored thereon computer-executable instructions which, when executed by a processor, cause the processor to record activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes; score each neuron of the ANN based on a class-separation ability of a neuron in corresponding activations; and prune the one or more neurons based on the scoring.
Brief Description of the Drawings
The invention will be more clearly understood from the following description of an embodiment thereof, given by way of example only, with reference to the accompanying drawings, in which:-
FIG.1 illustrates an artificial neural network (ANN), in accordance with an embodiment of the present invention;
FIG.2 is a flowchart illustrating a method of pruning the ANN, in accordance with an embodiment of the present invention;
FIG.3 illustrates a table that show the layers and the number of filters pruned in each layer for an exemplary ANN for different compression ratios, in accordance with an embodiment of the present invention; and
FIG.4 illustrates accuracy vs. CR and speedup for the exemplary ANN based on the table of FIG.3, in accordance with an embodiment of the present invention.
Detailed Description of the drawings
FIG.1 illustrates an artificial neural network (ANN) 100, in accordance with an embodiment of the present invention. The ANN 100 is one of a deep neural network (DNN), a convolutional neural network (CNN), and a fully connected NN. The ANN 100 includes k number of layers, such that the ANN 100 includes an input layer, hidden layer 1 , hidden layer 2, and an output layer, setting the value of k as 4.
The ANN 100 including k layers may be represented by the following:
ANN = {NNi, NN2, NN3, NNk} . (1 )
Where, NNi represents an ith layer
Each layer of the ANN 100 includes a plurality of neurons/convolutional filters. The following invention has been described with reference to neurons, however, it would be apparent to one of ordinary skill in the art, that the present embodiments would be applicable to the convolutional filters as well. In an example, the input layer includes three neurons, each of the hidden layers 1 and 2 include four neurons, and the output layer includes a single neuron.
In the ANN 100, each neuron is represented by NNij, where ij is the address of the neuron, such that NNij is jth neuron of ith layer. For example, the neuron 102 may be represented as Noo, the neuron 104 may be represented as N01, and the neuron 106 may be represented as N10.
FIG.2 is a flowchart illustrating a method 200 of pruning the ANN 100, in accordance with an embodiment of the present invention.
Referring to FIGS. 1 and 2 together, at step 202, the method includes recording activation of the ANN 100 on an input dataset that includes a plurality of data inputs of a plurality of classes.
Let there be a dataset D with m data inputs, each belonging to one of the n classes in set C, such that
D = {di, d2, d3, ...., dm} . (2)
Where, di, d2, ds, .... dm correspond to the plurality of data inputs of the dataset D,
Figure imgf000010_0001
where,
Figure imgf000010_0002
correspond to the plurality of classes of the data inputs
In the context of the present invention, the activation of the ANN 100 on the input dataset D is recorded by computing an activation dataset of each neuron separately for each class. As a result, for each neuron, the number of activation datasets would be equal to the number of classes of the input dataset D. In an example, when there are three classes ci,C2, C3 of input data, then for the input dataset D, three activation datasets corresponding to the three classes are computed for a neuron 102. The computation of activation datasets of the neuron 102 independently for each class is based on the assumption that the activations of the neuron 102 on the input dataset D would be different for different classes.
Further, for a given class m, the activation dataset Actij (m) of the ijth neuron includes a plurality of activation scores corresponding to the plurality of data inputs of D, such that each activation score Actijd (ci) corresponds to an output of ijth neuron for the dth input for class m. The exemplary activation datasets Actij (ci), Actij (02), and Actij (03) of the neuron 102 for an exemplary input dataset D, for the classes m, C2 and C3 are represented by the following:
Actij (m) = [10, 20, -110, 0, 232, 123, 0.12, . ] . (4)
Actij (c2) = [123, 231 , -1 , -2345, 0.12, . ] . (5)
Actij (c3) = [-100, 232, 232, 1 ,1 ,1 , . ] . (6)
Thereafter, at step 204, the method includes scoring each neuron of the ANN 100 based on a class-separation ability of a neuron in corresponding activations. The class- separation ability of a neuron is measured based on a class-separation score Css of the neuron. The class-separation score Css of a neuron is a measure of how much separation in classes the neuron has. The class-separation score Css of a neuron is measured based on class centres of the plurality of classes. The class center (ccn) of the ijth filter/neuron for a given class cn’ is calculated as follows:
Figure imgf000011_0001
Where,
Figure imgf000011_0002
Class centre of
Figure imgf000011_0003
neuron for class cn’ Summation of activation scores of ijth neuron for class cn’ for all the data
Figure imgf000011_0004
inputs of the dataset
Thus, the class centre of a neuron for a given class is mean of activation scores of corresponding activation dataset. In an example, the class centre of the neuron 102 for class m is computed by summing all the activation scores of the activation score dataset Actij (m) (Equation 4), and then dividing the sum by the total number of recorded elements. In the similar manner, the class centre of each neuron for each class is computed based on the input dataset D, and each class centre is flattened to form a vector.
After that, a set E of Euclidean distances between all pairs of class centres within the same neuron is calculated based on the following equation: {Euclidean distance
Figure imgf000011_0006
Figure imgf000011_0005
Where,
Figure imgf000011_0007
= a set E of Euclidean distances for ijth neuron
Figure imgf000011_0008
first class centre of pair for ijth neuron second class centre of pair for ijth neuron
Figure imgf000011_0009
In an example, when a neuron has three class centres cm, CC2, and ccs corresponding to three classes m , C2 and C3, then the pairs of class centres are (cm, CC2), (cc2, ccs) and (CC1, CC3).
It is to be noted that /-norms are used for calculating Euclidean distances in equation 8, but that is neither on the weights nor activations. It is only to quantify the distance between class centres, and overcomes the problems posed by calculation of /-norms in existing pruning methods.
Once the set E of Euclidean distances is calculated for each neuron of the ANN 100, a class-separation score Css is estimated for each neuron based on the following:
Figure imgf000012_0001
Where, class separation score fo
Figure imgf000012_0003
neuron a set of Euclidean distances for neuron
Figure imgf000012_0002
Figure imgf000012_0004
Therefore, the class separation score of a neuron is estimated by calculating a median value of Euclidean distances of corresponding set. The median is chosen here because it is robust to noisy data. If mean is chosen instead of median, the class separation score may be incorrectly inflated for neurons, where only a few classes have large separation and rest others overlap.
The use of class-separation score as a metric and also the way it is calculated by first grouping the classes together is a novel aspect of the present invention.
At step 206, the method includes pruning the one or more neurons based on the scoring. A neuron is pruned from the ANN, when the class-separation score of the neuron is less than a threshold class-separation score. In the context of the present invention, the pruning a neuron means assigning 0 value to weight of the neuron. Also, as the outputs of the pruned layers are pruned, the weights/connections in the subsequent layer are also removed. Also, the pruning can be extended to whole layers, instead of just neurons.
In the context of the present invention, a variety of algorithms may be used to prune the ANN 100 based on the class separation scores of the neurons. Some examples include:
• Global pruning may be considered where the neurons with the least class separation scores are filtered out. The percentage of neurons to prune may also be chosen to make the process completely automatic
• Layer-wise structured pruning may also be chosen where percentage of neurons to prune in each layer may be specified. Percentage for each layer could be same or different. This algorithm may also be made completely automatic.
• A human may also look at the class separation scores and see which neurons and filters may be pruned out. This process is manual but hand-tuning pruning like this may give more control over how the model is pruned.
When the method 200 is implemented using human in the loop algorithm, where after looking at class separation scores for all neurons, a user may manually decide on the neurons to be pruned. The pseudo code of the method 200 implemented on a Deep Neural Network (DNN) using human in the loop algorithm is summarized below:
Figure imgf000014_0002
Experiments
An input dataset (for example, known dataset CIFAR-10) with 60,000 images of size 32x32 and 3 channels has been used for demonstrating pruning method 200 of the present invention on exemplary neural networks. The input dataset has 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck) and may be used for classification tasks. Two metrics, the theoretical speedup and compression ratio (CR) are measured herein because different kind of layers of the neural network have different Floating-Point Operations (FLOPs), and though the number of parameters pruned might be same, the reduction in execution time depends on FLOPs. Hence the two metrics, one that quantifies the reduction in memory footprint (CR) and the other that quantifies the reduction in execution time (speedup) are needed. The theoretical speedup and compression ratio (CR) are estimated based on the following:
Figure imgf000014_0001
Figure imgf000015_0001
The accuracy of an unpruned network can be set as a baseline, and a change in accuracy of the pruned network from the baseline is used to determine how the method 200 of the present invention affects the accuracy.
In a first example, an exemplary residual network, more popularly known as ResNet-56 is pruned using the method 200. The ResNet-56 includes 27 blocks, and each block consists of two convolutional layers. When compared with the state-of-the-art, the accuracy of pruned network is found to be reasonably well while getting the best compression ratio. The theoretical speedup is found to be around 2.5x, and more than half of the parameters of the network have been pruned (CR=2.1 1x).
In a second example, an exemplary residual network, more popularly known as ResNet- 32 is pruned using the method 200. The pre-trained model has an accuracy of 92.63% on the input dataset. The network has 15 residual blocks, and is divided into three groups, each group having 5 residual blocks. Each group is denoted by layerl , Iayer2 and Iayer3. The two layers of each residual block are named convl and conv2. Table I shows these layers and the number of filters pruned in each layer for different compression ratios. Table I also illustrates the compression ratio (CR), speed up and change in accuracy from the baseline.
It is to be noted from Table I that how quickly accuracy drops when CR goes above 2.00. It decreases more than 2% for 2.52 and 3.05. But at the same time, the pruned NN has less than half the parameters of the unpruned NN, so this drop is justified. And as high losses in accuracy are usually avoided, the range 1 .1 -2.02 of CR is focussed, and it can be clearly seen that more the NN is compressed, more is the decrease in accuracy, but in this range the drop is minuscule. Figure 4 illustrates accuracy vs. CR and speedup for ResNet-32 on the input dataset based on Table I. It is shown that how the accuracy drops when CR and speedup are increased. The accuracy increases for compression ratios 1 .10, 1 .20 and 1 .30. All these experiments show that the technique of the present invention works over a wide range of compression ratios and speedup with acceptable drop in accuracy, and is able to prune large networks on an input dataset, and achieve highest compression rates.
Thus, using an axiomatic approach with well-defined concept can prune a neural network effectively achieving high compression ratios and speedup compared to existing studies. The pruning method of the present invention is envisioned to be used with other methods of pruning. The pruning method may be extended to other deep learning tasks like regression, segmentation, etc.
Further, in an embodiment of the present invention, there is provided a tool that allows a neural network expert to automatically apply the network pruning method and deploy and evaluate their pruned models on an edge device directly. The edge hardware devices are devices that are constrained in terms of resources such as memory and processing power. The tool automates the application of the network pruning on a trained NN and deploys the NN. Performance measures such as change in accuracy and the memory occupied are calculated, allowing the neural network expert to understand the effects of the pruning method on the deployed NN.
In the specification, the terms "comprise, comprises, comprised and comprising" or any variation thereof and the terms include, includes, included and including" or any variation thereof are considered to be interchangeable, and they should all be afforded the widest possible interpretation and vice versa.
The invention is not limited to the embodiments hereinbefore described but may be varied in both construction and detail.

Claims

Claims
1 . A method of pruning an artificial neural network (ANN) comprising: recording activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes; for each class, computing a plurality of activation scores of a neuron for corresponding plurality of data inputs; for each class, calculating a class centre of the neuron, wherein the class centre is an average value of the plurality of activation scores for said class; calculating a set of Euclidean distances for the neuron, for corresponding sets of class pairs, wherein a Euclidean distance is distance between class centres of corresponding pair; calculating a class-separation score of the neuron, wherein the class-separation score is a median value of the plurality of Euclidean distances, wherein the class-separation score is a measure of the class-separation ability of the neuron for the plurality of classes; and pruning the neuron from the ANN, when the class-separation score of the neuron is less than a threshold class-separation score.
2. The method as claimed in any preceding claim, wherein the ANN is one of: a deep neural network (DNN), a convolutional neural network (CNN), and a fully connected neural network.
3. The method as claimed in any preceding claim, wherein the ANN includes a first number of layers, and each layer comprises a second number of neurons.
4. The method as claimed in any preceding claim further comprising pruning a layer of the ANN based on class-separation scores of neurons of corresponding layer.
5. The method as claimed in any preceding claim, wherein the class centre of a neuron for a given class is calculated based on the equation
Figure imgf000018_0001
where is class centre of neuron for class is summation of
Figure imgf000018_0002
Figure imgf000018_0003
activation scores of neuron for class
Figure imgf000018_0005
for all the data inputs of the dataset.
Figure imgf000018_0004
6. The method as claimed in any preceding claim, wherein the set E of Euclidean distances is calculated based on the following:
Figure imgf000018_0006
Where, Eij is a set E of Euclidean distances for ijth neuron,
Figure imgf000018_0007
’ is first class centre of pair for neuron and is second class centre of pair for
Figure imgf000018_0008
neuron.
Figure imgf000018_0010
Figure imgf000018_0009
7. A system for pruning an artificial neural network (ANN) comprising a memory, and a processor communicatively coupled to the memory, and configured to: record activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes; for each class, compute a plurality of activation scores of a neuron for corresponding plurality of data inputs; for each class, calculate a class centre of the neuron, wherein the class centre is an average value of the plurality of activation scores for said class; calculate a set of Euclidean distances for the neuron, for corresponding sets of class pairs, wherein a Euclidean distance is distance between class centres of corresponding pair; and calculate a class-separation score of the neuron, wherein the class- separation score is a median value of the plurality of Euclidean distances, wherein the class-separation score is a measure of the class-separation ability of the neuron for the plurality of classes; and prune the neuron from the ANN, when the class-separation score of the neuron is less than a threshold class-separation score.
8. The system as claimed in claim 7, wherein the ANN is one of: a deep neural network (DNN), a convolutional neural network (CNN), and a fully connected neural network.
9. The system as claimed in claim 7, wherein the processor is further configured to prune a layer of the ANN based on class-separation scores of neurons of corresponding layer.
10. A non-transitory computer readable medium having stored thereon computer- executable instructions which, when executed by a processor, cause the processor to: record activation of the ANN on an input dataset that includes a plurality of data inputs of a plurality of classes; for each class, compute a plurality of activation scores of a neuron for corresponding plurality of data inputs; for each class, calculate a class centre of the neuron, wherein the class centre is an average value of the plurality of activation scores for said class; calculate a set of Euclidean distances for the neuron, for corresponding sets of class pairs, wherein a Euclidean distance is distance between class centres of corresponding pair; and calculate a class-separation score of the neuron, wherein the class- separation score is a median value of the plurality of Euclidean distances, wherein the class-separation score is a measure of the class-separation ability of the neuron for the plurality of classes; and prune the neuron from the ANN, when the class-separation score of the neuron is less than a threshold class-separation score.
PCT/EP2022/085998 2021-12-14 2022-12-14 Class separation aware artificial neural network pruning method WO2023111109A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GBGB2118066.6A GB202118066D0 (en) 2021-12-14 2021-12-14 Class separation aware artificial neural network pruning method
GB2118066.6 2021-12-14

Publications (1)

Publication Number Publication Date
WO2023111109A1 true WO2023111109A1 (en) 2023-06-22

Family

ID=80080168

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/085998 WO2023111109A1 (en) 2021-12-14 2022-12-14 Class separation aware artificial neural network pruning method

Country Status (2)

Country Link
GB (1) GB202118066D0 (en)
WO (1) WO2023111109A1 (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180181867A1 (en) 2016-12-21 2018-06-28 Axis Ab Artificial neural network class-based pruning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180181867A1 (en) 2016-12-21 2018-06-28 Axis Ab Artificial neural network class-based pruning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUCHEN LIU ET AL: "Rethinking Class-Discrimination Based CNN Channel Pruning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 29 April 2020 (2020-04-29), XP081655141 *

Also Published As

Publication number Publication date
GB202118066D0 (en) 2022-01-26

Similar Documents

Publication Publication Date Title
CN110633745B (en) Image classification training method and device based on artificial intelligence and storage medium
US11176418B2 (en) Model test methods and apparatuses
US11334789B2 (en) Feature selection for retraining classifiers
KR20200022739A (en) Method and device to recognize image and method and device to train recognition model based on data augmentation
CN110929029A (en) Text classification method and system based on graph convolution neural network
CN107545897A (en) Conversation activity presumption method, conversation activity estimating device and program
CN105469376B (en) The method and apparatus for determining picture similarity
CN110874550A (en) Data processing method, device, equipment and system
CN107153817A (en) Pedestrian's weight identification data mask method and device
CN112163637B (en) Image classification model training method and device based on unbalanced data
CN109800781A (en) A kind of image processing method, device and computer readable storage medium
CN107223260B (en) Method for dynamically updating classifier complexity
CN111723915A (en) Pruning method of deep convolutional neural network, computer equipment and application method
CN113569895A (en) Image processing model training method, processing method, device, equipment and medium
US20200372371A1 (en) System and a method for optimizing multiple solution identification in a search space
CN113705596A (en) Image recognition method and device, computer equipment and storage medium
CN112560881B (en) Object recognition method and device and data processing method
CN115545086A (en) Migratable feature automatic selection acoustic diagnosis method and system
Marchese et al. Topological learning for acoustic signal identification
CN114417095A (en) Data set partitioning method and device
CN111352926A (en) Data processing method, device, equipment and readable storage medium
CN107203916B (en) User credit model establishing method and device
CN113344182A (en) Network model compression method based on deep learning
WO2023111109A1 (en) Class separation aware artificial neural network pruning method
CN114359605A (en) Face clustering method, device and equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22835049

Country of ref document: EP

Kind code of ref document: A1