WO2023085968A1 - Device and method for neural network pruning - Google Patents

Device and method for neural network pruning Download PDF

Info

Publication number
WO2023085968A1
WO2023085968A1 PCT/RU2021/000505 RU2021000505W WO2023085968A1 WO 2023085968 A1 WO2023085968 A1 WO 2023085968A1 RU 2021000505 W RU2021000505 W RU 2021000505W WO 2023085968 A1 WO2023085968 A1 WO 2023085968A1
Authority
WO
WIPO (PCT)
Prior art keywords
denotes
neural network
parameter representation
continuous parameter
data processing
Prior art date
Application number
PCT/RU2021/000505
Other languages
French (fr)
Inventor
Kirill Igorevich SOLODSKIKH
Azim Edgarovich KURBANOV
Ruslan Daurenovich AYDARKHANOV
Dehua SONG
Alexander Nikolaevich Filippov
Youliang Yan
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/RU2021/000505 priority Critical patent/WO2023085968A1/en
Publication of WO2023085968A1 publication Critical patent/WO2023085968A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the present disclosure relates to devices and methods in the fields of computer science, in particular, artificial intelligence (Al).
  • Al artificial intelligence
  • the disclosure relates especially to devices and methods for neural network pruning.
  • pruning a neural network refers to a process of deleting unnecessary or least important parameters, such as weights and neurons, from a trained neural network.
  • Neural network pruning is widely used for neural network compression to achieve a lightweight trained neural network model of any desired size.
  • the lightweight trained neural network may have the benefits of a reduced size and an accelerated execution speed during an inference phase.
  • neural network pruning may be simply referred to as pruning.
  • a neural network may be simply referred to as a model.
  • a further issue with conventional neural network pruning is that changing the size of the neural network during the inference phase is not allowed after pruning is done.
  • an objective of this disclosure is to enable neural network pruning to an arbitrary size without retraining. Another objective is to allow the size of a neural network to change flexibly during the inference phase.
  • An idea described in the present disclosure is to use a continuous representation, such as integral operations, to replace a discrete transformation denoted by weights of neural network layers. Then, this continuous representation of the neural network layer(s) may be discretized to an arbitrary size at any time, also during the inference phase.
  • a first aspect of the present disclosure provides a data processing apparatus configured to obtain a trained neural network comprising a plurality of neural network layers, in which each neural network layer comprises a first set of discrete weights.
  • the data processing apparatus is configured to determine a continuous parameter representation for the first set of discrete weights based on a linear combination of Riemann integrable functions. Then, for each neural network layer, the data processing apparatus is configured to discretize the continuous parameter representation to obtain a second set of discrete weights, and generate a layer output by processing a layer input based on the second set of discrete weights.
  • each neural network layer of the trained neural network comprises the second set of discrete weights.
  • the size of the second set of discrete weights may be less than that of the first set of discrete weights. That is, the size of the neural network may be reduced and the trained neural network may be pruned.
  • the first set of discrete weights may be presented by multidimensional continuous surfaces based on the linear combination of the Riemann integrable functions. This may allow generating a neural network of any desired size without fine-tuning or retraining. This is because trained parameters may have inertial effect to the neighborhood of each parameter to some extent. Unlike conventional neural network pruning, where information of the pruned (or deleted) elements is simply lost, a discretization based on the continuous parameter representation may include the inertial effect to the neighborhood for each weight.
  • a further advantage of the present disclosure is that instead of storing the first set of discrete weights for each neural network layer conventionally, for storing the trained neural network, the data processing apparatus may ultimately only need to store the continuous parameter representation for each neural network layer. In this way, the storage space of the data processing apparatus may be saved.
  • values of the first set of discrete weights are in the range of [0, 1].
  • a Riemann integrable function in general, may refer to a function that its lower and upper integral are the same.
  • the data processing apparatus may be configured to, only during the inference phase of the trained neural network, perform the discretization of the continuous parameter representation and the generation of the layer output.
  • the data processing apparatus may be configured to perform the generation of the layer output only during the inference phase.
  • the data processing apparatus may be configured to:
  • n denotes a dimension of the continuous parameter representation
  • applying the meshgrid operation may refer to using a meshgrid function to create a rectangular grid out of the continuous parameter representation.
  • the data processing apparatus may be configured to use the numerical integration method to compute a quadrature over a partition of rectangular gird and obtain the result of the numerical integration method as a discrete weight of the second set of discrete weights.
  • the data processing apparatus may be configured to apply uniform partitions in each dimension of the continuous parameter representation.
  • any suitable kind of partition may be used. By using the uniform partitions, computational complexity may be further reduced.
  • the data processing apparatus may be further configured to adapt a size of the second set of discrete weights based on computational complexity of an inference phase of the trained neural network.
  • the size of the second set of discrete weights may be understood as the size of each layer.
  • the data processing apparatus may be further configured to:
  • the data processing apparatus may be configured to execute these two steps during the inference phase.
  • a non-linear transformation performed by the neural network may be turned into a numerical integration.
  • the size of the neural network and/or its computational complexity may be adapted on-demand during the inference phase.
  • the Riemann integrable functions may be based on wavelet functions.
  • wavelet functions may also simply refer to wavelets in the field of mathematics.
  • wavelet functions include but are not limited to: a Gaussian function, Morlet wavelet, and Ricker wavelet.
  • the Gaussian function may be used.
  • the continuous parameter representation may be as follows: wherein F w () denotes the continuous parameter representation, ⁇ denotes a vector of parameters including ⁇ i , ⁇ i and ⁇ i , det() denotes a determinant operation, denotes a location parameter, ⁇ i ; denotes diagonal positive definite matrix, ⁇ i denotes weights of linear combination, k denotes the number of dimensions, i denotes the number of Gaussian functions, and i,k are integers.
  • the plurality of neural network layers may comprise at least one 2-dimensional, 2D, convolutional layer, and for discretizing the continuous parameter representation to obtain a second set of discrete weights
  • the data processing apparatus may be configured to perform: for each 2D convolution layer, wherein W denotes the second set of discrete weights, i,j, k, I denote four dimensions of the weights of the 2D convolutional layer, and h denotes a step along each dimension.
  • the plurality of neural network layers may comprise at least one fully-connected layer, and for discretizing the continuous parameter representation to obtain a second set of discrete weights, the data processing apparatus may be configured to perform: for each fully-connected layer, wherein W denotes the second set of discrete weights, i,j denote two dimensions of the weights of the fully-connected layer, and h denotes a step along each dimension.
  • a second aspect of the present disclosure provides a data processing method.
  • the method comprises the following steps:
  • each neural network layer comprises a first set of discrete weights
  • the method may be performed by a single apparatus.
  • the steps of the method may be performed by a plurality of distributed apparatus.
  • the method may be performed by the apparatus of the first aspect.
  • the step of discretizing the continuous parameter representation may comprise:
  • n denotes a dimension of the continuous parameter representation
  • the step of discretizing the continuous parameter representation may comprise applying uniform partitions in each dimension of the continuous parameter representation.
  • the method may further comprise adapting a size of the second set of discrete weights based on a computational complexity of an inference phase of the trained neural network.
  • the method may further comprise:
  • the Riemann integrable functions may be based on wavelet functions.
  • the continuous parameter representation may be: wherein F w () denotes the continuous parameter representation, ⁇ denotes a vector of parameters including and ⁇ i , det() denotes a determinant operation, ⁇ i denotes a location parameter, ⁇ i denotes diagonal positive definite matrix, denotes weigths of linear combination, k denotes the number of dimensions, i denotes the number of Gaussian functions, and i,k are integers.
  • the plurality of neural network layers may comprise at least one 2-dimensional, 2D, convolutional layer
  • the discretizing the continuous parameter representation to obtain a second set of discrete weights may comprise performing: for each 2D convolution layer, wherein W denotes the second set of discrete weights, i,j, k, I denote four dimensions of the weights of the 2D convolutional layer, and h denotes a step along each dimension.
  • the plurality of neural network layers may comprise at least one fully-connected layer
  • the step of discretizing the continuous parameter representation to obtain a second set of discrete weights may comprise performing: for each fully-connected layer, wherein W denotes the second set of discrete weights, i,j denote two dimensions of the weights of the fully-connected layer, and h denotes a step along each dimension.
  • W denotes the second set of discrete weights
  • i,j denote two dimensions of the weights of the fully-connected layer
  • h denotes a step along each dimension.
  • a fourth aspect of the present disclosure provides a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the method according to any one of the second aspect or any implementation form thereof.
  • a fifth aspect of the present disclosure provides a chipset comprising instructions which, when executed by the chipset, cause the chipset to carry out the method according to any one of the fourth aspect or any implementation form thereof.
  • FIG. 1 shows an example of neural network pruning according to the present disclosure
  • FIG. 2 shows an illustration of neural network pruning according to the present disclosure
  • FIG. 3 shows a flow-diagram of a method for neural network pruning according to the present disclosure
  • FIG. 4 shows an illustrative result of neural network priming
  • FIG. 5 shows an application scenario of the present disclosure
  • FIG. 6 shows another application scenario of the present disclosure.
  • FIGs. 1-6 corresponding elements are labelled with the same reference signs, may share the same features and may function likewise. Moreover, it is noted that the number of elements, graphs of functions depicted and values depicted in FIGs. 1-6 are for illustration purposes only and shall not be interpreted as limitations to embodiments of the present disclosure.
  • FIG. 1 shows an example of neural network pruning according to the present disclosure.
  • a data processing apparatus 100 is firstly configured to obtain a trained neural network.
  • the trained neural network comprises a plurality of neural network layers, which may comprise an input layer, at least one hidden layer, and an output layer.
  • the neural network layer may be simply referred to as a layer.
  • Neural network layers are subsequently connected by establishing connections between neurons in neighboring layers.
  • the input layer may be configured to receive an input.
  • Each of the output layer and the at least one hidden layer may be configured to receive a layer output from its previous layer and apply a non-linear transformation based on its weights to generate its layer output.
  • the layer output of the output layer may refer to an output of the neural network.
  • Each layer may comprise weights and optional biases.
  • the trained neural network may refer to a neural network that has been trained based on a training set for a particular purpose, such as image processing. Training a neural network may be referred to as a training phase, while applying the trained neural network may be referred to as an inference phase.
  • the first set of discrete weights may refer to fine-timed parameters during the training phase.
  • the first set of discrete weights 111 shown in FIG. 1 is for illustration purposes, which may comprise a set of discrete values of weights associated with neurons of a layer.
  • the trained neural network comprises multiple layers, wherein each layer may comprise a set of discrete weights similar to the first set of discrete weights 111 shown in FIG. 1. In the following, aspects referring to the first set of discrete weights 111 shall apply likewise to any other layers of the trained neural network.
  • the data processing apparatus is configured to determine a continuous parameter representation 131 (or labeled as F w ) for the first set of discrete weights.
  • the continuous parameter representation 131 is based on a plurality of linear combination of Riemann integrable functions 121, 122, 123.
  • the continuous parameter representation 131 may be based on any reasonable number of Riemann integrable functions.
  • the data processing apparatus may be further configured to determine an upper limit of the number of Riemann integrable functions that can be used based on its computational capability.
  • numerical integration methods can generally be described as combining evaluations of the integrand to get an approximation to the integral, which is shown as follows:
  • the integrand may be evaluated at a finite set of points called integration points and a weighted sum of these values is used to approximate the integral.
  • the integration points and weights depend on the specific method used and the accuracy required from the approximation.
  • the evaluation of multiple integrals can be reduced to iterated integral by iteratively applying such quadratures.
  • a wavelet function is Riemann integrable. Therefore, in some embodiments of the present disclosure, the Riemann integrable functions may be based on wavelet functions.
  • the continuous parameter representation may be based on a linear combination of Gaussian functions as follows:
  • F w () denotes the continuous parameter representation and may define parameter surface of weights
  • 9 denotes a vector of parameters including ⁇ i , ⁇ i and ⁇ i
  • det() denotes a determinant operation
  • ⁇ i denotes a location parameter
  • ⁇ i denotes diagonal positive definite matrix
  • ⁇ i denotes weights of linear combination
  • k denotes the number of dimensions
  • i denotes the number of Gaussian functions
  • i, k are integers.
  • ⁇ ii , and ⁇ i are trainable parameters and may be determined in a learnable way.
  • the trainable parameters may be randomly initialized.
  • the initialization of the location parameters [ ⁇ may be based on a uniform grid.
  • the number of functions along each axis is defined by shape discretization.
  • the univariate Gaussian functions are mostly concentrated in segment[ ⁇ -3 ⁇ , ⁇ + 3 ⁇ ]3 ⁇ r], so that neighboring Gaussian functions may have a high support at the intersections to ensure the necessary gradient behavior.
  • Gaussian functions are mostly localized in a bounded area which makes it possible to concentrate to the training of trainable parameters within the cube ⁇ .
  • Diagonal covariance matrices may lead to a fast evaluation while allowing to train complex surfaces with a small number of parameters.
  • F W ( ⁇ , x) may not be limited to equation (3).
  • the data processing apparatus 100 may be configured to store the linear combination F w () instead of storing the first set of discrete weights for each layer.
  • the data processing apparatus 100 is configured to discretize the continuous parameter representation to obtain a second set of discrete weights.
  • the second set of discrete weights may be seen as part of a pruned neural network, which is lightweight and may help to reduce the computational complexity of the inference phase.
  • FIG. 1 further illustrates an example of a discretization of the continuous parameter representation to obtain one of the second set of discrete weights.
  • the data processing apparatus 100 may be configured to determine the size of the second set of discrete weights 141, which is fourteen as an example. Then, the data processing apparatus 100 may be configured to determine a corresponding number of partitions and perform integral on each partition to obtain, as a result, a discrete weight of the second set of discrete weights 141.
  • the second set of discrete weights may be obtained by:
  • the second set of discrete weights may be obtained by:
  • FIG. 2 shows an illustration of neural network pruning according to the present disclosure.
  • neural network pruning on a three-dimensional cube is illustrated. Similar to FIG. 1, weights of higher dimensions (larger than two) of each layer may be presented by a linear combination of surfaces or wavelets on a high-dimensional unit cube, optionally via smooth integral kernel evaluation.
  • the linear combination may be represented by function FwQ and a continuous surface may be formed.
  • the data processing apparatus 100 may be configured to applying a meshgrid operation to obtain a meshgrid on the continuous surface.
  • the function FwQ may be discretized by performing integral on partitions of the meshgrid according to a desired shape and quadrate (e.g., according to a desired neural network size).
  • FIG. 3 shows a flow-diagram of a method 300 according to the present disclosure.
  • the method 300 may be performed by the apparatus 100.
  • the method 300 comprises the following steps:
  • step 301 obtaining a trained neural network comprising a plurality of neural network layers, wherein each neural network layer comprises a first set of discrete weights; for each neural network layer:
  • step 302 determining a continuous parameter representation for the first set of discrete weights based on a linear combination of Riemann integrable functions
  • step 303 discretizing the continuous parameter representation to obtain a second set of discrete weights; and during an inference phase of the trained neural network,
  • - step 304 discretizing the continuous parameter representation to obtain a second set of discrete weights.
  • a single apparatus may be configured to execute the method 300.
  • multiple apparatus or components of a device may be configured to execute different steps of the method 300. It is noted although an apparatus 100 is used with respect to FIG. 1, it shall be understood that it does not exclude an embodiment where multiple apparatus may be configured to execute the steps mentioned in the method 300.
  • a first apparatus may be configured to execute steps 301-302 once a trained neural network is obtained. Then, the first apparatus may be configured to provide the continuous parameter representation to a second apparatus.
  • the second apparatus may be specifically configured to execute the trained neural network during the inference phase. During the inference phase, the second apparatus may be configured to execute steps 303 and 304.
  • the first apparatus may be a server, while the second apparatus may be a terminal.
  • a first execution unit of a device may be configured to execute steps 301- 302, while a second execution unit of the device may be configured to execute steps 303-304.
  • the device may be a mobile device and may comprise multiple cores in its CPU.
  • the multiple cores may comprise relatively battery-saving and slower processor cores (known as ‘little core’), and relatively more powerful and power-hungry processor cores (known as ‘big core’).
  • the one or more big cores may be configured to execute steps 301-302; while the one or more little cores may be configured to execute steps 303-304.
  • the device may further comprise an Al accelerator chip, such as a tensor core, neural processing unit, tensor processing unit, and graph processing unit.
  • the Al accelerator chip may be used to assist Al-related computations in steps 301-304.
  • the steps of the method 300 may share the same functions and details from the perspective of FIG. 1 and 2 described above. Therefore, the corresponding method implementations are not described again at this point.
  • FIG. 4 shows an illustrative result of neural network pruning.
  • an initially obtained neural network 410 is pruned into a lightweight neural network 420.
  • the initially obtained neural network 410 comprises exemplarily three layers. Each layer comprises a first set of discrete weights 411, 412 and 413.
  • each layer of the light-weight neural network 420 comprises a second set of discrete weights 421, 422 and 423.
  • FIG. 5 shows an application scenario of the present disclosure.
  • the first two 2D convolutional layers and the last fully connected (FC) layer of a trained neural network are represented as continuous parameter representations (original NN).
  • These continuous parameter representations can be adapted into pruned neural networks (NNs 1-3) of different sizes, which are illustrated exemplarily in FIG. 5.
  • FIG. 6 shows another application scenario of the present disclosure.
  • the present disclosure may be applied to smartphones or self-driving cars where AI- based applications are often used. It can be seen that the present disclosure may allow conducting flexible inference strategies for Al-based applications depending on battery power, CPU usage, memory usage, environmental conditions, etc.
  • each layer may optionally comprise a set of biases, and embodiments of the present disclosure may apply similarly to the set of biases.
  • the apparatus 100 in the present disclosure may comprise processing circuitry configured to perform, conduct or initiate the various operations of the device described herein, respectively.
  • the processing circuitry may comprise hardware and software.
  • the hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry.
  • the digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or multi-purpose processors.
  • the processing circuitry comprises one or more processors and a non-transitory memory connected to the one or more processors.
  • the non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the device to perform, conduct or initiate the operations or methods described herein, respectively.
  • the apparatus 100 in the present disclosure may be a single electronic device capable of computing, or may comprise a set of connected electronic devices or modules capable of computing with shared system memory. It is well-known in the art that such computing capabilities may be incorporated into many different devices, and therefore the term “device” may comprise a chip, chipset, computer (including in-vehicle computer), server, navigation equipment, radar microcontroller (MCU), advanced driver assistance system (ADAS), autonomous vehicle, drone, mobile terminal, tablet, wearable device, game console, graphic processing unit, graphic card, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to methods and devices for performing neural network pruning in the field of machine learning. In particular, each layer of a trained neural network that comprises a first set of discrete weights is represented by a continuous parameter representation. The continuous parameter representation is based on a linear combination of Riemann integrable functions. Then, said continuous parameter representation may be discretized to obtain a second set of discrete weights of a desired size for that layer. In this way, neural network pruning is performed and the size of the neural network may be changed at the inference phase. Moreover, there is no need to fine-tune the second set of discrete weights because of the continuous parameter representation.

Description

DEVICE AND METHOD FOR NEURAL NETWORK PRUNING
TECHNICAL FIELD
The present disclosure relates to devices and methods in the fields of computer science, in particular, artificial intelligence (Al). The disclosure relates especially to devices and methods for neural network pruning.
BACKGROUND
In the field of artificial intelligence, such as machine learning, pruning a neural network (or neural network pruning) refers to a process of deleting unnecessary or least important parameters, such as weights and neurons, from a trained neural network. Neural network pruning is widely used for neural network compression to achieve a lightweight trained neural network model of any desired size. The lightweight trained neural network may have the benefits of a reduced size and an accelerated execution speed during an inference phase. In this disclosure, neural network pruning may be simply referred to as pruning. A neural network may be simply referred to as a model.
SUMMARY
One issue with conventional neural network pruning is that retraining (or fine-tuning) is often required, in order to reduce accuracy drop. Sometimes, for ensuring a similar model performance as the original model, several iterations of pruning and retraining may be required.
A further issue with conventional neural network pruning is that changing the size of the neural network during the inference phase is not allowed after pruning is done.
In view of the above, an objective of this disclosure is to enable neural network pruning to an arbitrary size without retraining. Another objective is to allow the size of a neural network to change flexibly during the inference phase.
These and other objectives are achieved by the solutions of the present disclosure as described in the independent claims. Advantageous implementations are further defined in the dependent claims. An idea described in the present disclosure is to use a continuous representation, such as integral operations, to replace a discrete transformation denoted by weights of neural network layers. Then, this continuous representation of the neural network layer(s) may be discretized to an arbitrary size at any time, also during the inference phase.
A first aspect of the present disclosure provides a data processing apparatus configured to obtain a trained neural network comprising a plurality of neural network layers, in which each neural network layer comprises a first set of discrete weights. For each neural network layer, the data processing apparatus is configured to determine a continuous parameter representation for the first set of discrete weights based on a linear combination of Riemann integrable functions. Then, for each neural network layer, the data processing apparatus is configured to discretize the continuous parameter representation to obtain a second set of discrete weights, and generate a layer output by processing a layer input based on the second set of discrete weights.
As a result, each neural network layer of the trained neural network comprises the second set of discrete weights. For example, for each neural network layer, the size of the second set of discrete weights may be less than that of the first set of discrete weights. That is, the size of the neural network may be reduced and the trained neural network may be pruned.
Moreover, the first set of discrete weights may be presented by multidimensional continuous surfaces based on the linear combination of the Riemann integrable functions. This may allow generating a neural network of any desired size without fine-tuning or retraining. This is because trained parameters may have inertial effect to the neighborhood of each parameter to some extent. Unlike conventional neural network pruning, where information of the pruned (or deleted) elements is simply lost, a discretization based on the continuous parameter representation may include the inertial effect to the neighborhood for each weight.
A further advantage of the present disclosure is that instead of storing the first set of discrete weights for each neural network layer conventionally, for storing the trained neural network, the data processing apparatus may ultimately only need to store the continuous parameter representation for each neural network layer. In this way, the storage space of the data processing apparatus may be saved.
Optionally, values of the first set of discrete weights are in the range of [0, 1]. Optionally, a Riemann integrable function, in general, may refer to a function that its lower and upper integral are the same.
Optionally, the data processing apparatus may be configured to, only during the inference phase of the trained neural network, perform the discretization of the continuous parameter representation and the generation of the layer output. Alternatively, the data processing apparatus may be configured to perform the generation of the layer output only during the inference phase.
In a possible implementation form of the first aspect, for discretizing the continuous parameter representation, the data processing apparatus may be configured to:
- obtain a first discretization by applying a meshgrid operation to the continuous parameter representation within [0, l]n, wherein n denotes a dimension of the continuous parameter representation, and
- adjust the first discretization according to a numerical integration method to obtain the second set of discrete weights.
Optionally, applying the meshgrid operation may refer to using a meshgrid function to create a rectangular grid out of the continuous parameter representation. In the rectangular grid, the data processing apparatus may be configured to use the numerical integration method to compute a quadrature over a partition of rectangular gird and obtain the result of the numerical integration method as a discrete weight of the second set of discrete weights.
In a possible implementation form of the first aspect, for discretizing the continuous parameter representation, the data processing apparatus may be configured to apply uniform partitions in each dimension of the continuous parameter representation.
It is noted that any suitable kind of partition may be used. By using the uniform partitions, computational complexity may be further reduced.
In a possible implementation form of the first aspect, the data processing apparatus may be further configured to adapt a size of the second set of discrete weights based on computational complexity of an inference phase of the trained neural network. Optionally, the size of the second set of discrete weights may be understood as the size of each layer.
In a possible implementation form of the first aspect, the data processing apparatus may be further configured to:
- determine a further continuous parameter representation for a set of discrete inputs based on the linear combination of the Riemann integrable functions; and
- perform a numerical integration based on the continuous parameter representation and the further continuous parameter representation, to obtain, as a result, the layer output.
Optionally, the data processing apparatus may be configured to execute these two steps during the inference phase. In this way, a non-linear transformation performed by the neural network may be turned into a numerical integration. Thus, the size of the neural network and/or its computational complexity may be adapted on-demand during the inference phase.
In a possible implementation form of the first aspect, the Riemann integrable functions may be based on wavelet functions.
It is noted that the wavelet functions may also simply refer to wavelets in the field of mathematics. Examples of the wavelet functions include but are not limited to: a Gaussian function, Morlet wavelet, and Ricker wavelet. Preferably, the Gaussian function may be used.
In a possible implementation form of the first aspect, the continuous parameter representation may be as follows:
Figure imgf000006_0001
wherein Fw () denotes the continuous parameter representation, θ denotes a vector of parameters including μi, σi and λi , det() denotes a determinant operation, denotes a location parameter, σi; denotes diagonal positive definite matrix, λi denotes weights of linear combination, k denotes the number of dimensions, i denotes the number of Gaussian functions, and i,k are integers. In a possible implementation form of the first aspect, the plurality of neural network layers may comprise at least one 2-dimensional, 2D, convolutional layer, and for discretizing the continuous parameter representation to obtain a second set of discrete weights, the data processing apparatus may be configured to perform:
Figure imgf000007_0001
for each 2D convolution layer, wherein W denotes the second set of discrete weights, i,j, k, I denote four dimensions of the weights of the 2D convolutional layer, and h denotes a step along each dimension.
In a possible implementation form of the first aspect, the plurality of neural network layers may comprise at least one fully-connected layer, and for discretizing the continuous parameter representation to obtain a second set of discrete weights, the data processing apparatus may be configured to perform:
Figure imgf000007_0002
for each fully-connected layer, wherein W denotes the second set of discrete weights, i,j denote two dimensions of the weights of the fully-connected layer, and h denotes a step along each dimension.
A second aspect of the present disclosure provides a data processing method. The method comprises the following steps:
- obtaining a trained neural network comprising a plurality of neural network layers, wherein each neural network layer comprises a first set of discrete weights,
- for each neural network layer, determining a continuous parameter representation for the first set of discrete weights based on a linear combination of Riemann integrable functions;
- discretizing the continuous parameter representation to obtain a second set of discrete weights; and
- generating a layer output by processing a layer input based on the second set of discrete weights. Optionally, the method may be performed by a single apparatus. Alternatively, the steps of the method may be performed by a plurality of distributed apparatus. For example, the method may be performed by the apparatus of the first aspect.
In a possible implementation form of the second aspect, the step of discretizing the continuous parameter representation may comprise:
- obtaining a first discretization by applying a meshgrid operation to the continuous parameter representation within [0, l]n, wherein n denotes a dimension of the continuous parameter representation, and
- adjusting the first discretization according to a numerical integration method to obtain the second set of discrete weights.
In a possible implementation form of the second aspect, the step of discretizing the continuous parameter representation may comprise applying uniform partitions in each dimension of the continuous parameter representation.
In a possible implementation form of the second aspect, the method may further comprise adapting a size of the second set of discrete weights based on a computational complexity of an inference phase of the trained neural network.
In a possible implementation form of the second aspect, the method may further comprise:
- determining a further continuous parameter representation for a set of discrete inputs based on the linear combination of the Riemann integrable functions; and
- performing a numerical integration based on the continuous parameter representation and the further continuous parameter representation to obtain as a result the layer output.
In a possible implementation form of the second aspect, the Riemann integrable functions may be based on wavelet functions. In a possible implementation form of the second aspect, the continuous parameter representation may be:
Figure imgf000009_0001
wherein Fw () denotes the continuous parameter representation, θ denotes a vector of parameters including and λi, det() denotes a determinant operation, μi denotes a location parameter, σi denotes diagonal positive definite matrix, denotes weigths of linear combination, k denotes the number of dimensions, i denotes the number of Gaussian functions, and i,k are integers.
In a possible implementation form of the second aspect, the plurality of neural network layers may comprise at least one 2-dimensional, 2D, convolutional layer, and the discretizing the continuous parameter representation to obtain a second set of discrete weights may comprise performing:
Figure imgf000009_0002
for each 2D convolution layer, wherein W denotes the second set of discrete weights, i,j, k, I denote four dimensions of the weights of the 2D convolutional layer, and h denotes a step along each dimension.
In a possible implementation form of the second aspect, the plurality of neural network layers may comprise at least one fully-connected layer, and the step of discretizing the continuous parameter representation to obtain a second set of discrete weights may comprise performing:
Figure imgf000009_0003
for each fully-connected layer, wherein W denotes the second set of discrete weights, i,j denote two dimensions of the weights of the fully-connected layer, and h denotes a step along each dimension.. A third aspect of the present disclosure provides a computer program or program product comprising a program code for performing the method according to the second aspect or any implementation form thereof, when executed on a computer.
A fourth aspect of the present disclosure provides a computer-readable medium comprising instructions which, when executed by a computer, cause the computer to carry out the method according to any one of the second aspect or any implementation form thereof.
A fifth aspect of the present disclosure provides a chipset comprising instructions which, when executed by the chipset, cause the chipset to carry out the method according to any one of the fourth aspect or any implementation form thereof.
It has to be noted that all apparatus, devices, elements, units, and means described in the present application could be implemented in software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity, which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof.
BRIEF DESCRIPTION OF DRAWINGS
The above-described aspects and implementation forms will be explained in the following description of specific embodiments in relation to the enclosed drawings, in which:
FIG. 1 shows an example of neural network pruning according to the present disclosure;
FIG. 2 shows an illustration of neural network pruning according to the present disclosure;
FIG. 3 shows a flow-diagram of a method for neural network pruning according to the present disclosure;
FIG. 4 shows an illustrative result of neural network priming; FIG. 5 shows an application scenario of the present disclosure; and
FIG. 6 shows another application scenario of the present disclosure.
DETAILED DESCRIPTION OF EMBODIMENTS
In FIGs. 1-6, corresponding elements are labelled with the same reference signs, may share the same features and may function likewise. Moreover, it is noted that the number of elements, graphs of functions depicted and values depicted in FIGs. 1-6 are for illustration purposes only and shall not be interpreted as limitations to embodiments of the present disclosure.
FIG. 1 shows an example of neural network pruning according to the present disclosure.
For performing neural network pruning, a data processing apparatus 100 is firstly configured to obtain a trained neural network. The trained neural network comprises a plurality of neural network layers, which may comprise an input layer, at least one hidden layer, and an output layer. In the present disclosure, the neural network layer may be simply referred to as a layer. Neural network layers are subsequently connected by establishing connections between neurons in neighboring layers. The input layer may be configured to receive an input. Each of the output layer and the at least one hidden layer may be configured to receive a layer output from its previous layer and apply a non-linear transformation based on its weights to generate its layer output. The layer output of the output layer may refer to an output of the neural network.
Each layer may comprise weights and optional biases. The trained neural network may refer to a neural network that has been trained based on a training set for a particular purpose, such as image processing. Training a neural network may be referred to as a training phase, while applying the trained neural network may be referred to as an inference phase. The first set of discrete weights may refer to fine-timed parameters during the training phase.
The first set of discrete weights 111 shown in FIG. 1 is for illustration purposes, which may comprise a set of discrete values of weights associated with neurons of a layer. The trained neural network comprises multiple layers, wherein each layer may comprise a set of discrete weights similar to the first set of discrete weights 111 shown in FIG. 1. In the following, aspects referring to the first set of discrete weights 111 shall apply likewise to any other layers of the trained neural network. For each neural network layer, the data processing apparatus is configured to determine a continuous parameter representation 131 (or labeled as Fw) for the first set of discrete weights. The continuous parameter representation 131 is based on a plurality of linear combination of Riemann integrable functions 121, 122, 123. Optionally, the continuous parameter representation 131 may be based on any reasonable number of Riemann integrable functions. The data processing apparatus may be further configured to determine an upper limit of the number of Riemann integrable functions that can be used based on its computational capability.
Without prejudice to the commonly known meaning in the field of mathematics, a function is Riemann integrable under the following condition: let /(x/, .... xn) be a multivariate function defined on n-dimensional unit cube Q=[0, 1 ]n, function ƒ() is Riemann integrable if the following limit exists:
Figure imgf000012_0001
where A, denotes volume of partition, A denotes maximum volume of partition
Figure imgf000012_0002
denotes points inside partition z. The same notations apply to the following where same symbols are used, unless otherwise specified.
If such limit exists, then it is called Riemann integral of function f(x1 , ... , xn ). This definition may be intuitively interpreted as the volume under the surface defined by the function f. A finer partition of the cube Ω with smaller A may ensure a more precise integral estimation.
Optionally, to calculate a numerical value of definite integral, different numerical quadratures may be used. Numerical integration methods can generally be described as combining evaluations of the integrand to get an approximation to the integral, which is shown as follows:
Figure imgf000013_0001
The integrand may be evaluated at a finite set of points called integration points and a weighted sum of these values is used to approximate the integral. The integration points and weights depend on the specific method used and the accuracy required from the approximation. The evaluation of multiple integrals can be reduced to iterated integral by iteratively applying such quadratures.
A wavelet function is Riemann integrable. Therefore, in some embodiments of the present disclosure, the Riemann integrable functions may be based on wavelet functions.
In a preferred embodiment, the continuous parameter representation may be based on a linear combination of Gaussian functions as follows:
Figure imgf000013_0002
( 1 ) wherein Fw () denotes the continuous parameter representation and may define parameter surface of weights, 9 denotes a vector of parameters including μi, σi and λi, det() denotes a determinant operation,μi denotes a location parameter, σi denotes diagonal positive definite matrix, λi denotes weights of linear combination, k denotes the number of dimensions, i denotes the number of Gaussian functions, and i, k are integers.
Optionallyμ σii , and λi are trainable parameters and may be determined in a learnable way.
For training the trainable parameter, the trainable parameters may be randomly initialized. Alternatively, the initialization of the location parameters [μ may be based on a uniform grid. The number of functions along each axis is defined by shape discretization. The univariate Gaussian functions are mostly concentrated in segment[ μ-3σ, μ+ 3σ]3<r], so that neighboring Gaussian functions may have a high support at the intersections to ensure the necessary gradient behavior.
An advantage of using Gaussian functions is that Gaussian functions are mostly localized in a bounded area which makes it possible to concentrate to the training of trainable parameters within the cube Ω. Diagonal covariance matrices may lead to a fast evaluation while allowing to train complex surfaces with a small number of parameters.
When other Riemann integrable functions are used, FW(θ, x) may not be limited to equation (3).
After the data processing apparatus 100 determines the continuous parameter representation 131, the data processing apparatus 100 may be configured to store the linear combination Fw () instead of storing the first set of discrete weights for each layer. During the inference phase, the data processing apparatus 100 is configured to discretize the continuous parameter representation to obtain a second set of discrete weights. The second set of discrete weights may be seen as part of a pruned neural network, which is lightweight and may help to reduce the computational complexity of the inference phase.
FIG. 1 further illustrates an example of a discretization of the continuous parameter representation to obtain one of the second set of discrete weights. In particular, during the inference phase, the data processing apparatus 100 may be configured to determine the size of the second set of discrete weights 141, which is fourteen as an example. Then, the data processing apparatus 100 may be configured to determine a corresponding number of partitions and perform integral on each partition to obtain, as a result, a discrete weight of the second set of discrete weights 141.
When a layer is a 2-dimensional (2D) convolutional (conv) layer, the second set of discrete weights may be obtained by:
(2),
Figure imgf000014_0001
where i,j, k, I denote four dimensions of the weights of the 2D convolutional layer, h denotes a step along each dimension where a meshgrid is defined. For each point of the meshgrid, function Fw is evaluated to obtain as a result a discrete weight of the second set of discrete weights.
When a layer is a fully connected (FC) layer, the second set of discrete weights may be obtained by:
(3).
Figure imgf000015_0001
FIG. 2 shows an illustration of neural network pruning according to the present disclosure.
In FIG. 2, neural network pruning on a three-dimensional cube is illustrated. Similar to FIG. 1, weights of higher dimensions (larger than two) of each layer may be presented by a linear combination of surfaces or wavelets on a high-dimensional unit cube, optionally via smooth integral kernel evaluation. The linear combination may be represented by function FwQ and a continuous surface may be formed. Then, the data processing apparatus 100 may be configured to applying a meshgrid operation to obtain a meshgrid on the continuous surface. Then, the function FwQ may be discretized by performing integral on partitions of the meshgrid according to a desired shape and quadrate (e.g., according to a desired neural network size).
FIG. 3 shows a flow-diagram of a method 300 according to the present disclosure. The method 300 may be performed by the apparatus 100.
The method 300 comprises the following steps:
- step 301: obtaining a trained neural network comprising a plurality of neural network layers, wherein each neural network layer comprises a first set of discrete weights; for each neural network layer:
- step 302: determining a continuous parameter representation for the first set of discrete weights based on a linear combination of Riemann integrable functions;
- step 303: discretizing the continuous parameter representation to obtain a second set of discrete weights; and during an inference phase of the trained neural network,
- step 304: discretizing the continuous parameter representation to obtain a second set of discrete weights. Optionally, a single apparatus may be configured to execute the method 300. Alternatively, multiple apparatus or components of a device may be configured to execute different steps of the method 300. It is noted although an apparatus 100 is used with respect to FIG. 1, it shall be understood that it does not exclude an embodiment where multiple apparatus may be configured to execute the steps mentioned in the method 300.
For example, a first apparatus may be configured to execute steps 301-302 once a trained neural network is obtained. Then, the first apparatus may be configured to provide the continuous parameter representation to a second apparatus. The second apparatus may be specifically configured to execute the trained neural network during the inference phase. During the inference phase, the second apparatus may be configured to execute steps 303 and 304. As an example, the first apparatus may be a server, while the second apparatus may be a terminal.
In another scenario, a first execution unit of a device may be configured to execute steps 301- 302, while a second execution unit of the device may be configured to execute steps 303-304. For example, the device may be a mobile device and may comprise multiple cores in its CPU. The multiple cores may comprise relatively battery-saving and slower processor cores (known as ‘little core’), and relatively more powerful and power-hungry processor cores (known as ‘big core’). The one or more big cores may be configured to execute steps 301-302; while the one or more little cores may be configured to execute steps 303-304. The device may further comprise an Al accelerator chip, such as a tensor core, neural processing unit, tensor processing unit, and graph processing unit. The Al accelerator chip may be used to assist Al-related computations in steps 301-304.
Moreover, the steps of the method 300 may share the same functions and details from the perspective of FIG. 1 and 2 described above. Therefore, the corresponding method implementations are not described again at this point.
FIG. 4 shows an illustrative result of neural network pruning.
In FIG. 4, an initially obtained neural network 410 is pruned into a lightweight neural network 420. The initially obtained neural network 410 comprises exemplarily three layers. Each layer comprises a first set of discrete weights 411, 412 and 413. After performing the neural network pruning, each layer of the light-weight neural network 420 comprises a second set of discrete weights 421, 422 and 423.
FIG. 5 shows an application scenario of the present disclosure.
In FIG. 5, the first two 2D convolutional layers and the last fully connected (FC) layer of a trained neural network are represented as continuous parameter representations (original NN). These continuous parameter representations can be adapted into pruned neural networks (NNs 1-3) of different sizes, which are illustrated exemplarily in FIG. 5.
FIG. 6 shows another application scenario of the present disclosure.
In FIG. 6, the present disclosure may be applied to smartphones or self-driving cars where AI- based applications are often used. It can be seen that the present disclosure may allow conducting flexible inference strategies for Al-based applications depending on battery power, CPU usage, memory usage, environmental conditions, etc.
The present disclosure is described mainly with reference to the weights comprised in each layer of the neural network. It is noted that each layer may optionally comprise a set of biases, and embodiments of the present disclosure may apply similarly to the set of biases.
It is noted that the apparatus 100 in the present disclosure may comprise processing circuitry configured to perform, conduct or initiate the various operations of the device described herein, respectively. The processing circuitry may comprise hardware and software. The hardware may comprise analog circuitry or digital circuitry, or both analog and digital circuitry. The digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or multi-purpose processors. In one embodiment, the processing circuitry comprises one or more processors and a non-transitory memory connected to the one or more processors. The non-transitory memory may carry executable program code which, when executed by the one or more processors, causes the device to perform, conduct or initiate the operations or methods described herein, respectively. It is further noted that the apparatus 100 in the present disclosure may be a single electronic device capable of computing, or may comprise a set of connected electronic devices or modules capable of computing with shared system memory. It is well-known in the art that such computing capabilities may be incorporated into many different devices, and therefore the term “device” may comprise a chip, chipset, computer (including in-vehicle computer), server, navigation equipment, radar microcontroller (MCU), advanced driver assistance system (ADAS), autonomous vehicle, drone, mobile terminal, tablet, wearable device, game console, graphic processing unit, graphic card, and the like.
The present disclosure has been described in conjunction with various embodiments as examples as well as implementations. However, other variations can be understood and effected by those persons skilled in the art and practicing the claimed subject matter, from the studies of the drawings, this disclosure and the independent claims. In the claims as well as in the description the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. A single element or another unit may fulfill the functions of several entities or items recited in the claims. The mere fact that certain measures are recited in the mutual different dependent claims does not indicate that a combination of these measures cannot be used in an advantageous implementation.

Claims

1. A data processing apparatus (100) configured to: obtain a trained neural network comprising a plurality of neural network layers, wherein each neural network layer comprises a first set of discrete weights (111), for each neural network layer, determine a continuous parameter representation (131) for the first set of discrete weights based on a linear combination of Riemann integrable functions (121, 122, 123); discretize the continuous parameter representation (131) to obtain a second set of discrete weights (141); and generate a layer output by processing a layer input based on the second set of discrete weights (141).
2. The data processing apparatus (100) according to claim 1, wherein for discretizing the continuous parameter representation (131), the data processing apparatus (100) is configured to: obtain a first discretization by applying a meshgrid operation to the continuous parameter representation (131) within [0, 1 ]n, wherein n denotes a dimension of the continuous parameter representation (131), and adjust the first discretization according to a numerical integration method to obtain the second set of discrete weights (141).
3. The data processing apparatus (100) according to claim 1 or 2, wherein for discretizing the continuous parameter representation (131), the data processing apparatus (100) is configured to apply uniform partitions in each dimension of the continuous parameter representation (131).
4. The data processing apparatus (100) according to any one of claims 1 to 3, further configured to adapt a size of the second set of discrete weights (141) based on computational complexity of an inference phase of the trained neural network.
5. The data processing apparatus (100) according to any one of claims 1 to 4, further configured to: determine a further continuous parameter representation ( 131 ) for a set of discrete inputs based on the linear combination of the Riemann integrable functions (121, 122, 123); and perform a numerical integration based on the continuous parameter representation (131) and the further continuous parameter representation (131), to obtain as a result the layer output.
6. The data processing apparatus (100) according to any one of claims 1 to 5, wherein the Riemann integrable functions (121,122, 123) are based on wavelet functions.
7. The data processing apparatus (100) according to claim 6, wherein the continuous parameter representation (131) is:
Figure imgf000020_0001
wherein Fw () denotes the continuous parameter representation (131), 9 denotes a vector of parameters including /Z(, a, and det() denotes a determinant operation, /Z; denotes a location parameter, denotes diagonal positive definite matrix, denotes weights of the linear combination, k denotes the number of dimensions, z denotes the number of the Gaussian functions, and i,k are integers.
8. The data processing apparatus (100) according to claim 7, wherein the plurality of neural network layers comprises at least one 2-dimensional, 2D, convolutional layer, and for discretizing the continuous parameter representation (131) to obtain a second set of discrete weights (141), the data processing apparatus (100) is configured to perform:
Figure imgf000020_0002
for each 2D convolution layer, wherein W denotes the second set of discrete weights (141), i.j, k, I denote four dimensions of the weights of the 2D convolutional layer, and h denotes a step along each dimension.
9. The data processing apparatus (100) according to claim 7, wherein the plurality of neural network layers comprises at least one fully-connected layer, and for discretizing the continuous parameter representation (131) to obtain a second set of discrete weights (141), the data processing apparatus (100) is configured to perform:
Figure imgf000020_0003
for each fully-connected layer, wherein W denotes the second set of discrete weights (141), i,j denote two dimensions of the weights of the fully-connected layer, and h denotes a step along each dimension.
10. A data processing method (300) comprising: obtaining (301 ) a trained neural network comprising a plurality of neural network layers, wherein each neural network layer comprises a first set of discrete weights (111), for each neural network layer, determining (302) a continuous parameter representation (131) for the first set of discrete weights (111) based on a linear combination of Riemann integrable functions (121, 122, 123); discretizing (303) the continuous parameter representation (131) to obtain a second set of discrete weights (141); and generating (304) a layer output by processing a layer input based on the second set of discrete weights (141).
11. The data processing method according to claim 10, wherein the discretizing (303) the continuous parameter representation (131) comprises: obtaining a first discretization by applying a meshgrid operation to the continuous parameter representation (131) within [0, l]n, wherein n denotes a dimension of the continuous parameter representation (131), and adjusting the first discretization according to a numerical integration method to obtain the second set of discrete weights (141).
12. The data processing method according to claim 10 or 11, wherein the discretizing (303) the continuous parameter representation (131) comprises applying uniform partitions in each dimension of the continuous parameter representation (131).
13. The data processing method according to any one of claims 10 to 12, further comprising adapting a size of the second set of discrete weights (141) based on computational complexity of an inference phase of the trained neural network.
14. The data processing method according to any one of claims 10 to 13, further comprising: determining a further continuous parameter representation (131) for a set of discrete inputs based on the linear combination of the Riemann integrable functions (121, 122, 123); and performing a numerical integration based on the continuous parameter representation (131) and the further continuous parameter representation (131) to obtain as a result the layer output.
15. The data processing method according to any one of claims 10 or 14, wherein the Riemann integrable functions (121, 122, 123) are based on wavelet functions.
16. The data processing apparatus (100) according to claim 15, wherein the continuous parameter representation (131) is:
Figure imgf000022_0001
wherein Fw () denotes the continuous parameter representation (131), 6 denotes a vector of parameters including and A,b detQ denotes a determinant operation, denotes a location parameter, (T£ denotes diagonal positive definite matrix, λi denotes weights of the linear combination, k denotes the number of dimensions, z denotes the number of the Gaussian functions, and i,k are integers.
17. The data processing method according to claim 16, wherein the plurality of neural network layers comprises at least one 2-dimensional, 2D, convolutional layer, and the discretizing the continuous parameter representation (131) to obtain a second set of discrete weights (141) comprises performing
WViJ. k. Q = FwQ0, ihout,jhin, khwl, khw2), for each 2D convolution layer, wherein W denotes the second set of discrete weights (141), i,j, k, I denote four dimensions of the weights of the 2D convolutional layer, and h denotes a step along each dimension.
18. The data processing method according to claim 16, wherein the plurality of neural network layers comprises at least one fully-connected layer, and the discretizing the continuous parameter representation (131) to obtain a second set of discrete weights (141) comprises performing
Figure imgf000023_0001
for each fully-connected layer, wherein W denotes the second set of discrete weights (141), i,j denote two dimensions of the weights of the fully-connected layer, and h denotes a step along each dimension.
19. A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to perform the method according to any one of claims 10 to 18.
PCT/RU2021/000505 2021-11-15 2021-11-15 Device and method for neural network pruning WO2023085968A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/RU2021/000505 WO2023085968A1 (en) 2021-11-15 2021-11-15 Device and method for neural network pruning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/RU2021/000505 WO2023085968A1 (en) 2021-11-15 2021-11-15 Device and method for neural network pruning

Publications (1)

Publication Number Publication Date
WO2023085968A1 true WO2023085968A1 (en) 2023-05-19

Family

ID=79164929

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/RU2021/000505 WO2023085968A1 (en) 2021-11-15 2021-11-15 Device and method for neural network pruning

Country Status (1)

Country Link
WO (1) WO2023085968A1 (en)

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHEN LIN ET AL: "Dynamical Conventional Neural Network Channel Pruning by Genetic Wavelet Channel Search for Image Classification", FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, vol. 15, 27 October 2021 (2021-10-27), Lausanne, XP055937112, ISSN: 1662-5188, DOI: 10.3389/fncom.2021.760554 *
MORITZ WOLTER ET AL: "Towards deep neural network compression via learnable wavelet transforms", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 20 April 2020 (2020-04-20), XP081649021 *

Similar Documents

Publication Publication Date Title
US12073309B2 (en) Neural network device and method of quantizing parameters of neural network
US11816532B2 (en) Performing kernel striding in hardware
CN112074806B (en) System, method and computer storage medium for block floating point computing
KR102647858B1 (en) Low-power hardware acceleration method and system for convolution neural network computation
TWI759361B (en) An architecture, method, computer-readable medium, and apparatus for sparse neural network acceleration
WO2020123185A1 (en) Residual quantization for neural networks
KR20210108906A (en) Point cloud data processing method, apparatus, electronic device and computer readable storage medium
CN110020723A (en) Neural-network processing unit and system on chip including the neural-network processing unit
CN114651260A (en) Phase selective convolution with dynamic weight selection
WO2020074989A1 (en) Data representation for dynamic precision in neural network cores
EP4318313A1 (en) Data processing method, training method for neural network model, and apparatus
WO2019018564A1 (en) Neuromorphic synthesizer
WO2022111002A1 (en) Method and apparatus for training neural network, and computer readable storage medium
CN108496188A (en) Method, apparatus, computer system and the movable equipment of neural metwork training
CN114078195A (en) Training method of classification model, search method and device of hyperparameter
CN110246166A (en) Method and apparatus for handling point cloud data
CN113761934B (en) Word vector representation method based on self-attention mechanism and self-attention model
Shiri et al. An FPGA implementation of singular value decomposition
CN111382839A (en) Method and apparatus for pruning neural network
CN110246167A (en) Method and apparatus for handling point cloud data
US20230351181A1 (en) Approximating activation functions with taylor series
US11119507B2 (en) Hardware accelerator for online estimation
US20210397953A1 (en) Deep neural network operation method and apparatus
US20220222512A1 (en) Systems and methods for cognitive signal processing
CN114662549A (en) DOA (direction of arrival) determination method, device and medium of signal

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21834963

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21834963

Country of ref document: EP

Kind code of ref document: A1