EP4309079A1 - Transferlernen zwischen neuronalen netzen - Google Patents

Transferlernen zwischen neuronalen netzen

Info

Publication number
EP4309079A1
EP4309079A1 EP21716362.5A EP21716362A EP4309079A1 EP 4309079 A1 EP4309079 A1 EP 4309079A1 EP 21716362 A EP21716362 A EP 21716362A EP 4309079 A1 EP4309079 A1 EP 4309079A1
Authority
EP
European Patent Office
Prior art keywords
neural network
logical
source
computing device
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP21716362.5A
Other languages
English (en)
French (fr)
Inventor
Henrique Koji MIYAMOTO
Apostolos Destounis
Jean-Claude Belfiore
Ingmar LAND
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of EP4309079A1 publication Critical patent/EP4309079A1/de
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the disclosure relates to a method, and more particularly to a method for transfer learning associated with neural networks. Furthermore, the disclosure relates to a corresponding computing device and a computer program.
  • DNNs Deep Neural Networks
  • image recognition DNNs may be trained to identify images that contain cars by analysing example images that have been manually labelled as “car” or “no car” and using the results to identify cars in other images. DNNs do this without any prior knowledge about cars. Instead, they automatically generate identifying features from the learning material that they process.
  • Transfer learning may also be applied with DNNs.
  • transfer learning learning acquired in a first neural network can be transferred to a second neural network.
  • a computing device is configured to initialize a source neural network; train the source neural network with training data of the source neural network; perform a semantic analysis of the source neural network; extract logical behaviour data of the source neural network based on the semantic analysis; and cause a transmission of the logical behaviour data.
  • the solution may, for example, significantly reduce the amount of bits transferred from the source neural network to the target neural network.
  • the logical behaviour data comprises a logical table. The solution may enable, for example, an efficient data structure for the logical behaviour data.
  • the computing device is further configured to semantically analyse neurons of the source neural network; and store, based on the semantic analysis, logical propositions corresponding to outputs of at least some of the neurons into the logical table.
  • the solution may enable, for example, an efficient analysis of the neurons.
  • the computing device is further configured to encode the logical propositions into binary vectors; and cause a transmission of the binary vectors.
  • the solution may enable, for example, to optimize the amount of data needed to be transferred to the target neural network.
  • a computing device is configured to receive logical behaviour data associated with a source neural network, the logical behaviour data being based on a semantic analysis of the source neural network; pre-train a target neural network with the logical behaviour data associated with the source neural network; and train the target neural network with training data of the target neural network.
  • the solution may enable, for example, fast learning speed and improved final accuracy of the target neural network.
  • the logical behaviour data comprises a logical table.
  • the solution may enable, for example, an efficient data structure for the logical behaviour data.
  • the logical table comprises logical propositions corresponding to outputs of at least some of the neurons of the source neural network, the logical propositions being based on a semantic analysis of the neurons of the source neural network.
  • the solution may enable, for example, an efficient analysis of the neurons.
  • the computing device is further configured to compute an inverse logical table based on the received logical table, the inverse logical table being indicative of the desired logical behaviour for each neuron in the target neural network, wherein the computing device is configured to pre-train the target neural network by using a cost function that takes into account the outputs of the neurons of the target neural network and penalises deviations from desired outputs indicated by the inverted logical table.
  • the solution may enable, for example, increased efficiency.
  • the computing device is further configured to pre-train the target neural network successively layer by layer. The solution may enable, for example, increased efficiency.
  • the logical table comprises the logical propositions encoded into binary vectors.
  • the solution may enable, for example, to optimize the amount of data needed to be transferred to the target neural network.
  • the computing device is configured to pre-train the target neural network with a limited set of the training data of the target neural network and a limited number of epochs.
  • the solution may enable, for example, increased efficiency.
  • a method comprises initializing a source neural network; training the source neural network with training data of the source neural network; performing a semantic analysis of the source neural network; extracting logical behaviour data of the source neural network based on the semantic analysis; and causing a transmission of the logical behaviour data.
  • the solution may, for example, significantly reduce the amount of bits transferred from the source neural network to the target neural network.
  • the logical behaviour data comprises a logical table.
  • the solution may enable, for example, an efficient data structure for the logical behaviour data.
  • the method further comprises semantically analysing neurons of the source neural network; and storing, based on the semantic analysis, logical propositions corresponding to outputs of at least some of the neurons into the logical table.
  • the solution may enable, for example, an efficient analysis of the neurons.
  • the method further comprises encoding the logical propositions into binary vectors; and causing a transmission of the binary vectors.
  • the solution may enable, for example, to optimize the amount of data needed to be transferred to the target neural network.
  • a method comprises receiving logical behaviour data associated with a source neural network, the logical behaviour data being based on a semantic analysis of the source neural network; pre-training a target neural network with the logical behaviour data associated with the source neural network; and training the target neural network with training data of the target neural network.
  • the solution may enable, for example, fast learning speed and improved final accuracy of the target neural network.
  • the logical behaviour data comprises a logical table. The solution may enable, for example, an efficient data structure for the logical behaviour data.
  • the logical table comprises logical propositions corresponding to outputs of at least some of the neurons of the source neural network, the logical propositions being based on a semantic analysis of the neurons of the source neural network.
  • the solution may enable, for example, an efficient analysis of the neurons.
  • the method further comprises computing an inverse logical table based on the received logical table, the inverse logical table being indicative of the desired logical behaviour for each neuron in the target neural network; and pre-training the target neural network by using a cost function that takes into account the outputs of the neurons of the target neural network and penalises deviations from desired outputs indicated by the inverted logical table.
  • the solution may enable, for example, increased efficiency.
  • the method further comprises pre training the target neural network successively layer by layer.
  • the solution may enable, for example, increased efficiency.
  • the logical table comprises the logical propositions encoded into binary vectors.
  • the solution may enable, for example, to optimize the amount of data needed to be transferred to the target neural network.
  • the method further comprises pre training the target neural network with a limited set of the training data of the target neural network training data and a limited number of epochs.
  • the solution may enable, for example, increased efficiency.
  • a computer program comprising program code configured to perform a method according to the third aspect when the computer program is executed on a computer.
  • a computer program comprising program code configured to perform a method according to the fourth aspect when the computer program is executed on a computer.
  • a telecommunication device comprising the computing device of the first aspect.
  • a telecommunication device comprising the computing device of the second aspect.
  • a computing device comprising means for initializing a source neural network; means for training the source neural network with source neural network training data; means for performing a semantic analysis of the source neural network; means for extracting logical behaviour data of the source neural network based on the semantic analysis; and means for causing transmission of the logical behaviour data.
  • the solution may, for example, significantly reduce the amount of bits transferred from the source neural network to the target neural network.
  • a computing device which comprises means for receiving logical behaviour data associated with a source neural network, the logical behaviour data being obtained based on a semantic analysis of the source neural network; means for pre-training a target neural network with the logical behaviour data associated with the source neural network; and means for training the target neural network with target neural network training data.
  • the solution may enable, for example, fast learning speed and improved final accuracy of the target neural network.
  • FIG. 1 illustrates a schematic representation of a computing device according to an embodiment
  • FIG. 2 illustrates a schematic representation of a computing device according to an embodiment
  • FIG. 3 illustrates a flow chart representation of a method according to an embodiment
  • FIG. 4 illustrates a flow chart representation of a method according to an embodiment
  • FIG. 5 illustrates a schematic representation of a deep neural network according to an embodiment
  • FIG. 6A illustrates the problem of identifying the shape of a signal according to an embodiment
  • FIG. 6B illustrates the problem of identifying the shape of a signal according to an embodiment
  • FIGS. 7A-7C illustrate an example result of an analysis of a neuron according to an embodiment
  • FIG. 8 illustrates an example of a logical table according to an embodiment
  • FIG. 9 illustrates a numerical example of transferred bits according to an embodiment
  • FIG. 10 illustrates a performance example of the semantic transfer according to an embodiment.
  • Fig. 1 illustrates a schematic representation of a computing device 100 according to an embodiment.
  • the computing device 100 is configured to initialize a source neural network.
  • the computing device 100 may be further configured to train the source neural network with training data of the source neural network.
  • the computing device 100 may be further configured to perform a semantic analysis of the source neural network.
  • the computing device 100 may be further configured to extract logical behaviour data of the source neural network based on the semantic analysis.
  • the computing device 100 may be further configured to cause a transmission of the logical behaviour data.
  • the computing device 100 may comprise a processor 102.
  • the computing device 100 may further comprise a memory 104.
  • the computing device 100 may be implemented as a system on a chip (SoC).
  • SoC system on a chip
  • the processor 102, the memory 104, and/or other components of computing device 100 may be implemented using a field- programmable gate array (FPGA).
  • FPGA field- programmable gate array
  • Components of the computing device 100, such as the processor 102 and the memory 104, may not be discrete components.
  • the components may correspond to different units of the SoC.
  • the processor 102 may comprise, for example, one or more of various processing devices, such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like.
  • various processing devices such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like.
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • MCU micro
  • the memory 104 may be configured to store, for example, computer programs and the like.
  • the memory 104 may include one or more volatile memory devices, one or more non volatile memory devices, and/or a combination of one or more volatile memory devices and non-volatile memory devices.
  • the memory 104 may be embodied as magnetic storage devices (such as hard disk drives, floppy disks, magnetic tapes, etc.), optical magnetic storage devices, and semi-conductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash ROM, RAM (random access memory), etc ).
  • the memory 104 may comprise program code for performing any functionality disclosed herein, and the processor 102 may be configured to perform the functionality according to the program code comprised in the memory 102.
  • some component and/or components of the computing device 100 may be configured to implement this functionality.
  • this functionality may be implemented using program code comprised, for example, in the memory 104.
  • the one or more memories 104 and the computer program code can be configured to, with the one or more processors 102, cause the computing device 100 to perform that operation.
  • a telecommunication device comprises the computing device 100.
  • Fig. 2 illustrates a schematic representation of a computing device 200 according to an embodiment.
  • the computing device 200 is configured to receive logical behaviour data associated with a source neural network, the logical behaviour data being based on a semantic analysis of the source neural network.
  • the computing device 200 may be further configured to pre-train a target neural network with the logical behaviour data associated with the source neural network.
  • the computing device 200 may be further configured to train the target neural network with training data of the target neural network.
  • the computing device 200 may comprise a processor 202.
  • the computing device 200 may further comprise a memory 204.
  • the computing device 200 may be implemented as a system on a chip (SoC).
  • SoC system on a chip
  • the processor 202, the memory 204, and/or other components of computing device 200 may be implemented using a field- programmable gate array (FPGA).
  • FPGA field- programmable gate array
  • Components of the computing device 200 may not be discrete components.
  • the components may correspond to different units of the SoC.
  • the processor 202 may comprise, for example, one or more of various processing devices, such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like.
  • various processing devices such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like.
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • MCU micro
  • the memory 204 may be configured to store, for example, computer programs and the like.
  • the memory 204 may include one or more volatile memory devices, one or more non volatile memory devices, and/or a combination of one or more volatile memory devices and non-volatile memory devices.
  • the memory 204 may be embodied as magnetic storage devices (such as hard disk drives, floppy disks, magnetic tapes, etc.), optical magnetic storage devices, and semi-conductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash ROM, RAM (random access memory), etc ).
  • the memory 204 may comprise program code for performing any functionality disclosed herein, and the processor 202 may be configured to perform the functionality according to the program code comprised in the memory 202.
  • the computing device 200 is configured to implement some functionality, some component and/or components of the computing device 200, such as the one or more processors 202 and/or the memory 204, may be configured to implement this functionality.
  • this functionality may be implemented using program code comprised, for example, in the memory 204. For example, if the computing device 200 is configured to perform an operation, the one or more memories 204 and the computer program code can be configured to, with the one or more processors 202, cause the computing device 200 to perform that operation.
  • a telecommunication device comprises the computing device 200.
  • Fig. 3 illustrates a flow chart representation of a method 300 according to an embodiment.
  • the method 300 comprises initializing 302 a source neural network.
  • the method 300 may further comprise training 304 the source neural network with training data of the source neural network.
  • the method 300 may further comprise performing 306 a semantic analysis of the source neural network.
  • the method 300 may further comprise extracting 308 logical behaviour data of the source neural network based on the semantic analysis.
  • the logical behaviour data may comprise, for example, a logical table.
  • the method 300 may further comprise semantically analysing neurons of the source neural network and storing, based on the semantic analysis, logical propositions corresponding to outputs of at least some of the neurons into the logical table.
  • the method 300 may further comprise causing 310 a transmission of the logical behaviour data.
  • a recipient of the logical behaviour data may use the logical behaviour data for training a target neural network.
  • the method 300 may be performed, for example, by the computing device 100.
  • At least some operations of the method 300 may be performed by a computer program product when executed on a computer.
  • Fig. 4 illustrates a flow chart representation of a method 400 according to an embodiment.
  • the method 400 comprises receiving 402 logical behaviour data associated with a source neural network, the logical behaviour data being based on a semantic analysis of the source neural network.
  • the logical behaviour data may comprise, for example, a logical table.
  • the logical table may comprise logical propositions corresponding to outputs of at least some of the neurons of the source neural network, the logical propositions being obtained based on a semantic analysis of the neurons of the source neural network.
  • the logical propositions may be encoded into binary vectors.
  • the method 400 may further comprise pre-training 404 a target neural network with the logical behaviour data associated with the source neural network.
  • the method 400 may further comprise computing an inverse logical table based on the received logical table, the inverse logical table being indicative of the desired logical behaviour for each neuron in the target neural network, and pre-training the target neural network by using a cost function that takes into account the outputs of the neurons of the target neural network and penalises deviations from desired outputs indicated by the inverted logical table.
  • the inverse logical table may describe the desired logical behaviour for some neurons in the target neural network if the logical table only comprises entries for some neurons and not all neurons in the source network.
  • the target neural network may be pre-trained successively layer by layer. Further, the target neural network may be pre-trained with a limited set of the training data of target neural network and a limited number of epochs.
  • the method 400 may further comprise training 406 the target neural network with training data of the target neural network.
  • the method 400 may be performed, for example, by the computing device 200.
  • At least some operations of the method 400 may be performed by a computer program product when executed on a computer.
  • Fig. 5 illustrates a schematic representation of neural network usage according to an embodiment.
  • a deep neural network (DNN) 500 may be based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain.
  • artificial neurons are aggregated into layers 502, 504, 506, 508 where different layers may perform different kinds of transformations on their inputs.
  • the layer 502 may be called the input layer
  • the layers 504, 506 may be called hidden layers
  • the layer 508 may be called the output layer.
  • the connections between artificial neurons have weights that are adjusted as learning proceeds. The weight increases or decreases the strength of the signal at a connection.
  • DNNs are efficient tools to solve various classification tasks.
  • an image may be presented to the input layer 502, one value per input neuron.
  • Each neuron in the network 500 then may compute a function (a nonlinear function of an affine transformation) of the input values and may forward the result to the following layer.
  • the functions are parameterised, and these parameters are to be optimised.
  • the output layer 508 may consist of one neuron per class, and the classification result corresponds to the class with the largest output value.
  • DNNs are trained using a training data set, consisting of data labelled with their corresponding classes.
  • An optimisation algorithm for example, stochastic gradient descent, may be used to find the network parameters (weights and biases for the affine transformation of the outputs of a layer to be used in the activation function of the next layer) that minimise a cost function that assigns a large cost to wrongly classified data points and a small cost to correctly classified data points.
  • the actual classification accuracy of the DNN in terms of the percentage of correctly classified data points is determined using a second labelled data set, the validation data set.
  • transfer learning may be used.
  • learning acquired in one neural network i.e. a source network
  • a second neural network i.e. a target network.
  • Reasons for transfer learning may comprise, for example, that the source and target tasks are similar and learning of the target network is expensive (for example, in terms of time or computational cost), or that the target network has access to insufficient training data and requires initialisation provided by the trained source network.
  • each data point is associated to exactly one class, as well as fully connected DNNs, where every neuron is fully connected with all neurons of the previous layer.
  • V ⁇ yi>y2>->y M ⁇
  • the training and validation data sets are and consisting of pairs of data points x l and labels y l .
  • Each layer l has JV ( neurons.
  • the network size is characterised symbolically as N — JV, - N L- - M.
  • Every neuron in the hidden layers 504, 506 and the output 508 layer combines its input values into output values (activations).
  • this function may be, for example, where w k i are called the weights, b k l the biases, and
  • Weights and biases are referred to as the network parameters, and they are to be optimised during training.
  • the source problem and the target problem are single class tasks on the same set of classes y.
  • the target problem may be more difficult than the source problem. This may mean, for example, that the data at the source and the data at the target may be different in a way such that the classes in the target problem are harder to classify correctly than in the source problem.
  • the source and the target DNN have identical structures and that training data is available for both the source and the target DNN. In other embodiments, the structures may differ from each other.
  • the goal is to train first the source network, then transfer data carrying the learning to the target network, and finally train the target network.
  • the approach comprising six steps:
  • the main challenge relates to the 4 th step, i.e. the transfer of network data.
  • the source and target network may not be co-located and this data needs to be transmitted over a wireless link, where network resources and transmit power may be costly.
  • weights and biases are real values that require a high precision.
  • the amount of data to be transferred is proportional to the network size, particularly to the number of connections in the network.
  • the use of this approach has disadvantages.
  • First, the amount of data is very large. Assuming a network of size 100-40- 20-10-6 (6 layers, input length 100, 4 hidden layers, 6 classes at output), to be used in a small example later on, and a precision of 16 bits for each weight value, the data to be transferred amounts to 80000 bits.
  • the functionality represented by the weights is specific to the source problem. Source and target data, however, may be slightly different, and it would be preferable to transfer data that is more related to the classes rather than the source data.
  • the data to be transferred to the target network may be compressed by transferring only parts of the source network, for example, by quantising the weights to few bits only, or by applying other methods for model compression.
  • the transfer problem still applies, and the order of the magnitude of the data to be transmitted will remain the same.
  • a semantic transfer learning solution is illustrated. After training the source network, a semantic analysis is performed, the logical behaviour data of the neurons is extracted. The logical behaviour data is then transferred from the source network to the target network. In the target network, a logical pre-training using the logical behaviour data is performed, followed by a regular training.
  • the logical behaviour data may carry semantic information about the classification in the source network, and it may provide information about the source network functionality to the target network.
  • the illustrated solution may be applied whenever two or more neural networks are exchanging information about the functionality they are implementing.
  • the solution may also be used in an iterative fashion between two or more networks to enable collaborative learning while exchanging only small amounts of information.
  • FIGs 6A and 6B illustrate the problem of identifying the shape of a signal according to an embodiment.
  • two single-class classification tasks, the source and the target task, and two fully connected DNNs, the source and the target network, are assumed.
  • the signals are a rectangle, a triangle, and a half-circle, where the shapes may be in different positions and may have different heights.
  • the source network is given a simpler problem, where the signals are wider (FIG. 6A), and the target network is given a harder problem, where the signals are narrower and thus the shapes are more difficult to distinguish (FIG. 6B).
  • the positions and heights of the shapes may differ within the data set.
  • the three different shapes of signals represent three classes of the classification task.
  • the source network is first initialized and trained using source training data. Any method applicable to achieving this can be used.
  • the source network may be fed with some or all of the training data again, and neuron outputs (i.e. activations) of at least some neurons are analysed. In this example it is assumed the neuron outputs of all neurons are analysed.
  • FIGS. 7A-7C illustrate an example result of an analysis of a neuron according to an embodiment. For each class, A , B , C, the frequency of each output value is depicted. In the following an example is explained for how the output of the neuron can be associated with semantics by assigning logical propositions to the outputs.
  • the neuron output (activation) is denoted by a.
  • the indices l for the layer and k for the neuron within the layer are omitted. It is also assumed that the same analysis applies to all neurons in the network.
  • a quantised neuron output is denoted as
  • A denotes both the class and the corresponding proposition, and the meaning is clear from the context.
  • B and C these basic propositions may be combined using logical operations, particularly negation -, conjunction A, disjunction v, and implication ®, to form new logical propositions.
  • logical operations particularly negation -, conjunction A, disjunction v, and implication ®, to form new logical propositions.
  • other logical systems like predicate logic, may be applied.
  • a tolerance value e may be introduced, and the logical propositions are said to be e-true if the propositions hold for the relative amount of (1 — e) of the training data set. In the following, they can still be called “true” for simplicity.
  • all neurons of the source network are semantically analysed.
  • the logical propositions corresponding to each neuron output may be written into a logical table.
  • the logical characterisation may be more general.
  • the network may learn logical propositions that are not labels of the data points and rather relations, like A ® B.
  • a network may identify animals in images, and a neuron may come up with the proposition “if mouse then cat”, even though the image labels are only conjunctions.
  • FIG. 8 illustrates an example of a logical table according to an embodiment.
  • the logical table illustrated in FIG. 8 has five layers (layer 0, layer 1, layer 2, layer 3, layer 4).
  • T ( A or B or C or D ) (always true) and F (always false).
  • the logical table may have JV tot rows (one row for each neuron) with two entries each, one corresponding to the proposition for the activation being less than 1 ⁇ 2 and for the activation being greater than 1 ⁇ 2.
  • propositions that are disjunctions of basic propositions, like A, A V C, A V B V C, etc.
  • these may be encoded in binary vectors of length M, where M is the number of classes: a zero indicates that a class is not present in the disjunction and a one indicates that the class is present.
  • M 3 classes.
  • the proposition A may then be encoded as 100, and the proposition B V C may be encoded as Oil.
  • the size of this logical table is therefore 2 M jV ( bits.
  • the general logical table (or the encoded version of it) may then transferred from the source network to the target network.
  • a logical table i.e. logical behaviour data, relating to the source network
  • the logical table carries semantic information about the classification in the source network.
  • the logical table represents semantically how the source network understands the data.
  • the logical table has a very compact representation, and it provides information about the source network functionality to the target network. Taking an example network of size 100-40-20-10-6, only about 900 bits are required for the logical behaviour data. As compared to 80000 bits in the conventional approach, the decrease in the amount of data to be transmitted to the target network is significant.
  • the logical table represents semantically how the source network understands the data.
  • a pre-training is configured to be performed at the target network side based on the obtained logical behaviour data associated with the source network.
  • an inverse logical table may be computed based on the received logical table, the inverse logical table describing the desired logical behaviour for each neuron in the target neural network. While the logical table associates propositions with the activation, the inverse logical table associates the desired activation (according to the learning of the source network) with the proposition, as given by the class of the data point.
  • a corresponds to the quantised neuron output, and denotes the desired target value.
  • All other neurons may be processed similarly, and the end result will be an inverted logical table, a k l (y ) is determined for every neuron k in every layer l.
  • the inverted logical table describes the desired logical behaviour for each neuron in the target network.
  • a cost function may be employed in the pre-training.
  • the cost function may take into account the outputs (activations) a k l of all neurons and may penalise deviations from the desired outputs a k l , according to the inverted logical table. This may be done, for example, by using the cross-entropy (CE) cost function per neuron and taking a sum over all neurons, i.e. for a given training data pair (x,y):
  • CE cross-entropy
  • the logical pre-training may be performed successively layer by layer, starting with the first hidden layer and ending with the output layer.
  • the size of training data and number of epochs may be design parameters.
  • the pre-training may be performed with a limited or small set of the target neural network training data and a limited or small number of epochs.
  • the target network weights after the pre training represent the initialisation for the following conventional training of the target network.
  • There the target network is trained using the target network training data. Any applicable method may be applied in the final training phase.
  • FIG. 9 illustrates a numerical example of transferred bits according to an embodiment.
  • the (easier) source problem uses wider shapes, while the (harder) target problem uses more narrow shapes.
  • the source and target networks have the size of 100 — 40 — 20 — 10 — 6.
  • the training in source and target network use 10000 epochs, and the logical pre-training in the target network uses 400 epochs out of the 10000 epochs.
  • the example also employs the cross-entropy cost function and gradient descent.
  • FIG. 10 illustrates a performance example of the semantic transfer according to an embodiment.
  • FIG. 10 illustrates four performance results: regular training, conventional training, semantic training A and semantic training B.
  • regular training For the conventional transfer learning, the target network is initialised with the weights of the trained source network. As compared to the regular training, the accuracy increases faster, however converges slightly below the regular training.
  • FIG. 10 shows two versions of the semantic transfer learning.
  • the semantic transfer A is the approach that has been discussed above in the various example embodiments.
  • the accuracy is low and unstable, as the pre-training is preformed successively from the first hidden layer towards the output layer such that only in the last few periods of the pre-training, the output layer is considered in the cost function.
  • the accuracy increases very fast and even outperforms both the regular training and the full- weight transfer.
  • the semantic transfer B uses a modified cost function, where the relative frequency of data points leading to the corresponding logical propositions is also taken into account.
  • the discussed semantic transfer learning is advantageous in both the amount of bits to be transferred from the source network to the target network, as well as in terms of learning speed and final accuracy.
  • the functionality described herein can be performed, at least in part, by one or more computer program product components such as software components.
  • the functionality described herein can be performed, at least in part, by one or more hardware logic components.
  • illustrative types of hardware logic components include Field-programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), Graphics Processing Units (GPUs).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)
EP21716362.5A 2021-03-31 2021-03-31 Transferlernen zwischen neuronalen netzen Pending EP4309079A1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2021/058462 WO2022207097A1 (en) 2021-03-31 2021-03-31 Transfer learning between neural networks

Publications (1)

Publication Number Publication Date
EP4309079A1 true EP4309079A1 (de) 2024-01-24

Family

ID=75377793

Family Applications (1)

Application Number Title Priority Date Filing Date
EP21716362.5A Pending EP4309079A1 (de) 2021-03-31 2021-03-31 Transferlernen zwischen neuronalen netzen

Country Status (3)

Country Link
EP (1) EP4309079A1 (de)
CN (1) CN117121020A (de)
WO (1) WO2022207097A1 (de)

Also Published As

Publication number Publication date
WO2022207097A1 (en) 2022-10-06
CN117121020A (zh) 2023-11-24

Similar Documents

Publication Publication Date Title
Choudhary et al. A comprehensive survey on model compression and acceleration
JP7193252B2 (ja) 画像の領域のキャプション付加
CN114863407B (zh) 一种基于视觉语言深度融合的多任务冷启动目标检测方法
US11868776B2 (en) AI synaptic coprocessor
Joshua Thomas et al. A deep learning framework on generation of image descriptions with bidirectional recurrent neural networks
US11449758B2 (en) Quantization and inferencing for low-bitwidth neural networks
Zhang et al. Cosine: compressive network embedding on large-scale information networks
Prabhakar et al. Summerge: An efficient algorithm and implementation for weight repetition-aware dnn inference
Chaturvedi et al. Analyzing the performance of novel activation functions on deep learning architectures
CN116956228A (zh) 一种技术交易平台的文本挖掘方法
EP4309079A1 (de) Transferlernen zwischen neuronalen netzen
Park et al. Continual learning with speculative backpropagation and activation history
Chu et al. Mixed-precision quantized neural network with progressively decreasing bitwidth for image classification and object detection
Julian Deep learning with pytorch quick start guide: learn to train and deploy neural network models in Python
Olin-Ammentorp et al. Bridge networks: Relating inputs through vector-symbolic manipulations
Rahman et al. CS_Morgan at ImageCLEFmedical 2022 Caption Task: Deep Learning Based Multi-Label Classification and Transformers for Concept Detection & Caption Prediction
Korytkowski et al. Fast computing framework for convolutional neural networks
Kalvapalli et al. A novel reformed normaliser free network with U-Net architecture for semantic segmentation
Qiu et al. CompNet: A Designated Model to Handle Combinations of Images and Designed features
US20240126987A1 (en) Decision making as language generation
US20240095032A1 (en) Ai synaptic coprocessor
Gallez et al. Hardware-software co-design of an FPGA-based transformer for embedded machine learning
Soroka et al. Cross-Modal Transfer Learning for Image and Sound
Bytyn Efficiency and scalability exploration of an application-specific instruction-set processor for deep convolutional neural networks
Wang GPU-Based Acceleration and Optimization Research on Computer Vision

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20231017

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)