GB2572949A

GB2572949A - Neural network

Info

Publication number: GB2572949A
Application number: GB1805973.3A
Authority: GB
Inventors: Aytekin Caglar; Cricri Francesco; Lixin Fan
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2018-04-11
Filing date: 2018-04-11
Publication date: 2019-10-23
Also published as: GB201805973D0

Abstract

An apparatus (e.g. mobile phone or Internet of Things -IoT- enabled device) is receiving layer weight parameters of a trained neural network (NN) and is using a subnetwork part e.g. 32 of the NN. The subnetwork has intermediate hidden layers 24, 25 which correspond to the weights of layers of the trained NN. The subnetwork further comprises an output layer (intermediate output 36). The intermediate output layer may be used as the output of the whole system in the event that the subsequent layers of the NN are missing. In this way a scalable neural network may be maintained even though some neural network data may be missing. The pre-training of the NN may take place on a server and be transmitted to the apparatus in a message sequence. Alternatively, the output layer may be trained on the apparatus from scratch or it may be fine-tuned from a base layer on the server.

Description

Neural Network

Field

The present specification relates to neural networks, in particular to instantiating or 5 providing instructions for instantiating a neural network.

Background

Neural networks are being utilized in a wide range of applications, for many different devices, such as mobile phones. Example applications of neural networks include image 10 and video analysis and processing, social media data analysis, device usage data analysis etc. Although sending data to a centralised computation server for processing is appealing in terms of end-user computational complexity and battery power saving, other considerations, such as data privacy and weaknesses of centralised computation argue in favour of a more distributed computation scenario. To this end, improvements 15 in the communication and updating of neural networks would be advantageous.

Summary

In a first aspect, this specification describes an apparatus comprising: means for receiving neural network parameters corresponding to weights of a plurality of hidden 20 layers of a trained neural network; means for instantiating a neural network, based on the received neural network parameters, wherein the instantiated neural network comprises one or more subnetworks, each subnetwork comprising: one or more hidden layers, wherein weights of the one or more hidden layers correspond to the weights of a proper subset of the plurality of hidden layers of the trained neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and an output layer, wherein the output layer is configured to receive the hidden layer output produced by the subnetwork and produce a subnetwork output based on the hidden layer output. Each subnetwork may comprise n hidden layers. The apparatus may further comprise means for selecting a subnetwork output as an output of the neural network in the event that no later subnetworks in the neural network are instantiated.

The apparatus may further comprise means for receiving output layer parameters corresponding to weights of the or each output layer. Alternatively, the apparatus may 35 further comprise means for training the or each output layer using a training dataset.

- 2 In some embodiments, the means for receiving neural network parameters may be configured to receive said neural network parameters of said hidden layers layer-bylayer. In some other embodiments, the means for receiving neural network parameters may be configured to receive said neural network parameters of said hidden 5 layers subnetwork-by-subnetwork.

In at least some forms of the invention, the means may comprise: at least one processor; and at least one memory including computer program code, the at least one memory and computer program code configured to, with the at least one processor, cause the performance of the apparatus.

In a second aspect, this specification describes an apparatus comprising: means for receiving or generating neural network parameters corresponding to weights of a plurality of hidden layers of a neural network; means for providing instructions for instantiating a neural network, based on said neural network parameters, wherein the neural network comprises one or more subnetworks, each subnetwork comprising: one or more hidden layers, wherein weights of the one or more hidden layers correspond to the weights of a proper subset of the plurality of hidden layers of the neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and an output layer, wherein the output layer is configured to receive a hidden layer output produced by the subnetwork and produce a subnetwork output based on the hidden layer output. Each subnetwork may comprise n hidden layers.

The apparatus may further comprise: means for generating parameters corresponding to weights of the or each output layer, wherein the means for providing instructions for instantiating the neural network includes means for providing said parameters corresponding to the weights of the or each output layer. The means for generating parameters corresponding to weights of the or each output layer may comprise generating said parameters by using a training dataset.

In some embodiments, the means for providing instructions for instantiating a neural network may be configured to provide said neural network parameters of said hidden layers layer-by-layer. In some other embodiments, the means for providing instructions for instantiating a neural network may be configured to provide said neural network parameters of said hidden layers subnetwork-by-subnetwork.

-3In at least some forms of the invention, the means may comprise: at least one processor; and at least one memory including computer program code, the at least one memory and computer program code configured to, with the at least one processor, cause the performance of the apparatus.

In a third aspect, this specification describes a method comprising: receiving neural network parameters corresponding to weights of a plurality of hidden layers of a trained neural network (said parameters may, for example, be received layer-by-layer 10 or subnetwork-by-subnetwork); instantiating a neural network, based on the received neural network parameters, wherein the instantiated neural network comprises one or more subnetworks, each subnetwork comprising: one or more hidden layers, wherein weights of the one or more hidden layers correspond to the weights of a proper subset of the plurality of hidden layers of the trained neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and an output layer, wherein the output layer is configured to receive the hidden layer output produced by the subnetwork and produce a subnetwork output based on the hidden layer output. Each subnetwork may comprise n hidden layers. The method may further comprise selecting a subnetwork output as an output of the neural network in 20 the event that no later subnetworks in the neural network are instantiated.

In a fourth aspect, this specification describes a method comprising: receiving or generating neural network parameters corresponding to weights of a plurality of hidden layers of a neural network; providing instructions for instantiating a neural network, 25 based on said neural network parameters, wherein the neural network comprises one or more subnetworks, each subnetwork comprising: one or more hidden layers, wherein weights of the one or more hidden layers correspond to the weights of a proper subset of the plurality of hidden layers of the neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and 30 an output layer, wherein the output layer is configured to receive a hidden layer output produced by the subnetwork and produce a subnetwork output based on the hidden layer output. Each subnetwork may comprise n hidden layers. In some embodiments, the neural network parameters of said hidden layers maybe provided layer-by-layer. In some other embodiments the neural network parameters of said hidden layers maybe 35 provided subnetwork-by-subnetwork.

-4In a fifth aspect, this specification describes an apparatus configured to perform any method as described with reference to the third or fourth aspect.

In a sixth aspect, this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform any method as described with reference to the third or fourth aspect.

In a seventh aspect, this specification describes a computer program comprising instructions stored thereon for performing at least the following: receiving neural network parameters corresponding to weights of a plurality of hidden layers of a trained neural network; instantiating a neural network, based on the received neural network parameters, wherein the instantiated neural network comprises one or more subnetworks, each subnetwork comprising: one or more hidden layers, wherein weights of the one or more hidden layers correspond to the weights of a proper subset of the plurality of hidden layers of the trained neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and an output layer, wherein the output layer is configured to receive the hidden layer output produced by the subnetwork and produce a subnetwork output based on the hidden layer output.

In an eighth aspect, this specification describes a computer program comprising instructions stored thereon for performing at least the following: receiving or generating neural network parameters corresponding to weights of a plurality of hidden layers of a neural network; providing instructions for instantiating a neural network, based on said neural network parameters, wherein the neural network comprises one or more subnetworks, each subnetwork comprising: one or more hidden layers, wherein weights of the one or more hidden layers correspond to the weights of a proper subset of the plurality of hidden layers of the neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and an output layer, wherein the output layer is configured to receive a hidden layer output produced by the subnetwork and produce a subnetwork output based on the hidden layer output.

In a ninth aspect, this specification describes a non-transitory computer readable medium comprising program instructions stored thereon for performing at least the following: receiving neural network parameters corresponding to weights of a plurality of hidden layers of a trained neural network; instantiating a neural network, based on

-5the received neural network parameters, wherein the instantiated neural network comprises one or more subnetworks, each subnetwork comprising: one or more hidden layers, wherein weights of the one or more hidden layers correspond to the weights of a proper subset of the plurality of hidden layers of the trained neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and an output layer, wherein the output layer is configured to receive the hidden layer output produced by the subnetwork and produce a subnetwork output based on the hidden layer output.

In a tenth aspect, this specification describes a non-transitory computer readable medium comprising program instructions stored thereon for performing at least the following: receiving or generating neural network parameters corresponding to weights of a plurality of hidden layers of a neural network; providing instructions for instantiating a neural network, based on said neural network parameters, wherein the neural network comprises one or more subnetworks, each subnetwork comprising: one or more hidden layers, wherein weights of the one or more hidden layers correspond to the weights of a proper subset of the plurality of hidden layers of the neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and an output layer, wherein the output layer is configured to receive a hidden layer output produced by the subnetwork and produce a subnetwork output based on the hidden layer output.

In an eleventh aspect, this specification describes an apparatus comprising: at least one processor; and at least one memory including computer program code which, when 25 executed by the at least one processor, causes the apparatus to: receive neural network parameters corresponding to weights of a plurality of hidden layers of a trained neural network; instantiate a neural network, based on the received neural network parameters, wherein the instantiated neural network comprises one or more subnetworks, each subnetwork comprising: one or more hidden layers, wherein weights 30 of the one or more hidden layers correspond to the weights of a proper subset of the plurality of hidden layers of the trained neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and an output layer, wherein the output layer is configured to receive the hidden layer output produced by the subnetwork and produce a subnetwork output based on the hidden layer output.

-6In a twelfth aspect, this specification describes an apparatus comprising: at least one processor; and at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus to: receive or generate neural network parameters corresponding to weights of a plurality of hidden layers of a 5 neural network; provide instructions for instantiating a neural network, based on said neural network parameters, wherein the neural network comprises one or more subnetworks, each subnetwork comprising: one or more hidden layers, wherein weights of the one or more hidden layers correspond to the weights of a proper subset of the plurality of hidden layers of the neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and an output layer, wherein the output layer is configured to receive a hidden layer output produced by the subnetwork and produce a subnetwork output based on the hidden layer output.

Brief description of the drawings

Example embodiments will now be described, by way of example only, with reference to the following schematic drawings, in which:

FIG. 1 is a block diagram of a neural network system in accordance with an example embodiment;

FIG. 2 is a flow chart showing an algorithm in accordance with an example embodiment;

FIG. 3 is a block diagram of a neural network system in accordance with an example embodiment;

FIG. 4 is a block diagram of a system in accordance with an example embodiment;

FIG. 5 illustrates message formats in accordance with example embodiments of the invention;

FIG. 6 is a message format in accordance with an example embodiment;

FIG. 7 is a flow chart showing an algorithm in accordance with an example embodiment;

FIG. 8 is a block diagram of a system in accordance with an example embodiment; and FIGS. 9a and 9b show tangible media, respectively a removable memory unit and a compact disc (CD) storing computer-readable code which when run by a computer perform operations according to embodiments.

Detailed description

-ΊΑ neural network is a computational graph including one or more layers of computation. One or more layers represent the input layers, and one or more layers represent the output layers. During the forward pass, the typical execution order of layers is from the input layers to the output layers. Feedforward neural networks are such that there is no feedback loop: each layer takes one or more inputs from one or more of the layers before and provides one or more outputs to one or more of the subsequent layers. Initial layers of a neural network (those close to the input data) may extract semantically low-level features, such as edges and textures in images. Intermediate and final layers may extract more semantically high-level features.

FIG. lisa block diagram of a neural network system, indicated generally by the reference numeral 1. The system 1 includes an input layer 2, one or more intermediate (or hidden) layers 4, and an output layer 6. The system includes inputs that are provided to the input layer 2 and outputs provided by the output layer 6. As shown in 15 FIG. 1, each layer takes one or more inputs from the layer before and provides one or more outputs to the subsequent layer, such that the system 1 is a feedforward neural network. Of course, although the neural network 1 includes three neural network layers, the principles described herein are applicable to neural networks having any number of layers and taking inputs from one or more previous layers and providing outputs to one 20 or more subsequent layers.

Due to the error-prone distribution environments in which neural network parameters are sometimes distributed amongst different parties, it maybe advantageous to provide a scalable neural network in which neural network performance can be maintained at a 25 reasonable level, even though some neural network data (such as parameters) may be missing.

FIG. 2 is a flow chart showing an algorithm, indicated generally by the reference numeral 10, in accordance with an example embodiment.

The algorithm 10 starts at operation 12 where a neural network is trained in order to generate neural network parameters. The operation 12 may, for example, be used to generate weights for multiple hidden layers of a neural network (such as the hidden layers 4 of the neural network 1 described above).

-8In operation 14, the hidden layers of the neural network trained in the operation 12 are divided into a number of subnetworks. An output layer is trained for each subnetwork (as discussed in detail below).

Finally, at operation 16, the neural network parameters and output layer parameters are transmitted. As described further below, the neural network and output layer parameters may be transmitted layer-by-layer or subnetwork-by-subnetwork.

FIG. 3 is a block diagram of a neural network system, indicated generally by the reference numeral 20, in accordance with an example embodiment. The neural network system 20 may be generated using the algorithm 10 described above.

The neural network system 20 comprises an input layer 22, a plurality of intermediate (or hidden) layers and an output layer 30 (similar to the input layer 2, intermediate 15 layers 4 and output layer 6 respectively of the system 1 described above). By way of example, the neural network system 20 shows a first intermediate layer 24, a second intermediate layer 25, a third intermediate layer 26, a fourth intermediate layer 27, a fifth intermediate layer 28 and a sixth intermediate layer 29.

The first and second intermediate layers 24 and 25 are arranged into a first subnetwork

32. Similarly, the third and fourth intermediate layers 26 and 27 are arranged into a second subnetwork 33 and the fifth and sixth intermediate layers 28 and 29 are arranged into a third subnetwork 34. The last hidden layer 25 of the first subnetwork 32 maybe configured to provide its output to the first hidden layer 26 of the second 25 subnetwork 22. The last hidden layer 27 of the second subnetwork 33 may be configured to provide its output to the first hidden layer 28 of the third subnetwork 34. Note that the number of intermediate layers, the number of subnetworks and the number of intermediate layers in each subnetwork is shown in FIG. 3 by way of example only. As indicated by the dots between the second and third subnetworks, for 30 example, a number of additional subnetworks could be provided. Moreover, any number of intermediate layers could be provided in each subnetwork (and the number of network layers may differ between different subnetworks).

The neural network system 20 also comprises a first intermediate output layer 36 and a 35 second intermediate output layer 38. The first intermediate output layer 36 receives an output of the first subnetwork 32 and the second intermediate output layer 38 receives

-9an output of the second subnetwork 33. The output layer of each subnetwork is configured to receive the output of the last intermediate layer in the respective subnetwork and to produce a subnetwork output based on the intermediate layer output.

In operation 12 of the algorithm 10 described above, the neural network may be trained in order to generate weights for the multiple intermediate layers of the neural network (such as the intermediate layers 24 to 29 of the neural network system 20) and the output layer 30 . End-to-end training of the network may be implemented according to 10 known training algorithms.

In operation 14, the hidden layers of the neural network trained in the operation 12 are divided into a number of subnetworks (such as the subnetworks 32, 33 and 34). An output layer is trained for each subnetwork (such as the intermediate output layers 36 15 and 38 and the output layer 30). In a similar way to the output layer 30, the intermediate output layers 36 and 38 may be trained such that the output of each intermediate output layer could be used as the output of the overall system, in the event that following layers of the neural network are missing. By way of example, the outputs of the intermediate output layers may simply not be used in the event that the following 20 layers of the neural network are present and operational.

There are a number of different options for training the output layers.

In one embodiment, each output layer is trained from scratch. For example, for the output layer 36, the input layer 22 and the first and second intermediate layers 24 and (i.e. the layers preceding the output layer 36) maybe kept constant and the output layer 36 trained. The process can then be repeated for the output layer 38, where the input layer 22 and the first, second, third and fourth intermediate layers 24 to 27 are kept constant whilst the output layer 38 is trained.

In another embodiment, the output layer may be fine-tuned from a base output layer.

For example, an output layer may be initialised by one of the following example methods (other methods are possible):

Random initialisation, such as Xavier initialisation, or some other known random initialisation;

• Pre-training the weights of the respective output layer using a different dataset (i.e. not the dataset for which the network is to be trained);

• Pre-training the weights of the respective output layer based on a different but related task.

FIG. 4 is a block diagram of a system, indicated generally by the reference numeral 40, in accordance with an example embodiment. The system 40 includes a first entity 42 (such as a server) and a second entity 44 (such as a remote device). The second entity 44 may, for example be a user device, a mobile phone or some other device, such as an

Internet of Things enabled device. The algorithm 10 described above may, for example, be carried out by the first entity 42, with the operation 16 transmitting the neural network and output layer parameters from the first entity 42 to the second entity 44.

Thus, the first entity 42 may have computational, memory and power capabilities for performing training of a neural network (such as the neural network systems 1 and 20 described above). The first entity may train the neural network itself, or may otherwise obtain an updated neural network (e.g. from some other device). The first entity may have one or more base neural networks, i.e., neural networks which maybe updated.

Similarly, the second entity 44 may have computational, memory and power capabilities for using a neural network, or is otherwise interested in receiving a neural network trained by the first entity 42. The second entity 44 may have one or more base neural networks, i.e., neural networks which maybe updated. One or more of these base neural networks may be the same as some of those present in the first entity 42. In particular, on the two devices the common base neural networks may have the same topology and/or architecture (e.g. number and type of layers, number of units per layer, etc.) and the same initial values of each of the learnable and non-learnable parameters (e.g. the weights of the various layers).

FIG. 5 is a message sequence, indicated generally by the reference numeral 50, in accordance with an example embodiment. The message sequence 50 may be used to transmit data in the operation 16 of the algorithm 10.

The message sequence 50 includes a first message 52 and a second message 54. The 35 first message 52 has a format including parameters 55 of one of the intermediate layers of the neural network. The second message 54 has a format including parameters 56 of

- 11 the ith intermediate layer of the neural network and output layer parameters 57 for the relevant subnetwork. An output layer, which uses output layer parameters 57, may take as its inputs the outputs from the previous layer which may use layer i parameters 56. The message sequence 50 relates to a subnetwork of the neural network (such as the 5 first and second intermediate layers 25 and 26 and the first output layer 36 of the network 20). Of course, if the subnetwork includes more than two intermediate layers, then the message sequence 50 would include more instances of the first message 52.

Thus, for example, in the neural network system 20, the parameters of first intermediate layer 24 may be provided in a first instance of the message 52, the parameters of the second intermediate layer 25 and the first intermediate output layer 36 may be provided by a first instance of the message 54, the parameters of third intermediate layer 26 may be provided by a second instance of the message 52, and the parameters of the fourth intermediate layer 27 and the second intermediate output layer 38 may be provided by a second instance of the message 54.

FIG. 6 is a message format, indicated generally by the reference numeral 60, in accordance with an example embodiment. The message format 60 may be used to transmit data in the operation 16 of the algorithm 10. The message format 60 has a 20 format including parameters 62 of the i intermediate layers (relating to some or all of the intermediate layers in the relevant subnetwork) and output layer parameters 64 for the relevant subnetwork. An output layer, which uses output layer parameters 64, may take as its inputs the outputs from the previous layer which uses layer i parameters included in parameters 62. The message sequence 60 relates to a subnetwork of the 25 neural network (such as the first and second intermediate layers 25 and 26 and the first output layer 36 of the network 20). Of course, if the subnetwork includes more than two intermediate layers, then the parameters 62 would simply include parameters relating to more intermediate layers.

Thus, for example, in the neural network system 20, the parameters of the first and second intermediate layers 24 and 25, and the first intermediate output layer 36 may be provided by a first instance of the message 60. Similarly, the parameters of the third and fourth intermediate layers 26 and 27, and the second intermediate output layer 38 maybe provided by a second instance of the message 60.

- 12 FIG. 7 is a flow chart showing an algorithm, indicated generally by the reference numeral 70, in accordance with an example embodiment. The algorithm 70 may, for example, be carried out at the second entity 44 of the system 40 (such as a remote device).

The algorithm 70 starts at operation 72 where neural network update parameters are received. For example, the operation 72 may receive neural network parameters for the intermediate layers 24 to 29 described above. As described above with reference to FIGS. 5 and 6, the message formats 50 and 60 may provide both the neural network parameters and the intermediate output layer parameters. Accordingly, the operation may provide both the neural network parameters and the intermediate output layer parameters.

In the example algorithm 70, a neural network is instantiated in two steps. First, at operation 74, the neural network parameters are used to instantiate the hidden layers of the neural network (such as the intermediate layers 24 to 29 described above). Next, at operation 76, the intermediate output layers of the neural network are instantiated. Of course, the operations 74 and 76 are provided by way of example and could readily be merged (or implemented in a different order).

With the hidden layers and the output layers instantiated, the neural network is ready to use. Indeed, the algorithm 70 includes the use of the neural network (operation 78).

The algorithm 70 may be implemented by providing the relevant weights of each layer 25 of a neural network. This is not essential to all embodiments. For example the second entity 44 may already have a base neural network for updating. The operation 72 of the algorithm 70 may include providing update information (e.g. the difference between a previous layer weight and a new layer weight). In this arrangement, the second entity 44 may modify the weights of the corresponding layers accordingly (e.g. using a summation operation).

Some of the examples described above relate to layer-by-layer updating of neural networks. This is not essential. For example, the invention is applicable to unit-by-unit updates of neural networks, where the units are any of a neural network layer, a neural 35 network filter or a neural network node.

-13Moreover, the examples described above generally provide two intermediate layers per intermediate output layer. This is not essential. Any number of intermediate layers (including one) could be provided before each intermediate output layer.

For completeness, FIG. 8 is a schematic diagram of components of one or more of the modules described previously (e.g. the first entity 42 and/or the second entity 44), which hereafter are referred to generically as processing systems 300. A processing system 300 may have a processor 302, a memory 304 coupled to the processor and comprised of a RAM 314 and ROM 312, and, optionally, user inputs 310 and a display

318. The processing system 300 may comprise one or more network interfaces 308 for connection to a network, e.g. a modem which may be wired or wireless.

The processor 302 is connected to each of the other components in order to control operation thereof.

The memory 304 may comprise a non-volatile memory, a hard disk drive (HDD) or a solid state drive (SSD). The ROM 312 of the memory 304 stores, amongst other things, an operating system 315 and may store software applications 316. The RAM 314 of the memory 304 is used by the processor 302 for the temporary storage of data. The operating system 315 may contain code which, when executed by the processor, implements aspects of the algorithms 10 or 70.

The processor 302 may take any suitable form. For instance, it may be a microcontroller, plural microcontrollers, a processor, or plural processors. Processor

302 may comprise processor circuitry.

The processing system 300 may be a standalone computer, a server, a console, or a network thereof.

In some embodiments, the processing system 300 may also be associated with external software applications. These maybe applications stored on a remote server device and may run partly or exclusively on the remote server device. These applications may be termed cloud-hosted applications. The processing system 300 may be in communication with the remote server device in order to utilize the software application stored there.

-14FIG. 9a and FIG. 9b show tangible media, respectively a removable memory unit 365 and a compact disc (CD) 368, storing computer-readable code which when run by a computer may perform methods according to embodiments described above. The removable memory unit 365 may be a memory stick, e.g. a USB memory stick, having 5 internal memory 366 storing the computer-readable code. The memory 366 may be accessed by a computer system via a connector 367. The CD 368 may be a CD-ROM or a DVD or similar. Other forms of tangible storage media may be used.

Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic. The software, application logic and/or hardware may reside on memory, or any computer media. In an example embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a “memory” or “computer-readable medium” maybe any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.

Reference to, where relevant, “computer-readable storage medium”, “computer program product”, “tangibly embodied computer program” etc., or a “processor” or “processing circuitry” etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices and other devices. References to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device as instructions for a processor or configured or configuration settings for a fixed function device, gate array, programmable logic device, etc.

As used in this application, the term “circuitry” refers to all of the following: (a) hardware-only circuit implementations (such as implementations in only analogue and/or digital circuitry) and (b) to combinations of circuits and software (and/or firmware), such as (as applicable): (i) to a combination of processor(s) or (ii) to portions of processor(s)/software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a server, to perform

-ι₅various functions) and (c) to circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even if the software or firmware is not physically present.

If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined. Similarly, it will also be appreciated that the flow diagrams of FIGS. 2 and 7 are examples only and that various operations depicted therein maybe omitted, reordered and/or combined.

It will be appreciated that the above described example embodiments are purely illustrative and are not limiting on the scope of the invention. Other variations and modifications will be apparent to persons skilled in the art upon reading the present specification.

Moreover, the disclosure of the present application should be understood to include any novel features or any novel combination of features either explicitly or implicitly disclosed herein or any generalization thereof and during the prosecution of the present application or of any application derived therefrom, new claims may be formulated to 20 cover any such features and/or combination of such features.

Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other combinations of features from the described embodiments and/or the dependent claims with the features of the independent claims, 25 and not solely the combinations explicitly set out in the claims.

It is also noted herein that while the above describes various examples, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which maybe made without departing from the scope of 30 the present invention as defined in the appended claims.

Claims

Claims:

1. An apparatus comprising:

means for receiving neural network parameters corresponding to weights of a

5 plurality of hidden layers of a trained neural network;

means for instantiating a neural network, based on the received neural network parameters, wherein the instantiated neural network comprises one or more subnetworks, each subnetwork comprising:

one or more hidden layers, wherein weights of the one or more hidden io layers correspond to the weights of a proper subset of the plurality of hidden layers of the trained neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and an output layer, wherein the output layer is configured to receive the hidden layer output produced by the subnetwork and produce a subnetwork

15 output based on the hidden layer output.
2. An apparatus as claimed in claim 1, further comprising:

means for receiving output layer parameters corresponding to weights of the or each output layer.
3. An apparatus as claimed in claim 1, further comprising:

means for training the or each output layer using a training dataset.
4. An apparatus as claimed in any one of claims 1 to 3, further comprising:

25 means for selecting a subnetwork output as an output of the neural network in the event that no later subnetworks in the neural network are instantiated.
5. An apparatus as claimed in any one of the preceding claims, wherein said means for receiving neural network parameters is configured to receive said neural network

30 parameters of said hidden layers layer-by-layer.
6. An apparatus as claimed in any one of claims 1 to 4, wherein said means for receiving neural network parameters is configured to receive said neural network parameters of said hidden layers subnetwork-by-subnetwork.
7. An apparatus comprising:

-17means for receiving or generating neural network parameters corresponding to weights of a plurality of hidden layers of a neural network;

means for providing instructions for instantiating a neural network, based on said neural network parameters, wherein the neural network comprises one or more

5 subnetworks, each subnetwork comprising:

one or more hidden layers, wherein weights of the one or more hidden layers correspond to the weights of a proper subset of the plurality of hidden layers of the neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and

10 an output layer, wherein the output layer is configured to receive a hidden layer output produced by the subnetwork and produce a subnetwork output based on the hidden layer output.
8. An apparatus as claimed in claim 7, further comprising:

15 means for generating parameters corresponding to weights of the or each output layer, wherein the means for providing instructions for instantiating the neural network includes means for providing said parameters corresponding to the weights of the or each output layer.
9. An apparatus as claimed in claim 8, wherein the means for generating parameters corresponding to weights of the or each output layer comprises generating said parameters by using a training dataset.

25
10. An apparatus as claimed in any one of claims 7 to 9, wherein said means for providing instructions for instantiating a neural network is configured to provide said neural network parameters of said hidden layers layer-by-layer.
11. An apparatus as claimed in any one of claims 7 to 9, wherein said means for

30 providing instructions for instantiating a neural network is configured to provide said neural network parameters of said hidden layers subnetwork-by-subnetwork.
12. An apparatus as claimed in any one of the preceding claims, wherein each subnetwork comprises n hidden layers.

-1813 · An apparatus as claimed in any one of the preceding claims, wherein the means comprise:

at least one processor; and at least one memory including computer program code, the at least one memory

5 and computer program code configured to, with the at least one processor, cause the performance of the apparatus.
14. A method comprising:

receiving neural network parameters corresponding to weights of a plurality of 10 hidden layers of a trained neural network;

instantiating a neural network, based on the received neural network parameters, wherein the instantiated neural network comprises one or more subnetworks, each subnetwork comprising:

one or more hidden layers, wherein weights of the one or more hidden
15 layers correspond to the weights of a proper subset of the plurality of hidden layers of the trained neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and an output layer, wherein the output layer is configured to receive the hidden layer output produced by the subnetwork and produce a subnetwork 20 output based on the hidden layer output.

15. A method comprising:

receiving or generating neural network parameters corresponding to weights of a plurality of hidden layers of a neural network;

25 providing instructions for instantiating a neural network, based on said neural network parameters, wherein the neural network comprises one or more subnetworks, each subnetwork comprising:

one or more hidden layers, wherein weights of the one or more hidden layers correspond to the weights of a proper subset of the plurality of hidden 30 layers of the neural network, and wherein the subnetwork is configured to produce a hidden layer output on receiving an input; and an output layer, wherein the output layer is configured to receive a hidden layer output produced by the subnetwork and produce a subnetwork output based on the hidden layer output.