WO2019234291A1

WO2019234291A1 - An apparatus, a method and a computer program for selecting a neural network

Info

Publication number: WO2019234291A1
Application number: PCT/FI2019/050393
Authority: WO
Inventors: Francesco Cricri; Caglar AYTEKIN
Original assignee: Nokia Technologies Oy
Priority date: 2018-06-08
Filing date: 2019-05-21
Publication date: 2019-12-12
Also published as: EP3803712A4; EP3803712A1

Abstract

There is provided an apparatus comprising means for receiving data to be processed by one of a plurality of main neural networks (210). The apparatus comprises means for providing the data and signalling information associated with the data to a plurality of devices each comprising a main neural network and an auxiliary neural network, the auxiliary neural network comprising a subset of layers of the main neural network, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural networks at the plurality of devices (220). The apparatus comprises means for receiving, from the plurality of devices, indications of performance of the auxiliary neural networks for performing the auxiliary task (230). The apparatus comprises means for selecting, based on the indications of performance of the auxiliary neural networks, one of the plurality of main neural networks for performing a main task on the data (240).

Description

An apparatus, a method and a computer program for selecting a neural network

Technical field

Various example embodiments relate to selecting a neural network from a plurality of neural networks.

Background

Neural networks are being utilized in an ever increasing number of applications for many different types of device, such as mobile phones. Examples of applications comprise image and video analysis and processing, social media data analysis, device usage data analysis, etc.

Various different neural networks may be available for various different tasks. It may be difficult to choose between different neural networks which one is an optimal neural network for performing a specific task when ground-truth data or an entity providing guidance are not available.

There is, therefore, a need for a solution that enables the selection of the optimal neural network for performing a task without the need for ground-truth data.

Summary

Now there has been invented a method and technical equipment implementing the method, by which the above problems are alleviated. Various aspects include an apparatus, a method, and a computer program product comprising a computer program stored therein, which are characterized by what is stated in the independent claims. Various example embodiments are disclosed in the dependent claims.

According to a first aspect, there is provided an apparatus comprising means for receiving data to be processed by one of a plurality of main neural networks; providing the data and signalling information associated with the data to a plurality of devices each comprising a main neural network and an auxiliary neural network, the auxiliary neural network comprising a subset of layers of the main neural network, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural networks at the plurality of devices; receiving, from the plurality of devices, indications of performance of the auxiliary neural networks for performing the auxiliary task; and selecting, based on the indications of performance of the auxiliary neural networks, one of the plurality of main neural networks for performing a main task on the data.

According to an embodiment, the apparatus further comprises means for requesting the selected main neural network to perform the main task on the data; and receiving an output of the main task.

According to an embodiment, the apparatus further comprises means for requesting one of the plurality of devices comprising the selected main neural network to provide the selected main neural network; receiving the selected main neural network; and performing the main task on the data using the selected main neural network.

According to an embodiment, the apparatus is a cell phone.

According to a second aspect, there is provided an apparatus comprising a main neural network and an auxiliary neural network comprising a subset of layers of the main neural network, further comprising means for: receiving data and signalling information associated with the data, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural network; training the auxiliary network for performing the auxiliary task; providing an indication of performance of the auxiliary neural network for performing the auxiliary task; and receiving, in response to providing the indication of performance, a request to perform a main task by a selected main neural network or to provide the selected main neural network to another device.

According to an embodiment, the auxiliary task is an unsupervised task or a self-supervised task. According to an embodiment, the signalling information further comprises at least one of an identifier of the main task; one or more parameters for the auxiliary neural networks; and one or more training parameters for the auxiliary neural networks.

According to an embodiment, the indication of performance comprises a convergence speed of the auxiliary neural network.

According to an embodiment, the auxiliary networks are trained using as initial values for parameters of the subset of layers values of the subset of layers of the main neural network.

According to an embodiment, a learning rate of the subset of layers is lower than a learning rate of other layers of the auxiliary neural network.

According to an embodiment, the data is image data or video data and the auxiliary task is an image denoising task, an image inpainting task, an image compression task, a single-image super-resolution task, a next frame prediction task and/or sound generation task from image data or video data.

According to an embodiment, the data is image data or video data and the main task is an image classification task, an image segmentation task, an image object detection task, an image or a video captioning task, a salient object detection task and/or a video object tracking task.

According to an embodiment, the means comprises at least one processor; at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the performance of the apparatus.

According to a third aspect, there is provided a method comprising receiving data to be processed by one of a plurality of main neural networks; providing the data and signalling information associated with the data to a plurality of devices each comprising a main neural network and an auxiliary neural network, the auxiliary neural network comprising a subset of layers of the main neural network, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural networks at the plurality of devices; receiving, from the plurality of devices, indications of performance of the auxiliary neural networks for performing the auxiliary task; and selecting, based on the indications of performance of the auxiliary neural networks, one of the plurality of main neural networks for performing a main task on the data.

According to a fourth aspect, there is provided a method comprising receiving data and signalling information associated with the data, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural network; training the auxiliary network for performing the auxiliary task; providing an indication of performance of the auxiliary neural network for performing the auxiliary task; and receiving, in response to providing the indication of performance, a request to perform a main task by a selected main neural network or to provide the selected main neural network to another device.

According to a fifth aspect, there is provided a computer program comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to: receive data to be processed by one of a plurality of main neural networks; provide the data and signalling information associated with the data to a plurality of devices each comprising a main neural network and an auxiliary neural network, the auxiliary neural network comprising a subset of layers of the main neural network, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural networks at the plurality of devices; receive, from the plurality of devices, indications of performance of the auxiliary neural networks for performing the auxiliary task; and select, based on the indications of performance of the auxiliary neural networks, one of the plurality of main neural networks for performing a main task on the data

According to a sixth aspect, there is provided a computer program comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to: receive data and signalling information associated with the data, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural network; train the auxiliary network for performing the auxiliary task; provide an indication of performance of the auxiliary neural network for performing the auxiliary task; and receive, in response to providing the indication of performance, a request to perform a main task by a selected main neural network or to provide the selected main neural network to another device.

Description of the Drawings

In the following, various example embodiments will be described in more detail with reference to the appended drawings, in which

Fig . 1 a shows, by way of example, a system and devices for selecting a neural network;

Fig. 1 b shows, by way of example, a block diagram of an apparatus

Fig. 2 shows, by way of example, a flowchart of a method for selecting a neural network;

Fig. 3a, 3b and 3c show, by way of examples, communication and signalling between a user device and other devices;

Fig. 4 shows, by way of example, a process of transfer learning and training of an auxiliary network.

Fig. 5 shows, by way of example, a flowchart of a method for selecting a neural network;

Drawings are schematic.

Description of Example Embodiments

A neural network (NN) is a computation graph comprising several layers of computation. Each layer may comprise one or more units, where each unit performs an elementary computation. A unit is connected to one or more other units, and the connection may have associated a weight. The weight may be used for scaling a signal passing through the associated connection. Weights are usually learnable parameters, i.e. , values which may be learned from training data. There may be other learnable parameters, such as those of batch-normalization layers.

It is noted that the terms“model ,“neural network ,“neural net and“network may be used interchangeably. The weights of neural networks may be referred to as learnable parameters or simply as parameters.

Two of the most widely used architectures for neural networks are feed forward and recurrent architectures. Feed-forward neural networks are such that there is no feedback loop: each layer takes input from one or more of the layers before and provides its output as the input for one or more of the subsequent layers. Also, units inside a certain layers take input from units in one or more of preceding layers, and provide output to one or more of following layers.

Initial layers, i.e. layers close to the input data, may extract semantically low- level features. In case of image data, these low-level features may be e.g. edges and textures in images. The intermediate and final layers may extract more high-level features. There may be one or more layers after the feature extraction layers performing a certain task, such as classification, semantic segmentation, object detection, denoising, style transfer, super-resolution, etc. In recurrent neural nets, there is a feedback loop, so that the network may become stateful, i.e., it may be able to memorize information or a state.

The neural networks and other machine learning tools are able to learn properties from input data. Learning may be e.g. supervised or unsupervised or semi-supervised. Such learning is a result of a training algorithm, or of a meta-level neural network providing the training signal. The training algorithm may comprise changing some properties of the neural network so that its output is as close as possible to a desired output. For example, in the case of classification of objects in images, the output of the neural network may be used to derive a class or category index which indicates the class or category that the object in the input image belongs to. Training may be carried out by minimizing or decreasing the output’s error, also referred to as the loss. Examples of losses are mean squared error, cross-entropy, etc. Training may be an iterative process, where at each iteration the algorithm modifies the weights of the neural network to make a gradual improvement of the network’s output, i.e. to gradually decrease the loss.

Training a neural network is an optimization process. The goal of the optimization or training process is to make the model learn the properties of the data distribution from a limited training dataset. In other words, the goal is to learn to use a limited training dataset in order to learn to generalize to previously unseen data, i.e. data which was not used for training the model. This may be referred to as generalization. The data may be split into at least two sets, the training set and the validation set. The training set is used for training the network, i.e. to modify its learnable parameters in order to minimize the loss. The validation set is used for checking the performance of the network on data which was not used to minimize the loss, as an indication of the final performance of the model.

The errors on the training set and on the validation set may be monitored during the training process. The network is learning if the training set error decreases. Otherwise the model is considered to be in the regime of underfitting. The network is learning to generalize if also the validation set error decreases and is not too much higher than the training set error. If the training set error is low, but the validation set error is much higher than the training set error, or it does not decrease, or it even increases, the model is considered to be in the regime of overfitting. This means that the model has just memorized the training set’s properties and performs well on that set, but may perform poorly on a set not used for tuning its parameters.

Various different neural networks may be available for various different tasks, such as classification, segmentation, future prediction etc. Different neural networks may be trained for the same task. However, each neural network may be trained on a specific and/or narrow data-domain, rather than on a wide domain. Here, the domain means the context and/or conditions in which the data was captured. For example, an image scene classification task involves classes such as“outdoor”,“cityscape”,“nature”,“kitchen”,“inside car”, etc. Each network may be trained to perform the scene classification task on data captured in one of the following lighting and/or weather conditions:“rainy” domain,“dark” domain,“sunny” domain,“foggy” domain, etc. However, the domain is not limited to lighting conditions, but it may be one of the scene classes described above, i.e. “outdoor”, “cityscape”, “nature”,“kitchen”,“inside-car”, etc. For example, for the task of future human action prediction, one network may be trained on data from the“outdoor” domain, another network on the“indoor” domain, and still another network on the“kitchen” domain, etc.

There may be several reason for having narrow domain neural nets for the same tasks. For example, neural nets may perform better if they are trained on a narrow domain, as it is a simpler problem to be solved. The training device performing the training of a certain network may be able of capturing data only or mainly from a specific domain. For example, in The British Isles, the most common weather condition domain may be“cloudy”, whereas in California the most common weather condition domain may be“sunny”. A neural network trained on a narrow domain may need less weights to perform well, as opposed to another network trained on a wide domain which may need much more weights to perform well on that wide domain. A network with less weights may occupy less memory and storage and thus be more suitable for memory-limited devices such as Internet of Things (loT) devices and mobile devices.

There may be situations, that a user device receives content and the content needs to be processed or analyzed by a neural net. However, the user device may have limited memory and/or computational capabilities. Thus, if the user device comprises a neural network, this neural network may have been trained for narrow domains and thus probably not suitable for performing a task of interest on data which may be from a different domain. The user device may be connected to other devices having neural networks. These other devices may also have limited memory and/or computational capabilities. Thus, it may be difficult to choose between the neural networks which one is the optimal neural network for performing a specific task on a certain input data.

An apparatus performing a method disclosed herein is able to select a neural network to efficiently execute or obtain an output from the most optimal neural network among a plurality of neural networks without availability of any ground-truth data nor the availability of an oracle providing indications or approximations of the ground-truth labels. Herein, oracle refers to an entity, such as a human or a neural network trained on a big dataset, which may provide ground-truth data or guidance about the performance of other neural networks. The most optimal or the best neural network is the network having the best performance on the domain of the data provided by the user device. The approach proposed in this invention may provide an approximation of the most optimal neural network, i.e. , it may select a network which is not the most optimal but which may be one of the most optimal neural networks.

Fig. 1 a shows, by way of example, a system and devices for selecting a neural network. A user device 110 may be e.g. a mobile device such as a cell phone, e.g. smartphone 125, or the user device may be a personal computer or a laptop 120. The user device may be able to capture content or receive content from another entity, e.g. a database. The content, i.e. data, needs to be processed and/or analyzed by a neural network. A device may be a user device if it has availability of the content. Thus, a server 1 15 may be considered as a user device, if it has availability of the data. The different devices may be connected to each other via a communication connection 100, e.g. via Internet, a mobile communication network, Wireless Local Area Network (WLAN), Bluetooth®, or other contemporary and future networks. Different networks may be connected to each other by means of a communication interface. The server 1 15 may be connected to and controlled by another device, e.g. another user device. Devices 130, 131 , 132 are devices having at least one neural network (NN). The NN devices 130, 131 , 132 may be connected among themselves. The NN devices have and are able to run at least one neural network. The at least one neural network may be trained on a narrow domain and each NN device may have at least one neural network which has been trained on a different domain than the network on another NN device. The server 1 15 may be one of the NN devices.

Fig. 1 b shows, by way of example, a block diagram of an apparatus. The apparatus may be the user device 1 10 and/or the NN device. The apparatus may comprise a user interface 102. The user interface may receive user input e.g. through a touch screen and/or a keypad. Alternatively, the user interface may receive user input from internet or a personal computer or a smartphone via a communication interface 108. The apparatus may comprise means such as circuitry and electronics for handling, receiving and transmitting data. The apparatus may comprise a memory 106 for storing data and computer program code which can be executed by a processor 104 to carry out various embodiment of the method as disclosed herein. The elements of the method may be implemented as a software component residing in the apparatus or distributed across several apparatuses. Processor 104 may include processor circuitry. The computer program code may be embodied on a non- transitory computer readable medium.

Fig. 2 shows a flowchart of a method 200 for selecting a neural network. The method 200 may be carried out e.g. by the user device 110. The method may comprise receiving 210 data to be processed by one of a plurality of main neural networks. The method may comprise providing 220 the data and signalling information associated with the data to a plurality of devices each comprising a main neural network and an auxiliary neural network. The auxiliary neural network may comprise a subset of layers of the main neural network. The signalling information may comprise an identifier of an auxiliary task to be performed on the data by the auxiliary neural networks at the plurality of devices. The method may comprise receiving 230, from the plurality of devices, indications of performance of the auxiliary neural networks for performing the auxiliary task. The method may comprise selecting 240, based on the indications of performance of the auxiliary neural networks, one of the plurality of main neural networks for performing a main task on the data.

Fig. 3a, 3b and 3c show, by way of examples, communication and signalling between a user device and other devices. The user device 1 10 may be connected by a bi-directional channel to the NN devices 130, 131 , 132. The NN devices may also be connected among themselves. The user device 1 10 has availability of data. The data may be received e.g. from a memory of the user device, by capturing the data by the user device or received from another entity, such as a database. The data may be any type of data which is, or is pre-processed to be in suitable format for inputting the data to a neural network. For example, the data may be pre-processed to tensor form i.e. to a multidimensional array. The data may be e.g. image data captured by a camera or video data captured by a video camera. Other examples of data may be e.g. text data or audio data, such as audio or speech signal. A task, i.e. a main task, needs to be performed on the data, for any reason. The task may be e.g. an analysis task such as classification, e.g. classification of image data, or a processing task such as denoising of an image. In case of audio data, or audio data extracted from video data, the task may be a speech recognition task. The task is to be performed by a neural network.

The user device 1 10 may have one or more neural networks. However, these networks may be trained on a narrow domain or for a different task. If the narrow domains on which these neural networks are trained do not correspond to the domain of the data on which the main task needs to be performed, the networks are not optimal for performing the task. The user device may determine that the data is from a different domain, i.e. the probability that the data is from the same domain as the training data of the neural network of the user device is less than 1. The determination of whether the domains are different may be carried out such that the neural network of the user device performs the task, for example a classification task. If the outputs, for example classes, have a low probability, it may be determined that the domain of the training data of the neural network of the user device is different from the domain of the input data. Alternatively, another neural network may determine the domain of the input data which may be compared with the domain of the training data. If the domain of the input data is different from the domain of the training data of the neural network of the user device, the user device may initiate a process for identifying the best or optimal neural network which is able to perform the main task, i.e. a task of interest, on the domain to which the data belongs. However, the user device may initiate the identification of the best neural network even without first verifying that the domain of the input data is different from the domain of the neural network(s) in the user device.

The NN devices have one or more neural networks. Furthermore, the NN devices have sufficient memory and computational capabilities for running the neural networks. The NN devices also have capabilities for training the neural networks. The neural networks on different devices may have been trained on different data domains and for the same task. Alternatively, the neural networks may have been trained for different tasks, but for each task there may be different networks trained on different data domains. In general, the assumption is that at least a sub-set of all NN devices has one or more neural networks trained for the task of interest for the user device. There is not availability of ground-truth labels or the availability of an oracle providing indications or approximations of the ground-truth labels.

Fig. 3a shows, by way of example, the signalling from the user device to the NN devices. The user device 1 10 provides the data 310 to a plurality of the NN devices 130, 131 , 132. The NN devices comprise a main neural network and an auxiliary neural network. The auxiliary network comprises a subset of layers of the main neural network. In addition to the data, signalling information 320 associated with the data may be provided. The signalling information comprises an auxiliary task ID i.e. an AuxTaskID. The AuxTaskID may be an identifier of an auxiliary task to be performed on the data by the auxiliary neural network to be trained. The signalling information may further comprise an identifier of the main task, i.e. a Task ID. The main task is the task of interest for the user device 1 10. The Task ID may be understood by the NN devices. Examples of main tasks may be object detection, image classification, image segmentation, image enhancement, image captioning, etc. The signalling information may further comprise e.g. hyper-parameters for the architecture of the auxiliary network, training hyper-parameters for the auxiliary network and/or the number K of initial layers to transfer from the main network to the auxiliary network.

Alternatively, the user device may send the data, whereas the additional information may be either sent to the NN devices by a third-party entity or is negotiated among the NN devices themselves. Fig. 3c shows an example, wherein there is an intermediate device 350, e.g. a third-party entity, communicating between the user device 1 10 and the NN devices 130, 131 , 132.

The auxiliary task ID and/or the main task ID may comprise e.g. a script to be run by the NN device, the script executing a task on data. The auxiliary task may be e.g. an image denoising task, an image inpainting task, an image compression task, a single-image super-resolution task, a next frame prediction task and/or sound generation task from image data or video data. The main task may be e.g. an image classification task, an image segmentation task, an image object detection task, an image or a video captioning task, a salient object detection task and/or a video object tracking task. Referring back to Fig. 3b, the NN devices receive data and signalling information associated with the data, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural network. The signalling information may further comprise other information as described above.

The NN devices comprise one or more neural networks for performing the main task, referred to as main neural networks. At least some of these main neural networks may have been trained on different data domains. The NN devices may use the data to train an auxiliary network for performing the auxiliary task. The training may be carried out in unsupervised way or in self- supervised way. The auxiliary network may be derived from the main network. Thus, the auxiliary network comprises the already-trained initial layers of the main network and the new layers which are not yet trained. Using the initial layers is an example, which may be very common in practice, but the method presented herein is not limited to such case. The auxiliary network may re use any combination of layers, or a subset thereof, from the main network. The training may comprise using as initial values for parameters of the re used subset of layers values of the subset of layers of the main neural network. Features extracted by the initial layers of the main neural network are better features when transferred to another neural network if the domain of the input data and the domain of the data used to train the main neural network are similar. Therefore, the features extracted by the re-used or transferred layers perform well also for an auxiliary task, which may be an unsupervised or a self-supervised task. The transfer learning is described in more detail in the context of Fig. 4.

The Fig. 3b shows, by way of example, the signalling from the NN devices 130, 131 , 132 to the user device 1 10. Indications of performance 330 of the auxiliary neural networks are sent from the NN devices to the user device. Indications of performance indicate how well the auxiliary neural network performed the auxiliary task. Indications of performance need to be comparable among different NN devices such that the most optimal neural network may be chosen. According to an embodiment, a method for determining performance of the auxiliary network and/or a format of providing the indication of performance may be indicated in signalling information 320 sent from the user device 1 10 to the plurality of NN devices 130, 132, 132. The indications of performance may comprise a convergence speed. The convergence speed may be measured e.g. by the number of iterations needed so that the loss becomes less than a certain, pre-defined threshold. The threshold may be indicated in signalling information 320. In addition or alternatively, the performance of the auxiliary net may be described by loss values. Examples of losses are mean squared error, cross-entropy, etc. For example, a loss may be computed based on the input to the auxiliary neural network, i.e. the data received from the user device, and the output of the neural network. In addition or alternatively, the performance of the auxiliary neural network may be described by how much the auxiliary neural network modified the weights of the initial layers, i.e. re-used or transferred layers.

Alternatively, the received data may be divided into two parts. For example, if the data is an image, it may be divided into two halves. One part is used to train the auxiliary neural network and the other part is used as validation data. The loss computed on the validation data is the value which will be compared among different NN devices. This approach is more robust in cases where the auxiliary task comprises reconstructing the input data and where the input data is not corrupted by any noise or other modification.

Based on the indications of performance of the auxiliary neural networks, a decision may be made about which NN device has the best main network for the main task. The decision may be made based on the training session of the auxiliary networks. Thus, it is selected, based on the indications of performance of the auxiliary neural networks, one of the plurality of main neural networks for performing a main task on the data. The selection may be made by comparing the indication(s) of performance received from the plurality of NN devices to at least one predetermined criterion, e.g., which auxiliary neural network reached the lowest loss, which auxiliary neural net converged faster, or which auxiliary neural net’s training modified the least the weights of the re-used or transferred K layers in case the re-used or transferred K layers were also trained, i.e. their weights or parameters were tuned. See Fig. 4 for the K layers.

According to an embodiment, the user device may select the first NN device or network that fulfils the at least one predetermined criterion. In other words, the selected neural network is not necessarily the best according to at least one predetermined criterion mentioned above. For example, user device may select the NN device 130 if it provides an indication of sufficient performance before receiving the requested indications of performance from the other NN devices 131 , 132. Sufficient performance may be defined by a predetermined threshold, e.g. a specific convergence speed.

Once one of the neural networks is selected, the user device may request the selected main neural network to perform the main task on the data and receive an output of the main task. Alternatively, the user device may request one of the plurality of devices comprising the selected main neural network to provide the selected main neural network. The user device may receive the selected main neural network and perform the main task on the data using the selected main neural network.

Thus, in response to providing the indication of performance of the auxiliary network for performing the auxiliary task, the NN device may receive a request to perform a main task by the selected main neural network or to provide the selected main neural network to another device, e.g. to the user device 1 10 or to an intermediate device 350. In response to receiving the request, the NN device may perform the main task and transmit the results to user device 1 10, or transmit its main neural network to user device 1 10 or another device identified in the request.

As shown in Fig. 3b, output of the main task performed by the main neural network, i.e. MainNet’s output, may also be provided to the user device together with the indications of performance, i.e. AuxNet’s performance. In this case, the user device may compare the training information of the auxiliary networks, make a selection of the best main neural network, and use the output of the desired neural network, for example the best performing main neural network. Alternatively, the output of the main task performed by the main neural network may also be provided to the user device together with the main neural network. In this case, the user device may compare the training information of the auxiliary networks, make a selection of the best main neural network, and use this best main neural network to obtain the desired output. Fig. 3c shows, by way of example, signalling between the user device and the NN device through an intermediate device 350. The intermediate device may perform e.g. receiving the data and main task ID from the user device 1 10. The intermediate device 350 may broadcast the data and main task ID to the NN devices 130, 131 , 132, with associated signalling information similar to signalling information 320. The intermediate device 350 may receive the information on performance of the trained auxiliary networks from the NN devices and make the decision about the model. The intermediate device 350 may request the NN device having the selected model to provide the main NN’s output to the user device. Alternatively, if the NN devices already sent the main task output to the intermediate device 350, then this entity will just forward the selected main task output to the user device. If the NN devices already sent the main networks to the intermediate device 350, then this entity will either run the best main network on the given input and send back to the user device the obtained output, or forward the best main network to the user device.

The Fig. 4 shows, by way of example, a process of transfer learning and training of an auxiliary network. The main network comprises the initial layers 430, which may extract low/mid-level features 435. The layers 450 are layers which are more specific to the main task. For example the layers 450 may be one or few layers which are needed for performing classification, such as one or more convolutional layer, one or more fully-connected layers, and a final softmax layer. In the simplest case, for classification, the layers 450 may comprise one fully-connected layer and a softmax layer. In the example of Fig. 4 the layers 450 include the first K layers, but it is to be noted that any subset of K layers may be used. The NN device may transfer the initial K layers 430 from the main network MainNet 410 to the auxiliary network AuxNet 420. The transfer may be carried out, for example, by copying the initial layers 430 to obtain re-used or transferred layers 440. Thus, the auxiliary network may comprise a subset of layers of the main neural network. The first K layers 440 may also be trained during training of the auxiliary network. Any modification to these layers done during auxiliary training will not affect the original copy of these layers in the main network. These re-used or transferred K layers may function as feature extraction layers. In addition to these K layers, the device will add new layers to complete the auxiliary network. The new layers 460 of the auxiliary network may be chosen based on the received additional information, i.e. the signalling information. The new layers 460 may be completely untrained, for example initialized with one of the common initialization methods, or may be pre-trained on another dataset. The architecture of the auxiliary net may be the same in different devices. Similarity of the architectures may be agreed e.g. by receiving the architecture information from the user device or from a third-party entity. Alternatively the similarity of the architectures may be negotiated and agreed by the NN devices with each other. The NN device may set the learning rate and other training hyper-parameters as instructed in the signalling information received from the user device or from the third-party entity or as agreed with other NN devices. The NN device may train the auxiliary neural network for the number of iterations or epochs specified in the additional information.

In transfer learning, layers of a neural network which were trained for a certain task or domain are re-used for a different task or domain. A neural network may be first trained on a certain task and domain, then the initial K layers of the trained neural net are used for building a new neural net, where these K layers form the initial layers of the new network and new layers are added. This way, the new network, i.e. auxiliary network 420, may comprise the pre trained re-used or transferred layers 440, followed by new layers 460. The new layers may be pre-trained or only initialized with one of the methods known for a skilled person. The new network is then trained on a new task and/or a new data domain. The new layers 460 are trained with a sufficiently high learning rate, e.g., 0.01 , whereas the pre-trained layers 440 may be left un-modified, or be fine-tuned for example using a smaller learning rate, e.g., 0.001 , than for the new layers. Learning rate above 0.007 may be considered high. Learning rate below 0.002 may be considered low. Learning rate is an update step size. Smaller learning rate indicates that the weights are updated by a smaller amount. A learning rate of the subset of layers 440, i.e. the re used or transferred layers from the main neural network, may be lower than a learning rate of other layers 460 of the auxiliary neural network. Thus, the new layers 460 may be trained more than the subset of layers 440 transferred, e.g. copied, from the main neural network 410. With lower learning rate of the subset of layers 440, the subset of layers may be preserved close to the initial K layers 430. This may ensure that the performance of the auxiliary network provides a good estimate of the performance of the main neural network. The initial layers may be considered to be good if they transfer well to the new training, i.e. , if the new network, after being trained, performs well on the new task and/or domain. Transfer learning may be applied on the initial layers and/or to any subset of layers.

The NN devices may transfer the initial layers of the main network to a new network, referred to as the auxiliary network, which is then trained to perform an auxiliary task. This auxiliary task may be an unsupervised task or a self- supervised task. This means that a neural network is trained in an unsupervised way or in a self-supervised way. Unsupervised training or self- supervised training 470 are training approaches where the input data to the neural network is obtained by modifying the data 480, and using the original version of the data as the ground-truth desired data to which the neural network’s output will be compared for computing the loss and the weight updates.

Modification of data may be a degradation operation, such as addition of noise, removal of portions of the data, etc., or other modifications. In these examples, the auxiliary neural network’s unsupervised task is to recover the original version of the data, i.e., to denoise or to fill-in the missing portions.

Another type of modification may comprise splitting the data in different parts. Splitting may be done for example spatially, e.g. by splitting an image into different crops or blocks. Alternatively, splitting may be done e.g. temporally, e.g. by splitting a video into past frames and future frames. Such modifications may be used for the unsupervised task of predicting one split from the other. For example, a neural network may be trained to get as input one image’s crop/patch/block, and to output a prediction of the neighbouring block. Then, the loss may be computed by computing the mean squared error or other suitable loss measure between the network’s predicted block and the corresponding real block. As another example, a network may be trained to get as input the past frames of a video and to output a prediction of the following frames. Then, the loss may be computed by computing the mean squared error or other suitable loss measure between the predicted future frames and the real future frames. According to an embodiment, the data is image data or video data. The auxiliary task may be an image denoising task. The auxiliary task may be an image inpainting task. The auxiliary task may be an image compression task. The auxiliary task may be a single-image super-resolution task. Super resolution task may be realized by a neural network which is trained to perform upsampling of the input image, or to improve the quality, e.g. mean squared error, of a previously upsampled image. The auxiliary task may be a next frame prediction task. The auxiliary task may be a sound generation task, if the received data comprises also an audio track. The auxiliary task may be any combination of these tasks.

According to an embodiment, the data is image data or video data. The main task may be image classification task. The main task may be image segmentation task. The main task may be image object detection task. The main task may be image or video captioning task. The main task may be salient object detection task. The main task may be video object tracking task. The main task may be any combination of these tasks.

Fig. 5 shows, by way of example, a flowchart of a method 500 for selecting a neural network. The method 500 may be carried out e.g. by devices 130, 131 , 132 having at least one neural network. The method may comprise receiving 510 data and signalling information associated with the data, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural network. The method may comprise training 520 the auxiliary network for performing the auxiliary task. The method may comprise providing 530 an indication of performance of the auxiliary neural network for performing the auxiliary task. The method may comprise receiving 540, in response to providing the indication of performance, a request to perform a main task by a selected main neural network or to provide the selected main neural network to another device. The another device may be e.g. the user device 1 10 or the intermediate device 350.

Referring back to the determination of whether the domain of the training data of the neural network of the user device is different from the domain of the input data, it may be carried out by the user device such that the user device builds an auxiliary neural network. In this case, the user device needs to have capabilities for training neural networks. The auxiliary neural network may be built as described above in the context of the NN device. Then, the user device may train the auxiliary neural network in self-supervised way. If the performance is not high enough, the domain of the neural network of the user device may be determined to be different from the domain of the input data.

As used in this application, the term“circuitry” may refer to one or more or all of the following: (a) hardware-only circuit implementations (such as implementations in only analog and/or digital circuitry) and (b) combinations of hardware circuits and software, such as (as applicable):

(i) a combination of analog and/or digital hardware circuit(s) with software/firmware and (ii) any portions of hardware processor(s) with software (including digital signal processor(s)), software, and memory(ies) that work together to cause an apparatus, such as a mobile phone or server, to perform various functions) and (c) hardware circuit(s) and or processor(s), such as a microprocessor(s) or a portion of a microprocessor(s), that requires software (e.g., firmware) for operation, but the software may not be present when it is not needed for operation.”

This definition of circuitry applies to all uses of this term in this application, including in any claims. As a further example, as used in this application, the term circuitry also covers an implementation of merely a hardware circuit or processor (or multiple processors) or portion of a hardware circuit or processor and its (or their) accompanying software and/or firmware. The term circuitry also covers, for example and if applicable to the particular claim element, a baseband integrated circuit or processor integrated circuit for a mobile device or a similar integrated circuit in server, a cellular network device, or other computing or network device.

A system may comprise a first apparatus and a plurality of second apparatuses. The first apparatus may be a user device 1 10. The second apparatus may be the NN device 130, 131 , 132. The plurality of second apparatuses each comprise a main neural network and an auxiliary neural network. The system may comprise means for receiving, by the first apparatus, data to be processed by one of a plurality of main neural networks. The system may comprise means for providing the data and signalling information associated with the data to a plurality of devices each comprising a main neural network and an auxiliary neural network, the auxiliary neural network comprising a subset of layers of the main neural network, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural networks at the plurality of devices. The system may comprise means for receiving, by the first apparatus from the plurality of devices, indications of performance of the auxiliary neural networks for performing the auxiliary task. The system may comprise means for selecting, based on the indications of performance of the auxiliary neural networks, one of the plurality of main neural networks for performing a main task on the data.

The system may comprise means for receiving, by the plurality of second apparatuses, data and signalling information associated with the data, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural network. The system may comprise means for training, by the plurality of second apparatuses, the auxiliary network for performing the auxiliary task. The system may comprise means for providing, by the plurality of second apparatuses to the first apparatus, an indication of performance of the auxiliary neural network for performing the auxiliary task. The system may comprise means for receiving by the second apparatus comprising a selected main neural network, in response to providing the indication of performance, a request to perform a main task by the selected main neural network or to provide the selected main neural network to another device.

The system may comprise means for requesting, by the first apparatus, the selected main neural network to perform the main task on the data. The system may comprise means for receiving an output of the main task from the second apparatus comprising the selected main neural network.

The system may comprise means for requesting, by the first apparatus, the second apparatus comprising the selected main neural network to provide the selected main neural network. The system may comprise means for receiving, by the first apparatus, the selected main neural network. The system may comprise means for performing, by the first apparatus, the main task on the data using the selected main neural network.

Claims

Claims:

1. An apparatus comprising means for:

receiving data to be processed by one of a plurality of main neural networks;

providing the data and signalling information associated with the data to a plurality of devices each comprising a main neural network and an auxiliary neural network, the auxiliary neural network comprising a subset of layers of the main neural network, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural networks at the plurality of devices;

receiving, from the plurality of devices, indications of performance of the auxiliary neural networks for performing the auxiliary task; and

selecting, based on the indications of performance of the auxiliary neural networks, one of the plurality of main neural networks for performing a main task on the data.

2. The apparatus according to claim 1 , further comprising means for requesting the selected main neural network to perform the main task on the data; and

receiving an output of the main task.

3. The apparatus according to claim 1 , further comprising means for requesting one of the plurality of devices comprising the selected main neural network to provide the selected main neural network;

receiving the selected main neural network; and

performing the main task on the data using the selected main neural network.

4. The apparatus according to any of the claims 1 to 3, wherein the apparatus is a cell phone.

5. An apparatus comprising a main neural network and an auxiliary neural network comprising a subset of layers of the main neural network, further comprising means for:

receiving data and signalling information associated with the data, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural network; training the auxiliary network for performing the auxiliary task; providing an indication of performance of the auxiliary neural network for performing the auxiliary task; and

receiving, in response to providing the indication of performance, a request to perform a main task by a selected main neural network or to provide the selected main neural network to another device.

6. The apparatus according to any of the claims 1 to 5, wherein the auxiliary task is an unsupervised task or a self-supervised task.

7. The apparatus according to any of the claims 1 to 6, wherein the signalling information further comprises at least one of

an identifier of the main task;

one or more parameters for the auxiliary neural networks; and one or more training parameters for the auxiliary neural networks.

8. The apparatus according to any of the claims 1 to 7, wherein the indication of performance comprises a convergence speed of the auxiliary neural network.

9. The apparatus according to any of the claims 5 to 8, wherein the auxiliary networks are trained using as initial values for parameters of the subset of layers values of the subset of layers of the main neural network.

10. The apparatus according to any of the claims 5 to 9, wherein a learning rate of the subset of layers is lower than a learning rate of other layers of the auxiliary neural network.

1 1. The apparatus according to any of the claims 1 to 10, wherein the data is image data or video data and the auxiliary task is an image denoising task, an image inpainting task, an image compression task, a single-image super-resolution task, a next frame prediction task and/or sound generation task from image data or video data.

12. The apparatus according to any of the claims 1 to 1 1 , wherein the data is image data or video data and the main task is an image classification task, an image segmentation task, an image object detection task, an image or a video captioning task, a salient object detection task and/or a video object tracking task.

13. The apparatus of any preceding claim wherein the means comprises at least one processor; at least one memory including computer program code; the at least one memory and the computer program code configured to, with the at least one processor, cause the performance of the apparatus.

14. A method for selecting a neural network, the method comprising

receiving data to be processed by one of a plurality of main neural networks;

15. A method for selecting a neural network, the method comprising

receiving data and signalling information associated with the data, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural network;

training the auxiliary network for performing the auxiliary task;

providing an indication of performance of the auxiliary neural network for performing the auxiliary task; and

16. A computer program comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to: receive data to be processed by one of a plurality of main neural networks;

provide the data and signalling information associated with the data to a plurality of devices each comprising a main neural network and an auxiliary neural network, the auxiliary neural network comprising a subset of layers of the main neural network, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural networks at the plurality of devices;

receive, from the plurality of devices, indications of performance of the auxiliary neural networks for performing the auxiliary task; and

select, based on the indications of performance of the auxiliary neural networks, one of the plurality of main neural networks for performing a main task on the data 17. A computer program comprising computer program code configured to, when executed on at least one processor, cause an apparatus or a system to:

receive data and signalling information associated with the data, wherein the signalling information comprises an identifier of an auxiliary task to be performed on the data by the auxiliary neural network;

train the auxiliary network for performing the auxiliary task;

provide an indication of performance of the auxiliary neural network for performing the auxiliary task; and

receive, in response to providing the indication of performance, a request to perform a main task by a selected main neural network or to provide the selected main neural network to another device.