WO2019141905A1 - An apparatus, a method and a computer program for running a neural network - Google Patents

An apparatus, a method and a computer program for running a neural network Download PDF

Info

Publication number
WO2019141905A1
WO2019141905A1 PCT/FI2019/050032 FI2019050032W WO2019141905A1 WO 2019141905 A1 WO2019141905 A1 WO 2019141905A1 FI 2019050032 W FI2019050032 W FI 2019050032W WO 2019141905 A1 WO2019141905 A1 WO 2019141905A1
Authority
WO
WIPO (PCT)
Prior art keywords
baseline model
application
model
information
parts
Prior art date
Application number
PCT/FI2019/050032
Other languages
French (fr)
Inventor
Caglar AYTEKIN
Lixin Fan
Francesco Cricri
Emre Aksu
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2019141905A1 publication Critical patent/WO2019141905A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/70Services for machine-to-machine communication [M2M] or machine type communication [MTC]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality

Definitions

  • the present invention relates to an apparatus, a method and a computer program for running a neural network.
  • Neural networks are used more and more in various types of devices, from smartphones to self-driving cars. Many mobile devices and IoT devices, while being very constrained in terms of memory, bandwidth and computation capacity, are configured to run applications, which could benefit from running the application or part of it as NN-based algorithms.
  • NN-based algorithms For utilizing NN-based algorithms efficiently in a mobile or an IoT device, it may be anticipated that such devices have several applications that utilize a derivation of a particular neural network baseline. Therefore, storing the multiple derivations (i.e. at least one derived model per an application) is a waste of storage capacity. The more applications there are in the mobile device that use same parts of the baseline, the more inefficient the storage usage becomes, since the same parts of baseline would be stored several times. This may also drastically reduce the inference speed of the NN-based algorithms.
  • a method comprises obtaining, in a memory of a first apparatus, a first baseline model of a neural net; receiving, from a second apparatus, information for modifying the first baseline model so as to be used in a first application of the first apparatus; retrieving the first baseline model from the memory to said first application; and applying modifications based on said information for obtaining a first modified model to be used by the first application.
  • the first apparatus comprises a plurality of applications and the method further comprises receiving, from the second or a third apparatus, information for modifying the first baseline model so as to be used at least in a second application of the first apparatus; retrieving the first baseline model from the memory at least to said second application; and applying modifications based on said information for obtaining a second modified model to be used by the second application.
  • the information for modifying the baseline model comprises one or more of the following: identification of the baseline model to use;
  • the method further comprises requesting, upon noticing that indicated baseline model is not available in the memory of the first apparatus, the second apparatus to send the indicated baseline model to the first apparatus.
  • the method further comprises obtaining, in the memory of the first apparatus, a second baseline model of a neural net; receiving, from the second apparatus, information for modifying the second baseline model so as to be used in the first application of the first apparatus; retrieving the second baseline model from the memory to said first application; and applying modifications based on said information for obtaining a second modified model to be used by the first application.
  • the method further comprises identifying one or more unmodified parts of the same baseline model used by a plurality of applications of the first apparatus, and providing input data of the one or more unmodified parts of the same baseline model from said plurality of applications to be processed as batch processing.
  • An apparatus comprises means for obtaining, in a memory of a first apparatus, a first baseline model of a neural net; means for receiving, from a second apparatus, information for modifying the first baseline model so as to be used in a first application of the first apparatus; means for retrieving the first baseline model from the memory to said first application; and means for applying modifications based on said information for obtaining a first modified model to be used by the first application.
  • a third aspect relates to a method comprising providing, by a second apparatus, information to a first apparatus for modifying a first baseline model of the first apparatus so as to be used in a first application of the first apparatus.
  • An apparatus comprises means for providing information to a remote apparatus for modifying a first baseline model of the remote apparatus so as to be used in a first application of the remote apparatus.
  • Figure 1 shows schematically an electronic device employing embodiments of the invention
  • Figure 2 shows schematically a user equipment suitable for employing
  • FIG. 3 further shows schematically electronic devices employing embodiments of the invention connected using wireless and wired network connections;
  • Figure 4 shows a flow chart of a method for running a neural network according to an embodiment of the invention
  • Figure 5 shows a simplified block diagram for communication between application providers and application of an apparatus according to an embodiment of the invention
  • Figure 6 shows a simplified block diagram for an operation of an application upon modifying a baseline model according to an embodiment of the invention
  • Figure 7 shows a simplified block diagram for an operation of a model assembler unit upon modifying a baseline model according to an embodiment of the invention.
  • Figure 8 shows a simplified block diagram for batch processing input data from a plurality of applications according to an embodiment of the invention.
  • FIG. 1 shows an example block diagram of an apparatus 50 suitable for implementing the embodiments.
  • the apparatus 50 may be a so-called IoT apparatus.
  • the Internet of Things (IoT) may be defined, for example, as an interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure. The convergence of various technologies has and will enable many fields of embedded systems, such as wireless sensor networks, control systems, home/building automation, etc. to be included the Internet of Things (IoT).
  • IoT apparatuses are provided with an IP address as a unique identifier.
  • An IoT apparatus may be provided with a radio transmitter, such as WLAN or Bluetooth transmitter or a RFID tag.
  • an IoT apparatus may have access to an IP-based network via a wired network, such as an Ethernet-based network or a power-line connection (PLC).
  • PLC power-line connection
  • the apparatus may be configured to perform various functions, such as gathering information by one or more sensors, receiving or transmitting information, analyzing information gathered or received by the apparatus, or the like.
  • Figure 2 shows a layout of an apparatus according to an example embodiment. The elements of Figs. 1 and 2 will be explained next.
  • the apparatus 50 may for example be a mobile terminal or user equipment of a wireless communication system, a sensor device, a tag, or other lower power device.
  • embodiments of the invention may be implemented within any electronic device or apparatus which may process data by neural networks.
  • the apparatus 50 may comprise a housing 30 for incorporating and protecting the device.
  • the apparatus 50 further may comprise a display 32 in the form of a liquid crystal display.
  • the display may be any suitable display technology suitable to display an image or video.
  • the apparatus 50 may further comprise a keypad 34.
  • any suitable data or user interface mechanism may be employed.
  • the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
  • the apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input.
  • the apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection.
  • the apparatus 50 may also comprise a battery (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator).
  • the apparatus may further comprise a camera capable of recording or capturing images and/or video.
  • the apparatus 50 may further comprise an infrared port for short range line of sight communication to other devices. In other embodiments the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/firewire wired connection.
  • the apparatus 50 may comprise a controller 56, processor or processor circuitry for controlling the apparatus 50.
  • the controller 56 may be connected to memory 58 which in embodiments of the invention may store both data in the form of image and audio data and/or may also store instructions for implementation on the controller 56.
  • the controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller.
  • the apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
  • a card reader 48 and a smart card 46 for example a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
  • the apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network.
  • the apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).
  • the apparatus 50 may comprise a camera capable of recording or detecting individual frames which are then passed to the codec 54 or the controller for processing.
  • the apparatus may receive the video image data for processing from another device prior to transmission and/or storage.
  • the apparatus 50 may also receive either wirelessly or by a wired connection the image for coding/decoding.
  • the apparatus may further comprise a video coding system incorporating a codec.
  • the structural elements of apparatus 50 described above represent examples of means for performing a corresponding function.
  • the system 10 comprises multiple communication devices which can communicate through one or more networks.
  • the system 10 may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM, UMTS, CDMA, 4G, 5G network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.
  • a wireless cellular telephone network such as a GSM, UMTS, CDMA, 4G, 5G network etc.
  • WLAN wireless local area network
  • the system 10 may include both wired and wireless communication devices and/or apparatus 50 suitable for implementing embodiments of the invention.
  • the system shown in Figure 3 shows a mobile telephone network 11 and a representation of the internet 28.
  • Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.
  • the example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22.
  • PDA personal digital assistant
  • IMD integrated messaging device
  • the apparatus 50 may be stationary or mobile when carried by an individual who is moving.
  • the apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle or any similar suitable mode of transport.
  • the embodiments may also be implemented in a set-top box; i.e. a digital TV receiver, which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC), which have hardware and/or software to process neural network data, in various operating systems, and in chipsets, processors, DSPs and/or embedded systems offering hardware/software based coding.
  • a set-top box i.e. a digital TV receiver, which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC), which have hardware and/or software to process neural network data, in various operating systems, and in chipsets, processors, DSPs and/or embedded systems offering hardware/software based coding.
  • Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24.
  • the base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the internet 28.
  • the system may include additional communication devices and communication devices of various types.
  • the communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.11 and any similar wireless communication technology.
  • CDMA code division multiple access
  • GSM global systems for mobile communications
  • UMTS universal mobile telecommunications system
  • TDMA time divisional multiple access
  • FDMA frequency division multiple access
  • TCP-IP transmission control protocol-internet protocol
  • SMS short messaging service
  • MMS multimedia messaging service
  • email instant messaging service
  • IMS instant messaging service
  • Bluetooth IEEE 802.11 and any similar wireless communication technology
  • communications device involved in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection.
  • a channel may refer either to a physical channel or to a logical channel.
  • a physical channel may refer to a physical transmission medium such as a wire
  • a logical channel may refer to a logical connection over a multiplexed medium, capable of conveying several logical channels.
  • a channel may be used for conveying an information signal, for example a bitstream, from one or several senders (or transmitters) to one or several receivers.
  • Artificial neural networks are parametric computation graphs consisting of units and connections.
  • the units may be arranged in successive layers, and in some neural network architectures only units in adjacent layers are connected.
  • Each connection has an associated parameter or weight, which defines the strength of the connection. The weight gets multiplied by the incoming signal in that connection.
  • each unit in a layer is connected to each unit in the following layer. So, the signal which is output by a certain unit gets multiplied by the connections connecting that unit to another unit in the following layer. The latter unit then may perform a simple operation such as a sum of the weighted signals.
  • the weights of the connections represent the biggest part of the leamable parameters of a neural network.
  • Other leamable parameters may be for example the parameters of the batch-normalization layer.
  • the parameters are learned by means of a training algorithm, where the goal is to minimize the loss function on a training dataset.
  • the training dataset is regarded as a representative sample of the whole data.
  • One popular learning approach is based on iterative local methods, where the loss is minimized by following the negative gradient direction.
  • the gradient is understood to be the gradient of the loss with respect to the weights of the neural network.
  • the loss is represented by the reconstructed prediction error.
  • Computing the gradient on the whole dataset may be computationally too heavy, thus learning is performed in sub-steps, where at each step a mini-batch of data is sampled and gradients are computed from the mini-batch. This is regarded to as stochastic gradient descent.
  • the gradients are usually computed by back-propagation algorithm, where errors are propagated from the output layer to the input layer, by using the chain rule for differentiation. If the loss function or some components of the neural network are not differentiable, it is still possible to estimate the gradient of the loss by using policy gradient methods, such as those used in reinforcement learning.
  • the computed gradients are then used by one of the available optimization routines (such as stochastic gradient descent, Adam, RMSprop, etc.), to compute a weight update, which is then applied to update the weights of the network. After a full pass over the training dataset, the process is repeated several times until a convergence criterion is met, usually a generalization criterion.
  • the gradients of the loss i.e., the gradients of the reconstructed prediction error with respect to the weights of the neural network, may be referred to as the training signal.
  • Online training consists of learning the parameters of the neural network
  • each layer takes input from the layer before and provides its output as the input for the subsequent layer.
  • Initial layers (those close to the input data) extract semantically low-level features such as edges and textures in images, and intermediate and final layers extract more high-level features.
  • the common procedure consists of taking a neural network pre-trained on a large dataset (such as ImageNet, a large public visual database designed for object recognition research), which is able of extracting high-quality generic visual features, especially at lower semantic levels, such as edges and textures. From the pre-trained baseline model, a new model is derived for a specific down-stream task.
  • the derivation is application-specific and may comprise one or more of the following options:
  • fine-tuning refers to training the layer(s) from an existing training state of the layer’s weights.
  • the fine-tuning effect or impact on the weights may depend on several criteria, such as the learning rate of fine-tuning, which is a hyper-parameter such that a higher value would modify to a major extent the existing weights, and a lower value would modify the layer only slightly thus preserving to a major extent the existing information.
  • Neural networks are used more and more in various types of devices, from smartphones to self-driving cars.
  • One very important category of devices is represented by very small devices, such as various mobile devices and IoT devices.
  • Many IoT devices such as the ones mentioned above, are typically very constrained in terms of memory, bandwidth and computation capacity.
  • many mobile devices and IoT devices are configured to run applications, which could benefit from running the application or part of it as NN-based algorithms.
  • An example of such an application is object recognition in a mobile or an IoT device which is capable of media acquisition, such as mobile or IoT devices provided with a camera.
  • NN-based algorithms For utilizing NN-based algorithms efficiently in a mobile or an IoT device, it is expected that such devices have several applications that utilize a derivation of a particular neural network baseline. Therefore, storing the multiple derivations (i.e. at least one derived model per an application) is a waste of storage capacity. The more applications there are in the mobile device that use same parts of the baseline, the more inefficient the storage usage becomes, since the same parts of baseline would be stored several times. This may also drastically reduce the inference speed of the NN-based algorithms.
  • inference speed may become very low if multiple models need to be run on the same device at the same time, e.g., when taking a picture or video the camera may run simultaneously a camera parameter tuning neural net, a person detection neural net, a style-transfer neural net, etc.
  • a further problem may arise from the communication between the application provider and the application in a device regarding the network updates.
  • the application provider can fine-tune/update these derivations and can provide the application with the updated derivations of the neural network.
  • the size of an entire updated derivations of the neural network may be very large, and therefore efficient communication of the updated derivations of the neural network between the application provider and the application on the device may become a problem.
  • the method comprises obtaining (400), in a memory of a first apparatus, a first baseline model of a neural net; receiving (402), from a second apparatus, information for modifying the first baseline model so as to be used in a first application of the first apparatus; retrieving (404) the first baseline model from the memory to said first application; and applying (406) modifications based on said information for obtaining a first modified model to be used by the first application.
  • the first apparatus comprises a plurality of applications and the method further comprises receiving, from the second or a third apparatus, information for modifying the first baseline model so as to be used at least in a second application of the first apparatus; retrieving the first baseline model from the memory at least to said second application; and applying modifications based on said information for obtaining a second modified model to be used by the second application.
  • the first apparatus modifying the baseline model so as to be used in a first application may be any apparatus comprising at least one application, where at least a part of the application may be run as a NN-based algorithm.
  • the first apparatus may be, for example, a mobile device or an IoT device.
  • the second apparatus sending the information for modifying the first baseline model to the first apparatus may also be referred to as“an application provider device”. It is noted that, for the applications of the first apparatus, there may be a plurality of second apparatuses (hence, the“third apparatus” above), i.e. application provider devices, wherein one application provider device may supply information for modifying the first baseline model to be used in one or more of the plurality of applications of the first apparatus.
  • the embodiments may be illustrated by a simplified block chart shown in Figure 5, where an example of the communication between the applications and the memory unit of the first device, and further the communication between the application provider devices and the applications of the first device are shown.
  • the first apparatus 500 comprises four applications 502, 504, 506, 508 (Appl, App2, App3, App4) and a memory unit 510.
  • the memory unit is configured to store one or more baseline models of a neural net.
  • the application provider devices 512, 514, 516, 518 indicate to their corresponding applications 502, 504, 506, 508 information about the necessary baseline model and how to modify the indicated baseline model.
  • each application 502, 504, 506, 508 requests the availability of the indicated baseline model from the memory unit 510 and, if available, loads it from the memory unit to the application.
  • each application 502, 504, 506, 508 applies the necessary modifications, and runs the modified model on the application-specific input data and uses the result on its specific task.
  • the information for modifying the baseline model comprises one or more of the following:
  • Figure 6 illustrates an example of an operation of an application upon modifying the baseline model into a modified model.
  • the application 600 may be, for example, one of the plurality of applications comprised by the apparatus.
  • a dedicated application provider 602 provides the application with the information for modifying a baseline model. Said information comprises at least the identification of the baseline model to use, and one or more instructions for modifying one or more parts of the identified baseline model.
  • the application 600 requests the availability of the indicated baseline model from a memory unit 604, or from a register maintaining a list of baseline models stored in the memory unit 604. If the indicated baseline model is available in the memory unit 604, the application 600 loads it to a model assembler unit 606.
  • the model assembler 606 modifies the loaded baseline model according to the one or more instructions for modifying one or more parts of the identified baseline model received from the application provider 602. After completing the modifications, a modified model 608 is created. The modified model 608 may then be used for analyzing input data for accomplishing a specific application task 610.
  • the method further comprises requesting, upon noticing that the indicated baseline model is not available in the memory of the first apparatus, the second apparatus to send the indicated baseline model to the first apparatus.
  • the second apparatus may then send the indicated baseline model to the first apparatus.
  • the new baseline model may be registered on the register maintaining a list of baseline models stored in the memory of the first apparatus such that applications may use the new baseline model or parts of it.
  • the request to send the indicated baseline model to the first apparatus may be sent to a third apparatus. It is possible that the second apparatus may have access only to the first baseline model, or at least to a limited set of baseline models excluding the indicated baseline model.
  • the request may be sent to the third apparatus having access to larger set of baseline models, or at least to the indicated baseline model.
  • the request sent to the second apparatus is forwarded to the third apparatus, which then responds to the request and sends the indicated baseline model to the first apparatus.
  • the method further comprises obtaining, in the memory of the first apparatus, a second baseline model of a neural net; receiving, from the second apparatus, information for modifying the second baseline model so as to be used in the first application of the first apparatus; retrieving the second baseline model from the memory to said first application; and applying modifications based on said information for obtaining a second modified model to be used by the first application.
  • FIG. 7 illustrates an example of an operation of a model assembler upon modifying the baseline model into a modified model.
  • the model assembler 700 loads the baseline model 702 indicated by the respective application provider device from the memory unit of the apparatus.
  • the baseline model 702 comprises the parts: the first convolutional layer (Convl), the first rectified linear unit layer (relu), the first pooling layer (pool), the second convolutional layer (Conv2), the second rectified linear unit layer (relu), the second pooling layer (pool) and a fully-connected layer (fc).
  • the model assembler 700 also receives the information 704 for modifying the baseline model from the respective application provider device.
  • the information 704 for modifying the baseline model identifies the fully-connected layer (fc) as a part of the baseline model to be discarded.
  • the block 706 shows the modified baseline model where the fully-connected layer (fc) has been removed.
  • the information 704 for modifying the baseline model further identifies the second convolutional layer (Conv2) as a part of the baseline model to be modified by fine-tuning.
  • the block 708 shows the modified baseline model where the second convolutional layer (Conv2) has been fine-tuned.
  • the information 704 for modifying the baseline model further identifies a third convolutional layer (Conv3) as a part to be added to the baseline model and instructions on how to add the third
  • the block 710 shows the modified baseline model where the third convolutional layer (Conv2) has been added.
  • the modified baseline model as shown in block 710 may then be used as the modified model 712 for analyzing input data for the specific application task.
  • the method further comprises identifying one or more unmodified parts of the same baseline model used by a plurality of applications of the first apparatus, and providing input data of the one or more unmodified parts of the same baseline model from said plurality of applications to be processed as batch processing.
  • FIG. 8 An example shown in Figure 8 illustrates this embodiment.
  • Such a batch processing would substantially reduce the computation time, especially if the apparatus is provided with a high capability GPU.
  • the apparatus may comprise a specific application for identifying one or more unmodified parts of the same baseline model used by a plurality of applications of the first apparatus, and for providing input data of the one or more unmodified parts of the same baseline model from said plurality of applications to be processed as batch processing.
  • the application shown in Figure 8, may be referred to as a batch processing application 800 (BP app).
  • the plurality of applications 802 may provide the BP-app with information about the baseline model types and the parts of the baseline models that the applications use as unmodified (i.e.“as is”).
  • a common model type detector 804 may gather information about the application that use a common baseline model and send this information to a common parts detector 806 and/or to a batch processor 808.
  • the common parts detector 806 obtains information about the parts of the baseline models that the applications use as unmodified.
  • the common parts detector 806 may combine this information with the information about the application that use a common baseline model received from the common model type detector 804, and send the combined information to the batch processor 808. Alternatively, the common parts detector 806 may send the information about the parts of the baseline models that the applications use as unmodified to the batch processor, as such, and the batch processor may combine this information with the information about the application that use a common baseline model received from the common model type detector 804.
  • the batch processor 808 may load the required baseline model from the memory 810 (if not loaded previously).
  • the batch processor 808 receives the input data for the unmodified parts of the baseline model from said plurality of applications, carries out the batch processing and distributes the outputs to each of said applications. The applications may then continue their individual computation.
  • a second apparatus providing, by a second apparatus, information to a first apparatus for modifying a first baseline model of the first apparatus so as to be used in a first application of the first apparatus.
  • the communication between the application provider and the application is necessary in order to provide the application with the information regarding which baseline model to use and how to use it.
  • the application provider may occasionally or periodically supply updates.
  • the updates may follow a continuous training made on the server side (such as the application provider), and the updated weights should be sent to the application in an efficient way. In such situation, sending the entire baseline model to the application would be inefficient in terms of used time and bandwidth.
  • the method further comprises updating parts of the first baseline model that are not used as unmodified by the first apparatus; and sending updated weights of said parts of first baseline model to the first apparatus in response to the number and/or size of the updated weights that reaches a predetermined threshold.
  • the baseline model modification can be obtained by fine-tuning the baseline model, whereas the aim of the fine-tuning is to maximize the performance of application task in question and, simultaneously, minimize the amount of bits for describing NN model updates.
  • This can be implemented, for example, by thresholding the number and/or size of the accumulated weights updates during the training phase.
  • sparse coding of the weights updates may be applied during the training phase.
  • the method further comprises sending data to the first apparatus for controlling the first apparatus to retrieve a second baseline model from the memory to said first application; and applying modifications based on said data for obtaining a second modified model to be used by the first application.
  • the application provider may send data, and eventually also labels, to the first apparatus so that it is the first apparatus which retrieves a second baseline model (signaled by the application provider) and fine-tunes it.
  • a second baseline model signalaled by the application provider
  • This may be required in case the first apparatus would obtain better performance by switching from an initial (first) baseline model provided by an application provider A to a better (second) baseline model provided, for example, by an application provider B.
  • the application provider A may not have access to the second baseline model, and therefore the first apparatus may carry out the fine-tuning of the second baseline model.
  • the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
  • the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware.
  • any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
  • the software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
  • the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on multi-core processor architecture, as non-limiting examples.
  • Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
  • the design of integrated circuits is by and large a highly automated process.
  • Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
  • Programs such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
  • the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

Abstract

A method comprising: obtaining, in a first apparatus implementing a neural net, data to be analyzed by the a neural net; receiving, from a second apparatus, a set of computational elements of the neural net needed for analysis; storing the set of computational elements of the neural net in a memory of the first apparatus; retrieving, upon carrying out calculations relating to said analysis, the set of computational elements of the neural net from the memory; and deleting, after carrying out calculations relating to said analysis requiring said set of computational elements of the neural net, the set of computational elements of the neural net from the memory of the first apparatus.

Description

AN APPARATUS, A METHOD AND A COMPUTER PROGRAM FOR RUNNING A
NEURAL NETWORK
TECHNICAL FIELD
[0001] The present invention relates to an apparatus, a method and a computer program for running a neural network.
BACKGROUND
[0002] Recently, the development of various artificial neural network (NN) techniques, especially the ones related to deep learning, has enabled to leam algorithms for several tasks from the raw data, which algorithms may outperform algorithms which have been developed for many years using non-learning based methods.
[0003] Neural networks are used more and more in various types of devices, from smartphones to self-driving cars. Many mobile devices and IoT devices, while being very constrained in terms of memory, bandwidth and computation capacity, are configured to run applications, which could benefit from running the application or part of it as NN-based algorithms.
[0004] For utilizing NN-based algorithms efficiently in a mobile or an IoT device, it may be anticipated that such devices have several applications that utilize a derivation of a particular neural network baseline. Therefore, storing the multiple derivations (i.e. at least one derived model per an application) is a waste of storage capacity. The more applications there are in the mobile device that use same parts of the baseline, the more inefficient the storage usage becomes, since the same parts of baseline would be stored several times. This may also drastically reduce the inference speed of the NN-based algorithms.
SUMMARY
[0005] Now in order to at least alleviate the above problems, an enhanced method for running a neural network is introduced herein.
[0006] A method according to a first aspect comprises obtaining, in a memory of a first apparatus, a first baseline model of a neural net; receiving, from a second apparatus, information for modifying the first baseline model so as to be used in a first application of the first apparatus; retrieving the first baseline model from the memory to said first application; and applying modifications based on said information for obtaining a first modified model to be used by the first application. [0007] According to an embodiment, the first apparatus comprises a plurality of applications and the method further comprises receiving, from the second or a third apparatus, information for modifying the first baseline model so as to be used at least in a second application of the first apparatus; retrieving the first baseline model from the memory at least to said second application; and applying modifications based on said information for obtaining a second modified model to be used by the second application.
[0008] According to an embodiment, wherein for the first and any subsequent application, the information for modifying the baseline model comprises one or more of the following: identification of the baseline model to use;
identification of the parts of the baseline model to use as such;
identification of the parts of the baseline model to be modified;
identification of the parts of the baseline model to be discarded;
identification of the parts of the baseline model to be added to the baseline model and instructions on how to add them.
[0009] According to an embodiment, the method further comprises requesting, upon noticing that indicated baseline model is not available in the memory of the first apparatus, the second apparatus to send the indicated baseline model to the first apparatus.
[0010] According to an embodiment, the method further comprises obtaining, in the memory of the first apparatus, a second baseline model of a neural net; receiving, from the second apparatus, information for modifying the second baseline model so as to be used in the first application of the first apparatus; retrieving the second baseline model from the memory to said first application; and applying modifications based on said information for obtaining a second modified model to be used by the first application.
[001 1] According to an embodiment, the method further comprises identifying one or more unmodified parts of the same baseline model used by a plurality of applications of the first apparatus, and providing input data of the one or more unmodified parts of the same baseline model from said plurality of applications to be processed as batch processing.
[0012] An apparatus according to a second aspect comprises means for obtaining, in a memory of a first apparatus, a first baseline model of a neural net; means for receiving, from a second apparatus, information for modifying the first baseline model so as to be used in a first application of the first apparatus; means for retrieving the first baseline model from the memory to said first application; and means for applying modifications based on said information for obtaining a first modified model to be used by the first application. A third aspect relates to a method comprising providing, by a second apparatus, information to a first apparatus for modifying a first baseline model of the first apparatus so as to be used in a first application of the first apparatus.
[0013] An apparatus according to a fourth aspect comprises means for providing information to a remote apparatus for modifying a first baseline model of the remote apparatus so as to be used in a first application of the remote apparatus.
[0014] The apparatuses as described above are arranged to carry out the above methods and one or more of the embodiments related thereto.
BRIEF DESCRIPTION OF THE DRAWINGS
[0015] For better understanding of the present invention, reference will now be made by way of example to the accompanying drawings in which:
[0016] Figure 1 shows schematically an electronic device employing embodiments of the invention;
[0017] Figure 2 shows schematically a user equipment suitable for employing
embodiments of the invention;
[0018] Figure 3 further shows schematically electronic devices employing embodiments of the invention connected using wireless and wired network connections;
[0019] Figure 4 shows a flow chart of a method for running a neural network according to an embodiment of the invention;
[0020] Figure 5 shows a simplified block diagram for communication between application providers and application of an apparatus according to an embodiment of the invention;
[0021] Figure 6 shows a simplified block diagram for an operation of an application upon modifying a baseline model according to an embodiment of the invention;
[0022] Figure 7 shows a simplified block diagram for an operation of a model assembler unit upon modifying a baseline model according to an embodiment of the invention; and
[0023] Figure 8 shows a simplified block diagram for batch processing input data from a plurality of applications according to an embodiment of the invention.
DETAILED DESCRIPTON OF SOME EXAMPLE EMBODIMENTS
[0024] The following describes in further detail suitable apparatus and possible
mechanisms for running a neural network according to embodiments. In this regard reference is first made to Figures 1 and 2, where Figure 1 shows an example block diagram of an apparatus 50 suitable for implementing the embodiments. [0025] The apparatus 50 may be a so-called IoT apparatus. The Internet of Things (IoT) may be defined, for example, as an interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure. The convergence of various technologies has and will enable many fields of embedded systems, such as wireless sensor networks, control systems, home/building automation, etc. to be included the Internet of Things (IoT). In order to utilize Internet IoT apparatuses are provided with an IP address as a unique identifier. An IoT apparatus may be provided with a radio transmitter, such as WLAN or Bluetooth transmitter or a RFID tag. Alternatively, an IoT apparatus may have access to an IP-based network via a wired network, such as an Ethernet-based network or a power-line connection (PLC).
[0026] The apparatus may be configured to perform various functions, such as gathering information by one or more sensors, receiving or transmitting information, analyzing information gathered or received by the apparatus, or the like.
[0027] Figure 2 shows a layout of an apparatus according to an example embodiment. The elements of Figs. 1 and 2 will be explained next.
[0028] The apparatus 50 may for example be a mobile terminal or user equipment of a wireless communication system, a sensor device, a tag, or other lower power device.
However, it would be appreciated that embodiments of the invention may be implemented within any electronic device or apparatus which may process data by neural networks.
[0029] The apparatus 50 may comprise a housing 30 for incorporating and protecting the device. The apparatus 50 further may comprise a display 32 in the form of a liquid crystal display. In other embodiments of the invention the display may be any suitable display technology suitable to display an image or video. The apparatus 50 may further comprise a keypad 34. In other embodiments of the invention any suitable data or user interface mechanism may be employed. For example the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
[0030] The apparatus may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input. The apparatus 50 may further comprise an audio output device which in embodiments of the invention may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection. The apparatus 50 may also comprise a battery (or in other embodiments of the invention the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator). The apparatus may further comprise a camera capable of recording or capturing images and/or video. The apparatus 50 may further comprise an infrared port for short range line of sight communication to other devices. In other embodiments the apparatus 50 may further comprise any suitable short range communication solution such as for example a Bluetooth wireless connection or a USB/firewire wired connection.
[0031] The apparatus 50 may comprise a controller 56, processor or processor circuitry for controlling the apparatus 50. The controller 56 may be connected to memory 58 which in embodiments of the invention may store both data in the form of image and audio data and/or may also store instructions for implementation on the controller 56. The controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and decoding of audio and/or video data or assisting in coding and decoding carried out by the controller.
[0032] The apparatus 50 may further comprise a card reader 48 and a smart card 46, for example a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
[0033] The apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals for example for communication with a cellular communications network, a wireless communications system or a wireless local area network. The apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and for receiving radio frequency signals from other apparatus(es).
[0034] The apparatus 50 may comprise a camera capable of recording or detecting individual frames which are then passed to the codec 54 or the controller for processing. The apparatus may receive the video image data for processing from another device prior to transmission and/or storage. The apparatus 50 may also receive either wirelessly or by a wired connection the image for coding/decoding. The apparatus may further comprise a video coding system incorporating a codec. The structural elements of apparatus 50 described above represent examples of means for performing a corresponding function.
[0035] With respect to Figure 3, an example of a system within which embodiments of the present invention can be utilized is shown. The system 10 comprises multiple communication devices which can communicate through one or more networks. The system 10 may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM, UMTS, CDMA, 4G, 5G network etc.), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.
[0036] The system 10 may include both wired and wireless communication devices and/or apparatus 50 suitable for implementing embodiments of the invention.
[0037] For example, the system shown in Figure 3 shows a mobile telephone network 11 and a representation of the internet 28. Connectivity to the internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.
[0038] The example communication devices shown in the system 10 may include, but are not limited to, an electronic device or apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22. The apparatus 50 may be stationary or mobile when carried by an individual who is moving. The apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle or any similar suitable mode of transport.
[0039] The embodiments may also be implemented in a set-top box; i.e. a digital TV receiver, which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC), which have hardware and/or software to process neural network data, in various operating systems, and in chipsets, processors, DSPs and/or embedded systems offering hardware/software based coding.
[0040] Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24.
The base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the internet 28. The system may include additional communication devices and communication devices of various types.
[0041 ] The communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth, IEEE 802.11 and any similar wireless communication technology. A
communications device involved in implementing various embodiments of the present invention may communicate using various media including, but not limited to, radio, infrared, laser, cable connections, and any suitable connection.
[0042] In telecommunications and data networks, a channel may refer either to a physical channel or to a logical channel. A physical channel may refer to a physical transmission medium such as a wire, whereas a logical channel may refer to a logical connection over a multiplexed medium, capable of conveying several logical channels. A channel may be used for conveying an information signal, for example a bitstream, from one or several senders (or transmitters) to one or several receivers.
[0043] Recently, the development of various artificial neural network (NN) techniques, especially the ones related to deep learning, has enabled to leam algorithms for several tasks from the raw data, which algorithms may outperform algorithms which have been developed for many years using traditional (non-learning based) methods.
[0044] Artificial neural networks, or simply neural networks, are parametric computation graphs consisting of units and connections. The units may be arranged in successive layers, and in some neural network architectures only units in adjacent layers are connected. Each connection has an associated parameter or weight, which defines the strength of the connection. The weight gets multiplied by the incoming signal in that connection. In fully- connected layers of a feedforward neural network, each unit in a layer is connected to each unit in the following layer. So, the signal which is output by a certain unit gets multiplied by the connections connecting that unit to another unit in the following layer. The latter unit then may perform a simple operation such as a sum of the weighted signals.
[0045] The weights of the connections represent the biggest part of the leamable parameters of a neural network. Other leamable parameters may be for example the parameters of the batch-normalization layer.
[0046] The parameters are learned by means of a training algorithm, where the goal is to minimize the loss function on a training dataset. The training dataset is regarded as a representative sample of the whole data. One popular learning approach is based on iterative local methods, where the loss is minimized by following the negative gradient direction. Here, the gradient is understood to be the gradient of the loss with respect to the weights of the neural network. The loss is represented by the reconstructed prediction error. Computing the gradient on the whole dataset may be computationally too heavy, thus learning is performed in sub-steps, where at each step a mini-batch of data is sampled and gradients are computed from the mini-batch. This is regarded to as stochastic gradient descent. The gradients are usually computed by back-propagation algorithm, where errors are propagated from the output layer to the input layer, by using the chain rule for differentiation. If the loss function or some components of the neural network are not differentiable, it is still possible to estimate the gradient of the loss by using policy gradient methods, such as those used in reinforcement learning. The computed gradients are then used by one of the available optimization routines (such as stochastic gradient descent, Adam, RMSprop, etc.), to compute a weight update, which is then applied to update the weights of the network. After a full pass over the training dataset, the process is repeated several times until a convergence criterion is met, usually a generalization criterion. The gradients of the loss, i.e., the gradients of the reconstructed prediction error with respect to the weights of the neural network, may be referred to as the training signal.
[0047] Online training consists of learning the parameters of the neural network
continuously on a stream of data, as opposed to the case where the dataset is finite and given in its entirety and the network is used only after training has completed.
[0048] Thus, in a feed- forward neural network there is no feedback loop: each layer takes input from the layer before and provides its output as the input for the subsequent layer. Initial layers (those close to the input data) extract semantically low-level features such as edges and textures in images, and intermediate and final layers extract more high-level features. After the feature extraction layers there may be one or more layers performing a certain task, such as classification, semantic segmentation, object detection, denoising, style transfer, super resolution, etc.
[0049] Many applications often utilize a neural network derived from a baseline model.
The common procedure consists of taking a neural network pre-trained on a large dataset (such as ImageNet, a large public visual database designed for object recognition research), which is able of extracting high-quality generic visual features, especially at lower semantic levels, such as edges and textures. From the pre-trained baseline model, a new model is derived for a specific down-stream task. The derivation is application-specific and may comprise one or more of the following options:
fine-tuning all layers;
freezing (i.e., not training) a part of the layers/weights and fine-tuning only the remaining layers/weights;
freezing or fine-tuning one or more layers, and adding one or more new layers and branches (i.e., set of layers) which may have been pre-trained or trained from scratch.
[0050] Herein, fine-tuning refers to training the layer(s) from an existing training state of the layer’s weights. The fine-tuning effect or impact on the weights may depend on several criteria, such as the learning rate of fine-tuning, which is a hyper-parameter such that a higher value would modify to a major extent the existing weights, and a lower value would modify the layer only slightly thus preserving to a major extent the existing information.
[0051 ] In the neural networks described above, a large memory storage is typically needed for the weights of the neural network. Moreover, the number of operations to be performed is also very high.
[0052] Neural networks are used more and more in various types of devices, from smartphones to self-driving cars. One very important category of devices is represented by very small devices, such as various mobile devices and IoT devices. Many IoT devices, such as the ones mentioned above, are typically very constrained in terms of memory, bandwidth and computation capacity. On the other hand, many mobile devices and IoT devices are configured to run applications, which could benefit from running the application or part of it as NN-based algorithms. An example of such an application is object recognition in a mobile or an IoT device which is capable of media acquisition, such as mobile or IoT devices provided with a camera.
[0053] For utilizing NN-based algorithms efficiently in a mobile or an IoT device, it is expected that such devices have several applications that utilize a derivation of a particular neural network baseline. Therefore, storing the multiple derivations (i.e. at least one derived model per an application) is a waste of storage capacity. The more applications there are in the mobile device that use same parts of the baseline, the more inefficient the storage usage becomes, since the same parts of baseline would be stored several times. This may also drastically reduce the inference speed of the NN-based algorithms. In fact, inference speed may become very low if multiple models need to be run on the same device at the same time, e.g., when taking a picture or video the camera may run simultaneously a camera parameter tuning neural net, a person detection neural net, a style-transfer neural net, etc.
[0054] A further problem may arise from the communication between the application provider and the application in a device regarding the network updates. In the above case, where a neural network baseline is used with some modifications, the application provider can fine-tune/update these derivations and can provide the application with the updated derivations of the neural network. However, the size of an entire updated derivations of the neural network may be very large, and therefore efficient communication of the updated derivations of the neural network between the application provider and the application on the device may become a problem. [0055] Now an enhanced method for sharing a baseline neural network among a plurality of application within a device is introduced.
[0056] The method, which is depicted in the flow chart of Figure 4, comprises obtaining (400), in a memory of a first apparatus, a first baseline model of a neural net; receiving (402), from a second apparatus, information for modifying the first baseline model so as to be used in a first application of the first apparatus; retrieving (404) the first baseline model from the memory to said first application; and applying (406) modifications based on said information for obtaining a first modified model to be used by the first application.
[0057] According to an embodiment, the first apparatus comprises a plurality of applications and the method further comprises receiving, from the second or a third apparatus, information for modifying the first baseline model so as to be used at least in a second application of the first apparatus; retrieving the first baseline model from the memory at least to said second application; and applying modifications based on said information for obtaining a second modified model to be used by the second application.
[0058] Herein, the first apparatus modifying the baseline model so as to be used in a first application may be any apparatus comprising at least one application, where at least a part of the application may be run as a NN-based algorithm. The first apparatus may be, for example, a mobile device or an IoT device. The second apparatus sending the information for modifying the first baseline model to the first apparatus may also be referred to as“an application provider device”. It is noted that, for the applications of the first apparatus, there may be a plurality of second apparatuses (hence, the“third apparatus” above), i.e. application provider devices, wherein one application provider device may supply information for modifying the first baseline model to be used in one or more of the plurality of applications of the first apparatus.
[0059] The embodiments may be illustrated by a simplified block chart shown in Figure 5, where an example of the communication between the applications and the memory unit of the first device, and further the communication between the application provider devices and the applications of the first device are shown. In the example of Figure 5, the first apparatus 500 comprises four applications 502, 504, 506, 508 (Appl, App2, App3, App4) and a memory unit 510. The memory unit is configured to store one or more baseline models of a neural net. In the example of Figure 5, for each applications 502, 504, 506, 508 there is a dedicated application provider device 512, 514, 516, 518 (Appl Provider, App2 Provider, App3
Provider, App4 Provider). [0060] The application provider devices 512, 514, 516, 518 indicate to their corresponding applications 502, 504, 506, 508 information about the necessary baseline model and how to modify the indicated baseline model. On the basis of an application-specific information about the necessary baseline model, each application 502, 504, 506, 508 requests the availability of the indicated baseline model from the memory unit 510 and, if available, loads it from the memory unit to the application. On the basis of an application-specific information about how to modify the indicated baseline model, each application 502, 504, 506, 508 applies the necessary modifications, and runs the modified model on the application-specific input data and uses the result on its specific task.
[0061] According to an embodiment, for each application, the information for modifying the baseline model comprises one or more of the following:
identification of the baseline model to use;
identification of the parts of the baseline model to use as such (i.e., without further training or fine-tuning);
identification of the parts of the baseline model to be modified (i.e. further training or fine-tuning);
identification of the parts of the baseline model to be discarded;
identification of the parts to be added to the baseline model and instructions on how to add them.
[0062] Figure 6 illustrates an example of an operation of an application upon modifying the baseline model into a modified model. The application 600 may be, for example, one of the plurality of applications comprised by the apparatus. A dedicated application provider 602 provides the application with the information for modifying a baseline model. Said information comprises at least the identification of the baseline model to use, and one or more instructions for modifying one or more parts of the identified baseline model. Based on the identification of the baseline model (i.e. the baseline type), the application 600 requests the availability of the indicated baseline model from a memory unit 604, or from a register maintaining a list of baseline models stored in the memory unit 604. If the indicated baseline model is available in the memory unit 604, the application 600 loads it to a model assembler unit 606.
[0063] The model assembler 606 modifies the loaded baseline model according to the one or more instructions for modifying one or more parts of the identified baseline model received from the application provider 602. After completing the modifications, a modified model 608 is created. The modified model 608 may then be used for analyzing input data for accomplishing a specific application task 610.
[0064] According to an embodiment, the method further comprises requesting, upon noticing that the indicated baseline model is not available in the memory of the first apparatus, the second apparatus to send the indicated baseline model to the first apparatus. Thus, if the indicated baseline model is not stored in the first apparatus, the respective application provider is notified about the situation, and the application provider may then send the indicated baseline model to the first apparatus. The new baseline model may be registered on the register maintaining a list of baseline models stored in the memory of the first apparatus such that applications may use the new baseline model or parts of it.
[0065] According to an embodiment, instead of the request made to the second apparatus, the request to send the indicated baseline model to the first apparatus may be sent to a third apparatus. It is possible that the second apparatus may have access only to the first baseline model, or at least to a limited set of baseline models excluding the indicated baseline model.
In such case, the request may be sent to the third apparatus having access to larger set of baseline models, or at least to the indicated baseline model. According to an embodiment, it is also possible that in such case the request sent to the second apparatus is forwarded to the third apparatus, which then responds to the request and sends the indicated baseline model to the first apparatus.
[0066] According to an embodiment, the method further comprises obtaining, in the memory of the first apparatus, a second baseline model of a neural net; receiving, from the second apparatus, information for modifying the second baseline model so as to be used in the first application of the first apparatus; retrieving the second baseline model from the memory to said first application; and applying modifications based on said information for obtaining a second modified model to be used by the first application.
[0067] Thus, it may be possible that, after the addition of the new baseline model in the memory of the first apparatus, one or more of the present applications of the first apparatus will switch from using their modified model derived from a previously presented baseline model to a new modified model derived at least partly from the new baseline model. This may happen, for example, if a better visual feature extraction is uploaded in the new baseline model by an application provider (e.g., a better feature extractor, either in terms of feature quality or in terms of computational efficiency, meaning less parameters and thus less computational operations). [0068] Figure 7 illustrates an example of an operation of a model assembler upon modifying the baseline model into a modified model. The model assembler 700 loads the baseline model 702 indicated by the respective application provider device from the memory unit of the apparatus. In this example, the baseline model 702 comprises the parts: the first convolutional layer (Convl), the first rectified linear unit layer (relu), the first pooling layer (pool), the second convolutional layer (Conv2), the second rectified linear unit layer (relu), the second pooling layer (pool) and a fully-connected layer (fc).
[0069] The model assembler 700 also receives the information 704 for modifying the baseline model from the respective application provider device. The information 704 for modifying the baseline model, in this example, identifies the fully-connected layer (fc) as a part of the baseline model to be discarded. The block 706 shows the modified baseline model where the fully-connected layer (fc) has been removed. The information 704 for modifying the baseline model further identifies the second convolutional layer (Conv2) as a part of the baseline model to be modified by fine-tuning. The block 708 shows the modified baseline model where the second convolutional layer (Conv2) has been fine-tuned. The information 704 for modifying the baseline model further identifies a third convolutional layer (Conv3) as a part to be added to the baseline model and instructions on how to add the third
convolutional layer to the baseline model. The block 710 shows the modified baseline model where the third convolutional layer (Conv2) has been added. The modified baseline model as shown in block 710 may then be used as the modified model 712 for analyzing input data for the specific application task.
[0070] It is possible, or even likely, that multiple applications will use unmodified parts of the same baseline model simultaneously. According to an embodiment, in such situation, the method further comprises identifying one or more unmodified parts of the same baseline model used by a plurality of applications of the first apparatus, and providing input data of the one or more unmodified parts of the same baseline model from said plurality of applications to be processed as batch processing.
[0071] An example shown in Figure 8 illustrates this embodiment. Thus, it is possible to batch-process the inputs of a plurality of applications regarding the one or more unmodified parts of the baseline model that are shared by said plurality of applications. Such a batch processing would substantially reduce the computation time, especially if the apparatus is provided with a high capability GPU.
[0072] The apparatus may comprise a specific application for identifying one or more unmodified parts of the same baseline model used by a plurality of applications of the first apparatus, and for providing input data of the one or more unmodified parts of the same baseline model from said plurality of applications to be processed as batch processing. The application, shown in Figure 8, may be referred to as a batch processing application 800 (BP app). The plurality of applications 802 may provide the BP-app with information about the baseline model types and the parts of the baseline models that the applications use as unmodified (i.e.“as is”). A common model type detector 804 may gather information about the application that use a common baseline model and send this information to a common parts detector 806 and/or to a batch processor 808. The common parts detector 806 obtains information about the parts of the baseline models that the applications use as unmodified.
The common parts detector 806 may combine this information with the information about the application that use a common baseline model received from the common model type detector 804, and send the combined information to the batch processor 808. Alternatively, the common parts detector 806 may send the information about the parts of the baseline models that the applications use as unmodified to the batch processor, as such, and the batch processor may combine this information with the information about the application that use a common baseline model received from the common model type detector 804.
[0073] For carrying the batch processing, the batch processor 808 may load the required baseline model from the memory 810 (if not loaded previously). The batch processor 808 receives the input data for the unmodified parts of the baseline model from said plurality of applications, carries out the batch processing and distributes the outputs to each of said applications. The applications may then continue their individual computation.
[0074] Further aspects of the invention relate to the operation of the second apparatus (“application provider device”). In accordance with what has been disclosed above, the operation of the second apparatus, in general, may be defined by a method comprising:
providing, by a second apparatus, information to a first apparatus for modifying a first baseline model of the first apparatus so as to be used in a first application of the first apparatus.
[0075] The communication between the application provider and the application is necessary in order to provide the application with the information regarding which baseline model to use and how to use it. For the parts of the baseline model that are not used as unmodified (i.e. fine-tuned layers or layers added for the application), the application provider may occasionally or periodically supply updates. The updates may follow a continuous training made on the server side (such as the application provider), and the updated weights should be sent to the application in an efficient way. In such situation, sending the entire baseline model to the application would be inefficient in terms of used time and bandwidth.
[0076] According to an embodiment, the method further comprises updating parts of the first baseline model that are not used as unmodified by the first apparatus; and sending updated weights of said parts of first baseline model to the first apparatus in response to the number and/or size of the updated weights that reaches a predetermined threshold.
[0077] The baseline model modification can be obtained by fine-tuning the baseline model, whereas the aim of the fine-tuning is to maximize the performance of application task in question and, simultaneously, minimize the amount of bits for describing NN model updates. This can be implemented, for example, by thresholding the number and/or size of the accumulated weights updates during the training phase. According to another embodiment, sparse coding of the weights updates may be applied during the training phase.
[0078] According to yet another embodiment, the method further comprises sending data to the first apparatus for controlling the first apparatus to retrieve a second baseline model from the memory to said first application; and applying modifications based on said data for obtaining a second modified model to be used by the first application.
[0079] Thus, the application provider may send data, and eventually also labels, to the first apparatus so that it is the first apparatus which retrieves a second baseline model (signaled by the application provider) and fine-tunes it. This may be required in case the first apparatus would obtain better performance by switching from an initial (first) baseline model provided by an application provider A to a better (second) baseline model provided, for example, by an application provider B. In such case, the application provider A may not have access to the second baseline model, and therefore the first apparatus may carry out the fine-tuning of the second baseline model.
[0080] In general, the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. For example, some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof. [0081] The embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware. Further in this regard it should be noted that any blocks of the logic flow as in the Figures may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on such physical media as memory chips, or memory blocks implemented within the processor, magnetic media such as hard disk or floppy disks, and optical media such as for example DVD and the data variants thereof, CD.
[0082] The memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory. The data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on multi-core processor architecture, as non-limiting examples.
[0083] Embodiments of the inventions may be practiced in various components such as integrated circuit modules. The design of integrated circuits is by and large a highly automated process. Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
[0084] Programs, such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules. Once the design for a semiconductor circuit has been completed, the resultant design, in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.
[0085] The foregoing description has provided by way of exemplary and non-limiting examples a full and informative description of the exemplary embodiment of this invention. However, various modifications and adaptations may become apparent to those skilled in the relevant arts in view of the foregoing description, when read in conjunction with the accompanying drawings and the appended claims. However, all such and similar
modifications of the teachings of this invention will still fall within the scope of this invention.

Claims

CLAIMS:
1. A method comprising
obtaining, in a memory of a first apparatus, a first baseline model of a neural net; receiving, from a second apparatus, information for modifying the first baseline model so as to be used in a first application of the first apparatus;
retrieving the first baseline model from the memory to said first application; and applying modifications based on said information for obtaining a first modified model to be used by the first application.
2. The method according to claim 1, wherein the first apparatus comprises a plurality of applications and the method further comprises
receiving, from the second or a third apparatus, information for modifying the first baseline model so as to be used at least in a second application of the first apparatus;
retrieving the first baseline model from the memory at least to said second application; and
applying modifications based on said information for obtaining a second modified model to be used by the second application.
3. The method according to claim 1 or 2, wherein for the first and any subsequent application, the information for modifying the baseline model comprises one or more of the following:
identification of the baseline model to use;
identification of the parts of the baseline model to use as such;
identification of the parts of the baseline model to be modified;
identification of the parts of the baseline model to be discarded;
identification of the parts of the baseline model to be added to the baseline model and instructions on how to add them.
4. The method according to any preceding claim, further comprising
requesting, upon noticing that indicated baseline model is not available in the memory of the first apparatus, the second apparatus to send the indicated baseline model to the first apparatus..
5. The method according claim 4, further comprising
obtaining, in the memory of the first apparatus, a second baseline model of a neural net;
receiving, from the second apparatus, information for modifying the second baseline model so as to be used in the first application of the first apparatus;
retrieving the second baseline model from the memory to said first application; and
applying modifications based on said information for obtaining a second modified model to be used by the first application.
6. The method according to any preceding claim, further comprising
identifying one or more unmodified parts of the same baseline model used by a plurality of applications of the first apparatus, and
providing input data of the one or more unmodified parts of the same baseline model from said plurality of applications to be processed as batch processing.
7. An apparatus comprising
means for obtaining, in a memory of a first apparatus, a first baseline model of a neural net;
means for receiving, from a second apparatus, information for modifying the first baseline model so as to be used in a first application of the first apparatus;
means for retrieving the first baseline model from the memory to said first application; and
means for applying modifications based on said information for obtaining a first modified model to be used by the first application.
8. The apparatus according to claim 7, further comprising
a plurality of applications stored in the memory;
means for receiving, from the second or a third apparatus, information for modifying the first baseline model so as to be used at least in a second application of the apparatus;
means for retrieving the first baseline model from the memory at least to said second application; and means for applying modifications based on said information for obtaining a second modified model to be used by the second application.
9. The apparatus according to claim 7 or 8, wherein for the first and any subsequent application, the information for modifying the baseline model comprises one or more of the following:
identification of the baseline model to use;
identification of the parts of the baseline model to use as such;
identification of the parts of the baseline model to be modified;
identification of the parts of the baseline model to be discarded;
identification of the parts of the baseline model to be added to the baseline model and instructions on how to add them.
10. The apparatus according to any of claims 7 - 9, further comprising
means for requesting, upon noticing that indicated baseline model is not available in the memory of the apparatus, the second apparatus to send the indicated baseline model to the apparatus..
11. The apparatus according claim 10, further comprising
means for obtaining, in the memory of the apparatus, a second baseline model of a neural net;
means for receiving, from the second apparatus, information for modifying the second baseline model so as to be used in the first application of the apparatus;
means for retrieving the second baseline model from the memory to said first application; and
means for applying modifications based on said information for obtaining a second modified model to be used by the first application.
12. The apparatus according to any of claims 7 - 11, further comprising
means for identifying one or more unmodified parts of the same baseline model used by a plurality of applications of the apparatus, and
means for providing input data of the one or more unmodified parts of the same baseline model from said plurality of applications to be processed as batch processing.
13. The apparatus of according to any of claims 7 - 12, wherein the means comprises at least one processor and at least one memory, said at least one memory stored with code thereon, which when executed by said at least one processor, causes the performance of the apparatus.
14. A method comprising:
providing, by a second apparatus, information to a first apparatus for modifying a first baseline model of the first apparatus so as to be used in a first application of the first apparatus.
15. The method according to claim 14, wherein for the first and any subsequent application, the information for modifying the baseline model comprises one or more of the following:
identification of the baseline model to use;
identification of the parts of the baseline model to use as such;
identification of the parts of the baseline model to be modified;
identification of the parts of the baseline model to be discarded;
identification of the parts of the baseline model to be added to the baseline model and instructions on how to add them.
16. The method according to claim 14 or 15, further comprising
receiving a request, from the first apparatus, to send the baseline model indicated in said information for modifying the baseline model to the first apparatus;
sending said the requested baseline model to the first apparatus.
17. The method according any of claims 14 - 16, further comprising
providing, by a second apparatus, information to the first apparatus for modifying a second baseline model so as to be used in the first application of the first apparatus.
18. The method according any of claims 14 - 17, further comprising
updating parts of the first baseline model that are not used as unmodified by the first apparatus; and sending updated weights of said parts of first baseline model to the first apparatus in response to the number and/or size of the updated weights reaches a predetermined threshold.
19. The method according any of claims 14 - 17, further comprising
sending data to the first apparatus for controlling the first apparatus to retrieve a second baseline model from the memory to said first application; and
applying modifications based on said data for obtaining a second modified model to be used by the first application.
20. An apparatus comprising:
means for providing information to a remote apparatus for modifying a first baseline model of the remote apparatus so as to be used in a first application of the remote apparatus.
21. The apparatus according to claim 20, wherein for the first and any subsequent application, the information for modifying the baseline model comprises one or more of the following:
identification of the baseline model to use;
identification of the parts of the baseline model to use as such;
identification of the parts of the baseline model to be modified;
identification of the parts of the baseline model to be discarded;
identification of the parts of the baseline model to be added to the baseline model and instructions on how to add them.
22. The apparatus according to claim 20 or 21, further comprising
means for receiving a request, from the remote apparatus, to send the baseline model indicated in said information for modifying the baseline model to the remote apparatus;
means for sending said the requested baseline model to the remote apparatus.
23. The apparatus according any of claims 20 - 22, further comprising
means for providing information to the remote apparatus for modifying a second baseline model so as to be used in the first application of the remote apparatus.
24. The apparatus according any of claims 20 - 23, further comprising
means for updating parts of the first baseline model that are not used as unmodified by the remote apparatus; and
means for sending updated weights of said parts of first baseline model to the remote apparatus in response to the number and/or size of the updated weights reaches a predetermined threshold.
25. The apparatus according any of claims 20 - 23, further comprising
means for sending data to the first apparatus for controlling the remote apparatus to retrieve a second baseline model from the memory to said first application; and
means for applying modifications based on said data for obtaining a second modified model to be used by the first application.
26. The apparatus of according to any of claims 20 - 25, wherein the means comprises at least one processor and at least one memory, said at least one memory stored with code thereon, which when executed by said at least one processor, causes the performance of the apparatus
PCT/FI2019/050032 2018-01-19 2019-01-17 An apparatus, a method and a computer program for running a neural network WO2019141905A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FI20185052 2018-01-19
FI20185052 2018-01-19

Publications (1)

Publication Number Publication Date
WO2019141905A1 true WO2019141905A1 (en) 2019-07-25

Family

ID=67302017

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2019/050032 WO2019141905A1 (en) 2018-01-19 2019-01-17 An apparatus, a method and a computer program for running a neural network

Country Status (1)

Country Link
WO (1) WO2019141905A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021107488A1 (en) * 2019-11-28 2021-06-03 Samsung Electronics Co., Ltd. Server and method for controlling server
CN114968602A (en) * 2022-08-01 2022-08-30 成都图影视讯科技有限公司 Architecture, method and apparatus for a dynamically resource-allocated neural network chip

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5283418A (en) * 1992-02-27 1994-02-01 Westinghouse Electric Corp. Automated rotor welding processes using neural networks
US20150206065A1 (en) * 2013-11-22 2015-07-23 California Institute Of Technology Weight benefit evaluator for training data
US20160283864A1 (en) * 2015-03-27 2016-09-29 Qualcomm Incorporated Sequential image sampling and storage of fine-tuned features

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5283418A (en) * 1992-02-27 1994-02-01 Westinghouse Electric Corp. Automated rotor welding processes using neural networks
US20150206065A1 (en) * 2013-11-22 2015-07-23 California Institute Of Technology Weight benefit evaluator for training data
US20160283864A1 (en) * 2015-03-27 2016-09-29 Qualcomm Incorporated Sequential image sampling and storage of fine-tuned features

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CRICRI, F. ET AL.: "Use cases for Neural Network Representations", MOTION PICTURE EXPERT GROUP OR ISO/IEC JTC1/SC29/WG11, 17 January 2018 (2018-01-17), XP030070434, Retrieved from the Internet <URL:http://wg11.sc29.org> [retrieved on 20180905] *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021107488A1 (en) * 2019-11-28 2021-06-03 Samsung Electronics Co., Ltd. Server and method for controlling server
CN114968602A (en) * 2022-08-01 2022-08-30 成都图影视讯科技有限公司 Architecture, method and apparatus for a dynamically resource-allocated neural network chip
CN114968602B (en) * 2022-08-01 2022-10-21 成都图影视讯科技有限公司 Architecture, method and apparatus for a dynamically resource-allocated neural network chip

Similar Documents

Publication Publication Date Title
WO2019141902A1 (en) An apparatus, a method and a computer program for running a neural network
CN107545889B (en) Model optimization method and device suitable for pattern recognition and terminal equipment
WO2020087974A1 (en) Model generation method and device
CN113362811B (en) Training method of voice recognition model, voice recognition method and device
EP3938965A1 (en) An apparatus, a method and a computer program for training a neural network
CN113436620A (en) Model training method, speech recognition method, device, medium and equipment
WO2019141905A1 (en) An apparatus, a method and a computer program for running a neural network
CN113191479A (en) Method, system, node and storage medium for joint learning
CN113327599A (en) Voice recognition method, device, medium and electronic equipment
WO2021026034A1 (en) Artificial intelligence job recommendation neural network machine learning training based on embedding technologies and actual and synthetic job transition latent information
Wang et al. Meta-learning with less forgetting on large-scale non-stationary task distributions
CN116562357B (en) Click prediction model training method and device
CN116662411A (en) Construction method of scene template library, object prediction method and device and electronic equipment
CN109753708A (en) A kind of payment amount prediction technique, device and readable storage medium storing program for executing
CN114648712B (en) Video classification method, device, electronic equipment and computer readable storage medium
CN111709784B (en) Method, apparatus, device and medium for generating user retention time
US20230093630A1 (en) System and method for adapting to changing constraints
CN114330239A (en) Text processing method and device, storage medium and electronic equipment
WO2019141896A1 (en) A method for neural networks
CN114416863A (en) Method, apparatus, and medium for performing model-based parallel distributed reasoning
CN111582456A (en) Method, apparatus, device and medium for generating network model information
CN111581455A (en) Text generation model generation method and device and electronic equipment
WO2020008108A1 (en) An apparatus, a method and a computer program for training a neural network
WO2024078252A1 (en) Feature data encoding and decoding methods and related apparatuses
US20240096063A1 (en) Integrating model reuse with model retraining for video analytics

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19741487

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19741487

Country of ref document: EP

Kind code of ref document: A1