WO2022269469A1 - Procédé, appareil et produit-programme informatique d'apprentissage fédéré de données non distribuées de manière identique et non indépendantes - Google Patents

Procédé, appareil et produit-programme informatique d'apprentissage fédéré de données non distribuées de manière identique et non indépendantes Download PDF

Info

Publication number
WO2022269469A1
WO2022269469A1 PCT/IB2022/055719 IB2022055719W WO2022269469A1 WO 2022269469 A1 WO2022269469 A1 WO 2022269469A1 IB 2022055719 W IB2022055719 W IB 2022055719W WO 2022269469 A1 WO2022269469 A1 WO 2022269469A1
Authority
WO
WIPO (PCT)
Prior art keywords
group
weight
sparsity
updates
groups
Prior art date
Application number
PCT/IB2022/055719
Other languages
English (en)
Inventor
Homayun AFRABANDPEY
Hamed REZAZADEGAN TAVAKOLI
Honglei Zhang
Francesco Cricrì
Goutham RANGU
Emre Baris Aksu
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Publication of WO2022269469A1 publication Critical patent/WO2022269469A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • An example apparatus includes: at least one processor; and at least one non-transitory memory including computer program code; wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform: define one or more groups for elements of a weight update; compress one or more weight updates based on a plurality of sparsity levels, wherein the one or more weight updates comprise a weight update corresponding to each group of the one or more groups; wherein to compress the one or more weight updates based on the plurality of sparsity levels, the apparatus is further caused to: apply an intra group sparsity that results in a sparse weight update for each group; and apply an inter group sparsity that results in updating weights for a subset of the one or more groups.
  • the example apparatus may further include, wherein the one or more groups comprise at least one of a group of convolution layers, a group of filters in a convolution layer, a group of features across multiple filters in a convolution layer, or a group of n ⁇ n block in a fully connected layer.
  • the example apparatus may be further caused to define an objective function, wherein the objective function is used by one or more client devices to train the one or more weight updates by using data available at each of the one or more client devices.
  • the example apparatus may further include, wherein the objective function is defined as following: wherein a convolutional neural network (CNN) with L layers is considered in which layers are parameterized by weight tensors to train the CNN using a federated learning (FL) framework with a server and one or more clients; wherein the objective function is used by each client in the FL framework; wherein at each iteration t: a weight tensor is received by each client of the one or more clients and is updated to by using local training data set of the each client; and weight updates are signaled to the server; and wherein is a training loss, is a regularization term for weight updates at layer l, and ⁇ is a scalar parameter that balances the training loss and the weight update regularization.
  • CNN convolutional neural network
  • FL federated learning
  • the example apparatus may be further caused to define a final objective function by adding regularization term to the objective function. [0010] The example apparatus may be further caused to define a bitmask, wherein the bitmask determines the updated and non-updated weights. [0011] The example apparatus may be further caused to communicate the bitmask and values of the updated weights to the one or more client devices. [0012] The example apparatus may be further caused to encode the bitmask. [0013] The example apparatus may be further caused to at least compress or quantize the values of the updated weights. [0014] The example apparatus may befurther caused to apply the training loss function in neural network compression techniques to impose both network compression and weight update compression simultaneously or substantially simultaneously.
  • the example apparatus may be further caused to apply the intra group sparsity and the inter group sparsity simultaneously or substantially simultaneously.
  • the example apparatus may be further caused to: calculate a threshold value from absolute values of the one or more weight updates; and set at least one weight update, of the one or more weight updates, to zero when an absolute value of the at least one weight update is less than the threshold.
  • An example method includes: defining one or more groups for elements of a weight update; compressing one or more weight updates based on a plurality of sparsity levels, wherein the one or more weight updates comprise a weight update corresponding to each group of the one or more groups; wherein compressing the one or more weight updates based on the plurality of sparsity levels comprises: applying an intra group sparsity that results in a sparse weight update for each group; and applying an inter group sparsity that results in updating weights for a subset of the one or more groups.
  • the example method may further include, wherein the one or more groups comprise at least one of a group of convolution layers, a group of filters in a convolution layer, a group of features across multiple filters in a convolution layer, or a group of n ⁇ n block in a fully connected layer.
  • the example method may further include defining an objective function, wherein the objective function is used by one or more client devices to train the one or more weight updates by using data available at each of the one or more client devices.
  • the example method may further include, wherein the objective function is defined as following: wherein a convolutional neural network (CNN) with L layers is considered in which layers are parameterized by weight tensors to train the CNN using a federated learning (FL) framework with a server and one or more clients; wherein the objective function is used by each client in the FL framework; wherein at each iteration t: a weight tensor is received by each client of the one or more clients and is updated to by using local training data set of the each client; and weight updates are signaled to the server; and wherein s a training loss, ⁇ is a regularization term for weight updates at layer l, and ⁇ is a scalar parameter that balances the training loss and the weight update regularization.
  • CNN convolutional neural network
  • FL federated learning
  • the example method may further include defining a final objective function by adding regularization term to the objective function. [0023] The example method may further include defining a bitmask, wherein the bitmask determines the updated and non-updated weights. [0024] The example method may further include communicating the bitmask and values of the updated weights to the one or more client devices. [0025] The example method may further include encoding the bitmask. [0026] The example method may further include at least one of compressing or quantizing the values of the updated weights. [0027] The example method may further include applying the training loss function in neural network compression techniques to impose both network compression and weight update compression simultaneously or substantially simultaneously.
  • the example method may further include applying the intra group sparsity and the inter group sparsity simultaneously or substantially simultaneously.
  • the example method may further include: calculating a threshold value from absolute values of the one or more weight updates; and setting at least one weight update, of the one or more weight updates, to zero when an absolute value of the at least one weight update is less than the threshold.
  • An example computer readable medium includes program instructions for causing an apparatus to perform at least the following: defining one or more groups for elements of a weight update; compressing one or more weight updates based on a plurality of sparsity levels, wherein the one or more weight updates comprise a weight update corresponding to each group of the one or more groups; wherein compressing the one or more weight updates based on the plurality of sparsity levels comprises: applying an intra group sparsity that results in a sparse weight update for each group; and applying an inter group sparsity that results in updating weights for a subset of the one or more groups.
  • the example computer readable medium may further cause the apparatus to perform the methods as described in any of the previous paragraphs.
  • FIG. 1 shows schematically an electronic device employing embodiments of the examples described herein.
  • FIG. 2 shows schematically a user equipment suitable for employing embodiments of the examples described herein.
  • FIG. 3 further shows schematically electronic devices employing embodiments of the examples described herein connected using wireless and wired network connections.
  • FIG.4 shows schematically a block diagram of an encoder on a general level.
  • FIG. 38 shows schematically a block diagram of an encoder on a general level.
  • FIG. 5 is a block diagram showing an interface between an encoder and a decoder in accordance with the examples described herein.
  • FIG.6 illustrates a system configured to support streaming of media data from a source to a client device;
  • FIG. 7 is a block diagram of an apparatus that may be specifically configured in accordance with an example embodiment.
  • FIG. 8 illustrates the effect of group sparsity on group-wise weight updates, in accordance with an embodiment.
  • FIG.9 illustrates the effect of the inter-group sparsification term on the weight updates, according to an embodiment.
  • FIG. 10 illustrates an example pseudocode for efficient federated learning for non-iid data via compressed communication, in accordance with an embodiment.
  • FIG. 10 illustrates an example pseudocode for efficient federated learning for non-iid data via compressed communication, in accordance with an embodiment.
  • FIG. 11 is an example apparatus, which may be implemented in hardware, configured to implement model or architecture for efficient federated learning for non-iid data via compressed communication.
  • FIG. 12 illustrates an example method for federated learning for non-iid data, in accordance with an embodiment.
  • FIG. 13 is a block diagram of one possible and non-limiting system in which the example embodiments may be practiced.
  • AMF access and mobility management function AVC advanced video coding CABAC context-adaptive binary arithmetic coding CDMA code-division multiple access CE core experiment CNN convolutional neural network CU central unit DASH dynamic adaptive streaming over HTTP DCT discrete cosine transform DSP digital signal processor DU distributed unit eNB (or eNodeB) evolved Node B (for example, an LTE base station) EN-DC E-UTRA-NR dual connectivity en-gNB or En-gNB node providing NR user plane and control plane protocol terminations towards the UE, and acting as secondary node in EN-DC E-UTRA evolved universal terrestrial radio access, for example, the LTE radio access technology FDMA frequency division multiple access f(n) fixed-pattern bit string using n bits written (from left to right) with the left bit first.
  • F1 or F1-C interface between CU and DU control interface FL federated learning gNB (or gNodeB) base station for 5G/NR for example, a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC GSM Global System for Mobile communications H.222.0 MPEG-2 Systems is formally known as ISO/IEC 13818-1 and as ITU-T Rec.
  • IBC intra block copy ID identifier IEC International Electrotechnical Commission IEEE Institute of Electrical and Electronics Engineers I/F interface
  • IMS instant messaging service IoT internet of things IP internet protocol ISO International Organization for Standardization ISOBMFF ISO base media file format ITU International Telecommunication Union ITU-T ITU Telecommunication Standardization Sector LTE long-term evolution LZMA Lempel–Ziv–Markov chain compression LZMA2 simple container format that can include both uncompressed data and LZMA data LZO Lempel–Ziv–Oberhumer compression LZW Lempel–Ziv–Welch compression MAC medium access control mdat MediaDataBox MME mobility management entity MMS multimedia messaging service moov MovieBox MP4 file format for MPEG-4 Part 14 files MPEG moving picture experts group MPEG-2 H.222/H.262 as defined by the ITU MPEG-4 audio and video coding standard for ISO/IEC 14496
  • circuitry refers to (a) hardware-only circuit implementations (e.g., implementations in analog circuitry and/or digital circuitry); (b) combinations of circuits and computer program product(s) comprising software and/or firmware instructions stored on one or more computer readable memories that work together to cause an apparatus to perform one or more functions described herein; and (c) circuits, such as, for example, a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation even when the software or firmware is not physically present.
  • This definition of ‘circuitry’ applies to all uses of this term herein, including in any claims.
  • the term ‘circuitry’ also includes an implementation comprising one or more processors and/or portion(s) thereof and accompanying software and/or firmware.
  • the term ‘circuitry’ as used herein also includes, for example, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, other network device, and/or other computing device.
  • a method, apparatus and computer program product are provided in accordance with an example embodiment in order to implement efficient federated learning for non-iid data via compressed communication.
  • data used to train a neural network within a federated learning framework may include a media frame, or other data, e.g., audio, text, and the like.
  • FIG. 1 shows an example block diagram of an apparatus 50.
  • the apparatus may be an Internet of Things (IoT) apparatus configured to perform various functions, for example, gathering information by one or more sensors, receiving, or transmitting information, analyzing information gathered or received by the apparatus, or the like.
  • IoT Internet of Things
  • the apparatus may comprise a video coding system, which may incorporate a codec.
  • FIG.2 shows a layout of an apparatus according to an example embodiment.
  • the apparatus 50 may for example, be a mobile terminal or user equipment of a wireless communication system, a sensor device, a tag, or a lower power device. However, it would be appreciated that embodiments of the examples described herein may be implemented within any electronic device or apparatus which may process data by neural networks.
  • the apparatus 50 may comprise a housing 30 for incorporating and protecting the device.
  • the apparatus 50 may further comprise a display 32 in the form of, for example, a liquid crystal display, light emitting diode (LED) display, or an organic light emitting diode (OLED) display.
  • LED light emitting diode
  • OLED organic light emitting diode
  • the display may be any suitable display technology suitable to display media or multimedia content, for example, an image or video.
  • the apparatus 50 may further comprise a keypad 34.
  • any suitable data or user interface mechanism may be employed.
  • the user interface may be implemented as a virtual keyboard or data entry system as part of a touch-sensitive display.
  • the apparatus 50 may comprise a microphone 36 or any suitable audio input which may be a digital or analogue signal input.
  • the apparatus 50 may further comprise an audio output device which in embodiments of the examples described herein may be any one of: an earpiece 38, speaker, or an analogue audio or digital audio output connection.
  • the apparatus 50 may also comprise a battery (or in other embodiments of the examples described herein the device may be powered by any suitable mobile energy device such as solar cell, fuel cell or clockwork generator).
  • the apparatus may further comprise a camera 42 capable of recording or capturing images and/or video.
  • the apparatus 50 may further comprise an infrared port for short range line of sight communication to other devices.
  • the apparatus 50 may further comprise any suitable short range communication solution such as, for example, a Bluetooth ® wireless connection or a USB/firewire wired connection.
  • the apparatus 50 may comprise a controller 56, a processor or a processor circuitry for controlling the apparatus 50.
  • the controller 56 may be connected to memory 58 which in embodiments of the examples described herein may store both data in the form of an image, audio data, video data, and/or may also store instructions for implementation on the controller 56.
  • the controller 56 may further be connected to codec circuitry 54 suitable for carrying out coding and/or decoding of audio, image, and/or video data or assisting in coding and/or decoding carried out by the controller.
  • the apparatus 50 may further comprise a card reader 48 and a smart card 46, for example, a UICC and UICC reader for providing user information and being suitable for providing authentication information for authentication and authorization of the user at a network.
  • the apparatus 50 may comprise radio interface circuitry 52 connected to the controller and suitable for generating wireless communication signals, for example, for communication with a cellular communications network, a wireless communications system or a wireless local area network.
  • the apparatus 50 may further comprise an antenna 44 connected to the radio interface circuitry 52 for transmitting radio frequency signals generated at the radio interface circuitry 52 to other apparatus(es) and/or for receiving radio frequency signals from other apparatus(es).
  • the apparatus 50 may comprise a camera 42 capable of recording or detecting individual frames which are then passed to the codec 54 or the controller for processing.
  • the apparatus may receive the video image data for processing from another device prior to transmission and/or storage.
  • the apparatus 50 may also receive either wirelessly or by a wired connection the image for coding/decoding.
  • the structural elements of the apparatus 50 described above represent examples of means for performing a corresponding function.
  • the system 10 comprises multiple communication devices which can communicate through one or more networks.
  • the system 10 may comprise any combination of wired or wireless networks including, but not limited to a wireless cellular telephone network (such as a GSM, UMTS, CDMA, LTE, 4G, 5G network, and the like), a wireless local area network (WLAN) such as defined by any of the IEEE 802.x standards, a Bluetooth ® personal area network, an Ethernet local area network, a token ring local area network, a wide area network, and the Internet.
  • a wireless cellular telephone network such as a GSM, UMTS, CDMA, LTE, 4G, 5G network, and the like
  • WLAN wireless local area network
  • the system 10 may include both wired and wireless communication devices and/or the apparatus 50 suitable for implementing embodiments of the examples described herein.
  • the system shown in FIG. 3 shows a mobile telephone network 11 and a representation of the Internet 28.
  • Connectivity to the Internet 28 may include, but is not limited to, long range wireless connections, short range wireless connections, and various wired connections including, but not limited to, telephone lines, cable lines, power lines, and similar communication pathways.
  • the example communication devices shown in the system 10 may include, but are not limited to, an electronic device or the apparatus 50, a combination of a personal digital assistant (PDA) and a mobile telephone 14, a PDA 16, an integrated messaging device (IMD) 18, a desktop computer 20, a notebook computer 22.
  • PDA personal digital assistant
  • IMD integrated messaging device
  • the apparatus 50 may be stationary or mobile when carried by an individual who is moving.
  • the apparatus 50 may also be located in a mode of transport including, but not limited to, a car, a truck, a taxi, a bus, a train, a boat, an airplane, a bicycle, a motorcycle or any similar suitable mode of transport.
  • the embodiments may also be implemented in a set-top box; for example, a digital TV receiver, which may/may not have a display or wireless capabilities, in tablets or (laptop) personal computers (PC), which have hardware and/or software to process neural network data, in various operating systems, and in chipsets, processors, DSPs and/or embedded systems offering hardware/software based coding.
  • Some or further apparatus may send and receive calls and messages and communicate with service providers through a wireless connection 25 to a base station 24.
  • the base station 24 may be connected to a network server 26 that allows communication between the mobile telephone network 11 and the Internet 28.
  • the system may include additional communication devices and communication devices of various types.
  • the communication devices may communicate using various transmission technologies including, but not limited to, code division multiple access (CDMA), global systems for mobile communications (GSM), universal mobile telecommunications system (UMTS), time divisional multiple access (TDMA), frequency division multiple access (FDMA), transmission control protocol-internet protocol (TCP-IP), short messaging service (SMS), multimedia messaging service (MMS), email, instant messaging service (IMS), Bluetooth ® , IEEE 802.11, 3GPP Narrowband IoT and any similar wireless communication technology.
  • CDMA code division multiple access
  • GSM global systems for mobile communications
  • UMTS universal mobile telecommunications system
  • TDMA time divisional multiple access
  • FDMA frequency division multiple access
  • TCP-IP transmission control protocol-internet protocol
  • SMS short messaging service
  • MMS multimedia messaging service
  • email instant messaging service
  • IMS instant messaging service
  • Bluetooth ® IEEE 802.11, 3GPP Narrowband IoT and any similar wireless communication technology.
  • a communications device involved in implementing various embodiments of the examples described herein may communicate
  • a channel may refer either to a physical channel or to a logical channel.
  • a physical channel may refer to a physical transmission medium such as a wire
  • a logical channel may refer to a logical connection over a multiplexed medium, capable of conveying several logical channels.
  • a channel may be used for conveying an information signal, for example, a bitstream, from one or several senders (or transmitters) to one or several receivers.
  • the embodiments may also be implemented in internet of things (IoT) devices.
  • IoT internet of things
  • IoT may be defined, for example, as an interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure.
  • the convergence of various technologies has and may enable many fields of embedded systems, such as wireless sensor networks, control systems, home/building automation, and the like, to be included the IoT.
  • the IoT devices are provided with an IP address as a unique identifier.
  • IoT devices may be provided with a radio transmitter, such as WLAN or Bluetooth ® transmitter or a RFID tag.
  • IoT devices may have access to an IP-based network via a wired network, such as an Ethernet-based network or a power-line connection (PLC).
  • PLC power-line connection
  • An MPEG-2 transport stream (TS), specified in ISO/IEC 13818-1 or equivalently in ITU-T Recommendation H.222.0, is a format for carrying audio, video, and other media as well as program metadata or other metadata, in a multiplexed stream.
  • a packet identifier (PID) is used to identify an elementary stream (a.k.a. packetized elementary stream) within the TS.
  • PID packet identifier
  • a logical channel within an MPEG-2 TS may be considered to correspond to a specific PID value.
  • Video codec consists of an encoder that transforms the input video into a compressed representation suited for storage/transmission and a decoder that can decompress the compressed video representation back into a viewable form, or into a form that is suitable as an input to one or more algorithms for analysis or processing.
  • a video encoder and/or a video decoder may also be separate from each other, for example, need not form a codec.
  • Typical hybrid video encoders for example, many encoder implementations of ITU-T H.263 and H.264, encode the video information in two phases. Firstly pixel values in a certain picture area (or ‘block’) are predicted, for example, by motion compensation means (finding and indicating an area in one of the previously coded video frames that corresponds closely to the block being coded) or by spatial means (using the pixel values around the block to be coded in a specified manner). Secondly the prediction error, for example, the difference between the predicted block of pixels and the original block of pixels, is coded.
  • encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate).
  • a specified transform for example, Discrete Cosine Transform (DCT) or a variant of it
  • DCT Discrete Cosine Transform
  • encoder can control the balance between the accuracy of the pixel representation (picture quality) and size of the resulting coded video representation (file size or transmission bitrate).
  • temporal prediction the sources of prediction are previously decoded pictures (a.k.a. reference pictures).
  • IBC intra block copy
  • inter prediction may refer to temporal prediction only, while in other cases inter prediction may refer collectively to temporal prediction and any of intra block copy, inter-layer prediction, and inter-view prediction provided that they are performed with the same or similar process than temporal prediction.
  • Inter prediction or temporal prediction may sometimes be referred to as motion compensation or motion-compensated prediction.
  • Inter prediction which may also be referred to as temporal prediction, motion compensation, or motion-compensated prediction, reduces temporal redundancy.
  • inter prediction the sources of prediction are previously decoded pictures.
  • Intra prediction utilizes the fact that adjacent pixels within the same picture are likely to be correlated. Intra prediction can be performed in spatial or transform domain, for example, either sample values or transform coefficients can be predicted. Intra prediction is typically exploited in intra-coding, where no inter prediction is applied.
  • One outcome of the coding procedure is a set of coding parameters, such as motion vectors and quantized transform coefficients. Many parameters can be entropy-coded more efficiently when they are predicted first from spatially or temporally neighboring parameters.
  • FIG.4 shows a block diagram of a general structure of a video encoder.
  • FIG.4 presents an encoder for two layers, but it would be appreciated that presented encoder could be similarly extended to encode more than two layers.
  • FIG. 4 illustrates a video encoder comprising a first encoder section 500 for a base layer and a second encoder section 502 for an enhancement layer.
  • Each of the first encoder section 500 and the second encoder section 502 may comprise similar elements for encoding incoming pictures.
  • the encoder sections 500, 502 may comprise a pixel predictor 302, 402, prediction error encoder 303, 403 and prediction error decoder 304, 404.
  • FIG.4 also shows an embodiment of the pixel predictor 302, 402 as comprising an inter-predictor 306, 406, an intra-predictor 308, 408, a mode selector 310, 410, a filter 316, 416, and a reference frame memory 318, 418.
  • the pixel predictor 302 of the first encoder section 500 receives a base layer image(s) 300 of a video stream to be encoded at both the inter-predictor 306 (which determines the difference between the image and a motion compensated reference frame) and the intra-predictor 308 (which determines a prediction for an image block based only on the already processed parts of current frame or picture).
  • the output of both the inter-predictor and the intra-predictor are passed to the mode selector 310.
  • the intra-predictor 308 may have more than one intra-prediction modes. Hence, each mode may perform the intra-prediction and provide the predicted signal to the mode selector 310.
  • the mode selector 310 also receives a copy of the base layer image 300.
  • the pixel predictor 402 of the second encoder section 502 receives enhancement layer image(s) 400 of a video stream to be encoded at both the inter-predictor 406 (which determines the difference between the image and a motion compensated reference frame) and the intra-predictor 408 (which determines a prediction for an image block based only on the already processed parts of current frame or picture).
  • the output of both the inter-predictor and the intra-predictor are passed to the mode selector 410.
  • the intra-predictor 408 may have more than one intra-prediction modes. Hence, each mode may perform the intra-prediction and provide the predicted signal to the mode selector 410.
  • the mode selector 410 also receives a copy of the enhancement layer image 400.
  • the output of the inter-predictor 306, 406 or the output of one of the optional intra-predictor modes or the output of a surface encoder within the mode selector is passed to the output of the mode selector 310, 410.
  • the output of the mode selector 310, 410 is passed to a first summing device 321, 421.
  • the first summing device may subtract the output of the pixel predictor 302, 402 from the base layer image 300 or the enhancement layer image 400 to produce a first prediction error signal 320, 420 which is input to the prediction error encoder 303, 403.
  • the pixel predictor 302, 402 further receives from a preliminary reconstructor 339, 439 the combination of the prediction representation of the image block 312, 412 and the output 338, 438 of the prediction error decoder 304, 404.
  • the preliminary reconstructed image 314, 414 may be passed to the intra-predictor 308, 408 and to the filter 316, 416.
  • the filter 316, 416 receiving the preliminary representation may filter the preliminary representation and output a final reconstructed image 340, 440 which may be saved in the reference frame memory 318, 418.
  • the reference frame memory 318 may be connected to the inter-predictor 306 to be used as the reference image against which a future base layer image 300 is compared in inter-prediction operations.
  • the reference frame memory 318 may also be connected to the inter-predictor 406 to be used as the reference image against which a future enhancement layer image 400 is compared in inter-prediction operations. Moreover, the reference frame memory 418 may be connected to the inter-predictor 406 to be used as the reference image against which a future enhancement layer image 400 is compared in inter-prediction operations. [0079] Filtering parameters from the filter 316 of the first encoder section 500 may be provided to the second encoder section 502 subject to the base layer being selected and indicated to be source for predicting the filtering parameters of the enhancement layer according to some embodiments.
  • the prediction error encoder 303, 403 comprises a transform unit 342, 442 and a quantizer 344, 444.
  • the transform unit 342, 442 transforms the first prediction error signal 320, 420 to a transform domain.
  • the transform is, for example, the DCT transform.
  • the quantizer 344, 444 quantizes the transform domain signal, for example, the DCT coefficients, to form quantized coefficients.
  • the prediction error decoder 304, 404 receives the output from the prediction error encoder 303, 403 and performs the opposite processes of the prediction error encoder 303, 403 to produce a decoded prediction error signal 338, 438 which, when combined with the prediction representation of the image block 312, 412 at the second summing device 339, 439, produces the preliminary reconstructed image 314, 414.
  • the prediction error decoder may be considered to comprise a dequantizer 346, 446, which dequantizes the quantized coefficient values, for example, DCT coefficients, to reconstruct the transform signal and an inverse transformation unit 348, 448, which performs the inverse transformation to the reconstructed transform signal wherein the output of the inverse transformation unit 348, 448 includes reconstructed block(s).
  • the prediction error decoder may also comprise a block filter which may filter the reconstructed block(s) according to further decoded information and filter parameters.
  • the entropy encoder 330, 430 receives the output of the prediction error encoder 303, 403 and may perform a suitable entropy encoding/variable length encoding on the signal to provide a compressed signal.
  • FIG.5 is a block diagram showing the interface between an encoder 501 implementing neural network encoding 503, and a decoder 504 implementing neural network decoding 505 in accordance with the examples described herein.
  • the encoder 501 may embody a device, software method or a hardware circuit.
  • the encoder 501 has the goal of compressing input data 511 (for example, an input video, a neural network model, or an update to the neural network model) to a compressed data 512 (for example, a bitstream) such that the bitrate is minimized, and the accuracy of an analysis or processing algorithm is maximized.
  • the encoder 501 uses an encoder or compression algorithm, for example, to perform neural network encoding 503, e.g., encoding the input data by using one or more neural networks.
  • the general analysis or processing algorithm may be part of the decoder 504.
  • the decoder 504 uses a decoder or decompression algorithm, for example, to perform the neural network decoding 505 (e.g., decoding by using one or more neural networks) to decode the compressed data 512 (for example, compressed video) which was encoded by the encoder 501.
  • the decoder 504 produces decompressed data 513 (for example, reconstructed data).
  • the encoder 501 and decoder 504 may be entities implementing an abstraction, may be separate entities or the same entities, or may be part of the same physical device.
  • the analysis/processing algorithm may be any algorithm, traditional or learned from data. In the case of an algorithm which is learned from data, it is assumed that this algorithm can be modified or updated, for example, by using optimization via gradient descent.
  • One example of the learned algorithm is a neural network.
  • the method and apparatus of an example embodiment may be utilized in a wide variety of systems, including systems that rely upon the compression and decompression of media data and possibly also the associated metadata.
  • the method and apparatus are configured to compress the media data and associated metadata streamed from a source via a content delivery network to a client device, at which point the compressed media data and associated metadata is decompressed or otherwise processed.
  • FIG.6 depicts an example of such a system 600 that includes a source 602 of media data and associated metadata.
  • the source may be, in one embodiment, a server. However, the source may be embodied in other manners when so desired.
  • the source is configured to stream the media data and associated metadata to a client device 604.
  • the client device may be embodied by a media player, a multimedia system, a video system, a smart phone, a mobile telephone or other user equipment, a personal computer, a tablet computer or any other computing device configured to receive and decompress the media data and process associated metadata.
  • media data and metadata are streamed via a network 606, such as any of a wide variety of types of wireless networks and/or wireline networks.
  • the client device is configured to receive structured information including media, metadata and any other relevant representation of information including the media and the metadata; and to decompress the media data and process the associated metadata (e.g. for proper playback timing of decompressed media data).
  • An apparatus 700 is provided in accordance with an example embodiment as shown in FIG.7.
  • the apparatus of FIG.7 may be embodied by a source 602, such as a file writer which, in turn, may be embodied by a server, that is configured to stream a compressed representation of the media data and associated metadata.
  • the apparatus may be embodied by the client device 604, such as a file reader which may be embodied, for example, by any of the various computing devices described above.
  • the apparatus of an example embodiment includes, is associated with or is in communication with a processing circuitry 702, one or more memory devices 704, a communication interface 706 and optionally a user interface.
  • the processing circuitry 702 may be in communication with the memory device 704 via a bus for passing information among components of the apparatus 700.
  • the memory device may be non-transitory and may include, for example, one or more volatile and/or non-volatile memories.
  • the memory device may be an electronic storage device (e.g., a computer readable storage medium) comprising gates configured to store data (e.g., bits) that may be retrievable by a machine (e.g., a computing device like the processing circuitry).
  • the memory device may be configured to store information, data, content, applications, instructions, or the like for enabling the apparatus to carry out various functions in accordance with an example embodiment of the present disclosure.
  • the apparatus 700 may, in some embodiments, be embodied in various computing devices as described above. However, in some embodiments, the apparatus may be embodied as a chip or chip set. In other words, the apparatus may comprise one or more physical packages (e.g., chips) including materials, components and/or wires on a structural assembly (e.g., a baseboard). The structural assembly may provide physical strength, conservation of size, and/or limitation of electrical interaction for component circuitry included thereon.
  • the apparatus may therefore, in some cases, be configured to implement an embodiment of the present disclosure on a single chip or as a single ‘system on a chip.’
  • a chip or chipset may constitute means for performing one or more operations for providing the functionalities described herein.
  • the processing circuitry 702 may be embodied in a number of different ways.
  • the processing circuitry may be embodied as one or more of various hardware processing means such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing element with or without an accompanying DSP, or various other circuitry including integrated circuits such as, for example, an ASIC (application specific integrated circuit), an FPGA (field programmable gate array), a microcontroller unit (MCU), a hardware accelerator, a special- purpose computer chip, or the like.
  • the processing circuitry may include one or more processing cores configured to perform independently.
  • a multi-core processing circuitry may enable multiprocessing within a single physical package.
  • the processing circuitry may include one or more processors configured in tandem via the bus to enable independent execution of instructions, pipelining and/or multithreading.
  • the processing circuitry 702 may be configured to execute instructions stored in the memory device 704 or otherwise accessible to the processing circuitry.
  • the processing circuitry may be configured to execute hard coded functionality.
  • the processing circuitry may represent an entity (e.g., physically embodied in circuitry) capable of performing operations according to an embodiment of the present disclosure while configured accordingly.
  • the processing circuitry may be specifically configured hardware for conducting the operations described herein.
  • the processing circuitry when the processing circuitry is embodied as an executor of instructions, the instructions may specifically configure the processing circuitry to perform the algorithms and/or operations described herein when the instructions are executed.
  • the processing circuitry may be a processor of a specific device (e.g., an image or video processing system) configured to employ an embodiment of the present invention by further configuration of the processing circuitry by instructions for performing the algorithms and/or operations described herein.
  • the processing circuitry may include, among other things, a clock, an arithmetic logic unit (ALU) and logic gates configured to support operation of the processing circuitry.
  • ALU arithmetic logic unit
  • the communication interface 706 may be any means such as a device or circuitry embodied in either hardware or a combination of hardware and software that is configured to receive and/or transmit data, including video bitstreams.
  • the communication interface may include, for example, an antenna (or multiple antennas) and supporting hardware and/or software for enabling communications with a wireless communication network. Additionally or alternatively, the communication interface may include the circuitry for interacting with the antenna(s) to cause transmission of signals via the antenna(s) or to handle receipt of signals received via the antenna(s). In some environments, the communication interface may alternatively or also support wired communication.
  • the communication interface may include a communication modem and/or other hardware/software for supporting communication via cable, digital subscriber line (DSL), universal serial bus (USB) or other mechanisms.
  • the apparatus 700 may optionally include a user interface that may, in turn, be in communication with the processing circuitry 702 to provide output to a user, such as by outputting an encoded video bitstream and, in some embodiments, to receive an indication of a user input.
  • the user interface may include a display and, in some embodiments, may also include a keyboard, a mouse, a joystick, a touch screen, touch areas, soft keys, a microphone, a speaker, or other input/output mechanisms.
  • the processing circuitry may comprise user interface circuitry configured to control at least some functions of one or more user interface elements such as a display and, in some embodiments, a speaker, ringer, microphone and/or the like.
  • the processing circuitry and/or user interface circuitry comprising the processing circuitry may be configured to control one or more functions of one or more user interface elements through computer program instructions (e.g., software and/or firmware) stored on a memory accessible to the processing circuitry (e.g., memory device, and/or the like).
  • computer program instructions e.g., software and/or firmware
  • FL Federated learning
  • FL generally consists of multiple rounds of communication between a server and one or more clients. In each round, clients train their own models using their own data and then send their models to the server to be aggregated into a single global model. The clients, then begin the next round of training using the global model as the initialization.
  • This procedure has significant communication load that requires high communication bandwidth. To decrease communication load, it is assumed that clients only send the updates to the weights they received from the server rather than the full model. However, the procedure is still far from being communication efficient.
  • the embodiments described herein are explained by using following non-limiting example use cases: (i) deployment of neural network updates from a server to a client, and (ii) federated learning with two or more clients and a server.
  • the embodiments consider convolutional neural networks (CNNs) as an example of neural network models to be trained in these use cases.
  • CNNs convolutional neural networks
  • One of the non-limiting example objectives of the embodiments described herein is to achieve compression of weight-updates while preserving the accuracy (and/or other performance metric) of the neural networks that are updated by the compressed weight-updates, where FL is one example use case. It is to be noted that various embodiments are not limited to any specific use case, any specific type of neural network architecture, and/or any specific task of the neural networks.
  • - various embodiments may be applied to both use cases (i) and (ii) with more clients; - various embodiments may be applied to other use cases; - various embodiments may consider architectures other than CNNs, such as, transformer architecture, recurrent neural network architecture, and the like; and - various embodiments may consider any suitable task for the neural network, such as image classification, object detection, semantic segmentation, anomaly detection, and the like.
  • An example application where reducing the bitrate of weight-updates is important, is the use case of neural network based codecs, such as neural network based video codecs.
  • Video codecs may use one or more neural networks, or a neural network may be used as part of image/video decoding operations, or as part of image/video post-processing operations.
  • NN may be a loop filter or a post-processing filter.
  • the video codec may be a video codec such as the versatile video codec (VVC/H.266) that has been modified to include one or more neural networks.
  • Examples of these neural networks include, but are not limited to: - a neural network filter to be used as one of the in-loop filters of VVC; - a neural network filter to replace all the in-loop filters of VVC; - a neural network filter to be used as a post-processing filter; - a neural network to be used for performing intra-frame prediction; and - a neural network to be used for performing inter-frame prediction.
  • the video codec may comprise a neural network that transforms the input data into a more compressible representation.
  • the new representation may be quantized, lossless compressed, then lossless decompressed, dequantized, and then another neural network may transform its input into reconstructed or decoded data.
  • the encoder may finetune the neural network filter by using the ground-truth data (the uncompressed data), which is available at encoder side. Finetuning may be performed in order to improve the neural network filter when applied to the current input data. Finetuning may include running one or more optimization iterations on some or all the learnable weights of the neural network filter.
  • An optimization iteration may include computing gradients of a loss function with respect to some or all the learnable weights of the neural network filter, for example, by using the backpropagation algorithm, and then updating the some or all learnable weights by using an optimizer, such as the stochastic gradient descent optimizer.
  • the loss function may comprise one or more loss terms.
  • An example loss term may be the mean squared error (MSE).
  • MSE mean squared error
  • Other distortion metrics may be used as the loss terms.
  • the loss function may be computed by providing one or more data to the input of the neural network filter, obtaining one or more corresponding outputs from the neural network filter, and computing a loss term by using the one or more outputs from the neural network filter and one or more ground-truth data.
  • weight-update The difference between the weights of the finetuned neural network and the weights of the neural network before finetuning is referred to as weight-update.
  • This weight- update needs to be encoded and used at decoder side for updating the neural network filter. It is desirable to encode the weight-update such that it requires a small number of bits.
  • Various embodiments herein may be applied to compress these weight-updates.
  • FL has some challenges including, but not limited to: - non-iid data: the distribution of local datasets vary among clients. This fundamental property of training data distribution is termed as non independent and non identically distributed data (non-iid data).
  • Non-iid data negatively affects the performance of model training in federated learning; and - communication overhead: sending model updates among server and distributed devices result in communication overhead which is a major obstacle for using FL in networks with limited communication bandwidth.
  • Existing approaches for FL are either not efficient in compression and bandwidth usage or they are not able to handle non-iid data, or both. There is no systematically and theoretically proven technique to cope with both of the above-mentioned challenges simultaneously.
  • An example method for FL is FedAVG.
  • Weight update compression is an approach to obtain efficient communication in FL setting. Weight updates are the changes in the model parameters (which are usually weights of deep neural networks) caused by an optimization approach adopted in a distributed device. To perform efficient weight update communication, several algorithms are introduced. One example method is Top-k sparsification, which preserves only the k weight updates with highest magnitudes (both positive and negative) and discards the rest. This is to obtain a sparse weight update which reduces communication bandwidth between clients and server. Top-k sparsification removes weight updates solely based on their magnitude without taking advantage of the structural information of the weight updates which results in inefficient sparsification.
  • Various embodiments propose an approach to simultaneously tackle sparsification of weight updates for efficient transport and handling the non-iid data distribution in clients.
  • the proposed embodiments - efficiently compresses weight updates, for example, for efficient bandwidth utilization; - effectively handle non-iid data; and/or - can be integrated with any quantization approach such as sparse ternary or sparse binary to provide even higher weight update compression.
  • the proposed approach adopts structural information of the weight updates to compress weight updates as efficiently as possible, and at the same time forces clients to keep their weight updates as close as possible to each other. These are obtained via a loss term to be adopted by the clients. It, thus, improves over FedProx by improving the bandwidth utilization.
  • the proposed loss term is orthogonal to other weight update compression techniques, including top-k sparse methods and quantization techniques. In other words, it can work independent of or in conjunction with such methods.
  • different structural groups for elements of a weight update are defined , e.g., group of convolution layers, group of filters in a convolution layer, group of features across multiple filters in a convolution layer, group of n ⁇ n block in a fully connected layer, and the like.
  • two levels of sparsity are considered, for example: 1. An intra group sparsity; and 2. An inter group sparsity.
  • Applying intra group sparsity results in sparse weight updates for each group, e.g., most elements of the weight updates of each group are zero.
  • each client receives from the server, updates them to using its local data, and sends the weight updates, e.g., back to the server.
  • weight tensors for convolution layers
  • weight matrices for fully connected layer
  • Equation 1 [00118]
  • one the goals is to train a convolutional neural network (CNN) with L by using a FL framework with a server and several clients The L layers are parameterized by weight tensors
  • the objective function is used by each client in the FL framework.
  • each client receives from the server, updates them to using its local data, and sends the weight updates, e.g., back to the server.
  • Equation 1 represents the local training data set, represents the training loss, represents a regularization term for the weight updates at layer l, and ⁇ is scalar parameter that balances the training loss and weight update regularization.
  • Equation 2 [00120]
  • the first term of equation 2 is the l ,- -norm, a.k.a. exclusive group sparsity, of the weight updates of each group.
  • group sparsity e.g., l 1,2 -norm forces the weight update to be as sparse as possible inside each group (refer to the l 1 -norm used for each weight update), while at the same time ensures that in each layer, there will be no update group with all zeros (e.g. each group in a layers, will have something to update).
  • Each weight update vector e.g., a vector 802
  • values of the weight update elements may be represented by shades of a color.
  • a weight update value may vary from ‘0’ to ‘1’.
  • a color palette 806 shows weight values corresponding to different shades of color ‘grey’.
  • a square e.g., an element having a darker shade has higher weight value as compared to a square having a lighter shade.
  • color white indicates a lowest value of a weight, for example, ‘0’.
  • color palette is an example for representing weights, and other representations, for example, actual values may be used to represent a weight.
  • color ‘grey’ is an example color, other colors, such as color ‘blue’ may be used to generate a color palette.
  • vectors 802, 808, 810, and 812 shown in the left side of the arrow represent the original weight update vectors before applying intra-group sparsification.
  • Vectors 814, 816, 818, and 820 on the right hand side of the arrow represent weight update vectors after applying intra- group sparsification.
  • weight update vectors are sparse, e.g., each weight update vector has a few nonzero (non-white) elements, none of the weight update vector has all elements with values as zero (e.g., there is at least one ‘grey’ colored element for each weight update vector).
  • the exclusive group sparsity term also addresses the issue of statistical heterogeneity by restricting the local updates to be closer to each other. This is done through the norm-2 of the weight updates which forces the current weights of each client to be close to the weights received from the server. In other words, all clients are forced to keep their weight updates close to the same reference and hence they are forced to keep their weight updates close to each other.
  • FIG. 9 illustrates the effect of inter-group sparsification term on the weight updates, according to an embodiment. As shown in FIG.
  • the vectors 802, 808, 810, 812 on the left side of the arrow represent the original weight update vectors before applying inter-group sparsification
  • the vectors 814, 816, 818, 820, an 822 on the right side of the arrow represent weight update vectors after applying inter-group sparsification.
  • Each weight update vector is considered as a group.
  • FIG.10 illustrates an example pseudocode 1000 for efficient federated learning for non- iid data via compressed communication, in accordance with an embodiment.
  • Example output of the embodiments are, (i) binary bitmask that determines updated or non-updated weights, (ii) values of the updated weights.
  • the bitmask and the values of the updated weights can be communicated using the following semantic and later decoded by parsing the bitmask and reconstructing relevant data structures.
  • bit_mask_size For (i; i ⁇ bit_mask_size; i++) ⁇ bitmask_value[i] ⁇ number_of_weight_update For (i; i ⁇ number_of_weight_updates; i++) ⁇ weight_update_value[i] ⁇ [00133]
  • bitmask_size For (i; i ⁇ bit_mask_size; i++) ⁇ bitmask_value[i] ⁇ number_of_weight_update For (i; i ⁇ number_of_weight_updates; i++) ⁇ weight_update_value[i] ⁇ [00133]
  • bitmask and valid values can be created for each data structure or data element. Further, dimensions of data structures is communicated, which upon receiving data on decoder is used to reconstruct the original matrix. It is also possible to communicate a single large bitmask and valid values including indices of each weight matrix in the large matrix signaled.
  • bitmasks could be encoded with some encoding mechanism such as run-length encoding, golomb encoding, and the like, to further obtain a more compressed bitstream.
  • the weight update values could be further quantized and compressed, e.g., using uniform quantization and cabac coding.
  • Additional embodiments [00137] Achieving specific sparsity ratio guarantee: In an embodiment, the proposed approach can guarantee a specific sparsity level when necessary, by applying a threshold after training is done. For example, once the training with the proposed loss function is done, when the desired sparsity ratio is not achieved, a threshold value is calculated from the absolute value of the weight updates.
  • the proposed loss function may be computed based at least on a training set, where the training set may be one or more media units to be encoded (e.g., one or more blocks or frames of a video).
  • the weight updates that have an absolute value below the calculated threshold is set to zero.
  • Neural compression In another embodiment, the proposed training loss or loss function could be applied in combination of neural network compression techniques to impose both network compression and weight update compression simultaneously.
  • Combination with other weight update compression techniques the proposed approach could work as standalone or could apply as a pre-processing step in combination with other weight update compression approaches such as top-k sparsification and quantization techniques. [00140] FIG.
  • the apparatus 1100 is an example apparatus 1100, which may be implemented in hardware, configured to implement inter-codec model or architecture for efficient federated learning for non- iid data via compressed communication.
  • the apparatus 1100 comprises at least one processor 1102, at least one non-transitory memory 1104 including computer program code 1105, wherein the at least one memory 1104 and the computer program code 1105 are configured to, with the at least one processor 1102, cause the apparatus to implement mechanisms for federated learning for non independent and non identically distributed data 1106 based on the examples described herein.
  • the apparatus 1100 optionally includes a display 1108 that may be used to display content during rendering.
  • the apparatus 1100 optionally includes one or more network (NW) interfaces (I/F(s)) 1110.
  • NW network interfaces
  • the NW I/F(s) 1110 may be wired and/or wireless and communicate over the Internet/other network(s) via any communication technique.
  • the NW I/F(s) 1110 may comprise one or more transmitters and one or more receivers.
  • the N/W I/F(s) 1110 may comprise standard well-known components such as an amplifier, filter, frequency-converter, (de)modulator, and encoder/decoder circuitry(ies) and one or more antennas.
  • the apparatus 1100 may be a remote, virtual or cloud apparatus.
  • the apparatus 1100 may be either a coder or a decoder, or both a coder and a decoder.
  • the at least one memory 1104 may be implemented using any suitable data storage technology, such as semiconductor based memory devices, flash memory, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the at least one memory 1104 may comprise a database for storing data.
  • the apparatus 1100 need not comprise each of the features mentioned, or may comprise other features as well.
  • the apparatus 1100 may correspond to or be another embodiment of the apparatus 50 shown in FIG. 1 and FIG. 2, or any of the apparatuses shown in FIG.3.
  • the apparatus 1100 may correspond to or be another embodiment of the apparatuses shown in FIG.13, including UE 110, RAN node 170, or network element(s) 190.
  • FIG.12 illustrates an example method 1200 for federated learning for non independent and non identically distributed data, in accordance with an embodiment.
  • the method includes defining one or more groups for elements of a weight update.
  • the method includes compressing one or more weight updates based on a plurality of sparsity levels, wherein the one or more weight updates comprise a weight update corresponding to each group of the one or more groups.
  • the method includes wherein compressing the one or more weight updates based on the plurality of sparsity levels comprises applying an intra group sparsity that results in a sparse weight update for each group.
  • the method includes wherein compressing the one or more weight updates based on the plurality of sparsity levels comprises applying an inter group sparsity that results in updating weights for a subset of the one or more groups.
  • 1206 and 1208 are performed simultaneously or substantially simultaneously. In an alternate embodiment, 1206 and 1208 may be performed sequentially.
  • FIG.13 this figure shows a block diagram of one possible and non-limiting example in which the examples may be practiced.
  • a user equipment (UE) 110, radio access network (RAN) node 170, and network element(s) 190 are illustrated.
  • the user equipment (UE) 110 is in wireless communication with a wireless network 100.
  • a UE is a wireless device that can access the wireless network 100.
  • the UE 110 includes one or more processors 120, one or more memories 125, and one or more transceivers 130 interconnected through one or more buses 127.
  • Each of the one or more transceivers 130 includes a receiver, Rx, 132 and a transmitter, Tx, 133.
  • the one or more buses 127 may be address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, and the like.
  • the one or more transceivers 130 are connected to one or more antennas 128.
  • the one or more memories 125 include computer program code 123.
  • the UE 110 includes a module 140, comprising one of or both parts 140-1 and/or 140-2, which may be implemented in a number of ways.
  • the module 140 may be implemented in hardware as module 140-1, such as being implemented as part of the one or more processors 120.
  • the module 140-1 may be implemented also as an integrated circuit or through other hardware such as a programmable gate array.
  • the module 140 may be implemented as module 140- 2, which is implemented as computer program code 123 and is executed by the one or more processors 120.
  • the one or more memories 125 and the computer program code 123 may be configured to, with the one or more processors 120, cause the user equipment 110 to perform one or more of the operations as described herein.
  • the UE 110 communicates with RAN node 170 via a wireless link 111.
  • the RAN node 170 in this example is a base station that provides access by wireless devices such as the UE 110 to the wireless network 100.
  • the RAN node 170 may be, for example, a base station for 5G, also called New Radio (NR).
  • the RAN node 170 may be a NG-RAN node, which is defined as either a gNB or an ng-eNB.
  • a gNB is a node providing NR user plane and control plane protocol terminations towards the UE, and connected via the NG interface to a 5GC (such as, for example, the network element(s) 190).
  • the ng-eNB is a node providing E-UTRA user plane and control plane protocol terminations towards the UE, and connected via the NG interface to the 5GC.
  • the NG-RAN node may include multiple gNBs, which may also include a central unit (CU) (gNB-CU) 196 and distributed unit(s) (DUs) (gNB-DUs), of which DU 195 is shown.
  • the DU may include or be coupled to and control a radio unit (RU).
  • the gNB-CU is a logical node hosting radio resource control (RRC), SDAP and PDCP protocols of the gNB or RRC and PDCP protocols of the en-gNB that controls the operation of one or more gNB-DUs.
  • RRC radio resource control
  • the gNB-CU terminates the F1 interface connected with the gNB-DU.
  • the F1 interface is illustrated as reference 198, although reference 198 also illustrates a link between remote elements of the RAN node 170 and centralized elements of the RAN node 170, such as between the gNB-CU 196 and the gNB-DU 195.
  • the gNB-DU is a logical node hosting RLC, MAC and PHY layers of the gNB or en-gNB, and its operation is partly controlled by gNB-CU.
  • One gNB-CU supports one or multiple cells.
  • One cell is supported by only one gNB-DU.
  • the gNB-DU terminates the F1 interface 198 connected with the gNB-CU.
  • the DU 195 is considered to include the transceiver 160, for example, as part of a RU, but some examples of this may have the transceiver 160 as part of a separate RU, for example, under control of and connected to the DU 195.
  • the RAN node 170 may also be an eNB (evolved NodeB) base station, for LTE (long term evolution), or any other suitable base station or node.
  • the RAN node 170 includes one or more processors 152, one or more memories 155, one or more network interfaces (N/W I/F(s)) 161, and one or more transceivers 160 interconnected through one or more buses 157.
  • Each of the one or more transceivers 160 includes a receiver, Rx, 162 and a transmitter, Tx, 163.
  • the one or more transceivers 160 are connected to one or more antennas 158.
  • the one or more memories 155 include computer program code 153.
  • the CU 196 may include the processor(s) 152, memories 155, and network interfaces 161.
  • the DU 195 may also include its own memory/memories and processor(s), and/or other hardware, but these are not shown.
  • the RAN node 170 includes a module 150, comprising one of or both parts 150-1 and/or 150-2, which may be implemented in a number of ways.
  • the module 150 may be implemented in hardware as module 150-1, such as being implemented as part of the one or more processors 152.
  • the module 150-1 may be implemented also as an integrated circuit or through other hardware such as a programmable gate array.
  • the module 150 may be implemented as module 150-2, which is implemented as computer program code 153 and is executed by the one or more processors 152.
  • the one or more memories 155 and the computer program code 153 are configured to, with the one or more processors 152, cause the RAN node 170 to perform one or more of the operations as described herein.
  • the functionality of the module 150 may be distributed, such as being distributed between the DU 195 and the CU 196, or be implemented solely in the DU 195.
  • the one or more network interfaces 161 communicate over a network such as via the links 176 and 131.
  • Two or more gNBs 170 may communicate using, for example, link 176.
  • the link 176 may be wired or wireless or both and may implement, for example, an Xn interface for 5G, an X2 interface for LTE, or other suitable interface for other standards.
  • the one or more buses 157 may be address, data, or control buses, and may include any interconnection mechanism, such as a series of lines on a motherboard or integrated circuit, fiber optics or other optical communication equipment, wireless channels, and the like.
  • the one or more transceivers 160 may be implemented as a remote radio head (RRH) 195 for LTE or a distributed unit (DU) 195 for gNB implementation for 5G, with the other elements of the RAN node 170 possibly being physically in a different location from the RRH/DU, and the one or more buses 157 could be implemented in part as, for example, fiber optic cable or other suitable network connection to connect the other elements (for example, a central unit (CU), gNB-CU) of the RAN node 170 to the RRH/DU 195.
  • Reference 198 also indicates those suitable network link(s).
  • the cell makes up part of a base station. That is, there can be multiple cells per base station. For example, there could be three cells for a single carrier frequency and associated bandwidth, each cell covering one-third of a 360 degree area so that the single base station’s coverage area covers an approximate oval or circle. Furthermore, each cell can correspond to a single carrier and a base station may use multiple carriers. So when there are three 120 degree cells per carrier and two carriers, then the base station has a total of 6 cells.
  • the wireless network 100 may include a network element or elements 190 that may include core network functionality, and which provides connectivity via a link or links 181 with a further network, such as a telephone network and/or a data communications network (for example, the Internet).
  • Such core network functionality for 5G may include access and mobility management function(s) (AMF(S)) and/or user plane functions (UPF(s)) and/or session management function(s) (SMF(s)).
  • AMF(S) access and mobility management function(s)
  • UPF(s) user plane functions
  • SMF(s) session management function(s)
  • LTE Long Term Evolution
  • MME Mobility Management Entity
  • SGW Serving Gateway
  • the network element 190 includes one or more processors 175, one or more memories 171, and one or more network interfaces (N/W I/F(s)) 180, interconnected through one or more buses 185.
  • the one or more memories 171 include computer program code 173.
  • the one or more memories 171 and the computer program code 173 are configured to, with the one or more processors 175, cause the network element 190 to perform one or more operations.
  • the wireless network 100 may implement network virtualization, which is the process of combining hardware and software network resources and network functionality into a single, software-based administrative entity, a virtual network. Network virtualization involves platform virtualization, often combined with resource virtualization.
  • Network virtualization is categorized as either external, combining many networks, or parts of networks, into a virtual unit, or internal, providing network-like functionality to software containers on a single system. Note that the virtualized entities that result from the network virtualization are still implemented, at some level, using hardware such as processors 152 or 175 and memories 155 and 171, and also such virtualized entities create technical effects.
  • the computer readable memories 125, 155, and 171 may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor based memory devices, flash memory, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the computer readable memories 125, 155, and 171 may be means for performing storage functions.
  • the processors 120, 152, and 175 may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on a multi-core processor architecture, as non- limiting examples.
  • the processors 120, 152, and 175 may be means for performing functions, such as controlling the UE 110, RAN node 170, network element(s) 190, and other functions as described herein.
  • the various embodiments of the user equipment 110 can include, but are not limited to, cellular telephones such as smart phones, tablets, personal digital assistants (PDAs) having wireless communication capabilities, portable computers having wireless communication capabilities, image capture devices such as digital cameras having wireless communication capabilities, gaming devices having wireless communication capabilities, music storage and playback appliances having wireless communication capabilities, Internet appliances permitting wireless Internet access and browsing, tablets with wireless communication capabilities, as well as portable units or terminals that incorporate combinations of such functions.
  • PDAs personal digital assistants
  • image capture devices such as digital cameras having wireless communication capabilities
  • gaming devices having wireless communication capabilities
  • music storage and playback appliances having wireless communication capabilities
  • Internet appliances permitting wireless Internet access and browsing, tablets with wireless communication capabilities, as well as portable units or terminals that incorporate combinations of such functions.
  • modules 140-1, 140-2, 150-1, and 150-2 may be configured to implement mechanisms for federated learning for non independent and non identically distributed data.
  • Computer program code 173 may also be configured to implement mechanisms for federated learning for non independent and non identically distributed data.
  • FIG. 12 include a flowcharts of an apparatus (e.g. 50, 100, 604, 700, or 1100), method, and computer program product according to certain example embodiments. It will be understood that each block of the flowcharts, and combinations of blocks in the flowcharts, may be implemented by various means, such as hardware, firmware, processor, circuitry, and/or other devices associated with execution of software including one or more computer program instructions. For example, one or more of the procedures described above may be embodied by computer program instructions.
  • the computer program instructions which embody the procedures described above may be stored by a memory (e.g., 58, 125, 704, or 1104) of an apparatus employing an embodiment of the present invention and executed by processing circuitry (e.g., 56, 120, 702 or 1102) of the apparatus.
  • processing circuitry e.g., 56, 120, 702 or 1102
  • any such computer program instructions may be loaded onto a computer or other programmable apparatus (e.g., hardware) to produce a machine, such that the resulting computer or other programmable apparatus implements the functions specified in the flowchart blocks.
  • These computer program instructions may also be stored in a computer-readable memory that may direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture, the execution of which implements the function specified in the flowchart blocks.
  • the computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operations to be performed on the computer or other programmable apparatus to produce a computer-implemented process such that the instructions which execute on the computer or other programmable apparatus provide operations for implementing the functions specified in the flowchart blocks.
  • a computer program product is therefore defined in those instances in which the computer program instructions, such as computer-readable program code portions, are stored by at least one non-transitory computer-readable storage medium with the computer program instructions, such as the computer-readable program code portions, being configured, upon execution, to perform the functions described above, such as in conjunction with the flowchart of FIG. 12.
  • the computer program instructions, such as the computer-readable program code portions need not be stored or otherwise embodied by a non-transitory computer-readable storage medium, but may, instead, be embodied by a transitory medium with the computer program instructions, such as the computer-readable program code portions, still being configured, upon execution, to perform the functions described above.
  • blocks of the flowcharts support combinations of means for performing the specified functions and combinations of operations for performing the specified functions for performing the specified functions. It will also be understood that one or more blocks of the flowcharts, and combinations of blocks in the flowcharts, may be implemented by special purpose hardware-based computer systems which perform the specified functions, or combinations of special purpose hardware and computer instructions. [00159] In some embodiments, certain ones of the operations above may be modified or further amplified. Furthermore, in some embodiments, additional optional operations may be included. Modifications, additions, or amplifications to the operations above may be performed in any order and in any combination. [00160] In the above, some example embodiments have been described with reference to an SEI message or an SEI NAL unit.
  • references to a ‘computer’, ‘processor’, etc. should be understood to encompass not only computers having different architectures such as single/multi-processor architectures and sequential (Von Neumann)/parallel architectures but also specialized circuits such as field- programmable gate arrays (FPGA), application specific circuits (ASIC), signal processing devices and other processing circuitry. References to computer program, instructions, code etc.
  • circuitry may refer to any of the following: (a) hardware circuit implementations, such as implementations in analog and/or digital circuitry, and (b) combinations of circuits and software (and/or firmware), such as (as applicable): (i) a combination of processor(s) or (ii) portions of processor(s)/software including digital signal processor(s), software, and memory(ies) that work together to cause an apparatus to perform various functions, and (c) circuits, such as a microprocessor(s) or a portion of a microprocessor(s), that require software or firmware for operation, even when the software or firmware is not physically present.
  • circuitry applies to uses of this term in this application.
  • circuitry would also cover an implementation of merely a processor (or multiple processors) or a portion of a processor and its (or their) accompanying software and/or firmware.
  • circuitry would also cover, for example, and when applicable to the particular element, a baseband integrated circuit or applications processor integrated circuit for a mobile phone or a similar integrated circuit in a server, a cellular network device, or another network device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Divers modes de réalisation concernent un appareil, un procédé et un produit-programme informatique. Un procédé consiste : à définir un ou plusieurs groupes destinés à des éléments d'une mise à jour de poids ; à compresser une ou plusieurs mises à jour de poids sur la base d'une pluralité de niveaux de dispersion, la ou les mises à jour de poids comprenant une mise à jour de poids correspondant à chaque groupe du ou des groupes ; la compression de la ou des mises à jour de poids sur la base de la pluralité de niveaux de dispersion consistant : à appliquer une dispersion intra-groupe qui conduit à une mise à jour de poids peu dense pour chaque groupe ; et à appliquer une dispersion inter-groupe qui conduit à la mise à jour de poids pour un sous-ensemble du ou des groupes.
PCT/IB2022/055719 2021-06-22 2022-06-20 Procédé, appareil et produit-programme informatique d'apprentissage fédéré de données non distribuées de manière identique et non indépendantes WO2022269469A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163202720P 2021-06-22 2021-06-22
US63/202,720 2021-06-22

Publications (1)

Publication Number Publication Date
WO2022269469A1 true WO2022269469A1 (fr) 2022-12-29

Family

ID=82385281

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2022/055719 WO2022269469A1 (fr) 2021-06-22 2022-06-20 Procédé, appareil et produit-programme informatique d'apprentissage fédéré de données non distribuées de manière identique et non indépendantes

Country Status (1)

Country Link
WO (1) WO2022269469A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116341689A (zh) * 2023-03-22 2023-06-27 深圳大学 机器学习模型的训练方法、装置、电子设备及存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200311551A1 (en) * 2019-03-25 2020-10-01 Nokia Technologies Oy Compressing weight updates for decoder-side neural networks
US20210065002A1 (en) * 2018-05-17 2021-03-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Concepts for distributed learning of neural networks and/or transmission of parameterization updates therefor

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210065002A1 (en) * 2018-05-17 2021-03-04 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Concepts for distributed learning of neural networks and/or transmission of parameterization updates therefor
US20200311551A1 (en) * 2019-03-25 2020-10-01 Nokia Technologies Oy Compressing weight updates for decoder-side neural networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
AMIRHOSSEIN MALEKIJOO ET AL: "FEDZIP: A Compression Framework for Communication-Efficient Federated Learning", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 2 February 2021 (2021-02-02), XP081871550 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116341689A (zh) * 2023-03-22 2023-06-27 深圳大学 机器学习模型的训练方法、装置、电子设备及存储介质
CN116341689B (zh) * 2023-03-22 2024-02-06 深圳大学 机器学习模型的训练方法、装置、电子设备及存储介质

Similar Documents

Publication Publication Date Title
US20220256227A1 (en) High-level syntax for signaling neural networks within a media bitstream
US20230217028A1 (en) Guided probability model for compressed representation of neural networks
US20230269387A1 (en) Apparatus, method and computer program product for optimizing parameters of a compressed representation of a neural network
US20210104076A1 (en) Guiding Decoder-Side Optimization of Neural Network Filter
CN117121480A (zh) 用于在媒体比特流内用信号通知神经网络的高级语法
US11849113B2 (en) Quantization constrained neural image coding
US11528498B2 (en) Alpha channel prediction
WO2023135518A1 (fr) Syntaxe de haut niveau de codage résiduel prédictif dans une compression de réseau neuronal
WO2022269415A1 (fr) Procédé, appareil et produit-programme d'ordinateur permettant de fournir un bloc d'attention de compression d'image de vidéo reposant sur un réseau neuronal
WO2022238967A1 (fr) Procédé, appareil et produit programme d'ordinateur pour fournir un réseau neuronal réglé précisément
US20230325639A1 (en) Apparatus and method for joint training of multiple neural networks
WO2022269469A1 (fr) Procédé, appareil et produit-programme informatique d'apprentissage fédéré de données non distribuées de manière identique et non indépendantes
US20230325644A1 (en) Implementation Aspects Of Predictive Residual Encoding In Neural Networks Compression
CN114746870A (zh) 用于神经网络压缩中优先级信令的高级语法
US20230196072A1 (en) Iterative overfitting and freezing of decoder-side neural networks
US20220335269A1 (en) Compression Framework for Distributed or Federated Learning with Predictive Compression Paradigm
US20240013046A1 (en) Apparatus, method and computer program product for learned video coding for machine
WO2022224069A1 (fr) Syntaxe et sémantique pour compression de mise à jour de poids de réseaux neuronaux
WO2022224113A1 (fr) Procédé, appareil et produit programme informatique pour fournir un filtre de réseau neuronal à réglage fin
US9756331B1 (en) Advance coded reference prediction
US20230209092A1 (en) High level syntax and carriage for compressed representation of neural networks
US20230169372A1 (en) Appratus, method and computer program product for probability model overfitting
US20240195969A1 (en) Syntax and semantics for weight update compression of neural networks
US20240146938A1 (en) Method, apparatus and computer program product for end-to-end learned predictive coding of media frames
US20230412806A1 (en) Apparatus, method and computer program product for quantizing neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22737533

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE