WO2022167547A1 - Dynamic feature size adaptation in splitable deep neural networks - Google Patents

Dynamic feature size adaptation in splitable deep neural networks Download PDF

Info

Publication number
WO2022167547A1
WO2022167547A1 PCT/EP2022/052633 EP2022052633W WO2022167547A1 WO 2022167547 A1 WO2022167547 A1 WO 2022167547A1 EP 2022052633 W EP2022052633 W EP 2022052633W WO 2022167547 A1 WO2022167547 A1 WO 2022167547A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
dnn
compression factor
dnn model
compression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/EP2022/052633
Other languages
English (en)
French (fr)
Inventor
Suresh Kirthi Kumaraswamy
Quang Khanh Ngoc DUONG
Alexey Ozerov
Patrick Fontaine
Francois Schnitzler
Anne Lambert
Ghyslain Pelletier
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
InterDigital CE Patent Holdings SAS
Original Assignee
InterDigital CE Patent Holdings SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by InterDigital CE Patent Holdings SAS filed Critical InterDigital CE Patent Holdings SAS
Priority to US18/272,714 priority Critical patent/US20240311621A1/en
Priority to EP22707038.0A priority patent/EP4288907A1/en
Priority to JP2023544040A priority patent/JP2024509670A/ja
Priority to CN202280013234.2A priority patent/CN116940946A/zh
Publication of WO2022167547A1 publication Critical patent/WO2022167547A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6041Compression optimized for errors
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • H03M7/60General implementation details not specific to a particular type of compression
    • H03M7/6064Selection of Compressor
    • H03M7/6076Selection between compressors of the same type

Definitions

  • the present embodiments generally relate to dynamic feature size adaptation in splitable Deep Neural Network (DNN).
  • DNN Deep Neural Network
  • a device comprising: a Wireless Transmit/Receive Unit (WTRU), comprising: a receiver configured to receive a part of a Deep Neural Network (DNN) model, wherein said part is before a split point of said DNN model, and wherein said part of said DNN model includes a neural network to compress feature at said split point of said DNN model; one or more processors configured to: obtain a compression factor for said neural network, determine which nodes in said neural network are to be connected responsive to said compression factor, configure said neural network responsive to said determining, and perform inference with said part of said DNN model to generate compressed feature; and a transmitter configured to transmit said compressed feature to another WTRU.
  • WTRU Wireless Transmit/Receive Unit
  • a device comprising: a Wireless Transmit/Receive Unit (WTRU), comprising: a receiver configured to receive a part of a Deep Neural Network (DNN) model, wherein said part is after a split point of said DNN model, and wherein said part of said DNN model includes a neural network to expand feature at said split point of said DNN model, wherein said receiver is also configured to receive one or more features output from another WTRU; and one or more processors configured to: obtain a compression factor for said neural network, determine which nodes in said neural network are to be connected responsive to said compression factor, configure said neural network responsive to said determining, and perform inference with said part of said DNN model, using said one or more features output from another WTRU as input to said neural network.
  • WTRU Wireless Transmit/Receive Unit
  • a method comprising: a method performed by a Wireless Transmit/Receive Unit (WTRU), the method comprising: receiving a part of a Deep Neural Network (DNN) model, wherein said part is before a split point of said DNN model, and wherein said part of said DNN model includes a neural network to compress feature at said split point of said DNN model; obtaining a compression factor for said neural network; determining which nodes in said neural network are to be connected responsive to said compression factor; configuring said neural network responsive to said determining; performing inference with said part of said DNN model to generate compressed feature; and transmitting said compressed feature to another WTRU.
  • WTRU Wireless Transmit/Receive Unit
  • a method comprising: receiving a part of a Deep Neural Network (DNN) model, wherein said part is after a split point of said DNN model, and wherein said part of said DNN model includes a neural network to expand feature at said split point of said DNN model; receiving one or more features output from another WTRU; obtaining a compression factor for said neural network; determining which nodes in said neural network are to be connected responsive to said compression factor; configuring said neural network responsive to said determining; and performing inference with said part of said DNN model, using said one or more features output from another WTRU as input to said neural network.
  • DNN Deep Neural Network
  • FIG. 1 A is a system diagram illustrating an example communications system in which one or more disclosed embodiments may be implemented
  • FIG. IB is a system diagram illustrating an example wireless transmit/receive unit (WTRU) that may be used within the communications system illustrated in FIG. 1A according to an embodiment.
  • WTRU wireless transmit/receive unit
  • FIG. 2 illustrates a mechanism for a distributed Al between two devices without feature size compression.
  • FIGs. 3A, 3B and 3C illustrate a DNN with one, two, and three candidate splits for feature compression, respectively.
  • FIG. 4 illustrates a DNN with a single split for feature compression.
  • FIG. 5 A illustrates a feature size compression mechanism for a distributed Al between two devices, Device- 1 and Device-2, using a bandwidth-reducer (BWR) and bandwidthexpander (BWE), where a single compression factor is supported
  • FIG. 5B illustrates a feature size compression mechanism where multiple compression factors are supported.
  • FIG. 6 A illustrates the total inference latency without the BWR and BWE
  • FIG. 6B illustrates the total inference latency with the BWR and BWR, where the size of the intermediate data may be reduced.
  • FIG. 7 illustrates a process to dynamically switch between splits and compression factor (CF) configurations, according to an embodiment.
  • FIG. 8 A shows Devices 1 and 2 estimating their compute capability and the transmission channel
  • FIG. 8B illustrates the reception of the AI/ML model from each of the devices
  • FIG. 8C illustrates the inference time operations of the devices.
  • FIG. 9 illustrates a method with a single split in a DNN for adaptive feature compression, according to an embodiment.
  • FIG. 10 illustrates an example DySw capable of reducing and expanding an input of size 4.
  • FIG. 11 illustrates the connections of DySw configurations shown in FIG. 9.
  • FIG. 1A is a diagram illustrating an example communications system 100 in which one or more disclosed embodiments may be implemented.
  • the communications system 100 may be a multiple access system that provides content, such as voice, data, video, messaging, broadcast, etc., to multiple wireless users.
  • the communications system 100 may enable multiple wireless users to access such content through the sharing of system resources, including wireless bandwidth.
  • the communications systems 100 may employ one or more channel access methods, such as code division multiple access (CDMA), time division multiple access (TDMA), frequency division multiple access (FDMA), orthogonal FDMA (OFDMA), single-carrier FDMA (SC-FDMA), zero-tail unique-word DFT-Spread OFDM (ZT UW DTS-s OFDM), unique word OFDM (UW-OFDM), resource block-filtered OFDM, filter bank multicarrier (FBMC), and the like.
  • CDMA code division multiple access
  • TDMA time division multiple access
  • FDMA frequency division multiple access
  • OFDMA orthogonal FDMA
  • SC-FDMA single-carrier FDMA
  • ZT UW DTS-s OFDM zero-tail unique-word DFT-Spread OFDM
  • UW-OFDM unique word OFDM
  • FBMC filter bank multicarrier
  • the communications system 100 may include wireless transmit/receive units (WTRUs) 102a, 102b, 102c, 102d, a RAN 104, a CN 106, a public switched telephone network (PSTN) 108, the Internet 110, and other networks 112, though it will be appreciated that the disclosed embodiments contemplate any number of WTRUs, base stations, networks, and/or network elements.
  • WTRUs 102a, 102b, 102c, 102d may be any type of device configured to operate and/or communicate in a wireless environment.
  • the WTRUs 102a, 102b, 102c, 102d may be configured to transmit and/or receive wireless signals and may include a user equipment (UE), a mobile station, a fixed or mobile subscriber unit, a subscription-based unit, a pager, a cellular telephone, a personal digital assistant (PDA), a smartphone, a laptop, a netbook, a personal computer, a wireless sensor, a hotspot or Mi-Fi device, an Internet of Things (loT) device, a watch or other wearable, a head-mounted display (HMD), a vehicle, a drone, a medical device and applications (e.g., remote surgery), an industrial device and applications (e.g., a robot and/or other wireless devices operating in an industrial and/or an automated processing chain contexts), a consumer electronics device, a device operating on commercial and/or industrial wireless networks, and the like.
  • UE user equipment
  • PDA personal digital assistant
  • HMD head-mounted display
  • a vehicle a drone
  • the communications systems 100 may also include a base station 114a and/or a base station 114b.
  • Each of the base stations 114a, 114b may be any type of device configured to wirelessly interface with at least one of the WTRUs 102a, 102b, 102c, 102d to facilitate access to one or more communication networks, such as the CN 106, the Internet 110, and/or the other networks 112.
  • the base stations 114a, 114b may be a base transceiver station (BTS), aNode-B, an eNode B, a Home Node B, a Home eNode B, agNB, aNRNodeB, a site controller, an access point (AP), a wireless router, and the like. While the base stations 114a, 114b are each depicted as a single element, it will be appreciated that the base stations 114a, 114b may include any number of interconnected base stations and/or network elements.
  • the base station 114a may be part of the RAN 104, which may also include other base stations and/or network elements (not shown), such as a base station controller (BSC), a radio network controller (RNC), relay nodes, etc.
  • BSC base station controller
  • RNC radio network controller
  • the base station 114a and/or the base station 114b may be configured to transmit and/or receive wireless signals on one or more carrier frequencies, which may be referred to as a cell (not shown). These frequencies may be in licensed spectrum, unlicensed spectrum, or a combination of licensed and unlicensed spectrum.
  • a cell may provide coverage for a wireless service to a specific geographical area that may be relatively fixed or that may change over time.
  • the cell may further be divided into cell sectors. For example, the cell associated with the base station 114a may be divided into three sectors.
  • the base station 114a may include three transceivers, i.e., one for each sector of the cell.
  • the base station 114a may employ multiple-input multiple output (MIMO) technology and may utilize multiple transceivers for each sector of the cell.
  • MIMO multiple-input multiple output
  • beamforming may be used to transmit and/or receive signals in desired spatial directions.
  • the base stations 114a, 114b may communicate with one or more of the WTRUs 102a, 102b, 102c, 102d over an air interface 116, which may be any suitable wireless communication link (e.g., radio frequency (RF), microwave, centimeter wave, micrometer wave, infrared (IR), ultraviolet (UV), visible light, etc.).
  • the air interface 116 may be established using any suitable radio access technology (RAT).
  • RAT radio access technology
  • the communications system 100 may be a multiple access system and may employ one or more channel access schemes, such as CDMA, TDMA, FDMA, OFDMA, SC-FDMA, and the like.
  • the base station 114a in the RAN 104 and the WTRUs 102a, 102b, 102c may implement a radio technology such as Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access (UTRA), which may establish the air interface 116 using wideband CDMA (WCDMA).
  • WCDMA may include communication protocols such as High-Speed Packet Access (HSPA) and/or Evolved HSPA (HSPA+).
  • HSPA may include High-Speed Downlink (DL) Packet Access (HSDPA) and/or High-Speed UL Packet Access (HSUPA).
  • the base station 114a and the WTRUs 102a, 102b, 102c may implement a radio technology such as Evolved UMTS Terrestrial Radio Access (E-UTRA), which may establish the air interface 116 using Long Term Evolution (LTE) and/or LTE- Advanced (LTE- A) and/or LTE- Advanced Pro (LTE- A Pro).
  • E-UTRA Evolved UMTS Terrestrial Radio Access
  • LTE Long Term Evolution
  • LTE- A LTE- Advanced
  • LTE- A Pro LTE- Advanced Pro
  • the base station 114a and the WTRUs 102a, 102b, 102c may implement a radio technology such as NR Radio Access , which may establish the air interface 116 using New Radio (NR).
  • NR Radio Access New Radio
  • the base station 114a and the WTRUs 102a, 102b, 102c may implement multiple radio access technologies.
  • the base station 114a and the WTRUs 102a, 102b, 102c may implement LTE radio access and NR radio access together, for instance using dual connectivity (DC) principles.
  • DC dual connectivity
  • the air interface utilized by WTRUs 102a, 102b, 102c may be characterized by multiple types of radio access technologies and/or transmissions sent to/from multiple types of base stations (e.g., a eNB and a gNB).
  • the base station 114a and the WTRUs 102a, 102b, 102c may implement radio technologies such as IEEE 802.11 (i.e., Wireless Fidelity (WiFi), IEEE 802.16 (i.e., Worldwide Interoperability for Microwave Access (WiMAX)), CDMA2000, CDMA2000 IX, CDMA2000 EV-DO, Interim Standard 2000 (IS -2000), Interim Standard 95 (IS-95), Interim Standard 856 (IS-856), Global System for Mobile communications (GSM), Enhanced Data rates for GSM Evolution (EDGE), GSM EDGE (GERAN), and the like.
  • IEEE 802.11 i.e., Wireless Fidelity (WiFi)
  • IEEE 802.16 i.e., Worldwide Interoperability for Microwave Access (WiMAX)
  • CDMA2000, CDMA2000 IX, CDMA2000 EV-DO Code Division Multiple Access 2000
  • IS-2000 Interim Standard 95
  • IS-856 Interim Standard 856
  • GSM Global System
  • the base station 114b in FIG. 1A may be a wireless router, Home Node B, Home eNode B, or access point, for example, and may utilize any suitable RAT for facilitating wireless connectivity in a localized area, such as a place of business, a home, a vehicle, a campus, an industrial facility, an air corridor (e.g., for use by drones), a roadway, and the like.
  • the base station 114b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.11 to establish a wireless local area network (WLAN).
  • WLAN wireless local area network
  • the base station 114b and the WTRUs 102c, 102d may implement a radio technology such as IEEE 802.15 to establish a wireless personal area network (WPAN).
  • the base station 114b and the WTRUs 102c, 102d may utilize a cellularbased RAT (e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, LTE-A Pro, NR etc.) to establish a picocell or femtocell.
  • a cellularbased RAT e.g., WCDMA, CDMA2000, GSM, LTE, LTE-A, LTE-A Pro, NR etc.
  • the base station 114b may have a direct connection to the Internet 110.
  • the base station 114b may not be required to access the Internet 110 via the CN 106.
  • the RAN 104 may be in communication with the CN 106, which may be any type of network configured to provide voice, data, applications, and/or voice over internet protocol (VoIP) services to one or more of the WTRUs 102a, 102b, 102c, 102d.
  • the data may have varying quality of service (QoS) requirements, such as differing throughput requirements, latency requirements, error tolerance requirements, reliability requirements, data throughput requirements, mobility requirements, and the like.
  • QoS quality of service
  • the CN 106 may provide call control, billing services, mobile location-based services, pre-paid calling, Internet connectivity, video distribution, etc., and/or perform high-level security functions, such as user authentication.
  • the RAN 104 and/or the CN 106 may be in direct or indirect communication with other RANs that employ the same RAT as the RAN 104 or a different RAT.
  • the CN 106 may also be in communication with another RAN (not shown) employing a GSM, UMTS, CDMA 2000, WiMAX, E-UTRA, or WiFi radio technology.
  • the CN 106 may also serve as a gateway for the WTRUs 102a, 102b, 102c, 102d to access the PSTN 108, the Internet 110, and/or the other networks 112.
  • the PSTN 108 may include circuit-switched telephone networks that provide plain old telephone service (POTS).
  • POTS plain old telephone service
  • the Internet 110 may include a global system of interconnected computer networks and devices that use common communication protocols, such as the transmission control protocol (TCP), user datagram protocol (UDP) and/or the internet protocol (IP) in the TCP/IP internet protocol suite.
  • the networks 112 may include wired and/or wireless communications networks owned and/or operated by other service providers.
  • the networks 112 may include another CN connected to one or more RANs, which may employ the same RAT as the RAN 104 or a different RAT.
  • Some or all of the WTRUs 102a, 102b, 102c, 102d in the communications system 100 may include multi-mode capabilities (e.g., the WTRUs 102a, 102b, 102c, 102d may include multiple transceivers for communicating with different wireless networks over different wireless links).
  • the WTRU 102c shown in FIG. 1A may be configured to communicate with the base station 114a, which may employ a cellular-based radio technology, and with the base station 114b, which may employ an IEEE 802 radio technology.
  • FIG. IB is a system diagram illustrating an example WTRU 102.
  • the WTRU 102 may include a processor 118, a transceiver 120, atransmit/receive element 122, a speaker/mi crophone 124, a keypad 126, a display/touchpad 128, non-removable memory 130, removable memory 132, a power source 134, a global positioning system (GPS) chipset 136, and/or other peripherals 138, among others.
  • GPS global positioning system
  • the processor 118 may be a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), a state machine, and the like.
  • the processor 118 may perform signal coding, data processing, power control, input/output processing, and/or any other functionality that enables the WTRU 102 to operate in a wireless environment.
  • the processor 118 may be coupled to the transceiver 120, which may be coupled to the transmit/receive element 122. While FIG. IB depicts the processor 118 and the transceiver 120 as separate components, it will be appreciated that the processor 118 and the transceiver 120 may be integrated together in an electronic package or chip.
  • the transmit/receive element 122 may be configured to transmit signals to, or receive signals from, a base station (e.g., the base station 114a) over the air interface 116.
  • the transmit/receive element 122 may be an antenna configured to transmit and/or receive RF signals.
  • the transmit/receive element 122 may be an emitter/detector configured to transmit and/or receive IR, UV, or visible light signals, for example.
  • the transmit/receive element 122 may be configured to transmit and/or receive both RF and light signals. It will be appreciated that the transmit/receive element 122 may be configured to transmit and/or receive any combination of wireless signals.
  • the WTRU 102 may include any number of transmit/receive elements 122. More specifically, the WTRU 102 may employ MIMO technology. Thus, in one embodiment, the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 116.
  • the WTRU 102 may include two or more transmit/receive elements 122 (e.g., multiple antennas) for transmitting and receiving wireless signals over the air interface 116.
  • the transceiver 120 may be configured to modulate the signals that are to be transmitted by the transmit/receive element 122 and to demodulate the signals that are received by the transmit/receive element 122.
  • the WTRU 102 may have multi-mode capabilities.
  • the transceiver 120 may include multiple transceivers for enabling the WTRU 102 to communicate via multiple RATs, such as NR and IEEE 802.11, for example.
  • the processor 118 of the WTRU 102 may be coupled to, and may receive user input data from, the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128 (e.g., a liquid crystal display (LCD) display unit or organic light-emitting diode (OLED) display unit).
  • the processor 118 may also output user data to the speaker/microphone 124, the keypad 126, and/or the display/touchpad 128.
  • the processor 118 may access information from, and store data in, any type of suitable memory, such as the non-removable memory 130 and/or the removable memory 132.
  • the non-removable memory 130 may include randomaccess memory (RAM), read-only memory (ROM), a hard disk, or any other type of memory storage device.
  • the removable memory 132 may include a subscriber identity module (SIM) card, a memory stick, a secure digital (SD) memory card, and the like.
  • SIM subscriber identity module
  • SD secure digital
  • the processor 118 may access information from, and store data in, memory that is not physically located on the WTRU 102, such as on a server or a home computer (not shown).
  • the processor 118 may receive power from the power source 134, and may be configured to distribute and/or control the power to the other components in the WTRU 102.
  • the power source 134 may be any suitable device for powering the WTRU 102.
  • the power source 134 may include one or more dry cell batteries (e.g., nickel-cadmium (NiCd), nickel-zinc (NiZn), nickel metal hydride (NiMH), lithium-ion (Li-ion), etc.), solar cells, fuel cells, and the like.
  • the processor 118 may also be coupled to the GPS chipset 136, which may be configured to provide location information (e.g., longitude and latitude) regarding the current location of the WTRU 102.
  • location information e.g., longitude and latitude
  • the WTRU 102 may receive location information over the air interface 116 from a base station (e.g., base stations 114a, 114b) and/or determine its location based on the timing of the signals being received from two or more nearby base stations. It will be appreciated that the WTRU 102 may acquire location information by way of any suitable location-determination method while remaining consistent with an embodiment.
  • the processor 118 may further be coupled to other peripherals 138, which may include one or more software and/or hardware modules that provide additional features, functionality and/or wired or wireless connectivity.
  • the peripherals 138 may include an accelerometer, an e-compass, a satellite transceiver, a digital camera (for photographs and/or video), a universal serial bus (USB) port, a vibration device, a television transceiver, a hands free headset, a Bluetooth® module, a frequency modulated (FM) radio unit, a digital music player, a media player, a video game player module, an Internet browser, a Virtual Reality and/or Augmented Reality (VR/AR) device, an activity tracker, and the like.
  • FM frequency modulated
  • the peripherals 138 may include one or more sensors, the sensors may be one or more of a gyroscope, an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, alight sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
  • a gyroscope an accelerometer, a hall effect sensor, a magnetometer, an orientation sensor, a proximity sensor, a temperature sensor, a time sensor; a geolocation sensor; an altimeter, alight sensor, a touch sensor, a magnetometer, a barometer, a gesture sensor, a biometric sensor, and/or a humidity sensor.
  • the WTRU 102 may include a full duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for both the UL (e.g., for transmission) and downlink (e.g., for reception) may be concurrent and/or simultaneous.
  • the full duplex radio may include an interference management unit to reduce and or substantially eliminate self-interference via either hardware (e.g., a choke) or signal processing via a processor (e.g., a separate processor (not shown) or via processor 118).
  • the WRTU 102 may include a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g., for transmission) or the downlink (e.g., for reception)).
  • a half-duplex radio for which transmission and reception of some or all of the signals (e.g., associated with particular subframes for either the UL (e.g., for transmission) or the downlink (e.g., for reception)).
  • the WTRU is described in FIGs. 1A-1B as a wireless terminal, it is contemplated that in certain representative embodiments that such a terminal may use (e.g., temporarily or permanently) wired communication interfaces with the communication network.
  • one or more, or all, of the functions described herein with regard to one or more of: WTRU 102a-d, Base Station 114a-b, eNode-B 160a-c, MME 162, SGW 164, PGW 166, gNB 180a-c, AMF 182a-b, UPF 184a-b, SMF 183a-b, DN 185a-b, and/or any other device(s) described herein, may be performed by one or more emulation devices (not shown).
  • the emulation devices may be one or more devices configured to emulate one or more, or all, of the functions described herein.
  • the emulation devices may be used to test other devices and/or to simulate network and/or WTRU functions.
  • the emulation devices may be designed to implement one or more tests of other devices in a lab environment and/or in an operator network environment.
  • the one or more emulation devices may perform the one or more, or all, functions while being fully or partially implemented and/or deployed as part of a wired and/or wireless communication network in order to test other devices within the communication network.
  • the one or more emulation devices may perform the one or more, or all, functions while being temporarily implemented/deployed as part of a wired and/or wireless communication network.
  • the emulation device may be directly coupled to another device for purposes of testing and/or may performing testing using over-the-air wireless communications.
  • the one or more emulation devices may perform the one or more, including all, functions while not being implemented/deployed as part of a wired and/or wireless communication network.
  • the emulation devices may be utilized in a testing scenario in a testing laboratory and/or a non-deployed (e.g., testing) wired and/or wireless communication network in order to implement testing of one or more components.
  • the one or more emulation devices may be test equipment. Direct RF coupling and/or wireless communications via RF circuitry (e.g., which may include one or more antennas) may be used by the emulation devices to transmit and/or receive data.
  • RF circuitry e.g., which may include one or more antennas
  • DNN operations are often addressed by transferring the data from the mobile devices to the cloud server, where all the computations are done.
  • this is bandwidth demanding, time intensive (due to transmission latency), and raises data privacy concerns.
  • One way this can be solved is by doing all computation on the user devices (e.g., mobile phones) through lightweight and less accurate DNNs.
  • the other way is through DNN with high accuracy but by sharing the computation across single/multiple mobile devices and/or the cloud.
  • model compression techniques are widely exploited. They allow reducing model memory footprint and runtime to fit it to a particular device. However, one might not know upfront on which device the model will be executed and yet, even if the device is known, its available resources might vary over time due to, e.g., other processes. To overcome these issues, a family of so-called Flexible Al models was proposed recently. Those models can instantly adapt to the available resources through, e.g., allowing early classification exits, adapting model width (slimming), or allowing switchable model weights quantization.
  • FIG. 2 illustrates a mechanism for a distributed Al between 2 devices, Device- 1 and Device-2, without feature size compression.
  • intermediate data feature
  • FIG. 3A illustrates a DNN with one candidate split for feature compression, where al, a2, or a3 can be used as the split points.
  • FIG 3B illustrates a DNN with two candidate splits (e.g., al and a2) for feature compression.
  • FIG. 3C illustrates a DNN with three candidate splits (e.g., cl, c2 and c3) for feature compression.
  • a feature may be considered as an individual measurable property or characteristic of data that may be used to represent a phenomenon.
  • One or more features may be related to the inputs and/or outputs of a machine learning algorithm, of a neural network and/or of one of its layers.
  • features may be organized as vectors.
  • features associated with wireless use cases may include time, transmitter identity, and measurements on Reference Signals (RS).
  • RS Reference Signals
  • features associated to an algorithm used to process Positioning information may include values associated with a measurement of a positioning RS (PRS), of a quantity such as Reference Signal Receive Power (RSRP), of a quantity such as Reference Signal Receive Quality (RSRQ), of a quantity related to a Received Signal Strength Indication (RSSI), a quantity related to a time difference measurement based on signals of separate sources (e.g., for time-based positioning methods), of a quantity related to an angle of arrival measurement, of a quantity related to the quality of a beam, and/or output from a sensor (WTRU rotation, imaging from a camera, or the likes).
  • PRS positioning RS
  • RSRP Reference Signal Receive Power
  • RSSQ Reference Signal Receive Quality
  • RSSI Received Signal Strength Indication
  • WTRU rotation imaging from a camera, or the likes
  • features associated to an algorithm used to process Channel State Information may include measurements of a quantity associated with reception of Channel State Reference Signal (CSI-RS), of a Synchronization Signal Block (SSB), Precoding Matrix Indication (PMI), Rank Indicator (RI), Channel Quality Indicator (CQI), RSRP, RSRQ, RSSI or the likes.
  • CSI-RS Channel State Reference Signal
  • SSB Synchronization Signal Block
  • PMI Precoding Matrix Indication
  • RI Rank Indicator
  • CQI Channel Quality Indicator
  • RSRP Radio Service
  • RSRQ Radio Service Set
  • features associated to an algorithm used to process beam management and selection may include a quantity associated with similar measurements as for processing CSI, a Transmit/Receive Point (TRP) identity (ID), a beam ID and/or one or more parameters related to Beam Failure Detection (BFD) e.g., thresholds determination of sufficient beam quality.
  • TRP Transmit/Receive Point
  • BFD Beam Failure Detection
  • any method described herein may further be applied to, or include specific parameter settings for, hyperparameters used for the machine learning algorithm for a specific phase of the AI/ML processing e.g., training or inference.
  • FIG. 4 illustrates a DNN with a single split (a2, b2 respectively) for feature compression, where the feature size is reduced from (a) 4 to 2 and (b) 4 to 3.
  • (a3) is a subnetwork realizing 4 to 2 feature size reduction and (b3) reducing from 4 to 3.
  • compression factor is the ratio of feature size at the output of the compressor and the feature size at the input to the compressor. This means whenever there is a need to change the compression factor the devices and the cloud-server has to co-ordinate and freshly download a new model from the cloud server.
  • FIG. 5 A illustrates a feature size compression mechanism for a distributed Al between two devices, Device- 1, and Device-2, using a bandwidth-reducer (BWR, 510) and bandwidth-expander (BWE, 520), where a single compression factor is supported.
  • BWR bandwidth-reducer
  • BWE bandwidth-expander
  • FD-AI Flexible and Distributed Al
  • the proposed approach is distributed since the DNN can be split among two or more devices.
  • the proposed approach is also flexible because the split points can be chosen among several possible split point candidates, depending upon the available resource in the devices.
  • the transmitted feature size at each split point can be compressed to suit the available network bandwidth for the transmission.
  • the bottleneck subnetworks which are parts of the DNN architecture.
  • the bottleneck subnetworks are switchable as they may adapt to different transmission network bandwidths at the time of inference.
  • These bottleneck subnetworks can be incorporated at one or more split positions of any existing DNNs.
  • the first device may be either an edge device or a cloud server
  • the second device may be either an edge device or a cloud server.
  • the methods described herein may be applied to any device exchanging data over a communication link.
  • Such device may include processing of a split neural network, or an autoencoder function.
  • Methods described herein may be applicable to processing in a device e.g., for an end-user application (e.g., audio, video, or the likes) or for a function related to a processing for transmission and/or reception of data.
  • an end-user application e.g., audio, video, or the likes
  • a function related to a processing for transmission and/or reception of data More generally, such device may be a mobile terminal, a radio access network node such as gNB or the likes.
  • Such communication link may be a wireless link and/or interface such as 3GPP Uu, 3GPP sidelink or a Wifi link.
  • the DNN layers up to the split point with the feature size reducing layers of bottleneck subnetwork are loaded on to the first device.
  • the remaining part, i.e., the bottleneck subnetwork expander and the rest of the DNN after the split point are loaded on to the second device.
  • DySw Dynamic feature size Switch
  • the feature to be transmitted to the second device is extracted at the middle of the DySw.
  • DNN realizing this a Dynamic Switchable Feature Size Network (DyFsNet).
  • DyFsNet generally applies to any DNN architecture such as convolutional neural network (CNN), and it is novel in design and training.
  • the inferencing in DyFsNet is simple and adjustable (with respect to the split-positions and available network bandwidths).
  • FIG. 5B illustrates an example of a feature size compression mechanism that supports multiple compression factors for a distributed Al between two devices, Device-1, and Device- 2, using a bandwidth reducer (BWR) and bandwidth expander (BWE), where Ki, K2, ... , KN specify the compression factors inside the trainable BWR (530) and BWE (540), which are exclusive and dynamically switchable at the time of inference.
  • BWR bandwidth reducer
  • BWE bandwidth expander
  • Device- 1 and Device-2 monitor the channel conditions and device status, and select the compression factor and the feature size at the split location.
  • Device- 1 receives the first part of a DNN model up to the split location and Device-2 receives the remaining part of the DNN model.
  • inference is performed to calculate the feature from the input and then compressed by the BWR.
  • BWE can control the compression factor by controlling the node connection in the BWE. Then Device-2 continues the inference and provides the final output.
  • FIG. 6A illustrates the total inference latency without the BWR and BWE.
  • FIG. 6B illustrates the total inference latency with the BWR and BWR, where the size of the intermediate data may be reduced.
  • FIG. 7 illustrates a process to dynamic switch between split/compression factor (CF) configurations, according to an embodiment.
  • the DyFsNet model is trained for different splits and CFs. This can be currently done offline in the cloud server.
  • the trained model is saved in the cloud server and is available for downloading for the devices.
  • the orchestrator (in the server side) manages the co-ordination of trained model selection and transmission to the end devices based on the request. Here it is assumed that the information about the bandwidth is available. Based on this, the CF is estimated as the ratio of the feature size and the available bandwidth.
  • an orchestrator or external control system determines the split location for the DNN based on the compute ability of the end devices (e.g., in Device-1 and Device-2). This is communicated to the devices which load the DNN for procession in accordance to the split information.
  • trained split models are received by the devices. Once received they are loaded on the device for inferencing.
  • the network (e.g., bandwidth) and/or device (e.g., available processing power) status are monitored (730).
  • the devices monitor the network channel between them and co-ordinate CFs among themselves. This is done without involving the server.
  • the CF selection (740) is done thus impacting the feature size at the split locations.
  • available CF options depend on the number of channels in the filters of the DNN layer at which the split is realized. Normally CF is chosen to nearly match and not exactly the bandwidth available.
  • the split model inference is performed on the first device and second device (750). For example, the first device computes intermediate feature using the DNN up to the split, compresses the feature, transfers the compressed feature to the second device. The second device receives the compressed feature uncompressed it and continues with the DNN inference.
  • the device may perform at least one of the following:
  • the device may adapt the split processing points, the feature dimensions, the compression factor, inference latency, processing requirements, accuracy of function, or any other aspect proposed herein.
  • the device may trigger such adaptation for Al processing upon determination of at least one of the following in relation to Ll/physical (PHY) layer operation: o
  • the device may determine that a change in radio characteristics has occurred, where such characteristics may impact the transmission data rates over the interface, such as a change in the identity of a cell, a change in carrier frequency, a change of bandwidth part (BWP), a change in the number of physical resource blocks (PRB) of the BWP and/or of the cell, a change in sub-carrier spacing (SCS), a change in the number of aggregated carriers available for transmissions, a change in available transmission power, a change in measured quantities or the likes.
  • a change in radio characteristics has occurred, where such characteristics may impact the transmission data rates over the interface, such as a change in the identity of a cell, a change in carrier frequency, a change of bandwidth part (BWP), a change in the number of physical resource blocks (PRB) of the BWP and/or of the cell, a change in sub-
  • the device may determine that a change in the operating conditions over the wireless interface as occurred, such as a change of the control channel resources (CORESET) or identity, where a first identity may be associated to a first threshold and a second identity may be associated to a second threshold.
  • CORESET control channel resources
  • the device may determine that the change is above a specific, possible configured, threshold indicating deterioration of the channel quality and may perform an adaptation that would lower the data rate associated with the Al processing.
  • the device may determine an improvement in radio conditions and perform an adaptation that may increase the data rate associated with the Al processing.
  • the device may trigger such adaptation for Al processing upon determination of at least one of the following in relation to L2/Medium Access Control (MAC) layer operation: o
  • the device may determine that a change in data processing, information bearer
  • the device may determine that the change is above a specific, possible configured, threshold indicating a decrease in the available data rate for the Al processing and may perform an adaptation that would lower the data rate associated with the Al processing. Conversely, the device may determine an increase in available data rate and perform an adaptation that may increase the data rate associated with the Al processing.
  • PDB Packet Delay Budget
  • PBR Prioritized Bit Rate
  • TTI duration/numerology a change in the associated QoS flow ID
  • mapping restriction towards a set of resources enabling a different data rate or the likes.
  • this may be applicable to a system level function such as a positioning function of the device.
  • this may be application to a specific data radio bearer (DRB) and/or DRB type e.g., a DRB associated with a specific Al-enabled application such that a change in a DRB or its characteristics may trigger an adaptation of an Al-based processing at the associated application layer.
  • DRB data radio bearer
  • DRB type e.g., a DRB associated with a specific Al-enabled application such that a change in a DRB or its characteristics may trigger an adaptation of an Al-based processing at the associated application layer.
  • the device may trigger such adaptation for Al processing upon determination of at least one of the following in relation to L3/Radio Resource Control (RRC) layer operations: o
  • the device may determine that a change in configuration has occurred e.g., impacting one or more of the L1/L2 configuration such as aspects described above that may change the available data rates.
  • the device may determine that it has received and/or that it shall apply (e.g., for a conditional handover command) a reconfiguration message e.g., for mobility has been received, where the message may include an indication of applicable data rate for Al processing and/or its related radio bearers.
  • a radio link impairment has occurred, such as a radio link failure (RLF).
  • RLF radio link failure
  • the device may determine that the change is above a specific, possible configured, threshold indicating a decrease in the available data rate for the Al processing and may perform an adaptation that would lower the data rate associated with the Al processing. Conversely, the device may determine an increase in available data rate and perform an adaptation that may increase the data rate associated with the Al processing. Alternatively, it may determine that the event itself may be associated with an increase (e.g., addition of a cell to the connectivity of the device’s configuration e.g., dual connectivity) or a decrease (e.g., RLF and/or removal of a cell to the connectivity of the device’s configuration) of the available data rates for the Al processing.
  • an increase e.g., addition of a cell to the connectivity of the device’s configuration e.g., dual connectivity
  • a decrease e.g., RLF and/or removal of a cell to the connectivity of the device’s configuration
  • the device may trigger such adaptation for Al processing upon determination of at least one of the following in relation to available processing resources: o
  • the device may determine that a change in available hardware processing has occurred, e.g., based on a change in the number of instantiated and/or active Al processes, based on a change in dynamic device capabilities, or based on a change in processing requirement (e.g., inference latency, accuracy) for the Al processing.
  • a change in power state of the device has occurred.
  • the device may determine that it has transited from a first state to a second state, where such states may be related to a RRC connectivity state (IDLE, INACTIVE or CONNECTED), a DRX state (active, inactive) or a different configuration thereof.
  • a RRC connectivity state IDLE, INACTIVE or CONNECTED
  • a DRX state active, inactive
  • the device may determine that the change is above a specific, possible configured, threshold indicating a decrease in the available processing resources.
  • the device may determine an increase in available processing resources and perform an adaptation that may increase the data rate associated with the Al processing.
  • a specific state may be associated with a specific Al processing level, split point configuration and/or data rate associated.
  • the device may trigger such adaptation for Al processing upon determination that it receives control signalling according to at least one of the following: o
  • the device may receive control information that indicates either an increase or a decrease in Al processing / available data rates for Al processing. This may be implicitly based on a signalled value and/or a modification of the control channel property a value such as described above for LI, L2, L3 processing and/or for power saving management, or explicitly using an indication in the control message.
  • Such control information may be received in a LI signal, a LI message e.g., a DCI on PDCCH, in a L2 MAC control element or in a RRC message.
  • the control information may include the specific split point configuration to apply for a given Al processing, hyperparameters settings, target resolution, target accuracy, target feature vector or the likes.
  • FIGs. 8A, 8B and 8C provide an alternate view of the process.
  • FIG. 8A shows that Devices 1 and 2 (840, 860) estimate their compute capability and the transmission channel (850). Their estimations are conveyed (820, 830) to the operator/edge/cloud and a suitable AI/ML model (810) is requested.
  • FIG. 8B the reception of the AI/ML model from each of the devices is shown.
  • the operator/cloud/edge performs selection of the model and transmits the model by network (830), and the requested model is received by devices 1 and 2.
  • FIG. 8C depicts the inference time operations of the devices.
  • Device- 1 calculates the feature and then based on channel conditions the feature size of appropriate dimension is transmitted to Device-2.
  • Device- 1 performs inference on input data (870).
  • the input data could be one or many images from the device memory or that captured live from the camera of the device, or audio data on the device memory or captured live from the device microphone or any other data that needs to be processed by a DNN.
  • Device- 1 outputs an intermediate or early output (880) processed by the DNN such as in the case of MSDNet type of DNN.
  • the information required for further processing of the feature is also communicated via channel (850) to Device-2.
  • Device-2 receives the feature, further continues the inference and switching the CF if required, and provides the final output (890). Additionally, Device-1 transmits the feature to the Device-2 along with control information to further process the feature. Device- 2 receives the feature and control information, and continues with the inference.
  • FIG. 9 illustrates a proposed method with a single split in a DNN for feature compression.
  • FIG. 9(a) depicts jointly trained subnetwork DySw (a3) without compression factor selected.
  • FIG. 9(b) depicts jointly trained subnetwork (b3) with feature compression factor 4 to 2 selected.
  • FIG. 9(c) depicts jointly trained subnetwork (c3) with feature compression factor 4 to 3 selected.
  • the DySw can be trained together with the entire DNN.
  • the DNN without the DySw is pretrained, and the DySw subnetwork is added.
  • the pretrained DNN is augmented with DySw (a3) subnetwork and training is only for the DySw while keeping the pretrained (weights of) DNN unchanged (i.e. , fixed).
  • the DySw is reconfigurable to suit multiple compression factors.
  • the reconfiguration is realized through connection details of the DySw nodes.
  • a DySw subnetwork as illustrated in FIG. 10, we can maintain a matrix of size 4x3 specifying the node connections as shown in FIG. 11.
  • Each element (Eij) in the matrix represents whether input node i is connected to output node j, where ‘0’ represents disconnected, and ‘1’ connected.
  • the matrices as shown in FIG. 11(a), (b) and (c) correspond to FIG. 9(a), (b) and (c), respectively.
  • FIG. 9(a) specifies that none of the input nodes is connected to any output nodes, FIG.
  • FIG. 9(b) specifies that only 2 of the output nodes (output node- 2 and node-3) are connected to the input nodes, and FIG. 9 (c) specifies that all the nodes of input are connected to the output.
  • FIG. 11 shows the connection on the reducer side, and the expander can maintain matrices corresponding to different compression factors. In one example, the shape of the matrix at the expander side is transposed (with respect to the one at the reducer side) but the number of all-zero rows will remain the same.
  • the devices coordinate the CF.
  • an orchestrator or external control system informs Device-1 about the available bandwidth.
  • Device- 1 determines the CF to be used based on the information about the bandwidth.
  • Device- 1 then switches the DySw to realize the feature size compression corresponding to the determined CF.
  • Device- 1 may also communicate the CF it is using and accordingly Device-2 switches its side of the DNN to suit the communicated information.
  • Device- 1 decides which connections should be disabled between nodes to provide the selected the CF
  • Device-2 also decides which connections should be disabled correspondingly in order to properly perform the expansion.
  • the CF determines how many output nodes are connected to the input nodes, but the way and how many will be determined through learning.
  • FIG. 10 illustrates an example DySw capable of reducing and expanding an input of size 4.
  • the illustrated DySw is capable of compression from 4-to-3, 4-to-2 and 4-to-l and the corresponding expansions (i.e., l-to-4, 2- to-4 and 3-to-4).
  • DySw design may have additional layers if need, for example, BatchNorm layer for beter training.
  • BatchNorm layer can be an optional layer required for efficient training, hence not shown here.
  • a typical DySw comprises four types of layers, namely feature dimensionality reducer and expander layers, non-linearity layers and batch normalization (BatchNorm) layers. Of these layers the BatchNorm layer is optional.
  • a simple DySw is shown in FIG. 10.
  • the DySw used in a DNN classifiers can be trained using the conventional taskspecific loss, for example, cross-entropy loss for classification tasks or mean-square error loss for regression tasks.
  • the DySw can be used for any task, namely classification, detection, or segmentation, and in any DNN architecture, namely CNN, GAN, Auto-encoder etc.
  • Training a DySw involves learning reducer-expander layer weights and the parameters of batch normalization layer (also denoted as “BatchNorm”). BatchNorm is used for faster convergence of training.
  • the DySw training allows for additional constraints to the loss objective. As an illustration we show adding of reconstruction-loss across DySw. The reconstruction loss penalizes the disparity between the input to and output to the DySw.
  • the DySw is an auxiliary and optional entity which can be added to a trained DNN.
  • DySw the reduction factor is switchable on the fly at the time of inference.
  • DyFsNet the training iterations are modified to co-leam shared DySw weights with multiple reduction factors, as detailed further below.
  • the training of DySw can be offline or online, done on the cloud/operator/edge or it may be a federated training on the devices.
  • the training mechanism described here may be extended to multiple split cases.
  • DySw is a subnetwork represented by h DySw .
  • the parameters of h DySw are ⁇ DySw .
  • BWR and BWE an example implementation of such reducer and expander can include a convolutional layer, a non-liner layer (ReLu), and a batch normalization layer (BatchNorm) as summarized below:
  • ⁇ DySw [ ⁇ DySwBVVR ; ⁇ DyBSWwE ]
  • DySwBWR [layerconv DySwBWR ; ReLu ; BatchNorm]
  • DySwBWE [layerconv DySwBWE 1 ReLu ; BatchNorm]
  • the DNN with DySw is referred as DyFsNet.
  • DyFsNet be represented by h .
  • the subnetwork of DyFsNet before the split point is h device1BWR and the subnetwork after the split point is h device2BWE .
  • DySw switches among various compression factors (CF) of the feature size.
  • the CF switching is indexed by K.
  • the intermediate outputs, indexed by K, at the split of DyFsNet are as follows, y '2k hDySw BwR (y2) where h DySWBWR and h DySWBWE are the DySw subnetwork doing BWR and BWE respectively, and for a DNN classifier N c is the number of classes and subscript K represents the compression factor. depends on the objective of the DNN whether it is classifier, regressor or generator. Without loss of generality, we will assume the classifier case here.
  • the setup provides us with two types of supervision, one type is through ground truth labels Y true ⁇ B ⁇ Nc ⁇ ' and the other one is the reconstruction loss (e.g., in form of mean-square error) between the input to the DySw subnetwork and the output of the DySw subnetwork.
  • DyFsNet is initialized with a pretrained DNN, it is possible to use knowledge distillation loss between the outputs of the pretrained DNN, Y KD and the output of DySw subnetwork.
  • loss calculated with Y True and Y KD supervision as global-loss
  • the reconstruction — loss across DySw as the local loss.
  • the proposed approach deals with efficient bandwidth for transmission for distributed Al with a provision to switch among multiple feature bandwidths.
  • each device needs to load part of the Al model only one time, but the input/output features communicated between them can be flexibly configured depending on the available transmission bandwidth by enabling/disabling connection between nodes in the DySw.
  • other parameters of the DNN remain the same. That is, the same DNN model is used for different compression factors, and no new DNN model needs to be downloaded to adapt to the compression factor or the network bandwidth.
  • the Al processing can be used, for example, but not limited to, on images shot on a basic phone’s camera, or on images shot from a smart TV camera for UI interaction via gesture detection.
  • the proposed approach can be used in various scenarios.
  • the Al model can be split between device and cloud. In the following, we list several possible usage scenarios:
  • Al model split between two devices For example, the user wants to process data captured in the smart watch, where a part of processing can be done on the watch and the rest on the user’s mobile phone.
  • a terminal device that may communicate over a wireless link, where the Al processing is related to a function of a transmission and/or a reception of radio processing chain (e.g., CSI compression, CSI autoencoding, positioning determination, or the likes).
  • the Al processing is related to a function of a transmission and/or a reception of radio processing chain (e.g., CSI compression, CSI autoencoding, positioning determination, or the likes).
  • a terminal device that may communicate over a wireless link, where the Al processing is related to a function of a scheduling or data processing e.g., related to QoS processing (e.g., user plane data rate adaptation or the likes).
  • a scheduling or data processing e.g., related to QoS processing (e.g., user plane data rate adaptation or the likes).
  • ROM read only memory
  • RAM random access memory
  • register cache memory
  • semiconductor memory devices magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).
  • a processor in association with software may be used to implement a video encoder, a video decoder or both, a radio frequency transceiver for use in a UE, WTRU, terminal, base station, RNC, or any host computer.
  • processing platforms, computing systems, controllers, and other devices containing processors are noted. These devices may contain at least one Central Processing Unit (“CPU") and memory.
  • CPU Central Processing Unit
  • FIG. 1 A processor in association with software may be used to implement a video encoder, a video decoder or both, a radio frequency transceiver for use in a UE, WTRU, terminal, base station, RNC, or any host computer.
  • the acts and symbolically represented operations or instructions include the manipulation of electrical signals by the CPU.
  • An electrical system represents data bits that can cause a resulting transformation or reduction of the electrical signals and the maintenance of data bits at memory locations in a memory system to thereby reconfigure or otherwise alter the CPU's operation, as well as other processing of signals.
  • the memory locations where data bits are maintained are physical locations that have particular electrical, magnetic, or optical properties corresponding to or representative of the data bits. It should be understood that the exemplary embodiments are not limited to the above-mentioned platforms or CPUs and that other platforms and CPUs may support the provided methods.
  • the data bits may also be maintained on a computer readable medium including magnetic disks, optical disks, and any other volatile (e.g., Random Access Memory (“RAM”)) or non-volatile (e.g., Read-Only Memory (“ROM”)) mass storage system readable by the CPU.
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • the computer readable medium may include cooperating or interconnected computer readable medium, which exist exclusively on the processing system or are distributed among multiple interconnected processing systems that may be local or remote to the processing system. It is understood that the representative embodiments are not limited to the above-mentioned memories and that other platforms and memories may support the described methods.
  • any of the operations, processes, etc. described herein may be implemented as computer-readable instructions stored on a computer-readable medium.
  • the computer-readable instructions may be executed by a processor of a mobile unit, a network element, and/or any other computing device.
  • Suitable processors include, by way of example, a GPU (Graphics Processing Unit), a general purpose processor, a special purpose processor, a conventional processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine.
  • GPU Graphics Processing Unit
  • DSP digital signal processor
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • FPGAs Field Programmable Gate Arrays
  • DSPs digital signal processors
  • a signal bearing medium examples include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive, a CD, a DVD, a digital tape, a computer memory, etc., and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
  • a signal bearing medium include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive, a CD, a DVD, a digital tape, a computer memory, etc.
  • a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.).
  • any two components so associated may also be viewed as being “operably connected”, or “operably coupled”, to each other to achieve the desired functionality, and any two components capable of being so associated may also be viewed as being “operably couplable” to each other to achieve the desired functionality.
  • operably couplable include but are not limited to physically mateable and/or physically interacting components and/or wirelessly interactable and/or wirelessly interacting components and/or logically interacting and/or logically interactable components.
  • the phrase “A or B” will be understood to include the possibilities of “A” or “B” or “A and B.”
  • the terms “any of followed by a listing of a plurality of items and/or a plurality of categories of items, as used herein, are intended to include “any of,” “any combination of,” “any multiple of,” and/or “any combination of multiples of' the items and/or the categories of items, individually or in conjunction with other items and/or other categories of items.
  • the term “set” or “group” is intended to include any number of items, including zero.
  • the term “number” is intended to include any number, including zero.
  • a range includes each individual member.
  • a group having 1-3 cells refers to groups having 1, 2, or 3 cells.
  • a group having 1-5 cells refers to groups having 1, 2, 3, 4, or 5 cells, and so forth.
  • the systems may be implemented in software on microprocessors/general purpose computers (not shown).
  • one or more of the functions of the various components may be implemented in software that controls a general-purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)
PCT/EP2022/052633 2021-02-05 2022-02-03 Dynamic feature size adaptation in splitable deep neural networks Ceased WO2022167547A1 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
US18/272,714 US20240311621A1 (en) 2021-02-05 2022-02-03 Dynamic feature size adaptation in splitable deep neural networks
EP22707038.0A EP4288907A1 (en) 2021-02-05 2022-02-03 Dynamic feature size adaptation in splitable deep neural networks
JP2023544040A JP2024509670A (ja) 2021-02-05 2022-02-03 分割可能なディープニューラルネットワークにおける動的特徴サイズ適応
CN202280013234.2A CN116940946A (zh) 2021-02-05 2022-02-03 可分割深度神经网络中的动态特征尺寸适配

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP21305156 2021-02-05
EP21305156.8 2021-02-05

Publications (1)

Publication Number Publication Date
WO2022167547A1 true WO2022167547A1 (en) 2022-08-11

Family

ID=74661327

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/052633 Ceased WO2022167547A1 (en) 2021-02-05 2022-02-03 Dynamic feature size adaptation in splitable deep neural networks

Country Status (5)

Country Link
US (1) US20240311621A1 (https=)
EP (1) EP4288907A1 (https=)
JP (1) JP2024509670A (https=)
CN (1) CN116940946A (https=)
WO (1) WO2022167547A1 (https=)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115499658A (zh) * 2022-09-20 2022-12-20 支付宝(杭州)信息技术有限公司 虚拟世界的数据传输方法及装置
WO2024164473A1 (zh) * 2023-02-09 2024-08-15 索尼集团公司 用于分割学习、模型分割的电子设备和方法
WO2024168748A1 (zh) * 2023-02-16 2024-08-22 富士通株式会社 模型发送和接收方法以及装置
WO2024226398A1 (en) * 2023-04-22 2024-10-31 Qualcomm Incorporated Rate adaptation for video coding for machines
WO2024255040A1 (en) * 2023-06-13 2024-12-19 Huawei Technologies Co., Ltd. Communication method and communication apparatus
WO2025019249A1 (en) * 2023-07-18 2025-01-23 Interdigital Vc Holdings, Inc. Tensor information for intermediate data
WO2025019540A1 (en) * 2023-07-19 2025-01-23 Interdigital Vc Holdings, Inc. Multi-layer split points output information
WO2025047742A1 (ja) * 2023-08-30 2025-03-06 京セラ株式会社 通信制御方法及びユーザ装置
EP4542387A4 (en) * 2022-08-24 2025-10-08 Huawei Tech Co Ltd METHOD FOR SEGMENTING A COMPUTING TASK AND ASSOCIATED APPARATUS
EP4651458A1 (en) * 2024-05-13 2025-11-19 InterDigital CE Patent Holdings, SAS Methods, apparatuses and systems related to transport partial results data with intermediate data
US12526439B2 (en) 2023-04-22 2026-01-13 Qualcomm Incorporated Rate adaptation for video coding for machines

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230422117A1 (en) * 2022-06-09 2023-12-28 Qualcomm Incorporated User equipment machine learning service continuity
US12587970B2 (en) * 2023-09-05 2026-03-24 Qualcomm Incorporated Decibel compression point information reporting
CN118741441B (zh) * 2024-07-18 2025-02-28 北京物资学院 无线蜂窝网络中终端选择大语言模型的方法和装置
WO2026016173A1 (en) * 2024-07-19 2026-01-22 Apple Inc. Performance monitoring of chained ai model in wireless communications
CN118843159B (zh) * 2024-09-23 2024-11-19 四川科锐得电力通信技术有限公司 一种基于无线网桥的无信号区输电线路数据传输方法及系统

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200351509A1 (en) * 2017-10-30 2020-11-05 Electronics And Telecommunications Research Institute Method and device for compressing image and neural network using hidden variable

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019134802A1 (en) * 2018-01-03 2019-07-11 Signify Holding B.V. System and methods to share machine learning functionality between cloud and an iot network
WO2019193660A1 (ja) * 2018-04-03 2019-10-10 株式会社ウフル 機械学習済みモデル切り替えシステム、エッジデバイス、機械学習済みモデル切り替え方法、及びプログラム
JP7056345B2 (ja) * 2018-04-18 2022-04-19 日本電信電話株式会社 データ分析システム、方法、及びプログラム
US11700518B2 (en) * 2019-05-31 2023-07-11 Huawei Technologies Co., Ltd. Methods and systems for relaying feature-driven communications

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200351509A1 (en) * 2017-10-30 2020-11-05 Electronics And Telecommunications Research Institute Method and device for compressing image and neural network using hidden variable

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CONINCK ELIAS DE ET AL: "DIANNE: a modular framework for designing, training and deploying deep neural networks on heterogeneous distributed infrastructure", JOURNAL OF SYSTEMS & SOFTWARE, vol. 141, 1 July 2018 (2018-07-01), US, pages 52 - 65, XP055933056, ISSN: 0164-1212, DOI: 10.1016/j.jss.2018.03.032 *
JIAWEI SHAO ET AL: "BottleNet++: An End-to-End Approach for Feature Compression in Device-Edge Co-Inference Systems", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 5 June 2020 (2020-06-05), XP081680641 *
LI EN ET AL: "Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing", IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, IEEE SERVICE CENTER, PISCATAWAY, NJ, US, vol. 19, no. 1, 18 October 2019 (2019-10-18), pages 447 - 457, XP011766454, ISSN: 1536-1276, [retrieved on 20200107], DOI: 10.1109/TWC.2019.2946140 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4542387A4 (en) * 2022-08-24 2025-10-08 Huawei Tech Co Ltd METHOD FOR SEGMENTING A COMPUTING TASK AND ASSOCIATED APPARATUS
CN115499658B (zh) * 2022-09-20 2024-05-07 支付宝(杭州)信息技术有限公司 虚拟世界的数据传输方法及装置
CN115499658A (zh) * 2022-09-20 2022-12-20 支付宝(杭州)信息技术有限公司 虚拟世界的数据传输方法及装置
WO2024164473A1 (zh) * 2023-02-09 2024-08-15 索尼集团公司 用于分割学习、模型分割的电子设备和方法
WO2024168748A1 (zh) * 2023-02-16 2024-08-22 富士通株式会社 模型发送和接收方法以及装置
WO2024226398A1 (en) * 2023-04-22 2024-10-31 Qualcomm Incorporated Rate adaptation for video coding for machines
US12526439B2 (en) 2023-04-22 2026-01-13 Qualcomm Incorporated Rate adaptation for video coding for machines
WO2024255040A1 (en) * 2023-06-13 2024-12-19 Huawei Technologies Co., Ltd. Communication method and communication apparatus
WO2025019249A1 (en) * 2023-07-18 2025-01-23 Interdigital Vc Holdings, Inc. Tensor information for intermediate data
WO2025019540A1 (en) * 2023-07-19 2025-01-23 Interdigital Vc Holdings, Inc. Multi-layer split points output information
WO2025047742A1 (ja) * 2023-08-30 2025-03-06 京セラ株式会社 通信制御方法及びユーザ装置
EP4651458A1 (en) * 2024-05-13 2025-11-19 InterDigital CE Patent Holdings, SAS Methods, apparatuses and systems related to transport partial results data with intermediate data
WO2025237862A1 (en) * 2024-05-13 2025-11-20 Interdigital Ce Patent Holdings, Sas Methods, apparatuses and systems related to transport partial results data with intermediate data

Also Published As

Publication number Publication date
CN116940946A (zh) 2023-10-24
US20240311621A1 (en) 2024-09-19
JP2024509670A (ja) 2024-03-05
EP4288907A1 (en) 2023-12-13

Similar Documents

Publication Publication Date Title
US20240311621A1 (en) Dynamic feature size adaptation in splitable deep neural networks
US20230409963A1 (en) Methods for training artificial intelligence components in wireless systems
US20240224082A1 (en) Parameter selection method, parameter configuration method, terminal, and network side device
US10637551B2 (en) Generic reciprocity based channel state information acquisition frameworks for advanced networks
US11329710B2 (en) Facilitation of beam failure indication for multiple transmission points for 5G or other next generation network
US20260105363A1 (en) Aiml model life cycle management
WO2024076755A1 (en) Ai/ml-based joint denoising and compression of csi feedback
WO2022098629A1 (en) Methods, architectures, apparatuses and systems for adaptive multi-user noma selection and symbol detection
CN115865570B (zh) 通过信道状态信息报告增强多用户mimo的方法、基站
WO2019078864A1 (en) MAPPING OF SPECIFIC EU BEAMS WITH REFERENCE WEIGHT VECTORS
US20240429984A1 (en) Data-driven wtru-specific mimo pre-coder codebook design
WO2025225851A1 (en) Method and apparatus for a large channel model in a wireless communication system
WO2025222367A1 (en) Signaling of associations between dataset-identification and beam sets for model training
US20250056211A1 (en) Capability information transmission
US20220329989A1 (en) Facilitation of container management for internet of things devices for 5g or other next generation network
CN121463098A (zh) 反馈报告的配置方法、装置,网络侧设备和终端
CN121485735A (zh) Csi上报、接收方法、装置、设备、可读存储介质及计算机程序产品
WO2025041001A1 (en) Additional information reporting using mdt
CN118945701A (zh) 模型管理方法、装置及通信设备
WO2025209691A1 (en) Methods and apparatuses for csi reporting
CN120389768A (zh) 预编码矩阵的反馈方法、终端及网络侧设备
TW202608102A (zh) 用於無線網路的通用ue資料收集
WO2026072977A1 (en) End-to-end learning of channel state information feedback functions
WO2026012628A1 (en) Synchronization of twin models
WO2026012627A1 (en) Synchronization of twin models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22707038

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 202317046596

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 18272714

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2023544040

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 202280013234.2

Country of ref document: CN

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022707038

Country of ref document: EP

Effective date: 20230905