WO2021244756A1 - Système de communication - Google Patents

Système de communication Download PDF

Info

Publication number
WO2021244756A1
WO2021244756A1 PCT/EP2020/065632 EP2020065632W WO2021244756A1 WO 2021244756 A1 WO2021244756 A1 WO 2021244756A1 EP 2020065632 W EP2020065632 W EP 2020065632W WO 2021244756 A1 WO2021244756 A1 WO 2021244756A1
Authority
WO
WIPO (PCT)
Prior art keywords
actions
user equipment
protocol data
model
information
Prior art date
Application number
PCT/EP2020/065632
Other languages
English (en)
Inventor
Alvaro VALCARCE RIAL
Jakob Hoydis
Original Assignee
Nokia Technologies Oy
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy filed Critical Nokia Technologies Oy
Priority to PCT/EP2020/065632 priority Critical patent/WO2021244756A1/fr
Publication of WO2021244756A1 publication Critical patent/WO2021244756A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W80/00Wireless network protocols or protocol adaptations to wireless operation
    • H04W80/02Data link layer protocols
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W88/00Devices specially adapted for wireless communication networks, e.g. terminals, base stations or access point devices
    • H04W88/02Terminal devices

Definitions

  • the present specification relates to communication systems, for example to training communication systems (e.g. using machine learning techniques).
  • this specification describes an apparatus (e.g. a MAC agent) comprising means for performing: training a first model at a first module of a user equipment based at least in part on information related to one or more protocols, wherein the information is received from a first communication node, and wherein the first model is configured to perform (e.g. periodically) at a first time interval: one of a plurality of first actions (e.g. signalling actions), wherein the plurality of first actions comprises generating a protocol data unit; and one of a plurality of second actions (e.g.
  • the plurality of second actions comprises transmitting a protocol data unit, wherein the one of the plurality of first actions and/ or the one of the plurality of second actions are selected based on one or more policies; and generating, using the first model, outputs relating to one or more protocol data units.
  • the apparatus may further comprise means for performing: transmitting the outputs to a server (e.g. a machine-learning server).
  • a server e.g. a machine-learning server.
  • Some example embodiments include means for performing: receiving the one or more policies from the server, wherein the one or more policies are based on outputs from a plurality of first modules.
  • the plurality of second actions may further comprises configuring a physical layer of the user equipment, and/ or configuring one or more new physical channels (e.g. new RACH channels) for transmission of the protocol data unit.
  • new physical channels e.g. new RACH channels
  • the received protocol information may comprise one or more of: information of service data units received from a radio link control layer of the user equipment; information of protocol data units received from the first communication node; radio measurements received from a physical layer of the user equipment; and state of the first module of the user equipment.
  • Some example embodiments further comprise means for performing: receiving a scalar reward in the event that one or more second actions are performed.
  • the first module may be implemented at a media access control layer of the user equipment.
  • this specification describes an apparatus (e.g. a server) comprising means for performing: retrieving information relating to protocol data units, wherein the information is received from a plurality of user equipment; providing the retrieved information to a second model, wherein the second model is configured to generate one or more policies for distribution to one or more user equipment, wherein the one or more policies are used at the one or more user equipment to generate protocol data units; and distributing, at a second time interval, the one or more policies to one or more of the plurality of user equipment.
  • this specification describes a system comprising a plurality of user equipment, a server and a communication node (such as a base station).
  • One or more of the plurality of user equipment comprises means for performing: training a first model at a first module based at least in part on information related to one or more protocols, wherein the information is received from a first communication node, and wherein the first model is configured to periodically perform at a first time interval: one of a plurality of first actions, wherein the plurality of first actions comprises generating a protocol data unit, and one of a plurality of second actions, wherein the plurality of second actions comprises transmitting a protocol data unit; wherein the one of the plurality of first actions and/ or the one of the plurality of second actions are selected based on one or more policies; generating, using the first model, outputs relating to one or more protocol data units.
  • the server comprises means for performing: retrieving information relating to protocol data units, wherein the information is received from one or more of the plurality of user equipment, wherein the information comprises the outputs generated at the first module of one or more of the plurality of user equipment; providing the retrieved information to a second model, wherein the second model is configured to generate one or more policies for distribution to one or more user equipment, wherein the one or more policies are used at the one or more user equipment to generate protocol data units; and distributing, at a second time interval, the one or more policies to one or more of the plurality of user equipment.
  • the communication node comprises means for performing: receiving a plurality of protocol data units from one or more of the plurality of user equipment; aggregating information relating to the plurality of protocol data units to generate information relating to one or more protocols; and transmitting the information related to the one or more protocols to one or more of the plurality of user equipment based on the received plurality of protocol data units.
  • this specification describes a communication node (e.g.
  • a base station such as a gNB
  • a base station comprising means for performing: receiving a plurality of protocol data units from one or more of a plurality of user equipment; aggregating information related to a plurality of protocol data units to generate information relating to one or more protocols; and transmitting the information related to the one or more protocols to one or more of the plurality of user equipment based on the received plurality of protocol data units.
  • the said means may comprise: at least one processor; and at least one memory including computer program code, the at least one memory and the computer program configured, with the at least one processor, to cause the performance of the apparatus.
  • this specification describes a method comprising: training a first model at a first module of a user equipment based at least in part information related to one or more protocols, wherein the information is received from a first communication node, and wherein the first model is configured to periodically perform at a first time interval: one of a plurality of first actions, wherein the plurality of first actions comprises generating a protocol data unit, and one of a plurality of second actions, wherein the plurality of second actions comprises transmitting a protocol data unit, wherein the one of the plurality of first actions and/ or the one of the plurality of second actions are selected based on one or more policies; and generating, using the first model, outputs relating to one or more protocol data units.
  • the method may comprise transmitting the outputs to a server (e.g. a machine-learning server).
  • the method may comprise receiving the one or more policies from the server, wherein the one or more policies are based on outputs from a plurality of first modules.
  • the plurality of second actions may further comprises configuring a physical layer of the user equipment, and/or configuring one or more new physical channels (e.g. new RACH channels) for transmission of the protocol data unit.
  • the received protocol information may comprise one or more of: information of service data units received from a radio link control layer of the user equipment; information of protocol data units received from the first communication node; radio measurements received from a physical layer of the user equipment; and state of the first module of the user equipment.
  • Some example embodiments further comprise receiving a scalar reward in the event that one or more second actions are performed.
  • this specification describes a method comprising: retrieving information relating to protocol data units, wherein the information is received from a plurality of user equipment; providing the retrieved information to a second model , wherein the second model is configured to generate one or more policies for distribution to one or more user equipment, wherein the one or more policies are used at the one or more user equipment to generate protocol data units; and distributing, at a second time interval, the one or more policies to one or more of the plurality of user equipment.
  • this specification describes a method comprising performing, at one or more user equipment: training a first model at a first module based at least in part on information related to one or more protocols, wherein the information is received from a first communication node, and wherein the first model is configured to periodically perform at a first time interval: one of a plurality of first actions, wherein the plurality of first actions comprises generating a protocol data unit, and one of a plurality of second actions, wherein the plurality of second actions comprises transmitting a protocol data unit; wherein the one of the plurality of first actions and/or the one of the plurality of second actions are selected based on one or more policies; generating, using the first model, outputs relating to one or more protocol data units.
  • the method further comprises performing, at a server: retrieving information relating to protocol data units, wherein the information is received from one or more of the plurality of user equipment, wherein the information comprises the outputs generated at the first module of one or more of the plurality of user equipment; providing the retrieved information to a second model, wherein the second model is configured to generate one or more policies for distribution to one or more user equipment, wherein the one or more policies are used at the one or more user equipment to generate protocol data units; and distributing, at a second time interval, the one or more policies to one or more of the plurality of user equipment.
  • the method further comprises performing, at a communication node: receiving a plurality of protocol data units from one or more of the plurality of user equipment; aggregating information related to the plurality of protocol data units to generate information relating to one or more protocols; and transmitting the information related to the one or more protocols to one or more of the plurality of user equipment based on the received plurality of protocol data units.
  • this specification describes a method comprising: receiving (at a communication node) a plurality of protocol data units from one or more of a plurality of user equipment; aggregating information relating to a plurality of protocol data units to generate information relating to one or more protocols; and transmitting the information related to the one or more protocols to one or more of the plurality of user equipment based on the received plurality of protocol data units.
  • this specification describes an apparatus configured to perform (at least) any method as described with reference to the fourth to eighth aspects.
  • this specification describes computer-readable instructions which, when executed by computing apparatus, cause the computing apparatus to perform (at least) any method as described with reference to the fourth to eighth aspects.
  • this specification describes a computer-readable medium (such as a non-transitory computer-readable medium) comprising program instructions stored thereon for performing (at least) any method as described above with reference to the fourth to eighth aspects.
  • this specification describes an apparatus comprising: at least one processor; and at least one memory including computer program code which, when executed by the at least one processor, causes the apparatus to perform (at least) any method as described above with reference to the fourth to eighth aspects.
  • this specification describes a computer program comprising instructions for causing an apparatus to perform at least the following: training a first model at a first module of a user equipment based at least in part information related to one or more protocols, wherein the information is received from a first communication node, and wherein the first model is configured to periodically perform at a first time interval: one of a plurality of first actions, wherein the plurality of first actions comprises generating a protocol data unit, and one of a plurality of second actions, wherein the plurality of second actions comprises transmitting a protocol data unit, wherein the one of the plurality of first actions and/ or the one of the plurality of second actions are selected based on one or more policies; and generating, using the first model, outputs relating to one or more protocol data units.
  • this specification describes a computer program comprising instructions for causing an apparatus to perform at least the following: retrieving information relating to protocol data units, wherein the information is received from a plurality of user equipment; providing the retrieved information to a second model , wherein the second model is configured to generate one or more policies for distribution to one or more user equipment, wherein the one or more policies are used at the one or more user equipment to generate protocol data units; and distributing, at a second time interval, the one or more policies to one or more of the plurality of user equipment.
  • this specification describes an apparatus comprising: a first module (e.g. a MAC agent) of a user equipment for training a first model (e.g. a machine learning model) at the first module of the user equipment based at least in part information related to one or more protocols, wherein the information is received from a first communication node (e.g.
  • a first module e.g. a MAC agent
  • a first model e.g. a machine learning model
  • the first model is configured to periodically perform at a first time interval: one of a plurality of first actions at the first module, wherein the plurality of first actions comprises generating a protocol data unit, and one of a plurality of second actions at a physical layer of the user equipment, wherein the plurality of second actions comprises transmitting a protocol data unit, wherein the one of the plurality of first actions and/or the one of the plurality of second actions are selected based on one or more policies; and means (such as the first module) for generating, using the first model, outputs relating to one or more protocol data units.
  • this specification describes an apparatus comprising: means (such as a processing module) for retrieving information relating to protocol data units (e.g. from a database), wherein the information is received from a plurality of user equipment; the processing module providing the retrieved information to a second model (e.g. a machine learning model), wherein the second model is configured to generate one or more policies for distribution to one or more user equipment, wherein the one or more policies are used at the one or more user equipment to generate protocol data units; and means (such as a communication means) for distributing, at a second time interval, the one or more policies to one or more of the plurality of user equipment.
  • a processing module for retrieving information relating to protocol data units (e.g. from a database), wherein the information is received from a plurality of user equipment
  • the processing module providing the retrieved information to a second model (e.g. a machine learning model), wherein the second model is configured to generate one or more policies for distribution to one or more user equipment, wherein the one or more policies are
  • FIGS. 1 to 3 are block diagrams of systems in accordance with example embodiments;
  • FIG. 4 is a flow chart showing an algorithm in accordance with an example embodiment;
  • FIG. 5 is a block diagram of a system in accordance with an example embodiments
  • FIG. 6 is a flow chart showing an algorithm in accordance with an example embodiment
  • FIG. 7 is a block diagram of a system in accordance with an example embodiments
  • FIG. 8 is a flow chart showing an algorithm in accordance with an example embodiment
  • FIG. 9 is a block diagram showing a decision process in accordance with an example embodiment
  • FIGS. 10 and 11 are flow charts showing algorithms in accordance with example embodiments
  • FIG. 12 is a plot showing performance results in accordance with example embodiments.
  • FIG. 13 shows a neural network in accordance with an example embodiment
  • FIG. 14 is a block diagram of a system in accordance with an example embodiment
  • FIGS. 15A and 15B show tangible media, respectively a removable non-volatile memory unit and a compact disc (CD) storing computer-readable code which when run by a computer perform operations according to embodiments.
  • CD compact disc
  • FIG. 1 is a block diagram of a system, indicated generally by the reference numeral 10, in accordance with an example embodiment.
  • the system 10 comprises a first user device 12, a second user device 13, a third user device 14 and a communication node 16 (such as a base station).
  • Each user device is in two-way communication with the communication node 16 using a protocol that may be the subject of a standard.
  • the implementation, testing and validation of standardized protocols in systems such as the system 10 is a costly activity. It is possible that a communication protocol learned by a radio node maybe used more efficiently than an expert-coded protocol. Radio nodes trained this way may be freed from human intuitions and may thus be able to minimize control-plane traffic in an unprecedented fashion.
  • FIG. 10 comprises a first user device 12, a second user device 13, a third user device 14 and a communication node 16 (such as a base station).
  • Each user device is in two-way communication with the communication node 16 using a protocol that may be the subject of a standard.
  • the system 20 comprises a user equipment 21 (which is an example of the user devices 12, 13 and 14 of the system 10 described above) and a gNB 22 (which is an example of the communication node 16 of the system 10).
  • the gNB 22 includes a medium access control (MAC) layer 24 that is in communication with a similar layer at the user equipment 21.
  • MAC medium access control
  • the corresponding layer in the user equipment 21 is a MAC agent 25.
  • the MAC layer 24 and the MAC agent 25 communicate using MAC protocol data units (PDUs) 26.
  • PDUs MAC protocol data units
  • the user equipment 21 may further comprise a radio resource control (RRC) module 28a, a packet data convergence protocol (PDCP) module 28b, a radio link control (RLC) module 28c, and a physical layer (PHY) module 27.
  • the gNB 22 may further comprise a radio resource control (RRC) module 29a, a packet data convergence protocol (PDCP) module 29b, a radio link module 29c, and a physical layer (PHY) module 29d.
  • the MAC agent 25 is an intelligent agent that can be trained to communicate with the MAC layer 24 using reinforcement learning or similar techniques. The training may take many forms.
  • the MAC agent 25 may use tabular methods (e.g. dynamic programming) or deep learning techniques by means of a deep neural network. Tabular methods maybe appropriate in small state spaces to provide fast convergence to learn simple MAC protocols (e.g. sensor networks). Deep learning techniques are typically more computationally expensive, but may provide more degrees of freedom to learn complex protocols (e.g. cellular networks).
  • a MAC layer encapsulates upper-layer Service Data Units (SDUs) into Protocol Data Units (PDUs) by adding a header to the SDUs.
  • SDUs Service Data Units
  • PDUs Protocol Data Units
  • the MAC agent 25 may be configured to learn how to build the PDU headers progressively through interaction with an expert radio counterpart (i.e. the MAC layer 24).
  • a training method may allow the MAC agent 25 to update its header- construction strategy progressively. Through repeated interaction with the gNB 22, the UE 21 can develop a fully learned PDU-construction policy.
  • FIG. 3 is a block diagram of a system, indicated generally by the reference numeral 30, in accordance with an example embodiment.
  • the system 30 comprises a MAC layer 36 (such as the MAC layer 24 described above) that is in communication with a plurality of MAC agents 32, 33 and 34 (each of which maybe similar to the MAC agent 25 described above).
  • FIG. 4 is a flow chart showing an algorithm, indicated generally by the reference numeral 40, in accordance with an example embodiment.
  • the algorithm 40 may be implemented using the system 30 described above.
  • the algorithm 40 may be used for training a first model at a first module of a user equipment (e.g. the MAC agent 25 of the UE 21).
  • the training may be performed based at least in part on information related to one or more protocols.
  • the first module may be implemented at a media access control layer of the user equipment.
  • the algorithm 40 starts at operation 42, where information related to one or more protocols is obtained, for example, at the MAC agent 25.
  • the information maybe received from a first communication node (e.g. gNB 22).
  • the gNB 22 may comprise protocol information at a MAC layer (e.g. MAC layer 24 or 36) of the gNB received from one or more of a plurality of MAC agents (e.g. MAC agents 32 to 34 shown in FIG. 3).
  • one of a plurality of first actions may be performed, for example, at a first time interval.
  • the plurality of first actions may be a signalling action that comprises generating a protocol data unit.
  • the first time interval may correspond to a transmission time interval (TTI), such that one of a plurality of first actions is performed at each TTI.
  • TTI transmission time interval
  • the first time interval may be a periodic time interval.
  • one of a plurality of second of actions may be performed, for example, at the first time interval.
  • the plurality of second actions maybe a physical layer (PHY) action that comprises transmitting a protocol data unit (e.g. on an uplink shared channel (UL-SCH)).
  • PHY physical layer
  • the second action may be performed at the physical (PHY) layer 27 of the UE 21.
  • the physical layer actions may further comprise any possible interactions between the MAC agent 25 and the PHY layer 27, for example defined by an application programming interface for configuring physical channels and/or configuring the PHY layer 27.
  • the second actions may comprise configuring a new physical channel (e.g. random access channel (RACH)) for transmitting the protocol data unit.
  • the first time interval may correspond to a transmission time interval (TP), such that one of a plurality of second actions is performed at each TTI.
  • the first time interval may be a periodic time interval.
  • the one of the plurality of first actions and/or the one of the plurality of second actions are selected based on one or more policies.
  • outputs relating to one or more protocol data units maybe generated using the first model.
  • the outputs may be related to protocol data units for a protocol for which information is received from the first communication node at the operation 42.
  • the training process is described in further detail below.
  • FIG. 5 is a block diagram of a system, indicated generally by the reference numeral 50, in accordance with an example embodiment.
  • the system 50 comprises a user equipment 51 (e.g. similar to the user equipment 21) and a gNB 52 (similar to gNB 22).
  • the user equipment 51 comprises a MAC agent 53 (similar to the MAC agent 25) and a physical layer module 54 (similar to the physical layer module 27).
  • the gNB 52 comprises a MAC layer 55 (similar to the MAC layer 24).
  • the MAC layer 55 and the MAC agent 53 may communicate using MAC protocol data units (PDUs) 57.
  • the MAC agent 53 comprises a first model 64 that may be used for generating (e.g.
  • the first model 64 may be a machine learning model (e.g. a neural network).
  • the MAC layer 55 is also in communication with a plurality of other MAC agents 56a to 56c, which plurality of other MAC agents 56a to 56c are comprised within a plurality of other user devices respectively.
  • the user equipment 51 may further comprise a radio resource control (RRC) module 68a (similar to the RRC module 28a), a packet data convergence protocol (PDCP) module 68b (similar to the PDCP module 28b), and a radio link control (RLC) module 68c (similar to the RLC module 28c).
  • the gNB 52 may further comprise a radio resource control (RRC) module 69a (similar to the RRC module 29a), a packet data convergence protocol (PDCP) module 69b (similar to the RRC module 29b), a radio link module 69c (similar to the RRC module 29c), and a physical layer (PHY) module 69d (similar to the RRC module 29d).
  • RRC radio resource control
  • PDCP packet data convergence protocol
  • PHY physical layer
  • the MAC agent 53 may, at a first time interval (e.g. TP), execute a signalling action (e.g. first action, operation 44) and a physical layer action (e.g. second action, operation
  • a signalling action e.g. first action, operation 44
  • a physical layer action e.g. second action, operation
  • signalling actions are executed by the MAC agent 53 to construct zero or one MAC PDUs on each TTL
  • the signalling actions may include the following:
  • Signalling Action 1 Build one MAC PDU & save it to a queue
  • signalling actions may comprise compound actions made of a function identifier s° e zUand a sequence of L arguments sf s 2 , ... ,s L .
  • the remaining arguments may indicate how many MAC service data units (SDUs) are to be multiplexed, what Logical Channel Identities (LCIDs) to use, howto configure the MAC Control Elements (CEs) in the various MAC subPDUs, or may indicate any other information regarding the MAC PDU structure.
  • SDUs MAC service data units
  • LCIDs Logical Channel Identities
  • CEs MAC Control Elements
  • the execution of signalling actions may not affect the physical layer (PHY) or the network in any significant way.
  • PDUs that are built as part of the signalling actions may be generated and stored in a queue. Once the PDUs are stored, one or more PDUs may be transmitted based on a physical layer (PHY) policy of the MAC agent.
  • the signalling action to be performed e.g. Signalling Action ‘o’ or Action T
  • PHY actions are executed (e.g. second action, operation 46) by the MAC agent 53 to transmit zero or one MAC PDUs on each TTL
  • a p denote the set of second actions that the MAC agent 53 can invoke on the underlying physical layer (PHY) 54.
  • This set of second actions may be a-priori and may be defined by a physical layer (PHY) application programming interface (API).
  • a p may include second actions (e.g. uplink actions) such as: ⁇ PHY Action o: No action
  • PHY Action 2 Configuring a new physical channel (e.g. RACH) for transmission
  • the MAC agent 53 can invoke these actions by means of the PHY API, which may provide a set of logical functions with configurable arguments.
  • each of the above second actions may therefore be considered as a compound action made of a function identifier p° e A p and a sequence of L arguments p 1 , p 2 , ... , p .
  • action p° 1 (e.g. Action 1) is chosen (Transmission of the next MAC PDU on the UL-SCH)
  • the argument p 1 may indicate the first argument demanded by the PHY API for this function call.
  • the PHY action to be performed e.g.
  • PHY Action ‘o’, Action , or Action ‘2’) may be selected based on a PHY usage policy p R and its equivalent action-value function Q p (o,p).
  • information of the selected PHY action may be provided to the PHY layer 54, such that the selected PHY action may be performed at the PHY layer 54.
  • the received protocol information (e.g. received at operation 42) may be provided to the first model 64, such that the first model 64 may use the received protocol information to generate the outputs (e.g. at operation 48).
  • the above protocol information may be aggregated into an observation vector o e O which may comprise one or more of: ⁇ information of service data units (SDUs) received (e.g. as shown by the arrow 59) from a radio link control (RLC) layer of the UE 51 (e.g. MAC SDUs deposited by the RLC layer on logical channels);
  • RLC radio link control
  • protocol data units received (e.g. as shown by the arrow 60) from the gNB 52 (e.g. MAC PDUs received from the gNB through the PHY API);
  • radio measurements received e.g. as shown by the arrow 61
  • PHY physical layer
  • API application programming interface
  • the MAC agent 53 may receive a reward (e.g. a scalar reward) when a PHY action is performed.
  • a reward e.g. a scalar reward
  • the reward may be calculated (e.g. at a reward calculation module 63) based on one or more reward schemes, depending on one or more objectives to be achieved.
  • the rewards may comprise information of one or more of:
  • FIG. 6 is a flowchart of an algorithm, indicated generally by the reference numeral 65, in accordance with an example embodiment.
  • the algorithm 65 starts at operation 66, where the outputs generated at operation 48, for example by the MAC agent 53, may be transmitted to a server (e.g. a central machine learning server).
  • the server may receive outputs relating to one or more protocol data units from a plurality of MAC agents (similar to the MAC agent 53 and/or MAC agents 56a to 56c) of a plurality of UEs respectively.
  • the server may have a large dataset for training a model (e.g. machine learning model) for generating policies, which policies may then be provided to the plurality of MAC agents.
  • a model e.g. machine learning model
  • the MAC agent 53 may receive one or more policies from the server. The MAC agent 53 may then use the policies received from the server for performing one or more signalling actions (e.g. operation 44) and/or one or more PHY actions (e.g. operation 46).
  • the MAC agent 53 may then use the policies received from the server for performing one or more signalling actions (e.g. operation 44) and/or one or more PHY actions (e.g. operation 46).
  • the server may also distribute the policies to the plurality of MAC agents 56a to 56c.
  • One or more of the plurality of MAC agents 56a to 56c may then use the policies received from the server for performing one or more signalling actions (e.g. operation 44) and/or one or more PHY actions (e.g. operation 46).
  • FIG. 7 is a block diagram of a system, indicated generally by the reference numeral 70, in accordance with an example embodiment.
  • the system 70 comprises a server 71 (e.g. a machine learning server), one or more routers 72, a plurality of MAC agents 73a to 73c (e.g. similar to the MAC agent 53 and/ or 56a to 56c, where the plurality of MAC agents 73a to 73c maybe comprised in a plurality of UE respectively), a channel 74, and a gNB 75 (e.g. similar to the gNB 52).
  • the server 71 may receive information regarding protocol data units from one or more of the plurality of MAC agents 73 via the router 72.
  • One or more of the plurality of MAC agents 73 may be in communication with a MAC layer (e.g. MAC layer 55) of the gNB 75, for example, for receiving (e.g. operation 42) information regarding one or more protocols.
  • a MAC layer e.g
  • system 70 may be used for implementing an algorithm described below with reference to FIG. 8.
  • FIG. 8 is a flowchart of an algorithm, indicated generally by the reference numeral 80, in accordance with an example embodiment.
  • the operations of the algorithm 80 may performed at a server, such as the server 71.
  • the algorithm starts at operation 82, where information relating to protocol data units are retrieved.
  • the information may be received at the server from a plurality of MAC agents 73a to 73c (e.g. similar to the MAC agent 53 and/or 56a to 56c) of a plurality of UEs.
  • the retrieved information may be provided to a second model (e.g. a machine learning model).
  • the second model maybe configured to generate one or more policies (e.g. signalling policies or PHY policies) for distribution to one or more user equipment.
  • the one or more policies maybe distributed to one or more of the plurality of user equipment.
  • the distribution maybe performed at a second time interval (e.g. a transmission time interval or a periodic time interval).
  • the policies may then be used by MAC agents of the one or more user equipment (e.g. MAC agents 73a to 73c, similar to the MAC agent 53 and/or 56a to 56c) to generate protocol data units (e.g. by performing signalling actions and/or PHY actions based on the policies).
  • a MAC agent e.g.
  • MAC agents 53, 73 may be trained in a wireless environment with a plurality of MAC agents, where the plurality of plurality of MAC agents maybe in communication with an ‘expert’ MAC counterpart, such as a commercial gNB (e.g. gNB 52, 75).
  • Experience samples such as PDUs generated by a first model at the MAC agents, may be transmitted over to a server.
  • the server may then use the PDUs to train a second model at the server for outputting a PHY usage policy 7 i p an d a signalling policy n s , which policies may be periodically distributed to the MAC agents.
  • the second model maybe trained using reinforcement learning (RL) algorithm implemented by the server.
  • more advanced RL learning algorithms may be used for the training of the second model.
  • FIG. 9 is a block diagram showing a decision process, indicated generally by the reference numeral 90, in accordance with an example embodiment.
  • the decision process 90 comprises a Markov Decision Process. For example, on every Transmission Time Interval (TTI) t , each MAC agent may obtain an observation ot (e.g. operation 42), chooses a PHY action Vt (e.g. PHY action performed at operation 46), a Signalling action s t (e.g. signalling action performed at operation 44), it receives a reward r t + 1 and a new observation ot+i.
  • the observation ot may includes a downlink protocol data unit (DL PDU) received from the expert MAC counterpart (e.g.
  • DL PDU downlink protocol data unit
  • the tuple (ot, pt, st, rt+i, ot+i) is denoted as an experience sample of the underlying Markov Decision
  • MDP The index t notation for the time dynamics of the MDP is illustrated in the decision process 90.
  • FIG. 10 is a flowchart of an algorithm, indicated generally by the reference numeral 100, in accordance with an example embodiment.
  • the algorithm too may be performed at a server (e.g. server 71) for training the second model.
  • a batch of experience samples may be retrieved from a database (e.g. database 76).
  • the experience samples may comprise protocol data units received (e.g. operation 82) from a plurality of MAC agents.
  • a training action may be performed, for example, by updating a Q table (e.g. as described above) or by performing a gradient descend update.
  • TTI transmission time interval
  • FIG. 11 is a flowchart of an algorithm, indicated generally by the reference numeral 110, in accordance with an example embodiment.
  • the operations of the algorithm 110 may be performed at a communication node (e.g. an expert MAC counterpart, such as gNB 22, 52, and/or 75).
  • a communication node e.g. an expert MAC counterpart, such as gNB 22, 52, and/or 75.
  • the algorithm 110 starts at operation 112, where the communication node receives protocol data units, for example, from a plurality of user devices (e.g. plurality of UEs comprising MAC agents).
  • the information related to the received plurality of protocol data units maybe aggregated to generate information (e.g. observations) relating to one or more protocols.
  • the information related to the one or more protocols may be transmitted to one or more of the plurality of user equipment based on the received plurality of protocol data units.
  • the algorithm 110 may facilitate a user device to obtain protocol information from the communication node, which protocol information comprises aggregated information from a plurality of user devices. Therefore, the pool of information (e.g. observations) available to each user device may be significantly higher compared to what each user device may collect on their own. This may improve the training of the models at the MAC agents of the individual UEs.
  • protocol information comprises aggregated information from a plurality of user devices. Therefore, the pool of information (e.g. observations) available to each user device may be significantly higher compared to what each user device may collect on their own. This may improve the training of the models at the MAC agents of the individual UEs.
  • the above example embodiments provide decentralized execution (e.g. each UE individually performing signalling actions or PHY actions) but centralized learning (e.g. the server training the second model centrally, and deploying the policies from the trained second model to a plurality of UEs), which may assist in avoiding non- stationarity.
  • centralized learning e.g. the server training the second model centrally, and deploying the policies from the trained second model to a plurality of UEs
  • only one model e.g. second model at the server
  • This approach may provide the advantage of enabling cross-vendor training, where the state transitions collected from interactions between the MAC agents and expert MACs from different vendors may be jointly used for training in the server (e.g. server 71, central machine learning server).
  • the MAC agents at each UE may have some memory capacity to avoid partial observability (where MAC agents do not have access to the observations of other agents). Techniques to deal with this may include memory-capable neural networks such as Long Short-Term Memories (LSTMs) or using a state representation that contains past observations and actions. Further, the rewards scheme in the MAC agents provide the same reward to all MAC agents to avoid agent competition degrading network performance.
  • LSTMs Long Short-Term Memories
  • the rewards scheme in the MAC agents provide the same reward to all MAC agents to avoid agent competition degrading network performance.
  • FIG. 12 is a plot, indicated generally by the reference numeral 120, showing performance results in accordance with example embodiments.
  • the plot 120 shows results from experiments that have been performed with 2 UE MAC agents and one gNB MAC expert.
  • the dashed line 121 may represent the value of the reward per episode or iteration that may allow a model to generate optimal policy(ies).
  • SDU uplink service data units
  • BLER block error rate
  • PDUs protocol data units
  • the gNB selected a random schedule and communicated to the UEs via a pre-defined signalling unknown to the UEs.
  • the MAC agents at the UEs were trained using a variant of tabular Q-learning [9], where their Q tables at time t are not only conditioned on the current observation delivered by the environment, but also on the PDUs received (e.g. operation 42) from the gNB and on the MAC agent’s internal state (h) in the precious time step.
  • the MAC agents were not built with any expert knowledge about the MAC signalling received from the gNB and assigned no semantics to it.
  • the MAC agents learned to interpret gNB signalling as Scheduling Grants and to adjust their transmissions accordingly in a collision-avoidance manner. For example, approximately 25000 training episodes may suffice for the MAC agents to learn to interpret the gNB Scheduling Grants and successfully deliver their uplink SDUs (e.g. for example, in only 4 time steps, as shown in FIG. 12).
  • Table 1 below shows the messages exchanged by a trained MAC agent (e.g. MAC agent 25, 32 to 33, 53, 56, and/or 73) and an expert MAC counterpart (e.g. MAC layer 24 of gNB 22, MAC layer 55 of gNB 52, and/or a MAC layer of gNB 75).
  • the observation o t received by the agent at each time step may indicate the number of SDUs available in the MAC agent’s transmitting buffer.
  • the downlink (DL) PDUs received by the MAC agent from the gNB are denoted by m t .
  • the lack of random uplink (UL) SDU arrivals may render Scheduling Requests useless, as the MAC agent may know that there’s always just 1 SDU to transmit.
  • FIG. 13 shows a neural network, indicated generally by the reference numeral 130, used in some example embodiments.
  • One or more of a first model e.g. implemented at the MAC agent of a UE
  • a second model e.g. implemented at a central machine learning server
  • the neural network 130 comprises an input layer 131, one or more hidden layers 132, and an output layer 133.
  • the hidden layers 132 may comprise a plurality of hidden nodes, where the processing may be performed based on the inputs received.
  • the output layer 133 one or more outputs may be generated.
  • the neural network 130 when used to implement the first model (e.g. at the MAC agent 25 or 53), the neural network 130 may be trained, at a training stage, with inputs comprising information related to one or more protocols received (e.g. operation 42) from a first communication node (e.g. the expert MAC system, gNB 22 or 52). For example, during an inference stage, at the input layer 131, similar protocol information maybe received (e.g. operation 42).
  • the hidden layers 132 may perform processing (e.g. performing a signalling action at operation 44 and a PHY action at operation 46) and outputs, such as the protocol data units may be generated (e.g. operation 48) at the output layer 133.
  • the neural network 130 when used to implement the second model (e.g. at the server 71), the neural network 130 may be trained, at a training stage, with inputs comprising information relating to protocol data units, wherein the information is received (e.g. operation 82) from a plurality of user equipment.
  • the inputs maybe received at the input layer 131.
  • the hidden layers 132 may perform processing, and outputs, such as signalling policies and PHY policies may be generated (e.g. at operation 84) at the output layer 133.
  • FIG. 14 is a schematic diagram of components of one or more of the example embodiments described previously, which hereafter are referred to generically as a processing system 300.
  • the processing system 300 may, for example, be the apparatus referred to in the claims below.
  • the processing system 300 may have a processor 302, a memory 304 closely coupled to the processor and comprised of a RAM 314 and a ROM 312, and, optionally, a user input 310 and a display 318.
  • the processing system 300 may comprise one or more network/apparatus interfaces 308 for connection to a network/apparatus, e.g. a modem which may be wired or wireless.
  • the network/ apparatus interface 308 may also operate as a connection to other apparatus such as device/apparatus which is not network side apparatus. Thus, direct connection between devices/apparatus without network participation is possible.
  • the processor 302 is connected to each of the other components in order to control operation thereof.
  • the memory 304 may comprise a non-volatile memory, such as a hard disk drive (HDD) or a solid state drive (SSD).
  • the ROM 312 of the memory 304 stores, amongst other things, an operating system 315 and may store software applications 316.
  • the RAM 314 of the memory 304 is used by the processor 302 for the temporary storage of data.
  • the operating system 315 may contain code which, when executed by the processor implements aspects of the algorithms 40, 65, 80, too, and 110 described above. Note that in the case of small device/apparatus the memory can be most suitable for small size usage i.e. not always a hard disk drive (HDD) or a solid state drive (SSD) is used.
  • HDD hard disk drive
  • SSD solid state drive
  • the processor 302 may take any suitable form. For instance, it may be a microcontroller, a plurality of microcontrollers, a processor, or a plurality of processors.
  • the processing system 300 maybe a standalone computer, a server, a console, or a network thereof.
  • the processing system 300 and needed structural parts may be all inside device/apparatus such as IoT device/apparatus i.e. embedded to very small size.
  • the processing system 300 may also be associated with external software applications. These may be applications stored on a remote server device/apparatus and may run partly or exclusively on the remote server device/apparatus. These applications maybe termed cloud-hosted applications.
  • the processing system 300 may be in communication with the remote server device/apparatus in order to utilize the software application stored there.
  • FIGS. 15A and 15B show tangible media, respectively a removable memory unit 365 and a compact disc (CD) 368, storing computer-readable code which when run by a computer may perform methods according to example embodiments described above.
  • the removable memory unit 365 may be a memory stick, e.g. a USB memory stick, having internal memory 366 storing the computer-readable code.
  • the internal memory 366 may be accessed by a computer system via a connector 367.
  • the CD 368 may be a
  • Tangible media can be any device/apparatus capable of storing data/information which data/information can be exchanged between devices/apparatus/network.
  • Embodiments of the present invention may be implemented in software, hardware, application logic or a combination of software, hardware and application logic.
  • the software, application logic and/ or hardware may reside on memory, or any computer media.
  • the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media.
  • a “memory” or “computer-readable medium” may be any non-transitory media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer.
  • Reference to, where relevant, “computer-readable medium”, “computer program product”, “tangibly embodied computer program” etc., or a “processor” or “processing circuitry” etc. should be understood to encompass not only computers having differing architectures such as single/multi-processor architectures and sequencers/parallel architectures, but also specialised circuits such as field programmable gate arrays FPGA, application specify circuits ASIC, signal processing devices/apparatus and other devices/apparatus.
  • references to computer program, instructions, code etc. should be understood to express software for a programmable processor firmware such as the programmable content of a hardware device/apparatus as instructions for a processor or configured or configuration settings for a fixed function device/apparatus, gate array, programmable logic device/apparatus, etc.
  • firmware such as the programmable content of a hardware device/apparatus as instructions for a processor or configured or configuration settings for a fixed function device/apparatus, gate array, programmable logic device/apparatus, etc.
  • the different functions discussed herein may be performed in a different order and/ or concurrently with each other.
  • one or more of the above-described functions maybe optional or maybe combined.
  • the flow diagrams of Figures 4, 6, 8, 10, and 11 are examples only and that various operations depicted therein may be omitted, reordered and/ or combined.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

L'invention concerne un appareil, un système, un procédé et un programme informatique comprenant : l'entraînement d'un premier modèle au niveau d'un premier module d'un équipement utilisateur (UE) sur la base, au moins en partie, d'informations relatives à un ou plusieurs protocoles, les informations étant reçues d'un premier nœud de communication, et le premier modèle étant configuré pour effectuer à un premier intervalle de temps : une première action parmi une pluralité de premières actions, la pluralité de premières actions comprenant la génération d'une unité de données de protocole ; et une seconde action parmi une pluralité de secondes actions, la pluralité de secondes actions comprenant l'émission d'une unité de données de protocole, ladite première action parmi la pluralité de premières actions et/ou ladite seconde action parmi la pluralité de secondes actions étant sélectionnées sur la base d'une ou de plusieurs politiques ; et la génération, à l'aide du premier modèle, de sorties relatives à une ou plusieurs unités de données de protocole.
PCT/EP2020/065632 2020-06-05 2020-06-05 Système de communication WO2021244756A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/065632 WO2021244756A1 (fr) 2020-06-05 2020-06-05 Système de communication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/065632 WO2021244756A1 (fr) 2020-06-05 2020-06-05 Système de communication

Publications (1)

Publication Number Publication Date
WO2021244756A1 true WO2021244756A1 (fr) 2021-12-09

Family

ID=71016546

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/065632 WO2021244756A1 (fr) 2020-06-05 2020-06-05 Système de communication

Country Status (1)

Country Link
WO (1) WO2021244756A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080103738A1 (en) * 2006-10-27 2008-05-01 Chandrashekar Karthikeyan Modeling and simulating wireless mac protocols
US20190138362A1 (en) * 2017-11-03 2019-05-09 Salesforce.Com, Inc. Dynamic segment generation for data-driven network optimizations
WO2020068127A1 (fr) * 2018-09-28 2020-04-02 Ravikumar Balakrishnan Système et procédé utilisant l'apprentissage collaboratif d'un environnement d'interférence et d'une topologie de réseau pour un partage de spectre autonome

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080103738A1 (en) * 2006-10-27 2008-05-01 Chandrashekar Karthikeyan Modeling and simulating wireless mac protocols
US20190138362A1 (en) * 2017-11-03 2019-05-09 Salesforce.Com, Inc. Dynamic segment generation for data-driven network optimizations
WO2020068127A1 (fr) * 2018-09-28 2020-04-02 Ravikumar Balakrishnan Système et procédé utilisant l'apprentissage collaboratif d'un environnement d'interférence et d'une topologie de réseau pour un partage de spectre autonome

Similar Documents

Publication Publication Date Title
CN113254197B (zh) 一种基于深度强化学习的网络资源调度方法及系统
CN113098714B (zh) 基于强化学习的低时延网络切片方法
Ko et al. Joint client selection and bandwidth allocation algorithm for federated learning
CN113613339B (zh) 基于深度强化学习的多优先级无线终端的信道接入方法
CN112637883A (zh) 电力物联网中对无线环境变化具有鲁棒性的联邦学习方法
US20230104220A1 (en) Radio resource allocation
CN113891276A (zh) 基于信息年龄的混合更新工业无线传感器网络调度方法
US7469203B2 (en) Wireless network hybrid simulation
CN111740925B (zh) 一种基于深度强化学习的Coflow调度方法
US11917612B2 (en) Systems and methods to reduce network access latency and improve quality of service in wireless communication
CN109548161A (zh) 一种无线资源调度的方法、装置和终端设备
CN114501667A (zh) 一种考虑业务优先级的多信道接入建模及分布式实现方法
Wu et al. A deep reinforcement learning approach for collaborative mobile edge computing
CN113094180B (zh) 无线联邦学习调度优化方法及装置
Ganjalizadeh et al. Interplay between distributed AI workflow and URLLC
WO2021244756A1 (fr) Système de communication
CN115426635B (zh) 一种不可靠传输场景下无人机通信网络推断方法及系统
CN109600853A (zh) 一种上行数据传输方法及装置
Liu et al. Atlas: automate online service configuration in network slicing
CN103517335B (zh) 数据组包方法与装置
JP7005729B2 (ja) パケットスケジューラ
Xu et al. Probabilistic Client Sampling and Power Allocation for Wireless Federated Learning
EP4173243A1 (fr) Apprentissage dans des systèmes de communication
WO2019063019A1 (fr) Procédé et dispositif de génération d'informations de système
WO2024082968A1 (fr) Systèmes et procédés pour fournir un service à faible retard dans un réseau local sans fil (wlan)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20731062

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20731062

Country of ref document: EP

Kind code of ref document: A1