WO2022060777A1 - Online reinforcement learning - Google Patents

Online reinforcement learning Download PDF

Info

Publication number
WO2022060777A1
WO2022060777A1 PCT/US2021/050379 US2021050379W WO2022060777A1 WO 2022060777 A1 WO2022060777 A1 WO 2022060777A1 US 2021050379 W US2021050379 W US 2021050379W WO 2022060777 A1 WO2022060777 A1 WO 2022060777A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
ric
host
training
processing circuitry
Prior art date
Application number
PCT/US2021/050379
Other languages
French (fr)
Inventor
Jaemin HAN
Meryem Simsek
Shu-Ping Yeh
Dawei YING
Jingwen BAI
Hosein Nikopour
Oner Orhan
Leifeng RUAN
Original Assignee
Intel Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel Corporation filed Critical Intel Corporation
Publication of WO2022060777A1 publication Critical patent/WO2022060777A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W92/00Interfaces specially adapted for wireless communication networks
    • H04W92/04Interfaces between hierarchically different network devices
    • H04W92/12Interfaces between hierarchically different network devices between access points and access point controllers

Definitions

  • aspects pertain to wireless communications. Some aspects relate to wireless networks including 3 GPP (Third Generation Partnership Project) networks, 3 GPP LTE (Long Term Evolution) networks, 3 GPP LTE- A (LTE Advanced) networks, (MulteFire, LTE-U), and fifth-generation (5G) networks including 5G new' radio (NR.) (or 5G-NR) networks, 5G-LTE networks such as 5GNR unlicensed spectrum (NR-U) networks and other unlicensed networks including Wi-Fi, CBRS (OnGo), etc.
  • 5G networks including 5G new' radio (NR.) (or 5G-NR) networks, 5G-LTE networks such as 5GNR unlicensed spectrum (NR-U) networks and other unlicensed networks including Wi-Fi, CBRS (OnGo), etc.
  • OFD Open RAN
  • 5G new radio (5G-NR) networks will continue to evolve based on 3GPP LTE-Advanced with additional potential new radio access technologies (RATs) to enrich people’s lives with seamless wireless connectivity solutions delivering fast, rich content and services.
  • RATs new radio access technologies
  • mmWave millimeter wave
  • Potential LTE operation in the unlicensed spectrum includes (and is not limited to) the LTE operation in the unlicensed spectrum via dual connectivity (DC), or DC-based LAA, and the standalone LTE system in the unlicensed spectrum, according to which LTE-based technology solely operates in the unlicensed spectrum without requiring an “anchor” in the licensed spectrum, called MulteFire.
  • MulteFire combines the performance benefits of LTE technology with the simplicity of Wi-Fi-like deployments.
  • FIG. 1 illustrates an example Open RAN (O-RAN) system architecture.
  • FIG. 2 illustrates a logical architecture of the O-RAN system of FIG. 1.
  • FIG. 3 illustrates a system where a non-RT RIC acts as both the ML training and inference host, in accordance with some embodiments.
  • FIG. 4 illustrates a system where a non-RT RIC acts as the ML training and a near-RT RIC acts as the ML inference, in accordance with some embodiments.
  • FIG. 5 illustrates a system for online reinforcement learning, in accordance with some embodiments.
  • FIG. 6 illustrates a method for online reinforcement learning, in accordance with some embodiments.
  • FIG. 7 illustrates a method for online reinforcement learning, in accordance with some embodiments.
  • FIG. 1 provides a high-level view of an Open RAN (O-RAN) architecture 100.
  • the O-RAN architecture 100 includes four O-RAN defined interfaces - namely, the A1 interface, the 01 interface, the 02 interface, and the Open Fronthaul Management (M)-plane interface - which connect the Service Management and Orchestration (SMO) framework 102 to O-RAN network functions (NFs) 104 and the O-Cloud 106.
  • the SMO 102 (described in Reference [R13]) also connects with an external system 110, which provides enrichment data to the SMO 102.
  • the A1 interface terminates at an O- RAN Non-Real Time (RT) RAN Intelligent Controller (RIC) 112 in or at the SMO 102 and at the O-RAN Near-RT RIC 114 in or at the O-RAN NFs 104.
  • the O- RAN NFs 104 can be virtual network functions (VNFs) such as virtual machines (VMs) or containers, sitting above the O-Cloud 106 and/or Physical Network Functions (PNFs) utilizing customized hardware. All O-RAN NFs 104 are expected to support the 01 interface when interfacing with the SMO framework 102.
  • the O-RAN NFs 104 connect to the NG-Core 108 via the NG interface (which is a 3GPP defined interface).
  • the Open Fronthaul M-plane interface between the SMO 102 and the O-RAN Radio Unit (O-RU) 116 supports the O- RU 116 management in the O-RAN hybrid model as specified in Reference [R16],
  • the Open Fronthaul M-plane interface is an optional interface to the SMO 102 that is included for backward compatibility purposes as per Reference [R16] and is intended for management of the O-RU 116 in hybrid mode only.
  • the management architecture of flat mode (see Reference [R12]) and its relation to the 01 interface for the O-RU 116 is in development.
  • FIG. 2 shows an O-RAN logical architecture 200 corresponding to the O- RAN architecture 100 of FIG. 1.
  • the SMO 202 corresponds to the SMO 102
  • O-Cloud 206 corresponds to the O-Cloud 106
  • the non-RT RIC 212 corresponds to the non-RT RIC 112
  • the near-RT RIC 214 corresponds to the near- RT RIC 114
  • the O-RU 216 corresponds to the O-RU 116 of FIG. 2, respectively.
  • the O-RAN logical architecture 200 includes a radio portion and a management portion.
  • the management portion/side of the architectures 200 includes the SMO
  • the O-Cloud 206 is a cloud computing platform including a collection of physical infrastructure nodes to host the relevant O-RAN functions (e.g., the near- RT RIC 214, O-RAN Central Unit-Control Plane (O-CU-CP) 221, O-RAN Central Unit-User Plane O-CU-UP 222, and the O-RAN Distributed Unit (O-DU) 215, supporting software components (e.g., OSs, VMMs, container runtime engines, ML engines, etc.), and appropriate management and orchestration functions.
  • the radio portion/side of the logical architecture 200 includes the near-RT RIC 214, the O-DU 215, the O-RAN Radio Unit (O-RU) 216, the O-CU-CP 221, and the O-CU-UP 222 functions.
  • the radio portion/side of the logical architecture 200 may also include the O-e/gNB 210.
  • the O-DU 215 is a logical node hosting Radio Link Control (RLC), media access control (MAC), and higher physical (PHY) layer entities/elements (High- PHY layers) based on a lower layer functional split.
  • the O-RU 216 is a logical node hosting lower PHY layer entities/elements (Low-PHY layer) (e.g., FFT/iFFT, PRACH extraction, etc.) and RF processing elements based on a lower layer functional split. Virtualization of O-RU 216 is FFS.
  • the O-CU-CP 221 is a logical node hosting the RRC and the control plane (CP) part of the PDCP protocol.
  • the O-CU-UP 222 is a logical node hosting the user plane part of the PDCP protocol and the SDAP protocol.
  • An E2 interface terminates at a plurality of E2 nodes.
  • the E2 nodes are logical nodes/entities that terminate the E2 interface.
  • the E2 nodes include the O-CU-CP 221, O-CU-UP 222, O-DU 215, or any combination of elements as defined in Reference [R15],
  • the E2 nodes include the O-e/gNB 210.
  • the E2 interface also connects the O-e/gNB 210 to the Near-RT RIC 214.
  • the protocols over E2 interface are based exclusively on Control Plane (CP) protocols.
  • CP Control Plane
  • the E2 functions are grouped into the following categories: (a) near-RT RIC 214 services (REPORT, INSERT, CONTROL and POLICY, as described in Reference [R15]); and (b) near-RT RIC 214 support functions, which include E2 Interface Management (E2 Setup, E2 Reset, Reporting of General Error Situations, etc.) and Near-RT RIC Service Update (e.g., capability exchange related to the list of E2 Node functions exposed over E2).
  • E2 Interface Management E2 Setup, E2 Reset, Reporting of General Error Situations, etc.
  • Near-RT RIC Service Update e.g., capability exchange related to the list of E2 Node functions exposed over E2.
  • FIG. 2 shows the Uu interface between a UE 201 and O-e/gNB 210 as well as between the UE 201 and O-RAN components.
  • the Uu interface is a 3 GPP defined interface (see e.g., sections 5.2 and 5.3 of Reference [R07]), which includes a complete protocol stack from LI to L3 and terminates in the NG-RAN or E-UTRAN.
  • the O-e/gNB 210 is an LTE eNB (see Reference [R04]), a 5G gNB or ng-eNB (see Reference [R06]) that supports the E2 interface.
  • the O-e/gNB 210 may be the same or similar as discussed in FIGS. 3-7.
  • the UE 201 may correspond to UEs discussed with respect to FIGS. 3-7 and/or the like. There may be multiple UEs 201 and/or multiple O-e/gNB 210, each of which may be connected to one another the via respective Uu interfaces. Although not shown in FIG. 2, the O- e/gNB 210 supports O-DU 215 and O-RU 216 functions with an Open Fronthaul interface between them.
  • the OF interfaced) includes the Control User Synchronization (CUS) Plane and Management (M) Plane.
  • FIGS. 1 and 2 also show that the O-RU 216 terminates the OF M-Plane interface towards the O-DU 215 and optionally towards the SMO 202 as specified in Reference [R16].
  • the O-RU 216 terminates the OF CUS-Plane interface towards the O-DU 215 and the SMO 202.
  • the Fl-c interface connects the O-CU-CP 221 with the O-DU 215.
  • the Fl-c interface is between the gNB-CU-CP and gNB-DU nodes (see References [R07] and [R10].)
  • the Fl-c interface is adopted between the O-CU-CP 221 with the O-DU 215 functions while reusing the principles and protocol stack defined by 3 GPP and the definition of interoperability profile specifications.
  • the Fl-u interface connects the O-CU-UP 222 with the O-DU 215.
  • the Fl-u interface is between the gNB-CU-UP and gNB-DU nodes (see References [R07] and [R10]).
  • the Fl-u interface is adopted between the O-CU-UP 222 with the O-DU 215 functions while reusing the principles and protocol stack defined by 3GPP and the definition of interoperability profile specifications.
  • the NG-c interface is defined by 3GPP as an interface between the gNB- CU-CP and the AMF in the 5GC (see Reference [R06]).
  • the NG-c is also referred as the N2 interface (see Reference [R06]).
  • the NG-u interface is defined by 3GPP, as an interface between the gNB-CU-UP and the UPF in the 5GC (see Reference [R06]).
  • the NG-u interface is referred as the N3 interface (see Reference [R06]).
  • NG-c and NG-u protocol stacks defined by 3GPP are reused and may be adapted for O-RAN purposes.
  • the X2-c interface is defined in 3GPP for transmitting control plane information between eNBs or between eNB and en-gNB in EN-DC.
  • the X2-u interface is defined in 3GPP for transmitting user plane information between eNBs or between eNB and en-gNB in EN-DC (see e.g., [005], [006]).
  • X2- c and X2-u protocol stacks defined by 3GPP are reused and may be adapted for O-RAN purposes.
  • the Xn-c interface is defined in 3GPP for transmitting control plane information between gNBs, ng-eNBs, or between an ng-eNB and gNB.
  • the Xn-u interface is defined in 3GPP for transmitting user plane information between gNBs, ng-eNBs, or between ng-eNB and gNB (see e.g., References [R06] and [R08]).
  • Xn-c and Xn-u protocol stacks defined by 3GPP are reused and may be adapted for O-RAN purposes
  • the El interface is defined by 3GPP as being an interface between the gNB-CU-CP (e.g., gNB-CU-CP 3728) and gNB-CU-UP (see e.g., [007], [009]).
  • El protocol stacks defined by 3GPP are reused and adapted as being an interface between the O-CU-CP 221 and the O-CU-UP 222 functions.
  • the O-RAN Non-Real Time (RT) RAN Intelligent Controller (RIC) 212 is a logical function within the SMO framework 102, 202 that enables non-realtime control and optimization of RAN elements and resources; AVmachine learning (ML) workflow(s) including model training, inferences, and updates; and policy-based guidance of applications/features in the Near-RT RIC 214.
  • RT Non-Real Time
  • RIC RAN Intelligent Controller
  • the O-RAN near-RT RIC 214 is a logical function that enables near-realtime control and optimization of RAN elements and resources via fine-grained data collection and actions over the E2 interface.
  • the near-RT RIC 214 may include one or more AIZML workflows including model training, inferences, and updates.
  • the non-RT RIC 212 can be an ML training host to host the training of one or more ML models.
  • the ML data can be collected from one or more of the following: the Near-RT RIC 214, O-CU-CP 221, O-CU-UP 222, O-DU 215, O- RU 216, external enrichment source 110 of FIG. 1, and so forth.
  • the ML training host and/or ML inference host/actor can be part of the non-RT RIC 212 and/or the near-RT RIC 214.
  • the ML training host and ML inference host/actor can be part of the non-RT RIC 212 and/or the near-RT RIC 214.
  • the ML training host and ML inference host/actor are co-located as part of the near-RT RIC 214.
  • the non-RT RIC 212 may request or trigger ML model training in the training hosts regardless of where the model is deployed and executed. ML models may be trained and not currently deployed.
  • the non-RT RIC 212 provides a query-able catalog for an ML designer/developer to publish/install trained ML models (e.g., executable software components).
  • the non-RT RIC 212 may provide discovery mechanism if a particular ML model can be executed in a target ML inference host (MF), and what number and type of ML models can be executed in the target ML inference host.
  • the Near-RT RIC 214 is a managed function (MF).
  • ML catalogs made discoverable by the non-RT RIC 212: a design-time catalog (e.g., residing outside the non-RT RIC 212 and hosted by some other ML platform(s)), a training/deployment-time catalog (e.g., residing inside the non-RT RIC 212), and a run-time catalog (e.g., residing inside the non-RT RIC 212).
  • the non-RT RIC 212 supports necessary capabilities for ML model inference in support of ML assisted solutions running in the non-RT RIC 212 or some other ML inference host. These capabilities enable executable software to be installed such as VMs, containers, etc.
  • the non-RT RIC 212 may also include and/or operate one or more ML engines, which are packaged software executable libraries that provide methods, routines, data types, etc., used to run ML models.
  • the non-RT RIC 212 may also implement policies to switch and activate ML model instances under different operating conditions.
  • the non-RT RIC 22 is able to access feedback data (e.g., FM, PM, and network KPI statistics) over the 01 interface on ML model performance and perform necessary evaluations. If the ML model fails during runtime, an alarm can be generated as feedback to the non-RT RIC 212. How well the ML model is performing in terms of prediction accuracy or other operating statistics it produces can also be sent to the non-RT RIC 212 over 01.
  • the non-RT RIC 212 can also scale ML model instances running in a taiget MF over the 01 interface by observing resource utilization in MF.
  • the environment where the ML model instance is running (e.g., the MF) monitors resource utilization of the running ML model.
  • the scaling mechanism may include a scaling factor such as an number, percentage, and/or other like data used to scale up/down the number of ML instances.
  • ML model instances running in the target ML inference hosts may be automatically scaled by observing resource utilization in the MF. For example, the Kubemetes® (K8s) runtime environment typically provides an auto-scaling feature.
  • the A1 interface is between the non-RT RIC 212, which is within or the SMO 202) and the near-RT RIC 214.
  • the A1 interface supports three types of services as defined in Reference [R14], including a Policy Management Service, an Enrichment Information Service, and ML Model Management Service.
  • A1 policies have the following characteristics compared to persistent configuration as defined in Reference [R14]: A1 policies are not critical to traffic; A1 policies have temporary validity; A1 policies may handle individual UE or dynamically defined groups of UEs; A1 policies act within and take precedence over the configuration; and A1 policies are non-persistent, i.e., do not survive a restart of the near-RT RIC.
  • a technical problem is how to train and maintain good AI/ML models to be used by an inference host to perform E2 control and other controls.
  • the disclosed examples address this issue by including two training hosts: one in the non-RT RIC and one in the near-RT RIC.
  • the training host in the near-RT RIC performs online learning, which may use different data than the offline learning, and ensures that an adequate AI/ML model is being used by the inference host.
  • the training host of the non-RT RIC performs offline learning and transfers an initial model and updated models to a model repository that is used to store AIZML model that may be used by the training host of the near-RT RIC.
  • a deployment is disclosed of online reinforcement learning in the Near- RT RIC.
  • the AI/ML training host and inference host are located in the Near-RT RIC, while an offline learning host and ML model repository reside in the Non- RT RIC.
  • the deployment reduces communication and feedback delay between the ML training host and the ML inference host.
  • the delay reduction is essential for online reinforcement learning, especially for generating fast changing decision-making policies that adapt to highly dynamic environments.
  • the ML model repository ensures performance of the online reinforcement learning by saving the most accurate and best performing ML models.
  • Examples disclose a deployment scenario for online reinforcement learning in the Near-RT RIC, which incorporates an online training host and inference host in the Near-RT RIC, while an offline training host and ML model repository reside in SMO/Non-RT RIC.
  • FIG. 3 illustrates a system 300 where a non-RT RIC acts as both the ML training and inference host, in accordance with some embodiments.
  • ML training information 322 is collected from the DU/O-CU 332 over the E2 interface and/or 01 interface and sent to data management 308.
  • ML online information 324 is collected from the E2 interface and/or 01 interface and sent to data management 308.
  • Data management 308 sends the information to the ML training 316 and ML inference 315.
  • the ML inference 314 uses a model and sends configuration management 306 (if DU or CU is subject of action).
  • the ML inference 314 sends Policy/intent (if near-RT RIC is subject of action) 304 to the near-RT RIC 302.
  • the 01 management (MGMT) 310 sends data enrichment 330 (and deploy instructions and models).
  • the non-RT RIC 312 includes the ML training 316 and ML inference 314.
  • the ML inference sends perform data 338 to the ML training 316.
  • the ML training 316 sends deploy instructions and models 336 to the ML inference 314.
  • FIG. 4 illustrates a system 400 where a non-RT RIC acts as the ML training and a near-RT RIC acts as the ML inference, in accordance with some embodiments.
  • the ML inference 314 resides in the near-RT RIC rather than the non-RT RIC 312.
  • 01 management sends ML deploy 404 instructions or models to the ML inference 314.
  • the same numbers as FIG. 3 are meant to indicate the same or similar information and/or function.
  • FIG. 5 illustrates a system 500 for online reinforcement learning, in accordance with some embodiments.
  • the performance feedback 530 is training data for online training, e.g., rewards, environment states, performed actions, and so forth, and data for performance monitoring.
  • the training host (online learning) (“training host near-RT”) 514 is configured for online learning.
  • the training host near-RT 514 resides in the near real-time RIC 510.
  • the inference host 512 is an Al/ML inference host.
  • the inference host 512 resides in the near- RTRIC.
  • the training host (offline) (“training host non-RT”) 506 is a training host for offline learning.
  • the training host non-RT 506 resides in the SMO or non- RT RIC 504.
  • the model repository 508 is an Al/ML model repository.
  • the model repository 508 resides in the SMO/non-RT RIC (“non-RT RIC”) 504.
  • the training host non-RT 506 collects ML learning data 516 from the E2 nodes O-CU/O-DU 502 (“E2 nodes”) over the 01 interface for offline reinforcement learning.
  • the training host non-RT 506 trains the initial model based on the offline training.
  • the training host non-RT 506 transfers via move model 518 the initial model 534, which is an offline trained model, to the model repository as a trained model 536.
  • the ML learning data 516 e.g., O-CU/O-DU data collected over 01, for offline training, e.g., as performed by training host non-RT 506, and the O-CU/O-DU data collected over E2, e.g., inference data 526, for online learning, e.g., training host near-RT 514, and/or inference use, e.g., inference host 512, may be different.
  • the model repository 508 is associated with or stored in the SMO/Non-
  • the trained model 536 may be trained, validated, and tested.
  • the trained model 536 may be the initial model 534 transferred to the model repository 508.
  • the stored model, e.g., trained model 536 may be tested to be well-performing and may be used as a backup for online learning, in case the running model 538 drifts too much and leads to severe degradation.
  • the model repository 508 sends out a model download 522 notification to the training host near-RT 514, which is associated with or located in the near- RT RIC 510 and the model repository 508 sends the model, e.g., trained model 536, to the training host near-RT 514.
  • the model repository 508 receives a model download request from the training host near-RT 514 associated with or in the near-RT RIC 510, and the model repository 508 sends, in response, a model, e.g., trained model 536 or updated model 540, to the training host near- RT 514.
  • the model repository 508 receives a model upload request from the training host near-RT 514 associated with or in the near-RT RIC 510, and the model repository 508 receives a model upload 524 comprising the updated model 540 from the training host near-RT 514.
  • the model download 522 and the model upload 524 between the model repository 508 associated with or in SMO/Non-RT RIC 508 and training host near-RT 514 associated with or in the near-RT RIC 510 are communicated over the A1 interface.
  • the request model message for a model download 522 and the notification message for model upload 524 are part of the Al-ML service.
  • the model download 522 and the model upload 524 are communicate over the 01 interface.
  • the training host near-RT 514 in the near-RT RIC 510 collects learning data for online reinforcement learning from E2 nodes over the E2 interface and from the ML inference host (“inference host”) 512, e.g., the performance feedback 530, using an application program interface (API) of the near-RT RIC 510.
  • the training host near-RT 514 updates the model, e.g., running model 538, based on the above training data for online learning, and it deploys the AI/ML model, e.g., running model 538, to the inference host 512.
  • the model e.g., running model 538
  • the training host near-RT 514 associated with or in the near-RT RIC 510 receives a model download notification from the model repository 508, and it receives the model, e.g., train model 536 or updated model 540, from the model repository 508.
  • the training host near-RT 514 in the near-RT RIC 510 communicates or sends out a model download 522 request to the model repository 508, and the training host near-RT 514 receives the model from the model repository 508.
  • the training host near-RT 510 associated with or in the Near-RT RIC 510 sends out a model upload 524 request to the model repository 508, and the training host near-RT 514 sends the model to the model repository 508.
  • the training host near-RT 514 in the near-RT RIC 510 receives AIZML performance feedback 530 from the inference host 512. If the training host near- RT 514 detects severe performance degradation, then the training host near-RT 514 can send a model download 522 request to the model repository 508. After a model download 522 of a previously well-performing model, e.g., updated model 540, from the model repository 508, the training host near-RT 514 communicates a model deploy 532 with this backup model to the inference host 512.
  • a model download 522 of a previously well-performing model e.g., updated model 540
  • the training host near-RT 514 associated with or in the near-RT RIC 510 determines whether to communicate a model upload 524 of the latest trained model, e.g., trained running model 542, to the model repository 508.
  • a validation and testing procedure for the uploaded model is done by the training host near-RT 514 associated with or in the near-RT RIC 510 before uploading.
  • the inference host 512 is associated with or resides in the near-RT RIC 510.
  • the inference host 512 which may be termed a ML inference host, collects data, e.g., inference data 526, from E2 node over the E2 interface for inference.
  • the inference host 512 uses the model, e.g., the running model 538, deployed by the training host near-RT 514 associated with or residing in the Near-RT RIC
  • the inference host 512 enforces the control actions/guidance via E2 interface.
  • the inference host 512 sends performance feedback and training data 530 for online learning to the training host near-RT 514 in the near-RT RIC 510.
  • the SMO/NON-RT RIC 504 and the near-RT RIC 510 may communicate over the Ol/Al interfaces.
  • the ML learning data 516 is data for offline training.
  • Performance feedback 530 includes data is for online training.
  • the ML learning data 516, the inference data 526, and/or the performance feedback 530 may include one or more of the following: the size, number, and the ML downlink (DL) physical resource blocks (PRBs) used for data traffic, the size, number, and the uplink (UL) PRBs used for data traffic; an average DL user equipment (UE) throughput in a next generation Node-B (gNB) of the 0-RAN network; an average UL UE throughput in the gNB; a number of protocol data unit (PDU) sessions requested for setup in the O-RAN network; a number of PDU sessions successfully set up in the O-RAN network; and/or a number of PDU sessions failed to set up in the O-RAN network.
  • DL physical resource blocks
  • UL uplink
  • FIG. 6 illustrates a method 600 for online reinforcement learning, in accordance with some embodiments.
  • the method 600 begins at operation 602 (or step 1) with collecting training data for offline learning.
  • the training host non-RT 506 collects ML learning data 516 from the E2 nodes O- CU/O-DU 502 (“E2 nodes”) 502 over the 01 interface for offline reinforcement learning.
  • E2 nodes E2 nodes
  • the method 600 continues at operation 604 (or step 2) with performing offline learning.
  • the training host non-RT 506 trains the initial model, e.g., initial model 534, based on the offline training using the ML learning data 516.
  • the method 600 continues at operation 606 (or step 3) with moving the initial model.
  • the initial model 534 is moved from the training host non-RT 506 offline learning to the model repository 508 in the non-RT RIC 504 as a trained model 538.
  • the method 600 continues at operation 608 (or step 4) downloading the model to a near real time training host.
  • the ALML model e.g., trained model 536
  • the training host near-RT 514 associated with or residing in the near-RT RIC 510.
  • the method 600 continues at operation 610 (or step 5) with deploying the model.
  • the AIZML model e.g., running model 538, trained running model 542, trained model 536, or updated model 540
  • the inference host 512 e.g., xApp
  • the method 600 continues at operation 612 (or step 6) with collecting inference data.
  • inference data 526 is collected from E2 nodes via the E2 interface, e.g., E2 of FIGS. 1-4.
  • the method 600 continues at operation 614 (or step 7) with generating decision-making policies.
  • the ML inference host e.g., inference hots 512
  • the ML model e.g., running model 538
  • the inference data e.g., inference data 526.
  • the method 600 continues at operation 616 (or step 8) with enforcing E2 control actions/guidance via the E2 interface.
  • the ML inference host e.g., inference host 512
  • enforces E2 control action/guidance via the E2 interface e.g., E2 control 528.
  • the method 600 continues at operation 618 (or step 9) with collecting training data over the E2 interface.
  • the training data e.g., performance data 530
  • the training host e.g., training host near-RT 514, in the near-RT RIC 510.
  • the method 600 continues at operation 620 (or step 10) with providing feedback.
  • the ML inference host e.g., inference host 512
  • provides performance feedback and online training data e.g., performance feedback 530
  • the training host e.g., training host near-RT 514 associated with or residing in the near-RT RIC 510.
  • the method 600 continues at operation 622 (or step 11) with performing online learning.
  • the training host e.g., training host near-RT 514, in the near-RT RIC 510 performs online learning based on the online learning data, e.g., performance data 530, from the inference host, inference host 512, and E2 nodes.
  • the method 600 continues at operation 624 (or step 12) deploy updated model.
  • the training host e.g., training host near-RT 514, in the near-RT RIC, near-RT RIC 510
  • deploys e.g., deploy model 532
  • updated model e.g., trained running model 542 becomes the running model 538
  • the ML inference host e.g., inference host 512.
  • the method 600 after operation 624 the method 600 returns to operation 612.
  • the method 600 may return to operation 612 during operation of the inference host 512.
  • the method 600 may continue to operation 626 when the training host 514 determines that a new AI/ML model should be used.
  • the method 600 may return to operation 612 after operation 628.
  • the method 600 continues at operation 626 (or step 13) sending upload request.
  • the training host e.g., the training host near-RT 514, associated with or residing in the near-RT RIC 510 detects the running model 538 performs well (based on performance feedback data)
  • the training host near-RT 514 may send a model upload request to the model repository 508, and the updated AI/ML model, e.g., trained running model 542, is uploaded and stored in the model repository, e.g., as updated model 540.
  • the method 600 continues at operation 628 (or step 14) with request model download.
  • the training host e.g., training host near-RT 514
  • the near-RT RIC e.g., near-RT RIC 510
  • the training host 514 and/or inference host 512 may detect performance degradation that is above a threshold value and send a request for a different model.
  • the training host near-RT 514 may use the trained running model 542 or may request a model from the model repository 508, e.g., trained model 536 or updated model 540, which may be communicated or sent to the training host near-RT 514 via model download 522.
  • the training host near-RT 514 may select which model to deploy in the inference host 512 based on performance data 530 and/or previous performance information associated with the other models, e.g., running model 538, trained running model 542, trained model 536, or updated model 540.
  • the method 600 may include one or more additional operations.
  • the operations of method 600 may be performed in a different order.
  • One or more of the operations of method 600 may be optional.
  • Different step may be performed by different functional entities such as training host 506, model repository 508, inference host 512, and/or training host 514.
  • FIG. 7 illustrates a method 700 for online reinforcement learning, in accordance with some embodiments.
  • the method 700 may be performed by a near-RT RIC in an O-RAN.
  • the method 700 begins at operation 702 with receiving a model such as an AI/ML model.
  • the training host 514 of the near- RT RIC 510 may receive a model such as trained model 536, updated model 540, running model 538, and/or initial model 534.
  • the method 700 continues at operation 704 with receiving training data.
  • the training host 514 of the near-RT RIC 510 may receive performance feedback 530 from the inference host 512.
  • the method 700 continues at operation 706 with updating the AI/ML model based on the training data.
  • training host 514 of the near-RT RIC 510 may perform training on the AI/ML model to update the model and generate the trained running model 542 with updates.
  • the training host 506 may be performing offline learning and may update AI/ML models.
  • the data used to update the models may be different as disclosed herein.
  • Example 1 includes where a deployment scenario for online reinforcement learning includes he training host for online learning and the AI/ML inference host reside in the Near-RT RIC; and, the training host for offline learning and AI/ML model repository are in SMO/Non-RT RIC.
  • Example 2 the subject matter of Example 1 includes where the functionality of the training host in SMO/Non-RT RIC includes the following: (1) collecting learning data for offline reinforcement learning from E2 nodes over the 01 interface; (2) training the initial model based on the offline training; and, (3) transferring the offline trained model to the model repository.
  • Example 3 the subject matter of Examples 1 and 2 includes where the functionality of the model repository in SMO/Non-RT RIC includes the following: (1) storing trained models; (2) sending out model download notification to the training host in the Near-RT RIC, and sending the model to that training host; (3) It receives model download requests from the training host in the Near-RT RIC, and sends the model to that training host; and, (4) receiving the model upload request from the training host in the Near-RT RIC, and receiving the updated model from the training host.
  • the functionality of the model repository in SMO/Non-RT RIC includes the following: (1) storing trained models; (2) sending out model download notification to the training host in the Near-RT RIC, and sending the model to that training host; (3) It receives model download requests from the training host in the Near-RT RIC, and sends the model to that training host; and, (4) receiving the model upload request from the training host in the Near-RT RIC, and receiving the updated model from the training host.
  • Example 4 the subject matter of Examples 1-3 includes where the model download and upload are between the model repository in SMO/Non-RT RIC and ML training host in the Near-RT RIC.
  • the interactions are over A1 interface where the request and notification for model download and upload are part of Al-ML service.
  • the interactions are over 01 interface.
  • Example 5 the subject matter of Examples 1-4 includes where the training host in the Near-RT RIC is configured to perform the following: (1) collect part of learning data for online reinforcement learning from E2 nodes over E2 interface; (2) collect part of learning data for online reinforcement learning from the ML inference host over Near-RT RIC’s internal API; (3) update the model based on the online training data; (4) deploy the AI/ML model to the inference host; (5) receive model download notification from the model repository and receive the model from the model repository; (6) send out model download request to the model repository, and, in response, receive the model from the model repository; (7) send out model upload request to the model repository and send the model to the model repository; and, (8) receive A1ZML performance feedback from the ML inference host.
  • Example 6 the subject matter of Examples 1-5 includes where the ML inference host in the Near-RT RIC is configured to perform the following: (1) collect data from E2 node over E2 interface for inference; (2) infer E2 control using the model deployed by the training host in the Near-RT RIC; (3) enforce the control actions/guidance via E2 interface; and (4) send performance feedback and training data for online learning to the training host in the Near-RT RIC.
  • Example 7 a method for initial offline training including the following:
  • Step 1 collecting training data from the E2 nodes over 01 interface to SMO/Non-RT RIC; (2) Step 2: the training host inside SMO/Non-RT RIC performing offline learning; (3)Step 3: the trained offline model transferring a model to the model repository; (4) Step 4: the model repository sending model download notification to the training host in the Near-RT RIC; and, (5) Step 5: downloading the initial model to the training host in the Near-RT RIC.
  • Example 8 includes a method for online reinforcement learning including: (1) Step 1 : The training host in the Near-RT RIC deploying AI/ML model to the inference host; (2) Step 2: the inference host collecting inference data from E2 nodes over the E2 interface; (3) Step 3: the ML inference host performing inference, generating decision making policies, using the deployed ML model; (4) Step 4: the ML inference host enforcing E2 control action/guidance via the E2 interface; (5) Step 5: the ML inference host providing online training data to the training host in the Near-RT RIC; (6) Step 6: the training host in the Near-RT RIC collecting online training data over E2 interface from E2 nodes; (7) Step 7: the training host in the Near-RT RIC performs online reinforcement learning and update the AI/ML model; and, (8) Step 8: the training host in the Near-RT RIC deploying updated model to the ML inference host.
  • Step 1 The training host in the Near-RT RIC deploying AI/ML model to the
  • Example 9 includes a method for updated model upload to repository includes the following: (1) Step 1: the training host sends model upload request to the model repository; and, (2) Step 2: the training host in the Near-RT RIC uploads the updated model to the repository.
  • Example 10 includes a method for a backup model download from repository, the method including the following operations: (1) step 1: the ML inference host providing performance feedback to the training host in the near- RT RIC; (2) Step 2: the training host in the Near-RT RIC detecting performance degradation, which may be greater than average or severe, based on the feedback from inference host; (3) step 3: the training host sending model download request to the model repository; and (4) step 4: the backup model (previously well-performing model) is downloaded from the repository to the training host in the Near-RT RIC.
  • the term “application” may refer to a complete and deployable package, environment to achieve a certain function in an operational environment.
  • A1ZML application or the like may be an application that contains some AJZML models and application-level descriptions.
  • machine learning refers to the use of computer systems implementing algorithms and/or statistical models to perform specific task(s) without using explicit instructions, but instead relying on patterns and inferences.
  • ML algorithms build or estimate mathematical model(s) (referred to as “ML models” or the like) based on sample data (referred to as “training data,” “model training information,” or the like) in order to make predictions or decisions without being explicitly programmed to perform such tasks.
  • training data referred to as “training data,” “model training information,” or the like
  • an ML algorithm is a computer program that learns from experience with respect to some task and some performance measure, and an ML model may be any object or data structure created after an ML algorithm is trained with one or more training datasets. After training, an ML model may be used to make predictions on new datasets.
  • ML algorithm refers to different concepts than the term “ML model,” these terms as discussed herein may be used interchangeably for the purposes of the present disclosure.
  • ML model may also refer to ML methods and concepts used by an ML- assisted solution.
  • An “ML- assisted solution” is a solution that addresses a specific use case using ML algorithms during operation.
  • ML models include supervised learning (e.g., linear regression, k-nearest neighbor (KNN), descision tree algorithms, support machine vectors, Bayesian algorithm, ensemble algorithms, etc.) unsupervised learning (e.g., K-means clustering, principle component analysis (PCA), etc.), reinforcement learning (e.g., Q-leaming, multi-armed bandit learning, deep RL, etc.), neural networks, and the like.
  • An “ML pipeline” is a set of functionalities, functions, or functional entities specific for an ML-assisted solution; an ML pipeline may include one or several data sources in a data pipeline, a model training pipeline, a model evaluation pipeline, and an actor.
  • the “actor” is an entity that hosts an ML assisted solution using the output of the ML model inference).
  • ML training host refers to an entity, such as a network function, that hosts the training of the model.
  • ML inference host refers to an entity, such as a network function, that hosts model during inference mode (which includes both the model execution as well as any online learning if applicable).
  • the ML-host informs the actor about the output of the ML algorithm, and the actor takes a decision for an action (an “action” is performed by an actor as a result of the output of an ML assisted solution).
  • model inference information refers to information used as an input to the ML model for determining inference(s); the data used to train an ML model and the data used to determine inferences may overlap, however, “training data” and “inference data” refer to different concepts.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

An apparatus for a near real-time (RT) radio access network intelligence controller (RIC) in an open radio access network (O-RAN), the apparatus including a training host and an inference host. The training host of the near-RT RIC is configured to train artificial intelligence (AI)/machine learning (ML) models based on performance and feedback data. The training host of the near-RT RIC is configured to send and receive AI/ML models to the model repository of the non-RT RIC. The training host of the near-RT RIC is configured to replace an AI/ML model being used by the inference host if the performance is below a threshold performance. An apparatus for a non-RT RIC in an O-RAN, the apparatus including a training host and a model repository. The training host of the non-RT RIC is configured to train initial models and update models based on ML offline learning data and other data.

Description

ONLINE REINFORCEMENT LEARNING
PRIORITY CLAIM
[0001] This application claims the benefit of priority to United States Provisional Patent Application 63/079,876, filed September 17, 2020, and entitled “DEPLOYMENT SCENARIO FOR ONLINE REINFORCEMENT LEARNING INNEAR-RT RIC”, which is incorporated herein by reference in its entirety.
TECHNICAL FIELD
[0002] .Aspects pertain to wireless communications. Some aspects relate to wireless networks including 3 GPP (Third Generation Partnership Project) networks, 3 GPP LTE (Long Term Evolution) networks, 3 GPP LTE- A (LTE Advanced) networks, (MulteFire, LTE-U), and fifth-generation (5G) networks including 5G new' radio (NR.) (or 5G-NR) networks, 5G-LTE networks such as 5GNR unlicensed spectrum (NR-U) networks and other unlicensed networks including Wi-Fi, CBRS (OnGo), etc. Other aspects are directed to Open RAN (O-RAN) architectures and, more specifically, techniques for reinforcement learning for O-RAN networks.
BACKGROUND
[0003] Mobile communications have evolved significantly from early voice systems to today’s highly sophisticated integrated communication platform. With the increase in different types of devices communicating with various network devices, usage of 3GPP LTE systems has increased. The penetration of mobile devices (user equipment or UEs) in modern society has continued to drive demand for a wide variety of networked devices in many disparate environments. Fifth-generation (5G) wireless systems are forthcoming and are expected to enable even greater speed, connectivity, and usability. Next generation 5G networks are expected to increase throughput, coverage, and robustness and reduce latency and operational and capital expenditures. 5G new radio (5G-NR) networks will continue to evolve based on 3GPP LTE-Advanced with additional potential new radio access technologies (RATs) to enrich people’s lives with seamless wireless connectivity solutions delivering fast, rich content and services. As current cellular network frequency is saturated, higher frequencies, such as millimeter wave (mmWave) frequency, can be beneficial due to their high bandwidth. [0004] Potential LTE operation in the unlicensed spectrum includes (and is not limited to) the LTE operation in the unlicensed spectrum via dual connectivity (DC), or DC-based LAA, and the standalone LTE system in the unlicensed spectrum, according to which LTE-based technology solely operates in the unlicensed spectrum without requiring an “anchor” in the licensed spectrum, called MulteFire. MulteFire combines the performance benefits of LTE technology with the simplicity of Wi-Fi-like deployments.
[0005] Further enhanced operation of LTE and NR. systems in the licensed, as well as unlicensed spectrum, is expected in future releases and 5G systems such as O-RAN systems. Such enhanced operations can include techniques for machine learning (ML) for O-RAN networks.
BRIEF DESCRIPTION OF THE FIGURES [0006] In the figures, which are not necessarily drawn to scale, like numerals may describe similar components in different views. Like numerals having different letter suffixes may represent different instances of similar components. The figures illustrate generally, by way of example, but not by way of limitation, various aspects discussed in the present document.
[0007] FIG. 1 illustrates an example Open RAN (O-RAN) system architecture.
[0008] FIG. 2 illustrates a logical architecture of the O-RAN system of FIG. 1. [0009] FIG. 3 illustrates a system where a non-RT RIC acts as both the ML training and inference host, in accordance with some embodiments.
[0010] FIG. 4 illustrates a system where a non-RT RIC acts as the ML training and a near-RT RIC acts as the ML inference, in accordance with some embodiments.
[0011] FIG. 5 illustrates a system for online reinforcement learning, in accordance with some embodiments.
[0012] FIG. 6 illustrates a method for online reinforcement learning, in accordance with some embodiments.
[0013] FIG. 7 illustrates a method for online reinforcement learning, in accordance with some embodiments.
DETAILED DESCRIPTION
[0014] The following description and the drawings sufficiently illustrate aspects to enable those skilled in the art to practice them. Other aspects may incorporate structural, logical, electrical, process, and other changes. Portions and features of some aspects may be included in or substituted for, those of other aspects.
Aspects outlined in the claims encompass all available equivalents of those claims.
[0015] FIG. 1 provides a high-level view of an Open RAN (O-RAN) architecture 100. The O-RAN architecture 100 includes four O-RAN defined interfaces - namely, the A1 interface, the 01 interface, the 02 interface, and the Open Fronthaul Management (M)-plane interface - which connect the Service Management and Orchestration (SMO) framework 102 to O-RAN network functions (NFs) 104 and the O-Cloud 106. The SMO 102 (described in Reference [R13]) also connects with an external system 110, which provides enrichment data to the SMO 102. FIG. 1 also illustrates that the A1 interface terminates at an O- RAN Non-Real Time (RT) RAN Intelligent Controller (RIC) 112 in or at the SMO 102 and at the O-RAN Near-RT RIC 114 in or at the O-RAN NFs 104. The O- RAN NFs 104 can be virtual network functions (VNFs) such as virtual machines (VMs) or containers, sitting above the O-Cloud 106 and/or Physical Network Functions (PNFs) utilizing customized hardware. All O-RAN NFs 104 are expected to support the 01 interface when interfacing with the SMO framework 102. The O-RAN NFs 104 connect to the NG-Core 108 via the NG interface (which is a 3GPP defined interface). The Open Fronthaul M-plane interface between the SMO 102 and the O-RAN Radio Unit (O-RU) 116 supports the O- RU 116 management in the O-RAN hybrid model as specified in Reference [R16],
The Open Fronthaul M-plane interface is an optional interface to the SMO 102 that is included for backward compatibility purposes as per Reference [R16] and is intended for management of the O-RU 116 in hybrid mode only. The management architecture of flat mode (see Reference [R12]) and its relation to the 01 interface for the O-RU 116 is in development. The O-RU 116 termination of the 01 interface towards the SMO 102 as specified in Reference [R12],
[0016] FIG. 2 shows an O-RAN logical architecture 200 corresponding to the O- RAN architecture 100 of FIG. 1. In FIG. 2, the SMO 202 corresponds to the SMO 102, O-Cloud 206 corresponds to the O-Cloud 106, the non-RT RIC 212 corresponds to the non-RT RIC 112, the near-RT RIC 214 corresponds to the near- RT RIC 114, and the O-RU 216 corresponds to the O-RU 116 of FIG. 2, respectively. The O-RAN logical architecture 200 includes a radio portion and a management portion. [0017] The management portion/side of the architectures 200 includes the SMO
Framework 202 containing the non-RT RIC 212, and may include the O-Cloud 206. The O-Cloud 206 is a cloud computing platform including a collection of physical infrastructure nodes to host the relevant O-RAN functions (e.g., the near- RT RIC 214, O-RAN Central Unit-Control Plane (O-CU-CP) 221, O-RAN Central Unit-User Plane O-CU-UP 222, and the O-RAN Distributed Unit (O-DU) 215, supporting software components (e.g., OSs, VMMs, container runtime engines, ML engines, etc.), and appropriate management and orchestration functions.
[0018] The radio portion/side of the logical architecture 200 includes the near-RT RIC 214, the O-DU 215, the O-RAN Radio Unit (O-RU) 216, the O-CU-CP 221, and the O-CU-UP 222 functions. The radio portion/side of the logical architecture 200 may also include the O-e/gNB 210.
[0019] The O-DU 215 is a logical node hosting Radio Link Control (RLC), media access control (MAC), and higher physical (PHY) layer entities/elements (High- PHY layers) based on a lower layer functional split. The O-RU 216 is a logical node hosting lower PHY layer entities/elements (Low-PHY layer) (e.g., FFT/iFFT, PRACH extraction, etc.) and RF processing elements based on a lower layer functional split. Virtualization of O-RU 216 is FFS. The O-CU-CP 221 is a logical node hosting the RRC and the control plane (CP) part of the PDCP protocol. The O-CU-UP 222 is a logical node hosting the user plane part of the PDCP protocol and the SDAP protocol.
[0020] An E2 interface terminates at a plurality of E2 nodes. The E2 nodes are logical nodes/entities that terminate the E2 interface. For NR/5G access, the E2 nodes include the O-CU-CP 221, O-CU-UP 222, O-DU 215, or any combination of elements as defined in Reference [R15], For E-UTRA access the E2 nodes include the O-e/gNB 210. As shown in FIG. 2, the E2 interface also connects the O-e/gNB 210 to the Near-RT RIC 214. The protocols over E2 interface are based exclusively on Control Plane (CP) protocols. The E2 functions are grouped into the following categories: (a) near-RT RIC 214 services (REPORT, INSERT, CONTROL and POLICY, as described in Reference [R15]); and (b) near-RT RIC 214 support functions, which include E2 Interface Management (E2 Setup, E2 Reset, Reporting of General Error Situations, etc.) and Near-RT RIC Service Update (e.g., capability exchange related to the list of E2 Node functions exposed over E2).
[0021] FIG. 2 shows the Uu interface between a UE 201 and O-e/gNB 210 as well as between the UE 201 and O-RAN components. The Uu interface is a 3 GPP defined interface (see e.g., sections 5.2 and 5.3 of Reference [R07]), which includes a complete protocol stack from LI to L3 and terminates in the NG-RAN or E-UTRAN. The O-e/gNB 210 is an LTE eNB (see Reference [R04]), a 5G gNB or ng-eNB (see Reference [R06]) that supports the E2 interface. The O-e/gNB 210 may be the same or similar as discussed in FIGS. 3-7. The UE 201 may correspond to UEs discussed with respect to FIGS. 3-7 and/or the like. There may be multiple UEs 201 and/or multiple O-e/gNB 210, each of which may be connected to one another the via respective Uu interfaces. Although not shown in FIG. 2, the O- e/gNB 210 supports O-DU 215 and O-RU 216 functions with an Open Fronthaul interface between them.
[0022] The Open Fronthaul (OF) interfaced) i s/are between O-DU 215 and O- RU 216 functions (see References [R16] and [R17].) The OF interfaced) includes the Control User Synchronization (CUS) Plane and Management (M) Plane. FIGS. 1 and 2 also show that the O-RU 216 terminates the OF M-Plane interface towards the O-DU 215 and optionally towards the SMO 202 as specified in Reference [R16]. The O-RU 216 terminates the OF CUS-Plane interface towards the O-DU 215 and the SMO 202. [0023] The Fl-c interface connects the O-CU-CP 221 with the O-DU 215. As defined by 3 GPP, the Fl-c interface is between the gNB-CU-CP and gNB-DU nodes (see References [R07] and [R10].) However, for purposes of O-RAN, the Fl-c interface is adopted between the O-CU-CP 221 with the O-DU 215 functions while reusing the principles and protocol stack defined by 3 GPP and the definition of interoperability profile specifications.
[0024] The Fl-u interface connects the O-CU-UP 222 with the O-DU 215. As defined by 3 GPP, the Fl-u interface is between the gNB-CU-UP and gNB-DU nodes (see References [R07] and [R10]). However, for purposes of O-RAN, the Fl-u interface is adopted between the O-CU-UP 222 with the O-DU 215 functions while reusing the principles and protocol stack defined by 3GPP and the definition of interoperability profile specifications.
[0025] The NG-c interface is defined by 3GPP as an interface between the gNB- CU-CP and the AMF in the 5GC (see Reference [R06]). The NG-c is also referred as the N2 interface (see Reference [R06]). The NG-u interface is defined by 3GPP, as an interface between the gNB-CU-UP and the UPF in the 5GC (see Reference [R06]). The NG-u interface is referred as the N3 interface (see Reference [R06]). In O-RAN, NG-c and NG-u protocol stacks defined by 3GPP are reused and may be adapted for O-RAN purposes. [0026] The X2-c interface is defined in 3GPP for transmitting control plane information between eNBs or between eNB and en-gNB in EN-DC. The X2-u interface is defined in 3GPP for transmitting user plane information between eNBs or between eNB and en-gNB in EN-DC (see e.g., [005], [006]). In O-RAN, X2- c and X2-u protocol stacks defined by 3GPP are reused and may be adapted for O-RAN purposes.
[0027] The Xn-c interface is defined in 3GPP for transmitting control plane information between gNBs, ng-eNBs, or between an ng-eNB and gNB. The Xn-u interface is defined in 3GPP for transmitting user plane information between gNBs, ng-eNBs, or between ng-eNB and gNB (see e.g., References [R06] and [R08]). In O-RAN, Xn-c and Xn-u protocol stacks defined by 3GPP are reused and may be adapted for O-RAN purposes
[0028] The El interface is defined by 3GPP as being an interface between the gNB-CU-CP (e.g., gNB-CU-CP 3728) and gNB-CU-UP (see e.g., [007], [009]). In O-RAN, El protocol stacks defined by 3GPP are reused and adapted as being an interface between the O-CU-CP 221 and the O-CU-UP 222 functions.
[0029] The O-RAN Non-Real Time (RT) RAN Intelligent Controller (RIC) 212 is a logical function within the SMO framework 102, 202 that enables non-realtime control and optimization of RAN elements and resources; AVmachine learning (ML) workflow(s) including model training, inferences, and updates; and policy-based guidance of applications/features in the Near-RT RIC 214.
[0030] The O-RAN near-RT RIC 214 is a logical function that enables near-realtime control and optimization of RAN elements and resources via fine-grained data collection and actions over the E2 interface. The near-RT RIC 214 may include one or more AIZML workflows including model training, inferences, and updates.
[0031] The non-RT RIC 212 can be an ML training host to host the training of one or more ML models. The ML data can be collected from one or more of the following: the Near-RT RIC 214, O-CU-CP 221, O-CU-UP 222, O-DU 215, O- RU 216, external enrichment source 110 of FIG. 1, and so forth. For supervised learning, and the ML training host and/or ML inference host/actor can be part of the non-RT RIC 212 and/or the near-RT RIC 214. For unsupervised learning, the ML training host and ML inference host/actor can be part of the non-RT RIC 212 and/or the near-RT RIC 214. For reinforcement learning, the ML training host and ML inference host/actor are co-located as part of the near-RT RIC 214. In some implementations, the non-RT RIC 212 may request or trigger ML model training in the training hosts regardless of where the model is deployed and executed. ML models may be trained and not currently deployed.
[0032] In some implementations, the non-RT RIC 212 provides a query-able catalog for an ML designer/developer to publish/install trained ML models (e.g., executable software components). In these implementations, the non-RT RIC 212 may provide discovery mechanism if a particular ML model can be executed in a target ML inference host (MF), and what number and type of ML models can be executed in the target ML inference host. The Near-RT RIC 214 is a managed function (MF). For example, there may be three types of ML catalogs made discoverable by the non-RT RIC 212: a design-time catalog (e.g., residing outside the non-RT RIC 212 and hosted by some other ML platform(s)), a training/deployment-time catalog (e.g., residing inside the non-RT RIC 212), and a run-time catalog (e.g., residing inside the non-RT RIC 212). The non-RT RIC 212 supports necessary capabilities for ML model inference in support of ML assisted solutions running in the non-RT RIC 212 or some other ML inference host. These capabilities enable executable software to be installed such as VMs, containers, etc. The non-RT RIC 212 may also include and/or operate one or more ML engines, which are packaged software executable libraries that provide methods, routines, data types, etc., used to run ML models. The non-RT RIC 212 may also implement policies to switch and activate ML model instances under different operating conditions.
[0033] The non-RT RIC 22 is able to access feedback data (e.g., FM, PM, and network KPI statistics) over the 01 interface on ML model performance and perform necessary evaluations. If the ML model fails during runtime, an alarm can be generated as feedback to the non-RT RIC 212. How well the ML model is performing in terms of prediction accuracy or other operating statistics it produces can also be sent to the non-RT RIC 212 over 01. The non-RT RIC 212 can also scale ML model instances running in a taiget MF over the 01 interface by observing resource utilization in MF. The environment where the ML model instance is running (e.g., the MF) monitors resource utilization of the running ML model. This can be done, for example, using an ORAN-SC component called ResourceMonitor in the near-RT RIC 214 and/or in the non-RT RIC 212, which continuously monitors resource utilization. If resources are low or fall below a certain threshold, the runtime environment in the near-RT RIC 214 and/or the non- RT RIC 212 provides a scaling mechanism to add more ML instances. The scaling mechanism may include a scaling factor such as an number, percentage, and/or other like data used to scale up/down the number of ML instances. ML model instances running in the target ML inference hosts may be automatically scaled by observing resource utilization in the MF. For example, the Kubemetes® (K8s) runtime environment typically provides an auto-scaling feature.
[0034] The A1 interface is between the non-RT RIC 212, which is within or the SMO 202) and the near-RT RIC 214. The A1 interface supports three types of services as defined in Reference [R14], including a Policy Management Service, an Enrichment Information Service, and ML Model Management Service. A1 policies have the following characteristics compared to persistent configuration as defined in Reference [R14]: A1 policies are not critical to traffic; A1 policies have temporary validity; A1 policies may handle individual UE or dynamically defined groups of UEs; A1 policies act within and take precedence over the configuration; and A1 policies are non-persistent, i.e., do not survive a restart of the near-RT RIC.
[0035] A technical problem is how to train and maintain good AI/ML models to be used by an inference host to perform E2 control and other controls. The disclosed examples address this issue by including two training hosts: one in the non-RT RIC and one in the near-RT RIC. The training host in the near-RT RIC performs online learning, which may use different data than the offline learning, and ensures that an adequate AI/ML model is being used by the inference host. The training host of the non-RT RIC performs offline learning and transfers an initial model and updated models to a model repository that is used to store AIZML model that may be used by the training host of the near-RT RIC.
[0036] A deployment is disclosed of online reinforcement learning in the Near- RT RIC. The AI/ML training host and inference host are located in the Near-RT RIC, while an offline learning host and ML model repository reside in the Non- RT RIC. The deployment reduces communication and feedback delay between the ML training host and the ML inference host. The delay reduction is essential for online reinforcement learning, especially for generating fast changing decision-making policies that adapt to highly dynamic environments. The ML model repository ensures performance of the online reinforcement learning by saving the most accurate and best performing ML models.
[0037] Examples disclose a deployment scenario for online reinforcement learning in the Near-RT RIC, which incorporates an online training host and inference host in the Near-RT RIC, while an offline training host and ML model repository reside in SMO/Non-RT RIC.
[0038] FIG. 3 illustrates a system 300 where a non-RT RIC acts as both the ML training and inference host, in accordance with some embodiments. ML training information 322 is collected from the DU/O-CU 332 over the E2 interface and/or 01 interface and sent to data management 308. ML online information 324 is collected from the E2 interface and/or 01 interface and sent to data management 308. Data management 308 sends the information to the ML training 316 and ML inference 315. The ML inference 314 uses a model and sends configuration management 306 (if DU or CU is subject of action). The ML inference 314 sends Policy/intent (if near-RT RIC is subject of action) 304 to the near-RT RIC 302. The 01 management (MGMT) 310 sends data enrichment 330 (and deploy instructions and models). The non-RT RIC 312 includes the ML training 316 and ML inference 314. The ML inference sends perform data 338 to the ML training 316. The ML training 316 sends deploy instructions and models 336 to the ML inference 314. [0039] FIG. 4 illustrates a system 400 where a non-RT RIC acts as the ML training and a near-RT RIC acts as the ML inference, in accordance with some embodiments. In FIG. 4 the ML inference 314 resides in the near-RT RIC rather than the non-RT RIC 312. 01 management sends ML deploy 404 instructions or models to the ML inference 314. The same numbers as FIG. 3 are meant to indicate the same or similar information and/or function.
[0040] FIG. 5 illustrates a system 500 for online reinforcement learning, in accordance with some embodiments. The performance feedback 530 is training data for online training, e.g., rewards, environment states, performed actions, and so forth, and data for performance monitoring. The training host (online learning) (“training host near-RT”) 514 is configured for online learning. The training host near-RT 514 resides in the near real-time RIC 510. The inference host 512 is an Al/ML inference host. The inference host 512 resides in the near- RTRIC.
[0041] The training host (offline) (“training host non-RT”) 506 is a training host for offline learning. The training host non-RT 506 resides in the SMO or non- RT RIC 504. The model repository 508 is an Al/ML model repository. The model repository 508 resides in the SMO/non-RT RIC (“non-RT RIC”) 504. [0042] The training host non-RT 506 collects ML learning data 516 from the E2 nodes O-CU/O-DU 502 (“E2 nodes”) over the 01 interface for offline reinforcement learning. The training host non-RT 506 trains the initial model based on the offline training. The training host non-RT 506 transfers via move model 518 the initial model 534, which is an offline trained model, to the model repository as a trained model 536. The ML learning data 516, e.g., O-CU/O-DU data collected over 01, for offline training, e.g., as performed by training host non-RT 506, and the O-CU/O-DU data collected over E2, e.g., inference data 526, for online learning, e.g., training host near-RT 514, and/or inference use, e.g., inference host 512, may be different. [0043] The model repository 508 is associated with or stored in the SMO/Non-
RT RIC 504, which stores trained, validated, and tested models. The terms associated with or stored may mean implemented by, or located within or in, in accordance with some embodiments. The trained model 536 may be trained, validated, and tested. The trained model 536 may be the initial model 534 transferred to the model repository 508. The stored model, e.g., trained model 536, may be tested to be well-performing and may be used as a backup for online learning, in case the running model 538 drifts too much and leads to severe degradation. [0044] The model repository 508 sends out a model download 522 notification to the training host near-RT 514, which is associated with or located in the near- RT RIC 510 and the model repository 508 sends the model, e.g., trained model 536, to the training host near-RT 514. The model repository 508 receives a model download request from the training host near-RT 514 associated with or in the near-RT RIC 510, and the model repository 508 sends, in response, a model, e.g., trained model 536 or updated model 540, to the training host near- RT 514.
[0045] The model repository 508 receives a model upload request from the training host near-RT 514 associated with or in the near-RT RIC 510, and the model repository 508 receives a model upload 524 comprising the updated model 540 from the training host near-RT 514.
[0046] In one embodiment, the model download 522 and the model upload 524 between the model repository 508 associated with or in SMO/Non-RT RIC 508 and training host near-RT 514 associated with or in the near-RT RIC 510 are communicated over the A1 interface. The request model message for a model download 522 and the notification message for model upload 524 are part of the Al-ML service. In another embodiment, the model download 522 and the model upload 524 are communicate over the 01 interface. [0047] The training host near-RT 514 in the near-RT RIC 510 collects learning data for online reinforcement learning from E2 nodes over the E2 interface and from the ML inference host (“inference host”) 512, e.g., the performance feedback 530, using an application program interface (API) of the near-RT RIC 510. The training host near-RT 514 updates the model, e.g., running model 538, based on the above training data for online learning, and it deploys the AI/ML model, e.g., running model 538, to the inference host 512.
[0048] The training host near-RT 514 associated with or in the near-RT RIC 510 receives a model download notification from the model repository 508, and it receives the model, e.g., train model 536 or updated model 540, from the model repository 508. The training host near-RT 514 in the near-RT RIC 510 communicates or sends out a model download 522 request to the model repository 508, and the training host near-RT 514 receives the model from the model repository 508. [0049] The training host near-RT 510 associated with or in the Near-RT RIC 510 sends out a model upload 524 request to the model repository 508, and the training host near-RT 514 sends the model to the model repository 508.
[0050] The training host near-RT 514 in the near-RT RIC 510 receives AIZML performance feedback 530 from the inference host 512. If the training host near- RT 514 detects severe performance degradation, then the training host near-RT 514 can send a model download 522 request to the model repository 508. After a model download 522 of a previously well-performing model, e.g., updated model 540, from the model repository 508, the training host near-RT 514 communicates a model deploy 532 with this backup model to the inference host 512. The training host near-RT 514 associated with or in the near-RT RIC 510 determines whether to communicate a model upload 524 of the latest trained model, e.g., trained running model 542, to the model repository 508. A validation and testing procedure for the uploaded model is done by the training host near-RT 514 associated with or in the near-RT RIC 510 before uploading. [0051] The inference host 512 is associated with or resides in the near-RT RIC 510. The inference host 512, which may be termed a ML inference host, collects data, e.g., inference data 526, from E2 node over the E2 interface for inference. The inference host 512 uses the model, e.g., the running model 538, deployed by the training host near-RT 514 associated with or residing in the Near-RT RIC
510 to generate E2 control . The inference host 512 enforces the control actions/guidance via E2 interface. The inference host 512 sends performance feedback and training data 530 for online learning to the training host near-RT 514 in the near-RT RIC 510. The SMO/NON-RT RIC 504 and the near-RT RIC 510 may communicate over the Ol/Al interfaces.
[0052] The ML learning data 516 is data for offline training. Performance feedback 530 includes data is for online training. The ML learning data 516, the inference data 526, and/or the performance feedback 530 may include one or more of the following: the size, number, and the ML downlink (DL) physical resource blocks (PRBs) used for data traffic, the size, number, and the uplink (UL) PRBs used for data traffic; an average DL user equipment (UE) throughput in a next generation Node-B (gNB) of the 0-RAN network; an average UL UE throughput in the gNB; a number of protocol data unit (PDU) sessions requested for setup in the O-RAN network; a number of PDU sessions successfully set up in the O-RAN network; and/or a number of PDU sessions failed to set up in the O-RAN network.
[0053] FIG. 6 illustrates a method 600 for online reinforcement learning, in accordance with some embodiments. The method 600 begins at operation 602 (or step 1) with collecting training data for offline learning. For example, the training host non-RT 506 collects ML learning data 516 from the E2 nodes O- CU/O-DU 502 (“E2 nodes") 502 over the 01 interface for offline reinforcement learning.
[0054] The method 600 continues at operation 604 (or step 2) with performing offline learning. For example, the training host non-RT 506 trains the initial model, e.g., initial model 534, based on the offline training using the ML learning data 516.
[0055] The method 600 continues at operation 606 (or step 3) with moving the initial model. For example, the initial model 534 is moved from the training host non-RT 506 offline learning to the model repository 508 in the non-RT RIC 504 as a trained model 538.
[0056] The method 600 continues at operation 608 (or step 4) downloading the model to a near real time training host. For example, the ALML model, e.g., trained model 536, is downloaded, e.g., model download 522, to the training host near-RT 514 associated with or residing in the near-RT RIC 510.
[0057] The method 600 continues at operation 610 (or step 5) with deploying the model. For example, the AIZML model, e.g., running model 538, trained running model 542, trained model 536, or updated model 540, is deployed to the inference host 512, e.g., xApp, in the near-RT RIC 510. [0058] The method 600 continues at operation 612 (or step 6) with collecting inference data. For example, inference data 526 is collected from E2 nodes via the E2 interface, e.g., E2 of FIGS. 1-4.
[0059] The method 600 continues at operation 614 (or step 7) with generating decision-making policies. For example, the ML inference host, e.g., inference hots 512, generates decision making policies based on the deployed ML model, e.g., running model 538, and the inference data, e.g., inference data 526.
[0060] The method 600 continues at operation 616 (or step 8) with enforcing E2 control actions/guidance via the E2 interface. For example, the ML inference host, e.g., inference host 512, enforces E2 control action/guidance via the E2 interface, e.g., E2 control 528.
[0061] The method 600 continues at operation 618 (or step 9) with collecting training data over the E2 interface. For example, the training data, e.g., performance data 530, for online learning is collected over E2 interface from E2 nodes to the training host, e.g., training host near-RT 514, in the near-RT RIC 510.
[0062] The method 600 continues at operation 620 (or step 10) with providing feedback. For example, the ML inference host, e.g., inference host 512, provides performance feedback and online training data, e.g., performance feedback 530, to the training host, e.g., training host near-RT 514 associated with or residing in the near-RT RIC 510.
[0063] The method 600 continues at operation 622 (or step 11) with performing online learning. For example, the training host, e.g., training host near-RT 514, in the near-RT RIC 510 performs online learning based on the online learning data, e.g., performance data 530, from the inference host, inference host 512, and E2 nodes.
[0064] The method 600 continues at operation 624 (or step 12) deploy updated model. For example, the training host, e.g., training host near-RT 514, in the near-RT RIC, near-RT RIC 510, deploys, e.g., deploy model 532, updated model, e.g., trained running model 542 becomes the running model 538, to the ML inference host, e.g., inference host 512.
[0065] In some embodiments, after operation 624 the method 600 returns to operation 612. For example, the method 600 may return to operation 612 during operation of the inference host 512. The method 600 may continue to operation 626 when the training host 514 determines that a new AI/ML model should be used. The method 600 may return to operation 612 after operation 628.
[0066] The method 600 continues at operation 626 (or step 13) sending upload request. For example, if the training host, e.g., the training host near-RT 514, associated with or residing in the near-RT RIC 510 detects the running model 538 performs well (based on performance feedback data), then the training host near-RT 514 may send a model upload request to the model repository 508, and the updated AI/ML model, e.g., trained running model 542, is uploaded and stored in the model repository, e.g., as updated model 540. [0067] The method 600 continues at operation 628 (or step 14) with request model download. For example, if the training host, e.g., training host near-RT 514, in the near-RT RIC, e.g., near-RT RIC 510, detects the running model 538 performs not as well as expected (based on performance feedback data, then it may request a model download from the model repository, when the running model leads to severe performance degradation. For example, the training host 514 and/or inference host 512 may detect performance degradation that is above a threshold value and send a request for a different model. The training host near-RT 514 may use the trained running model 542 or may request a model from the model repository 508, e.g., trained model 536 or updated model 540, which may be communicated or sent to the training host near-RT 514 via model download 522. The training host near-RT 514 may select which model to deploy in the inference host 512 based on performance data 530 and/or previous performance information associated with the other models, e.g., running model 538, trained running model 542, trained model 536, or updated model 540.
[0068] The method 600 may include one or more additional operations. The operations of method 600 may be performed in a different order. One or more of the operations of method 600 may be optional. Different step may be performed by different functional entities such as training host 506, model repository 508, inference host 512, and/or training host 514.
[0069] FIG. 7 illustrates a method 700 for online reinforcement learning, in accordance with some embodiments. The method 700 may be performed by a near-RT RIC in an O-RAN. The method 700 begins at operation 702 with receiving a model such as an AI/ML model. The training host 514 of the near- RT RIC 510 may receive a model such as trained model 536, updated model 540, running model 538, and/or initial model 534. The method 700 continues at operation 704 with receiving training data. For example, the training host 514 of the near-RT RIC 510 may receive performance feedback 530 from the inference host 512. The method 700 continues at operation 706 with updating the AI/ML model based on the training data. For example, training host 514 of the near-RT RIC 510 may perform training on the AI/ML model to update the model and generate the trained running model 542 with updates.
[0070] Simultaneously with training host 514 performing online learning the training host 506 may be performing offline learning and may update AI/ML models. The data used to update the models may be different as disclosed herein.
[0071] The following describes further example embodiments. Example 1 includes where a deployment scenario for online reinforcement learning includes he training host for online learning and the AI/ML inference host reside in the Near-RT RIC; and, the training host for offline learning and AI/ML model repository are in SMO/Non-RT RIC.
[0072] In Example 2, the subject matter of Example 1 includes where the functionality of the training host in SMO/Non-RT RIC includes the following: (1) collecting learning data for offline reinforcement learning from E2 nodes over the 01 interface; (2) training the initial model based on the offline training; and, (3) transferring the offline trained model to the model repository.
[0073] In Example 3, the subject matter of Examples 1 and 2 includes where the functionality of the model repository in SMO/Non-RT RIC includes the following: (1) storing trained models; (2) sending out model download notification to the training host in the Near-RT RIC, and sending the model to that training host; (3) It receives model download requests from the training host in the Near-RT RIC, and sends the model to that training host; and, (4) receiving the model upload request from the training host in the Near-RT RIC, and receiving the updated model from the training host.
[0074] In Example 4, the subject matter of Examples 1-3 includes where the model download and upload are between the model repository in SMO/Non-RT RIC and ML training host in the Near-RT RIC. In one embodiment, the interactions are over A1 interface where the request and notification for model download and upload are part of Al-ML service. In another embodiment, the interactions are over 01 interface.
[0075] In Example 5, the subject matter of Examples 1-4 includes where the training host in the Near-RT RIC is configured to perform the following: (1) collect part of learning data for online reinforcement learning from E2 nodes over E2 interface; (2) collect part of learning data for online reinforcement learning from the ML inference host over Near-RT RIC’s internal API; (3) update the model based on the online training data; (4) deploy the AI/ML model to the inference host; (5) receive model download notification from the model repository and receive the model from the model repository; (6) send out model download request to the model repository, and, in response, receive the model from the model repository; (7) send out model upload request to the model repository and send the model to the model repository; and, (8) receive A1ZML performance feedback from the ML inference host. [0076] In Example 6, the subject matter of Examples 1-5 includes where the ML inference host in the Near-RT RIC is configured to perform the following: (1) collect data from E2 node over E2 interface for inference; (2) infer E2 control using the model deployed by the training host in the Near-RT RIC; (3) enforce the control actions/guidance via E2 interface; and (4) send performance feedback and training data for online learning to the training host in the Near-RT RIC. [0077] Example 7 a method for initial offline training including the following:
(1) Step 1 : collecting training data from the E2 nodes over 01 interface to SMO/Non-RT RIC; (2) Step 2: the training host inside SMO/Non-RT RIC performing offline learning; (3)Step 3: the trained offline model transferring a model to the model repository; (4) Step 4: the model repository sending model download notification to the training host in the Near-RT RIC; and, (5) Step 5: downloading the initial model to the training host in the Near-RT RIC.
[0078] Example 8 includes a method for online reinforcement learning including: (1) Step 1 : The training host in the Near-RT RIC deploying AI/ML model to the inference host; (2) Step 2: the inference host collecting inference data from E2 nodes over the E2 interface; (3) Step 3: the ML inference host performing inference, generating decision making policies, using the deployed ML model; (4) Step 4: the ML inference host enforcing E2 control action/guidance via the E2 interface; (5) Step 5: the ML inference host providing online training data to the training host in the Near-RT RIC; (6) Step 6: the training host in the Near-RT RIC collecting online training data over E2 interface from E2 nodes; (7) Step 7: the training host in the Near-RT RIC performs online reinforcement learning and update the AI/ML model; and, (8) Step 8: the training host in the Near-RT RIC deploying updated model to the ML inference host.
[0079] Example 9 includes a method for updated model upload to repository includes the following: (1) Step 1: the training host sends model upload request to the model repository; and, (2) Step 2: the training host in the Near-RT RIC uploads the updated model to the repository. [0080] Example 10 includes a method for a backup model download from repository, the method including the following operations: (1) step 1: the ML inference host providing performance feedback to the training host in the near- RT RIC; (2) Step 2: the training host in the Near-RT RIC detecting performance degradation, which may be greater than average or severe, based on the feedback from inference host; (3) step 3: the training host sending model download request to the model repository; and (4) step 4: the backup model (previously well-performing model) is downloaded from the repository to the training host in the Near-RT RIC.
[0081] REFERENCES
[0082] [R04] 3 GPP TS 36.401 v15.l.O (2019-01-09).
[0083] [R05] 3 GPP TS 36.420 v15.2.0 (2020-01-09).
[0084] [R06] 3 GPP TS 38.300 v16.0.0 (2020-01-08). [0085] [R07] 3 GPP TS 38.401 v16.0.0 (2020-01-09).
[0086] [R08] 3 GPP TS 38.420 v15.2.0 (2019-01-08).
[0087] [R09] 3 GPP TS 38.460 v16.0.0 (2020-01-09).
[0088] [R10] 3 GPP TS 38.470 v16.0.0 (2020-01-09).
[0089] [R12] O-RAN Alliance Working Group 1, O-RAN Operations and Maintenance Architecture Specification, version 2.0 (Dec 2019) (“0-RAN- WG1 ,OAM-Architecture-v02.00”).
[0090] [R13] O-RAN Alliance Working Group 1, O-RAN Operations and
Maintenance Interface Specification, version 2.0 (Dec 2019) (“O-RAN- WG1 .01 -Interface-v02.00”). [0091] [R14] O-RAN Alliance Working Group 2, O-RAN A1 interface:
General Aspects and Principles Specification, version 1.0 (Oct 2019) (“ORAN- WG2. A1.GA&P-vO 1.00”).
[0092] [R15] O-RAN Alliance Working Group 3, Near-Real-time RAN
Intelligent Controller Architecture & E2 General Aspects and Principles (“ORAN-WG3.E2GAP.0-v0.1”).
[0093] [R16] O-RAN Alliance Working Group 4, O-RAN Fronthaul
Management Plane Specification, version 2.0 (July 2019) (“ORAN-WG4.MP.O- v02.00.00”). [0094] [R17] O-RAN Alliance Working Group (WG) 4, O-RAN Fronthaul
Control, User and Synchronization Plane Specification, version 2.0 (July 2019) (“ORAN-WG4.CUS.0-v02.00”).
[0095] [R18] O-RAN WG1, “O-RAN Architecture Description”. [0096] [R19] O-RAN WG2, ““AI/ML Workflow Description and
Requirements”.
TERMINOLOGY
[0097] The term “application” may refer to a complete and deployable package, environment to achieve a certain function in an operational environment. The term “A1ZML application” or the like may be an application that contains some AJZML models and application-level descriptions.
[0098] The term “machine learning” or “ML” refers to the use of computer systems implementing algorithms and/or statistical models to perform specific task(s) without using explicit instructions, but instead relying on patterns and inferences. ML algorithms build or estimate mathematical model(s) (referred to as “ML models” or the like) based on sample data (referred to as “training data,” “model training information,” or the like) in order to make predictions or decisions without being explicitly programmed to perform such tasks. Generally, an ML algorithm is a computer program that learns from experience with respect to some task and some performance measure, and an ML model may be any object or data structure created after an ML algorithm is trained with one or more training datasets. After training, an ML model may be used to make predictions on new datasets. Although the term “ML algorithm” refers to different concepts than the term “ML model,” these terms as discussed herein may be used interchangeably for the purposes of the present disclosure.
[0099] The term “machine learning model,” “ML model,” or the like may also refer to ML methods and concepts used by an ML- assisted solution. An “ML- assisted solution” is a solution that addresses a specific use case using ML algorithms during operation. ML models include supervised learning (e.g., linear regression, k-nearest neighbor (KNN), descision tree algorithms, support machine vectors, Bayesian algorithm, ensemble algorithms, etc.) unsupervised learning (e.g., K-means clustering, principle component analysis (PCA), etc.), reinforcement learning (e.g., Q-leaming, multi-armed bandit learning, deep RL, etc.), neural networks, and the like. Depending on the implementation a specific ML model could have many sub-models as components and the ML model may train all sub-models together. Separately trained ML models can also be chained together in an ML pipeline during inference. An “ML pipeline” is a set of functionalities, functions, or functional entities specific for an ML-assisted solution; an ML pipeline may include one or several data sources in a data pipeline, a model training pipeline, a model evaluation pipeline, and an actor. The “actor” is an entity that hosts an ML assisted solution using the output of the ML model inference). The term “ML training host” refers to an entity, such as a network function, that hosts the training of the model. The term “ML inference host” refers to an entity, such as a network function, that hosts model during inference mode (which includes both the model execution as well as any online learning if applicable). The ML-host informs the actor about the output of the ML algorithm, and the actor takes a decision for an action (an “action” is performed by an actor as a result of the output of an ML assisted solution). The term “model inference information” refers to information used as an input to the ML model for determining inference(s); the data used to train an ML model and the data used to determine inferences may overlap, however, “training data” and “inference data” refer to different concepts. [00100] Although an aspect has been described with reference to specific exemplary aspects, it will be evident that various modifications and changes may be made to these aspects without departing from the broader scope of the present disclosure. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. This Detailed Description, therefore, is not to be taken in a limiting sense, and the scope of various aspects is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.

Claims

CLAIMS What is claimed is:
1. An apparatus for a near real-time (RT) radio access network intelligence controller (RIC)(Near-RT RIC) in an open radio access network (O- RAN), the apparatus comprising: processing circuitry configured to: receive an artificial intelligence (AQ/machine learning (ML) model; receive training data from E2 nodes over an E2 interface and from an ML inference host residing in the Near-RT RIC; and update the AIZML model based on the training data.
2. The apparatus of claim 1 wherein the processing circuitry is an
ML training host.
3. The apparatus of claim 1, wherein the processing circuitry is first processing circuitry, and wherein the apparatus further comprises: second processing circuitry configured to: receive data over the E2 interface; enforce control actions and guidance via the E2 interface; and send performance feedback and training data to the ML training host.
4. The apparatus of claim 3 wherein the second processing circuitry is an ML inference host and wherein the processing circuitry is further configured to: infer E2 controls using another AIZML model deployed to the ML inference host.
5. The apparatus of claim 3 wherein the first processing circuitry is a ML training host and the second processing circuitry is an ML inference host, and wherein the second processing circuity is further configured to: send training data and performance feedback data to the ML training host of the near-RT RIC.
6. The apparatus of claim 1 wherein the processing circuitry is a ML training host and wherein the processing circuitry is further configured to: detect severe performance degradation based on feedback data from an inference host residing in the near-RT RIC; send a model download request to a model repository, wherein the model repository resides in a non-RT RIC; and receive a backup model from the repository.
7. The apparatus of claim 1 wherein the training data from the ML inference host is received over a near-RT RIC internal application program interface (API), wherein the processing circuitry is further configured to: deploy the updated AIZML model to the ML inference host.
8. The apparatus of claim 1 wherein the processing circuitry is further configured to: receive a model download notification from a model repository, the model repository residing in a non-RT RIC; receive the AIZML model from the model repository; send a model download request to the model repository; receive another AIZML model from the model repository; send an AIZML model upload request to the model repository; and send the updated AIZML model to the model repository.
9. The apparatus of claim 8 wherein the model download notification, the AIZML model, the model download request, the another AIZML model, the AIZML model upload request, and the updated AIZML model are sent or received over an A1 interface or the 01, and wherein the download request and upload request are part of a Al-ML service.
10. The apparatus of claim 1 wherein the processing circuitry is first processing circuitry, the training data is first training data, the AIZML model is a first AIZML model, and the apparatus is a first apparatus and further comprising: a second apparatus for a non-RT RIC, the apparatus comprising: second processing circuitry configured to: receive a second AI/ML model; receive second training data; and update the second AI/ML model based on the training data.
11. The apparatus of claim 10 wherein the second processing circuitry is an ML training module.
12. The apparatus of claim 11 wherein the second training data is received from the E2 nodes over an 01 interface and wherein the second processing circuitry is further configured to: train an initial AI/ML model based on the second training data; and deploy the trained initial AI/ML model to the ML inference host.
13. The apparatus of claim 1 wherein the processing circuitry is first processing circuitry, the training data is first training data, the AI/ML model is a first AI/ML model, and the apparatus is a first apparatus and further comprising: a second apparatus for a non-RT RIC, the apparatus comprising: second processing circuitry configured to: receive second training data from the E2 nodes over an 01 interface; perform offline learning to train an initial AI/ML model; transfer the initial AI/ML model to a model repository of the non-RT
RIC; send model download notification to a ML training host of the near-RT
RIC; and download the initial AI/ML model to the ML training host of the near-RT RIC, wherein the second processing circuitry is a ML training host of the non- RT RIC and the model repository of the non-RT RIC.
14. An apparatus for an open radio access network (O-RAN), the apparatus comprising a non real-time (RT) radio access network intelligence controller (RIC), the non-RT RIC comprising first processing circuitry, and a near-RT RIC, the near-RT RIC comprising second processing circuitry, wherein the first processing circuitry is configured to: perform first training of a first artificial intelligence (AI)/machine learning (ML) model, and wherein the first processing circuitry is a training host of the non-RT RIC; and wherein the second processing circuitry is configured to: perform second training on a second AIZML model, and wherein the second processing circuitry is a training host of the near-RT RIC.
15. The apparatus of claim 14, wherein the first processing circuitry is further configured to: receive first training data, and wherein the training is based on the first training data, and wherein the second processing circuitry is further configured to: receive the second AIZML model from a model repository of the non-RT
RIC; and receive second training data, and wherein the training is based on the second training data.
16. A non-transitory computer-readable storage medium that stores instructions for execution by one or more processors of a near real-time (RT) radio access network intelligence controller (RIC) in an Open RAN (O-RAN) network, the instructions to configure the one or more processors to perform the following operations: receive an artificial intelligence (Al)/machine learning (ML) model; receive training data from E2 nodes over an E2 interface and from a ML inference host residing in the near-RT RIC; and update the AIZML model based on the training data.
17. The non-transitory computer-readable storage medium of claim 16 wherein the near-RT RIC comprises an ML training host.
18. The non-transitory computer-readable storage medium of claim 16 wherein the operations further comprise: at the ML inference host, receive data over an E2 interface; at the ML inference host, enforce control actions and guidance via the E2 interface; and at the ML inference host, send performance feedback and training data to the ML training host.
19. The non-transitory computer-readable storage medium of claim 18 wherein the operation further comprise: at the ML inference host, infer E2 controls using another AIZML model deployed to the ML inference host.
20. The non-transitory computer-readable storage medium of claim 18 wherein the operation further comprise: at the ML inference host, send training data and performance feedback data to the ML training host of the near-RT RIC.
PCT/US2021/050379 2020-09-17 2021-09-15 Online reinforcement learning WO2022060777A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202063079876P 2020-09-17 2020-09-17
US63/079,876 2020-09-17

Publications (1)

Publication Number Publication Date
WO2022060777A1 true WO2022060777A1 (en) 2022-03-24

Family

ID=80775567

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/050379 WO2022060777A1 (en) 2020-09-17 2021-09-15 Online reinforcement learning

Country Status (1)

Country Link
WO (1) WO2022060777A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024062717A1 (en) * 2022-09-20 2024-03-28 Kddi株式会社 Control device of radio access network
WO2024062716A1 (en) * 2022-09-20 2024-03-28 Kddi株式会社 Radio access network control device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019183020A1 (en) * 2018-03-19 2019-09-26 Mavenir Networks, Inc. System and method for reduction in fronthaul interface bandwidth for cloud ran
US20190380037A1 (en) * 2017-06-27 2019-12-12 Allot Communications Ltd. System, Device, and Method of Detecting, Mitigating and Isolating a Signaling Storm
CN111242304A (en) * 2020-03-05 2020-06-05 北京物资学院 Artificial intelligence model processing method and device based on federal learning in O-RAN system
CN111565418A (en) * 2020-07-13 2020-08-21 网络通信与安全紫金山实验室 O-RAN and MEC communication method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190380037A1 (en) * 2017-06-27 2019-12-12 Allot Communications Ltd. System, Device, and Method of Detecting, Mitigating and Isolating a Signaling Storm
WO2019183020A1 (en) * 2018-03-19 2019-09-26 Mavenir Networks, Inc. System and method for reduction in fronthaul interface bandwidth for cloud ran
CN111242304A (en) * 2020-03-05 2020-06-05 北京物资学院 Artificial intelligence model processing method and device based on federal learning in O-RAN system
CN111565418A (en) * 2020-07-13 2020-08-21 网络通信与安全紫金山实验室 O-RAN and MEC communication method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SOLMAZ NIKNAM; ABHISHEK ROY; HARPREET S. DHILLON; SUKHDEEP SINGH; RAHUL BANERJI; JEFFERY H. REED; NAVRATI SAXENA; SEUNGIL YOON: "Intelligent O-RAN for Beyond 5G and 6G Wireless Networks", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 17 May 2020 (2020-05-17), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081675121 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024062717A1 (en) * 2022-09-20 2024-03-28 Kddi株式会社 Control device of radio access network
WO2024062716A1 (en) * 2022-09-20 2024-03-28 Kddi株式会社 Radio access network control device

Similar Documents

Publication Publication Date Title
US20220012645A1 (en) Federated learning in o-ran
US11917527B2 (en) Resource allocation and activation/deactivation configuration of open radio access network (O-RAN) network slice subnets
US20220014963A1 (en) Reinforcement learning for multi-access traffic management
NL2033617B1 (en) Resilient radio resource provisioning for network slicing
US20210184989A1 (en) Data-centric service-based network architecture
US20220014942A1 (en) Ml model management in o-ran
EP4214912A1 (en) Non-realtime services for ai/ml
WO2023091664A1 (en) Radio access network intelligent application manager
US10979986B1 (en) Facilitating adaptive power spectral density with chromatic spectrum optimization in fifth generation (5G) or other advanced networks
WO2022060777A1 (en) Online reinforcement learning
US20220217046A1 (en) Providing information
US20220417863A1 (en) Facilitating real-time power optimization in advanced networks
US20220368605A1 (en) Wireless multi-carrier configuration and selection
WO2022155511A1 (en) Data services for ric applications
EP4238289A1 (en) Online learning at a near-real time ric
WO2023069534A1 (en) Using ai-based models for network energy savings
US20230403606A1 (en) Managing resources in a radio access network
US20210297832A1 (en) Facilitating enablement of intelligent service aware access utilizing multiaccess edge computing in advanced networks
US11665686B2 (en) Facilitating a time-division multiplexing pattern-based solution for a dual-subscriber identity module with single radio in advanced networks
WO2023283102A1 (en) Radio resource planning and slice-aware scheduling for intelligent radio access network slicing
WO2023016653A1 (en) Method, apparatus, and computer program
EP4240050A1 (en) A1 enrichment information related functions and services in the non-real time ran intelligent controller
US20240086253A1 (en) Systems and methods for intent-based orchestration of a virtualized environment
US20230370879A1 (en) Measurement data collection to support radio access network intelligence
WO2024091862A1 (en) Artificial intelligence/machine learning (ai/ml) models for determining energy consumption in virtual network function instances

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21870097

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21870097

Country of ref document: EP

Kind code of ref document: A1