WO2022060777A1

WO2022060777A1 - Online reinforcement learning

Info

Publication number: WO2022060777A1
Application number: PCT/US2021/050379
Authority: WO
Inventors: Jaemin HAN; Meryem Simsek; Shu-Ping Yeh; Dawei YING; Jingwen BAI; Hosein Nikopour; Oner Orhan; Leifeng RUAN
Original assignee: Intel Corporation
Priority date: 2020-09-17
Filing date: 2021-09-15
Publication date: 2022-03-24

Abstract

An apparatus for a near real-time (RT) radio access network intelligence controller (RIC) in an open radio access network (O-RAN), the apparatus including a training host and an inference host. The training host of the near-RT RIC is configured to train artificial intelligence (AI)/machine learning (ML) models based on performance and feedback data. The training host of the near-RT RIC is configured to send and receive AI/ML models to the model repository of the non-RT RIC. The training host of the near-RT RIC is configured to replace an AI/ML model being used by the inference host if the performance is below a threshold performance. An apparatus for a non-RT RIC in an O-RAN, the apparatus including a training host and a model repository. The training host of the non-RT RIC is configured to train initial models and update models based on ML offline learning data and other data.

Description

ONLINE REINFORCEMENT LEARNING

PRIORITY CLAIM

[0001] This application claims the benefit of priority to United States Provisional Patent Application 63/079,876, filed September 17, 2020, and entitled “DEPLOYMENT SCENARIO FOR ONLINE REINFORCEMENT LEARNING INNEAR-RT RIC”, which is incorporated herein by reference in its entirety.

TECHNICAL FIELD

[0002] .Aspects pertain to wireless communications. Some aspects relate to wireless networks including 3 GPP (Third Generation Partnership Project) networks, 3 GPP LTE (Long Term Evolution) networks, 3 GPP LTE- A (LTE Advanced) networks, (MulteFire, LTE-U), and fifth-generation (5G) networks including 5G new' radio (NR.) (or 5G-NR) networks, 5G-LTE networks such as 5GNR unlicensed spectrum (NR-U) networks and other unlicensed networks including Wi-Fi, CBRS (OnGo), etc. Other aspects are directed to Open RAN (O-RAN) architectures and, more specifically, techniques for reinforcement learning for O-RAN networks.

BACKGROUND

[0003] Mobile communications have evolved significantly from early voice systems to today’s highly sophisticated integrated communication platform. With the increase in different types of devices communicating with various network devices, usage of 3GPP LTE systems has increased. The penetration of mobile devices (user equipment or UEs) in modern society has continued to drive demand for a wide variety of networked devices in many disparate environments. Fifth-generation (5G) wireless systems are forthcoming and are expected to enable even greater speed, connectivity, and usability. Next generation 5G networks are expected to increase throughput, coverage, and robustness and reduce latency and operational and capital expenditures. 5G new radio (5G-NR) networks will continue to evolve based on 3GPP LTE-Advanced with additional potential new radio access technologies (RATs) to enrich people’s lives with seamless wireless connectivity solutions delivering fast, rich content and services. As current cellular network frequency is saturated, higher frequencies, such as millimeter wave (mmWave) frequency, can be beneficial due to their high bandwidth. [0004] Potential LTE operation in the unlicensed spectrum includes (and is not limited to) the LTE operation in the unlicensed spectrum via dual connectivity (DC), or DC-based LAA, and the standalone LTE system in the unlicensed spectrum, according to which LTE-based technology solely operates in the unlicensed spectrum without requiring an “anchor” in the licensed spectrum, called MulteFire. MulteFire combines the performance benefits of LTE technology with the simplicity of Wi-Fi-like deployments.

[0005] Further enhanced operation of LTE and NR. systems in the licensed, as well as unlicensed spectrum, is expected in future releases and 5G systems such as O-RAN systems. Such enhanced operations can include techniques for machine learning (ML) for O-RAN networks.

BRIEF DESCRIPTION OF THE FIGURES [0006] In the figures, which are not necessarily drawn to scale, like numerals may describe similar components in different views. Like numerals having different letter suffixes may represent different instances of similar components. The figures illustrate generally, by way of example, but not by way of limitation, various aspects discussed in the present document.

[0007] FIG. 1 illustrates an example Open RAN (O-RAN) system architecture.

[0008] FIG. 2 illustrates a logical architecture of the O-RAN system of FIG. 1. [0009] FIG. 3 illustrates a system where a non-RT RIC acts as both the ML training and inference host, in accordance with some embodiments.

[0010] FIG. 4 illustrates a system where a non-RT RIC acts as the ML training and a near-RT RIC acts as the ML inference, in accordance with some embodiments.

[0011] FIG. 5 illustrates a system for online reinforcement learning, in accordance with some embodiments.

[0012] FIG. 6 illustrates a method for online reinforcement learning, in accordance with some embodiments.

[0013] FIG. 7 illustrates a method for online reinforcement learning, in accordance with some embodiments.

DETAILED DESCRIPTION

[0014] The following description and the drawings sufficiently illustrate aspects to enable those skilled in the art to practice them. Other aspects may incorporate structural, logical, electrical, process, and other changes. Portions and features of some aspects may be included in or substituted for, those of other aspects.

Aspects outlined in the claims encompass all available equivalents of those claims.

[0015] FIG. 1 provides a high-level view of an Open RAN (O-RAN) architecture 100. The O-RAN architecture 100 includes four O-RAN defined interfaces - namely, the A1 interface, the 01 interface, the 02 interface, and the Open Fronthaul Management (M)-plane interface - which connect the Service Management and Orchestration (SMO) framework 102 to O-RAN network functions (NFs) 104 and the O-Cloud 106. The SMO 102 (described in Reference [R13]) also connects with an external system 110, which provides enrichment data to the SMO 102. FIG. 1 also illustrates that the A1 interface terminates at an O- RAN Non-Real Time (RT) RAN Intelligent Controller (RIC) 112 in or at the SMO 102 and at the O-RAN Near-RT RIC 114 in or at the O-RAN NFs 104. The O- RAN NFs 104 can be virtual network functions (VNFs) such as virtual machines (VMs) or containers, sitting above the O-Cloud 106 and/or Physical Network Functions (PNFs) utilizing customized hardware. All O-RAN NFs 104 are expected to support the 01 interface when interfacing with the SMO framework 102. The O-RAN NFs 104 connect to the NG-Core 108 via the NG interface (which is a 3GPP defined interface). The Open Fronthaul M-plane interface between the SMO 102 and the O-RAN Radio Unit (O-RU) 116 supports the O- RU 116 management in the O-RAN hybrid model as specified in Reference [R16],

The Open Fronthaul M-plane interface is an optional interface to the SMO 102 that is included for backward compatibility purposes as per Reference [R16] and is intended for management of the O-RU 116 in hybrid mode only. The management architecture of flat mode (see Reference [R12]) and its relation to the 01 interface for the O-RU 116 is in development. The O-RU 116 termination of the 01 interface towards the SMO 102 as specified in Reference [R12],

[0016] FIG. 2 shows an O-RAN logical architecture 200 corresponding to the O- RAN architecture 100 of FIG. 1. In FIG. 2, the SMO 202 corresponds to the SMO 102, O-Cloud 206 corresponds to the O-Cloud 106, the non-RT RIC 212 corresponds to the non-RT RIC 112, the near-RT RIC 214 corresponds to the near- RT RIC 114, and the O-RU 216 corresponds to the O-RU 116 of FIG. 2, respectively. The O-RAN logical architecture 200 includes a radio portion and a management portion. [0017] The management portion/side of the architectures 200 includes the SMO

Framework 202 containing the non-RT RIC 212, and may include the O-Cloud 206. The O-Cloud 206 is a cloud computing platform including a collection of physical infrastructure nodes to host the relevant O-RAN functions (e.g., the near- RT RIC 214, O-RAN Central Unit-Control Plane (O-CU-CP) 221, O-RAN Central Unit-User Plane O-CU-UP 222, and the O-RAN Distributed Unit (O-DU) 215, supporting software components (e.g., OSs, VMMs, container runtime engines, ML engines, etc.), and appropriate management and orchestration functions.

[0018] The radio portion/side of the logical architecture 200 includes the near-RT RIC 214, the O-DU 215, the O-RAN Radio Unit (O-RU) 216, the O-CU-CP 221, and the O-CU-UP 222 functions. The radio portion/side of the logical architecture 200 may also include the O-e/gNB 210.

[0019] The O-DU 215 is a logical node hosting Radio Link Control (RLC), media access control (MAC), and higher physical (PHY) layer entities/elements (High- PHY layers) based on a lower layer functional split. The O-RU 216 is a logical node hosting lower PHY layer entities/elements (Low-PHY layer) (e.g., FFT/iFFT, PRACH extraction, etc.) and RF processing elements based on a lower layer functional split. Virtualization of O-RU 216 is FFS. The O-CU-CP 221 is a logical node hosting the RRC and the control plane (CP) part of the PDCP protocol. The O-CU-UP 222 is a logical node hosting the user plane part of the PDCP protocol and the SDAP protocol.

[0020] An E2 interface terminates at a plurality of E2 nodes. The E2 nodes are logical nodes/entities that terminate the E2 interface. For NR/5G access, the E2 nodes include the O-CU-CP 221, O-CU-UP 222, O-DU 215, or any combination of elements as defined in Reference [R15], For E-UTRA access the E2 nodes include the O-e/gNB 210. As shown in FIG. 2, the E2 interface also connects the O-e/gNB 210 to the Near-RT RIC 214. The protocols over E2 interface are based exclusively on Control Plane (CP) protocols. The E2 functions are grouped into the following categories: (a) near-RT RIC 214 services (REPORT, INSERT, CONTROL and POLICY, as described in Reference [R15]); and (b) near-RT RIC 214 support functions, which include E2 Interface Management (E2 Setup, E2 Reset, Reporting of General Error Situations, etc.) and Near-RT RIC Service Update (e.g., capability exchange related to the list of E2 Node functions exposed over E2).

[0021] FIG. 2 shows the Uu interface between a UE 201 and O-e/gNB 210 as well as between the UE 201 and O-RAN components. The Uu interface is a 3 GPP defined interface (see e.g., sections 5.2 and 5.3 of Reference [R07]), which includes a complete protocol stack from LI to L3 and terminates in the NG-RAN or E-UTRAN. The O-e/gNB 210 is an LTE eNB (see Reference [R04]), a 5G gNB or ng-eNB (see Reference [R06]) that supports the E2 interface. The O-e/gNB 210 may be the same or similar as discussed in FIGS. 3-7. The UE 201 may correspond to UEs discussed with respect to FIGS. 3-7 and/or the like. There may be multiple UEs 201 and/or multiple O-e/gNB 210, each of which may be connected to one another the via respective Uu interfaces. Although not shown in FIG. 2, the O- e/gNB 210 supports O-DU 215 and O-RU 216 functions with an Open Fronthaul interface between them.

[0022] The Open Fronthaul (OF) interfaced) i s/are between O-DU 215 and O- RU 216 functions (see References [R16] and [R17].) The OF interfaced) includes the Control User Synchronization (CUS) Plane and Management (M) Plane. FIGS. 1 and 2 also show that the O-RU 216 terminates the OF M-Plane interface towards the O-DU 215 and optionally towards the SMO 202 as specified in Reference [R16]. The O-RU 216 terminates the OF CUS-Plane interface towards the O-DU 215 and the SMO 202. [0023] The Fl-c interface connects the O-CU-CP 221 with the O-DU 215. As defined by 3 GPP, the Fl-c interface is between the gNB-CU-CP and gNB-DU nodes (see References [R07] and [R10].) However, for purposes of O-RAN, the Fl-c interface is adopted between the O-CU-CP 221 with the O-DU 215 functions while reusing the principles and protocol stack defined by 3 GPP and the definition of interoperability profile specifications.

[0024] The Fl-u interface connects the O-CU-UP 222 with the O-DU 215. As defined by 3 GPP, the Fl-u interface is between the gNB-CU-UP and gNB-DU nodes (see References [R07] and [R10]). However, for purposes of O-RAN, the Fl-u interface is adopted between the O-CU-UP 222 with the O-DU 215 functions while reusing the principles and protocol stack defined by 3GPP and the definition of interoperability profile specifications.

[0025] The NG-c interface is defined by 3GPP as an interface between the gNB- CU-CP and the AMF in the 5GC (see Reference [R06]). The NG-c is also referred as the N2 interface (see Reference [R06]). The NG-u interface is defined by 3GPP, as an interface between the gNB-CU-UP and the UPF in the 5GC (see Reference [R06]). The NG-u interface is referred as the N3 interface (see Reference [R06]). In O-RAN, NG-c and NG-u protocol stacks defined by 3GPP are reused and may be adapted for O-RAN purposes. [0026] The X2-c interface is defined in 3GPP for transmitting control plane information between eNBs or between eNB and en-gNB in EN-DC. The X2-u interface is defined in 3GPP for transmitting user plane information between eNBs or between eNB and en-gNB in EN-DC (see e.g., [005], [006]). In O-RAN, X2- c and X2-u protocol stacks defined by 3GPP are reused and may be adapted for O-RAN purposes.

[0027] The Xn-c interface is defined in 3GPP for transmitting control plane information between gNBs, ng-eNBs, or between an ng-eNB and gNB. The Xn-u interface is defined in 3GPP for transmitting user plane information between gNBs, ng-eNBs, or between ng-eNB and gNB (see e.g., References [R06] and [R08]). In O-RAN, Xn-c and Xn-u protocol stacks defined by 3GPP are reused and may be adapted for O-RAN purposes

[0028] The El interface is defined by 3GPP as being an interface between the gNB-CU-CP (e.g., gNB-CU-CP 3728) and gNB-CU-UP (see e.g., [007], [009]). In O-RAN, El protocol stacks defined by 3GPP are reused and adapted as being an interface between the O-CU-CP 221 and the O-CU-UP 222 functions.

[0029] The O-RAN Non-Real Time (RT) RAN Intelligent Controller (RIC) 212 is a logical function within the SMO framework 102, 202 that enables non-realtime control and optimization of RAN elements and resources; AVmachine learning (ML) workflow(s) including model training, inferences, and updates; and policy-based guidance of applications/features in the Near-RT RIC 214.

[0030] The O-RAN near-RT RIC 214 is a logical function that enables near-realtime control and optimization of RAN elements and resources via fine-grained data collection and actions over the E2 interface. The near-RT RIC 214 may include one or more AIZML workflows including model training, inferences, and updates.

[0031] The non-RT RIC 212 can be an ML training host to host the training of one or more ML models. The ML data can be collected from one or more of the following: the Near-RT RIC 214, O-CU-CP 221, O-CU-UP 222, O-DU 215, O- RU 216, external enrichment source 110 of FIG. 1, and so forth. For supervised learning, and the ML training host and/or ML inference host/actor can be part of the non-RT RIC 212 and/or the near-RT RIC 214. For unsupervised learning, the ML training host and ML inference host/actor can be part of the non-RT RIC 212 and/or the near-RT RIC 214. For reinforcement learning, the ML training host and ML inference host/actor are co-located as part of the near-RT RIC 214. In some implementations, the non-RT RIC 212 may request or trigger ML model training in the training hosts regardless of where the model is deployed and executed. ML models may be trained and not currently deployed.

[0032] In some implementations, the non-RT RIC 212 provides a query-able catalog for an ML designer/developer to publish/install trained ML models (e.g., executable software components). In these implementations, the non-RT RIC 212 may provide discovery mechanism if a particular ML model can be executed in a target ML inference host (MF), and what number and type of ML models can be executed in the target ML inference host. The Near-RT RIC 214 is a managed function (MF). For example, there may be three types of ML catalogs made discoverable by the non-RT RIC 212: a design-time catalog (e.g., residing outside the non-RT RIC 212 and hosted by some other ML platform(s)), a training/deployment-time catalog (e.g., residing inside the non-RT RIC 212), and a run-time catalog (e.g., residing inside the non-RT RIC 212). The non-RT RIC 212 supports necessary capabilities for ML model inference in support of ML assisted solutions running in the non-RT RIC 212 or some other ML inference host. These capabilities enable executable software to be installed such as VMs, containers, etc. The non-RT RIC 212 may also include and/or operate one or more ML engines, which are packaged software executable libraries that provide methods, routines, data types, etc., used to run ML models. The non-RT RIC 212 may also implement policies to switch and activate ML model instances under different operating conditions.

[0033] The non-RT RIC 22 is able to access feedback data (e.g., FM, PM, and network KPI statistics) over the 01 interface on ML model performance and perform necessary evaluations. If the ML model fails during runtime, an alarm can be generated as feedback to the non-RT RIC 212. How well the ML model is performing in terms of prediction accuracy or other operating statistics it produces can also be sent to the non-RT RIC 212 over 01. The non-RT RIC 212 can also scale ML model instances running in a taiget MF over the 01 interface by observing resource utilization in MF. The environment where the ML model instance is running (e.g., the MF) monitors resource utilization of the running ML model. This can be done, for example, using an ORAN-SC component called ResourceMonitor in the near-RT RIC 214 and/or in the non-RT RIC 212, which continuously monitors resource utilization. If resources are low or fall below a certain threshold, the runtime environment in the near-RT RIC 214 and/or the non- RT RIC 212 provides a scaling mechanism to add more ML instances. The scaling mechanism may include a scaling factor such as an number, percentage, and/or other like data used to scale up/down the number of ML instances. ML model instances running in the target ML inference hosts may be automatically scaled by observing resource utilization in the MF. For example, the Kubemetes® (K8s) runtime environment typically provides an auto-scaling feature.

[0034] The A1 interface is between the non-RT RIC 212, which is within or the SMO 202) and the near-RT RIC 214. The A1 interface supports three types of services as defined in Reference [R14], including a Policy Management Service, an Enrichment Information Service, and ML Model Management Service. A1 policies have the following characteristics compared to persistent configuration as defined in Reference [R14]: A1 policies are not critical to traffic; A1 policies have temporary validity; A1 policies may handle individual UE or dynamically defined groups of UEs; A1 policies act within and take precedence over the configuration; and A1 policies are non-persistent, i.e., do not survive a restart of the near-RT RIC.

[0035] A technical problem is how to train and maintain good AI/ML models to be used by an inference host to perform E2 control and other controls. The disclosed examples address this issue by including two training hosts: one in the non-RT RIC and one in the near-RT RIC. The training host in the near-RT RIC performs online learning, which may use different data than the offline learning, and ensures that an adequate AI/ML model is being used by the inference host. The training host of the non-RT RIC performs offline learning and transfers an initial model and updated models to a model repository that is used to store AIZML model that may be used by the training host of the near-RT RIC.

[0036] A deployment is disclosed of online reinforcement learning in the Near- RT RIC. The AI/ML training host and inference host are located in the Near-RT RIC, while an offline learning host and ML model repository reside in the Non- RT RIC. The deployment reduces communication and feedback delay between the ML training host and the ML inference host. The delay reduction is essential for online reinforcement learning, especially for generating fast changing decision-making policies that adapt to highly dynamic environments. The ML model repository ensures performance of the online reinforcement learning by saving the most accurate and best performing ML models.

[0037] Examples disclose a deployment scenario for online reinforcement learning in the Near-RT RIC, which incorporates an online training host and inference host in the Near-RT RIC, while an offline training host and ML model repository reside in SMO/Non-RT RIC.

[0038] FIG. 3 illustrates a system 300 where a non-RT RIC acts as both the ML training and inference host, in accordance with some embodiments. ML training information 322 is collected from the DU/O-CU 332 over the E2 interface and/or 01 interface and sent to data management 308. ML online information 324 is collected from the E2 interface and/or 01 interface and sent to data management 308. Data management 308 sends the information to the ML training 316 and ML inference 315. The ML inference 314 uses a model and sends configuration management 306 (if DU or CU is subject of action). The ML inference 314 sends Policy/intent (if near-RT RIC is subject of action) 304 to the near-RT RIC 302. The 01 management (MGMT) 310 sends data enrichment 330 (and deploy instructions and models). The non-RT RIC 312 includes the ML training 316 and ML inference 314. The ML inference sends perform data 338 to the ML training 316. The ML training 316 sends deploy instructions and models 336 to the ML inference 314. [0039] FIG. 4 illustrates a system 400 where a non-RT RIC acts as the ML training and a near-RT RIC acts as the ML inference, in accordance with some embodiments. In FIG. 4 the ML inference 314 resides in the near-RT RIC rather than the non-RT RIC 312. 01 management sends ML deploy 404 instructions or models to the ML inference 314. The same numbers as FIG. 3 are meant to indicate the same or similar information and/or function.

[0040] FIG. 5 illustrates a system 500 for online reinforcement learning, in accordance with some embodiments. The performance feedback 530 is training data for online training, e.g., rewards, environment states, performed actions, and so forth, and data for performance monitoring. The training host (online learning) (“training host near-RT”) 514 is configured for online learning. The training host near-RT 514 resides in the near real-time RIC 510. The inference host 512 is an Al/ML inference host. The inference host 512 resides in the near- RTRIC.

[0041] The training host (offline) (“training host non-RT”) 506 is a training host for offline learning. The training host non-RT 506 resides in the SMO or non- RT RIC 504. The model repository 508 is an Al/ML model repository. The model repository 508 resides in the SMO/non-RT RIC (“non-RT RIC”) 504. [0042] The training host non-RT 506 collects ML learning data 516 from the E2 nodes O-CU/O-DU 502 (“E2 nodes”) over the 01 interface for offline reinforcement learning. The training host non-RT 506 trains the initial model based on the offline training. The training host non-RT 506 transfers via move model 518 the initial model 534, which is an offline trained model, to the model repository as a trained model 536. The ML learning data 516, e.g., O-CU/O-DU data collected over 01, for offline training, e.g., as performed by training host non-RT 506, and the O-CU/O-DU data collected over E2, e.g., inference data 526, for online learning, e.g., training host near-RT 514, and/or inference use, e.g., inference host 512, may be different. [0043] The model repository 508 is associated with or stored in the SMO/Non-

RT RIC 504, which stores trained, validated, and tested models. The terms associated with or stored may mean implemented by, or located within or in, in accordance with some embodiments. The trained model 536 may be trained, validated, and tested. The trained model 536 may be the initial model 534 transferred to the model repository 508. The stored model, e.g., trained model 536, may be tested to be well-performing and may be used as a backup for online learning, in case the running model 538 drifts too much and leads to severe degradation. [0044] The model repository 508 sends out a model download 522 notification to the training host near-RT 514, which is associated with or located in the near- RT RIC 510 and the model repository 508 sends the model, e.g., trained model 536, to the training host near-RT 514. The model repository 508 receives a model download request from the training host near-RT 514 associated with or in the near-RT RIC 510, and the model repository 508 sends, in response, a model, e.g., trained model 536 or updated model 540, to the training host near- RT 514.

[0045] The model repository 508 receives a model upload request from the training host near-RT 514 associated with or in the near-RT RIC 510, and the model repository 508 receives a model upload 524 comprising the updated model 540 from the training host near-RT 514.

[0046] In one embodiment, the model download 522 and the model upload 524 between the model repository 508 associated with or in SMO/Non-RT RIC 508 and training host near-RT 514 associated with or in the near-RT RIC 510 are communicated over the A1 interface. The request model message for a model download 522 and the notification message for model upload 524 are part of the Al-ML service. In another embodiment, the model download 522 and the model upload 524 are communicate over the 01 interface. [0047] The training host near-RT 514 in the near-RT RIC 510 collects learning data for online reinforcement learning from E2 nodes over the E2 interface and from the ML inference host (“inference host”) 512, e.g., the performance feedback 530, using an application program interface (API) of the near-RT RIC 510. The training host near-RT 514 updates the model, e.g., running model 538, based on the above training data for online learning, and it deploys the AI/ML model, e.g., running model 538, to the inference host 512.

[0048] The training host near-RT 514 associated with or in the near-RT RIC 510 receives a model download notification from the model repository 508, and it receives the model, e.g., train model 536 or updated model 540, from the model repository 508. The training host near-RT 514 in the near-RT RIC 510 communicates or sends out a model download 522 request to the model repository 508, and the training host near-RT 514 receives the model from the model repository 508. [0049] The training host near-RT 510 associated with or in the Near-RT RIC 510 sends out a model upload 524 request to the model repository 508, and the training host near-RT 514 sends the model to the model repository 508.

[0050] The training host near-RT 514 in the near-RT RIC 510 receives AIZML performance feedback 530 from the inference host 512. If the training host near- RT 514 detects severe performance degradation, then the training host near-RT 514 can send a model download 522 request to the model repository 508. After a model download 522 of a previously well-performing model, e.g., updated model 540, from the model repository 508, the training host near-RT 514 communicates a model deploy 532 with this backup model to the inference host 512. The training host near-RT 514 associated with or in the near-RT RIC 510 determines whether to communicate a model upload 524 of the latest trained model, e.g., trained running model 542, to the model repository 508. A validation and testing procedure for the uploaded model is done by the training host near-RT 514 associated with or in the near-RT RIC 510 before uploading. [0051] The inference host 512 is associated with or resides in the near-RT RIC 510. The inference host 512, which may be termed a ML inference host, collects data, e.g., inference data 526, from E2 node over the E2 interface for inference. The inference host 512 uses the model, e.g., the running model 538, deployed by the training host near-RT 514 associated with or residing in the Near-RT RIC

510 to generate E2 control . The inference host 512 enforces the control actions/guidance via E2 interface. The inference host 512 sends performance feedback and training data 530 for online learning to the training host near-RT 514 in the near-RT RIC 510. The SMO/NON-RT RIC 504 and the near-RT RIC 510 may communicate over the Ol/Al interfaces.

[0052] The ML learning data 516 is data for offline training. Performance feedback 530 includes data is for online training. The ML learning data 516, the inference data 526, and/or the performance feedback 530 may include one or more of the following: the size, number, and the ML downlink (DL) physical resource blocks (PRBs) used for data traffic, the size, number, and the uplink (UL) PRBs used for data traffic; an average DL user equipment (UE) throughput in a next generation Node-B (gNB) of the 0-RAN network; an average UL UE throughput in the gNB; a number of protocol data unit (PDU) sessions requested for setup in the O-RAN network; a number of PDU sessions successfully set up in the O-RAN network; and/or a number of PDU sessions failed to set up in the O-RAN network.

[0053] FIG. 6 illustrates a method 600 for online reinforcement learning, in accordance with some embodiments. The method 600 begins at operation 602 (or step 1) with collecting training data for offline learning. For example, the training host non-RT 506 collects ML learning data 516 from the E2 nodes O- CU/O-DU 502 (“E2 nodes") 502 over the 01 interface for offline reinforcement learning.

[0054] The method 600 continues at operation 604 (or step 2) with performing offline learning. For example, the training host non-RT 506 trains the initial model, e.g., initial model 534, based on the offline training using the ML learning data 516.

[0055] The method 600 continues at operation 606 (or step 3) with moving the initial model. For example, the initial model 534 is moved from the training host non-RT 506 offline learning to the model repository 508 in the non-RT RIC 504 as a trained model 538.

[0056] The method 600 continues at operation 608 (or step 4) downloading the model to a near real time training host. For example, the ALML model, e.g., trained model 536, is downloaded, e.g., model download 522, to the training host near-RT 514 associated with or residing in the near-RT RIC 510.

[0057] The method 600 continues at operation 610 (or step 5) with deploying the model. For example, the AIZML model, e.g., running model 538, trained running model 542, trained model 536, or updated model 540, is deployed to the inference host 512, e.g., xApp, in the near-RT RIC 510. [0058] The method 600 continues at operation 612 (or step 6) with collecting inference data. For example, inference data 526 is collected from E2 nodes via the E2 interface, e.g., E2 of FIGS. 1-4.

[0059] The method 600 continues at operation 614 (or step 7) with generating decision-making policies. For example, the ML inference host, e.g., inference hots 512, generates decision making policies based on the deployed ML model, e.g., running model 538, and the inference data, e.g., inference data 526.

[0060] The method 600 continues at operation 616 (or step 8) with enforcing E2 control actions/guidance via the E2 interface. For example, the ML inference host, e.g., inference host 512, enforces E2 control action/guidance via the E2 interface, e.g., E2 control 528.

[0061] The method 600 continues at operation 618 (or step 9) with collecting training data over the E2 interface. For example, the training data, e.g., performance data 530, for online learning is collected over E2 interface from E2 nodes to the training host, e.g., training host near-RT 514, in the near-RT RIC 510.

[0062] The method 600 continues at operation 620 (or step 10) with providing feedback. For example, the ML inference host, e.g., inference host 512, provides performance feedback and online training data, e.g., performance feedback 530, to the training host, e.g., training host near-RT 514 associated with or residing in the near-RT RIC 510.

[0063] The method 600 continues at operation 622 (or step 11) with performing online learning. For example, the training host, e.g., training host near-RT 514, in the near-RT RIC 510 performs online learning based on the online learning data, e.g., performance data 530, from the inference host, inference host 512, and E2 nodes.

[0064] The method 600 continues at operation 624 (or step 12) deploy updated model. For example, the training host, e.g., training host near-RT 514, in the near-RT RIC, near-RT RIC 510, deploys, e.g., deploy model 532, updated model, e.g., trained running model 542 becomes the running model 538, to the ML inference host, e.g., inference host 512.

[0065] In some embodiments, after operation 624 the method 600 returns to operation 612. For example, the method 600 may return to operation 612 during operation of the inference host 512. The method 600 may continue to operation 626 when the training host 514 determines that a new AI/ML model should be used. The method 600 may return to operation 612 after operation 628.

[0066] The method 600 continues at operation 626 (or step 13) sending upload request. For example, if the training host, e.g., the training host near-RT 514, associated with or residing in the near-RT RIC 510 detects the running model 538 performs well (based on performance feedback data), then the training host near-RT 514 may send a model upload request to the model repository 508, and the updated AI/ML model, e.g., trained running model 542, is uploaded and stored in the model repository, e.g., as updated model 540. [0067] The method 600 continues at operation 628 (or step 14) with request model download. For example, if the training host, e.g., training host near-RT 514, in the near-RT RIC, e.g., near-RT RIC 510, detects the running model 538 performs not as well as expected (based on performance feedback data, then it may request a model download from the model repository, when the running model leads to severe performance degradation. For example, the training host 514 and/or inference host 512 may detect performance degradation that is above a threshold value and send a request for a different model. The training host near-RT 514 may use the trained running model 542 or may request a model from the model repository 508, e.g., trained model 536 or updated model 540, which may be communicated or sent to the training host near-RT 514 via model download 522. The training host near-RT 514 may select which model to deploy in the inference host 512 based on performance data 530 and/or previous performance information associated with the other models, e.g., running model 538, trained running model 542, trained model 536, or updated model 540.

[0068] The method 600 may include one or more additional operations. The operations of method 600 may be performed in a different order. One or more of the operations of method 600 may be optional. Different step may be performed by different functional entities such as training host 506, model repository 508, inference host 512, and/or training host 514.

[0069] FIG. 7 illustrates a method 700 for online reinforcement learning, in accordance with some embodiments. The method 700 may be performed by a near-RT RIC in an O-RAN. The method 700 begins at operation 702 with receiving a model such as an AI/ML model. The training host 514 of the near- RT RIC 510 may receive a model such as trained model 536, updated model 540, running model 538, and/or initial model 534. The method 700 continues at operation 704 with receiving training data. For example, the training host 514 of the near-RT RIC 510 may receive performance feedback 530 from the inference host 512. The method 700 continues at operation 706 with updating the AI/ML model based on the training data. For example, training host 514 of the near-RT RIC 510 may perform training on the AI/ML model to update the model and generate the trained running model 542 with updates.

[0070] Simultaneously with training host 514 performing online learning the training host 506 may be performing offline learning and may update AI/ML models. The data used to update the models may be different as disclosed herein.

[0071] The following describes further example embodiments. Example 1 includes where a deployment scenario for online reinforcement learning includes he training host for online learning and the AI/ML inference host reside in the Near-RT RIC; and, the training host for offline learning and AI/ML model repository are in SMO/Non-RT RIC.

[0072] In Example 2, the subject matter of Example 1 includes where the functionality of the training host in SMO/Non-RT RIC includes the following: (1) collecting learning data for offline reinforcement learning from E2 nodes over the 01 interface; (2) training the initial model based on the offline training; and, (3) transferring the offline trained model to the model repository.

[0073] In Example 3, the subject matter of Examples 1 and 2 includes where the functionality of the model repository in SMO/Non-RT RIC includes the following: (1) storing trained models; (2) sending out model download notification to the training host in the Near-RT RIC, and sending the model to that training host; (3) It receives model download requests from the training host in the Near-RT RIC, and sends the model to that training host; and, (4) receiving the model upload request from the training host in the Near-RT RIC, and receiving the updated model from the training host.

[0074] In Example 4, the subject matter of Examples 1-3 includes where the model download and upload are between the model repository in SMO/Non-RT RIC and ML training host in the Near-RT RIC. In one embodiment, the interactions are over A1 interface where the request and notification for model download and upload are part of Al-ML service. In another embodiment, the interactions are over 01 interface.

[0075] In Example 5, the subject matter of Examples 1-4 includes where the training host in the Near-RT RIC is configured to perform the following: (1) collect part of learning data for online reinforcement learning from E2 nodes over E2 interface; (2) collect part of learning data for online reinforcement learning from the ML inference host over Near-RT RIC’s internal API; (3) update the model based on the online training data; (4) deploy the AI/ML model to the inference host; (5) receive model download notification from the model repository and receive the model from the model repository; (6) send out model download request to the model repository, and, in response, receive the model from the model repository; (7) send out model upload request to the model repository and send the model to the model repository; and, (8) receive A1ZML performance feedback from the ML inference host. [0076] In Example 6, the subject matter of Examples 1-5 includes where the ML inference host in the Near-RT RIC is configured to perform the following: (1) collect data from E2 node over E2 interface for inference; (2) infer E2 control using the model deployed by the training host in the Near-RT RIC; (3) enforce the control actions/guidance via E2 interface; and (4) send performance feedback and training data for online learning to the training host in the Near-RT RIC. [0077] Example 7 a method for initial offline training including the following:

(1) Step 1 : collecting training data from the E2 nodes over 01 interface to SMO/Non-RT RIC; (2) Step 2: the training host inside SMO/Non-RT RIC performing offline learning; (3)Step 3: the trained offline model transferring a model to the model repository; (4) Step 4: the model repository sending model download notification to the training host in the Near-RT RIC; and, (5) Step 5: downloading the initial model to the training host in the Near-RT RIC.

[0078] Example 8 includes a method for online reinforcement learning including: (1) Step 1 : The training host in the Near-RT RIC deploying AI/ML model to the inference host; (2) Step 2: the inference host collecting inference data from E2 nodes over the E2 interface; (3) Step 3: the ML inference host performing inference, generating decision making policies, using the deployed ML model; (4) Step 4: the ML inference host enforcing E2 control action/guidance via the E2 interface; (5) Step 5: the ML inference host providing online training data to the training host in the Near-RT RIC; (6) Step 6: the training host in the Near-RT RIC collecting online training data over E2 interface from E2 nodes; (7) Step 7: the training host in the Near-RT RIC performs online reinforcement learning and update the AI/ML model; and, (8) Step 8: the training host in the Near-RT RIC deploying updated model to the ML inference host.

[0079] Example 9 includes a method for updated model upload to repository includes the following: (1) Step 1: the training host sends model upload request to the model repository; and, (2) Step 2: the training host in the Near-RT RIC uploads the updated model to the repository. [0080] Example 10 includes a method for a backup model download from repository, the method including the following operations: (1) step 1: the ML inference host providing performance feedback to the training host in the near- RT RIC; (2) Step 2: the training host in the Near-RT RIC detecting performance degradation, which may be greater than average or severe, based on the feedback from inference host; (3) step 3: the training host sending model download request to the model repository; and (4) step 4: the backup model (previously well-performing model) is downloaded from the repository to the training host in the Near-RT RIC.

[0081] REFERENCES

[0082] [R04] 3 GPP TS 36.401 v15.l.O (2019-01-09).

[0083] [R05] 3 GPP TS 36.420 v15.2.0 (2020-01-09).

[0084] [R06] 3 GPP TS 38.300 v16.0.0 (2020-01-08). [0085] [R07] 3 GPP TS 38.401 v16.0.0 (2020-01-09).

[0086] [R08] 3 GPP TS 38.420 v15.2.0 (2019-01-08).

[0087] [R09] 3 GPP TS 38.460 v16.0.0 (2020-01-09).

[0088] [R10] 3 GPP TS 38.470 v16.0.0 (2020-01-09).

[0089] [R12] O-RAN Alliance Working Group 1, O-RAN Operations and Maintenance Architecture Specification, version 2.0 (Dec 2019) (“0-RAN- WG1 ,OAM-Architecture-v02.00”).

[0090] [R13] O-RAN Alliance Working Group 1, O-RAN Operations and

Maintenance Interface Specification, version 2.0 (Dec 2019) (“O-RAN- WG1 .01 -Interface-v02.00”). [0091] [R14] O-RAN Alliance Working Group 2, O-RAN A1 interface:

General Aspects and Principles Specification, version 1.0 (Oct 2019) (“ORAN- WG2. A1.GA&P-vO 1.00”).

[0092] [R15] O-RAN Alliance Working Group 3, Near-Real-time RAN

Intelligent Controller Architecture & E2 General Aspects and Principles (“ORAN-WG3.E2GAP.0-v0.1”).

[0093] [R16] O-RAN Alliance Working Group 4, O-RAN Fronthaul

Management Plane Specification, version 2.0 (July 2019) (“ORAN-WG4.MP.O- v02.00.00”). [0094] [R17] O-RAN Alliance Working Group (WG) 4, O-RAN Fronthaul

Control, User and Synchronization Plane Specification, version 2.0 (July 2019) (“ORAN-WG4.CUS.0-v02.00”).

[0095] [R18] O-RAN WG1, “O-RAN Architecture Description”. [0096] [R19] O-RAN WG2, ““AI/ML Workflow Description and

Requirements”.

TERMINOLOGY

[0097] The term “application” may refer to a complete and deployable package, environment to achieve a certain function in an operational environment. The term “A1ZML application” or the like may be an application that contains some AJZML models and application-level descriptions.

[0098] The term “machine learning” or “ML” refers to the use of computer systems implementing algorithms and/or statistical models to perform specific task(s) without using explicit instructions, but instead relying on patterns and inferences. ML algorithms build or estimate mathematical model(s) (referred to as “ML models” or the like) based on sample data (referred to as “training data,” “model training information,” or the like) in order to make predictions or decisions without being explicitly programmed to perform such tasks. Generally, an ML algorithm is a computer program that learns from experience with respect to some task and some performance measure, and an ML model may be any object or data structure created after an ML algorithm is trained with one or more training datasets. After training, an ML model may be used to make predictions on new datasets. Although the term “ML algorithm” refers to different concepts than the term “ML model,” these terms as discussed herein may be used interchangeably for the purposes of the present disclosure.

[0099] The term “machine learning model,” “ML model,” or the like may also refer to ML methods and concepts used by an ML- assisted solution. An “ML- assisted solution” is a solution that addresses a specific use case using ML algorithms during operation. ML models include supervised learning (e.g., linear regression, k-nearest neighbor (KNN), descision tree algorithms, support machine vectors, Bayesian algorithm, ensemble algorithms, etc.) unsupervised learning (e.g., K-means clustering, principle component analysis (PCA), etc.), reinforcement learning (e.g., Q-leaming, multi-armed bandit learning, deep RL, etc.), neural networks, and the like. Depending on the implementation a specific ML model could have many sub-models as components and the ML model may train all sub-models together. Separately trained ML models can also be chained together in an ML pipeline during inference. An “ML pipeline” is a set of functionalities, functions, or functional entities specific for an ML-assisted solution; an ML pipeline may include one or several data sources in a data pipeline, a model training pipeline, a model evaluation pipeline, and an actor. The “actor” is an entity that hosts an ML assisted solution using the output of the ML model inference). The term “ML training host” refers to an entity, such as a network function, that hosts the training of the model. The term “ML inference host” refers to an entity, such as a network function, that hosts model during inference mode (which includes both the model execution as well as any online learning if applicable). The ML-host informs the actor about the output of the ML algorithm, and the actor takes a decision for an action (an “action” is performed by an actor as a result of the output of an ML assisted solution). The term “model inference information” refers to information used as an input to the ML model for determining inference(s); the data used to train an ML model and the data used to determine inferences may overlap, however, “training data” and “inference data” refer to different concepts. [00100] Although an aspect has been described with reference to specific exemplary aspects, it will be evident that various modifications and changes may be made to these aspects without departing from the broader scope of the present disclosure. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense. This Detailed Description, therefore, is not to be taken in a limiting sense, and the scope of various aspects is defined only by the appended claims, along with the full range of equivalents to which such claims are entitled.

Claims

CLAIMS What is claimed is:

1. An apparatus for a near real-time (RT) radio access network intelligence controller (RIC)(Near-RT RIC) in an open radio access network (O- RAN), the apparatus comprising: processing circuitry configured to: receive an artificial intelligence (AQ/machine learning (ML) model; receive training data from E2 nodes over an E2 interface and from an ML inference host residing in the Near-RT RIC; and update the AIZML model based on the training data.

2. The apparatus of claim 1 wherein the processing circuitry is an

ML training host.

3. The apparatus of claim 1, wherein the processing circuitry is first processing circuitry, and wherein the apparatus further comprises: second processing circuitry configured to: receive data over the E2 interface; enforce control actions and guidance via the E2 interface; and send performance feedback and training data to the ML training host.

4. The apparatus of claim 3 wherein the second processing circuitry is an ML inference host and wherein the processing circuitry is further configured to: infer E2 controls using another AIZML model deployed to the ML inference host.

5. The apparatus of claim 3 wherein the first processing circuitry is a ML training host and the second processing circuitry is an ML inference host, and wherein the second processing circuity is further configured to: send training data and performance feedback data to the ML training host of the near-RT RIC.

6. The apparatus of claim 1 wherein the processing circuitry is a ML training host and wherein the processing circuitry is further configured to: detect severe performance degradation based on feedback data from an inference host residing in the near-RT RIC; send a model download request to a model repository, wherein the model repository resides in a non-RT RIC; and receive a backup model from the repository.

7. The apparatus of claim 1 wherein the training data from the ML inference host is received over a near-RT RIC internal application program interface (API), wherein the processing circuitry is further configured to: deploy the updated AIZML model to the ML inference host.

8. The apparatus of claim 1 wherein the processing circuitry is further configured to: receive a model download notification from a model repository, the model repository residing in a non-RT RIC; receive the AIZML model from the model repository; send a model download request to the model repository; receive another AIZML model from the model repository; send an AIZML model upload request to the model repository; and send the updated AIZML model to the model repository.

9. The apparatus of claim 8 wherein the model download notification, the AIZML model, the model download request, the another AIZML model, the AIZML model upload request, and the updated AIZML model are sent or received over an A1 interface or the 01, and wherein the download request and upload request are part of a Al-ML service.

10. The apparatus of claim 1 wherein the processing circuitry is first processing circuitry, the training data is first training data, the AIZML model is a first AIZML model, and the apparatus is a first apparatus and further comprising: a second apparatus for a non-RT RIC, the apparatus comprising: second processing circuitry configured to: receive a second AI/ML model; receive second training data; and update the second AI/ML model based on the training data.

11. The apparatus of claim 10 wherein the second processing circuitry is an ML training module.

12. The apparatus of claim 11 wherein the second training data is received from the E2 nodes over an 01 interface and wherein the second processing circuitry is further configured to: train an initial AI/ML model based on the second training data; and deploy the trained initial AI/ML model to the ML inference host.

13. The apparatus of claim 1 wherein the processing circuitry is first processing circuitry, the training data is first training data, the AI/ML model is a first AI/ML model, and the apparatus is a first apparatus and further comprising: a second apparatus for a non-RT RIC, the apparatus comprising: second processing circuitry configured to: receive second training data from the E2 nodes over an 01 interface; perform offline learning to train an initial AI/ML model; transfer the initial AI/ML model to a model repository of the non-RT

RIC; send model download notification to a ML training host of the near-RT

RIC; and download the initial AI/ML model to the ML training host of the near-RT RIC, wherein the second processing circuitry is a ML training host of the non- RT RIC and the model repository of the non-RT RIC.

14. An apparatus for an open radio access network (O-RAN), the apparatus comprising a non real-time (RT) radio access network intelligence controller (RIC), the non-RT RIC comprising first processing circuitry, and a near-RT RIC, the near-RT RIC comprising second processing circuitry, wherein the first processing circuitry is configured to: perform first training of a first artificial intelligence (AI)/machine learning (ML) model, and wherein the first processing circuitry is a training host of the non-RT RIC; and wherein the second processing circuitry is configured to: perform second training on a second AIZML model, and wherein the second processing circuitry is a training host of the near-RT RIC.

15. The apparatus of claim 14, wherein the first processing circuitry is further configured to: receive first training data, and wherein the training is based on the first training data, and wherein the second processing circuitry is further configured to: receive the second AIZML model from a model repository of the non-RT

RIC; and receive second training data, and wherein the training is based on the second training data.

16. A non-transitory computer-readable storage medium that stores instructions for execution by one or more processors of a near real-time (RT) radio access network intelligence controller (RIC) in an Open RAN (O-RAN) network, the instructions to configure the one or more processors to perform the following operations: receive an artificial intelligence (Al)/machine learning (ML) model; receive training data from E2 nodes over an E2 interface and from a ML inference host residing in the near-RT RIC; and update the AIZML model based on the training data.

17. The non-transitory computer-readable storage medium of claim 16 wherein the near-RT RIC comprises an ML training host.

18. The non-transitory computer-readable storage medium of claim 16 wherein the operations further comprise: at the ML inference host, receive data over an E2 interface; at the ML inference host, enforce control actions and guidance via the E2 interface; and at the ML inference host, send performance feedback and training data to the ML training host.

19. The non-transitory computer-readable storage medium of claim 18 wherein the operation further comprise: at the ML inference host, infer E2 controls using another AIZML model deployed to the ML inference host.

20. The non-transitory computer-readable storage medium of claim 18 wherein the operation further comprise: at the ML inference host, send training data and performance feedback data to the ML training host of the near-RT RIC.