WO2023140758A1

WO2023140758A1 - Reinforcement learning model for selecting a network function producer

Info

Publication number: WO2023140758A1
Application number: PCT/SE2022/050398
Authority: WO
Inventors: Athanasios KARAPANTELAKIS; Maxim TESLENKO; Lackis ELEFTHERIADIS; Alessandro Previti; Ioannis Fikouras
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2022-01-20
Filing date: 2022-04-25
Publication date: 2023-07-27

Abstract

According to an aspect, there is provided a computer-implemented method of operating a first node to determine a model for selecting a Network Function Producer, NFP, to provide a first network function, NF, instance to a first Network Function Consumer, NFC, wherein a plurality of NFPs are capable of providing the first NF instance, and the plurality of NFPs comprise one or more non-terrestrial-based NFPs and one or more terrestrial-based NFPs, wherein one or more of the NFPs have intermittent availability for providing a reliable NF instance to the first NFC. The method comprises (i) receiving (601), from a NF Repository Function, NRF, information on the plurality of NFPs, wherein the information comprises availability information for the plurality of NFPs; (ii) selecting (603), based on the received information, a first candidate NFP to provide the first NF instance; (iii) establishing (605) a connection to the first candidate NFP, and using the first NF instance provided by the first candidate NFP; (iv) monitoring (607) one or more network availability performance metrics for the first NF instance provided by the first candidate NFP, wherein the network availability performance metrics relate to and/or are affected by the availability of a candidate NFP to provide a reliable NF instance; (v) calculating (609) a reward based on values of the one or more network availability performance metrics for the first NF instance provided by the first candidate NFP; (vi) training (611) a reinforcement learning, RL, model that is to be used to identify an NFP in the plurality of NFPs to provide the first NF instance, wherein the RL model is trained using information on the candidate NFP and the calculated reward; and (vii) repeating (613) steps (ii)-(v) for one or more further selected candidate NFPs to calculate respective rewards and training the RL model using information on the one or more further candidate NFPs and the respective calculated rewards.

Description

REINFORCEMENT FUNCTION PRODU

Technical Field

This disclosure relates to selecting a Network Function Producer (NFP) for providing a first network function (NF) instance to a Network Function Consumer (NFC). In particular, this disclosure relates to a method and apparatus for determining a model for selecting a NFP when one or more NFPs capable of providing the NF instance have intermittent availability.

Background

A new generation of multi-service mobile networks provides connectivity services of different Quality of Service (QoS) demands to enterprise customers. Technologies such as network slicing and software-defined networks ensure that the operation of each service is isolated from other services running on top of the same infrastructure. A certain class of low- latency, high availability applications, known as ultra-reliable low latency communications (URLLC) requires low latency on the network link between the mobile devices (known as User Equipments - UEs) and the endpoint they communicate with (e.g., an Internet server). This necessitates the existence of a core network close to the location of the radio base station (RBS) or RBSs where UEs consuming the URLLC connectivity service are located, to avoid the time to backhaul to a centralised core network.

In 5^th Generation (5G) networks, a service-based architecture has been introduced for the core network, which is broken down into communicating services known as Network Functions (NFs). Once deployed in an actual network, there are referred to as NF Instances. These instances can be hosted in any cloud infrastructure, either closer to the edge/radio access network (RAN), or further away from the edge in a centralised cloud.

Fig. 1 illustrates a 5G system reference architecture 101 showing service-based interfaces used within the Control Plane (CP). It will be appreciated that not all types of NF instances/NF service are depicted. Service-based interfaces are represented in the format Nxyz and point to point interfaces in the format Nx. The reference architecture 101 shown in Fig. 1 comprises the following types of NF instance: a Network Slice Selection Function (NSSF) 102 that has a Nnssf interface, a Network Exposure Function (NEF) 103 that has a Nnef interface, a Network Repository Function (NRF) 104 that has a Nnrf interface, a Policy Control Function (PCF) 105 that has a Npcf interface, a Unified Data Management (UDM) 106 that has a Nudm interface, an Application Function (AF) 107 that has a Naf interface, an Authentication Server Function (AUSF) 108 that has a Nausf interface, an Access and Mobility Management Function (AMF) 109 that has a Namf interface, a Session Management Function (SMF) 110 that has a Nsmf interface and a Service Communication Proxy (SCP) 111 with a Nscp interface. The AMF 109 has an N1 interface to a user equipment (UE) 112, and an N2 interface to a radio access network (AN) 113 (which can be a radio AN, RAN). The SMF 110 has an N4 interface to a User Plane Function (UPF) 114. The interface between the R(AN) 113 and the UPF 114 is the N3 interface, and the interface between the UPF 114 and a Data Network (DN) 115 is the N6 interface.

The Network Repository Function (NRF) 104 provides a NF discovery and selection service for NF instances. In this way, any NF instance can discover and select services offered by other NF instances. Eventually the requestor or consumer of the NF instance can access the selected NF instance without having to pass through other nodes. The process is further illustrated in Fig. 2 which shows the interaction between a NRF 201 , a NF node 202 that is intending to consume or use a NF instance, and a set 203 of NF nodes that are available to provide a NF instance to be used or consumed. NF node 202 is referred to as a NF Consumer (NFC) 202, and the NF nodes in the set 203 available to provide an NF instance are referred to as NF Producers (NFP). In Fig. 2 the set 203 comprises four NFPs, respectively labelled 204, 205, 206 and 207. Each NFP may provide the same type of NF instance (e.g. NSSF, NEF, PCF, etc.), different types of NF instance, or some combination thereof.

As shown in Fig. 2, the NFPs 204-207 register with the NRF 201 so that the NRF 201 is aware of which NFPs are available to provide NF instances to NFCs (signal 211). The NRF 201 is also responsible for identifying the health of registered NFPs. Each NFP 204-207 contacts the NRF 201 periodically to demonstrate that it is still functioning properly. The way this is done is by means of an “update” functionality called NFUpdate, which updates the parameters of the NF instance - also known as the profile (or NFProfile). If the NRF 201 does not receive an update for an amount of time longer than the ‘heart-beat’ interval, then it marks the NFP as suspended and it is no longer discoverable via the NRF 201. In addition, the NRF 201 can adjust the time between health checks (heartbeats) and can inform NFPs as a response to the next heartbeat.

The consumer NF (NFC) 202, for example a Policy Control Function (PCF) node, performs a discovery request 212 to the NRF 202 which in turn returns a list of candidate NFPs. The discovery request 212 contains a list of parameters that include the type or instance name of the NFP(s) to discover, as well as network slice related identifiers (e.g., Network Slice Subnet Instance - NSSI) and service parameters (e.g. a list of features to be supported). Based on the received information, the NFC 202 can select 213 a particular NFP 204-207 to provide an NF instance, and communicate with the NFP 203/7 to receive the NF instance.

Examples of NFs using a NRF 201 to discover and use services of other NFs include:

The Network Exposure Function (NEF) 103 making use of services from a mobility management function (AMF) 109 and Unified Data Management (UDM) 106 to expose mobility services to third parties (for example UE loss of connectivity, UE reachability, etc.).

The User Plane Function (UPF) 114 making use of services from Service Management Function (SMF) 110 which provides rules for filtering traffic. These rules contain o means for filtering specific applications, using service data flow (SDF) filters, or 3-tuple <protocol, server IP address, port number> Packet Filter Description (PFD) filters). o QoS information such as guaranteed bitrate, priority, latency ceiling, acceptable packet drop rate, etc.

For URLLC applications of global or wide-area reachability, the cost of maintaining NFs in multiple edge clouds (to reduce latency by negating the need fortraversing a backhaul transport connection) is large, as instances of these NFs need to be replicated across the radio edge.

It has been suggested that, in such cases, using a satellite network to implement NF instances, which combines the benefits of a low-latency edge cloud with the advantage of global reachability may be beneficial. Some types of satellite have some compute capacity/capability to store NF instances (also known as a “regenerative” arrangement), while other types act as a traffic relay rather than implementing NF instances directly (this arrangement is also known as “bent pipe”). This disclosure relates to the regenerative type of satellite, which many low-earth orbit (LEO) satellites are.

Summary

However, identifying whether a satellite network or particular satellite could function as a NFP or not is a complex problem as it depends on multiple objective factors. Coverage of the satellite network can be global in outdoor spaces and may have low latency in lower orbits, but it is also typically limited in terms of compute power and/or bandwidth/throughput, it can be costly to the network operator when compared to a terrestrial data center, and it may also have intermittent availability to a potential NFC (i.e. there will only be coverage as long as a satellite is in range). On the other hand, when selecting an NFP, it would be useful for an NFC to have some future knowledge about the availability/reliability of this NFP, so the NFC does not have to keep (unnecessarily) changing providers of the NFP(s).

Briefly, this disclosure considers NFPs with temporal availability for providing NF instances, such as those NFPs based in satellite networks, or other non-terrestrial-based NFPs. Embodiments provide a method that uses reinforcement learning (RL) techniques, such as deep RL techniques, to create a model that can be used to select NFPs to provide a NF instance to an NFC. Embodiments provide for the model to learn over time to make a selection decision using one or more objective factors, such as the availability of an NFP, the cost and the quality of service as perceived by NFC.

The techniques described herein allow for satellite networks (or other non-terrestrial- based devices) to play the role of compute platforms, hosting NFs for mobile telecommunications networks, thus opening new revenue streams and opportunities for satellite network vendors and operators of mobile telecommunication networks.

Beyond satellite networks, this disclosure allows for opportunistic use of temporally available compute resources (e.g., in data centers powered by renewable energy sources), thus leading to a more sustainable operation of the telecommunication network.

According to a first aspect, there is provided a computer-implemented method of operating a first node to determine a model for selecting a NFP to provide a first NF instance to a first NFC. A plurality of NFPs are capable of providing the first NF instance, and the plurality of NFPs comprise one or more non-terrestrial-based NFPs and one or more terrestrial-based NFPs. One or more of the NFPs have intermittent availability for providing a reliable NF instance to the first NFC. The method comprises: (i) receiving, from a NRF, information on the plurality of NFPs, wherein the information comprises availability information for the plurality of NFPs; (ii) selecting, based on the received information, a first candidate NFP to provide the first NF instance; (iii) establishing a connection to the first candidate NFP, and using the first NF instance provided by the first candidate NFP; (iv) monitoring one or more network availability performance metrics for the first NF instance provided by the first candidate NFP, with the network availability performance metrics relating to and/or being affected by the availability of a candidate NFP to provide a reliable NF instance; (v) calculating a reward based on values of the one or more network availability performance metrics for the first NF instance provided by the first candidate NFP; (vi) training a reinforcement learning, RL, model that is to be used to identify an NFP in the plurality of NFPs to provide the first NF instance, with the RL model being trained using information on the candidate NFP and the calculated reward; and (vii) repeating steps (ii)-(v) for one or more further selected candidate NFPs to calculate respective rewards and training the RL model using information on the one or more further candidate NFPs and the respective calculated rewards.

According to a second aspect, there is provided a computer program product comprising a computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform the according to the first aspect or any embodiment thereof.

According to a third aspect, there is provided a first node configured to determine a model for selecting a NFP to provide a first NF instance to a first NFC. A plurality of NFPs are capable of providing the first NF instance, and the plurality of NFPs comprise one or more non-terrestrial- based NFPs and one or more terrestrial-based NFPs. One or more of the NFPs have intermittent availability for providing a reliable NF instance to the first NFC. The first node is configured to: (i) receive, from a NRF, information on the plurality of NFPs, wherein the information comprises availability information for the plurality of NFPs; (ii) select, based on the received information, a first candidate NFP to provide the first NF instance; (iii) establish a connection to the first candidate NFP, and use the first NF instance provided by the first candidate NFP; (iv) monitor one or more network availability performance metrics for the first NF instance provided by the first candidate NFP, with the network availability performance metrics relating to and/or being affected by the availability of a candidate NFP to provide a reliable NF instance; (v) calculate a reward based on values of the one or more network availability performance metrics for the first NF instance provided by the first candidate NFP; (vi) train a reinforcement learning, RL, model that is to be used to identify an NFP in the plurality of NFPs to provide the first NF instance, with the RL model being trained using information on the candidate NFP and the calculated reward; and (vii) repeat operations (ii)-(v) for one or more further selected candidate NFPs to calculate respective rewards and training the RL model using information on the one or more further candidate NFPs and the respective calculated rewards.

According to a third aspect, there is provided a first node configured to determine a model for selecting a NFP to provide a first NF instance to a first NFC. A plurality of NFPs are capable of providing the first NF instance, and the plurality of NFPs comprise one or more non-terrestrial- based NFPs and one or more terrestrial-based NFPs. One or more of the NFPs have intermittent availability for providing a reliable NF instance to the first NFC. The first node comprises a processor and a memory, the memory containing instructions executable by said processor whereby said first node is operative to: (i) receive, from a NRF, information on the plurality of NFPs, wherein the information comprises availability information for the plurality of NFPs; (ii) select, based on the received information, a first candidate NFP to provide the first NF instance; (iii) establish a connection to the first candidate NFP, and use the first NF instance provided by the first candidate NFP; (iv) monitor one or more network availability performance metrics for the first NF instance provided by the first candidate NFP, with the network availability performance metrics relating to and/or being affected by the availability of a candidate NFP to provide a reliable NF instance; (v) calculate a reward based on values of the one or more network availability performance metrics for the first NF instance provided by the first candidate NFP; (vi) train a reinforcement learning, RL, model that is to be used to identify an NFP in the plurality of NFPs to provide the first NF instance, with the RL model being trained using information on the candidate NFP and the calculated reward; and (vii) repeat operations (ii)-(v) for one or more further selected candidate NFPs to calculate respective rewards and training the RL model using information on the one or more further candidate NFPs and the respective calculated rewards. Other aspects and embodiments are described further below.

Brief Description of the Drawings

Some of the embodiments contemplated herein will now be described more fully with reference to the accompanying drawings, in which:

Fig. 1 illustrates a 5G system reference architecture showing service-based interfaces used within the control plane;

Fig. 2 is an illustration of the interactions between an NRF, an NFC, and a plurality of NFPs;

Fig. 3 is a simplified illustration of a terrestrial-based NFC, terrestrial-based NFPs and non-terrestrial-based NFPs;

Fig. 4 is a block diagram illustrating components of a system that can implement the techniques described herein;

Fig. 5 is a signalling diagram illustrating a process for training a model according to the techniques described herein;

Fig. 6 is a flow chart illustrating a method of operating a first node according to some embodiments;

Fig. 7 is a simplified block diagram of a first node according to some embodiments; and

Fig. 8 is a block diagram illustrating a virtualization environment in which functions implemented by some embodiments may be virtualized.

Detailed Description

Some of the embodiments contemplated herein will now be described more fully with reference to the accompanying drawings. Embodiments are provided by way of example to convey the scope of the subject matter to those skilled in the art.

Fig. 3 is a simplified (and not-to-scale) illustration of a scenario to which the techniques described herein can be applied. In particular Fig. 3 shows a scenario in which a terrestrialbased NFC is to be provided an NF instance by a NFP that may be terrestrial-based or non- terrestrial-based. Fig. 3 shows the Earth 301 , a terrestrial-based NFC 302 (i.e. an NFC 302 that is installed at a ground site, such as, e.g. a building), a plurality of non-terrestrial-based NFPs 303 (individually labelled 303a-303h) and a plurality of terrestrial-based NFPs 304 (two terrestrial-based NFPs are shown, labelled 304a and 304b). The non-terrestrial-based NFPs 303 may be implemented or provided by satellites in Earth’s orbit - typically a low Earth orbit. As such, individual non-terrestrial-based NFPs 303 will have intermittent availability for providing the NF instance to the NFC 302. For example, in the snapshot provided by Fig. 3, non- terrestrial-based NFPs 303a, 303b and 303h are able to communicate with NFC 302, terrestrialbased NFPs 304a and 304b are able to communicate with NFC 302, but non-terrestrial-based NFPs 303c-g are not able to communicate with the NFC 302 due to their orbital positions. However, the positions of the non-terrestrial-based NFPs 303 change over time due to their orbits, and so the particular non-terrestrial-based NFPs 303 that are ‘visible’ to the NFC 302 (i.e. able to communicate with the NFC 302) changes frequently, e.g. over the course of a few minutes or tens of minutes.

For simplicity Fig. 3 shows the non-terrestrial-based NFPs 303a, 303b and 303h communicating directly with the NFC 302, but it will be appreciated that in practice the non- terrestrial-based NFPs 303 and the NFC 302 may communicate via a separately located ground station that manages the communication link between the Earth 301 and the satellites. However the same intermittent availability problem occurs since the satellite-based NFPs 303 are only able to intermittently communicate with the ground station.

While Fig. 3 illustrates a particularly important scenario to which the techniques described herein can be applied (i.e. a terrestrial-based NFC 302) - and to which the techniques are described below in more detail - it will be appreciated that the techniques can also be applied to a scenario where the NFC 302 is non-terrestrial-based (e.g. in a satellite). In this scenario, it may be that the availability of the non-terrestrial-based NFPs to the NFC is relatively static (e.g. if they follow a similar orbital path) but the availability of the terrestrial-based NFPs will vary more significantly depending on the orbital position of the NFC 302.

It will be appreciated that although the NFC 302 is to consume a NF instance provided by one of the NFPs 303, 304, the NFC 302 may itself provide a NF instance to other nodes, and thus the NFC 302 may also be a NFP for other nodes. Likewise, any one or more of the NFPs 303, 304 may consume a NF instance from another node, and thus any of the NFPs 303, 304 may also be a NFC towards other nodes.

As noted above, this disclosure considers NFPs 303, 304 with temporal availability for providing NF instances, such as those NFPs 303 based in satellite networks, or other non- terrestrial-based NFPs 303. Embodiments provide a method that uses reinforcement learning (RL) techniques, such as deep RL techniques, to create a model that can be used to select NFPs 303, 304 to provide a NF instance to an NFC 302. Embodiments provide for the model to learn overtime to make a selection decision using one or more objective factors, such as the availability of an NFP 303, 304, the cost and the quality of service as perceived by NFC 302. In effect, the model learns to predict the availability of NFPs to provide a reliable NF instance to an NFC.

The model used to select an NFP to provide an NF instance to an NFC can be generated and/or used by the NFC itself to select an NFP. Alternatively, the model can be generated and/or used by an intermediate node, such as a Service Communication Proxy (SCP). In particular, a SCP can mediate the communication between a NFP and a NFC, and the SCP can be used to either forward selection of a NFC to a NFP, or select a NFP on behalf of the NFC. A SCP has the benefit of offering functionality such as load balancing and failover, and it can also function as an interoperability bridge between different vendors (e.g. in case NFPs are hosted by another vendor). An SCP can also enforce signalling policies and monitor NF use.

Fig. 4 is a block diagram illustrating components of a system 401 that can implement the techniques described herein. The system 401 comprises a NFC 402 that is terrestrial-based. The system 401 comprises several non-terrestrial-based NFPs 403a, 403b, and several terrestrial-based NFPs 404a, 404b. In this example, the NFC 402 and the terrestrial-based NFPs 404a, 404b are considered to be part of an ‘edge cloud’ of the communication network and the non-terrestrial-based NFPs 403a, 403b are considered to be part of a ‘satellite cloud’. Two RAN nodes 405a, 405b of the communication network are also shown, and these have a backhaul connection to the NFC 402/NFPs 403, 404. Possible backhaul connections are shown by the dashed arrows, and the specific backhaul connection used can depend on the locations of the NFs. The system 401 also comprises a NRF 406. As noted above, in some embodiments a SCP can be used as an intermediary between the NFC 402 and the NFPs 403, 404. The SCP is not shown in Fig. 4, but it could be located in either the edge cloud or the satellite cloud.

The NRF 406 operates to register and update availability information of NFPs 403, 404, either periodically on the initiative of the NFP itself, or by request. The availability information for an NFP 403, 404 can be embedded in a profile, e.g. in an NFProfile. The NFProfile is defined in, for example, Table 6.1.6.2.2-1 of 3GPP TS 29.510 V15.1.0 (2018-10), Release 15, “5G System; Network function repository services; Stage 3”. The registration/update of NFPs 403, 404 with the NRF 406 is shown by the solid arrows in Fig. 4.

The availability information for a NF instance provided by a NFP 403, 404 can comprise any of: a NF type (indicating the type of NF, e.g., AMF, UDM, etc. as described in Table 6.1.6.3.3-1 of 3GPP TS 29.510 cited above), and/or a NF instance identifier (e.g. a unique identity of a NF instance); for each NF instance identifier, the information can comprise a capacity indicator (relative to other NFPs), load information (e.g. as a percentage), and/or locality (e.g. geographical location of the data center in which the NFP is implemented/located); and for each NF instance identifier, a priority of the NF instance (e.g. in terms of preference over other NFPs).

A NFC 402 or SCP can request from an NFP 403, 404 not only the latest record of these parameters (the availability information) as received from NRF 406 in an update or register message by an NFP 403, 404. A NFC 402 (or SCP) can also request a time series of performance data to enable the NFC 402 or SCP to make a more informed decision on which NFP 403, 404 to use. In addition to load and capacity information, the information can also include inform about the availability and latency of the connection to the NFP. In this discussion, the new parameter/availability information is referred to as metricValues.

The NRF 406 is a logical node and can be in the core network of the operator’s network, or it can also function in a distributed manner. For example the NRF 406 can manage NFPs in one (part of the) edge cloud, or in a particular geographical region.

The non-terrestrial-based NFPs 403 and terrestrial-based NFPs 404 register to the NRF 406 and provide information about the type of network functions they support and their availability, as discussed above with reference to the NRF 406.

The NFC 402 subscribes to updates of NFs it is interested in to the NRF 405 and receives updates on the status of network functions from the NRF 406. The updates can be received when a trigger is met, as specified in the standards (e.g. in the other “Release 15” 3GPP standard documents), or the updates can be received as a time series of data, e.g. as the metricValues parameter/information. The subscription by the NFC 402 to NFP information updates from the NRF 406 is shown by the dotted arrow in Fig. 4.

The signalling diagram in Fig. 5 illustrates a process for training a model according to embodiments of the techniques described herein. Fig. 5 shows the signalling between a NFC 501 , a NRF 502, a NFP 503 and a node 504 referred to as an ‘Experience Repository’ or 'Experience_Repo' .

In these embodiments, the (and each other) NFC 501 is equipped with an intelligent agent that executes a reinforcement learning (RL) algorithm. In alternative embodiments where a SCP determines and/or uses the model, the SCP can be provided with the intelligent agent for executing the algorithm. In general, in RL an agent, when given a state of an environment, takes actions in that environment and receives a reward, as well as receiving a new state of the environment as a result of the action taken. The goal of the agent is over time to learn the optimal policy, i.e., to be able to take, for any state of the environment, the action yielding the highest reward. This method of learning is suitable for NFCs, as it does not require any training data in advance.

The model learning can be achieved in the agent by means of a neural network. Both value-learning and policy-learning approaches are applicable for training the model. Valuelearning can include techniques such as Deep Q-Network (DQN) or DQN derivatives such as Double-DQN, or Deep Recurrent Q-Network algorithms. Policy-learning can include techniques such as actor-critic type of approaches like Deep Deterministic Policy Gradient (DDPG), Asynchronous Actor-Critic Agent (A3C), Generalised Advantage Estimate (GAE) and Q-prop.

In the following, a value-learning approach is used, where the neural network is trained to indicate Q-Values for actions, with the highest Q-Value belonging to the action that is predicted to yield the highest reward. The Experience_Repo 504 is a (global) database which can be used by agents to exchange experiences, i.e. training data for their neural networks. This training data repository 504 can be implemented using a centralised database (e.g., some form of relational database), or a distributed database such as a ledger (e.g., a blockchain). The latter is heavier computationally but is better suited for scenario where different NFCs may belong to different administrative domains and do not necessarily trust each other (e.g. in a multi-vendor scenario where NFCs belong to different mobile network operators).

The process in Fig. 5 is split into three main sections, a bootstrapping/setup phase 511 , a training phase 512 and an operation phase 513.

In the bootstrapping/setup phase 511 , the process starts by the NFC 501 subscribing to the NRF 502 for a specific type of NF. This is shown in Fig. 5 by the NFC 501 sending a NFStatusSubscribe signal 515 to the NRF 502. For example, the NFC 501 can be a NEF subscribing to updates of status of NFInstances of types UDM and AMF. The NRF 502 activates the subscription for the NFC 501 and sends a notification 516 indicating this to the NFC 501 .

In the training phase 512, when the NRF 502 detects a change in the status of NFP 503 hosting the NFInstances subscribed to by the NFC 501 , then the NRF 502 notifies the NFC 501 . This is shown by NFUpdate signal 521. The “status” of an NFInstance is encapsulated in an NFProfile (for example see Section 6.1 .6.2.2-1 of 3GPP TS 29.510 cited above), but the present techniques make use of any of: the identity of the NFInstance, its type, capacity, load, as well as a set of metric values (metricValues) as described above (e.g. latency, jitter, packet drops and/or availability).

The NFC 501 stores the received update information (step 522).

At step 523, the agent in the NFC 501 takes an action (a_nfc) based on the aggregate “status” of all NFInstances that form the state of the RL environment. The action (a_nfc) is a choice of an NF instance (NFInstance) to use, for the next t number of minutes, where t is a period referred to as the observation period. The NF instance to use is selected from a list of NFPs that can provide the required type of NF instance.

The action selection in step 523 can be performed according to a selection policy. For example early in the process where the neural network of the agent is not sufficiently trained, then the choice of an NFP 503 can be random, or can be based on a priority flag of the NFUpdate received in step 521 (as it is done conventionally). It is only after the neural network is trained sufficiently than its predictions can be relied on.

The NFC 501 connects to the selected NFP 503 (as shown by signals 524) and uses the NF instance provided by the NFP 503. During t, the NFC 501 collects performance information about the NFP 503 and eventually uses this data to calculate a reward, after t elapses (step 525). The performance information can include network availability performance metrics that relate to and/or are affected by the availability of the NFP 502 to provide a reliable NF instance to the NFC 501 . The network availability performance metrics can include any of latency, jitter, packet drop rate, and capacity of the NFP 503 to provide NF instances.

The reward calculated in step 525 is a function that returns a scalar (e.g., a floating number between 0 and 1) and its calculation can be based on several factors. The factors can include, for example: the number of violations of a network metric in the observation period (e.g. the amount of times any of, e.g. latency, jitter and/or packet drops were measured to be above a threshold, or availability and/or capacity below a threshold); a cost associated with the action relating to, for example, service costs paid to the edge cloud or satellite provider. The cost could also factor in energy costs that contribute to sustainable development of the industry - for example satellites have “free” energy from the sun, whereas cellular networks are powered by the electrical grid. the availability of the NFP 503, which is not always guaranteed. The availability can be indicated by a percentage that can be normalised from 0 to 1 . According to 3GPP TS 29.510 cited above, an NRF 502 can continuously “ping” the NFP 503 to determine if it is available or not (i.e. by use of heartbeat mechanism). It is therefore possible to quantify availability as a ratio of ‘failed’ heartbeats by the total number of heartbeats. Alternatively, in case of aperiodical heartbeats, it is also possible to measure the time between successful and failed heartbeats, add it to the time between failed heartbeats, and divide by total time.

The calculation of the reward in step 525 can be a weighted or non-weighted averaging of the aforementioned factors. A weighted average can be used in the event that there is a preference or importance for one or more of the factors, e.g. cost.

Fig. 5 shows that the NFC 501 subsequently receives another NF update (signal 526) from the NRF 502 for another NFP 503 and this is stored by the NFC 501 (step 527), although these steps are optional.

In embodiments where an Experience Repository 504 is used, the NFC 501 sends parts of the stored data to the Experience Repository 504 for storage and use by other NFCs 501 (as shown by signal 528). The information sent to the Experience Repository 504 can include information on a status of the NFC (s_nfc) and the corresponding action taken by the NFC (a_nfc).

Steps 521 to 528 can be repeated a number of times to collect a sufficiently large set of training data.

In embodiments where an Experience Repository 504 is used, then in sub-section 529, when the RL model is to be trained, the NFC 501 can retrieve experience information for other NFCs 501 from the Experience Repository 504 (step/signal 530). In step 531 the neural network/RL model is trained using the training data generated by the NFC 501 , and any other experience data obtained in step 530.

It should be noted that for the Experience Repository 504 to be used, all NFCs 501 need to have access to the same NFPs 503 and share the same reward function.

In the operation phase 513, which occurs once the RL model has been trained, the NFC 501 receives an NF update (signal 541) from the NRF 502 for a NFP 503. The NFC 501 subsequently needs to select a new NFP 502 to provide the required NF instance, and so the NFC 501 executes the trained model to select an appropriate NFP 503 (step 542).

In sub-section 543, the NFP 503 selected by the RL model may be different to the one that is currently being used by the NFC 501 , and therefore the NFC 501 may establish a connection to, and use, the newly-selected NFP 503 (as shown by signal 544).

In some embodiments, in case new NFPs join the pool of available NF instance providers, or existing NFPs leave the pool, either temporarily or permanently, then the agent needs to retrain its neural network/RL model. This does not necessarily mean that training of the neural network has to start from the beginning, however, as the input layer of the neural network changes, meaning that the architecture of the neural network needs to change, for example by adding a convolutional layer in the beginning, or by using padding approach.

In some embodiments, a non-terrestrial-based (i.e. satellite) NFP 503 may change role from a NFP to a ‘bent pipe’ device, i.e. rather than providing an NFP itself, the satellite can just be used for relaying NF instance communications between a NFC and a NFP. If, for example, selected actions (a_nfc) from the agents of all (or most) NFCs show that the NFPs to be used are in terrestrial networks (i.e. they are terrestrial-based), then there is no need for compute capability to be provided by the satellites, which means that the satellites can, in this case, be used solely as data carriers. The opposite is also true, i.e., in the case of a satellite starting to be assigned NF instances, its role can change from ‘bent pipe’ operation to that of an NFP. It is also possible for the NRF 502 to function within certain limits of satellite capacity, e.g. 40% bent pipe and 60% regenerative.

In some embodiments, the state description forthe NFPs can be extended with information about with satellite orbit dynamics. That is, since satellites are moving relative to each other and with respect to the Earth, the capacity of a communication channel between a NFC and a NFP (e.g. in terms of bandwidth, latency, jitter etc.) will change over time. While the RL agent has information about capacity of each NFP at the time the performance information was obtained, and even information on the reliability of the NFP in delivering its NF instances in the past, the RL agent has little visibility on how this capacity can change in the future without having information about the relative movement of a NFP with respect to a NFC. Therefore, to increase the accuracy of the assessment of the expected reward when choosing a specific NFP, embodiments provide for the inclusion of information on the temporal availability of a communication channel between a NFC and a NFP in the information for the RL agent about the environmental state space. Two ways to provide this information are: the information could be included as an input to the RL agent in the form of relative position coordinate(s) along with a direction of movement of all the potential NFPs with respect to NFC. Overtime, the RL agent would learn how the relative dynamics of the satellites impact communication channel characteristics. However, this approach may require a more complex neural network to learn and analyse the relationship. That can mean a longer convergence time for training the RL agent; alternatively, the time period of how long into future the communication channel between NFC and NFP will be available at the level that would not impact NFP function towards NFC could be assessed outside the RL agent, using information about relative movement of satellites. In that case, the RL agent would receive a corresponding temporal availability assessment of the NFPs rather than satellite movement parameters as described in first alternative above. This approach would reduce RL agent complexity, and speed up its training convergence at a cost of possible degradation in the accuracy of the decisions.

Fig. 6 is a flow chart illustrating a method of operating a first node according to some embodiments. The method in Fig. 6 may be performed by a first node as described later with reference to Figs. 7 and 8. The first node may be a NFC or a SCP.

The first node may perform the method in response to executing suitably formulated computer readable code. The computer readable code may be embodied or stored on a computer readable medium, such as a memory chip, optical disc, or other storage medium. The computer readable medium may be part of a computer program product.

The method in Fig. 6 enables a first node to determine a model for selecting a NFP to provide a first NF instance to a first NFC. The NFP is to be selected from a plurality of NFPs that are capable of providing the first NF instance, with the plurality of NFPs comprising at least one non-terrestrial-based NFP and at least one terrestrial-based NFP. One or more of these NFPs have intermittent availability for providing a reliable NF instance to the first NFC.

In some embodiments, the first node and/or the NFC can be terrestrial-based. In these embodiments, the one or more non-terrestrial-based NFPs may have intermittent availability for providing a reliable NF instance to the NFC, and the one or more terrestrial-based NFPs may have high availability for providing a reliable NF instance to the NFC.

In alternative embodiments, the first node and/or the NFC can be non-terrestrial-based. In these embodiments, the one or more terrestrial-based NFPs may have intermittent availability for providing a reliable NF instance to the NFC, and the one or more non-terrestrial-based NFPs may have high availability for providing a reliable NF instance to the NFC, depending on their respective orbital paths. The NFP and/or the NFC can be (or provide NF instances of) any of an: AUSF, Unified Data Repository (UDR), UDM, PCF, Short Message Service Function (SMSF), Location Management Function (LMF), Core Charging Function (CHF), NEF, AF, AMF, SMF, Network Data Analytics Function (NWDAF), UPF, a virtual network switch, a virtual network router and a virtual firewall.

In a first step, step 601 , the first node receives, from a NRF, information on the plurality of NFPs. The information comprises availability information for the plurality of NFPs.

In some embodiments, the information received from the NRF comprises, for each of the NFPs, information indicating any one or more of: an identifier for the NFP, the types of NF instances provided by the NFP, the capacity of the NFP to provide NF instances, the load of the NFP (e.g. how much of the capacity of the NFP is currently being used), and previous values of network availability performance metrics for the NFP.

In some embodiments, the information received for the non-terrestrial-based NFPs can comprise information indicating any of: relative positions and/or coordinates of an NFP (e.g. relative to the first node and/or the NFC), a direction of movement of an NFP, and one or more time periods in which the NFP is expected to be available for connection to the first NFC.

In step 603, the first node selects a first candidate NFP to provide the first NF instance based on the received information. In some embodiments, the first node randomly selects the first candidate NFP from the plurality of NFPs. In alternative embodiments, the first node selects the first candidate NFP as the NFP in the plurality of NFPs having a highest priority flag in the received information.

In step 605, the first node establishes a connection to the first candidate NFP, and uses the first NF instance provided by the first candidate NFP.

In step 607, while the first NF instance provided by the first candidate NFP is being used, the first node monitors one or more network availability performance metrics for the first NF instance. These network availability performance metrics relate to and/or are affected by the availability of a candidate NFP to provide a reliable NF instance. In some embodiments, the one or more network availability performance metrics comprise any one or more of: latency, jitter, packet drop rate, and capacity of the NFP to provide NF instances.

In step 609, the first node calculates a reward based on values of the one or more network availability performance metrics for the first NF instance.

Some embodiments of step 609 comprise calculating the reward based on values of the one or more network availability performance metrics for the first NF instance provided by the first candidate NFP, and a cost associated with using the first candidate NFP to provide the first NF instance to the first NFC. The cost associated with using the first candidate NFP can relate to a financial cost to an operator of the NFC. For example, since a non-terrestrial-based NFP can have a higher financial set up and maintenance cost (e.g. the cost of manufacturing the satellite and launching it into orbit) than a terrestrial-based NFP, the cost levied by an operator of a non-terrestrial-based NFP for use of a NF instance provided by that NFP may be higher than for an operator of a terrestrial-based NFP. As an example, the calculated reward may be higher when the (financial) cost of using the NFP is lower. In addition or alternatively, the cost can relate to an energy cost of using the NFP to provide the first NF instance. For example, the energy cost of using a satellite-based NFP can be zero or minimal if the satellite is solar- powered, whereas the energy cost may be much higher for a non-solar-powered terrestrialbased NFP. As an example, the calculated reward may be higher when the energy cost of using the NFP is lower.

In step 611 , a RL model is trained that is to be used to identify an NFP in the plurality of NFPs to provide the first NF instance. The RL model is trained using information on the candidate NFP and the calculated reward. In some embodiments, the RL model is built using deep learning, value learning or policy learning. In particular, value-learning techniques can include techniques such as DQN, or DQN derivatives such as Double-DQN, or Deep Recurrent Q-Network algorithms. Policy-learning techniques can include actor-critic types of approaches like DDPG, A3C, GAE and Q-prop.

Then, in step 613, steps 603 to 611 are repeated for one or more further selected candidate NFPs to calculate respective rewards, and then training the RL model using information on the one or more further candidate NFPs and the respective calculated rewards.

In some embodiments, as part of, or prior to, step 613, the method can further comprise receiving further training data relating to the plurality of NFPs and respective rewards determined by other NFCs. This information can be received from a database (e.g. Experience Repository 504. In this case, step 613 comprises training the RL model using the information on the candidate NFPs, the calculated rewards, and the received further training data.

In some embodiments, the method further comprises using the trained RL model to identify one of the plurality of NFPs to provide the first NF instance when the first NFC requires the first NF instance to be provided. The method can further comprise establishing a connection to the identified NFP and using the identified NFP to provide the first NF instance.

Further embodiments provide for the continuous training of the RL model after the model has been deployed and is being used to select NFPs. In particular, the method can further comprise performing steps 607 and 609 for the first NF instance provided by the identified NFP, and updating the trained RL model based on the reward calculated for the first NF instance provided by the identified NFP.

In some embodiments, prior to step 601 , the first node can subscribe to the NRF for information on NFPs. In these embodiments, the information on the plurality of NFPs received in step 601 is received as a result of the subscription. In some embodiments, the method can further comprise the steps of one or both of: (i) removing an existing NFP from the plurality of NFPs if the existing NFP is no longer able to provide the first NF instance (for example if the existing NFP has been deactivated or is faulty); and (ii) adding a new NFP to the plurality of NFPs if the new NFP is now able to provide the first NF instance (e.g. if a new NFP has been activated). In this case, the method can further comprise updating the trained RL model according to the NFPs in the updated plurality of NFPs.

Fig. 7 is a simplified block diagram of a first node 700 according to some embodiments that can be used to implement the techniques described herein. As described herein, the first node 700 can be any node that can operate as a NFC (i.e. that can make use of a NF instance provided by another NF node), or a SCP (i.e. that can act as an intermediary between a NFC and NFP(s)). Any of the NFC, SCP, and NFP(s) can be configured as shown in Fig. 7.

The first node 700 comprises processing circuitry (or logic) 701 . It will be appreciated that the first node 700 may comprise one or more virtual machines running different software and/or processes. The first node 700 may therefore comprise, or be implemented in or as one or more servers, switches and/or storage devices and/or may comprise cloud computing infrastructure that runs the software and/or processes.

The processing circuitry 701 controls the operation of the first node 700 and can implement the methods described herein in relation to the first node 700. The processing circuitry 701 can comprise one or more processors, processing units, multi-core processors or modules that are configured or programmed to control the first node 700 in the manner described herein. In particular implementations, the processing circuitry 701 can comprise a plurality of software and/or hardware modules that are each configured to perform, or are for performing, individual or multiple steps of the method described herein in relation to the first node 700.

The first node 700 may also comprise a communications interface 702. The communications interface 702 is for use in enabling communications with other nodes, such as any of NFPs, a NRF, a SCP, and a NFC. For example, the communications interface 702 can be configured to transmit to and/or receive from other nodes requests, acknowledgements, information, data, signals, or similar. The communication interface 702 can use any suitable communication technology, and for example the communication technology used can depend on whether the first node 700 is terrestrial-based or non-terrestrial-based.

The processing circuitry 701 may be configured to control the communications interface 702 to transmit to and/or receive from other nodes requests, acknowledgements, information, data, signals, or similar, according to the methods described herein.

The first node 700 may comprise a memory 703. In some embodiments, the memory 703 can be configured to store program code that can be executed by the processing circuitry 701 to perform the method described herein in relation to the first node 700. Alternatively or in addition, the memory 703 can be configured to store any requests, acknowledgements, information, data, signals, or similar that are described herein. The processing circuitry 701 may be configured to control the memory 703 to store such information therein.

Fig. 8 is a block diagram illustrating a virtualization environment 800 in which functions implemented by some embodiments may be virtualized. In the present context, virtualizing means creating virtual versions of apparatuses or devices which may include virtualizing hardware platforms, storage devices and networking resources. As used herein, virtualization can be applied to any device described herein, or components thereof, and relates to an implementation in which at least a portion of the functionality is implemented as one or more virtual components. In particular, virtualization can be applied to any of a NFC, a SCP, a NFP, a NRF, or more generally a ‘first node’ as described herein (i.e. the general term used to refer to a NFC or a SCP). Some or all of the functions described herein may be implemented as virtual components executed by one or more virtual machines (VMs) implemented in one or more virtual environments 800 hosted by one or more of hardware nodes, such as a hardware computing device that operates as a core network node. In further embodiments, the node may be entirely virtualized. In further embodiments, the virtualization environment may be part of a terrestrial-based node, or part of a non-terrestrial-based node, such as a satellite.

Applications 802 (which may alternatively be called software instances, virtual appliances, network functions, virtual nodes, virtual network functions, etc.) are run in the virtualization environment 800 to implement some of the features, functions, and/or benefits of some of the embodiments disclosed herein.

Hardware 804 includes processing circuitry, memory that stores software and/or instructions executable by hardware processing circuitry, and/or other hardware devices as described herein, such as a network interface, input/output interface, and so forth. Software may be executed by the processing circuitry to instantiate one or more virtualization layers 806 (also referred to as hypervisors or virtual machine monitors (VMMs)), provide VMs 808a and 808b (one or more of which may be generally referred to as VMs 808), and/or perform any of the functions, features and/or benefits described in relation with some embodiments described herein. The virtualization layer 806 may present a virtual operating platform that appears like networking hardware to the VMs 808.

The VMs 808 comprise virtual processing, virtual memory, virtual networking or interface and virtual storage, and may be run by a corresponding virtualization layer 806. Different embodiments of the instance of a virtual appliance 802 may be implemented on one or more of VMs 808, and the implementations may be made in different ways. Virtualization of the hardware is in some contexts referred to as network function virtualization (NFV). NFV may be used to consolidate many network equipment types onto industry standard high volume server hardware, physical switches, and physical storage, which can be located in data centers, and customer premise equipment. In the context of NFV, a VM 808 may be a software implementation of a physical machine that runs programs as if they were executing on a physical, non-virtualized machine. Each of the VMs 808, and that part of hardware 804 that executes that VM, be it hardware dedicated to that VM and/or hardware shared by that VM with others of the VMs, forms separate virtual network elements. Still in the context of NFV, a virtual network function is responsible for handling specific network functions that run in one or more VMs 808 on top of the hardware 804 and corresponds to the application 802.

Hardware 804 may be implemented in a standalone network node with generic or specific components. Hardware 804 may implement some functions via virtualization. Alternatively, hardware 804 may be part of a larger cluster of hardware (e.g. such as in a data center or customer premises equipment - CPE) where many hardware nodes work together and are managed via management and orchestration 810, which, among others, oversees lifecycle management of applications 802. In some embodiments, hardware 804 is coupled to one or more radio units that each include one or more transmitters and one or more receivers that may be coupled to one or more antennas. Radio units may communicate directly with other hardware nodes via one or more appropriate network interfaces and may be used in combination with the virtual components to provide a virtual node with radio capabilities, such as a radio access node or a base station. In some embodiments, some signalling can be provided with the use of a control system 812 which may alternatively be used for communication between hardware nodes and radio units.

The foregoing merely illustrates the principles of the disclosure. Various modifications and alterations to the described embodiments will be apparent to those skilled in the art in view of the teachings herein. It will thus be appreciated that those skilled in the art will be able to devise numerous systems, arrangements, and procedures that, although not explicitly shown or described herein, embody the principles of the disclosure and can be thus within the scope of the disclosure. Various exemplary embodiments can be used together with one another, as well as interchangeably therewith, as should be understood by those having ordinary skill in the art.

Claims

1. A computer-implemented method of operating a first node to determine a model for selecting a Network Function Producer, NFP, to provide a first network function, NF, instance to a first Network Function Consumer, NFC, wherein a plurality of NFPs are capable of providing the first NF instance, and the plurality of NFPs comprise one or more non-terrestrial-based NFPs and one or more terrestrial-based NFPs, wherein one or more of the NFPs have intermittent availability for providing a reliable NF instance to the first NFC; the method comprising:

(i) receiving (601), from a NF Repository Function, NRF, information on the plurality of NFPs, wherein the information comprises availability information for the plurality of NFPs;

(ii) selecting (603), based on the received information, a first candidate NFP to provide the first NF instance;

(iii) establishing (605) a connection to the first candidate NFP, and using the first NF instance provided by the first candidate NFP;

(iv) monitoring (607) one or more network availability performance metrics for the first NF instance provided by the first candidate NFP, wherein the network availability performance metrics relate to and/or are affected by the availability of a candidate NFP to provide a reliable NF instance;

(v) calculating (609) a reward based on values of the one or more network availability performance metrics for the first NF instance provided by the first candidate NFP;

(vi) training (611) a reinforcement learning, RL, model that is to be used to identify an NFP in the plurality of NFPs to provide the first NF instance, wherein the RL model is trained using information on the candidate NFP and the calculated reward; and

(vii) repeating (613) steps (ii)-(v) for one or more further selected candidate NFPs to calculate respective rewards and training the RL model using information on the one or more further candidate NFPs and the respective calculated rewards.

2. A method as claimed in claim 1 , wherein the method further comprises: when the first NFC requires the first NF instance to be provided, using the trained RL model to identify one of the plurality of NFPs to provide the first NF instance.

3. A method as claimed in claim 2, wherein the method further comprises: establishing a connection to the identified NFP and using the identified NFP to provide the first NF instance.

4. A method as claimed in claim 3, wherein the method further comprises: performing steps (iv) and (v) for the first NF instance provided by the identified NFP; and updating the trained RL model based on the reward calculated for the first NF instance provided by the identified NFP.

5. A method as claimed in any of claims 1-4, wherein the step of selecting (603) comprises randomly selecting the first candidate NFP from the plurality of NFPs.

6. A method as claimed in any of claims 1-4, wherein the step of selecting (603) comprises selecting the first candidate NFP as the NFP in the plurality of NFPs having a highest priority flag in the received information.

7. A method as claimed in any of claims 1-6, wherein step (v) comprises calculating (609) the reward based on values of the one or more network availability performance metrics for the first NF instance provided by the first candidate NFP and a cost associated with using the first candidate NFP to provide the first NF instance to the first NFC.

8. A method as claimed in claim 7, wherein the cost associated with using the first candidate NFP relates to a financial cost to an operator of the NFC and/or to an energy cost of using the NFP to provide the first NF instance.

9. A method as claimed in any of claims 1-8, wherein the method further comprises: subscribing to the NRF for information on NFPs; wherein the information on the plurality of NFPs received in step (i) (601) is received as a result of the subscription.

10. A method as claimed in any of claims 1 -9, wherein the method further comprises: receiving, from a database, further training data relating to the plurality of NFPs and respective rewards determined by other NFCs; and wherein step (vii) comprises training the RL model using the information on the candidate NFPs, the calculated rewards, and the received further training data.

11. A method as claimed in any of claims 1-10, wherein the information received from the NRF comprises, for each of the plurality of NFPs, information indicating any one or more of: an identifier for the NFP, types of NF instances provided by the NFP, capacity of the NFP, load of the NFP, and previous values of network availability performance metrics for the NFP.

12. A method as claimed in any of claims 1-11 , wherein the one or more network availability performance metrics comprise any one or more of: latency, jitter, packet drop rate, and capacity of the NFP to provide NF instances.

13. A method as claimed in any of claims 1-12, wherein the RL model is built using deep learning, value learning or policy learning.

14. A method as claimed in any of claims 1-13, wherein the NFP and/or the NFC is any of: Authentication Server Function (AUSF), Unified Data Repository (UDR), Unified Data Management (UDM), Policy Control Function (PCF), Short Message Service Function (SMSF), Location Management Function (LMF), Core Charging Function (CHF), Network Exposure Function (NEF), Application Function (AF), Access and Mobility Function (AMF), Session Management Function (SMF), Network Data Analytics Function (NWDAF), User Plane Function (UPF), a virtual network switch, a virtual network router and a virtual firewall.

15. A method as claimed in any of claims 1-14, wherein the method further comprises: one or both of: (i) removing an existing NFP from the plurality of NFPs if the existing NFP is no longer able to provide the first NF instance; and (ii) adding a new NFP to the plurality of NFPs if the new NFP is now able to provide the first NF instance; and updating the trained RL model according to the NFPs in the plurality of NFPs.

16. A method as claimed in any of claims 1-15, wherein the received information for the non- terrestrial-based NFPs comprises information indicating any of: relative positions and/or coordinates of an NFP, a direction of movement of an NFP, and one or more time periods in which the NFP is expected to be available for connection to the first NFC.

17. A method as claimed in any of claims 1-16, wherein the one or more non-terrestrial-based NFPs have intermittent availability for providing a reliable NF instance to the NFC, and wherein the one or more terrestrial-based NFPs have high availability for providing a reliable NF instance to the NFC.

18. A method as claimed in any of claims 1-17, wherein the first node is the first NFC.

19. A method as claimed in any of claims 1-17, wherein the first node is a Service Communication Proxy, SCP.

20. A computer program product comprising a computer readable medium having computer readable code embodied therein, the computer readable code being configured such that, on execution by a suitable computer or processor, the computer or processor is caused to perform the method of any of claims 1-19.

21. A first node (501 ; 700; 800) configured to determine a model for selecting a Network Function Producer, NFP, (403, 404; 503) to provide a first network function, NF, instance to a first Network Function Consumer, NFC (402; 501), wherein a plurality of NFPs (403, 404; 503) are capable of providing the first NF instance, wherein the plurality of NFPs (403, 404; 503) comprise one or more non-terrestrial-based NFPs (403; 503) and one or more terrestrial-based NFPs (404; 503), wherein one or more of the NFPs (403, 404; 503) have intermittent availability for providing a reliable NF instance to the first NFC (402; 501); the first node (501 ; 700; 800) configured to:

(i) receive, from a NF Repository Function, NRF, (406; 502) information on the plurality of NFPs (403, 404; 503), wherein the information comprises availability information for the plurality of NFPs (403, 404; 503);

(ii) select, based on the received information, a first candidate NFP to provide the first NF instance;

(iii) establish a connection to the first candidate NFP (403, 404; 503), and use the first NF instance provided by the first candidate NFP (403, 404; 503);

(iv) monitor one or more network availability performance metrics for the first NF instance provided by the first candidate NFP (403, 404; 503), wherein the network availability performance metrics relate to and/or are affected by the availability of a candidate NFP (403, 404; 503) to provide a reliable NF instance;

(v) calculate a reward based on values of the one or more network availability performance metrics for the first NF instance provided by the first candidate NFP (403, 404; 503);

(vi) train a reinforcement learning, RL, model that is to be used to identify an NFP (403, 404; 503) in the plurality of NFPs (403, 404; 503) to provide the first NF instance, wherein the RL model is trained using information on the candidate NFP (403, 404; 503) and the calculated reward; and

(vii) repeat operation (ii)-(v) for one or more further selected candidate NFPs (403, 404; 503) to calculate respective rewards and training the RL model using information on the one or more further candidate NFPs (403, 404; 503) and the respective calculated rewards.

22. A first node (501 ; 700; 800) as claimed in claim 21 , wherein the first node (501 ; 700; 800) is further configured to: when the first NFC (402; 501) requires the first NF instance to be provided, use the trained RL model to identify one of the plurality of NFPs (403, 404; 503) to provide the first NF instance.

23. A first node (501 ; 700; 800) as claimed in claim 22, wherein the first node (501 ; 700; 800) is further configured to: establish a connection to the identified NFP (403, 404; 503) and use the identified NFP (403, 404; 503) to provide the first NF instance.

24. A first node (501 ; 700; 800) as claimed in claim 23, wherein the first node (501 ; 700; 800) is further configured to: perform operations (iv) and (v) for the first NF instance provided by the identified NFP (403, 404; 503); and update the trained RL model based on the reward calculated for the first NF instance provided by the identified NFP (403, 404; 503).

25. A first node (501 ; 700; 800) as claimed in any of claims 21-24, wherein the first node (501 ; 700; 800) is configured to select a first candidate NFP by randomly selecting the first candidate NFP (403, 404; 503) from the plurality of NFPs (403, 404; 503).

26. A first node (501 ; 700; 800) as claimed in any of claims 21-24, wherein the first node (501 ; 700; 800) is configured to select a first candidate NFP (403, 404; 503) by selecting the first candidate NFP (403, 404; 503) as the NFP (403, 404; 503) in the plurality of NFPs (403, 404; 503) having a highest priority flag in the received information.

27. A first node (501 ; 700; 800) as claimed in any of claims 21-26, wherein the first node (501 ; 700; 800) is configured to calculate the reward based on values of the one or more network availability performance metrics for the first NF instance provided by the first candidate NFP (403, 404; 503) and a cost associated with using the first candidate NFP (403, 404; 503) to provide the first NF instance to the first NFC (402; 501).

28. A first node (501 ; 700; 800) as claimed in claim 27, wherein the cost associated with using the first candidate NFP (403, 404; 503) relates to a financial cost to an operator of the NFC (402; 501) and/or to an energy cost of using the NFP (403, 404; 503) to provide the first NF instance.

29. A first node (501 ; 700; 800) as claimed in any of claims 21-28, wherein the first node (501 ;

700; 800) is further configured to: subscribe to the NRF (406; 502) for information on NFPs (403, 404; 503); wherein the information on the plurality of NFPs (403, 404; 503) received in operation (i) is received as a result of the subscription.

30. A first node (501 ; 700; 800) as claimed in any of claims 21-29, wherein the first node (501 ; 700; 800) is further configured to: receive, from a database (504), further training data relating to the plurality of NFPs (403, 404; 503) and respective rewards determined by other NFCs (402; 501); and wherein operation (vii) comprises training the RL model using the information on the candidate NFPs (403, 404; 503), the calculated rewards, and the received further training data.

31. A first node (501 ; 700; 800) as claimed in any of claims 21-30, wherein the information received from the NRF (406; 502) comprises, for each of the plurality of NFPs (403, 404; 503), information indicating any one or more of: an identifier for the NFP (403, 404; 503), types of NF instances provided by the NFP (403, 404; 503), capacity of the NFP (403, 404; 503), load of the NFP (403, 404; 503), and previous values of network availability performance metrics for the NFP (403, 404; 503).

32. A first node (501 ; 700; 800) as claimed in any of claims 21-31 , wherein the one or more network availability performance metrics comprise any one or more of: latency, jitter, packet drop rate, and capacity of the NFP (403, 404; 503) to provide NF instances.

33. A first node (501 ; 700; 800) as claimed in any of claims 21-32, wherein the RL model is built using deep learning, value learning or policy learning.

34. A first node (501 ; 700; 800) as claimed in any of claims 21-33, wherein the NFP (403, 404; 503) and/or the NFC (402; 501) is any of: Authentication Server Function (AUSF), Unified Data Repository (UDR), Unified Data Management (UDM), Policy Control Function (PCF), Short Message Service Function (SMSF), Location Management Function (LMF), Core Charging Function (CHF), Network Exposure Function (NEF), Application Function (AF), Access and Mobility Function (AMF), Session Management Function (SMF), Network Data Analytics Function (NWDAF), User Plane Function (UPF), a virtual network switch, a virtual network router and a virtual firewall.

35. A first node (501 ; 700; 800) as claimed in any of claims 21-34, wherein the first node (501 ;

700; 800) is further configured to: one or both of: (i) remove an existing NFP (403, 404; 503) from the plurality of NFPs (403, 404; 503) if the existing NFP (403, 404; 503) is no longer able to provide the first NF instance; and (ii) add a new NFP (403, 404; 503) to the plurality of NFPs (403, 404; 503) if the new NFP (403, 404; 503) is now able to provide the first NF instance; and update the trained RL model according to the NFPs (403, 404; 503) in the plurality of NFPs (403, 404; 503).

36. A first node (501 ; 700; 800) as claimed in any of claims 21-35, wherein the received information for the non-terrestrial-based NFPs (403; 503) comprises information indicating any of: relative positions and/or coordinates of an NFP (403; 503), a direction of movement of an NFP (403; 503), and one or more time periods in which the NFP (403; 503) is expected to be available for connection to the first NFC (402; 501).

37. A first node (501 ; 700; 800) as claimed in any of claims 21-36, wherein the one or more non-terrestrial-based NFPs (403; 503) have intermittent availability for providing a reliable NF instance to the NFC (402; 501), and wherein the one or more terrestrial-based NFPs (404; 503) have high availability for providing a reliable NF instance to the NFC (402; 501).

38. A first node (501 ; 700; 800) as claimed in any of claims 21-37, wherein the first node (501 ; 700; 800) is the first NFC (402; 501).

39. A first node (501 ; 700; 800) as claimed in any of claims 21-37, wherein the first node (501 ; 700; 800) is a Service Communication Proxy, SCP.

40. A first node configured to determine a model for selecting a Network Function Producer, NFP, to provide a first network function, NF, instance to a first Network Function Consumer, NFC, wherein a plurality of NFPs are capable of providing the first NF instance, wherein the plurality of NFPs comprise one or more non-terrestrial-based NFPs and one or more terrestrialbased NFPs, wherein one or more of the NFPs have intermittent availability for providing a reliable NF instance to the first NFC; wherein the first node comprises a processor and a memory, said memory containing instructions executable by said processor whereby said first node is operative to:

(i) receive, from a NF Repository Function, NRF, information on the plurality of NFPs, wherein the information comprises availability information for the plurality of NFPs;

(ii) select, based on the received information, a first candidate NFP to provide the first NF instance; (iii) establish a connection to the first candidate NFP, and use the first NF instance provided by the first candidate NFP;

(iv) monitor one or more network availability performance metrics for the first NF instance provided by the first candidate NFP, wherein the network availability performance metrics relate to and/or are affected by the availability of a candidate NFP to provide a reliable NF instance;

(v) calculate a reward based on values of the one or more network availability performance metrics for the first NF instance provided by the first candidate NFP;

(vi) train a reinforcement learning, RL, model that is to be used to identify an NFP in the plurality of NFPs to provide the first NF instance, wherein the RL model is trained using information on the candidate NFP and the calculated reward; and

(vii) repeat operation (ii)-(v) for one or more further selected candidate NFPs to calculate respective rewards and training the RL model using information on the one or more further candidate NFPs and the respective calculated rewards.

41 . A first node as claimed in claim 40, wherein the first node is further operative to: when the first NFC requires the first NF instance to be provided, use the trained RL model to identify one of the plurality of NFPs to provide the first NF instance.

42. A first node as claimed in claim 41 , wherein the first node is further operative to: establish a connection to the identified NFP and use the identified NFP to provide the first NF instance.

43. A first node as claimed in claim 42, wherein the first node is further operative to: perform operations (iv) and (v) for the first NF instance provided by the identified

NFP; and update the trained RL model based on the reward calculated for the first NF instance provided by the identified NFP.

44. A first node as claimed in any of claims 40-43, wherein the first node is operative to select a first candidate NFP by randomly selecting the first candidate NFP from the plurality of NFPs.

45. A first node as claimed in any of claims 40-43, wherein the first node is operative to select a first candidate NFP by selecting the first candidate NFP as the NFP in the plurality of NFPs having a highest priority flag in the received information.

46. A first node as claimed in any of claims 40-45, wherein the first node is operative to calculate the reward based on values of the one or more network availability performance metrics for the first NF instance provided by the first candidate NFP and a cost associated with using the first candidate NFP to provide the first NF instance to the first NFC.

47. A first node as claimed in claim 46, wherein the cost associated with using the first candidate NFP relates to a financial cost to an operator of the NFC and/or to an energy cost of using the NFP to provide the first NF instance.

48. A first node as claimed in any of claims 40-47, wherein the first node is further operative to: subscribe to the NRF for information on NFPs; wherein the information on the plurality of NFPs received in operation (i) is received as a result of the subscription.

49. A first node as claimed in any of claims 40-48, wherein the first node is further operative to: receive, from a database, further training data relating to the plurality of NFPs and respective rewards determined by other NFCs; and wherein operation (vii) comprises training the RL model using the information on the candidate NFPs, the calculated rewards, and the received further training data.

50. A first node as claimed in any of claims 40-49, wherein the information received from the NRF comprises, for each of the plurality of NFPs, information indicating any one or more of: an identifier for the NFP, types of NF instances provided by the NFP, capacity of the NFP, load of the NFP, and previous values of network availability performance metrics for the NFP.

51 . A first node as claimed in any of claims 40-50, wherein the one or more network availability performance metrics comprise any one or more of: latency, jitter, packet drop rate, and capacity of the NFP to provide NF instances.

52. A first node as claimed in any of claims 40-51 , wherein the RL model is built using deep learning, value learning or policy learning.

53. A first node as claimed in any of claims 40-52, wherein the NFP and/or the NFC is any of: Authentication Server Function (AUSF), Unified Data Repository (UDR), Unified Data Management (UDM), Policy Control Function (PCF), Short Message Service Function (SMSF), Location Management Function (LMF), Core Charging Function (CHF), Network Exposure Function (NEF), Application Function (AF), Access and Mobility Function (AMF), Session Management Function (SMF), Network Data Analytics Function (NWDAF), User Plane Function (UPF), a virtual network switch, a virtual network router and a virtual firewall.

54. A first node as claimed in any of claims 40-53, wherein the first node is further operative to: one or both of: (i) remove an existing NFP from the plurality of NFPs if the existing NFP is no longer able to provide the first NF instance; and (ii) add a new NFP to the plurality of NFPs if the new NFP is now able to provide the first NF instance; and update the trained RL model according to the NFPs in the plurality of NFPs.

55. A first node as claimed in any of claims 40-54, wherein the received information for the non-terrestrial-based NFPs comprises information indicating any of: relative positions and/or coordinates of an NFP, a direction of movement of an NFP, and one or more time periods in which the NFP is expected to be available for connection to the first NFC.

56. A first node as claimed in any of claims 40-55, wherein the one or more non-terrestrial- based NFPs have intermittent availability for providing a reliable NF instance to the NFC, and wherein the one or more terrestrial-based NFPs have high availability for providing a reliable NF instance to the NFC.

57. A first node as claimed in any of claims 40-56, wherein the first node is the first NFC.

58. A first node as claimed in any of claims 40-56, wherein the first node is a Service Communication Proxy, SCP.