WO2023227349A1 - Gestion de données destinées à être utilisées dans l'apprentissage d'un modèle - Google Patents

Gestion de données destinées à être utilisées dans l'apprentissage d'un modèle Download PDF

Info

Publication number
WO2023227349A1
WO2023227349A1 PCT/EP2023/061912 EP2023061912W WO2023227349A1 WO 2023227349 A1 WO2023227349 A1 WO 2023227349A1 EP 2023061912 W EP2023061912 W EP 2023061912W WO 2023227349 A1 WO2023227349 A1 WO 2023227349A1
Authority
WO
WIPO (PCT)
Prior art keywords
training data
network node
metadata
training
information related
Prior art date
Application number
PCT/EP2023/061912
Other languages
English (en)
Inventor
Pablo SOLDATI
Euhanna GHADIMI
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Publication of WO2023227349A1 publication Critical patent/WO2023227349A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0806Configuration setting for initial configuration or provisioning, e.g. plug-and-play
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/06Generation of reports
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/10Scheduling measurement reports ; Arrangements for measurement reports

Definitions

  • [001] Disclosed are embodiments related to managing data for use in training a model (e.g., a neural network or other model).
  • a model e.g., a neural network or other model.
  • FIG. 1 illustrates the Functional Framework for RAN Intelligence.
  • the framework includes the following functions: 1) a data collection function; 2) a model training function; 3) a model inference function; and 4) an actor function, or Actor.
  • the data collection function provides training data (e.g., a set of training data samples - i.e., one or more training data samples) to the model training function.
  • Training data is data that is used by the model training function to train a model (e.g., a neural network or other model).
  • a model e.g. a neural network
  • parameters e.g., neural network weights
  • the function approximated by the model is the Q-function, which assigns a value to a state-action pair.
  • the Q-function (hence the ML model) determine the behavior (or policy) of the RL agent.
  • the data collection function also provides inference data to the model inference function, which uses the inference data to produce an output (a.k.a., an inference).
  • ML model specific data preparation may also be carried out in the data collection function.
  • Examples of interference and training data may include measurements from user equipments (UEs) or different network entities, feedback from the Actor, and output from the model inference function.
  • the model training function performs the ML model training, validation, and testing which may generate model performance metrics as part of the model testing procedure.
  • the model training function is also responsible for data preparation (e.g. data pre-processing and cleaning, formatting, and transformation) based on training data delivered by a data collection function, if required.
  • the model training function deploys a trained, validated and tested model (e.g., a model that parameterizes or approximates at least one of a policy function, a value function and a Q-function in a deep reinforcement learning environment) to the model inference function or delivers an updated model to the model inference function.
  • the model inference function provides model inference output (e.g. predictions or decisions).
  • the model inference function may provide model performance feedback to the model training function when applicable.
  • the model inference function is also responsible for data preparation (e.g. data pre-processing and cleaning, formatting, and transformation) based on inference data delivered by a data collection function, if required.
  • the model inference function may provide model performance feedback information to the model training function, which uses this feedback information for monitoring the performance of the model.
  • the actor is a function that receives the output from the model inference function and triggers or performs corresponding actions.
  • the Actor may trigger actions directed to other entities or to itself.
  • the actions may generate feedback information, provided to the data collection function, that may be needed to derive training or inference data.
  • TR 37.817 3GPP Technical Report
  • TR 37.817 states:
  • AI/ML Model Training is located in the 0AM and AI/ML Model Inference is located in the gNB [5G base station],
  • AI/ML Model Training and AI/ML Model Inference are both located in the gNB. Note: gNB is also allowed to continue model training based on model trained in the 0AM.
  • AI/ML Model Training is located in the 0AM and AI/ML Model Inference is located in the gNB-CU.
  • AI/ML Model Training and Model Inference are both located in the gNB-CU.
  • TR 37.817 states:
  • the AI/ML Model Training function is deployed in CAM, while the model inference function resides within the RAN node.
  • Both the AI/ML Model Training function and the AI/ML Model Inference function reside within the RAN node.
  • AI/ML Model Training is located in CU-CP or 0AM and AI/ML Model Inference function is located in CU-CP.
  • gNB is also allowed to continue model training based on model trained in the 0 AM.
  • TR 37.817 states:
  • AI/ML Model Training is located in the 0AM and AI/ML Model Inference is located in the gNB.
  • AI/ML Model Training and AI/ML Model Inference are both located in the gNB.
  • AI/ML Model Training is located in the CAM and AI/ML Model Inference is located in the gNB-CU.
  • AI/ML Model Training and Model Inference are both located in the gNB-CU.
  • gNB is also allowed to continue model training based on model trained in the CAM.
  • Tdoc R3-215244 proposes to introduce a model management function in the Functional Framework for RAN Intelligence, as shown in FIG. 2.
  • Tdoc R3-215244 states:
  • Model deployment/update should be decided by model management instead of model training.
  • the model management may also host a model repository.
  • the model deployment/update should be performed by model management.
  • Model performance monitoring is a key function to assist and control model inference.
  • the model performance feedback from model inference should be first sent to model management. If the performance is not ideal, the model management may decide to fallback to traditional algorithm or change/update the model.
  • the model training should be also controlled by model management.
  • the model management function may be taken by either CAM or OU or other network entities depending on the use cases. Clearly defining a model management function is useful for future signalling design and analysis.
  • Model management function supports following roles: I) Requesting model training and receiving the model training result; ii) Model deployment/updates for inference, iii) Model performance monitoring, including receiving performance feedback from model inference and taking necessary action, e.g. keep the model, fallback to traditional algorithm, change or update the model, iv) Model storage.
  • the main objective of model training is to produce a model (e.g., neural network that parameterizes or approximates at least one of a policy function, a value function and a Q-function) that can generalize to conditions and situations not directly experienced in the training data (i.e., a model that performs well when used with inference data that differs from the training data used in the training process).
  • a model e.g., neural network that parameterizes or approximates at least one of a policy function, a value function and a Q-function
  • This process is also known as a training process.
  • each "rollout worker” i.e., a function that combines the functionality of the model inference function and the Actor function receives a model update from a Model Training function.
  • the rollout worker uses the received model to interact with an external environment by selecting actions and applying the actions to the environment.
  • the rollout worker can collect experience samples that can be used for further training and improving the model.
  • an experience sample is a tuple that comprises: i) an observation (e.g., state vector) for time step t (denoted St), ii) an action (At) selected based on St, iii) an observation for time step t+1 (denoted St+1), and iv) a reward value Rt based on St and St+1.
  • Some techniques provide a shared storage memory, also known as “replay buffer” or “experience buffer,” in which the rollout workers store the experience samples (e.g., at each time step, the rollout worker generates and stores an experience in the replay buffer).
  • the Model Trainer function can then filter experiences from the replay buffer to train/update the model (e.g., a new set of weights of a neural network), which is then provided to the distributed rollout workers.
  • Parallel and distributed experience sample collection allows the evaluation of multiple versions of a model in parallel and to quickly produce a new model. It also allows for improved diversity in the collected information, as different rollout workers can be tasked to test the model against different versions of the environment. This allows improved quality of the collected experiences, which in turns enables: producing a model that better generalizes against conditions (e.g., events) unseen during the training process, improving the speed of learning because updates of the model can be provided more frequently due to the high throughput of the training data generation, and improving learning efficiency (i.e., the improved data diversity provided by parallel and distributed rollout workers enables production of a better model for a given amount of experience samples compared to the case where a single rollout worker is used). Using these techniques in a RAN could achieve a performance that otherwise would not be possible to achieve.
  • a main objective of training a model is to produce a model that can generalize to conditions and situations not directly experienced in data set that was used to train the model. That is, a main objective is to produce a model than performs well when used with input data that differ from the training data used in the training process.
  • the training process requires training data that provides as much diverse information as possible. For example, training a model with training data collected only in low load conditions or only with low interference, will produce a model that works well in low load conditions or low interference conditions, respectively, but the model will not work well in high load conditions or high interference conditions, respectively.
  • Training a model with training data that comprises a mixture of training data samples collected in a diverse set of conditions, such a high/low load, high/low interference etc., will provide better model generalization.
  • training a model to properly generalize further requires to carefully balance the different types of training data used during the training process, so that the trained model will not fit certain conditions better than others.
  • Existing technology does not provide a solution to these problems.
  • a method performed by a first network node in a communications network.
  • the method includes transmitting to a second network node a data collection formatting message comprising data formatting configuration information for configuring the second network node to provide a training data report comprising a training metadata payload comprising metadata associated with a set of one or more training data elements.
  • the method also includes receiving from the second network node a data collection report message comprising a training data report comprising a first training metadata payload comprising first metadata associated with a first set of one or more training data elements.
  • a method performed by a second network node includes transmitting to a network node a data collection report message comprising a training data report formatted according to data formatting configuration information.
  • a computer program comprising instructions which when executed by processing circuitry of a network node causes the network node to perform any of the methods disclosed herein.
  • a carrier containing the computer program wherein the carrier is one of an electronic signal, an optical signal, a radio signal, and a computer readable storage medium.
  • a network node that is configured to perform the methods disclosed herein.
  • the network node may include memory and processing circuitry coupled to the memory.
  • An advantage of the embodiments disclosed herein is that they enable an improvement in training of models by enabling efficient training data collection and management from multiple sources.
  • the ability to discriminate training data based on metadata enables producing a model for a certain task that can better generalize across different conditions and environments.
  • This is instrumental for boosting the performance of radio communication systems, where a function (a.k.a., process) that uses the model can be deployed to operate in multiple radio cells of the network or in multiple user devices, each experiencing a different radio environment and conditions.
  • Training data collected for a model from multiple network nodes in a radio communication network may consist of millions to trillions of training data elements being gathered across the network. It is therefore beneficial to discriminate which training data elements is more informative, whether there is strong correlation from the training data elements collected by different network entities, etc. This can allow to reduce resources necessary to store and manage the massive amount of training data elements produced in a radio access network, thereby reducing operational costs.
  • Another advantage of the embodiments is that they enable better generalization of models for RAN operations, where generalization can be achieved with respect to different conditions or environments.
  • the metadata that goes with training data allows one to efficiently manage, filter, or categorize the training data samples included in the training data (or derived therefrom) according to different criteria, thereby enabling the production of a model that can generalize to and fit (i.e., operate well) different conditions and environments.
  • the method allows to efficiently train: 1) global models, such as a model that can generalize to and fit network-wide (e.g., the same algorithm suitable to operate in all nodes or radio cells or user devices in the network); 2) regional models, such as different models that can generalize to and fit specific regions of the network (e.g., the same model suitable to operate in all nodes or radio cells or user devices of a region of the network); and 3) local models such as different models that can generalize and fit the specific radio environment observed in a particular network node or radio cell or user device.
  • global models such as a model that can generalize to and fit network-wide (e.g., the same algorithm suitable to operate in all nodes or radio cells or user devices in the network)
  • regional models such as different models that can generalize to and fit specific regions of the network (e.g., the same model suitable to operate in all nodes or radio cells or user devices of a region of the network)
  • local models such as different models that can generalize and fit the specific radio environment observed in a particular
  • Another advantage of the embodiments is that they enable improved diversity in the training data used for training a particular model.
  • metadata corresponding to particular training data can be used to efficiently manage, filter, categorize, and select training data element from the training data to improve, update, or optimize a model to better generalize or fit certain conditions or events, thereby improving the system performance.
  • the embodiments can be used to ensure information diversity of training data used to train models for applications to radio access networks, communication networks, etc.
  • FIG. 1 illustrates a Functional Framework for RAN Intelligence.
  • FIG. 2 illustrates the introduction of a model management function in the Functional Framework for RAN Intelligence.
  • FIG. 3 illustrates a training architecture exploiting distributed collection of training data.
  • FIG. 4 is a message flow diagram according to an embodiment.
  • FIG. 5 illustrates a training report according to an embodiment.
  • FIG. 6 illustrates a training report according to an embodiment.
  • FIG. 7 illustrates a training report according to an embodiment.
  • FIG. 8 illustrates a training report according to an embodiment.
  • FIG. 9 illustrates a training report according to an embodiment.
  • FIG. 10 is a message flow diagram according to an embodiment.
  • FIG. 11 is a flowchart illustrating a process according to an embodiment.
  • FIG. 12 is a flowchart illustrating a process according to an embodiment.
  • FIG. 13 is a block diagram of a network node according to an embodiment.
  • a "network node” can be a RAN node, an 0AM, a Core Network node, an 0AM, an SMO, a Network Management System (NMS), a Non-Real Time RAN Intelligent Controller (Non-RT RIO), a Real- Time RAN Intelligent Controller (RT-RIC), a gNB, eNB, en-gNB, ng-eNB, gNB-CU, gNB-CU-CP, gNB-CU-UP, eNB- CU, eNB-CU-CP, eNB-CU-UP, lAB-node, lAB-donor DU, lAB-donor-CU, IAB-DU, IAB-MT, O-CU, O-CU-CP, O-CU- UP, O-DU, O-RU, O-eNB, a UE.
  • NMS Network Management System
  • Non-RT RIO Non-Real Time RAN Intelligent Controller
  • RT-RIC Real- Time RAN Intelligent Controller
  • a network node may be a physical node or a function or logical entity of any kind, e.g. a software entity implemented in a data center or a cloud, e.g. using one or more virtual machines, and two network nodes may well be implemented as logical software entities in the same data center or cloud.
  • model training model optimizing, model optimization, model updating are herein used interchangeably with the same meaning unless explicitly specified otherwise.
  • model changing, modify or similar are herein used interchangeably with the same meaning unless explicitly specified otherwise. In particular, they refer to the fact that the type, structure, parameters, connectivity of a model may have changed compared to a previous format/configuration of the model.
  • Al model ML model
  • AI/ML model AIML model
  • Data collection refers to a process of collecting data for the purpose of model training, data analytics, and/or inference.
  • AI/ML models may include supervised learning algorithms, deep learning algorithms, reinforcement learning type of algorithms (such as DQN, A2C, A3C, etc.), contextual multi-armed bandit algorithms, autoregression algorithms, etc., or combinations thereof.
  • Such algorithms may exploit functional approximation models, hereafter referred to as AI/ML models, such as neural networks (e.g. feedforward neural networks, deep neural networks, recurrent neural networks, convolutional neural networks, etc.).
  • reinforcement learning algorithms may include deep reinforcement learning (such as deep Q-network (DQN), proximal policy optimization (PPO), double Q- learning), actor-critic algorithms (such as Advantage actor-critic algorithms, e.g. A2C or A3C, actor-critic with experience replay, etc), policy gradient algorithms, off-policy learning algorithms, etc.
  • DQN deep Q-network
  • PPO proximal policy optimization
  • double Q- learning double Q- learning
  • actor-critic algorithms such as Advantage actor-critic algorithms, e.g. A2C or A3C, actor-critic with experience replay, etc
  • policy gradient algorithms e.g. A2C or A3C, actor-critic with experience replay, etc.
  • This disclosure provides a method for a first network node 402 (see FIG. 4) to configure (e.g., instruct, indicate, require, etc.) a second network node 402 to provide a training data report associated to a model (i.e., for potential use in training the model) according to a data formatting configuration provided by first network node 402.
  • the data formatting configuration may configure second network node 404 to produce a training data report data comprising: 1) at least a first training data payload comprising a first set of training data elements (e.g., one or more lEs wherein each IE contains at least one training data element) and 2) a metadata payload comprising first metadata corresponding to the first training data payload (e.g., the first set of training data elements).
  • a "training data element” may be: a training data sample (e.g., an experience sample), one or more components of a training data sample, information that can be used to generate a training data sample, or information that can be used to generate one or more components of a training data sample.
  • a training data element may comprise a set of measurement values and an average of these measurement values can be calculated, wherein the calculated average value is a training data sample.
  • a training data sample may consists of N values (e.g., X1, X2, .... XN) and a training data element may consists of a subset of these values (e.g., the training data element may consists of X1 or consist of X1 and X2).
  • a training data sample may describe a state used by an algorithm, which could consist of multiple measurements (e.g., Reference Signal Received Power (RSRP), Channel Quality Indicator (CQI), rank, signal -to- noise ratio (SNR), etc.) and a training data element may consists of just one of the components (e.g., the SNR value) of the training data sample.
  • RSRP Reference Signal Received Power
  • CQI Channel Quality Indicator
  • SNR signal -to- noise ratio
  • the data formatting configuration may configure second network node 404 to format a training data report (or "report” for short) such that each distinct training data payload (e.g., distinct set of training data elements) in the report is associated with corresponding metadata for the training data payload. Therefore, second network node 404 may report one or a group of training data payloads, each comprising a training data elements and associated metadata.
  • the data formatting configuration may configure second network node 404 to format a specified group of training data payloads or all of the training data payloads with a single metadata payload. Therefore, second network node 404 may report a group of training data elements with the same metadata.
  • the metadata payload may comprise one or more metadata, such as tags (a.k.a., labels), associated to one or more data payloads, wherein the metadata that can be used to manage the training data elements (such as filter, select, categorize training data elements) associated to a model.
  • metadata is particularly relevant when training data is collected by a plurality of network nodes or user devices in association to the same model and stored in a shared storage memory/entity.
  • Examples of metadata that may be contained in a metadata payload corresponding to a training data payload may include one or more of:
  • the metadata may comprise one or more indications indicating which information elements (lEs) of the data payload can additionally be used as metadata.
  • FIG. 4 is a message flow diagram illustrating a steps, according to an embodiment, performed by a first network node 402 and a second network node 404.
  • first network node 402 and/or second network node 404 can be different RAN nodes (e.g. two gNBs, or two eNBs, or two en-gNBs, or two ng-eNBs) or different core network node or one can be a RAN node or core network node while the other is a UE.
  • different RAN nodes e.g. two gNBs, or two eNBs, or two en-gNBs, or two ng-eNBs
  • core network node e.g. two gNBs, or two eNBs, or two en-gNBs, or two ng-eNBs
  • first network node 402 and/or second network node 404 can be different nodes/functions of a same RAN node (e.g. a gNB-CU-CP and a gNB-DU, or a gNB-CU-CP and a gNB-CU-UP).
  • a same RAN node e.g. a gNB-CU-CP and a gNB-DU, or a gNB-CU-CP and a gNB-CU-UP.
  • first network node 402 can be a first RAN node (e.g. a gNBs, or a eNBs, or a en-gNBs, or a ng-eNBs) and second network node 404 can be component/nodes/functions of a second RAN node (e.g. gNB-CU-CP).
  • first RAN node e.g. a gNBs, or a eNBs, or a en-gNBs, or a ng-eNBs
  • second network node 404 can be component/nodes/functions of a second RAN node (e.g. gNB-CU-CP).
  • first network node 402 and/or second network node 404 can pertain to the same Radio Access Technology (e.g. e.g. E-UTRAN, , NG-RAN, , WiFi, etc.) or to different Radio Access Technologies (e.g. one to NR and the other to E-UTRAN or WiFi).
  • Radio Access Technology e.g. e.g. E-UTRAN, , NG-RAN, , WiFi, etc.
  • different Radio Access Technologies e.g. one to NR and the other to E-UTRAN or WiFi.
  • first network node 402 and/or second network node 404 can pertain to the same RAN system (e.g. E-UTRAN, , NG-RAN, , WiFi, etc) or to different RAN systems (e.g. one to NG-RAN and the other to E-UTRAN).
  • RAN system e.g. E-UTRAN, , NG-RAN, , WiFi, etc
  • different RAN systems e.g. one to NG-RAN and the other to E-UTRAN.
  • first network node 402 and second network node 404 may be connected via a direct signaling connection (e.g. two gNB via XnAP), or an indirect signaling connection (e.g. an e-NB and a gNB via S1AP, NGAP and one or more Core Network nodes, e.g. an MME and an AMF).
  • first network node 402 can be a management system, such as the OAM system or the SMO, while second network node 404 can consist of a RAN node or function.
  • first network node 402 can be a RAN node or function while second network node 404 can be a management system, such as the OAM or the SMO.
  • first network node 402 can be a core network node or function, such a 5GC function, while second network node 404 can consist of a RAN node or function.
  • first network node 402 can be a RAN node or function while second network node 404 can be a core network node or function, such a 5GC function.
  • first network node 402 transmit a data collection formatting message m450 and second network node 404 receives message m450.
  • the message comprises data formatting configuration information for configuring (e.g., indicating, instructing, or requesting) second network node 404 to produce a training data report comprising: i) a training data payload (a.k.a., "data payload” for short) comprising one or more lEs containing training data elements and ii) a training metadata payload (a.k.a., "metadata payload” for short) comprising metadata associated with the data payload.
  • a training data payload a.k.a., "data payload” for short
  • a training metadata payload a.k.a., "metadata payload” for short
  • report 500 comprises at least a first data payload and at least a first metadata payload corresponding to the first data payload.
  • second network node 404 transmits a data collection report message m452 comprising the training data report comprising the data payload and the metadata payload.
  • first network node 402 receives the message m452, but in other embodiments another network node may receive message 452 (e.g., rather than transmitting message m452 to first network node, second network node transmits message 452 to a third network node (not shown).
  • the data collection formatting message m450 may indicate, instruct, or request that each training data payload included in a training data report is associated with dedicated metadata. This is illustrated in FIG. 6, which shows a report 600 comprising i) a first data payload and a corresponding first metadata payload and ii) a second data payload and a corresponding second metadata payload.
  • the data collection formatting message may indicate, instruct, or request second network node 404 to produce a report that comprises multiple data payloads and one metadata payload corresponding to each data payload (i.e., the metadata is shared among the data payloads). This would enable reducing the information overhead due to the metadata.
  • each data payload reported by second network node 404 is associated with the same metadata.
  • FIG. 7 shows a non-limiting illustration of this embodiment.
  • the training data payload may consist of one or more lEs containing training data (e.g., experience samples) used to train or update a model deployed at second network node 404.
  • the lEs of the training data payload may represent "input features” (i.e. , input information) used or required by the model (e.g., to perform inference and provide an output).
  • lEs of the training data payload may additionally include an action identifier (such as an action identity or an action index) or an action value and/or the output produced by the model, and a performance metric associated to the action (action identifier or action value) or to the output produced by the model, often referred to as "reward.”
  • action identifier such as an action identity or an action index
  • performance metric associated to the action or action value
  • the exact type of lEs comprising the training data payload depend on the specific design of the model and it may consist of UE measurements and network measurements.
  • the training data payload may comprise two groups of lEs representing "input features”, one set containing the information prior to taking the indicated action, and one set containing information measured or determined upon taking the indicated action. At least one IE, such as a binary flag or a special type of information (e.g., a "null” value), could be used to denote separate and identify the two sets of information.
  • first network node 402 configures (e.g., instructs or requests) a second network node to generate a training data report that includes one or more training data payloads and one or more metadata payloads (see, e.g., training data report 500 illustrated in FIG. 5).
  • second network node 404 may add or append a metadata payload to a training data payload.
  • the requested metadata payload may comprise metadata indicating, for instance, the conditions and/or the environment (such as radio environment and deployment environment) under which the training data in the corresponding training data payload has been collected, as well as information related to the network entity involved in the collection of the training data.
  • the metadata may comprise a set of data labels (a.k.a., "tags”), which enable proper and efficient management of training data collected by multiple network entities (e.g., access network nodes and UEs). For instance, when storing a training data element in a database, metadata corresponding to the training data element may be stored with the training data element, thereby enabling the later retrieval from the database of training data element based on the metada stored therewith. Therefore, in some examples, the individual metadata of a metadata payload may comprise control tags that can be used to manage training data associated to a model. As such, the metadata corresponding to a training data payload may allow a node to filter, select, categorize, present training data of a large training data set. Such metadata is particularly relevant when training data is collected by a plurality of network nodes or user devices in association to the same model and stored in a shared storage memory/entity.
  • metadata is particularly relevant when training data is collected by a plurality of network nodes or user devices in association to the same model and stored
  • FIG. 8 illustrates a training data report formatting consisting of a training data payload and a corresponding metadata payload consisting of one or more control tags.
  • first network node 402 (or a third network node) could use information identifying a radio cell to select/filter training data with information gathered in a specific radio cell. For example, if a database stores a large number of training data elements and further stores, for each training data element, metadata indicating the radio cell to which the training data element pertains, then one can retrieve from the database only those samples pertaining to a particular radio cell. In this way first network node 402 could train and optimize the model to generalize and fit the data distribution produced by the intended radio cell. This would enable training a model that operates well in the environment experienced in a specific radio cell. This could be beneficial in case of radio cells with very specific or critical surroundings.
  • first network node 402 (or a third network node) could use information identifying radio cells to select/filter a training data set with information gathered in group of specific radio cells, so as to train a model for a radio feature intended to operate in such radio cell.
  • first network node 402 could train and optimize the model to generalize and fit the data distribution produced by the group of intended radio cells.
  • the advantage in this case, is that a single model could be optimized to operate well in multiple radio cells.
  • data collection formatting message m450 may indicate a data collection process, such as a data collection process identity, to which the data formatting configuration refers to or is associated to.
  • the metadata included in a metadata payload may include one or more of:
  • the metadata payload may comprise one or more indications of which lEs of the training data payload can be used as metadata.
  • the metadata payload may comprise a bitmap comprising one bit associated to each IE of the training data payload, wherein the value of this bit determines whether the IE contains metadata.
  • the metadata payload may indicate whether or not the specific training data IE corresponding to this bit should be used as metadata (e.g., as a tag or label).
  • FIG. 9 illustrates an example where the metadata payload consists of a bitmap, with one binary input associated to one IE in the training data payload, e.g., with association by indexing.
  • a value equal to one (1) in the metadata payload bitmap indicates that the associated IE in the training data payload is also a metadata IE.
  • the lEs indexed by and i 2 in the training data payload are indicated as being also metadata lEs.
  • the metadata payload may comprise consist of: one or more metadata or control lEs, such as a tags or labels; and one or more indications of which lEs of the data payload can be used as metadata or control information, such as a bitmap.
  • second network node 404 may generate a data collection report message comprising one training data report formatted according to a data collection formatting message (e.g., message m450) provided by first network node 402. Therefore, the data collection report message may comprise a training data report that comprises: 1 ) a training data payload comprising a set of one or more training data elements and 2) a metadata payload comprising metadata associated one or more of the one or more training data elements.
  • the metadata payload may comprise one or more of:
  • an indication of a data collection process such as a data collection process identity, to which the corresponding data refers to.
  • this information could be provided by the data collection formatting message itself and should be appended to the corresponding information;
  • second network node 404 may additionally transmit to first network node 402 a data collection formatting response message m1050, where the message m1050 indicate whether or not second network node 404 can provide all or part of the requested data according to all or part of the requested data formatting configuration.
  • the data collection formatting response message m1050 may report an error message indicating that second network node 404 cannot provide the requested data according to the requested data formatting.
  • the data collection formatting response message may additionally indicate which of the requested data or which of the requested formatting cannot be provide.
  • second network node 404 may not be able to provide one or more of the requested metadata, in which case second network node 404 could indicate which metadata can or cannot be provided.
  • second network node 404 may not be able to provide one or more of the requested training data, in which case second network node 404 could indicate which training data can or cannot be provided.
  • second network node 404 may generate a training data report associated to a model according to a training data format configuration. For example, in some embodiments, second network node 404 perform a process that includes: 1) receiving a data collection formatting message m450 from a first network node comprising a data formatting configuration indicating, instructing, or requesting second network node 404 to provide a training data report comprising: I) a training data payload comprising a set of one or more training data elements and ii) a metadata payload comprising metadata corresponding to the set of training data elements; and 2) transmitting to first network node 402 or to a third network node a data collection report message comprising the training data report.
  • second network node 404 may transmit a data collection formatting response message m1050 to first network node 402 indicating whether second network node 404 can provide all or part of the requested data according to all or part of the requested data formatting configuration.
  • the third network node may be indicated by the data collection formatting message.
  • FIG. 13 is a block diagram of network node 1300, according to some embodiments, that can be used to implement first network node 504 or second network node 506.
  • network node 1300 may comprise: processing circuitry (PC) 1302, which may include one or more processors (P) 1355 (e.g., one or more general purpose microprocessors and/or one or more other processors, such as an application specific integrated circuit (ASIC), field-programmable gate arrays (FPGAs), and the like), which processors may be colocated in a single housing or in a single data center or may be geographically distributed (i.e., network node 1300 may be a distributed computing apparatus); at least one network interface 1348 (e.g., a physical interface or air interface) comprising a transmitter (Tx) 1345 and a receiver (Rx) 1347 for enabling network node 1300 to transmit data to and receive data from other nodes connected to a network 110 (e.g., an Internet Protocol (IP)
  • IP Internet Protocol
  • a computer readable storage medium may be provided.
  • CRSM 1342 may store a computer program (CP) 1343 comprising computer readable instructions (CRI) 1344.
  • CP computer program
  • CRSM 1342 may be a non-transitory computer readable medium, such as, magnetic media (e.g., a hard disk), optical media, memory devices (e.g., random access memory, flash memory), and the like.
  • the CR1 1344 of computer program 1343 is configured such that when executed by PC 1302, the CRI causes network node 1300 to perform steps described herein (e.g., steps described herein with reference to the flow charts).
  • network node 1300 may be configured to perform steps described herein without the need for code. That is, for example, PC 1302 may consist merely of one or more ASICs. Hence, the features of the embodiments described herein may be implemented in hardware and/or software. According to embodiments, a network node may also be deployed or implemented as a function or logical entity of any kind, e.g. as a software entity implemented in a data center or a cloud, e.g. using one or more virtual machines.
  • a network node may be a RAN node, an CAM, a Core Network node, an SMC, a Network Management System (NMS), a logic function in an Open RAN (O-RAN), a Non-Real Time RAN Intelligent Controller (Non-RT RIC), a Real-Time RAN Intelligent Controller (RT-RIC), a gNB, eNB, en-gNB, ng-eNB, gNB-CU, gNB-CU-CP, gNB-CU-UP, eNB-CU, eNB-CU-CP, eNB-CU-UP, lAB-node, lAB-donor DU, lAB-donor-CU, IAB-DU, IAB-MT, O-CU, O-CU-CP, O-CU-UP, O-DU, O-RU, O-eNB, a UE.
  • NMS Network Management System
  • O-RAN Open RAN
  • Non-RT RIC Non-Real Time RAN Intelligent Controller
  • a method 1100 performed by a first network node 402 in a communications network, the method comprising: transmitting (step s1102) to a second network node 404 a data collection formatting message m450 comprising data formatting configuration information for configuring (e.g., indicating, instructing, or requesting) the second network node 404 to provide a training data report comprising a training metadata payload comprising metadata associated with a set of one or more training data elements (e.g., a set of training data samples or information for generating a set of training data samples); and/or receiving (step s1106) from the second network node 404 a data collection report message m452 comprising a training data report comprising a first training metadata payload comprising first metadata associated with a first set of one or more training data elements.
  • a data collection formatting message m450 comprising data formatting configuration information for configuring (e.g., indicating, instructing, or requesting) the second network node 404 to provide a training data report comprising
  • training data report included in the data collection report message m452 further comprises a first training data payload comprising said first set of one or more training data elements, and said data formatting configuration information configured the second network node 404 to provide the training data report comprising both: I) the first training data payload and ii) said first training metadata payload.
  • A5. The method of embodiment A4, wherein the data collection formatting response message m1050 indicate whether or not the second network node 404 can provide all or part of data requested according to all or part of the data formatting configuration.
  • A6 The method of any one of embodiment A1-A5, wherein the method comprises transmitting said data collection formatting message m450 to the second network node, and the data formatting configuration information included in the message m450 indicates a third network node, and the data formatting configuration information included in the message m450 is for configuring the second network node to transmit the data collection report message to the third network node.
  • A7 The method of any one of embodiment A1 -A6, wherein the method comprises receiving from the second network node 404 the data collection report message comprising the training data report, and the training data report comprises: i) a first training data payload comprising a first set of one or more training data elements and ii) a first metadata payload corresponding to the first training data payload, the first metadata payload comprising first metadata corresponding to the first set of training data elements.
  • A8 The method of embodiment A7, wherein the first metadata comprises an indication of a data collection process (e.g., as a data collection process identity) associated with the collection of the first set of training data elements.
  • a data collection process e.g., as a data collection process identity
  • the first metadata comprises information related to at least a particular network node relevant to the first set of training data elements (e.g., information related to a network node that collected or created the first set of training data elements).
  • the information related to the particular network node comprises one or more of: an indication or identity of the particular network node; or information related to the configuration and/or characteristics of the particular network node.
  • A13 The method of any one of embodiments A7-A12, wherein the first metadata comprises information related to at least a particular radio cell relevant to the first set of training data elements (e.g., a radio where all or part of the one or more training data elements was produced or collected).
  • the first metadata comprises information related to at least a particular radio cell relevant to the first set of training data elements (e.g., a radio where all or part of the one or more training data elements was produced or collected).
  • the information related to the particular radio cell comprises one or more of: an indication or identity of the radio cell; an indication or identity of a portion of the radio cell, such as the coverage area of a downlink reference signal beam; information related to the configuration and/or characteristics of the radio cell; information characterizing traffic in the radio cell; or information representing key performance indicators (KPIs), for uplink and/or downlink performance, associated with the radio cell.
  • KPIs key performance indicators
  • A15 The method of any one of embodiments A7-A10, wherein the first metadata comprises: information related to an area or environment surrounding a particular network node or the particular radio cell relevant to the first set of training data elements.
  • the information related to the area or environment comprises one or more of: an indication of at least one area from which the training data elements originated (e.g., one or more of: at least a tracking area or tracking area identifier, at least RAN -based notification areas or RAN-based notification areas, at least one public land mobile network (PLMN) identifier), or information related interfering network nodes and/or cells.
  • an indication of at least one area from which the training data elements originated e.g., one or more of: at least a tracking area or tracking area identifier, at least RAN -based notification areas or RAN-based notification areas, at least one public land mobile network (PLMN) identifier
  • PLMN public land mobile network
  • A17 The method of any one of embodiments A7-A16, wherein the first metadata comprises: information related to at least a first user device relevant to the first set of training data elements (e.g., a user device that produced or collected all or part of the training data elements).
  • the information related to at least the first user device comprises one or more of: the identity or identifier of the first user device, information indicating the number of user devices relevant to the first set of training data elements, information related to the type and/or characteristics of the first user device, information indicating a configuration of the first user device (e.g., energy saving configuration, discontinuous transmission (DTX) configuration, discontinuous reception (DRX) configuration, RRC state (e.g., ACTIVE, INACTIVE, CONNECTED)), information related to the mobility of the first user device (e.g., speed, acceleration, direction, elevation, etc.), or information related to traffic sent to and/or from the first user device.
  • DTX discontinuous transmission
  • DRX discontinuous reception
  • RRC state e.g., ACTIVE, INACTIVE, CONNECTED
  • information related to the mobility of the first user device e.g., speed, acceleration, direction, elevation, etc.
  • traffic sent to and/or from the first user device e.g., speed,
  • A19 The method of any one of embodiments A7-A18, wherein the first metadata comprises information related to a model to which the first set of training data elements is associated to or that was used to collect the first set of training data elements.
  • the information related to the model comprises one or more of: an identity or identifier of a process (a.k.a., function) that uses the model (e.g., a power management process, a link adaptation process, etc.), an identity or identifier of the model, a version identifier of the process, a version identifier of the model, or Information related to configuration or parameter used by the model or process.
  • a process a.k.a., function
  • the model e.g., a power management process, a link adaptation process, etc.
  • Information related to configuration or parameter used by the model or process e.g., information related to the model.
  • A21 The method of any one of embodiments A1 -A20, wherein the method comprises the step of receiving the data collection report message, the first set of training data elements comprises a first training data element, and the first training data element is: a training data sample (e.g., an experience sample), one or more components of a training data sample, information that can be used to generate a training data sample, or information that can be used to generate one or more components of a training data sample.
  • a training data sample e.g., an experience sample
  • a method 1200 performed by a second network node 404 in a communications network, the method comprising: transmitting (step s1206) to a network node a data collection report message m452 comprising a training data report comprising a first metadata payload comprising first metadata corresponding to a first set of training data elements.
  • a data collection report message m452 comprising a training data report comprising a first metadata payload comprising first metadata corresponding to a first set of training data elements.
  • the training data report further comprises a first training data payload comprising said first set of training data elements.
  • step s1204 transmitting (step s1204) to the first network node a data collection formatting response message m1050 that is responsive to the data collection formatting message m450.
  • [00128] B6 The method of any one of embodiment B3-B5, wherein the data formatting configuration information is for configuring the second network node to provide a training data report comprising: I) a first training data payload comprising a first set of one or more training data elements and ii) a first metadata payload corresponding to the first training data payload, the first metadata payload comprising first metadata corresponding to the first set of training data elements.
  • [00130] B8 The method of any one of embodiment B1-B7, wherein the first metadata comprises an indication of a data collection process (e.g., as a data collection process identity) associated with the collection of the first set of training data elements.
  • a data collection process e.g., as a data collection process identity
  • [00135] B13 The method of any one of embodiments B1-B12, wherein the first metadata comprises information related to at least a particular radio cell relevant to the first set of training data elements (e.g., a radio where all or part of the one or more training data is produced or collected).
  • the first metadata comprises information related to at least a particular radio cell relevant to the first set of training data elements (e.g., a radio where all or part of the one or more training data is produced or collected).
  • the information related to the particular radio cell comprises one or more of: an indication or identity of the particular radio cell; information related to the configuration and/or characteristics of the particular radio cell; information characterizing traffic in the particular radio cell; or information representing key performance indicators (KPIs), for uplink and/or downlink performance, associated with the particular radio cell.
  • KPIs key performance indicators
  • the first metadata comprises: information related to an area or environment surrounding a particular network node or a particular radio cell relevant to the first set of training data elements (e.g., relevant to the network node or cell that originated the first set of training data elements).
  • the information related to the area or environment comprises one or more of: an indication of at least one area from which the training data elements originated (e.g., one or more of: at least a tracking area or tracking area identifier, at least RAN -based notification areas or RAN-based notification areas, at least one public land mobile network (PLMN) identifier), or information related interfering network nodes and/or cells.
  • an indication of at least one area from which the training data elements originated e.g., one or more of: at least a tracking area or tracking area identifier, at least RAN -based notification areas or RAN-based notification areas, at least one public land mobile network (PLMN) identifier
  • PLMN public land mobile network
  • the first metadata comprises: information related to at least a first user device relevant to the first set of training data elements (e.g., a user device that produced or collected all or part of the corresponding training data).
  • the information related to at least the first user device comprises one or more of: the identity or identifier of the first user device, information indicating the number of user devices relevant to the first set of training data elements, information related to the type and/or characteristics of the first user device, information indicating a configuration of the first user device (e.g., energy saving configuration, discontinuous transmission (DTX) configuration, discontinuous reception (DRX) configuration, RRC state (e.g., ACTIVE, INACTIVE, CONNECTED)), information related to the mobility of the first user device, or information related to traffic sent to and/or from the first user device.
  • the first metadata comprises information related to a model to which the first set of training data elements is associated or that was used to collect the first set of training data elements.
  • the information related to the model comprises one or more of: an identity or identifier of a process that uses the model (e.g., a power management process, a link adaptation process, etc.), an identity or identifier of the model, a version identifier of the process, a version identifier of the model, or Information related to configuration or parameter used by the model or process.
  • a process that uses the model e.g., a power management process, a link adaptation process, etc.
  • Information related to configuration or parameter used by the model or process e.g., a power management process, a link adaptation process, etc.
  • the first set of training data elements comprises a first training data element
  • the first training data element is: a training data sample (e.g., an experience sample), one or more components of a training data sample, information that can be used to generate a training data sample, or information that can be used to generate one or more components of a training data sample.
  • a computer program (1343) comprising instructions (1344) which when executed by processing circuitry (1302) of a network node (1300) causes the network to perform the method of any one of embodiments A1-A21 or B1-B22.
  • D1 A first network node, first network node 402 being configured to perform the method of embodiment A1.
  • a second network node, second network node 404 being configured to perform the method of embodiment B1.
  • transmit to means “transmit directly or indirectly to.” Accordingly, transmitting a message to a node encompasses transmitting the message directly to the node or transmitting the message indirectly to the node such that the message is relayed to the node via one or more intermediate nodes.
  • receive from means “receive directly or indirectly from.” Accordingly, receiving a message from a node encompasses receiving the message directly from the node or receiving the message indirectly from node such that the message is relayed from the sender to the node via one or more intermediate nodes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

L'invention concerne un procédé mis en œuvre par un premier nœud de réseau dans un réseau de communication. Le procédé consiste à transmettre, à un second nœud de réseau, un message de formatage de collecte de données comprenant des informations de configuration de formatage de données permettant de configurer le second nœud de réseau pour fournir un rapport de données d'apprentissage comprenant une charge utile de métadonnées d'apprentissage comportant des métadonnées associées à un ensemble constitué d'un ou de plusieurs éléments de données d'apprentissage. Le procédé consiste également à recevoir, du second nœud de réseau, un message de rapport de collecte de données comprenant un rapport de données d'apprentissage comprenant une première charge utile de métadonnées d'apprentissage comportant des premières métadonnées associées à un premier ensemble constitué d'un ou plusieurs éléments de données d'apprentissage.
PCT/EP2023/061912 2022-05-25 2023-05-05 Gestion de données destinées à être utilisées dans l'apprentissage d'un modèle WO2023227349A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202263345617P 2022-05-25 2022-05-25
US63/345,617 2022-05-25

Publications (1)

Publication Number Publication Date
WO2023227349A1 true WO2023227349A1 (fr) 2023-11-30

Family

ID=86378637

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2023/061912 WO2023227349A1 (fr) 2022-05-25 2023-05-05 Gestion de données destinées à être utilisées dans l'apprentissage d'un modèle

Country Status (1)

Country Link
WO (1) WO2023227349A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190034829A1 (en) * 2017-12-28 2019-01-31 Intel Corporation Filtering training data for models in a data center
WO2020167223A1 (fr) * 2019-02-14 2020-08-20 Telefonaktiebolaget Lm Ericsson (Publ) Collecte de données initiée par ran
WO2022060923A1 (fr) * 2020-09-16 2022-03-24 Intel Corporation Services non en temps réel pour ia/ml

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190034829A1 (en) * 2017-12-28 2019-01-31 Intel Corporation Filtering training data for models in a data center
WO2020167223A1 (fr) * 2019-02-14 2020-08-20 Telefonaktiebolaget Lm Ericsson (Publ) Collecte de données initiée par ran
WO2022060923A1 (fr) * 2020-09-16 2022-03-24 Intel Corporation Services non en temps réel pour ia/ml

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
3GPP TECHNICAL DOCMENT (TDOC) R3-215244
3GPP TECHNICAL REPORT (TR) 37.817

Similar Documents

Publication Publication Date Title
EP4099635A1 (fr) Procédé et dispositif de sélection de service dans un système de communication sans fil
US20220012645A1 (en) Federated learning in o-ran
US11451452B2 (en) Model update method and apparatus, and system
US20220052925A1 (en) Predicting Network Communication Performance using Federated Learning
EP4161128A1 (fr) Procédé d'optimisation de réseau, serveur, dispositif côté réseau, système, et support de stockage
EP3972339A1 (fr) Prévision et gestion de taux de réussite de transfert utilisant l'apprentissage machine pour réseaux 5g
CN112512058A (zh) 网络优化方法、服务器、客户端设备、网络设备和介质
CN111466103B (zh) 用于网络基线的生成和适配的方法和系统
US10361913B2 (en) Determining whether to include or exclude device data for determining a network communication configuration for a target device
WO2022060923A1 (fr) Services non en temps réel pour ia/ml
CN115486117A (zh) 机器学习辅助操作控制
CN111480318A (zh) 使用主动学习框架控制用于数据分析服务的数据报告
WO2022152515A1 (fr) Appareil et procédé permettant une rétroaction d'analyse
WO2022038760A1 (fr) Dispositif, procédé et programme de prédiction de qualité de communication
CN114765789A (zh) 无线通信网络中数据处理方法和装置
WO2023227349A1 (fr) Gestion de données destinées à être utilisées dans l'apprentissage d'un modèle
Fortuna et al. Software interfaces for control, optimization and update of 5G machine type communication networks
WO2023114017A1 (fr) Solutions basées sur un modèle de ressources de réseau pour formation de modèle ai-ml
US11863354B2 (en) Model transfer within wireless networks for channel estimation
US11805022B2 (en) Method and device for providing network analytics information in wireless communication network
US20220286365A1 (en) Methods for data model sharing for a radio access network and related infrastructure
EP3457634B1 (fr) Collecte de données de performance de plan de gestion
WO2024027911A1 (fr) Modèles spécifiques d'une tâche pour réseaux sans fil
US11997574B2 (en) Systems and methods for a micro-service data gateway
US20240037409A1 (en) Transfer models using conditional generative modeling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23723575

Country of ref document: EP

Kind code of ref document: A1