WO2022028793A1

WO2022028793A1 - Instantiation, training, and/or evaluation of machine learning models

Info

Publication number: WO2022028793A1
Application number: PCT/EP2021/068560
Authority: WO
Inventors: Gábor HANNÁK; Péter SZILÁGYI
Original assignee: Nokia Technologies Oy
Priority date: 2020-08-07
Filing date: 2021-07-06
Publication date: 2022-02-10

Abstract

Systems, methods, apparatuses, and computer program products for instantiation, training, and/or evaluation of machine learning models. For example, certain embodiments may automatically create and maintain a set of machine learning (ML) model instances that are running on different parts of the input data samples. Certain operations described herein may start with a single (first) model instance trained on training/available data. This model instance may be evaluated on the data elements to detect the problematic elements on which the model instance's accuracy is poor. A new (second) model instance may be trained on those data elements for which the initial model performed poorly. Additional model instances may be created and/or evaluated in a recursive manner similar to that described with the first and second model instances until the various parts of the input data are covered by a sufficiently accurate model instance.

Description

INSTANTIATION, TRAINING, AND/OR EVALUATION OF MACHINE LEARNING MODELS

FIELD:

Some example embodiments may generally relate to mobile or wireless telecommunication systems, such as Long Term Evolution (LTE) or fifth generation (5G) radio access technology or new radio (NR) access technology, or other communications systems. For example, certain embodiments may relate to systems and/or methods for instantiation, training, and/or evaluation of machine learning models.

BACKGROUND:

Examples of mobile or wireless telecommunication systems may include the Universal Mobile Telecommunications System (UMTS) Terrestrial Radio Access Network (UTRAN), Long Term Evolution (LTE) Evolved UTRAN (E-UTRAN), LTE-Advanced (LTE-A), MulteFire, LTE-A Pro, and/or fifth generation (5G) radio access technology or new radio (NR) access technology. 5G wireless systems refer to the next generation (NG) of radio systems and network architecture. 5G is mostly built on a new radio (NR), but a 5G (or NG) network can also build on E-UTRA radio. It is estimated that NR may provide bitrates on the order of 10-20 Gbit/s or higher, and may support at least enhanced mobile broadband (eMBB) and ultra-reliable low-latency-communication (URLLC) as well as massive machine type communication (mMTC). NR is expected to deliver extreme broadband and ultra-robust, low latency connectivity and massive networking to support the Internet of Things (loT). With loT and machine-to-machine (M2M) communication becoming more widespread, there will be a growing need for networks that meet the needs of lower power, low data rate, and long battery life. It is noted that, in 5G, the nodes that can provide radio access functionality to a user equipment (i.e., similar to Node B in UTRAN or eNB in LTE) may be named gNB when built on NR radio and may be named NG-eNB when built on E-UTRA radio.

SUMMARY:

According to a first embodiment, a method may include initializing a model instance. The method may include training the model instance on an associated data set. The method may include evaluating an accuracy of the model instance with respect to data elements of the associated data set. The method may include determining whether the accuracy satisfies a threshold for the data elements of the associated data set. The method may include determining to not create one or more additional model instances when the accuracy satisfies the threshold for the data elements. The method may include creating one or more partitions of the associated data set for one or more data elements when the accuracy fails to satisfy the threshold for the one or more data elements, and initializing the one or more additional model instances for the one or more partitions of the associated data.

In a variant, the method may further include training the one or more initialized additional model instances on the one or more partitions. In a variant, creating the one or more partitions and initializing the one or more additional model instances may further include iteratively creating a partition and initializing an additional model instance for the partition when one or more previously initialized model instances fail to have an accuracy that satisfies a threshold for the partition. In a variant, the method may further include re-training the model instance on the associated data set excluding the one or more partitions.

According to a second embodiment, a method may include receiving a set of data elements and information identifying one or more states. Each of the set of data elements may be associated with a state of the one or more states. Each of the one or more states may be associated with a model instance. The method may include determining whether a corresponding entry for the one or more states is included in a data structure. The data structure may include information identifying at least one state of the one or more states and associated model instances of the at least one state. The method may include providing information that identifies, for the at least one state that has the corresponding entry in the data structure, the associated model instances. The method may include initializing, for at least one other state of the one or more states that does not have the corresponding entry in the data structure, at least one model instance.

In a variant, the data structure may further store information identifying an accuracy of the associated model instances with respect to the sets of data elements. In a variant, the method may further include, based on determining that the corresponding entry for the at least one other states is not included in the data structure, collecting data elements associated with the one or more states, training one or more new model instances on the collected data elements, and storing, in the data structure, an entry for an association between the one or more new model instances and the one or more states.

In a variant, the method may further include, based on determining that the corresponding entry for the at least one other state is not included in the data structure, selecting one or more related model instances that are associated with other states similar to the at least one other state, and storing, in the data structure, an entry for an association between the one or more related model instances and the at least one other state. In a variant, the method may further include, based on determining that the corresponding entry for the at least one other state is not included in the data structure, executing and evaluating, with respect to the at least one other state, one or more model instances associated with entries in the data structure, selecting a model instance with a highest relative accuracy for the at least one other state, and storing, in the data structure, an entry for an association between the model instance with the highest relative accuracy and the at least one other state.

A third embodiment may be directed to an apparatus including at least one processor and at least one memory comprising computer program code. The at least one memory and computer program code may be configured, with the at least one processor, to cause the apparatus at least to perform the method according to the first embodiment or the second embodiment, or any of the variants discussed above.

A fourth embodiment may be directed to an apparatus that may include circuitry configured to perform the method according to the first embodiment or the second embodiment, or any of the variants discussed above.

A fifth embodiment may be directed to an apparatus that may include means for performing the method according to the first embodiment or the second embodiment, or any of the variants discussed above. Examples of the means may include one or more processors, memory, and/or computer program codes for causing the performance of the operation.

A sixth embodiment may be directed to a computer readable medium comprising program instructions stored thereon for performing at least the method according to the first embodiment or the second embodiment, or any of the variants discussed above.

A seventh embodiment may be directed to a computer program product encoding instructions for performing at least the method according to the first embodiment or the second embodiment, or any of the variants discussed above.

BRIEF DESCRIPTION OF THE DRAWINGS:

For proper understanding of example embodiments, reference should be made to the accompanying drawings, wherein:

Fig. 1 illustrates an example of instantiation, training, and/or evaluation of machine learning models, according to some embodiments;

Fig. 2 illustrates an example flow diagram of a method ofexample operations of a computing node related to determining whether to create new data partitions and create new model instances for the new data partitions, according to some embodiments; Fig. 3 illustrates an example architecture of a computing node, according to some embodiments;

Fig. 4a illustrates example operations of a computing device related to handling new states, according to some embodiments;

Fig. 4b illustrates additional example operations of the computing node related to handling the new states, according to some embodiments;

Fig. 4c illustrates further example operations of the computing device related to handling the new states, according to some embodiments;

Fig. 5 illustrates an example data structure, according to some embodiments;

Fig. 6 illustrates an example deployment, according to some embodiments;

Fig. 7 illustrates an example process of implementation, according to some embodiments;

Fig. 8 illustrates an example flow diagram of a method, according to some embodiments; Fig. 9 illustrates an example flow diagram of a method, according to some embodiments;

Fig. 10 illustrates an example block diagram of an apparatus, according to an embodiment.

DETAILED DESCRIPTION:

It will be readily understood that the components of certain example embodiments, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of some example embodiments of systems, methods, apparatuses, and computer program products for instantiation, training, and/or evaluation of machine learning models is not intended to limit the scope of certain embodiments but is representative of selected example embodiments.

The features, structures, or characteristics of example embodiments described throughout this specification may be combined in any suitable manner in one or more example embodiments. For example, the usage of the phrases “certain embodiments,” “some embodiments,” or other similar language, throughout this specification refers to the fact that a particular feature, structure, or characteristic described in connection with an embodiment may be included in at least one embodiment. Thus, appearances of the phrases “in certain embodiments,” “in some embodiments,” “in other embodiments,” or other similar language, throughout this specification do not necessarily all refer to the same group of embodiments, and the described features, structures, or characteristics may be combined in any suitable manner in one or more example embodiments. In addition, the phrase “set of’ refers to a set that includes one or more of the referenced set members. As such, the phrases “set of,” “one or more of,” and “at least one of,” or equivalent phrases, may be used interchangeably. Further, “or” is intended to mean “and/or,” unless explicitly stated otherwise. Additionally, if desired, the different functions or operations discussed below may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the described functions or operations may be optional or may be combined. As such, the following description should be considered as merely illustrative of the principles and teachings of certain example embodiments, and not in limitation thereof.

Machine learning (ML) may involve model training, deployment, benchmarking, and selection. An ML model may have a pre-defined input, output, and internal structure with internal parameters (e.g., an input layer, an output layer, and hidden layers in an artificial neural network). A specific model instance may be the result of a training procedure. The training procedure may take a pre-defined model and may optimize the model’s internal parameters via an algorithm (e.g., gradient descent) so that it fits the training data (and possibly also data that is similar to the training data but is not a part of the training data). Based on this, model instances with the exact same input, output, and internal structures may be configured differently due the different impressions (training data and algorithms) shaping their internal parameters. Such model instances may perform different computation or inference tasks. Additionally, the structure of the model instances may also change while keeping their observable input and output unchanged, which may enable an even larger difference in their implemented compute logic compared to parameter-only differences.

ML may be applied in network functions and network management services to implement data driven use cases. Various functionality can be realized with supervised ML on the conditions that 1) the problem can be formulated as a mapping between a set of inputs and a set of outputs; and 2) that a sufficient amount of data (examples of the desired input-output mapping) can be collected for training. A single model instance may be trained on the entire training data set, or multiple model instances may be trained on different parts of the data set. When using a single model instance, the model may have to have a sufficiently complex structure so that it can cover the training data, and the model may have to be generalized to data that is not part of its training data set. If the data is diverse, no single model instance may be able to achieve suitable accuracy on every data point (regardless of hyper-parameter tuning, changing the model’s internal structure, regularization, or other ML optimizations). A single model instance may also be large (e.g., with many hidden layers in a neural network with parameters on the order of hundreds of millions, which may result in a model with a size of several megabytes). Thus, using a single model instance can consume significant space and compute resources (and time) for training and evaluation, while also overfitting the data and poorly generalizing to unseen data points.

Instead, splitting the data into multiple partitions (which may or may not be disjoint partitions), and training a dedicated model instance for each part (with the same input/output interface) may be more efficient and may provide more accurate results as model instances can specialize. However, in this case, finding the suitable data split may be a technical problem where additional knowledge of the data itself may have to be used. Sometimes an optimal data split (including the case of no split used with a single model instance) cannot be established ahead of time, but may have to have several iterations of: data splitting, model design, training, and evaluation. Additionally, in dynamically changing systems, the optimal data split and model coverage may change in time. Further, there may be a problem of selecting the model instance to be applied to a new data point for inference, without maintaining a history of the training data per model instance and finding the closest training data element (and its corresponding model).

Some embodiments described herein may provide for instantiation, training, and/or evaluation of machine learning models. For instance, certain embodiments may provide a computing node that automatically creates, evaluates, and/or maintains ML model instances that collectively implement an ML use case. The model instances may have the same structure, defined by the set of input data types that are to be consumed by the model instances and the output that is to be produced by the model (e.g., the inputs and output together may be a ML model template (MT)).

Certain embodiments may automatically create and maintain a set of ML model instances that are running on different (but not necessarily disjoint) partitions of input data samples. Certain operations described herein may start with a single (first) model instance trained on training/available data. This model instance may be evaluated on the data elements to detect the problematic elements on which the model instance’s accuracy is poor (e.g., fails to satisfy a threshold). A new (second) model instance may be trained on those data elements for which the initial model failed to have a threshold accuracy. The first model instance may also be retrained on the data, but excluding the data elements for which the first model instance failed to have a threshold accuracy. Additional model instances may be created and/or evaluated in a recursive manner similar to that described with the first and second model instances until the various parts of the input data are covered by a sufficiently accurate model instance (e.g., a model instance with an accuracy that satisfies a threshold). This may result in improved accuracy of a data processing task utilizing ML model instances.

Fig. 1 illustrates an example 100 of instantiation, training, and/or evaluation of machine learning models, according to some embodiments. The example 100 of Fig. 1 illustrates example operations of a computing node (e.g., similar to apparatus 10 of Fig. 10). In certain embodiments, the computing node may host one or more components described herein, such as a model supervisor or a state classifier entity that may perform the operations illustrated in, and described with respect to Fig. 1.

As illustrated in Fig. 1, the computing node may iteratively partition the data and may train a model instance for each partition. In iteration [1], a single model instance (Model Instance 1) may be trained for the data set (single partition that comprises the entire data set). The model instance’s accuracy may then be evaluated on the various data elements of the data set, which may result in either the accuracy satisfying a threshold (e.g., being greater than or equal to the threshold) or failing to satisfy the threshold (e.g., being less than the threshold). Satisfaction of the threshold may indicate that the model instance has a sufficient accuracy with respect to the corresponding data element(s) of the data set, and failing to satisfy the threshold may indicate that the model instance has poor (or insufficient) accuracy with respect to data element(s) of the data set.

As there may be data elements for which the model instance [1] fails to provide a threshold accuracy, the computing node may continue with a second iteration (iteration [2]). In iteration [2], model instance 2 may be trained on data elements for which the model instance in the previous iteration (model instance 1) failed to have a threshold accuracy. For example, the computing node may partition the data element(s) for which the model instance 1 failed to provide a threshold accuracy from the data set and may process the partitioned data using the model instance 2. The model instance 2 may be created from the model instance 1 by performing hyper-parameter tuning on the model instance 1, changing the model instance l’s internal structure, regularization with respect to the model instance 1, or other ML optimizations for the data element(s) in the partitioned data elements.

Similar to the iteration [1] described above, the model instance [2] may still fail to have a threshold accuracy with respect to data elements of the partitioned data elements. As there may still be data elements where both model instances 1 and 2 fail to provide a threshold accuracy, the computing node may continue with a third iteration (iteration [3]). In iteration [3], a third model instance (model instance 3) may be trained on the data elements for which none of the previously created model instances has a threshold accuracy (e.g., data elements in the partitioned data elements for which the model instance 2 also failed to provide a threshold accuracy). The computing node may determine to stop the iterations when the various data elements in the data set have been partitioned such that the data elements can be processed by at least one created model instance with a threshold accuracy. In certain embodiments, the previously created model instance(s) may be re-trained such that the data elements resulting in poor accuracy are excluded from the training data for the model instances. For example, after partitioning data elements from the data set, the computing node may retrain the model instance 1 on data elements for which the model instance 1 provides a threshold accuracy, and not on data elements for which the model instance fails to provide the threshold accuracy.

As described above, Fig. 1 is provided as an example. Other examples are possible, according to some embodiments.

Fig. 2 illustrates an example flow diagram of a method 200 of operations of a computing node related to determining whether to create new data partitions and create new model instances for the new data partitions, according to some embodiments. In certain embodiments, the method 200 may be performed by a component of the computing node described elsewhere herein.

As illustrated at 202, the computing node may initialize a model instance. In connection with initializing the model instance, the computing node may receive a training data set to be processed by one or more model instances with respect to a data processing task. As illustrated at 204, the computing node may train the model instance on an associated data set. For example, the computing node may train the model instance on the data set received in connection with the operations at 202. As illustrated at 206, the computing node may evaluate an accuracy of the model instance with respect to data elements of the associated data set. For example, the computing node may determine an accuracy of the model instance with respect to each of the data elements of the training data set.

As illustrated at 208, the computing node may determine whether an accuracy of the model instance satisfies a threshold for the data elements. For example, the computing node may determine whether the model instance provides a threshold accuracy for data elements of the training data set. If the computing node determines that the accuracy satisfies the threshold for the data elements (e.g., there are no data elements in the data set for which the model instance fails to have a threshold accuracy) (208-YES), then the computing node may determine to not create one or more additional model instances, as illustrated at 210. For example, the computing node may use the model instance for a data processing task, without partitioning the data set into multiple partitions and training one or more new model instances on the newly created partitions.

If the computing node determines that the accuracy does not satisfy the threshold for any of the data elements (208-NO), then the computing node may create a new data partition for data elements where the model instance failed to provide the threshold accuracy, as illustrated at 212. For example, the computing node may combine the data elements for which the accuracy failed to satisfy the threshold accuracy into a data partition. As illustrated at 214, the computing node may create a new model instance and may associate the new model instance with the new data partition. For example, the computing node may create a new model instance to process the data using the new model instance. Creating the new model instance may include training the first model instance specifically on the data partition and/or performing one or more ML optimizations on the first model instance. After creating the new model instance, the computing node may return to performing the operations at 204 with respect to the new data partition and may continue the partitioning and evaluation operations described herein iteratively. Once a data partition includes data elements for which a particular model instance provides a threshold accuracy, the computing node may associate the model instance and the partition of data elements. For example, the computing node may associate the new model instance and the new data partition by storing, in a data structure, information that identifies the new model instance and the new data partition.

As indicated above, Fig. 2 is provided as an example. Other examples are possible, according to some embodiments.

Fig. 3 illustrates an example architecture of a computing node, according to some embodiments. The example of Fig. 3 illustrates a computing node 300, a model supervisor, a state classifier entity of the computing node 300 (where the model supervisor and the state classifier may be components of the computing node 300), entities of the computing node 300 (e.g., an association table), model instances hosted on the computing node 300, a model template, and various data sources (e.g., other computing nodes). Although the model template and the data sources are illustrated in Fig. 3 as being external to the computing node 300, the model template and/or the data sources may be implemented in one or more components or entities of the computing node 300. Similarly, certain components, elements, and/or model instances may be implemented external to the computing node 300, in certain embodiments.

As illustrated in Fig. 3, the computing node may receive various types of inputs, such as the model template (at 302), which describes the inputs and output of the models (e.g., the model structure), and the input data from the data sources (at 304). The state classifier entity may automatically quantize/cluster input data elements according to similarity. For example, the computing node 300 may group, in the same cluster, data elements that are neighboring elements according to a distance measure, for example, Euclidean distance, Jaccard distance, and/or the like. Dissimilar data elements (e.g., that are not neigboring elements according to the distance measure, may be grouped in different clusters. By the quantization/clustering process, the state classifier entity may implicitly partition the space defined by the data elements into smaller sets referred to as states. Therefore, each data element may be associated with a state, and each state may be associated with a set of data elements, serving as a self-similar subset of the data set. For splitting numerical data, one or more various techniques may be used, such as K-means clustering, hierarchical clustering, neural gas-based algorithms, and/or the like. For partitioning categorical data, clustering techniques, such as a K-modes technique or an itemset mining-based technique, may be used.

The model supervisor may select, for each input data element, a model instance to be used for inference. Additionally, or alternatively, the model supervisor may train and evaluate model instance(s) derived from the model template. Each new input data point may be input to the state classifier entity, which may provide the state corresponding to the data element.

The model supervisor may maintain an association table as an example data structure. The table may record information that identifies the data partition that corresponds to each model instance, and the accuracy of the model instance on the data in the data partition. The data partition may be implemented as, or identified by, a list of states provided by the state classifier entity. Information that identifies the accuracy of the model instance(s) may be collected from the model instance(s) as output of a training process (e.g., ML model instances may produce an accuracy value with respect to the training data set, or another data set associated with a data processing task, which may be expressed as a percentage of data elements accurately processed, a likelihood of accurately processing the data elements, and/or the like).

The association table may enable the model supervisor to track the accuracy and data partition of the model instance(s) while other operations, such as those illustrated in Fig. 2, are performed (e.g., iteratively creating and training model instances and partitioning the data). Additionally, or alternatively, the table may enable the model supervisor to select a particular model instance to be used for a new data element, such as by checking which model instance’s data partition includes the same state associated with the new data element. Such model instance may provide a threshold accuracy on data elements associated with the same state.

Based on the data similarity assumptions within states, the computing node 300 (or a component or entity of the computing node 300) may assume that the same model instance will have a threshold accuracy with respect to a new data element from the same state. Using the association between states and data elements, new input data elements may not have to be processed by all model instances to select the model instance that may likely have at least a threshold accuracy with respect to the new input data. This conserves computing resources of the computing node 300 that would otherwise be consumed processing the new input data with all of the created model instances to identify which model instances have a threshold accuracy. Additionally, or alternatively, there may be no need to process every past data element to identify which past data element(s) are similar to the new input data so that the associated model instance(s) of the similar past data element(s) can be used to process the new input data.

As described above, Fig. 3 is provided as an example. Other examples are possible, according to some embodiments.

Fig. 4a illustrates an example flow diagram of a method 400 of operations of a computing node related to handling new states, according to some embodiments. For example Fig. 4a illustrates example operations of a computing node (e.g., apparatus 10 of Fig. 10) or of a component of the computing node.

As illustrated, the computing node may receive data elements, at 402, and information identifying a state of the data elements, at 404, as input data. As illustrated at 406, the computing node may determine whether there is an entry in a data structure for the state. For example, the data structure may store information that identifies various states and the model instances to be used to process each of the states. If the computing node determines that there is an entry in the data structure for the state (406-YES), then the computing node may execute a model instance identified in the data structure, at 408. For example, the computing node may use the model instance to process the input data received at 402. If the computing node determines that there is not an entry in the data structure for the state (406-NO), then the computing node may perform one or more actions, at 410. The one or more actions are described with respect to Fig. 4b, which illustrates additional example operations of the computing node with respect to the method 400.

In particular, Fig. 4b illustrates examples of the one or more actions at 410. At 412, the computing node may execute already created model instances on the input data elements received at 402. For example, the computing node may execute the model instances created for various partitions of a training data set to process the input data elements. At 414, the computing node may select a model instance from the already created model instances based on a performance of the already created model instances. For example, the computing node may select the already created model instance that has the highest accuracy relative to the other already created model instances. At 416, the computing node may create a new entry in the data structure for an association between the selected model instance and the input data elements (e.g., between the selected model instance and the state associated with the state identified in the information received at 404). At 418, the computing node may add data to a data collector or create a new data collector for the state as one example of the one or more actions at 410. For example, a data collector may be a component of the computing node, that includes a data structure, a database, and/or the like, that may gather and store data from one or more data sources for one or more states (e.g., the data collector may gather and store data along with information that identifies the state(s) associated with data elements of the data). If there is a data collector already created for the state, then the computing node may gather data and add it to the already created data collector. If a data collector has not already been created for the state, then the computing node may create the data collector and may add gathered data to the newly created data collector. At 420, the computing node may determine whether a sufficient amount of data has been gathered. For example, the computing node may determine whether a volume of data (e.g., in terms of size, such as megabytes, gigabytes, terabytes, etc.), a quantity of data elements, and/or the like that have been gathered satisfies a threshold. Further operations that the computing node may perform are illustrated in, and described with respect to, Fig. 4c.

In particular, Fig. 4c illustrates example operations that the computing node may perform after determining whether a sufficient amount of data has been gathered at 420. If the computing node determines that there is not a sufficient amount of data (420-NO), then the computing node may determine to wait for additional data elements for the state, at 422. For example, the computing node may determine to wait for a threshold quantity of data elements, a threshold amount data (e.g., by size, such as megabytes, gigabytes, or terabytes), and/or the like to be received. If the computing node determines that a sufficient amount of data has been gathered (420-YES), then the computing node may, at 424, evaluate already created model instances with respect to the data elements received at 402. For example, the computing node may process the data elements with the already created model instances and may determine an accuracy of each of the already created model instances with respect to the data elements received at 402.

At 426, the computing node may determine whether any of the already created model instances have a threshold accuracy with respect to the data elements received at 402. For example, the computing node may determine whether the accuracy of any of the already created model instances satisfies a threshold. If the computing node determines that any of the already created model instances have a threshold accuracy (426-YES), then the computing node may select a model instance with the threshold accuracy, at 428. For example, the computing node may select the model instance with the highest accuracy relative to other model instances that have the threshold accuracy. If the computing node determines that none of the already created model instances have the threshold accuracy (426-NO), then the computing node may train a new model instance based on the input data elements, at 430. For example, the computing node may train an initial model instance on the input data elements received at 402 after performing ML optimizations on the initial model instance.

At 432, the computing node may create a new entry in the data structure for an association between the received state (at 404) and the model instance selected or trained for the state. For example, the computing node may create a new record in an association table that identifies the state and the selected or trained model instance.

Returning to Fig. 4b, the computing node may, at 434 as an example of the one or more actions at 410, identify a similar state to the received state (the state received at 404) based on a state classifier entity. For example, the computing node may determine whether the received state is associated with data elements that are similar to data elements of another state included in the data structure. Similarity between data elements is described elsewhere herein in more detail. At 436, the computing node may select the model instance associated with the similar state to process the data elements received at 402. For example, the computing node may initialize the model instance and may process the data elements received at 402 using the model instance. At 438, the computing node may create a new entry in the data structure for an association between the selected model instance and the input data elements. For example, the computing node may create a new record in an association table that identifies the association between the selected model instance and the state received at 404.

As described above, Figs. 4a-4c are provided as examples. Other examples are possible, according to some embodiments.

With respect to the operations illustrated in, and described with respect to, Figs. 4a-4c, if, for an input data element, the state classifier entity outputs a state to which there is no model instance assigned yet, there may be one or more implementation options. For example, upon the first encounter with a new state, the model supervisor may start collecting data elements associated with this state. When a sufficient amount is collected, the model supervisor may train a new model instance from the model template for this new state. Upon successful training, a new entry in the association table may be created that associates the new state with a data partition and the new model instance.

As another example, a state modeling entity (which may be a component of the computing node) may, in certain embodiments, output not only the state of a particular data element, but also a list of states that are similar to the state of the data element. The model supervisor may execute a model instance that is associated with a similar state (if an associated entry in the association table exists), and may create an association entry between the new state and the similar model instance (e.g., if the similar state’s model instance provides a threshold accuracy for the data element(s) of the new state).

As another example, various trained model instances may be executed and evaluated for a new data element, and a model instance may be selected, as long as the model instance has a threshold accuracy. An entry for the respective state and this model instance may be created in a data structure. If no existing model instance satisfies a threshold accuracy, the model supervisor may fall back to collecting data to train a new model instance).

If a new state is input to the model supervisor and a model instance has to be assigned or trained, there may be one or more operations that can be performed. For example, the model supervisor may train a new model instance on the data elements associated with the new state. As another example, the model supervisor may apply transfer learning in order to reuse parts of an existing model instance (with a threshold accuracy). For example, the model supervisor may apply changes to an existing model instance to account for differences between the state associated with the selected model and the newly received state. This may have the advantage that a small amount of training data may have to be collected for the new state. As another example, the computing node may perform similar state approximation. For example, the computing node may search for, and select, a state that is similar to the new state and that has an existing model instance associated with the state.

Fig. 5 illustrates an example data structure 500, according to some embodiments. For example, the data structure 500 may include an association table described elsewhere herein. As illustrated in Fig. 5, the data structure 500 may include information that identifies various data partitions that include states grouped together (in a “Data Partition” column). For example, the data structure 500 identifies a first data partition for states [1,2] grouped together, where data elements for states [1,2] are included in the first data partition. In addition, the data structure 500 may include information that identifies a model instance for each of the data partitions identified in the data structure 500 (e.g., in a “Model Instance” column). For example, the data structure 500 may identify model instance 1 as being the model instance associated with the first data partition (e.g., the model instance 1 may be used to process data elements associated with the states [1,2]). Further, the data structure 500 may identify an accuracy of the model instance with respect to the associated data partition for the model instance (e.g., in an “Accuracy” column). For example, the data structure 500 may identify a percentage of the data elements of the data partition that are accurately processed, a likelihood of processing any given data element, and/or the like as an accuracy of the model instance. As described above, Fig. 5 is provided as an example. Other examples are possible, according to some embodiments.

Fig. 6 illustrates an example deployment 600, according to some embodiments. For example, Fig. 6 illustrates an example deployment 600 of a model supervisor host (e.g., that hosts or implements a model supervisor), a state classifier entity host (e.g., that hosts or implements a state classifier entity), a model instance training/execution host (e.g., that hosts or implements a model instance), and data sources. In certain embodiments, the model supervisor host, the state classifier entity host, the model instance training/execution host, and/or the data sources may be hosted on different computing nodes, on a same computing node, or on some combination of multiple computing nodes. In certain embodiments, these hosts may be separate network functions, cloud nodes (e.g., an edge cloud, a core cloud, a public cloud, etc.), or any other compute environments that can run the implementation of the functions.

The model supervisor may deploy a selected model instance at the model instance training/execution host for training and/or execution on a new data element. In some embodiments, a model instance that the model supervisor has created may be deployed once and may be continuously up and running in one or more model instance training/execution hosts, ready to receive input data. This conserves time and/or computing resources associated with de-initializing and re-initializing a model instance. In this case, the model supervisor may not need to re-deploy a model instance when a different model instance needs to be used for the next input data element. Rather, it may provide the input data to the appropriate model instance. In certain embodiments, one or more of the model supervisor hosts, state classifier entity hosts, and/or model instance training/execution hosts may be the same, making one or more of these functions co-located. For example, it may be possible to co-locate the state classifier entity with the training/execution of one or more of the model instances on the same host to benefit from the shared input data. In certain embodiments, it may be possible to co-locate the model supervisor with the training/execution of one or more model instances on the same host to reduce the communication overhead and/or computing resources of switching between model instances.

As described above, Fig. 6 is provided as an example. Other examples are possible, according to some embodiments. Fig. 7 illustrates an example process 700 of implementation, according to some embodiments. For example, Fig. 7 illustrates an example process 700 of implementation with respect to open radio access network (O-RAN) architecture elements.

As illustrated at 702 (operation 1), a ML modeler may use a designer environment along with ML toolkits to create the initial ML model. The ML modeller may create an ML model template. As illustrated at 704, the initial model may be sent to training hosts for training. As illustrated at 706, data sets may be collected from a near-real-time (near-RT) RAN intelligent controller (RIC), open centralized unit (O-CU), and open distributed unit (O-DU) to a data lake and passed to the ML training hosts. With respect to the operations illustrated at 704 and 706, the model supervisor may receive ML model templates and may collect training data from O-RAN architecture elements and data lakes, and may initiate training. The model supervisor can host the ML designer catalog, which may store trained model instances. As illustrated 708, the trained model/sub-models may be uploaded to the ML designer catalog. The ML model may be composed.

As illustrated at 710, the ML model may be published to a non-real-time (non-RT) RIC along with the associated license and metadata. As illustrated at 712, the non-RT RIC may create a containerized ML application containing model artifacts. As illustrated at 714, the non-RT RIC may deploy the ML application to the near-RT RIC, O-DU and open radio unit (O-RU) using the 01 interface. Policies may also be set using the Al interface. The model supervisor may deploy model instances and associated ML applications to the ML model inference host. As illustrated at 716, PM data may be sent back to ML training hosts from the near-RT RIC, O-DU and O-RU for retraining. The model supervisor may receive a PM data/model instance benchmark in order to select the suitable model instance and may monitor the model accuracy.

As described above, Fig. 7 is provided as an example. Other examples are possible, according to some embodiments.

Fig. 8 illustrates an example flow diagram of a method 800, according to some embodiments. For example, Fig. 8 shows example operations of a computing node (e.g., apparatus 10 illustrated in, and described with respect to, Fig. 10). Some of the operations illustrated in Fig. 8 may be similar to some operations shown in, and described with respect to, Figs. 1-7.

In an embodiment, the method may include, at 802, initializing a model instance. The method may include, at 804, training the model instance on an associated data set. The method may include, at 806, evaluating an accuracy of the model instance with respect to data elements of the associated data set. The method may include, at 808, determining whether the accuracy satisfies a threshold for the data elements of the associated data set. The method may include, at 810, determining to not create one or more additional model instances when the accuracy satisfies the threshold for the data elements. The method may include, at 812, creating one or more partitions of the associated data set for one or more data elements when the accuracy fails to satisfy the threshold for the one or more data elements, and initializing the one or more additional model instances for the one or more partitions of the associated data.

The computing node may perform one or more other operations in connection with the method 800 illustrated in Fig. 8. In some embodiments, the method may include training the one or more initialized additional model instances on the one or more partitions. In some embodiments, creating the one or more partitions and initializing the one or more additional model instances may further include iteratively creating a partition and initializing an additional model instance for the partition when one or more previously initialized model instances fail to have an accuracy that satisfies a threshold for the partition. In some embodiments, the method may include re-training the model instance on the associated data set excluding the one or more partitions.

As described above, Fig. 8 is provided as an example. Other examples are possible according to some embodiments.

Fig. 9 illustrates an example flow diagram of a method 900, according to some embodiments. For example, Fig. 9 shows example operations of a computer node (e.g., apparatus 10 illustrated in, and described with respect to, Fig. 10). Some of the operations illustrated in Fig. 9 may be similar to some operations shown in, and described with respect to, Figs. 1-7.

In an embodiment, the method may include, at 902, receiving a set of data elements and information identifying one or more states. Each of the set of data elements may be associated with a state of the one or more states. Each of the one or more states may be associated with a model instance. The method may include, at 904, determining whether a corresponding entry for the one or more states is included in a data structure. The data structure may include information identifying at least one state of the one or more states and associated model instances of the at least one state. The method may include, at 906, providing information that identifies, for the at least one state that has the corresponding entry in the data structure, the associated model instances. The method may include, at 908, initializing, for at least one other state of the one or more states that does not have the corresponding entry in the data structure, at least one model instance.

The computing node may perform one or more other operations in connection with the method 900 illustrated in Fig. 9. In some embodiments, the data structure may further store information identifying an accuracy of the associated model instances with respect to the sets of data elements. In some embodiments, based on determining that the corresponding entry for the at least one other states is not included in the data structure, the method may include collecting data elements associated with the one or more states, training one or more new model instances on the collected data elements, and storing, in the data structure, an entry for an association between the one or more new model instances and the one or more states.

In some embodiments, based on determining that the corresponding entry for the at least one other state is not included in the data structure, the method may include selecting one or more related model instances that are associated with other states similar to the at least one other state, and storing, in the data structure, an entry for an association between the one or more related model instances and the at least one other state. In some embodiments, based on determining that the corresponding entry for the at least one other state is not included in the data structure, the method may include executing and evaluating, with respect to the at least one other state, one or more model instances associated with entries in the data structure, selecting a model instance with a highest relative accuracy for the at least one other state, and storing, in the data structure, an entry for an association between the model instance with the highest relative accuracy and the at least one other state.

As described above, Fig. 9 is provided as an example. Other examples are possible according to some embodiments.

Fig. 10 illustrates an example of an apparatus 10 according to an embodiment. In an embodiment, apparatus 10 may be a computing node and/or a component hosted on a computing node described herein. For example, apparatus 10 may be a computing node, a model supervisor or a model supervisor host, a state classifier entity or a state classifier entity host, a model instance training/execution host, and/or the like described elsewhere herein, and/or the like. Additionally, or alternatively, apparatus 10 may be a data source.

In some example embodiments, apparatus 10 may include one or more processors, one or more computer-readable storage medium (for example, memory, storage, or the like), one or more radio access components (for example, a modem, a transceiver, or the like), and/or a user interface. In some embodiments, apparatus 10 may be configured to operate using one or more radio access technologies, such as GSM, LTE, LTE-A, NR, 5G, WLAN, WiFi, NB-IoT, Bluetooth, NFC, MulteFire, and/or any other radio access technologies. It should be noted that one of ordinary skill in the art would understand that apparatus 10 may include components or features not shown in, or different than, Fig . 10, depending on the device to which apparatus 10 corresponds.

As illustrated in the example of Fig. 10, apparatus 10 may include or be coupled to a processor 12 for processing information and executing instructions or operations. Processor 12 may be any type of general or specific purpose processor. In fact, processor 12 may include one or more of general-purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs), field- programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), and processors based on a multi -core processor architecture, as examples. While a single processor 12 is shown in Fig. 10, multiple processors may be utilized according to other embodiments. For example, it should be understood that, in certain embodiments, apparatus 10 may include two or more processors that may form a multiprocessor system (e.g., in this case processor 12 may represent a multiprocessor) that may support multiprocessing. In certain embodiments, the multiprocessor system may be tightly coupled or loosely coupled (e.g., to form a computer cluster).

Processor 12 may perform functions associated with the operation of apparatus 10 including, as some examples, precoding of antenna gain/phase parameters, encoding and decoding of individual bits forming a communication message, formatting of information, and overall control of the apparatus 10, including processes related to management of communication resources.

Apparatus 10 may further include or be coupled to a memory 14 (internal or external), which may be coupled to processor 12, for storing information and instructions that may be executed by processor 12. Memory 14 may be one or more memories and of any type suitable to the local application environment, and may be implemented using any suitable volatile or nonvolatile data storage technology such as a semiconductor-based memory device, a magnetic memory device and system, an optical memory device and system, fixed memory, and/or removable memory. For example, memory 14 can be comprised of any combination of random access memory (RAM), read only memory (ROM), static storage such as a magnetic or optical disk, hard disk drive (HDD), or any other type of non-transitory machine or computer readable media. The instructions stored in memory 14 may include program instructions or computer program code that, when executed by processor 12, enable the apparatus 10 to perform tasks as described herein. In an embodiment, apparatus 10 may further include or be coupled to (internal or external) a drive or port that is configured to accept and read an external computer readable storage medium, such as an optical disc, USB drive, flash drive, or any other storage medium. For example, the external computer readable storage medium may store a computer program or software for execution by processor 12 and/or apparatus 10.

In some embodiments, apparatus 10 may also include or be coupled to one or more antennas 15 for receiving a signal and for transmitting another signal from apparatus 10. Apparatus 10 may further include a transceiver 18 configured to transmit and receive information. The transceiver 18 may also include a radio interface (e.g., a modem) coupled to the antenna 15. The radio interface may correspond to a plurality of radio access technologies including one or more of GSM, LTE, LTE-A, 5G, NR, WLAN, NB-IoT, Bluetooth, BT-LE, NFC, RFID, UWB, and the like.

In an embodiment, memory 14 stores software modules that provide functionality when executed by processor 12. The modules may include, for example, an operating system that provides operating system functionality for apparatus 10. The memory may also store one or more functional modules, such as an application or program, to provide additional functionality for apparatus 10. The components of apparatus 10 may be implemented in hardware, or as any suitable combination of hardware and software. According to an example embodiment, apparatus 10 may optionally be configured to communicate with another apparatus (which may be similar to apparatus 10) via a wireless or wired communications link.

According to some embodiments, processor 12 and memory 14 may be included in or may form a part of processing circuitry or control circuitry. In addition, in some embodiments, transceiver 18 may be included in or may form a part of transceiving circuitry.

As used herein, the term “circuitry” may refer to hardware-only circuitry implementations (e.g., analog and/or digital circuitry), combinations of hardware circuits and software, combinations of analog and/or digital hardware circuits with software/firmware, any portions of hardware processor(s) with software (including digital signal processors) that work together to cause an apparatus (e.g., apparatus 10) to perform various functions, and/or hardware circuit(s) and/or processor(s), or portions thereof, that use software for operation but where the software may not be present when it is not needed for operation. As a further example, as used herein, the term “circuitry” may also cover an implementation of merely a hardware circuit or processor (or multiple processors), or portion of a hardware circuit or processor, and its accompanying software and/or firmware. The term circuitry may also cover, for example, a baseband integrated circuit in a server, cellular network node or device, or other computing or network device.

According to certain embodiments, apparatus 10 may be controlled by memory 14 and processor 12 to perform the functions associated with example embodiments described herein. For example, in some embodiments, apparatus 10 may be configured to perform one or more of the processes described with respect to, or depicted in, Figs. 1-9. For instance, in one embodiment, apparatus 10 may be controlled by memory 14 and processor 12 to perform the method 200 of Fig. 2, the method 400 of Figs. 4a-4c, the method 800 of Fig. 8, and/or the method 900 of Fig. 9.

Therefore, certain example embodiments provide several technological improvements, enhancements, and/or advantages over existing technological processes. For example, one benefit of some example embodiments is improved handling of data with respect to selection and/or creation of a model instance for processing the data. Accordingly, the use of some example embodiments results in improved functioning of communications networks and their nodes and, therefore constitute an improvement at least to the technological field of machine learning, among others.

In some example embodiments, the functionality of any of the methods, processes, signaling diagrams, algorithms or flow charts described herein may be implemented by software and/or computer program code or portions of code stored in memory or other computer readable or tangible media, and executed by a processor.

In some example embodiments, an apparatus may be included or be associated with at least one software application, module, unit or entity configured as arithmetic operation(s), or as a program or portions of it (including an added or updated software routine), executed by at least one operation processor. Programs, also called program products or computer programs, including software routines, applets and macros, may be stored in any apparatus-readable data storage medium and may include program instructions to perform particular tasks.

A computer program product may include one or more computer-executable components which, when the program is run, are configured to carry out some example embodiments. The one or more computerexecutable components may be at least one software code or portions of code. Modifications and configurations used for implementing functionality of an example embodiment may be performed as routine(s), which may be implemented as added or updated software routine(s). In one example, software routine(s) may be downloaded into the apparatus.

As an example, software or a computer program code or portions of code may be in a source code form, object code form, or in some intermediate form, and it may be stored in some sort of carrier, distribution medium, or computer readable medium, which may be any entity or device capable of carrying the program. Such carriers may include a record medium, computer memory, read-only memory, photoelectrical and/or electrical carrier signal, telecommunications signal, and/or software distribution package, for example. Depending on the processing power needed, the computer program may be executed in a single electronic digital computer or it may be distributed amongst a number of computers. The computer readable medium or computer readable storage medium may be a non-transitory medium.

In other example embodiments, the functionality may be performed by hardware or circuitry included in an apparatus (e.g., apparatus 10 or apparatus 20), for example through the use of an application specific integrated circuit (ASIC), a programmable gate array (PGA), a field programmable gate array (FPGA), or any other combination of hardware and software. In yet another example embodiment, the functionality may be implemented as a signal, such as a non-tangible means that can be carried by an electromagnetic signal downloaded from the Internet or other network.

According to an example embodiment, an apparatus, such as a node, device, or a corresponding component, may be configured as circuitry, a computer or a microprocessor, such as single-chip computer element, or as a chipset, which may include at least a memory for providing storage capacity used for arithmetic operation(s) and/or an operation processor for executing the arithmetic operation(s).

Example embodiments described herein apply equally to both singular and plural implementations, regardless of whether singular or plural language is used in connection with describing certain embodiments. For example, an embodiment that describes operations of a single computing node equally applies to embodiments that include multiple instances of the computing node, and vice versa.

One having ordinary skill in the art will readily understand that the example embodiments as discussed above may be practiced with operations in a different order, and/or with hardware elements in configurations which are different than those which are disclosed. Therefore, although some embodiments have been described based upon these example preferred embodiments, it would be apparent to those of skill in the art that certain modifications, variations, and alternative constructions would be apparent, while remaining within the spirit and scope of example embodiments. PARTIAL GLOSSARY

KPI Key Performance Indicator

ML Machine Learning MT Model Template

O-RAN Open RAN

RAN Radio Access Network

RIC RAN Intelligent Controller

UC Use Case

Claims

24 CLAIMS

1. A method, comprising: initializing, by a computing node, a model instance; training the model instance on an associated data set; evaluating an accuracy of the model instance with respect to data elements of the associated data set; determining whether the accuracy satisfies a threshold for the data elements of the associated data set; and determining to not create one or more additional model instances when the accuracy satisfies the threshold for the data elements, or creating one or more partitions of the associated data set for one or more data elements when the accuracy fails to satisfy the threshold for the one or more data elements, and initializing the one or more additional model instances for the one or more partitions of the associated data.

2. The method according to claim 1, further comprising: training the one or more initialized additional model instances on the one or more partitions.

3. The method according to claims 1 or 2, wherein creating the one or more partitions and initializing the one or more additional model instances further comprises: iteratively creating a partition and initializing an additional model instance for the partition when one or more previously initialized model instances fail to have an accuracy that satisfies a threshold for the partition.

4. The method according to any of claims 1-3, further comprising: re-training the model instance on the associated data set excluding the one or more partitions.

5. A method, comprising: receiving, by a computing node, a set of data elements and information identifying one or more states, wherein each of the set of data elements is associated with a state of the one or more states, wherein each of the one or more states is associated with a model instance; determining whether a corresponding entry for the one or more states is included in a data structure, wherein the data structure comprises information identifying at least one state of the one or more states and associated model instances of the at least one state; and providing information that identifies, for the at least one state that has the corresponding entry in the data structure, the associated model instances, and initializing, for at least one other state of the one or more states that does not have the corresponding entry in the data structure, at least one model instance.

6. The method according to claim 5, wherein the data structure further stores information identifying an accuracy of the associated model instances with respect to the sets of data elements.

7. The method according to claims 5 or 6, further comprising: based on determining that the corresponding entry for the at least one other states is not included in the data structure, collecting data elements associated with the one or more states; training one or more new model instances on the collected data elements; and storing, in the data structure, an entry for an association between the one or more new model instances and the one or more states.

8. The method according to any of claims 5-7, further comprising: based on determining that the corresponding entry for the at least one other state is not included in the data structure, selecting one or more related model instances that are associated with other states similar to the at least one other state; and storing, in the data structure, an entry for an association between the one or more related model instances and the at least one other state.

9. The method according to any of claims 5-8, further comprising: based on determining that the corresponding entry for the at least one other state is not included in the data structure, executing and evaluating, with respect to the at least one other state, one or more model instances associated with entries in the data structure; selecting a model instance with a highest relative accuracy for the at least one other state; and storing, in the data structure, an entry for an association between the model instance with the highest relative accuracy and the at least one other state.

10. A method, comprising: initializing a model instance; training the model instance on an associated data set; evaluating an accuracy of the model instance with respect to data elements of the associated data set; determining whether the accuracy satisfies a threshold for the data elements of the associated data set; and determining to not create one or more additional model instances when the accuracy satisfies the threshold for the data elements, or creating one or more partitions of the associated data set for one or more data elements when the accuracy fails to satisfy the threshold for the one or more data elements, and initializing the one or more additional model instances for the one or more partitions of the associated data.

11. A method, comprising: receiving a set of data elements and information identifying one or more states, wherein each of the set of data elements is associated with a state of the one or more states, wherein each of the one or more states is associated with a model instance; determining whether a corresponding entry for the one or more states is included in a data structure, wherein the data structure comprises information identifying at least one state of the one or more states and associated model instances of the at least one state; and providing information that identifies, for the at least one state that has the corresponding entry in the data structure, the associated model instances, and initializing, for at least one other state of the one or more states that does not have the corresponding entry in the data structure, at least one model instance.

12. An apparatus, comprising: at least one processor; and at least one memory including computer program code, wherein the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform the method according to any of claims 1-11.

13. An apparatus, comprising: means for performing the method according to any of claims 1-11.

14. A non-transitory computer readable medium comprising program instructions for causing an apparatus to perform the method according to any of claims 1-11.

15. An apparatus, comprising: circuitry configured to perform the method according to any of claims 1-11. 27

16. A computer program product encoding instructions for performing the method according to any of claims 1-11.