WO2022153079A1 - Procédés et appareils destinés à procurer des modèles d'apprentissage par machine candidats - Google Patents

Procédés et appareils destinés à procurer des modèles d'apprentissage par machine candidats Download PDF

Info

Publication number
WO2022153079A1
WO2022153079A1 PCT/IB2021/050226 IB2021050226W WO2022153079A1 WO 2022153079 A1 WO2022153079 A1 WO 2022153079A1 IB 2021050226 W IB2021050226 W IB 2021050226W WO 2022153079 A1 WO2022153079 A1 WO 2022153079A1
Authority
WO
WIPO (PCT)
Prior art keywords
source
candidate
domain
application
target domain
Prior art date
Application number
PCT/IB2021/050226
Other languages
English (en)
Inventor
Dániel GÉHBERGER
Chunyan Fu
Martin Julien
Mohammad ABU LEBDEH
Original Assignee
Telefonaktiebolaget Lm Ericsson (Publ)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Telefonaktiebolaget Lm Ericsson (Publ) filed Critical Telefonaktiebolaget Lm Ericsson (Publ)
Priority to PCT/IB2021/050226 priority Critical patent/WO2022153079A1/fr
Publication of WO2022153079A1 publication Critical patent/WO2022153079A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Definitions

  • Embodiments described herein relate to methods and apparatuses for selecting one or more candidate source domains from a plurality of initial source domains, each of the one or more candidate source domains executing a candidate ML model suitable for reuse in a target domain.
  • edge cloud The widespread deployment of the edge cloud is expected to play a major role in evolving 5G deployments. Besides getting popular for problem solving in the application domain, machine learning is used more and more in controlling IT systems (e.g. application scaling, control of physical infrastructure) due to the increased complexity of deployments, input data and parameters.
  • IT systems e.g. application scaling, control of physical infrastructure
  • a machine learning model e.g. a neural network
  • various solutions have emerged during the recent years to optimize the training.
  • the gains and revenue of applying machine learning are from applying the trained models, running inferences in real systems.
  • a machine learning model may be trained centrally with data collected and aggregated from different sites, or, federated learning may be applied, in which case the training is distributed but the local models are aggregated into one central model.
  • Some existing solutions require training a pre-model and evaluating all the possible models. However, particularly in an edge cloud setting, the number of sites running an application may change over time, thus the pre-model would need to be retrained at every change. Also there may be 1000s of sites in the system with different machine learning models making the evaluation process resource inefficient.
  • a method in a central coordinator node, for selecting one or more candidate source domains from a plurality of initial source domains each of the one or more candidate source domains executing a candidate ML model suitable for reuse in a target domain, wherein each of the plurality of initial source domains is associated with source meta information relating to an application running in the initial source domain and/or to the initial source domain.
  • the method comprises receiving from the target domain a request for one or more candidate source domains or one or more candidate ML models, wherein the request comprises target meta information relating to an application running in the target domain and/or to the target domain; comparing the target meta information to the source meta information for each of the plurality of initial source domains; and selecting one or more candidate source domains from the plurality of initial source domains based on the comparison.
  • a method in a local coordinator node of a target domain, for determining a selected machine learning ML model for reuse of the machine learning model from a source domain in the target domain.
  • the method comprises transmitting a request to a central coordinator node for one or more candidate ML models or one or more candidate source domains, wherein the request comprises target meta information relating to an application running in the target domain and/or to the target domain; responsive to transmitting the request, receiving one or more candidate ML models; testing performance of the one or more candidate ML models in the target domain; and responsive to at least one candidate ML model meeting a performance requirement, selecting a selected candidate ML model from the at least one candidate ML model meeting the performance requirement.
  • a method in a local coordinator node of a source domain, wherein the source domain is utilizing a machine learning, ML, model for an application.
  • the method comprises updating source meta information at a central coordinator node, wherein the source meta information comprises a plurality of information elements relating to the application and/or to the source domain; and responsive to a request for the ML model, serializing the ML model and transmitting the serialized model to the central coordinator node or to a target domain.
  • a central coordinator node for selecting one or more candidate source domains from a plurality of initial source domains, each of the one or more candidate source domains executing a candidate ML model suitable for reuse in a target domain, wherein each of the plurality of initial source domains is associated with source meta information relating to an application running in the initial source domain and/or to the initial source domain.
  • the central coordinator node comprises processing circuitry configured to cause the central coordinator node to receive from the target domain a request for one or more candidate source domains or one or more candidate ML models, wherein the request comprises target meta information relating to an application running in the target domain and/or to the target domain; compare the target meta information to the source meta information for each of the plurality of initial source domains; and select one or more candidate source domains from the plurality of initial source domains based on the comparison.
  • a local coordinator node of a target domain for determining a selected machine learning ML model for reuse of the machine learning model from a source domain in the target domain.
  • the local coordinator node of the target domain comprises processing circuitry configured to cause the local coordinator node of the target domain to transmit a request to a central coordinator node for one or more candidate ML models or one or more candidate source domains, wherein the request comprises target meta information relating to an application running in the target domain and/or to the target domain; responsive to transmitting the request, receive one or more candidate ML models; test performance of the one or more candidate ML models in the target domain; and responsive to at least one candidate ML model meeting a performance requirement, select a selected candidate ML model from the at least one candidate ML model meeting the performance requirement.
  • a local coordinator node of a source domain wherein the source domain is utilizing a machine learning, ML, model for an application
  • the local coordinator node of the source domain comprises processing circuitry configured to cause the local coordinator node of the source domain to update source meta information at a central coordinator node, wherein the source meta information comprises a plurality of information elements relating to the application and/or to the source domain; and responsive to a request for the ML model, serialize the ML model and transmit the serialized ML model to the central coordinator node or to a local coordinator node of a target domain.
  • Figure 1 illustrates an example architecture of a system 100 according to some embodiments
  • Figure 2 illustrates a method in a central coordinator node for selecting one or more candidate source domains
  • Figure 3 illustrates a method in a target domain for determining a selected machine learning ML model for reuse of the machine learning model from a source domain in the target domain;
  • Figure 4 illustrates a method, in a source domain, wherein the source domain is utilizing a machine learning, ML, model for an application;
  • Figure 5 is a signaling diagram illustrating an example implementation of the methods of Figures 2, 3 and 4;
  • Figure 6 illustrates an example implementation of step 504 of Figure 5 in more detail
  • Figure 7 illustrates a method in a target domain for determining a selected machine learning, ML, model reuse of the ML model from a source domain in the target domain;
  • Figure 8 illustrates an example model testing system
  • Figure 9 illustrates how feedback from the target domain may be handled at the central coordinator node
  • Figure 10 illustrates an example central coordinator node
  • Figure 11 illustrates an example local coordinator node of a target domain
  • Figure 12 illustrates an example local coordinator node of a source domain
  • Hardware implementation may include or encompass, without limitation, digital signal processor (DSP) hardware, a reduced instruction set processor, hardware (e.g., digital or analogue) circuitry including but not limited to application specific integrated circuit(s) (ASIC) and/or field programmable gate array(s) (FPGA(s)), and (where appropriate) state machines capable of performing such functions.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • Embodiments described herein enable fast and automatic re-use of trained models.
  • the methods described herein may be advantageously applied in an edge cloud environment, as well as other machine learning environments.
  • Methods described herein compare target meta information relating to an application running in the target domain and/or to the target domain, to source meta information relating to respective applications running on a pool of initial source domains and/or to the respective initial source domain.
  • the initial source domains may comprise source domains currently registered at a central coordinator node.
  • One or more candidate source domains may then be selected from the initial source domains.
  • ML models from the one or more candidate source domains may then be tested by the target domain, and a ML model may be selected based on performance.
  • Embodiments described herein may be applied in an edge cloud environment. It may be assumed that for many edge cloud applications a single centrally trained machine learning model cannot properly fulfill the requirements, i.e. its inference error is unexpectedly large, and a model that is trained on locally collected data can provide significantly better results.
  • Such applications include for example prediction of user activity for an application, anomaly detection, Quality of Service (QoS) prediction, or physical data center (e.g. cooling) control applications.
  • the term “application” used herein may comprise a cloud management application or a cloud user application.
  • An example application may be a scaling controller application that uses a machine learning model to predict user activity for the near future (e.g. time-series prediction).
  • the training of a machine learning model requires significant effort, in other words, it requires a large amount of training data, compute resources and time.
  • a trained machine learning model may be fetched from another similar domain, thus the cost of training from scratch is saved. With this method, the application can start working in a shorter time period.
  • the selected and re-used model may be further trained and fine-tuned in a later step once enough data becomes available.
  • Figure 1 illustrates an example architecture of a system 100 according to some embodiments.
  • a central coordinator node 101 is in communication with a plurality of initial source domains 102a to 102n.
  • the central coordinator node 101 is also in communication with a target domain 103.
  • target domain is used to describe an environment in which an application starts running (for example a new edge cloud site) where the target domain may be able to reuse a pre-trained ML model for the application rather than training one locally.
  • source domain for example an edge cloud site
  • the central coordinator node 101 may communicate with a local coordinator node (104a and 104n) in each of the initial source domains 102a to 102n, and with a local coordinator node 105 in the target domain 103.
  • an application “Application 1” is running on a first initial source domain 102a.
  • the same application “Application 1” is also running on a second initial source domain 102n.
  • Application 1 includes a machine learning environment that is responsible for handling an encapsulated ML model (ML model 1 in the first initial source domain 102a and ML model 2 in the second initial source domain).
  • ML model 1 and ML model 2 can be both used by Application 1 and they can have the same purpose and solve the same problem.
  • the only difference between them is the data that is used for their training, i.e. local data from the respective initial source domains that is expected to provide low inference error.
  • Embodiments described herein therefore provide a way of filtering the initial source domains 102a to 102n to provide one or more candidate source domains to the target domain that may have ML models that are suitable for reuse in the target domain 103.
  • Figure 2 illustrates a method in a central coordinator node (e.g. the central coordinator node 101 of Figure 1), for selecting one or more candidate source domains from a plurality of initial source domains each of the one or more candidate source domains executing a candidate ML model suitable for reuse in a target domain, wherein each of the plurality of initial source domains is associated with source meta information relating to an application running in the initial source domain and/or to the initial source domain.
  • the central coordinator node may be a distributed node.
  • a plurality of nodes may be capable of performing the method of Figure 2 in order to provide load sharing.
  • the central coordinator node may maintain a database of the source meta information for each of the plurality of initial source domains.
  • the initial source domains may transmit updates to the central coordinator node to update their associated source meta information.
  • the updates may for example be transmitted periodically.
  • the central coordinator node receives from the target domain a request.
  • the request may request that the central coordinator node provide one or more candidate source domains (from which the target domain may retrieve candidate ML models) or the request may request that the central coordinator node provide one or more candidate ML models.
  • the request comprises target meta information relating to an application running in the target domain and/or to the target domain.
  • the application running in the target domain 103 may be a cloud management application or a cloud user application. More specifically, the application running in the target domain 103 may comprise one of: prediction of user activity for an application, anomaly detection, Quality of Service (QoS) prediction, and a physical data center application.
  • QoS Quality of Service
  • the meta information may comprise a plurality of information elements.
  • Some information elements may relate to the domain (either target domain or source domain).
  • some information elements may relate to the domain in that they relate to how the application runs on the domain.
  • the plurality of information elements may comprise one or more of: a number of servers used by the application; a number of hardware accelerators used by the application; a detailed description of one or more servers used by the application; a software version used by the application; information relating to an uplink connection towards a central data center for the application; information relating to a downlink connection to the domain; a physical location of the domain; a number of users served by the domain; or traffic patterns observed at the domain.
  • Some information elements may relate specifically to the application.
  • the plurality of information elements may comprise one or more of: an identification of the application; a version of the application; a configuration or profile of the application (e.g. 5G mobile broadband or Internet of Things (loT)).
  • step 202 the central coordinator node compares the target meta information to the source meta information for each of the plurality of initial source domains.
  • the central coordinator node selects one or more candidate source domains from the plurality of initial source domains based on the comparison.
  • the method executed in the central coordinator node therefore filters the initial source domains to determine the candidate source domains.
  • the central coordinator node transmits an indication of the one or more candidate source domains to the target domain. In some examples (those in which the request in step 201 requests that the central coordinator node provides one or more candidate ML models to the target domain), the central coordinator node obtains a candidate ML model from each of the one or more candidate source domains, and transmits the obtained one or more candidate ML models to the target domain.
  • Figure 3 illustrates a method in a target domain for determining a selected machine learning ML model for reuse of the machine learning model from a source domain in the target domain.
  • the target domain transmits a request to a central coordinator node for one or more candidate source domains or one or more candidate ML models, wherein the request comprises target meta information relating to an application running in the target domain and/or to the target domain.
  • the target meta information may comprise a plurality of information elements as described above.
  • the target domain responsive to transmitting the request, receives one or more candidate ML models.
  • the one or more candidate ML models are received from the central coordinator node.
  • the target domain receives an indication of the one or more candidate source domains from the central coordinator node, and then retrieves the one or more candidate ML models from the candidate source domains.
  • step 303 the target domain tests performance of the one or more candidate ML models in the target domain.
  • the target domain responsive to at least one candidate ML model meeting a performance requirement, selects a selected candidate ML model from the at least one candidate ML model meeting the performance requirement.
  • the target domain may select one of the candidate ML models for reuse in the target domain.
  • none of the candidate ML models meet the performance requirement, in which case no candidate ML model is selected, and the target domain may train a new ML model locally at the target domain.
  • the target domain may transmit an indication of the selected candidate ML model to the central coordinator node.
  • the target domain may transmit an indication to the central coordinator node that no candidate source domain has been selected.
  • Figure 4 illustrates a method, in a source domain, wherein the source domain is utilizing a machine learning, ML, model for an application.
  • the source domain of Figure 4 may comprise an initial source domain.
  • step 401 the source domain updates source meta information at a central coordinator node, wherein the source meta information comprises a plurality of information elements relating to the application and/or to the source domain.
  • the source meta information may comprise any of the plurality of information elements listed above.
  • the source domain responsive to receiving a request for the ML model, serializes the ML model and transmits the serialized model to the central coordinator node or a target domain.
  • FIG. 5 is a signaling diagram illustrating an example implementation of the methods of Figures 2, 3 and 4.
  • steps 501a, 501b and 501c the initial source domains 102a, 102b and 102c (only three initial source domains are illustrated, but it will be appreciated that there may be any number of initial source domains) transmit updates to their associated source meta information to the central coordinator node 101.
  • the steps 501 a, 501 b and 501c correspond to step 401 of Figure 4.
  • These updates to the source meta information may be sent periodically to the central coordinator node, or may for example, be transmitted when a change to the source meta information occurs.
  • Some of the information elements of the meta information may be considered as static, e.g. the physical location. However, other information elements are dynamic, e.g. the traffic patterns. These dynamic information elements may be updated in steps 501a to 501c.
  • step 502 an application, Application 1 , starts at the target domain 103.
  • the target domain transmits a request to the central coordinator node for one or more candidate ML models, wherein the request comprises target meta information relating to the application running in the target domain and/or to the target domain.
  • step 504 the central coordinator node evaluates the receives target meta information and selects candidate source domains. Step 504 will be described in more detail below with reference to Figure 6. In this example, the central coordinator node selects the initial source domains 102a and 102b as candidate source domains. The initial source domain 102c is not selected as a candidate source domain.
  • step 504 comprises utilizing weighting values that place different priorities on the different information elements.
  • the central coordinator node requests a candidate ML model from each of the candidate source domains 102a and 102b.
  • the candidate source domains 102a and 102b return candidate ML models to the central coordinator node 101.
  • step 506 the central coordinator node provides the candidate ML models received in steps 505c and 505d to the target domain 103.
  • step 507 the target domain 103 tests the candidate ML models. Step 507 is described in more detail with reference to Figure 7 below.
  • the target domain 103 indicates to the central coordinator node which of the one or more candidate source domains has been selected by the target domain or if none of the one or more candidate source domains have been selected by the target domain.
  • Step 508 may further comprise an indication of the performance (e.g. performance metrics such as KPIs) of the candidate ML models in the target domain.
  • step 509 the central coordinator node updates weighting values used in step 504 based on the indication in step 508. An example of how this step may be performed will be described in more detail with reference to Figure 9.
  • Figure 6 illustrates an example implementation of step 504 of Figure 5 in more detail.
  • step 601 the central coordinator node compares the identification of the application provided in the target meta information to the identification of an application provided in the source meta information for each initial source domain.
  • the identification of the application provided in the target meta information may be required to match exactly with the identification of an application provided in the source meta information.
  • the central coordinator node may be configured to filter out initial source domains that are not running the same application as the target source domain.
  • initial source domains running different versions of the application running in the target domain may be accepted as candidate source domains.
  • the version of the application may also be used to determine which applications are qualified as the same. For example, some versions of an application may be compatible, but some may not.
  • the central coordinator node therefore determines if any of the initial source domains are running the same application as the target domain.
  • the initial source domains 101a and 101 b may both be running Application 1 .
  • Initial source domain 101a may be running Version 1 of Application 1
  • Initial source domain 101b may be running Version 2 of Application 1.
  • Initial source domain 101c however, may be running Application 2, and is therefore not running the same application as the target domain 103.
  • step 603 If none of the initial source domains are running the same application as the target domain, the method passes to step 603 in which it is determined that there are no possible candidate source domains.
  • an error may be transmitted to the target domain indicating that no candidate source domains could be found, and the target domain may train a new ML model locally.
  • step 604 the central coordinator node compares the target meta information to the source meta information for each initial source domain running the same application as the target domain.
  • Step 604 may comprise, for each initial source domain (or each initial source domain running the application as determined by step 603), generating a plurality of numerical values each representative of how similar one of the information elements in the target meta information is to a corresponding information element in the source meta information associated with the initial source domain.
  • a comparison function may be defined that can compare 2 values of the information element type to give the numerical value representing similarity.
  • the values of the numerical values may be normalised across information element types (for example, a number between 0 and 1). For some information element types, e.g. a number of servers, the function gives 1 (or a highest value) if the number is identical and less if there is a difference.
  • the traffic patterns observed for example, the traffic may be compared hour by hour and an average similarity may be calculated.
  • step 605 for each initial source domain (or each initial source domain running the application as determined by step 603) the central coordinator node determines an overall similarity value by combining the plurality of numerical values. For example, the central coordinator node may combine the plurality of numerical values by calculating a weighted sum of the plurality of numerical values. The weighted sum may be calculated based on weighting values each associated with a respective one of the information elements in the target meta information. The weighting values may be associated with an application type of the application.
  • the weighting values may be configured such that they place higher weight on the information elements that are relevant for the application that they are associated with. In other words, a match or mismatch in information elements may be higher prioritized using the weighting values if it is relevant for the application.
  • the weighting values from previous ML model selections may be stored in the central coordinator node and used for subsequent candidate model selection requests.
  • the weighting values may evolve over time using the evaluation feedback from the target domain as more and more selection requests are fulfilled (this is described in more detail with reference to Figure 9).
  • the central coordinator node selects the one or more candidate source domains based on the overall similarity values. For example, the central coordinator node may select the initial source domains associated with overall similarity values greater than a predetermined threshold value as the one or more candidate source domains. In some examples, the central coordinator node may select a fixed number of candidate source domains. For example, if 10 initial source domains are running the same application type as the target domain, the central coordinator node may only select, for example, 3 candidate source domains that are closest matching based on the number of users served and the traffic patterns observed properties, since these have high weighting values.
  • the central coordinator node may group initial source domains together if their associated source meta information illustrate significant similarity. Then, the central coordinator node may select one or more initial source domains from each of the closest matching groups as the one or more candidate source domains.
  • Figure 7 illustrates a method in a target domain for determining a selected machine learning, ML, model reuse of the ML model from a source domain in the target domain. The method of Figure 7 may for example be performed by a local coordinator in the target domain, as illustrated in Figure 1.
  • step 701 a new application is started at a target domain.
  • the new application may benefit from the use of a pretrained ML model.
  • the target domain transmits a request to a central coordinator node for one or more candidate ML models or one or more candidate source domains, wherein the request comprises target meta information relating to the application running in the target domain and/or to the target domain.
  • the target domain Responsive to transmitting the request in step 702, in step 703, the target domain receives the one or more candidate ML models.
  • the candidate ML models are received from the central coordinator node.
  • the target domain may receive an indication of one or more candidate source domains, and the target domain may then obtain the one or more candidate ML models from the one or more candidate source domains.
  • step 704 the target domain determines whether the candidate ML model selection has failed, in other words if there are any candidate ML models available. If no candidate ML models are available (for example, if the same application is not running at any of the source domains in the system), the method passes to step 705 in which the target domain trains a new ML model for the application. There are many possible methods for training a new ML model that will be appreciated by those skilled in the art.
  • the method passes to step 706 in which the target domain tests performance of the one or more candidate ML models in the target domain.
  • the target domain may test the one or more candidate ML models using locally collected data. An example of this testing is described in more detail with reference to Figure 8.
  • the purpose of the testing of step 706 is to find the best performing of the one or more candidate ML models in the context of the target domain and the application.
  • the test may be executed at the target domain.
  • a black box test method may be used, in other words the model structure or parameters may not be modified.
  • the testing may be performed either offline or online based on the target domain’s available resources. In the former case, data is collected and pre-processed prior to the testing and enough space may be required for testing data storage, while in the latter case, online data samples are pre-processed and fed into the models directly, so there is no need to store testing data.
  • the length of the testing phase may be based on the properties of an application, for example, the data samples collected for testing may need to be enough to incorporate seasonal traffic changes of the application.
  • the target domain determines whether any of the one or more candidate ML models meet a performance requirement.
  • the method passes to step 708 in which the target domain selects a selected candidate ML model from the at least one candidate ML model meeting the performance requirement.
  • the target domain may then utilize the selected candidate ML model in the target domain.
  • the selected candidate ML model is updated based on data collected when using the selected candidate ML model in the target domain.
  • the performance requirement may be based on model Key Performance Indicators (KPIs) such as F1 score and model reaction time, and/or the application KPIs such as the application’s response time and throughput.
  • KPIs Key Performance Indicators
  • the method may pass to step 705 in which the target domain trains a new ML model at the target domain.
  • the target domain may also transmit an indication to the central coordinator node that no candidate ML model has been selected.
  • the method may first pass to step 709 in which it is determined whether a maximum number of selection iterations has been reached. If the maximum number of selection iterations has been reached the method passes to step 705.
  • the method may pass to step 710 in which the target domain sends a request for a new set of candidate ML models to the central coordinator node. Based on the local testing information the central coordinator node may be able to change the weights used for the selection of the candidate source domains and may be able to provide a different set of candidates ML models to the target domain. The method may then return to step 703, and a count of the selection iterations increases.
  • the target domain transmits an indication of the selected candidate ML model to the central coordinator node. In some examples, the target domain also transmits information relating to the results of step 706.
  • Figure 8 illustrates an example model testing system where the input is a list of the candidate ML models and the output is a selected candidate ML model or if there is no model fulfilling the model KPI requirements, an error.
  • the testing system comprises a Model Distributor 801 which is responsible for initializing model tester(s) for model test. This process can be done in parallel or round robin based on resource limit of the target domain.
  • a Model Tester 802 is responsible for testing one model in the context of the application. In the offline case, it retrieves the testing data from storage, pre- processes the data based on the source meta information associated with the model and tests the model. In the online case, the Model Tester 802 requests a data sample from the Monitoring System 803, pre-processes the data and performs the test.
  • the Monitoring System 803 is responsible for monitoring the target domain and is also responsible for monitoring the model’s performance metrics during the test, such as the number of model inference per second and model process CPU utilization.
  • the Model Evaluator 804 is responsible for evaluating models. It collects the inference results from the Model Tester 802 and the model performance results from the Monitoring System 803 once a test is done.
  • the Model Evaluator 804 may for example, first eliminate candidate ML models that does not fulfil the performance requirement (e.g. model KPI requirements) and may then rank the candidate ML models based on their scores.
  • the model score calculation can be e.g., based on a weight (w, where 0 ⁇ w ⁇ 1) multiplying each model KPI value and the sum of the weights is 1.
  • the weights can be e.g. pre-defined as fair to each KPI.
  • the testing can be done by providing the monitoring information for the model and evaluating the error of the predicted and the actual values after a short time period. Based on the accuracy of the prediction, the best candidate ML model can be selected from the one or more candidate ML models.
  • the testing may be considered successful if there is at least one candidate ML model in the one or more candidate ML models that satisfies the performance requirement (e.g. inference error related requirement). If there are multiple such models, then the best is selected.
  • the performance requirement e.g. inference error related requirement
  • the local testing results (e.g. performance metrics) are sent to the central coordinator node.
  • Figure 9 illustrates how feedback from the target domain may be handled at the central coordinator node.
  • the central coordinator node may use equal or random weighting values as no previous information is available for the application type.
  • the weighting values may be updated automatically.
  • expert opinion regarding the weighting values may be applied during the first deployment of a new application.
  • certain typical weight profiles may be applied, e.g. one that considers hardware related properties with higher weighting values, another that favors software properties, and a third that prioritizes user activity. If the weighting values are unknown, the central coordinator node may select one or more models according to each profile to speed up the convergence of the weighting values for the given application.
  • step 901 an indication of a selected candidate ML model is received from the target domain.
  • the source meta information of the source domain from which the selected candidate ML model originated and the target meta information is then compared in step 902.
  • the central coordinator node adjusts the weighting values associated with the application type to prefer information elements that are closest matching in the source meta information of the selected candidate source domain and the target meta information. In other words, the weighting values may be changed so that closely matching properties are getting higher weighting values, while the weighting values of mismatching properties are lowered.
  • the central coordinator node receives an indication that none of the one or more candidate ML models have been selected by the target domain.
  • the central coordinator node compares the source meta information of the best performing candidate ML models at the target domain to the target meta information. The best performing candidate ML models are the candidate ML models that perform the best when tested, even though none perform well enough to be selected.
  • the central coordinator node may then adjust the weighting values associated with the application type to prefer information elements that are closest matching in the source meta information of the best performing candidate source domain(s) and the target meta information.
  • the meta information of the selected candidate source domain and the target domain are compared. As a close match will be found regarding properties describing the number of users served, and traffic patterns, the weighting values for these properties may then be given higher values for future selections, while the rest of the information elements (e.g. available hardware) will likely not provide a close match and may therefore be given lower weighting values.
  • FIG. 10 illustrates a central coordinator node 1000 comprising processing circuitry (or logic) 1001.
  • the processing circuitry 1001 controls the operation of the central coordinator node 1000 and can implement the method described herein in relation to a central coordinator node 1000.
  • the processing circuitry 1001 can comprise one or more processors, processing units, multi-core processors or modules that are configured or programmed to control the central coordinator node 1000 in the manner described herein.
  • the processing circuitry 1001 can comprise a plurality of software and/or hardware modules that are each configured to perform, or are for performing, individual or multiple steps of the method described herein in relation to the central coordinator node 1000.
  • the processing circuitry 1001 of the central coordinator node 1000 is configured to: receive from the local coordinator node of the target domain a request for one or more candidate source domains or one or more candidate ML models, wherein the request comprises target meta information relating to an application running in the target domain and/or to the target domain; compare the target meta information to the source meta information for each of the plurality of initial source domains; and select one or more candidate source domains from the plurality of initial source domains based on the comparison.
  • the central coordinator node 1000 may optionally comprise a communications interface 1002.
  • the communications interface 1002 of the central coordinator node 1000 can be for use in communicating with other nodes, such as local coordinator nodes of the source and/or target domains, or with other virtual nodes.
  • the communications interface 1002 of the central coordinator node 1000 can be configured to transmit to and/or receive from other nodes requests, resources, information, data, signals, or similar.
  • the processing circuitry 1001 of central coordinator node 1000 may be configured to control the communications interface 1002 of the central coordinator node 1000 to transmit to and/or receive from other nodes requests, resources, information, data, signals, or similar.
  • the central coordinator node 1000 may comprise a memory 1003.
  • the memory 1003 of the central coordinator node 1000 can be configured to store program code that can be executed by the processing circuitry 1001 of the central coordinator node 1000 to perform the method described herein in relation to the central coordinator node 1000.
  • the memory 1003 of the central coordinator node 1000 can be configured to store any requests, resources, information, data, signals, or similar that are described herein.
  • the processing circuitry 1001 of the central coordinator node 1000 may be configured to control the memory 1003 of the central coordinator node 1000 to store any requests, resources, information, data, signals, or similar that are described herein.
  • Figure 11 illustrates a local coordinator node 1100 of a target domain comprising processing circuitry (or logic) 1101.
  • the processing circuitry 1101 controls the operation of the local coordinator node 1100 and can implement the method described herein in relation to a local coordinator node 1100 (of a target domain).
  • the processing circuitry 1101 can comprise one or more processors, processing units, multi-core processors or modules that are configured or programmed to control the local coordinator node 1100 in the manner described herein.
  • the processing circuitry 1101 can comprise a plurality of software and/or hardware modules that are each configured to perform, or are for performing, individual or multiple steps of the method described herein in relation to the local coordinator node 1100.
  • the processing circuitry 1101 of the local coordinator node 1100 is configured to: transmit a request to a central coordinator for one or more candidate ML models or one or more candidate source domains, wherein the request comprises target meta information relating to an application running in the target domain and/or to the target domain; responsive to transmitting the request, receiving one or more candidate ML models; test performance of the one or more candidate ML models in the target domain; and responsive to at least one candidate ML model meeting a performance requirement, select a selected candidate ML model from the at least one candidate ML model meeting the performance requirement.
  • the local coordinator node 1100 may optionally comprise a communications interface 1102.
  • the communications interface 1102 of the local coordinator node 1100 can be for use in communicating with other nodes, such as the central coordinator node and/or local coordinator nodes of source domains, or with other virtual nodes.
  • the communications interface 1102 of the local coordinator node 1100 can be configured to transmit to and/or receive from other nodes requests, resources, information, data, signals, or similar.
  • the processing circuitry 1101 of local coordinator node 1100 may be configured to control the communications interface 1102 of the local coordinator node 1100 to transmit to and/or receive from other nodes requests, resources, information, data, signals, or similar.
  • the local coordinator node 1100 may comprise a memory 1103.
  • the memory 1103 of the local coordinator node 1100 can be configured to store program code that can be executed by the processing circuitry 1101 of the local coordinator node 1100 to perform the method described herein in relation to the local coordinator node 1100.
  • the memory 1103 of the local coordinator node 1100 can be configured to store any requests, resources, information, data, signals, or similar that are described herein.
  • the processing circuitry 1101 of the local coordinator node 1100 may be configured to control the memory 1103 of the local coordinator node 1100 to store any requests, resources, information, data, signals, or similar that are described herein.
  • Figure 12 illustrates a local coordinator node 1200 of a source domain comprising processing circuitry (or logic) 1201.
  • the processing circuitry 1201 controls the operation of the local coordinator node 1200 and can implement the method described herein in relation to a local coordinator node 1200 (of a source domain).
  • the processing circuitry 1201 can comprise one or more processors, processing units, multi-core processors or modules that are configured or programmed to control the local coordinator node 1200 in the manner described herein.
  • the processing circuitry 1201 can comprise a plurality of software and/or hardware modules that are each configured to perform, or are for performing, individual or multiple steps of the method described herein in relation to the local coordinator node 1200.
  • the processing circuitry 1201 of the local coordinator node 1200 is configured to: update source meta information at a central coordinator node, wherein the source meta information comprises a plurality of information elements relating to the application and/or to the source domain; and responsive to a request for the ML model, serialize the ML model and transmit the serialized model to the central coordinator node or a target domain.
  • the local coordinator node 1200 may optionally comprise a communications interface 1202.
  • the communications interface 1202 of the local coordinator node 1200 can be for use in communicating with other nodes, such as the central coordinator node and/or the local coordinator node of target domains, or with other virtual nodes.
  • the communications interface 1202 of the local coordinator node 1200 can be configured to transmit to and/or receive from other nodes requests, resources, information, data, signals, or similar.
  • the processing circuitry 1201 of local coordinator node 1200 may be configured to control the communications interface 1202 of the local coordinator node 1200 to transmit to and/or receive from other nodes requests, resources, information, data, signals, or similar.
  • the local coordinator node 1200 may comprise a memory 1203.
  • the memory 1203 of the local coordinator node 1200 can be configured to store program code that can be executed by the processing circuitry 1201 of the local coordinator node 1200 to perform the method described herein in relation to the local coordinator node 1200.
  • the memory 1203 of the local coordinator node 1200 can be configured to store any requests, resources, information, data, signals, or similar that are described herein.
  • the processing circuitry 1201 of the local coordinator node 1200 may be configured to control the memory 1203 of the local coordinator node 1200 to store any requests, resources, information, data, signals, or similar that are described herein.
  • Embodiments described herein provide for automatic selection of machine learning models to re-use in a target domain such as an edge site in an edge cloud environment.
  • the embodiments described herein ensure selection of the best available option. Whilst the selected candidate ML model may be further trained, creating an even better model for the target domain, the application can start using a well performing model in a short period of time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Stored Programmes (AREA)

Abstract

Des modes de réalisation selon la présente invention portent sur la sélection d'un ou plusieurs domaines sources candidats parmi une pluralité de domaines sources initiaux, le domaine ou chacun des domaines sources candidats exécutant d'un modèle de ML candidat convenant à la réutilisation dans un domaine cible. Chaque domaine de la pluralité de domaines sources initiaux est associé à des méta-informations sources portant sur une application exécutée dans le domaine source initial et/ou sur le domaine source initial. Un procédé dans un nœud de coordinateur central consiste à recevoir du domaine cible une requête d'un ou plusieurs domaines sources candidats ou d'un ou plusieurs modèles de ML candidats, tel que la requête comprend des méta-informations cibles portant sur une application exécutée dans le domaine cible et/ou sur le domaine cible ; à comparer les méta-informations cibles aux méta-informations sources pour chaque domaine de la pluralité de domaines sources initiaux ; et à sélectionner un ou plusieurs domaines sources candidats parmi la pluralité de domaines sources initiaux sur la base de la comparaison.
PCT/IB2021/050226 2021-01-13 2021-01-13 Procédés et appareils destinés à procurer des modèles d'apprentissage par machine candidats WO2022153079A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/IB2021/050226 WO2022153079A1 (fr) 2021-01-13 2021-01-13 Procédés et appareils destinés à procurer des modèles d'apprentissage par machine candidats

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2021/050226 WO2022153079A1 (fr) 2021-01-13 2021-01-13 Procédés et appareils destinés à procurer des modèles d'apprentissage par machine candidats

Publications (1)

Publication Number Publication Date
WO2022153079A1 true WO2022153079A1 (fr) 2022-07-21

Family

ID=74191800

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2021/050226 WO2022153079A1 (fr) 2021-01-13 2021-01-13 Procédés et appareils destinés à procurer des modèles d'apprentissage par machine candidats

Country Status (1)

Country Link
WO (1) WO2022153079A1 (fr)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200134469A1 (en) * 2018-10-30 2020-04-30 Samsung Sds Co., Ltd. Method and apparatus for determining a base model for transfer learning
WO2020104072A1 (fr) * 2018-11-21 2020-05-28 Telefonaktiebolaget Lm Ericsson (Publ) Procédé et gestionnaire d'apprentissage automatique de gestion de la prédiction des caractéristiques d'un service

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200134469A1 (en) * 2018-10-30 2020-04-30 Samsung Sds Co., Ltd. Method and apparatus for determining a base model for transfer learning
WO2020104072A1 (fr) * 2018-11-21 2020-05-28 Telefonaktiebolaget Lm Ericsson (Publ) Procédé et gestionnaire d'apprentissage automatique de gestion de la prédiction des caractéristiques d'un service

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DAGA HARSHIT ET AL: "Cartel A System for Collaborative Transfer Learning at the Edge", PROCEEDINGS OF THE ACM SYMPOSIUM ON CLOUD COMPUTING, ACMPUB27, NEW YORK, NY, USA, 20 November 2019 (2019-11-20), pages 25 - 37, XP058477958, ISBN: 978-1-4503-6973-2, DOI: 10.1145/3357223.3362708 *
SAPRA DOLLY ET AL: "Deep Learning Model Reuse and Composition in Knowledge Centric Networking", 2020 29TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS AND NETWORKS (ICCCN), IEEE, 3 August 2020 (2020-08-03), pages 1 - 11, XP033833223, DOI: 10.1109/ICCCN49398.2020.9209668 *

Similar Documents

Publication Publication Date Title
CN107231436B (zh) 一种进行业务调度的方法和装置
US20210208983A1 (en) Multi-phase cloud service node error prediction
CN113361680B (zh) 一种神经网络架构搜索方法、装置、设备及介质
CN112528870B (zh) 一种基于mimo神经网络和迁移学习的多点振动响应预测方法
CN112364973B (zh) 基于神经网络和模型迁移学习的多源频域载荷识别方法
US20210209481A1 (en) Methods and systems for dynamic service performance prediction using transfer learning
US11513851B2 (en) Job scheduler, job schedule control method, and storage medium
CN112052081B (zh) 一种任务调度方法、装置及电子设备
US20220012611A1 (en) Method and machine learning manager for handling prediction of service characteristics
CN113869521A (zh) 构建预测模型的方法、装置、计算设备和存储介质
WO2018040843A1 (fr) Utilisation d'informations d'une variable dépendante pour améliorer les performances dans une relation d'apprentissage entre la variable dépendante et variables indépendantes
US20180357654A1 (en) Testing and evaluating predictive systems
US20240095529A1 (en) Neural Network Optimization Method and Apparatus
CN113886454A (zh) 一种基于lstm-rbf的云资源预测方法
CN111949530B (zh) 测试结果的预测方法、装置、计算机设备及存储介质
WO2022153079A1 (fr) Procédés et appareils destinés à procurer des modèles d'apprentissage par machine candidats
US11750719B2 (en) Method of performing communication load balancing with multi-teacher reinforcement learning, and an apparatus for the same
US20230419172A1 (en) Managing training of a machine learning model
CN113132482B (zh) 一种基于强化学习的分布式消息系统参数自适应优化方法
CN112637904B (zh) 负载均衡方法、装置及计算设备
CN117480510A (zh) 生成用于机器学习模型预测的置信度分数
Majumdar et al. Improving scalability of 6G network automation with distributed deep Q-networks
CN115174681B (zh) 一种边缘计算服务请求调度方法、设备及存储介质
US20230123841A1 (en) Automated application tiering among core and edge computing sites
CN113810212B (zh) 5g切片用户投诉的根因定位方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21700999

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21700999

Country of ref document: EP

Kind code of ref document: A1