WO2022164454A1 - Prévision de performance de modèle d'apprentissage automatique - Google Patents

Prévision de performance de modèle d'apprentissage automatique Download PDF

Info

Publication number
WO2022164454A1
WO2022164454A1 PCT/US2021/015924 US2021015924W WO2022164454A1 WO 2022164454 A1 WO2022164454 A1 WO 2022164454A1 US 2021015924 W US2021015924 W US 2021015924W WO 2022164454 A1 WO2022164454 A1 WO 2022164454A1
Authority
WO
WIPO (PCT)
Prior art keywords
mlm
engine
hardware
model
procedures
Prior art date
Application number
PCT/US2021/015924
Other languages
English (en)
Inventor
Ravi Subramaniam
Adam Silva
Caio GUIMARAES
Hugo SECRETO
Henrique NISHI
Joao SOUZA
Leandro MENDES
Vinicius TREVISAN
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2021/015924 priority Critical patent/WO2022164454A1/fr
Publication of WO2022164454A1 publication Critical patent/WO2022164454A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • Figure 1 is a block and schematic diagram generally illustrating a machine learning model performance forecaster, according to one example.
  • Figure 2 is a schematic diagram generally illustrating a machine learning model, according to one example.
  • Figure 3 is a schematic diagram generally illustrating an operational flow diagram, according to one example.
  • Figure 4 is a flow diagram illustrating a method of forecasting performance of machine learning models on known hardware, according to one example.
  • Figure 5 is a block and schematic diagram generally illustrating a computing system for implementing a machine learning model performance forecaster, according to one example.
  • Figure 6 is a block and schematic diagram generally illustrating a computing system including a performance forecaster, according to one example.
  • ML engines are application-specific or domain-specific integrated circuits (ASICs) typically having multi-core designs (e.g., hundreds or thousands of processing cores or compute elements) and employing both low and high precision arithmetic along with optimized dataflow architectures and memory use (e.g., in-memory computing) to accelerate calculation and increase computational throughput when processing MLMs.
  • ASICs application-specific or domain-specific integrated circuits
  • multi-core designs e.g., hundreds or thousands of processing cores or compute elements
  • optimized dataflow architectures and memory use e.g., in-memory computing
  • different ML engines are designed to optimize processing of one or more different types of MLMs.
  • MLMs are stored as model files having a representational data format which describes the architecture of the model (e.g., input, output, and hidden layers, layer weights, nodes of each layer, interconnections between nodes of different layers, and ML operations of each node/layer) along with operating parameters and, thus, describe or represent a process flow between input and output layers of an MLM.
  • an MLM it is advantageous for an MLM to be deployable in environments other than the environment or framework in which the model was initially trained (i.e., to be “portable”), such as on any number of different ML engines, for example.
  • an MLM must currently be installed and run on a given ML engine in order to determine/benchmark its performance thereon. If a model’s performance fails to meet expectations or requirements, the model’s architecture may be modified in attempts to improve its performance (e.g., layers may be combined/separated, parameters modified). However, the modified MLM must be re-run on the given ML engine in order to benchmark its adjusted performance. Such processes are time consuming and costly.
  • the present disclosure provides a technique to forecast performance of MLM models on ML engines without installing and running the models thereon.
  • a model execution profile of a selected MLM which is derived from a corresponding model file, is evaluated against a hardware execution profile of a selected ML engine to provide a prediction or forecast of the performance of the selected MLM on the selected ML engine.
  • Such performance forecast may include performance metrics such as an operational latency and energy consumption to execute the selected MLM on the selected ML engine, for example.
  • FIG. 1 is a block and schematic diagram generally illustrating an MLM performance forecaster 20 for predicting a performance of an MLM on an ML engine, according to one example of the present disclosure.
  • performance forecaster 20 includes a memory 22 to store forecasting instructions 24, and a processor 26 to execute forecasting instructions 24 to provide a performance forecast of a selected MLM on a selected ML engine.
  • forecasting instructions 24 include an input module 32, a translator module 34, and a predictor module 36.
  • processor 26 executes input module 32 to receive hardware execution profiles 38 (illustrated as hardware execution profiles 38-1 to 38-n) and model files 40 (illustrated as model files 40-1 to 40-n).
  • hardware execution profiles 38 illustrated as hardware execution profiles 38-1 to 38-n
  • model files 40 illustrated as model files 40-1 to 40-n.
  • each hardware execution profile 38 defines the operation of a corresponding ML engine 39 (illustrated at ML engines 39-1 to 39-n) including different types, numbers, and dependencies of hardware embodied procedures along with corresponding operating metrics of each procedure, such as an execution latency, for example.
  • each model file 40 has a representational format describing an architecture (e.g., a neural network (NN) model) of a corresponding MLM 42 (illustrated as MLMs 42-1 to 42-n), the model architecture including input, output, and hidden layers, layer weights, numbers of nodes of each layer, interconnections between nodes, ML procedures of each node/layer, and parameters and ordering of ML procedures, for example.
  • each model file 40 may have one of any number of representational formats.
  • each model file 40 has an Open Neural Network Exchange (ONNX) file format.
  • the ONNX format is an open-source format (e.g., common sets of ML functions, operations, and sub-operations, and sets of parameters) which describes the architecture and parameters of an MLM 40, and which enables developers to more easily move MLMs between frameworks for MLM training and inferencing to provide network architecture flexibility.
  • MLM models 40 may have one of any number of suitable representational file formats other than the ONNX format, such as NNEF, and any number of proprietary representational formats.
  • processor 26 executes translator module 26 to derive a model execution profile 44 of a selected MLM 42 from its corresponding model file 40 (e.g., an ONNX file).
  • the model execution profile 44 defines different types, numbers, ordering, and dependencies of ML procedures to execute the MLM 42 on a selected ML engine 39, with each ML procedure being mapped to a hardware embodied procedure, or a sequence of hardware embodied procedures, of the selected ML engine 39 as defined by the corresponding hardware execution profile 38.
  • ML procedures may include any number of ML functions (e.g., convolution function) and ML operations (e.g., mathematical operations, mathematical functions, memory operations, data transfers).
  • processor 26 executes predictor module 36 to derive a performance forecast 46 of the selected MLM 40 on the selected ML engine 39, where such derivation is performed based on model execution file 40 and the hardware execution profile 38 of the selected ML engine 39.
  • predictor module 36 derives the performance forecast 46 of the selected MLM 40 on the selected ML engine 38.
  • performance forecast 46 includes at least a predicted latency to execute the selected MLM 40 on the selected ML engine 38.
  • performance forecast 46 may include any number of other performance metrics such an amount of energy consumed to execute the selected MLM 40, and whether the selected ML engine 39 has sufficient memory to execute the selected MLM 40, for example.
  • different ML engines 39 may be optimized for different types of MLMs 42 (e.g., convolutional models, deep stacking, stochastic, spiking, etc.) such that hardware architectures, including the processing types and capacities (e.g., numbers and types of discrete processing cores optimized to perform different operations, microcontrollers, CPUs), memory capacities and configurations, and data path structures, for example, vary between different ML engines 39.
  • hardware architectures including the processing types and capacities (e.g., numbers and types of discrete processing cores optimized to perform different operations, microcontrollers, CPUs), memory capacities and configurations, and data path structures, for example.
  • the types, numbers, and dependencies of hardware embodied procedures and corresponding operating metrics, as defined by hardware execution profiles 38 will vary between different ML engines 39.
  • a first ML engine and a second ML engine may each include a large number of processing cores (sometimes referred to as neural processors) which are optimized to perform certain granular hardware-embodied procedures (e.g., add and multiply), where such optimized hardware-embodied procedures may vary between the first and second ML engines depending on the types of MLM models 40 the ML engines are optimized to execute.
  • the first and second ML engines may each include microcontrollers and CPUs to process more complex (or less frequently occurring) hardware-embodied procedures where, again, such microcontroller- and/or CPU-performed procedures may vary between the first and second ML engines.
  • Dependencies may also vary between the first and second ML engines.
  • Dependencies describe operating characteristics of an ML engine, such as a degree of processing parallelism (e.g., how many hardware-embodied procedures of a given type may be executed in parallel), ordering of procedures (e.g., whether procedures execute serially relative to one another, such as “add” procedure needing to be performed before a “multiply” procedure, and, conversely, whether certain procedures cannot be processed consecutively relative to one another), memory operations (e.g., is an output of a first procedure provided directly to a second procedure or is such output transferred to memory before being provided the second procedure), and operational constraints.
  • Constraints describe various operating restrictions of an ML engine, such as memory capacity (sufficient capacity to execute and store parameters of a given MLM), and minimum execution times for certain memory operations, for example.
  • a same MLM 42 may execute differently on different ML engines 39.
  • a first MLM engine may be optimized to process a given ML procedure of the MLM 42 via a single optimized hardware-embodied procedure (e.g., processing via dedicated neural processor)
  • a second ML engine may process the given ML procedure via multiple hardware-embodied procedures.
  • translator module 34 derives a model execution profile 44 for a selected MLM 42 from its corresponding model file 40 in view of the hardware execution profile 38 of the selected ML engine 39 for which the performance of the selected MLM 42 is being forecast.
  • translator module 34 derives a model execution profile 44 including ML procedures which can be mapped to a hardware embodied procedure or to a sequence hardware embodied procedures defined by the hardware execution profile of the selected ML engine 39.
  • the selected ML engine 42 may execute a given ML function (e.g., a convolution) using a sequence of hardware embodied procedures (e.g., add and multiply procedures).
  • translator module 36 may represent such ML function in the corresponding model execution profile 44 as multiple sub-procedures (e.g., add and multiply) than can be mapped to hardware-embodied procedures define in the hardware execution profile 38 of the selected ML engine 39.
  • the selected ML engine 42 may execute a given ML function via a microcontroller such that translator 36 may represent such ML function in the corresponding model execution profile 44 as a single ML procedure rather than multiple subprocedures.
  • translator 36 derives model execution profiles 44 defining ML procedures at a granular level which can be mapped hardware embodied procedures at a granular level at which they are defined in the hardware execution profile of the selected ML engine 39.
  • FIG. 2 is schematic diagram generally illustrating an example of an MLM 42 implemented as a neural network (NN) 60, and graphically illustrates an architecture of NN 60 (such as input, output, and hidden layers; numbers of nodes at each layer, and interconnectivity between layers, for example) which is described by the representational format of the corresponding model file 40.
  • NN 60 includes an input layer 62 including a plurality of input nodes 64-1 to 64-n, an output layer 66 including a plurality of output nodes 68-1 to 68-k, and a plurality of hidden layers 70, such as illustrated by hidden layer 72 having a plurality of hidden nodes 74-1 to 74-m, and hidden layer 76 having a plurality of hidden nodes 78-1 to 78-p.
  • NN 60 may have any suitable number of hidden layers 70, and input layer 62, output layer 66, and each hidden layer 80 may have any suitable number of nodes, where each layer may have a different number of nodes.
  • outputs of each node of a layer are connected as weighted inputs to nodes of other layers of NN 60, such as illustrated by weighted connection 80, where different connections may have a different weight.
  • each node of each layer is connected to each node the next layer of the NN to form what is referred to as a fully connected NN.
  • various ML procedures e.g., functions, operations
  • NN 60 provides a set of output data 84 at output layer 66.
  • nodes of a given layer may not be connected to nodes of a next layer.
  • nodes of a given layer may be connected to nodes of a subsequent layer which is not the next layer (a so-called “skip connect configuration).
  • a first portion of nodes of a given layer may be connected to nodes of a first subsequent layer, and a second portion of nodes of the given layer may be connected to nodes of a second subsequent layer. Any number of suitable interconnect arrangements may be employed between nodes of different layers.
  • Figure 3 is a graphical representation illustrating a derivation of an example model execution profile 144 from a corresponding model file 40 of a selected MLM 42 by translator module 34, according to one example.
  • the selected MLM 42 is a convolutional NN.
  • translator module 34 derives the parameters and ordering of ML functions and operations, on a layer-by-layer basis, for execution of the selected MLM 42.
  • Flow diagram 150 graphically represents the types and ordering of ML functions and operations, on a layer- by-layer basis, for the execution of the corresponding MLM 42. It is noted that, typically, the nodes of a given layer, such as nodes 74-1 to 74-m of hidden layer 72 of NN 60 (see Figure 2), each perform a same set of ML functions and/or operations.
  • an input layer and an output layer are respectively illustrated at 152 and 154, and hidden layers illustrated at 160, 162, 164, 166, and 168.
  • input layer 152 is graphically illustrated as including a transpose operation to place data in a desired format for hidden layer 160.
  • Hidden layer 160 is graphically illustrated as including sequentially performed Convolution and PRelu functions followed by a MaxPool operation, with hidden layers 162 and 164 each including sequentially performing Convolution and PRelu functions.
  • Layers 166 and 168 are illustrated as including sequentially performed Convolution and PRelu functions, with layer 166 further including a Softmax operation.
  • the procedures of layers 152 and 160-164 are illustrated as being performed sequentially while the procedures of layers 166 and 168 may be performed in parallel (and being representative of a skip-layer architecture). While the Convolution and PRelu functions are illustrated and constructively treated as being part of a same layer for purposes of deriving model execution profile 44, it is noted that such functions may actually be arranged as separate layers in the architecture of corresponding MLM 42. [0028] A portion of model execution profile 144 is illustrated on the right-hand side of Figure 3, and illustrates ML procedures for executing the Convolution functions of each layer of the flow diagram illustrated by graph 150.
  • model execution profile 144 each of the Convolution functions is broken down in model execution profile 144 into more granular sub-operations of addition and multiplication, with such-operations being defined as distinct ML procedures which may be mapped to corresponding hardware-embodied procedures of the hardware execution profile 38 of the selected ML engine 39.
  • model execution profile includes a total number of addition and multiplication operations executed for each layer 160-168, as respectively illustrated at 160-1 to 168-1 , and total bytes of memory for each layer 160-168, as respectively illustrated at 160-2 to168-2, to execute the corresponding Convolution functions.
  • model execution profile 144 the convolution operations of the entire selected MLM 42 are determined to use about 100,780,118 add operations 170, 101 ,801 ,166 multiply operations 172, and consume about 4,084,192 bytes (about 4.1 Mb) 174.
  • translator module 34 derives similar metrics for model execution profile 144 regarding the PRelu functions (and any other ML procedures). Additionally, each of the arrows between the ML procedures (e.g., transpose, convolution, PRelu, Softmax, etc.) indicates a transfer of data between ML procedures and/or layer. In one example, a total number of such data transfers, along with corresponding memory consumption, is derived from model file 40 and included in model execution profile 144.
  • ML procedures e.g., transpose, convolution, PRelu, Softmax, etc.
  • predictor module 36 determines from the hardware execution profile 38 of the selected ML engine 39 how many add operations (if any) can be processed in parallel, how many multiplication operations (if any) can be processed in parallel, and based on how long it takes to run (the latency) of each addition and multiply operation determines a total latency for the convolution functions.
  • predictor module 36 similarly determines a total latency for remaining ML procedures, such as for PRelu functions and memory operations. In one example, predictor module 36 aggregates such latencies to derive an overall predicted latency (a model latency) for executing the selected MLM 42 on the selected ML engine 39.
  • predictor module 36 aggregates a predicted total memory consumption to execute of the selected MLM 42 from the memory consumption of each of the individual ML procedures and, based on a memory capacity included in the hardware execution profile of the selected ML engine 39, includes in the performance forecast 46 a prediction of whether the selected MLM 42 will execute on the selected ML engine 39.
  • predictor module based on an amount of power consumed for each hardware- embodied procedure included as operating metrics in the hardware execution profile 38 of the selected ML engine 39, and based on an aggregation of the ML procedures to which hardware embodied procedures are mapped, predictor module provides an estimate of power and energy consumption of the selected ML engine 39 to execute the selected MLM 42.
  • MLM performance forecaster 20 calculates computational effort and memory consumption to provide a performance forecast of MLMs on ML engines (hardware) without running the model on the hardware.
  • the process is ML framework and hardware agnostic, and provides a quick and easy way to estimate whether a MLM will “fit” certain hardware (and vice versa) and, if so, predicts performance metrics of the MLM on the ML engine without the need for costly deployment thereon.
  • MLM performance forecaster 20 enables expected changes in performance resulting from modifications made to MLM architectures to be quickly and efficiently estimated.
  • MLM forecaster 20 simplifies the training and optimization of MLMs to particular hardware (e.g., a particular ML engine). By predicting a performance of the MLM on the hardware, changes can be made to the MLM’s network architecture while maintaining accuracy of the model’s output.
  • the performance of any number of configurations of the MLM may be predicted and, when combined with training, assist in optimizing the MLM architecture without having to run the MLM on the hardware, thereby saving time and reducing costs.
  • FIG 4 is a flow diagram generally illustrating a process 200 for forecasting machine learning model (MLM) performance on machine learning (ML) engines.
  • Process 200 begins at 202 with receiving model files, each model file having a representational format describing an architecture of a corresponding MLM, such as processor 26 executing input module 32 to receive model files 40-1 to 40-n corresponding to MLMs 42-1 to 42-n, according to Figure 1 , for example.
  • process 200 including receiving hardware execution profiles, each hardware execution profile defining the operation of a corresponding an ML engine and defining different types, numbers, and dependencies of hardware embodied procedures of the ML engine, along with operating metrics of each hardware embodied procedure, including an execution latency, such as processor 26 executing input module 32 to receive hardware execution profiles 38-1 to 38-n corresponding to ML engines 39-1 to 39-n, according to Figure 1 , for example.
  • process 200 includes deriving a model execution profile of a selected MLM from its corresponding model file, the model execution profile defining different types, numbers, ordering and dependencies of ML procedures to execute the MLM on a selected ML engine, each ML procedure mapped to a sequence of one or more hardware embodied procedures of the selected ML engine, such as processor 26 executing translator module 34 to derive a model execution profile 44 from a model file 40 corresponding to a selected MLM 42, according to Figure 1 , for example.
  • process 200 includes deriving a performance forecast of the selected MLM on the selected ML engine based on the model execution profile of the selected MLM and the hardware execution profile of the selected ML engine, the performance forecast including a model latency, such as processor 26 executing predictor module 36 to derive a performance forecast 46 from the model execution profile 44 of the selected MLM 42 and the hardware execution profile 38 of the selected ML engine 39, according to Figure 1 , for example.
  • process includes deriving a performance including additional performance metrics such as memory consumption, power consumption, and energy consumption of the selected MLM 42.
  • FIG. 5 is a block and schematic diagram generally illustrating a computing system 300 for implementing MLM performance forecaster 20 according to one example.
  • computing system or computing device 300 includes processing units 302 and system memory 304, where system memory 304 may be volatile (e.g. RAM), non-volatile (e.g. ROM, flash memory, etc.), or some combination thereof.
  • system memory 304 may be volatile (e.g. RAM), non-volatile (e.g. ROM, flash memory, etc.), or some combination thereof.
  • Computing device 300 may also have additional features/functionality and additional or different hardware.
  • computing device 300 may include input devices 310 (e.g. keyboard, mouse, etc.), output devices 312 (e.g. display), and communication connections 314 that allow computing device 300 to communicate with other computers/applications 316, wherein the various elements of computing device 300 are communicatively coupled together via communication links 318.
  • computing device 300 may include additional storage (removable and/or non-removable) including, but not limited to, magnetic or optical disks or tape. Such additional storage is illustrated as removable storage 306 and non-removable storage 308.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any suitable method or technology for non-transitory storage of information such as computer readable instructions, data structures, program modules, or other data, and does not include transitory storage media.
  • Computer storage media includes RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, and magnetic disc storage or other magnetic storage devices, for example.
  • System memory 304, removable storage 306, and non-removable storage 308 represent examples of computer storage media, including non- transitory computer readable storage media, storing computer executable instructions that when executed by one or more processors units of processing units 302 causes the one or more processors to perform the functionality of a system, such as MLM performance forecaster 20.
  • system memory 304 stores computer executable forecasting instructions 24 for MLM performance forecaster 20, including input module instructions 32, translator module instructions 34, and predictor module instructions 36, that when executed by one or more processing units of processing units 302 implement the functionalities of MLM performance forecaster 20, as described herein.
  • one or more of the at least one machine-readable medium storing instructions for MLM performance forecaster 20, including input module instructions 32, translator module instructions 34, and predictor module instructions 36 may be separate from but accessible to computing device 300. In other examples, hardware and programming may be divided among multiple computing devices.
  • the computer executable instructions can be part of an installation package that, when installed, can be executed by at least one processing unit to implement the functionality of MLM performance forecaster 20.
  • the machine-readable storage medium may be a portable medium, such as a CD, DVD, or flash drive, for example, or a memory maintained by a server from which the installation package can be downloaded and installed.
  • the computer executable instructions may be part of an application, applications, or component already installed on computing device 300, including the processing resource.
  • the machine readable storage medium may include memory such as a hard drive, solid state drive, or the like.
  • FIG. 6 is a block and schematic diagram generally illustrating an example of a computing device 400 (e.g., laptop) including an ML platform 402 having performance forecasting instructions 24 for forecasting performance of MLMs on ML engines, according to one example of the present disclosure.
  • ML platform 402 includes an ML manager 404 (e.g., a microcontroller), a memory 406, and a plurality of ML engines 408 (illustrated as ML engines 408-1 to 408-n).
  • memory 406 stores a plurality of ML model files 410 (illustrated as ML files 410-1 to 410-n), where each ML model file 410 has representational format (e.g., ONNX format) and corresponds to a MLM (such as MLMs 42, see Figure 1 ).
  • memory 406 stores a number of hardware execution profiles 412 (such as described above with respect to hardware execution files 38 of Figure 1 ), illustrated as hardware execution profiles 412-1 to 412-n, with each hardware execution profile 412 corresponding to a different one of the ML engines 408.
  • memory 406 also stores MLM performance forecasting instructions 24 including input module 32, translator module 34, and predictor module 36, as described above with respect to at least Figures 1 -3.
  • each of the MLMs represented by ML files 410-1 to 410-n is configured and trained to perform different machine learning tasks (such as voice recognition, face recognition, speech-to-text conversion, and text-to- speech conversion, for example).
  • machine learning tasks such as voice recognition, face recognition, speech-to-text conversion, and text-to- speech conversion, for example.
  • ML manager 404 based on requests from computing device 400 for execution of a given machine learning task, loads the corresponding MLM 40 (see Figure 1 ) onto one of the ML engines 408.
  • ML manager 404 executes translator module 34 to derive a model execution profile 44 for the requested MLM, and executes predictor module 36 to derive a performance forecast 46 of the requested MLM on one or more of the still available ML engines 408 based on their corresponding hardware execution profiles 412.
  • ML manager 404 implements the MLM to perform the requested ML task on the ML engine 40 corresponding to the performance forecast having the most favorable performance metrics. In examples, the most favorable performance metrics may vary depending on operating policies of computing device 400.
  • the most favorable performance metric may be the lowest power consumption to execute an MLM. In another case, the most favorable performance metric may be the smallest latency to execute an MLM.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

L'invention concerne un système de prévision de performance de modèle d'apprentissage automatique (MLM), comprenant un processeur destiné à exécuter des instructions de prévision, comprenant des modules d'entrée, de traduction et de prédiction. Le module d'entrée s'exécute pour recevoir des fichiers de modèle, chacun décrivant l'architecture d'un MLM correspondant, et des profils d'exécution matérielle, chacun correspondant à un moteur d'apprentissage automatique (ML) et définissant différents types, nombres et dépendances de procédures mises en oeuvre par matériel et des métriques de fonctionnement de chaque procédure. Le module de traduction s'exécute pour obtenir un profil d'exécution de modèle d'un MLM sélectionné parmi les types de définition de fichier de modèle, les nombres, les commandes et les dépendances de procédures ML pour exécuter le MLM sélectionné sur un moteur ML sélectionné, chaque procédure ML correspondant à une séquence d'au moins une procédure mises en oeuvre par matériel du moteur ML sélectionné. Le module de prédiction s'exécute pour obtenir une prévision de performance du MLM sélectionné sur le moteur ML sélectionné, en fonction du profil d'exécution de modèle et des profils d'exécution matérielle correspondants.
PCT/US2021/015924 2021-01-29 2021-01-29 Prévision de performance de modèle d'apprentissage automatique WO2022164454A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2021/015924 WO2022164454A1 (fr) 2021-01-29 2021-01-29 Prévision de performance de modèle d'apprentissage automatique

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2021/015924 WO2022164454A1 (fr) 2021-01-29 2021-01-29 Prévision de performance de modèle d'apprentissage automatique

Publications (1)

Publication Number Publication Date
WO2022164454A1 true WO2022164454A1 (fr) 2022-08-04

Family

ID=82654871

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/015924 WO2022164454A1 (fr) 2021-01-29 2021-01-29 Prévision de performance de modèle d'apprentissage automatique

Country Status (1)

Country Link
WO (1) WO2022164454A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150317589A1 (en) * 2012-11-09 2015-11-05 The Trustees Of Columbia University In The City Of New York Forecasting system using machine learning and ensemble methods
US20180045855A1 (en) * 2016-08-15 2018-02-15 International Business Machines Corporation Correcting computer model weather forecasts using a hybrid analog method with dynamic time warping
US10319476B1 (en) * 2015-02-06 2019-06-11 Brain Trust Innovations I, Llc System, method and device for predicting an outcome of a clinical patient transaction
WO2020131187A2 (fr) * 2018-09-26 2020-06-25 Sofar Ocean Technologies, Inc. Système de prévision météorologique océanique

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150317589A1 (en) * 2012-11-09 2015-11-05 The Trustees Of Columbia University In The City Of New York Forecasting system using machine learning and ensemble methods
US10319476B1 (en) * 2015-02-06 2019-06-11 Brain Trust Innovations I, Llc System, method and device for predicting an outcome of a clinical patient transaction
US20180045855A1 (en) * 2016-08-15 2018-02-15 International Business Machines Corporation Correcting computer model weather forecasts using a hybrid analog method with dynamic time warping
WO2020131187A2 (fr) * 2018-09-26 2020-06-25 Sofar Ocean Technologies, Inc. Système de prévision météorologique océanique

Similar Documents

Publication Publication Date Title
Kang et al. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge
Zhu et al. A novel approach to workload prediction using attention-based LSTM encoder-decoder network in cloud environment
Liu et al. A speculative approach to spatial‐temporal efficiency with multi‐objective optimization in a heterogeneous cloud environment
EP3182280B1 (fr) Machine de développement de modèles analytiques
AU2016259298B2 (en) Machine for development and deployment of analytical models
US10970628B2 (en) Training neural networks represented as computational graphs
US20240007414A1 (en) Methods, systems, articles of manufacture and apparatus to optimize resources in edge networks
US9135069B2 (en) Application resource model composition from constituent components
Li et al. Machine learning based online performance prediction for runtime parallelization and task scheduling
Yang et al. Intelligent resource scheduling at scale: a machine learning perspective
US20190385045A1 (en) Systems And Methods For Generalized Adaptive Storage Endpoint Prediction
Liu et al. CORP: Cooperative opportunistic resource provisioning for short-lived jobs in cloud systems
Li et al. Efficient response time predictions by exploiting application and resource state similarities
Ullah et al. Cloud infrastructure estimation and auto-scaling using recurrent Cartesian genetic programming-based ANN
CN113220466A (zh) 一种基于长短期记忆模型的云服务负载通用预测方法
Dehraj et al. Maintenance assessment guidelines for autonomic system using ANP approach
Wang et al. Enabling energy-efficient and reliable neural network via neuron-level voltage scaling
WO2022164454A1 (fr) Prévision de performance de modèle d'apprentissage automatique
US11989068B2 (en) Thermal and performance management
Shirzad et al. Scheduling optimization of parallel linear algebra algorithms using supervised learning
US11556384B2 (en) Dynamic allocation and re-allocation of learning model computing resources
Guo et al. Hierarchical design space exploration for distributed CNN inference at the edge
Liu et al. An optimized speculative execution strategy based on local data prediction in a heterogeneous hadoop environment
Patel et al. k stacked bidirectional LSTM for resource usage prediction in cloud data centers
Zasadziński et al. Early termination of failed HPC jobs through machine and deep learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21923530

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21923530

Country of ref document: EP

Kind code of ref document: A1