TWI770534B

TWI770534B - Automatic machine learning system performance tuning method, device, electronic device and storage medium

Info

Publication number: TWI770534B
Application number: TW109120932A
Authority: TW
Inventors: 劉政岳; 呂宜鴻
Original assignee: 新加坡商鴻運科股份有限公司
Priority date: 2020-06-19
Filing date: 2020-06-19
Publication date: 2022-07-11
Also published as: TW202201284A

Abstract

A method for performance tuning in Automatic Machine Learning (AutoML) includes obtaining a preset application interface and system resources of the automatic machine learning system. Performance index measurement values are obtained according to the preset application interface when system pre-trains deep learning training model candidates. A distribution strategy and a resource allocation strategy are determined according to the performance index measurement values and the system resources and computing resources of the system are allocated according to the distribution strategy and the resource allocation strategy, so that the deep learning training model candidates is trained based on the computing resources, so as to realize the dynamic allocation of computing resources for each deep learning training model candidate, and ensure the rational allocation of resources of the automatic machine learning system, and improve training performance. An electronic device and a storage medium are also disclosed.

Description

Automatic machine learning system performance tuning method, device, equipment and medium

本發明涉及一種自動機器學習系統效能調優方法、裝置、設備及介質。 The present invention relates to an automatic machine learning system performance tuning method, device, equipment and medium.

自動化機器學習(Automated Machine Learning,AutoML)技術是當前機器學習領域熱點研究和迅速發展的方向之一，它是將自動化和機器學習結合的方式，是一種自動機器學習系統，將機器學習中的資料預處理、特徵選擇、演算法選擇等步驟與深度學習中的模型架構設計和模型訓練等步驟相結合，將其放在一個“黑箱”裡，透過黑箱，使用者只需要輸入資料，就可以得到其想要的預測結果。國內外許多公司紛紛將AutoML技術集成到自研AI平臺中，降低演算法工程師調參試錯成本，加速機器學習模型的構建和落地。現有AutoML平臺產品包括：Cloud AutoML、EasyDL、雲PAI、DarwinML、AI Prophet AutoML、智易科技。 Automated machine learning (Automated Machine Learning, AutoML) technology is one of the hot research and rapid development directions in the field of machine learning. It is a way of combining automation and machine learning. The steps of preprocessing, feature selection, algorithm selection, etc. are combined with the steps of model architecture design and model training in deep learning, and they are placed in a "black box". Through the black box, users only need to input data, they can get its desired prediction result. Many companies at home and abroad have integrated AutoML technology into their self-developed AI platforms, reducing the cost of parameter adjustment, trial and error for algorithm engineers, and accelerating the construction and implementation of machine learning models. The existing AutoML platform products include: Cloud AutoML, EasyDL, Cloud PAI, DarwinML, AI Prophet AutoML, and Zhiyi Technology.

自動化機器學習從特徵工程、模型構建、超參優化三方面實現自動化。自動化機器學習可分為兩類，一類支援的模型類別為分類或回歸時，使用的技術包括概率矩陣分解和貝葉斯優化，其計算量較少，因此實現成本較低。另一類支援的模型類別為用於分類的卷積神經網路(CNN)、迴圈神經網路 (RNN)、長短期記憶網路(LSTM)，使用的技術包括帶梯度策略更新的強化學習、高效神經架構搜索，其使用一個透過迴圈訓練的RNN控制器，對候選架構(即子模型)進行採樣，然後對該候選架構進行訓練，以測量其在期望任務中的性能，接著，控制器使用性能作為指導訊號，以找到更有前景的架構。神經架構搜索在計算上成本昂貴及耗時。 Automated machine learning realizes automation from three aspects: feature engineering, model building, and hyperparameter optimization. Automated machine learning can be divided into two categories. When the supported model category is classification or regression, the techniques used include probabilistic matrix factorization and Bayesian optimization, which require less computation and therefore lower implementation costs. Another class of models supported are Convolutional Neural Networks (CNN), Cyclic Neural Networks for classification (RNN), Long Short-Term Memory Network (LSTM), using techniques including reinforcement learning with gradient policy updates, efficient neural architecture search, which uses an RNN controller trained through loops to This candidate architecture is sampled and then trained to measure its performance on the desired task, and the controller then uses the performance as a guiding signal to find more promising architectures. Neural architecture search is computationally expensive and time-consuming.

綜上，使用AutoML進行深度學習時，開發神經網路的過程需要消耗大量的計算能力，而為每個隨機選擇的候選架構請求的計算資源是不同的，存在計算資源配置過度或分配不足的問題。 In summary, when using AutoML for deep learning, the process of developing a neural network needs to consume a lot of computing power, and the computing resources requested for each randomly selected candidate architecture are different, and there is a problem of over-allocation or under-allocation of computing resources. .

鑒於上述內容，有必要提供一種自動機器學習系統效能調優方法、裝置、設備及介質，對所述自動機器學習的中的深度學習訓練模型的計算資源進行動態分配，解決計算資源配置過度或分配不足的問題，提高自動機器學習訓練性能。 In view of the above, it is necessary to provide an automatic machine learning system performance tuning method, device, equipment and medium, which can dynamically allocate the computing resources of the deep learning training model in the automatic machine learning, and solve the problem of excessive allocation or allocation of computing resources. Insufficient problem, improve automatic machine learning training performance.

本申請一實施方式提供一種自動機器學習系統效能調優方法，應用於與所述自動機器學習系統連接的效能調優裝置，包括：獲取所述自動機器學習系統的預設應用程式介面及系統資源；在所述自動機器學習系統對一候選深度學習訓練模型進行預訓練時，根據所述預設應用程式介面獲取其對應的效能指標量測值；根據所述效能指標量測值及所述系統資源確定分發策略和/或資源配置策略；以及，根據所述分發策略和/或所述資源配置策略分配所述自動機器學習系統的計算資源，以使得所述候選深度學習訓練模型基於所述計算資源配置進行訓練。 An embodiment of the present application provides a performance tuning method for an automatic machine learning system, which is applied to a performance tuning device connected to the automatic machine learning system, including: acquiring a default application programming interface and system resources of the automatic machine learning system ; When the automatic machine learning system pre-trains a candidate deep learning training model, obtain its corresponding performance index measurement value according to the preset application program interface; According to the performance index measurement value and the system Resource determination distribution strategy and/or resource allocation strategy; and allocating computing resources of the automatic machine learning system according to the distribution strategy and/or the resource allocation strategy, so that the candidate deep learning training model is based on the calculation Resource allocation for training.

本申請一實施方式還提供一種效能調優裝置，包括：第一獲取模組，用於獲取所述自動機器學習系統的預設應用程式介面及系統資源；第二獲取模組，用於在所述自動機器學習系統對一候選深度學習訓練模型進行預訓練時，根據所述預設應用程式介面獲取其對應的效能指標量測值；策略確定模組，用於根據所述效能指標量測值及所述系統資源確定分發策略和/或資源配置策略；以及，分配模組，用於基於所述分發策略和/或所述資源配置策略分配所述自動機器學習系統的計算資源，以使得所述候選深度學習訓練模型基於所述計算資源配置進行訓練。 An embodiment of the present application further provides a performance tuning device, including: a first acquisition module for acquiring a preset application program interface and system resources of the automatic machine learning system; a second acquisition module for When the automatic machine learning system pre-trains a candidate deep learning training model, the corresponding performance index measurement value is obtained according to the preset application program interface; the strategy determination module is used for measuring the value according to the performance index and the system resources to determine a distribution strategy and/or a resource allocation strategy; and an allocation module for allocating computing resources of the automatic machine learning system based on the distribution strategy and/or the resource allocation strategy, so that all The candidate deep learning training model is trained based on the computing resource configuration.

本申請一實施方式還提供一種電子設備，所述電子設備包括：一個或複數處理器；當一個或複數程式被所述一個或複數處理器執行，使得所述一個或複數處理器實現如上任一所述的自動機器學習系統效能調優方法。 An embodiment of the present application also provides an electronic device, the electronic device includes: one or more processors; when one or more programs are executed by the one or more processors, the one or more processors implement any one of the above The described automatic machine learning system performance tuning method.

本申請一實施方式還提供一種電腦可讀存儲介質，其上存儲有電腦程式，所述電腦程式被處理器執行時實現如上任一項所述的自動機器學習系統效能調優方法。 An embodiment of the present application further provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the performance tuning method for an automatic machine learning system as described in any one of the above.

本申請實施方式提供的自動機器學習系統效能調優方法、裝置、設備及介質，對所述自動機器學習系統中的每個候選深度學習訓練模型都進行單獨的動態優化，針對所述自動機器學習系統中的每個候選深度學習訓練模型，獲取候選深度學習訓練模型進行預訓練時的預設應用程式介面的效能指標量測值，根據所述效能指標量測值及所述系統資源確定分發策略和/或資源配置策略，最後基於所述分發策略和/或所述資源配置策略分配所述自動機器學習系統的計算資源，以使得所述候選深度學習訓練模型基於所述計算資源配置進行訓練，實現對每個候選深度學習訓練模型計算資源的動態分配，保證自動機器學習系統計算資源配置的合理，提高訓練性能。 In the automatic machine learning system performance tuning method, device, device and medium provided by the embodiments of the present application, each candidate deep learning training model in the automatic machine learning system is independently dynamically optimized. For each candidate deep learning training model in the system, obtain the performance index measurement value of the preset application program interface when the candidate deep learning training model is pre-trained, and determine the distribution strategy according to the performance index measurement value and the system resources and/or resource allocation strategy, and finally assign the automatic machine learning system based on the distribution strategy and/or the resource allocation strategy Systematic computing resources, so that the candidate deep learning training model is trained based on the computing resource configuration, realizes the dynamic allocation of computing resources for each candidate deep learning training model, ensures the reasonable configuration of computing resources of the automatic machine learning system, and improves the training performance.

10:自動機器學習系統 10: Automated Machine Learning Systems

73:效能調優裝置 73: Performance Tuning Device

7:電子設備 7: Electronic equipment

71:處理器 71: Processor

72:記憶體 72: Memory

101:第一獲取模組 101: The first acquisition module

102:第二獲取模組 102: Second acquisition module

103:策略確定模組 103: Strategy Determination Module

104:分配模組 104: Assign modules

圖1是本申請一實施例提供的自動機器學習系統效能調優示意圖。 FIG. 1 is a schematic diagram of performance tuning of an automatic machine learning system provided by an embodiment of the present application.

圖2是本申請一實施例提供的電子設備的方框圖。 FIG. 2 is a block diagram of an electronic device provided by an embodiment of the present application.

圖3是本申請一實施例提供的自動機器學習系統效能調優方法的流程圖。 FIG. 3 is a flowchart of an automatic machine learning system performance tuning method provided by an embodiment of the present application.

圖4是本申請一實施例提供的預設應用程式介面及系統資源獲取方法的流程示意圖。 FIG. 4 is a schematic flowchart of a default application programming interface and a method for obtaining system resources provided by an embodiment of the present application.

圖5是本申請一實施例提供的配置應用程式介面方法的流程示意圖。 FIG. 5 is a schematic flowchart of a method for configuring an API according to an embodiment of the present application.

圖6是本申請一實施例提供的一種效能調優裝置的方框圖。 FIG. 6 is a block diagram of an apparatus for performance tuning provided by an embodiment of the present application.

下面將結合本發明實施方式中的附圖，對本發明實施方式中的技術方案進行清楚、完整地描述，顯然，所描述的實施方式是本發明一部分實施方式，而不是全部的實施方式。基於本發明中的實施方式，本領域普通技術人員在沒有付出創造性勞動前提下所獲得的所有其他實施方式，都屬於本發明保護的範圍。 The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

基於本申請中的實施方式，本領域普通技術人員在沒有付出創造性勞動前提下所獲得的所有其他實施方式，都是屬於本申請保護的範圍。 Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the protection scope of the present application.

為了能夠更清楚地理解本發明的上述目的、特徵和優點，下面結合附圖和具體實施例對本發明進行詳細描述。需要說明的是，在不衝突的情況下，本申請的實施例及實施例中的特徵可以相互組合。 In order to more clearly understand the above objects, features and advantages of the present invention, the present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments of the present application and the features in the embodiments may be combined with each other in the case of no conflict.

在下面的描述中闡述了很多具體細節以便於充分理解本發明，所描述的實施例僅僅是本發明一部分實施例，而不是全部的實施例。基於本發明中的實施例，本領域普通技術人員在沒有做出創造性勞動前提下所獲得的所有其他實施例，都屬於本發明保護的範圍。 In the following description, many specific details are set forth in order to facilitate a full understanding of the present invention, and the described embodiments are only some, but not all, embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

除非另有定義，本文所使用的所有的技術和科學術語與屬於本發明的技術領域的技術人員通常理解的含義相同。本文中在本發明的說明書中所使用的術語只是為了描述具體的實施例的目的，不是旨在於限制本發明。 Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terms used herein in the description of the present invention are for the purpose of describing specific embodiments only, and are not intended to limit the present invention.

為了便於本領域技術人員深入理解本申請實施例，以下將首先介紹本申請實施例中所涉及的專業術語的定義。 In order to facilitate the in-depth understanding of the embodiments of the present application by those skilled in the art, the following will firstly introduce definitions of technical terms involved in the embodiments of the present application.

AutoML在開發神經網路的過程中，將要訓練集上傳，透過神經架構搜索(NAS，NeuralArchitecture Search via Parameter Sharing)搜索出最好的神經網路架構，神經架構搜索的工作流程如下：通常從定義一組神經網路可能會用到的“建築模組”開始，這些“建築模組”包括多種卷積和池化模組，然後使用一個迴圈神經網路(RNN)作為控制器，控制器從這些“建築模組”中挑選，然後將它們放在一起，組成新神經網路架構，使用訓練集對新神經網路架構進行訓練，直至收斂，使用測試集進行測試，得到準確率，這個準確率隨後會用來透過策略梯度更新控制器，以讓控制器生成神經網路架構的水準越來越高。學習卷積神經網路結構方法還包括高效神經架構搜索(Efficient Neural Architecture Search via Parameter Sharing，ENAS)、漸進式神經架構搜索(Progressive Neural Architecture Search)。 In the process of developing a neural network, AutoML will upload the training set, and search for the best neural network architecture through Neural Architecture Search (NAS, NeuralArchitecture Search via Parameter Sharing). Start with "building blocks" that may be used by the neural network, these "building blocks" include various convolution and pooling blocks, and then use a recurrent neural network (RNN) as the controller, the controller from Pick from these "building modules", then put them together to form a new neural network architecture, use the training set to train the new neural network architecture until convergence, use the test set to test, and get the accuracy rate, this standard The accuracy rate is then used to update the controller through policy gradients, allowing the controller to generate higher and higher levels of neural network architecture. The methods of learning convolutional neural network structure also include Efficient Neural Architecture Search via Parameter Sharing (ENAS) and Progressive Neural Architecture Search.

請參閱圖1，為了優化自動機器學習系統10的性能，減少自動機器學習系統10的培訓時間，所述效能調優裝置73連接所述自動機器學習系統10，透過本申請實施例的效能調優裝置73自動優化所述自動化機器學習系統的10自動機器學習性能，為所述自動機器學習系統10隨機選擇出的每一候選深度學習訓練模型分配計算資源，避免每一候選深度學習訓練模型的計算資源配置過度或分配不足。 Referring to FIG. 1 , in order to optimize the performance of the automatic machine learning system 10 and reduce the training time of the automatic machine learning system 10 , the performance tuning device 73 is connected to the automatic machine learning system 10 to perform performance tuning through the embodiment of the present application. The device 73 automatically optimizes the automatic machine learning performance of the automatic machine learning system 10 for each candidate depth randomly selected by the automatic machine learning system 10 Learning and training models allocate computing resources to avoid over-allocation or under-allocation of computing resources for each candidate deep learning training model.

在本申請實施例中，自動機器學習系統10的AutoMl底層可以使用Scikit-Learn、XGBoost、TensorFlow、Keras、LightGBM等工具來確保運行時的高效。 In the embodiment of the present application, the AutoM1 bottom layer of the automatic machine learning system 10 can use tools such as Scikit-Learn, XGBoost, TensorFlow, Keras, LightGBM, etc., to ensure the efficiency of the runtime.

請參閱圖2，所述效能調優裝置73可以運行於電子設備，所述電子設備包括，但不僅限於，記憶體及至少一個處理器，上述元件之間可以透過匯流排連接，效能調優裝置73運行於所述處理器上，所述效能調優裝置73執行電腦程式時實現本申請自動機器學習系統效能調優方法實施例中的步驟。或者，所述效能調優裝置73執行所述電腦程式時實現本申請效能調優裝置73實施例中各模組/單元的功能。 Please refer to FIG. 2 , the performance tuning device 73 can run on an electronic device. The electronic device includes, but is not limited to, a memory and at least one processor. The above components can be connected through a bus, and the performance tuning device 73 runs on the processor, and the performance tuning device 73 implements the steps in the embodiment of the automatic machine learning system performance tuning method of the present application when the performance tuning device 73 executes the computer program. Alternatively, when the performance tuning device 73 executes the computer program, the functions of each module/unit in the embodiment of the performance tuning device 73 of the present application are implemented.

在本實施方式中，所述電子設備可以包括效能調優裝置73及伺服器。在其他實施方式中，所述電子設備可以是雲端伺服器等計算設備。本領域技術人員可以理解，所述示意圖僅僅是電子設備的示例，並不構成對電子設備的限定，可以包括比圖示更多或更少的部件，或者組合某些部件，或者不同的部件，例如所述電子設備還可以包括輸入輸出設備、網路接入設備、匯流排等。本申請自動機器學習系統效能調優方法應用在一個或者複數電子設備中。所述電子設備是一種能夠按照事先設定或存儲的指令，自動進行數值計算和/或資訊處理的設備，其硬體包括但不限於微處理器、專用積體電路(Application Specific Integrated Circuit，ASIC)、可程式設計閘陣列(Field-Programmable Gate Array，FPGA)、數文書處理器(Digital Signal Processor，DSP)、嵌入式設備等。 In this embodiment, the electronic device may include a performance tuning device 73 and a server. In other embodiments, the electronic device may be a computing device such as a cloud server. Those skilled in the art can understand that the schematic diagram is only an example of an electronic device, and does not constitute a limitation on the electronic device, and may include more or less components than the one shown in the figure, or combine some components, or different components, For example, the electronic device may further include input and output devices, network access devices, bus bars, and the like. The automatic machine learning system performance tuning method of the present application is applied in one or more electronic devices. The electronic device is a device that can automatically perform numerical calculations and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, application specific integrated circuits (ASICs) , Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Signal Processor (Digital Signal Processor, DSP), embedded devices, etc.

所述電子設備可以是桌上型電腦、筆記型電腦、平板電腦及雲端伺服器等計算設備。所述電子設備可以與使用者透過鍵盤、滑鼠、遙控器、觸控板或聲控設備等方式進行人機交互。 The electronic device may be a computing device such as a desktop computer, a notebook computer, a tablet computer, and a cloud server. The electronic device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touchpad, or a voice-activated device.

圖3是本申請一實施例提供的自動機器學習系統效能調優方法的流程圖。根據不同的需求，所述流程圖中步驟的順序可以改變，某些步驟可以省略。 FIG. 3 is a flowchart of an automatic machine learning system performance tuning method provided by an embodiment of the present application. According to different requirements, the order of the steps in the flowchart can be changed, and some steps can be omitted.

請參閱圖3，所述自動機器學習系統效能調優方法由與所述自動機器學習系統10連接的效能調優裝置73實施，具體包括以下步驟： Please refer to FIG. 3, the automatic machine learning system performance tuning method is implemented by the performance tuning device 73 connected to the automatic machine learning system 10, and specifically includes the following steps:

步驟S10：獲取所述自動機器學習系統的預設應用程式介面及系統資源。 Step S10: Acquire the default application programming interface and system resources of the automatic machine learning system.

在本申請實施例中，所述效能調優裝置73連接所述自動機器學習系統10，所述自動機器學習系統10的計算後端是所述效能調優裝置73能識別的計算後端，例如TensorFlow，則所述效能調優裝置73可以獲取所述自動機器學習系統10的相關資料資訊，獲取所述自動機器學習系統10的預設應用程式介面及系統資源。 In this embodiment of the present application, the performance tuning device 73 is connected to the automatic machine learning system 10 , and the computing backend of the automatic machine learning system 10 is a computing backend that the performance tuning device 73 can identify, such as TensorFlow, the performance tuning device 73 can obtain relevant data information of the automatic machine learning system 10 , and obtain the default application programming interface and system resources of the automatic machine learning system 10 .

在本申請實施例中，所述效能調優裝置73可以包括一調優伺服器及一性能分析工具。 In this embodiment of the present application, the performance tuning device 73 may include an tuning server and a performance analysis tool.

在其中一種可能實現方式中，請參閱圖4，所述獲取所述自動機器學習系統10的預設應用程式介面及系統資源具體可以透過以下步驟進行： In one of the possible implementations, please refer to FIG. 4 , the acquisition of the default application programming interface and system resources of the automatic machine learning system 10 can be specifically performed through the following steps:

步驟S101：將所述自動機器學習系統中與系統效能相關的應用程式介面及所述自動機器學習系統的系統資源記錄於一調優伺服器的資料庫。 Step S101 : Record the application programming interface related to the system performance in the automatic machine learning system and the system resources of the automatic machine learning system in a database of a tuning server.

步驟S102：所述調優伺服器從所述資料庫讀取所述自動機器學習系統的系統資源。 Step S102: the tuning server reads the system resources of the automatic machine learning system from the database.

步驟S103：一性能分析工具從所述調優伺服器的資料庫中讀取所述預設應用程式介面。 Step S103: A performance analysis tool reads the default API from the database of the tuning server.

在本申請實施例中，所述效能調優裝置73中的調優伺服器與性能分析工具均能識別所述自動機器學習系統10的計算後端，所述調優伺服器預先在其資料庫中記錄所述自動機器學習系統10中與效能相關的應用程式介面(Application Programming Interface，API)及系統可用資源，其中，所述預設應用程式介面為所述自動機器學習系統10在進行深度學習任務堆疊中與效能相關的應用程式介面。 In the embodiment of the present application, both the tuning server and the performance analysis tool in the performance tuning device 73 can identify the computing backend of the automatic machine learning system 10, and the tuning server pre- Record performance-related Application Programming Interface (API) and system available resources in the automatic machine learning system 10 in its database, wherein the default API is the automatic machine learning system 10 APIs related to performance in performing deep learning task stacks.

在本申請實施例中，所述調優伺服器可以為生成所述分發策略和資源配置策略的內置伺服器。 In this embodiment of the present application, the tuning server may be a built-in server that generates the distribution strategy and resource configuration strategy.

在本申請實施例中，所述性能分析工具可以為內置效能量測工具的SOFA伺服器，還可以包括內置效能量測工具的火焰圖(Flame Grap)。透過所述性能分析工具可以從中央處理器(Central Processing Unit，CPU)、圖形處理器(Graphics Processing Unit，GPU)、通訊網路及存放裝置收集異構的效能指標量測值。 In the embodiment of the present application, the performance analysis tool may be a SOFA server with a built-in efficiency measurement tool, and may further include a flame graph (Flame Grap) with a built-in efficiency measurement tool. Through the performance analysis tool, heterogeneous performance index measurement values can be collected from a central processing unit (CPU), a graphics processing unit (GPU), a communication network, and a storage device.

步驟S20：在所述自動機器學習系統對候選深度學習訓練模型進行預訓練時，根據所述預設應用程式介面獲取其對應的效能指標量測值。 Step S20: When the automatic machine learning system pre-trains the candidate deep learning training model, obtain the corresponding performance index measurement value according to the preset application program interface.

在本申請實施例中，使用者將訓練資料登錄至所述自動機器學習系統10後，所述自動機器學習系統10進行神經網路搜索，根據訓練資料對搜索出的所述候選深度學習訓練模型進行預訓練，並在進行深度學習訓練過程中，根據所述預設應用程式介面獲取該應用程式介面對應的效能指標量測值。 In the embodiment of the present application, after the user logs the training data into the automatic machine learning system 10, the automatic machine learning system 10 performs a neural network search, and searches for the candidate deep learning training model according to the training data. Pre-training is performed, and during the deep learning training process, the performance index measurement values corresponding to the application program interface are obtained according to the preset application program interface.

在其中一種可能實現方式中，所述在所述自動機器學習系統10對一候選深度學習訓練模型進行預訓練時，根據所述預設應用程式介面獲取其對應的效能指標量測值包括：在所述自動機器學習系統10對一候選深度學習訓練模型進行預訓練時，所述性能分析工具根據所述預設應用程式介面獲取該應用程式介面的效能指標量測值，以透過通訊方式例如遠端調用GRPC將所述效能指標量測值傳輸給所述調優伺服器，其中，所述性能分析工具與所述自動機器學習系統10集成封裝。 In one possible implementation manner, when the automatic machine learning system 10 pre-trains a candidate deep learning training model, acquiring the corresponding performance index measurement value according to the preset application program interface includes: When the automatic machine learning system 10 pre-trains a candidate deep learning training model, the performance analysis tool obtains the performance index measurement value of the application program interface according to the default application program interface, and uses a communication method such as a remote The terminal calls GRPC to transmit the measurement value of the performance index Input to the tuning server, wherein the performance analysis tool is integrated and packaged with the automatic machine learning system 10 .

在本申請實施例中，所述性能分析工具與所述自動機器學習系統10集成封裝，透過將性能分析工具、自動機器學習模型及相應的應用程式介面軟體封裝集成一起，實現自動化獲取所述自動機器學習系統10效能指標量測值，並透過遠端調用將效能指標量測值發送給所述內置伺服器，以使得所述內置伺服器進行計算資源決策。 In the embodiment of the present application, the performance analysis tool is integrated and packaged with the automatic machine learning system 10, and by integrating the performance analysis tool, the automatic machine learning model and the corresponding application programming interface software package, the automatic acquisition of the automatic The machine learning system 10 measures the performance index value, and sends the performance index measurement value to the built-in server through a remote call, so that the built-in server makes a computing resource decision.

示例性地，選擇AutoKeras作為所述自動機器學習系統10引擎，所述AutoKeras將使用有效神經體系結構搜索演算法(ENAS)來選擇出所述候選深度學習模型並進行評估，以根據前一候選深度學習模型的評估選擇出下一更好的候選深度學習模型。使用者將訓練資料登錄至所述AutoKeras，所述AutoKeras的數據預處理API根據深度神經網路的超參數批大小確定一次訓練的資料量大小，所述候選深度學習模型根據所述訓練資料進行訓練的過程中，所述SOFA伺服器透過效能量測工具獲取所述AutoKeras的所述預設應用程式介面的效能指標量測值，例如，透過效能量測工具獲取高速串列電腦擴展匯流排標準匯流排(PCIe BUS)上的資料交換數量的量測值。SOFA伺服器透過遠端程式呼叫(如gRPC Remote Procedure Calls等)將獲得的效能指標量測值發送給內置伺服器，以使得內置伺服器根據所述效能指標量測值及系統資源為所述候選深度學習模型分配計算資源。 Illustratively, AutoKeras is selected as the automatic machine learning system 10 engine, which will use an efficient neural architecture search algorithm (ENAS) to select and evaluate the candidate deep learning models to The evaluation of the learned model selects the next better candidate deep learning model. The user logs the training data into the AutoKeras, the data preprocessing API of the AutoKeras determines the amount of data for one training according to the hyperparameter batch size of the deep neural network, and the candidate deep learning model is trained according to the training data In the process, the SOFA server obtains the performance index measurement value of the default API of the AutoKeras through the performance measurement tool, for example, obtains the high-speed serial computer expansion bus standard bus through the performance measurement tool A measure of the number of data exchanges on the row (PCIe BUS). The SOFA server sends the obtained performance index measurement value to the built-in server through remote program calls (such as gRPC Remote Procedure Calls, etc.), so that the built-in server can be the candidate according to the performance index measurement value and system resources. Deep learning models allocate computing resources.

在其中一種可能實現方式中，所述效能指標包括轉發傳播時間、(FW)和向後傳播時間(BW)、將資料從主機複製到GPU設備(H2D)的時間、將資料從GPU設備複製到主機(D2H)的時間或資料從對等設備(P2P)複製的時間。 In one of the possible implementations, the performance metrics include forward propagation time, (FW) and backward propagation time (BW), time to copy data from the host to the GPU device (H2D), copy data from the GPU device to the host (D2H) or when data is copied from peer-to-peer (P2P).

在其中一種可能實現方式中，所述效能指標包括用戶空間(user space)及系統空間(system space)的程式執行時間、檔案系統或儲存媒體的讀寫頻寬、網路使用頻寬、呼叫的凾式熱點分佈、耗時在系統鎖的開銷等。 In one possible implementation manner, the performance indicators include program execution time in user space and system space, read/write bandwidth of the file system or storage medium, network usage bandwidth, and call Instantaneous hotspot distribution, time-consuming system lock overhead, etc.

步驟S30：根據所述效能指標量測值及所述系統資源確定分發策略和/或資源配置策略。 Step S30: Determine a distribution strategy and/or a resource allocation strategy according to the performance index measurement value and the system resources.

在本申請實施例中，所述性能分析器透過遠端程式呼叫將效能指標量測值發送給所述內置伺服器，所述內置伺服器根據所述效能指標量測值及所述系統資源確定分發策略和/或資源配置策略。 In the embodiment of the present application, the performance analyzer sends the performance index measurement value to the built-in server through a remote program call, and the built-in server determines the performance index measurement value according to the performance index measurement value and the system resources Distribution strategy and/or resource allocation strategy.

在其中一種可能實現方式中，所述內置伺服器將創建一個另一種標記語言(YAML Ain't Markup Language，YAML)檔，用於生成Kubernetes pod，其中Pod是Kubernetes中能夠創建和部署的最小單元，即Kubernetes集群中的一個應用實例，Kubernetes用於管理雲平臺中複數主機上的容器化的應用，其是自動化容器操作的開源平臺。YAML檔將記錄要運行具體的應用容器引擎(Docker)映射、硬體資源的分配以及相應容器的虛擬機器節點的設置。 In one of the possible implementations, the built-in server will create another markup language (YAML Ain't Markup Language, YAML) file for generating a Kubernetes pod, where a pod is the smallest unit that can be created and deployed in Kubernetes , that is, an application instance in a Kubernetes cluster. Kubernetes is used to manage containerized applications on multiple hosts in a cloud platform. It is an open source platform for automating container operations. The YAML file will record the specific application container engine (Docker) mapping to run, the allocation of hardware resources, and the settings of the virtual machine node of the corresponding container.

在其中一種可能實現方式中，所述根據所述效能指標量測值及所述系統資源確定分發策略包括：根據所述效能指標量測值及所述系統資源確定單節點訓練或多節點訓練，其中，所述單節點訓練包括由單一節點對所述候選深度學習訓練模型進行訓練，所述多節點訓練包括由複數節點共同對所述候選深度學習訓練模型訓練，並共用所述候選深度學習訓練模型中的參數變數。 In one possible implementation manner, the determining the distribution strategy according to the performance index measurement value and the system resources includes: determining single-node training or multi-node training according to the performance index measurement value and the system resources, Wherein, the single-node training includes training the candidate deep learning training model by a single node, and the multi-node training includes jointly training the candidate deep learning training model by a plurality of nodes, and sharing the candidate deep learning training model Parametric variables in the model.

在本申請實施例中，對所述候選深度學習訓練模型的訓練任務可以根據所述效能指標量測值及所述系統資源確定由單一節點進行訓練還是由多節點進行訓練，在訓練任務比較繁重時，透過在複數節點之間進行分散式訓練分佈，允許擴大深度學習訓練任務，可以學習到更大的模型或以更快的速度訓練。在訓練任務較輕時，透過單一節點進行訓練，可以保證計算資源的合理分配，避免計算資源配置過度。 In the embodiment of the present application, the training task for the candidate deep learning training model can be determined according to the measured value of the performance index and the system resource to be trained by a single node or by multiple nodes. , through the distributed training distribution among the complex nodes, allowing the deep learning training task to be scaled up to learn a larger model or train at a faster speed practice. When the training task is light, training through a single node can ensure the reasonable allocation of computing resources and avoid excessive allocation of computing resources.

在其中一種可能實現方式中，所述單節點訓練包括由單一節點中的單一設備或複數鏡像設備對所述候選深度學習訓練模型進行訓練。所述多節點訓練包括由複數節點使用複製模式或參數伺服器模式對所述候選深度學習訓練模型進行訓練。 In one possible implementation manner, the single-node training includes training the candidate deep learning training model by a single device or a plurality of mirrored devices in a single node. The multi-node training includes training the candidate deep learning training model by a plurality of nodes using a replication mode or a parameter server mode.

在本申請實施例中，在由單一節點進行訓練時，可以由單一設備進行訓練，例如由單一圖形處理器進行訓練，由圖形處理器或中央處理器存儲參數。也可以由複數鏡像設備對所述候選深度學習訓練模型進行訓練，例如，複數鏡像設備圖形處理器對所述候選深度學習訓練模型進行訓練，然後由圖形處理器存儲參數。 In this embodiment of the present application, when training is performed by a single node, the training may be performed by a single device, for example, by a single graphics processor, and parameters are stored by a graphics processor or a central processing unit. The candidate deep learning training model may also be trained by a plurality of mirroring devices, for example, a graphics processor of a complex mirroring device trains the candidate deep learning training model, and then the graphics processor stores parameters.

在本申請實施例中，在多節點上進行共同訓練，即進行分散式所述候選深度學習訓練模型訓練，將複數程式同步共同訓練所述候選深度學習訓練模型，並共用所述候選深度學習訓練模型的參數變數，例如權重、偏置值等。示例性地，在使用複製模式時，由複數節點上的複數圖形處理器進行訓練，並由圖形處理器存儲參數。在使用參數伺服器模式時，即基於參數伺服器(Parameter-Server)對所述候選訓練模型的參數與訓練資料分開存放，由圖形處理器進行訓練，由中央處理器存儲參數。 In the embodiment of the present application, joint training is performed on multiple nodes, that is, the training of the candidate deep learning training model is performed in a distributed manner, the candidate deep learning training model is jointly trained by multiple programs synchronously, and the candidate deep learning training model is shared. Parametric variables of the model, such as weights, bias values, etc. Illustratively, when using replication mode, training is performed by a complex graphics processor on a complex node, and parameters are stored by the graphics processor. When the parameter server mode is used, the parameters of the candidate training model are stored separately from the training data based on the parameter server (Parameter-Server), the training is performed by the graphics processor, and the parameters are stored by the central processing unit.

在其中一種可能實現方式中，使用TensorFlow的應用程式介面做深度學習分散式訓練時，若SOFA伺服器獲取的高速串列電腦擴展匯流排標準匯流排(PCIe BUS)資料交換數量較少，則POTATO伺服器建議採用參數伺服器模式，如Parameter Server，用於對大規模參數的分散式存儲和協同的支援；反之，採用複製模式如鏡像複製Mirrored Replicated，透過網路把一個地域的資料中心(Data Center)中的鏡像檔案傳輸到目標地域的資料中心。 In one of the possible implementations, when using the TensorFlow API for deep learning distributed training, if the number of high-speed serial computer expansion bus (PCIe BUS) data exchanges obtained by the SOFA server is small, the POTATO It is recommended to use a parameter server mode, such as Parameter Server, to support the distributed storage and collaboration of large-scale parameters; on the contrary, use a replication mode such as Mirrored Replicated to transfer a regional data center (Data Center) through the network. Center) is transferred to the data center in the target region.

在其中一種可能實現方式中，所述根據所述效能指標量測值及所述系統資源確定資源配置策略包括：根據所述效能指標量測值及所述系統資源為所述候選深度學習訓練模型配置應用程式介面、軟體資源及硬體資源。 In one possible implementation manner, the determining a resource allocation policy according to the performance index measurement value and the system resources includes: training the candidate deep learning model for the candidate deep learning model according to the performance index measurement value and the system resources Configure the API, software resources, and hardware resources.

在本申請實施例中，所述內置伺服器確定所述自動機器學習系統10當前選定的所述候選深度學習模型採用的應用程式介面、搭配的軟體資源及硬體資源。 In the embodiment of the present application, the built-in server determines the application programming interface, matching software resources and hardware resources used by the candidate deep learning model currently selected by the automatic machine learning system 10 .

在其中一種可能實現方式中，請參閱圖5，所述根據所述效能指標量測值及所述系統資源為所述候選深度學習訓練模型配置應用程式介面具體可以透過以下步驟進行： In one of the possible implementations, please refer to FIG. 5 , the configuration of the application program interface for the candidate deep learning training model according to the performance index measurement value and the system resource can be specifically performed through the following steps:

步驟S301：確定所述候選深度學習訓練模型的應用程式介面類別型。 Step S301: Determine the API type of the candidate deep learning training model.

在本申請實施例中，重新確定所述候選深度學習訓練模型所需的應用程式介面類別型。 In the embodiment of the present application, the API type required by the candidate deep learning training model is re-determined.

步驟S302：根據所述應用程式介面類別型確定所述候選深度學習訓練模型的新應用程式介面。 Step S302: Determine a new API of the candidate deep learning training model according to the API type.

在本申請實施例中，根據重新確定的應用程式介面類別型為所述候選深度學習訓練模型分配新的應用程式介面。 In the embodiment of the present application, a new API is allocated to the candidate deep learning training model according to the re-determined API type.

步驟S303：透過共用所述自動機器學習系統的環境變數調整所述新應用程式介面參數，其中，所述參數包括批大小。 Step S303: Adjust the parameters of the new API by sharing the environment variables of the automatic machine learning system, wherein the parameters include batch size.

在本申請實施例中，所述內置伺服器以共用所述自動機器學習系統10環境變數的方式調整新應用程式介面，並重新執行所述新應用程式介面。示例性地，根據GPU的計算能力及其記憶體大小，決定可負擔的深度學習模型批量處理的最大值，並在新應用程式介面中設定，以在重新開機所述應用程式介面時，可以調整所述候選深度學習模型的一次訓練的資料大小。 In the embodiment of the present application, the built-in server adjusts the new application programming interface by sharing the environment variables of the automatic machine learning system 10, and re-executes the new application programming interface. Illustratively, determine the affordable deep learning model based on the computing power of the GPU and its memory size The maximum value for batch processing and is set in the new API so that when the API is restarted, the data size of a training session of the candidate deep learning model can be adjusted.

在其中一種可能實現方式中，使用遠端調用GRPC在所有聯接的計算節點之間共用相同的所述候選深度模型，以進行批量資料平行計算。 In one of the possible implementations, the same candidate depth model is shared among all connected computing nodes by using a remote calling GRPC to perform batch data parallel computing.

步驟S40：根據所述分發策略和/或所述資源配置策略分配所述自動機器學習系統的計算資源，以使得所述候選深度學習訓練模型基於所述計算資源配置進行訓練。 Step S40: Allocate computing resources of the automatic machine learning system according to the distribution strategy and/or the resource configuration strategy, so that the candidate deep learning training model is trained based on the computing resource configuration.

在本申請實施例中，所述候選深度學習訓練模型基於所述計算資源配置進行訓練，所述自動機器學習系統10對所述候選深度學習訓練模型的性能進行評價，然後根據模型評價結果繼續選定新的候選深度學習訓練模型進行模型訓練，繼續步驟S10至步驟S40為該新的候選深度學習訓練模型分配計算資源，直至獲得符合要求的候選深度學習訓練模型。 In the embodiment of the present application, the candidate deep learning training model is trained based on the computing resource configuration, and the automatic machine learning system 10 evaluates the performance of the candidate deep learning training model, and then continues to select according to the model evaluation result Model training is performed on the new candidate deep learning training model, and steps S10 to S40 are continued to allocate computing resources to the new candidate deep learning training model, until a candidate deep learning training model that meets the requirements is obtained.

在本申請實施例中，內置伺服器將確定的所述分發策略和/或所述資源配置策略發給所述自動機器學習系統10，基於所述分發策略和/或所述資源配置策略分配所述自動機器學習系統10的計算資源，以使得所述自動機器學習系統10根據分配的計算資源對當前選定的所述候選深度學習訓練模型進行計算資源配置，如分配CPU內核數、主記憶體容量、GPU數量等，所述候選深度學習訓練模型基於所述計算資源配置進行訓練。 In this embodiment of the present application, the built-in server sends the determined distribution strategy and/or the resource allocation strategy to the automatic machine learning system 10, and allocates the determined distribution strategy and/or the resource allocation strategy to the automatic machine learning system 10. computing resources of the automatic machine learning system 10, so that the automatic machine learning system 10 configures computing resources for the currently selected candidate deep learning training model according to the allocated computing resources, such as the number of allocated CPU cores, the capacity of the main memory , the number of GPUs, etc., the candidate deep learning training model is trained based on the computing resource configuration.

自動機器學習系統10在每一輪嘗試新的候選深度學習模型時，便透過所述自動機器學習系統效能調優方法動態根據每一新選定的候選深度學習模型的特性進行效能優化。 When trying new candidate deep learning models in each round, the automatic machine learning system 10 dynamically optimizes performance according to the characteristics of each newly selected candidate deep learning model through the automatic machine learning system performance tuning method.

在其中一種可能實現方式中，在對所述自動機器學習系統10進行調優完成之後，生成對應的自動機器學習應用程式介面，將測試資料登錄至所述自動機器學習應用程式介面，對所述測試資料進行測試。 In one possible implementation manner, after the automatic machine learning system 10 is optimized, a corresponding automatic machine learning application program interface is generated, the test data is logged into the automatic machine learning application program interface, and the automatic machine learning application program interface is recorded. test data.

請參閱圖6，本申請的實施例提供一種效能調優裝置73，包括第一獲取模組101、第二獲取模組102、策略確定模組103及分配模組104。 Referring to FIG. 6 , an embodiment of the present application provides a performance tuning device 73 , which includes a first acquisition module 101 , a second acquisition module 102 , a policy determination module 103 and an allocation module 104 .

第一獲取模組101用於獲取所述自動機器學習系統的預設應用程式介面及系統資源。 The first obtaining module 101 is used for obtaining the default application programming interface and system resources of the automatic machine learning system.

第二獲取模組102用於在所述自動機器學習系統對一候選深度學習訓練模型進行預訓練時，根據所述預設應用程式介面獲取其對應的效能指標量測值。 The second obtaining module 102 is configured to obtain a corresponding performance index measurement value according to the preset application program interface when the automatic machine learning system pre-trains a candidate deep learning training model.

策略確定模組103用於根據所述效能指標量測值及所述系統資源確定分發策略和/或資源配置策略。 The strategy determination module 103 is configured to determine a distribution strategy and/or a resource allocation strategy according to the performance index measurement value and the system resources.

分配模組104用於根據所述分發策略和/或所述資源配置策略分配所述自動機器學習系統的計算資源，以使得所述候選深度學習訓練模型基於所述計算資源配置進行訓練。 The allocation module 104 is configured to allocate computing resources of the automatic machine learning system according to the distribution strategy and/or the resource allocation strategy, so that the candidate deep learning training model is trained based on the computing resource allocation.

所稱處理器71可以是中央處理模組(Central Processing Unit，CPU)，還可以是其他通用處理器、數位訊號處理器(Digital Signal Processor，DSP)、專用積體電路(Application Specific Integrated Circuit，ASIC)、現成可程式設計閘陣列(Field-Programmable Gate Array，FPGA)或者其他可程式設計邏輯元件、分立門或者電晶體邏輯元件、分立硬體元件等。通用處理器可以是微處理器或者所述處理器71也可以是任何常規的處理器等，所述處理器71是所述電子設備7的控制中心，利用各種介面和線路連接整個電子設備7的各個部分。 The processor 71 may be a central processing unit (CPU), other general-purpose processors, a digital signal processor (DSP), or an application specific integrated circuit (ASIC). ), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic elements, discrete gate or transistor logic elements, discrete hardware elements, and the like. The general-purpose processor can be a microprocessor or the processor 71 can also be any conventional processor, etc. The processor 71 is the control center of the electronic device 7, and uses various interfaces and lines to connect the entire electronic device 7. various parts.

所述記憶體72可用於存儲所述電腦程式和/或模組/單元，所述處理器71透過運行或執行存儲在所述記憶體72內的電腦程式和/或模組/單元，以及調用存儲在記憶體72內的資料，實現所述計電子設備7的各種功能。所述記憶體72可主要包括存儲程式區和存儲資料區，其中，存儲程式區可存儲作業系統、至少一個功能所需的應用程式(比如聲音播放功能、圖像播放功能等)等；存儲資料區可存儲根據電子設備7的使用所創建的資料等。此外，記憶體72可以包括高速隨機存取記憶體，還可以包括非易失性記憶體，例如硬碟、記憶體、插接式硬碟，智慧存儲卡(Smart Media Card,SMC)，安全數位(Secure Digital,SD)卡，快閃記憶體卡(Flash Card)、至少一個磁碟記憶體件、快閃記憶體元件、或其他易失性固態記憶體件。 The memory 72 can be used to store the computer programs and/or modules/units, and the processor 71 executes or executes the computer programs and/or modules/units stored in the memory 72, and calls The data stored in the memory 72 realizes various functions of the electronic device 7 . The memory 72 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; The data storage area can store data or the like created according to the use of the electronic device 7 . In addition, the memory 72 may include high-speed random access memory, and may also include non-volatile memory such as hard disk, memory, plug-in hard disk, Smart Media Card (SMC), Secure Digital (Secure Digital, SD) card, flash memory card (Flash Card), at least one disk memory device, flash memory device, or other volatile solid state memory device.

所述電子設備7集成的模組/單元如果以軟體功能模組的形式實現並作為獨立的產品銷售或使用時，可以存儲在一個電腦可讀取存儲介質中。基於這樣的理解，本發明實現上述實施例方法中的全部或部分流程，也可以透過電腦程式來指令相關的硬體來完成，所述的電腦程式可存儲於一電腦可讀存儲介質中，所述電腦程式在被處理器執行時，可實現上述各個方法實施例的步驟。其中，所述電腦程式包括電腦程式代碼，所述電腦程式代碼可以為原始程式碼形式、物件代碼形式、可執行檔或某些中間形式等。所述電腦可讀介質可以包括：能夠攜帶所述電腦程式代碼的任何實體或裝置、記錄介質、U盤、移動硬碟、磁碟、光碟、電腦記憶體、唯讀記憶體(ROM，Read-Only Memory)、隨機存取記憶體(RAM，Random Access Memory)、電載波訊號、電信訊號以及軟體分發介質等。需要說明的是，所述電腦可讀介質包含的內容可以根據司法管轄區內立法和專利實踐的要求進行適當的增減，例如在某些司法管轄區，根據立法和專利實踐，電腦可讀介質不包括電載波訊號和電信訊號。 If the modules/units integrated in the electronic device 7 are implemented in the form of software function modules and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the present invention can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing the relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium, so When the computer program is executed by the processor, the steps of the above-mentioned method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of original code, object code, executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U disk, removable hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-only memory) Only Memory), random access memory (RAM, Random Access Memory), electrical carrier signal, electrical signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, the computer-readable medium Electric carrier signals and telecommunication signals are not included.

在本發明所提供的幾個實施例中，應該理解到，所揭露的電子設備和方法，可以透過其它的方式實現。例如，以上所描述的電子設備實施例僅僅是示意性的，例如，所述模組的劃分，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式。 In the several embodiments provided by the present invention, it should be understood that the disclosed electronic device and method may be implemented in other manners. For example, the above-described electronic device embodiments are only illustrative. For example, the division of the modules is only a logical function division, and other division methods may be used in actual implementation.

另外，在本發明各個實施例中的各功能模組可以集成在相同處理模組中，也可以是各個模組單獨物理存在，也可以兩個或兩個以上模組集成在相同模組中。上述集成的模組既可以採用硬體的形式實現，也可以採用硬體加軟體功能模組的形式實現。 In addition, each functional module in each embodiment of the present invention may be integrated in the same processing module, or each module may exist physically alone, or two or more modules may be integrated in the same processing module. in the same module. The above-mentioned integrated modules can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

最後應說明的是，以上實施例僅用以說明本發明的技術方案而非限制，儘管參照較佳實施例對本發明進行了詳細說明，本領域的普通技術人員應當理解，可以對本發明的技術方案進行修改或等同替換，而不脫離本發明技術方案的精神和範圍。 Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and not to limit them. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art should understand that the technical solutions of the present invention can be Modifications or equivalent substitutions can be made without departing from the spirit and scope of the technical solutions of the present invention.

7:電子設備 7: Electronic equipment

71:處理器 71: Processor

72:記憶體 72: Memory

73:效能調優裝置 73: Performance Tuning Device

Claims

An automatic machine learning system performance tuning method, which is applied to an performance tuning device connected to the automatic machine learning system, wherein the automatic machine learning system performance tuning method comprises: obtaining a preset of the automatic machine learning system Application programming interface and system resources; when the automatic machine learning system pre-trains the candidate deep learning training model, obtains its corresponding performance index measurement value according to the default application programming interface; according to the performance index measurement value and the system resources to determine a distribution strategy and/or a resource allocation strategy; and, allocating computing resources of the automatic machine learning system according to the distribution strategy and/or the resource allocation strategy, so that the candidate deep learning training The model is trained based on the computing resource configuration.

The automatic machine learning system performance tuning method according to claim 1, wherein the determining a distribution strategy according to the performance index measurement value and the system resource comprises: according to the performance index measurement value and the system resource Determine single-node training or multi-node training, wherein the single-node training includes training the candidate deep learning training model by a single node, and the multi-node training includes jointly training the candidate deep learning training model by a plurality of nodes , and share the parameter variables in the candidate deep learning training model.

The automatic machine learning system performance tuning method according to claim 1, wherein the single-node training includes training the candidate deep learning training model by a single device or a plurality of mirrored devices in a single node; the multi-node training This includes training the candidate deep learning training model by the plurality of nodes using a replication mode or a parameter server mode.

The automatic machine learning system performance tuning method according to claim 3, wherein, The determining a resource allocation strategy according to the performance index measurement value and the system resource includes: configuring an application programming interface, software resources, and hardware resources.

The automatic machine learning system performance tuning method of claim 4, wherein the configuring an application program interface for the candidate deep learning training model according to the performance index measurements and the system resources comprises: determining the candidate an API type for a deep learning training model; determine a new API for the candidate deep learning training model according to the API type; adjust the new application by sharing environmental variables of the automatic machine learning system Interface parameters, wherein the parameters include batch size.

The performance tuning method for an automatic machine learning system according to claim 1, wherein the acquiring the default application programming interface and system resources of the automatic machine learning system comprises: Application programming interface and system resources of the automatic machine learning system are recorded in a database of a tuning server; the tuning server reads the system resources of the automatic machine learning system from the database; a performance analysis The tool reads the default API from the tuning server's database.

The performance tuning method for an automatic machine learning system according to claim 1, wherein, when the automatic machine learning system pre-trains a candidate deep learning training model, the corresponding information is obtained according to the default application program interface. The performance index measurement value includes: when the automatic machine learning system pre-trains a candidate deep learning training model, the performance analysis tool obtains the performance of the application program interface according to the default application program interface The index measurement value is transmitted to the tuning server by means of communication, wherein the performance analysis tool is integrated and packaged with the automatic machine learning system.

A performance tuning device, wherein the performance tuning device comprises: a first acquisition module for acquiring a preset application program interface and system resources of the automatic machine learning system; a second acquisition module for When the automatic machine learning system pre-trains the candidate deep learning training model, the corresponding performance index measurement value is obtained according to the preset application program interface; the strategy determination module is used for measuring the value according to the performance index and the system resource to determine a distribution strategy and/or a resource allocation strategy; and an allocation module for allocating computing resources of the automatic machine learning system according to the distribution strategy and/or the resource allocation strategy, so that the The candidate deep learning training model is trained based on the computing resource configuration.

An electronic device, wherein the electronic device comprises: one or more processors; when one or more programs are executed by the one or more processors, so that the one or more processors implement any one of claim 1 to 7 The automatic machine learning system performance tuning method described in item.

A computer-readable storage medium, wherein a computer program is stored, and when the computer program is executed by a processor, the automatic machine learning system performance tuning method according to any one of claim 1 to 7 is implemented.