WO2023083058A1 - Scheduling parameter adjusting method, devices, and storage medium - Google Patents

Scheduling parameter adjusting method, devices, and storage medium Download PDF

Info

Publication number
WO2023083058A1
WO2023083058A1 PCT/CN2022/129029 CN2022129029W WO2023083058A1 WO 2023083058 A1 WO2023083058 A1 WO 2023083058A1 CN 2022129029 W CN2022129029 W CN 2022129029W WO 2023083058 A1 WO2023083058 A1 WO 2023083058A1
Authority
WO
WIPO (PCT)
Prior art keywords
scheduling
operator
parameters
target device
parameter
Prior art date
Application number
PCT/CN2022/129029
Other languages
French (fr)
Chinese (zh)
Inventor
裘瑞涛
金士英
刘涛
王永成
韩炳涛
屠要峰
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023083058A1 publication Critical patent/WO2023083058A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present application relate to the field of computer technology, and in particular, to a scheduling parameter adjustment method, device, and storage medium.
  • the scheduling optimization of deep learning model reasoning is mainly done manually, which usually cannot achieve optimal scheduling, and the efficiency is very low, and resources cannot be allocated quickly and efficiently.
  • the main purpose of the embodiments of this application is to propose a scheduling parameter adjustment method, device, and storage medium, aiming to overcome the dependence on manual work in various application scenarios, and automatically implement the operator scheduling process of any target device. Scheduling design, to obtain the optimal scheduling parameters, and more efficient and faster.
  • an embodiment of the present application provides a method for adjusting scheduling parameters, which is applied to the main control device, including: searching for an operator scheduling template that matches the target device; and scheduling according to the matched operator scheduling template and The parameter search algorithm generates scheduling parameters, and sends the scheduling parameters to the target device for the target device to run the scheduling process corresponding to the operator according to the scheduling parameters; execute the scheduling process after receiving feedback from the target device performance data, adjusting the scheduling parameter according to the performance data and sending it to the target device.
  • the embodiment of the present application also proposes a scheduling parameter adjustment method, which is applied to the target device, including: receiving the scheduling parameter sent by the master device; wherein the scheduling parameter is based on matching with the target device
  • the operator scheduling template and the scheduling parameter search algorithm are generated; the scheduling process corresponding to the operator is run according to the scheduling parameters; the performance data of the scheduling process is fed back to the main control device for the main control device
  • the performance data adjusts the scheduling parameter and sends it to the target device.
  • the embodiment of the present application also proposes a main control device, including: a search module, configured to search for an operator scheduling template that matches the target device; a scheduling parameter generation module, configured to The sub-scheduling template and the scheduling parameter search algorithm generate scheduling parameters, and send the scheduling parameters to the target device for the target device to run the scheduling process corresponding to the operator according to the scheduling parameters; the iteration module is used to receive the The performance data for executing the scheduling process fed back by the target device is adjusted, and the scheduling parameter is adjusted according to the performance data and sent to the target device.
  • a search module configured to search for an operator scheduling template that matches the target device
  • a scheduling parameter generation module configured to The sub-scheduling template and the scheduling parameter search algorithm generate scheduling parameters, and send the scheduling parameters to the target device for the target device to run the scheduling process corresponding to the operator according to the scheduling parameters
  • the iteration module is used to receive the The performance data for executing the scheduling process fed back by the target device is adjusted, and the scheduling parameter is adjusted according to the performance data
  • an embodiment of the present application also proposes a target device, including: a receiving module, configured to receive a scheduling parameter sent by a master device; wherein, the scheduling parameter is based on an operator matched with the target device A scheduling template and scheduling parameter search algorithm are generated; an operation module is used to run the scheduling process corresponding to the operator according to the scheduling parameters; a feedback module is used to feed back the performance data of the scheduling process to the main control device for the The master control device adjusts the scheduling parameter according to the performance data and sends it to the target device.
  • an embodiment of the present application also proposes an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be used by the Instructions executed by at least one processor, where the instructions are executed by the at least one processor, so that the at least one processor can execute the scheduling parameter adjustment method described in any one of the preceding items.
  • the embodiment of the present application also proposes a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the method for adjusting the scheduling parameter as described in any one of the preceding items is implemented.
  • FIG. 1 is a schematic flowchart of a method for adjusting scheduling parameters applied to a master control device provided in an embodiment of the present application;
  • FIG. 2 is a schematic flowchart of a method for adjusting scheduling parameters applied to a target device provided in an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a master control device provided in another embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of a target device provided in another embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of an electronic device provided in another embodiment of the present application.
  • the scheduling parameter adjustment method after searching for the operator scheduling template matching the target device, generating scheduling parameters according to the matching operator scheduling template and scheduling parameter search algorithm, and sending the scheduling parameters to the target device, Provide the target device to run the scheduling process corresponding to the operator according to the scheduling parameters, and then receive the performance data of the execution scheduling process fed back by the target device, adjust the scheduling parameters according to the performance data and send it to the target device until the performance data converges, that is, the scheduling design process of reasoning Decoupling into three parts: determining operator scheduling template, determining scheduling parameters, and running that can be understood and executed by the machine, so that the scheduling design process that was originally implemented manually can be handed over to the machine to complete, overcoming the dependence on manual labor, and reducing the scheduling design process While reducing the workload of manual participation, it can also improve design efficiency, cover as many actual application scenarios as possible, enhance applicability and practicality, and enable scheduling for any remote device in various application scenarios Design, efficiently and quickly obtain the optimal scheduling parameters, so as to automatically accelerate the inference speed
  • determining the scheduling parameters is to generate scheduling parameters according to the matching operator scheduling template and scheduling parameter search algorithm , put it on the main control device to complete, without running on the target device, and avoid the problem that the target device is a device with poor computing performance, such as user terminal, edge device CPU, etc., which is inefficient or even impossible to realize.
  • the embodiments of the present application provide a method for adjusting scheduling parameters, which is applied to a master control device.
  • the master control device may be an electronic device such as a computer or a server, as shown in FIG. 1 , and specifically includes the following steps.
  • Step 101 searching for an operator scheduling template matching the target device.
  • the operator refers to various operations in the depth model, such as convolution, pooling, splicing, and upsampling.
  • An operator scheduling template is a description of the scheduling process of an operator in a specific environment, including at least operator feature information and operator running information.
  • an operator scheduling template can include convolution kernels, running
  • the environment refers to information such as the hardware that the runtime depends on.
  • the operator scheduling template in the operator scheduling template database needs to cover a variety of application scenarios as much as possible, that is, operators with various operator feature information and operator running information, for example,
  • the operator scheduling template database must at least define the corresponding subdivision scenarios for each subdivided scene in which the length of the convolution kernel is equal to 1 or greater than 1, and the operating environment is x86 or x84 CPU, ARM processor, GPU, etc. Operator scheduling template.
  • the operator scheduling template with the highest matching degree can be used as the matched operator scheduling template.
  • the matching operator scheduling templates are operator scheduling templates corresponding to CPUs with 3 ⁇ 3 convolution kernels and x86 operating environments.
  • a deep learning model usually includes several operators, and the scheduling design should actually design the scheduling process of multiple operators. Therefore, in an example, before searching for the operator scheduling template that matches the target device
  • the method for adjusting the scheduling parameters further includes: splitting the deep learning model involved in the scheduling parameters to be acquired into a single operator.
  • searching for an operator scheduling template that matches the target device includes: searching for an operator scheduling template that matches the target device and obtains operators obtained through splitting.
  • the deep learning model that needs inference acceleration in the target device is a face recognition model trained based on Convolutional Neural Network (CNN), that is, the deep learning model involved in the scheduling parameters to be obtained is face recognition model, first split the face recognition model, and obtain 32 convolution operators with convolution kernels of 11 ⁇ 11, 1 pooling operator, 32 convolution operators of 9 ⁇ 9, and 16 convolution operators of 7 ⁇ 7 convolution operator, 16 5 ⁇ 5 convolution operators, 1 fully connected operator, and 1 loss function operator.
  • CNN Convolutional Neural Network
  • the target device uses the CPU when running the deep learning model, and then, according to The operator feature information and operator running information of each operator above are searched and matched against the corresponding operator scheduling template in the operator scheduling template database.
  • this embodiment does not limit the number of operators and operator scheduling templates. How many operators or how many types of operators are involved in the deep learning model involved in the scheduling parameters to be obtained in the target device? Search for the corresponding number of operator scheduling templates. If the deep learning model consists of 78 operations, search for the 78 operator scheduling templates corresponding to the 78 operations. Or, if the deep learning model consists of 98 operations, and this The 98 operations correspond to 75 types of operators, and the operator scheduling templates corresponding to the 75 types of operators included in the 98 operations are searched. Among them, operators with different operator feature information can be considered as different types of operators, such as Convolution operators with different convolution kernels can be considered as different kinds of operators. Of course, the above is only a specific example, and the number of operator scheduling templates and the depth model can also have other relationships, which will not be described here.
  • Step 102 generate scheduling parameters according to the matching operator scheduling template and scheduling parameter search algorithm, and send the scheduling parameters to the target device, so that the target device runs the scheduling process corresponding to the operator according to the scheduling parameters.
  • the scheduling parameter search algorithm is an algorithm for finding an optimal solution in an optimization problem, such as simulated annealing algorithm, gradient descent algorithm, global traversal algorithm, etc. This embodiment does not limit the scheduling parameter search algorithm.
  • generating scheduling parameters based on the operator scheduling template and the scheduling parameter search algorithm can be realized in the following way: Generate an operator scheduling parameter set according to the scheduling parameters exposed by the matching operator scheduling template, where the operator scheduling template exposes
  • the scheduling parameters refer to the preset feasible scheduling parameters in the operator scheduling process;
  • the operator scheduling parameter set includes multiple sets of scheduling parameters, and each set of scheduling parameters includes the scheduling parameters required in the operator's one scheduling process;
  • the scheduling parameter search algorithm searches out a set of scheduling parameters in the operator scheduling parameter set, and uses the searched set of scheduling parameters as the generated scheduling parameters.
  • the scheduling parameters of an operator include parameter A and parameter B.
  • the feasible value range of parameter A is ⁇ a1, a2,...,an ⁇
  • the feasible value range of parameter B is ⁇ b1, b2,...,bm ⁇
  • the parameter A includes a1, a2,..., an
  • the parameter B includes b1, b2,..., bm in the scheduling parameters exposed by the operator scheduling template corresponding to the operator , operator scheduling parameter set C, that is, the scheduling parameters are ⁇ (a1, b1), (a1, b2), ..., (a1, bm), (a2, b1), ..., (a2, bm), ...
  • the combination of B may be a combination of parameter A and parameter B that require the least system resources.
  • the exposed scheduling parameters can determine specific values, that is, they can be exhausted.
  • the exposed scheduling parameters can also include continuous scheduling parameters within a certain range, that is, they cannot be exhausted.
  • the operator scheduling parameter set is still generated according to the exposed scheduling parameters, and then the operator scheduling parameter set is searched based on the scheduling parameter search algorithm, which will not be described here.
  • the scheduling parameter adjustment method further includes: according to the operator-based scheduling The size of the parameter search space formed by the parameter set, select a scheduling parameter search algorithm in the preset scheduling search algorithm database; wherein, the parameter search space is obtained based on the operator scheduling parameter set, and the scheduling search algorithm database includes a variety of scheduling parameter search algorithm.
  • searching for a set of scheduling parameters in the operator scheduling parameter set by using the scheduling parameter search algorithm includes: searching for a set of scheduling parameters in the operator scheduling parameter set by using the selected scheduling parameter search algorithm.
  • a scheduling parameter search algorithm is selected from the preset scheduling search algorithm database, which can be realized in the following way: predict the performance according to the size of the parameter search space The time required for data convergence; when the time required for performance data convergence is greater than the preset threshold, select a scheduling parameter search algorithm that is biased towards global uniform search; when the time required for performance data convergence is less than or equal to the preset threshold, Select a scheduling parameter search algorithm that searches for a local optimal solution within a specified time.
  • the scheduling parameters of an operator can be exhausted, that is, the feasible values of the scheduling parameters can be explained by enumerating.
  • the search space is considered to be relatively small, and it can be selected from the preset scheduling search algorithm database.
  • the global traversal algorithm gets the generated scheduling parameters. It is worth mentioning that, since the global traversal algorithm will compare each feasible solution, and then determine the optimal solution, therefore, the global traversal algorithm can ensure that the obtained scheduling parameters are the current optimal solution, and the accuracy of the search extremely high.
  • the search space of scheduling parameters of an operator is relatively large, and the execution time of the scheduling parameter search algorithm is required in advance. At this time, it is necessary to estimate the convergence time of the scheduling parameter search algorithm.
  • the scheduling parameter search algorithm with high search accuracy can be preferentially selected in the scheduling search algorithm database, such as global traversal. If the estimated convergence time is not less than the execution time, A scheduling parameter search algorithm with high search efficiency can be preferentially selected in the scheduling search algorithm database, such as the steepest descent method. For example, when the parameter search space size corresponding to an operator scheduling template is 100, and the evaluation time of each target device is 5s, the total running time of the algorithm is about 500s. If the time is less than the preset threshold T, then You can choose the global traversal algorithm, and vice versa, you can choose optimization algorithms such as simulated annealing.
  • a suitable scheduling parameter search algorithm can be selected from the scheduling search algorithm database according to requirements, and details will not be repeated here.
  • the deep learning model in the target device usually contains several operators. Therefore, there may be multiple operator scheduling templates obtained through the matching in step 101. Considering the actual relationship between operators, the target When the deep learning model is running on the device, the scheduling process of operators will affect each other. Therefore, the influence between several operator scheduling templates needs to be considered when generating scheduling parameters. That is to say, the scheduling parameter search algorithm is for all matching In terms of operator scheduling templates, rather than a single operator scheduling template. In particular, different numbers of scheduling templates of the same type of operators corresponding to deep learning models will also lead to differences in optimal scheduling parameters.
  • the scheduling template for all matched operators is mainly related to the objective function in the scheduling parameter search algorithm. Therefore, the above only uses a single operator scheduling template as an example for illustration, and can be extended to the case of multiple operator scheduling templates, which does not mean that this embodiment can only be implemented for a single operator scheduling template.
  • the search space may be formed by combining the scheduling parameters of the various operators included in the deep learning model, which will not be described here.
  • Step 103 receiving the performance data of executing the scheduling process fed back by the target device, adjusting the scheduling parameters according to the performance data and sending it to the target device.
  • the target device when receiving the performance data fed back by the target device, first compare the historically received performance data according to the currently received performance data to detect whether the performance has improved. If it is detected that the performance has not been improved, It is determined that the search algorithm has converged. At this time, it is necessary to select the scheduling parameter corresponding to the test item with the best historical scheduling performance as the choice of the optimal scheduling parameter; when it is detected that the performance has been improved, it is determined that the search algorithm has not converged. , there may still be a better combination of scheduling parameters in the search space, and it is necessary to adjust the scheduling parameter search algorithm according to a certain strategy to select another set of scheduling parameter combinations and send them to the target device for execution. Among them, adjusting the scheduling parameter search algorithm The strategy can be an optimization direction determined according to the execution effect, or it can add a certain disturbance to the scheduling parameter search algorithm to make it iteratively select other scheduling parameter combinations in another direction, which will not be described here.
  • the target device it actually continuously receives the performance data returned by the target device, then adjusts the scheduling parameters according to the performance parameters, and then sends the adjusted scheduling parameters to the target device until the performance data converges, that is, obtains Satisfactory scheduling parameters, that is, the optimal scheduling parameters are determined through loops until the optimal scheduling parameters are obtained, which ensures the optimality of the scheduling parameters.
  • this embodiment realizes automatic scheduling design by decoupling the scheduling design process into three parts, which can be understood and executed by the machine: determining operator scheduling template, determining scheduling parameters, and running.
  • Dependence and because the scheduling design is realized automatically, it avoids the limitation of manpower, and can design the optimal scheduling for all operators on any hardware, even though the deep learning model usually contains a large number of different types of operators, and the types For the same operator, different parameters have different optimal scheduling implementation methods. Different types of hardware used will also affect the optimal scheduling of hardware resources. Even the same operator is optimal on the same type of hardware with different models. There are also differences in scheduling, and a large amount of design work can be completed by the huge computing power of the machine, and the coverage of scheduling design for various application scenarios can be realized.
  • the embodiments of the present application also provide a method for adjusting scheduling parameters, which is applied to a target device.
  • the target device may be an electronic device such as a computer or a server, as shown in FIG. 2 , which specifically includes the following steps.
  • Step 201 receiving the scheduling parameters sent by the master device; wherein, the scheduling parameters are generated according to the operator scheduling template and the scheduling parameter search algorithm matching the target device.
  • the received scheduling parameters may be the scheduling parameters of a single operator or the scheduling parameters of multiple operators.
  • Step 202 run the scheduling process corresponding to the operator according to the scheduling parameters.
  • the target device also monitors the running process to obtain performance data.
  • Step 203 Feedback the performance data of executing the scheduling process to the main control device, so that the main control device can adjust the scheduling parameters according to the performance data and send it to the target device.
  • the embodiments of the present application also provide a master control device, as shown in FIG. 3 , including the following modules.
  • the search module 301 is configured to search for an operator scheduling template that matches the target device.
  • the scheduling parameter generating module 302 is configured to generate scheduling parameters according to the matching operator scheduling template and scheduling parameter search algorithm, and send the scheduling parameters to the target device for the target device to run the scheduling process corresponding to the operator according to the scheduling parameters.
  • the iteration module 303 is configured to receive the performance data of the execution scheduling process fed back by the target device, adjust the scheduling parameters according to the performance data and send it to the target device.
  • this embodiment is a device embodiment corresponding to the method embodiment applied to the main control device, and this embodiment can be implemented in cooperation with the method embodiment applied to the main control device.
  • the relevant technical details mentioned in the embodiment of the method applied to the master control device are still valid in this embodiment, and will not be repeated here to reduce repetition.
  • the relevant technical details mentioned in this embodiment can also be applied to the method embodiment applied to the master control device.
  • modules involved in this embodiment are logical modules.
  • a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units.
  • units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • the embodiment of the present application provides a target device, as shown in FIG. 4 , including the following modules.
  • the receiving module 401 is configured to receive the scheduling parameters sent by the main control device; wherein, the scheduling parameters are generated according to the operator scheduling template and the scheduling parameter search algorithm matched with the target device.
  • the running module 402 is configured to run the scheduling process corresponding to the operator according to the scheduling parameters.
  • the feedback module 403 is configured to feed back the performance data of the scheduling process to the main control device, so that the main control device adjusts the scheduling parameters according to the performance data and sends them to the target device.
  • this embodiment is a device embodiment corresponding to the method embodiment applied to the target device, and this embodiment can be implemented in cooperation with the method embodiment applied to the target device.
  • the relevant technical details mentioned in the embodiment of the method applied to the target device are still valid in this embodiment, and will not be repeated here in order to reduce repetition.
  • the relevant technical details mentioned in this embodiment can also be applied in the method embodiment applied to the target device.
  • modules involved in this embodiment are logical modules.
  • a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units.
  • units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
  • the embodiment of the present application also provides an electronic device, as shown in FIG. 5 , including: at least one processor 501; and a memory 502 communicatively connected to the at least one processor 501; An instruction to be executed by at least one processor 501, the instruction is executed by at least one processor 501, so that at least one processor 501 can execute the scheduling parameter adjustment method described in any one of the above method embodiments.
  • the memory 502 and the processor 501 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 501 and various circuits of the memory 502 together.
  • the bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein.
  • the bus interface provides an interface between the bus and the transceivers.
  • a transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium.
  • the data processed by the processor 501 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 501 .
  • Processor 501 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management and other control functions. And the memory 502 may be used to store data used by the processor 501 when performing operations.
  • Another aspect of the embodiment of the present application provides a computer-readable storage medium storing a computer program.
  • the computer program is executed by the processor, the scheduling parameter adjustment method described in any one of the above method embodiments is implemented.
  • a storage medium includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
  • an embodiment of the present invention also provides a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed When executed by a computer, the computer is made to execute the method in any of the above method embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Factory Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Embodiments of the present application relate to the technical field of computers, and provide a scheduling parameter adjusting method, devices, and a storage medium. The scheduling parameter adjusting method is applied to a main control device, and comprises: searching for an operator scheduling template matched with a target device; generating a scheduling parameter according to the matched operator scheduling template and a scheduling parameter search algorithm, and sending the scheduling parameter to the target device, so that the target device runs, according to the scheduling parameter, a scheduling process corresponding to an operator; and receiving performance data fed back by the target device and used for executing the scheduling process, adjusting the scheduling parameter according to the performance data, and sending the scheduling parameter to the target device.

Description

调度参数的调整方法、设备及存储介质Adjustment method, equipment and storage medium of scheduling parameters 技术领域technical field
本申请的实施例涉及计算机技术领域,特别涉及一种调度参数的调整方法、设备及存储介质。The embodiments of the present application relate to the field of computer technology, and in particular, to a scheduling parameter adjustment method, device, and storage medium.
背景技术Background technique
随着近年来深度学习技术在计算机视觉、语音识别、自然语言处理等多个领域获得巨大成功,工业界也开始在各类型的硬件上,如中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)、智能芯片等,逐步部署深度学习模型推理相关的服务,其中,只有当硬件的计算资源和存储资源被合理、充分调度时,深度学习模型的推理时延、吞吐率等性能指标才能被更好地提升。With the great success of deep learning technology in computer vision, speech recognition, natural language processing and other fields in recent years, the industry has also begun to use various types of hardware, such as central processing unit (central processing unit, CPU), graphics processing, etc. Graphics processing unit (GPU), smart chips, etc., and gradually deploy deep learning model inference-related services. Among them, only when the computing resources and storage resources of the hardware are reasonably and fully scheduled, the inference delay and throughput of the deep learning model Performance indicators such as efficiency can be better improved.
然而,在一些情形下对深度学习模型推理的调度优化主要是由人工完成,通常并不能实现最优调度,效率很低,无法快速、高效地进行资源合理调用。However, in some cases, the scheduling optimization of deep learning model reasoning is mainly done manually, which usually cannot achieve optimal scheduling, and the efficiency is very low, and resources cannot be allocated quickly and efficiently.
发明内容Contents of the invention
本申请实施例的主要目的在于提出一种调度参数的调整方法、设备及存储介质,旨在实现在各种应用场景下,能够克服对人工的依赖,自动对任意目标设备的算子调度过程进行调度设计,得到最优的调度参数,且更加高效、快速。The main purpose of the embodiments of this application is to propose a scheduling parameter adjustment method, device, and storage medium, aiming to overcome the dependence on manual work in various application scenarios, and automatically implement the operator scheduling process of any target device. Scheduling design, to obtain the optimal scheduling parameters, and more efficient and faster.
为至少实现上述目的,本申请实施例提供了一种调度参数的调整方法,应用于主控设备,包括:搜索与目标设备匹配的算子调度模板;根据所述匹配的算子调度模板和调度参数搜索算法生成调度参数,并将所述调度参数发送给所述目标设备,供所述目标设备根据所述调度参数运行算子对应的调度过程;接收所述目标设备反馈的执行所述调度过程的性能数据,根据所述性能数据调整所述调度参数并发送给所述目标设备。In order to at least achieve the above purpose, an embodiment of the present application provides a method for adjusting scheduling parameters, which is applied to the main control device, including: searching for an operator scheduling template that matches the target device; and scheduling according to the matched operator scheduling template and The parameter search algorithm generates scheduling parameters, and sends the scheduling parameters to the target device for the target device to run the scheduling process corresponding to the operator according to the scheduling parameters; execute the scheduling process after receiving feedback from the target device performance data, adjusting the scheduling parameter according to the performance data and sending it to the target device.
为至少实现上述目的,本申请实施例还提出了一种调度参数的调整方法,应用于目标设备,包括:接收主控设备发送的调度参数;其中,所述调度参数根据与所述目标设备匹配的算子调度模板和调度参数搜索算法生成;根据所述调度参数运行算子对应的调度过程;向所述主控设备反馈执行所述调度过程的性能数据,供所述主控设备根据所述性能数据调整所述调度参数并发送给所述目标设备。In order to at least achieve the above purpose, the embodiment of the present application also proposes a scheduling parameter adjustment method, which is applied to the target device, including: receiving the scheduling parameter sent by the master device; wherein the scheduling parameter is based on matching with the target device The operator scheduling template and the scheduling parameter search algorithm are generated; the scheduling process corresponding to the operator is run according to the scheduling parameters; the performance data of the scheduling process is fed back to the main control device for the main control device The performance data adjusts the scheduling parameter and sends it to the target device.
为至少实现上述目的,本申请实施例还提出了一种主控设备,包括:搜索模块,用于搜索与目标设备匹配的算子调度模板;调度参数生成模块,用于根据所述匹配的算子调度模板和调度参数搜索算法生成调度参数,并将所述调度参数发送给所述目标设备,供所述目标设备根据所述调度参数运行算子对应的调度过程;迭代模块,用于接收所述目标设备反馈的执行所述调度过程的性能数据,根据所述性能数据调整所述调度参数并发送给所述目标设备。In order to at least achieve the above purpose, the embodiment of the present application also proposes a main control device, including: a search module, configured to search for an operator scheduling template that matches the target device; a scheduling parameter generation module, configured to The sub-scheduling template and the scheduling parameter search algorithm generate scheduling parameters, and send the scheduling parameters to the target device for the target device to run the scheduling process corresponding to the operator according to the scheduling parameters; the iteration module is used to receive the The performance data for executing the scheduling process fed back by the target device is adjusted, and the scheduling parameter is adjusted according to the performance data and sent to the target device.
为至少实现上述目的,本申请实施例还提出了一种目标设备,包括:接收模块,用于接收主控设备发送的调度参数;其中,所述调度参数根据与所述目标设备匹配的算子调度模板和调度参数搜索算法生成;运行模块,用于根据所述调度参数运行算子对应的调度过程;反馈模块,用于向所述主控设备反馈执行所述调度过程的性能数据,供所述主控设备根据所述性能数据调整所述调度参数并发送给所述目标设备。In order to at least achieve the above purpose, an embodiment of the present application also proposes a target device, including: a receiving module, configured to receive a scheduling parameter sent by a master device; wherein, the scheduling parameter is based on an operator matched with the target device A scheduling template and scheduling parameter search algorithm are generated; an operation module is used to run the scheduling process corresponding to the operator according to the scheduling parameters; a feedback module is used to feed back the performance data of the scheduling process to the main control device for the The master control device adjusts the scheduling parameter according to the performance data and sends it to the target device.
为至少实现上述目的,本申请实施例还提出了一种电子设备,包括:至少一个处理器;以及,与所述至少一个处理器通信连接的存储器;其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上任一项所述的调度参数的调整方法。In order to at least achieve the above purpose, an embodiment of the present application also proposes an electronic device, including: at least one processor; and a memory connected to the at least one processor in communication; wherein, the memory stores information that can be used by the Instructions executed by at least one processor, where the instructions are executed by the at least one processor, so that the at least one processor can execute the scheduling parameter adjustment method described in any one of the preceding items.
为至少实现上述目的,本申请实施例还提出了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时实现如上任一项所述的 调度参数的调整方法。In order to at least achieve the above purpose, the embodiment of the present application also proposes a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the method for adjusting the scheduling parameter as described in any one of the preceding items is implemented.
附图说明Description of drawings
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定。One or more embodiments are exemplified by pictures in the accompanying drawings, and these exemplifications are not intended to limit the embodiments.
图1是本申请一实施例中提供的应用于主控设备的调度参数的调整方法的流程示意图;FIG. 1 is a schematic flowchart of a method for adjusting scheduling parameters applied to a master control device provided in an embodiment of the present application;
图2是本申请一实施例中提供的应用于目标设备的调度参数的调整方法的流程示意图;FIG. 2 is a schematic flowchart of a method for adjusting scheduling parameters applied to a target device provided in an embodiment of the present application;
图3是本申请另一实施例中提供的主控设备的结构示意图;FIG. 3 is a schematic structural diagram of a master control device provided in another embodiment of the present application;
图4是本申请另一实施例中提供的目标设备的结构示意图;FIG. 4 is a schematic structural diagram of a target device provided in another embodiment of the present application;
图5是本申请另一实施例中提供的电子设备的结构示意图。Fig. 5 is a schematic structural diagram of an electronic device provided in another embodiment of the present application.
具体实施方式Detailed ways
本申请实施例提供的调度参数的调整方法,搜索到与目标设备匹配的算子调度模板后,根据匹配的算子调度模板和调度参数搜索算法生成调度参数,并将调度参数发送给目标设备,供目标设备根据调度参数运行算子对应的调度过程,接着接收目标设备反馈的执行调度过程的性能数据,根据性能数据调整调度参数并发送给目标设备,直至性能数据收敛,即将推理的调度设计过程解耦成能够被机器理解并执行的确定算子调度模板、确定调度参数以及运行三部分,从而能够将原本人工实现的调度设计过程交由机器完成,克服了对人工的依赖,减少调度设计过程中人工参与的工作量的同时,还能够提升设计效率,尽可能地覆盖更多的实际应用场景,增强了适用性和实用性,使得在各种应用场景下,能够针对任意远端设备进行调度设计,高效、快速地得到最优调度参数,从而实现自动加速任意深度学习网络模型推理速度。此外,将确定算子调度模板、确定调度参数以及运行三个过程交给主控设备 与目标设备分别完成,其中,将确定调度参数即根据匹配的算子调度模板和调度参数搜索算法生成调度参数,放到主控设备上完成,无需在目标设备上运行,避免目标设备为计算性能较差的设备,如用户终端、边缘设备CPU等,效率低甚至无法实现的问题。In the scheduling parameter adjustment method provided by the embodiment of the present application, after searching for the operator scheduling template matching the target device, generating scheduling parameters according to the matching operator scheduling template and scheduling parameter search algorithm, and sending the scheduling parameters to the target device, Provide the target device to run the scheduling process corresponding to the operator according to the scheduling parameters, and then receive the performance data of the execution scheduling process fed back by the target device, adjust the scheduling parameters according to the performance data and send it to the target device until the performance data converges, that is, the scheduling design process of reasoning Decoupling into three parts: determining operator scheduling template, determining scheduling parameters, and running that can be understood and executed by the machine, so that the scheduling design process that was originally implemented manually can be handed over to the machine to complete, overcoming the dependence on manual labor, and reducing the scheduling design process While reducing the workload of manual participation, it can also improve design efficiency, cover as many actual application scenarios as possible, enhance applicability and practicality, and enable scheduling for any remote device in various application scenarios Design, efficiently and quickly obtain the optimal scheduling parameters, so as to automatically accelerate the inference speed of any deep learning network model. In addition, the three processes of determining the operator scheduling template, determining the scheduling parameters, and running are handed over to the main control device and the target device to complete respectively. Among them, determining the scheduling parameters is to generate scheduling parameters according to the matching operator scheduling template and scheduling parameter search algorithm , put it on the main control device to complete, without running on the target device, and avoid the problem that the target device is a device with poor computing performance, such as user terminal, edge device CPU, etc., which is inefficient or even impossible to realize.
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请的各实施例进行详细的阐述。然而,本领域的普通技术人员可以理解,在本申请各实施例中,为了使读者更好地理解本申请而提出了许多技术细节。但是,即使没有这些技术细节和基于以下各实施例的种种变化和修改,也可以实现本申请所要求保护的技术方案。以下各个实施例的划分是为了描述方便,不应对本申请的具体实现方式构成任何限定,各个实施例在不矛盾的前提下可以相互结合相互引用。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. However, those of ordinary skill in the art can understand that in each embodiment of the application, many technical details are provided for readers to better understand the application. However, even without these technical details and various changes and modifications based on the following embodiments, the technical solutions claimed in this application can also be realized. The division of the following embodiments is for the convenience of description, and should not constitute any limitation to the specific implementation of the present application, and the embodiments can be combined and referred to each other on the premise of no contradiction.
本申请实施例一方面提供了一种调度参数的调整方法,应用于主控设备,该主控设备可以是计算机、服务器等电子设备,如图1所示,具体包括以下步骤。On the one hand, the embodiments of the present application provide a method for adjusting scheduling parameters, which is applied to a master control device. The master control device may be an electronic device such as a computer or a server, as shown in FIG. 1 , and specifically includes the following steps.
步骤101,搜索与目标设备匹配的算子调度模板。 Step 101, searching for an operator scheduling template matching the target device.
本实施例中,算子是指深度模型中的各类运算,如卷积、池化、拼接、上采样等。算子调度模板是对算子在某特定环境下的调度过程的描述,至少包括算子特征信息和算子运行信息,如对于卷积操作而言,算子调度模板可以包括卷积核、运行环境即运行时依赖的硬件等信息。In this embodiment, the operator refers to various operations in the depth model, such as convolution, pooling, splicing, and upsampling. An operator scheduling template is a description of the scheduling process of an operator in a specific environment, including at least operator feature information and operator running information. For example, for convolution operations, an operator scheduling template can include convolution kernels, running The environment refers to information such as the hardware that the runtime depends on.
在一个例子中,目标设备中设置有算子调度模板数据库且算子调度模板数据库中存储有多个算子调度模板的情况下,此时,搜索与目标设备匹配的算子调度模板,可以通过如下方式实现:根据目标设备的硬件信息以及预设的算子调度模板与硬件信息的对应关系,在预设的算子调度模板数据库中查询得到与目标设备匹配的算子调度模板。例如,对于一个卷积核为3×3、运行时依赖的硬件为ARM处理器的算子而言,可以通过查询条件“type==conv and kernel_size>1and env==arm”在算子调度模板数据库查询相应的算子调度 模板。In one example, when the target device is provided with an operator scheduling template database and multiple operator scheduling templates are stored in the operator scheduling template database, at this time, searching for an operator scheduling template that matches the target device can be done by It is realized in the following manner: according to the hardware information of the target device and the corresponding relationship between the preset operator scheduling template and the hardware information, query the preset operator scheduling template database to obtain the operator scheduling template that matches the target device. For example, for an operator whose convolution kernel is 3×3 and whose runtime hardware is an ARM processor, you can use the query condition "type==conv and kernel_size>1and env==arm" in the operator scheduling template The database queries the corresponding operator scheduling template.
需要说明的是,算子调度模板数据库中的算子调度模板需要尽量囊括各种各样的应用场景,即囊括具有各种各样的算子特征信息和算子运行信息的算子,例如,对于卷积算子,算子调度模板数据库中至少需要定义卷积核长度等于1以及大于1、运行环境为x86或x84的CPU、ARM处理器、GPU等环境下的各个细分场景的对应的算子调度模板。It should be noted that the operator scheduling template in the operator scheduling template database needs to cover a variety of application scenarios as much as possible, that is, operators with various operator feature information and operator running information, for example, For convolution operators, the operator scheduling template database must at least define the corresponding subdivision scenarios for each subdivided scene in which the length of the convolution kernel is equal to 1 or greater than 1, and the operating environment is x86 or x84 CPU, ARM processor, GPU, etc. Operator scheduling template.
还需要说明的是,在算子调度模板数据库查询相应的算子调度模板,若是无法找到完全匹配的算子调度模板,可以将匹配度最高的算子调度模板作为匹配到的算子调度模板,如对于某个卷积核为7×7、运行环境为x84的CPU的卷积算子,若是算子调度模板数据库中只有卷积核为1×1以及卷积核为3×3、运行环境分别为x86的CPU、ARM处理器、GPU时的算子调度模板,此时匹配到的算子调度模板为卷积核为3×3、运行环境分别为x86的CPU对应的算子调度模板。It should also be noted that, if the corresponding operator scheduling template cannot be found in the operator scheduling template database, if a completely matching operator scheduling template cannot be found, the operator scheduling template with the highest matching degree can be used as the matched operator scheduling template. For example, for a convolution operator with a CPU with a convolution kernel of 7×7 and an operating environment of x84, if only the convolution kernel is 1×1 and the convolution kernel is 3×3 in the operator scheduling template database, the operating environment Operator scheduling templates for x86 CPUs, ARM processors, and GPUs. At this time, the matching operator scheduling templates are operator scheduling templates corresponding to CPUs with 3×3 convolution kernels and x86 operating environments.
可以理解的是,深度学习模型通常包括若干算子,调度设计实际上应该是对多个算子的调度过程进行设计,因此,在一个例子中,在搜索与目标设备匹配的算子调度模板之前,调度参数的调整方法还包括:将待获取的调度参数涉及的深度学习模型拆分为单个算子。相应地,搜索与目标设备匹配的算子调度模板,包括:搜索与目标设备匹配的拆分得到的算子的算子调度模板。It can be understood that a deep learning model usually includes several operators, and the scheduling design should actually design the scheduling process of multiple operators. Therefore, in an example, before searching for the operator scheduling template that matches the target device The method for adjusting the scheduling parameters further includes: splitting the deep learning model involved in the scheduling parameters to be acquired into a single operator. Correspondingly, searching for an operator scheduling template that matches the target device includes: searching for an operator scheduling template that matches the target device and obtains operators obtained through splitting.
在一个例子中,目标设备中需要推理加速的深度学习模型为基于卷积神经网络(Convolutional Neural Network,CNN)训练得到的人脸识别模型,即待获取的调度参数涉及的深度学习模型为人脸识别模型,首先对人脸识别模型进行拆分,依次得到32个卷积核为11×11的卷积算子、1个池化算子、32个9×9的卷积算子、16个7×7的卷积算子、16个5×5的卷积算子、1个全连接算子、1个损失函数算子,其中,目标设备运行深度学习模型时利用的是CPU,然后,根据上述每个算子的算子特征信息和算子运行信息在算子 调度模板数据库对相应的算子调度模板进行搜索匹配。In one example, the deep learning model that needs inference acceleration in the target device is a face recognition model trained based on Convolutional Neural Network (CNN), that is, the deep learning model involved in the scheduling parameters to be obtained is face recognition model, first split the face recognition model, and obtain 32 convolution operators with convolution kernels of 11×11, 1 pooling operator, 32 convolution operators of 9×9, and 16 convolution operators of 7 ×7 convolution operator, 16 5×5 convolution operators, 1 fully connected operator, and 1 loss function operator. Among them, the target device uses the CPU when running the deep learning model, and then, according to The operator feature information and operator running information of each operator above are searched and matched against the corresponding operator scheduling template in the operator scheduling template database.
需要说明的是,本实施例不对算子和算子调度模板的数量进行限定,目标设备中待获取的调度参数涉及的深度学习模型由多少算子组成或者由多少种算子组成,就需要对相应数量的算子调度模板进行搜索,如深度学习模型由78个运算组成,则对这78个运算对应的78个算子调度模板进行搜索,或者,深度学习模型由98个运算组成,且这98个运算对应75种算子,则对这98个运算包含的75种算子对应的算子调度模板进行搜索,其中,算子特征信息不同的算子可以认为是不同种类的算子,如卷积核不同的卷积算子可以被认为是不同种类的算子。当然,以上仅为具体的举例说明,算子调度模板的数量和深度模型还可以是其他关系,此处就不再一一赘述了。It should be noted that this embodiment does not limit the number of operators and operator scheduling templates. How many operators or how many types of operators are involved in the deep learning model involved in the scheduling parameters to be obtained in the target device? Search for the corresponding number of operator scheduling templates. If the deep learning model consists of 78 operations, search for the 78 operator scheduling templates corresponding to the 78 operations. Or, if the deep learning model consists of 98 operations, and this The 98 operations correspond to 75 types of operators, and the operator scheduling templates corresponding to the 75 types of operators included in the 98 operations are searched. Among them, operators with different operator feature information can be considered as different types of operators, such as Convolution operators with different convolution kernels can be considered as different kinds of operators. Of course, the above is only a specific example, and the number of operator scheduling templates and the depth model can also have other relationships, which will not be described here.
步骤102,根据匹配的算子调度模板和调度参数搜索算法生成调度参数,并将调度参数发送给目标设备,供目标设备根据调度参数运行算子对应的调度过程。 Step 102, generate scheduling parameters according to the matching operator scheduling template and scheduling parameter search algorithm, and send the scheduling parameters to the target device, so that the target device runs the scheduling process corresponding to the operator according to the scheduling parameters.
本实施例中,调度参数搜索算法是在优化问题中寻找最优解的算法,如模拟退火算法、梯度下降算法、全局遍历算法等,本实施例不对调度参数搜索算法进行限定。In this embodiment, the scheduling parameter search algorithm is an algorithm for finding an optimal solution in an optimization problem, such as simulated annealing algorithm, gradient descent algorithm, global traversal algorithm, etc. This embodiment does not limit the scheduling parameter search algorithm.
本实施例中,根据算子调度模板和调度参数搜索算法生成调度参数可以通过如下方式实现:根据匹配的算子调度模板暴露的调度参数,生成算子调度参数集合,其中,算子调度模板暴露的调度参数是指预设的算子调度过程中可行的调度参数;算子调度参数集合包括多组调度参数,每一组调度参数均包括算子的一次调度过程中所需的调度参数;以调度参数搜索算法在算子调度参数集合中搜索出一组调度参数,将搜索出的一组调度参数,作为生成的调度参数。In this embodiment, generating scheduling parameters based on the operator scheduling template and the scheduling parameter search algorithm can be realized in the following way: Generate an operator scheduling parameter set according to the scheduling parameters exposed by the matching operator scheduling template, where the operator scheduling template exposes The scheduling parameters refer to the preset feasible scheduling parameters in the operator scheduling process; the operator scheduling parameter set includes multiple sets of scheduling parameters, and each set of scheduling parameters includes the scheduling parameters required in the operator's one scheduling process; The scheduling parameter search algorithm searches out a set of scheduling parameters in the operator scheduling parameter set, and uses the searched set of scheduling parameters as the generated scheduling parameters.
在一个例子中,某个算子的调度参数包括参数A和参数B,在调度过程中的参数A可行的取值范围为{a1,a2,……,an},参数B可行的取值范围为{b1,b2,……,bm},则该算子对应的算子调度模板暴露的调度参数中参 数A包括a1,a2,……,an,参数B包括b1,b2,……,bm,算子调度参数集合C,即调度参数为{(a1,b1),(a1,b2),……,(a1,bm),(a2,b1),……,(a2,bm),……,(an,bm)},然后基于调度参数搜索算法在集合C中寻找最优解,即参数A和参数B的最优组合,其中,最优解可以是执行时间最短的参数A和参数B的组合,可以是系统资源所需最少的参数A和参数B的组合等。In an example, the scheduling parameters of an operator include parameter A and parameter B. During the scheduling process, the feasible value range of parameter A is {a1, a2,...,an}, and the feasible value range of parameter B is {b1, b2,...,bm}, then the parameter A includes a1, a2,..., an, and the parameter B includes b1, b2,..., bm in the scheduling parameters exposed by the operator scheduling template corresponding to the operator , operator scheduling parameter set C, that is, the scheduling parameters are {(a1, b1), (a1, b2), ..., (a1, bm), (a2, b1), ..., (a2, bm), ... ..., (an, bm)}, and then search for the optimal solution in the set C based on the scheduling parameter search algorithm, that is, the optimal combination of parameter A and parameter B, where the optimal solution can be parameter A and parameter B with the shortest execution time The combination of B may be a combination of parameter A and parameter B that require the least system resources.
当然,上述说明是针对暴露的调度参数能够确定出具体数值,即可以穷尽的情况,进行举例说明,本实施例中,暴露的调度参数还可以包括一定范围内连续的调度参数,即不可穷尽,此时,仍然是根据暴露的调度参数生成算子调度参数集合,然后基于调度参数搜索算法在算子调度参数集合中进行搜索,此处就不再一一赘述了。Of course, the above description is for the case where the exposed scheduling parameters can determine specific values, that is, they can be exhausted. In this embodiment, the exposed scheduling parameters can also include continuous scheduling parameters within a certain range, that is, they cannot be exhausted. At this time, the operator scheduling parameter set is still generated according to the exposed scheduling parameters, and then the operator scheduling parameter set is searched based on the scheduling parameter search algorithm, which will not be described here.
需要说明的是,目标设备中的调度参数搜索算法实际可以有多种,以根据实际情况选择合适的调度参数搜索算法。It should be noted that there may actually be multiple scheduling parameter search algorithms in the target device, so that an appropriate scheduling parameter search algorithm can be selected according to actual conditions.
因此,在一个例子中,在生成算子调度参数集合之后,在以调度参数搜索算法在算子调度参数集合中搜索出一组调度参数之前,调度参数的调整方法还包括:根据基于算子调度参数集合形成的参数搜索空间的大小,在预设的调度搜索算法数据库中选取一种调度参数搜索算法;其中,参数搜索空间基于算子调度参数集合得到,调度搜索算法数据库包括多种调度参数搜索算法。相应地,以调度参数搜索算法在算子调度参数集合中搜索出一组调度参数,包括:以选取的调度参数搜索算法在算子调度参数集合中搜索出一组调度参数。Therefore, in an example, after the operator scheduling parameter set is generated, before a scheduling parameter search algorithm is used to search for a set of scheduling parameters in the operator scheduling parameter set, the scheduling parameter adjustment method further includes: according to the operator-based scheduling The size of the parameter search space formed by the parameter set, select a scheduling parameter search algorithm in the preset scheduling search algorithm database; wherein, the parameter search space is obtained based on the operator scheduling parameter set, and the scheduling search algorithm database includes a variety of scheduling parameter search algorithm. Correspondingly, searching for a set of scheduling parameters in the operator scheduling parameter set by using the scheduling parameter search algorithm includes: searching for a set of scheduling parameters in the operator scheduling parameter set by using the selected scheduling parameter search algorithm.
特别地,根据基于算子调度参数集合形成的参数搜索空间的大小,在预设的调度搜索算法数据库中选取一种调度参数搜索算法,可以通过如下方式实现:根据参数搜索空间的大小预估性能数据收敛需要的时间;在性能数据收敛需要的时间大于预设阈值的情况下,选取偏向于全局均匀搜索的调度参数搜索算法;在性能数据收敛需要的时间小于或等于预设阈值的情况下,选 取在规定时间内搜索局部最优解的调度参数搜索算法。In particular, according to the size of the parameter search space formed based on the operator scheduling parameter set, a scheduling parameter search algorithm is selected from the preset scheduling search algorithm database, which can be realized in the following way: predict the performance according to the size of the parameter search space The time required for data convergence; when the time required for performance data convergence is greater than the preset threshold, select a scheduling parameter search algorithm that is biased towards global uniform search; when the time required for performance data convergence is less than or equal to the preset threshold, Select a scheduling parameter search algorithm that searches for a local optimal solution within a specified time.
在一个例子中,某个算子的调度参数可以穷尽,即能够通过列举的方式说明调度参数的可行取值,此时认为搜索空间相对较小,可以在预设的调度搜索算法数据库中选取采用全局遍历算法得到生成的调度参数。值得一提的是,由于全局遍历算法会针对每一个可行的解进行比较,然后确定出最优解,因此,通过全局遍历算法能够保证得到的调度参数为当前的最优解,搜索的准确性极高。In an example, the scheduling parameters of an operator can be exhausted, that is, the feasible values of the scheduling parameters can be explained by enumerating. At this time, the search space is considered to be relatively small, and it can be selected from the preset scheduling search algorithm database. The global traversal algorithm gets the generated scheduling parameters. It is worth mentioning that, since the global traversal algorithm will compare each feasible solution, and then determine the optimal solution, therefore, the global traversal algorithm can ensure that the obtained scheduling parameters are the current optimal solution, and the accuracy of the search extremely high.
在另一个例子中,某个算子的调度参数的搜索空间相对较大,并且预先对调度参数搜索算法的执行时间有所要求,此时,就需要对调度参数搜索算法的收敛时间进行估计,在预估的收敛时间小于执行时间的情况下,可以在调度搜索算法数据库优先选择搜索准确性高的调度参数搜索算法,如全局遍历等,在预估的收敛时间不小于执行时间的情况下,可以在调度搜索算法数据库优先选择搜索效率高的调度参数搜索算法,如最速下降法等。例如,当某个算子调度模板对应的参数搜索空间大小为100,且每次目标设备的评估时间为5s,则算法的总运行时间约为500s,若该时间小于预设的阈值T,则可以选择全局遍历算法,反之可以选择模拟退火等优化算法。In another example, the search space of scheduling parameters of an operator is relatively large, and the execution time of the scheduling parameter search algorithm is required in advance. At this time, it is necessary to estimate the convergence time of the scheduling parameter search algorithm. In the case where the estimated convergence time is less than the execution time, the scheduling parameter search algorithm with high search accuracy can be preferentially selected in the scheduling search algorithm database, such as global traversal. If the estimated convergence time is not less than the execution time, A scheduling parameter search algorithm with high search efficiency can be preferentially selected in the scheduling search algorithm database, such as the steepest descent method. For example, when the parameter search space size corresponding to an operator scheduling template is 100, and the evaluation time of each target device is 5s, the total running time of the algorithm is about 500s. If the time is less than the preset threshold T, then You can choose the global traversal algorithm, and vice versa, you can choose optimization algorithms such as simulated annealing.
当然,以上仅为具体的举例说明,实际是实现时可以根据需求从调度搜索算法数据库选择相适应的调度参数搜索算法,此处就不再一一赘述了。Of course, the above is only a specific example, and in actual implementation, a suitable scheduling parameter search algorithm can be selected from the scheduling search algorithm database according to requirements, and details will not be repeated here.
需要说明的是,目标设备中的深度学习模型通常会包含若干算子,因此,通过步骤101匹配得到的算子调度模板很可能有多个,考虑到算子之间实际存在关联关系,在目标设备中运行深度学习模型时,算子的调度过程彼此之间会产生影响,因此,生成调度参数时需要考虑若干算子调度模板之间的影响,也就是说,调度参数搜索算法是针对所有匹配到的算子调度模板而言的,而不是单一的算子调度模板。特别地,深度学习模型对应的同一类算子调度模板的不同数量也会带来最优调度参数的不同。It should be noted that the deep learning model in the target device usually contains several operators. Therefore, there may be multiple operator scheduling templates obtained through the matching in step 101. Considering the actual relationship between operators, the target When the deep learning model is running on the device, the scheduling process of operators will affect each other. Therefore, the influence between several operator scheduling templates needs to be considered when generating scheduling parameters. That is to say, the scheduling parameter search algorithm is for all matching In terms of operator scheduling templates, rather than a single operator scheduling template. In particular, different numbers of scheduling templates of the same type of operators corresponding to deep learning models will also lead to differences in optimal scheduling parameters.
可以理解的是,针对所有匹配到的算子调度模板主要与调度参数搜索算 法中的目标函数等有关。因此,以上仅是以单个算子调度模板为例进行说明,可以推广至多个算子调度模板的情况,而不意味着本实施例只能够针对单个算子调度模板实现。例如,在确定搜索空间时,可以将深度学习模型包含的各个算子的调度参数组合后形成的搜索空间,此处就不再一一赘述了。It can be understood that the scheduling template for all matched operators is mainly related to the objective function in the scheduling parameter search algorithm. Therefore, the above only uses a single operator scheduling template as an example for illustration, and can be extended to the case of multiple operator scheduling templates, which does not mean that this embodiment can only be implemented for a single operator scheduling template. For example, when determining the search space, the search space may be formed by combining the scheduling parameters of the various operators included in the deep learning model, which will not be described here.
步骤103,接收目标设备反馈的执行调度过程的性能数据,根据性能数据调整调度参数并发送给目标设备。 Step 103, receiving the performance data of executing the scheduling process fed back by the target device, adjusting the scheduling parameters according to the performance data and sending it to the target device.
具体地说,在接收到目标设备反馈回来的性能数据时,先根据当前接收的性能数据对历史接收的性能数据进行比较,以检测性能是否有提升,在检测到性能未得到提升的情况下,判定搜索算法已经收敛,此时,需要选择历史调度运行性能最优的测试项对应的调度参数,作为最优调度参数的选择;在检测到性能得到提升的情况下,判定搜索算法未收敛,此时,搜索空间中仍可能有更佳的调度参数组合,需要按照一定策略调整调度参数搜索算法,以选择另一组调度参数组合,并发送给目标设备进行执行,其中,调整调度参数搜索算法的策略可以是根据执行效果确定出来的优化方向,还可以是给调度参数搜索算法增加一定扰动,使其向另一个方向继续迭代选择出另一些调度参数组合,此处就不再一一赘述了。Specifically, when receiving the performance data fed back by the target device, first compare the historically received performance data according to the currently received performance data to detect whether the performance has improved. If it is detected that the performance has not been improved, It is determined that the search algorithm has converged. At this time, it is necessary to select the scheduling parameter corresponding to the test item with the best historical scheduling performance as the choice of the optimal scheduling parameter; when it is detected that the performance has been improved, it is determined that the search algorithm has not converged. , there may still be a better combination of scheduling parameters in the search space, and it is necessary to adjust the scheduling parameter search algorithm according to a certain strategy to select another set of scheduling parameter combinations and send them to the target device for execution. Among them, adjusting the scheduling parameter search algorithm The strategy can be an optimization direction determined according to the execution effect, or it can add a certain disturbance to the scheduling parameter search algorithm to make it iteratively select other scheduling parameter combinations in another direction, which will not be described here.
需要说明的是,对于目标设备而言,实际就是不断接收目标设备返回的性能数据,然后根据性能参数调整调度参数,接着将调整后的调度参数发送给目标设备,直到性能数据收敛,也就是得到满意的调度参数,即通过循环确定出最优的调度参数,直到得到最优的调度参数,保证了调度参数的最优性。It should be noted that, for the target device, it actually continuously receives the performance data returned by the target device, then adjusts the scheduling parameters according to the performance parameters, and then sends the adjusted scheduling parameters to the target device until the performance data converges, that is, obtains Satisfactory scheduling parameters, that is, the optimal scheduling parameters are determined through loops until the optimal scheduling parameters are obtained, which ensures the optimality of the scheduling parameters.
由背景技术可知,通过人工设计调度过程实现深度学习模型推理往往效率低、不能得到最优调度。值得一提的是,本实施例通过将调度设计过程解耦为能够被机器理解并执行的确定算子调度模板、确定调度参数以及运行三部分,实现了自动化进行调度设计,克服了对人工的依赖,进而由于调度设计是自动化实现的,因此,避免了人力的局限性,能够在任意硬件上针对所 有算子设计出最优调度,即使深度学习模型通常包含大量不同种类的算子,且种类相同的算子,其参数不同最优调度的实现方式也会有差别,使用的硬件的种类不同也会影响对硬件资源的最优调度,甚至相同算子在型号不同的同类型硬件上最优调度也存在差异,也能够由机器的庞大计算能力完成大量的设计工作,实现对各样的应用场景的调度设计的覆盖。It can be seen from the background technology that implementing deep learning model reasoning by manually designing the scheduling process is often inefficient and cannot be optimally scheduled. It is worth mentioning that this embodiment realizes automatic scheduling design by decoupling the scheduling design process into three parts, which can be understood and executed by the machine: determining operator scheduling template, determining scheduling parameters, and running. Dependence, and because the scheduling design is realized automatically, it avoids the limitation of manpower, and can design the optimal scheduling for all operators on any hardware, even though the deep learning model usually contains a large number of different types of operators, and the types For the same operator, different parameters have different optimal scheduling implementation methods. Different types of hardware used will also affect the optimal scheduling of hardware resources. Even the same operator is optimal on the same type of hardware with different models. There are also differences in scheduling, and a large amount of design work can be completed by the huge computing power of the machine, and the coverage of scheduling design for various application scenarios can be realized.
本申请实施例另一方面还提供了一种调度参数的调整方法,应用于目标设备,该目标设备可以是计算机、服务器等电子设备,如图2所示,具体包括以下步骤。On the other hand, the embodiments of the present application also provide a method for adjusting scheduling parameters, which is applied to a target device. The target device may be an electronic device such as a computer or a server, as shown in FIG. 2 , which specifically includes the following steps.
步骤201,接收主控设备发送的调度参数;其中,调度参数根据与目标设备匹配的算子调度模板和调度参数搜索算法生成。 Step 201, receiving the scheduling parameters sent by the master device; wherein, the scheduling parameters are generated according to the operator scheduling template and the scheduling parameter search algorithm matching the target device.
需要说明的是,由于目标设备匹配的算子调度模板可能有多个,因此,接收到的调度参数可以是单个算子的调度参数,还可以是多个算子的调度参数。It should be noted that since there may be multiple operator scheduling templates matched by the target device, the received scheduling parameters may be the scheduling parameters of a single operator or the scheduling parameters of multiple operators.
步骤202,根据调度参数运行算子对应的调度过程。 Step 202, run the scheduling process corresponding to the operator according to the scheduling parameters.
具体地说,在运行的过程中,目标设备还对运行过程进行监控,以获取性能数据。Specifically, during the running process, the target device also monitors the running process to obtain performance data.
步骤203,向主控设备反馈执行调度过程的性能数据,供主控设备根据性能数据调整调度参数并发送给目标设备。Step 203: Feedback the performance data of executing the scheduling process to the main control device, so that the main control device can adjust the scheduling parameters according to the performance data and send it to the target device.
此外,应当理解的是,上面各种方法的步骤划分,只是为了描述清楚,实现时可以合并为一个步骤或者对某些步骤进行拆分,分解为多个步骤,只要包括相同的逻辑关系,都在本专利的保护范围内;对算法中或者流程中添加无关紧要的修改或者引入无关紧要的设计,但不改变其算法和流程的核心设计都在该专利的保护范围内。In addition, it should be understood that the division of steps in the above methods is only for clarity of description, and may be combined into one step or split into multiple steps during implementation. As long as the same logical relationship is included, all Within the scope of protection of this patent; adding insignificant modifications or introducing insignificant designs to the algorithm or process, but not changing the core design of the algorithm and process are all within the scope of protection of the patent.
本申请实施例另一方面还提供了一种主控设备,如图3所示,包括以下模块。On the other hand, the embodiments of the present application also provide a master control device, as shown in FIG. 3 , including the following modules.
搜索模块301,用于搜索与目标设备匹配的算子调度模板。The search module 301 is configured to search for an operator scheduling template that matches the target device.
调度参数生成模块302,用于根据匹配的算子调度模板和调度参数搜索算法生成调度参数,并将调度参数发送给目标设备,供目标设备根据调度参数运行算子对应的调度过程。The scheduling parameter generating module 302 is configured to generate scheduling parameters according to the matching operator scheduling template and scheduling parameter search algorithm, and send the scheduling parameters to the target device for the target device to run the scheduling process corresponding to the operator according to the scheduling parameters.
迭代模块303,用于接收目标设备反馈的执行调度过程的性能数据,根据性能数据调整调度参数并发送给目标设备。The iteration module 303 is configured to receive the performance data of the execution scheduling process fed back by the target device, adjust the scheduling parameters according to the performance data and send it to the target device.
不难发现,本实施例为与应用于主控设备的方法实施例相对应的设备实施例,本实施例可与应用于主控设备的方法实施例互相配合实施。应用于主控设备的方法实施例中提到的相关技术细节在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用在应用于主控设备的方法实施例中。It is not difficult to find that this embodiment is a device embodiment corresponding to the method embodiment applied to the main control device, and this embodiment can be implemented in cooperation with the method embodiment applied to the main control device. The relevant technical details mentioned in the embodiment of the method applied to the master control device are still valid in this embodiment, and will not be repeated here to reduce repetition. Correspondingly, the relevant technical details mentioned in this embodiment can also be applied to the method embodiment applied to the master control device.
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施例中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。It is worth mentioning that all the modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units. In addition, in order to highlight the innovative part of the present application, units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
本申请实施例另一方面还提供了一种目标设备,如图4所示,包括以下模块。On the other hand, the embodiment of the present application provides a target device, as shown in FIG. 4 , including the following modules.
接收模块401,用于接收主控设备发送的调度参数;其中,调度参数根据与目标设备匹配的算子调度模板和调度参数搜索算法生成。The receiving module 401 is configured to receive the scheduling parameters sent by the main control device; wherein, the scheduling parameters are generated according to the operator scheduling template and the scheduling parameter search algorithm matched with the target device.
运行模块402,用于根据调度参数运行算子对应的调度过程。The running module 402 is configured to run the scheduling process corresponding to the operator according to the scheduling parameters.
反馈模块403,用于向主控设备反馈执行调度过程的性能数据,供主控设备根据性能数据调整调度参数并发送给目标设备。The feedback module 403 is configured to feed back the performance data of the scheduling process to the main control device, so that the main control device adjusts the scheduling parameters according to the performance data and sends them to the target device.
不难发现,本实施例为与应用于目标设备的方法实施例相对应的设备实施例,本实施例可与应用于目标设备的方法实施例互相配合实施。应用于目标设备的方法实施例中提到的相关技术细节在本实施例中依然有效,为了减少重复,这里不再赘述。相应地,本实施例中提到的相关技术细节也可应用 在应用于目标设备的方法实施例中。It is not difficult to find that this embodiment is a device embodiment corresponding to the method embodiment applied to the target device, and this embodiment can be implemented in cooperation with the method embodiment applied to the target device. The relevant technical details mentioned in the embodiment of the method applied to the target device are still valid in this embodiment, and will not be repeated here in order to reduce repetition. Correspondingly, the relevant technical details mentioned in this embodiment can also be applied in the method embodiment applied to the target device.
值得一提的是,本实施例中所涉及到的各模块均为逻辑模块,在实际应用中,一个逻辑单元可以是一个物理单元,也可以是一个物理单元的一部分,还可以以多个物理单元的组合实现。此外,为了突出本申请的创新部分,本实施例中并没有将与解决本申请所提出的技术问题关系不太密切的单元引入,但这并不表明本实施例中不存在其它的单元。It is worth mentioning that all the modules involved in this embodiment are logical modules. In practical applications, a logical unit can be a physical unit, or a part of a physical unit, or multiple physical units. Combination of units. In addition, in order to highlight the innovative part of the present application, units that are not closely related to solving the technical problem proposed in the present application are not introduced in this embodiment, but this does not mean that there are no other units in this embodiment.
本申请实施例另一方面还提供了一种电子设备,如图5所示,包括:至少一个处理器501;以及,与至少一个处理器501通信连接的存储器502;其中,存储器502存储有可被至少一个处理器501执行的指令,指令被至少一个处理器501执行,以使至少一个处理器501能够执行上述任一方法实施例所描述的调度参数的调整方法。On the other hand, the embodiment of the present application also provides an electronic device, as shown in FIG. 5 , including: at least one processor 501; and a memory 502 communicatively connected to the at least one processor 501; An instruction to be executed by at least one processor 501, the instruction is executed by at least one processor 501, so that at least one processor 501 can execute the scheduling parameter adjustment method described in any one of the above method embodiments.
其中,存储器502和处理器501采用总线方式连接,总线可以包括任意数量的互联的总线和桥,总线将一个或多个处理器501和存储器502的各种电路连接在一起。总线还可以将诸如外围设备、稳压器和功率管理电路等之类的各种其他电路连接在一起,这些都是本领域所公知的,因此,本文不再对其进行进一步描述。总线接口在总线和收发机之间提供接口。收发机可以是一个元件,也可以是多个元件,比如多个接收器和发送器,提供用于在传输介质上与各种其他装置通信的单元。经处理器501处理的数据通过天线在无线介质上进行传输,进一步,天线还接收数据并将数据传输给处理器501。Wherein, the memory 502 and the processor 501 are connected by a bus, and the bus may include any number of interconnected buses and bridges, and the bus connects one or more processors 501 and various circuits of the memory 502 together. The bus may also connect together various other circuits such as peripherals, voltage regulators, and power management circuits, all of which are well known in the art and therefore will not be further described herein. The bus interface provides an interface between the bus and the transceivers. A transceiver may be a single element or multiple elements, such as multiple receivers and transmitters, providing means for communicating with various other devices over a transmission medium. The data processed by the processor 501 is transmitted on the wireless medium through the antenna, and further, the antenna also receives the data and transmits the data to the processor 501 .
处理器501负责管理总线和通常的处理,还可以提供各种功能,包括定时,外围接口,电压调节、电源管理以及其他控制功能。而存储器502可以被用于存储处理器501在执行操作时所使用的数据。 Processor 501 is responsible for managing the bus and general processing, and may also provide various functions including timing, peripheral interface, voltage regulation, power management and other control functions. And the memory 502 may be used to store data used by the processor 501 when performing operations.
本申请实施例另一方面还提供了一种计算机可读存储介质,存储有计算机程序。计算机程序被处理器执行时实现上述任一方法实施例所描述的调度参数的调整方法。Another aspect of the embodiment of the present application provides a computer-readable storage medium storing a computer program. When the computer program is executed by the processor, the scheduling parameter adjustment method described in any one of the above method embodiments is implemented.
即,本领域技术人员可以理解,实现上述实施例方法中的全部或部分步 骤是可以通过程序来指令相关的硬件来完成,该程序存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。That is, those skilled in the art can understand that all or part of the steps in the method of the above-mentioned embodiments can be completed by instructing related hardware through a program, the program is stored in a storage medium, and includes several instructions to make a device ( It may be a single-chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
此外,本发明实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任意方法实施例中的方法。In addition, an embodiment of the present invention also provides a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed When executed by a computer, the computer is made to execute the method in any of the above method embodiments.
本领域的普通技术人员可以理解,上述各实施例是实现本申请的具体实施例,而在实际应用中,可以在形式上和细节上对其作各种改变,而不偏离本申请的精神和范围。Those of ordinary skill in the art can understand that the above-mentioned embodiments are specific embodiments for realizing the present application, and in practical applications, various changes can be made to it in form and details without departing from the spirit and spirit of the present application. scope.

Claims (11)

  1. 一种调度参数的调整方法,其中,应用于主控设备,包括:A method for adjusting scheduling parameters, wherein, applied to a master control device, comprising:
    搜索与目标设备匹配的算子调度模板;Search for an operator scheduling template that matches the target device;
    根据所述匹配的算子调度模板和调度参数搜索算法生成调度参数,并将所述调度参数发送给所述目标设备,供所述目标设备根据所述调度参数运行算子对应的调度过程;generating scheduling parameters according to the matched operator scheduling template and scheduling parameter search algorithm, and sending the scheduling parameters to the target device, so that the target device runs a scheduling process corresponding to the operator according to the scheduling parameters;
    接收所述目标设备反馈的执行所述调度过程的性能数据,根据所述性能数据调整所述调度参数并发送给所述目标设备。The performance data for executing the scheduling process fed back by the target device is received, and the scheduling parameter is adjusted according to the performance data and sent to the target device.
  2. 根据权利要求1所述的调度参数的调整方法,其中,所述搜索与目标设备匹配的算子调度模板,包括:The method for adjusting scheduling parameters according to claim 1, wherein the searching for an operator scheduling template that matches the target device includes:
    根据目标设备的硬件信息以及预设的算子调度模板与硬件信息的对应关系,在预设的算子调度模板数据库中查询得到与目标设备匹配的算子调度模板;According to the hardware information of the target device and the corresponding relationship between the preset operator scheduling template and the hardware information, query the preset operator scheduling template database to obtain the operator scheduling template that matches the target device;
    其中,所述算子调度模板数据库中存储有多个算子调度模板。Wherein, the operator scheduling template database stores a plurality of operator scheduling templates.
  3. 根据权利要求1所述的调度参数的调整方法,其中,所述根据所述算子调度模板和调度参数搜索算法生成调度参数,包括:The method for adjusting scheduling parameters according to claim 1, wherein said generating scheduling parameters according to said operator scheduling template and scheduling parameter search algorithm comprises:
    根据所述匹配的算子调度模板暴露的调度参数,生成算子调度参数集合;所述算子调度参数集合包括多组调度参数,每一组调度参数均包括所述算子的一次调度过程中所需的调度参数;According to the scheduling parameters exposed by the matched operator scheduling template, an operator scheduling parameter set is generated; the operator scheduling parameter set includes multiple sets of scheduling parameters, and each set of scheduling parameters includes the desired scheduling parameters;
    以所述调度参数搜索算法在所述算子调度参数集合中搜索出一组调度参数,将所述搜索出的一组调度参数,作为所述生成的调度参数。A group of scheduling parameters is searched in the set of operator scheduling parameters by using the scheduling parameter search algorithm, and the searched group of scheduling parameters is used as the generated scheduling parameters.
  4. 根据权利要求3所述的调度参数的调整方法,其中,在所述生成算子调度参数集合之后,在所述以所述调度参数搜索算法在所述算子调度参数集合中搜索出一组调度参数之前,还包括:The method for adjusting scheduling parameters according to claim 3, wherein after said generating the operator scheduling parameter set, a group of scheduling is searched in the operator scheduling parameter set using the scheduling parameter search algorithm Before the parameters, also include:
    根据基于所述算子调度参数集合形成的参数搜索空间的大小,在预设的调度搜索算法数据库中选取一种调度参数搜索算法;其中,所述参数搜索空 间基于所述算子调度参数集合得到,所述调度搜索算法数据库包括多种调度参数搜索算法;According to the size of the parameter search space formed based on the operator scheduling parameter set, select a scheduling parameter search algorithm from the preset scheduling search algorithm database; wherein, the parameter search space is obtained based on the operator scheduling parameter set , the scheduling search algorithm database includes multiple scheduling parameter search algorithms;
    所述以所述调度参数搜索算法在所述算子调度参数集合中搜索出一组调度参数,包括:The searching out a set of scheduling parameters in the operator scheduling parameter set by using the scheduling parameter search algorithm includes:
    以所述选取的调度参数搜索算法在所述算子调度参数集合中搜索出一组调度参数。A set of scheduling parameters is searched in the set of operator scheduling parameters by using the selected scheduling parameter search algorithm.
  5. 根据权利要求4所述的调度参数的调整方法,其中,所述根据基于所述算子调度参数集合形成的参数搜索空间的大小,在预设的调度搜索算法数据库中选取一种调度参数搜索算法,包括:The method for adjusting scheduling parameters according to claim 4, wherein, according to the size of the parameter search space formed based on the operator scheduling parameter set, a scheduling parameter search algorithm is selected from a preset scheduling search algorithm database ,include:
    根据所述参数搜索空间的大小预估所述性能数据收敛需要的时间;Estimate the time required for the convergence of the performance data according to the size of the parameter search space;
    在所述性能数据收敛需要的时间大于预设阈值的情况下,选取偏向于全局均匀搜索的调度参数搜索算法;When the time required for the convergence of the performance data is greater than a preset threshold, select a scheduling parameter search algorithm that is biased towards global uniform search;
    在所述性能数据收敛需要的时间小于或等于预设阈值的情况下,选取在规定时间内搜索局部最优解的调度参数搜索算法。In the case that the time required for the convergence of the performance data is less than or equal to a preset threshold, a scheduling parameter search algorithm that searches for a local optimal solution within a specified time is selected.
  6. 根据权利要求1至5中任一项所述的调度参数的调整方法,其中,在所述搜索与目标设备匹配的算子调度模板之前,还包括:The method for adjusting scheduling parameters according to any one of claims 1 to 5, wherein, before the searching for an operator scheduling template that matches the target device, further comprising:
    将待获取的调度参数涉及的深度学习模型拆分为单个算子;Split the deep learning model involved in the scheduling parameters to be obtained into a single operator;
    所述搜索与目标设备匹配的算子调度模板,包括:The searching for an operator scheduling template that matches the target device includes:
    搜索与目标设备匹配的所述拆分得到的算子的算子调度模板。Search for the operator scheduling template of the split operator that matches the target device.
  7. 一种调度参数的调整方法,其中,应用于目标设备,包括:A method for adjusting scheduling parameters, wherein, applied to a target device, comprising:
    接收主控设备发送的调度参数;其中,所述调度参数根据与所述目标设备匹配的算子调度模板和调度参数搜索算法生成;receiving scheduling parameters sent by the master control device; wherein, the scheduling parameters are generated according to an operator scheduling template and a scheduling parameter search algorithm that match the target device;
    根据所述调度参数运行算子对应的调度过程;Run the scheduling process corresponding to the operator according to the scheduling parameters;
    向所述主控设备反馈执行所述调度过程的性能数据,供所述主控设备根据所述性能数据调整所述调度参数并发送给所述目标设备。The performance data of executing the scheduling process is fed back to the master control device, so that the master control device adjusts the scheduling parameter according to the performance data and sends it to the target device.
  8. 一种主控设备,其中,包括:A master control device, including:
    搜索模块,用于搜索与目标设备匹配的算子调度模板;A search module, configured to search for operator scheduling templates that match the target device;
    调度参数生成模块,用于根据所述匹配的算子调度模板和调度参数搜索算法生成调度参数,并将所述调度参数发送给所述目标设备,供所述目标设备根据所述调度参数运行算子对应的调度过程;A scheduling parameter generating module, configured to generate scheduling parameters according to the matched operator scheduling template and scheduling parameter search algorithm, and send the scheduling parameters to the target device for the target device to run the operator according to the scheduling parameters Sub-corresponding scheduling process;
    迭代模块,用于接收所述目标设备反馈的执行所述调度过程的性能数据,根据所述性能数据调整所述调度参数并发送给所述目标设备。An iteration module, configured to receive performance data for executing the scheduling process fed back by the target device, adjust the scheduling parameters according to the performance data, and send the scheduling parameters to the target device.
  9. 一种目标设备,其中,包括:A target device, comprising:
    接收模块,用于接收主控设备发送的调度参数;其中,所述调度参数根据与所述目标设备匹配的算子调度模板和调度参数搜索算法生成;A receiving module, configured to receive scheduling parameters sent by the master control device; wherein, the scheduling parameters are generated according to an operator scheduling template and a scheduling parameter search algorithm that match the target device;
    运行模块,用于根据所述调度参数运行算子对应的调度过程;A running module, configured to run a scheduling process corresponding to the operator according to the scheduling parameters;
    反馈模块,用于向所述主控设备反馈执行所述调度过程的性能数据,供所述主控设备根据所述性能数据调整所述调度参数并发送给所述目标设备。A feedback module, configured to feed back performance data of executing the scheduling process to the main control device, so that the main control device adjusts the scheduling parameters according to the performance data and sends them to the target device.
  10. 一种电子设备,其中,包括:An electronic device, comprising:
    至少一个处理器;以及,at least one processor; and,
    与所述至少一个处理器通信连接的存储器;其中,a memory communicatively coupled to the at least one processor; wherein,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如权利要求1至6中任一项所述的调度参数的调整方法,或执行如权利要求7所述的调度参数的调整方法。The memory stores instructions executable by the at least one processor, the instructions are executed by the at least one processor, so that the at least one processor can perform the operation described in any one of claims 1 to 6 The method for adjusting the scheduling parameters described above, or perform the method for adjusting the scheduling parameters according to claim 7.
  11. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至6中任一项所述的调度参数的调整方法,或实现权利要求7所述的调度参数的调整方法。A computer-readable storage medium storing a computer program, wherein, when the computer program is executed by a processor, the method for adjusting a scheduling parameter according to any one of claims 1 to 6 is realized, or the method according to claim 7 is realized The adjustment method of the scheduling parameters.
PCT/CN2022/129029 2021-11-12 2022-11-01 Scheduling parameter adjusting method, devices, and storage medium WO2023083058A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111354335.XA CN114064242A (en) 2021-11-12 2021-11-12 Method, device and storage medium for adjusting scheduling parameters
CN202111354335.X 2021-11-12

Publications (1)

Publication Number Publication Date
WO2023083058A1 true WO2023083058A1 (en) 2023-05-19

Family

ID=80272481

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/129029 WO2023083058A1 (en) 2021-11-12 2022-11-01 Scheduling parameter adjusting method, devices, and storage medium

Country Status (2)

Country Link
CN (1) CN114064242A (en)
WO (1) WO2023083058A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117170879A (en) * 2023-11-01 2023-12-05 之江实验室 Device management device and method for intelligent chip

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064242A (en) * 2021-11-12 2022-02-18 中兴通讯股份有限公司 Method, device and storage medium for adjusting scheduling parameters
CN116304720B (en) * 2023-05-18 2023-08-25 之江实验室 Cost model training method and device, storage medium and electronic equipment
CN116755782B (en) * 2023-08-18 2023-10-20 腾讯科技(深圳)有限公司 Method, device, equipment, storage medium and program product for instruction scheduling

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180341851A1 (en) * 2017-05-24 2018-11-29 International Business Machines Corporation Tuning of a machine learning system
CN111752716A (en) * 2020-06-29 2020-10-09 北京小米松果电子有限公司 Model using method, data processing method and device
CN111796917A (en) * 2019-04-09 2020-10-20 华为技术有限公司 Operator operation scheduling method and device
WO2021051920A1 (en) * 2019-09-17 2021-03-25 华为技术有限公司 Model optimization method and apparatus, storage medium, and device
CN114064242A (en) * 2021-11-12 2022-02-18 中兴通讯股份有限公司 Method, device and storage medium for adjusting scheduling parameters

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180341851A1 (en) * 2017-05-24 2018-11-29 International Business Machines Corporation Tuning of a machine learning system
CN111796917A (en) * 2019-04-09 2020-10-20 华为技术有限公司 Operator operation scheduling method and device
WO2021051920A1 (en) * 2019-09-17 2021-03-25 华为技术有限公司 Model optimization method and apparatus, storage medium, and device
CN111752716A (en) * 2020-06-29 2020-10-09 北京小米松果电子有限公司 Model using method, data processing method and device
CN114064242A (en) * 2021-11-12 2022-02-18 中兴通讯股份有限公司 Method, device and storage medium for adjusting scheduling parameters

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117170879A (en) * 2023-11-01 2023-12-05 之江实验室 Device management device and method for intelligent chip
CN117170879B (en) * 2023-11-01 2024-03-12 之江实验室 Device management device and method for intelligent chip

Also Published As

Publication number Publication date
CN114064242A (en) 2022-02-18

Similar Documents

Publication Publication Date Title
WO2023083058A1 (en) Scheduling parameter adjusting method, devices, and storage medium
US11176487B2 (en) Gradient-based auto-tuning for machine learning and deep learning models
US9875186B2 (en) System and method for data caching in processing nodes of a massively parallel processing (MPP) database system
US20200342322A1 (en) Method and device for training data, storage medium, and electronic device
US10114682B2 (en) Method and system for operating a data center by reducing an amount of data to be processed
CN104885078B (en) For the method for the Two-phrase query optimization in MPP data-base cluster
US20210406085A1 (en) Methods and apparatus for allocating a workload to an accelerator using machine learning
US8176037B2 (en) System and method for SQL query load balancing
US20140280021A1 (en) System and Method for Distributed SQL Join Processing in Shared-Nothing Relational Database Clusters Using Stationary Tables
US9576026B2 (en) System and method for distributed SQL join processing in shared-nothing relational database clusters using self directed data streams
US10956417B2 (en) Dynamic operation scheduling for distributed data processing
US20210224692A1 (en) Hyperparameter tuning method, device, and program
CN109886859A (en) Data processing method, system, electronic equipment and computer readable storage medium
US20210398013A1 (en) Method and system for performance tuning and performance tuning device
CN111209077A (en) Deep learning framework design method
CN115249315B (en) Heterogeneous computing device-oriented deep learning image classification method and device
CN114611675A (en) Data processing method, data processing device, electronic device and storage medium
US11782888B2 (en) Dynamic multi-platform model generation and deployment system
US11985029B2 (en) Pico-base station configuration method and apparatus, storage medium and electronic apparatus
US9934051B1 (en) Adaptive code generation with a cost model for JIT compiled execution in a database system
CN112966054A (en) Enterprise graph node relation-based ethnic group division method and computer equipment
CN109242680B (en) Method for dynamically adjusting block chain consensus
CN116582407A (en) Containerized micro-service arrangement system and method based on deep reinforcement learning
WO2021051920A1 (en) Model optimization method and apparatus, storage medium, and device
US20220004560A1 (en) Data model matching method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891854

Country of ref document: EP

Kind code of ref document: A1