WO2020207393A1 - 算子运算调度方法和装置 - Google Patents

算子运算调度方法和装置 Download PDF

Info

Publication number
WO2020207393A1
WO2020207393A1 PCT/CN2020/083635 CN2020083635W WO2020207393A1 WO 2020207393 A1 WO2020207393 A1 WO 2020207393A1 CN 2020083635 W CN2020083635 W CN 2020083635W WO 2020207393 A1 WO2020207393 A1 WO 2020207393A1
Authority
WO
WIPO (PCT)
Prior art keywords
scheduling
operator
parameters
feasible
strategies
Prior art date
Application number
PCT/CN2020/083635
Other languages
English (en)
French (fr)
Inventor
李琳
丁皓
杨康
张登程
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2020207393A1 publication Critical patent/WO2020207393A1/zh
Priority to US17/144,780 priority Critical patent/US11934866B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/008Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0806Multiuser, multiprocessor or multiprocessing cache systems
    • G06F12/0811Multiuser, multiprocessor or multiprocessing cache systems with multilevel cache hierarchies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0893Caches characterised by their organisation or structure
    • G06F12/0897Caches characterised by their organisation or structure with two or more cache hierarchy levels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound

Definitions

  • This application relates to artificial intelligence technology, and in particular to an operator operation scheduling method and device.
  • Deep learning technology is used in real life applications
  • DNN Deep Neural Network
  • the computational efficiency of the Deep Neural Network (DNN) used in deep learning technology directly affects the effect of practical applications, for example, the calculation time of algorithms such as target detection, target recognition, and motion prediction in autonomous driving Determines the availability and security of the algorithm. Therefore, how to perform AI calculations with high performance is an urgent need for industrial products.
  • AI computing usually uses two types of chips: general-purpose computing chips and dedicated chips.
  • General-purpose computing chips include a central processing unit (Central Processing Unit, referred to as CPU) and a graphics processing unit (Graphics Processing Unit, referred to as GPU).
  • CPU Central Processing Unit
  • GPU Graphics Processing Unit
  • dedicated chips many companies have already begun to vigorously deploy AI chips. Therefore, in the face of multiple chip architectures, how to use general algorithms to improve the computing efficiency of heterogeneous chips is facing huge difficulties and challenges.
  • ANN Artificial Neural Network
  • This application provides an operator operation scheduling method and device to realize general processing of processor architecture and operator types and improve operator operation performance.
  • this application provides an operator operation scheduling method, including: obtaining operator parameters and processor parameters corresponding to the operator operation; creating N scheduling strategies according to the operator parameters and the processor parameters, so The N scheduling strategies are classified into M scheduling strategy subsets, each of the scheduling strategy subsets includes at least one of the scheduling strategies, and N and M are both natural numbers; according to the operator parameters and the processor parameters, respectively Filtering the M scheduling strategy subsets to obtain K feasible scheduling strategies, where the K feasible scheduling strategies are the optimal scheduling strategies of the K feasible scheduling subsets in the M scheduling strategy subsets, K is a natural number less than or equal to M; input the operator parameters and the K feasible scheduling strategies into the cost model to obtain K operator operation costs, the K operator operation costs and the K feasible scheduling strategies One-to-one correspondence; determining the optimal scheduling strategy for the operator operation according to the target demand and the K operator operation costs.
  • This application uses factors that are difficult to model directly in the processor and operator parameters to create multiple scheduling strategies, classifies these scheduling strategies into multiple scheduling strategy subsets, and then filters each scheduling strategy subset to obtain each
  • the optimal scheduling strategy in a scheduling strategy subset reduces the number of scheduling strategy subsets to be searched, and finally the optimal scheduling strategy for operator operations is obtained through the cost model, so that the operator operation scheduling method can support any operator
  • the scheduling strategy output of parameters and processor parameters realizes the general processing of processor architecture and operator types, and improves the performance of operator operations.
  • the filtering processing on the M scheduling policy subsets respectively according to the operator parameters and the processor parameters to obtain K feasible scheduling policies includes: according to the calculation The sub-parameters and the processor parameters determine the constraint conditions, and the M scheduling policy subsets are respectively filtered according to the constraint conditions to obtain the K feasible scheduling subsets; and the K feasible scheduling subsets
  • the scheduling strategies are respectively input into the objective function to obtain the K feasible scheduling strategies, and the K feasible scheduling strategies are respectively the optimal scheduling strategies in the K feasible scheduling subsets.
  • the determining the optimal scheduling strategy of the operator operation according to the target demand and the K operator operation costs includes: selecting the optimal scheduling strategy for the operator operation according to the K operator operation costs.
  • the feasible scheduling strategy for the target demand is the optimal scheduling strategy.
  • the operator parameters include at least one of the following information: operator type or operator size;
  • the processor parameters include at least one of the following information: processor architecture, cache level, cache capacity And bandwidth, computing power or processing frequency;
  • the scheduling strategy includes at least one of the following information: segmentation type, cyclic order or data flow.
  • the method further includes: obtaining sample operator parameters and sample processor parameters, where the sample operator parameters are obtainable Operator parameters, the current processor parameters are obtainable processor parameters; X scheduling strategies are created according to the sample operator parameters and the sample processor parameters, and the X scheduling strategies are classified into Y scheduling strategies Each of the scheduling strategy subsets includes at least one of the scheduling strategies, and X and Y are both natural numbers; and the Y scheduling strategy subsets are respectively determined according to the sample operator parameter and the sample processor parameter Performing filtering processing to obtain Z feasible scheduling strategies, where the Z feasible scheduling strategies are respectively the optimal scheduling strategies of the Z feasible scheduling subsets in the Y scheduling strategy subsets, and Z is a natural number less than or equal to Y;
  • the cost model is obtained by training according to the sample operator parameters and the Z feasible scheduling strategies.
  • this application uses as many factors that are difficult to directly model in the processor and operator parameters to create multiple scheduling strategies, classify these scheduling strategies into multiple scheduling strategy subsets, and then analyze each The scheduling strategy subset is filtered to obtain the optimal scheduling strategy in each scheduling strategy subset, which reduces the number of scheduling strategy subsets to be searched. Finally, the cost model obtained after training with the actual processor operating cost as the target improves Acquire the accuracy of the cost and improve the efficiency of online processing.
  • the training to obtain the cost model according to the sample operator parameters and the Z feasible scheduling strategies includes: combining the sample operator parameters and the Z feasible scheduling strategies Input the corresponding processor to obtain Z operator operation costs, the Z operator operation costs correspond to the Z feasible scheduling strategies one-to-one; take the Z operator operation costs as the target, and calculate according to the sample
  • the cost model is obtained by training the sub-parameters and the Z feasible scheduling strategies.
  • the present application provides an operator operation scheduling device, including: a general scheduling module, used to obtain operator parameters and processor parameters corresponding to the operator operation; and create based on the operator parameters and the processor parameters N scheduling strategies, the N scheduling strategies are classified into M scheduling strategy subsets, each of the scheduling strategy subsets includes at least one of the scheduling strategies, and N and M are both natural numbers; the global feature module is used to The operator parameters and the processor parameters respectively perform filtering processing on the M scheduling strategy subsets to obtain K feasible scheduling strategies, and the K feasible scheduling strategies are respectively in the M scheduling strategy subsets
  • the optimal scheduling strategy of K feasible scheduling subsets, K is a natural number less than or equal to M; a cost model, used to input the operator parameters and the K feasible scheduling strategies into the cost model to obtain K operator operation costs ,
  • the K operator operation costs are in one-to-one correspondence with the K feasible scheduling strategies;
  • the optimal scheduling selection module is used to determine the optimal operator operation according to the target demand and the K operator operation costs Scheduling strategy.
  • the global feature module is specifically configured to determine constraint conditions according to the operator parameters and the processor parameters, and perform respective operations on the M scheduling policy subsets according to the constraint conditions.
  • the K feasible scheduling subsets are obtained by filtering processing; the scheduling strategies in the K feasible scheduling subsets are respectively input into the objective function to obtain the K feasible scheduling strategies, and the K feasible scheduling strategies are respectively the K The optimal scheduling strategy in the feasible scheduling subset.
  • the optimal scheduling selection module is specifically configured to select a feasible scheduling strategy that satisfies the target demand as the optimal scheduling strategy according to the operation costs of the K operators.
  • the operator parameters include at least one of the following information: operator type or operator size;
  • the processor parameters include at least one of the following information: processor architecture, cache level, cache capacity And bandwidth, computing power or processing frequency;
  • the scheduling strategy includes at least one of the following information: segmentation type, cyclic order or data flow.
  • it further includes: a training module; the general scheduling module is also used to obtain sample operator parameters and sample processor parameters, where the sample operator parameters are obtainable operator parameters, so The sample processor parameters are available processor parameters; X scheduling strategies are created based on the sample operator parameters and the sample processor parameters, and the X scheduling strategies are classified into Y scheduling strategy subsets, each The scheduling strategy subset includes at least one of the scheduling strategies, and both X and Y are natural numbers; the global feature module is further configured to perform the calculation of the Y number according to the sample operator parameter and the sample processor parameter.
  • the scheduling strategy subset is filtered to obtain Z feasible scheduling strategies.
  • the Z feasible scheduling strategies are the optimal scheduling strategies of the Z feasible scheduling subsets in the Y scheduling strategy subsets, and Z is less than or equal to A natural number of Y; the training model is used to train the cost model according to the sample operator parameters and the Z feasible scheduling strategies.
  • the training model is specifically used to input the sample operator parameters and the Z feasible scheduling strategies into the corresponding processor to obtain Z operator operation costs, and the Z operators
  • the sub-operation costs have a one-to-one correspondence with the Z feasible scheduling strategies; taking the Z operator operation costs as a target, the cost model is obtained by training according to the sample operator parameters and the Z feasible scheduling strategies.
  • this application provides a computer device, including:
  • One or more processors are One or more processors;
  • Memory used to store one or more programs
  • the one or more processors implement the method in any one of the above-mentioned first aspects.
  • the present application provides a computer-readable storage medium, and the computer-readable storage medium stores instructions.
  • the instructions When the instructions are run on a computer, they are used to execute the method of any one of the above-mentioned first aspects.
  • the present application provides a computer program, when the computer program is executed by a computer, it is used to execute the method in any one of the above-mentioned first aspects.
  • FIG. 1 is a schematic diagram of an application scenario of an embodiment of an operator operation scheduling method of this application
  • FIG. 3 is a schematic diagram of an embodiment of a general scheduling policy subset of this application.
  • FIG. 4 is a schematic diagram of an embodiment of a feasible scheduling strategy of this application.
  • Embodiment 2 is a flowchart of Embodiment 2 of the operator operation scheduling method of this application;
  • FIG. 6 is a schematic structural diagram of an embodiment of an operator operation scheduling apparatus 600 of this application.
  • FIG. 7 is a schematic structural diagram of an embodiment of a cost model training apparatus 700 of this application.
  • FIG. 8 is a schematic structural diagram of an embodiment of a computer device of this application.
  • FIG 1 is a schematic diagram of an application scenario of an embodiment of an operator operation scheduling method according to this application.
  • the method of this embodiment is executed by an operator operation scheduling system, which includes an operator operation scheduling device and a cost model Training device, wherein the cost model training device is trained to obtain the cost model of each scheduling strategy according to the implementable operator parameter data set and the available processor parameter data set.
  • This process is an offline processing process, that is, in the non-operator operation
  • the cost model training device collects as many operator parameters and processor parameters as possible, performs actual processor tests on these parameters to obtain the actual cost of the scheduling strategy, and then calculates the actual cost of the operator parameters and processor parameters. The corresponding relationship is the goal, and the cost model is trained.
  • the operator operation scheduling device takes the operator parameters and processor parameters corresponding to the operator operation as input, and obtains the optimal scheduling strategy of the operator operation after the cost model.
  • This process is an online processing process, that is, for each actual operator operation
  • the operator parameters and processor parameters in scheduling are combined with the cost model obtained through training to determine the corresponding optimal scheduling strategy.
  • the operator operation scheduling method of the present application is implemented in the form of program code included in the software of the operator operation scheduling system and deployed on the processor hardware.
  • the operator operation is an abstract description of the operation process to be performed, including the operation method steps and the operation data of each step.
  • the scheduling strategy is the specific steps in the actual realization of the operator operation, including the direction of the data flow, the size of the data segmentation, and the order of operations.
  • a scheduling strategy subset is a space composed of scheduling strategies, and each point in the space represents a scheduling strategy.
  • Fig. 2 is a flowchart of Embodiment 1 of the operator operation scheduling method according to this application. As shown in Fig. 2, the method of this embodiment is executed by the operator operation scheduling system, and the method may include:
  • Step 201 Obtain operator parameters and processor parameters corresponding to the operator operation.
  • the operator operation scheduling system first needs to obtain relevant operator parameters and processor parameters, where the operator parameters include at least one of the following information: operator type or operator size, etc.
  • the processor parameters include at least one of the following information: processor architecture, cache level, cache capacity and bandwidth, computing capability or processing frequency, etc. These parameters can be obtained through the data input at the input terminal, or internally, according to the model or type of the processor.
  • Step 202 Create N scheduling strategies according to the operator parameters and the processor parameters.
  • each scheduling strategy subset includes at least one scheduling strategy, and both N and M are natural numbers.
  • the operator operation scheduling system in this application uses factors that are difficult to directly model in the processor and operator parameters, such as operator type, data format, processor architecture, etc., to create N scheduling strategies, and then these N
  • the scheduling strategy is classified into M scheduling strategy subsets.
  • the general filter conditions of the associated operator parameters and processor parameters can be set to realize the isolation of hardware parameters and software parameters, so that any operator can be supported.
  • the scheduling strategy output of parameters and processor parameters ensures that the scheduling method can be applied to any operator parameter and any processor.
  • FIG. 3 is a schematic diagram of an embodiment of a scheduling strategy subset of this application. As shown in FIG. 3, there are differences between the scheduling strategy subset i and the scheduling strategy subset j in the following factors, including: segmentation type, cycle order And data flow, that is, matrix A and matrix B in the scheduling strategy subset i are divided into rows and columns, respectively.
  • the whole segmentation process After the three-level cache.
  • matrix A and matrix B are divided into rows and columns at the same time.
  • the matrix B is first column-first and then the row is circularly divided, and then the matrix A is circularly divided into the first row and then the column.
  • the sub-process passes through two levels of cache.
  • Step 203 Perform filtering processing on the subset of M scheduling strategies respectively according to the operator parameters and the processor parameters, and obtain K feasible scheduling strategies.
  • the K feasible scheduling strategies are the optimal scheduling strategies of the K feasible scheduling subsets in the M scheduling strategy subsets, and K is a natural number less than or equal to M.
  • the operator operation scheduling system determines the constraint conditions according to the operator parameters and the processor parameters, and filters the M scheduling strategy subsets according to the constraint conditions to obtain K feasible scheduling subsets. Since one or more subsets may be filtered from the M scheduling policy subsets according to the constraint conditions, or all M scheduling policy subsets may be retained, K is a natural number less than or equal to M. Then the scheduling strategies in the K feasible scheduling subsets are input into the objective function to obtain K feasible scheduling strategies.
  • the K feasible scheduling strategies are the optimal scheduling strategies in the K feasible scheduling subsets.
  • the operator operation scheduling system first performs a filtering process on the subset of M scheduling strategies according to the constraint conditions to filter out the infeasible scheduling strategy subset and/or scheduling strategy, reducing the number of scheduling strategies to be searched, and then
  • the scheduling strategies in the K feasible scheduling subsets are respectively input to the objective function for further filtering processing, so that the output scheduling strategy is the optimal scheduling strategy in the feasible scheduling strategy subset. Therefore, the data reuse rate of the scheduling strategy is used as an indicator to guide the dimensionality reduction processing of the feasible scheduling strategy subset. The higher the data reuse rate, the less repeated data movement, the better the performance, and the better the power consumption. The smaller. Illustratively, FIG.
  • FIG. 4 is a schematic diagram of an embodiment of a feasible scheduling strategy subset of the application.
  • the scheduling strategy subset is filtered according to the operator parameters and the processor parameters to obtain the feasible scheduling strategy subset, and the operator parameters are extracted , The characteristics of the processor parameters and the scheduling strategy, and then perform dimensionality reduction processing on the feasible scheduling strategy subset with the data reuse rate as the target, and obtain the optimal scheduling strategy in the feasible scheduling strategy subset.
  • the subset of feasible scheduling strategies includes two scheduling strategies (i and j). Scheduling strategy i only splits the matrix (A and B) in one dimension (row or column). The split matrix exceeds the buffer size, so it is Infeasible scheduling strategy.
  • Scheduling strategy j splits the matrix (A and B) in two dimensions (rows and columns) at the same time.
  • the split matrix meets the requirements of the cache size, so it is the optimal scheduling strategy in the subset of feasible scheduling strategies.
  • the parameters include the height and width of matrix A (A_H, A_W) and the height and width of matrix B (B_H, B_W).
  • Step 204 Input the operator parameters and K feasible scheduling strategies into the cost model to obtain K operator operation costs.
  • the operation cost of K operators corresponds to K feasible scheduling strategies one to one.
  • the operator operation scheduling system respectively passes the K feasible scheduling strategies through the cost model to obtain the corresponding operator operation cost.
  • the operator operation cost includes but is not limited to the time consumption of the operator operation, the total operating power consumption, and the like.
  • the cost model is a model for predicting the operating cost of the scheduling strategy. As described above, the cost model is trained by the operator operation scheduling system during offline processing.
  • Step 205 Determine an optimal scheduling strategy for the operator operation according to the target demand and K operator operation costs.
  • the operator operation scheduling system selects the feasible scheduling strategy that meets the target demand as the optimal scheduling strategy according to the K operator operation costs.
  • the target demand includes the customer's optimal standard, for example, the realization standard of performance, power consumption, etc., in the operator operation
  • the operator operation scheduling system can select one of K feasible scheduling strategies according to the target demand as the optimal scheduling strategy for the operator operation, including the lowest power consumption, and the performance meets customer standards. .
  • the factors that are difficult to directly model in the processor and operator parameters are used to create multiple scheduling strategies, these scheduling strategies are classified into multiple scheduling strategy subsets, and then each scheduling strategy subset is filtered to obtain The optimal scheduling strategy in each scheduling strategy subset reduces the number of scheduling strategy subsets to be searched. Finally, the optimal scheduling strategy for operator operation is obtained through the cost model, so that the operator operation scheduling method can support any calculation
  • the scheduling strategy output of sub-parameters and processor parameters realizes the general processing of processor architecture and operator types, and improves the performance of operator operations.
  • the first embodiment above is the process of the operator operation scheduling system online determining the optimal scheduling strategy based on the operator parameters and processor parameters corresponding to the operator operation.
  • the cost model involved in this process is trained in the offline process.
  • 5 is a flowchart of Embodiment 2 of the operator operation scheduling method of this application. As shown in FIG. 5, the training process of the cost model in this application may include:
  • Step 501 Obtain sample operator parameters and sample processor parameters.
  • the sample operator parameters are obtainable operator parameters, and the sample processor parameters are obtainable processor parameters.
  • the operator operation scheduling system can collect as many available operator parameters and available processor parameters as possible, where the operator parameters include at least the following One type of information: operator type or operator size, etc.
  • the processor parameter includes at least one of the following information: processor architecture, cache level, cache capacity and bandwidth, computing power, or processing frequency. These parameters can be obtained through the input data, or according to the model or type of the processor.
  • Step 502 Create X scheduling strategies according to the sample operator parameters and the sample processor parameters.
  • the X scheduling strategies are classified into Y scheduling strategy subsets, each scheduling strategy subset includes at least one scheduling strategy, and both X and Y are natural numbers.
  • the operator operation scheduling system uses factors that are difficult to model directly in the above parameters, such as operator type, data format, processor architecture, etc., to create X scheduling strategies, and then classify these X scheduling strategies into Y schedules Strategy subsets, in each scheduling strategy subset, set the general filter conditions of the associated operator parameters and processor parameters, which can realize the isolation of hardware parameters and software parameters, so that it can support the scheduling of arbitrary operator parameters and processor parameters Strategy output to ensure the versatility of the operator operation to the processor and operator type.
  • Step 503 Perform filtering processing on the Y scheduling policy subsets respectively according to the sample operator parameters and the sample processor parameters to obtain Z feasible scheduling policies.
  • the Z feasible scheduling strategies are the optimal scheduling strategies of the Z feasible scheduling subsets in the Y scheduling strategy subsets, and Z is a natural number less than or equal to Y.
  • the operator operation scheduling system determines the constraint conditions according to the operator parameters and the processor parameters, and filters the Y scheduling policy subsets according to the constraint conditions to obtain Z feasible scheduling subsets. Since one or more subsets may be filtered from the Y scheduling policy subsets according to the constraint conditions, or all the Y scheduling policy subsets may be retained, Z is a natural number less than or equal to Y. Then the scheduling strategies in the Z feasible scheduling subsets are input into the objective function to obtain the Z feasible scheduling strategies.
  • the Z feasible scheduling strategies are the optimal scheduling strategies in the Z feasible scheduling subsets.
  • the operator operation scheduling system first performs a filtering process on the Y scheduling strategy subsets according to the constraint conditions to filter out the infeasible scheduling strategy subsets and/or scheduling strategies, reducing the number of scheduling strategies to be searched, and then
  • the scheduling strategies in the Z feasible scheduling subsets are respectively input to the objective function for further filtering processing, so that the output scheduling strategy is the optimal scheduling strategy in the feasible scheduling strategy subset. Therefore, the data reuse rate of the scheduling strategy is used as an indicator to guide the dimensionality reduction processing of the feasible scheduling strategy subset. The higher the data reuse rate, the less repeated data movement, the better the performance, and the better the power consumption. The smaller.
  • Step 504 Training to obtain a cost model according to the sample operator parameters and Z feasible scheduling strategies.
  • the operator operation scheduling system inputs the sample operator parameters and Z feasible scheduling strategies into the corresponding processor to obtain the Z operator operation costs.
  • the Z operator operation costs correspond to the Z feasible scheduling strategies one by one, and then the Z operations
  • the sub-operation cost is the goal, and the cost model is trained according to the sample operator parameters and Z feasible scheduling strategies. That is, the operator operation cost of each feasible scheduling strategy is obtained by running the operator parameters and Z feasible scheduling strategies on the actual processor.
  • the operator operation cost of the Z feasible scheduling strategies is taken as the target, and the training can be A model for predicting the operating cost of a scheduling strategy.
  • the cost model is trained in the cost model training process in combination with the operator operation cost measured by the real processor.
  • the operator operation cost of the feasible scheduling strategy is directly output according to the cost model, which saves the need for The test time of the feasible scheduling strategy of the operator parameters and the processor parameters corresponding to the operator operation can realize the real-time output of the optimal scheduling strategy.
  • multiple scheduling strategies are created by using as many factors that are difficult to directly model in the processor and operator parameters in an offline state, and these scheduling strategies are classified into multiple scheduling strategy subsets, and then each The scheduling strategy subset is filtered to obtain the optimal scheduling strategy in each scheduling strategy subset, which reduces the number of scheduling strategy subsets to be searched.
  • the cost model obtained after training with the actual processor operating cost as the target improves Acquire the accuracy of the cost and improve the efficiency of online processing.
  • FIG. 6 is a schematic structural diagram of an embodiment of an operator operation scheduling apparatus 600 of this application.
  • the operator operation scheduling apparatus 600 includes a general scheduling module 601, a global feature module 602, The cost model 603 and the optimal scheduling selection module 604, wherein the general scheduling module 601 is used to obtain operator parameters and processor parameters corresponding to the operator operation; create N scheduling strategies according to the operator parameters and the processor parameters
  • the N scheduling strategies are classified into M scheduling strategy subsets, each of the scheduling strategy subsets includes at least one of the scheduling strategies, and both N and M are natural numbers.
  • the global feature module 602 is configured to filter the M scheduling strategy subsets according to the operator parameters and the processor parameters to obtain K feasible scheduling strategies, and the K feasible scheduling strategies are the The optimal scheduling strategy of the K feasible scheduling subsets in the M scheduling strategy subsets, K is a natural number less than or equal to M.
  • the cost model 603 is configured to input the operator parameters and the K feasible scheduling strategies into the cost model to obtain K operator operation costs, and the K operator operation costs correspond to the K feasible scheduling strategies one-to-one.
  • the optimal scheduling strategy selection module 604 is configured to determine the optimal scheduling strategy of the operator operation according to the target demand and the K operator operation costs.
  • the global feature module 602 is specifically configured to determine a constraint condition according to the operator parameter and the processor parameter, and to perform the calculation of the M scheduling policy subsets according to the constraint condition. Perform filtering processing to obtain the K feasible scheduling subsets; input the scheduling strategies in the K feasible scheduling subsets into the objective function to obtain the K feasible scheduling strategies, and the K feasible scheduling strategies are respectively the K The optimal scheduling strategy in a subset of feasible scheduling.
  • the optimal scheduling selection module 604 is specifically configured to select a feasible scheduling strategy that satisfies the target demand as the optimal scheduling strategy according to the operation costs of the K operators.
  • the operator parameters include at least one of the following information: operator type or operator size;
  • the processor parameters include at least one of the following information: processor architecture, cache level, cache capacity And bandwidth, computing power or processing frequency;
  • the scheduling strategy includes at least one of the following information: segmentation type, cyclic order or data flow.
  • FIG. 7 is a schematic structural diagram of an embodiment of a cost model training apparatus 700 of this application.
  • the cost model training apparatus 700 includes a general scheduling module 701, a global feature module 702, and a training module 703.
  • the general scheduling module 701 is configured to obtain sample operator parameters and sample processor parameters, where the sample operator parameters are obtainable operator parameters, and the sample processor parameters are obtainable processor parameters;
  • the sample operator parameters and the sample processor parameters create X scheduling policies, the X scheduling policies are classified into Y scheduling policy subsets, and each scheduling policy subset includes at least one scheduling policy, Both X and Y are natural numbers.
  • the global feature module 702 is configured to filter the Y scheduling policy subsets respectively according to the sample operator parameters and the sample processor parameters to obtain Z feasible scheduling policies, and the Z feasible scheduling policies are respectively In the optimal scheduling strategy of the Z feasible scheduling subsets in the Y scheduling strategy subsets, Z is a natural number less than or equal to Y.
  • the training module 703 is configured to train to obtain the cost model according to the sample operator parameters and the Z feasible scheduling strategies.
  • the training model 703 is specifically configured to input the sample operator parameters and the Z feasible scheduling strategies into the corresponding processor to obtain Z operator operation costs, and the Z The operator operation cost corresponds to the Z feasible scheduling strategies in a one-to-one correspondence; the Z operator operation costs are taken as a target, and the cost model is obtained by training according to the sample operator parameters and the Z feasible scheduling strategies.
  • the general scheduling module 601 and the global feature module 602 in FIG. 6 and the general scheduling module 701 and the global feature module 702 in FIG. 7 may be the same functional modules, and the training module 703 in FIG. 7 may be provided in FIG. 6
  • the cost model 603 calculates the cost.
  • Fig. 8 is a schematic structural diagram of an embodiment of a computer device of this application.
  • the computer device includes a processor 801, a memory 802, an input device 803, and an output device 804; the number of processors 801 in the computer device can be one or There are multiple, one processor 801 is taken as an example in FIG. 8; the processor 801, the memory 802, the input device 803, and the output device 804 in the computer equipment may be connected by a bus or other means. In FIG. 8, the connection by a bus is taken as an example.
  • the memory 802 can be used to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the method in the embodiment shown in FIG. 2 or FIG. 5 of this application.
  • the processor 801 executes various functional applications and data processing of the computer device by running the software programs, instructions, and modules stored in the memory 802, that is, realizes the aforementioned operator operation scheduling method.
  • the memory 802 may mainly include a program storage area and a data storage area.
  • the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the terminal, and the like.
  • the memory 802 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the memory 802 may further include a memory remotely provided with respect to the processor 801, and these remote memories may be connected to a computer device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the input device 803 can be used to receive inputted digital or character information, and generate key signal input related to user settings and function control of the computer equipment.
  • the output device 804 may include a display device such as a display screen.
  • the present application provides a computer-readable storage medium that stores an instruction, and when the instruction runs on a computer, it is used to execute the above-mentioned FIG. 2 or FIG. 5 The method in the embodiment.
  • this application provides a computer program, when the computer program is executed by a computer, it is used to execute the method in the embodiment shown in FIG. 2 or FIG. 5.
  • a person of ordinary skill in the art can understand that all or part of the steps in the foregoing method embodiments can be implemented by a program instructing relevant hardware.
  • the aforementioned program can be stored in a computer readable storage medium.
  • the steps including the foregoing method embodiments are executed; and the foregoing storage medium includes: ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Computational Linguistics (AREA)
  • Robotics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Stored Programmes (AREA)

Abstract

一种算子运算调度方法和装置。该方法包括:获取算子运算对应的算子参数和处理器参数(201);根据算子参数和处理器参数创建N个调度策略(202),N个调度策略分类为M个调度策略子集,每个调度策略子集包括至少一个调度策略;根据算子参数和处理器参数分别对M个调度策略子集进行过滤处理,获取K个可行调度策略(203),K个可行调度策略分别为M个调度策略子集中的K个可行调度子集的最优调度策略;将算子参数和K个可行调度策略输入代价模型得到K个算子运算代价(204),K个算子运算代价与K个可行调度策略一一对应;根据目标需求和K个算子运算代价确定用于算子运算的最优调度策略(205)。所述方法实现了对处理器架构和算子类型的通用处理,并提高算子运算性能。

Description

算子运算调度方法和装置
本申请要求于2019年4月9日提交中国专利局、申请号为201910282106.8、申请名称为“算子运算调度方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术,尤其涉及一种算子运算调度方法和装置。
背景技术
随着人工智能(Artificial Intelligence,简称:AI)的快速发展,越来越多的实际应用开始使用深度学习技术,例如,语音识别,机器翻译,自动驾驶等,深度学习技术在现实生活的应用中有巨大潜力,这也使得它越来越受到关注。而深度学习技术所采用的深度神经网络(Deep Neural Network,简称:DNN)的计算效率,直接影响着实际应用的效果,例如,自动驾驶中的目标检测,目标识别,运动预测等算法的计算时间决定了算法的可用性和安全性。因此,如何高性能地进行AI计算是工业产品落地的迫切需求。
AI计算通常使用两种类型的芯片:通用计算芯片和专用芯片。通用计算芯片包括中央处理器(Central Processing Unit,简称:CPU)和图形处理器(Graphics Processing Unit,简称:GPU)。而专用芯片,已经有很多公司纷纷在AI芯片上开始大力布局。因此面对多种芯片架构,如何使用通用的算法来提升异构芯片的计算效率,面临着巨大的困难和挑战。
目前,业界已经出现了一些解决方法,一种利用人工神经网络(Artificial Neural Network,简称:ANN)拟合嵌套循环切分对性能的影响,并用以指导嵌套循环切分的算子运算调度方法,该方法将嵌套循环切分的大小作为参数,通过对所有可能的嵌套循环切分的大小空间采样,建立嵌套循环切分的大小和性能的ANN模型,然后用ANN模型替代真实机器遍历所有可能的嵌套循环切分的大小空间,选出候选嵌套循环切分的大小,最后经过实际机器运行选出最优嵌套循环切分的大小。
但是,上述方法只针对一种算子运算,通用性不高,而且需要用ANN遍历所有可能的算子运算空间,非常耗时,还需要经过实际机器多次运行调度策略,无法满足调度策略的毫秒级实时输出。
发明内容
本申请提供一种算子运算调度方法和装置,以实现对处理器架构和算子类型的通用处理,并提高算子运算性能。
第一方面,本申请提供一种算子运算调度方法,包括:获取算子运算对应的算子参数和处理器参数;根据所述算子参数和所述处理器参数创建N个调度策略,所述N个调度策略分类为M个调度策略子集,每个所述调度策略子集包括至少一个所述调度策略,N和M均为自然数;根据所述算子参数和所述处理器参数分别对所述M个调度策略子集进行过滤处理,获取K个可行调度策略,所述K个可行调度策略分别为所述M个调度策略 子集中的K个可行调度子集的最优调度策略,K为小于或等于M的自然数;将所述算子参数和所述K个可行调度策略输入代价模型得到K个算子运算代价,所述K个算子运算代价与所述K个可行调度策略一一对应;根据目标需求和所述K个算子运算代价确定用于所述算子运算的最优调度策略。
本申请通过将处理器和算子参数中难以直接建模的因素用以创建多个调度策略,将这些调度策略分类为多个调度策略子集,然后对各个调度策略子集进行过滤处理得到每个调度策略子集中的最优调度策略,降低了待搜索的调度策略子集的数目,最后经过代价模型得到用于算子运算的最优调度策略,使得算子运算调度方法可以支持任意算子参数和处理器参数的调度策略输出,实现了对处理器架构和算子类型的通用处理,并提高算子运算性能。
在一种可能的实现方式中,所述根据所述算子参数和所述处理器参数分别对所述M个调度策略子集进行过滤处理,获取K个可行调度策略,包括:根据所述算子参数和所述处理器参数确定约束条件,根据所述约束条件分别对所述M个调度策略子集进行过滤处理得到所述K个可行调度子集;将所述K个可行调度子集中的调度策略分别输入目标函数获取所述K个可行调度策略,所述K个可行调度策略分别为所述K个可行调度子集中的最优调度策略。
在一种可能的实现方式中,所述根据目标需求和所述K个算子运算代价确定所述算子运算的最优调度策略,包括:根据所述K个算子运算代价选取满足所述目标需求的可行调度策略为所述最优调度策略。
在一种可能的实现方式中,所述算子参数包括以下至少一种信息:算子类型或者算子大小;所述处理器参数包括以下至少一种信息:处理器架构、缓存层级、缓存容量和带宽、计算能力或者处理主频;所述调度策略包括以下至少一种信息:切分类型、循环顺序或者数据流。
在一种可能的实现方式中,所述获取算子运算对应的算子参数和处理器参数之前,还包括:获取样本算子参数和样本处理器参数,所述样本算子参数为可获取的算子参数,所述本处理器参数为可获取的处理器参数;根据所述样本算子参数和所述样本处理器参数创建X个调度策略,所述X个调度策略分类为Y个调度策略子集,每个所述调度策略子集包括至少一个所述调度策略,X和Y均为自然数;根据所述样本算子参数和所述样本处理器参数分别对所述Y个调度策略子集进行过滤处理,获取Z个可行调度策略,所述Z个可行调度策略分别为所述Y个调度策略子集中的Z个可行调度子集的最优调度策略,Z为小于或等于Y的自然数;根据所述样本算子参数和所述Z个可行调度策略训练得到所述代价模型。
本申请在离线状态下,通过尽可能多的将处理器和算子参数中难以直接建模的因素用以创建多个调度策略,将这些调度策略分类为多个调度策略子集,然后对各个调度策略子集进行过滤处理得到每个调度策略子集中的最优调度策略,降低了待搜索的调度策略子集的数目,最后以实际处理器运行代价为目标经过训练得到的代价模型,提高了代价获取的准确性,并提高在线处理的效率。
在一种可能的实现方式中,所述根据所述样本算子参数和所述Z个可行调度策略训练得到所述代价模型,包括:将所述样本算子参数和所述Z个可行调度策略输入相应的处理 器得到Z个算子运算代价,所述Z个算子运算代价与所述Z个可行调度策略一一对应;以所述Z个算子运算代价为目标,根据所述样本算子参数和所述Z个可行调度策略训练得到所述代价模型。
第二方面,本申请提供一种算子运算调度装置,包括:通用调度模块,用于获取算子运算对应的算子参数和处理器参数;根据所述算子参数和所述处理器参数创建N个调度策略,所述N个调度策略分类为M个调度策略子集,每个所述调度策略子集包括至少一个所述调度策略,N和M均为自然数;全局特征模块,用于根据所述算子参数和所述处理器参数分别对所述M个调度策略子集进行过滤处理,获取K个可行调度策略,所述K个可行调度策略分别为所述M个调度策略子集中的K个可行调度子集的最优调度策略,K为小于或等于M的自然数;代价模型,用于将所述算子参数和所述K个可行调度策略输入代价模型得到K个算子运算代价,所述K个算子运算代价与所述K个可行调度策略一一对应;最优调度选择模块,用于根据目标需求和所述K个算子运算代价确定所述算子运算的最优调度策略。
在一种可能的实现方式中,所述全局特征模块,具体用于根据所述算子参数和所述处理器参数确定约束条件,根据所述约束条件分别对所述M个调度策略子集进行过滤处理得到所述K个可行调度子集;将所述K个可行调度子集中的调度策略分别输入目标函数获取所述K个可行调度策略,所述K个可行调度策略分别为所述K个可行调度子集中的最优调度策略。
在一种可能的实现方式中,所述最优调度选择模块,具体用于根据所述K个算子运算代价选取满足所述目标需求的可行调度策略为所述最优调度策略。
在一种可能的实现方式中,所述算子参数包括以下至少一种信息:算子类型或者算子大小;所述处理器参数包括以下至少一种信息:处理器架构、缓存层级、缓存容量和带宽、计算能力或者处理主频;所述调度策略包括以下至少一种信息:切分类型、循环顺序或者数据流。
在一种可能的实现方式中,还包括:训练模块;所述通用调度模块,还用于获取样本算子参数和样本处理器参数,所述样本算子参数为可获取的算子参数,所述样本处理器参数为可获取的处理器参数;根据所述样本算子参数和所述样本处理器参数创建X个调度策略,所述X个调度策略分类为Y个调度策略子集,每个所述调度策略子集包括至少一个所述调度策略,X和Y均为自然数;所述全局特征模块,还用于根据所述样本算子参数和所述样本处理器参数分别对所述Y个调度策略子集进行过滤处理,获取Z个可行调度策略,所述Z个可行调度策略分别为所述Y个调度策略子集中的Z个可行调度子集的最优调度策略,Z为小于或等于Y的自然数;所述训练模型,用于根据所述样本算子参数和所述Z个可行调度策略训练得到所述代价模型。
在一种可能的实现方式中,所述训练模型,具体用于将所述样本算子参数和所述Z个可行调度策略输入相应的处理器得到Z个算子运算代价,所述Z个算子运算代价与所述Z个可行调度策略一一对应;以所述Z个算子运算代价为目标,根据所述样本算子参数和所述Z个可行调度策略训练得到所述代价模型。
第三方面,本申请提供一种计算机设备,包括:
一个或多个处理器;
存储器,用于存储一个或多个程序;
当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如上述第一方面中任一的方法。
第四方面,本申请提供一种计算机可读存储介质,计算机可读存储介质存储有指令,当指令在计算机上运行时,用于执行上述第一方面中任一项的方法。
第五方面,本申请提供一种计算机程序,当计算机程序被计算机执行时,用于执行上述第一方面中任一项的方法。
附图说明
图1为本申请算子运算调度方法实施例的应用场景示意图;
图2为本申请算子运算调度方法实施例一的流程图;
图3为本申请通用调度策略子集实施例的示意图;
图4为本申请可行调度策略实施例的示意图;
图5为本申请算子运算调度方法实施例二的流程图;
图6为本申请算子运算调度装置600实施例的结构示意图;
图7为本申请代价模型训练装置700实施例的结构示意图;
图8为本申请计算机设备实施例的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
图1为本申请算子运算调度方法实施例的应用场景示意图,如图1所示,本实施例的方法由算子运算调度系统执行,算子运算调度系统包括算子运算调度装置和代价模型训练装置,其中,代价模型训练装置根据可实现的算子参数数据集和可采用的处理器参数数据集训练得到获取各调度策略的代价模型,该过程是离线处理过程,即在非算子运算调度的情况下,代价模型训练装置尽可能多的搜集算子参数和处理器参数,将这些参数进行处理器的实际测试得到调度策略的实际代价,再以算子参数和处理器参数与实际代价之间的对应关系为目标,训练出代价模型。算子运算调度装置以算子运算对应的算子参数和处理器参数作为输入,经过代价模型后得到算子运算的最优调度策略,该过程是在线处理过程,即针对每个实际算子运算调度中的算子参数和处理器参数,结合训练得到的代价模型,确定其对应的最优调度策略。本申请的算子运算调度方法以包含在算子运算调度系统的软件中,并部署在处理器硬件上的程序代码的形式实现。
需要说明的是,算子运算是对需进行运算过程的抽象描述,包括运算方法步骤和各步骤的操作数据等。调度策略是算子运算实际实现时的具体步骤,包括数据流的流向、数据切分大小和运算的先后次序等。调度策略子集是由调度策略组成的空间,空间中的每一个点代表一种调度策略。
图2为本申请算子运算调度方法实施例一的流程图,如图2所示,本实施例的方法由 算子运算调度系统执行,该方法可以包括:
步骤201、获取算子运算对应的算子参数和处理器参数。
算子运算调度系统要确定用于算子运算的最优调度策略,首先需要获取相关算子参数和处理器参数,其中,算子参数包括以下至少一种信息:算子类型或者算子大小等,处理器参数包括以下至少一种信息:处理器架构、缓存层级、缓存容量和带宽、计算能力或者处理主频等。这些参数可以通过输入端输入的数据获取,也可以内部获取,根据处理器的型号或类型获取。
步骤202、根据算子参数和处理器参数创建N个调度策略。
上述N个调度策略分类为M个调度策略子集,每个调度策略子集包括至少一个调度策略,N和M均为自然数。本申请中算子运算调度系统将处理器和算子参数中难以直接建模的因素,例如,算子类型、数据格式、处理器架构等,用以创建N个调度策略,再把这N个调度策略分类为M个调度策略子集,在每个调度策略子集中设定关联算子参数和处理器参数的通用过滤条件,可以实现硬件参数和软件参数的隔离,这样就可以支持任意算子参数和处理器参数的调度策略输出,保证调度方法可以适用于任何算子参数和任意处理器。由于算子运算调度系统不是直接针对处理器架构和算子类型进行建模,而是构建M个调度策略子集,从而避免了建模精度对调度策略的影响和建模的时间消耗,以及建模的不可迁移性,实现了对处理器架构和算子类型的通用性。示例性的,图3为本申请调度策略子集实施例的示意图,如图3所示,调度策略子集i和调度策略子集j在以下因素中存在差异,包括:切分类型、循环顺序和数据流,即调度策略子集i中矩阵A和矩阵B分别按行和列切分,计算时先按照矩阵A的方式循环切分,再按照矩阵B的方式循环切分,整个切分过程经过三级缓存。调度策略子集j中矩阵A和矩阵B同时采用行和列切分,计算时先按照矩阵B的方式先列后行循环切分,再按照矩阵A的方式先行后列循环切分,整个切分过程经过两级缓存。
步骤203、根据算子参数和处理器参数分别对M个调度策略子集进行过滤处理,获取K个可行调度策略。
K个可行调度策略分别为M个调度策略子集中的K个可行调度子集的最优调度策略,K为小于或等于M的自然数。算子运算调度系统根据算子参数和处理器参数确定约束条件,根据约束条件分别对M个调度策略子集进行过滤处理得到K个可行调度子集。由于根据约束条件可能从M个调度策略子集中过滤掉一个或多个子集,或者M个调度策略子集全部保留,因此K为小于或等于M的自然数。然后将K个可行调度子集中的调度策略分别输入目标函数获取K个可行调度策略,K个可行调度策略分别为K个可行调度子集中的最优调度策略。本申请中算子运算调度系统先根据约束条件对M个调度策略子集进行一次过滤处理,过滤掉不可行的调度策略子集和/或调度策略,降低了待搜索的调度策略的数目,然后将K个可行调度子集中的调度策略分别输入目标函数进行进一步的过滤处理,使得输出的调度策略为可行调度策略子集中的最优的调度策略。由此以调度策略的数据复用率为指标,指导对可行调度策略子集的降维处理,数据复用率越高,重复的数据搬移就越少,性能也就越好,功耗也能越小。示例性的,图4为本申请可行调度策略子集实施例的示意图,如图4所示,根据算子参数和处理器参数对调度策略子集过滤得到可行调度策略子集,提取算子参数、处理器参数和调度策略的特征,然后对可行调度策略子集以 数据复用率为目标进行降维处理,得到该可行调度策略子集中的最优调度策略。可行调度策略子集包括两个调度策略(i和j),调度策略i只对矩阵(A和B)进行一维(行或列)切分,切分后的矩阵超出了缓存大小,因此是不可行调度策略。调度策略j同时对矩阵(A和B)进行二维(行和列)进行切分,切分后的矩阵满足缓存大小要求,因此是可行调度策略子集中的最优调度策略,该调度策略的参数包括矩阵A的高度和宽度(A_H、A_W)和矩阵B的高度和宽度(B_H、B_W)。
步骤204、将算子参数和K个可行调度策略输入代价模型得到K个算子运算代价。
K个算子运算代价与K个可行调度策略一一对应。算子运算调度系统将K个可行调度策略分别经过代价模型得到对应的算子运算代价,该算子运算代价包括但不限于算子运算的时间消耗、运行总功耗等。代价模型是预测调度策略的运行代价的模型,如上所述,代价模型是算子运算调度系统在离线处理过程中训练得到的。
步骤205、根据目标需求和K个算子运算代价确定用于算子运算的最优调度策略。
算子运算调度系统根据K个算子运算代价选取满足目标需求的可行调度策略为最优调度策略,目标需求包括客户的最优标准,例如,性能、功耗等的实现标准,在算子运算代价尽可能低的前提下,算子运算调度系统可以参照目标需求从K个可行调度策略中选取其中之一作为用于算子运算的最优调度策略,包括功耗最低,性能符合客户标准等。
本申请,通过将处理器和算子参数中难以直接建模的因素用以创建多个调度策略,将这些调度策略分类为多个调度策略子集,然后对各个调度策略子集进行过滤处理得到每个调度策略子集中的最优调度策略,降低了待搜索的调度策略子集的数目,最后经过代价模型得到用于算子运算的最优调度策略,使得算子运算调度方法可以支持任意算子参数和处理器参数的调度策略输出,实现了对处理器架构和算子类型的通用处理,并提高算子运算性能。
上述实施例一是算子运算调度系统在线根据算子运算对应的算子参数和处理器参数确定最优调度策略的过程,该过程中涉及到的代价模型是在离线过程中训练得到的,图5为本申请算子运算调度方法实施例二的流程图,如图5所示,本申请中代价模型的训练过程可以包括:
步骤501、获取样本算子参数和样本处理器参数。
样本算子参数为可获取的算子参数,样本处理器参数为可获取的处理器参数。为了训练出适用于任意算子参数和处理器参数的代价模型,算子运算调度系统可以尽可能多得搜集可获取的算子参数和可获取的处理器参数,其中,算子参数包括以下至少一种信息:算子类型或者算子大小等,处理器参数包括以下至少一种信息:处理器架构、缓存层级、缓存容量和带宽、计算能力或者处理主频等。这些参数可以通过输入端输入的数据,或者根据处理器的型号或类型获取。
步骤502、根据样本算子参数和样本处理器参数创建X个调度策略。
X个调度策略分类为Y个调度策略子集,每个调度策略子集包括至少一个调度策略,X和Y均为自然数。算子运算调度系统将上述参数中难以直接建模的因素,例如,算子类型、数据格式、处理器架构等,用以创建X个调度策略,再把这X个调度策略分类为Y个调度策略子集,在每个调度策略子集中设定关联算子参数和处理器参数的通用过滤条件,可以实现硬件参数和软件参数的隔离,这样就可以支持任意算子参数和处理器参数的 调度策略输出,确保算子运算对处理器和算子类型的通用性。
步骤503、根据样本算子参数和样本处理器参数分别对Y个调度策略子集进行过滤处理,获取Z个可行调度策略。
Z个可行调度策略分别为Y个调度策略子集中的Z个可行调度子集的最优调度策略,Z为小于或等于Y的自然数。算子运算调度系统根据算子参数和处理器参数确定约束条件,根据约束条件分别对Y个调度策略子集进行过滤处理得到Z个可行调度子集。由于根据约束条件可能从Y个调度策略子集中过滤掉一个或多个子集,或者Y个调度策略子集全部保留,因此Z为小于或等于Y的自然数。然后将Z个可行调度子集中的调度策略分别输入目标函数获取Z个可行调度策略,Z个可行调度策略分别为Z个可行调度子集中的最优调度策略。本申请中算子运算调度系统先根据约束条件对Y个调度策略子集进行一次过滤处理,过滤掉不可行的调度策略子集和/或调度策略,降低了待搜索的调度策略的数目,然后将Z个可行调度子集中的调度策略分别输入目标函数进行进一步的过滤处理,使得输出的调度策略为可行调度策略子集中的最优的调度策略。由此以调度策略的数据复用率为指标,指导对可行调度策略子集的降维处理,数据复用率越高,重复的数据搬移就越少,性能也就越好,功耗也能越小。
步骤504、根据样本算子参数和Z个可行调度策略训练得到代价模型。
算子运算调度系统将样本算子参数和Z个可行调度策略输入相应的处理器得到Z个算子运算代价,Z个算子运算代价与Z个可行调度策略一一对应,再以Z个算子运算代价为目标,根据样本算子参数和Z个可行调度策略训练得到代价模型。即通过将算子参数和Z个可行调度策略在实际的处理器上运行得到各个可行调度策略的算子运算代价,反过来再以Z个可行调度策略的算子运算代价为目标,训练出可以预测调度策略的运行代价的模型。本申请在代价模型的训练过程中结合真实处理器实测得到的算子运算代价训练代价模型,而在算子运算调度过程中直接根据代价模型输出可行调度策略的算子运算代价,节省了需要对算子运算对应的算子参数和处理器参数的可行调度策略的测试时间,实现最优调度策略的实时输出。
本申请,在离线状态下通过尽可能多的将处理器和算子参数中难以直接建模的因素用以创建多个调度策略,将这些调度策略分类为多个调度策略子集,然后对各个调度策略子集进行过滤处理得到每个调度策略子集中的最优调度策略,降低了待搜索的调度策略子集的数目,最后以实际处理器运行代价为目标经过训练得到的代价模型,提高了代价获取的准确性,并提高在线处理的效率。
在图1所示示意图的基础上,图6为本申请算子运算调度装置600实施例的结构示意图,如图6所示,算子运算调度装置600包括通用调度模块601、全局特征模块602、代价模型603和最优调度选择模块604,其中,通用调度模块601用于获取算子运算对应的算子参数和处理器参数;根据所述算子参数和所述处理器参数创建N个调度策略,所述N个调度策略分类为M个调度策略子集,每个所述调度策略子集包括至少一个所述调度策略,N和M均为自然数。全局特征模块602用于根据所述算子参数和所述处理器参数分别对所述M个调度策略子集进行过滤处理,获取K个可行调度策略,所述K个可行调度策略分别为所述M个调度策略子集中的K个可行调度子集的最优调度策略,K为小于或等于M的自然数。代价模型603用于将所述算子参数和所述K个可行调度策略输入代价 模型得到K个算子运算代价,所述K个算子运算代价与所述K个可行调度策略一一对应。最优调度策略选择模块604用于根据目标需求和所述K个算子运算代价确定所述算子运算的最优调度策略。
在一种可能的实现方式中,所述全局特征模块602,具体用于根据所述算子参数和所述处理器参数确定约束条件,根据所述约束条件分别对所述M个调度策略子集进行过滤处理得到所述K个可行调度子集;将所述K个可行调度子集中的调度策略分别输入目标函数获取所述K个可行调度策略,所述K个可行调度策略分别为所述K个可行调度子集中的最优调度策略。
在一种可能的实现方式中,所述最优调度选择模块604,具体用于根据所述K个算子运算代价选取满足所述目标需求的可行调度策略为所述最优调度策略。
在一种可能的实现方式中,所述算子参数包括以下至少一种信息:算子类型或者算子大小;所述处理器参数包括以下至少一种信息:处理器架构、缓存层级、缓存容量和带宽、计算能力或者处理主频;所述调度策略包括以下至少一种信息:切分类型、循环顺序或者数据流。
在图1所示示意图的基础上,图7为本申请代价模型训练装置700实施例的结构示意图,如图7所示,代价模型训练装置700包括通用调度模块701、全局特征模块702和训练模块703,其中,通用调度模块701用于获取样本算子参数和样本处理器参数,所述样本算子参数为可获取的算子参数,所述样本处理器参数为可获取的处理器参数;根据所述样本算子参数和所述样本处理器参数创建X个调度策略,所述X个调度策略分类为Y个调度策略子集,每个所述调度策略子集包括至少一个所述调度策略,X和Y均为自然数。全局特征模块702用于根据所述样本算子参数和所述样本处理器参数分别对所述Y个调度策略子集进行过滤处理,获取Z个可行调度策略,所述Z个可行调度策略分别为所述Y个调度策略子集中的Z个可行调度子集的最优调度策略,Z为小于或等于Y的自然数。训练模块703用于根据所述样本算子参数和所述Z个可行调度策略训练得到所述代价模型。
在一种可能的实现方式中,所述训练模型703,具体用于将所述样本算子参数和所述Z个可行调度策略输入相应的处理器得到Z个算子运算代价,所述Z个算子运算代价与所述Z个可行调度策略一一对应;以所述Z个算子运算代价为目标,根据所述样本算子参数和所述Z个可行调度策略训练得到所述代价模型。
需要说明的是,图6通用调度模块601和全局特征模块602与图7中的通用调度模块701和全局特征模块702可以为相同的功能模块,图7中的训练模块703可以提供给图6中的代价模型603计算代价。
图8为本申请计算机设备实施例的结构示意图,如图8所示,该计算机设备包括处理器801、存储器802、输入装置803和输出装置804;计算机设备中处理器801的数量可以是一个或多个,图8中以一个处理器801为例;计算机设备中的处理器801、存储器802、输入装置803和输出装置804可以通过总线或其他方式连接,图8中以通过总线连接为例。
存储器802作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请图2或图5所示实施例中的方法对应的程序指令/模块。处理器801通过运行存储在存储器802中的软件程序、指令以及模块,从而执行计算机设备的各种功能应用以及数据处理,即实现上述的算子运算调度方法。
存储器802可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储器802可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器802可进一步包括相对于处理器801远程设置的存储器,这些远程存储器可以通过网络连接至计算机设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置803可用于接收输入的数字或字符信息,以及产生与计算机设备的用户设置以及功能控制有关的键信号输入。输出装置804可包括显示屏等显示设备。
在一种可能的实现方式中,本申请提供一种计算机可读存储介质,该计算机可读存储介质存储有指令,当该指令在计算机上运行时,用于执行上述图2或图5所示实施例中的方法。
在一种可能的实现方式中,本申请提供一种计算机程序,当所述计算机程序被计算机执行时,用于执行上述图2或图5所示实施例中的方法。
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上各实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述各实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (15)

  1. 一种算子运算调度方法,其特征在于,包括:
    获取算子运算对应的算子参数和处理器参数;
    根据所述算子参数和所述处理器参数创建N个调度策略,所述N个调度策略分类为M个调度策略子集,每个所述调度策略子集包括至少一个所述调度策略,N和M均为自然数;
    根据所述算子参数和所述处理器参数分别对所述M个调度策略子集进行过滤处理,获取K个可行调度策略,所述K个可行调度策略分别为所述M个调度策略子集中的K个可行调度子集的最优调度策略,K为小于或等于M的自然数;
    将所述算子参数和所述K个可行调度策略输入代价模型得到K个算子运算代价,所述K个算子运算代价与所述K个可行调度策略一一对应;
    根据目标需求和所述K个算子运算代价确定用于所述算子运算的最优调度策略。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述算子参数和所述处理器参数分别对所述M个调度策略子集进行过滤处理,获取K个可行调度策略,包括:
    根据所述算子参数和所述处理器参数确定约束条件,根据所述约束条件分别对所述M个调度策略子集进行过滤处理得到所述K个可行调度子集;
    将所述K个可行调度子集中的调度策略分别输入目标函数获取所述K个可行调度策略,所述K个可行调度策略分别为所述K个可行调度子集中的最优调度策略。
  3. 根据权利要求1或2所述的方法,其特征在于,所述根据目标需求和所述K个算子运算代价确定所述算子运算的最优调度策略,包括:
    根据所述K个算子运算代价选取满足所述目标需求的可行调度策略为所述最优调度策略。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述算子参数包括以下至少一种信息:算子类型或者算子大小;所述处理器参数包括以下至少一种信息:处理器架构、缓存层级、缓存容量和带宽、计算能力或者处理主频;所述调度策略包括以下至少一种信息:切分类型、循环顺序或者数据流。
  5. 根据权利要求1-4中任一项所述的方法,其特征在于,所述获取算子运算对应的算子参数和处理器参数之前,还包括:
    获取样本算子参数和样本处理器参数,所述样本算子参数为可获取的算子参数,所述样本处理器参数为可获取的处理器参数;
    根据所述样本算子参数和所述样本处理器参数创建X个调度策略,所述X个调度策略分类为Y个调度策略子集,每个所述调度策略子集包括至少一个所述调度策略,X和Y均为自然数;
    根据所述样本算子参数和所述样本处理器参数分别对所述Y个调度策略子集进行过滤处理,获取Z个可行调度策略,所述Z个可行调度策略分别为所述Y个调度策略子集中的Z个可行调度子集的最优调度策略,Z为小于或等于Y的自然数;
    根据所述样本算子参数和所述Z个可行调度策略训练得到所述代价模型。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述样本算子参数和所述Z 个可行调度策略训练得到所述代价模型,包括:
    将所述样本算子参数和所述Z个可行调度策略输入相应的处理器得到Z个算子运算代价,所述Z个算子运算代价与所述Z个可行调度策略一一对应;
    以所述Z个算子运算代价为目标,根据所述样本算子参数和所述Z个可行调度策略训练得到所述代价模型。
  7. 一种算子运算调度装置,其特征在于,包括:
    通用调度模块,用于获取算子运算对应的算子参数和处理器参数;根据所述算子参数和所述处理器参数创建N个调度策略,所述N个调度策略分类为M个调度策略子集,每个所述调度策略子集包括至少一个所述调度策略,N和M均为自然数;
    全局特征模块,用于根据所述算子参数和所述处理器参数分别对所述M个调度策略子集进行过滤处理,获取K个可行调度策略,所述K个可行调度策略分别为所述M个调度策略子集中的K个可行调度子集的最优调度策略,K为小于或等于M的自然数;
    代价模型,用于将所述算子参数和所述K个可行调度策略输入代价模型得到K个算子运算代价,所述K个算子运算代价与所述K个可行调度策略一一对应;
    最优调度选择模块,用于根据目标需求和所述K个算子运算代价确定用于所述算子运算的最优调度策略。
  8. 根据权利要求7所述的装置,其特征在于,所述全局特征模块,具体用于根据所述算子参数和所述处理器参数确定约束条件,根据所述约束条件分别对所述M个调度策略子集进行过滤处理得到所述K个可行调度子集;将所述K个可行调度子集中的调度策略分别输入目标函数获取所述K个可行调度策略,所述K个可行调度策略分别为所述K个可行调度子集中的最优调度策略。
  9. 根据权利要求7或8所述的装置,其特征在于,所述最优调度选择模块,具体用于根据所述K个算子运算代价选取满足所述目标需求的可行调度策略为所述最优调度策略。
  10. 根据权利要求7-9中任一项所述的装置,其特征在于,所述算子参数包括以下至少一种信息:算子类型或者算子大小;所述处理器参数包括以下至少一种信息:处理器架构、缓存层级、缓存容量和带宽、计算能力或者处理主频;所述调度策略包括以下至少一种信息:切分类型、循环顺序或者数据流。
  11. 根据权利要求7-10中任一项所述的装置,其特征在于,还包括:训练模块;
    所述通用调度模块,还用于获取样本算子参数和样本处理器参数,所述样本算子参数为可获取的算子参数,所述样本处理器参数为可获取的处理器参数;根据所述样本算子参数和所述样本处理器参数创建X个调度策略,所述X个调度策略分类为Y个调度策略子集,每个所述调度策略子集包括至少一个所述调度策略,X和Y均为自然数;
    所述全局特征模块,还用于根据所述样本算子参数和所述样本处理器参数分别对所述Y个调度策略子集进行过滤处理,获取Z个可行调度策略,所述Z个可行调度策略分别为所述Y个调度策略子集中的Z个可行调度子集的最优调度策略,Z为小于或等于Y的自然数;
    所述训练模型,用于根据所述样本算子参数和所述Z个可行调度策略训练得到所述代价模型。
  12. 根据权利要求11所述的装置,其特征在于,所述训练模型,具体用于将所述样本算子参数和所述Z个可行调度策略输入相应的处理器得到Z个算子运算代价,所述Z个算子运算代价与所述Z个可行调度策略一一对应;以所述Z个算子运算代价为目标,根据所述样本算子参数和所述Z个可行调度策略训练得到所述代价模型。
  13. 一种计算机设备,其特征在于,包括:
    一个或多个处理器;
    存储器,用于存储一个或多个程序;
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-6中任一所述的方法。
  14. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有指令,当所述指令在计算机上运行时,用于执行权利要求1-6中任一项所述的方法。
  15. 一种计算机程序,其特征在于,当所述计算机程序被计算机执行时,用于执行权利要求1-6中任一项所述的方法。
PCT/CN2020/083635 2019-04-09 2020-04-08 算子运算调度方法和装置 WO2020207393A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/144,780 US11934866B2 (en) 2019-04-09 2021-01-08 Operator operation scheduling method and apparatus to determine an optimal scheduling policy for an operator operation

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910282106.8A CN111796917B (zh) 2019-04-09 2019-04-09 算子运算调度方法和装置
CN201910282106.8 2019-04-09

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/144,780 Continuation US11934866B2 (en) 2019-04-09 2021-01-08 Operator operation scheduling method and apparatus to determine an optimal scheduling policy for an operator operation

Publications (1)

Publication Number Publication Date
WO2020207393A1 true WO2020207393A1 (zh) 2020-10-15

Family

ID=72750916

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/083635 WO2020207393A1 (zh) 2019-04-09 2020-04-08 算子运算调度方法和装置

Country Status (3)

Country Link
US (1) US11934866B2 (zh)
CN (1) CN111796917B (zh)
WO (1) WO2020207393A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112698897A (zh) * 2020-12-29 2021-04-23 长威信息科技发展股份有限公司 一种可视化大数据算子编排的方法及系统
CN113342631B (zh) * 2021-07-02 2022-08-26 厦门美图之家科技有限公司 分发管理优化方法、装置和电子设备
CN114064242A (zh) * 2021-11-12 2022-02-18 中兴通讯股份有限公司 调度参数的调整方法、设备及存储介质
CN117785492B (zh) * 2024-02-28 2024-05-17 上海燧原智能科技有限公司 一种算子的切分方式确定方法、装置、设备及介质
CN118093455A (zh) * 2024-04-23 2024-05-28 北京壁仞科技开发有限公司 数据加载方法、数据加载装置、处理器和电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013080152A1 (en) * 2011-12-01 2013-06-06 International Business Machines Corporation Dynamically configurable placement engine
CN104683488A (zh) * 2015-03-31 2015-06-03 百度在线网络技术(北京)有限公司 流式计算系统及其调度方法和装置
CN106708838A (zh) * 2015-11-12 2017-05-24 华为技术有限公司 用于流数据查询的方法和装置
CN107273193A (zh) * 2017-04-28 2017-10-20 中国科学院信息工程研究所 一种基于dag的面向多计算框架的数据处理方法及系统
CN108491274A (zh) * 2018-04-02 2018-09-04 深圳市华傲数据技术有限公司 分布式数据管理的优化方法、装置、存储介质及设备
CN109189572A (zh) * 2018-08-02 2019-01-11 中兴飞流信息科技有限公司 一种资源预估方法及系统、电子设备和存储介质

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05250377A (ja) * 1992-03-04 1993-09-28 Fujitsu Ltd スケジューリング方式
EP1630671A1 (en) * 2004-08-24 2006-03-01 International Business Machines Corporation Framework for pluggable schedulers
US8028293B2 (en) * 2007-06-28 2011-09-27 Microsoft Corporation Optimal policies for load balancing for distributed and strategic agents (more technically, optimal coordination mechanisms for machine scheduling)
US10740330B2 (en) * 2013-03-15 2020-08-11 Teradata Us, Inc. Multi-platform optimization
US10802876B2 (en) * 2013-05-22 2020-10-13 Massachusetts Institute Of Technology Multiprocessor scheduling policy with deadline constraint for determining multi-agent schedule for a plurality of agents
GB201515318D0 (en) * 2015-08-20 2015-10-14 Servicepower Business Solutions Ltd Infeasible schedules in a quantum annealing optimisation process
WO2018165607A1 (en) * 2017-03-10 2018-09-13 Rigetti & Co, Inc. Event scheduling in a hybrid computing system
WO2020040763A1 (en) * 2018-08-23 2020-02-27 Siemens Aktiengesellschaft Real-time production scheduling with deep reinforcement learning and monte carlo tree search
US11042405B2 (en) * 2019-01-10 2021-06-22 Vmware, Inc. Scheduling and executing functions across different functions-as-a-service (FAAS) infrastructures

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013080152A1 (en) * 2011-12-01 2013-06-06 International Business Machines Corporation Dynamically configurable placement engine
CN104683488A (zh) * 2015-03-31 2015-06-03 百度在线网络技术(北京)有限公司 流式计算系统及其调度方法和装置
CN106708838A (zh) * 2015-11-12 2017-05-24 华为技术有限公司 用于流数据查询的方法和装置
CN107273193A (zh) * 2017-04-28 2017-10-20 中国科学院信息工程研究所 一种基于dag的面向多计算框架的数据处理方法及系统
CN108491274A (zh) * 2018-04-02 2018-09-04 深圳市华傲数据技术有限公司 分布式数据管理的优化方法、装置、存储介质及设备
CN109189572A (zh) * 2018-08-02 2019-01-11 中兴飞流信息科技有限公司 一种资源预估方法及系统、电子设备和存储介质

Also Published As

Publication number Publication date
US20210132990A1 (en) 2021-05-06
US11934866B2 (en) 2024-03-19
CN111796917A (zh) 2020-10-20
CN111796917B (zh) 2024-06-25

Similar Documents

Publication Publication Date Title
WO2020207393A1 (zh) 算子运算调度方法和装置
McDanel et al. Embedded binarized neural networks
CN109840589A (zh) 一种在fpga上运行卷积神经网络的方法、装置及系统
CN111819580A (zh) 用于密集图像预测任务的神经架构搜索
CN109711528A (zh) 基于特征图变化对卷积神经网络剪枝的方法
CN114549563A (zh) 一种基于DeepLabV3+的复合绝缘子实时分割方法及系统
CN113627389A (zh) 一种目标检测的优化方法及设备
CN109523022A (zh) 终端数据处理方法、装置及终端
Patel et al. MAG-D: A multivariate attention network based approach for cloud workload forecasting
CN112052027A (zh) 一种处理ai任务的方法及装置
US20220335293A1 (en) Method of optimizing neural network model that is pre-trained, method of providing a graphical user interface related to optimizing neural network model, and neural network model processing system performing the same
Long et al. Complexity-aware adaptive training and inference for edge-cloud distributed AI systems
CN114169506A (zh) 一种基于工业物联网平台的深度学习边缘计算系统框架
CN116432736A (zh) 神经网络模型优化方法、装置及计算设备
CN110855474B (zh) Kqi数据的网络特征提取方法、装置、设备及存储介质
CN104732278A (zh) 一种基于海云协同架构的深度神经网络训练方法
Ding et al. JMDC: A joint model and data compression system for deep neural networks collaborative computing in edge-cloud networks
CN113485848B (zh) 深度神经网络部署方法、装置、计算机设备和存储介质
Wang et al. Edge computing for artificial intelligence
US20210241068A1 (en) Convolutional neural network
Song et al. Residual Squeeze-and-Excitation Network for Battery Cell Surface Inspection
CN115202879A (zh) 基于多类型智能模型的云边协同调度方法及应用
CN115052154A (zh) 一种模型训练和视频编码方法、装置、设备及存储介质
CN110989040B (zh) 一种基于切片处理的人工智能雷电临近预警方法及系统
KR20220144281A (ko) 신경망 모델의 최적화 방법 및 이를 수행하는 신경망 모델 처리 시스템

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20788184

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20788184

Country of ref document: EP

Kind code of ref document: A1