WO2024060916A1 - Operator operation mode configuration method and apparatus, and related system - Google Patents

Operator operation mode configuration method and apparatus, and related system Download PDF

Info

Publication number
WO2024060916A1
WO2024060916A1 PCT/CN2023/114411 CN2023114411W WO2024060916A1 WO 2024060916 A1 WO2024060916 A1 WO 2024060916A1 CN 2023114411 W CN2023114411 W CN 2023114411W WO 2024060916 A1 WO2024060916 A1 WO 2024060916A1
Authority
WO
WIPO (PCT)
Prior art keywords
operator
operators
configuration
operation mode
configuration parameters
Prior art date
Application number
PCT/CN2023/114411
Other languages
French (fr)
Chinese (zh)
Inventor
俞郑中
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024060916A1 publication Critical patent/WO2024060916A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the field of compilers, and in particular to a configuration method, device and related systems for an operator operation mode.
  • the computing power of an AI chip is fixed.
  • the Ascend 710 AI processor can calculate float16 twice as fast as float32 in vector calculations. In some calculations, the computing power gap is even greater than that of float32. 2 times.
  • the industry uses mixed precision methods. For example, the calculation types of some layers of the AI model are converted to fp16 calculations, and the calculations of some layers still maintain fp32 calculations. This kind of entire AI
  • the model includes both fp16 and fp32 mixed precision calculation methods to improve the execution efficiency of the entire AI model without significant loss of accuracy.
  • the computing efficiency of the same chip in different types of operations is also very different. For example: exp, log, the computing efficiency of these computing chips for division is much lower than that of addition, subtraction, multiplication, and reciprocal operations.
  • the existing AI framework does not provide a good expansion capability and can only perform operations according to a fixed processing method, which brings inconvenience to the performance and accuracy tuning of the AI model.
  • Embodiments of the present application provide an operator operation mode configuration method, device and related systems to ensure the operation accuracy of the operators in the operator model and improve the operation performance of the operators.
  • embodiments of the present application provide a method for configuring operator operation modes, including: determining M operators corresponding to M operator nodes based on an algorithm model, where one operator node corresponds to one operator, and each The operators belong to one of the N types of operators. Each type of operator corresponds to multiple operating modes. The operating accuracy and/or operating speed between each of the operating modes are different; M is greater than or An integer equal to N, where N is an integer greater than or equal to 1; configure the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node The operating mode; wherein the configuration parameters or the configuration options are used to indicate the operating mode of the corresponding operator.
  • the operator after determining the operator corresponding to each of the M operator nodes of the operator model and the N types of operators corresponding to the M operators, the operator can be Configuration parameters corresponding to the operator or configuration options corresponding to each operator node configure the operation mode of the M operators in the algorithm model.
  • the operators of each type of N types of operators can have a variety of operation modes with different operation accuracy and/or operation speed, which leads to the algorithm model also having multiple operation modes. In order to ensure that the operators in the operator model The operation accuracy and the operation performance of the operator can be improved at the same time.
  • the embodiments of this application can be based on the operation characteristics of the operator model, based on the configuration parameters corresponding to each type of operator in the algorithm model or based on each operator node in the algorithm model.
  • the corresponding configuration options configure the operation mode of the M operators in the algorithm model.
  • the method also includes: setting the configuration parameters corresponding to each type of operator in the operator interface corresponding to each type of operator in the N types of operators; configuring the operating modes of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator, including: calling the N types of operators and obtaining the corresponding configuration parameters through the operator interface corresponding to each type of operator in the N types of operators; configuring the operating modes of each type of operator in the algorithm model based on the configuration parameters corresponding to each type of operator in the N types of operators.
  • setting the configuration parameters corresponding to each type of operator in the operator implementation interface can realize operator-level precision mode setting.
  • operators with different operating modes can be compiled according to different configuration parameters in the operator implementation interface, and finally the accuracy and/or performance of the operator model can be adjusted. This makes the optimization of algorithm model performance and accuracy more flexible.
  • the configuration parameters are one of target configuration parameters, default configuration parameters, and tuning configuration parameters;
  • the target configuration parameter is used to instruct the corresponding operator to run in the target operation mode, and the target operation mode is one of the multiple operation modes;
  • the default configuration parameter is used to instruct the corresponding operator to run in the highest operation mode.
  • the tuning configuration parameters are used to instruct the corresponding operator to run in the optimal operation mode, and the optimal operation mode is determined based on the operator model and the The above-mentioned operation mode with the highest operation accuracy has an operation mode with the operation result error within the preset threshold range and the fastest operation speed.
  • the configuration parameters can be divided into three categories, for example, the configuration parameters can be any one of the target configuration parameters, the default configuration parameters, and the tuning configuration parameters.
  • the target configuration parameters can be used to indicate an operation mode set by the user;
  • the default configuration parameters can be used to indicate that the operator of this type is operated in the operation mode with the highest computing accuracy or the fastest computing speed;
  • the tuning configuration parameters can be used to indicate that the operator is operated in the optimal operation mode under the computing characteristics of the current operator model.
  • the configuration parameters include a configuration path corresponding to the operating mode; and based on the configuration parameters corresponding to each type of the N types of operators, the configuration parameters in the algorithm model are
  • the operation mode corresponding to each type of operator in the N types of operators includes: based on the configuration parameters corresponding to each type of operators in the N types of operators, obtaining the operation mode of each type of operators in the N types of operators.
  • the configuration path of the corresponding operating mode of the operator; the corresponding configuration file is read based on the configuration path to configure the operating mode of the corresponding type of operator during the operation of the algorithm model.
  • the configuration parameters may include corresponding operating mode configuration paths.
  • the configuration path is used to indicate the configuration file corresponding to the operation mode.
  • the configuration file includes relevant configuration information, so that after reading the configuration file, the operation mode of the corresponding type of operator can be configured during the operation of the algorithm model. For example: when the configuration parameters of the first type of operator are target configuration parameters, they can include the configuration path of the target operating mode, and read the configuration file of the target operating mode based on the configuration path of the target operating mode; and then configure the first operating mode based on the configuration file. Class operators run in target run mode.
  • determining the M operators corresponding to the M operator nodes based on the algorithm model includes: receiving input information and determining the M operator nodes and each of the M operator nodes based on the input information.
  • Configuring the operation mode of the M operators in the algorithm model includes: calling the N types of operators respectively through the operator interface corresponding to each type of operator in the N types of operators; based on each of the The configuration options corresponding to the operator nodes configure the operation mode of the operator corresponding to each of the M operator nodes in the algorithm model.
  • the configuration options corresponding to each operator node can also be set based on the user's input information to achieve operator node level Precision mode setting, that is, operators of the same type in different operator nodes can be configured with different operating modes.
  • This configuration option is the same as the configuration parameter. You can also choose one of the three configuration parameters to set.
  • each operator node calls this type of operator through the corresponding type of operator interface, it can be set according to the operator
  • the configuration options corresponding to the node configure the operation mode of the operator of the operator node, and ultimately realize the adjustment of the accuracy and/or performance of the operator model, making the optimization of the performance and accuracy of the algorithm model more flexible.
  • the method further includes: compiling and running the algorithm model based on the operation modes of the M operators.
  • the operator model can be compiled according to the operating modes of the M operators, and finally the accuracy and/or performance of the operating operator model can be achieved.
  • an operator operation mode configuration device which may include:
  • a determination unit configured to determine M operators corresponding to M operator nodes based on an algorithm model, wherein one operator node corresponds to one operator, each of the operators belongs to one of N types of operators, each type of operator corresponds to a plurality of operation modes, and each operation mode has a different operation precision and/or operation speed; M is an integer greater than or equal to N, and N is an integer greater than or equal to 1;
  • a configuration unit configured to configure the operating modes of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node; wherein, the configuration Parameters or configuration options are used to indicate the operation mode of the corresponding operator.
  • the device further includes: a setting unit configured to set the configuration parameters corresponding to each type of operator in the operator interface corresponding to each type of operator in the N types of operators;
  • the configuration unit is specifically used to: call the N types of operators through the operator interface corresponding to each type of operator in the N types of operators and obtain the corresponding configuration parameters; based on the N types of operators Configuration parameters corresponding to each type of operator in the algorithm model configure the corresponding operating mode of each type of operator in the N types of operators in the algorithm model.
  • the configuration parameter is one of a target configuration parameter, a default configuration parameter, and a tuning configuration parameter; wherein the target configuration parameter is used to instruct the corresponding operator to run in the target operation mode,
  • the target operating mode is one of the multiple operating modes; the default configuration parameters are used to instruct the corresponding operator to operate in the operating mode with the highest operating accuracy or the fastest operating speed. OK; the tuning configuration parameters are used to instruct the corresponding operator to run in the optimal operation mode.
  • the optimal operation mode is the operation result error determined based on the operator model and the operation mode with the highest operation accuracy. An operation mode that is within the preset threshold range and has the fastest operation speed.
  • the configuration parameters include a configuration path corresponding to the operating mode; the configuration unit is specifically configured to: based on the configuration parameters corresponding to each type of the N types of operators, obtain The configuration path of the corresponding operation mode of each type of operator in the N types of operators; read the corresponding configuration file based on the configuration path to configure the operation of the corresponding type of operator in the operation process of the algorithm model Way.
  • the determining unit is further configured to: receive input information and determine the M operator nodes and the configuration options corresponding to each operator node based on the input information;
  • the configuration options include one of target configuration parameters, default configuration parameters, and tuning configuration parameters;
  • the configuration unit is specifically used to: use the operator interface corresponding to each type of operator in the N types of operators , respectively calling the N types of operators; based on the configuration options corresponding to each of the operator nodes, configure the operation of the operator corresponding to each of the M operator nodes in the algorithm model Way.
  • the device further includes: a compilation unit, configured to compile and run the algorithm model based on the operation mode of the M operators.
  • embodiments of the present application provide an operator operation mode configuration system, including: a model compiler, used to determine M operators corresponding to M operator nodes based on the algorithm model, where one operator node corresponds to An operator. Each operator belongs to one of the N types of operators. Each type of operator corresponds to multiple operating modes. Each of the operating modes has different operating accuracy and/or operating speed. ; M is an integer greater than or equal to N, and N is an integer greater than or equal to 1; an operator compiler is used based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node, Configure the operating modes of the M operators in the algorithm model; wherein the configuration parameters or the configuration options are used to indicate the operating modes of the corresponding operators.
  • embodiments of the present application provide a computer storage medium for storing computer software instructions used for configuring a device for an operator operation mode provided in the second aspect, which includes instructions for executing the operations designed in the above aspect. program.
  • embodiments of the present application provide a computer program.
  • the computer program includes instructions.
  • the computer program can execute the operations performed by the configuration device of the operator operation mode in the second aspect. process.
  • the present application provides a chip system, which includes a processor and is used to support a terminal device to implement the functions involved in the above-mentioned first aspect, for example, in generating or processing the configuration method of the above-mentioned operator operation mode. the information involved.
  • the chip system further includes a memory, and the memory is used to store necessary program instructions and data for the data sending device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • Figure 1 is a partial structural schematic diagram of an algorithm model provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of the configuration architecture of an operator operation mode provided by the embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a method for configuring an operator operation mode provided by an embodiment of the present application.
  • Figure 4 is a partial structural schematic diagram of an algorithm model provided by an embodiment of the present application.
  • Figure 5 is a schematic diagram of an algorithm model of several operating modes provided by embodiments of the present application.
  • Figure 6 is a configuration device for an operator operation mode provided by an embodiment of the present application.
  • At least one (item) refers to one or more
  • plural refers to two or more.
  • “and/ "Or” is used to describe the association of associated objects, indicating that there can be three relationships.
  • a and/or B can mean: only A exists, only B exists, and A and B exist simultaneously, where A , B can be singular or plural.
  • the character “/” generally indicates that the related objects are an “or” relationship.
  • “At least one of the following” or similar expressions refers to any combination of these items, including Any combination of single or plural items.
  • At least one of a, b or c can mean: a, b, c, "a and b", “a and c” , “b and c", or "a and b and c", where a, b, c can be single or multiple.
  • an embodiment means that a particular feature, structure or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application.
  • the appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art understand, both explicitly and implicitly, that the embodiments described herein may be combined with other embodiments.
  • a component may be, but is not limited to, a process, a processor, an object, an executable file, a thread of execution, a program and/or a computer running on a processor.
  • applications running on the computing device and the computing device may be components.
  • One or more components can reside in a process and/or thread of execution and a component can be localized on one computer and/or distributed between 2 or more computers. Additionally, these components can execute from various computer-readable media having various data structures stored thereon.
  • a component may, for example, be based on a signal having one or more data packets (eg, data from two components interacting with another component, a local system, a distributed system, and/or a network, such as the Internet, which interacts with other systems via signals) Communicate through local and/or remote processes.
  • data packets eg, data from two components interacting with another component, a local system, a distributed system, and/or a network, such as the Internet, which interacts with other systems via signals
  • fp16 refers to a data type that uses 2 bytes (16 bits) for encoding and storage; similarly, fp32 refers to using 4 bytes (32 bits).
  • the industry uses mixed precision methods. For example, the calculation types of some layers of the AI model are converted to fp16 calculations, and the calculations of some layers still maintain fp32 calculations. This kind of entire AI The model includes both fp16 and fp32 mixed precision calculation methods to improve the execution efficiency of the entire AI model without significant loss of accuracy.
  • Figure 1 is a partial structural schematic diagram of an algorithm model provided by an embodiment of the present application. As shown in Figure 1, this part of the algorithm model includes three operators, namely MatMul, Add and Softmax. During the compilation process, if the accuracy mode of the operator is not specified, each operator node will use the high-precision operation mode by default (the high-precision operation mode can ensure that the accuracy of all value ranges is correct), such as fp32, etc. .
  • the calculation logic of the Div operator is: find the division result of two tensors (Tensor), that is, find x ⁇ y, x and y are a set of Tensors that satisfy the broadcast relationship.
  • the calculation efficiency of division on some AI chips is relatively low (i.e., the calculation performance is poor), but the calculation accuracy is high (i.e., the calculation result is accurate and the error is small), while the calculation efficiency of reciprocal calculation on the above-mentioned AI chips is relatively high, but the calculation The accuracy is slightly worse.
  • the AI chip can use the Newton iteration method to improve calculation accuracy. The more Newton iterations are used, the accuracy will go up, but the performance will go down. Due to these characteristics, the Div operator can provide multiple implementation versions suitable for this AI chip. For example:
  • High-precision operation mode Use the Div instruction of the AI chip for calculation, which features high precision but poor performance.
  • High-performance operation mode Use the AI chip to find the reciprocal, and then do multiplication, that is, calculate x*(1/y), with slightly lower accuracy and higher performance.
  • Softmax operator Illustrative: Softmax operator.
  • the calculation logic of the Softmax operator is:
  • Step 1 Find the max value on the reduce axis.
  • Step 2 Subtract the max value from the data on the reduce axis.
  • Step 3 Find the exp value of the data on the reduce axis.
  • Step 4 Find the cumulative sum of exp values of the data on the reduce axis.
  • Step 5 Find the exp value of the data on the reduce axis divided by its cumulative sum.
  • the efficiency of calculating the max value of fp32 on the AI chip is much lower than that of fp16. Therefore, when calculating the Softmax result of an fp32 Tensor, if the second step is calculated using fp32, the efficiency will be not tall. If you convert fp32 to fp16 and then perform calculations, the performance will be improved a lot. However, converting to fp16 may result in poor calculation accuracy. For example: when the number entered When the data value size does not exceed the maximum value of fp16 (65504), this conversion has no effect on the final accuracy. When the input data value size exceeds the maximum value of fp16 (65504), this conversion will lead to the final result of the calculation. Incorrect. Due to these characteristics, the Softmax operator can provide 2 implementation versions:
  • High-precision operation mode for max value calculation, no conversion from fp32 to fp16 is performed.
  • High-performance operation mode Convert fp32 to fp16 for max value calculation.
  • Model developers need to provide customized operators in the form of custom operators, and then use the customized operators in the model to improve model running performance.
  • the model developer needs to provide the "My Softmax" operator that implements a high-performance operation mode in the AI framework, and then use the new operator in the model.
  • Disadvantage 1 Although operators with different operating modes are implemented, redefining an operator end-to-end requires many modifications. For example, a MySoftmax operator needs to be fully defined and implemented, and it also needs to be modified based on the AI model to use this new operator instead of the old one.
  • Disadvantage 2 The AI framework does not have the built-in capability of this new operator, that is, the new operator is not stored in the operator library. Operator developers need to identify which operators can have multiple operating modes. During development, new operators with multiple operating modes are provided. This method increases learning costs and requires AI model developers to be familiar with chip characteristics for optimization.
  • the embodiments of the present application provide a configuration method, device and related system for operator operation modes. There is no need to define new operators, and multiple operation modes of operators can be implemented on the original operator framework.
  • the embodiment of this application provides a configuration method of operator operation mode, which can determine the operator corresponding to each of the M operator nodes of the operator model, and the N categories corresponding to the M operators. After the operator is added, the operation mode of the M operators in the algorithm model can be configured according to the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node.
  • the operators of each type of N types of operators can have a variety of operation modes with different operation accuracy and/or operation speed, which leads to the algorithm model also having multiple operation modes.
  • the embodiments of this application can be based on the operation characteristics of the operator model, based on the configuration parameters corresponding to each type of operator in the algorithm model or based on each operator node in the algorithm model.
  • the corresponding configuration options configure the operation mode of the M operators in the algorithm model.
  • Figure 2 is a schematic diagram of the configuration architecture of an operator operation mode provided by an embodiment of the present application.
  • the configuration system architecture of this operator operation mode includes: a model compiler 101 and an operator compiler 102; it may also include: an operator library 103 and a tuner 104. in:
  • the model compiler 101 is used to determine M operators corresponding to M operator nodes based on the algorithm model, where one operator node corresponds to one operator, and each of the operators belongs to one of the N types of operators, Each type of operator corresponds to multiple operation modes, and the operation accuracy and/or operation speed between each operation mode are different; M is an integer greater than or equal to N, and N is an integer greater than or equal to 1.
  • the model compiler 101 can be an AI model compiler (Ascend Tensor Complier, ATC).
  • ATC Adend Tensor Complier
  • the operator development interface is provided to model developers, so that model developers can use ATC to compile their models, so that the models can be used in Ascend. Wait for the platform to run.
  • the operator compiler 102 is used to configure the operation mode of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node; wherein, The configuration parameters or configuration options are used to indicate the operation mode of the corresponding operator.
  • the operator compiler 102 can be an operator compiler included in a graph compilation engine and a fusion engine (Graph Engine&Fusion Engine, GE&FE).
  • GE&FE Graph Engine&Fusion Engine
  • the operator compiler needs to be used to compile operators that can be run on the AI chip.
  • the operator compiler 102 provided in the embodiment of this application can read the corresponding operator precision mode configuration (that is, the corresponding operating mode of the operator) according to the configuration of each operator node in the model, and compile according to the configuration a program that can be used in the AI Implementation version running on the chip.
  • the operator library 103 provides multiple types of operators and multiple operating modes of each type of operators for the operator compiler 102 to compile. That is, in this embodiment of the present application, the operator library 103 can provide multiple versions of an operator (that is, each type of operator corresponds to multiple operating modes) and implementation information (that is, multiple operating modes). Configuration information corresponding to each mode), a default high-performance mode configuration file and a default high-precision mode configuration file corresponding to all types of operators can also be provided to achieve high precision or high performance of the operators. of operation.
  • the tuner 104 can be called Auto Tune.
  • the multi-version implementation information provided in the operator library 103 can be read, and the precision mode configuration is generated by automatically tuning the operator based on model-based computing features or chip-based computing power features. For example: Compared with division operations, the calculation efficiency of reciprocal calculation on related AI chips is relatively high.
  • the tuner 104 can read the configuration file in the operator library 103 and obtain multiple operating modes corresponding to various operators. That is, obtain which operators can adjust the operating mode (precision mode configuration), and which precision modes can be configured.
  • the configuration file can include multiple operating modes. For example, the order from left to right is: the higher the performance is to the left, the higher the accuracy is to the right.
  • the tuner 104 can run all the various operation modes of an operator in parallel, and then compare the output results of the various operation modes with the output results of the high-precision operation mode for accuracy, and select the operation mode with the highest performance as the tuning operation mode of the operator from the operation modes whose accuracy comparison results are less than a preset threshold (the user can also specify a threshold, the default threshold is double one thousandth, that is, the absolute value of the difference between the values of each element in the two result Tensors is compared point by point and does not exceed one thousandth (abs(x-y)/min(abs(x), abs(y)) ⁇ 0.001), and the number of elements with an absolute value difference exceeding one thousandth accounts for no more than one thousandth of the total number of elements in the total Tensor). For example: If the high performance and high precision comparison of an operator meet the threshold requirements, it will be adjusted to a high-performance implementation to improve the final AI model operation performance.
  • the embodiment of the present application provides an operator operation mode configuration system, which does not need to define new operators and can implement multiple operation modes of operators on the original operator framework. For example: after the model compiler determines the operator corresponding to each of the M operator nodes of the operator model and the N types of operators corresponding to the M operators, the operator compiler can determine the The configuration parameters corresponding to the above operators or the configuration options corresponding to each of the operator nodes configure the operation mode of the M operators in the algorithm model. Among them, each type of N-type operators can have a variety of operating modes with different operating accuracy and/or operating speed, which leads to the algorithm model also having multiple operating modes. In order to ensure that the operators in the operator model The operation accuracy is improved while the operation performance of the operator is improved.
  • Embodiments of this application can configure the algorithm model based on the operation characteristics of the operator model, based on the configuration parameters corresponding to each type of operator in the algorithm model, or based on the configuration options corresponding to each operator node in the algorithm model. How M operators operate.
  • the above-mentioned method of configuring the operation mode of the algorithm model through configuring parameters or configuration options can prevent the algorithm model in the existing technology from only performing operations according to a fixed processing method, making the optimization of the performance and accuracy of the algorithm model more flexible.
  • each functional module in the configuration architecture of the operator operation mode described in the embodiment of the present application is only an exemplary description in the embodiment of the present application, and the embodiment of the present application does not limit this.
  • Figure 3 is a schematic flow chart of a method for configuring an operator operating mode provided by an embodiment of the present application.
  • This method can be applied to the configuration system architecture of the operator operating mode described in Figure 1, wherein The method includes step S201 to step S204 of the method flow shown in Figure 3.
  • the model compiler 101 in the configuration system architecture of the operator operation mode shown in Figure 1 can be used to support and execute the method flow steps S201 to S202 and S204 shown in Figure 3.
  • the operator compiler 102 may be used to support and execute step S203 of the method flow shown in FIG. 3 . in,
  • Step S201 Determine M operators corresponding to M operator nodes based on the algorithm model.
  • M operators corresponding to M operator nodes are determined based on the algorithm model, wherein one operator node corresponds to one operator, each of the operators belongs to one of N types of operators, each type of operator corresponds to multiple operating modes, and the operation accuracy and/or operation speed of each operating mode is different; M is an integer greater than or equal to N, and N is an integer greater than or equal to 1.
  • FIG. 4 is a partial structural diagram of an algorithm model provided by an embodiment of the present application.
  • this part of the algorithm model includes three operator nodes, namely, a first operator node 301, a second operator node 302, and a third operator node 303.
  • the first operator node 301 corresponds to the MatMul operator
  • the second operator node 302 corresponds to the RealDiv operator
  • the third operator node 303 corresponds to the Softmax operator.
  • the operators corresponding to the first operator node 301, the second operator node 302, and the third operator node 303 respectively belong to three different types of operators.
  • Each type of operator can correspond to multiple operation modes, such as: high-precision operation mode, high-performance operation mode, half-precision operation mode, etc.
  • Each of the above operation modes has different operation accuracy and/or operation speed.
  • RealDiv operator can provide high-precision operation mode: Use the Div instruction with fp32 precision for calculations.
  • Half-precision operation mode Use the Div instruction of fp16 precision for calculation.
  • High-performance operation mode use fp16 precision to find the reciprocal, and then do multiplication, that is, calculate x*(1/y).
  • the embodiments of this application do not specifically limit the operating modes of various operators.
  • Step S202 Set the configuration parameters corresponding to each type of operator in the operator interface corresponding to each type of operator among the N types of operators.
  • the configuration parameters corresponding to each type of operator are set in the operator interface corresponding to each type of operator in the N types of operators.
  • Each type of operator corresponds to multiple operating modes. Setting the configuration parameters corresponding to each type of operator in the operator interface corresponding to each type of N types of operators can realize the operating mode without changing the operator structure. configuration.
  • the configuration parameters corresponding to each type of operator are set in the operator interface corresponding to each type of operator mentioned in the embodiment of the present application. It is also possible to set the configuration parameters based on the global parameter global_build_context in the compiled context information before calling the operator interface for compilation, so that when calling the operator interface, the operation mode of the operator can be directly configured based on the configuration parameters corresponding to the global parameter.
  • the pre-set global parameters when calling the Softmax operator are: global_build_context.set_impl_mode("high_performance")
  • the configuration parameters are one of target configuration parameters, default configuration parameters, and tuning configuration parameters; wherein, the target configuration parameters are used to instruct the corresponding operator to run in the target operation mode, and the target operation mode It is one of the multiple operating modes; the default configuration parameters are used to instruct the corresponding operator to run in the operating mode with the highest operating accuracy or the fastest operating speed; the tuning configuration parameters are used to indicate the corresponding
  • the operator operates in an optimal operation mode, which is one in which the operation result error determined based on the operator model and the operation mode with the highest operation accuracy is within a preset threshold range and has the fastest operation speed. mode of operation.
  • the configuration parameters in the embodiments of the present application can be divided into three categories.
  • the configuration parameters can be any one of target configuration parameters, default configuration parameters, and tuning configuration parameters.
  • the target configuration parameters can be used to indicate an operating mode set by the user; that is, the user can directly set the configuration parameters to the target configuration parameters of the required target operating mode.
  • the target configuration parameters can indicate an ini file. Path, indicating that the configuration information in this ini file is used to select the corresponding precision implementation version (that is, the running mode) for the corresponding operator.
  • Default configuration parameters can be used to instruct this type of operator to run in the highest computing accuracy or fastest computing speed mode; for example: the default configuration parameters can indicate the default high-precision and high-performance configuration built into the operator library, such as The corresponding configuration parameters in the operator interface of the above Softmax operator. Tuning configuration parameters can be used to indicate running in the optimal operating mode under the operating characteristics of the current operator model. Another example: the tuning configuration parameters represent the running mode after automatic tuning using the tuner (auto_tune module). This tuning configuration parameter can be used to instruct the tuner to select an operating mode from multiple operating modes that has an error within a preset threshold and has the highest performance. Please refer to Figure 5.
  • Figure 5 is a schematic diagram of an algorithm model of several operating modes provided by embodiments of the present application. As shown in Figure 5, this application runs the high-precision operation mode and the high-performance operation mode respectively based on the algorithm model structure shown in Figure 4 above. The results of the two operation modes can be compared to select the optimal operation mode.
  • the preset threshold range can represent a specified threshold, or it can be a default threshold.
  • the default threshold is one in two thousand, that is, two results (the operation result of the optimal operation mode and the operation result of the high-precision operation mode) Tensor The values in are compared point by point.
  • the absolute value of the difference between each element value does not exceed one thousandth (abs(x-y)/min(abs(x), abs(y)) ⁇ 0.001), and the absolute value difference exceeds one thousandth.
  • the number of elements of one accounts for no more than one thousandth of the total number of elements in the total Tensor.
  • input information is received and the M operator nodes and the configuration options corresponding to each operator node are determined based on the input information.
  • input information is received and the M operator nodes and the configuration options corresponding to each operator node are determined based on the input information; wherein the configuration options include target configuration parameters, default configuration parameters, One of the tuning configuration parameters.
  • the input information may be user input or selection operation, and in response to the operation, the configuration options corresponding to each operator node may be set.
  • the configuration option also includes one of target configuration parameters, default configuration parameters, and tuning configuration parameters.
  • the above three configuration parameters can be referred to the relevant descriptions of the above embodiments, and will not be described again in the embodiments of this application.
  • setting configuration parameters at the operator interface means that the operator model will run in the mode indicated by the configuration parameters when calling and compiling this type of operator; setting configuration options at the operator node means Refers to the operating mode indicated by this configuration option when compiling the operator corresponding to the operator node of the operator model.
  • Step S203 Configure the operating modes of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node.
  • the operation mode of the M operators in the algorithm model is configured.
  • the operator compiler reads/obtains the operation modes corresponding to the M operators, it reads the corresponding configuration files in the operator library according to the operation modes to compile the operators.
  • the corresponding operating mode in the configuration options corresponding to the operator node is used as the corresponding operation mode of the current operator node. How the operator operates.
  • the corresponding operating mode in the configuration parameters corresponding to each type of operator is used as the current operator.
  • the node corresponds to the operation mode of the operator.
  • the corresponding high-precision operation mode in the configuration parameters corresponding to each type of operator is used as the The current operator node corresponds to the operating mode of the operator.
  • configuring the operating modes of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operators includes: using the N types of operators
  • the operator interface corresponding to each type of operator calls the N types of operators respectively and obtains the corresponding configuration parameters; configure the algorithm based on the configuration parameters corresponding to each type of the N types of operators.
  • the configuration parameters include a configuration path corresponding to the operating mode; and based on the configuration parameters corresponding to each type of the N types of operators, the configuration parameters in the algorithm model are
  • the operation mode corresponding to each type of operator in the N types of operators includes: based on the configuration parameters corresponding to each type of operators in the N types of operators, obtaining the operation mode of each type of operators in the N types of operators.
  • the configuration path of the corresponding operating mode of the operator; the corresponding configuration file is read based on the configuration path to configure the operating mode of the corresponding type of operator during the operation of the algorithm model.
  • the configuration parameters may include corresponding running mode configuration paths. The configuration path is used to indicate the configuration file corresponding to the operation mode.
  • the configuration file includes relevant configuration information, so that after reading the configuration file, the operation mode of the corresponding type of operator can be configured during the operation of the algorithm model.
  • the configuration parameters of the first type of operator are target configuration parameters, they can include the configuration path of the target operating mode, and read the configuration file of the target operating mode based on the configuration path of the target operating mode; and then configure the first operating mode based on the configuration file.
  • Class operators run in target run mode.
  • the target operation mode may be a high-precision operation mode, which is not specifically limited in this application.
  • the configuration options corresponding to each operator node can also be set by selecting one of the three configuration parameters.
  • the operation mode of the operator of the operator node can be configured according to the configuration options corresponding to the operator node, and finally the precision and/or performance of the operator model can be adjusted, making the optimization of the performance and precision of the algorithm model more flexible.
  • Step S204 Compile and run the algorithm model based on the operation modes of the M operators.
  • the algorithm model is compiled and run. After obtaining the operation mode of each operator, the operators that can be run on the chip can be compiled sequentially or in parallel to obtain and run the algorithm model.
  • the operator after determining the operator corresponding to each of the M operator nodes of the operator model, and the N types of operators corresponding to the M operators, the operator can be determined according to the description of each type.
  • the configuration parameters corresponding to the operator or the configuration options corresponding to each operator node configure the operation mode of the M operators in the algorithm model.
  • each type of N-type operators can have a variety of operating modes with different operating accuracy and/or operating speed, which leads to the algorithm model also having multiple operating modes. In order to ensure that the operators in the operator model The operation accuracy of the operator is improved while the operation performance of the operator is improved.
  • the embodiments of this application can be based on the operation characteristics of the operator model, based on the configuration parameters corresponding to each type of operator in the algorithm model or based on each operator node in the algorithm model.
  • the corresponding configuration options configure the operation mode of the M operators in the algorithm model.
  • Figure 6 is an operator operation mode configuration device provided by an embodiment of the present application.
  • the operator operation mode configuration device 60 is applied to the configuration system structure of the operator operation mode and is suitable for the above operator operation. configuration method.
  • the operator operation mode configuration device 60 may include a determination unit 401 and a configuration unit 402, and may also include a setting unit 403 and a compilation unit 404. Among them, the detailed description of each unit is as follows.
  • the determination unit 401 is used to determine M operators corresponding to M operator nodes based on the algorithm model, where one operator node corresponds to one operator, and each operator belongs to one of the N types of operators.
  • the operators described in this class respectively correspond to multiple operating modes, and the operation accuracy and/or operation speed between each of the operation modes are different;
  • M is an integer greater than or equal to N, and N is an integer greater than or equal to 1;
  • the configuration unit 402 is configured to configure the operating modes of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node; wherein, Configuration parameters or configuration options are used to indicate the operation mode of the corresponding operator.
  • the device further includes: a setting unit 403, configured to set the configuration parameters corresponding to each type of operator in the operator interface corresponding to each type of operator in the N types of operators. ;
  • the configuration unit 402 is specifically configured to: call the N types of operators through the operator interface corresponding to each type of operator in the N types of operators and obtain the corresponding configuration parameters; based on the N
  • the configuration parameters corresponding to each type of operator in the operator type configure the operation mode corresponding to each type of operator in the N types of operators in the algorithm model.
  • the configuration parameter is one of a target configuration parameter, a default configuration parameter, and a tuning configuration parameter; wherein the target configuration parameter is used to instruct the corresponding operator to run in the target operation mode,
  • the target operating mode is one of the multiple operating modes; the default configuration parameters are used to instruct the corresponding operator to run in the operating mode with the highest operating accuracy or the fastest operating speed; the tuning configuration
  • the parameter is used to indicate that the corresponding operator runs in the optimal operation mode.
  • the optimal operation mode is that the operation result error determined based on the operator model and the operation mode with the highest operation accuracy is within the preset threshold range and the operation result The fastest way to run.
  • the configuration parameters include a configuration path corresponding to the operating mode; the configuration unit 402 is specifically configured to: based on the configuration parameters corresponding to each type of the N types of operators, Obtain the configuration path corresponding to the operation mode of each type of operator in the N types of operators; read the corresponding configuration file based on the configuration path to configure the operation of the corresponding type of operator in the algorithm model operation process. Operation mode.
  • the determination unit 401 is also used to: receive input information and determine the M operator nodes and the configuration options corresponding to each of the operator nodes based on the input information; wherein the configuration options include one of target configuration parameters, default configuration parameters, and tuning configuration parameters; the configuration unit 402 is specifically used to: respectively call the N types of operators through the operator interfaces corresponding to each type of operator in the N types of operators; and configure the operation mode of the operator corresponding to each of the M operator nodes in the algorithm model based on the configuration options corresponding to each of the operator nodes.
  • the device further includes: a compiling unit 404, configured to compile and run the algorithm model based on the operating modes of the M operators.
  • each functional unit in the operator operation mode configuration device 60 described in the embodiment of the present application can be referred to step S201 in the embodiment of the operator operation mode configuration method described in Figure 3.
  • the relevant description of S204 will not be repeated here.
  • Embodiments of the present application provide a computer storage medium for storing computer software instructions used for configuring a device for an operator operation mode provided in the above-mentioned related embodiments, which includes a program designed for executing the above aspect.
  • An embodiment of the present application provides a computer program.
  • the computer program includes instructions.
  • the computer program can execute the process executed by the configuration device of the operator operation mode in the above related embodiments.
  • the present application provides a chip system.
  • the chip system includes a processor and is used to support the terminal device to implement the functions involved in the above-mentioned related embodiments, for example, generating or processing the information involved in the configuration method of the above-mentioned operator operation mode.
  • the chip system further includes a memory, and the memory is used to store necessary program instructions and data for the data sending device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the disclosed device can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the above units is only a logical function division. In actual implementation, there may be other divisions.
  • multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical or other forms.
  • the units described above as separate components may or may not be physically separated.
  • the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which can be a personal computer, a server, a network device, etc., specifically a processor in a computer device) to execute all or part of the steps of the above methods in various embodiments of the present application.
  • a computer device which can be a personal computer, a server, a network device, etc., specifically a processor in a computer device
  • the aforementioned storage media may include: U disk, mobile hard disk, magnetic disk, optical disk, read-only memory (Read-Only Memory, abbreviation: ROM) or random access memory (Random Access Memory, abbreviation: RAM), etc.
  • U disk mobile hard disk
  • magnetic disk magnetic disk
  • optical disk read-only memory
  • read-only memory Read-Only Memory
  • RAM random access memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Stored Programmes (AREA)

Abstract

Embodiments of the present application provide an operator operation mode configuration method and apparatus, and a related system. The operator operation mode configuration method comprises: on the basis of an algorithm model, determining M operators corresponding to M operator nodes, wherein one operator node corresponds to one operator, each operator belongs to one type of N types of operators, each type of operators correspond to a plurality of operation modes, the operation precision and/or the operation speed of the operation modes are different, M is an integer greater than or equal to N, and N is an integer greater than or equal to 1; and on the basis of configuration parameters corresponding to each type of operators or configuration options corresponding to the operator nodes, configuring operation modes of the M operators in the algorithm model, wherein the configuration parameters or the configuration options are used for indicating operation modes of corresponding operators. By implementing the embodiments of the present application, the operation performance of operators can be improved while the operation precision of the operators in an operator model is ensured.

Description

一种算子运行方式的配置方法、装置及相关系统Configuration method, device and related system for operator operation mode
本申请要求于2022年9月22日提交中国专利局、申请号为202211155932.4、申请名称为“一种算子运行方式的配置方法、装置及相关系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on September 22, 2022, with the application number 202211155932.4 and the application title "A configuration method, device and related system for operator operation mode", and its entire content incorporated herein by reference.
技术领域Technical field
本申请涉及编译器领域,尤其涉及一种算子运行方式的配置方法、装置及相关系统。The present application relates to the field of compilers, and in particular to a configuration method, device and related systems for an operator operation mode.
背景技术Background technique
对于人工智能(Artificial Intelligence,AI)专用芯片上运行AI模型,最关注的两点就是芯片的精度与性能,即计算结果是否精准,计算速度是否迅速。而一个AI芯片的算力是固定的,例如:昇腾710AI处理器在Vector类计算上对于float16的计算速度要比float32的计算速度快一倍,甚至在某些计算上算力差距更是超过2倍。对于这种普遍的算力差距,业界均使用混合精度的办法,例如:将AI模型的某些层的运算类型转换为fp16的计算,某些层的计算仍然保持fp32的计算,这种整个AI模型中既包含fp16也包含fp32的混合精度计算的办法,来提升整个AI模型的执行效率,同时精度也不会有很大的损失。When it comes to running AI models on dedicated artificial intelligence (AI) chips, the two most important points are the accuracy and performance of the chip, that is, whether the calculation results are accurate and whether the calculation speed is fast. The computing power of an AI chip is fixed. For example, the Ascend 710 AI processor can calculate float16 twice as fast as float32 in vector calculations. In some calculations, the computing power gap is even greater than that of float32. 2 times. To deal with this common computing power gap, the industry uses mixed precision methods. For example, the calculation types of some layers of the AI model are converted to fp16 calculations, and the calculations of some layers still maintain fp32 calculations. This kind of entire AI The model includes both fp16 and fp32 mixed precision calculation methods to improve the execution efficiency of the entire AI model without significant loss of accuracy.
但是同一个芯片在不同类型的运算上的计算效率也存在很大的差异,例如:exp,log这些运算芯片对于除法的计算效率远低于加法、减法、乘法、求倒数的运算。现有的AI框架没有提供一个很好的扩展能力,只能按照固定的处理方式进行运算,给AI模型的性能与精度调优带来了不便。However, the computing efficiency of the same chip in different types of operations is also very different. For example: exp, log, the computing efficiency of these computing chips for division is much lower than that of addition, subtraction, multiplication, and reciprocal operations. The existing AI framework does not provide a good expansion capability and can only perform operations according to a fixed processing method, which brings inconvenience to the performance and accuracy tuning of the AI model.
发明内容Contents of the invention
本申请实施例提供一种算子运行方式的配置方法、装置及相关系统,以保证算子模型中算子的运算精度同时提高算子的运算性能。Embodiments of the present application provide an operator operation mode configuration method, device and related systems to ensure the operation accuracy of the operators in the operator model and improve the operation performance of the operators.
第一方面,本申请实施例提供了一种算子运行方式的配置方法,包括:基于算法模型确定M个算子节点对应的M个算子,其中,一个算子节点对应一个算子,每个所述算子属于N类算子中的一类,每类所述算子分别对应多种运行方式,每种所述运行方式之间的运算精度和/或运算速度不同;M为大于或等于N的整数,N为大于或等于1的整数;基于每类所述算子对应的配置参数或每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式;其中,所述配置参数或所述配置选项用于指示对应算子的运行方式。In the first aspect, embodiments of the present application provide a method for configuring operator operation modes, including: determining M operators corresponding to M operator nodes based on an algorithm model, where one operator node corresponds to one operator, and each The operators belong to one of the N types of operators. Each type of operator corresponds to multiple operating modes. The operating accuracy and/or operating speed between each of the operating modes are different; M is greater than or An integer equal to N, where N is an integer greater than or equal to 1; configure the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node The operating mode; wherein the configuration parameters or the configuration options are used to indicate the operating mode of the corresponding operator.
在第一方面提供的实施例中,在确定算子模型M个算子节点的每个算子节点对应的算子,以及,M个算子对应的N类算子后,可以根据每类所述算子对应的配置参数或每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式。其中,N类算子每类所述算子均可以有多种不同运算精度和/或运算速度的运行方式,进而导致算法模型也可以有多种运行方式,也为了保证算子模型中算子的运算精度同时提高算子的运算性能,本申请实施例可以根据算子模型的运算特点,基于算法模型中每类所述算子对应的配置参数或者基于算法模型中每个所述算子节点对应的配置选项,配置所述算法模型中M个算子的运行方式。上述通过配置参数或者配置选项,配置所述算法模型的运行方式的方法可以避免现有技术中算法模型只能按照固定的处理方式进行运算,使得算法模型的性能与精度的优化更加灵活。In the embodiment provided in the first aspect, after determining the operator corresponding to each of the M operator nodes of the operator model and the N types of operators corresponding to the M operators, the operator can be Configuration parameters corresponding to the operator or configuration options corresponding to each operator node configure the operation mode of the M operators in the algorithm model. Among them, the operators of each type of N types of operators can have a variety of operation modes with different operation accuracy and/or operation speed, which leads to the algorithm model also having multiple operation modes. In order to ensure that the operators in the operator model The operation accuracy and the operation performance of the operator can be improved at the same time. The embodiments of this application can be based on the operation characteristics of the operator model, based on the configuration parameters corresponding to each type of operator in the algorithm model or based on each operator node in the algorithm model. The corresponding configuration options configure the operation mode of the M operators in the algorithm model. The above-mentioned method of configuring the operation mode of the algorithm model through configuring parameters or configuration options can prevent the algorithm model in the existing technology from only performing operations according to a fixed processing method, making the optimization of the performance and accuracy of the algorithm model more flexible.
在一种可能实现的方式中,所述方法还包括:在所述N类算子中每类算子对应的算子接口中设置每类算子对应的所述配置参数;所述基于每类所述算子对应的配置参数,分别配置所述算法模型中所述M个算子的运行方式,包括:通过所述N类算子中每类算子对应的算子接口,分别调用所述N类算子并获取对应的所述配置参数;基于所述N类算子中每类所述算子对应的配置参数,配置所述算法模型中所述N类算子中每类所述算子分别对应的运行方式。In one possible implementation, the method also includes: setting the configuration parameters corresponding to each type of operator in the operator interface corresponding to each type of operator in the N types of operators; configuring the operating modes of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator, including: calling the N types of operators and obtaining the corresponding configuration parameters through the operator interface corresponding to each type of operator in the N types of operators; configuring the operating modes of each type of operator in the algorithm model based on the configuration parameters corresponding to each type of operator in the N types of operators.
在本申请实施例中,在算子实现接口中设置每类算子对应的配置参数,可以实现算子级别的精度模式设置。在通过每类算子对应的算子接口调用该类算子时,可以根据算子实现接口中配置参数不同编译不同运行方式的算子,最终实现算子模型的精度和/或性能的调整,使得算法模型的性能与精度的优化更加灵活。In the embodiment of this application, setting the configuration parameters corresponding to each type of operator in the operator implementation interface can realize operator-level precision mode setting. When calling each type of operator through the operator interface corresponding to this type of operator, operators with different operating modes can be compiled according to different configuration parameters in the operator implementation interface, and finally the accuracy and/or performance of the operator model can be adjusted. This makes the optimization of algorithm model performance and accuracy more flexible.
在一种可能实现的方式中,所述配置参数为目标配置参数、默认配置参数、调优配置参数中的一种; 其中,所述目标配置参数用于指示对应算子以目标运行方式运行,所述目标运行方式为所述多种运行方式中的一种;所述默认配置参数用于指示对应算子以最高运算精度的运行方式或最快运算速度的运行方式运行;所述调优配置参数用于指示对应算子以最优运行方式运行,所述最优运行方式为基于所述算子模型确定的与所述最高运算精度的运行方式的运算结果误差在预设阈值范围内且运算速度最快的一种运行方式。In a possible implementation manner, the configuration parameters are one of target configuration parameters, default configuration parameters, and tuning configuration parameters; Wherein, the target configuration parameter is used to instruct the corresponding operator to run in the target operation mode, and the target operation mode is one of the multiple operation modes; the default configuration parameter is used to instruct the corresponding operator to run in the highest operation mode. The precision operation mode or the fastest operation mode operation mode; the tuning configuration parameters are used to instruct the corresponding operator to run in the optimal operation mode, and the optimal operation mode is determined based on the operator model and the The above-mentioned operation mode with the highest operation accuracy has an operation mode with the operation result error within the preset threshold range and the fastest operation speed.
在本申请实施例中,配置参数可以分成三类,例如:配置参数可以为目标配置参数、默认配置参数、调优配置参数中的任意一种。其中,目标配置参数可以用于指示用户设置的一种运行方式;默认配置参数可以用于指示该类算子以最高运算精度的运行方式或最快运算速度的运行方式运行;调优配置参数可以用于指示以在当前算子模型的运算特征下的最优运行方式运行。In the embodiment of the present application, the configuration parameters can be divided into three categories, for example, the configuration parameters can be any one of the target configuration parameters, the default configuration parameters, and the tuning configuration parameters. Among them, the target configuration parameters can be used to indicate an operation mode set by the user; the default configuration parameters can be used to indicate that the operator of this type is operated in the operation mode with the highest computing accuracy or the fastest computing speed; the tuning configuration parameters can be used to indicate that the operator is operated in the optimal operation mode under the computing characteristics of the current operator model.
在一种可能实现的方式中,所述配置参数包括对应运行方式的配置路径;所述基于所述N类算子中每类所述算子对应的配置参数,配置所述算法模型中所述N类算子中每类所述算子分别对应的运行方式,包括:基于所述N类算子中每类所述算子对应的配置参数,获取所述N类算子中每类所述算子对应运行方式的配置路径;基于所述配置路径读取对应的配置文件,以配置所述对应类型算子在所述算法模型运算过程中的运行方式。In a possible implementation manner, the configuration parameters include a configuration path corresponding to the operating mode; and based on the configuration parameters corresponding to each type of the N types of operators, the configuration parameters in the algorithm model are The operation mode corresponding to each type of operator in the N types of operators includes: based on the configuration parameters corresponding to each type of operators in the N types of operators, obtaining the operation mode of each type of operators in the N types of operators. The configuration path of the corresponding operating mode of the operator; the corresponding configuration file is read based on the configuration path to configure the operating mode of the corresponding type of operator during the operation of the algorithm model.
在本申请实施例中,配置参数可以包括对应运行方式配置路径。该配置路径用于指示对应运行方式的配置文件,该配置文件包括相关的配置信息,以便在读取该配置文件后配置对应类型算子在所述算法模型运算过程中的运行方式。例如:第一类算子的配置参数为目标配置参数时,可以包括目标运行方式的配置路径,基于该目标运行方式的配置路径读取目标运行方式的配置文件;再根据该配置文件配置第一类算子以目标运行方式运行。In this embodiment of the present application, the configuration parameters may include corresponding operating mode configuration paths. The configuration path is used to indicate the configuration file corresponding to the operation mode. The configuration file includes relevant configuration information, so that after reading the configuration file, the operation mode of the corresponding type of operator can be configured during the operation of the algorithm model. For example: when the configuration parameters of the first type of operator are target configuration parameters, they can include the configuration path of the target operating mode, and read the configuration file of the target operating mode based on the configuration path of the target operating mode; and then configure the first operating mode based on the configuration file. Class operators run in target run mode.
在一种可能实现的方式中,所述基于算法模型确定M个算子节点对应的M个算子,包括:接收输入信息并基于所述输入信息确定所述M个算子节点以及每个所述算子节点对应的所述配置选项;其中,所述配置选项包括目标配置参数、默认配置参数、调优配置参数中的一种;所述基于每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式,包括:通过所述N类算子中每类算子对应的算子接口,分别调用所述N类算子;基于每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子节点中每个所述算子节点对应的算子的运行方式。In a possible implementation manner, determining the M operators corresponding to the M operator nodes based on the algorithm model includes: receiving input information and determining the M operator nodes and each of the M operator nodes based on the input information. The configuration options corresponding to the operator nodes; wherein the configuration options include one of target configuration parameters, default configuration parameters, and tuning configuration parameters; based on the configuration options corresponding to each operator node, Configuring the operation mode of the M operators in the algorithm model includes: calling the N types of operators respectively through the operator interface corresponding to each type of operator in the N types of operators; based on each of the The configuration options corresponding to the operator nodes configure the operation mode of the operator corresponding to each of the M operator nodes in the algorithm model.
在本申请实施例中,除了在算子实现接口中设置每类算子对应的配置参数之外,还可以基于用户的输入信息设置每个算子节点对应的配置选项,实现算子节点级别的精度模式设置,即处于不同算子节点的同类型算子可以被配置不同的运行方式。该配置选项同配置参数也可以在三种配置参数中选择一种进行设置,在进行模型编译时,每个算子节点通过对应类型的算子接口调用该类算子时,可以根据该算子节点对应的配置选项配置该算子节点的算子的运行方式,最终实现算子模型的精度和/或性能的调整,使得算法模型的性能与精度的优化更加灵活。In the embodiment of this application, in addition to setting the configuration parameters corresponding to each type of operator in the operator implementation interface, the configuration options corresponding to each operator node can also be set based on the user's input information to achieve operator node level Precision mode setting, that is, operators of the same type in different operator nodes can be configured with different operating modes. This configuration option is the same as the configuration parameter. You can also choose one of the three configuration parameters to set. When compiling the model, when each operator node calls this type of operator through the corresponding type of operator interface, it can be set according to the operator The configuration options corresponding to the node configure the operation mode of the operator of the operator node, and ultimately realize the adjustment of the accuracy and/or performance of the operator model, making the optimization of the performance and accuracy of the algorithm model more flexible.
在一种可能实现的方式中,所述方法还包括:基于所述M个算子的运行方式,编译并运行所述算法模型。In a possible implementation manner, the method further includes: compiling and running the algorithm model based on the operation modes of the M operators.
在本申请实施例中,在确定M个算子的运行方式后,可以根据该M个算子的运行方式编译算子模型,最终实现运行中的算子模型的精度和/或性能。In the embodiment of the present application, after the operating modes of the M operators are determined, the operator model can be compiled according to the operating modes of the M operators, and finally the accuracy and/or performance of the operating operator model can be achieved.
第二方面,本申请实施例提供了一种算子运行方式的配置装置,可包括:In the second aspect, embodiments of the present application provide an operator operation mode configuration device, which may include:
确定单元,用于基于算法模型确定M个算子节点对应的M个算子,其中,一个算子节点对应一个算子,每个所述算子属于N类算子中的一类,每类所述算子分别对应多种运行方式,每种所述运行方式之间的运算精度和/或运算速度不同;M为大于或等于N的整数,N为大于或等于1的整数;A determination unit, configured to determine M operators corresponding to M operator nodes based on an algorithm model, wherein one operator node corresponds to one operator, each of the operators belongs to one of N types of operators, each type of operator corresponds to a plurality of operation modes, and each operation mode has a different operation precision and/or operation speed; M is an integer greater than or equal to N, and N is an integer greater than or equal to 1;
配置单元,用于基于每类所述算子对应的配置参数或每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式;其中,所述配置参数或所述配置选项用于指示对应算子的运行方式。A configuration unit configured to configure the operating modes of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node; wherein, the configuration Parameters or configuration options are used to indicate the operation mode of the corresponding operator.
在一种可能实现的方式中,所述装置还包括:设置单元,用于在所述N类算子中每类算子对应的算子接口中设置每类算子对应的所述配置参数;所述配置单元,具体用于:通过所述N类算子中每类算子对应的算子接口,分别调用所述N类算子并获取对应的所述配置参数;基于所述N类算子中每类所述算子对应的配置参数,配置所述算法模型中所述N类算子中每类所述算子分别对应的运行方式。In a possible implementation manner, the device further includes: a setting unit configured to set the configuration parameters corresponding to each type of operator in the operator interface corresponding to each type of operator in the N types of operators; The configuration unit is specifically used to: call the N types of operators through the operator interface corresponding to each type of operator in the N types of operators and obtain the corresponding configuration parameters; based on the N types of operators Configuration parameters corresponding to each type of operator in the algorithm model configure the corresponding operating mode of each type of operator in the N types of operators in the algorithm model.
在一种可能实现的方式中,所述配置参数为目标配置参数、默认配置参数、调优配置参数中的一种;其中,所述目标配置参数用于指示对应算子以目标运行方式运行,所述目标运行方式为所述多种运行方式中的一种;所述默认配置参数用于指示对应算子以最高运算精度的运行方式或最快运算速度的运行方式运 行;所述调优配置参数用于指示对应算子以最优运行方式运行,所述最优运行方式为基于所述算子模型确定的与所述最高运算精度的运行方式的运算结果误差在预设阈值范围内且运算速度最快的一种运行方式。In a possible implementation manner, the configuration parameter is one of a target configuration parameter, a default configuration parameter, and a tuning configuration parameter; wherein the target configuration parameter is used to instruct the corresponding operator to run in the target operation mode, The target operating mode is one of the multiple operating modes; the default configuration parameters are used to instruct the corresponding operator to operate in the operating mode with the highest operating accuracy or the fastest operating speed. OK; the tuning configuration parameters are used to instruct the corresponding operator to run in the optimal operation mode. The optimal operation mode is the operation result error determined based on the operator model and the operation mode with the highest operation accuracy. An operation mode that is within the preset threshold range and has the fastest operation speed.
在一种可能实现的方式中,所述配置参数包括对应运行方式的配置路径;所述配置单元,具体用于:基于所述N类算子中每类所述算子对应的配置参数,获取所述N类算子中每类所述算子对应运行方式的配置路径;基于所述配置路径读取对应的配置文件,以配置所述对应类型算子在所述算法模型运算过程中的运行方式。In a possible implementation manner, the configuration parameters include a configuration path corresponding to the operating mode; the configuration unit is specifically configured to: based on the configuration parameters corresponding to each type of the N types of operators, obtain The configuration path of the corresponding operation mode of each type of operator in the N types of operators; read the corresponding configuration file based on the configuration path to configure the operation of the corresponding type of operator in the operation process of the algorithm model Way.
在一种可能实现的方式中,所述确定单元,还用于:接收输入信息并基于所述输入信息确定所述M个算子节点以及每个所述算子节点对应的所述配置选项;其中,所述配置选项包括目标配置参数、默认配置参数、调优配置参数中的一种;所述配置单元,具体用于:通过所述N类算子中每类算子对应的算子接口,分别调用所述N类算子;基于每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子节点中每个所述算子节点对应的算子的运行方式。In a possible implementation manner, the determining unit is further configured to: receive input information and determine the M operator nodes and the configuration options corresponding to each operator node based on the input information; Wherein, the configuration options include one of target configuration parameters, default configuration parameters, and tuning configuration parameters; the configuration unit is specifically used to: use the operator interface corresponding to each type of operator in the N types of operators , respectively calling the N types of operators; based on the configuration options corresponding to each of the operator nodes, configure the operation of the operator corresponding to each of the M operator nodes in the algorithm model Way.
在一种可能实现的方式中,所述装置还包括:编译单元,用于基于所述M个算子的运行方式,编译并运行所述算法模型。In a possible implementation manner, the device further includes: a compilation unit, configured to compile and run the algorithm model based on the operation mode of the M operators.
第三方面,本申请实施例提供一种算子运行方式的配置系统,包括:模型编译器,用于基于算法模型确定M个算子节点对应的M个算子,其中,一个算子节点对应一个算子,每个所述算子属于N类算子中的一类,每类所述算子分别对应多种运行方式,每种所述运行方式之间的运算精度和/或运算速度不同;M为大于或等于N的整数,N为大于或等于1的整数;算子编译器,用于基于每类所述算子对应的配置参数或每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式;其中,所述配置参数或所述配置选项用于指示对应算子的运行方式。In the third aspect, embodiments of the present application provide an operator operation mode configuration system, including: a model compiler, used to determine M operators corresponding to M operator nodes based on the algorithm model, where one operator node corresponds to An operator. Each operator belongs to one of the N types of operators. Each type of operator corresponds to multiple operating modes. Each of the operating modes has different operating accuracy and/or operating speed. ; M is an integer greater than or equal to N, and N is an integer greater than or equal to 1; an operator compiler is used based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node, Configure the operating modes of the M operators in the algorithm model; wherein the configuration parameters or the configuration options are used to indicate the operating modes of the corresponding operators.
第四方面,本申请实施例提供一种计算机存储介质,用于储存为上述第二方面提供的一种算子运行方式的配置装置所用的计算机软件指令,其包含用于执行上述方面所设计的程序。In the fourth aspect, embodiments of the present application provide a computer storage medium for storing computer software instructions used for configuring a device for an operator operation mode provided in the second aspect, which includes instructions for executing the operations designed in the above aspect. program.
第五方面,本申请实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被计算机执行时,使得计算机可以执行上述第二方面中的算子运行方式的配置装置所执行的流程。In a fifth aspect, embodiments of the present application provide a computer program. The computer program includes instructions. When the computer program is executed by a computer, the computer can execute the operations performed by the configuration device of the operator operation mode in the second aspect. process.
第六方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持终端设备实现上述第一方面中所涉及的功能,例如,生成或处理上述算子运行方式的配置方法中所涉及的信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存数据发送设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。In a sixth aspect, the present application provides a chip system, which includes a processor and is used to support a terminal device to implement the functions involved in the above-mentioned first aspect, for example, in generating or processing the configuration method of the above-mentioned operator operation mode. the information involved. In a possible design, the chip system further includes a memory, and the memory is used to store necessary program instructions and data for the data sending device. The chip system may be composed of chips, or may include chips and other discrete devices.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或背景技术中的技术方案,下面将对本申请实施例或背景技术中所需要使用的附图进行说明。In order to more clearly explain the technical solutions in the embodiments of the present application or the background technology, the drawings required to be used in the embodiments or the background technology of the present application will be described below.
图1是本申请实施例提供的一种算法模型的部分结构示意图。Figure 1 is a partial structural schematic diagram of an algorithm model provided by an embodiment of the present application.
图2是本申请实施例提供的一种算子运行方式的配置构架示意图。Figure 2 is a schematic diagram of the configuration architecture of an operator operation mode provided by the embodiment of the present application.
图3是本申请实施例提供的一种算子运行方式的配置方法的流程示意图。FIG. 3 is a schematic flowchart of a method for configuring an operator operation mode provided by an embodiment of the present application.
图4是本申请实施例提供的一种算法模型的部分结构示意图。Figure 4 is a partial structural schematic diagram of an algorithm model provided by an embodiment of the present application.
图5是本申请实施例提供的几种运行方式的算法模型示意图。Figure 5 is a schematic diagram of an algorithm model of several operating modes provided by embodiments of the present application.
图6是本申请实施例提供的一种算子运行方式的配置装置。Figure 6 is a configuration device for an operator operation mode provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例进行描述。The embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms “first”, “second”, “third” and “fourth” in the description, claims and drawings of this application are used to distinguish different objects, rather than to describe a specific sequence. . Furthermore, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device that includes a series of steps or units is not limited to the listed steps or units, but optionally also includes steps or units that are not listed, or optionally also includes Other steps or units inherent to such processes, methods, products or devices.
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/ 或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。It should be understood that in this application, "at least one (item)" refers to one or more, and "plurality" refers to two or more. "and/ "Or" is used to describe the association of associated objects, indicating that there can be three relationships. For example, "A and/or B" can mean: only A exists, only B exists, and A and B exist simultaneously, where A , B can be singular or plural. The character "/" generally indicates that the related objects are an "or" relationship. "At least one of the following" or similar expressions refers to any combination of these items, including Any combination of single or plural items. For example, at least one of a, b or c can mean: a, b, c, "a and b", "a and c" , "b and c", or "a and b and c", where a, b, c can be single or multiple.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those skilled in the art understand, both explicitly and implicitly, that the embodiments described herein may be combined with other embodiments.
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。The terms "component", "module", "system", etc. used in this specification are used to refer to computer-related entities, hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to, a process, a processor, an object, an executable file, a thread of execution, a program and/or a computer running on a processor. Through the illustrations, both applications running on the computing device and the computing device may be components. One or more components can reside in a process and/or thread of execution and a component can be localized on one computer and/or distributed between 2 or more computers. Additionally, these components can execute from various computer-readable media having various data structures stored thereon. A component may, for example, be based on a signal having one or more data packets (eg, data from two components interacting with another component, a local system, a distributed system, and/or a network, such as the Internet, which interacts with other systems via signals) Communicate through local and/or remote processes.
首先,为了便于理解本申请实施例,以下具体分析本申请实施例所需要解决的技术问题以及应用场景。First of all, in order to facilitate understanding of the embodiments of the present application, the technical problems and application scenarios that need to be solved by the embodiments of the present application are analyzed in detail below.
现有技术中,对于人工智能(Artificial Intelligence,AI)专用芯片上运行AI模型,最关注的两点就是芯片的精度与性能,即计算结果是否精准,计算速度是否迅速。而一个AI芯片针对某些方面的算力是固定的,例如:昇腾710AI处理器在Vector类计算上对于fp16的计算速度要比fp32的计算速度快一倍,甚至在某些计算上算力差距更是超过2倍。fp16是指采用2字节(16位)进行编码存储的一种数据类型;同理fp32是指采用4字节(32位)。In the existing technology, when it comes to running AI models on dedicated artificial intelligence (AI) chips, the two most important points are the accuracy and performance of the chip, that is, whether the calculation results are accurate and whether the calculation speed is fast. The computing power of an AI chip for certain aspects is fixed. For example, the Ascend 710AI processor's calculation speed for fp16 is twice as fast as that for fp32 in vector calculations, and even the computing power for some calculations is The difference is more than 2 times. fp16 refers to a data type that uses 2 bytes (16 bits) for encoding and storage; similarly, fp32 refers to using 4 bytes (32 bits).
对于这种普遍的算力差距,业界均使用混合精度的办法,例如:将AI模型的某些层的运算类型转换为fp16的计算,某些层的计算仍然保持fp32的计算,这种整个AI模型中既包含fp16也包含fp32的混合精度计算的办法,来提升整个AI模型的执行效率,同时精度也不会有很大的损失。To deal with this common computing power gap, the industry uses mixed precision methods. For example, the calculation types of some layers of the AI model are converted to fp16 calculations, and the calculations of some layers still maintain fp32 calculations. This kind of entire AI The model includes both fp16 and fp32 mixed precision calculation methods to improve the execution efficiency of the entire AI model without significant loss of accuracy.
请参考附图1,图1是本申请实施例提供的一种算法模型的部分结构示意图。如图1所示,该部分算法模型包括三种算子,即,MatMul、Add和Softmax。在编译的过程中,如果不指定算子的精度模式,每个算子节点均将默认使用高精度运行方式(高精度运行方式可以保证所有的值域范围精度都是正确的),如fp32等。Please refer to Figure 1. Figure 1 is a partial structural schematic diagram of an algorithm model provided by an embodiment of the present application. As shown in Figure 1, this part of the algorithm model includes three operators, namely MatMul, Add and Softmax. During the compilation process, if the accuracy mode of the operator is not specified, each operator node will use the high-precision operation mode by default (the high-precision operation mode can ensure that the accuracy of all value ranges is correct), such as fp32, etc. .
但是同一个芯片在不同类型的运算上的计算效率也存在很大的差异,例如下面是一些算子不同的计算方法的示例:However, there are also great differences in the computing efficiency of the same chip on different types of operations. For example, the following are examples of different calculation methods for some operators:
示例性的:Div算子。Div算子的计算逻辑为:求两个张量(Tensor)的相除的结果,即求x÷y,x与y是一组满足broadcast关系的Tensor。Example: Div operator. The calculation logic of the Div operator is: find the division result of two tensors (Tensor), that is, find x÷y, x and y are a set of Tensors that satisfy the broadcast relationship.
而除法在某些AI芯片上的计算效率比较低下(即运算性能差),但是运算精度高(即运算结果准确且误差小),而求倒数在上述AI芯片上的计算效率比较高,但是运算精度要差一些。然而该AI芯片可以使用牛顿迭代的方法提升计算精度,牛顿迭代的次数多了,精度会上去,但是性能会下去。由于这些特征,Div算子可以提供多个实现版本适用于该AI芯片。例如:The calculation efficiency of division on some AI chips is relatively low (i.e., the calculation performance is poor), but the calculation accuracy is high (i.e., the calculation result is accurate and the error is small), while the calculation efficiency of reciprocal calculation on the above-mentioned AI chips is relatively high, but the calculation The accuracy is slightly worse. However, the AI chip can use the Newton iteration method to improve calculation accuracy. The more Newton iterations are used, the accuracy will go up, but the performance will go down. Due to these characteristics, the Div operator can provide multiple implementation versions suitable for this AI chip. For example:
高精度运行方式:使用AI芯片的Div指令进行计算,特点精度高,性能较差。High-precision operation mode: Use the Div instruction of the AI chip for calculation, which features high precision but poor performance.
高性能运行方式:使用AI芯片的求倒数,然后做乘法,即计算x*(1/y),特点精度稍差,性能较高。High-performance operation mode: Use the AI chip to find the reciprocal, and then do multiplication, that is, calculate x*(1/y), with slightly lower accuracy and higher performance.
示例性的:Softmax算子。Softmax算子的计算逻辑为:Illustrative: Softmax operator. The calculation logic of the Softmax operator is:
第一步:求reduce轴上的max值。Step 1: Find the max value on the reduce axis.
第二步:对reduce轴上的数据减去这个max值。Step 2: Subtract the max value from the data on the reduce axis.
第三步:求reduce轴上的数据的exp值。Step 3: Find the exp value of the data on the reduce axis.
第四步:求reduce轴上的数据的exp值的累加和。Step 4: Find the cumulative sum of exp values of the data on the reduce axis.
第五步:求reduce轴上的数据的exp值除其累加和。Step 5: Find the exp value of the data on the reduce axis divided by its cumulative sum.
针对第一步,在AI芯片上对于fp32的max值计算效率远低于fp16的max值计算,因此求一个fp32的Tensor的Softmax结果的时候,如果其第二步计算使用fp32进行计算,效率就不高。而将fp32转换成fp16再去计算,性能就会提升很多。但是转换成fp16可能会导致计算的精度很不好。例如:当输入的数 据值大小不超过的fp16的最大值(65504)的时候,这个转换对最终精度没有影响,当输入的数据值大小超过fp16的最大值(65504)的时候,这个转换就会导致计算的最终结果不正确。由于这些特征,Softmax算子可以提供2中实现版本:For the first step, the efficiency of calculating the max value of fp32 on the AI chip is much lower than that of fp16. Therefore, when calculating the Softmax result of an fp32 Tensor, if the second step is calculated using fp32, the efficiency will be not tall. If you convert fp32 to fp16 and then perform calculations, the performance will be improved a lot. However, converting to fp16 may result in poor calculation accuracy. For example: when the number entered When the data value size does not exceed the maximum value of fp16 (65504), this conversion has no effect on the final accuracy. When the input data value size exceeds the maximum value of fp16 (65504), this conversion will lead to the final result of the calculation. Incorrect. Due to these characteristics, the Softmax operator can provide 2 implementation versions:
高精度运行方式:针对max值计算,不进行fp32到fp16的转换。High-precision operation mode: for max value calculation, no conversion from fp32 to fp16 is performed.
高性能运行方式:针对max值计算,进行fp32到fp16的转换。High-performance operation mode: Convert fp32 to fp16 for max value calculation.
对于以上的场景,现有的AI框架没有提供有效的方法支持该扩展。需要模型开发者通过自定义算子的方式进行提供定制化算子,然后模型中再使用该定制化算子,以提升模型运行性能。例如:对于上述Softmax算子,需要模型开发者提供在AI框架中实现一个高性能运行方式的“My Softmax”算子,然后模型中使用该新的算子。For the above scenarios, existing AI frameworks do not provide effective methods to support this extension. Model developers need to provide customized operators in the form of custom operators, and then use the customized operators in the model to improve model running performance. For example: For the above Softmax operator, the model developer needs to provide the "My Softmax" operator that implements a high-performance operation mode in the AI framework, and then use the new operator in the model.
缺点1:虽然实现了不同运行方式的算子,但是重新定义一个算子端到端需要修改的地方很多,例如:需要完整的定义并实现一个MySoftmax算子,同时还需要基于AI模型进行修改,使用这个新的算子而非旧算子。Disadvantage 1: Although operators with different operating modes are implemented, redefining an operator end-to-end requires many modifications. For example, a MySoftmax operator needs to be fully defined and implemented, and it also needs to be modified based on the AI model to use this new operator instead of the old one.
缺点2:AI框架没有内置这种新的算子的能力,即算子库中并不存储有该新的算子,需要算子开发者识别哪些算子可以有多种运行方式,在算子开发的时候,就提供多种运行方式的新算子。该方式增加了学习成本,需要AI模型开发者去熟悉芯片特征进行优化。Disadvantage 2: The AI framework does not have the built-in capability of this new operator, that is, the new operator is not stored in the operator library. Operator developers need to identify which operators can have multiple operating modes. During development, new operators with multiple operating modes are provided. This method increases learning costs and requires AI model developers to be familiar with chip characteristics for optimization.
因此,本申请实施例提供了一种算子运行方式的配置方法、装置及相关系统,不需要定义新的算子,可以在原有的算子框架上实现算子多种运行方式。例如:本申请实施例提供了一种算子运行方式的配置方法,可以在确定算子模型M个算子节点的每个算子节点对应的算子,以及,M个算子对应的N类算子后,可以根据每类所述算子对应的配置参数或每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式。其中,N类算子每类所述算子均可以有多种不同运算精度和/或运算速度的运行方式,进而导致算法模型也可以有多种运行方式,也为了保证算子模型中算子的运算精度同时提高算子的运算性能,本申请实施例可以根据算子模型的运算特点,基于算法模型中每类所述算子对应的配置参数或者基于算法模型中每个所述算子节点对应的配置选项,配置所述算法模型中M个算子的运行方式。上述通过配置参数或者配置选项,配置所述算法模型的运行方式的方法可以避免现有技术中算法模型只能按照固定的处理方式进行运算,使得算法模型的性能与精度的优化更加灵活。Therefore, the embodiments of the present application provide a configuration method, device and related system for operator operation modes. There is no need to define new operators, and multiple operation modes of operators can be implemented on the original operator framework. For example: the embodiment of this application provides a configuration method of operator operation mode, which can determine the operator corresponding to each of the M operator nodes of the operator model, and the N categories corresponding to the M operators. After the operator is added, the operation mode of the M operators in the algorithm model can be configured according to the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node. Among them, the operators of each type of N types of operators can have a variety of operation modes with different operation accuracy and/or operation speed, which leads to the algorithm model also having multiple operation modes. In order to ensure that the operators in the operator model The operation accuracy and the operation performance of the operator can be improved at the same time. The embodiments of this application can be based on the operation characteristics of the operator model, based on the configuration parameters corresponding to each type of operator in the algorithm model or based on each operator node in the algorithm model. The corresponding configuration options configure the operation mode of the M operators in the algorithm model. The above-mentioned method of configuring the operation mode of the algorithm model through configuring parameters or configuration options can prevent the algorithm model in the existing technology from only performing operations according to a fixed processing method, making the optimization of the performance and accuracy of the algorithm model more flexible.
基于上述提出的技术问题以及本申请中对应的应用场景,也为了便于理解本申请实施例,下面先对本申请实施例所基于的其中一种算子运行方式的配置系统架构进行描述。请参阅图2,图2是本申请实施例提供的一种算子运行方式的配置构架示意图。如图2所示,该算子运行方式的配置系统架构包括:模型编译器101和算子编译器102;还可以包括:算子库103和调优器104。其中:Based on the technical issues raised above and the corresponding application scenarios in this application, and to facilitate understanding of the embodiments of this application, the configuration system architecture of one of the operator operation modes based on the embodiments of this application is first described below. Please refer to Figure 2. Figure 2 is a schematic diagram of the configuration architecture of an operator operation mode provided by an embodiment of the present application. As shown in Figure 2, the configuration system architecture of this operator operation mode includes: a model compiler 101 and an operator compiler 102; it may also include: an operator library 103 and a tuner 104. in:
模型编译器101,用于基于算法模型确定M个算子节点对应的M个算子,其中,一个算子节点对应一个算子,每个所述算子属于N类算子中的一类,每类所述算子分别对应多种运行方式,每种所述运行方式之间的运算精度和/或运算速度不同;M为大于或等于N的整数,N为大于或等于1的整数。The model compiler 101 is used to determine M operators corresponding to M operator nodes based on the algorithm model, where one operator node corresponds to one operator, and each of the operators belongs to one of the N types of operators, Each type of operator corresponds to multiple operation modes, and the operation accuracy and/or operation speed between each operation mode are different; M is an integer greater than or equal to N, and N is an integer greater than or equal to 1.
示例性的,该模型编译器101可以为AI模型编译器(Ascend Tensor Complier,ATC),算子的开发接口提供给模型开发者,使得模型开发者可以使用ATC编译其模型,使得模型可以在Ascend等平台上运行。For example, the model compiler 101 can be an AI model compiler (Ascend Tensor Complier, ATC). The operator development interface is provided to model developers, so that model developers can use ATC to compile their models, so that the models can be used in Ascend. Wait for the platform to run.
算子编译器102,用于基于每类所述算子对应的配置参数或每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式;其中,所述配置参数或所述配置选项用于指示对应算子的运行方式。The operator compiler 102 is used to configure the operation mode of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node; wherein, The configuration parameters or configuration options are used to indicate the operation mode of the corresponding operator.
示例性的,算子编译器102可以是图编译引擎以及融合引擎(Graph Engine&Fusion Engine,GE&FE)中包含的算子编译器,其中,AI模型中的每个算子节点经过各种处理后,最终需要通过算子编译器编译出可以在AI芯片上运行的算子。本申请实施例中提供的算子编译器102可以根据模型中每个算子节点的配置,读取相应的算子精度模式配置(即算子对应的运行方式),按照配置编译出可以在AI芯片上运行的实现版本。For example, the operator compiler 102 can be an operator compiler included in a graph compilation engine and a fusion engine (Graph Engine&Fusion Engine, GE&FE). In which, after each operator node in the AI model undergoes various processing, it is finally The operator compiler needs to be used to compile operators that can be run on the AI chip. The operator compiler 102 provided in the embodiment of this application can read the corresponding operator precision mode configuration (that is, the corresponding operating mode of the operator) according to the configuration of each operator node in the model, and compile according to the configuration a program that can be used in the AI Implementation version running on the chip.
需要说明的是,本申请实施例及以下相关实施例提及的AI模型相当于算法模型。It should be noted that the AI model mentioned in the embodiments of this application and the following related embodiments is equivalent to an algorithm model.
算子库103,提供多种类型的算子以及每种类型的算子的多种运行方式供算子编译器102进行编译。即,在本申请实施例中,算子库103中可以对一种算子可以提供多个版本(即,每类所述算子分别对应多种运行方式)的实现信息(即,多种运行方式分别对应的配置信息),还可以提供所有种类的算子分别对应的一个默认的高性能模式的配置文件和一个默认的高精度模式的配置文件,以供实现算子的高精度或高性能的运行。例如:算子库提供默认的高精度模式配置文件high_performance.ini包括: SoftmaxV2=high_performance,以及RealDiv=high_performance,即该高精度模式配置文件包括了SoftmaxV2算子和RealDiv算子的高精度运行方式的配置信息。又例如:算子库提供默认的高性能模式配置文件high_precision.ini包括:SoftmaxV2=high_performance,以及RealDiv=high_performance,即该高性能模式配置文件包括了SoftmaxV2算子和RealDiv算子的高性能运行方式的配置信息。The operator library 103 provides multiple types of operators and multiple operating modes of each type of operators for the operator compiler 102 to compile. That is, in this embodiment of the present application, the operator library 103 can provide multiple versions of an operator (that is, each type of operator corresponds to multiple operating modes) and implementation information (that is, multiple operating modes). Configuration information corresponding to each mode), a default high-performance mode configuration file and a default high-precision mode configuration file corresponding to all types of operators can also be provided to achieve high precision or high performance of the operators. of operation. For example: the operator library provides the default high-precision mode configuration file high_performance.ini including: SoftmaxV2=high_performance, and RealDiv=high_performance, that is, the high-precision mode configuration file includes the configuration information of the high-precision operation mode of the SoftmaxV2 operator and the RealDiv operator. Another example: the operator library provides the default high-precision mode configuration file high_precision.ini including: SoftmaxV2=high_performance, and RealDiv=high_performance, that is, the high-performance mode configuration file includes the high-performance operation mode of the SoftmaxV2 operator and the RealDiv operator. Configuration information.
调优器104,可以称为Auto Tune。可以读取算子库103中提供的多版本实现信息,基于模型的计算特征或者基于芯片的算力特征通过自动调优算子生成精度模式配置。例如:相较于除法运算,求倒数在相关的AI芯片上的计算效率比较高。The tuner 104 can be called Auto Tune. The multi-version implementation information provided in the operator library 103 can be read, and the precision mode configuration is generated by automatically tuning the operator based on model-based computing features or chip-based computing power features. For example: Compared with division operations, the calculation efficiency of reciprocal calculation on related AI chips is relatively high.
示例性的,调优器104可以读取算子库103中配置文件,可以获取多种算子分别对应的多种运行方式。即,获取哪些算子可以调整运行方式(精度模式配置),以及可以配置哪些精度模式。其中,配置文件中可以包括多种运行方式,示例性的:从左到右的顺序为往左侧性能越高,往右侧精度越高。For example, the tuner 104 can read the configuration file in the operator library 103 and obtain multiple operating modes corresponding to various operators. That is, obtain which operators can adjust the operating mode (precision mode configuration), and which precision modes can be configured. Among them, the configuration file can include multiple operating modes. For example, the order from left to right is: the higher the performance is to the left, the higher the accuracy is to the right.
在本申请实施例中,调优器104可以并行运行一个算子的所有的多种运行方式,然后将多种运行方式的输出结果与高精度运行方式的输出结果进行精度对比,从精度比较结果小于预设阈值的(用户也可以指定阈值,默认阈值为双千分之一,即两个结果Tensor中的值逐点比较每个元素值的差值绝对值不超过千分之一(abs(x-y)/min(abs(x),abs(y))<0.001),绝对值差值超过千分之一的元素个数占总Tensor中元素总数的占比不超过千分之一)运行方式中选择性能最高的一种运行方式作为该算子的调优运行方式。例如:如果一个算子的高性能与高精度比对满足阈值要求,就将其调整为高性能实现,以提升最终的AI模型运行性能。In an embodiment of the present application, the tuner 104 can run all the various operation modes of an operator in parallel, and then compare the output results of the various operation modes with the output results of the high-precision operation mode for accuracy, and select the operation mode with the highest performance as the tuning operation mode of the operator from the operation modes whose accuracy comparison results are less than a preset threshold (the user can also specify a threshold, the default threshold is double one thousandth, that is, the absolute value of the difference between the values of each element in the two result Tensors is compared point by point and does not exceed one thousandth (abs(x-y)/min(abs(x), abs(y))<0.001), and the number of elements with an absolute value difference exceeding one thousandth accounts for no more than one thousandth of the total number of elements in the total Tensor). For example: If the high performance and high precision comparison of an operator meet the threshold requirements, it will be adjusted to a high-performance implementation to improve the final AI model operation performance.
因此,本申请实施例提供了一种算子运行方式的配置系统,不需要定义新的算子,可以在原有的算子框架上实现算子多种运行方式。例如:模型编译器可以在确定算子模型M个算子节点的每个算子节点对应的算子,以及,M个算子对应的N类算子后,算子编译器可以根据每类所述算子对应的配置参数或每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式。其中,N类算子每类所述算子均可以有多种不同运算精度和/或运算速度的运行方式,进而导致算法模型也可以有多种运行方式,也为了保证算子模型中算子的运算精度同时提高算子的运算性能。本申请实施例可以根据算子模型的运算特点,基于算法模型中每类所述算子对应的配置参数或者基于算法模型中每个所述算子节点对应的配置选项,配置所述算法模型中M个算子的运行方式。上述通过配置参数或者配置选项,配置所述算法模型的运行方式的方法可以避免现有技术中算法模型只能按照固定的处理方式进行运算,使得算法模型的性能与精度的优化更加灵活。Therefore, the embodiment of the present application provides an operator operation mode configuration system, which does not need to define new operators and can implement multiple operation modes of operators on the original operator framework. For example: after the model compiler determines the operator corresponding to each of the M operator nodes of the operator model and the N types of operators corresponding to the M operators, the operator compiler can determine the The configuration parameters corresponding to the above operators or the configuration options corresponding to each of the operator nodes configure the operation mode of the M operators in the algorithm model. Among them, each type of N-type operators can have a variety of operating modes with different operating accuracy and/or operating speed, which leads to the algorithm model also having multiple operating modes. In order to ensure that the operators in the operator model The operation accuracy is improved while the operation performance of the operator is improved. Embodiments of this application can configure the algorithm model based on the operation characteristics of the operator model, based on the configuration parameters corresponding to each type of operator in the algorithm model, or based on the configuration options corresponding to each operator node in the algorithm model. How M operators operate. The above-mentioned method of configuring the operation mode of the algorithm model through configuring parameters or configuration options can prevent the algorithm model in the existing technology from only performing operations according to a fixed processing method, making the optimization of the performance and accuracy of the algorithm model more flexible.
需要说明的是,本申请实施例中所描述的算子运行方式的配置构架中各功能模块只是本申请实施例中的一种示例性的描述,本申请实施例并不对此进行限制。It should be noted that each functional module in the configuration architecture of the operator operation mode described in the embodiment of the present application is only an exemplary description in the embodiment of the present application, and the embodiment of the present application does not limit this.
还需要说明的是,本申请实施例中所描述的算子运行方式的配置构架中各功能模块的功能还可参见下述所述的方法实施例中的步骤S201-步骤S204相关描述,此处暂不赘述。It should also be noted that for the functions of each functional module in the configuration architecture of the operator operation mode described in the embodiments of this application, please refer to the relevant descriptions of steps S201 to S204 in the method embodiments described below, here No further details for now.
基于上述实施例提供的算子运行方式的配置系统架构,结合本申请中提供的算子运行方式的配置方法,对本申请中提出的技术问题进行具体分析和解决。Based on the configuration system architecture of the operator operation mode provided in the above embodiments, combined with the configuration method of the operator operation mode provided in this application, the technical problems raised in this application are specifically analyzed and solved.
参见图3,图3是本申请实施例提供的一种算子运行方式的配置方法的流程示意图,该方法可应用于上述图1中所述的算子运行方式的配置系统架构中,其中所述方法包括图3中所示的方法流程步骤S201-步骤S204。其中,上述图1所示算子运行方式的配置系统架构中的模型编译器101可以用于支持并执行图3中所示的方法流程步骤S201-步骤S202和步骤S204,所述算子编译器102可以用于支持并执行图3中所示的方法流程步骤S203。其中,Referring to Figure 3, Figure 3 is a schematic flow chart of a method for configuring an operator operating mode provided by an embodiment of the present application. This method can be applied to the configuration system architecture of the operator operating mode described in Figure 1, wherein The method includes step S201 to step S204 of the method flow shown in Figure 3. Among them, the model compiler 101 in the configuration system architecture of the operator operation mode shown in Figure 1 can be used to support and execute the method flow steps S201 to S202 and S204 shown in Figure 3. The operator compiler 102 may be used to support and execute step S203 of the method flow shown in FIG. 3 . in,
步骤S201:基于算法模型确定M个算子节点对应的M个算子。Step S201: Determine M operators corresponding to M operator nodes based on the algorithm model.
具体的,基于算法模型确定M个算子节点对应的M个算子,其中,一个算子节点对应一个算子,每个所述算子属于N类算子中的一类,每类所述算子分别对应多种运行方式,每种所述运行方式之间的运算精度和/或运算速度不同;M为大于或等于N的整数,N为大于或等于1的整数。Specifically, M operators corresponding to M operator nodes are determined based on the algorithm model, wherein one operator node corresponds to one operator, each of the operators belongs to one of N types of operators, each type of operator corresponds to multiple operating modes, and the operation accuracy and/or operation speed of each operating mode is different; M is an integer greater than or equal to N, and N is an integer greater than or equal to 1.
请参考附图4,图4是本申请实施例提供的一种算法模型的部分结构示意图。如图4所示,该部分算法模型包括三个算子节点,即,第一算子节点301、第二算子节点302和第三算子节点303。第一算子节点301对应MatMul算子、第二算子节点302对应RealDiv算子,和第三算子节点303对应Softmax算子。其中,第一算子节点301、第二算子节点302和第三算子节点303对应的算子分别属于三种不同类型的算子。每种类型的算子均可以对应多种运行方式,例如:高精度运行方式、高性能运行方式、半精度运行方式等等,上述每种运行方式的运算精度和/或运算速度不同。例如:RealDiv算子可以提供高精度运行方式: 使用fp32精度的Div指令进行计算。半精度运行方式:使用fp16精度的Div指令进行计算。高性能运行方式:使用fp16精度的求倒数,然后做乘法,即计算x*(1/y)。本申请实施例对各种算子的运行方式并不做具体的限定。Please refer to FIG. 4 , which is a partial structural diagram of an algorithm model provided by an embodiment of the present application. As shown in Figure 4, this part of the algorithm model includes three operator nodes, namely, a first operator node 301, a second operator node 302, and a third operator node 303. The first operator node 301 corresponds to the MatMul operator, the second operator node 302 corresponds to the RealDiv operator, and the third operator node 303 corresponds to the Softmax operator. Among them, the operators corresponding to the first operator node 301, the second operator node 302, and the third operator node 303 respectively belong to three different types of operators. Each type of operator can correspond to multiple operation modes, such as: high-precision operation mode, high-performance operation mode, half-precision operation mode, etc. Each of the above operation modes has different operation accuracy and/or operation speed. For example: RealDiv operator can provide high-precision operation mode: Use the Div instruction with fp32 precision for calculations. Half-precision operation mode: Use the Div instruction of fp16 precision for calculation. High-performance operation mode: use fp16 precision to find the reciprocal, and then do multiplication, that is, calculate x*(1/y). The embodiments of this application do not specifically limit the operating modes of various operators.
步骤S202:在N类算子中每类算子对应的算子接口中设置每类算子对应的配置参数。Step S202: Set the configuration parameters corresponding to each type of operator in the operator interface corresponding to each type of operator among the N types of operators.
具体的,在N类算子中每类算子对应的算子接口中设置每类算子对应的配置参数。每类算子均对应多种运行方式,在N类算子中每类算子对应的算子接口中设置每类算子对应的配置参数可以在不更改算子结构的情况下实现运行方式的配置。Specifically, the configuration parameters corresponding to each type of operator are set in the operator interface corresponding to each type of operator in the N types of operators. Each type of operator corresponds to multiple operating modes. Setting the configuration parameters corresponding to each type of operator in the operator interface corresponding to each type of N types of operators can realize the operating mode without changing the operator structure. configuration.
示例性的:现有技术中Softmax算子的算子接口可以为def softmax_v2(input_x,output_y,axis=-1,kernel_name="softmax_v2"),本申请实施例中Softmax算子的算子接口可以为:def softmax_v2(input_x,output_y,axis=-1,kernel_name="softmax_v2",impl_mode="high_performance")。其中,相较于现有技术,本申请在算子接口中新增了配置参数impl_mode="high_performance用于指示调用该Softmax算子时配置的运行方式为高性能运行方式。Exemplary: The operator interface of the Softmax operator in the prior art can be def softmax_v2 (input_x, output_y, axis=-1, kernel_name="softmax_v2"). In the embodiment of this application, the operator interface of the Softmax operator can be :def softmax_v2(input_x, output_y, axis=-1, kernel_name="softmax_v2", impl_mode="high_performance"). Among them, compared with the existing technology, this application adds a new configuration parameter impl_mode="high_performance" in the operator interface to indicate that the configured operation mode when calling the Softmax operator is the high-performance operation mode.
示例性的:本申请实施例提及的在每类算子对应的算子接口中设置每类算子对应的配置参数,还可以为在调用算子接口进行编译之前,基于编译的上下文信息中的全局参数global_build_context设置配置参数,以使得在调用该算子接口时可以直接基于该全局参数对应的配置参数配置算子的运行方式。例如:调用Softmax算子时预先设置好的全局参数为:global_build_context.set_impl_mode(“high_performance”),算子编译器根据编译的上下文信息调用Softmax算子的算子接口为def softmax_v2(input_x,output_y,axis=-1,kernel_name="softmax_v2"):global_build_context.set_impl_mode(“high_performance”),用于指示调用该Softmax算子时配置的运行方式为高性能运行方式。Exemplary: The configuration parameters corresponding to each type of operator are set in the operator interface corresponding to each type of operator mentioned in the embodiment of the present application. It is also possible to set the configuration parameters based on the global parameter global_build_context in the compiled context information before calling the operator interface for compilation, so that when calling the operator interface, the operation mode of the operator can be directly configured based on the configuration parameters corresponding to the global parameter. For example: the pre-set global parameters when calling the Softmax operator are: global_build_context.set_impl_mode("high_performance"), and the operator compiler calls the operator interface of the Softmax operator according to the compiled context information as def softmax_v2(input_x, output_y, axis=-1, kernel_name="softmax_v2"): global_build_context.set_impl_mode("high_performance"), which is used to indicate that the operation mode configured when calling the Softmax operator is a high-performance operation mode.
可选的,所述配置参数为目标配置参数、默认配置参数、调优配置参数中的一种;其中,所述目标配置参数用于指示对应算子以目标运行方式运行,所述目标运行方式为所述多种运行方式中的一种;所述默认配置参数用于指示对应算子以最高运算精度的运行方式或最快运算速度的运行方式运行;所述调优配置参数用于指示对应算子以最优运行方式运行,所述最优运行方式为基于所述算子模型确定的与所述最高运算精度的运行方式的运算结果误差在预设阈值范围内且运算速度最快的一种运行方式。Optionally, the configuration parameters are one of target configuration parameters, default configuration parameters, and tuning configuration parameters; wherein, the target configuration parameters are used to instruct the corresponding operator to run in the target operation mode, and the target operation mode It is one of the multiple operating modes; the default configuration parameters are used to instruct the corresponding operator to run in the operating mode with the highest operating accuracy or the fastest operating speed; the tuning configuration parameters are used to indicate the corresponding The operator operates in an optimal operation mode, which is one in which the operation result error determined based on the operator model and the operation mode with the highest operation accuracy is within a preset threshold range and has the fastest operation speed. mode of operation.
可以理解的是,本申请实施例中的配置参数可以分成三类,例如:配置参数可以为目标配置参数、默认配置参数、调优配置参数中的任意一种。其中,目标配置参数可以用于指示用户设置的一种运行方式;即:用户可以直接将配置参数设置为需求的目标运行方式的目标配置参数,示例性的:目标配置参数可以指示一个ini文件的路径,表示使用这个ini文件中的配置信息为对应的算子选择对应的精度实现版本(也即运行方式)。默认配置参数可以用于指示该类算子以最高运算精度的运行方式或最快运算速度的运行方式运行;例如:默认配置参数可以指示算子库内置的默认高精度与高性能的配置,如上述Softmax算子的算子接口中对应的配置参数。调优配置参数可以用于指示以在当前算子模型的运算特征下的最优运行方式运行。又例如:调优配置参数表示使用调优器(auto_tune模块)进行自动调优后的运行方式。该调优配置参数可以用于指示调优器从多种运行方式中选择的误差在预设阈值范围内且性能最高的一种运行方式。请参考附图5,图5是本申请实施例提供的几种运行方式的算法模型示意图。如图5所示,本申请基于上述图4所示的算法模型结构,分别运行了高精度运行方式和高性能运行方式,可以对比两种运行方式的结果选择出最优运行方式。It can be understood that the configuration parameters in the embodiments of the present application can be divided into three categories. For example, the configuration parameters can be any one of target configuration parameters, default configuration parameters, and tuning configuration parameters. Among them, the target configuration parameters can be used to indicate an operating mode set by the user; that is, the user can directly set the configuration parameters to the target configuration parameters of the required target operating mode. For example: the target configuration parameters can indicate an ini file. Path, indicating that the configuration information in this ini file is used to select the corresponding precision implementation version (that is, the running mode) for the corresponding operator. Default configuration parameters can be used to instruct this type of operator to run in the highest computing accuracy or fastest computing speed mode; for example: the default configuration parameters can indicate the default high-precision and high-performance configuration built into the operator library, such as The corresponding configuration parameters in the operator interface of the above Softmax operator. Tuning configuration parameters can be used to indicate running in the optimal operating mode under the operating characteristics of the current operator model. Another example: the tuning configuration parameters represent the running mode after automatic tuning using the tuner (auto_tune module). This tuning configuration parameter can be used to instruct the tuner to select an operating mode from multiple operating modes that has an error within a preset threshold and has the highest performance. Please refer to Figure 5. Figure 5 is a schematic diagram of an algorithm model of several operating modes provided by embodiments of the present application. As shown in Figure 5, this application runs the high-precision operation mode and the high-performance operation mode respectively based on the algorithm model structure shown in Figure 4 above. The results of the two operation modes can be compared to select the optimal operation mode.
其中,该预设阈值范围可以示指定阈值,也可以是默认阈值,该默认阈值为双千分之一,即两个结果(最优运行方式的运行结果与高精度运行方式的运行结果)Tensor中的值逐点比较每个元素值的差值绝对值不超过千分之一(abs(x-y)/min(abs(x),abs(y))<0.001),绝对值差值超过千分之一的元素个数占总Tensor中元素总数的占比不超过千分之一。Among them, the preset threshold range can represent a specified threshold, or it can be a default threshold. The default threshold is one in two thousand, that is, two results (the operation result of the optimal operation mode and the operation result of the high-precision operation mode) Tensor The values in are compared point by point. The absolute value of the difference between each element value does not exceed one thousandth (abs(x-y)/min(abs(x), abs(y))<0.001), and the absolute value difference exceeds one thousandth. The number of elements of one accounts for no more than one thousandth of the total number of elements in the total Tensor.
可选的,接收输入信息并基于所述输入信息确定所述M个算子节点以及每个所述算子节点对应的所述配置选项。具体的,接收输入信息并基于所述输入信息确定所述M个算子节点以及每个所述算子节点对应的所述配置选项;其中,所述配置选项包括目标配置参数、默认配置参数、调优配置参数中的一种。该输入信息可以是用户输入或选择操作,响应于该操作,可以设置每个算子节点对应的配置选项。相对应的,该配置选项也包括目标配置参数、默认配置参数、调优配置参数中的一种。其中上述三种配置参数可以对应参考上述实施例的相关描述,本申请实施例不再赘述。Optionally, input information is received and the M operator nodes and the configuration options corresponding to each operator node are determined based on the input information. Specifically, input information is received and the M operator nodes and the configuration options corresponding to each operator node are determined based on the input information; wherein the configuration options include target configuration parameters, default configuration parameters, One of the tuning configuration parameters. The input information may be user input or selection operation, and in response to the operation, the configuration options corresponding to each operator node may be set. Correspondingly, the configuration option also includes one of target configuration parameters, default configuration parameters, and tuning configuration parameters. The above three configuration parameters can be referred to the relevant descriptions of the above embodiments, and will not be described again in the embodiments of this application.
可以理解的是,在算子接口处设置配置参数,是指该算子模型在调取编译该类算子时,均是该配置参数指示的运行方式;在算子节点处设置配置选项,是指该算子模型在编译该算子节点对应的算子时,是该配置选项指示的运行方式。 It can be understood that setting configuration parameters at the operator interface means that the operator model will run in the mode indicated by the configuration parameters when calling and compiling this type of operator; setting configuration options at the operator node means Refers to the operating mode indicated by this configuration option when compiling the operator corresponding to the operator node of the operator model.
步骤S203:基于每类算子对应的配置参数或每个算子节点对应的配置选项,配置算法模型中M个算子的运行方式。Step S203: Configure the operating modes of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node.
具体的,基于每类算子对应的配置参数,配置算法模型中M个算子的运行方式。或,基于每个算子节点对应的配置选项,配置算法模型中M个算子的运行方式。相应的,算子编译器在读取/获取到M个算子对应的运行方式后,根据该运行方式在算子库中读取相应的配置文件以编译算子。Specifically, based on the configuration parameters corresponding to each type of operator, the operation mode of the M operators in the algorithm model is configured. Or, configure the operation mode of M operators in the algorithm model based on the configuration options corresponding to each operator node. Correspondingly, after the operator compiler reads/obtains the operation modes corresponding to the M operators, it reads the corresponding configuration files in the operator library according to the operation modes to compile the operators.
在一种可能实现的方式中,在算子节点对应的配置选项与每类算子对应的配置参数同时被设置时,以算子节点对应的配置选项中对应的运行方式作为当前算子节点对应算子的运行方式。In one possible implementation method, when the configuration options corresponding to the operator node and the configuration parameters corresponding to each type of operator are set at the same time, the corresponding operating mode in the configuration options corresponding to the operator node is used as the corresponding operation mode of the current operator node. How the operator operates.
在另一种可能实现的方式中,在算子节点对应的配置选项与每类算子对应的配置参数同时被设置时,以每类算子对应的配置参数中对应的运行方式作为当前算子节点对应算子的运行方式。In another possible implementation, when the configuration options corresponding to the operator node and the configuration parameters corresponding to each type of operator are set at the same time, the corresponding operating mode in the configuration parameters corresponding to each type of operator is used as the current operator. The node corresponds to the operation mode of the operator.
在另一种可能实现的方式中,在算子节点对应的配置选项与每类算子对应的配置参数均未被设置时,以每类算子对应的配置参数中对应的高精度运行方式作为当前算子节点对应算子的运行方式。In another possible implementation method, when the configuration options corresponding to the operator node and the configuration parameters corresponding to each type of operator are not set, the corresponding high-precision operation mode in the configuration parameters corresponding to each type of operator is used as the The current operator node corresponds to the operating mode of the operator.
在一种可能实现的方式中,所述基于每类所述算子对应的配置参数,分别配置所述算法模型中所述M个算子的运行方式,包括:通过所述N类算子中每类算子对应的算子接口,分别调用所述N类算子并获取对应的所述配置参数;基于所述N类算子中每类所述算子对应的配置参数,配置所述算法模型中所述N类算子中每类所述算子分别对应的运行方式。In a possible implementation manner, configuring the operating modes of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operators includes: using the N types of operators The operator interface corresponding to each type of operator calls the N types of operators respectively and obtains the corresponding configuration parameters; configure the algorithm based on the configuration parameters corresponding to each type of the N types of operators. The corresponding operating mode of each type of operator in the N types of operators described in the model.
在算子实现接口中设置每类算子对应的配置参数,可以实现算子级别的精度模式设置。在通过每类算子对应的算子接口调用该类算子时,可以根据算子实现接口中配置参数不同编译不同运行方式的算子,最终实现算子模型的精度和/或性能的调整,使得算法模型的性能与精度的优化更加灵活。Setting the configuration parameters corresponding to each type of operator in the operator implementation interface enables operator-level precision mode setting. When calling each type of operator through the operator interface corresponding to this type of operator, operators with different operating modes can be compiled according to different configuration parameters in the operator implementation interface, and finally the accuracy and/or performance of the operator model can be adjusted. This makes the optimization of algorithm model performance and accuracy more flexible.
在一种可能实现的方式中,所述配置参数包括对应运行方式的配置路径;所述基于所述N类算子中每类所述算子对应的配置参数,配置所述算法模型中所述N类算子中每类所述算子分别对应的运行方式,包括:基于所述N类算子中每类所述算子对应的配置参数,获取所述N类算子中每类所述算子对应运行方式的配置路径;基于所述配置路径读取对应的配置文件,以配置所述对应类型算子在所述算法模型运算过程中的运行方式。示例性的,配置参数可以包括对应运行方式配置路径。该配置路径用于指示对应运行方式的配置文件,该配置文件包括相关的配置信息,以便在读取该配置文件后配置对应类型算子在所述算法模型运算过程中的运行方式。例如:第一类算子的配置参数为目标配置参数时,可以包括目标运行方式的配置路径,基于该目标运行方式的配置路径读取目标运行方式的配置文件;再根据该配置文件配置第一类算子以目标运行方式运行。In a possible implementation manner, the configuration parameters include a configuration path corresponding to the operating mode; and based on the configuration parameters corresponding to each type of the N types of operators, the configuration parameters in the algorithm model are The operation mode corresponding to each type of operator in the N types of operators includes: based on the configuration parameters corresponding to each type of operators in the N types of operators, obtaining the operation mode of each type of operators in the N types of operators. The configuration path of the corresponding operating mode of the operator; the corresponding configuration file is read based on the configuration path to configure the operating mode of the corresponding type of operator during the operation of the algorithm model. For example, the configuration parameters may include corresponding running mode configuration paths. The configuration path is used to indicate the configuration file corresponding to the operation mode. The configuration file includes relevant configuration information, so that after reading the configuration file, the operation mode of the corresponding type of operator can be configured during the operation of the algorithm model. For example: when the configuration parameters of the first type of operator are target configuration parameters, they can include the configuration path of the target operating mode, and read the configuration file of the target operating mode based on the configuration path of the target operating mode; and then configure the first operating mode based on the configuration file. Class operators run in target run mode.
需要说明的是,该目标运行方式可以是高精度运行方式,本申请对此不做具体的限定。It should be noted that the target operation mode may be a high-precision operation mode, which is not specifically limited in this application.
可选的,通过所述N类算子中每类算子对应的算子接口,分别调用所述N类算子;基于每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子节点中每个所述算子节点对应的算子的运行方式。其中,基于每个所述算子节点调用对应类型算子的算子接口,并基于每个所述算子节点对应的配置选项,配置调用算子的运行方式。Optionally, call the N types of operators respectively through the operator interface corresponding to each type of N types of operators; configure the algorithm model based on the configuration options corresponding to each of the operator nodes. The operation mode of the operator corresponding to each of the M operator nodes. Among them, based on each operator node, the operator interface of the corresponding type of operator is called, and based on the configuration option corresponding to each operator node, the running mode of the called operator is configured.
除了在算子实现接口中设置每类算子对应的配置参数之外,还可以基于用户的输入信息设置每个算子节点对应的配置选项,实现算子节点级别的精度模式设置,即处于不同算子节点的同类型算子可以被配置不同的运行方式。该配置选项同配置参数也可以在三种配置参数中选择一种进行设置,在进行模型编译时,每个算子节点通过对应类型的算子接口调用该类算子时,可以根据该算子节点对应的配置选项配置该算子节点的算子的运行方式,最终实现算子模型的精度和/或性能的调整,使得算法模型的性能与精度的优化更加灵活。In addition to setting the configuration parameters corresponding to each type of operator in the operator implementation interface, you can also set the configuration options corresponding to each operator node based on the user's input information to implement the precision mode setting at the operator node level, that is, operators of the same type in different operator nodes can be configured to run in different ways. The configuration options and configuration parameters can also be set by selecting one of the three configuration parameters. When compiling the model, when each operator node calls the operator of this type through the operator interface of the corresponding type, the operation mode of the operator of the operator node can be configured according to the configuration options corresponding to the operator node, and finally the precision and/or performance of the operator model can be adjusted, making the optimization of the performance and precision of the algorithm model more flexible.
步骤S204:基于M个算子的运行方式,编译并运行算法模型。Step S204: Compile and run the algorithm model based on the operation modes of the M operators.
具体的,基于M个算子的运行方式,编译并运行算法模型。在获取到每个算子的运行方式后,可以依次或并行编译出可以在芯片上运行的算子获得并运行该算法模型。Specifically, based on the operation mode of M operators, the algorithm model is compiled and run. After obtaining the operation mode of each operator, the operators that can be run on the chip can be compiled sequentially or in parallel to obtain and run the algorithm model.
在本申请提供的实施例中,在确定算子模型M个算子节点的每个算子节点对应的算子,以及,M个算子对应的N类算子后,可以根据每类所述算子对应的配置参数或每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式。其中,N类算子每类所述算子均可以有多种不同运算精度和/或运算速度的运行方式,进而导致算法模型也可以有多种运行方式,也为了保证算子模型中算子的运算精度同时提高算子的运算性能,本申请实施例可以根据算子模型的运算特点,基于算法模型中每类所述算子对应的配置参数或者基于算法模型中每个所述算子节点对应的配置选项,配置所述算法模型中M个算子的运行方式。上述通过配置参数或者配置选项,配置所述算法模型的运行方式的方法可以避免现有技术中算法模型只能按照固定的处理方式进行运算,使得本申请实施例中的算法模型的性能与精度的优化更加灵活。 In the embodiment provided by this application, after determining the operator corresponding to each of the M operator nodes of the operator model, and the N types of operators corresponding to the M operators, the operator can be determined according to the description of each type. The configuration parameters corresponding to the operator or the configuration options corresponding to each operator node configure the operation mode of the M operators in the algorithm model. Among them, each type of N-type operators can have a variety of operating modes with different operating accuracy and/or operating speed, which leads to the algorithm model also having multiple operating modes. In order to ensure that the operators in the operator model The operation accuracy of the operator is improved while the operation performance of the operator is improved. The embodiments of this application can be based on the operation characteristics of the operator model, based on the configuration parameters corresponding to each type of operator in the algorithm model or based on each operator node in the algorithm model. The corresponding configuration options configure the operation mode of the M operators in the algorithm model. The above method of configuring the operation mode of the algorithm model through configuring parameters or configuration options can avoid that the algorithm model in the prior art can only operate according to a fixed processing method, so that the performance and accuracy of the algorithm model in the embodiment of the present application are improved. Optimization is more flexible.
上述详细阐述了本申请实施例的方法,下面提供了本申请实施例的相关装置。The methods of the embodiments of the present application are described in detail above, and the relevant devices of the embodiments of the present application are provided below.
请参见图6,图6是本申请实施例提供的一种算子运行方式的配置装置,该算子运行方式的配置装置60应用于算子运行方式的配置系统结构,适用于上述算子运行方式的配置方法。其中算子运行方式的配置装置60可以包括:确定单元401和配置单元402,还可以包括:设置单元403和编译单元404。其中,各个单元的详细描述如下。Please refer to Figure 6. Figure 6 is an operator operation mode configuration device provided by an embodiment of the present application. The operator operation mode configuration device 60 is applied to the configuration system structure of the operator operation mode and is suitable for the above operator operation. configuration method. The operator operation mode configuration device 60 may include a determination unit 401 and a configuration unit 402, and may also include a setting unit 403 and a compilation unit 404. Among them, the detailed description of each unit is as follows.
确定单元401,用于基于算法模型确定M个算子节点对应的M个算子,其中,一个算子节点对应一个算子,每个所述算子属于N类算子中的一类,每类所述算子分别对应多种运行方式,每种所述运行方式之间的运算精度和/或运算速度不同;M为大于或等于N的整数,N为大于或等于1的整数;The determination unit 401 is used to determine M operators corresponding to M operator nodes based on the algorithm model, where one operator node corresponds to one operator, and each operator belongs to one of the N types of operators. The operators described in this class respectively correspond to multiple operating modes, and the operation accuracy and/or operation speed between each of the operation modes are different; M is an integer greater than or equal to N, and N is an integer greater than or equal to 1;
配置单元402,用于基于每类所述算子对应的配置参数或每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式;其中,所述配置参数或所述配置选项用于指示对应算子的运行方式。The configuration unit 402 is configured to configure the operating modes of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node; wherein, Configuration parameters or configuration options are used to indicate the operation mode of the corresponding operator.
在一种可能实现的方式中,所述装置还包括:设置单元403,用于在所述N类算子中每类算子对应的算子接口中设置每类算子对应的所述配置参数;所述配置单元402,具体用于:通过所述N类算子中每类算子对应的算子接口,分别调用所述N类算子并获取对应的所述配置参数;基于所述N类算子中每类所述算子对应的配置参数,配置所述算法模型中所述N类算子中每类所述算子分别对应的运行方式。In a possible implementation manner, the device further includes: a setting unit 403, configured to set the configuration parameters corresponding to each type of operator in the operator interface corresponding to each type of operator in the N types of operators. ; The configuration unit 402 is specifically configured to: call the N types of operators through the operator interface corresponding to each type of operator in the N types of operators and obtain the corresponding configuration parameters; based on the N The configuration parameters corresponding to each type of operator in the operator type configure the operation mode corresponding to each type of operator in the N types of operators in the algorithm model.
在一种可能实现的方式中,所述配置参数为目标配置参数、默认配置参数、调优配置参数中的一种;其中,所述目标配置参数用于指示对应算子以目标运行方式运行,所述目标运行方式为所述多种运行方式中的一种;所述默认配置参数用于指示对应算子以最高运算精度的运行方式或最快运算速度的运行方式运行;所述调优配置参数用于指示对应算子以最优运行方式运行,所述最优运行方式为基于所述算子模型确定的与所述最高运算精度的运行方式的运算结果误差在预设阈值范围内且运算速度最快的一种运行方式。In a possible implementation manner, the configuration parameter is one of a target configuration parameter, a default configuration parameter, and a tuning configuration parameter; wherein the target configuration parameter is used to instruct the corresponding operator to run in the target operation mode, The target operating mode is one of the multiple operating modes; the default configuration parameters are used to instruct the corresponding operator to run in the operating mode with the highest operating accuracy or the fastest operating speed; the tuning configuration The parameter is used to indicate that the corresponding operator runs in the optimal operation mode. The optimal operation mode is that the operation result error determined based on the operator model and the operation mode with the highest operation accuracy is within the preset threshold range and the operation result The fastest way to run.
在一种可能实现的方式中,所述配置参数包括对应运行方式的配置路径;所述配置单元402,具体用于:基于所述N类算子中每类所述算子对应的配置参数,获取所述N类算子中每类所述算子对应运行方式的配置路径;基于所述配置路径读取对应的配置文件,以配置所述对应类型算子在所述算法模型运算过程中的运行方式。In a possible implementation manner, the configuration parameters include a configuration path corresponding to the operating mode; the configuration unit 402 is specifically configured to: based on the configuration parameters corresponding to each type of the N types of operators, Obtain the configuration path corresponding to the operation mode of each type of operator in the N types of operators; read the corresponding configuration file based on the configuration path to configure the operation of the corresponding type of operator in the algorithm model operation process. Operation mode.
在一种可能实现的方式中,所述确定单元401,还用于:接收输入信息并基于所述输入信息确定所述M个算子节点以及每个所述算子节点对应的所述配置选项;其中,所述配置选项包括目标配置参数、默认配置参数、调优配置参数中的一种;所述配置单元402,具体用于:通过所述N类算子中每类算子对应的算子接口,分别调用所述N类算子;基于每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子节点中每个所述算子节点对应的算子的运行方式。In one possible implementation, the determination unit 401 is also used to: receive input information and determine the M operator nodes and the configuration options corresponding to each of the operator nodes based on the input information; wherein the configuration options include one of target configuration parameters, default configuration parameters, and tuning configuration parameters; the configuration unit 402 is specifically used to: respectively call the N types of operators through the operator interfaces corresponding to each type of operator in the N types of operators; and configure the operation mode of the operator corresponding to each of the M operator nodes in the algorithm model based on the configuration options corresponding to each of the operator nodes.
在一种可能实现的方式中,所述装置还包括:编译单元404,用于基于所述M个算子的运行方式,编译并运行所述算法模型。In a possible implementation manner, the device further includes: a compiling unit 404, configured to compile and run the algorithm model based on the operating modes of the M operators.
需要说明的是,本申请实施例中所描述的算子运行方式的配置装置60中各功能单元的功能可参见上述图3中所述的算子运行方式的配置方法实施例中步骤S201-步骤S204的相关描述,此处不再赘述。It should be noted that the functions of each functional unit in the operator operation mode configuration device 60 described in the embodiment of the present application can be referred to step S201 in the embodiment of the operator operation mode configuration method described in Figure 3. The relevant description of S204 will not be repeated here.
本申请实施例提供一种计算机存储介质,用于储存为上述相关实施例提供的一种算子运行方式的配置装置所用的计算机软件指令,其包含用于执行上述方面所设计的程序。Embodiments of the present application provide a computer storage medium for storing computer software instructions used for configuring a device for an operator operation mode provided in the above-mentioned related embodiments, which includes a program designed for executing the above aspect.
本申请实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被计算机执行时,使得计算机可以执行上述相关实施例中的算子运行方式的配置装置所执行的流程。An embodiment of the present application provides a computer program. The computer program includes instructions. When the computer program is executed by a computer, the computer can execute the process executed by the configuration device of the operator operation mode in the above related embodiments.
本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持终端设备实现上述相关实施例中所涉及的功能,例如,生成或处理上述算子运行方式的配置方法中所涉及的信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存数据发送设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。The present application provides a chip system. The chip system includes a processor and is used to support the terminal device to implement the functions involved in the above-mentioned related embodiments, for example, generating or processing the information involved in the configuration method of the above-mentioned operator operation mode. . In a possible design, the chip system further includes a memory, and the memory is used to store necessary program instructions and data for the data sending device. The chip system may be composed of chips, or may include chips and other discrete devices.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above embodiments, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实 施例,所涉及的动作和模块并不一定是本申请所必须的。It should be noted that for the sake of simple description, the foregoing method embodiments are expressed as a series of action combinations. However, those skilled in the art should know that the present application is not limited by the described action sequence. Because according to this application, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are preferred implementations. The actions and modules involved in the embodiment are not necessarily necessary for this application.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the above units is only a logical function division. In actual implementation, there may be other divisions. For example, multiple units or components may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical or other forms.
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described above as separate components may or may not be physically separated. The components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.
上述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以为个人计算机、服务端或者网络设备等,具体可以是计算机设备中的处理器)执行本申请各个实施例上述方法的全部或部分步骤。其中,而前述的存储介质可包括:U盘、移动硬盘、磁碟、光盘、只读存储器(Read-Only Memory,缩写:ROM)或者随机存取存储器(Random Access Memory,缩写:RAM)等各种可以存储程序代码的介质。If the above-mentioned integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which can be a personal computer, a server, a network device, etc., specifically a processor in a computer device) to execute all or part of the steps of the above methods in various embodiments of the present application. Among them, the aforementioned storage media may include: U disk, mobile hard disk, magnetic disk, optical disk, read-only memory (Read-Only Memory, abbreviation: ROM) or random access memory (Random Access Memory, abbreviation: RAM), etc. A medium on which program code can be stored.
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。 As mentioned above, the above embodiments are only used to illustrate the technical solution of the present application, but not to limit it. Although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that they can still make the foregoing technical solutions. The technical solutions described in each embodiment may be modified, or some of the technical features thereof may be equivalently replaced; however, these modifications or substitutions shall not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of each embodiment of the present application.

Claims (15)

  1. 一种算子运行方式的配置方法,其特征在于,包括:A method for configuring operator operation mode, which is characterized by including:
    基于算法模型确定M个算子节点对应的M个算子,其中,一个算子节点对应一个算子,每个所述算子属于N类算子中的一类,每类所述算子分别对应多种运行方式,每种所述运行方式之间的运算精度和/或运算速度不同;M为大于或等于N的整数,N为大于或等于1的整数;M operators corresponding to M operator nodes are determined based on the algorithm model. Among them, one operator node corresponds to one operator. Each of the operators belongs to one of the N types of operators. The operators of each type are respectively Corresponding to multiple operation modes, the operation accuracy and/or operation speed between each operation mode are different; M is an integer greater than or equal to N, and N is an integer greater than or equal to 1;
    基于每类所述算子对应的配置参数或每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式;其中,所述配置参数或所述配置选项用于指示对应算子的运行方式。Based on the configuration parameters corresponding to each type of the operator or the configuration options corresponding to each of the operator nodes, the operation modes of the M operators in the algorithm model are configured; wherein the configuration parameters or the configuration options are used to indicate the operation mode of the corresponding operator.
  2. 根据权利要求1所述方法,其特征在于,所述方法还包括:The method according to claim 1, characterized in that, the method further includes:
    在所述N类算子中每类算子对应的算子接口中设置每类算子对应的所述配置参数;Set the configuration parameters corresponding to each type of operator in the operator interface corresponding to each type of operator in the N types of operators;
    所述基于每类所述算子对应的配置参数,分别配置所述算法模型中所述M个算子的运行方式,包括:The operation mode of the M operators in the algorithm model is respectively configured based on the configuration parameters corresponding to each type of operator, including:
    通过所述N类算子中每类算子对应的算子接口,分别调用所述N类算子并获取对应的所述配置参数;Through the operator interface corresponding to each type of operator in the N types of operators, call the N types of operators respectively and obtain the corresponding configuration parameters;
    基于所述N类算子中每类所述算子对应的配置参数,配置所述算法模型中所述N类算子中每类所述算子分别对应的运行方式。Based on the configuration parameters corresponding to each type of the N types of operators, configure the corresponding operating mode of each type of the N types of operators in the algorithm model.
  3. 根据权利要求1或2所述方法,其特征在于,所述配置参数为目标配置参数、默认配置参数、调优配置参数中的一种;The method according to claim 1 or 2, characterized in that the configuration parameters are one of target configuration parameters, default configuration parameters, and tuning configuration parameters;
    其中,所述目标配置参数用于指示对应算子以目标运行方式运行,所述目标运行方式为所述多种运行方式中的一种;The target configuration parameter is used to instruct the corresponding operator to run in a target operation mode, and the target operation mode is one of the multiple operation modes;
    所述默认配置参数用于指示对应算子以最高运算精度的运行方式或最快运算速度的运行方式运行;The default configuration parameters are used to instruct the corresponding operator to operate in the operation mode of the highest computing accuracy or the operation mode of the fastest computing speed;
    所述调优配置参数用于指示对应算子以最优运行方式运行,所述最优运行方式为基于所述算子模型确定的与所述最高运算精度的运行方式的运算结果误差在预设阈值范围内且运算速度最快的一种运行方式。The tuning configuration parameters are used to instruct the corresponding operator to run in the optimal operation mode. The optimal operation mode is the preset error between the operation result determined based on the operator model and the operation mode with the highest operation accuracy. An operation mode that is within the threshold range and has the fastest operation speed.
  4. 根据权利要求1-3任意一项所述方法,其特征在于,所述配置参数包括对应运行方式的配置路径;The method according to any one of claims 1-3, characterized in that the configuration parameters include a configuration path corresponding to the operating mode;
    所述基于所述N类算子中每类所述算子对应的配置参数,配置所述算法模型中所述N类算子中每类所述算子分别对应的运行方式,包括:Configuring the operation mode corresponding to each type of the N types of operators in the algorithm model based on the configuration parameters corresponding to each type of the N types of operators includes:
    基于所述N类算子中每类所述算子对应的配置参数,获取所述N类算子中每类所述算子对应运行方式的配置路径;Based on the configuration parameters corresponding to each type of the N types of operators, obtain a configuration path corresponding to the operation mode of each type of the N types of operators;
    基于所述配置路径读取对应的配置文件,以配置所述对应类型算子在所述算法模型运算过程中的运行方式。The corresponding configuration file is read based on the configuration path to configure the operation mode of the corresponding type of operator during the operation of the algorithm model.
  5. 根据权利要求1所述方法,其特征在于,所述基于算法模型确定M个算子节点对应的M个算子,包括:The method according to claim 1, characterized in that determining M operators corresponding to M operator nodes based on an algorithm model includes:
    接收输入信息并基于所述输入信息确定所述M个算子节点以及每个所述算子节点对应的所述配置选项;其中,所述配置选项包括目标配置参数、默认配置参数、调优配置参数中的一种;Receive input information and determine the M operator nodes and the configuration options corresponding to each of the operator nodes based on the input information; wherein the configuration options include one of target configuration parameters, default configuration parameters, and tuning configuration parameters;
    所述基于每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式,包括:Configuring the operation mode of the M operators in the algorithm model based on the configuration options corresponding to each operator node includes:
    通过所述N类算子中每类算子对应的算子接口,分别调用所述N类算子;Call the N types of operators respectively through the operator interface corresponding to each type of operator in the N types of operators;
    基于每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子节点中每个所述算子节点对应的算子的运行方式。Based on the configuration options corresponding to each operator node, configure the operating mode of the operator corresponding to each of the M operator nodes in the algorithm model.
  6. 根据权利要求1-5任意一项所述方法,其特征在于,所述方法还包括:The method according to any one of claims 1-5, characterized in that the method further includes:
    基于所述M个算子的运行方式,编译并运行所述算法模型。Based on the operation mode of the M operators, the algorithm model is compiled and run.
  7. 一种算子运行方式的配置装置,其特征在于,包括:A configuration device for operator operation mode, which is characterized by including:
    确定单元,用于基于算法模型确定M个算子节点对应的M个算子,其中,一个算子节点对应一个算子,每个所述算子属于N类算子中的一类,每类所述算子分别对应多种运行方式,每种所述运行方式之间的运算精度和/或运算速度不同;M为大于或等于N的整数,N为大于或等于1的整数;Determination unit, used to determine M operators corresponding to M operator nodes based on the algorithm model, where one operator node corresponds to one operator, and each operator belongs to one of the N types of operators. Each type The operators respectively correspond to multiple operation modes, and the operation accuracy and/or operation speed between each operation mode are different; M is an integer greater than or equal to N, and N is an integer greater than or equal to 1;
    配置单元,用于基于每类所述算子对应的配置参数或每个所述算子节点对应的配置选项,配置所述算 法模型中所述M个算子的运行方式;其中,所述配置参数或所述配置选项用于指示对应算子的运行方式。A configuration unit configured to configure the operator based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node. The operation mode of the M operators in the method model; wherein the configuration parameters or the configuration options are used to indicate the operation mode of the corresponding operator.
  8. 根据权利要求7所述装置,其特征在于,所述装置还包括:The device according to claim 7, characterized in that, the device further includes:
    设置单元,用于在所述N类算子中每类算子对应的算子接口中设置每类算子对应的所述配置参数;A setting unit configured to set the configuration parameters corresponding to each type of operator in the operator interface corresponding to each type of operator in the N types of operators;
    所述配置单元,具体用于:The configuration unit is specifically used for:
    通过所述N类算子中每类算子对应的算子接口,分别调用所述N类算子并获取对应的所述配置参数;Through the operator interface corresponding to each type of operator in the N types of operators, call the N types of operators respectively and obtain the corresponding configuration parameters;
    基于所述N类算子中每类所述算子对应的配置参数,配置所述算法模型中所述N类算子中每类所述算子分别对应的运行方式。Based on the configuration parameters corresponding to each type of the N types of operators, configure the corresponding operating mode of each type of the N types of operators in the algorithm model.
  9. 根据权利要求7或8所述装置,其特征在于,所述配置参数为目标配置参数、默认配置参数、调优配置参数中的一种;The device according to claim 7 or 8, characterized in that the configuration parameters are one of target configuration parameters, default configuration parameters, and tuning configuration parameters;
    其中,所述目标配置参数用于指示对应算子以目标运行方式运行,所述目标运行方式为所述多种运行方式中的一种;Wherein, the target configuration parameter is used to instruct the corresponding operator to run in a target operation mode, and the target operation mode is one of the multiple operation modes;
    所述默认配置参数用于指示对应算子以最高运算精度的运行方式或最快运算速度的运行方式运行;The default configuration parameters are used to instruct the corresponding operator to operate in the operation mode of the highest computing accuracy or the operation mode of the fastest computing speed;
    所述调优配置参数用于指示对应算子以最优运行方式运行,所述最优运行方式为基于所述算子模型确定的与所述最高运算精度的运行方式的运算结果误差在预设阈值范围内且运算速度最快的一种运行方式。The tuning configuration parameters are used to instruct the corresponding operator to run in the optimal operation mode. The optimal operation mode is the preset error between the operation result determined based on the operator model and the operation mode with the highest operation accuracy. An operation mode that is within the threshold range and has the fastest operation speed.
  10. 根据权利要求7-9任意一项所述装置,其特征在于,所述配置参数包括对应运行方式的配置路径;The device according to any one of claims 7-9, wherein the configuration parameters include a configuration path corresponding to the operating mode;
    所述配置单元,具体用于:The configuration unit is specifically used for:
    基于所述N类算子中每类所述算子对应的配置参数,获取所述N类算子中每类所述算子对应运行方式的配置路径;Based on the configuration parameters corresponding to each type of operator in the N types of operators, obtain the configuration path corresponding to the operating mode of each type of operator in the N types of operators;
    基于所述配置路径读取对应的配置文件,以配置所述对应类型算子在所述算法模型运算过程中的运行方式。The corresponding configuration file is read based on the configuration path to configure the operation mode of the corresponding type of operator during the operation of the algorithm model.
  11. 根据权利要求7所述装置,其特征在于,所述确定单元,还用于:The device according to claim 7, characterized in that the determining unit is also used to:
    接收输入信息并基于所述输入信息确定所述M个算子节点以及每个所述算子节点对应的所述配置选项;其中,所述配置选项包括目标配置参数、默认配置参数、调优配置参数中的一种;Receive input information and determine the M operator nodes and the configuration options corresponding to each of the operator nodes based on the input information; wherein the configuration options include one of target configuration parameters, default configuration parameters, and tuning configuration parameters;
    所述配置单元,具体用于:The configuration unit is specifically used for:
    通过所述N类算子中每类算子对应的算子接口,分别调用所述N类算子;Call the N types of operators respectively through the operator interface corresponding to each type of operator in the N types of operators;
    基于每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子节点中每个所述算子节点对应的算子的运行方式。Based on the configuration options corresponding to each operator node, configure the operating mode of the operator corresponding to each of the M operator nodes in the algorithm model.
  12. 根据权利要求7-11任意一项所述装置,其特征在于,所述装置还包括:The device according to any one of claims 7-11, characterized in that the device further includes:
    编译单元,用于基于所述M个算子的运行方式,编译并运行所述算法模型。The compilation unit is used to compile and run the algorithm model based on the operation mode of the M operators.
  13. 一种算子运行方式的配置系统,其特征在于,包括:A configuration system for operator operation mode, which is characterized by including:
    模型编译器,用于基于算法模型确定M个算子节点对应的M个算子,其中,一个算子节点对应一个算子,每个所述算子属于N类算子中的一类,每类所述算子分别对应多种运行方式,每种所述运行方式之间的运算精度和/或运算速度不同;M为大于或等于N的整数,N为大于或等于1的整数;The model compiler is used to determine M operators corresponding to M operator nodes based on the algorithm model, where one operator node corresponds to one operator, and each of the operators belongs to one of the N types of operators. The operators described in this class respectively correspond to multiple operating modes, and the operation accuracy and/or operation speed between each of the operation modes are different; M is an integer greater than or equal to N, and N is an integer greater than or equal to 1;
    算子编译器,用于基于每类所述算子对应的配置参数或每个所述算子节点对应的配置选项,配置所述算法模型中所述M个算子的运行方式;其中,所述配置参数或所述配置选项用于指示对应算子的运行方式。An operator compiler is used to configure the operation mode of the M operators in the algorithm model based on the configuration parameters corresponding to each type of operator or the configuration options corresponding to each operator node; wherein, The above configuration parameters or the above configuration options are used to indicate the operation mode of the corresponding operator.
  14. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序,当所述计算机程序被计算机或处理器执行时,使得所述计算机或所述处理器执行如权利要求1-6中任意一项所述的方法。A computer program product, characterized in that the computer program product includes a computer program. When the computer program is executed by a computer or a processor, the computer or the processor causes the computer or the processor to execute any of claims 1-6. The method described in one item.
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序被计算机或处理器执行时实现上述权利要求1-6任意一项所述的方法。 A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program, and when the computer program is executed by a computer or processor, the method described in any one of claims 1-6 is implemented.
PCT/CN2023/114411 2022-09-22 2023-08-23 Operator operation mode configuration method and apparatus, and related system WO2024060916A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211155932.4A CN117785260A (en) 2022-09-22 2022-09-22 Operator operation mode configuration method, device and related system
CN202211155932.4 2022-09-22

Publications (1)

Publication Number Publication Date
WO2024060916A1 true WO2024060916A1 (en) 2024-03-28

Family

ID=90391565

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/114411 WO2024060916A1 (en) 2022-09-22 2023-08-23 Operator operation mode configuration method and apparatus, and related system

Country Status (2)

Country Link
CN (1) CN117785260A (en)
WO (1) WO2024060916A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458294A (en) * 2019-08-19 2019-11-15 Oppo广东移动通信有限公司 Model running method, apparatus, terminal and storage medium
CN110569984A (en) * 2019-09-10 2019-12-13 Oppo广东移动通信有限公司 configuration information generation method, device, equipment and storage medium
CN110750298A (en) * 2019-10-29 2020-02-04 南京星环智能科技有限公司 AI model compiling method, equipment and storage medium
US20220207330A1 (en) * 2020-12-30 2022-06-30 Qatar University Operational neural networks and self-organized operational neural networks with generative neurons

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458294A (en) * 2019-08-19 2019-11-15 Oppo广东移动通信有限公司 Model running method, apparatus, terminal and storage medium
CN110569984A (en) * 2019-09-10 2019-12-13 Oppo广东移动通信有限公司 configuration information generation method, device, equipment and storage medium
CN110750298A (en) * 2019-10-29 2020-02-04 南京星环智能科技有限公司 AI model compiling method, equipment and storage medium
US20220207330A1 (en) * 2020-12-30 2022-06-30 Qatar University Operational neural networks and self-organized operational neural networks with generative neurons

Also Published As

Publication number Publication date
CN117785260A (en) 2024-03-29

Similar Documents

Publication Publication Date Title
US11106437B2 (en) Lookup table optimization for programming languages that target synchronous digital circuits
CN113641701B (en) Data query method, system, heterogeneous acceleration platform and storage medium
US11775269B2 (en) Generating a synchronous digital circuit from a source code construct defining a function call
US20130139137A1 (en) Systems and Methods for Customizing Optimization/Transformation/ Processing Strategies
US8880573B2 (en) System and method of dynamic precision operations
US20150135171A1 (en) Information processing apparatus and compilation method
JP2019523942A (en) Query optimizer for CPU usage and code refactoring
CN105204837B (en) Method and device for realizing logic programming
US20230350653A1 (en) Computational Graph Optimization Method and Apparatus
CN118350325B (en) Method for optimizing digital logic circuit, computer device and storage medium
CN108920149B (en) Compiling method and compiling device
CN111984300B (en) Code copying method and device, electronic equipment and computer readable storage medium
WO2024060916A1 (en) Operator operation mode configuration method and apparatus, and related system
CN109284222B (en) Software unit, project testing method, device and equipment in data processing system
CN111736870B (en) Industrial camera adaptation method
WO2023207973A1 (en) Compiler test method and apparatus, case generation method and apparatus, and instruction storage structure
CN114756211B (en) Model training method and device, electronic equipment and storage medium
CN113050952B (en) Pseudo instruction compiling method, pseudo instruction compiling device, computer equipment and storage medium
CN112540750B (en) Self-adaptive built-in function and instruction operation selection translation method
CN110209397B (en) Data processing method, device and system
CN114443042A (en) Service arrangement execution method based on rule engine and related equipment
US11188324B2 (en) Methods, systems, and articles of manufacture to perform heterogeneous data structure selection via programmer annotations
CN111782183B (en) Method and device for judging component dependency, electronic device and medium
CN115951936B (en) Chip adaptation method, device, equipment and medium of vectorization compiler
CN113031914B (en) Floating point rounding mode control method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23867209

Country of ref document: EP

Kind code of ref document: A1