CN111580826B

CN111580826B - Compiling optimization method and device of machine learning model

Info

Publication number: CN111580826B
Application number: CN202010366299.8A
Authority: CN
Inventors: 姜曦楠; 朱子霖; 周飞虎; 郭振宇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2021-08-06
Anticipated expiration: 2040-04-30
Also published as: CN111580826A

Abstract

The invention discloses a compiling optimization method and device of a machine learning model. Wherein, the method comprises the following steps: the method comprises the steps of running a first machine learning model for multiple times to process a group of preheating data, obtaining the shape of an input tensor and an output tensor of each operation in a first group of operations in the process of running the first machine learning model for multiple times, and comparing whether the shape of the input tensor and the shape of the output tensor of each operation in the first group of operations in the process of running the first machine learning model for multiple times are changed or not; determining an operation in which the shapes of the input tensor and the output tensor are not changed in the process of running the first machine learning model for multiple times as a stable operation; in the case that a plurality of stable operations are included in a first set of operations, dividing the plurality of stable operations into one or more compilation regions; and merging the stable operation on the target compiling region in the first machine learning model through the target compiler to obtain a second machine learning model.

Description

Compiling optimization method and device of machine learning model

Technical Field

The invention relates to the field of artificial intelligence, in particular to a compiling optimization method and device of a machine learning model.

Background

The compilation optimization in the machine learning system has two main technologies of compilation in advance and just-in-time compilation. When the existing machine learning system is compiled, sampling information in runtime is not introduced, and because the change information of tensor shape in runtime is not sampled, recompilation of most operations of a calculation graph is caused due to the change of the shape of part of the operation tensor in the process of multiple iterations. Such recompilation is very time consuming and memory consuming.

Aiming at the problems of long compiling period and resource waste caused by the fact that sampling information in operation is not introduced in the compiling process of machine learning in the related technology, an effective solution scheme does not exist at present

Disclosure of Invention

The embodiment of the invention provides a compiling optimization method and a compiling optimization device of a machine learning model, which at least solve the technical problems of long compiling period and resource waste caused by the fact that sampling information in running is not introduced during compiling in machine learning.

According to an aspect of the embodiments of the present invention, there is provided a compilation optimization method for a machine learning model, including: running a first machine learning model a plurality of times to process a set of warm-up data, obtaining a shape of an input tensor and an output tensor of each operation in a first set of operations in the process of running the first machine learning model the plurality of times, and comparing whether the shape of the input tensor and the output tensor of each operation in the first set of operations in the process of running the first machine learning model the plurality of times is changed, wherein the first machine learning model comprises a first set of operations, and the operations in the first set of operations are operations which are allowed to be processed by a target compiler; determining an operation in which neither the shapes of the input tensor nor the output tensor have changed during the plurality of runs of the first machine learning model as a stable operation; in the case that a plurality of the stable operations are included in the first set of operations, dividing the plurality of the stable operations into one or more compilation regions; merging the stable operations on a target compiling region in the first machine learning model through the target compiler to obtain a second machine learning model, wherein the target compiling region is a compiling region in the one or more compiling regions, the stable operations on the target compiling region are multiple, the second machine learning model comprises a second group of operations, and the number of the operations in the second group of operations is smaller than that of the operations in the first group of operations.

Optionally, the running the first machine learning model a plurality of times processes a set of warm-up data, obtains a shape of an input tensor and an output tensor of each operation in a first set of operations in the running the first machine learning model a plurality of times, and compares whether the shape of the input tensor and the output tensor of each operation in the first set of operations in the running the first machine learning model a plurality of times changes, including: running the first machine learning model once each time, recording an input tensor and an output tensor for each operation in the first set of operations during the running of the first machine learning model; starting with a second run of the first machine learning model, each time the first machine learning model is run, comparing the input tensor and output tensor of each of the first set of operations currently recorded with the shape of the input tensor and output tensor of each of the first set of operations last recorded.

Optionally, the running the first machine learning model a plurality of times processes a set of warm-up data, obtains a shape of an input tensor and an output tensor of each operation in a first set of operations in the running the first machine learning model a plurality of times, and compares whether the shape of the input tensor and the output tensor of each operation in the first set of operations in the running the first machine learning model a plurality of times changes, including: recording, for each run of the first machine learning model, an input tensor and an output tensor for the first set of operations marked as unchanged operations during the run of the first machine learning model, wherein each operation in the first set of operations is marked as unchanged operations during the first run of the first machine learning model; starting from the second operation of the first machine learning model, comparing the input tensor and the output tensor marked as unchanged operation in the first group of operation recorded at the current time with the shapes of the input tensor and the output tensor recorded at the last time every time the first machine learning model is operated, marking the operation with changed shape in the input tensor and/or the output tensor in the operation marked as unchanged as changed operation, and continuing the operation with unchanged shape in the input tensor and the output tensor in the operation marked as unchanged operation; the determining that neither the shapes of the input tensor nor the output tensor have changed during the running of the first machine learning model a plurality of times is a stable operation, including: after running the first machine learning model multiple times, determining that the operation marked as changed is the deformation operation, and the operation marked as unchanged is the stable operation.

Optionally, the merging, by the target compiler, the stable operation on the target compilation region in the first machine learning model includes: and merging the stable operation on the target compiling region into one operation through the target compiler, and converting the input tensor and the output tensor of the stable operation into the input tensor and the output tensor of the one operation to obtain the second machine learning model.

Optionally, the set of warm-up data includes a plurality of warm-up data sets, wherein the running the first machine learning model a plurality of times processes the set of warm-up data, and comparing whether a shape of an input tensor and an output tensor of each operation in the first set of operations changes during the running the first machine learning model a plurality of times includes: repeatedly executing the following steps for a plurality of times: respectively inputting a plurality of input tensors in one preheating data set into partial or all operations in the first group of operations, and running the first machine learning model once to obtain an input tensor and an output tensor of each operation in the first group of operations; comparing whether a shape of an input tensor and an output tensor of each operation in the first set of operations changes during the plurality of runs of the first machine learning model.

Optionally, the running the first machine learning model multiple times processes a set of warm-up data, and comparing whether a shape of an input tensor and an output tensor of each operation in the first set of operations changes during the running the first machine learning model multiple times includes: repeatedly executing the following steps for a plurality of times: respectively inputting a plurality of input tensors in the group of preheating data into partial or all operations in the first group of operations, and running the first machine learning model once to obtain an input tensor and an output tensor of each operation in the first group of operations; comparing whether a shape of an input tensor and an output tensor of each operation in the first set of operations changes during the plurality of runs of the first machine learning model.

Optionally, the method further comprises: acquiring running state information of each operation in the first group of operations in the process of processing a group of preheating data by running the first machine learning model for multiple times, wherein the running state information comprises running time and/or required running resources of each operation; and allocating corresponding running resources to each operation in the second group of operations according to the running state information.

Optionally, after the obtaining the second machine learning model, the method further comprises: inputting formal data to the second machine learning model, wherein the formal data includes a set of formal input tensors; and compiling the second machine learning model through the target compiler to obtain a processing result output by the second machine learning model after the second machine learning model processes the set of formal input tensors.

Optionally, the dividing the plurality of the stable operations into one or more compiling regions includes: dividing the stable operation connected with each other in the stable operations into one compiling area to obtain the one or more compiling areas.

According to another aspect of the embodiments of the present invention, there is also provided a compiling optimization device for a machine learning model, including: the processing module is used for running a first machine learning model for multiple times to process a group of preheating data, acquiring the shapes of an input tensor and an output tensor of each operation in a first group of operations in the process of running the first machine learning model for multiple times, and comparing whether the shapes of the input tensor and the output tensor of each operation in the first group of operations in the process of running the first machine learning model for multiple times are changed or not, wherein the first machine learning model comprises a first group of operations, and the operations in the first group of operations are operations which are allowed to be processed by a target compiler; a determining module configured to determine, as a stable operation, an operation in which shapes of both an input tensor and an output tensor are unchanged in the process of running the first machine learning model a plurality of times; a dividing module, configured to divide a plurality of the stable operations into one or more compilation regions if the first group of operations includes the plurality of the stable operations; a merging module, configured to merge, by the target compiler, the stable operations on a target compilation region in the first machine learning model to obtain a second machine learning model, where the target compilation region is a compilation region in the one or more compilation regions, the stable operations on the target compilation region are multiple, the second machine learning model includes a second group of operations, and a number of the operations in the second group of operations is smaller than a number of the operations in the first group of operations.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, where the computer program is configured to execute the compiling optimization method of the machine learning model when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the compiling optimization method of the machine learning model through the computer program.

In the embodiment of the invention, a first machine learning model is run for multiple times to process a group of preheating data, the shape of an input tensor and an output tensor of each operation in a first group of operations in the process of running the first machine learning model for multiple times is obtained, and whether the shape of the input tensor and the shape of the output tensor of each operation in the first group of operations in the process of running the first machine learning model for multiple times are changed or not is compared; determining an operation in which the shapes of the input tensor and the output tensor are not changed in the process of running the first machine learning model for multiple times as a stable operation; in the case that a plurality of the stable operations are included in the first set of operations, dividing the plurality of the stable operations into one or more compilation regions; and merging stable operations on a target compiling region in the first machine learning model through the target compiler to obtain a second machine learning model, wherein the target compiling region is one or more compiling regions in the compiling region, the number of the stable operations on the target compiling region is multiple, the second machine learning model comprises a second group of operations, and the number of the operations in the second group of operations is smaller than that of the operations in the first group of operations. The tensor information of the machine learning model during compiling is introduced, and the machine learning model is optimized, so that the technical effect of improving the compiling efficiency of the machine learning model is achieved, and the technical problems that the compiling period is long and resources are wasted due to the fact that sampling information during compiling is not introduced during machine learning are solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a schematic diagram of a hardware environment for a method of compilation optimization of a machine learning model according to an embodiment of the invention;

FIG. 2 is a diagram of an application framework of an alternative method for compiling optimization of a machine learning model, according to an embodiment of the present invention;

FIG. 3 is a flow diagram of a method for compilation optimization of a machine learning model according to an embodiment of the present invention;

FIG. 4 is a first diagram illustrating a compilation optimization method of a machine learning model according to an alternative embodiment of the present invention;

FIG. 5 is a diagram of a compilation optimization method of a machine learning model according to an alternative embodiment of the present invention;

FIG. 6 is a diagram of a method for compiling an optimization of a machine learning model according to an alternative embodiment of the invention;

FIG. 7 is a diagram of a fourth method for compiling an optimization of a machine learning model according to an alternative embodiment of the present invention;

FIG. 8 is a diagram of a fifth method for compiling an optimization of a machine learning model, according to an alternative embodiment of the present invention;

FIG. 9 is a diagram illustrating a sixth method for compiling an optimization model according to an alternative embodiment of the invention;

FIG. 10 is a diagram of a method for compilation optimization of a machine learning model according to an alternative embodiment of the present invention;

FIG. 11 is a block diagram of an apparatus for compiling optimization of a machine learning model according to an embodiment of the present invention;

fig. 12 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiments of the present invention, a compilation optimization method for a machine learning model is provided, and optionally, as an optional implementation manner, the compilation optimization method for a machine learning model may be applied, but not limited, to an environment as shown in fig. 1. As shown in fig. 1, a server 102 is connected to a terminal 104 via a network including, but not limited to: the terminal 104 is not limited to a PC, a mobile phone, a tablet computer, etc. in a wide area network, a metropolitan area network, or a local area network. The method for displaying the target style in the browser according to the embodiment of the present invention may be executed by the server 102, or may be executed by the terminal 104, or may be executed by both the server 102 and the terminal 104.

As an alternative embodiment, the compiling optimization method of the machine learning model can be applied to an optimization framework as shown in fig. 2.

The above is merely an example, and this is not limited in this embodiment.

Optionally, as an optional implementation manner, as shown in fig. 3, the compiling optimization method of the machine learning model includes the following steps:

step S302, a first machine learning model is run for multiple times to process a group of preheating data, the shape of an input tensor and an output tensor of each operation in a first group of operations in the process of running the first machine learning model for multiple times is obtained, and whether the shape of the input tensor and the shape of the output tensor of each operation in the first group of operations in the process of running the first machine learning model for multiple times are changed or not is compared, wherein the first machine learning model comprises a first group of operations, and the operations in the first group of operations are operations which are allowed to be processed by a target compiler;

step S304 of determining, as a stable operation, an operation in which neither the shapes of the input tensor nor the output tensor changes in the process of running the first machine learning model a plurality of times;

step S306, under the condition that the first group of operations comprises a plurality of stable operations, dividing the stable operations into one or a plurality of compiling areas;

step S308, merging the stable operations on the target compiling region in the first machine learning model through the target compiler to obtain a second machine learning model, where the target compiling region is a compiling region in the one or more compiling regions, the stable operations on the target compiling region are multiple, the second machine learning model includes a second group of operations, and the number of operations in the second group of operations is less than the number of operations in the first group of operations.

As an alternative embodiment, the Tensor is a data structure, which can be understood as a vector or array matrix. Generally, a one-dimensional tensor is a vector, and a tensor of two or more dimensions is an array matrix. The shape of the tensor can refer to a dimension of an array, such as an array of m n dimensions.

Optionally, the method further includes: obtaining a first machine learning model to be compiled, wherein the first machine learning model comprises a group of preheating operations, and the input tensor and the output tensor of the group of preheating operations and the group of preheating operations form a first computation graph; determining a third operation that is not supported by the target compiler in the first computational graph; determining that an operation other than the third operation in the set of warm-up operations is the first set of operations.

As an alternative embodiment, fig. 4 is a block diagram of an alternative first computation graph. In the present embodiment, operation a, operation B, operation C, and operation D, and the input tensor and the output tensor (tensor A, B, C, D, E, G, F) of the respective operations constitute a first calculation diagram of the first machine learning model in fig. 4. In the first calculation graph, operation a, operation B, operation C, and operation D are sequentially connected in order, and the output tensor D of operation a serves as the input tensor of operation B.

As an alternative embodiment, the target compiler may be a JIT compiler. For JIT compilers, there are operations that are supported and unsupported. For example, in the case where operations a, B, C, and D included in the first computational graph shown in fig. 4 are all JIT compiler-supported compilation operations, the first set of operations of the first machine learning model includes: operation A, operation B, operation C, and operation D. For another example, in the first computational graph shown in fig. 4, where the JIT compiler supports only a portion of the operations for compiling, assuming that the JIT compiler supports compiling operations a, B, and C, but not operation D, the first set of operations of the first machine learning model includes: operation A, operation B, and operation C, while operation D, which is not supported by the JIT compiler, may be partitioned into a single compilation area first.

As an optional implementation manner, by running the first machine learning model multiple times, an output tensor and an output tensor of each operation in each running of the first computation graph can be obtained, and the input tensor and the output tensor of each operation obtained in different running of the first computation graph are compared to determine a stable operation and a deformation operation in the first group of operations. In this embodiment, taking the first computation graph shown in fig. 4 as an example, it is assumed that operation a, operation B, operation C, and operation D are all operations supported by the target compiler, and the first set of operations includes operation a, operation B, operation C, and operation D. When the first machine learning model is run for the first time, the input tensors and the output tensors of each operation in the first set of operations obtained are tensor a1, tensor B1, tensor C1, tensor D1, tensor E1, tensor G1, and tensor F1, respectively, and as shown in fig. 5, the first machine learning model is run for the first time. When the first machine learning model is run for the second time, the input tensor and the output tensor of each operation in the first set of operations obtained are tensor a2, tensor B2, tensor C2, tensor D2, tensor E2, tensor G2 and tensor F2, respectively, and as shown in fig. 6, the first computation graph is a structural graph of the first computation graph when the first machine learning model is run for the second time. Comparing the input and output tensors of the respective operation at the first run and the second run may determine whether the shape of the output or output tensor of the operation has changed. For example, in operation a in fig. 5 and 6, the tensor a1 and the tensor a2 are compared to determine whether the tensor shapes are the same, and if the tensor shapes are the same, the tensor shape of the input tensor a of operation a is not changed, and if the tensor shapes are different, the tensor shape of the input tensor of operation a is changed. By analogy, after comparing the corresponding input tensors and output tensors of the corresponding operations in fig. 5 and fig. 6 one by one, it can be obtained whether the input tensor and the output tensor of each operation in the first group of operations change or not in the process of running the first machine learning model for many times.

As an alternative embodiment, an operation in which the shapes of both the input tensor and the output tensor change during a plurality of runs is set as a stable operation. If there is a tensor whose shape has changed out of the input or output tensors, an operation in which the changed tensor is used as input or output is determined as a morphing operation. In the present embodiment, for example, in fig. 5 and 6, if the tensor shapes of the tensors E1 and E2 are different, the operation B in which the tensors E1 and E2 are output tensors is determined as the morphing operation, and the operation C in which the tensors E1 and E2 are input tensors is also determined as the morphing operation. For the determination of the stable operation, taking the first machine learning model is run twice as an example, for operation a in fig. 5 and 6, the tensor a1 is the same as the tensor shape of the tensor a2, the tensor B1 is the same as the tensor shape of the tensor B2, and the tensor D1 is the same as the tensor shape of the tensor D2, it is determined that operation a is the stable operation.

As an optional implementation manner, the stable operations in the first group of operations are merged and optimized in the same compiling region according to static information of the first computation graph, where the static information of the first computation graph is used to represent connection relationships between the operations in the first computation graph, and information of devices, environments, and the like. In this embodiment, taking operation a, operation B, operation C, and operation D in the first calculation graph shown in fig. 4 as an example, if after the first machine learning model is run for a plurality of times, operation a and operation B are determined to be stable operations, the optimized second calculation graph shown in fig. 7 is obtained.

As an optional implementation, the first machine learning model is run once to obtain an input tensor and an output tensor of each operation in the first group of operations, the obtained input tensor and output tensor of each operation are compared with the shapes of the input tensor and output tensor of the corresponding node when the first machine learning model is run last time, whether the output tensor and output tensor of each operation in the group of operations change or not is determined, so as to obtain model running information, and the operation in which the shapes of the input tensor, the input tensor and the output tensor do not change in the process of running the first machine learning model for multiple times is determined as the stable operation according to the model running information; determining an operation in which a shape of an input tensor and/or an output tensor changes during a plurality of runs of the first machine learning model as the morphing operation. In this embodiment, taking the first computation graph shown in fig. 4 as an example, it is assumed that operations A, B, C, D are all operations supported by the target compiler. Assume that the shape of each tensor in the first computational graph collected during the first run of the first machine learning model is TensorShapeA1, TensorShapeB1, TensorShapeC1, TensorShapeD1, TensorShapeE1, TensorShapeF1, TensorShapeG1, respectively. Where TensorShapeA1 represents the shape of the input tensor of operation A at the first run time in the first computational graph shown. TensorShaped1 represents the shape of the output tensor of operation A at the first run-time and the shape of the input tensor of operation B at the first run-time in the first computational graph. By analogy, TensorShapeB1, TensorShapeC1, TensorShapeE1, TensorShapeF1, TensorShapeG1 represent the shape of the input or output tensors, respectively, of the other individual operations in the first computational graph. In the second run of the first machine learning model, the tensors of the input or output of the operations collected in the first computational graph are TensorShapeA2, TensorShapeB2, TensorShapeC2, TensorShapeD2, TensorShapeE2, TensorShapeF2, TensorShapeG2, respectively. For example, a comparison of TensorShapeA1 with TensorShapeA2 can determine whether tensor A morphs. If the first machine learning model is executed for the third time, the shapes of the tensors of the inputs or outputs of the operations collected in the first computation graph during the third execution of the first machine learning model are TensorShapeA3, TensorShapeB3, TensorShapeC3, TensorShapeD3, TensorShapeE3, TensorShapeF3, TensorShapeG3, respectively. Taking the input tensor a of operation a as an example, if tensors of TensorShapeA1 ═ TensorShapeA2& & TensorShapeA1 ═ TensorShapeA3, that is, tensors of TensorShapeA1, TensorShapeA2, and TensorShapeA3 are the same in shape and are all m × n dimensions, the tensor a is determined to be a shape-stable tensor. If the input tensor A, the input tensor B and the output tensor D of the operation A are all shape-stable tensors, the operation A is determined to be stable operation, namely target operation.

Optionally, the running the first machine learning model a plurality of times processes a set of warm-up data, obtains a shape of an input tensor and an output tensor of each operation in a first set of operations in the running the first machine learning model a plurality of times, and compares whether the shape of the input tensor and the output tensor of each operation in the first set of operations in the running the first machine learning model a plurality of times changes, including: recording, for each run of the first machine learning model, an input tensor and an output tensor for the first set of operations marked as unchanged operations during the run of the first machine learning model, wherein each operation in the first set of operations is marked as unchanged operations during the first run of the first machine learning model; starting from the second operation of the first machine learning model, comparing the input tensor and the output tensor marked as unchanged operation in the first group of operation recorded at the current time with the shapes of the input tensor and the output tensor recorded at the last time every time the first machine learning model is operated, continuing to mark the operation marked as unchanged operation with unchanged shapes of the input tensor and the output tensor in the operation marked as unchanged operation, and marking the operation marked as unchanged operation with changed shapes in the input tensor and/or the output tensor in the operation marked as unchanged operation; the determining that neither the shapes of the input tensor nor the output tensor have changed during the running of the first machine learning model a plurality of times is a stable operation, including: after running the first machine learning model multiple times, determining that the operation marked as changed is the deformation operation, and the operation marked as unchanged is the stable operation.

As an alternative embodiment, the first time the first machine learning model is run, the input and output tensors of all operations are marked as operations that have not changed. And starting from the second operation of the first machine learning model, comparing the input tensor and the output tensor of each operation in the first group of operations at the current time with the input tensor and the output tensor of the corresponding operation at the last time, and if the input tensor or the output tensor of the operation changes, changing the mark of the operation into the changed operation. Each comparison compares only the input tensors and output tensors for the operation marked as unchanged from the corresponding operation at the last run. In the present embodiment, the first calculation diagrams shown in fig. 5 and 6 are taken as an example, and when the first machine learning model is run for the first time, the input/output tensors a1, B1, C1, D1, E1, F1, and G1 of the operations shown in fig. 5 are recorded, and a1, B1, C1, D1, E1, F1, and G1 are all marked as tensors that have not changed. The input and output tensors of the operations in the second operation of the first machine learning model in fig. 6 are recorded as a2, B2, C2, D2, E2, F2, and G2, and the tensor shapes of the same tensor in the two operations are compared to determine whether the shape of the tensor changes, and the changed operation is marked as the changed operation, for example, after a1 and a2 are compared, the shape of the tensor changes, so that the operation of marking the input tensor a as the changed operation is compared with the other tensors except for the tensor a in the next comparison, that is, only the tensors except for the tensor a are compared, so that the number of comparisons can be reduced, the operation flow can be simplified, and resources can be saved.

Optionally, merging, by the target compiler, the stable operations on the target compilation region in the first machine learning model, including: and merging the stable operation on the target compiling region into one operation through the target compiler, and converting the input tensor and the output tensor of the stable operation into the input tensor and the output tensor of the one operation to obtain the second machine learning model.

As an alternative embodiment, one or more stable operations may be included in a compilation area, and for a compilation area including a plurality of stable operations, the plurality of operations included in the compilation area may be merged into one operation, and the input and output tensors of the stable operations are converted into the input and output tensors of the merged operation.

As an alternative embodiment, the set of pre-heating data includes a plurality of pre-heating data sets, one pre-heating data set being input by the first machine learning model each time the first machine learning model is run. For example, a set of pre-heating data includes data set a, data set B, input data set a when a first machine learning model is run for the first time, and input data set B when the first machine learning model is run for the second time. In this embodiment, the shape of the partial tensors may be partially the same or partially different between the plurality of data sets. For a certain operation in the computation graph, if the shapes of the tensors input at different iteration runtimes are different, the operation is a deformable operation. For example, the first time the first machine learning model is run through dataset a, the tensor of operation a is tensor a 1. When the first machine learning model is run for the second time through the data set B, the input tensor of the operation A is B1, and if the A1 and the B1 are different, the operation A can be determined to be a deformable operation.

As an alternative, the preheating data is input once per run of the first machine model, i.e. the data input is the same for different runs. Respectively inputting a plurality of input tensors in the preheating data into part or all of operations in the first calculation graph, obtaining the input tensor and the output tensor of each operation in the first calculation graph after the first calculation graph is operated once, comparing whether the shapes of the input tensor and the output tensor of each operation change between different operation times, determining that the changed operation is a deformable operation, and determining that the input tensor and the output tensor are both changed operations which are stable. For example, the input is a set of preheating numbers a when the first machine learning model is run twice, and taking operation a as an example, since the tensors of the input are the same when the first machine learning model is run twice, it is only necessary to determine whether the output tensor of operation a changes, that is, to determine whether the tensor D1 output when operation a is run for the first time is the same as the tensor D2 output when operation a is run for the second time. If so, determining operation A as stable operation. If not, determining that the operation A is an operation easy to deform.

As an optional implementation manner, the running state information collected in the process of repeatedly and iteratively executing the first machine learning model further includes a running time and a running resource of each operation execution operation, and according to a resource required by each operation, a running resource is allocated to each operation in the optimized second operation group, where the allocated resource may be a memory resource. In this embodiment, operations included in the first computation graph in the first machine learning model may also be optimized in other manners, for example, operations with large resource requirements may be converted into operations with small resource requirements, and operations with long time consumption may be converted into operations with short time consumption. For example, the scheduling of resources may be optimized according to resource information such as bandwidth and latency of the current network, how many streamable processors are in the device, the model of the device, and the size of the buffer and/or video memory of the device. For example, when the network condition is good, a part of the operators with large computation consumption can be converted into the operators with large network consumption. On the contrary, if the network condition is poor, preprocessing can be performed before transmission, the amount of compressed data is transmitted, received or decompressed, and the network consumption is saved by using a compression operation area with high calculation consumption. Or for example, a multiplication operation of a x a requires two inputs, which can be translated into a single operation of the square of a if the two operands are analyzed to be the same, so that the input storage space requirement is reduced.

As an optional implementation, the second machine learning model is obtained after the first machine learning model is optimized. And inputting the formal data into the second machine learning model, and compiling through the target compiler to obtain an output result.

As an alternative embodiment, the first computation graph shown in fig. 8 is taken as an example for explanation, and it is assumed that operation a, operation B, operation C, operation D, operation E, and operation F are all operations supported by JIT compilation, and operation a, operation B, operation C, operation D, operation E, and operation F are connected in sequence. In this embodiment, if the operation A, B is a stable operation and the rest are easily deformable operations, the stable operations a and B are divided into the same compiling area, and the stable operations a and B are merged to obtain an operation AB, and the obtained optimized second calculation graph is shown in fig. 9.

If operation A, B, D, E, F is a stable operation, then operation a and operation B are divided into the same coding region, operation D, E, F is divided into another coding region, operation a and operation B are merged into operation AB, and operation D, E, F is merged into another operation DEF, and the optimized second computation graph is shown in fig. 10.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

According to another aspect of the embodiment of the present invention, there is also provided a compilation optimization device of a machine learning model for implementing the compilation optimization method of a machine learning model. As shown in fig. 11, the apparatus includes: a processing module 1102, configured to run a first machine learning model multiple times to process a set of warm-up data, obtain shapes of an input tensor and an output tensor of each operation in a first set of operations in the process of running the first machine learning model multiple times, and compare whether shapes of the input tensor and the output tensor of each operation in the first set of operations in the process of running the first machine learning model multiple times are changed, where the first machine learning model includes a first set of operations, and the operations in the first set of operations are operations that are allowed to be processed by a target compiler; a determining module 1104, configured to determine, as a stable operation, an operation in which shapes of both an input tensor and an output tensor are unchanged in the process of running the first machine learning model for the plurality of times; a dividing module 1106, configured to, if a plurality of the stable operations are included in the first set of operations, divide the plurality of the stable operations into one or more compilation regions; a merging module 1108, configured to merge, by the target compiler, the stable operations on a target compiling region in the first machine learning model to obtain a second machine learning model, where the target compiling region is a compiling region in the one or more compiling regions, the stable operations on the target compiling region are multiple, the second machine learning model includes a second group of operations, and a number of the operations in the second group of operations is smaller than a number of the operations in the first group of operations.

Optionally, the processing module includes: a first recording unit configured to record an input tensor and an output tensor of each operation in the first group of operations in the process of running the first machine learning model each time the first machine learning model is run; a first comparing unit configured to compare the input tensor and the output tensor of each operation in the first set of operations recorded at the current time with the shape of the input tensor and the output tensor of each operation in the first set of operations recorded at the last time, every time the first machine learning model is run, starting from the second time the first machine learning model is run.

Optionally, the processing module includes: a second recording unit, configured to record an input tensor and an output tensor of the first set of operations marked as unchanged operations in the first machine learning model in the process of running the first machine learning model each time the first machine learning model is run, where each operation in the first set of operations is marked as an unchanged operation in the first machine learning model running; starting from the second operation of the first machine learning model, comparing the input tensor and the output tensor marked as unchanged operation in the first group of operation recorded at the current time with the shapes of the input tensor and the output tensor recorded at the last time every time the first machine learning model is operated, and continuously marking the operation marked as unchanged operation in which the shapes of the input tensor and the output tensor are unchanged in the operation marked as unchanged operation not changed; the determining module includes: a determining unit configured to determine, after running the first machine learning model multiple times, an operation marked as not changed as the stable operation.

Optionally, the merging module includes a merging unit, configured to merge the stable operation on the target compiling region into one operation through the target compiler, and convert an input tensor and an output tensor of the stable operation into an input tensor and an output tensor of the one operation, so as to obtain the second machine learning model.

Optionally, the set of pre-heating data includes a plurality of pre-heating data sets, and the processing module is further configured to repeatedly perform the following steps for a plurality of times: respectively inputting a plurality of input tensors in one preheating data set into partial or all operations in the first group of operations, and running the first machine learning model once to obtain an input tensor and an output tensor of each operation in the first group of operations; comparing whether a shape of an input tensor and an output tensor of each operation in the first set of operations changes during the plurality of runs of the first machine learning model.

Optionally, the processing module is further configured to repeatedly execute the following steps for multiple times: respectively inputting a plurality of input tensors in the group of preheating data into partial or all operations in the first group of operations, and running the first machine learning model once to obtain an input tensor and an output tensor of each operation in the first group of operations; comparing whether a shape of an input tensor and an output tensor of each operation in the first set of operations changes during the plurality of runs of the first machine learning model.

Optionally, the apparatus is further configured to obtain running state information of each operation in the first set of operations during the process of processing a set of preheating data by running the first machine learning model multiple times, where the running state information includes a running time and/or required running resources of each operation; and allocating corresponding running resources to each operation in the second group of operations according to the running state information.

Optionally, the apparatus is further configured to, after the obtaining of the second machine learning model, input formal data into the second machine learning model, where the formal data includes a set of formal input tensors; and compiling the second machine learning model through the target compiler to obtain a processing result output by the second machine learning model after the second machine learning model processes the set of formal input tensors.

Optionally, the dividing module is further configured to divide the stable operations connected to each other in the plurality of stable operations into one compiling region, so as to obtain the one or more compiling regions.

According to another aspect of the embodiment of the present invention, there is also provided an electronic device for implementing the compiling optimization method of the machine learning model, where the electronic device may be a terminal device or a server. The present embodiment takes the electronic device as a server as an example for explanation. As shown in fig. 12, the electronic device comprises a memory 1202 and a processor 1204, the memory 1202 having stored therein a computer program, the processor 1204 being arranged to perform the steps of any of the above-described method embodiments by means of the computer program.

Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, a first machine learning model is run for a plurality of times to process a group of preheating data, the shape of an input tensor and an output tensor of each operation in a first group of operations in the process of running the first machine learning model for the plurality of times is obtained, and whether the shape of the input tensor and the output tensor of each operation in the first group of operations in the process of running the first machine learning model for the plurality of times is changed or not is compared, wherein the first machine learning model comprises a first group of operations, and the operations in the first group of operations are the operations which are allowed to be processed by a target compiler;

s2, determining an operation in which neither the shapes of the input tensor nor the output tensor change during the multiple running of the first machine learning model as a stable operation;

s3, dividing a plurality of stable operations into one or more compiling areas when the stable operations are included in the first group of operations;

s4, merging the stable operations on the target compiling region in the first machine learning model through the target compiler to obtain a second machine learning model, where the target compiling region is a compiling region in the one or more compiling regions, the stable operations on the target compiling region are multiple, the second machine learning model includes a second group of operations, and the number of operations in the second group of operations is less than the number of operations in the first group of operations.

Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 12 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 12 is a diagram illustrating a structure of the electronic device. For example, the electronics may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 12, or have a different configuration than shown in FIG. 12.

The memory 1202 may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for compiling and optimizing a machine learning model in the embodiment of the present invention, and the processor 1204 executes various functional applications and data processing by running the software programs and modules stored in the memory 1202, that is, implements the above-mentioned method for compiling and optimizing a machine learning model. The memory 1202 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1202 can further include memory located remotely from the processor 1204, which can be connected to a terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1202 may specifically include, but is not limited to, information such as an input tensor and an output tensor for the first set of operations and each operation in the first set of operations. As an example, as shown in fig. 12, the memory 1202 may include, but is not limited to, a processing module 1102, a determining module 1104, a dividing module 1106, and a combining module 1108 of the compiling and optimizing device of the machine learning model. In addition, other module units in the compiling and optimizing device of the machine learning model may also be included, but are not limited to these, and are not described in this example again.

In addition, the electronic device further includes: a display 1208 and a connection bus 1210 for connecting the various modular components of the electronic device described above.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.

According to a further aspect of an embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.

Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:

Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A compilation optimization method for a machine learning model, comprising:

running a first machine learning model a plurality of times to process a set of warm-up data, obtaining a shape of an input tensor and an output tensor of each operation in a first set of operations in the process of running the first machine learning model the plurality of times, and comparing whether the shape of the input tensor and the output tensor of each operation in the first set of operations in the process of running the first machine learning model the plurality of times is changed, wherein the first machine learning model comprises a first set of operations, and the operations in the first set of operations are operations which are allowed to be processed by a target compiler;

determining an operation in which neither the shapes of the input tensor nor the output tensor have changed during the plurality of runs of the first machine learning model as a stable operation;

in the case that a plurality of the stable operations are included in the first set of operations, dividing the plurality of the stable operations into one or more compilation regions;

merging, by the target compiler, the stable operations on a target compilation area into one operation, and converting an input tensor and an output tensor of the stable operations into an input tensor and an output tensor of the one operation, to obtain a second machine learning model, where the target compilation area is a compilation area in the one or more compilation areas, the stable operations on the target compilation area are multiple, and the second machine learning model includes a second set of operations, where a number of operations in the second set of operations is smaller than a number of operations in the first set of operations.

2. The method of claim 1, wherein the running the first machine learning model a plurality of times processes a set of warm-up data, obtaining a shape of an input tensor and an output tensor of each operation in the first set of operations during the running the first machine learning model a plurality of times, and comparing whether the shape of the input tensor and the output tensor of each operation in the first set of operations during the running the first machine learning model a plurality of times has changed comprises:

running the first machine learning model once each time, recording an input tensor and an output tensor for each operation in the first set of operations during the running of the first machine learning model;

starting with a second run of the first machine learning model, each time the first machine learning model is run, comparing the input tensor and output tensor of each of the first set of operations currently recorded with the shape of the input tensor and output tensor of each of the first set of operations last recorded.

3. The method of claim 1,

the multi-run first machine learning model processing a set of warm-up data, obtaining a shape of an input tensor and an output tensor of each operation in a first set of operations in the multi-run first machine learning model, and comparing whether the shape of the input tensor and the output tensor of each operation in the first set of operations in the multi-run first machine learning model changes, includes: recording, for each run of the first machine learning model, an input tensor and an output tensor for the first set of operations marked as unchanged operations during the run of the first machine learning model, wherein each operation in the first set of operations is marked as unchanged operations during the first run of the first machine learning model; starting from the second operation of the first machine learning model, comparing the input tensor and the output tensor marked as unchanged operation in the first group of operation recorded at the current time with the shapes of the input tensor and the output tensor recorded at the last time every time the first machine learning model is operated, and continuously marking the operation marked as unchanged operation in which the shapes of the input tensor and the output tensor are unchanged in the operation marked as unchanged operation not changed;

the determining that neither the shapes of the input tensor nor the output tensor have changed during the running of the first machine learning model a plurality of times is a stable operation, including: after running the first machine learning model a plurality of times, determining an operation marked as not changed as the stable operation.

4. The method of claim 1, wherein the set of pre-heat data comprises a plurality of sets of pre-heat data, wherein the running the first machine learning model a plurality of times processes the set of pre-heat data, and wherein comparing whether a shape of an input tensor and an output tensor of each operation of the first set of operations changes during the running the first machine learning model the plurality of times comprises:

repeatedly executing the following steps for a plurality of times: respectively inputting a plurality of input tensors in one preheating data set into partial or all operations in the first group of operations, and running the first machine learning model once to obtain an input tensor and an output tensor of each operation in the first group of operations;

comparing whether a shape of an input tensor and an output tensor of each operation in the first set of operations changes during the plurality of runs of the first machine learning model.

5. The method of claim 1, wherein running the first machine learning model a plurality of times processes a set of warm-up data, and wherein comparing whether a shape of an input tensor and an output tensor of each operation of the first set of operations changes during the running of the first machine learning model the plurality of times comprises:

repeatedly executing the following steps for a plurality of times: respectively inputting a plurality of input tensors in the group of preheating data into partial or all operations in the first group of operations, and running the first machine learning model once to obtain an input tensor and an output tensor of each operation in the first group of operations;

6. The method according to any one of claims 1 to 5, further comprising:

acquiring running state information of each operation in the first group of operations in the process of processing a group of preheating data by running the first machine learning model for multiple times, wherein the running state information comprises running time and/or required running resources of each operation;

and allocating corresponding running resources to each operation in the second group of operations according to the running state information.

7. The method of any of claims 1-5, wherein after the deriving the second machine learning model, the method further comprises:

inputting formal data to the second machine learning model, wherein the formal data includes a set of formal input tensors;

and compiling the second machine learning model through the target compiler to obtain a processing result output by the second machine learning model after the second machine learning model processes the set of formal input tensors.

8. The method according to any one of claims 1 to 5, wherein the dividing the plurality of the stable operations into one or more coding regions comprises:

dividing the stable operation connected with each other in the stable operations into one compiling area to obtain the one or more compiling areas.

9. An apparatus for compiling and optimizing a machine learning model, comprising:

the processing module is used for running a first machine learning model for multiple times to process a group of preheating data, acquiring the shapes of an input tensor and an output tensor of each operation in a first group of operations in the process of running the first machine learning model for multiple times, and comparing whether the shapes of the input tensor and the output tensor of each operation in the first group of operations in the process of running the first machine learning model for multiple times are changed or not, wherein the first machine learning model comprises a first group of operations, and the operations in the first group of operations are operations which are allowed to be processed by a target compiler;

a determining module configured to determine, as a stable operation, an operation in which shapes of both an input tensor and an output tensor are unchanged in the process of running the first machine learning model a plurality of times;

a dividing module, configured to divide a plurality of the stable operations into one or more compilation regions if the first group of operations includes the plurality of the stable operations;

a merging module, configured to merge the stable operations on a target compiling region into one operation through the target compiler, and convert an input tensor and an output tensor of the stable operation into an input tensor and an output tensor of the one operation to obtain a second machine learning model, where the target compiling region is a compiling region in the one or more compiling regions, the stable operations on the target compiling region are multiple, and the second machine learning model includes a second set of operations, and a number of the operations in the second set of operations is smaller than a number of the operations in the first set of operations.

10. A computer-readable storage medium comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 8.

11. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 8 by means of the computer program.