CN111580828A - Compiling optimization method and device of machine learning model - Google Patents

Compiling optimization method and device of machine learning model Download PDF

Info

Publication number
CN111580828A
CN111580828A CN202010366380.6A CN202010366380A CN111580828A CN 111580828 A CN111580828 A CN 111580828A CN 202010366380 A CN202010366380 A CN 202010366380A CN 111580828 A CN111580828 A CN 111580828A
Authority
CN
China
Prior art keywords
operations
subset
machine learning
learning model
tensor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010366380.6A
Other languages
Chinese (zh)
Other versions
CN111580828B (en
Inventor
姜曦楠
朱子霖
周飞虎
郭振宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202010366380.6A priority Critical patent/CN111580828B/en
Publication of CN111580828A publication Critical patent/CN111580828A/en
Application granted granted Critical
Publication of CN111580828B publication Critical patent/CN111580828B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Devices For Executing Special Programs (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a compiling optimization method and device of a machine learning model. Wherein, the method comprises the following steps: operating a first machine learning model to respectively process the N data sets to obtain N groups of model operation information; determining N operation sets in the first group of operations according to the N groups of model operation information; determining, from the N sets of operations, a plurality of subsets of operations in the first set of operations that are allowed to be compiled separately; compiling the operation subset corresponding to each compiling region respectively by using the target compiler. The invention solves the technical problems of long compiling period and resource waste caused by no introduction of sampling information during running in the compiling process of machine learning.

Description

Compiling optimization method and device of machine learning model
Technical Field
The invention relates to the field of artificial intelligence, in particular to a compiling optimization method and device of a machine learning model.
Background
The compilation optimization in the machine learning system has two main technologies of compilation in advance and just-in-time compilation. When the existing machine learning system is compiled, sampling information in runtime is not introduced, and because the change information of tensor shape in runtime is not sampled, recompilation of most operations of a calculation graph is caused due to the change of the shape of part of the operation tensor in the process of multiple iterations. Such recompilation is very time consuming and memory consuming.
Aiming at the problems of long compiling period and resource waste caused by the fact that sampling information in operation is not introduced in the compiling process of machine learning in the related technology, an effective solution scheme does not exist at present
Disclosure of Invention
The embodiment of the invention provides a compiling optimization method and a compiling optimization device of a machine learning model, which at least solve the technical problems of long compiling period and resource waste caused by the fact that sampling information in running is not introduced during compiling in machine learning.
According to an aspect of the embodiments of the present invention, there is provided a compilation optimization method for a machine learning model, including: operating a first machine learning model to respectively process N data sets to obtain N groups of model operation information, wherein for each data set, the first machine learning model is operated for multiple times to obtain one group of model operation information, the first machine learning model comprises a first group of operations, the operations in the first group of operations are operations allowed to be processed by a target compiler, and each group of model operation information is used for indicating: during the process of processing one data set by running the first machine learning model for multiple times, whether the shapes of an input tensor and an output tensor of each operation in the first group of operations are changed or not is judged, and N is a natural number greater than 1; determining N operation sets in the first group of operations according to the N groups of model operation information; determining a plurality of operation subsets which are allowed to be compiled respectively in the first group of operations according to the N operation sets, wherein the plurality of operation subsets are set to be allowed to be compiled respectively by the target compiler; compiling the operation subset corresponding to each compiling region respectively by using the target compiler.
Optionally, each of the operation sets includes a first operation subset and a second operation subset, each of the first operation subset includes an operation in which the shapes of the input tensor and the output tensor are not changed during the process of one of the data sets by the first machine learning model executed by the multiple times, each of the second operation subset includes an operation in which the shapes of the input tensor and/or the output tensor are changed during the process of one of the data sets by the first machine learning model executed by the multiple times, wherein the determining, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations includes: for each of the N sets of operations, performing the following operations, wherein each set of operations is considered a current set of operations in performing the following operations: determining whether a third subset of operations exists in the first set of operations, wherein operations in the third subset of operations occur only in the second subset of operations in the current set of operations and do not occur in the second subset of operations in the N sets of operations other than the current set of operations; wherein, in the presence of the third subset of operations, the third subset of operations is combined into one of the plurality of subsets of operations.
Optionally, each of the operation sets includes a first operation subset and a second operation subset, each of the first operation subset includes an operation in which the shapes of the input tensor and the output tensor are not changed during the process of one of the data sets by the first machine learning model executed by the multiple times, each of the second operation subset includes an operation in which the shapes of the input tensor and/or the output tensor are changed during the process of one of the data sets by the first machine learning model executed by the multiple times, wherein the determining, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations includes: and combining and intersecting the first operation subsets in the N operation sets to obtain a fourth operation subset, wherein the fourth operation subset is one operation subset in the multiple operation subsets.
Optionally, the method further comprises: combining and intersecting the second operation subsets in the N operation sets to obtain a fifth operation subset; setting operations in the fifth subset of operations to forgo compilation using a search target compiler.
Optionally, the determining, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations includes: determining whether a sixth subset of operations exists in the first set of operations, wherein operations in the sixth subset of operations include: the shapes of both the input tensor and the output tensor are unchanged during the processing of the first one of the N data sets by the multiple runs of the first machine learning model, and the shape of the input tensor and/or the output tensor changes during the processing of the other N-1 data sets of the N data sets except the first data set, and the shape of the input tensor and/or the output tensor changes during the processing of the mth data set of the N data sets by the multiple runs of the first machine learning model, and the shapes of the input tensor and the output tensor are not changed during the processing of the other N-1 data sets except the Mth data set, wherein the Mth data set is a data set of the N data sets other than the first data set; wherein, in a case where the sixth subset of operations exists, the sixth subset of operations is combined into one of the plurality of subsets of operations.
Optionally, the determining, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations includes: determining whether a seventh subset of operations exists in the first set of operations, wherein operations in the seventh subset of operations are operations in which shapes of an input tensor and/or an output tensor change during the processing of a first one of the N data sets by the multiple running of the first machine learning model, and shapes of neither an input tensor nor an output tensor change during the processing of N-1 data sets other than the first one of the N data sets; wherein, in a case where the seventh subset of operations exists, the seventh subset of operations is combined into one of the plurality of subsets of operations.
Optionally, the determining, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations includes: determining whether an eighth subset of operations exists in the first set of operations, wherein the operations in the eighth subset of operations are operations in which neither the shapes of the input tensor nor the output tensor changes during the processing of the mth data set of the N data sets by the multiple running of the first machine learning model, and the shapes of the input tensor and/or the output tensor change during the processing of N-1 data sets other than the mth data set, wherein the mth data set is a data set other than the first data set of the N data sets; wherein, in a case where the eighth subset of operations exists, the eighth subset of operations is combined into one of the plurality of subsets of operations.
Optionally, the determining, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations includes: determining whether a ninth subset of operations exists in the first set of operations, wherein the operations in the ninth subset of operations are operations in which shapes of input tensors and/or output tensors change during the processing of the N data sets by the first machine learning model performed the plurality of times; wherein, in a case where the ninth subset of operations exists, the ninth subset of operations is combined into one of the plurality of subsets of operations.
Optionally, for each of the data sets, the first machine learning model is run multiple times, resulting in a set of model run information, including: repeatedly executing the following steps for a plurality of times: inputting a plurality of input tensors in the data set to some or all of the operations in the first set of operations, respectively; modifying variable values of one or more variables in the dataset each time the first machine learning model is run, respectively inputting a plurality of input tensors of the modified dataset to partial or all operations in the first group of operations when the first machine learning model is run next time, and running the first machine learning model once to obtain an input tensor and an output tensor of each operation in the first group of operations; and comparing whether the shapes of the input tensor and the output tensor of each operation in the first group of operations are changed in the process of running the first machine learning model for multiple times to obtain a group of model running information.
Optionally, the method further comprises: in the case that a plurality of target operations are included in the subset of operations allowed to be compiled respectively, merging the target operations into one operation by using the target compiler, reserving operations other than the target operations in the first machine learning model, and converting input tensors and output tensors of the target operations into input tensors and output tensors of the operation to obtain a second machine learning model; or merging the target operations in the first machine learning model into a plurality of operations by using the target compiler, reserving the operations in the first machine learning model except the target operations, and converting the input tensor and the output tensor of the target operations into the input tensor and the output tensor of the merged operations to obtain a second machine learning model, wherein the number of the operations in the merged operations is less than the number of the operations in the target operations.
Optionally, compiling, by using the target compiler, the subset of operations corresponding to each of the compiling regions respectively, where the compiling includes: inputting official data to the second machine learning model; and compiling the second machine learning model by using the target compiler to obtain a processing result output after the second machine learning model processes the formal data.
According to another aspect of the embodiments of the present invention, there is also provided a compiling optimization device for a machine learning model, including: an operation module, configured to operate a first machine learning model to process N data sets respectively to obtain N sets of model operation information, where, for each data set, the first machine learning model is operated multiple times to obtain one set of the model operation information, the first machine learning model includes a first set of operations, an operation in the first set of operations is an operation allowed to be processed by a target compiler, and each set of the model operation information is used to indicate: during the process of processing one data set by running the first machine learning model for multiple times, whether the shapes of an input tensor and an output tensor of each operation in the first group of operations are changed or not is judged, and N is a natural number greater than 1; a first determining module, configured to determine N operation sets in the first group of operations according to the N groups of model operation information; a second determining module, configured to determine, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations, where the plurality of operation subsets are set to be allowed to be compiled respectively by the target compiler; and the compiling module is used for compiling the operation subset corresponding to each compiling area by using the target compiler respectively.
According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, where the computer program is configured to execute the compiling optimization method of the machine learning model when running.
According to another aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor is configured to execute the compiling optimization method of the machine learning model through the computer program.
In the embodiment of the present invention, N data sets are respectively processed by running a first machine learning model to obtain N sets of model running information, and the first machine learning model is run for each data set for multiple times to obtain a set of model running information, where the first machine learning model includes a first set of operations, an operation in the first set of operations is an operation allowed to be processed by a target compiler, and each set of model running information is used to indicate: in the process of processing a data set by running the first machine learning model for multiple times, whether the shapes of an input tensor and an output tensor of each operation in a first group of operations are changed or not is judged, and N is a natural number greater than 1; determining N operation sets in the first group of operations according to the N groups of model operation information; and compiling the operation subsets corresponding to each compiling area by using a target compiler respectively. The tensor information of the machine learning model during compiling is introduced, and the machine learning model is optimized, so that the technical effect of improving the compiling efficiency of the machine learning model is achieved, and the technical problems that the compiling period is long and resources are wasted due to the fact that sampling information during compiling is not introduced during machine learning are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of a hardware environment for a method of compilation optimization of a machine learning model according to an embodiment of the invention;
FIG. 2 is a diagram of an application framework of an alternative method for compiling optimization of a machine learning model, according to an embodiment of the present invention;
FIG. 3 is a flow diagram of a method for compilation optimization of a machine learning model according to an embodiment of the present invention;
FIG. 4 is a first diagram illustrating a compilation optimization method of a machine learning model according to an alternative embodiment of the present invention;
FIG. 5 is a diagram of a compilation optimization method of a machine learning model according to an alternative embodiment of the present invention;
FIG. 6 is a diagram of a method for compiling an optimization of a machine learning model according to an alternative embodiment of the invention;
FIG. 7 is a block diagram of an apparatus for compiling optimization of a machine learning model according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an alternative electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
According to an aspect of the embodiments of the present invention, a compilation optimization method for a machine learning model is provided, and optionally, as an optional implementation manner, the compilation optimization method for a machine learning model may be applied, but not limited, to an environment as shown in fig. 1. As shown in fig. 1, a server 102 is connected to a terminal 104 via a network including, but not limited to: the terminal 104 is not limited to a PC, a mobile phone, a tablet computer, etc. in a wide area network, a metropolitan area network, or a local area network. The method for displaying the target style in the browser according to the embodiment of the present invention may be executed by the server 102, or may be executed by the terminal 104, or may be executed by both the server 102 and the terminal 104.
As an alternative embodiment, the compiling optimization method of the machine learning model can be applied to an optimization framework as shown in fig. 2. In the optimization framework shown in fig. 2, the input preheat data includes N data sets, and the input preheat data performs each operation in the graph according to the computational graph that has not been optimized. And acquiring running information in the execution process, wherein the running information comprises the shape of a tensor, the operation running time, the input and output transmission time, the size of memory video memory occupied and the like. The results of processing the pre-heating data may generally be discarded. The preheating may be performed in a plurality of iterative steps, and operation information of each operation of each iterative step is collected. The static information preprocessing can extract the static information of the computation graph, such as the equipment planned to be allocated by the node, the topological neighborhood information of the node, the equipment, the operation environment and the like.
Comprehensively analyzing the information acquired by preheating operation and the static information of the calculation graph, defining a compiling area to obtain an optimized calculation graph, and referring to the information of shape change when dividing the compiling area. After the optimization of dividing the compiling area is finished, a new optimized calculation graph is obtained. At this time, the formal graph is started to run, namely the formal model is run, and training or reasoning is started. And inputting formal data, and executing each operation in the graph according to the optimized calculation graph to finish model calculation.
The above is merely an example, and this is not limited in this embodiment.
Optionally, as an optional implementation manner, as shown in fig. 3, the compiling optimization method of the machine learning model includes the following steps:
step S302, a first machine learning model is operated to process N data sets respectively to obtain N groups of model operation information, wherein for each data set, the first machine learning model is operated for multiple times to obtain one group of model operation information, the first machine learning model comprises a first group of operations, the operations in the first group of operations are operations allowed to be processed by a target compiler, and each group of model operation information is used for indicating: during the process of processing one data set by running the first machine learning model for multiple times, whether the shapes of an input tensor and an output tensor of each operation in the first group of operations are changed or not is judged, and N is a natural number greater than 1;
step S304, determining N operation sets in the first group of operations according to the N groups of model operation information;
step S306, according to the N operation sets, determining a plurality of operation subsets which are allowed to be compiled respectively in the first group of operations, wherein the plurality of operation subsets are set to be allowed to be compiled respectively by the target compiler;
step S308, compiling the operation subsets corresponding to each compiling region by using the target compiler.
As an alternative embodiment, the Tensor is a data structure, which can be understood as a vector or array matrix. Generally, a one-dimensional tensor is a vector, and a tensor of two or more dimensions is an array matrix. The shape of the tensor can refer to a dimension of an array, for examplem×nAn array of dimensions.
As an optional implementation, each operation set in the N operation sets includes a first operation subset and a second operation subset, each of the first operation subset includes operations in which the shapes of the input tensor and the output tensor are not changed during the processing of one data set by the first machine learning model by the multiple operations, and each of the second operation subset includes operations in which the shapes of the input tensor and/or the output tensor are changed during the processing of one data set by the first machine learning model by the multiple operations
Optionally, the method further includes: obtaining a first machine learning model to be compiled, wherein the first machine learning model comprises a group of preheating operations, and the input tensor and the output tensor of the group of preheating operations and the group of preheating operations form a first computation graph; determining a third operation that is not supported by the target compiler in the first computational graph; determining that an operation other than the third operation in the set of warm-up operations is the first set of operations.
As an alternative embodiment, fig. 4 is a block diagram of an alternative first computation graph. In the present embodiment, operation a, operation B, operation C, and operation D, and the input tensor and the output tensor (tensor A, B, C, D, E, G, F) of the respective operations constitute a first calculation diagram of the first machine learning model in fig. 4. In the first calculation graph, operation a, operation B, operation C, and operation D are sequentially connected in order, and the output tensor D of operation a serves as the input tensor of operation B.
As an alternative embodiment, the target compiler may be a JIT compiler. For JIT compilers, there are operations that are supported and unsupported. For example, in the case where operations a, B, C, and D included in the first computational graph shown in fig. 4 are all JIT compiler-supported compilation operations, the first set of operations of the first machine learning model includes: operation A, operation B, operation C, and operation D. For another example, in the first computation graph shown in fig. 4, where the JIT compiler only supports partial operations for compiling, assuming that the JIT compiler supports the compiling operations operationna, OperationB, and OperationC, but does not support the operation OperationD, the first set of operations of the first machine learning model includes: operation A, operation B, and operation C, while operation D, which is not supported by the JIT compiler, may be partitioned into a single compilation area first.
As an alternative embodiment, the warm-up data input to the first machine learning model includes N data sets, and the difference between the N data sets is that the partial variables are different. In the process of performing multiple operation processing on each data set, the input tensor and the output tensor of each operation in each operation processing are sampled, after the multiple operation processing is performed on each data set, whether the shape of the input tensor and the shape of the output tensor of each operation change or not is compared, so that model operation information corresponding to the data set is obtained, according to the model operation information, operations in a first group of operations of a first machine learning model can be divided into two operation subsets, wherein the operations included in the first operation subset are operations in which neither the input tensor nor the output tensor changes in the process of performing the multiple operation processing on the data set, and the operations can be called stable operations. The operations included in the second subset of operations are operations in which the shape of the input tensor and/or the output tensor changes during a plurality of times of execution processing on the data set, and may be referred to as morphing operations. After the operations are respectively executed on the N data sets, N groups of model operation information respectively corresponding to the N data sets and N operation sets respectively corresponding to the N data sets can be obtained.
As an optional implementation manner, according to N operation sets, a group of operations in the first computation graph of the first machine learning model is divided into a plurality of operation subsets, each operation subset may include a plurality of operations, the operations are merged and optimized, corresponding compilation regions are allocated to the operations in each operation subset, and a second computation graph of the second machine learning model after the optimization of the first machine learning model is obtained. Because the calculation graph is merged and optimized, the time can be saved and the efficiency can be improved when the optimized calculation graph is compiled.
Optionally, each of the operation sets includes a first operation subset and a second operation subset, each of the first operation subset includes an operation in which the shapes of the input tensor and the output tensor are not changed during the process of one of the data sets by the first machine learning model executed by the multiple times, each of the second operation subset includes an operation in which the shapes of the input tensor and/or the output tensor are changed during the process of one of the data sets by the first machine learning model executed by the multiple times, wherein the determining, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations includes: for each of the N sets of operations, performing the following operations, wherein each set of operations is considered a current set of operations in performing the following operations: determining whether a third subset of operations exists in the first set of operations, wherein operations in the third subset of operations occur only in the second subset of operations in the current set of operations and do not occur in the second subset of operations in the N sets of operations other than the current set of operations; wherein, in the presence of the third subset of operations, the third subset of operations is combined into one of the plurality of subsets of operations.
As an alternative embodiment, the factors that influence the change in the shape of the input and/or output tensor of each operation in a set of operations are different. Taking the calculation diagram shown in fig. 4 as an example, assuming that operations B, C are all morphing operations, if the tensor shape changes of operations a and B are only related to variable x and have a low frequency, and the tensor shape change of operation C is only related to variable y and also have a low frequency. However, if operation A, operation B, and operation C are fused into a new tensor shape change for operation ABC, then it is associated with both x and y. If x and y are uncorrelated, the tensor shape change frequency of operation ABC is high. For example, x varies from 1 to 120 and y also varies from 1 to 100, and since the tensor of operation ABC varies both with respect to x and y, it needs to vary 120 x 100 times. While the shape change of operation B is only related to x as long as 120 times and the shape change of operation C is only related to y as long as 100 times. The number of recompilation after fusion is greatly increased. In this embodiment, merging optimization may be performed on operations in a set of operations based on factors that affect the change in the tensor shape of the operations.
As an optional embodiment, during the process of processing each data set of the N data sets, one or more variables in each data set change, the changed data set is input by running the first machine learning model each time, tensor shape information of each data set in the process of running for multiple times is respectively sampled, and the operation influenced by the tensor shape change in the data set is recorded. In this embodiment, the calculation diagram shown in fig. 5 is taken as an example for explanation, and it is assumed that all operations included in the calculation diagram are operations supported by a target compiler. Assuming that the tensor shapes of operation a, operation B and operation C change with the change of parameters x and y in dataset 1, the operation that determines the change of the tensor shape determined by the dataset is operation a, operation B and operation C, and the rest are tensor shape-stable operations, and table 1 below is an operation that determines the change of the tensor shape in the first set of operations by dataset 1 and an operation that the tensor shape is stable. Assuming that the tensor shapes of the operation C and the operation D in the data set 2 change with the change of the parameter z, the operation for determining that the tensor shape determined by the data set changes is the operation C and the operation D, and the rest of the operations are tensor shape-stable operations, as shown in table 2 below, the operation for determining that the tensor shape in the first group of operations changes and the operation for determining that the tensor shape is stable are determined by the data set 2.
TABLE 1
Data set 1 Tensor shape change operations Tensor shape-stable operation
Detailed description of the invention Operation A, operation B, operation C Operation D, operation F
TABLE 2
Figure BDA0002476862840000121
The tensor shape information of each operation in the first set of operations during the running process of the data set 1 and the data set 2 can be determined by combining the above table 1 and table 2, as shown in the following table 3.
TABLE 3
Figure BDA0002476862840000131
In this embodiment, the third operation subset may be a set composed of operation a and operation B, or may be a set including only operation D. Operations that change only in data set 1 and operations B are divided into one compilation area P, and operations D that change only in data set 2 are divided into another compilation area Q. Here, the coding area P and the coding area Q may be separately coded. So that the tensor shape of both compiled areas does not change as frequently.
Optionally, each of the operation sets includes a first operation subset and a second operation subset, each of the first operation subset includes an operation in which the shapes of the input tensor and the output tensor are not changed during the process of one of the data sets by the first machine learning model executed by the multiple times, each of the second operation subset includes an operation in which the shapes of the input tensor and/or the output tensor are changed during the process of one of the data sets by the first machine learning model executed by the multiple times, wherein the determining, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations includes: and combining and intersecting the first operation subsets in the N operation sets to obtain a fourth operation subset, wherein the fourth operation subset is one operation subset in the multiple operation subsets.
As an alternative embodiment, included in the first subset of operations are operations in which neither the input tensor nor the output tensor has a changed shape. In this embodiment, an operation in which the shapes of the input tensor and the output tensor are not changed in the process of performing the operation processing on the N data sets a plurality of times is an operation in the fourth operation subset, that is, an operation in the fourth operation subset is an operation in which the shapes of both the input tensor and the output tensor are changed in the process of performing the operation processing on the N data sets a plurality of times. In table 3 above, the operation set consisting of operation E and operation F is a fourth subset of operations. Dividing the operation E and the operation F which do not generate the shape change in the two preheating processes into a compiling region R.
Optionally, the method further comprises: combining and intersecting the second operation subsets in the N operation sets to obtain a fifth operation subset; setting operations in the fifth subset of operations to forgo compilation using a search target compiler.
As an optional implementation, the operations included in the fifth subset of operations are operations in which, during the process of performing the warm-up processing on the N data sets, the tensor shapes all transmit changes, and the tensor shapes refer to input and/or output tensors. Such an operation is often affected by a change in the shape of its tensor, and the frequency of the change is high, so that the compilation of such an operation can be abandoned. Of the operations shown in Table 3, operation C, in which both warmups were changed, abandoned compilation because the frequency of change was too high. As an alternative embodiment, after performing merge optimization on the operations included in the computation graph shown in fig. 5, an optimized computation graph is obtained as shown in fig. 6.
Optionally, the determining, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations includes: determining whether a sixth subset of operations exists in the first set of operations, wherein operations in the sixth subset of operations include: the shapes of both the input tensor and the output tensor are unchanged during the processing of the first one of the N data sets by the multiple runs of the first machine learning model, and the shape of the input tensor and/or the output tensor changes during the processing of the other N-1 data sets of the N data sets except the first data set, and the shape of the input tensor and/or the output tensor changes during the processing of the mth data set of the N data sets by the multiple runs of the first machine learning model, and the shapes of the input tensor and the output tensor are not changed during the processing of the other N-1 data sets except the Mth data set, wherein the Mth data set is a data set of the N data sets other than the first data set; wherein, in a case where the sixth subset of operations exists, the sixth subset of operations is combined into one of the plurality of subsets of operations.
As an alternative embodiment, for more data sets, each class is divided into different coding regions, not necessarily solely according to the orthogonal result. For example, N preheat data sets may be divided into 2NAnd (4) orthogonal results. At this time, the division of each class into a coding region may be too scattered, and several classes may be merged.
In this embodiment, N is 3, that is, the data set is 3, and when N is 3, there are 8 types of operations, which can be respectively described as 000, 001, 010, 100, 011, 101, 110 and 111, where 000 represents all stabilization of the three preheating processes, 001 represents only the third shape change, 010 represents only the second shape change, 100 represents only the first shape change, 011 represents only the first shape stabilization, 101 represents only the second shape stabilization, 110 represents only the third shape stabilization, and 111 represents all shape changes. 001, 010, 011 can be divided in the same compiling region. In the present embodiment, the operation that the shapes of the input tensor and the output tensor are not changed during the process of processing the first data set of the N data sets by running the first machine learning model for multiple times, and the shapes of the input tensor and/or the output tensor are changed during the process of processing the N-1 data sets of the N data sets except for the first data set is that 011 in the present embodiment represents only the first time shape stabilization. In the present embodiment, the shape of the input tensor and/or the shape of the output tensor change during the process of processing the mth data set in the N data sets by running the first machine learning model multiple times, and the shapes of the input tensor and the output tensor do not change during the process of processing the N-1 data sets except the mth data set, that is, 001 changes only in shape for the third time in the present embodiment, and 010 represents shape change only for the second time.
As an optional implementation, in the process of processing the N data sets, an operation in which both the input tensor and the output tensor change may be used as an operation in the fourth operation subset, and the operations in the fourth subset are separately divided into one compiling area. In the above embodiment, where N is 3, that is, the data set is 3, all stable operations of warming up three times represented by 000 are separately divided into one compiling area.
Optionally, the determining, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations includes: determining whether a seventh subset of operations exists in the first set of operations, wherein operations in the seventh subset of operations are operations in which shapes of an input tensor and/or an output tensor change during the processing of a first one of the N data sets by the multiple running of the first machine learning model, and shapes of neither an input tensor nor an output tensor change during the processing of N-1 data sets other than the first one of the N data sets; wherein, in a case where the seventh subset of operations exists, the seventh subset of operations is combined into one of the plurality of subsets of operations.
As an optional embodiment, in the process of performing the pre-heating process on the N data sets, an operation in which the shape of the tensor changes only in the process of processing the first data set is used as an operation in the seventh operation subset. In the present embodiment, the operation of the first shape change is divided into the same compilation area only when the first dataset is processed, as represented by 100.
Optionally, the determining, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations includes: determining whether an eighth subset of operations exists in the first set of operations, wherein the operations in the eighth subset of operations are operations in which neither the shapes of the input tensor nor the output tensor changes during the processing of the mth data set of the N data sets by the multiple running of the first machine learning model, and the shapes of the input tensor and/or the output tensor change during the processing of N-1 data sets other than the mth data set, wherein the mth data set is a data set other than the first data set of the N data sets; wherein, in a case where the eighth subset of operations exists, the eighth subset of operations is combined into one of the plurality of subsets of operations.
As an alternative embodiment, an operation in which the tensor shape is not changed in the process of processing only one data set among the N data sets other than the first data set among the N data sets is set as the operation in the eighth operation subset. In the above example where N is 3, the reference numeral 101 denotes that only the second time of the operation of shape stabilization is separately divided into one coding region, and the reference numeral 110 denotes that only the third time of the operation of shape stabilization is separately divided into one coding region.
Optionally, the determining, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations includes: determining whether a ninth subset of operations exists in the first set of operations, wherein the operations in the ninth subset of operations are operations in which shapes of input tensors and/or output tensors change during the processing of the N data sets by the first machine learning model performed the plurality of times; wherein, in the presence of the ninth subset of operations, the eighth subset of operations is combined into one of the plurality of subsets of operations.
As an optional implementation, in the process of processing the N data sets, an operation in which the shape of the tensor changes in the process of processing each data set is taken as an operation in the ninth operation subset. That is, included in the ninth subset of operations are operations in which the shapes of tensors are all changed during the processing of the N data sets. In the above example where N is 3, 111 operations of all shape changes are separately divided into one coding region.
Optionally, for each of the data sets, the first machine learning model is run multiple times, resulting in a set of model run information, including: repeatedly executing the following steps for a plurality of times: inputting a plurality of input tensors in a data set into the first machine learning model once for each operation of part or all of the operations in the first group of operations, modifying variable values of one or more variables in the data set, inputting the plurality of input tensors of the modified data set into part or all of the operations in the first group of operations when the first machine learning model is operated next time, and operating the first machine learning model once to obtain an input tensor and an output tensor of each operation in the first group of operations; and comparing whether the shapes of the input tensor and the output tensor of each operation in the first group of operations are changed in the process of running the first machine learning model for multiple times to obtain a group of model running information.
As an optional embodiment, multiple running processes are performed on each of the N data sets, and partial variable changes of the preheating data in the data sets are controlled during the multiple running processes performed on each data set. For example, after the first run processing is performed on the data set 1, the variable values of the part of the data set 1 are changed, for example, the variables x and y may be changed, or the variable z may be changed. The variables that vary between the N data sets may be the same or different. For example, in processing a first data set in multiple runs, the x and y parameterization of the first data set is controlled. The z-parameter of the second data set is controlled to vary as the second data set is processed over multiple runs.
Optionally, in a case that a plurality of target operations are included in the subset of operations allowed to be compiled respectively, merging the target operations into one operation using the target compiler, retaining operations other than the target operations in the first machine learning model, and converting input tensors and output tensors of the target operations into input tensors and output tensors of the operation to obtain a second machine learning model; or merging the target operations in the first machine learning model into a plurality of operations by using the target compiler, reserving the operations in the first machine learning model except the target operations, and converting the input tensor and the output tensor of the target operations into the input tensor and the output tensor of the merged operations to obtain a second machine learning model, wherein the number of the operations in the merged operations is less than the number of the operations in the target operations.
As an optional implementation, in the case that a plurality of operations are included in one operation subset, the plurality of operations may be combined into one operation, or may be combined into a plurality of operations.
Optionally, compiling, by using the target compiler, the subset of operations corresponding to each of the compiling regions respectively, where the compiling includes: inputting official data to the second machine learning model; and compiling the second machine learning model by using the target compiler to obtain a processing result output after the second machine learning model processes the formal data.
As an alternative embodiment, after performing the merged optimization on the first set of operations of the first machine learning model, the second machine learning model is obtained. The formal data is compiled through the second machine learning model, and the calculation graph is optimized based on the information of the input tensor and the output tensor of the operation, so that the compiling efficiency is improved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
According to another aspect of the embodiment of the present invention, there is also provided a compilation optimization device of a machine learning model for implementing the compilation optimization method of a machine learning model. As shown in fig. 7, the apparatus includes: an operation module 702, configured to operate a first machine learning model to process N data sets respectively to obtain N sets of model operation information, where for each data set, the first machine learning model is operated multiple times to obtain one set of model operation information, the first machine learning model includes a first set of operations, an operation in the first set of operations is an operation that is allowed to be processed by a target compiler, each set of model operation information is used to indicate whether a shape of an input tensor and an output tensor of each operation in the first set of operations changes during the process of operating the first machine learning model multiple times to process one data set, and N is a natural number greater than 1; a first determining module 704, configured to determine N operation sets in the first group of operations according to the N groups of model operation information; a second determining module 706, configured to determine, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations, where the plurality of operation subsets are set to be allowed to be compiled respectively by the target compiler; a compiling module 708, configured to compile, using the target compiler, the subset of operations corresponding to each compiling region respectively.
Optionally, each of the operation sets includes a first operation subset and a second operation subset, each of the first operation subset includes an operation in which the shapes of the input tensor and the output tensor are not changed during the process of processing one of the data sets by the first machine learning model by the multiple times, and each of the second operation subset includes an operation in which the shapes of the input tensor and/or the output tensor are changed during the process of processing one of the data sets by the first machine learning model by the multiple times, wherein the second determining module includes: an execution unit, configured to perform the following operations for each operation set of the N operation sets, where each operation set is regarded as a current operation set in a process of performing the following operations: determining whether a third subset of operations exists in the first set of operations, wherein operations in the third subset of operations occur only in the second subset of operations in the current set of operations and do not occur in the second subset of operations in the N sets of operations other than the current set of operations; wherein, in the presence of the third subset of operations, the third subset of operations is combined into one of the plurality of subsets of operations.
Optionally, the apparatus is further configured to aggregate the first operation subsets in the N operation sets to obtain a fourth operation subset, where the fourth operation subset is aggregated to be one operation subset in the multiple operation subsets.
Optionally, the apparatus is further configured to aggregate intersections of the second operation subsets in the N operation sets to obtain a fifth operation subset; setting operations in the fifth subset of operations to forgo compilation using a search target compiler.
Optionally, the apparatus is further configured to determine whether a sixth subset of operations exists in the first set of operations, where operations in the sixth subset of operations include: the shapes of both the input tensor and the output tensor are unchanged during the processing of the first one of the N data sets by the multiple runs of the first machine learning model, and the shape of the input tensor and/or the output tensor changes during the processing of the other N-1 data sets of the N data sets except the first data set, and the shape of the input tensor and/or the output tensor changes during the processing of the mth data set of the N data sets by the multiple runs of the first machine learning model, and the shapes of the input tensor and the output tensor are not changed during the processing of the other N-1 data sets except the Mth data set, wherein the Mth data set is a data set of the N data sets other than the first data set; wherein, in a case where the sixth subset of operations exists, the sixth subset of operations is combined into one of the plurality of subsets of operations.
Optionally, the apparatus is further configured to determine, according to the N operation sets, a plurality of operation subsets allowed to be compiled respectively in the first group of operations, and includes: determining whether a seventh subset of operations exists in the first set of operations, wherein operations in the seventh subset of operations are operations in which shapes of an input tensor and/or an output tensor change during the processing of a first one of the N data sets by the multiple running of the first machine learning model, and shapes of neither an input tensor nor an output tensor change during the processing of N-1 data sets other than the first one of the N data sets; wherein, in a case where the seventh subset of operations exists, the seventh subset of operations is combined into one of the plurality of subsets of operations.
Optionally, the apparatus is further configured to determine whether an eighth subset of operations exists in the first set of operations, where the operations in the eighth subset of operations are operations in which shapes of an input tensor and an output tensor are not changed during the processing of an mth dataset of the N datasets by the multiple running of the first machine learning model, and shapes of the input tensor and/or the output tensor are changed during processing of N-1 datasets other than the mth dataset, where the mth dataset is a dataset of the N datasets other than the first dataset; wherein, in a case where the eighth subset of operations exists, the eighth subset of operations is combined into one of the plurality of subsets of operations.
Optionally, the apparatus is further configured to determine whether a ninth subset of operations exists in the first set of operations, where the operations in the ninth subset of operations are operations in which shapes of input tensors and/or output tensors change during the processing of the N data sets by running the first machine learning model multiple times; wherein, in a case where the ninth subset of operations exists, the ninth subset of operations is combined into one of the plurality of subsets of operations.
Optionally, the above apparatus is further configured to, for each of the data sets, execute the first machine learning model multiple times to obtain a set of model execution information, including: repeatedly executing the following steps for a plurality of times: inputting a plurality of input tensors in the data set to some or all of the operations in the first set of operations, respectively; modifying variable values of one or more variables in the dataset each time the first machine learning model is run, respectively inputting a plurality of input tensors of the modified dataset to partial or all operations in the first group of operations when the first machine learning model is run next time, and running the first machine learning model once to obtain an input tensor and an output tensor of each operation in the first group of operations; and comparing whether the shapes of the input tensor and the output tensor of each operation in the first group of operations are changed in the process of running the first machine learning model for multiple times to obtain a group of model running information.
Optionally, the apparatus is further configured to, in a case that a plurality of target operations are included in the subset of operations allowed to be compiled respectively, merge the plurality of target operations into one operation using the target compiler, retain operations other than the plurality of target operations in the first machine learning model, and convert input tensors and output tensors of the plurality of target operations into input tensors and output tensors of the one operation, so as to obtain a second machine learning model; or merging the target operations in the first machine learning model into a plurality of operations by using the target compiler, reserving the operations in the first machine learning model except the target operations, and converting the input tensor and the output tensor of the target operations into the input tensor and the output tensor of the merged operations to obtain a second machine learning model, wherein the number of the operations in the merged operations is less than the number of the operations in the target operations.
Optionally, the apparatus is further configured to input formal data into the second machine learning model; and compiling the second machine learning model by using the target compiler to obtain a processing result output after the second machine learning model processes the formal data.
According to another aspect of the embodiment of the present invention, there is also provided an electronic device for implementing the compiling optimization method of the machine learning model, where the electronic device may be a terminal device or a server. The present embodiment takes the electronic device as a server as an example for explanation. As shown in fig. 8, the electronic device comprises a memory 802 and a processor 804, the memory 802 having a computer program stored therein, the processor 804 being arranged to perform the steps of any of the above-described method embodiments by means of the computer program.
Optionally, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of a computer network.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, running a first machine learning model to process N data sets respectively to obtain N sets of model running information, where for each data set, the first machine learning model is run multiple times to obtain one set of the model running information, the first machine learning model includes a first set of operations, the operations in the first set of operations are operations allowed to be processed by a target compiler, and each set of the model running information is used to indicate: during the process of processing one data set by running the first machine learning model for multiple times, whether the shapes of an input tensor and an output tensor of each operation in the first group of operations are changed or not is judged, and N is a natural number greater than 1;
s2, determining N operation sets in the first group of operations according to the N groups of model operation information;
s3, according to the N operation sets, determining a plurality of operation subsets which are allowed to be compiled respectively in the first group of operations, wherein the operation subsets are set to be allowed to be compiled respectively by the target compiler;
and S4, compiling the operation subsets corresponding to the compiling areas respectively by using the target compiler.
Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 8 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 8 is a diagram illustrating a structure of the electronic device. For example, the electronics may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 8, or have a different configuration than shown in FIG. 8.
The memory 802 may be used to store software programs and modules, such as program instructions/modules corresponding to the method and apparatus for compiling and optimizing a machine learning model in the embodiment of the present invention, and the processor 804 executes various functional applications and data processing by running the software programs and modules stored in the memory 802, that is, implements the method for compiling and optimizing a machine learning model described above. The memory 802 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 802 can further include memory located remotely from the processor 804, which can be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 802 may specifically include, but is not limited to, information such as an input tensor and an output tensor for the first set of operations and each operation in the first set of operations. As an example, as shown in fig. 8, the above-mentioned memory 802 may include, but is not limited to, an execution module 702, a first determination module 704, a second determination module 706, and a compiling module 708 in the compiling and optimizing apparatus of the above-mentioned machine learning model. In addition, other module units in the compiling and optimizing device of the machine learning model may also be included, but are not limited to these, and are not described in this example again.
In addition, the electronic device further includes: a display 808 and a connection bus 810 for connecting the various modular components of the electronic device described above.
In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting a plurality of nodes through a network communication. Nodes can form a Peer-To-Peer (P2P, Peer To Peer) network, and any type of computing device, such as a server, a terminal, and other electronic devices, can become a node in the blockchain system by joining the Peer-To-Peer network.
According to a further aspect of an embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:
s1, running a first machine learning model to process N data sets respectively to obtain N sets of model running information, where for each data set, the first machine learning model is run multiple times to obtain one set of model running information, the first machine learning model includes a first set of operations, the operations in the first set of operations are operations allowed to be processed by a target compiler, each set of model running information is used to indicate whether shapes of an input tensor and an output tensor of each operation in the first set of operations change during the process of running the first machine learning model multiple times to process one data set, and N is a natural number greater than 1;
s2, determining N operation sets in the first group of operations according to the N groups of model operation information;
s3, according to the N operation sets, determining a plurality of operation subsets which are allowed to be compiled respectively in the first group of operations, wherein the operation subsets are set to be allowed to be compiled respectively by the target compiler;
and S4, compiling the operation subsets corresponding to the compiling areas respectively by using the target compiler.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (14)

1. A compilation optimization method for a machine learning model, comprising:
operating a first machine learning model to respectively process N data sets to obtain N groups of model operation information, wherein for each data set, the first machine learning model is operated for multiple times to obtain one group of model operation information, the first machine learning model comprises a first group of operations, the operations in the first group of operations are operations allowed to be processed by a target compiler, and each group of model operation information is used for indicating: during the process of processing one data set by running the first machine learning model for multiple times, whether the shapes of an input tensor and an output tensor of each operation in the first group of operations are changed or not is judged, and N is a natural number greater than 1;
determining N operation sets in the first group of operations according to the N groups of model operation information;
determining a plurality of operation subsets which are allowed to be compiled respectively in the first group of operations according to the N operation sets, wherein the plurality of operation subsets are set to be allowed to be compiled respectively by the target compiler;
compiling the operation subset corresponding to each compiling region respectively by using the target compiler.
2. The method of claim 1, wherein each of the operation sets comprises a first operation subset and a second operation subset, each of the first operation subset comprises operations in which neither a shape of an input tensor nor a shape of an output tensor changes during the multiple operation of the first machine learning model on one of the data sets, each of the second operation subset comprises operations in which a shape of an input tensor and/or an output tensor changes during the multiple operation of the first machine learning model on one of the data sets, and wherein the determining, from the N operation sets, a plurality of operation subsets that are allowed to be compiled separately in the first set of operations comprises:
for each of the N sets of operations, performing the following operations, wherein each set of operations is considered a current set of operations in performing the following operations:
determining whether a third subset of operations exists in the first set of operations, wherein operations in the third subset of operations occur only in the second subset of operations in the current set of operations and do not occur in the second subset of operations in the N sets of operations other than the current set of operations;
wherein, in the presence of the third subset of operations, the third subset of operations is combined into one of the plurality of subsets of operations.
3. The method of claim 1, wherein each of the operation sets comprises a first operation subset and a second operation subset, each of the first operation subset comprises operations in which neither a shape of an input tensor nor a shape of an output tensor changes during the multiple operation of the first machine learning model on one of the data sets, each of the second operation subset comprises operations in which a shape of an input tensor and/or an output tensor changes during the multiple operation of the first machine learning model on one of the data sets, and wherein the determining, from the N operation sets, a plurality of operation subsets that are allowed to be compiled separately in the first set of operations comprises:
and combining and intersecting the first operation subsets in the N operation sets to obtain a fourth operation subset, wherein the fourth operation subset is one operation subset in the multiple operation subsets.
4. The method of claim 2, further comprising:
combining and intersecting the second operation subsets in the N operation sets to obtain a fifth operation subset;
setting operations in the fifth subset of operations to forgo compilation using a search target compiler.
5. The method of claim 1, wherein determining, from the N sets of operations, a plurality of subsets of operations in the first set of operations that are allowed to be separately compiled comprises:
determining whether a sixth subset of operations exists in the first set of operations, wherein operations in the sixth subset of operations include: the shapes of both the input tensor and the output tensor are unchanged during the processing of the first one of the N data sets by the multiple runs of the first machine learning model, and the shape of the input tensor and/or the output tensor changes during the processing of the other N-1 data sets of the N data sets except the first data set, and the shape of the input tensor and/or the output tensor changes during the processing of the mth data set of the N data sets by the multiple runs of the first machine learning model, and the shapes of the input tensor and the output tensor are not changed during the processing of the other N-1 data sets except the Mth data set, wherein the Mth data set is a data set of the N data sets other than the first data set;
wherein, in a case where the sixth subset of operations exists, the sixth subset of operations is combined into one of the plurality of subsets of operations.
6. The method of claim 1, wherein determining, from the N sets of operations, a plurality of subsets of operations in the first set of operations that are allowed to be separately compiled comprises:
determining whether a seventh subset of operations exists in the first set of operations, wherein operations in the seventh subset of operations are operations in which shapes of an input tensor and/or an output tensor change during the processing of a first one of the N data sets by the multiple running of the first machine learning model, and shapes of neither an input tensor nor an output tensor change during the processing of N-1 data sets other than the first one of the N data sets;
wherein, in a case where the seventh subset of operations exists, the seventh subset of operations is combined into one of the plurality of subsets of operations.
7. The method of claim 1, wherein determining, from the N sets of operations, a plurality of subsets of operations in the first set of operations that are allowed to be separately compiled comprises:
determining whether an eighth subset of operations exists in the first set of operations, wherein the operations in the eighth subset of operations are operations in which neither the shapes of the input tensor nor the output tensor changes during the processing of the mth data set of the N data sets by the multiple running of the first machine learning model, and the shapes of the input tensor and/or the output tensor change during the processing of N-1 data sets other than the mth data set, wherein the mth data set is a data set other than the first data set of the N data sets;
wherein, in a case where the eighth subset of operations exists, the eighth subset of operations is combined into one of the plurality of subsets of operations.
8. The method of claim 1, wherein determining, from the N sets of operations, a plurality of subsets of operations in the first set of operations that are allowed to be separately compiled comprises:
determining whether a ninth subset of operations exists in the first set of operations, wherein the operations in the ninth subset of operations are operations in which shapes of input tensors and/or output tensors change during the processing of the N data sets by the first machine learning model performed the plurality of times;
wherein, in a case where the ninth subset of operations exists, the ninth subset of operations is combined into one of the plurality of subsets of operations.
9. The method of any of claims 1 to 8, wherein for each of the data sets, the first machine learning model is run a plurality of times, resulting in a set of model run information, comprising:
repeatedly executing the following steps for a plurality of times:
inputting a plurality of input tensors in the data set to some or all of the operations in the first set of operations, respectively;
modifying variable values of one or more variables in the dataset each time the first machine learning model is run, respectively inputting a plurality of input tensors of the modified dataset to partial or all operations in the first group of operations when the first machine learning model is run next time, and running the first machine learning model once to obtain an input tensor and an output tensor of each operation in the first group of operations;
and comparing whether the shapes of the input tensor and the output tensor of each operation in the first group of operations are changed in the process of running the first machine learning model for multiple times to obtain a group of model running information.
10. The method according to any one of claims 1 to 8, further comprising:
in the case that a plurality of target operations are included in the subset of operations allowed to be compiled respectively, merging the target operations into one operation by using the target compiler, reserving operations other than the target operations in the first machine learning model, and converting input tensors and output tensors of the target operations into input tensors and output tensors of the operation to obtain a second machine learning model; or
Merging the target operations in the first machine learning model into a plurality of operations using the target compiler, reserving operations in the first machine learning model other than the target operations, and converting input tensors and output tensors of the target operations into input tensors and output tensors of the merged operations, resulting in a second machine learning model, wherein the number of operations in the merged operations is smaller than the number of operations in the target operations.
11. The method of claim 10, wherein compiling the subset of operations corresponding to each of the compilation regions using the target compiler separately comprises:
inputting official data to the second machine learning model;
and compiling the second machine learning model by using the target compiler to obtain a processing result output after the second machine learning model processes the formal data.
12. An apparatus for compiling and optimizing a machine learning model, comprising:
the running module is used for running a first machine learning model to respectively process N data sets to obtain N groups of model running information, wherein for each data set, the first machine learning model is run for multiple times to obtain one group of model running information, the first machine learning model comprises a first group of operations, the operations in the first group of operations are operations allowed to be processed by a target compiler, each group of model running information is used for indicating whether the shapes of the input tensor and the output tensor of each operation in the first group of operations are changed or not in the process of running the first machine learning model for multiple times to process one data set, and N is a natural number greater than 1;
a first determining module, configured to determine N operation sets in the first group of operations according to the N groups of model operation information;
a second determining module, configured to determine, according to the N operation sets, a plurality of operation subsets that are allowed to be compiled respectively in the first group of operations, where the plurality of operation subsets are set to be allowed to be compiled respectively by the target compiler;
and the compiling module is used for compiling the operation subset corresponding to each compiling area by using the target compiler respectively.
13. A computer-readable storage medium comprising a stored program, wherein the program when executed performs the method of any of claims 1 to 11.
14. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 11 by means of the computer program.
CN202010366380.6A 2020-04-30 2020-04-30 Compiling optimization method and device of machine learning model Active CN111580828B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010366380.6A CN111580828B (en) 2020-04-30 2020-04-30 Compiling optimization method and device of machine learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010366380.6A CN111580828B (en) 2020-04-30 2020-04-30 Compiling optimization method and device of machine learning model

Publications (2)

Publication Number Publication Date
CN111580828A true CN111580828A (en) 2020-08-25
CN111580828B CN111580828B (en) 2021-08-27

Family

ID=72124618

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010366380.6A Active CN111580828B (en) 2020-04-30 2020-04-30 Compiling optimization method and device of machine learning model

Country Status (1)

Country Link
CN (1) CN111580828B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650922A (en) * 2016-09-29 2017-05-10 清华大学 Hardware neural network conversion method, computing device, compiling method and neural network software and hardware collaboration system
US20180157470A1 (en) * 2016-12-05 2018-06-07 Fujitsu Limited Compilation method and information processing apparatus
CN109656566A (en) * 2018-12-14 2019-04-19 北京中科寒武纪科技有限公司 Heterogeneous computing system executable file acquisition methods, operation method and Related product
CN109871322A (en) * 2019-01-28 2019-06-11 华南理工大学 A kind of program topic automatic scoring method based on machine learning
CN110515626A (en) * 2019-08-20 2019-11-29 Oppo广东移动通信有限公司 The code compiling method and Related product of deep learning Computational frame
CN110537193A (en) * 2018-10-24 2019-12-03 阿里巴巴集团控股有限公司 The quick calculating of convolutional neural networks
CN110580527A (en) * 2018-06-08 2019-12-17 上海寒武纪信息科技有限公司 method and device for generating universal machine learning model and storage medium
CN110659728A (en) * 2019-09-24 2020-01-07 上海寒武纪信息科技有限公司 Neural network optimization method and device, computer equipment and storage medium
CN110727437A (en) * 2019-09-10 2020-01-24 平安普惠企业管理有限公司 Code optimization item acquisition method and device, storage medium and electronic equipment
CN110766147A (en) * 2018-07-25 2020-02-07 赛灵思公司 Neural network compiler architecture and compiling method
US10592213B2 (en) * 2016-10-19 2020-03-17 Intel Corporation Preprocessing tensor operations for optimal compilation
CN110908667A (en) * 2019-11-18 2020-03-24 北京迈格威科技有限公司 Method and device for joint compilation of neural network and electronic equipment

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106650922A (en) * 2016-09-29 2017-05-10 清华大学 Hardware neural network conversion method, computing device, compiling method and neural network software and hardware collaboration system
US10592213B2 (en) * 2016-10-19 2020-03-17 Intel Corporation Preprocessing tensor operations for optimal compilation
US20180157470A1 (en) * 2016-12-05 2018-06-07 Fujitsu Limited Compilation method and information processing apparatus
CN110580527A (en) * 2018-06-08 2019-12-17 上海寒武纪信息科技有限公司 method and device for generating universal machine learning model and storage medium
CN110766147A (en) * 2018-07-25 2020-02-07 赛灵思公司 Neural network compiler architecture and compiling method
CN110537193A (en) * 2018-10-24 2019-12-03 阿里巴巴集团控股有限公司 The quick calculating of convolutional neural networks
CN109656566A (en) * 2018-12-14 2019-04-19 北京中科寒武纪科技有限公司 Heterogeneous computing system executable file acquisition methods, operation method and Related product
CN109871322A (en) * 2019-01-28 2019-06-11 华南理工大学 A kind of program topic automatic scoring method based on machine learning
CN110515626A (en) * 2019-08-20 2019-11-29 Oppo广东移动通信有限公司 The code compiling method and Related product of deep learning Computational frame
CN110727437A (en) * 2019-09-10 2020-01-24 平安普惠企业管理有限公司 Code optimization item acquisition method and device, storage medium and electronic equipment
CN110659728A (en) * 2019-09-24 2020-01-07 上海寒武纪信息科技有限公司 Neural network optimization method and device, computer equipment and storage medium
CN110908667A (en) * 2019-11-18 2020-03-24 北京迈格威科技有限公司 Method and device for joint compilation of neural network and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
量子位: "Facebook发布张量理解库,自动编译高性能机器学习核心", 《HTTPS://BLOG.CSDN.NET/YH0VLDE8VG8EP9VGE/ARTICLE/DETAILS/79329700》 *

Also Published As

Publication number Publication date
CN111580828B (en) 2021-08-27

Similar Documents

Publication Publication Date Title
Chiang et al. Cluster-gcn: An efficient algorithm for training deep and large graph convolutional networks
Ji et al. ispan: Parallel identification of strongly connected components with spanning trees
Sethi et al. HFIM: a Spark-based hybrid frequent itemset mining algorithm for big data processing
CN112287015B (en) Image generation system, image generation method, electronic device, and storage medium
Kumar et al. Undersampled K-means approach for handling imbalanced distributed data
CN111382347A (en) Object feature processing and information pushing method, device and equipment
CN106557307B (en) Service data processing method and system
Zdravevski et al. Feature ranking based on information gain for large classification problems with mapreduce
Xie et al. Graphiler: Optimizing graph neural networks with message passing data flow graph
CN111580826B (en) Compiling optimization method and device of machine learning model
CN114139022B (en) Subgraph extraction method and device
CN111580828B (en) Compiling optimization method and device of machine learning model
CN101635001A (en) Method and apparatus for extracting information from a database
CN111580827B (en) Compiling optimization method and device of machine learning model
Schiller et al. Compile-and run-time approaches for the selection of efficient data structures for dynamic graph analysis
CN111028092A (en) Community discovery method based on Louvain algorithm, computer equipment and readable storage medium thereof
JP5555238B2 (en) Information processing apparatus and program for Bayesian network structure learning
CN107291439A (en) A kind of target delta data construction method and device
CN111901500A (en) Image processing method and apparatus, storage medium, and electronic apparatus
CN112116403A (en) Information recommendation method, device and equipment
Yuan et al. Scalable training of sparse linear svms
Verma et al. Iterative hadoop mapreduce-based subgraph enumeration in network motif analysis
CN116069676B (en) Version comparison method, device, terminal equipment and storage medium
CN113760489B (en) Resource allocation method and device
Dias et al. Experiencing dfanalyzer for runtime analysis of phylogenomic dataflows

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40027367

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant