CN116126341A

CN116126341A - Model compiling method, device, computer equipment and computer readable storage medium

Info

Publication number: CN116126341A
Application number: CN202211733228.2A
Authority: CN
Inventors: 白杨; 沈小勇; 吕江波
Original assignee: Shenzhen Smartmore Technology Co Ltd
Current assignee: Shenzhen Smartmore Technology Co Ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-05-16

Abstract

The application relates to a model compiling method, a model compiling device, computer equipment and a computer readable storage medium. The method comprises the following steps: acquiring a plurality of tensor programs; the tensor programs are a plurality of intermediate programs generated by compiling the model to be compiled through a deep learning compiler according to different compiling optimization strategies; performing feature extraction on the loop optimization statement in the tensor program aiming at each tensor program to obtain loop optimization features; coding the cyclic optimization characteristics and operator category information corresponding to the cyclic optimization statement through an attention coder to obtain characteristic coding data; carrying out cost prediction on the tensor program according to the feature coding data to obtain a cost prediction result; determining a target tensor program from a plurality of tensor programs based on cost prediction results corresponding to the tensor programs; the target tensor program is used for indicating that the machine language is generated by the deep learning compiler for compiling the model to be compiled. The method can improve the efficiency of model compiling.

Description

Model compiling method, device, computer equipment and computer readable storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a method and apparatus for compiling a model, a computer device, and a computer readable storage medium.

Background

With the continued development of deep learning, an increasing number of deep learning models have become the dominant configuration of automated machine learning systems. Adapting a wide variety of hardware to operators in a deep learning model by hand would be a very tedious and error-prone matter. To cope with the above problems, model compilation technology has become a widely focused technical direction.

In the traditional compiling method, a cost prediction function based on a statistical method is often adopted, and the optimal compiling mode is explored in a predefined solution space through a search algorithm. However, the cost prediction function does not have good cost prediction performance, so that the search needs to be repeated multiple times, which results in inefficiency of the whole compiling process.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a model compiling method, apparatus, computer device, computer readable storage medium, and computer program product that can improve the efficiency of model compiling.

In a first aspect, the present application provides a model compiling method, including:

acquiring a plurality of tensor programs; the tensor programs are a plurality of intermediate programs generated by compiling the model to be compiled through a deep learning compiler according to different compiling optimization strategies;

performing feature extraction on the loop optimization statement in the tensor program aiming at each tensor program to obtain loop optimization features; the loop optimization statement is a program statement with loop optimization characteristics;

coding the cyclic optimization characteristics and operator category information corresponding to the cyclic optimization statement through an attention coder to obtain characteristic coding data;

carrying out cost prediction on the tensor program according to the feature coding data to obtain a cost prediction result;

determining a target tensor program from a plurality of tensor programs based on cost prediction results corresponding to the tensor programs; the target tensor program is used for indicating that the machine language is generated by the deep learning compiler for compiling the model to be compiled.

In a second aspect, the present application further provides a model compiling apparatus, including:

the acquisition module is used for acquiring a plurality of tensor programs; the tensor programs are a plurality of intermediate programs generated by compiling the model to be compiled through a deep learning compiler according to different compiling optimization strategies;

The extraction module is used for extracting the characteristics of the cyclic optimization sentences in the tensor program aiming at each tensor program to obtain cyclic optimization characteristics; the loop optimization statement is a program statement with loop optimization characteristics;

the cost prediction module is used for carrying out coding processing on the cyclic optimization characteristics and operator category information corresponding to the cyclic optimization statement through the attention coder to obtain characteristic coding data; carrying out cost prediction on the tensor program according to the feature coding data to obtain a cost prediction result;

the compiling module is used for determining a target tensor program from a plurality of tensor programs based on cost prediction results corresponding to the tensor programs; the target tensor program is used for indicating that the machine language is generated by the deep learning compiler for compiling the model to be compiled.

In a third aspect, the present application further provides a computer device, where the computer device includes a memory and a processor, where the memory stores a computer program, and where the processor implements the steps in the above-described model compilation method when executing the computer program.

In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above model compilation method.

In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the above model compilation method.

According to the model compiling method, device, computer equipment, storage medium and computer program product, the deep learning compiler is used for generating a plurality of tensor programs for the model to be compiled according to different compiling optimization strategies, feature extraction is carried out on the cyclic optimization statement in each tensor program to obtain cyclic optimization features, the cyclic optimization features can reflect the compiling optimization strategies corresponding to the corresponding tensor programs, and the cost prediction results obtained based on the cyclic optimization features can reflect the advantages and disadvantages of the compiling optimization strategies. And then, through the attention coder, the more important information in the cyclic optimization feature and operator category information corresponding to the cyclic optimization statement is selectively focused on based on the attention mechanism, so that coding processing is performed, feature coding data is obtained, compared with a mode of simply carrying out statistical prediction according to data distribution, the cost prediction result obtained by carrying out cost prediction according to the feature coding data is more accurate, the accurate cost prediction result can accurately indicate a more optimal tensor program, and then a target tensor program is determined from a plurality of tensor programs according to the cost prediction result, so that repeated searching is not needed, and the efficiency of model compiling is improved.

Drawings

Fig. 1 is a schematic flow chart of a model compiling method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of merging loop optimization features and operator class information according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an encoding process by an attention encoder according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a cost prediction model according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a migration cost prediction model according to an embodiment of the present application

FIG. 6 is a block diagram of a model compiling apparatus according to an embodiment of the present application;

FIG. 7 is an internal block diagram of a computer device according to an embodiment of the present application;

FIG. 8 is an internal block diagram of another computer device according to an embodiment of the present application;

fig. 9 is an internal structural diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In some embodiments, as shown in fig. 1, a model compiling method is provided, which is exemplified by application of the method to a computer device, and includes the following steps:

s102, acquiring a plurality of tensor programs.

The tensor programs are intermediate programs generated by compiling the model to be compiled through the deep learning compiler according to different compiling optimization strategies.

Illustratively, the computer device may parse the source program of the model to be compiled by the deep learning compiler to obtain a parsed intermediate program. It will be appreciated that the parsed intermediate program is an intermediate language between the source program and the machine language, corresponding to the internal representation of the source program. The computer equipment can respectively perform equivalent transformation on the intermediate program after the grammar analysis according to different compiling optimization strategies through the deep learning compiler to obtain a plurality of tensor programs. It is to be understood that the equivalent transformation refers to transformation performed without changing the result of the execution of the intermediate program after the parsing. One compilation optimization strategy is an equivalent transformation.

In some embodiments, the computer device may comprise at least one of a terminal and a server. It is understood that the model compiling method provided in the present application may be executed by a terminal or a server alone or by a system including the terminal and the server.

In some embodiments, the terminal may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, or portable wearable devices, which may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, etc. The portable wearable device may be a smart watch, smart bracelet, headset, or the like.

In some embodiments, the server may be implemented as a stand-alone server or as a cluster of servers.

S104, carrying out feature extraction on the loop optimization statement in the tensor program aiming at each tensor program to obtain the loop optimization feature.

The loop optimization statement is a program statement with loop optimization characteristics. Loop optimization is the optimization of a loop statement by pointers. The loop optimization characteristics refer to the characteristics of loop optimization, such as loop expansion, loop parallelization, loop order change, loop merging, blocking (tiling), loop reordering (reorder), and the like. The cycle optimization feature can characterize the cycle optimization characteristic possessed by the cycle optimization statement.

For example, the computer device may determine program statements in respective loop bodies included in each tensor program, resulting in loop optimization statements. The computer device may perform feature extraction on each loop optimization statement in each tensor program, to obtain a loop optimization feature corresponding to each loop optimization statement.

S106, coding the cyclic optimization characteristics and operator category information corresponding to the cyclic optimization statement through the attention coder to obtain characteristic coding data.

The operator category refers to the category of operators, and different network layers in the model to be compiled can include different operators, such as a convolution layer operator, a full-connection layer operator, an activation layer operator, and the like. The operator category information is used for indicating the category of the operator corresponding to the loop optimization statement. It will be appreciated that the intermediate program includes program statements for each operator; the program statement of each operator corresponds to an expression of the operator for expressing the operator.

For example, the computer device may determine operator class information and loop optimization features corresponding to each loop optimization statement, resulting in respective matched operator class information and loop optimization features. The computer equipment can take the matched operator category information and the circulation optimization feature as the input of a cost prediction model, and the attention encoder in the cost prediction model carries out encoding processing on the matched operator category information and the circulation optimization feature to obtain feature encoding data.

S108, carrying out cost prediction on the tensor program according to the feature coding data to obtain a cost prediction result.

The cost prediction result is used for indicating the program performance corresponding to the tensor program. It will be appreciated that the tensor program in fact corresponds to pseudocode for indicating that the model to be compiled is compiled into a machine language, and that the above-mentioned program performance actually refers to the program performance of the machine language that is compiled based on the tensor program.

The computer device can implement cost prediction of the tensor program by performing feature mapping processing on the feature coded data, so as to obtain a cost prediction result.

In some embodiments, the computer device may perform feature mapping processing on the feature encoded data through a fully connected layer in the cost prediction model.

In some embodiments, the cost prediction result corresponding to each tensor program is used to indicate how fast or how long the machine language compiled based on the tensor program is running on the processor. It will be appreciated that the cost prediction for a tensor program is in effect a performance prediction for a machine language compiled based on the tensor program.

S110, determining a target tensor program from a plurality of tensor programs based on cost prediction results corresponding to the tensor programs.

The target tensor program is used for indicating that the machine language is compiled and generated for the model to be compiled through the deep learning compiler. The program performance corresponding to the target tensor program is superior to the program performance corresponding to other tensor programs except the target tensor program.

Illustratively, the cost prediction result may include a cost prediction score. The magnitude of the cost prediction score is positively correlated with the program performance quality corresponding to the tensor program, namely, the larger the cost prediction score of the tensor program is, the better the corresponding program performance is. The computer device may determine a tensor program with a highest cost prediction score from the plurality of tensor programs, and obtain the target tensor program. The computer device may convert the target tensor program to machine language through a deep learning compiler.

In the model compiling method, a plurality of tensor programs are generated for the model to be compiled according to different compiling optimization strategies by the deep learning compiler, feature extraction is carried out on the cyclic optimization statement in each tensor program to obtain cyclic optimization features, the cyclic optimization features can reflect the compiling optimization strategies corresponding to the corresponding tensor programs, and the cost prediction results obtained based on the cyclic optimization features can reflect the advantages and disadvantages of the compiling optimization strategies. And then, through the attention coder, the more important information in the cyclic optimization feature and operator category information corresponding to the cyclic optimization statement is selectively focused on based on the attention mechanism, so that coding processing is performed, feature coding data is obtained, compared with a mode of simply carrying out statistical prediction according to data distribution, the cost prediction result obtained by carrying out cost prediction according to the feature coding data is more accurate, the accurate cost prediction result can accurately indicate a more optimal tensor program, and then a target tensor program is determined from a plurality of tensor programs according to the cost prediction result, so that repeated searching is not needed, and the efficiency of model compiling is improved.

In some embodiments, feature extraction is performed on a loop optimization statement in a tensor program to obtain a loop optimization feature, including:

determining a loop body in the tensor program; each loop body includes at least one program statement;

determining a loop optimization statement corresponding to the loop body from at least one program statement included in the loop body for each loop body;

and extracting the characteristics of the circulation optimization statement to obtain the circulation optimization characteristics corresponding to the circulation body.

The loop structure in the tensor program consists of a loop body and a loop judgment condition.

For example, the computer device may determine each loop body in the tensor program, and determine an innermost program statement from each loop body to obtain a loop optimization statement corresponding to each loop body. The computer equipment can extract the characteristics of each cycle optimization statement, map the cycle optimization characteristics of the cycle optimization statement into cycle optimization vectors, and obtain the cycle optimization characteristics corresponding to each cycle optimization statement. It can be appreciated that feature extraction of the loop optimization statement is equivalent to vector modeling of the loop optimization statement, which is essentially the mapping of features of the loop optimization statement to a vector system.

In some embodiments, the dimension of the loop optimization feature may be a preset feature dimension. The computer device may perform feature extraction on the loop optimization statement to obtain a loop optimization feature of the feature dimension. For example, the feature dimension may be 1×164.

In this embodiment, a loop body in the tensor program is determined; determining a loop optimization statement corresponding to the loop body from at least one program statement included in the loop body for each loop body; extracting features of the circulation optimization statement to obtain circulation optimization features corresponding to the circulation body; the loop body often determines the running speed of the whole operator, is a calculation bottleneck in the compiling and optimizing process, and is beneficial to optimizing the whole operator by carrying out cost prediction based on the loop optimization characteristics, so that the compiling efficiency of the model can be effectively improved.

In some embodiments, the encoding processing, by the attention encoder, the loop optimization feature and the operator class information corresponding to the loop optimization statement to obtain feature encoded data includes:

determining the class of an operator corresponding to each cyclic optimization statement to obtain operator class information; combining the cyclic optimization features corresponding to the cyclic optimization sentences and operator category information to obtain comprehensive features;

And (3) carrying out coding processing on the comprehensive characteristics of each cycle optimization statement through the attention coder to obtain characteristic coding data.

For example, the computer device may determine a class of an operator corresponding to each loop optimization statement, resulting in operator class information corresponding to the loop optimization statement. The computer equipment can splice the cyclic optimization features corresponding to each cyclic optimization statement and the operator category information to obtain the comprehensive features corresponding to each cyclic optimization statement. The computer equipment can carry out coding processing on the comprehensive characteristics of each cycle optimization statement through a plurality of attention encoders in the cost prediction model to obtain characteristic coding data corresponding to the tensor program.

In some embodiments, a schematic diagram of merging loop optimization features and operator class information is provided as shown in FIG. 2. The program statements in the for loop structure shown in FIG. 2 are all used to express operators, including loop optimization statement1 (Computation statement 1) and loop optimization statement2 (Computation statement 2). The computer device may perform feature extraction on the loop optimization statement1 to obtain a vector of loop optimization features of 164 length (1×164 dimension). The computer device may determine a vector of operator class information of 10 lengths (1 x 10 dimensions) corresponding to the loop optimization statement. It will be appreciated that the data dimensions of the loop optimization feature and operator class information may be preset and are not limited to the 164-length data dimension, and the 10-length data dimension in this embodiment. And (3) obtaining the vector of the comprehensive feature with the length of 174 (1 x 174 dimension) by splicing the vector of the cyclic optimization feature and the vector of the operator class information. It should be noted that, the operator category information length is not less than the total number of operator categories included in any model, so as to avoid missing operator categories.

In the embodiment, for each cyclic optimization statement, determining the class of an operator corresponding to the cyclic optimization statement, and obtaining operator class information; combining the cyclic optimization features corresponding to the cyclic optimization sentences and operator category information to obtain comprehensive features; and (3) coding the comprehensive characteristics of each cycle optimization statement through the attention coder to obtain characteristic coding data, and carrying out cost prediction by combining the cycle optimization characteristics and corresponding operator category information, so that different operators in a model to be compiled are fully considered, and the accuracy of cost prediction is improved.

In some embodiments, the processing of the integrated features of each loop optimization statement by the attention encoder to obtain feature encoded data includes:

and carrying out coding processing on the comprehensive characteristics of each loop optimization statement based on the dependency relationship among the loop optimization statements through the attention coder to obtain characteristic coding data.

The dependency relationship between the loop optimization sentences is related to the loop body where the loop optimization sentences are located, and the dependency relationship can comprise an indirect dependency relationship or a direct dependency relationship. The loop bodies directly depend on each other, so that the corresponding loop optimization statement has a direct dependency relationship; and (3) indirectly depending on the loop bodies, and then indirectly depending relationships exist among corresponding loop optimization sentences.

Illustratively, the computer device performs self-attention computation on the integrated features of the respective loop optimization statements through an attention encoder to learn the dependency relationships between the plurality of loop optimization statements. It can be appreciated that the higher the degree of dependency between a loop optimization statement and other loop optimization statements other than the loop optimization statement, the greater the attention weight corresponding to the comprehensive features of the loop optimization statement. The computer equipment can give attention weight to the comprehensive characteristics according to the degree of dependence among a plurality of loop optimization sentences through the attention encoder so as to realize the encoding of the comprehensive characteristics of each loop optimization sentence and obtain the characteristic encoding data corresponding to the tensor program.

In some embodiments, a schematic diagram of the encoding process performed by the attention encoder is provided as shown in FIG. 3. The attention encoder may be an attention encoder in a transducer. The tensor program comprises a loop optimization statement 1, a loop optimization statement 2, a loop optimization statement 3 and a loop optimization statement 4. The computer equipment can obtain comprehensive characteristics by combining the cyclic optimization characteristics and operator category information respectively corresponding to each cyclic optimization statement. V1 represents the comprehensive characteristics of the cyclic optimization statement 1, V2 represents the comprehensive characteristics of the cyclic optimization statement 2, V3 represents the comprehensive characteristics of the cyclic optimization statement 3, and V4 represents the comprehensive characteristics of the cyclic optimization statement 4. The computer device can learn the dependency relationship among V1, V2, V3 and V4 through the attention encoder to obtain Key data K (Key), query data Q (Query) and Value data V (Value). The computer device may calculate a correlation between the keyword data and the query data through the attention encoder to obtain a weight corresponding to the value data, and perform a weighting process on the value data using the weight corresponding to the value data to obtain the feature encoded data. The computer device may obtain the cost prediction score by feature mapping the feature encoded data.

In this embodiment, by using the attention encoder, based on the dependency relationship among the multiple loop optimization sentences, the comprehensive feature of each loop optimization sentence is encoded to obtain feature encoded data, instead of simply analyzing the loop optimization sentences, the dependency relationship among the loop optimization sentences is fully considered for encoding, and the cost prediction result obtained based on the feature encoded data is more accurate.

determining the comprehensive characteristics of each cycle optimization statement as the input of a cost prediction model, and carrying out coding processing on the comprehensive characteristics of each cycle optimization statement through a plurality of attention encoders in the cost prediction model to obtain characteristic coding data; wherein the output of the last attention encoder is used as the input of the next attention encoder.

For example, the computer device may determine the integrated feature of each loop optimization statement as an input of a cost prediction model, perform encoding processing on the integrated feature of each loop optimization statement by using a first attention encoder in the cost prediction model, and take an output of the first attention encoder as an input of a next attention encoder, perform encoding processing by using the next attention encoder, and obtain feature encoded data output by the next attention encoder. The structure of each attention encoder is the same.

In this embodiment, the comprehensive features of each cycle optimization statement are determined as input of a cost prediction model, and the comprehensive features of each cycle optimization statement are encoded by a plurality of attention encoders in the cost prediction model to obtain feature encoded data, and then an accurate cost prediction result can be obtained based on the feature encoded data.

In some embodiments, the attention encoder is disposed in a cost prediction model; the cost prediction model also comprises a multi-layer perceptron; carrying out cost prediction on the tensor program according to the feature coding data to obtain a cost prediction result, wherein the cost prediction result comprises the following steps:

and determining the feature coding data as the input of the multi-layer perceptron, and performing feature mapping processing on the feature coding data through the multi-layer perceptron to obtain a cost prediction result corresponding to the tensor program.

Illustratively, the cost prediction model includes an attention encoder and a fully connected layer. The fully connected layer may be a multi-layer perceptron. The computer equipment can perform characteristic linear mapping processing on the characteristic coding data through the multi-layer perceptron in the cost prediction model to obtain a cost prediction result corresponding to the tensor program. It will be appreciated that the multi-layer perceptron functions as a fully connected.

In some embodiments, a schematic structural diagram of a cost prediction model is provided as shown in fig. 4. The cost prediction model comprises a calculation layer and a regression layer. The calculation layer comprises two attention encoders, and the two attention encoders are identical in structure. The attention encoder includes a first normalization layer, an embedding layer, a multi-headed attention layer, a second normalization layer, and a first multi-layered perceptron. It can be understood that the cost prediction model provided in this embodiment is very simple in structure and simple to implement, and the cost prediction model can be embedded into the deep learning compiler by using the pyresch framework without too complex programming work.

The computer device can normalize the input of the attention encoder through a first normalization layer, perform primary encoding on the output of the first normalization layer through an embedding layer, perform self-attention calculation on the output of the embedding layer through a multi-head attention layer, normalize the output of the overlapped multi-head attention layer and the output of the embedding layer through a second normalization layer, and perform characteristic linear mapping on the output of the second normalization layer through a first multi-layer perceptron to obtain the output of the attention encoder. The number of attention heads and the size of the multi-head attention layer may be preset, for example, the number of attention heads may be 4, and the size may be 512.

The regression layer comprises a second multi-layer perceptron, the computer equipment can take the characteristic coding data as the input of the regression layer, and the characteristic linear mapping is carried out on the characteristic coding data through the second multi-layer perceptron in the regression layer to obtain a cost prediction result. The cost prediction model may be deployed on a processor included in the computer device, with model parameters continually adjusted by on-line training on the processor.

In this embodiment, the feature encoding data is determined as input of the multi-layer perceptron, feature mapping processing is performed on the feature encoding data by the multi-layer perceptron to obtain a cost prediction result corresponding to the tensor program, and then the target tensor program is determined from a plurality of tensor programs according to the accurate cost prediction result, so that repeated searching is not needed, and the efficiency of model compiling is improved.

In some embodiments, the cost prediction result is an output result obtained by taking the cyclic optimization feature and the operator class information corresponding to the cyclic optimization statement as the input of the cost prediction model; the method further comprises the steps of:

acquiring an initial cost prediction model; the initial cost prediction model is obtained by offline training on the first processor based on a labeled sample tensor program;

And migrating the initial cost prediction model to a second processor, and performing online training on the initial cost prediction model to obtain a cost prediction model.

Wherein the first processor and the second processor may be different processors. The labels of the sample tensor program are derived by the first processor executing the sample machine language. The sample machine language is compiled by the deep learning compiler from a sample tensor program.

For example, the computer device may generate a sample machine language based on sample tensor program compilation by a deep learning compiler. The computer equipment obtains the execution speed or the execution time of the first processor executing the sample machine language, and obtains the label of the sample tensor program. The computer device may perform offline training on the cost prediction model to be trained on the first processor based on the labeled sample tensor program to obtain an initial cost prediction model. The computer device may obtain an initial cost prediction model, migrate the initial cost model to the second processor, and perform online training on the initial cost prediction model on the second processor to obtain a cost prediction model.

It should be noted that the first processor is a source processor, that is, the initial cost prediction model is a model in the source domain. The second processor is the target processor, i.e. the cost prediction model is the model in the target domain. The initial cost prediction model is trained offline on the first processor, then the initial cost prediction model is migrated to the second processor to perform online training on the second processor, the initial cost prediction model learns sufficient knowledge in a source domain with enough size in an offline training mode, and the learned knowledge is converted into the cost prediction model in a target domain through iterative training times as few as possible. It will be appreciated that the domain of the first processor is not smaller than the domain of the second processor, i.e. the source domain is not smaller than the target domain.

In some embodiments, a schematic diagram of a migration cost prediction model is provided as shown in fig. 5. The domain of the first processor is a source domain and the domains of the second processor and the third processor are target domains. The first processor, the second processor, and the third processor may be graphics processors. The computer device may migrate the migration features in the initial cost prediction model in the source domain to the target domain in the second processor or the third processor.

In the embodiment, an initial cost prediction model is obtained; and migrating the initial cost prediction model to a second processor, performing online training on the initial cost prediction model to obtain a cost prediction model, and realizing migration of the cost prediction model in an online training mode on a target processor by offline training on a source processor, so that the cost prediction model can adapt to various processors.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a model compiling device. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in one or more embodiments of the model compiling apparatus provided below may refer to the limitation of the model compiling method hereinabove, and will not be repeated herein.

In some embodiments, as shown in fig. 6, there is provided a model compiling apparatus 600 including:

an acquisition module 602, configured to acquire a plurality of tensor programs; the tensor programs are a plurality of intermediate programs generated by compiling the model to be compiled through a deep learning compiler according to different compiling optimization strategies;

the extracting module 604 is configured to perform feature extraction on the loop optimization statement in the tensor program for each tensor program, so as to obtain a loop optimization feature; the loop optimization statement is a program statement with loop optimization characteristics;

the cost prediction module 606 is configured to perform encoding processing on the cyclic optimization feature and the operator class information corresponding to the cyclic optimization statement through the attention encoder, so as to obtain feature encoded data; carrying out cost prediction on the tensor program according to the feature coding data to obtain a cost prediction result;

A compiling module 608, configured to determine a target tensor program from a plurality of tensor programs based on a cost prediction result corresponding to each tensor program; the target tensor program is used for indicating that the machine language is generated by the deep learning compiler for compiling the model to be compiled.

In some embodiments, in extracting features of the loop optimization statement in the tensor program, the extracting module 604 is specifically configured to:

In some embodiments, in terms of encoding the cyclic optimization feature and the operator class information corresponding to the cyclic optimization statement by the attention encoder to obtain feature encoded data, the cost prediction module 606 is specifically configured to:

In some embodiments, in terms of encoding the composite feature of each loop optimization statement by the attention encoder to obtain feature encoded data, the cost prediction module 606 is specifically configured to:

In some embodiments, in terms of encoding the composite feature 5 of each loop optimization statement by the attention encoder to obtain feature encoded data, the cost prediction module 606 is specifically configured to:

0 in some embodiments, the attention encoder is set in a cost prediction model; the cost prediction model also comprises a multi-layer perceptron; in terms of performing cost prediction on the tensor program according to the feature coding data to obtain a cost prediction result, the cost prediction module 606 is specifically configured to:

5 in some embodiments, the cost prediction result is an output result obtained by taking the cyclic optimization feature and the operator class information corresponding to the cyclic optimization statement as the input of the cost prediction model; the acquisition module 602 is further configured to: acquiring an initial cost prediction model; the initial cost prediction model is obtained by offline training on the first processor based on a labeled sample tensor program; and migrating the initial cost prediction model to a second processor, and performing online training on the initial cost prediction model to obtain a cost prediction model.

The respective modules in the above model compiling apparatus may be realized in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In some embodiments, a computer device is provided, which may be a server, 5 whose internal structure may be as shown in fig. 7. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store tensor programs. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external computer device through a network connection. The computer program is executed by a processor to implement the steps in the model compilation method described above.

In some embodiments, a computer device is provided, which may be a computer device, the internal structure of which may be as shown in fig. 8. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for conducting wired or wireless communication with external computer devices, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement the steps in the model compilation method described above. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structures shown in fig. 7 or 8 are merely block diagrams of portions of structures related to the aspects of the present application and are not intended to limit the computer devices to which the aspects of the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or may have a different arrangement of components.

In some embodiments, a computer device is provided, the computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor performing the steps of the method embodiments described above when the computer program is executed.

In some embodiments, an internal structural diagram of a computer-readable storage medium is provided as shown in fig. 9, where the computer-readable storage medium stores a computer program that, when executed by a processor, implements the steps of the method embodiments described above.

In some embodiments, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data are required to comply with the related laws and regulations and standards of the related countries and regions.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not thereby to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. A model compilation method, comprising:

acquiring a plurality of tensor programs; the tensor programs are intermediate programs generated by compiling the model to be compiled through a deep learning compiler according to different compiling optimization strategies;

determining a target tensor program from the plurality of tensor programs based on a cost prediction result corresponding to each tensor program; the target tensor program is used for indicating the machine language to be compiled and generated for the model to be compiled through the deep learning compiler.

2. The method according to claim 1, wherein the feature extraction of the loop optimization statement in the tensor program to obtain the loop optimization feature includes:

determining a loop body in the tensor program; each of the loop bodies includes at least one program statement;

determining a loop optimization statement corresponding to each loop body from at least one program statement included in the loop body according to each loop body;

3. The method according to claim 1, wherein the encoding, by the attention encoder, the loop optimization feature and the operator class information corresponding to the loop optimization statement to obtain feature encoded data includes:

Determining the class of an operator corresponding to each cyclic optimization statement according to each cyclic optimization statement to obtain operator class information; combining the cyclic optimization features corresponding to the cyclic optimization sentences and operator category information to obtain comprehensive features;

and carrying out coding processing on the comprehensive characteristics of each loop optimization statement through an attention coder to obtain characteristic coding data.

4. A method according to claim 3, wherein said encoding, by an attention encoder, of said integrated features of each of said loop optimization statements to obtain feature encoded data comprises:

and carrying out coding processing on the comprehensive characteristics of each loop optimization statement based on the dependency relationship among a plurality of loop optimization statements through an attention coder to obtain characteristic coding data.

5. A method according to claim 3, wherein said encoding, by an attention encoder, of said integrated features of each of said loop optimization statements to obtain feature encoded data comprises:

determining the comprehensive characteristics of each cyclic optimization statement as input of a cost prediction model, and carrying out coding processing on the comprehensive characteristics of each cyclic optimization statement through a plurality of attention encoders in the cost prediction model to obtain characteristic coding data; wherein the output of the last attention encoder is used as the input of the next attention encoder.

6. The method of claim 1, wherein the attention encoder is disposed in a cost prediction model; the cost prediction model also comprises a multi-layer perceptron; and carrying out cost prediction on the tensor program according to the characteristic coding data to obtain a cost prediction result, wherein the cost prediction result comprises the following steps:

and determining the characteristic coding data as the input of the multi-layer perceptron, and performing characteristic mapping processing on the characteristic coding data through the multi-layer perceptron to obtain a cost prediction result corresponding to the tensor program.

7. The method according to any one of claims 1 to 6, wherein the cost prediction result is an output result obtained by taking the loop optimization feature and operator class information corresponding to the loop optimization statement as input of a cost prediction model; the method further comprises the steps of:

acquiring an initial cost prediction model; the initial cost prediction model is obtained by offline training on a first processor based on a labeled sample tensor program;

and migrating the initial cost prediction model to a second processor, and performing online training on the initial cost prediction model to obtain the cost prediction model.

8. A model compiling apparatus, comprising:

the acquisition module is used for acquiring a plurality of tensor programs; the tensor programs are intermediate programs generated by compiling the model to be compiled through a deep learning compiler according to different compiling optimization strategies;

the extraction module is used for extracting the characteristics of the cyclic optimization sentences in the tensor programs aiming at each tensor program to obtain cyclic optimization characteristics; the loop optimization statement is a program statement with loop optimization characteristics;

the compiling module is used for determining a target tensor program from the plurality of tensor programs based on a cost prediction result corresponding to each tensor program; the target tensor program is used for indicating the machine language to be compiled and generated for the model to be compiled through the deep learning compiler.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.