WO2021057807A1

WO2021057807A1 - Deep learning model generation method and apparatus, device, and storage medium

Info

Publication number: WO2021057807A1
Application number: PCT/CN2020/117196
Authority: WO
Inventors: 谭志鹏; 刘耀勇; 蒋燚
Original assignee: Oppo广东移动通信有限公司
Priority date: 2019-09-23
Filing date: 2020-09-23
Publication date: 2021-04-01
Also published as: CN110598855B; CN110598855A

Abstract

Embodiments of the present application relate to the field of deep learning. Disclosed are a deep learning model generation method and apparatus, a device, and a storage medium. The method comprises: generating a first source file according to a model file of a deep learning model, the model file comprising a weight matrix in the deep learning model; obtaining a second source file corresponding to the deep learning model; and compiling the first source file and the second source file to generate a target file corresponding to the deep learning model. The method provided in the embodiments of the present application is used, and the first source file is generated in advance according to the weight matrix in the deep learning model, so that in a compiling process, the first source file and the second source file corresponding to a neural network structure are compiled to generate the target file corresponding to the deep learning model, data loading of the weight matrix can be completed in the compiling stage of the deep learning model, and the weight matrix is not required to be reloaded in the subsequent model reasoning process, thereby improving the reasoning efficiency of the deep learning model.

Description

Deep learning model generation method, device, equipment and storage medium

This application claims the priority of the Chinese patent application filed on September 23, 2019, with the application number 201910897445.7 and the invention title "Deep learning model generation method, device, equipment and storage medium", the entire content of which is incorporated herein by reference Applying.

Technical field

The embodiments of the present application relate to the field of deep learning, and in particular, to a method, device, device, and storage medium for generating a deep learning model.

Background technique

The network structure of deep learning is a kind of multilayer neural network, and most of the data in the model is the value of the weight matrix. In order to complete the model inference, the deep learning model will use a suitable data structure to define the neural network structure.

When a deep learning model performs model inference, it first needs to load the model into the neural network structure adopted by the deep learning model. The general method of model loading is to treat the model as a file, and load the model file when running the code of the neural network structure. To the memory, and then copy the data from the memory to the neural network structure.

Summary of the invention

The embodiments of the present application provide a method, device, device, and storage medium for generating a deep learning model. The data loading of the weight matrix can be completed during the compilation stage of the deep learning model, thereby improving the efficiency of deep learning model inference. The technical solution is as follows:

On the one hand, an embodiment of the present application provides a method for generating a deep learning model, and the method includes:

Generating a first source file according to the model file of the deep learning model, the model file containing the weight matrix in the deep learning model;

Acquiring a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model;

Compiling the first source file and the second source file to generate a target file corresponding to the deep learning model.

On the other hand, an embodiment of the present application provides an apparatus for generating a deep learning model, and the apparatus includes:

The first generating module is configured to generate a first source file according to the model file of the deep learning model, the model file containing the weight matrix in the deep learning model;

A first acquisition module, configured to acquire a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model;

The second generating module is configured to compile the first source file and the second source file to generate a target file corresponding to the deep learning model.

On the other hand, an embodiment of the present application provides a computer device that includes a processor and a memory; the memory stores at least one instruction, and the at least one instruction is used to be executed by the processor to implement The deep learning model generation method described in the foregoing aspect.

In another aspect, a computer-readable storage medium is provided, the storage medium stores at least one instruction, and the at least one instruction is used to be executed by a processor to implement the deep learning model generation method as described in the above aspect.

According to one aspect of the present application, a computer program product or computer program is provided, the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the deep learning model generation method provided in the various optional implementations of the foregoing aspects.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative work.

Fig. 1 is a schematic diagram of a neural network data structure provided by an exemplary embodiment of the present application;

Figure 2 is a schematic diagram of the data loading implementation of the deep learning model inference process in related technologies;

Fig. 3 is a flowchart of a method for generating a deep learning model provided by an exemplary embodiment of the present application;

Fig. 4 is a flowchart of a method for generating a deep learning model provided by another exemplary embodiment of the present application;

Fig. 5 is a flowchart of a method for generating a deep learning model provided by another exemplary embodiment of the present application;

Fig. 6 is a schematic diagram of an implementation of a deep learning model generation process provided by an exemplary embodiment of the present application;

Fig. 7 is a flowchart of a method for generating a deep learning model provided by another exemplary embodiment of the present application;

Fig. 8 is a structural block diagram of a deep learning model generating device provided by an exemplary embodiment of the present application;

Fig. 9 is a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.

detailed description

In order to make the objectives, technical solutions, and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below with reference to the accompanying drawings.

The "plurality" mentioned herein means two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects before and after are in an "or" relationship.

To facilitate understanding, some terms involved in the embodiments of the present application are briefly introduced below.

Deep learning model reasoning: The process of guessing and inferring unknown samples using the trained deep learning model is called deep learning model reasoning. More specifically, the trained deep learning model can apply the knowledge it has learned to tasks in the digital world, such as image recognition, speech recognition, and spam filtering. The deep learning model is based on its training content to obtain unknown samples Derivation, in the terminology of deep learning, is reasoning.

Source file: The source file refers to the code file written in assembly language or high-level language, and the computer cannot directly recognize the code in the source file.

Object file: The object file refers to the binary file that can be directly recognized by the central processing unit (CPU) generated after the source file is compiled by the compiler. It contains machine code, data used by the code at runtime, debugging information, etc. .

Rule file: Since the code of the neural network structure is composed of multiple source files, the rule file is required to describe the way to compile these source files to the compilation system.

Tensor: In the field of deep learning, the core of a tensor is a data container, which can be an array of any dimension, containing a name and a memory pointer, and the memory pointer points to the address of the data that needs to be loaded.

The computer equipment needs to load the deep learning model into the adopted neural network structure before inferring the deep learning model, and most of the loaded data is the weight matrix of the deep learning model. In order to complete the reasoning of the deep learning model, the computer equipment will use a suitable data structure to define the neural network. The general definition method is shown in Figure 1. The neural network contains multiple operators. The data is uniformly packaged and input into the neural network.

In related technologies, the computer equipment usually saves the deep learning model as a file. As shown in Figure 2, the model file 21 needs to be loaded into the memory when the deep learning model is inferred, and because the memory pointer of Tensor23 in the neural network 22 points to Corresponding to the memory address of the weight matrix 24, so the computer device needs to copy the data of the weight matrix 24 to the Tensor23 according to the memory address during the operation of the model inference. In addition, if the neural network structure adopted by the deep learning model runs a special version such as a graphics processor (Graphics Processing Unit, GPU) version, a digital signal processor (Digital Signal Processor, DSP) version, etc., the computer equipment also needs to be at runtime. Copy the data of the deep learning model from the CPU to the GPU or DSP.

Since the neural network structure is extremely sensitive to operating efficiency, this kind of data copy will seriously reduce the operating efficiency, especially for models with a large amount of data, which will seriously affect the inference efficiency of the deep learning model.

In order to improve the reasoning efficiency of the deep learning model, in the deep learning model generation method provided in the embodiment of the present application, the computer device completes data copy when the deep learning model is compiled. The computer equipment first generates the first source file according to the data of the weight matrix in the deep learning model file, and compiles it simultaneously with the source file of the neural network structure adopted by the deep learning model (ie, the second source file) to generate the corresponding deep learning model Based on this target file, deep learning model inference is performed.

Compared with the deep learning model loading method provided in related technologies, in this embodiment of the application, the first source file is generated by generating the weight matrix of the deep learning model, so that the step of data loading is completed during the compilation of the deep learning model. The deep learning model does not need to open the model file and copy the data when inferring, which greatly improves the operating efficiency of the neural network structure, and further improves the inference efficiency of the deep learning model.

The deep learning model generation method provided in the embodiments of the present application can be used in computer devices with strong data processing capabilities such as personal computers or servers. The deep learning model obtained through the deep learning model generation method can be implemented as an application or a part of the application and installed in the terminal to make it have deep learning capabilities, or the deep learning model obtained through the deep learning model generation method can be It is applied to the back-end server of the application program, so that the server provides deep learning model reasoning service for the application program in the terminal. For ease of presentation, each embodiment of the present application is described by taking a deep learning model generation method applied to a computer device as an example.

The deep learning model generation method provided by the embodiment of the application includes:

Generate the first source file according to the model file of the deep learning model, and the model file contains the weight matrix in the deep learning model;

Obtain a second source file corresponding to the deep learning model, where the second source file is the source file of the neural network structure adopted by the deep learning model;

Compile the first source file and the second source file to generate the target file corresponding to the deep learning model.

Optionally, the first source file is generated according to the model file of the deep learning model, including:

In the process of compiling the rule file corresponding to the source code, run the target script in the rule file. The rule file is used to describe the way of compiling the source file to the compilation system;

According to the model file, the first source file is generated through the target script.

Optionally, according to the model file, the first source file is generated through the target script, including:

For each weight matrix in the model file, a static array corresponding to each weight matrix is generated through the target script;

The first source file is generated according to the static array corresponding to each weight matrix.

Optionally, the static array corresponding to each weight matrix is generated through the target script, including:

According to the matrix size and data type of the weight matrix, set the static array through the target script. The array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type;

According to the matrix name of the weight matrix, the array name of the static array is generated through the target script;

According to the weight data contained in the weight matrix, the array value of the static array is generated through the target script.

Optionally, the first source file and the second source file are compiled to generate the target file corresponding to the deep learning model, including:

During the compilation process, according to the memory pointer corresponding to the target Tensor in the second source file, the target Tensor points to the target static array in the first source file, and the target static array has the same name as the target Tensor.

Optionally, after compiling the first source file and the second source file to generate the target file corresponding to the deep learning model, the method further includes:

When a deep learning model inference request is received, the target file is loaded into the memory, and the target file is executed to perform deep learning model inference.

Optionally, before generating the first source file according to the model file of the deep learning model, the method further includes:

Obtain the data volume of the model file;

If the amount of data is greater than the threshold, the step of generating the first source file according to the model file of the deep learning model is executed.

Obtain the running version of the neural network structure adopted by the deep learning model;

If the running version belongs to the preset running version, the step of generating the first source file according to the model file of the deep learning model is executed, and the preset running version includes at least one of the GPU running version and the DSP running version.

Optionally, the storage directory of the first source file is the same as the storage directory of the second source file.

Please refer to FIG. 3, which shows a flowchart of a method for generating a deep learning model according to an embodiment of the present application. In this embodiment, the method for generating a deep learning model is used in a computer device as an example for description, and the method includes:

Step 301: Generate a first source file according to the model file of the deep learning model, and the model file contains the weight matrix in the deep learning model.

Among them, the deep learning model can be used for image recognition (recognizing objects contained in the input image), voice recognition (content recognition of the input voice), and video description information generation (generating video description based on the input video) Information) model, the embodiment of this application does not describe the use of the deep learning model.

The data loading of the deep learning model is mainly the numerical loading of its weight matrix. In a possible implementation, the computer device first compiles the weight matrix in the model file before compiling the neural network structure adopted by the deep learning model. The first source file is generated with the value of, so that the source file can be used to load the data directly when compiling the neural network structure.

Step 302: Obtain a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model.

In a possible implementation manner, before compiling the neural network structure adopted by the deep learning model, the computer device needs to obtain the code of the neural network structure first, and the code of the neural network structure is saved in the second source file.

Among them, the neural network structure adopted by the deep learning model can be Convolutional Neural Networks (CNN), Recursive Neural Network (RNN), or Long Short-Term Memory (LSTM), etc. Etc., the embodiment of the present application does not limit this.

Step 303: Compile the first source file and the second source file to generate a target file corresponding to the deep learning model.

In the related art, since the first source file corresponding to the deep learning model is not generated in advance, the computer device directly compiles the source file of the neural network structure to generate the target file.

In the embodiment of the present application, since the computer device generates the first source file in advance, after the preparation of the first source file and the second source file is completed, the computer device uses the compiling system to compare the first source file and the second source file according to certain rules. The source files are compiled at the same time. During the compilation process, the value of each weight matrix in the model file is loaded from the first source file to the second source file, and the data loading of the model file is completed before the completion of the compilation. After the compilation is completed, the target file corresponding to the deep learning model is generated. The content of the target file is the machine code obtained by compiling the code in the first source file and the second source file, which can be directly recognized by the computer equipment, and the subsequent model inference is here On the basis of.

In summary, in the embodiment of the present application, the first source file is generated according to the weight matrix in the deep learning model in advance, so that during the compilation process, the first source file and the second source file corresponding to the neural network structure Compile to generate the target file corresponding to the deep learning model; compared to the need to load the weight matrix in the model file into the neural network structure in the inference phase in related technologies, in the embodiment of the application, the deep learning model is compiled in the phase The data loading of the weight matrix can be completed, and the weight matrix does not need to be reloaded in the subsequent model inference process, thereby improving the efficiency of the deep learning model inference.

Please refer to FIG. 4, which shows a flowchart of a method for generating a deep learning model according to another embodiment of the present application. In this embodiment, the method for generating a deep learning model is used in a computer device as an example for description, and the method includes:

In step 401, in the process of compiling the source code corresponding to the rule file, the target script in the rule file is executed, and the rule file is used to describe the way of compiling the source file to the compilation system.

Since the code of the neural network structure adopted by the deep learning model is composed of multiple source files, it is necessary to use the rule file to describe the way of compiling these source files to the compilation system. In a possible implementation manner, code for running a target script is added to the source code of the rule file. The target script is used to generate the first source file from the value of the weight matrix in the deep learning model. The target script may be Shell script.

Illustratively, the developer adds the code for running the target script prepare.sh to the source code of the rule file, and the computer device runs the target script during the process of compiling the source code of the rule file. In the Android system, the rule file can be Android.mk.

Step 402: According to the model file, a first source file is generated through the target script.

In a possible implementation manner, the computer device reads the data in the model file during the execution of the target script, so as to generate the first source file according to the read data.

Optionally, based on FIG. 4, as shown in FIG. 5, step 402 includes the following

steps

402A and 402B.

Step 402A: For each weight matrix in the model file, a static array corresponding to each weight matrix is generated through the target script.

The purpose of the computer equipment to run the target script is to save the value of the weight matrix in the model file as a static array. The size of the sub-array has been determined when the static array is declared, that is, the number of array elements is fixed, so the static array There is a one-to-one correspondence with the weight matrix, which facilitates the subsequent data loading when compiling the neural network structure.

Illustratively, the developer adds the code for running the target script prepare.sh to the source code of the rule file, and the computer device runs prepare.sh when compiling the rule file to generate a static array corresponding to the value of the weight matrix of the model file. When the compilation is complete, all the values of the weight matrix are saved in the first source file in the form of a static array.

In a possible implementation manner, generating the static array according to the weight matrix may include the following steps:

1. According to the matrix size and data type of the weight matrix, the static array is set by the target script. The array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type.

Since the static array is directly loaded when the second source file is compiled, the size and data type of the static array need to be consistent with its corresponding weight matrix. Optionally, the size of the static array in the target script is determined according to the matrix size of the corresponding weight matrix, and the data type of the static array is the same as the data type of the weight matrix.

Schematically, for a weight matrix with a matrix size of 32*3*3*3 and a floating-point data type, when the computer device sets the corresponding static array, the size of the static array is set to 32*3* 3*3, the data type is set to floating point.

Second, according to the matrix name of the weight matrix, the array name of the static array is generated through the target script.

In order to facilitate the loading of the static array to the correct Tensor in the subsequent compilation process, the computer device needs to set a unique name for the static array according to the matrix name of the weight matrix.

In a possible implementation manner, a preset naming rule is set in the target script, and the target script generates a corresponding array name based on the matrix name of the weight matrix according to the preset naming rule.

Schematically, for a floating-point weight matrix with a name of MobilenetV1/Conv2d_0/weights and a matrix size of 32*3*3*3 in the deep learning model, the array name corresponding to the generated static array is MobilenetV1_Conv2d_0_weights[32 *3*3*3].

Third, according to the weight data contained in the weight matrix, the array value of the static array is generated through the target script.

After the name and data type of the static array are set, the computer device needs to further load the weight data contained in the weight matrix into the corresponding static array. In the embodiment of the present application, the computer device loads all the weight data contained in the weight matrix into the corresponding static array by running the target script.

Schematically, for a static array named MobilenetV1_Conv2d_0_weights[32*3*3*3], according to the name, find a floating-point weight matrix with a size of 32*3*3*3 MobilenetV1/Conv2d_0/weights={0.31435529, xxx,...,xxx}, after adding the weight data, the final static array generated is float MobilenetV1_Conv2d_0_weights[32*3*3*3]={0.31435529,xxx,...,xxx}.

Step 402B: Generate a first source file according to the static array corresponding to each weight matrix.

Optionally, after the computer device converts all the weight matrixes in the model file into static arrays, the target script saves all the static arrays in the source file format, thereby generating the first source file.

As shown in Figure 7, when the computer device generates a static array according to the weight data of the weight matrix 74 in the model file 71, the static array is saved as the first source file 75 and stored in the storage directory where the second source file is located. .

Schematically, if the C++ project is used for deep learning, the first source file generated is saved as Model.cpp.

Step 403: Obtain a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model.

For the implementation of this step, reference may be made to the above step 302, which will not be repeated in this embodiment.

Step 404: Compile the first source file and the second source file to generate a target file corresponding to the deep learning model.

The computer equipment uses the compilation system to compile the first source file and the second source file to generate the target file corresponding to the deep learning model. In order to ensure that the static array in the first source file can be correctly loaded into the Tensor in the neural network structure, in a possible implementation manner, during the compilation process, the computer device (the compilation system in the second source file) The memory pointer corresponding to the target Tensor points the target Tensor to the target static array in the first source file, and the target static array has the same name as the target Tensor.

Optionally, the neural network structure loads the data of the deep learning model through Tensor at compile time. In order to facilitate the computer equipment to accurately find the data loaded into the Tensor, the name of the Tensor is set to be consistent with the name of the corresponding static array. As shown in FIG. 6, the Tensor66 in the neural network 62 points to the corresponding static array in the first source file 65.

Schematically, for a Tensor named MobilenetV1_Conv2d_0_weights[32*3*3*3], during the process of compiling the first source file and the second source file by the computer equipment, its memory pointer points to the first source file with the same name MobilenetV1_Conv2d_0_weights[32*3*3*3] static array, and load the data in the static array into this Tensor.

After the deep learning model is compiled through the above steps 401 to 404, the computer device can use the deep learning model to perform inference through the following step 405.

Step 405: When a deep learning model inference request is received, load the target file into the memory, and execute the target file to perform deep learning model inference.

Schematically, as shown in FIG. 6, when the computer device receives the inference request of the deep learning model, it first loads the target file 63 compiled from the first source file 65 and the second source file into the memory, and then runs the target file 63 Perform deep learning model inference. Since the memory pointer of Tensor66 has been pointed to the static array in the compilation stage (that is, the data is loaded), there is no need to open and copy the model file, and the reasoning can be started directly, thereby improving the reasoning efficiency.

In the embodiment of the present application, the value of the weight matrix in the model file is generated by running the target script as a static array and saved as the first source file. The computer device compiles the first source file and the second source file according to the rule file. At this time, the data of the static array is loaded into the Tensor, so that the work of data loading is completed in the compilation stage, and the model inference can be directly performed, thereby improving the efficiency of the deep learning model inference.

Because the neural network structure is complex and diverse, the computer program can choose different deep learning model generation methods according to the currently adopted neural network structure and the type of deep learning model. For the case where the data volume of the model file is large or the running version requires additional data copy work, the method of the embodiment of this application can be used to generate the deep learning model, thereby improving the efficiency of model inference; for the small data volume of the model file and the data copy workload In smaller cases, the loading method of the deep learning model in related technologies can be adopted to flexibly change the weight matrix in the model file.

Optionally, based on FIG. 3, as shown in FIG. 8, the following steps may be included before step 301.

Step 300a: Obtain the data volume of the model file.

In a possible implementation manner, before compiling the deep learning model, the computer device obtains the data volume of the current deep learning model (that is, the data volume of the model file), and compares the data volume with a preset threshold. If the amount of data is greater than the threshold, step 300b is executed; if the amount of data is less than the threshold, the deep learning model is compiled using a method provided by related technologies (the first source file does not need to be generated).

Illustratively, the threshold is 100MB, that is, when the model file is larger than 100MB, the computer device needs to generate the first source file according to the model file.

In step 300b, if the amount of data is greater than the threshold, execute the step of generating the first source file according to the model file of the deep learning model.

If the data amount of the model file is greater than the threshold, the deep learning model generation method of the embodiment of the present application is used to continue to perform the step of generating the first source file according to the model file of the deep learning model and subsequent steps. If the data volume of the model file is less than the threshold, the deep learning model loading method in related technologies can be selected.

Step 300c: Obtain the running version of the neural network structure adopted by the deep learning model.

In addition to judging based on the data volume of the model file, the computer equipment can also select a suitable deep learning model generation method according to the running version of the neural network structure adopted by the deep learning model.

Wherein, the running version of the neural network structure is used to indicate the hardware for executing the deep learning model, and the running version includes at least one of a CPU running version, a GPU running version, and a DSP running version.

In step 300d, if the running version belongs to the preset running version, execute the step of generating the first source file according to the model file of the deep learning model. The preset running version includes at least one of the GPU running version and the DSP running version.

In a possible implementation, the computer device is pre-set with the operating version that needs to use the deep learning model generation method of the embodiment of this application, and it is determined whether the current operating version belongs to the preset operating version, and if it is, the implementation of this application is selected Example of the deep learning model generation method.

Because the GPU running version or the DSP running version of the deep learning model is running, the computer equipment not only needs to copy the data of the model file to the memory, but also copy the data from the CPU to the GPU or DSP, which seriously affects the deep learning model. The efficiency of inference, therefore, the preset running version set by the computer device includes at least one of the GPU running version and the DSP running version.

It should be noted that the above steps 300a to 300b and steps 300c to 300d can be executed alternatively or simultaneously, which is not limited in the embodiment of the present application.

In the embodiments of the present application, before the deep learning model is compiled, an appropriate compilation method is selected according to the data volume of the model file or the running version of the neural network structure, which helps to improve the efficiency and flexibility of the deep learning model inference.

Fig. 8 is a structural block diagram of an apparatus for generating a deep learning model according to an exemplary embodiment of the present application. The apparatus may be set in the computer equipment described in the above embodiment. As shown in Fig. 8, the apparatus includes:

The first generating module 801 is configured to generate a first source file according to a model file of the deep learning model, the model file containing the weight matrix in the deep learning model;

The first obtaining module 802 is configured to obtain a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model;

The second generating module 803 is configured to compile the first source file and the second source file to generate a target file corresponding to the deep learning model.

Optionally, the first generating module 801 includes:

The running unit is used to run the target script in the rule file in the process of compiling the source code corresponding to the rule file, and the rule file is used to describe the way of compiling the source file to the compilation system;

The first generating unit is configured to generate the first source file through the target script according to the model file.

Optionally, the first generating unit is further configured to:

For each weight matrix in the model file, generate a static array corresponding to each weight matrix through the target script;

Optionally, the first generating unit is further configured to:

The static array is set by the target script according to the matrix size and data type of the weight matrix, the array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type ；

Generating the array name of the static array through the target script according to the matrix name of the weight matrix;

Optionally, the first generating unit is further configured to:

During the compilation process, according to the memory pointer corresponding to the target Tensor in the second source file, the target Tensor is pointed to the target static array in the first source file, and the target static array and the target Tensor have the same name.

Optionally, the device further includes:

The reasoning module is used to load the target file into the memory when receiving a deep learning model reasoning request, and execute the target file to perform deep learning model reasoning.

Optionally, the device further includes:

The second acquiring module is used to acquire the data volume of the model file;

The third generation module is configured to perform the step of generating the first source file according to the model file of the deep learning model if the amount of data is greater than the threshold;

Optionally, the device further includes: a third acquisition module, configured to acquire the running version of the neural network structure adopted by the deep learning model;

The fourth generation module is configured to perform the step of generating the first source file according to the model file of the deep learning model if the running version belongs to the preset running version, and the preset running version includes the GPU running version and the DSP running version. At least one of the versions.

It should be noted that the deep learning model generation device provided in the above embodiment only uses the division of the above functional modules for illustration. In practical applications, the above functions can be allocated by different functional modules according to needs, that is, the equipment The internal structure is divided into different functional modules to complete all or part of the functions described above. In addition, the deep learning model generation device provided in the foregoing embodiment and the embodiment of the deep learning model generation method belong to the same concept. For the specific implementation process, please refer to the method embodiment, which will not be repeated here.

Please refer to FIG. 9, which shows a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application. Specifically, the computer device 900 includes a central processing unit (CPU) 901, a system memory 904 including a random access memory (RAM) 902 and a read-only memory (ROM) 903, and A system bus 905 connecting the system memory 904 and the central processing unit 901. The computer device 900 also includes a basic input/output system (Input/Output system, I/O system) 906 that helps to transfer information between various devices in the computer, and is used to store an operating system 913, application programs 914, and other programs. The mass storage device 907 of the module 915.

The basic input/output system 906 includes a display 908 for displaying information and an input device 909 such as a mouse and a keyboard for the user to input information. The display 908 and the input device 909 are both connected to the central processing unit 901 through the input and output controller 910 connected to the system bus 905. The basic input/output system 906 may also include an input and output controller 910 for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus. Similarly, the input and output controller 910 also provides output to a display screen, a printer, or other types of output devices.

The mass storage device 907 is connected to the central processing unit 901 through a mass storage controller (not shown) connected to the system bus 905. The mass storage device 907 and its associated computer readable medium provide non-volatile storage for the computer device 900. That is, the mass storage device 907 may include a computer-readable medium (not shown) such as a hard disk or a (Compact Disc Read-Only Memory, CD-ROM) drive.

Without loss of generality, the computer-readable media may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data. Computer storage media include RAM, ROM, computer memory (Erasable Programmable Read Only Memor, EPROM), read-write memory (Electrically Erasable Programmable Read Only Memory, EEPROM), flash memory or other solid-state storage technology, CD-ROM, digital universal optical disk ( Digital Versatile Disc, DVD) or other optical storage, tape cartridges, magnetic tape, disk storage or other magnetic storage devices. Of course, those skilled in the art may know that the computer storage medium is not limited to the foregoing. The aforementioned system memory 904 and mass storage device 907 may be collectively referred to as a memory.

The memory stores one or more programs, one or more programs are configured to be executed by one or more central processing units 901, one or more programs contain instructions for implementing the above-mentioned deep learning model generation method, the central processing unit 901 Executing the one or more programs implements the methods provided in the foregoing method embodiments.

According to various embodiments of the present application, the computer device 900 may also be connected to a remote computer on the network through a network such as the Internet to run. That is, the computer device 900 can be connected to the network 912 through the network interface unit 911 connected to the system bus 905, or in other words, the network interface unit 911 can also be used to connect to other types of networks or remote computer systems (not shown). ).

The memory further includes one or more programs, the one or more programs are stored in the memory, and the one or more programs include steps for performing the steps executed by the computer device in the method provided in the embodiments of the present application .

The embodiment of the present application also provides a computer-readable storage medium, the readable storage medium stores at least one instruction, at least one program, code set or instruction set, the at least one instruction, the at least one program, the The code set or instruction set is loaded and executed by the processor to implement the deep learning model generation method described in any of the foregoing embodiments.

Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by a program instructing relevant hardware. The program can be stored in a computer-readable storage medium. The medium may be a computer-readable storage medium included in the memory in the foregoing embodiment; or may be a computer-readable storage medium that exists alone and is not assembled into the terminal. The computer-readable storage medium stores at least one instruction, at least one program, code set or instruction set, and the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by the processor To implement the deep learning model generation method described in any of the foregoing method embodiments.

Optionally, the computer-readable storage medium may include: read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), solid state drive (SSD, Solid State Drives), or optical disk. Among them, random access memory may include resistive random access memory (ReRAM, Resistance Random Access Memory) and dynamic random access memory (DRAM, Dynamic Random Access Memory). The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.

A person of ordinary skill in the art can understand that all or part of the steps in the above embodiments can be implemented by hardware, or by a program to instruct relevant hardware. The program can be stored in a computer-readable storage medium. The storage medium mentioned can be a read-only memory, a magnetic disk or an optical disk, etc.

The above descriptions are only preferred embodiments of this application, and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application. Within range.

Claims

A method for generating a deep learning model, wherein the method includes:

Generating a first source file according to the model file of the deep learning model, the model file containing the weight matrix in the deep learning model;

Acquiring a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model;

Compiling the first source file and the second source file to generate a target file corresponding to the deep learning model.
The method according to claim 1, wherein said generating the first source file according to the model file of the deep learning model comprises:

In the process of compiling the rule file corresponding to the source code, run the target script in the rule file, and the rule file is used to describe the way of compiling the source file to the compilation system;

According to the model file, the first source file is generated through the target script.
The method according to claim 2, wherein said generating said first source file through said target script according to said model file comprises:

For each weight matrix in the model file, generate a static array corresponding to each weight matrix through the target script;

The first source file is generated according to the static array corresponding to each weight matrix.
The method according to claim 3, wherein said generating static arrays corresponding to each of said weight matrixes through said target script comprises:

The static array is set by the target script according to the matrix size and data type of the weight matrix, the array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type ；

Generating the array name of the static array through the target script according to the matrix name of the weight matrix;

According to the weight data contained in the weight matrix, the array value of the static array is generated through the target script.
3. The method according to claim 3, wherein said compiling said first source file and said second source file to generate a target file corresponding to said deep learning model comprises:

During the compilation process, according to the memory pointer corresponding to the target tensor Tensor in the second source file, the target Tensor points to the target static array in the first source file, and the target static array and the target Tensor have The same name.
The method according to any one of claims 1 to 5, wherein after the compiling the first source file and the second source file to generate the target file corresponding to the deep learning model, the method further include:

When a deep learning model reasoning request is received, the target file is loaded into the memory, and the target file is executed to perform deep learning model reasoning.
The method according to any one of claims 1 to 5, wherein before the generating the first source file according to the model file of the deep learning model, the method further comprises:

Acquiring the data volume of the model file;

If the amount of data is greater than the threshold, the step of generating the first source file according to the model file of the deep learning model is executed.
The method according to any one of claims 1 to 5, wherein before the generating the first source file according to the model file of the deep learning model, the method further comprises:

Obtaining a running version of the neural network structure adopted by the deep learning model;

If the running version belongs to the preset running version, the step of generating the first source file according to the model file of the deep learning model is executed. The preset running version includes the running version of the graphics processor GPU and the running version of the digital signal processor DSP. At least one of the versions.
The method according to any one of claims 1 to 5, wherein the storage directory of the first source file is the same as the storage directory of the second source file.
A device for generating a deep learning model, the device comprising:

The first generating module is configured to generate a first source file according to the model file of the deep learning model, the model file containing the weight matrix in the deep learning model;

A first acquisition module, configured to acquire a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model;

The second generating module is configured to compile the first source file and the second source file to generate a target file corresponding to the deep learning model.
The apparatus according to claim 10, wherein the first generating module comprises:

The running unit is used to run the target script in the rule file in the process of compiling the source code corresponding to the rule file, and the rule file is used to describe the way of compiling the source file to the compilation system;

The first generating unit is configured to generate the first source file through the target script according to the model file.
The device according to claim 11, wherein the first generating unit is further configured to:

For each weight matrix in the model file, generate a static array corresponding to each weight matrix through the target script;

The first source file is generated according to the static array corresponding to each weight matrix.
The device according to claim 12, wherein the first generating unit is further configured to:

The static array is set by the target script according to the matrix size and data type of the weight matrix, the array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type ；

Generating the array name of the static array through the target script according to the matrix name of the weight matrix;

According to the weight data contained in the weight matrix, the array value of the static array is generated through the target script.
The device according to claim 12, wherein the first generating unit is further configured to:

During the compilation process, according to the memory pointer corresponding to the target Tensor in the second source file, the target Tensor is pointed to the target static array in the first source file, and the target static array and the target Tensor have the same name.
The device according to any one of claims 10 to 14, wherein the device further comprises:

The reasoning module is used to load the target file into the memory when receiving a deep learning model reasoning request, and execute the target file to perform deep learning model reasoning.
The device according to any one of claims 10 to 14, wherein the device further comprises:

The second acquiring module is used to acquire the data volume of the model file;

The third generation module is configured to perform the step of generating the first source file according to the model file of the deep learning model if the amount of data is greater than the threshold.
The device according to any one of claims 10 to 14, wherein the device further comprises:

The third acquisition module is used to acquire the running version of the neural network structure adopted by the deep learning model;

The fourth generation module is configured to execute the step of generating the first source file according to the model file of the deep learning model if the running version belongs to the preset running version, and the preset running version includes the GPU running version and the DSP running version. At least one of the versions.
The apparatus according to any one of claims 10 to 14, wherein the storage directory of the first source file is the same as the storage directory of the second source file.
A computer device, the computer device includes a processor and a memory; the memory stores at least one instruction, and the at least one instruction is used to be executed by the processor to implement any one of claims 1 to 9 Deep learning model generation method.
A computer-readable storage medium storing at least one instruction, and the at least one instruction is used to be executed by a processor to implement the deep learning model generation method according to any one of claims 1 to 9.