WO2021057807A1 - Deep learning model generation method and apparatus, device, and storage medium - Google Patents

Deep learning model generation method and apparatus, device, and storage medium Download PDF

Info

Publication number
WO2021057807A1
WO2021057807A1 PCT/CN2020/117196 CN2020117196W WO2021057807A1 WO 2021057807 A1 WO2021057807 A1 WO 2021057807A1 CN 2020117196 W CN2020117196 W CN 2020117196W WO 2021057807 A1 WO2021057807 A1 WO 2021057807A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
deep learning
learning model
source file
model
Prior art date
Application number
PCT/CN2020/117196
Other languages
French (fr)
Chinese (zh)
Inventor
谭志鹏
刘耀勇
蒋燚
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021057807A1 publication Critical patent/WO2021057807A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the embodiments of the present application relate to the field of deep learning, and in particular, to a method, device, device, and storage medium for generating a deep learning model.
  • the network structure of deep learning is a kind of multilayer neural network, and most of the data in the model is the value of the weight matrix.
  • the deep learning model will use a suitable data structure to define the neural network structure.
  • a deep learning model When a deep learning model performs model inference, it first needs to load the model into the neural network structure adopted by the deep learning model.
  • the general method of model loading is to treat the model as a file, and load the model file when running the code of the neural network structure. To the memory, and then copy the data from the memory to the neural network structure.
  • the embodiments of the present application provide a method, device, device, and storage medium for generating a deep learning model.
  • the data loading of the weight matrix can be completed during the compilation stage of the deep learning model, thereby improving the efficiency of deep learning model inference.
  • an embodiment of the present application provides a method for generating a deep learning model, and the method includes:
  • the second source file is a source file of a neural network structure adopted by the deep learning model
  • an embodiment of the present application provides an apparatus for generating a deep learning model, and the apparatus includes:
  • the first generating module is configured to generate a first source file according to the model file of the deep learning model, the model file containing the weight matrix in the deep learning model;
  • a first acquisition module configured to acquire a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model;
  • the second generating module is configured to compile the first source file and the second source file to generate a target file corresponding to the deep learning model.
  • an embodiment of the present application provides a computer device that includes a processor and a memory; the memory stores at least one instruction, and the at least one instruction is used to be executed by the processor to implement The deep learning model generation method described in the foregoing aspect.
  • a computer-readable storage medium stores at least one instruction, and the at least one instruction is used to be executed by a processor to implement the deep learning model generation method as described in the above aspect.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the deep learning model generation method provided in the various optional implementations of the foregoing aspects.
  • Fig. 1 is a schematic diagram of a neural network data structure provided by an exemplary embodiment of the present application
  • Figure 2 is a schematic diagram of the data loading implementation of the deep learning model inference process in related technologies
  • Fig. 3 is a flowchart of a method for generating a deep learning model provided by an exemplary embodiment of the present application
  • Fig. 4 is a flowchart of a method for generating a deep learning model provided by another exemplary embodiment of the present application.
  • Fig. 5 is a flowchart of a method for generating a deep learning model provided by another exemplary embodiment of the present application.
  • Fig. 6 is a schematic diagram of an implementation of a deep learning model generation process provided by an exemplary embodiment of the present application.
  • Fig. 7 is a flowchart of a method for generating a deep learning model provided by another exemplary embodiment of the present application.
  • Fig. 8 is a structural block diagram of a deep learning model generating device provided by an exemplary embodiment of the present application.
  • Fig. 9 is a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.
  • the "plurality” mentioned herein means two or more.
  • “And/or” describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone.
  • the character “/” generally indicates that the associated objects before and after are in an "or” relationship.
  • Deep learning model reasoning The process of guessing and inferring unknown samples using the trained deep learning model is called deep learning model reasoning. More specifically, the trained deep learning model can apply the knowledge it has learned to tasks in the digital world, such as image recognition, speech recognition, and spam filtering. The deep learning model is based on its training content to obtain unknown samples Derivation, in the terminology of deep learning, is reasoning.
  • Source file The source file refers to the code file written in assembly language or high-level language, and the computer cannot directly recognize the code in the source file.
  • the object file refers to the binary file that can be directly recognized by the central processing unit (CPU) generated after the source file is compiled by the compiler. It contains machine code, data used by the code at runtime, debugging information, etc. .
  • Rule file Since the code of the neural network structure is composed of multiple source files, the rule file is required to describe the way to compile these source files to the compilation system.
  • Tensor In the field of deep learning, the core of a tensor is a data container, which can be an array of any dimension, containing a name and a memory pointer, and the memory pointer points to the address of the data that needs to be loaded.
  • the computer equipment needs to load the deep learning model into the adopted neural network structure before inferring the deep learning model, and most of the loaded data is the weight matrix of the deep learning model.
  • the computer equipment will use a suitable data structure to define the neural network.
  • the general definition method is shown in Figure 1.
  • the neural network contains multiple operators.
  • the data is uniformly packaged and input into the neural network.
  • the computer equipment usually saves the deep learning model as a file.
  • the model file 21 needs to be loaded into the memory when the deep learning model is inferred, and because the memory pointer of Tensor23 in the neural network 22 points to Corresponding to the memory address of the weight matrix 24, so the computer device needs to copy the data of the weight matrix 24 to the Tensor23 according to the memory address during the operation of the model inference.
  • the neural network structure adopted by the deep learning model runs a special version such as a graphics processor (Graphics Processing Unit, GPU) version, a digital signal processor (Digital Signal Processor, DSP) version, etc.
  • the computer equipment also needs to be at runtime. Copy the data of the deep learning model from the CPU to the GPU or DSP.
  • the computer device completes data copy when the deep learning model is compiled.
  • the computer equipment first generates the first source file according to the data of the weight matrix in the deep learning model file, and compiles it simultaneously with the source file of the neural network structure adopted by the deep learning model (ie, the second source file) to generate the corresponding deep learning model Based on this target file, deep learning model inference is performed.
  • the first source file is generated by generating the weight matrix of the deep learning model, so that the step of data loading is completed during the compilation of the deep learning model.
  • the deep learning model does not need to open the model file and copy the data when inferring, which greatly improves the operating efficiency of the neural network structure, and further improves the inference efficiency of the deep learning model.
  • the deep learning model generation method provided in the embodiments of the present application can be used in computer devices with strong data processing capabilities such as personal computers or servers.
  • the deep learning model obtained through the deep learning model generation method can be implemented as an application or a part of the application and installed in the terminal to make it have deep learning capabilities, or the deep learning model obtained through the deep learning model generation method can be It is applied to the back-end server of the application program, so that the server provides deep learning model reasoning service for the application program in the terminal.
  • each embodiment of the present application is described by taking a deep learning model generation method applied to a computer device as an example.
  • the second source file is the source file of the neural network structure adopted by the deep learning model
  • the first source file is generated according to the model file of the deep learning model, including:
  • the rule file is used to describe the way of compiling the source file to the compilation system
  • the first source file is generated through the target script.
  • the first source file is generated through the target script, including:
  • the first source file is generated according to the static array corresponding to each weight matrix.
  • the static array corresponding to each weight matrix is generated through the target script, including:
  • the static array According to the matrix size and data type of the weight matrix, set the static array through the target script.
  • the array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type;
  • the array name of the static array is generated through the target script
  • the array value of the static array is generated through the target script.
  • the first source file and the second source file are compiled to generate the target file corresponding to the deep learning model, including:
  • the target Tensor points to the target static array in the first source file, and the target static array has the same name as the target Tensor.
  • the method further includes:
  • the target file When a deep learning model inference request is received, the target file is loaded into the memory, and the target file is executed to perform deep learning model inference.
  • the method before generating the first source file according to the model file of the deep learning model, the method further includes:
  • the step of generating the first source file according to the model file of the deep learning model is executed.
  • the method before generating the first source file according to the model file of the deep learning model, the method further includes:
  • the step of generating the first source file according to the model file of the deep learning model is executed, and the preset running version includes at least one of the GPU running version and the DSP running version.
  • the storage directory of the first source file is the same as the storage directory of the second source file.
  • FIG. 3 shows a flowchart of a method for generating a deep learning model according to an embodiment of the present application.
  • the method for generating a deep learning model is used in a computer device as an example for description, and the method includes:
  • Step 301 Generate a first source file according to the model file of the deep learning model, and the model file contains the weight matrix in the deep learning model.
  • the deep learning model can be used for image recognition (recognizing objects contained in the input image), voice recognition (content recognition of the input voice), and video description information generation (generating video description based on the input video) Information) model, the embodiment of this application does not describe the use of the deep learning model.
  • the data loading of the deep learning model is mainly the numerical loading of its weight matrix.
  • the computer device first compiles the weight matrix in the model file before compiling the neural network structure adopted by the deep learning model.
  • the first source file is generated with the value of, so that the source file can be used to load the data directly when compiling the neural network structure.
  • Step 302 Obtain a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model.
  • the computer device before compiling the neural network structure adopted by the deep learning model, the computer device needs to obtain the code of the neural network structure first, and the code of the neural network structure is saved in the second source file.
  • the neural network structure adopted by the deep learning model can be Convolutional Neural Networks (CNN), Recursive Neural Network (RNN), or Long Short-Term Memory (LSTM), etc. Etc., the embodiment of the present application does not limit this.
  • CNN Convolutional Neural Networks
  • RNN Recursive Neural Network
  • LSTM Long Short-Term Memory
  • Step 303 Compile the first source file and the second source file to generate a target file corresponding to the deep learning model.
  • the computer device since the first source file corresponding to the deep learning model is not generated in advance, the computer device directly compiles the source file of the neural network structure to generate the target file.
  • the computer device since the computer device generates the first source file in advance, after the preparation of the first source file and the second source file is completed, the computer device uses the compiling system to compare the first source file and the second source file according to certain rules.
  • the source files are compiled at the same time.
  • the value of each weight matrix in the model file is loaded from the first source file to the second source file, and the data loading of the model file is completed before the completion of the compilation.
  • the target file corresponding to the deep learning model is generated.
  • the content of the target file is the machine code obtained by compiling the code in the first source file and the second source file, which can be directly recognized by the computer equipment, and the subsequent model inference is here On the basis of.
  • the first source file is generated according to the weight matrix in the deep learning model in advance, so that during the compilation process, the first source file and the second source file corresponding to the neural network structure Compile to generate the target file corresponding to the deep learning model; compared to the need to load the weight matrix in the model file into the neural network structure in the inference phase in related technologies, in the embodiment of the application, the deep learning model is compiled in the phase The data loading of the weight matrix can be completed, and the weight matrix does not need to be reloaded in the subsequent model inference process, thereby improving the efficiency of the deep learning model inference.
  • FIG. 4 shows a flowchart of a method for generating a deep learning model according to another embodiment of the present application.
  • the method for generating a deep learning model is used in a computer device as an example for description, and the method includes:
  • step 401 in the process of compiling the source code corresponding to the rule file, the target script in the rule file is executed, and the rule file is used to describe the way of compiling the source file to the compilation system.
  • code of the neural network structure adopted by the deep learning model is composed of multiple source files, it is necessary to use the rule file to describe the way of compiling these source files to the compilation system.
  • code for running a target script is added to the source code of the rule file.
  • the target script is used to generate the first source file from the value of the weight matrix in the deep learning model.
  • the target script may be Shell script.
  • the developer adds the code for running the target script prepare.sh to the source code of the rule file, and the computer device runs the target script during the process of compiling the source code of the rule file.
  • the rule file can be Android.mk.
  • Step 402 According to the model file, a first source file is generated through the target script.
  • the computer device reads the data in the model file during the execution of the target script, so as to generate the first source file according to the read data.
  • step 402 includes the following steps 402A and 402B.
  • Step 402A For each weight matrix in the model file, a static array corresponding to each weight matrix is generated through the target script.
  • the purpose of the computer equipment to run the target script is to save the value of the weight matrix in the model file as a static array.
  • the size of the sub-array has been determined when the static array is declared, that is, the number of array elements is fixed, so the static array There is a one-to-one correspondence with the weight matrix, which facilitates the subsequent data loading when compiling the neural network structure.
  • the developer adds the code for running the target script prepare.sh to the source code of the rule file, and the computer device runs prepare.sh when compiling the rule file to generate a static array corresponding to the value of the weight matrix of the model file.
  • the compilation is complete, all the values of the weight matrix are saved in the first source file in the form of a static array.
  • generating the static array according to the weight matrix may include the following steps:
  • the static array is set by the target script.
  • the array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type.
  • the size and data type of the static array need to be consistent with its corresponding weight matrix.
  • the size of the static array in the target script is determined according to the matrix size of the corresponding weight matrix, and the data type of the static array is the same as the data type of the weight matrix.
  • the array name of the static array is generated through the target script.
  • the computer device In order to facilitate the loading of the static array to the correct Tensor in the subsequent compilation process, the computer device needs to set a unique name for the static array according to the matrix name of the weight matrix.
  • a preset naming rule is set in the target script, and the target script generates a corresponding array name based on the matrix name of the weight matrix according to the preset naming rule.
  • the array name corresponding to the generated static array is MobilenetV1_Conv2d_0_weights[32 *3*3*3].
  • the array value of the static array is generated through the target script.
  • the computer device After the name and data type of the static array are set, the computer device needs to further load the weight data contained in the weight matrix into the corresponding static array. In the embodiment of the present application, the computer device loads all the weight data contained in the weight matrix into the corresponding static array by running the target script.
  • Step 402B Generate a first source file according to the static array corresponding to each weight matrix.
  • the target script saves all the static arrays in the source file format, thereby generating the first source file.
  • the computer device when the computer device generates a static array according to the weight data of the weight matrix 74 in the model file 71, the static array is saved as the first source file 75 and stored in the storage directory where the second source file is located. .
  • Step 403 Obtain a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model.
  • Step 404 Compile the first source file and the second source file to generate a target file corresponding to the deep learning model.
  • the computer equipment uses the compilation system to compile the first source file and the second source file to generate the target file corresponding to the deep learning model.
  • the computer device the compilation system in the second source file
  • the memory pointer corresponding to the target Tensor points the target Tensor to the target static array in the first source file, and the target static array has the same name as the target Tensor.
  • the neural network structure loads the data of the deep learning model through Tensor at compile time.
  • the name of the Tensor is set to be consistent with the name of the corresponding static array.
  • the Tensor66 in the neural network 62 points to the corresponding static array in the first source file 65.
  • the computer device can use the deep learning model to perform inference through the following step 405.
  • Step 405 When a deep learning model inference request is received, load the target file into the memory, and execute the target file to perform deep learning model inference.
  • the computer device when the computer device receives the inference request of the deep learning model, it first loads the target file 63 compiled from the first source file 65 and the second source file into the memory, and then runs the target file 63 Perform deep learning model inference. Since the memory pointer of Tensor66 has been pointed to the static array in the compilation stage (that is, the data is loaded), there is no need to open and copy the model file, and the reasoning can be started directly, thereby improving the reasoning efficiency.
  • the value of the weight matrix in the model file is generated by running the target script as a static array and saved as the first source file.
  • the computer device compiles the first source file and the second source file according to the rule file.
  • the data of the static array is loaded into the Tensor, so that the work of data loading is completed in the compilation stage, and the model inference can be directly performed, thereby improving the efficiency of the deep learning model inference.
  • the computer program can choose different deep learning model generation methods according to the currently adopted neural network structure and the type of deep learning model.
  • the method of the embodiment of this application can be used to generate the deep learning model, thereby improving the efficiency of model inference; for the small data volume of the model file and the data copy workload
  • the loading method of the deep learning model in related technologies can be adopted to flexibly change the weight matrix in the model file.
  • step 301 may be included before step 301.
  • Step 300a Obtain the data volume of the model file.
  • the computer device before compiling the deep learning model, obtains the data volume of the current deep learning model (that is, the data volume of the model file), and compares the data volume with a preset threshold. If the amount of data is greater than the threshold, step 300b is executed; if the amount of data is less than the threshold, the deep learning model is compiled using a method provided by related technologies (the first source file does not need to be generated).
  • the threshold is 100MB, that is, when the model file is larger than 100MB, the computer device needs to generate the first source file according to the model file.
  • step 300b if the amount of data is greater than the threshold, execute the step of generating the first source file according to the model file of the deep learning model.
  • the deep learning model generation method of the embodiment of the present application is used to continue to perform the step of generating the first source file according to the model file of the deep learning model and subsequent steps. If the data volume of the model file is less than the threshold, the deep learning model loading method in related technologies can be selected.
  • Step 300c Obtain the running version of the neural network structure adopted by the deep learning model.
  • the computer equipment can also select a suitable deep learning model generation method according to the running version of the neural network structure adopted by the deep learning model.
  • the running version of the neural network structure is used to indicate the hardware for executing the deep learning model
  • the running version includes at least one of a CPU running version, a GPU running version, and a DSP running version.
  • step 300d if the running version belongs to the preset running version, execute the step of generating the first source file according to the model file of the deep learning model.
  • the preset running version includes at least one of the GPU running version and the DSP running version.
  • the computer device is pre-set with the operating version that needs to use the deep learning model generation method of the embodiment of this application, and it is determined whether the current operating version belongs to the preset operating version, and if it is, the implementation of this application is selected Example of the deep learning model generation method.
  • the computer equipment not only needs to copy the data of the model file to the memory, but also copy the data from the CPU to the GPU or DSP, which seriously affects the deep learning model.
  • the preset running version set by the computer device includes at least one of the GPU running version and the DSP running version.
  • steps 300a to 300b and steps 300c to 300d can be executed alternatively or simultaneously, which is not limited in the embodiment of the present application.
  • an appropriate compilation method is selected according to the data volume of the model file or the running version of the neural network structure, which helps to improve the efficiency and flexibility of the deep learning model inference.
  • Fig. 8 is a structural block diagram of an apparatus for generating a deep learning model according to an exemplary embodiment of the present application.
  • the apparatus may be set in the computer equipment described in the above embodiment. As shown in Fig. 8, the apparatus includes:
  • the first generating module 801 is configured to generate a first source file according to a model file of the deep learning model, the model file containing the weight matrix in the deep learning model;
  • the first obtaining module 802 is configured to obtain a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model;
  • the second generating module 803 is configured to compile the first source file and the second source file to generate a target file corresponding to the deep learning model.
  • the first generating module 801 includes:
  • the running unit is used to run the target script in the rule file in the process of compiling the source code corresponding to the rule file, and the rule file is used to describe the way of compiling the source file to the compilation system;
  • the first generating unit is configured to generate the first source file through the target script according to the model file.
  • the first generating unit is further configured to:
  • the first source file is generated according to the static array corresponding to each weight matrix.
  • the first generating unit is further configured to:
  • the static array is set by the target script according to the matrix size and data type of the weight matrix, the array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type ;
  • the array value of the static array is generated through the target script.
  • the first generating unit is further configured to:
  • the target Tensor is pointed to the target static array in the first source file, and the target static array and the target Tensor have the same name.
  • the device further includes:
  • the reasoning module is used to load the target file into the memory when receiving a deep learning model reasoning request, and execute the target file to perform deep learning model reasoning.
  • the device further includes:
  • the second acquiring module is used to acquire the data volume of the model file
  • the third generation module is configured to perform the step of generating the first source file according to the model file of the deep learning model if the amount of data is greater than the threshold;
  • the device further includes: a third acquisition module, configured to acquire the running version of the neural network structure adopted by the deep learning model;
  • the fourth generation module is configured to perform the step of generating the first source file according to the model file of the deep learning model if the running version belongs to the preset running version, and the preset running version includes the GPU running version and the DSP running version. At least one of the versions.
  • the storage directory of the first source file is the same as the storage directory of the second source file.
  • the deep learning model generation device provided in the above embodiment only uses the division of the above functional modules for illustration. In practical applications, the above functions can be allocated by different functional modules according to needs, that is, the equipment The internal structure is divided into different functional modules to complete all or part of the functions described above.
  • the deep learning model generation device provided in the foregoing embodiment and the embodiment of the deep learning model generation method belong to the same concept. For the specific implementation process, please refer to the method embodiment, which will not be repeated here.
  • the computer device 900 includes a central processing unit (CPU) 901, a system memory 904 including a random access memory (RAM) 902 and a read-only memory (ROM) 903, and A system bus 905 connecting the system memory 904 and the central processing unit 901.
  • the computer device 900 also includes a basic input/output system (Input/Output system, I/O system) 906 that helps to transfer information between various devices in the computer, and is used to store an operating system 913, application programs 914, and other programs.
  • the mass storage device 907 of the module 915 The mass storage device 907 of the module 915.
  • the basic input/output system 906 includes a display 908 for displaying information and an input device 909 such as a mouse and a keyboard for the user to input information.
  • the display 908 and the input device 909 are both connected to the central processing unit 901 through the input and output controller 910 connected to the system bus 905.
  • the basic input/output system 906 may also include an input and output controller 910 for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus.
  • the input and output controller 910 also provides output to a display screen, a printer, or other types of output devices.
  • the mass storage device 907 is connected to the central processing unit 901 through a mass storage controller (not shown) connected to the system bus 905.
  • the mass storage device 907 and its associated computer readable medium provide non-volatile storage for the computer device 900. That is, the mass storage device 907 may include a computer-readable medium (not shown) such as a hard disk or a (Compact Disc Read-Only Memory, CD-ROM) drive.
  • a computer-readable medium such as a hard disk or a (Compact Disc Read-Only Memory, CD-ROM) drive.
  • the computer-readable media may include computer storage media and communication media.
  • Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data.
  • Computer storage media include RAM, ROM, computer memory (Erasable Programmable Read Only Memor, EPROM), read-write memory (Electrically Erasable Programmable Read Only Memory, EEPROM), flash memory or other solid-state storage technology, CD-ROM, digital universal optical disk ( Digital Versatile Disc, DVD) or other optical storage, tape cartridges, magnetic tape, disk storage or other magnetic storage devices.
  • RAM random access memory
  • ROM read-write memory
  • flash memory or other solid-state storage technology
  • CD-ROM compact disc
  • digital universal optical disk Digital Versatile Disc, DVD
  • tape cartridges magnetic tape
  • disk storage disk storage or other magnetic storage devices.
  • the aforementioned system memory 904 and mass storage device 907 may be collectively referred to as a memory.
  • the memory stores one or more programs, one or more programs are configured to be executed by one or more central processing units 901, one or more programs contain instructions for implementing the above-mentioned deep learning model generation method, the central processing unit 901 Executing the one or more programs implements the methods provided in the foregoing method embodiments.
  • the computer device 900 may also be connected to a remote computer on the network through a network such as the Internet to run. That is, the computer device 900 can be connected to the network 912 through the network interface unit 911 connected to the system bus 905, or in other words, the network interface unit 911 can also be used to connect to other types of networks or remote computer systems (not shown). ).
  • the memory further includes one or more programs, the one or more programs are stored in the memory, and the one or more programs include steps for performing the steps executed by the computer device in the method provided in the embodiments of the present application .
  • the embodiment of the present application also provides a computer-readable storage medium, the readable storage medium stores at least one instruction, at least one program, code set or instruction set, the at least one instruction, the at least one program, the The code set or instruction set is loaded and executed by the processor to implement the deep learning model generation method described in any of the foregoing embodiments.
  • a computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the deep learning model generation method provided in the various optional implementations of the foregoing aspects.
  • the program can be stored in a computer-readable storage medium.
  • the medium may be a computer-readable storage medium included in the memory in the foregoing embodiment; or may be a computer-readable storage medium that exists alone and is not assembled into the terminal.
  • the computer-readable storage medium stores at least one instruction, at least one program, code set or instruction set, and the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by the processor To implement the deep learning model generation method described in any of the foregoing method embodiments.
  • the computer-readable storage medium may include: read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), solid state drive (SSD, Solid State Drives), or optical disk.
  • random access memory may include resistive random access memory (ReRAM, Resistance Random Access Memory) and dynamic random access memory (DRAM, Dynamic Random Access Memory).
  • ReRAM resistive random access memory
  • DRAM Dynamic Random Access Memory
  • the program can be stored in a computer-readable storage medium.
  • the storage medium mentioned can be a read-only memory, a magnetic disk or an optical disk, etc.

Abstract

Embodiments of the present application relate to the field of deep learning. Disclosed are a deep learning model generation method and apparatus, a device, and a storage medium. The method comprises: generating a first source file according to a model file of a deep learning model, the model file comprising a weight matrix in the deep learning model; obtaining a second source file corresponding to the deep learning model; and compiling the first source file and the second source file to generate a target file corresponding to the deep learning model. The method provided in the embodiments of the present application is used, and the first source file is generated in advance according to the weight matrix in the deep learning model, so that in a compiling process, the first source file and the second source file corresponding to a neural network structure are compiled to generate the target file corresponding to the deep learning model, data loading of the weight matrix can be completed in the compiling stage of the deep learning model, and the weight matrix is not required to be reloaded in the subsequent model reasoning process, thereby improving the reasoning efficiency of the deep learning model.

Description

深度学习模型生成方法、装置、设备及存储介质Deep learning model generation method, device, equipment and storage medium
本申请要求于2019年09月23日提交的申请号为201910897445.7、发明名称为“深度学习模型生成方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on September 23, 2019, with the application number 201910897445.7 and the invention title "Deep learning model generation method, device, equipment and storage medium", the entire content of which is incorporated herein by reference Applying.
技术领域Technical field
本申请实施例涉及深度学习领域,特别涉及一种深度学习模型生成方法、装置、设备及存储介质。The embodiments of the present application relate to the field of deep learning, and in particular, to a method, device, device, and storage medium for generating a deep learning model.
背景技术Background technique
深度学习的网络结构是多层神经网络的一种,其模型中大部分数据是权值矩阵的数值。深度学习模型为了完成模型推理,会采用合适的数据结构来定义神经网络结构。The network structure of deep learning is a kind of multilayer neural network, and most of the data in the model is the value of the weight matrix. In order to complete the model inference, the deep learning model will use a suitable data structure to define the neural network structure.
深度学习模型进行模型推理时,首先需要把模型加载到深度学习模型所采用的神经网络结构中,模型加载的一般做法,是把模型当作文件,在运行神经网络结构的代码时把模型文件加载到内存中,再从内存拷贝数据到神经网络结构中。When a deep learning model performs model inference, it first needs to load the model into the neural network structure adopted by the deep learning model. The general method of model loading is to treat the model as a file, and load the model file when running the code of the neural network structure. To the memory, and then copy the data from the memory to the neural network structure.
发明内容Summary of the invention
本申请实施例提供了一种深度学习模型生成方法、装置、设备及存储介质,在深度学习模型的编译阶段即可完成权值矩阵的数据加载,进而提高了深度学习模型推理的效率。所述技术方案如下:The embodiments of the present application provide a method, device, device, and storage medium for generating a deep learning model. The data loading of the weight matrix can be completed during the compilation stage of the deep learning model, thereby improving the efficiency of deep learning model inference. The technical solution is as follows:
一方面,本申请实施例提供了一种深度学习模型生成方法,所述方法包括:On the one hand, an embodiment of the present application provides a method for generating a deep learning model, and the method includes:
根据深度学习模型的模型文件生成第一源文件,所述模型文件包含所述深度学习模型中的权值矩阵;Generating a first source file according to the model file of the deep learning model, the model file containing the weight matrix in the deep learning model;
获取所述深度学习模型对应的第二源文件,所述第二源文件为所述深度学习模型所采用神经网络结构的源文件;Acquiring a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model;
对所述第一源文件和所述第二源文件进行编译,生成所述深度学习模型对应的目标文件。Compiling the first source file and the second source file to generate a target file corresponding to the deep learning model.
另一方面,本申请实施例提供了一种深度学习模型生成装置,所述装置包括:On the other hand, an embodiment of the present application provides an apparatus for generating a deep learning model, and the apparatus includes:
第一生成模块,用于根据深度学习模型的模型文件生成第一源文件,所述模型文件包含所述深度学习模型中的权值矩阵;The first generating module is configured to generate a first source file according to the model file of the deep learning model, the model file containing the weight matrix in the deep learning model;
第一获取模块,用于获取所述深度学习模型对应的第二源文件,所述第二源文件为所述深度学习模型所采用神经网络结构的源文件;A first acquisition module, configured to acquire a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model;
第二生成模块,用于对所述第一源文件和所述第二源文件进行编译,生成所述深度学习模型对应的目标文件。The second generating module is configured to compile the first source file and the second source file to generate a target file corresponding to the deep learning model.
另一方面,本申请实施例提供了一种计算机设备,所述计算机设备包括处理器和存储器;所述存储器存储有至少一条指令,所述至少一条指令用于被所述处理器执行以实现如上述方面所述的深度学习模型生成方法。On the other hand, an embodiment of the present application provides a computer device that includes a processor and a memory; the memory stores at least one instruction, and the at least one instruction is used to be executed by the processor to implement The deep learning model generation method described in the foregoing aspect.
另一方面,提供了一种计算机可读存储介质,所述存储介质存储有至少一条指令,所述至少一条指令用于被处理器执行以实现如上述方面所述的深度学习模型生成方法。In another aspect, a computer-readable storage medium is provided, the storage medium stores at least one instruction, and the at least one instruction is used to be executed by a processor to implement the deep learning model generation method as described in the above aspect.
根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面的各种可选实现方式中提供的深度学习模型生成方法。According to one aspect of the present application, a computer program product or computer program is provided, the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the deep learning model generation method provided in the various optional implementations of the foregoing aspects.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those of ordinary skill in the art, other drawings can be obtained from these drawings without creative work.
图1是本申请一个示例性实施例提供的神经网络数据结构示意图;Fig. 1 is a schematic diagram of a neural network data structure provided by an exemplary embodiment of the present application;
图2是相关技术中深度学习模型推理过程的数据加载实施示意图;Figure 2 is a schematic diagram of the data loading implementation of the deep learning model inference process in related technologies;
图3是本申请一个示例性实施例提供的深度学习模型生成方法的流程图;Fig. 3 is a flowchart of a method for generating a deep learning model provided by an exemplary embodiment of the present application;
图4是本申请另一个示例性实施例提供的深度学习模型生成方法的流程图;Fig. 4 is a flowchart of a method for generating a deep learning model provided by another exemplary embodiment of the present application;
图5是本申请另一个示例性实施例提供的深度学习模型生成方法的流程图;Fig. 5 is a flowchart of a method for generating a deep learning model provided by another exemplary embodiment of the present application;
图6是本申请一个示例性实施例提供的深度学习模型生成过程的实施示意 图;Fig. 6 is a schematic diagram of an implementation of a deep learning model generation process provided by an exemplary embodiment of the present application;
图7是本申请另一个示例性实施例提供的深度学习模型生成方法的流程图;Fig. 7 is a flowchart of a method for generating a deep learning model provided by another exemplary embodiment of the present application;
图8是本申请一个示例性实施例提供的深度学习模型生成装置的结构框图;Fig. 8 is a structural block diagram of a deep learning model generating device provided by an exemplary embodiment of the present application;
图9是本申请一个示例性实施例提供的计算机设备的结构示意图。Fig. 9 is a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application.
具体实施方式detailed description
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。In order to make the objectives, technical solutions, and advantages of the present application clearer, the implementation manners of the present application will be further described in detail below with reference to the accompanying drawings.
在本文中提及的“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。字符“/”一般表示前后关联对象是一种“或”的关系。The "plurality" mentioned herein means two or more. "And/or" describes the association relationship of the associated objects, indicating that there can be three types of relationships, for example, A and/or B, which can mean: A alone exists, A and B exist at the same time, and B exists alone. The character "/" generally indicates that the associated objects before and after are in an "or" relationship.
为了便于理解,下面对本申请实施例中涉及的一些名词进行简单介绍。To facilitate understanding, some terms involved in the embodiments of the present application are briefly introduced below.
深度学习模型推理:利用经过训练之后的深度学习模型,对未知样本进行猜测和推断的过程称之为深度学习模型推理。更具体地说,经过训练的深度学习模型可以将其学到的知识应用于数字世界的任务,如图像识别、语音识别和垃圾邮件过滤等,深度学习模型基于其训练的内容对获得的未知样本进行推导,用深度学习领域的术语来说就是推理。Deep learning model reasoning: The process of guessing and inferring unknown samples using the trained deep learning model is called deep learning model reasoning. More specifically, the trained deep learning model can apply the knowledge it has learned to tasks in the digital world, such as image recognition, speech recognition, and spam filtering. The deep learning model is based on its training content to obtain unknown samples Derivation, in the terminology of deep learning, is reasoning.
源文件:源文件是指用汇编语言或高级语言编写的代码文件,计算机无法直接识别源文件中的代码。Source file: The source file refers to the code file written in assembly language or high-level language, and the computer cannot directly recognize the code in the source file.
目标文件:目标文件是指源文件经过编译程序编译后产生的能被中央处理器(Central Processing Unit,CPU)直接识别的二进制文件,其中包含机器代码、代码在运行时使用的数据和调试信息等。Object file: The object file refers to the binary file that can be directly recognized by the central processing unit (CPU) generated after the source file is compiled by the compiler. It contains machine code, data used by the code at runtime, debugging information, etc. .
规则文件:由于神经网络结构的代码由多个源文件组成,因此需要规则文件向编译系统描述编译这些源文件的方式。Rule file: Since the code of the neural network structure is composed of multiple source files, the rule file is required to describe the way to compile these source files to the compilation system.
张量(Tensor):在深度学习领域,张量的核心是一个数据容器,其可以是任意维的数组,包含名称和内存指针,该内存指针指向需要加载的数据的地址。Tensor: In the field of deep learning, the core of a tensor is a data container, which can be an array of any dimension, containing a name and a memory pointer, and the memory pointer points to the address of the data that needs to be loaded.
计算机设备在对深度学习模型进行推理前,需要把深度学习模型加载到所采用的神经网络结构中,其中加载的数据大部分为深度学习模型的权值矩阵。 为了完成对深度学习模型的推理,计算机设备会采用合适的数据结构来定义神经网络,一般的定义方式如图1所示,神经网络中含有多个算子,每个算子通过Tensor将各种数据统一封装并输入神经网络。The computer equipment needs to load the deep learning model into the adopted neural network structure before inferring the deep learning model, and most of the loaded data is the weight matrix of the deep learning model. In order to complete the reasoning of the deep learning model, the computer equipment will use a suitable data structure to define the neural network. The general definition method is shown in Figure 1. The neural network contains multiple operators. The data is uniformly packaged and input into the neural network.
相关技术中,计算机设备通常是将深度学习模型作为文件保存,如图2所示,在深度学习模型推理时需要先将模型文件21加载到内存中,且由于神经网络22中Tensor23的内存指针指向对应的权值矩阵24的内存地址,因此在模型推理的运行过程中计算机设备需要根据内存地址将权值矩阵24的数据拷贝到Tensor23中。另外,如果深度学习模型所采用的神经网络结构运行的是图形处理器(Graphics Processing Unit,GPU)版本、数字信号处理器(Digital Signal Processor,DSP)版本等特殊版本,计算机设备还需要在运行时将深度学习模型的数据从CPU拷贝到GPU或DSP中。In related technologies, the computer equipment usually saves the deep learning model as a file. As shown in Figure 2, the model file 21 needs to be loaded into the memory when the deep learning model is inferred, and because the memory pointer of Tensor23 in the neural network 22 points to Corresponding to the memory address of the weight matrix 24, so the computer device needs to copy the data of the weight matrix 24 to the Tensor23 according to the memory address during the operation of the model inference. In addition, if the neural network structure adopted by the deep learning model runs a special version such as a graphics processor (Graphics Processing Unit, GPU) version, a digital signal processor (Digital Signal Processor, DSP) version, etc., the computer equipment also needs to be at runtime. Copy the data of the deep learning model from the CPU to the GPU or DSP.
由于神经网络结构对于运行效率极其敏感,因此这种数据拷贝会严重降低运行效率,尤其对于数据量较大的模型,会严重影响深度学习模型的推理效率。Since the neural network structure is extremely sensitive to operating efficiency, this kind of data copy will seriously reduce the operating efficiency, especially for models with a large amount of data, which will seriously affect the inference efficiency of the deep learning model.
为了提高深度学习模型的推理效率,本申请实施例提供的深度学习模型生成方法中,计算机设备在深度学习模型编译时完成数据拷贝。计算机设备首先根据深度学习模型文件中权值矩阵的数据生成第一源文件,使其与深度学习模型所采用的神经网络结构的源文件(即第二源文件)同时编译,生成深度学习模型对应的目标文件,在此基础上进行深度学习模型推理。In order to improve the reasoning efficiency of the deep learning model, in the deep learning model generation method provided in the embodiment of the present application, the computer device completes data copy when the deep learning model is compiled. The computer equipment first generates the first source file according to the data of the weight matrix in the deep learning model file, and compiles it simultaneously with the source file of the neural network structure adopted by the deep learning model (ie, the second source file) to generate the corresponding deep learning model Based on this target file, deep learning model inference is performed.
相较于相关技术中提供的深度学习模型加载方法,本申请实施例中,通过将深度学习模型的权值矩阵生成第一源文件,使得数据加载这一步在深度学习模型的编译过程中完成,深度学习模型推理时不需要再进行模型文件打开和数据拷贝的工作,极大提高了神经网络结构的运行效率,进而提高了深度学习模型的推理效率。Compared with the deep learning model loading method provided in related technologies, in this embodiment of the application, the first source file is generated by generating the weight matrix of the deep learning model, so that the step of data loading is completed during the compilation of the deep learning model. The deep learning model does not need to open the model file and copy the data when inferring, which greatly improves the operating efficiency of the neural network structure, and further improves the inference efficiency of the deep learning model.
本申请实施例提供的深度学习模型生成方法可以用于个人计算机或者服务器等具有较强数据处理能力的计算机设备中。通过深度学习模型生成方法得到的深度学习模型可以实现成为应用程序或应用程序的一部分,并被安装到终端中,使其具备深度学习能力,或者,通过深度学习模型生成方法得到的深度学习模型可以应用于应用程序后台服务器,从而由服务器为终端中的应用程序提供深度学习模型推理服务。为了方便表述,本申请各个实施例以深度学习模型生成方法应用于计算机设备为例进行说明。The deep learning model generation method provided in the embodiments of the present application can be used in computer devices with strong data processing capabilities such as personal computers or servers. The deep learning model obtained through the deep learning model generation method can be implemented as an application or a part of the application and installed in the terminal to make it have deep learning capabilities, or the deep learning model obtained through the deep learning model generation method can be It is applied to the back-end server of the application program, so that the server provides deep learning model reasoning service for the application program in the terminal. For ease of presentation, each embodiment of the present application is described by taking a deep learning model generation method applied to a computer device as an example.
本申请实施例提供的深度学习模型生成方法包括:The deep learning model generation method provided by the embodiment of the application includes:
根据深度学习模型的模型文件生成第一源文件,模型文件包含深度学习模型中的权值矩阵;Generate the first source file according to the model file of the deep learning model, and the model file contains the weight matrix in the deep learning model;
获取深度学习模型对应的第二源文件,第二源文件为深度学习模型所采用神经网络结构的源文件;Obtain a second source file corresponding to the deep learning model, where the second source file is the source file of the neural network structure adopted by the deep learning model;
对第一源文件和第二源文件进行编译,生成深度学习模型对应的目标文件。Compile the first source file and the second source file to generate the target file corresponding to the deep learning model.
可选的,根据深度学习模型的模型文件生成第一源文件,包括:Optionally, the first source file is generated according to the model file of the deep learning model, including:
编译规则文件对应源代码的过程中,运行规则文件中的目标脚本,规则文件用于向编译系统描述编译源文件的方式;In the process of compiling the rule file corresponding to the source code, run the target script in the rule file. The rule file is used to describe the way of compiling the source file to the compilation system;
根据模型文件,通过目标脚本生成第一源文件。According to the model file, the first source file is generated through the target script.
可选的,根据模型文件,通过目标脚本生成第一源文件,包括:Optionally, according to the model file, the first source file is generated through the target script, including:
对于模型文件中的各个权值矩阵,通过目标脚本生成各个权值矩阵对应的静态数组;For each weight matrix in the model file, a static array corresponding to each weight matrix is generated through the target script;
根据各个权值矩阵对应的静态数组生成第一源文件。The first source file is generated according to the static array corresponding to each weight matrix.
可选的,通过目标脚本生成各个权值矩阵对应的静态数组,包括:Optionally, the static array corresponding to each weight matrix is generated through the target script, including:
根据权值矩阵的矩阵尺寸和数据类型,通过目标脚本设置静态数组,静态数组的数组大小根据矩阵尺寸确定,且静态数组的数组类型与数据类型相同;According to the matrix size and data type of the weight matrix, set the static array through the target script. The array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type;
根据权值矩阵的矩阵名称,通过目标脚本生成静态数组的数组名称;According to the matrix name of the weight matrix, the array name of the static array is generated through the target script;
根据权值矩阵中包含的权值数据,通过目标脚本生成静态数组的数组值。According to the weight data contained in the weight matrix, the array value of the static array is generated through the target script.
可选的,对第一源文件和第二源文件进行编译,生成深度学习模型对应的目标文件,包括:Optionally, the first source file and the second source file are compiled to generate the target file corresponding to the deep learning model, including:
编译过程中,根据第二源文件中目标Tensor对应的内存指针,将目标Tensor指向第一源文件中的目标静态数组,目标静态数组与目标Tensor具有相同的名称。During the compilation process, according to the memory pointer corresponding to the target Tensor in the second source file, the target Tensor points to the target static array in the first source file, and the target static array has the same name as the target Tensor.
可选的,对第一源文件和第二源文件进行编译,生成深度学习模型对应的目标文件之后,该方法还包括:Optionally, after compiling the first source file and the second source file to generate the target file corresponding to the deep learning model, the method further includes:
当接收到深度学习模型推理请求时,将目标文件加载至内存,并执行目标文件进行深度学习模型推理。When a deep learning model inference request is received, the target file is loaded into the memory, and the target file is executed to perform deep learning model inference.
可选的,根据深度学习模型的模型文件生成第一源文件之前,该方法还包括:Optionally, before generating the first source file according to the model file of the deep learning model, the method further includes:
获取模型文件的数据量;Obtain the data volume of the model file;
若数据量大于阈值,则执行根据深度学习模型的模型文件生成第一源文件的步骤。If the amount of data is greater than the threshold, the step of generating the first source file according to the model file of the deep learning model is executed.
可选的,根据深度学习模型的模型文件生成第一源文件之前,该方法还包括:Optionally, before generating the first source file according to the model file of the deep learning model, the method further includes:
获取深度学习模型所采用神经网络结构的运行版本;Obtain the running version of the neural network structure adopted by the deep learning model;
若运行版本属于预设运行版本,则执行根据深度学习模型的模型文件生成第一源文件的步骤,预设运行版本包括GPU运行版本和DSP运行版本中的至少一种。If the running version belongs to the preset running version, the step of generating the first source file according to the model file of the deep learning model is executed, and the preset running version includes at least one of the GPU running version and the DSP running version.
可选的,第一源文件的存储目录与第二源文件的存储目录相同。Optionally, the storage directory of the first source file is the same as the storage directory of the second source file.
请参考图3,其示出了本申请的一个实施例示出的深度学习模型生成方法的流程图。本实施例以深度学习模型生成方法用于计算机设备为例进行说明,该方法包括:Please refer to FIG. 3, which shows a flowchart of a method for generating a deep learning model according to an embodiment of the present application. In this embodiment, the method for generating a deep learning model is used in a computer device as an example for description, and the method includes:
步骤301,根据深度学习模型的模型文件生成第一源文件,模型文件包含深度学习模型中的权值矩阵。Step 301: Generate a first source file according to the model file of the deep learning model, and the model file contains the weight matrix in the deep learning model.
其中,该深度学习模型可以是用于进行图像识别(对输入图像中包含的对象进行识别)、进行语音识别(对输入语音进行内容识别)、进行视频描述信息生成(根据输入的视频生成视频描述信息)的模型,本申请实施例并不对深度学习模型的用途进行说明。Among them, the deep learning model can be used for image recognition (recognizing objects contained in the input image), voice recognition (content recognition of the input voice), and video description information generation (generating video description based on the input video) Information) model, the embodiment of this application does not describe the use of the deep learning model.
深度学习模型的数据加载主要是其权值矩阵的数值加载,在一种可能的实施方式中,计算机设备在对深度学习模型所采用的神经网络结构进行编译之前,首先将模型文件中权值矩阵的数值生成第一源文件,以便后续编译神经网络结构时直接利用此源文件完成数据加载。The data loading of the deep learning model is mainly the numerical loading of its weight matrix. In a possible implementation, the computer device first compiles the weight matrix in the model file before compiling the neural network structure adopted by the deep learning model. The first source file is generated with the value of, so that the source file can be used to load the data directly when compiling the neural network structure.
步骤302,获取深度学习模型对应的第二源文件,第二源文件为深度学习模型所采用神经网络结构的源文件。Step 302: Obtain a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model.
在一种可能的实施方式中,在对深度学习模型所采用的神经网络结构进行编译前,计算机设备需要先获取神经网络结构的代码,该神经网络结构的代码即保存在第二源文件中。In a possible implementation manner, before compiling the neural network structure adopted by the deep learning model, the computer device needs to obtain the code of the neural network structure first, and the code of the neural network structure is saved in the second source file.
其中,深度学习模型所采用的神经网络结构可以为卷积神经网络(Convolutional Neural Networks,CNN)、递归神经网络(Recursive Neural Network,RNN)或长短期记忆网络(Long Short-Term Memory,LSTM)等等, 本申请实施例对此并不进行限定。Among them, the neural network structure adopted by the deep learning model can be Convolutional Neural Networks (CNN), Recursive Neural Network (RNN), or Long Short-Term Memory (LSTM), etc. Etc., the embodiment of the present application does not limit this.
步骤303,对第一源文件和第二源文件进行编译,生成深度学习模型对应的目标文件。Step 303: Compile the first source file and the second source file to generate a target file corresponding to the deep learning model.
相关技术中,由于并未预先生成深度学习模型对应的第一源文件,因此计算机设备直接对神经网络结构的源文件进行编译,从而生成目标文件。In the related art, since the first source file corresponding to the deep learning model is not generated in advance, the computer device directly compiles the source file of the neural network structure to generate the target file.
而本申请实施例中,由于计算机设备预先生成了第一源文件,因此第一源文件和第二源文件准备完成后,计算机设备通过编译系统,按照一定的规则对第一源文件和第二源文件同时进行编译。编译过程中,模型文件中各个权值矩阵的数值从第一源文件加载到第二源文件,编译结束之前即完成模型文件的数据加载。编译完成后生成深度学习模型对应的目标文件,该目标文件的内容是第一源文件和第二源文件中的代码经过编译所得到的机器代码,能够被计算机设备直接识别,后续模型推理在此基础上进行。In the embodiment of the present application, since the computer device generates the first source file in advance, after the preparation of the first source file and the second source file is completed, the computer device uses the compiling system to compare the first source file and the second source file according to certain rules. The source files are compiled at the same time. During the compilation process, the value of each weight matrix in the model file is loaded from the first source file to the second source file, and the data loading of the model file is completed before the completion of the compilation. After the compilation is completed, the target file corresponding to the deep learning model is generated. The content of the target file is the machine code obtained by compiling the code in the first source file and the second source file, which can be directly recognized by the computer equipment, and the subsequent model inference is here On the basis of.
综上所述,本申请实施例中,通过预先根据深度学习模型中的权值矩阵生成第一源文件,从而在编译过程中,同时对第一源文件以及神经网络结构对应的第二源文件进行编译,生成深度学习模型对应的目标文件;相较于相关技术中需要在推理阶段将模型文件中的权值矩阵加载至神经网络结构,本申请实施例中,在深度学习模型的编译阶段即可完成权值矩阵的数据加载,后续模型推理过程中不需要重新加载权值矩阵,进而提高了深度学习模型推理的效率。In summary, in the embodiment of the present application, the first source file is generated according to the weight matrix in the deep learning model in advance, so that during the compilation process, the first source file and the second source file corresponding to the neural network structure Compile to generate the target file corresponding to the deep learning model; compared to the need to load the weight matrix in the model file into the neural network structure in the inference phase in related technologies, in the embodiment of the application, the deep learning model is compiled in the phase The data loading of the weight matrix can be completed, and the weight matrix does not need to be reloaded in the subsequent model inference process, thereby improving the efficiency of the deep learning model inference.
请参考图4,其示出了本申请的另一个实施例示出的深度学习模型生成方法的流程图。本实施例以深度学习模型生成方法用于计算机设备为例进行说明,该方法包括:Please refer to FIG. 4, which shows a flowchart of a method for generating a deep learning model according to another embodiment of the present application. In this embodiment, the method for generating a deep learning model is used in a computer device as an example for description, and the method includes:
步骤401,编译规则文件对应源代码的过程中,运行规则文件中的目标脚本,规则文件用于向编译系统描述编译源文件的方式。In step 401, in the process of compiling the source code corresponding to the rule file, the target script in the rule file is executed, and the rule file is used to describe the way of compiling the source file to the compilation system.
由于深度学习模型所采用的神经网络结构的代码是由多个源文件组成的,因此需要利用规则文件向编译系统描述编译这些源文件的方式。在一种可能的实施方式中,规则文件的源代码中添加有运行目标脚本的代码,该目标脚本用于将深度学习模型中的权值矩阵的数值生成第一源文件,该目标脚本可以是Shell脚本。Since the code of the neural network structure adopted by the deep learning model is composed of multiple source files, it is necessary to use the rule file to describe the way of compiling these source files to the compilation system. In a possible implementation manner, code for running a target script is added to the source code of the rule file. The target script is used to generate the first source file from the value of the weight matrix in the deep learning model. The target script may be Shell script.
示意性的,开发人员在规则文件的源代码中添加运行目标脚本prepare.sh的代码,计算机设备在编译规则文件源代码的过程中,即运行该目标脚本。在 Android系统中,规则文件可以为Android.mk。Illustratively, the developer adds the code for running the target script prepare.sh to the source code of the rule file, and the computer device runs the target script during the process of compiling the source code of the rule file. In the Android system, the rule file can be Android.mk.
步骤402,根据模型文件,通过目标脚本生成第一源文件。Step 402: According to the model file, a first source file is generated through the target script.
在一种可能的实施方式中,计算机设备在目标脚本运行过程中即读取模型文件中的数据,从而根据读取到的数据生成第一源文件。In a possible implementation manner, the computer device reads the data in the model file during the execution of the target script, so as to generate the first source file according to the read data.
可选的,在图4的基础上,如图5所示,步骤402包括下述步骤402A和402B。Optionally, based on FIG. 4, as shown in FIG. 5, step 402 includes the following steps 402A and 402B.
步骤402A,对于模型文件中的各个权值矩阵,通过目标脚本生成各个权值矩阵对应的静态数组。 Step 402A: For each weight matrix in the model file, a static array corresponding to each weight matrix is generated through the target script.
计算机设备运行目标脚本的目的,是将模型文件中权值矩阵的数值保存为静态数组,静态数组在被声明时已经确定了子数组的大小,即数组元素的个数固定不变,因此静态数组与权值矩阵是一一对应的,方便后续编译神经网络结构时的数据加载。The purpose of the computer equipment to run the target script is to save the value of the weight matrix in the model file as a static array. The size of the sub-array has been determined when the static array is declared, that is, the number of array elements is fixed, so the static array There is a one-to-one correspondence with the weight matrix, which facilitates the subsequent data loading when compiling the neural network structure.
示意性的,开发人员在规则文件的源代码中添加运行目标脚本prepare.sh的代码,计算机设备编译生成规则文件时运行prepare.sh,将模型文件的权值矩阵的数值分别对应生成静态数组,编译完成时所有的权值矩阵的数值以静态数组的形式保存在第一源文件中。Illustratively, the developer adds the code for running the target script prepare.sh to the source code of the rule file, and the computer device runs prepare.sh when compiling the rule file to generate a static array corresponding to the value of the weight matrix of the model file. When the compilation is complete, all the values of the weight matrix are saved in the first source file in the form of a static array.
在一种可能的实施方式中,根据权值矩阵生成静态数组可以包括如下步骤:In a possible implementation manner, generating the static array according to the weight matrix may include the following steps:
一,根据权值矩阵的矩阵尺寸和数据类型,通过目标脚本设置静态数组,静态数组的数组大小根据矩阵尺寸确定,且静态数组的数组类型与数据类型相同。1. According to the matrix size and data type of the weight matrix, the static array is set by the target script. The array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type.
由于第二源文件编译时直接加载静态数组,所以静态数组的大小和数据类型需要与其对应的权值矩阵一致。可选的,目标脚本中静态数组的大小根据与其对应的权值矩阵的矩阵尺寸确定,静态数组的数据类型与权值矩阵的数据类型相同。Since the static array is directly loaded when the second source file is compiled, the size and data type of the static array need to be consistent with its corresponding weight matrix. Optionally, the size of the static array in the target script is determined according to the matrix size of the corresponding weight matrix, and the data type of the static array is the same as the data type of the weight matrix.
示意性的,对于一个矩阵尺寸为32*3*3*3,数据类型为浮点型的权值矩阵,计算机设备在设置相应的静态数组时,将该静态数组的大小设置为32*3*3*3,数据类型设置为浮点型。Schematically, for a weight matrix with a matrix size of 32*3*3*3 and a floating-point data type, when the computer device sets the corresponding static array, the size of the static array is set to 32*3* 3*3, the data type is set to floating point.
二,根据权值矩阵的矩阵名称,通过目标脚本生成静态数组的数组名称。Second, according to the matrix name of the weight matrix, the array name of the static array is generated through the target script.
为了便于后续编译过程中将静态数组加载至正确的Tensor,计算机设备需要根据权值矩阵的矩阵名称为静态数组设置唯一的名称。In order to facilitate the loading of the static array to the correct Tensor in the subsequent compilation process, the computer device needs to set a unique name for the static array according to the matrix name of the weight matrix.
在一种可能的实施方式中,目标脚本中设置有预设命名规则,目标脚本即 根据该预设命名规则,基于权值矩阵的矩阵名称生成相应的数组名称。In a possible implementation manner, a preset naming rule is set in the target script, and the target script generates a corresponding array name based on the matrix name of the weight matrix according to the preset naming rule.
示意性的,对于深度学习模型中一个名称为MobilenetV1/Conv2d_0/weights,矩阵尺寸为32*3*3*3的浮点型权值矩阵,其对应生成的静态数组的数组名称则为MobilenetV1_Conv2d_0_weights[32*3*3*3]。Schematically, for a floating-point weight matrix with a name of MobilenetV1/Conv2d_0/weights and a matrix size of 32*3*3*3 in the deep learning model, the array name corresponding to the generated static array is MobilenetV1_Conv2d_0_weights[32 *3*3*3].
三,根据权值矩阵中包含的权值数据,通过目标脚本生成静态数组的数组值。Third, according to the weight data contained in the weight matrix, the array value of the static array is generated through the target script.
将静态数组的名称和数据类型设置完成后,计算机设备需要进一步将权值矩阵中包含权值数据加载至与其对应的静态数组中。本申请实施例中,计算机设备通过运行目标脚本,将权值矩阵中包含的权值数据全部加载到对应的静态数组中。After the name and data type of the static array are set, the computer device needs to further load the weight data contained in the weight matrix into the corresponding static array. In the embodiment of the present application, the computer device loads all the weight data contained in the weight matrix into the corresponding static array by running the target script.
示意性的,对于一个名称为MobilenetV1_Conv2d_0_weights[32*3*3*3]的静态数组,根据名称找到大小为32*3*3*3的浮点型权值矩阵MobilenetV1/Conv2d_0/weights={0.31435529,xxx,...,xxx},完成权值数据添加后,最终生成的静态数组为float MobilenetV1_Conv2d_0_weights[32*3*3*3]={0.31435529,xxx,...,xxx}。Schematically, for a static array named MobilenetV1_Conv2d_0_weights[32*3*3*3], according to the name, find a floating-point weight matrix with a size of 32*3*3*3 MobilenetV1/Conv2d_0/weights={0.31435529, xxx,...,xxx}, after adding the weight data, the final static array generated is float MobilenetV1_Conv2d_0_weights[32*3*3*3]={0.31435529,xxx,...,xxx}.
步骤402B,根据各个权值矩阵对应的静态数组生成第一源文件。 Step 402B: Generate a first source file according to the static array corresponding to each weight matrix.
可选的,当计算机设备将模型文件中所有的权值矩阵都转换为静态数组后,目标脚本以源文件格式保存全部静态数组,从而生成第一源文件。Optionally, after the computer device converts all the weight matrixes in the model file into static arrays, the target script saves all the static arrays in the source file format, thereby generating the first source file.
如图7所示,当计算机设备根据模型文件71中权值矩阵74的权值数据生成静态数组后,将静态数组保存为第一源文件75,并保存在第二源文件所在的存储目录下。As shown in Figure 7, when the computer device generates a static array according to the weight data of the weight matrix 74 in the model file 71, the static array is saved as the first source file 75 and stored in the storage directory where the second source file is located. .
示意性的,如果深度学习采用C++工程,则生成的第一源文件保存为Model.cpp。Schematically, if the C++ project is used for deep learning, the first source file generated is saved as Model.cpp.
步骤403,获取深度学习模型对应的第二源文件,第二源文件为深度学习模型所采用神经网络结构的源文件。Step 403: Obtain a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model.
本步骤的实施方式可以参考上述步骤302,本实施例在此不再赘述。For the implementation of this step, reference may be made to the above step 302, which will not be repeated in this embodiment.
步骤404,对第一源文件和第二源文件进行编译,生成深度学习模型对应的目标文件。Step 404: Compile the first source file and the second source file to generate a target file corresponding to the deep learning model.
计算机设备利用编译系统对第一源文件和第二源文件进行编译,生成深度学习模型对应的目标文件。为了确保第一源文件中的静态数组能够被正确加载至神经网络结构中的Tensor中,在一种可能的实施方式中,编译过程中,计算 机设备(中的编译系统)根据第二源文件中目标Tensor对应的内存指针,将目标Tensor指向第一源文件中的目标静态数组,目标静态数组与目标Tensor具有相同的名称。The computer equipment uses the compilation system to compile the first source file and the second source file to generate the target file corresponding to the deep learning model. In order to ensure that the static array in the first source file can be correctly loaded into the Tensor in the neural network structure, in a possible implementation manner, during the compilation process, the computer device (the compilation system in the second source file) The memory pointer corresponding to the target Tensor points the target Tensor to the target static array in the first source file, and the target static array has the same name as the target Tensor.
可选的,神经网络结构在编译时通过Tensor加载深度学习模型的数据。为了方便计算机设备准确查找到加载至Tensor的数据,Tensor的名称被设置为与对应的静态数组的名称一致。如图6所示,神经网络62中的Tensor66指向对应的第一源文件65中的静态数组。Optionally, the neural network structure loads the data of the deep learning model through Tensor at compile time. In order to facilitate the computer equipment to accurately find the data loaded into the Tensor, the name of the Tensor is set to be consistent with the name of the corresponding static array. As shown in FIG. 6, the Tensor66 in the neural network 62 points to the corresponding static array in the first source file 65.
示意性的,对于一个名称为MobilenetV1_Conv2d_0_weights[32*3*3*3]的Tensor,在计算机设备编译第一源文件和第二源文件的过程中,其内存指针指向第一源文件中名称同样为MobilenetV1_Conv2d_0_weights[32*3*3*3]的静态数组,并将该静态数组中的数据加载到此Tensor中。Schematically, for a Tensor named MobilenetV1_Conv2d_0_weights[32*3*3*3], during the process of compiling the first source file and the second source file by the computer equipment, its memory pointer points to the first source file with the same name MobilenetV1_Conv2d_0_weights[32*3*3*3] static array, and load the data in the static array into this Tensor.
通过上述步骤401至404完成深度学习模型编译后,计算机设备即可通过下述步骤405,利用深度学习模型进行推理。After the deep learning model is compiled through the above steps 401 to 404, the computer device can use the deep learning model to perform inference through the following step 405.
步骤405,当接收到深度学习模型推理请求时,将目标文件加载至内存,并执行目标文件进行深度学习模型推理。Step 405: When a deep learning model inference request is received, load the target file into the memory, and execute the target file to perform deep learning model inference.
示意性的,如图6所示,计算机设备接收到深度学习模型的推理请求时,先将由第一源文件65和第二源文件编译得到的目标文件63加载至内存,然后运行该目标文件63进行深度学习模型推理。由于在编译阶段已经将Tensor66的内存指针指向了静态数组(即完成数据加载),所以不需要再进行模型文件打开和拷贝的工作,可以直接开始推理,从而提高了推理效率。Schematically, as shown in FIG. 6, when the computer device receives the inference request of the deep learning model, it first loads the target file 63 compiled from the first source file 65 and the second source file into the memory, and then runs the target file 63 Perform deep learning model inference. Since the memory pointer of Tensor66 has been pointed to the static array in the compilation stage (that is, the data is loaded), there is no need to open and copy the model file, and the reasoning can be started directly, thereby improving the reasoning efficiency.
本申请实施例中,通过运行目标脚本将模型文件中的权值矩阵的数值生成静态数组,并保存为第一源文件,计算机设备根据规则文件对第一源文件和第二源文件进行编译的时候将静态数组的数据加载到Tensor中,使得数据加载的工作在编译阶段完成,可以直接进行模型推理,进而提高了深度学习模型推理的效率。In the embodiment of the present application, the value of the weight matrix in the model file is generated by running the target script as a static array and saved as the first source file. The computer device compiles the first source file and the second source file according to the rule file. At this time, the data of the static array is loaded into the Tensor, so that the work of data loading is completed in the compilation stage, and the model inference can be directly performed, thereby improving the efficiency of the deep learning model inference.
由于神经网络结构复杂多样,因此计算机程序可以根据当前采取的神经网络结构和深度学习模型的类型选择不同的深度学习模型生成方法。对于模型文件数据量较大或运行版本需要额外数据拷贝工作的情况,可以采取本申请实施例的方法生成深度学习模型,从而提高模型推理的效率;对于模型文件数据量较小且数据拷贝工作量较小的情况,可以采取相关技术中深度学习模型的加载 方法,以便灵活地变更模型文件中的权值矩阵。Because the neural network structure is complex and diverse, the computer program can choose different deep learning model generation methods according to the currently adopted neural network structure and the type of deep learning model. For the case where the data volume of the model file is large or the running version requires additional data copy work, the method of the embodiment of this application can be used to generate the deep learning model, thereby improving the efficiency of model inference; for the small data volume of the model file and the data copy workload In smaller cases, the loading method of the deep learning model in related technologies can be adopted to flexibly change the weight matrix in the model file.
可选的,在图3的基础上,如图8所示,步骤301之前还可以包括如下步骤。Optionally, based on FIG. 3, as shown in FIG. 8, the following steps may be included before step 301.
步骤300a,获取模型文件的数据量。 Step 300a: Obtain the data volume of the model file.
在一种可能的实施方式中,在对深度学习模型进行编译前,计算机设备获取当前深度学习模型的数据量(即模型文件的数据量),并将此数据量与事先设置的阈值进行比较。若数据量大于阈值,则执行步骤300b;若数据量小于阈值,则采用相关技术提供方法对深度学习模型进行编译(无需生成第一源文件)。In a possible implementation manner, before compiling the deep learning model, the computer device obtains the data volume of the current deep learning model (that is, the data volume of the model file), and compares the data volume with a preset threshold. If the amount of data is greater than the threshold, step 300b is executed; if the amount of data is less than the threshold, the deep learning model is compiled using a method provided by related technologies (the first source file does not need to be generated).
示意性的,该阈值为100MB,即当模型文件大于100MB时,计算机设备需要根据模型文件生成第一源文件。Illustratively, the threshold is 100MB, that is, when the model file is larger than 100MB, the computer device needs to generate the first source file according to the model file.
步骤300b,若数据量大于阈值,则执行根据深度学习模型的模型文件生成第一源文件的步骤。In step 300b, if the amount of data is greater than the threshold, execute the step of generating the first source file according to the model file of the deep learning model.
若模型文件的数据量大于阈值,则采用本申请实施例的深度学习模型生成方法,继续执行根据深度学习模型的模型文件生成第一源文件的步骤及后续步骤。若模型文件的数据量小于阈值,则可以选择相关技术中的深度学习模型加载方法。If the data amount of the model file is greater than the threshold, the deep learning model generation method of the embodiment of the present application is used to continue to perform the step of generating the first source file according to the model file of the deep learning model and subsequent steps. If the data volume of the model file is less than the threshold, the deep learning model loading method in related technologies can be selected.
步骤300c,获取深度学习模型所采用神经网络结构的运行版本。Step 300c: Obtain the running version of the neural network structure adopted by the deep learning model.
除了根据模型文件的数据量进行判断之外,计算机设备也可以根据深度学习模型所采用的神经网络结构的运行版本,选择合适的深度学习模型生成方法。In addition to judging based on the data volume of the model file, the computer equipment can also select a suitable deep learning model generation method according to the running version of the neural network structure adopted by the deep learning model.
其中,神经网络结构的运行版本用于指示执行深度学习模型的硬件,运行版本包括CPU运行版本、GPU运行版本和DSP运行版本中的至少一种。Wherein, the running version of the neural network structure is used to indicate the hardware for executing the deep learning model, and the running version includes at least one of a CPU running version, a GPU running version, and a DSP running version.
步骤300d,若运行版本属于预设运行版本,则执行根据深度学习模型的模型文件生成第一源文件的步骤,预设运行版本包括GPU运行版本和DSP运行版本中的至少一种。In step 300d, if the running version belongs to the preset running version, execute the step of generating the first source file according to the model file of the deep learning model. The preset running version includes at least one of the GPU running version and the DSP running version.
在一种可能的实施方式中,计算机设备中预先设置需要采用本申请实施例的深度学习模型生成方法的运行版本,并判断当前运行版本是否属于预设运行版本,若属于,则选择本申请实施例的深度学习模型生成方法。In a possible implementation, the computer device is pre-set with the operating version that needs to use the deep learning model generation method of the embodiment of this application, and it is determined whether the current operating version belongs to the preset operating version, and if it is, the implementation of this application is selected Example of the deep learning model generation method.
由于GPU运行版本或DSP运行版本的深度学习模型在运行时,计算机设备不仅需要将模型文件的数据拷贝到内存中,还要进一步将数据从CPU拷贝到至GPU或者DSP中,严重影响深度学习模型推理的效率,因此,计算机设备设置的预设运行版本包括GPU运行版本和DSP运行版本中的至少一种。Because the GPU running version or the DSP running version of the deep learning model is running, the computer equipment not only needs to copy the data of the model file to the memory, but also copy the data from the CPU to the GPU or DSP, which seriously affects the deep learning model. The efficiency of inference, therefore, the preset running version set by the computer device includes at least one of the GPU running version and the DSP running version.
需要说明的是,上述步骤300a至300b与步骤300c至300d可以择一执行,也可以同时执行,本申请实施例对此不做限定。It should be noted that the above steps 300a to 300b and steps 300c to 300d can be executed alternatively or simultaneously, which is not limited in the embodiment of the present application.
本申请实施例中,对深度学习模型进行编译之前,根据模型文件的数据量或神经网络结构的运行版本选择合适的编译方式,有助于提高深度学习模型推理的效率和灵活性。In the embodiments of the present application, before the deep learning model is compiled, an appropriate compilation method is selected according to the data volume of the model file or the running version of the neural network structure, which helps to improve the efficiency and flexibility of the deep learning model inference.
图8是本申请一个示例性实施例提供的深度学习模型生成装置的结构框图,该装置可以设置于上述实施例所述的计算机设备,如图8所示,该装置包括:Fig. 8 is a structural block diagram of an apparatus for generating a deep learning model according to an exemplary embodiment of the present application. The apparatus may be set in the computer equipment described in the above embodiment. As shown in Fig. 8, the apparatus includes:
第一生成模块801,用于根据深度学习模型的模型文件生成第一源文件,所述模型文件包含所述深度学习模型中的权值矩阵;The first generating module 801 is configured to generate a first source file according to a model file of the deep learning model, the model file containing the weight matrix in the deep learning model;
第一获取模块802,用于获取所述深度学习模型对应的第二源文件,所述第二源文件为所述深度学习模型所采用神经网络结构的源文件;The first obtaining module 802 is configured to obtain a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model;
第二生成模块803,用于对所述第一源文件和所述第二源文件进行编译,生成所述深度学习模型对应的目标文件。The second generating module 803 is configured to compile the first source file and the second source file to generate a target file corresponding to the deep learning model.
可选的,所述第一生成模块801,包括:Optionally, the first generating module 801 includes:
运行单元,用于编译规则文件对应源代码的过程中,运行所述规则文件中的目标脚本,所述规则文件用于向编译系统描述编译源文件的方式;The running unit is used to run the target script in the rule file in the process of compiling the source code corresponding to the rule file, and the rule file is used to describe the way of compiling the source file to the compilation system;
第一生成单元,用于根据所述模型文件,通过所述目标脚本生成所述第一源文件。The first generating unit is configured to generate the first source file through the target script according to the model file.
可选的,所述第一生成单元还用于:Optionally, the first generating unit is further configured to:
对于所述模型文件中的各个权值矩阵,通过所述目标脚本生成各个所述权值矩阵对应的静态数组;For each weight matrix in the model file, generate a static array corresponding to each weight matrix through the target script;
根据各个所述权值矩阵对应的所述静态数组生成所述第一源文件。The first source file is generated according to the static array corresponding to each weight matrix.
可选的,所述第一生成单元还用于:Optionally, the first generating unit is further configured to:
根据所述权值矩阵的矩阵尺寸和数据类型,通过所述目标脚本设置所述静态数组,所述静态数组的数组大小根据所述矩阵尺寸确定,且所述静态数组的数组类型与数据类型相同;The static array is set by the target script according to the matrix size and data type of the weight matrix, the array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type ;
根据所述权值矩阵的矩阵名称,通过所述目标脚本生成所述静态数组的数组名称;Generating the array name of the static array through the target script according to the matrix name of the weight matrix;
根据所述权值矩阵中包含的权值数据,通过所述目标脚本生成所述静态数组的数组值。According to the weight data contained in the weight matrix, the array value of the static array is generated through the target script.
可选的,所述第一生成单元还用于:Optionally, the first generating unit is further configured to:
编译过程中,根据所述第二源文件中目标Tensor对应的内存指针,将所述目标Tensor指向所述第一源文件中的目标静态数组,所述目标静态数组与所述目标Tensor具有相同的名称。During the compilation process, according to the memory pointer corresponding to the target Tensor in the second source file, the target Tensor is pointed to the target static array in the first source file, and the target static array and the target Tensor have the same name.
可选的,所述装置还包括:Optionally, the device further includes:
推理模块,用于当接收到深度学习模型推理请求时,将所述目标文件加载至内存,并执行所述目标文件进行深度学习模型推理。The reasoning module is used to load the target file into the memory when receiving a deep learning model reasoning request, and execute the target file to perform deep learning model reasoning.
可选的,所述装置还包括:Optionally, the device further includes:
第二获取模块,用于获取所述模型文件的数据量;The second acquiring module is used to acquire the data volume of the model file;
第三生成模块,用于若所述数据量大于阈值,则执行所述根据深度学习模型的模型文件生成第一源文件的步骤;The third generation module is configured to perform the step of generating the first source file according to the model file of the deep learning model if the amount of data is greater than the threshold;
可选的,所述装置还包括:第三获取模块,用于获取所述深度学习模型所采用神经网络结构的运行版本;Optionally, the device further includes: a third acquisition module, configured to acquire the running version of the neural network structure adopted by the deep learning model;
第四生成模块,用于若所述运行版本属于预设运行版本,则执行所述根据深度学习模型的模型文件生成第一源文件的步骤,所述预设运行版本包括GPU运行版本和DSP运行版本中的至少一种。The fourth generation module is configured to perform the step of generating the first source file according to the model file of the deep learning model if the running version belongs to the preset running version, and the preset running version includes the GPU running version and the DSP running version. At least one of the versions.
可选的,所述第一源文件的存储目录与所述第二源文件的存储目录相同。Optionally, the storage directory of the first source file is the same as the storage directory of the second source file.
需要说明的是:上述实施例提供的深度学习模型生成装置,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的深度学习模型生成装置与深度学习模型生成方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。It should be noted that the deep learning model generation device provided in the above embodiment only uses the division of the above functional modules for illustration. In practical applications, the above functions can be allocated by different functional modules according to needs, that is, the equipment The internal structure is divided into different functional modules to complete all or part of the functions described above. In addition, the deep learning model generation device provided in the foregoing embodiment and the embodiment of the deep learning model generation method belong to the same concept. For the specific implementation process, please refer to the method embodiment, which will not be repeated here.
请参考图9,其示出了本申请一个示例性实施例提供的计算机设备的结构示意图。具体来讲:所述计算机设备900包括中央处理单元(CPU)901、包括随机存取存储器(Random Access Memory,RAM)902和只读存储器(Read-Only Memory,ROM)903的系统存储器904,以及连接系统存储器904和中央处理单元901的系统总线905。所述计算机设备900还包括帮助计算机内的各个器件之间传输信息的基本输入/输出系统(Input/Output系统,I/O系统)906,和用 于存储操作系统913、应用程序914和其他程序模块915的大容量存储设备907。Please refer to FIG. 9, which shows a schematic structural diagram of a computer device provided by an exemplary embodiment of the present application. Specifically, the computer device 900 includes a central processing unit (CPU) 901, a system memory 904 including a random access memory (RAM) 902 and a read-only memory (ROM) 903, and A system bus 905 connecting the system memory 904 and the central processing unit 901. The computer device 900 also includes a basic input/output system (Input/Output system, I/O system) 906 that helps to transfer information between various devices in the computer, and is used to store an operating system 913, application programs 914, and other programs. The mass storage device 907 of the module 915.
所述基本输入/输出系统906包括有用于显示信息的显示器908和用于用户输入信息的诸如鼠标、键盘之类的输入设备909。其中所述显示器908和输入设备909都通过连接到系统总线905的输入输出控制器910连接到中央处理单元901。所述基本输入/输出系统906还可以包括输入输出控制器910以用于接收和处理来自键盘、鼠标、或电子触控笔等多个其他设备的输入。类似地,输入输出控制器910还提供输出到显示屏、打印机或其他类型的输出设备。The basic input/output system 906 includes a display 908 for displaying information and an input device 909 such as a mouse and a keyboard for the user to input information. The display 908 and the input device 909 are both connected to the central processing unit 901 through the input and output controller 910 connected to the system bus 905. The basic input/output system 906 may also include an input and output controller 910 for receiving and processing input from multiple other devices such as a keyboard, a mouse, or an electronic stylus. Similarly, the input and output controller 910 also provides output to a display screen, a printer, or other types of output devices.
所述大容量存储设备907通过连接到系统总线905的大容量存储控制器(未示出)连接到中央处理单元901。所述大容量存储设备907及其相关联的计算机可读介质为计算机设备900提供非易失性存储。也就是说,所述大容量存储设备907可以包括诸如硬盘或者(Compact Disc Read-Only Memory,CD-ROM)驱动器之类的计算机可读介质(未示出)。The mass storage device 907 is connected to the central processing unit 901 through a mass storage controller (not shown) connected to the system bus 905. The mass storage device 907 and its associated computer readable medium provide non-volatile storage for the computer device 900. That is, the mass storage device 907 may include a computer-readable medium (not shown) such as a hard disk or a (Compact Disc Read-Only Memory, CD-ROM) drive.
不失一般性,所述计算机可读介质可以包括计算机存储介质和通信介质。计算机存储介质包括以用于存储诸如计算机可读指令、数据结构、程序模块或其他数据等信息的任何方法或技术实现的易失性和非易失性、可移动和不可移动介质。计算机存储介质包括RAM、ROM、计算机存储器(Erasable Programmable Read Only Memor,EPROM)、读写存储器(Electrically Erasable Programmable Read Only Memory,EEPROM)、闪存或其他固态存储其技术,CD-ROM、数字通用光盘(Digital Versatile Disc,DVD)或其他光学存储、磁带盒、磁带、磁盘存储或其他磁性存储设备。当然,本领域技术人员可知所述计算机存储介质不局限于上述几种。上述的系统存储器904和大容量存储设备907可以统称为存储器。Without loss of generality, the computer-readable media may include computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules or other data. Computer storage media include RAM, ROM, computer memory (Erasable Programmable Read Only Memor, EPROM), read-write memory (Electrically Erasable Programmable Read Only Memory, EEPROM), flash memory or other solid-state storage technology, CD-ROM, digital universal optical disk ( Digital Versatile Disc, DVD) or other optical storage, tape cartridges, magnetic tape, disk storage or other magnetic storage devices. Of course, those skilled in the art may know that the computer storage medium is not limited to the foregoing. The aforementioned system memory 904 and mass storage device 907 may be collectively referred to as a memory.
存储器存储有一个或多个程序,一个或多个程序被配置成由一个或多个中央处理单元901执行,一个或多个程序包含用于实现上述深度学习模型生成方法的指令,中央处理单元901执行该一个或多个程序实现上述各个方法实施例提供的方法。The memory stores one or more programs, one or more programs are configured to be executed by one or more central processing units 901, one or more programs contain instructions for implementing the above-mentioned deep learning model generation method, the central processing unit 901 Executing the one or more programs implements the methods provided in the foregoing method embodiments.
根据本申请的各种实施例,所述计算机设备900还可以通过诸如因特网等网络连接到网络上的远程计算机运行。也即计算机设备900可以通过连接在所述系统总线905上的网络接口单元911连接到网络912,或者说,也可以使用网络接口单元911来连接到其他类型的网络或远程计算机系统(未示出)。According to various embodiments of the present application, the computer device 900 may also be connected to a remote computer on the network through a network such as the Internet to run. That is, the computer device 900 can be connected to the network 912 through the network interface unit 911 connected to the system bus 905, or in other words, the network interface unit 911 can also be used to connect to other types of networks or remote computer systems (not shown). ).
所述存储器还包括一个或者一个以上的程序,所述一个或者一个以上程序 存储于存储器中,所述一个或者一个以上程序包含用于进行本申请实施例提供的方法中由计算机设备所执行的步骤。The memory further includes one or more programs, the one or more programs are stored in the memory, and the one or more programs include steps for performing the steps executed by the computer device in the method provided in the embodiments of the present application .
本申请实施例还提供一种计算机可读存储介质,该可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现上述任一实施例所述的深度学习模型生成方法。The embodiment of the present application also provides a computer-readable storage medium, the readable storage medium stores at least one instruction, at least one program, code set or instruction set, the at least one instruction, the at least one program, the The code set or instruction set is loaded and executed by the processor to implement the deep learning model generation method described in any of the foregoing embodiments.
根据本申请的一个方面,提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述方面的各种可选实现方式中提供的深度学习模型生成方法。According to one aspect of the present application, a computer program product or computer program is provided, the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the deep learning model generation method provided in the various optional implementations of the foregoing aspects.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,该计算机可读存储介质可以是上述实施例中的存储器中所包含的计算机可读存储介质;也可以是单独存在,未装配入终端中的计算机可读存储介质。该计算机可读存储介质中存储有至少一条指令、至少一段程序、代码集或指令集,所述至少一条指令、所述至少一段程序、所述代码集或指令集由所述处理器加载并执行以实现上述任一方法实施例所述的深度学习模型生成方法。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by a program instructing relevant hardware. The program can be stored in a computer-readable storage medium. The medium may be a computer-readable storage medium included in the memory in the foregoing embodiment; or may be a computer-readable storage medium that exists alone and is not assembled into the terminal. The computer-readable storage medium stores at least one instruction, at least one program, code set or instruction set, and the at least one instruction, the at least one program, the code set or the instruction set is loaded and executed by the processor To implement the deep learning model generation method described in any of the foregoing method embodiments.
可选的,该计算机可读存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、固态硬盘(SSD,Solid State Drives)或光盘等。其中,随机存取记忆体可以包括电阻式随机存取记忆体(ReRAM,Resistance Random Access Memory)和动态随机存取存储器(DRAM,Dynamic Random Access Memory)。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。Optionally, the computer-readable storage medium may include: read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), solid state drive (SSD, Solid State Drives), or optical disk. Among them, random access memory may include resistive random access memory (ReRAM, Resistance Random Access Memory) and dynamic random access memory (DRAM, Dynamic Random Access Memory). The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments.
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。A person of ordinary skill in the art can understand that all or part of the steps in the above embodiments can be implemented by hardware, or by a program to instruct relevant hardware. The program can be stored in a computer-readable storage medium. The storage medium mentioned can be a read-only memory, a magnetic disk or an optical disk, etc.
以上所述仅为本申请的较佳实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。The above descriptions are only preferred embodiments of this application, and are not intended to limit this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection of this application. Within range.

Claims (20)

  1. 一种深度学习模型生成方法,其中,所述方法包括:A method for generating a deep learning model, wherein the method includes:
    根据深度学习模型的模型文件生成第一源文件,所述模型文件包含所述深度学习模型中的权值矩阵;Generating a first source file according to the model file of the deep learning model, the model file containing the weight matrix in the deep learning model;
    获取所述深度学习模型对应的第二源文件,所述第二源文件为所述深度学习模型所采用神经网络结构的源文件;Acquiring a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model;
    对所述第一源文件和所述第二源文件进行编译,生成所述深度学习模型对应的目标文件。Compiling the first source file and the second source file to generate a target file corresponding to the deep learning model.
  2. 根据权利要求1所述的方法,其中,所述根据深度学习模型的模型文件生成第一源文件,包括:The method according to claim 1, wherein said generating the first source file according to the model file of the deep learning model comprises:
    编译规则文件对应源代码的过程中,运行所述规则文件中的目标脚本,所述规则文件用于向编译系统描述编译源文件的方式;In the process of compiling the rule file corresponding to the source code, run the target script in the rule file, and the rule file is used to describe the way of compiling the source file to the compilation system;
    根据所述模型文件,通过所述目标脚本生成所述第一源文件。According to the model file, the first source file is generated through the target script.
  3. 根据权利要求2所述的方法,其中,所述根据所述模型文件,通过所述目标脚本生成所述第一源文件,包括:The method according to claim 2, wherein said generating said first source file through said target script according to said model file comprises:
    对于所述模型文件中的各个权值矩阵,通过所述目标脚本生成各个所述权值矩阵对应的静态数组;For each weight matrix in the model file, generate a static array corresponding to each weight matrix through the target script;
    根据各个所述权值矩阵对应的所述静态数组生成所述第一源文件。The first source file is generated according to the static array corresponding to each weight matrix.
  4. 根据权利要求3所述的方法,其中,所述通过所述目标脚本生成各个所述权值矩阵对应的静态数组,包括:The method according to claim 3, wherein said generating static arrays corresponding to each of said weight matrixes through said target script comprises:
    根据所述权值矩阵的矩阵尺寸和数据类型,通过所述目标脚本设置所述静态数组,所述静态数组的数组大小根据所述矩阵尺寸确定,且所述静态数组的数组类型与数据类型相同;The static array is set by the target script according to the matrix size and data type of the weight matrix, the array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type ;
    根据所述权值矩阵的矩阵名称,通过所述目标脚本生成所述静态数组的数组名称;Generating the array name of the static array through the target script according to the matrix name of the weight matrix;
    根据所述权值矩阵中包含的权值数据,通过所述目标脚本生成所述静态数组的数组值。According to the weight data contained in the weight matrix, the array value of the static array is generated through the target script.
  5. 根据权利要求3所述的方法,其中于,所述对所述第一源文件和所述第二源文件进行编译,生成所述深度学习模型对应的目标文件,包括:3. The method according to claim 3, wherein said compiling said first source file and said second source file to generate a target file corresponding to said deep learning model comprises:
    编译过程中,根据所述第二源文件中目标张量Tensor对应的内存指针,将所述目标Tensor指向所述第一源文件中的目标静态数组,所述目标静态数组与所述目标Tensor具有相同的名称。During the compilation process, according to the memory pointer corresponding to the target tensor Tensor in the second source file, the target Tensor points to the target static array in the first source file, and the target static array and the target Tensor have The same name.
  6. 根据权利要求1至5任一所述的方法,其中,所述对所述第一源文件和所述第二源文件进行编译,生成所述深度学习模型对应的目标文件之后,所述方法还包括:The method according to any one of claims 1 to 5, wherein after the compiling the first source file and the second source file to generate the target file corresponding to the deep learning model, the method further include:
    当接收到深度学习模型推理请求时,将所述目标文件加载至内存,并执行所述目标文件进行深度学习模型推理。When a deep learning model reasoning request is received, the target file is loaded into the memory, and the target file is executed to perform deep learning model reasoning.
  7. 根据权利要求1至5任一所述的方法,其中,所述根据深度学习模型的模型文件生成第一源文件之前,所述方法还包括:The method according to any one of claims 1 to 5, wherein before the generating the first source file according to the model file of the deep learning model, the method further comprises:
    获取所述模型文件的数据量;Acquiring the data volume of the model file;
    若所述数据量大于阈值,则执行所述根据深度学习模型的模型文件生成第一源文件的步骤。If the amount of data is greater than the threshold, the step of generating the first source file according to the model file of the deep learning model is executed.
  8. 根据权利要求1至5任一所述的方法,其中,所述根据深度学习模型的模型文件生成第一源文件之前,所述方法还包括:The method according to any one of claims 1 to 5, wherein before the generating the first source file according to the model file of the deep learning model, the method further comprises:
    获取所述深度学习模型所采用神经网络结构的运行版本;Obtaining a running version of the neural network structure adopted by the deep learning model;
    若所述运行版本属于预设运行版本,则执行所述根据深度学习模型的模型文件生成第一源文件的步骤,所述预设运行版本包括图形处理器GPU运行版本和数字信号处理器DSP运行版本中的至少一种。If the running version belongs to the preset running version, the step of generating the first source file according to the model file of the deep learning model is executed. The preset running version includes the running version of the graphics processor GPU and the running version of the digital signal processor DSP. At least one of the versions.
  9. 根据权利要求1至5任一所述的方法,其中,所述第一源文件的存储目录与所述第二源文件的存储目录相同。The method according to any one of claims 1 to 5, wherein the storage directory of the first source file is the same as the storage directory of the second source file.
  10. 一种深度学习模型生成装置,所述装置包括:A device for generating a deep learning model, the device comprising:
    第一生成模块,用于根据深度学习模型的模型文件生成第一源文件,所述模型文件包含所述深度学习模型中的权值矩阵;The first generating module is configured to generate a first source file according to the model file of the deep learning model, the model file containing the weight matrix in the deep learning model;
    第一获取模块,用于获取所述深度学习模型对应的第二源文件,所述第二源文件为所述深度学习模型所采用神经网络结构的源文件;A first acquisition module, configured to acquire a second source file corresponding to the deep learning model, where the second source file is a source file of a neural network structure adopted by the deep learning model;
    第二生成模块,用于对所述第一源文件和所述第二源文件进行编译,生成所述深度学习模型对应的目标文件。The second generating module is configured to compile the first source file and the second source file to generate a target file corresponding to the deep learning model.
  11. 根据权利要求10所述的装置,其中,所述第一生成模块包括:The apparatus according to claim 10, wherein the first generating module comprises:
    运行单元,用于编译规则文件对应源代码的过程中,运行所述规则文件中的目标脚本,所述规则文件用于向编译系统描述编译源文件的方式;The running unit is used to run the target script in the rule file in the process of compiling the source code corresponding to the rule file, and the rule file is used to describe the way of compiling the source file to the compilation system;
    第一生成单元,用于根据所述模型文件,通过所述目标脚本生成所述第一源文件。The first generating unit is configured to generate the first source file through the target script according to the model file.
  12. 根据权利要求11所述的装置,其中,所述第一生成单元还用于:The device according to claim 11, wherein the first generating unit is further configured to:
    对于所述模型文件中的各个权值矩阵,通过所述目标脚本生成各个所述权值矩阵对应的静态数组;For each weight matrix in the model file, generate a static array corresponding to each weight matrix through the target script;
    根据各个所述权值矩阵对应的所述静态数组生成所述第一源文件。The first source file is generated according to the static array corresponding to each weight matrix.
  13. 根据权利要求12所述的装置,其中,所述第一生成单元还用于:The device according to claim 12, wherein the first generating unit is further configured to:
    根据所述权值矩阵的矩阵尺寸和数据类型,通过所述目标脚本设置所述静态数组,所述静态数组的数组大小根据所述矩阵尺寸确定,且所述静态数组的数组类型与数据类型相同;The static array is set by the target script according to the matrix size and data type of the weight matrix, the array size of the static array is determined according to the matrix size, and the array type of the static array is the same as the data type ;
    根据所述权值矩阵的矩阵名称,通过所述目标脚本生成所述静态数组的数组名称;Generating the array name of the static array through the target script according to the matrix name of the weight matrix;
    根据所述权值矩阵中包含的权值数据,通过所述目标脚本生成所述静态数组的数组值。According to the weight data contained in the weight matrix, the array value of the static array is generated through the target script.
  14. 根据权利要求12所述的装置,其中,所述第一生成单元还用于:The device according to claim 12, wherein the first generating unit is further configured to:
    编译过程中,根据所述第二源文件中目标Tensor对应的内存指针,将所述目标Tensor指向所述第一源文件中的目标静态数组,所述目标静态数组与所述 目标Tensor具有相同的名称。During the compilation process, according to the memory pointer corresponding to the target Tensor in the second source file, the target Tensor is pointed to the target static array in the first source file, and the target static array and the target Tensor have the same name.
  15. 根据权利要求10至14任一所述的装置,其中,所述装置还包括:The device according to any one of claims 10 to 14, wherein the device further comprises:
    推理模块,用于当接收到深度学习模型推理请求时,将所述目标文件加载至内存,并执行所述目标文件进行深度学习模型推理。The reasoning module is used to load the target file into the memory when receiving a deep learning model reasoning request, and execute the target file to perform deep learning model reasoning.
  16. 根据权利要求10至14任一所述的装置,其中,所述装置还包括:The device according to any one of claims 10 to 14, wherein the device further comprises:
    第二获取模块,用于获取所述模型文件的数据量;The second acquiring module is used to acquire the data volume of the model file;
    第三生成模块,用于若所述数据量大于阈值,则执行所述根据深度学习模型的模型文件生成第一源文件的步骤。The third generation module is configured to perform the step of generating the first source file according to the model file of the deep learning model if the amount of data is greater than the threshold.
  17. 根据权利要求10至14任一所述的装置,其中,所述装置还包括:The device according to any one of claims 10 to 14, wherein the device further comprises:
    第三获取模块,用于获取所述深度学习模型所采用神经网络结构的运行版本;The third acquisition module is used to acquire the running version of the neural network structure adopted by the deep learning model;
    第四生成模块,用于若所述运行版本属于预设运行版本,则执行所述根据深度学习模型的模型文件生成第一源文件的步骤,所述预设运行版本包括GPU运行版本和DSP运行版本中的至少一种。The fourth generation module is configured to execute the step of generating the first source file according to the model file of the deep learning model if the running version belongs to the preset running version, and the preset running version includes the GPU running version and the DSP running version. At least one of the versions.
  18. 根据权利要求10至14任一所述的装置,其中,所述第一源文件的存储目录与所述第二源文件的存储目录相同。The apparatus according to any one of claims 10 to 14, wherein the storage directory of the first source file is the same as the storage directory of the second source file.
  19. 一种计算机设备,所述计算机设备包括处理器和存储器;所述存储器存储有至少一条指令,所述至少一条指令用于被所述处理器执行以实现如权利要求1至9任一所述的深度学习模型生成方法。A computer device, the computer device includes a processor and a memory; the memory stores at least one instruction, and the at least one instruction is used to be executed by the processor to implement any one of claims 1 to 9 Deep learning model generation method.
  20. 一种计算机可读存储介质,所述存储介质存储有至少一条指令,所述至少一条指令用于被处理器执行以实现如权利要求1至9任一所述的深度学习模型生成方法。A computer-readable storage medium storing at least one instruction, and the at least one instruction is used to be executed by a processor to implement the deep learning model generation method according to any one of claims 1 to 9.
PCT/CN2020/117196 2019-09-23 2020-09-23 Deep learning model generation method and apparatus, device, and storage medium WO2021057807A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910897445.7 2019-09-23
CN201910897445.7A CN110598855B (en) 2019-09-23 2019-09-23 Deep learning model generation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2021057807A1 true WO2021057807A1 (en) 2021-04-01

Family

ID=68862253

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117196 WO2021057807A1 (en) 2019-09-23 2020-09-23 Deep learning model generation method and apparatus, device, and storage medium

Country Status (2)

Country Link
CN (1) CN110598855B (en)
WO (1) WO2021057807A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115080240A (en) * 2022-06-29 2022-09-20 美的集团(上海)有限公司 Deployment method of voice processing model, electronic equipment and storage medium
CN116257286A (en) * 2023-03-13 2023-06-13 北京百度网讯科技有限公司 File processing method and device, electronic equipment and storage medium
WO2024061287A1 (en) * 2022-09-23 2024-03-28 维沃移动通信有限公司 Artificial intelligence (ai) model transmission method and apparatus, and terminal and medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598855B (en) * 2019-09-23 2023-06-09 Oppo广东移动通信有限公司 Deep learning model generation method, device, equipment and storage medium
CN113269323B (en) * 2020-02-17 2024-03-12 北京达佳互联信息技术有限公司 Data processing method, processing device, electronic equipment and storage medium
CN111338693B (en) * 2020-02-22 2023-07-14 深圳市魔数智擎人工智能有限公司 Model construction-based target file generation method, server and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633295A (en) * 2017-09-25 2018-01-26 北京地平线信息技术有限公司 For the method and apparatus for the parameter for being adapted to neutral net
CN107958285A (en) * 2017-11-21 2018-04-24 深圳普思英察科技有限公司 The mapping method and device of the neutral net of embedded system
US20190057036A1 (en) * 2018-10-15 2019-02-21 Amrita MATHURIYA Programmable interface to in-memory cache processor
CN109754073A (en) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 Data processing method, device, electronic equipment and readable storage medium storing program for executing
CN110598855A (en) * 2019-09-23 2019-12-20 Oppo广东移动通信有限公司 Deep learning model generation method, device, equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10157045B2 (en) * 2016-11-17 2018-12-18 The Mathworks, Inc. Systems and methods for automatically generating code for deep learning systems
US10956500B2 (en) * 2017-01-19 2021-03-23 Google Llc Dynamic-length stateful tensor array
CN106951926B (en) * 2017-03-29 2020-11-24 山东英特力数据技术有限公司 Deep learning method and device of hybrid architecture
WO2019086104A1 (en) * 2017-10-30 2019-05-09 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Neural network representation
CN109496294A (en) * 2018-01-15 2019-03-19 深圳鲲云信息科技有限公司 The Compilation Method and system of artificial intelligence process device, storage medium and terminal
CN109033309B (en) * 2018-07-17 2023-04-07 腾讯科技(深圳)有限公司 Data processing method, device, equipment and storage medium
CN110033086B (en) * 2019-04-15 2022-03-22 广州异构智能科技有限公司 Hardware accelerator for neural network convolution operations

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633295A (en) * 2017-09-25 2018-01-26 北京地平线信息技术有限公司 For the method and apparatus for the parameter for being adapted to neutral net
CN107958285A (en) * 2017-11-21 2018-04-24 深圳普思英察科技有限公司 The mapping method and device of the neutral net of embedded system
US20190057036A1 (en) * 2018-10-15 2019-02-21 Amrita MATHURIYA Programmable interface to in-memory cache processor
CN109754073A (en) * 2018-12-29 2019-05-14 北京中科寒武纪科技有限公司 Data processing method, device, electronic equipment and readable storage medium storing program for executing
CN110598855A (en) * 2019-09-23 2019-12-20 Oppo广东移动通信有限公司 Deep learning model generation method, device, equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115080240A (en) * 2022-06-29 2022-09-20 美的集团(上海)有限公司 Deployment method of voice processing model, electronic equipment and storage medium
CN115080240B (en) * 2022-06-29 2023-10-10 美的集团(上海)有限公司 Voice processing model deployment method, electronic equipment and storage medium
WO2024061287A1 (en) * 2022-09-23 2024-03-28 维沃移动通信有限公司 Artificial intelligence (ai) model transmission method and apparatus, and terminal and medium
CN116257286A (en) * 2023-03-13 2023-06-13 北京百度网讯科技有限公司 File processing method and device, electronic equipment and storage medium
CN116257286B (en) * 2023-03-13 2023-09-15 北京百度网讯科技有限公司 File processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN110598855B (en) 2023-06-09
CN110598855A (en) 2019-12-20

Similar Documents

Publication Publication Date Title
WO2021057807A1 (en) Deep learning model generation method and apparatus, device, and storage medium
US11150877B2 (en) Automatically generating machine learning models for software tools that operate on source code
US9383978B2 (en) Apparatus and method for on-demand optimization of applications
US20210158131A1 (en) Hierarchical partitioning of operators
US11816545B2 (en) Optimizing machine learning models
Zhou et al. {PetS}: A unified framework for {Parameter-Efficient} transformers serving
CN111194437A (en) Data processing offload using in-memory code execution
WO2012092211A2 (en) Emulating pointers
CN114237714A (en) Command packet generation method and device, electronic equipment and storage medium
US20180329729A1 (en) Software-defined microservices
CN112990461B (en) Method, device, computer equipment and storage medium for constructing neural network model
US20230062336A1 (en) String localization for universal use
US20210182041A1 (en) Method and apparatus for enabling autonomous acceleration of dataflow ai applications
CN112269606B (en) Application processing program dynamic loading method of brain-like computer operating system
CN112527264B (en) Constant data access optimization method based on heterogeneous platform
KR20200108789A (en) Method and computer program of processing program for single accelerator using dnn framework on plural accelerators
JP7329662B1 (en) System, method, program and information processing device for efficient control and use of computational resources
Temple Lang Enhancing R with advanced compilation tools and methods
US11876681B2 (en) Topology recommendation platform for application architecture
US11775655B2 (en) Risk assessment of a container build
US20220067502A1 (en) Creating deep learning models from kubernetes api objects
US11379353B1 (en) Platform for test environments
JP2019036278A (en) System and method of emulating execution of files
US20230130627A1 (en) Method for collaboration using cell-based computational notebooks
US10402306B2 (en) Parallel tracing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20867804

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20867804

Country of ref document: EP

Kind code of ref document: A1