CN111178517A

CN111178517A - Model deployment method, system, chip, electronic device and medium

Info

Publication number: CN111178517A
Application number: CN202010064768.0A
Authority: CN
Inventors: 黄文豪; 黄杰; 杨忠程
Original assignee: Shanghai Yitu Network Science and Technology Co Ltd
Current assignee: Shanghai Yitu Network Science and Technology Co Ltd
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2020-05-19
Anticipated expiration: 2040-01-20
Also published as: CN111178517B

Abstract

The model deployment method, system, chip, electronic device and medium comprises the following steps: obtaining a training result of the model configuration to be converted; splitting a training result; searching the split result in a database, and converting codes; and reconnecting the converted model codes to obtain the deployment model. After the specified task type and the input and output form of the model are obtained, training data and verification data are input, model parameters can be automatically trained and adjusted during training, simultaneously, the hyper-parameters and the structure of the model are adjusted through an optimization algorithm, and finally, iteration is carried out and the optimal model is selected.

Description

Model deployment method, system, chip, electronic device and medium

Technical Field

The invention relates to a computer data processing system and a computer data processing method, in particular to a model deployment method, a system, a chip, electronic equipment and a medium.

Background

The essence of the existing Artificial Intelligence (AI) research and development system is to provide a database of software, and a user needs to have a certain programming background in use; if a good calculation model is to be obtained, the user needs to adjust and select parameters by himself, so that relevant deep learning optimization knowledge is needed. Meanwhile, the existing Artificial Intelligence (AI) research and development system can only support a single task of a single user generally, and can only manually allocate various resources under the condition of multiple users, so that the utilization rate of computing resources is poor.

In an Artificial Intelligence (AI) research and development system, an existing training tool generally only supports training through a preset model after data input, and the effect is often poor when a new task is faced. If the model needs to be optimized, knowledge about model optimization is needed, and optimization is achieved through manual design and programming, so that the user range is narrow. The existing other automatic model optimization algorithms generally automatically design a network model by methods such as RNN (radio network) and the like, and the method has the disadvantages of slow process, poor parallelism and large amount of data; the method is not suitable for the situations of medium data quantity and small computing resources.

In addition, the application system is deployed and operated on the cloud platform to provide services for the outside. In the current model deployment link, conversion codes are required to be manually written for conversion according to the deployment environment and the target format; and whether the test fails or not can not be verified, a test script needs to be manually written for testing, which is very complicated and consumes a lot of time.

Disclosure of Invention

The invention aims to overcome the defects of the artificial intelligence research and development system and provides a model deployment method, a model deployment system, electronic equipment and a medium suitable for a model optimization deployment system so as to solve the existing problems.

In order to achieve the above object, the present invention provides a model deployment method for a model optimized deployment system, including:

s1, obtaining the training result of the network model to be converted;

s2, splitting the training result;

s3, searching the split result in the database, and converting the code;

and S4, reconnecting the converted model codes to obtain the deployment model.

In some embodiments, the method further includes S5, extracting feature layers from the network model to be converted and the deployment model in the hardware environment to be deployed by using the test data, and sequentially and correspondingly calculating a given vector distance between each two; and if the difference is smaller than a preset threshold value, the results are considered to be aligned, and a deployable model is output as a conversion result.

In some embodiments, in S2, the splitting is performed according to a network structure of the training results.

In some embodiments, the network structure is a computing layer or sub-structure.

In some embodiments, in S2, if a preset structure in the database is found during splitting, the preset structure is saved; and when the system is deployed, selecting a conversion method according to the stored preset structure.

In some embodiments, the lookup is performed in a lookup table to obtain the code in S3.

In some embodiments, in S3, the database is a database of a hardware environment-computing framework.

The present invention also provides a model deployment system, comprising:

the data preprocessing module receives the backflow data and then outputs a preprocessed data set;

the model training module is used for executing a model training method on the data set to obtain an optimal configuration model;

and the model deployment module is used for adapting and converting the optimal configuration model according to the hardware environment to be deployed, and finally deploying in a cross-platform mode.

In some embodiments, the system further comprises a data annotation module for annotating the auxiliary data.

In some embodiments, the system further comprises a storage module for performing unified storage of the data of the whole system.

In addition, the invention also executes a model training method based on the model deployment system, which comprises the following steps:

training a group of models on a preset basic neural network, and selecting a model T0 with the best performance on a verification set;

step two, optimizing the test result on T0 to obtain a plurality of candidate experimental configurations, training the candidate experimental configurations and obtaining a model T1' with optimal performance;

step three, retraining the T1' to ensure that the average value of retraining performance results is greater than the performance of T0; thus obtaining the optimal model configuration.

In some embodiments, in step two, the optimization method for T0 is to adjust parameters.

In some embodiments, the adjustment parameters include one or a combination of: model width, learning rate, optimization strategy, whether data enhancement is used, parameters of data enhancement and network unit module.

In some embodiments, in the retraining of T1' by step three S3, if the retrained performance result is greater than the performance result of T0, T0 is replaced as a new alternative experimental configuration, i.e., iterative optimization.

In some embodiments, the process of iterative optimization loops repeatedly until performance is optimized.

In some embodiments, the number of iterative optimizations is two rounds.

The invention also provides a chip, which comprises a processor and is used for calling and running the computer program from the memory so that the equipment provided with the chip executes any one of the model deployment methods.

The invention also provides an electronic device, which comprises a processor and a memory for storing executable instructions of the processor, wherein the processor executes any one of the model deployment methods when running.

The present invention also provides a computer readable medium having stored thereon computer program instructions which, when executed by a processor, implement any of the model deployment methods described above.

Compared with the prior art, the method has the advantages that after the specified task type and the input and output form of the model are obtained, the training data and the verification data are input, the model parameters can be automatically trained and adjusted during training, meanwhile, the hyper-parameters and the structure of the model are adjusted through an optimization algorithm, and finally, iteration is carried out and the optimal model is selected.

The invention provides a system for optimizing and deploying a model under the condition of data by personnel without programming background and deep learning algorithm background, which can automatically perform data preprocessing, automatic training and tuning of an AI model, evaluation and verification of the AI model and automatic deployment of the AI model, and supports multi-user and multi-task model research and development through a computing resource containerization module and a storage resource module.

The invention is a method which can automatically convert the trained model into cafe or TF pb or other common model formats based on the model optimization deployment system, automatically verify whether the model can be deployed on the corresponding hardware system expected to be deployed, and automatically perform some optimizations.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of the architecture design of the model optimized deployment system.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention; it is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

Referring to fig. 2, in the present embodiment, a system is deployed based on a model, and the architecture design of the system is as shown in fig. 2, and the system mainly includes a training flow part and a resource scheduling part. The training process part comprises a data marking module for carrying out data marking on the auxiliary data, a data preprocessing module for preprocessing, cleaning and segmenting the data after the data marking, and a model training module for carrying out automatic training and parameter tuning on the processed data.

And the scheduling part is used for automatically adapting and converting the obtained model according to the expected deployed hardware environment through the model deployment module after repeated data iteration and model tuning. And finally, the whole training process is efficiently and simply carried out.

Wherein: the data preprocessing module is used for receiving the backflow data and outputting a preprocessed data set;

the model training module is used for carrying out automatic training and parameter tuning on the processed data set to obtain an optimal configuration model;

and the model deployment module is used for automatically adapting and converting the optimal configuration model according to the environment of the hardware to be deployed and finally deploying the optimal configuration model in the hardware.

The system also comprises a data labeling module for labeling the auxiliary data. In order to support multi-user multitasking, it is preferable that the unified scheduling of the computing resources is performed by the containerized shared module, and the storage module performs unified storage of data, so as to efficiently utilize the computing resources.

The model optimization deployment system realizes automatic optimization of algorithm configuration, and can select a more optimal technical mode according to a plan arrangement environment; meanwhile, the unusable structures can be avoided according to the target scene and listed as a black list.

Based on the system, in the model training module, the model training is as follows:

preferably, the optimization method is to try to adjust various preset parameters in sequence. And the adjustment parameters include one or a combination of the following: model width, learning rate, optimization strategy, whether data enhancement is used, parameters of data enhancement, and network unit module selection.

And step three, retraining the T1' to ensure that the average value of the performance results is greater than that of the T0, and the optimal model configuration is the final optimized result. Especially in the retraining of T1', if the performance result is larger than T0, replacing T0 as a new experimental configuration and continuing the iterative optimization of the process; and stopping iterative optimization until the performance of the model configuration cannot be stably improved continuously.

Preferably, the number of iterative optimizations is generally two.

During optimization, according to a preset hardware environment to be deployed, limiting the attempted optimization direction, such as limiting the operation speed not to be lower than certain lower limits, using the one supported by a specific hardware platform or better optimized when selecting a network module, and the like.

Referring to fig. 1, after performing multiple iterative optimizations on a network model to obtain a final optimization result, a model deployment method is performed, including:

s1, obtaining the training result of the network model to be converted, i.e. the result obtained after stopping the iterative optimization in the third step.

S2, splitting the training result; the splitting is generally performed by computational layer. If the preset structure in the library is identified and found during splitting, the structure is saved.

S3, searching the split result in the database, and converting the code; the database is a database of a hardware environment-computing framework, including the types of servers that can be deployed, as well as the types of computing frameworks. The server comprises but is not limited to a Graphic Processing Unit (GPU), a Central Processing Unit (CPU), an ARM processor, an AI chip, a mobile phone and the like; computing frameworks include, but are not limited to, TF, caffe, and torch, among others. As many types as possible should be included to meet different requirements.

Wherein Caffe is a deep learning framework with expressiveness, speed and thinking modularization; the method supports various deep learning architectures, faces to image classification and image segmentation, and also supports CNN, RCNN, LSTM and fully-connected neural network design.

The TF, namely TensorFlow, is a symbolic mathematical system based on data flow programming, and is widely applied to programming realization of various machine learning algorithms; the TF framework has a multi-level structure, can be deployed in various servers, PC terminals and webpages, and supports GPU and TPU high-performance numerical calculation.

And the Torch is a scientific computing framework supported by a large number of machine learning algorithms, and is characterized by being particularly flexible and adopting a programming language Lua.

And then, the split computing layer or the split sub-structure is searched in a database (table look-up mode) and converted according to a preset code. The substructures in the database select the optimal conversion method based on the format of the planned conversion or the platform that it is desired to deploy.

And S4, reconnecting the converted model codes to obtain the deployment model.

Due to the fact that the experimental environment and the actual deployment environment are different, different hardware platforms and different computing frameworks are possible. Therefore, in the last loop of the overall model training process, the trained network and its corresponding weights need to be converted into the model format of the target environment for final deployment. This step traditionally requires targeted writing of translation code for the network architecture and source/target model format. This environment should be automated as part of the end-to-end AI system.

And finally, respectively calculating the original model in the experimental environment and the deployment model in the environment to be deployed by using preset test data (pictures, videos and the like), respectively extracting part of feature layers, and sequentially and correspondingly calculating the given vector distance between every two feature layers. And if the difference is smaller than a certain acceptable threshold value, the results are considered to be aligned, the conversion is successful, and the model is output as the final conversion result.

The method can automatically convert the trained model into a cafe or TF pb or other common model formats, automatically verify whether the model can be deployed on a corresponding hardware platform expected to be deployed, and automatically perform some optimization.

Referring to fig. 2, the present invention further provides a model optimization deployment system, including: and the data preprocessing module is used for receiving the backflow data and then outputting a preprocessed data set.

The model training module is used for executing a training method on the data set to obtain an optimal configuration model; specifically, a group of models is trained on a preset basic neural network, and a model T0 with the best performance on a verification set is selected; then optimizing on T0 and obtaining a plurality of candidate experimental configurations, training the candidate experimental configurations and obtaining a model T1' with optimal performance, and adjusting parameters of the optimization method of T0, wherein the parameters comprise one or the combination of the following steps: the model width, the learning rate, the optimization strategy, whether data enhancement is used or not, parameters of the data enhancement and a network unit module;

finally, T1' is retrained, and the average value of the retrained performance results is ensured to be larger than the performance of T0; thus obtaining the optimal model configuration. In the retraining of T1', if the retraining performance result is greater than that of T0, T0 is replaced as a new alternative experimental configuration, i.e. iterative optimization. And repeatedly circulating until the performance is optimal. And is typically twice.

And the model deployment module is used for adapting and converting the optimal configuration model according to the environment of the hardware to be deployed and finally deploying the optimal configuration model in the hardware. Specifically, a training result of a network model to be converted is obtained; splitting the training result according to network structures such as the network structure of the training result; then, searching the split result in a database of a hardware environment-computing frame through table lookup, and converting codes; finally, the converted model codes are reconnected to obtain the deployment model. Extracting feature layers from the network model to be converted and the deployment model in the hardware environment to be deployed by using the test data, and sequentially and correspondingly calculating the given vector distance between every two model; and if the difference is smaller than a preset threshold value, the results are considered to be aligned, and a deployable model is output as a conversion result.

In addition, the present invention also provides an electronic device including: at least one processor; a memory coupled to the at least one processor, the memory storing executable instructions, wherein the executable instructions, when executed by the at least one processor, cause the method of the present invention as described above to be implemented.

For example, the memory may include random access memory, flash memory, read only memory, programmable read only memory, non-volatile memory or registers, and the like. The processor may be a Central Processing Unit (CPU) or the like. Or a Graphics Processing Unit (GPU) memory may store executable instructions. The processor may execute executable instructions stored in the memory to implement the various processes described herein.

It will be appreciated that the memory in this embodiment can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile memory may be a ROM (Read-only memory), a PROM (programmable Read-only memory), an EPROM (erasable programmable Read-only memory), an EEPROM (electrically erasable programmable Read-only memory), or a flash memory. The volatile memory may be a RAM (random access memory) which serves as an external cache. By way of illustration and not limitation, many forms of RAM are available, such as SRAM (staticaram, static random access memory), DRAM (dynamic RAM, dynamic random access memory), SDRAM (synchronous DRAM ), DDRSDRAM (double data rate SDRAM, double data rate synchronous DRAM), ESDRAM (Enhanced SDRAM, Enhanced synchronous DRAM), SLDRAM (synchlink DRAM, synchronous link DRAM), and DRRAM (directrrambus RAM, direct memory random access memory). The memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, the memory stores elements, upgrade packages, executable units, or data structures, or a subset thereof, or an extended set thereof: an operating system and an application program.

The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application programs comprise various application programs and are used for realizing various application services. The program for implementing the method of the embodiment of the present invention may be included in the application program.

In the embodiment of the present invention, the processor is configured to execute the above method steps by calling a program or an instruction stored in the memory, specifically, a program or an instruction stored in the application program.

The embodiment of the invention also provides a chip for executing the method. Specifically, the chip includes: and the processor is used for calling and running the computer program from the memory so that the equipment provided with the chip is used for executing the method.

The present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the above-described method of the present invention.

For example, the machine-readable storage medium may include, but is not limited to, various known and unknown types of non-volatile memory.

Embodiments of the present invention also provide a computer program product, which includes computer program instructions, and the computer program instructions enable a computer to execute the above method.

Those of skill in the art would understand that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments of the present application, the disclosed system, electronic device, and method may be implemented in other ways. For example, the division of the unit is only one logic function division, and there may be another division manner in actual implementation. For example, multiple units or components may be combined or may be integrated into another system. In addition, the coupling between the respective units may be direct coupling or indirect coupling. In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or may exist separately and physically.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a machine-readable storage medium. Therefore, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a machine-readable storage medium and may include several instructions to cause an electronic device to perform all or part of the processes of the technical solution described in the embodiments of the present application. The storage medium may include various media that can store program codes, such as ROM, RAM, a removable disk, a hard disk, a magnetic disk, or an optical disk.

The above description is only for the specific embodiments of the present application, and the scope of the present application is not limited thereto. Those skilled in the art can make changes or substitutions within the technical scope disclosed in the present application, and such changes or substitutions should be within the protective scope of the present application.

Claims

1. A method of model deployment, comprising:

s1, obtaining the training result of the network model to be converted;

s2, splitting the training result;

s3, searching the split result in the database, and converting the code;

s4, reconnecting the converted model codes to obtain a deployment model;

s5, extracting feature layers from the network model to be converted and the deployment model in the hardware environment to be deployed according to the test data, and sequentially and correspondingly calculating the given vector distance between every two models; and if the difference is smaller than a preset threshold value, the results are considered to be aligned, and a deployable model is output as a conversion result.

2. The model deployment method of claim 1, wherein: in S2, the splitting is performed according to the network structure of the training result.

3. The model deployment method of claim 2, wherein: the network structure is a computing layer or sub-structure.

4. The model deployment method of any of claims 1-3, characterized in that: in S2, if a preset structure in the database is found during splitting, the preset structure is saved; and when the system is deployed, selecting a conversion method according to the stored preset structure.

5. The model deployment method of claim 1, wherein: in S3, the lookup is performed as a table lookup to obtain the code.

6. A model deployment system, comprising:

the model training module is used for performing training optimization on the data set to obtain an optimal configuration model;

the model deployment module executes the model deployment method of any one of claims 1 to 6 according to the environment of the hardware to be deployed, adapts and converts the optimal configuration model, and finally deploys the optimal configuration model on the hardware.

7. The model deployment system of claim 6, wherein: the system also comprises a data labeling module for labeling the auxiliary data.

8. A chip, comprising a processor for calling and running a computer program from a memory so that a device on which the chip is installed performs the model deployment method of any one of claims 1-5.

9. An electronic device, characterized in that: comprising a processor and a memory for storing executable instructions of the processor, the processor performing the model deployment method of any of claims 1-5 when running.

10. A computer-readable medium, characterized in that: stored thereon computer program instructions which, when processed and executed, implement the model deployment method of any one of claims 1-5.