CN111178517B

CN111178517B - Model deployment method, system, chip, electronic equipment and medium

Info

Publication number: CN111178517B
Application number: CN202010064768.0A
Authority: CN
Inventors: 黄文豪; 黄杰; 杨忠程
Original assignee: Shanghai Yitu Technology Co ltd
Current assignee: Shanghai Yitu Technology Co ltd
Priority date: 2020-01-20
Filing date: 2020-01-20
Publication date: 2023-12-05
Anticipated expiration: 2040-01-20
Also published as: CN111178517A

Abstract

The model deployment method, system, chip, electronic equipment and medium comprises the following steps: obtaining a training result of the model configuration to be converted; splitting the training result; searching the split result in the database, and converting the code; and reconnecting the converted model codes to obtain a deployment model. According to the application, after the input and output forms of the designated task type and the model are obtained, training data and verification data are input, model parameters can be automatically trained and adjusted during training, super parameters and structures of the model are adjusted through an optimization algorithm, and finally, the optimal model is iterated and selected.

Description

Model deployment method, system, chip, electronic equipment and medium

Technical Field

The application relates to a computer data processing system and method, in particular to a model deployment method, a system, a chip, electronic equipment and a medium.

Background

The essence of the existing Artificial Intelligence (AI) research and development system is to provide a database of software, which requires a certain programming background for users in use; if a good calculation model is to be obtained, the user needs to adjust and select parameters, and therefore, relevant deep learning optimization knowledge is needed. Meanwhile, the existing Artificial Intelligence (AI) development system generally only can support a single user and a single task, but can only manually allocate various resources under the condition of multiple users, and the utilization rate of the computing resources is poor.

In an Artificial Intelligence (AI) development system, an existing training tool generally only supports training through a preset model after data is input, and the training tool often has poor effect when facing a new task. If the model needs to be optimized, knowledge about the optimization of the model is also needed, and the optimization is realized by manual design and programming, so that the user range is narrow. The existing other automatic model optimization algorithms generally automatically design a network model through methods such as RNN, and the method has the disadvantages of slower process and poor parallelism and needs a large amount of data; the method is not applicable to the situation that the data volume is medium and the computing resources are small.

In addition, the application system is deployed and run on the cloud platform to provide services externally. At present, in a model deployment link, aiming at a deployment environment and a target format, a conversion code is manually written for conversion; and whether the verification fails or not can not be verified, and the test script needs to be manually written for testing, so that the test is very tedious and takes a lot of time.

Disclosure of Invention

Aiming at the defects of the artificial intelligence research and development system, the application aims to provide a model deployment method, a system, electronic equipment and a medium suitable for a model optimization deployment system so as to solve the existing problems.

In order to achieve the above object, the present application provides a model deployment method of a model optimization deployment system, comprising:

s1, obtaining a training result of a network model to be converted;

s2, splitting a training result;

s3, searching the split result in a database, and converting the split result into codes;

s4, reconnecting the converted model codes to obtain a deployment model.

In some embodiments, the method further includes S5, extracting feature layers from the test data for the network model to be converted and the deployment model in the hardware environment to be deployed, and correspondingly calculating given vector distances between every two pairs in sequence; if the difference is smaller than the preset threshold value, the results are considered to be aligned, and the deployable model is output as a conversion result.

In some embodiments, in S2, the splitting is performed according to a network structure of training results.

In some embodiments, the network structure is a computational layer or sub-structure.

In some embodiments, in S2, if a preset structure in the database is identified and found during splitting, the preset structure is saved; and when the method is deployed, selecting a conversion method according to the stored preset structure.

In some embodiments, in S3, the lookup is performed in a look-up table to obtain the code.

In some embodiments, in S3, the database is a database of a hardware environment-computing framework.

The application also provides a model deployment system, comprising:

the data preprocessing module receives the reflow data and then outputs a preprocessed data set;

the model training module is used for executing a model training method on the data set to obtain an optimal configuration model;

the model deployment module is used for adapting and converting the optimal configuration model according to the hardware environment to be deployed, and finally deploying the optimal configuration model in a cross-platform mode.

In some embodiments, the system further comprises a data labeling module for labeling the auxiliary data.

In some embodiments, the system further comprises a storage module for performing unified storage of the whole system data.

In addition, the application also provides a model training method based on the model deployment system, which comprises the following steps:

training a group of models on a preset basic neural network, and selecting a model T0 with the best performance on a verification set;

step two, optimizing on T0 and then obtaining a plurality of alternative experimental configurations, training the plurality of alternative experimental configurations and obtaining a model T1' with optimal performance;

retraining the T1', and ensuring that the average value of the retraining performance results is larger than the performance of T0; and obtaining the optimal model configuration.

In some embodiments, in the second step, the optimization method for T0 is an adjustment parameter.

In some embodiments, the adjustment parameters include one or a combination of the following: model width, learning rate, optimization strategy, whether data enhancement is used, parameters of data enhancement, and network unit modules.

In some embodiments, in the retraining of T1' in step three S3, if the performance result of the retraining is greater than the performance result of T0, then T0 is replaced as a new alternative experimental configuration, i.e., iterative optimization.

In some embodiments, the iterative optimization process loops repeatedly until performance is optimal.

In some embodiments, the number of iterative optimizations is two.

The application also provides a chip comprising a processor for calling and running a computer program from a memory, so that a device on which the chip is installed performs the model deployment method according to any one of the above.

The application also provides an electronic device comprising a processor and a memory for storing executable instructions of the processor, the processor executing any of the model deployment methods described above when running.

The application also provides a computer readable medium having stored thereon computer program instructions which, when processed and executed, implement any of the above described model deployment methods.

Compared with the prior art, after the input and output forms of the designated task type and the model are obtained, training data and verification data are input, the model parameters can be automatically trained and adjusted during training, simultaneously, the super parameters and the structure of the model are adjusted through an optimal adjustment algorithm, and finally, the optimal model is iterated and selected.

The application provides a system for optimizing and deploying models for personnel without programming background and deep learning algorithm background under the condition of data, which can automatically preprocess data, automatically train and optimize an AI model, evaluate and verify the AI model and automatically deploy the AI model, and store a resource module through a computing resource containerization module, so as to support multi-user and multi-task model research and development.

The application is based on a method for optimizing the deployment system by the model, which can automatically convert the trained model into caffe or TF pb or other common model formats, automatically verify whether the model can be deployed on the corresponding hardware system which is expected to be deployed, and automatically perform some optimizations.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of the present application;

FIG. 2 is a schematic diagram of a framework design of a model-optimized deployment system.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application; it will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 2, the embodiment is based on a model deployment system, and the architecture design of the system is shown in fig. 2, and the system mainly comprises a training flow part and a resource scheduling part. The training flow part comprises a data labeling module for assisting in data labeling, a data preprocessing module for preprocessing, cleaning and segmenting the data after the data labeling, and a model training module for automatically training and parameter tuning the processed data.

And the scheduling part is used for automatically adapting and converting the obtained model according to the hardware environment which is expected to be deployed through the model deployment module after repeated data iteration and model optimization. And finally, the whole training process is efficiently and simply carried out.

Wherein: the data preprocessing module receives the reflow data and outputs a preprocessed data set;

the model training module is used for automatically training and optimizing parameters of the processed data set to obtain an optimal configuration model;

the model deployment module is used for automatically adapting and converting the optimal configuration model according to the environment of the hardware to be deployed and finally deploying the optimal configuration model on the hardware.

The system also comprises a data labeling module for labeling the auxiliary data. In order to support multi-user multitasking, the unified scheduling of the computing resources is preferably performed through the containerized sharing module, and the unified storage of the data is preferably performed by the storage module, so that the computing resources are efficiently utilized.

The model optimization deployment system realizes automatic optimization of algorithm configuration, and can select a better technical mode according to a planned layout environment; meanwhile, unusable structures can be avoided according to the target scene and are abreast as a blacklist.

Based on the system, in the model training module, the model training is as follows:

preferably, the optimization method is to sequentially try to adjust various preset parameters. And the adjustment parameters include one or a combination of the following: model width, learning rate, optimization strategy, whether data enhancement is used, parameters of data enhancement, network unit module selection.

And step three, retraining the T1' to ensure that the average value of the performance results is larger than that of the T0, and the optimal model configuration is the final optimal result. Especially in the retraining of T1', if the performance result is larger than T0, substituting T0 as a new experimental configuration, and continuing the iterative optimization of the process; and stopping iterative optimization until the performance of the model configuration cannot continue to be stably improved.

Preferably, the number of iterative optimizations is generally two.

In the optimization, according to the preset hardware environment to be deployed, the optimization direction of the attempt is limited, for example, the operation speed is limited to be not lower than certain lower limits, and the network module is selected by using the support of a specific hardware platform or the optimization is better.

Referring to fig. 1, after performing iterative optimization on a network model for a plurality of times to obtain a final optimized result, a model deployment method is performed, including:

s1, obtaining a training result of the network model to be converted, namely stopping iterative optimization in the step three to obtain a result.

S2, splitting a training result; splitting is typically done at the compute layer. If the preset structure in the existing library is identified and found during splitting, the structure is saved.

S3, searching the split result in a database, and converting the split result into codes; the database is a database of hardware environment-computing frames, including the types of servers that can be deployed, and the types of computing frames. Wherein the server includes, but is not limited to, a Graphics Processor (GPU), a Central Processing Unit (CPU), an ARM processor, an AI chip, a cell phone, etc.; the computing framework includes, but is not limited to, TF, caffe, torch, and the like. As many types as possible should be included to meet different requirements.

Wherein Caffe is a deep learning framework with expressive, speed and thought modularizing functions; multiple types of deep learning architectures are supported, image classification and image segmentation oriented, as well as CNN, RCNN, LSTM and fully connected neural network designs.

TF, tensorFlow, is a symbolic mathematical system based on data stream programming, and is widely applied to the programming implementation of various machine learning algorithms; the TF framework has a multi-level structure, can be deployed on various servers, PC terminals and webpages, and supports GPU and TPU high-performance numerical computation.

Torch is a scientific computational framework supported by a large number of machine learning algorithms, and is characterized by being particularly flexible and employing the programming language Lua.

And then searching (looking up a table) the split calculation layer or the sub-structure in a database to convert according to a preset code. The substructures in the database select the optimal conversion method according to the format of the conversion plan or the platform which is expected to be deployed.

S4, reconnecting the converted model codes to obtain a deployment model.

Because of the large differences between the experimental environment and the actual deployment environment, it may be a different hardware platform, a different computing framework. The last ring in the overall model training process therefore requires that the trained network and its corresponding weights be converted to the model format of the target environment for final deployment. This step traditionally requires targeted programming of the transcoding for network architecture and source/target model formats. This environment should be automated as part of the end-to-end AI system.

And finally, respectively calculating an original model in an experimental environment and a deployment model in an environment to be deployed by using preset test data (pictures, videos and the like), respectively extracting part of characteristic layers, and correspondingly calculating given vector distances between every two pairs in sequence. If the difference is smaller than a certain acceptable threshold, the results are considered to be aligned, the conversion is successful, and the model is output as the final conversion result.

The method can automatically convert the trained model into caffe or TF pb or other common model formats, automatically verify whether the model can be deployed on a corresponding hardware platform which is expected to be deployed, and automatically perform some optimization.

Referring to fig. 2, the present application further provides a model optimized deployment system, comprising: and the data preprocessing module receives the reflow data and then outputs a preprocessed data set.

The model training module is used for executing a training method on the data set to obtain an optimal configuration model; specifically, a group of models are trained on a preset basic neural network, and a model T0 with the best performance on a verification set is selected; then optimizing on T0 and then obtaining a plurality of alternative experimental configurations, training the plurality of alternative experimental configurations and obtaining a model T1' with optimal performance, and adjusting parameters of an optimization method of T0, wherein the parameters comprise one or a combination of the following steps: model width, learning rate, optimizing strategy, whether to use data enhancement, parameters of data enhancement and a network unit module;

finally retraining the T1', and ensuring that the average value of the retraining performance results is larger than the performance of T0; and obtaining the optimal model configuration. In retraining T1', if the retrained performance result is greater than the performance result of T0, then T0 is replaced as a new alternative experimental configuration, i.e., iterative optimization. And repeatedly and circularly carrying out until the performance is optimal. And typically twice.

The model deployment module is used for adapting and converting the optimal configuration model according to the environment of the hardware to be deployed and finally deploying the optimal configuration model on the hardware. Specifically, a training result of a network model to be converted is obtained; splitting the training result according to the network structures such as the network structure of the training result; then searching the split result in a database of a hardware environment-computing framework through a table lookup, and converting the code; and finally, reconnecting the converted model codes to obtain a deployment model. Extracting feature layers from a network model to be converted and a deployment model in a hardware environment to be deployed according to test data, and correspondingly calculating given vector distances between every two pairs in sequence; if the difference is smaller than the preset threshold, the results are considered to be aligned, and the deployable model is output as a conversion result.

In addition, the application also provides electronic equipment, which comprises: at least one processor; a memory coupled to the at least one processor, the memory storing executable instructions that when executed by the at least one processor cause the method of the present application described above to be implemented.

For example, the memory may include random access memory, flash memory, read-only memory, programmable read-only memory, non-volatile memory, registers, or the like. The processor may be a central processing unit (Central Processing Unit, CPU) or the like. Or an image processor (Graphic Processing Unit, GPU) memory may store executable instructions. The processor may execute executable instructions stored in the memory to implement the various processes described herein.

It will be appreciated that the memory in this embodiment may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a ROM (Read-only memory), a PROM (programmable Read-only memory), an EPROM (erasablprom, erasable programmable Read-only memory), an EEPROM (electrically erasable EPROM), or a flash memory. The volatile memory may be a RAM (random access memory) which serves as an external cache. By way of example, and not limitation, many forms of RAM are available, such as SRAM (static RAM), DRAM (dynamic RAM), SDRAM (synchronous DRAM), ddr SDRAM (DoubleDataRate SDRAM, double data rate synchronous DRAM), ESDRAM (Enhanced SDRAM), SLDRAM (synclinkdram), and DRRAM (directrambus RAM). The memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, the memory stores the following elements, an upgrade package, an executable unit, or a data structure, or a subset thereof, or an extended set thereof: an operating system and application programs.

The operating system includes various system programs, such as a framework layer, a core library layer, a driving layer, and the like, and is used for realizing various basic services and processing hardware-based tasks. And the application programs comprise various application programs and are used for realizing various application services. The program for implementing the method of the embodiment of the application can be contained in an application program.

In the embodiment of the present application, the processor is configured to execute the above method steps by calling a program or an instruction stored in the memory, specifically, a program or an instruction stored in an application program.

The embodiment of the application also provides a chip for executing the method. Specifically, the chip includes: and a processor for calling and running the computer program from the memory, so that the device on which the chip is mounted is used for executing the above method.

The present application also provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor implements the steps of the above-described method of the present application.

For example, machine-readable storage media may include, but are not limited to, various known and unknown types of non-volatile memory.

Embodiments of the present application also provide a computer program product comprising computer program instructions for causing a computer to perform the above method.

Those of skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In embodiments of the present application, the disclosed systems, electronic devices, and methods may be implemented in other ways. For example, the division of the units is only one logic function division, and other division manners are also possible in actual implementation. For example, multiple units or components may be combined or may be integrated into another system. In addition, the coupling between the individual units may be direct coupling or indirect coupling. In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or may exist alone physically, or the like.

It should be understood that, in various embodiments of the present application, the size of the sequence number of each process does not mean that the execution sequence of each process should be determined by its functions and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.

The functions, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored on a machine-readable storage medium. Accordingly, aspects of the present application may be embodied in a software product, which may be stored on a machine-readable storage medium, which may include instructions for causing an electronic device to perform all or part of the processes of the aspects described in embodiments of the present application. The storage medium may include a ROM, a RAM, a removable disk, a hard disk, a magnetic disk, or an optical disk, etc. various media in which program codes can be stored.

The above is merely an embodiment of the present application, and the scope of the present application is not limited thereto. Those skilled in the art can make changes or substitutions within the technical scope of the present disclosure, and such changes or substitutions should be included in the scope of the present disclosure.

Claims

1. A method of model deployment, comprising:

s1, obtaining a training result of a network model to be converted;

s2, splitting the training result according to a network structure of the training result, wherein the network structure is a calculation layer;

s3, searching the split result in a hardware environment-computing framework database in a table look-up mode, and converting the code;

s4, reconnecting the converted model codes to obtain a deployment model;

s5, extracting feature layers from the network model to be converted and the deployment model in the hardware environment to be deployed according to the test data, and correspondingly calculating given vector distances between every two pairs in sequence; if the given vector distance is smaller than the preset threshold value, the results are considered to be aligned, and the deployable model is output as a conversion result.

2. The model deployment method of claim 1, wherein: s2, if a preset structure in the database is identified and found during splitting, the preset structure is stored; and when the method is deployed, selecting a conversion method according to the stored preset structure.

3. A model deployment system, comprising:

the model training module is used for executing training optimization on the data set to obtain an optimal configuration model;

the model deployment module executes the model deployment method of claim 1 or 2 according to the environment of the hardware to be deployed, adapts and converts the optimal configuration model, and finally deploys the optimal configuration model on the hardware.

4. A model deployment system as in claim 3, wherein: the system also comprises a data labeling module for labeling the auxiliary data.

5. The model deployment system of claim 4, wherein: the system also comprises a storage module for executing unified storage of the whole system data.

6. A chip comprising a processor for calling and running a computer program from a memory, so that a device on which the chip is mounted performs the model deployment method of claim 1 or 2.

7. An electronic device, characterized in that: a memory comprising a processor, and memory for storing executable instructions for the processor, the processor when run performing the model deployment method of claim 1 or 2.

8. A computer-readable medium, characterized by: on which computer program instructions are stored which, when processed and executed, implement the model deployment method of claim 1 or 2.