CN110991643B

CN110991643B - Model deployment method and device, electronic equipment and storage medium

Info

Publication number: CN110991643B
Application number: CN201911359655.7A
Authority: CN
Inventors: 陈可
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-12-25
Filing date: 2019-12-25
Publication date: 2024-01-30
Anticipated expiration: 2039-12-25
Also published as: CN110991643A

Abstract

The invention provides a model deployment method, a device, electronic equipment and a storage medium, wherein the method comprises the following steps: optimizing and compiling the obtained initial model to obtain target model parameters; determining a target index according to the target model parameters; under the condition that the target index does not accord with a preset standard, reducing the target model parameters according to the preset standard and a preset dimension; and deploying the initial model according to the reduced target model parameters to obtain a target model. The model parameters obtained by optimizing and compiling the model are reduced according to the preset standard and the preset dimension before the model is deployed, and then the model is deployed according to the reduced model parameters, so that the data volume of the model is reduced, the limitation of limited storage resources on the model deployment is overcome, and the efficiency of the model deployment is improved.

Description

Model deployment method and device, electronic equipment and storage medium

Technical Field

The invention belongs to the technical field of information, and particularly relates to a model deployment method, a device, electronic equipment and a storage medium.

Background

With the increasing development of deep learning technology, the deep learning model has been widely applied to the aspects of people's production and life.

At present, a method for optimizing and accelerating a deep learning model is to apply a deep learning reasoning optimization framework, firstly, optimizing and compiling the deep learning reasoning model through a network optimization compiler to generate an intermediate expression file of the model, and then, importing the intermediate expression file into a model accelerator of the deep learning reasoning optimization framework for deployment, but for some dimension-sensitive deep learning models, when optimizing, if only one fixed intermediate expression file is applied, the error is overlarge, so that a plurality of intermediate expression files with different dimensions are generally required to be prepared.

However, the data volume of the intermediate expression file in the mode can be multiplied, the storage resource requirement of model deployment is improved, and due to the fact that storage resources in some scenes or equipment are limited, such as edge ends, normal deployment of the model is limited due to insufficient storage resources, too many intermediate expression files also mean longer model import time, and model deployment efficiency is reduced.

Disclosure of Invention

In view of the above, the present invention provides a method, an apparatus, an electronic device, and a storage medium for model deployment, so as to solve the problems in the prior art that, when a model is deployed, normal deployment of the model is limited and efficiency of model deployment is reduced due to an excessive data size of intermediate expression files.

According to a first aspect of the present invention, there is provided a model deployment method, comprising:

optimizing and compiling the obtained initial model to obtain target model parameters;

determining a target index according to the target model parameters;

under the condition that the target index does not accord with a preset standard, reducing the target model parameters according to the preset standard and a preset dimension;

and deploying the initial model according to the reduced target model parameters to obtain a target model.

Optionally, the target index includes: and determining a target index according to the target model parameters, wherein the target error comprises the following steps:

adjusting the initial model according to the target model parameters;

inputting the test data into the adjusted initial model to obtain a target prediction result;

comparing the target prediction result with a standard prediction result to obtain a target error;

the step of reducing the target model parameters according to the preset standard and the preset dimension when the target index does not meet the preset standard includes:

and under the condition that the target error is smaller than an allowable error threshold, reducing the target model parameters according to the allowable error threshold and a preset dimension.

Optionally, the preset dimension includes: at least two dimension values, the target model parameters comprising: the step of reducing the target model parameters according to the preset dimension includes:

determining at least one intermediate dimension value from the at least two dimension values;

and reducing the dimension value of the target model subparameter to the nearest intermediate dimension value so as to reduce the target model parameter.

Optionally, the dimension value includes: the step of determining at least one intermediate dimension value from the at least two dimension values, comprises:

determining an intermediate size according to at least two sizes when the initial model is an image prediction model;

the step of reducing the dimension value of the target model subparameter to the nearest intermediate dimension value to reduce the target model subparameter comprises the following steps:

and reducing the size of the target model subparameter to the nearest intermediate size so as to reduce the target model subparameter.

According to a second aspect of the present invention, there is provided a model deployment apparatus comprising:

The compiling module is used for optimizing and compiling the obtained initial model to obtain target model parameters;

the determining module is used for determining a target index according to the target model parameters;

the adjusting module is used for reducing the target model parameters according to the preset standard and the preset dimension under the condition that the target index does not accord with the preset standard;

the deployment module is used for deploying the initial model according to the reduced target model parameters to obtain a target model.

Optionally, the target index includes: a target error, the determination module comprising:

the adjustment sub-module is used for adjusting the initial model according to the target model parameters;

the prediction sub-module is used for inputting the test data into the adjusted initial model to obtain a target prediction result;

the comparison sub-module is used for comparing the target prediction result with the standard prediction result to obtain a target error;

the adjustment module comprises:

and the first reduction submodule is used for reducing the target model parameters according to the allowable error threshold and a preset dimension under the condition that the target error is smaller than the allowable error threshold.

Optionally, the target index further includes: a target data volume, the determination module comprising:

a determining submodule, configured to determine a target data amount of the target model parameter;

the adjustment module comprises:

and the second reduction submodule is used for reducing the target model parameters according to the preset dimension and the data quantity threshold value under the condition that the target data quantity is larger than the data quantity threshold value.

Optionally, the preset dimension includes: at least two dimension values, the target model parameters comprising: the adjusting module includes:

a processing sub-module for determining at least one intermediate dimension value from the at least two dimension values;

and the reduction submodule is used for reducing the dimension value of the target model subparameter to the nearest intermediate dimension value so as to reduce the target model parameter.

Optionally, the dimension value includes: size, the processing submodule includes:

the processing unit is used for determining an intermediate size according to at least two sizes when the initial model is an image prediction model;

the downscaling submodule includes:

and the reduction unit is used for reducing the size of the target model subparameter to the nearest intermediate size so as to reduce the target model parameter.

According to a third aspect of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the model deployment method according to any of the first aspects when executing the computer program.

According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the model deployment method of any of the first aspects described above.

Aiming at the prior art, the invention has the following advantages:

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a flow chart of steps of a first model deployment method provided by an embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of a method for model parameter reduction according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating a method for model parameter reduction for an image prediction model according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating steps of a second model deployment method according to an embodiment of the present invention;

FIG. 5 is a flow chart of steps of a third model deployment method provided by an embodiment of the present invention;

FIG. 6 is a logic flow diagram of a model deployment method provided by an embodiment of the present invention;

fig. 7 is a block diagram of a model deployment device according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

Fig. 1 is a flowchart of steps of a first model deployment method according to an embodiment of the present invention, including:

and step 101, optimizing and compiling the obtained initial model to obtain target model parameters.

In the embodiment of the invention, when the trained initial model is deployed to the edge end, in order to improve the performance of the initial model, the initial model is firstly required to be imported into a network optimization compiler serving as a back end for optimization compiling, and an intermediate expression file irrelevant to a front-end deep learning development framework is generated, wherein the intermediate expression file generally comprises topological structure information of the model and model parameters. Because of the relatively small memory space occupied by topology information, it is typically on the order of KB (Kilo Byte), while model parameters can typically be on the order of MB (megabyte). The method is limited in that the existing optimizing compiler needs to pre-specify the characteristic of the input dimension of the model, and in order to ensure the performance of the model, often the intermediate expression file may contain multiple model parameters corresponding to different dimensions, so that the data volume of the model parameters is multiplied compared with the scheme of single model parameters. Therefore, if the intermediate expression file of the model needs to be reduced to adapt to the data storage capacity of the edge, multiple model parameters corresponding to different dimensions in the intermediate expression file mainly need to be reduced.

The network optimization compiler optimizes the topological structure of the model through an optimization algorithm framework, directly derives an intermediate expression file corresponding to the model and containing topological structure information and model parameters, and can input the intermediate expression file into a model accelerator for model deployment. In practical applications, the type of the network optimization compiler may be set according to the user's requirement and the type of the required optimization model, which is not limited herein.

And 102, determining a target index according to the target model parameters.

In the embodiment of the invention, the problem that the model cannot be smoothly deployed in the edge end with limited storage resources is that the data volume of the intermediate expression file corresponding to the model is overlarge, so that the target index can be the data volume of the target model, and along with the reduction of the data volume of the model parameters, the error of the model deployed through the target model parameters can be improved, so that the target index can be combined with the target error of the model on the basis of the data volume, and the data volume corresponding to the target model parameters and the target error are integrated to be used as the target index. The data volume can be obtained by directly counting target model parameters, the target error can be obtained by setting test data, inputting the target model parameters and topological structure information into a model accelerator for deployment, then inputting the test data into a deployed model for prediction, and comparing the obtained prediction data with standard prediction data to obtain the target error.

And 103, under the condition that the target index does not accord with a preset standard, reducing the target model parameters according to the preset standard and a preset dimension.

In the embodiment of the invention, a user can pre-configure a preset standard for a target index according to actual requirements, wherein the preset standard can be a maximum error allowed by a model after the target model parameter deployment, namely an allowable error threshold, a maximum data quantity supported by an edge end, namely a data quantity threshold, or a comprehensive index obtained by weighting and summing the allowable error threshold and the data quantity threshold.

If the target index does not reach the preset standard, the target model parameters need to be reduced. Specifically, since there are multiple model parameters adapted to different dimensions, the target model parameters may be reduced according to the preset dimensions related to the model parameters. For example, if the model parameter is a corresponding image recognition model, the preset dimension related to the model parameter may be image resolution, size, etc., and the resolution or size of the model parameter may be reduced to reduce the data size of the model parameter; if the model parameter corresponds to the semantic recognition model, the preset dimension related to the model parameter may be a data length, etc., the length of the model parameter may be reduced to reduce the data item of the model parameter, and the specific preset dimension may be adapted to the actual requirement of the user for selection, which is not limited herein.

After each reduction, determining target indexes of the reduced target model parameters, if the target indexes still do not reach the preset standard, continuing to reduce the target model parameters according to the preset dimension until the target indexes of the reduced target model parameters meet the preset standard.

In practical application, the target model parameters can be reduced successively according to preset dimensions without considering preset standards, and target indexes corresponding to the reduced target model parameters are recorded each time until the reduced target model parameters only correspond to one dimension value. And then comparing the target model parameters with target indexes of target model parameters after each reduction according to preset standards set by a user, and confirming the target model parameters closest to the preset standards.

Optionally, the preset dimension includes: at least two dimension values, the target model parameters comprising: the step 103, referring to fig. 2, includes:

step 1031, determining at least one intermediate dimension value from the at least two dimension values.

Step 1032, reducing the dimension values of the target model sub-parameters to the nearest intermediate dimension values, so as to reduce the target model parameters.

In the embodiment of the invention, an intermediate dimension value is determined according to the dimension values corresponding to the sub-parameters of at least two target models, and the dimension values of the at least two target model parameters are reduced to the intermediate dimension value according to a preset rule so as to reduce the target model parameters. The intermediate dimension values may be generated according to two adjacent dimension values, and the preset rule may be that the dimension of the target model parameter is reduced to an intermediate dimension value with the smallest distance between the dimension and the dimension value, or the dimension of the target model parameter is reduced back to an adjacent intermediate dimension value, so that the dimension value of each target model parameter is reduced, and the data size of each target model parameter is reduced. It can be found that the precondition for generating the intermediate dimension value is that there are at least two dimension values, so that if the number of the target model parameters is reduced to one, the data size of the target model parameters cannot be reduced any more.

Optionally, referring to fig. 3, a step flowchart of a model parameter reduction method for an image prediction model is shown, including the following steps A1 to A2, including:

And step A1, determining an intermediate size according to at least two sizes when the initial model is an image prediction model.

In the embodiment of the present invention, if the initial model is an image prediction model, the target model parameters output by the input model optimization compiler will include target model sub-parameters corresponding to a plurality of sizes, i.e. each target model sub-parameter corresponds to a size, and the intermediate size between two adjacent sizes can be determined according to the two adjacent sizes. For example: there are 8 sizes of target model subparameters, 320×240,640×480,800×600,1024×768,1280×720,1600×1200,1920×1080,3840×2160, respectively, and since the sizes of the images are generally common sizes, intermediate sizes between two adjacent sizes can be selected directly from the common sizes, for example: 480*360,720*540,912*684,1152*744,1440*960,1760*1440,2880*1620.

And step A2, reducing the size of the target model subparameter to the nearest intermediate size so as to reduce the target model subparameter.

In the embodiment of the invention, the concrete between the intermediate size and the corresponding size of each target model sub-parameter is calculated, and the intermediate size with the smallest distance with the corresponding size of the target model sub-parameter is determined as the target intermediate size of the target model sub-parameter. For example, the intermediate sizes of two ends of 640×480 are 480×360,720×540, 640×480 are 307200, the size of 480×360 is 172800, the size of 720×540 is 388800, the distance between 640×480 and 480×360 is 134400, the distance between 640×480 and 720×540 is 81600, and 720×540 is confirmed as the intermediate target size corresponding to 640×480 because 81600 is smaller than 134400. It can be found that 720×540 is larger than 640×480, and the blank area can be used to fill the target model subparameter to enlarge the size of the target model subparameter. It can be appreciated that, since the data size of the target model subparameter corresponding to each size is fixed, reducing the number of the target model subparameters can effectively reduce the data size of the target model parameters.

And 104, deploying the initial model according to the reduced target model parameters to obtain a target model.

In the embodiment of the invention, the target model parameters and the topological structure information corresponding to the initial model are input to the model accelerator deployed at the edge end, so that the deployment work of the target model at the edge end can be completed. It is conceivable that the model parameters corresponding to the target model parameters of the target model meet preset standards set by the user, thereby meeting the user requirements.

The first model deployment method provided by the invention comprises the following steps: optimizing and compiling the obtained initial model to obtain target model parameters; determining a target index according to the target model parameters; under the condition that the target index does not accord with a preset standard, reducing the target model parameters according to the preset standard and a preset dimension; and deploying the initial model according to the reduced target model parameters to obtain a target model. The model parameters obtained by optimizing and compiling the model are reduced according to the preset standard and the preset dimension before the model is deployed, and then the model is deployed according to the reduced model parameters, so that the data volume of the model is reduced, the limitation of limited storage resources on the model deployment is overcome, and the efficiency of the model deployment is improved.

Fig. 4 is a flowchart of steps of a second model deployment method according to an embodiment of the present invention, including:

and step 201, optimizing and compiling the obtained initial model to obtain target model parameters.

This step is described in detail with reference to step 101, and will not be described here.

Optionally, the target index includes: target error.

And 202, adjusting the initial model according to the target model parameters.

In the embodiment of the invention, the target model parameters are model parameters in an intermediate expression file obtained after the model frame is optimized by a model optimization compiler, and the intermediate expression file of the initial model is input to a model accelerator to obtain the adjusted initial model.

And 203, inputting the test data into the adjusted initial model to obtain a target prediction result.

In the embodiment of the invention, the test data and the sample data with the same dimension as the target model parameter are input into the adjusted initial model for prediction, so that a target prediction result is obtained. It will be appreciated that the initial model after adjustment is a model that has been trained, where the input of test data to predict is to obtain predicted data to evaluate the model, and not to further train the model.

And 204, comparing the target prediction result with a standard prediction result to obtain a target error.

In the embodiment of the invention, the standard prediction result may be a standard result labeled in advance for the test data according to actual requirements, or may be a prediction result obtained by inputting the test data into an initial model for prediction, as an evaluation reference of a subsequent index. And comparing the target prediction result with the standard prediction result, and determining a target error corresponding to the target model parameter according to the difference between the prediction results.

Step 205, in the case that the target error is smaller than the allowable error threshold, reducing the target model parameter according to the allowable error threshold and a preset dimension.

In the embodiment of the invention, although the reduction of the target model parameters is favorable for model deployment when the storage resources of the model at the edge end are limited, the error of the deployed model is also improved, so that a user can set an acceptable tolerance error threshold, namely the maximum error allowed by the deployed model, according to the self requirement by setting a preset standard. And sequentially reducing the number of the target model parameters according to the allowable error threshold and the preset dimension until the target error of the reduced target model parameters is greater than or equal to the allowable error threshold. Of course, since the target error of the target model parameter after each reduction is discontinuous, the tolerance error threshold is difficult to match to the completely equal target error, and at this time, the model parameter with the minimum distance from the tolerance error threshold in the target error corresponding to the target model parameter obtained by each reduction may be used as the final target model parameter for deployment.

In case the target error is greater than or equal to the allowable error threshold, the following step 206 is performed.

And step 206, deploying the initial model according to the reduced target model parameters to obtain a target model.

This step is described in detail with reference to step 104, and will not be described in detail here.

The second model deployment method provided by the invention comprises the following steps: optimizing and compiling the obtained initial model to obtain target model parameters; determining a target index according to the target model parameters; under the condition that the target index does not accord with a preset standard, reducing the target model parameters according to the preset standard and a preset dimension; and deploying the initial model according to the reduced target model parameters to obtain a target model. Before model deployment, when the error of the model parameters obtained by model optimization and compiling does not reach the allowable error threshold, the model parameters are reduced according to the preset dimension, and then the model deployment is carried out according to the reduced model parameters, so that the data volume of the model is reduced, the limitation of limited storage resources on the model deployment is overcome, and the efficiency of the model deployment is improved.

Fig. 5 is a flowchart of steps of a third model deployment method according to an embodiment of the present invention, including:

and 301, optimizing and compiling the obtained initial model to obtain target model parameters.

Optionally, the target index further includes: target data volume.

Step 302, determining a target data volume of the target model parameters.

In the embodiment of the invention, the target data volume can be obtained by carrying out data statistics on the target model parameters.

And step 303, in the case that the target data volume is greater than the data volume threshold, reducing the target model parameters according to the preset dimension and the data volume threshold.

In the embodiment of the present invention, in general, the storage resources of the edge are limited, and the data size of each target model parameter is fixed, and if the total data size of the target model parameters exceeds the maximum data size supported by the edge, the number of the target model parameters needs to be reduced. For example, a remote site has 1GB (gigabyte) of space, while target model parameters have 8 parts, each occupying 200MB of space, and if the model needs to be deployed, the target model parameters need to be reduced from 8 parts to 5 parts. Of course, the user can set the data quantity threshold value after the number of copies of the target model parameters are specifically adjusted according to the actual demand of the user. And when the data volume of the target model parameters is larger than the data volume threshold, reducing the target model parameters according to the preset dimension until the data volume of the reduced target model parameters is smaller than or equal to the data volume threshold.

In case the target data amount is less than or equal to the data amount threshold, the following step 304 is performed.

And step 304, deploying the initial model according to the reduced target model parameters to obtain a target model.

Further, referring to fig. 6, a logic flow diagram of a model deployment method of the present invention is shown, in which test data is input into an original model constructed by a model file (the model file contains a model topology and N model parameters) to be inferred, and the resulting inferred data is used as reference data; and enter a reduction loop for model parameters: presetting i=n; subtracting 1 from i before the reduction; performing model network optimization compiling according to the intermediate sizes of the two adjacent model parameters serving as new model input sizes to obtain i model parameters; the obtained i model parameter model accelerators are used for reasoning test data by utilizing the obtained model, and the obtained reasoning results are compared with the reference data to obtain a model error E (i) when the model parameter is i; if i is not equal to 1, returning to the step of subtracting 1 from i before reduction; at i=1, all results were traversed and all model parameter combinations (total N-1) and the corresponding errors for each model parameter combination were recorded.

Under the condition that a designated edge end storage space is received, calculating the number of copies p of the corresponding model parameters according to the data size of the model parameters; and selecting a model parameter corresponding to p from the model parameters (1-N-1) in the traversing result, providing a corresponding error as a reference, and outputting p model parameters.

And under the condition that the specified allowable error range E is received, selecting the minimum i meeting E (i) less than E from the traversing result, determining the model parameter with the number of copies being i as q model parameters, providing corresponding errors, and outputting the q model parameters.

According to the embodiment, before the model is deployed at the edge end, the model parameters are reduced part by part, corresponding model errors are recorded, and the model parameters meeting the requirements in a plurality of groups of model parameters which are reduced part by part are inquired and output according to the appointed storage space or the allowable errors, so that the flexibility of model parameter reduction is improved, and the efficiency of model deployment is ensured.

The third model deployment method provided by the invention comprises the following steps: optimizing and compiling the obtained initial model to obtain target model parameters; determining a target index according to the target model parameters; under the condition that the target index does not accord with a preset standard, reducing the target model parameters according to the preset standard and a preset dimension; and deploying the initial model according to the reduced target model parameters to obtain a target model. Before model deployment, when the data quantity of the model parameters obtained by model optimization and compiling does not reach a data quantity threshold, the model parameters are reduced according to preset dimensions, and then model deployment is carried out by using the reduced model parameters, so that the data quantity of the model is reduced, the limitation of limited storage resources on model deployment is overcome, and the efficiency of model deployment is improved.

Fig. 7 is a block diagram of a model deployment apparatus 40 according to an embodiment of the present invention, including:

the compiling module 401 is configured to perform optimization compiling on the obtained initial model to obtain the target model parameters.

A determining module 402, configured to determine a target indicator according to the target model parameter.

And the adjustment module 403 is configured to reduce the target model parameter according to the preset standard and the preset dimension when the target index does not meet the preset standard.

The deployment module 404 is configured to deploy the initial model according to the reduced target model parameter to obtain a target model.

Optionally, the target index includes: target error, the determining module 402 includes:

an adjustment submodule 4021 is configured to adjust the initial model according to the target model parameter.

The prediction submodule 4022 is configured to input the test data into the adjusted initial model to obtain a target prediction result.

And a comparison sub-module 4023, configured to compare the target prediction result with the standard prediction result to obtain a target error.

The adjustment module 403 includes:

the first downscaling submodule 4031 is configured to downscale the target model parameter according to the allowable error threshold and a preset dimension when the target error is smaller than the allowable error threshold.

Optionally, the target index further includes: a target data amount, the determining module 402 includes:

a determination submodule 4024 is configured to determine a target data amount of the target model parameter.

The adjustment module 403 includes:

and a second reduction submodule 4032, configured to reduce the target model parameter according to a preset dimension and a data volume threshold when the target data volume is greater than the data volume threshold.

Optionally, the preset dimension includes: at least two dimension values, the target model parameters comprising: the adjusting module 403 includes:

a processing submodule 4033 for determining at least one intermediate dimension value from the at least two dimension values.

And a reduction submodule 4034, configured to reduce the dimension value of the target model subparameter to the nearest intermediate dimension value, so as to reduce the target model parameter.

Optionally, the dimension value includes: size, the processing sub-module 4033, comprising:

a processing unit 40331, configured to determine an intermediate size according to at least two sizes when the initial model is an image prediction model.

The downscaling submodule 4034 includes:

A reduction unit 40341, configured to reduce the size of the target model sub-parameter to the nearest intermediate size, so as to reduce the target model parameter.

The first model deployment device provided by the invention comprises: the compiling module is used for optimizing and compiling the obtained initial model to obtain target model parameters; the determining module is used for determining a target index according to the target model parameters; the adjusting module is used for reducing the target model parameters according to the preset standard and the preset dimension under the condition that the target index does not accord with the preset standard; the deployment module is used for deploying the initial model according to the reduced target model parameters to obtain a target model. The model parameters obtained by optimizing and compiling the model are reduced according to the preset standard and the preset dimension before the model is deployed, and then the model is deployed according to the reduced model parameters, so that the data volume of the model is reduced, the limitation of limited storage resources on the model deployment is overcome, and the efficiency of the model deployment is improved.

For the embodiment of the server described above, since it is substantially similar to the method embodiment, the description is relatively simple, and reference is made to the description of the method embodiment in part.

In addition, the embodiment of the invention also provides an electronic device, which comprises a processor, a memory, and a computer program stored in the memory and capable of running on the processor, wherein the computer program realizes each process of the embodiment of the model deployment method when being executed by the processor, and can achieve the same technical effect, and the repetition is avoided.

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the processes of the above embodiment of the model deployment method, and can achieve the same technical effects, so that repetition is avoided, and no further description is given here. The computer readable storage medium may be Read-only memory (ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

As will be readily appreciated by those skilled in the art: any combination of the above embodiments is possible, and thus is an embodiment of the present invention, but the present specification is not limited by the text.

A model deployment method provided herein is not inherently related to any particular computer, virtual system, or other apparatus. Various general-purpose systems may also be used with the teachings herein. The required structure for a system constructed with aspects of the present invention will be apparent from the description above. In addition, the present invention is not directed to any particular programming language. It will be appreciated that the teachings of the present invention described herein may be implemented in a variety of programming languages, and the above description of specific languages is provided for disclosure of enablement and best mode of the present invention.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be construed as reflecting the intention that: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the apparatus of the embodiments may be adaptively changed and disposed in one or more apparatuses different from the embodiments. The modules or units or components of the embodiments may be combined into one module or unit or component and, furthermore, they may be divided into a plurality of sub-modules or sub-units or sub-components. Any combination of all features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be used in combination, except insofar as at least some of such features and/or processes or units are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some or all of the components in a model deployment method according to embodiments of the present invention may be implemented in practice using microprocessors or Digital Signal Processors (DSPs). The present invention can also be implemented as an apparatus or device program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several servers, several of these servers can be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

Claims

1. A method of model deployment, comprising:

optimizing and compiling the obtained initial model to obtain target model parameters; the target model parameters are model parameters with different dimensionalities in the intermediate expression file; the intermediate expression file is obtained after optimizing and compiling the topological structure of the initial model;

determining a target index according to the target model parameters;

reducing the dimension value of the target model parameter according to the preset standard and the preset dimension under the condition that the target index does not accord with the preset standard;

deploying the initial model according to the reduced target model parameters to obtain a target model;

the preset dimension comprises: at least two dimension values, the target model parameters comprising: the step of reducing the target model parameters according to the preset standard and the preset dimension includes:

2. The method of claim 1, wherein the target metrics comprise: and determining a target index according to the target model parameters, wherein the target error comprises the following steps:

adjusting the initial model according to the target model parameters;

3. The method of claim 1, wherein the target indicator further comprises: and determining a target index according to the target model parameters, wherein the target data volume comprises the following steps:

determining a target data amount of the target model parameters;

And under the condition that the target data volume is larger than a data volume threshold, reducing the target model parameters according to a preset dimension and the data volume threshold.

4. A method as claimed in claim 3, wherein the dimension values comprise: the step of determining at least one intermediate dimension value from the at least two dimension values, comprises:

5. A model deployment apparatus, comprising:

the compiling module is used for optimizing and compiling the obtained initial model to obtain target model parameters; the target model parameters are model parameters with different dimensionalities in the intermediate expression file; the intermediate expression file is obtained after optimizing and compiling the topological structure of the initial model;

The adjusting module is used for reducing the dimension value of the target model parameter according to the preset standard and the preset dimension under the condition that the target index does not accord with the preset standard;

the deployment module is used for deploying the initial model according to the reduced target model parameters to obtain a target model; the preset dimension comprises: at least two dimension values; the target model parameters include: target model subparameters corresponding to the at least two dimension values;

the adjustment module comprises:

6. The apparatus of claim 5, wherein the target metrics comprise: a target error, the determination module comprising:

the adjustment module comprises:

7. The apparatus of claim 5, wherein the target indicator further comprises: a target data volume, the determination module comprising:

the adjustment module comprises:

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the model deployment method of any of claims 1 to 4 when the computer program is executed.

9. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the model deployment method of any of claims 1 to 4.