CN115034367A

CN115034367A - Model deployment method and device

Info

Publication number: CN115034367A
Application number: CN202210651363.6A
Authority: CN
Inventors: 李亮; 张勃; 田值; 初祥祥
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2022-06-09
Filing date: 2022-06-09
Publication date: 2022-09-09

Abstract

The specification discloses a method and a device for model deployment, wherein the method for model deployment comprises the following steps: determining each network branch which needs to be replaced and is contained in a preset model as each target network branch; according to the initial network parameters corresponding to the target network branches, determining network parameters corresponding to the designated network branches after the assumption that the target network branches are equivalently replaced by the designated network branches is made; updating the initial network parameters according to the network parameters corresponding to the specified network branches, and performing pseudo-quantization processing on the updated initial network parameters; and equivalently replacing each target network branch contained in the model subjected to pseudo-quantization processing with a specified network branch, and deploying the model according to the replaced model.

Description

Model deployment method and device

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to a method and an apparatus for model deployment.

Background

With the development of deep learning technology, the application of neural networks is becoming mature. At present, the neural network model is widely applied to the industry and various business scenes of daily life of people. Some fields (such as industrial fields) generally have very strict requirements on the precision and the time delay of a neural network model, and the model which needs to be deployed on a terminal can keep high precision and has lower delay so as to fully meet the service requirement. To achieve this goal, many neural network compression and optimization techniques are intensively studied, which include two classical model optimization methods, i.e., a re-parameterization structure and a model quantization.

Since a certain error is generated during the quantization of the network parameters of the model (i.e. compressing the network parameters with high precision to the network parameters with low precision, such as compressing the network parameters with floating point precision to the network parameters with integer precision), therefore, at present, the model is usually trained first, then the multiple network branches of the trained model are fused, and then the fused single-branch model is trained, in the training process, pseudo quantization processing is carried out on the network parameters of the single-branch model (namely, high-precision network parameters are compressed into low-precision network parameters with a certain quantization error, and then the low-precision network parameters with the quantization error are restored into high-precision network parameters, but the high-precision network parameters have the quantization error), and then, carrying out quantization processing on the corresponding network parameters in the model after the training is finished. However, the accuracy of the model is reduced by such an optimization process, and the service requirements of some services with strict accuracy requirements cannot be met.

Therefore, how to optimize the model and improve the operation efficiency of the model on the premise of ensuring the precision of the model is a problem to be solved urgently.

Disclosure of Invention

The present specification provides a model deployment method and apparatus to partially solve the above problems in the prior art.

The technical scheme adopted by the specification is as follows:

the present specification provides a method of model deployment, comprising:

determining each network branch which needs to be replaced and is contained in a preset model as each target network branch;

according to the initial network parameters corresponding to the target network branches, determining network parameters corresponding to the designated network branches after the assumption that the target network branches are equivalently replaced by the designated network branches is made;

updating the initial network parameters according to the network parameters corresponding to the specified network branches, and performing pseudo-quantization processing on the updated initial network parameters;

and equivalently replacing each target network branch contained in the model subjected to pseudo-quantization processing with a specified network branch, and deploying the model according to the replaced model.

Optionally, before equivalently replacing each target network branch included in the model after the pseudo quantization processing with a specified network branch, the method further includes:

training the model after pseudo-quantization to obtain an optimized model;

performing model deployment according to the replaced model, specifically comprising:

and carrying out model deployment on the optimized model.

Optionally, before performing model deployment according to the replaced model, the method further includes:

carrying out quantization processing on the network parameters corresponding to the specified network branches;

and deploying the quantized model.

Optionally, training the model subjected to the pseudo-quantization processing to obtain an optimized model, specifically including:

acquiring input data;

inputting the input data into the model after pseudo-quantization processing, and determining an output result corresponding to the input data;

and training the model after pseudo-quantization processing by taking the minimized deviation between the output result and the actual label corresponding to the input data as an optimization target to obtain the optimized model.

Optionally, the acquiring the input data specifically includes:

acquiring initial input data;

and carrying out pseudo quantization processing on the initial input data to obtain the input data.

Optionally, the inputting the input data into the model after the pseudo-quantization processing, and determining an output result corresponding to the input data specifically include:

determining data distribution corresponding to the input data according to the input data

And performing regularization processing on the input data according to the data distribution, and determining an output result corresponding to the input data according to the processed input data.

Optionally, the method further comprises:

determining data distribution corresponding to historical input data as historical data distribution;

updating the historical data distribution according to the input data to obtain updated data distribution;

deploying the optimized model, specifically comprising:

and deploying the optimized model according to the updated data distribution.

The present specification provides an apparatus for model deployment, comprising:

the first determining module is used for determining each network branch which needs to be replaced and is contained in the preset model as each target network branch;

the second determining module is used for determining the network parameters corresponding to the specified network branches after the assumption that the target network branches are equivalently replaced by the specified network branches is made according to the initial network parameters corresponding to the target network branches;

the processing module updates the initial network parameters according to the network parameters corresponding to the specified network branches and performs pseudo-quantization processing on the updated initial network parameters;

and the deployment module is used for equivalently replacing each target network branch contained in the model subjected to the pseudo-quantization processing with a specified network branch and performing model deployment according to the replaced model.

The present specification provides a computer-readable storage medium, which stores a computer program that, when executed by a processor, implements the above-described method of model deployment.

The present specification provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above-described method of model deployment when executing the program.

The technical scheme adopted by the specification can achieve the following beneficial effects:

in the model deployment method provided in this specification, it is determined that, according to an initial network parameter corresponding to each target network branch, a network parameter corresponding to a designated network branch after assuming that each target network branch is equivalently replaced with the designated network branch is determined, the initial network parameter is updated according to the network parameter corresponding to the designated network branch, the updated initial network parameter is subjected to pseudo-quantization processing, and then each target network branch included in the model subjected to the pseudo-quantization processing is equivalently replaced with the designated network branch, and the finally obtained model is deployed.

In the prior art, after more target network branches are equivalently replaced by fewer designated network branches, the network parameters corresponding to the fewer replaced network branches are quantized, but a larger error is generated in the process of quantizing the network parameters corresponding to the fewer replaced network branches. Therefore, compared with the method for training the model with the pseudo-quantization parameters through the model structure with less specified network branches in the prior art, the method has the advantage that the accuracy of the model is further improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:

FIG. 1 is a schematic diagram of a prior art model optimization method provided herein;

FIG. 2 is a schematic flow chart diagram of a method for model deployment provided herein;

FIG. 3 is a schematic diagram of a model deployment method provided herein;

FIG. 4 is a schematic diagram of a model deployed device provided herein;

fig. 5 is a schematic diagram of an electronic device corresponding to fig. 1 provided in the present specification.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort belong to the protection scope of the present specification.

Because the network parameters corresponding to each network branch of the model are different, if the network parameters are subjected to pseudo-quantization processing first, the difference between the network parameters after training is further amplified, so that each network branch of the model cannot be equivalently replaced in the subsequent process of optimizing the model, and therefore, the method shown in fig. 1 is usually adopted in the prior art to deploy the model.

Fig. 1 is a schematic diagram of a conventional model optimization method provided in this specification.

The existing method firstly trains a model containing more target network branches for the first time to obtain a model with a plurality of target network branches, then equivalently replaces each target network branch in the model after the first training with a specified network branch to obtain the model with the specified network branches, then carries out pseudo-quantization processing on model parameters of the model with the specified network branches, trains the model after the processing, and then obtains a model with a specified branch network structure after the pseudo-quantization processing, finally quantizes the model and deploys the model.

In the process, the model with the pseudo-quantization network parameters can be trained only through a network structure with fewer designated network branches, and compared with a method of training through a model with more target network branches, the model precision is very limited.

The technical solutions provided by the embodiments of the present description are described in detail below with reference to the accompanying drawings.

Fig. 2 is a schematic flowchart of a method for model deployment provided in this specification, including the following steps:

s201: and determining each network branch which needs to be replaced and is contained in the preset model as each target network branch.

When training a neural network model, the training is usually performed in a cloud data center with high specific processing capacity and memory resources such as a server, so that even in the face of a complex neural network model, the training can be smoothly completed while the precision of the neural network model is ensured. After the model is trained, the server generally needs to install the model in a client on a terminal such as a mobile phone, a notebook computer, a tablet computer, a desktop computer, and the like, so as to complete the deployment of the model.

However, in the process of practical application, due to the limitation of conditions such as configuration (such as processor and memory) of some devices themselves, the processing capability and memory resource of the terminal device have a large difference from those of the server, and if the trained model is directly deployed on the terminal device, the low processing capability of the terminal device will often make the model run slowly, even the model cannot be started normally, and will also occupy a large memory of the terminal device.

For example, in a general situation, the processing capacity and the memory resource of a mobile phone are much lower than those of a server, if a model trained in the server is directly deployed in a client installed on the mobile phone, the model cannot be smoothly operated in the client even due to the configuration of the mobile phone, and even the client is crashed.

Therefore, it is necessary to optimize the trained model to simplify the logic structure and reduce the occupied memory resources, and then deploy the optimized model.

Based on this, the present specification provides a method for model deployment, where in a process of training a preset model by a server, each network branch included in the preset model and needing to be replaced needs to be determined as each target network branch.

Specifically, after the model training is completed, the server usually optimizes the trained model, that is, a part of network branches capable of being equivalently replaced in the trained model are equivalently replaced with designated network branches by adopting some equivalent formulas or equivalent algorithms, so that the fusion and simplification of multiple network branches in the trained model are realized, and the optimization of the running speed of the model is achieved. Therefore, the server can determine the network branches needing equivalent replacement in the preset model and the network branches needing to be used in the training process but having no function after deployment, and take the network branches as the target network branches.

It should be noted that there may be one designated network branch, or certainly, there may be a plurality of designated network branches, but the number of designated network branches is necessarily smaller than the number of target network branches, so that the optimization of the model can be realized.

In this specification, a method execution subject for implementing model deployment may refer to a designated device such as a server disposed on a service platform, and for convenience of description, the specification only takes the server as an execution subject to illustrate a method for model deployment provided in this specification.

S202: and determining the network parameters corresponding to the specified network branches after the assumption that the target network branches are equivalently replaced by the specified network branches is made according to the initial network parameters corresponding to the target network branches.

In the process of model training, because network parameters (such as convolution weights) corresponding to each network branch have a certain difference, if initial network parameters corresponding to each target network branch in the model are subjected to pseudo-quantization processing, after model training is completed, the difference between the different initial network parameters is often further amplified, so that in the process of model optimization, equivalent substitution of the target network branches through the specified network branches cannot be performed, and a large error is caused if substitution is performed forcibly. The procedure of the pseudo quantization process will be described in detail below, and will not be described in detail herein.

Therefore, the server may determine that the network parameters corresponding to the designated network branches are replaced with the network parameters corresponding to the designated network branches, and then update the initial network parameters of the target network branches with the network parameters, so that after model training is completed, the target network branches will have small or even no difference, and thus, the target network branches after updating the network parameters corresponding to the designated branches can be replaced with the network parameters through the designated network branches.

Specifically, after determining the target network branches, the server may perform equivalent replacement simulation on the target network branches, and after determining that the target network branches are equivalently replaced by the designated network branches through simulation, designate the network parameters corresponding to the network branches.

It should be noted that the simulation is only to determine the network parameters corresponding to the specified network branches, and in the actual training process, the preset model with the initial network structure (i.e., the network structure including the target network branches) is trained.

S203: and updating the initial network parameters according to the network parameters corresponding to the specified network branches, and performing pseudo-quantization processing on the updated initial network parameters.

After the server determines that the network parameters corresponding to the designated network branch are assumed to be equivalently replaced by the designated network branch, the server may update the target network parameters according to the network parameters corresponding to the designated network branch, for example, replace or adjust at least part of the target network parameters by the network parameters corresponding to the designated network branch, so as to obtain the updated network parameters corresponding to the target network branch.

After the model is deployed at the client, the configuration of the terminal device (such as a mobile phone, a tablet computer, etc.) running the client is generally lower than that of the server, so that the corresponding network parameters in the replaced model need to be quantized, and are compressed to a range matching with the configuration of the terminal device.

However, when the network parameters included in the trained model are quantized, certain errors may be generated, and the errors may affect the accuracy of the deployed model, so in order to reduce the errors, the server may set corresponding pseudo quantization nodes in the model, so that when the server trains the model, the updated initial network parameters may be subjected to pseudo quantization processing through the pseudo quantization nodes, so as to reduce or eliminate errors generated in the subsequent model quantization process.

Specifically, the server may perform quantization operation on the network parameter through the pseudo quantization node to obtain a quantized network parameter with a quantization error, and then restore the quantized network parameter to obtain a restored network parameter, and the server may compare the restored network parameter with the network parameter before the quantization operation to determine the quantization error generated in the quantization process, and then combine the quantization error with the network parameter before the quantization operation to obtain a pseudo quantized network parameter with the quantization error. Therefore, at least part of errors generated when the network parameters are quantized in the process of optimization later can be offset through the quantization errors, and the model identification precision is further improved.

For example, for a 32-bit floating point (float) network parameter in a target network branch, the server may quantize the network parameter by a certain quantization parameter (such as quantization ratio, quantization difference, etc.) to represent it by an 8-bit integer (int) network parameter, where the quantized integer network parameter has a certain quantization error, then reduce the integer network parameter with the quantization error by the quantization parameter, compare the reduced floating point network parameter with the floating point network parameter before quantization to determine the quantization error generated in the quantization process, and further combine the quantization error with the floating point network parameter before quantization to obtain the floating point network parameter with the quantization error,

of course, in practical applications, the server may also perform pseudo-quantization processing on the network parameters corresponding to the specified network branch, and then update the initial network parameters corresponding to each target network branch according to the network parameters after the pseudo-quantization processing.

S204: and equivalently replacing each target network branch contained in the pseudo-quantized model with a specified network branch, and deploying the model according to the replaced model.

After the server performs pseudo-quantization processing on the updated initial network parameters, a model after pseudo-quantization processing can be obtained, wherein the network parameters corresponding to each target network branch in the model are updated by the network parameters of the specified network branch, and the updated network parameters are obtained after pseudo-quantization processing is performed on the updated network parameters.

The server may then train the model after the pseudo-quantization, and in the process of training the model after the pseudo-quantization, the server may first obtain input data, input the input data into the model after the pseudo-quantization, and perform regularization on the input data through a Batch regularization (BN) layer in the model, thereby obtaining processed input data to limit a distribution range of data in the training process, and prevent overfitting of the model after training.

Specifically, the server may determine a data distribution of the input data through the BN layer in the model, where the data distribution may include a mean, a variance, and the like of the input data input into the pseudo-quantization processed model, and then perform a regularization process on the input data according to the data distribution.

After the model is deployed at the client, in the actual operation process of the model, the input data is usually quantized, for example, when picture recognition is performed through a specified client installed on a mobile phone, the model deployed at the client usually compresses a picture with a higher initial resolution into a picture with a lower resolution, so as to improve the recognition efficiency of the model. Moreover, because each initial network parameter is subjected to pseudo-quantization processing in the training process, the training effect of the model can be ensured by performing the pseudo-quantization processing on the input data.

Therefore, in order to simulate the input effect in the actual operation process, to make the configuration of the trained model and the deployment environment more matched, and to ensure the training effect of the model, the server may first obtain the initial input data, and then perform pseudo-quantization processing on the initial data, so as to obtain the input data after the pseudo-quantization processing. The method for performing the pseudo quantization processing on the initial input data is the same as the above method for performing the quantization processing on the updated initial network parameter, and will not be described herein again.

In addition, the server may determine, when each wheel trains the pseudo-quantized model, a data distribution corresponding to the input historical input data (e.g., a data distribution mean corresponding to the input data of each wheel train) as a historical data distribution, and update the historical data distribution according to the input data of each wheel, so as to obtain an updated data distribution. When training the pseudo-quantized model for each round, the input data is normalized only by using the data distribution corresponding to the input data in the training round.

And then the model after the pseudo-quantization processing can determine the recognition result corresponding to the input data through the processed input data, and train the model after the pseudo-quantization processing by taking the minimized deviation between the recognition result and the actual information corresponding to the input data as an optimization target until the training target is met. The training target may be that the model converges to a preset threshold range, or reaches a preset training frequency to ensure the recognition accuracy of the model, and the preset threshold range and the preset training frequency may be set according to the actual situation, which is not specifically limited in this specification.

In an actual application process, the initial input data may be image data, voiceprint data, text data, and the like, and correspondingly, the actual information corresponding to the initial input data may be image information included in the image data, voiceprint information included in the voiceprint data, text information included in the text data, and the like, which is not limited in this specification.

After the model is trained, the server may optimize the trained model to obtain an optimized model.

Specifically, the server may equivalently replace each network branch included in the trained model with a designated network branch to obtain a replaced model, thereby simplifying the network structure of the model and improving the operation efficiency of the model. In the process, each BN network included in the model is also equivalently replaced into the designated network in the designated branch.

In addition, the server can quantize the network parameters corresponding to the designated branches in the model after the replacement is completed, so that the memory occupation of the model is reduced, and the optimized model is obtained.

The server may then deploy the optimized model in the client.

It should be emphasized that, in the process of model training, the updated network parameters in the model are already subjected to pseudo quantization processing, so that the network parameters included in each target network branch in the trained model have a certain quantization error, and after the target network branch is equivalently replaced by a designated network branch in the optimization process, the network parameters corresponding to the designated network branch also have the quantization errors, and the quantization errors can counteract errors generated when at least part of the network parameters corresponding to the replaced model are quantized, so that the model accuracy is improved.

It should be noted that all the actions of acquiring signals, information or data in this specification are performed under the premise of complying with the corresponding data protection regulation policy of the country of the location and obtaining the authorization given by the owner of the corresponding device.

It can be seen from the above method that, in the method for model deployment provided in this specification, the initial network parameters can be updated by specifying the network parameters corresponding to the network branches, so that even if the model performs pseudo-quantization processing on these updated network parameters in the training process, the target network branches can be replaced by the specified network branches in the subsequent optimization process, and the model with pseudo-quantization parameters can be trained by using the model structure with multiple network branches.

For easy understanding, the present specification also provides a method schematic diagram of model deployment so as to make a clear distinction from the existing methods.

Fig. 3 is a schematic diagram of a model deployment method provided in this specification.

The network branch 1, the network branch 2 and the network branch 3 are respectively target network branches which are contained in a preset model and can be replaced in a subsequent optimization stage, in the scheme, a server can firstly determine that the target network branches are replaced by the appointed network branches equivalently, network parameters corresponding to the appointed network branches are appointed, then the initial network parameters of each target branch are updated through the network parameters to obtain updated parameters, the updated parameters are subjected to pseudo-quantization processing, so that a pseudo-quantization processed model containing the target network branches and the pseudo-quantization parameters corresponding to the target network branches is obtained, and the model is trained.

After training is completed, the server can equivalently replace each target network branch contained in the model with a specified network branch, quantization processing is carried out on corresponding network parameters in the specified network branch, and then the model obtained after quantization processing is deployed.

Compared with the prior art, the method has the advantages that the model with the pseudo quantization parameters can be trained through the network structure with more target network branches in the model training process, and compared with the prior art that the model with the external quantization parameters is trained through the network structure with fewer designated network branches, the deployed model precision is obviously improved.

Based on the same idea, the present specification also provides a device for implementing model deployment, as shown in fig. 4.

Fig. 4 is a schematic diagram of a model deployment apparatus provided in the present specification, including:

a first determining module 401, configured to determine each network branch that needs to be replaced and is included in the preset model, as each target network branch;

a second determining module 402, configured to determine, according to the initial network parameter corresponding to each target network branch, a network parameter corresponding to a designated network branch after assuming that each target network branch is equivalently replaced with the designated network branch;

the processing module 403, configured to update the initial network parameter according to the network parameter corresponding to the specified network branch, and perform pseudo quantization processing on the updated initial network parameter;

the deployment module 404 is configured to equivalently replace each target network branch included in the pseudo-quantized model with a designated network branch, and perform model deployment according to the replaced model.

Optionally, the apparatus further comprises:

a training module 405, configured to train the model subjected to the pseudo quantization processing to obtain an optimized model;

the deployment module 404 is specifically configured to perform model deployment on the optimized model.

Optionally, the deployment module 404 is further configured to perform quantization processing on the network parameter corresponding to the specified network branch;

the deployment module 404 is specifically configured to deploy the quantized model.

Optionally, the training module 405 is specifically configured to obtain input data; inputting the input data into the model after pseudo-quantization processing, and determining an output result corresponding to the input data; and training the model after pseudo-quantization processing by taking the minimized deviation between the output result and the actual label corresponding to the input data as an optimization target to obtain the optimized model.

Optionally, the training module 405 is specifically configured to obtain initial input data; and carrying out pseudo quantization processing on the initial input data to obtain the input data.

Optionally, the training module 405 is specifically configured to determine, according to the input data, data distribution corresponding to the input data; and performing regularization processing on the input data according to the data distribution, and determining an output result corresponding to the input data according to the processed input data.

Optionally, the training module 405 is further configured to determine a data distribution corresponding to the historical input data, as a historical data distribution; updating the historical data distribution according to the input data to obtain updated data distribution;

the deployment module 404 is specifically configured to deploy the optimized model according to the updated data distribution.

The present specification also provides a computer readable storage medium having stored thereon a computer program operable to perform a method of model deployment as provided above with respect to fig. 1.

This specification also provides a schematic block diagram of an electronic device corresponding to that of figure 1, shown in figure 5. As shown in fig. 5, at the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile memory, but may also include hardware required for other services. The processor reads a corresponding computer program from the non-volatile memory into the memory and then runs the computer program to implement the model deployment method described in fig. 1. Of course, besides the software implementation, the present specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The description has been presented with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the system embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference may be made to the partial description of the method embodiment for relevant points.

The above description is only an example of the present disclosure, and is not intended to limit the present disclosure. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A method of model deployment, comprising:

and equivalently replacing each target network branch contained in the pseudo-quantized model with a specified network branch, and deploying the model according to the replaced model.

2. The method of claim 1, wherein before equivalently replacing each target network branch contained in the pseudo-quantized model with a designated network branch, the method further comprises:

training the model after pseudo-quantization to obtain an optimized model;

and carrying out model deployment on the optimized model.

3. The method of claim 1 or 2, wherein prior to model deployment according to the replaced model, the method further comprises:

and deploying the quantized model.

4. The method of claim 2, wherein training the model after the pseudo-quantization process to obtain an optimized model comprises:

acquiring input data;

and training the model after the pseudo-quantization processing by taking the minimized deviation between the output result and the actual label corresponding to the input data as an optimization target to obtain the optimized model.

5. The method of claim 4, wherein obtaining input data specifically comprises:

acquiring initial input data;

6. The method of claim 4, wherein inputting the input data into the model after the pseudo quantization process, and determining an output result corresponding to the input data specifically comprises:

determining data distribution corresponding to the input data according to the input data;

and carrying out regularization processing on the input data according to the data distribution, and determining an output result corresponding to the input data according to the processed input data.

7. The method of claim 4, wherein the method further comprises:

deploying the optimized model, specifically comprising:

and deploying the optimized model according to the updated data distribution.

8. An apparatus for model deployment, comprising:

9. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1 to 7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 7 when executing the program.