CN116167431A

CN116167431A - Service processing method and device based on hybrid precision model acceleration

Info

Publication number: CN116167431A
Application number: CN202310454434.8A
Authority: CN
Inventors: 朱闻韬; 李少杰; 黄海亮
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-04-25
Filing date: 2023-04-25
Publication date: 2023-05-26
Anticipated expiration: 2043-04-25
Also published as: CN116167431B

Abstract

The specification discloses a business processing method and device based on hybrid precision model acceleration. First, sample data is acquired, along with a pre-trained business model. And secondly, inputting the sample data into a service model to obtain a standard result. And then, carrying out model frame conversion on the service model to obtain a model to be adjusted. And then, aiming at each network layer of the model to be adjusted, adjusting the parameter precision corresponding to the network layer by taking the deviation between the result output by the model aiming at the sample data and the standard result obtained after the parameter precision corresponding to the network layer is adjusted as constraint. Then, a target model is obtained and deployed. And finally, after receiving the service data, inputting the service data into the target model to obtain an output result aiming at the service data, and executing service processing. The method can improve the reasoning efficiency of the deep learning model under the condition of ensuring the accuracy of the output result of the deep learning model.

Description

Service processing method and device based on hybrid precision model acceleration

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for processing a service based on hybrid precision model acceleration.

Background

Along with the development of the deep learning technology, a developer can train the deep learning model to improve the accuracy of the output result of the deep learning model, and can also improve the efficiency of the deep learning model by optimizing the reasoning process of the deep learning model. Model reasoning refers to the process of calculating input data in a pre-trained deep learning model to obtain an output result.

At present, when most deep learning models are trained, model parameters are stored in a high-precision mode of FP32 or FP 64. However, the high-precision model parameters make the calculation amount of the deep learning model very large, and the calculation process takes a long time.

However, the existing optimization method for the reasoning process of the deep learning model is to convert all model parameters of the deep learning model into FP16 or Int8 precision so as to improve the reasoning efficiency of the deep learning model. However, this necessarily changes the output result of the deep learning model, reducing the accuracy of the output result of the deep learning model.

Therefore, how to improve the reasoning efficiency of the deep learning model under the condition of ensuring the accuracy of the output result of the deep learning model is a problem to be solved urgently.

Disclosure of Invention

The present disclosure provides a method, an apparatus, a storage medium, and an electronic device for processing services based on hybrid precision model acceleration, so as to partially solve the foregoing problems in the prior art.

The technical scheme adopted in the specification is as follows:

the specification provides a business processing method based on hybrid precision model acceleration, which comprises the following steps:

acquiring sample data and a pre-trained service model;

inputting the sample data into the service model to obtain an output result corresponding to the service model, wherein the output result is used as a standard result;

performing model frame conversion on the service model to obtain a model to be adjusted;

for each network layer of the model to be adjusted, adjusting the parameter precision corresponding to the network layer by taking a constraint that a deviation between a result output by the model after the parameter precision corresponding to the network layer is adjusted and the standard result meets a preset condition, wherein the parameter precision corresponding to the network layer is used for representing the parameter precision required by the data input and/or output by the network layer, and the parameter precision corresponding to the network layer before adjustment is larger than the parameter precision corresponding to the network layer after adjustment;

After the parameter precision corresponding to each network layer contained in the model to be adjusted is adjusted, a target model is obtained and deployed;

after receiving the service data, inputting the service data into the target model to obtain an output result aiming at the service data, and executing service processing according to the output result aiming at the service data.

Optionally, performing model framework conversion on the service model to obtain a model to be adjusted, which specifically includes:

converting the service model to obtain an open neural network switching model corresponding to the service model;

and converting the open neural network exchange model to obtain a model to be adjusted.

Optionally, the service model is converted to obtain an open neural network switching model corresponding to the service model, which specifically includes:

analyzing the service model to obtain parameter information of each neural network layer and an operation relation among the neural network layers in the service model;

and converting the parameter information of each neural network layer and the operation relation among the neural network layers in the service model to obtain an open neural network exchange model corresponding to the service model.

Optionally, before converting the service model to obtain an open neural network switching model corresponding to the service model, the method further includes:

determining each operation name in the service model through a model visualization tool;

determining operation names which are not supported by an open neural network switching model from the operation names in the service model;

replacing the operation which is not supported by the open neural network switching model to obtain a replaced service model;

converting the service model to obtain an open neural network switching model corresponding to the service model, which specifically comprises the following steps:

and converting the replaced service model to obtain an open neural network switching model corresponding to the replaced service model.

Optionally, converting the replaced service model to obtain an open neural network switching model corresponding to the replaced service model, which specifically includes:

determining a data size corresponding to service data and a target processor, wherein the target processor is used for executing operations required by the model to be adjusted;

and converting the replaced service model according to the data size corresponding to the service data, the target processor and the operation relation among the neural network layers to obtain an open neural network exchange model corresponding to the replaced service model.

Optionally, converting the open neural network exchange model to obtain a model to be adjusted, which specifically includes:

acquiring a processor version corresponding to the target processor, a model version corresponding to the service model and a model version corresponding to the open neural network switching model;

determining a model version corresponding to a model to be adjusted according to a processor version corresponding to the target processor, a model version corresponding to the service model and a model version corresponding to the open neural network exchange model;

and converting the open neural network exchange model according to the model version corresponding to the model to be adjusted to obtain the model to be adjusted.

storing the serialized data of the open neural network exchange model to obtain a model file corresponding to the open neural network exchange model;

and converting the open neural network exchange model according to a model file corresponding to the open neural network exchange model to obtain a model to be adjusted.

Optionally, the model visualization tool includes: netron visualization tool.

Optionally, for each network layer of the model to be adjusted in turn, the deviation between the result output by the model by adjusting the parameter precision corresponding to the network layer and the standard result by aiming at the sample data meets the preset condition as a constraint, so as to adjust the parameter precision corresponding to the network layer, which specifically includes:

and aiming at each network layer of the to-be-adjusted model in sequence, if the deviation between the result output by the model obtained after the parameter precision corresponding to the network layer is adjusted and the standard result aiming at the sample data is smaller than a set threshold value, determining that a preset condition is met, reserving the adjusted parameter precision corresponding to the network layer, and taking the model obtained after the parameter precision corresponding to the network layer is adjusted as the to-be-adjusted model of the next network layer.

and aiming at each network layer of the model to be adjusted in sequence, if the deviation between the result output by the model aiming at the sample data and the standard result obtained after the parameter precision corresponding to the network layer is adjusted is not smaller than a set threshold value, determining that the preset condition is not met, and not adjusting the parameter precision corresponding to the network layer.

The present specification provides a service processing device based on hybrid precision model acceleration, including:

the acquisition module is used for acquiring sample data and a pre-trained service model;

the input module is used for inputting the sample data into the service model to obtain an output result corresponding to the service model, and the output result is used as a standard result;

the conversion module is used for carrying out model frame conversion on the service model to obtain a model to be adjusted;

the adjustment module is used for sequentially adjusting the parameter precision corresponding to the network layer by taking the deviation between the result output by the model corresponding to the network layer and the standard result and which are obtained by adjusting the parameter precision corresponding to the network layer and aiming at the sample data as constraint, wherein the parameter precision corresponding to the network layer is used for representing the parameter precision required by the data input and/or output by the network layer, and the parameter precision corresponding to the network layer before adjustment is larger than the parameter precision corresponding to the network layer after adjustment;

the deployment module is used for obtaining a target model after the parameter precision corresponding to each network layer contained in the model to be adjusted is adjusted, and deploying the target model;

And the execution module is used for inputting the service data into the target model after receiving the service data, obtaining an output result aiming at the service data, and executing service processing according to the output result aiming at the service data.

Optionally, the conversion module is specifically configured to convert the service model to obtain an open neural network switching model corresponding to the service model, and convert the open neural network switching model to obtain a model to be adjusted.

Optionally, the conversion module is specifically configured to parse the service model to obtain parameter information of each neural network layer in the service model and an operational relationship between the neural network layers, and convert the parameter information of each neural network layer in the service model and the operational relationship between the neural network layers to obtain an open neural network exchange model corresponding to the service model.

The present specification provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the above-described hybrid precision model acceleration-based business processing method.

The present specification provides an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the above-described hybrid precision model acceleration-based business processing method when executing the program.

The above-mentioned at least one technical scheme that this specification adopted can reach following beneficial effect:

in the hybrid precision model acceleration-based service processing method provided in the specification, sample data and a pre-trained service model are acquired. And secondly, inputting the sample data into the service model to obtain an output result corresponding to the service model, and taking the output result as a standard result. And then, carrying out model frame conversion on the service model to obtain a model to be adjusted. And then, for each network layer of the model to be adjusted, adjusting the parameter precision corresponding to the network layer by taking the deviation between the result output by the model obtained by adjusting the parameter precision corresponding to the network layer and the standard result according to the sample data as constraint, wherein the parameter precision corresponding to the network layer is used for representing the parameter precision required by the data input and/or output by the network layer, and the parameter precision corresponding to the network layer before adjustment is larger than the parameter precision corresponding to the network layer after adjustment. And then, after the parameter precision corresponding to each network layer included in the model to be adjusted is adjusted, obtaining a target model, and deploying the target model. And finally, after receiving the service data, inputting the service data into the target model to obtain an output result aiming at the service data, and executing service processing according to the output result aiming at the service data.

According to the method for processing the business based on the mixed precision model acceleration, for each network layer of the model to be adjusted, the deviation between the result output by the model for the sample data and the standard result obtained after the parameter precision corresponding to the network layer is adjusted is used as a constraint, and the parameter precision corresponding to the network layer is adjusted. And then, after the parameter precision corresponding to each network layer included in the model to be adjusted is adjusted, obtaining a target model, and deploying the target model. And finally, after receiving the service data, inputting the service data into the target model to obtain an output result aiming at the service data, and executing service processing according to the output result aiming at the service data. The method can improve the reasoning efficiency of the deep learning model under the condition of ensuring the accuracy of the output result of the deep learning model.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings:

Fig. 1 is a flow chart of a business processing method based on hybrid precision model acceleration according to an embodiment of the present disclosure;

fig. 2 is a schematic flow chart of adjusting a service model according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a service processing device based on hybrid precision model acceleration according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the present specification more apparent, the technical solutions of the present specification will be clearly and completely described below with reference to specific embodiments of the present specification and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present specification. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present specification with reference to the accompanying drawings.

Fig. 1 is a flow chart of a service processing method based on hybrid precision model acceleration according to an embodiment of the present disclosure, which specifically includes the following steps:

S100: sample data is acquired and a pre-trained business model is obtained.

In the embodiment of the present specification, the execution body of the service processing method based on hybrid precision model acceleration provided in the present specification may be an electronic device such as a server or a desktop computer, and for convenience of description, only the server is used as the execution body to describe the service processing method based on hybrid precision model acceleration provided in the present specification.

In the present description embodiment, the server may obtain sample data and a pre-trained business model. The business model mentioned herein may be trained based on business requirements. Sample data as referred to herein may refer to data used to train a business model.

For example, if the service is detection and identification of lung non-small cell cancer, the sample data is positron emission computed tomography (Positron Emission Computed Tomography, PET) and electronic computed tomography (Computed Tomography, CT), wherein the CT images correspond to the PET images one by one. The server may segment the lung region from the CT image by means of an image segmentation model (e.g., UNet). From the lung regions in the CT image, the lung regions in the PET image are determined. And then inputting the lung region in the CT image and the lung region in the PET image into a lung non-small cell cancer detection model to obtain a detection result, and training the lung non-small cell cancer detection model by taking the deviation between the label corresponding to the minimized detection result and the sample data as an optimization target to obtain a trained lung non-small cell cancer detection model. That is, the trained lung non-small cell carcinoma detection model will be used as a pre-trained business model for subsequent model tuning.

S102: and inputting the sample data into the service model to obtain an output result corresponding to the service model, and taking the output result as a standard result.

In the embodiment of the present disclosure, the server may input the sample data into the service model, to obtain an output result corresponding to the service model, as a standard result.

S104: and performing model frame conversion on the service model to obtain a model to be adjusted.

In the embodiment of the present disclosure, the server may perform model framework conversion on the service model to obtain a to-be-adjusted model.

In practical applications, with the development of deep learning technology, a large number of deep learning frameworks are developed, and these deep learning frameworks have respective advantages. However, there is no unified standard format between these deep learning frameworks, and different deep learning frameworks are not compatible with each other. When a neural network model under a certain deep learning framework needs to be transplanted to another deep learning framework, a developer needs to reprogram.

Based on this, the server can help the developer to transform the deep learning model under different deep learning frameworks through an open neural network exchange model (Open Neural Network Exchange, ONNX). The open neural network exchange model is a model intermediate representation format and is used for model conversion between different deep learning frameworks.

In the embodiment of the present disclosure, the server may convert the service model to obtain an open neural network switching model corresponding to the service model.

Then, the server can convert the open neural network exchange model to obtain a model to be adjusted.

Wherein the deep learning framework of the business model mentioned herein may refer to a Pytorch model. The deep learning framework of the model to be adjusted mentioned here may refer to a TensorRT model.

Specifically, the server may analyze the service model to obtain parameter information of each neural network layer and an operational relationship between the neural network layers in the service model. The parameter information mentioned here may refer to model parameters such as convolution kernel shape size, step size, boundary filling value, weight and bias. The operational relationship referred to herein may refer to an operational order among the neural network layers.

And then, the server can convert the parameter information of each neural network layer and the operation relation among the neural network layers in the service model to obtain an open neural network exchange model corresponding to the service model.

In practical application, with the development of deep learning technology, more and more operations are performed in the service model, and operations that cannot be supported by the open neural network switching model may occur in the service model. When operation which cannot be supported by the open neural network switching model occurs in the service model, the service model cannot be converted into the open neural network switching model. Based on the above, the server can check whether the operation which cannot be supported by the neural network switching model exists in the service model, and then replace the operation which cannot be supported by the neural network switching model. And then, converting the replaced service model to obtain an open neural network exchange model.

In the embodiment of the present specification, the server may determine each operation name in the service model through the model visualization tool. The model visualization tool referred to herein may be referred to as Netron. Netron is a visualization tool for neural networks, deep learning, and machine learning models that can generate a visual map with a descriptive nature for the architecture of the deep learning model. For example, information such as an input name, an output name, a dimension of input data, and a dimension of output data of the neural network layer.

Second, the server may determine an operation name that is not supported by the open neural network switching model from among the operation names in the service model.

Specifically, the server may match each operation name in the service model with an operation name supported by the open neural network switching model, so as to determine an operation name not supported by the open neural network switching model.

Then, the server can replace the operation which is not supported by the open neural network exchange model, and a replaced service model is obtained. For example, tensor.view () in the model code is replaced with tensor.reshape (), torch.floor_divide () is replaced with "/" division operation, and slice assignment is replaced with torch.cat (). Note that, the substitution referred to herein means equivalent substitution, and the operation after substitution is the same as the effect achieved by the operation before substitution.

And finally, the server can convert the replaced service model to obtain an open neural network switching model corresponding to the service model.

In practical application, before converting the service model, the server needs to determine the data size corresponding to the service data and a processor for processing the service data, so as to be used in the conversion process of the service model.

In the embodiment of the present disclosure, the server may determine a data size corresponding to the service data and a target processor, where the target processor may be used to perform operations required by the model to be adjusted. The target processor referred to herein may refer to a graphics processor (Graphics Processing Unit, GPU), a central processing unit (Central Processing Unit, CPU), or the like. The server may determine the required processors based on the traffic demand.

And then, the server can convert the replaced service model according to the data size corresponding to the service data, the target processor and the operation relation among the neural network layers to obtain an open neural network exchange model corresponding to the replaced service model.

In practical applications, with the development of deep learning technology, there are various versions of the service model, the open neural network exchange model and the model to be adjusted. The different versions of the business model, the open neural network switching model, and the model to be tuned may not be compatible, which may result in the business model eventually failing to be converted into the model to be tuned.

Based on the above, the server may obtain a version corresponding to the service model and a version corresponding to the open neural network switching model. And determining the version corresponding to the model to be adjusted according to the version corresponding to the service model and the version corresponding to the open neural network exchange model.

In the embodiment of the present disclosure, the server may obtain a processor version corresponding to the target processor, a model version corresponding to the service model, and a model version corresponding to the open neural network switching model. References herein to a processor version may refer to a version of the processing algorithm to which the processor corresponds. For example, a version of the processing algorithm CUDA, a version of the processing algorithm cuDNN, etc.

Then, the server can determine the model version corresponding to the model to be adjusted according to the processor version corresponding to the target processor, the model version corresponding to the service model and the model version corresponding to the open neural network exchange model.

And finally, the server can convert the open neural network exchange model according to the model version corresponding to the model to be adjusted to obtain the model to be adjusted.

In the embodiment of the present disclosure, the server may store the serialized data of the open neural network switching model to obtain a model file corresponding to the open neural network switching model.

And then, the server can convert the open neural network exchange model according to the model file corresponding to the open neural network exchange model to obtain a model to be adjusted.

For example, if the model to be adjusted is TensorRT, the server may create an IBuilder pointer storing the builder instance through the createInfo builder interface, and create an empty network object.

Then, the server may call createpraser to create a parser object corresponding to the open neural network switching model, so as to store model parameters corresponding to the open neural network switching model.

Then, the server can call the parsef rom file function, and fill the network object with the model parameters corresponding to the open neural network exchange model in the parser object. Therefore, the data in the open neural network exchange model is analyzed into the data in the TensorRT, so that the operation layer number corresponding to the TensorRT in the network object is obtained.

Finally, the server may build ICudaEngine with builder to get the TensorRT model.

The server can set parameter precision of the TensorRT model. The TensorRT defaults to FP32, and the server can set the parameter precision to FP16 or INT8. The FP32 referred to herein may refer to a single precision floating point number 32 bit format. The FP16 referred to herein may refer to a half precision floating point number 16-bit format. INT8 as referred to herein may be in a specified point integer arithmetic 8-bit format.

Further, the server may also set parameters such as the number of data samples (batch size) that are grabbed by one training, the maximum workspace size, etc.

S106: and adjusting the parameter precision corresponding to the network layer by taking the deviation between the result output by the model obtained by adjusting the parameter precision corresponding to the network layer and the standard result according to the sample data as constraint in sequence for each network layer of the model to be adjusted, wherein the parameter precision corresponding to the network layer is used for representing the parameter precision required by the data input and/or output by the network layer, and the parameter precision corresponding to the network layer before adjustment is larger than the parameter precision corresponding to the network layer after adjustment.

S108: and after the parameter precision corresponding to each network layer contained in the model to be adjusted is adjusted, obtaining a target model, and deploying the target model.

S110: after receiving the service data, inputting the service data into the target model to obtain an output result aiming at the service data, and executing service processing according to the output result aiming at the service data.

In practical application, the existing optimization method for the reasoning process of the deep learning model is to convert all model parameters of the deep learning model into FP16 or Int8 precision so as to improve the reasoning efficiency of the deep learning model. However, this necessarily changes the output result of the deep learning model, reducing the accuracy of the output result of the deep learning model.

Based on the above, the server may sequentially adjust the parameter precision corresponding to each network layer of the model to be adjusted, and determine whether to keep the model obtained after adjusting the parameter precision corresponding to the network layer according to the deviation between the result output by the model obtained after adjusting the parameter precision corresponding to the network layer for the sample data and the standard result.

In this embodiment of the present disclosure, for each network layer of a model to be adjusted, the server may adjust the parameter precision corresponding to the network layer with respect to a constraint that a deviation between a result output by the model for sample data and a standard result obtained after adjusting the parameter precision corresponding to the network layer meets a preset condition, where the parameter precision corresponding to the network layer is used to represent the parameter precision required by the data input and/or output by the network layer, and the parameter precision corresponding to the network layer before adjustment is greater than the parameter precision corresponding to the network layer after adjustment.

And secondly, the server can obtain a target model after adjusting the parameter precision corresponding to each network layer included in the model to be adjusted, and deploy the target model.

Finally, the server may input the service data to the target model after receiving the service data, obtain an output result for the service data, and perform service processing according to the output result for the service data.

Specifically, the server may sequentially determine, for each network layer of the model to be adjusted, that the deviation between the result output by the model obtained by adjusting the parameter precision corresponding to the network layer for the sample data and the standard result is smaller than a set threshold, that the model meets a preset condition, reserve the adjusted parameter precision corresponding to the network layer, and use the model obtained by adjusting the parameter precision corresponding to the network layer as the model to be adjusted of the next network layer.

If the deviation between the result output by the model aiming at the sample data and the standard result obtained after the parameter precision corresponding to the network layer is adjusted is not smaller than a set threshold value, determining that the preset condition is not met, and not adjusting the parameter precision corresponding to the network layer.

Fig. 2 is a schematic flow chart of adjusting a service model according to an embodiment of the present disclosure.

In fig. 2, the server may obtain sample data as well as a pre-trained business model.

Secondly, the server can input the sample data into the service model to obtain an output result corresponding to the service model as a standard result.

Then, the server can convert the service model to obtain an open neural network exchange model corresponding to the service model, and then convert the open neural network exchange model to obtain a model to be adjusted.

And then, the server can adjust the parameter precision corresponding to the first network layer in the model to be adjusted to obtain the model obtained after the parameter precision corresponding to the first network layer is adjusted. And inputting the sample data into a model obtained after the parameter precision corresponding to the first network layer is adjusted, and obtaining a result output by the model obtained after the parameter precision corresponding to the first network layer is adjusted for the sample data. If the deviation between the result output by the model aiming at the sample data and the standard result obtained after the parameter precision corresponding to the first network layer is adjusted is smaller than a set threshold value, determining that the preset condition is met, and reserving the adjusted parameter precision corresponding to the first network layer.

And then, on the basis of keeping the adjusted parameter precision corresponding to the first network layer, adjusting the parameter precision corresponding to the second network layer in the model to be adjusted to obtain a model obtained after the parameter precision corresponding to the second network layer is adjusted. And inputting the sample data into a model obtained after the parameter precision corresponding to the second network layer is adjusted, and obtaining a result output by the model obtained after the parameter precision corresponding to the second network layer is adjusted for the sample data. If the deviation between the result output by the model aiming at the sample data and the standard result obtained after the parameter precision corresponding to the second network layer is adjusted is not smaller than a set threshold value, determining that the preset condition is not met, and not adjusting the parameter precision corresponding to the network layer. And similarly, after the parameter precision corresponding to each network layer included in the model to be adjusted is adjusted, obtaining a target model, and deploying the target model.

Finally, the server may input the service data to the target model after receiving the service data, obtain an output result for the service data, and perform service processing according to the output result for the service data. Therefore, under the condition of ensuring the accuracy of the output result of the service model, the reasoning efficiency of the service model is improved.

As can be seen from the above process, the method can sequentially adjust the parameter precision corresponding to each network layer of the model to be adjusted by taking the deviation between the result output by the model for the sample data and the standard result obtained after the parameter precision corresponding to the network layer is adjusted as a constraint. And then, after the parameter precision corresponding to each network layer included in the model to be adjusted is adjusted, obtaining a target model, and deploying the target model. And finally, after receiving the service data, inputting the service data into the target model to obtain an output result aiming at the service data, and executing service processing according to the output result aiming at the service data. The method can improve the reasoning efficiency of the deep learning model under the condition of ensuring the accuracy of the output result of the deep learning model.

The above method for processing services based on hybrid precision model acceleration provided for one or more embodiments of the present disclosure, based on the same concept, further provides a corresponding device for processing services based on hybrid precision model acceleration, as shown in fig. 3.

Fig. 3 is a schematic structural diagram of a service processing device based on hybrid precision model acceleration according to an embodiment of the present disclosure, which specifically includes:

an acquisition module 300, configured to acquire sample data and a pre-trained service model;

the input module 302 is configured to input the sample data into the service model, and obtain an output result corresponding to the service model as a standard result;

the conversion module 304 is configured to perform model frame conversion on the service model to obtain a model to be adjusted;

the adjustment module 306 is configured to adjust, for each network layer of the model to be adjusted, the parameter precision corresponding to the network layer with respect to the parameter precision corresponding to the network layer, where the parameter precision corresponding to the network layer is used to represent the parameter precision required by the data input and/or output by the network layer, and the parameter precision corresponding to the network layer before adjustment is greater than the parameter precision corresponding to the network layer after adjustment, with respect to the deviation between the result output by the sample data and the standard result meeting a preset condition as a constraint;

The deployment module 308 is configured to obtain a target model after the parameter precision corresponding to each network layer included in the model to be adjusted is adjusted, and deploy the target model;

and the execution module 310 is configured to input the service data to the target model after receiving the service data, obtain an output result for the service data, and execute service processing according to the output result for the service data.

Optionally, the conversion module 304 is specifically configured to convert the service model to obtain an open neural network switching model corresponding to the service model, and convert the open neural network switching model to obtain a model to be adjusted.

Optionally, the conversion module 304 is specifically configured to parse the service model to obtain parameter information of each neural network layer and an operational relationship between the neural network layers in the service model, and convert the parameter information of each neural network layer and the operational relationship between the neural network layers in the service model to obtain an open neural network exchange model corresponding to the service model.

Optionally, the conversion module 304 is further specifically configured to determine, by using a model visualization tool, each operation name in the service model, determine, from each operation name in the service model, an operation name that is not supported by an open neural network switching model, replace an operation that is not supported by the open neural network switching model, obtain a replaced service model, and convert the replaced service model to obtain an open neural network switching model corresponding to the replaced service model.

Optionally, the conversion module 304 is specifically configured to determine a data size corresponding to the service data and a target processor, where the target processor is configured to execute an operation required by the to-be-adjusted model, and convert the replaced service model according to the data size corresponding to the service data, the target processor, and an operational relationship between each neural network layer, so as to obtain an open neural network switching model corresponding to the replaced service model.

Optionally, the conversion module 304 is specifically configured to obtain a processor version corresponding to the target processor, a model version corresponding to the service model, and a model version corresponding to the open neural network switching model, determine a model version corresponding to a model to be adjusted according to the processor version corresponding to the target processor, the model version corresponding to the service model, and the model version corresponding to the open neural network switching model, and convert the open neural network switching model according to the model version corresponding to the model to be adjusted to obtain the model to be adjusted.

Optionally, the conversion module 304 is specifically configured to store the serialized data of the open neural network switching model, obtain a model file corresponding to the open neural network switching model, and convert the open neural network switching model according to the model file corresponding to the open neural network switching model, so as to obtain a model to be adjusted.

Optionally, the model visualization tool includes: netron visualization tool.

Optionally, the adjusting module 306 is specifically configured to, for each network layer of the to-be-adjusted model in sequence, determine that a preset condition is met if a deviation between a result output by the model obtained by adjusting the parameter precision corresponding to the network layer for the sample data and the standard result is smaller than a set threshold, reserve the adjusted parameter precision corresponding to the network layer, and use the model obtained by adjusting the parameter precision corresponding to the network layer as the to-be-adjusted model of the next network layer.

Optionally, the adjusting module 306 is specifically configured to, for each network layer of the model to be adjusted in sequence, determine that the preset condition is not met and not adjust the parameter precision corresponding to the network layer if the deviation between the result output by the model for the sample data and the standard result obtained after adjusting the parameter precision corresponding to the network layer is not less than a set threshold.

The present specification also provides a computer readable storage medium storing a computer program operable to perform the hybrid precision model acceleration based business processing method provided in fig. 1 above.

The present specification also provides a schematic structural diagram of the electronic device shown in fig. 4. At the hardware level, the electronic device includes a processor, an internal bus, a network interface, a memory, and a non-volatile storage, as described in fig. 4, although other hardware required by other services may be included. The processor reads the corresponding computer program from the nonvolatile memory into the memory and then runs the computer program to realize the business processing method based on the hybrid precision model acceleration provided by the above figure 1.

Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present description, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

In the 90 s of the 20 th century, improvements to one technology could clearly be distinguished as improvements in hardware (e.g., improvements to circuit structures such as diodes, transistors, switches, etc.) or software (improvements to the process flow). However, with the development of technology, many improvements of the current method flows can be regarded as direct improvements of hardware circuit structures. Designers almost always obtain corresponding hardware circuit structures by programming improved method flows into hardware circuits. Therefore, an improvement of a method flow cannot be said to be realized by a hardware entity module. For example, a programmable logic device (Programmable Logic Device, PLD) (e.g., field programmable gate array (Field Programmable Gate Array, FPGA)) is an integrated circuit whose logic function is determined by the programming of the device by a user. A designer programs to "integrate" a digital system onto a PLD without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Moreover, nowadays, instead of manually manufacturing integrated circuit chips, such programming is mostly implemented by using "logic compiler" software, which is similar to the software compiler used in program development and writing, and the original code before the compiling is also written in a specific programming language, which is called hardware description language (Hardware Description Language, HDL), but not just one of the hdds, but a plurality of kinds, such as ABEL (Advanced Boolean Expression Language), AHDL (Altera Hardware Description Language), confluence, CUPL (Cornell University Programming Language), HDCal, JHDL (Java Hardware Description Language), lava, lola, myHDL, PALASM, RHDL (Ruby Hardware Description Language), etc., VHDL (Very-High-Speed Integrated Circuit Hardware Description Language) and Verilog are currently most commonly used. It will also be apparent to those skilled in the art that a hardware circuit implementing the logic method flow can be readily obtained by merely slightly programming the method flow into an integrated circuit using several of the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), programmable logic controllers, and embedded microcontrollers, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, atmel AT91SAM, microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic of the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller may thus be regarded as a kind of hardware component, and means for performing various functions included therein may also be regarded as structures within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present specification.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The present description is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description can take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing is merely exemplary of the present disclosure and is not intended to limit the disclosure. Various modifications and alterations to this specification will become apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of the present description, are intended to be included within the scope of the claims of the present description.

Claims

1. A business processing method based on mixed precision model acceleration is characterized by comprising the following steps:

acquiring sample data and a pre-trained service model;

2. The method of claim 1, wherein performing model framework conversion on the service model to obtain a model to be adjusted specifically comprises:

3. The method of claim 2, wherein converting the service model to obtain an open neural network switching model corresponding to the service model specifically comprises:

4. The method of claim 3, wherein prior to transforming the service model to obtain an open neural network switching model corresponding to the service model, the method further comprises:

5. The method of claim 4, wherein converting the replaced service model to obtain an open neural network switching model corresponding to the replaced service model, specifically comprises:

6. The method of claim 5, wherein converting the open neural network switching model to obtain a model to be adjusted specifically comprises:

7. The method of claim 2, wherein converting the open neural network switching model to obtain a model to be adjusted specifically comprises:

8. The method of claim 4, wherein the model visualization tool comprises: netron visualization tool.

9. The method of claim 1, wherein for each network layer of the model to be adjusted, adjusting the parameter precision corresponding to the network layer with respect to the constraint that a deviation between a result output by the model for the sample data and the standard result obtained after adjusting the parameter precision corresponding to the network layer satisfies a preset condition, specifically includes:

10. The method of claim 1, wherein for each network layer of the model to be adjusted, adjusting the parameter precision corresponding to the network layer with respect to the constraint that a deviation between a result output by the model for the sample data and the standard result obtained after adjusting the parameter precision corresponding to the network layer satisfies a preset condition, specifically includes:

11. A hybrid precision model acceleration-based service processing apparatus, comprising:

12. The apparatus of claim 11, wherein the conversion module is specifically configured to convert the service model to obtain an open neural network switching model corresponding to the service model, and convert the open neural network switching model to obtain a model to be adjusted.

13. The apparatus of claim 11, wherein the conversion module is specifically configured to parse the service model to obtain parameter information of each neural network layer and an operational relationship between the neural network layers in the service model, and convert the parameter information of each neural network layer and the operational relationship between the neural network layers in the service model to obtain an open neural network exchange model corresponding to the service model.

14. A computer readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1-10.

15. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of the preceding claims 1-10 when executing the program.