CN110414679A

CN110414679A - Model training method, device, electronic equipment and computer readable storage medium

Info

Publication number: CN110414679A
Application number: CN201910713733.2A
Authority: CN
Inventors: 陈宝林; 吴志洋; 龚秋棠; 傅松林; 张伟
Original assignee: Xiamen Meitu Technology Co Ltd
Current assignee: Xiamen Meitu Technology Co Ltd
Priority date: 2019-08-02
Filing date: 2019-08-02
Publication date: 2019-11-05

Abstract

The embodiment of the present invention proposes a kind of model training method, device, electronic equipment and computer readable storage medium, is related to technical field of data processing.Method includes that trained floating-point mould is reconstructed into quantization network and floating-point network, based on identical input data, it executes the floating-point network forward inference and obtains the output data of the floating-point network, execute the quantization network forward inference and obtain the output data of the quantization network.Then the loss between the output data of the floating-point network and the output data of the quantization network is calculated, judge whether the loss reaches setting condition, if the not up to described setting condition, training is then iterated to the quantization network based on the loss, until the loss of the output data of the output data and quantization network after repetitive exercise of the floating-point network reaches the setting condition.To realize the convenient training to quantitative model.

Description

Model training method, device, electronic equipment and computer readable storage medium

Technical field

The present invention relates to technical field of data processing, set in particular to a kind of model training method, device, electronics Standby and computer readable storage medium.

Background technique

Deep neural network, which trains the model such as quantitative model come, has preferable ability in feature extraction, can be applied to object The various different tasks such as body classification, target detection and portrait segmentation, however existing quantitative model (network) training method is more Complicated, convenience is to be improved.

Summary of the invention

It can in view of this, the purpose of the present invention is to provide a kind of model training method, device, electronic equipment and computers Read storage medium.

To achieve the goals above, technical solution used in the embodiment of the present invention is as follows:

In a first aspect, the embodiment of the present invention provides a kind of model training method, comprising:

Trained floating-point mould is reconstructed into quantization network and floating-point network；

Based on identical input data, executes the floating-point network forward inference and obtain the output number of the floating-point network According to the execution quantization network forward inference and the output data for obtaining the quantization network；

Calculate the loss between the output data of the floating-point network and the output data of the quantization network；

Judge whether the loss reaches setting condition, if the not up to described setting condition, based on the loss to institute It states quantization network and is iterated training, until the output of the output data of the floating-point network and the quantization network after repetitive exercise The loss of data reaches the setting condition.

In alternative embodiments, the trained floating-point mould is the opening neural network exchange model of floating-point, Described the step of trained floating-point mould is reconstructed into quantization network and floating-point network, comprising:

The floating-point mould is parsed, weight and every layer of output activation value based on the floating-point mould construct weight section Point, running node, input node and output node, wherein the weight is stored in the weight node；Described every layer of output Activation value is stored in the running node；

The static node of graph for constructing network defines network node module and each operation layer mould in the static node of graph Block, wherein the weight node and running node are stored in the network node module, and each operation layer module is according to institute The parameter for stating network node module is built-up；

Floating-point network is constructed, each operation layer module is added in the floating-point network, and is saved according to the input Point determines the input number of the floating-point network forward inference；

Building quantization network, each operation layer module is added in the quantization network, and is saved according to the input Point determines the input number of the quantization network forward inference, wherein carries out for the quantization network to weight and activation value Quantization.

In alternative embodiments, weight is quantified by following steps:

Determine the quantized interval of weight are as follows: [w_min, w_max], w_min=min (WeightMin, 0), w_max= WeightMax, wherein WeightMin is the minimum value of current layer weight, and WeightMax is the maximum value of current layer weight；

Determine the quantization scale zoom factor of weight are as follows: S_w=(w_max-w_min)/255；

Weight is quantified by following formula: q_w=int ((r-w_min)/S_w+0.5), wherein q_w be [0, 255] fixed-point data, r are the floating-point weight in the floating-point mould；

The weight inverse of quantization is turned into floating data by following formula: f_w=S_w*q_w+w_min, wherein f_w For floating data.

In alternative embodiments, activation value is quantified by following steps:

Determine the quantized interval of activation value are as follows: [act_min, act_max], act_min=min (ActMin, 0), act_ Max=ActMax, wherein ActMin is the minimum value of current layer weight, and ActMax is the maximum value of current layer weight；

Determine the quantization scale zoom factor of activation value are as follows: S_act=(act_max-act_min)/255；

Activation value is quantified by following formula: q_act=int ((R-act_min)/S_act+0.5), wherein Q_act is the fixed-point data of [0,255], and R is the floating data that the floating-point mould is exported according to every layer of input data；

The activation value inverse of quantization is turned into floating data by following formula: f_act=S_act*q_act+act_ Min, wherein f_act is floating data.

In alternative embodiments, described to be based on identical input data, execute the floating-point network forward inference simultaneously The output data of the floating-point network is obtained, the quantization network forward inference is executed and obtains the output number of the quantization network According to the step of, comprising:

Obtain input data；

Input data quantization is turned into floating data to setting quantized interval, then inverse；

Using the input data as the input of the floating-point network, executes the floating-point network forward inference and obtain institute State the output data of floating-point network；The floating data that inverse is turned to executes the quantization as the input of the quantization network Network forward inference and the output data for obtaining the quantization network；Before using the data of quantization to setting quantized interval as reality Input to reasoning module, and obtain the output data of the actual forward reasoning module.

In alternative embodiments, described that training is iterated to the quantization network based on the loss, until institute The loss for stating the output data of the output data of floating-point network and the quantization network after repetitive exercise reaches the setting condition Step, comprising:

By all gradient zero setting in the quantization network；

Execute backpropagation and the gradient updating of the loss；

Loop iteration training, until calculating the output data of the floating-point network and quantifying the defeated of network after repetitive exercise The loss of data reaches the setting condition out.

In alternative embodiments, during repetitive exercise, the method also includes:

The output data of the quantization network respective layer is carried out using the output data of the actual forward reasoning module Covering；And

It keeps mean value and the variance fixation in the BN layer of the floating-point network and quantization network not update, keeps the floating-point The Dropout layer fixation of network and quantization network does not update.

Second aspect, the embodiment of the present invention provide a kind of model training apparatus, comprising:

Network reconfiguration unit, for trained floating-point mould to be reconstructed into quantization network and floating-point network；

Reasoning execution unit executes the floating-point network forward inference and obtains institute for being based on identical input data The output data of floating-point network is stated, the quantization network forward inference is executed and obtains the output data of the quantization network；

Costing bio disturbance unit, for calculate the floating-point network output data and it is described quantization network output data it Between loss；

Model training unit, for judging whether the loss reaches setting condition, if the not up to described setting condition, Training is iterated to the quantization network based on the loss, until after the output data of the floating-point network and repetitive exercise The loss of output data of quantization network reach the setting condition.

The third aspect, the embodiment of the present invention provide a kind of electronic equipment, including processor and memory, the memory are deposited Contain the machine-executable instruction that can be executed by the processor, the processor can be performed the machine-executable instruction with Realize any method of aforementioned embodiments.

Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, are stored thereon with computer program, The method as described in any one of aforementioned embodiments is realized when the computer program is executed by processor.

Model training method, device, electronic equipment and computer readable storage medium provided in an embodiment of the present invention, pass through Trained floating-point mould is reconstructed into quantization network and floating-point network, output data and quantization network based on floating-point network Loss between output data is iterated training to quantization network, can obtain the quantization network that loss reaches setting condition, It is more convenient to realize.

To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment is cited below particularly, and cooperate Appended attached drawing, is described in detail below.

Detailed description of the invention

In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.

Fig. 1 shows the block diagram of a kind of electronic equipment provided in an embodiment of the present invention.

Fig. 2 shows a kind of flow diagrams of model training method provided in an embodiment of the present invention.

Fig. 3 shows a kind of be reconstructed into trained floating-point mould provided in an embodiment of the present invention and quantifies network and floating-point The realization configuration diagram of network.

Fig. 4 shows a kind of output data using two actual forward reasoning modules provided in an embodiment of the present invention and distinguishes The schematic diagram that the output data of two operation layer modules is covered.

Fig. 5 shows a kind of another flow diagram of model training method provided in an embodiment of the present invention.

Fig. 6 shows a kind of Stored Procedure schematic diagram of quantitative model provided in an embodiment of the present invention.

Fig. 7 shows a kind of structural parameters of quantitative model provided in an embodiment of the present invention and the Stored Procedure signal of weight Figure.

Fig. 8 shows a kind of block diagram of model training apparatus provided in an embodiment of the present invention.

Icon: 100- electronic equipment；110- memory；120- processor；130- communication module；140- model training dress It sets；141- network reconfiguration unit；142- reasoning execution unit；143- costing bio disturbance unit；144- model training unit.

Specific embodiment

Due to model storage parameter amount it is more, calculation amount is larger, cause model forward inference time-consuming it is more, power consumption compared with It is high, it is difficult to meet common mobile end equipment and Embedded application scenarios.Since the parameter amount of model is more, the training of model Mode is more complicated, and the training time is longer, needs a large amount of graphics processor (Graphics Processing Unit, GPU) It resource and takes a substantial amount of time and carrys out adjustment parameter to reach aimed at precision, model training is more inconvenient.

In the prior art, low bit quantization mainly carried out by floating-point mould to FLOAT32 come reduce model storage, Time delay, power consumption etc..There are two types of its major programmes, one is post-processing operation is directly carried out, such as: NVIDIA TensorRT；It is another Kind is the analog quantization training program of Google (Google).The former need the representative data set of a batch come before carrying out to pushing away The calibration of reason, to reach minimum loss of significance, which requires high-precision scene and is not suitable for some, and largely Dependence experience tune ginseng；The latter mainly perceives the preceding loss into fixed point reasoning process by backpropagation, particularly directed to power Weight and the value that intensifies of every layer of output carry out analog quantization operation, and the quantization perceived loss is applied in weight and is updated, should Mode precision is higher, but needs the longer quantization training time, and needs to rely on and adjust ginseng skill, and the development cycle is long, some Under complex task, effect performance is simultaneously bad.Such as: gradient updating, meeting are carried out based on the different data set training of multiple distributions Cause to quantify activation value exception, precision degradation.Moreover, both modes all have the following problems: when each business men, company And personal (being referred to as third party) realizes there are when larger difference the fixed-point operation operator in deep neural network, will lead to quantization Model generates error in forward inference, and then substantially reduces precision.

In view of this, the embodiment of the present invention provides a kind of model training method, device, electronic equipment and computer-readable deposits Storage media, by the way that trained floating-point mould is reconstructed into quantization network and floating-point network, the output data based on floating-point network Loss between the output data of quantization network is iterated training to quantization network, so that obtaining loss reaches setting condition Quantization network, improve convenience that model (quantization network) training is realized.

For defect present in above scheme, be inventor being obtained after practicing and carefully studying as a result, Therefore, the discovery procedure of the above problem and the solution that hereinafter embodiment of the present invention is proposed regarding to the issue above, all It should be the contribution that inventor makes during invention.

Below in conjunction with attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Usually exist The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.

Therefore, the detailed description of the embodiment of the present invention provided in the accompanying drawings is not intended to limit below claimed The scope of the present invention, but be merely representative of selected embodiment of the invention.Based on the embodiment of the present invention, those skilled in the art Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

It should be noted that the terms "include", "comprise" or its any other variant are intended to the packet of nonexcludability Contain, so that the process, method, article or equipment for including a series of elements not only includes those elements, but also including Other elements that are not explicitly listed, or further include for elements inherent to such a process, method, article, or device. In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including the element Process, method, article or equipment in there is also other identical elements.

Fig. 1 is please referred to, is the block diagram of electronic equipment 100, the electronic equipment 100 in the embodiment of the present invention can be It is able to carry out server, the processing platform etc. of model training.The electronic equipment 100 include memory 110, processor 120 and Communication module 130.The memory 110, processor 120 and each element of communication module 130 are between each other directly or indirectly It is electrically connected, to realize the transmission or interaction of data.For example, these elements can pass through one or more communication bus between each other Or signal wire is realized and is electrically connected.

Wherein, memory 110 is for storing program or data.The memory 110 may be, but not limited to, at random It accesses memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), may be programmed Read-only memory (Programmable Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only Memory, EEPROM) etc..

Data or program of the processor 120 for being stored in read-write memory 110, and execute correspondingly function.

Communication module 130 is used to establish by the network logical between the electronic equipment 100 and other communication terminals Letter connection, and for passing through the network sending and receiving data.

It should be understood that structure shown in FIG. 1 is only the structural schematic diagram of electronic equipment 100, the electronic equipment 100 It may also include than shown in Fig. 1 more perhaps less component or with the configuration different from shown in Fig. 1.Shown in Fig. 1 Each component can be realized using hardware, software, or its combination.

Referring to Fig. 2, being a kind of flow diagram of model training method provided in an embodiment of the present invention, the model training Method is applied to electronic equipment 100 shown in Fig. 1, can be realized by the processor 120 in electronic equipment 100.Model training method Including step S110 to step S150.

Trained floating-point mould is reconstructed into quantization network and floating-point network by step S110.

Step S120 is based on identical input data, executes the floating-point network forward inference and obtain the floating-point net The output data of network executes the quantization network forward inference and obtains the output data of the quantization network.

Step S130 calculates the damage between the output data of the floating-point network and the output data of the quantization network It loses.

Step S140, judges whether the loss reaches setting condition, if the not up to described setting condition, thens follow the steps S150。

Step S150 is iterated training to the quantization network based on the loss, until the floating-point network is defeated The loss of data and the output data of the quantization network after repetitive exercise reaches the setting condition out.

In the embodiment of the present invention, trained floating-point mould, which can store, exchanges (OpenNeural for open neural network Network Exchange, ONNX) format, as the intermediary of each frame network, can facilitate between each network turn It changes.

Fig. 3 is please referred to, by taking trained floating-point mould is the opening neural network exchange model of floating-point as an example, step Trained floating-point mould, which is reconstructed into quantization network and floating-point network, in S110 to be accomplished by the following way.

(1) floating-point mould is parsed, weight and every layer of output activation value based on the floating-point mould construct weight Node (WeightNode), running node (OperationNode), input node and output node.Wherein, weight is stored in institute State weight node.Every layer of output activation value is stored in the running node.

Weight node can mainly store the weighted data of floating-point mould, weight maximum value (w_max), weight minimum value (w_min), weight type (such as: weight, bias, running_mean, running_var), the tensor dimension of weight are big Small and weight node i d (Identity document, identity number) title.It is every that running node can mainly store characteristic pattern The activation Value Data of layer output, the maximum value (act_max) for activating Value Data, the minimum value (act_min) for activating Value Data, behaviour Make parameter (such as: step-length (strides), the convolution kernel size (kernel_size) of convolutional layer), the type of operation layer of layer Connection relationship (inputs and top), the input number of operation layer before and after (such as: convolution, Chi Hua, Relu), operation layer (are not examined Consider input of the weight node to this layer), the node i d title of the size of activation value tensor dimension and activation value.

(2) the static node of graph for constructing network defines network node module (graph_ in the static node of graph ) and each operation layer module (models) node.Wherein, the weight node and running node are stored in the network node of graph mould Block, each operation layer module are built-up according to the parameter of the network node module.It is appreciated that in static node of graph Defined in network node module and each operation layer module be orderly dictionary (OrderedDict ()), in static node of graph Each operation layer module of definition is corresponding with each operation layer module of trained floating-point mould, correspondingly, according to the two operation layer The corresponding relationship of module determines corresponding store in static node of graph of the parameter of each operation layer module of trained floating-point mould In each operation layer module of justice.

(3) floating-point network is constructed, each operation layer module is added in the floating-point network, and according to the input Node determines the input number of the floating-point network forward inference.It can be entitled by one class of floating-point net definitions of building FloatModel, and the nn.Module from pytorch module is derived from, each operation layer module that step (2) construct orderly is added It is added in FloatModel, and determines the input number of floating-point network forward inference according to the input node of step (1) building. Based on above-mentioned setting, the forward inference process of floating-point network can be with are as follows: passes through the orderly dictionary graph_node that step (2) construct Iteration sequence executes each operation layer module.

(4) building quantization network, each operation layer module is added in the quantization network, and according to the input Node determines the input number of the quantization network forward inference.It can be entitled by one class of quantization net definitions of building QuantizedModel, and the nn.Module from pytorch module is derived from, each operation layer module that step (2) construct is had Sequence is added in QuantizedModel, and quantization network forward inference is determined according to the input node of step (1) building Input number.Based on above-mentioned setting, the forward inference process for quantifying network can be with are as follows: passes through the orderly dictionary that step (2) construct Graph_node iteration sequence executes each operation layer module, but quantifies network the difference is that being directed to, can be to weight and activation value Carry out extra quantization operation.In order to clearly illustrate in the embodiment of the present invention, weight and activation value are carried out for quantization network The realization process of extra quantization operation, now carries out illustrated below.

The process quantified to weight may include:

Determine the quantized interval of weight are as follows: [w_min, w_max], w_min=min (WeightMin, 0), w_max= WeightMax, wherein WeightMin is the minimum value of current layer weight, and WeightMax is the maximum value of current layer weight.

Determine the quantization scale zoom factor of weight are as follows: S_w=(w_max-w_min)/255.

Weight is quantified by following formula: q_w=int ((r-w_min)/S_w+0.5), wherein q_w be [0, 255] fixed-point data, r are the floating-point weight (static state) in the floating-point mould.

Exporting the process that activation value is quantified to every layer may include:

Determine the quantized interval of activation value are as follows: [act_min, act_max], act_min=min (ActMin, 0), act_ Max=ActMax, wherein ActMin is the minimum value of current layer weight, and ActMax is the maximum value of current layer weight.act_ Min and act_max according to the number of user's designated statistics sample data, does sliding average and is updated in the training process.

Determine the quantization scale zoom factor of activation value are as follows: S_act=(act_max-act_min)/255.

Activation value is quantified by following formula: q_act=int ((R-act_min)/S_act+0.5), wherein Q_act is the fixed-point data of [0,255], and R is the floating data (dynamic) that the floating-point mould is exported according to every layer of input data. It is appreciated that different according to the floating data of every layer of input data different output.

Signified activation value is not only the data after the output of the activation primitives such as Relu, PRelu in the embodiment of the present invention, and It is the data referred to by layer fusion and the fused operation layer output of activation primitive, as (BN is fused to volume by Conv, Conv+BN Product), Conv+BN+Relu (BN+Relu is fused to convolution), the modules output such as MaxPool data.

In the embodiment of the present invention, step S120 is based on identical input data, executes the floating-point network forward inference simultaneously The output data of the floating-point network is obtained, the quantization network forward inference is executed and obtains the output number of the quantization network According to the step of can be accomplished by the following way: obtain input data, by the input data quantization to setting quantized interval, then Inverse turns to floating data.Using the input data as the input of the floating-point network, execute before the floating-point network to pushing away Manage and obtain the output data of the floating-point network.The floating data that inverse is turned to is held as the input of the quantization network The row quantization network forward inference and the output data for obtaining the quantization network.By quantization to the data of setting quantized interval As the input of actual forward reasoning module, and obtain the output data of the actual forward reasoning module.

Wherein, setting quantized interval can be 0-255, can be straight by input data according to the maximum value and minimum value of weight Quantization is connect to 0-255, then inverse chemical conversion floating data is as the input for quantifying network again, so that the entirely data of quantization network Flowing is floating-point.

Actual forward reasoning module may come from third party, be trained for assisting to quantization network, with the amount of raising Change the precision of network.Fig. 4 is please referred to, in one implementation, the quantization network training stage is being carried out, is handling activation value When quantization, actual forward reasoning module is first called, (inverse turns to floating data using the output data of actual forward reasoning module The output data of quantization network respective layer is covered later).As shown in figure 4, giving using two actual forward reasonings The schematic diagram that the output data of module respectively covers the output data of two operation layer modules.It is understood that right The principle that the output data of three and the above operation layer module is covered is similar therewith, because without illustrating one by one.

In order to improve model training efficiency, improve training effect, in the embodiment of the present invention, can keep described in training Mean value and variance fixation in the BN layer of floating-point network and quantization network do not update, and keep the floating-point network and quantify network Dropout layers of fixation do not update, and produce bigger effect to avoid to weight.

In the embodiment of the present invention, step S150 is based on the loss and is iterated training to the quantization network, until institute State the output data of the output data of floating-point network and the quantization network after repetitive exercise loss reach the setting condition can To be accomplished by the following way: by all gradient zero setting in the quantization network, executing backpropagation and the ladder of the loss It spends and updates, loop iteration training, until calculating the output data of the floating-point network and quantifying the defeated of network after repetitive exercise The loss of data reaches the setting condition out, then the precision of the quantization network after determining repetitive exercise reaches requirement, thus To required quantitative model.

Fig. 5 is please referred to, is a kind of exemplary flow diagram of model training method provided in an embodiment of the present invention. Input data (InputData) is obtained first, by input data quantization to 0-255, then inverse quantization is to floating data.Then it executes Floating-point network forward inference, and obtain the output of floating-point network: OutputFloat=FloatModel (InputData).It executes Quantify network forward inference, and obtains the output of quantization network: OutputQuantized=QuantizedModel (InputData).It calculates floating-point network later and quantifies the loss of the output of network, such as mean square error loss, unknown losses letter Number etc.: mt_loss=MSELoss (OutputQuantized, OutputFloat).Secondly backpropagation and gradient are executed more Newly, as optimizer.zero_grad () # will quantify all gradient zero setting in network, mt_loss.backward () # execution Lose backpropagation, optimizer.step () # gradient updating.The several batches of last loop iteration, project evaluation chain network essence Degree, obtains the quantitative model that precision reaches requirement.

In the embodiment of the present invention, in the quantitative model (output data and iteration of floating-point network for obtaining precision and reaching requirement The loss of the output data of quantization network after training reach the setting condition in the case where quantization network) after, Ke Yijin The storage of row quantitative model.Fig. 6 and Fig. 7 are please referred to, quantitative model can be stored in the following manner.

Layer fusion: if quantization network occurs BN layers, and BN layers are placed on behind convolutional layer or warp lamination, then by BN layers of ginseng Number is merged into convolutional layer；If BN layers of front are not convolutional layer or warp lamination, precision is carried out to the data of BN layers of front and back and is turned It changes, which runs floating data in forward inference.

The fusion of activation primitive: previous in activation primitive if there are the activation primitives such as Relu, PRelu in quantization network Layer increases a flag bit newly, to indicate the type of activation primitive, and the layer is fused to preceding layer.

Obtain quantization parameter: w_max, the w_min saved according to the weight node of quantization network, running node are saved Act_max, act_min, map it onto scale coefficient Sw=(w_max-w_min)/255, Sact=(act_max-act_ min)/255。

Quantify the storage of network architecture parameters and its weight: if (1) input data is not the fixed-point data of 0-255, volume It increases one layer of precision conversion layer newly outside, is switched to the fixed-point data of 0-255.(2) weight all stores the integer at no symbol 8 Data, and be placed in initialization node.(3) according to the number for outputting and inputting number and determining scale coefficient of each operation layer, And it is stored in corresponding operation layer.(4) it is deposited from the front and back connection relationship that the quantization network of building obtains each layer with parameter Storage, and model is saved in specified path.

Referring to Fig. 8, one kind is given below in order to execute the corresponding steps in above-described embodiment and each possible mode The implementation of model training apparatus 140, optionally, the model training apparatus 140 can be applied to above-mentioned electronics shown in FIG. 1 Equipment 100.Further, Fig. 8 is a kind of functional block diagram of model training apparatus 140 provided in an embodiment of the present invention.It needs Illustrate, model training apparatus 140 provided by the present embodiment, the technical effect and the above method of basic principle and generation Embodiment is identical, and to briefly describe, the present embodiment part does not refer to place, can refer in corresponding in above-mentioned embodiment of the method Hold.The model training apparatus 140 includes: network reconfiguration unit 141, reasoning execution unit 142, costing bio disturbance unit 143 and mould Type training unit 144.

Wherein, network reconfiguration unit 141 is used to for trained floating-point mould to be reconstructed into quantization network and floating-point network.

Reasoning execution unit 142 is used to be based on identical input data, executes the floating-point network forward inference and obtains The output data of the floating-point network executes the quantization network forward inference and obtains the output data of the quantization network.

Costing bio disturbance unit 143 is used to calculate the output data of the floating-point network and the output data of the quantization network Between loss.

Model training unit 144 is for judging whether the loss reaches setting condition, if the not up to described setting condition, Training is then iterated to the quantization network based on the loss, until the output data and repetitive exercise of the floating-point network The loss of the output data of quantization network afterwards reaches the setting condition.

Optionally, said units can be stored in memory 110 shown in FIG. 1 in the form of software or firmware (Firmware) In or solidify in the operating system (Operating System, OS) of the electronic equipment 100, and can be by the processor in Fig. 1 120 execute.Meanwhile data, code of program needed for executing said units etc. can store in the memory 110.

Scheme in the embodiment of the present invention be suitable for current all CNN (Convolutional Neural Networks, Convolutional neural networks) structure, such as: classification (VGG (Visual Geometry Group Network), MobileNet, ResNet, SENet etc.), detection (SSD (Single Shot MultiBox Detector), Faster RCNN (Region- CNN), RetinaNet etc.), segmentation (FCN (Fully Convolutional Networks), UNet, SegNet etc.) and scheme Scene as generating tasks such as (GAN (Generative Adversarial Networks) etc.), it is verified, what training obtained It can maintain an equal level substantially with floating-point in the test effect and precision of quantitative model and (not generate the essence more than 1% compared with floating-point mould Spend error).

Quantized interval collection mode and training method according to used by this programme can suitably tolerate different quantification manners Parameter S and Z needed for the quantification manner that brought error such as Google proposes, and can be according to the quantization parameter needed for practical reasoning It is adjusted conversion, without will affect precision.By being inserted into after the execution of each floating-point operation operator in quantization training process The fixed point operator of forward inference, and the input for reacquiring the floating-point operation operator is obtained as the input of fixed point operator with this The output of actual motion forward inference, then the output is covered to the output of original floating-point operation operator, finally by backpropagation, In the gradient feedback to the weight of equivalent layer that every layer is generated, repetitive exercise, until convergence.Improve current main-stream deep learning The error that the fixed point operator that the operation operator and personal practical application scene that frame is realized are realized generates, it is ensured that analog quantization instruction Practice the output of process and the consistency for pinpointing real reasoning process output, the model storage size of compression setting times, before faster To reasoning time delay, model accuracy and development efficiency are improved, exploitation again is avoided and designs caused by a set of forward inference library Huge workload, moreover, the quantization training method of this programme, it can be in the case where not losing precision, quickly from original training Good floating-point mould obtains the much the same quantitative model of precision, compared to precision with higher such as NVIDIA TensorRT, and only It needs less artificial parameter to adjust, there is faster development efficiency and richer application compared to Google quantization training program Scene avoids the need for additional tags data and many and diverse loss function to carry out weight fine tuning, and it is more convenient to realize.

In several embodiments provided herein, it should be understood that disclosed device and method can also pass through Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, flow chart and block diagram in attached drawing Show the device of multiple embodiments according to the present invention, the architectural framework in the cards of method and computer program product, Function and operation.In this regard, each box in flowchart or block diagram can represent the one of a module, section or code Part, a part of the module, section or code, which includes that one or more is for implementing the specified logical function, to be held Row instruction.It should also be noted that function marked in the box can also be to be different from some implementations as replacement The sequence marked in attached drawing occurs.For example, two continuous boxes can actually be basically executed in parallel, they are sometimes It can execute in the opposite order, this depends on the function involved.It is also noted that every in block diagram and or flow chart The combination of box in a box and block diagram and or flow chart can use the dedicated base for executing defined function or movement It realizes, or can realize using a combination of dedicated hardware and computer instructions in the system of hardware.

In addition, each functional module in each embodiment of the present invention can integrate one independent portion of formation together Point, it is also possible to modules individualism, an independent part can also be integrated to form with two or more modules.

It, can be with if the function is realized and when sold or used as an independent product in the form of software function module It is stored in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially in other words The part of the part that contributes to existing technology or the technical solution can be embodied in the form of software products, the meter Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be a People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention. And storage medium above-mentioned includes: that USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic or disk.

The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, made any to repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims

1. a kind of model training method characterized by comprising

Based on identical input data, executes the floating-point network forward inference and obtains the output data of the floating-point network, It executes the quantization network forward inference and obtains the output data of the quantization network；

Judge whether the loss reaches setting condition, if the not up to described setting condition, based on the loss to the amount Change network and be iterated training, until the output data of the output data of the floating-point network and the quantization network after repetitive exercise Loss reach the setting condition.

2. model training method according to claim 1, which is characterized in that the trained floating-point mould is floating-point Open neural network exchange model, described the step of trained floating-point mould is reconstructed into quantization network and floating-point network, packet It includes:

The floating-point mould is parsed, weight and every layer of output activation value building weight node, behaviour based on the floating-point mould Make node, input node and output node, wherein the weight is stored in the weight node；Described every layer of output activation Value is stored in the running node；

The static node of graph for constructing network defines network node module and each operation layer module in the static node of graph, Wherein, the weight node and running node are stored in the network node module, and each operation layer module is according to The parameter of network node module is built-up；

Floating-point network is constructed, each operation layer module is added in the floating-point network, and true according to the input node The input number of the fixed floating-point network forward inference；

Building quantization network, each operation layer module is added in the quantization network, and true according to the input node The input number of the fixed quantization network forward inference, wherein weight and activation value are quantified for the quantization network.

3. model training method according to claim 2, which is characterized in that weight is quantified by following steps:

Weight is quantified by following formula: q_w=int ((r-w_min)/S_w+0.5), wherein q_w is [0,255] Fixed-point data, r be the floating-point mould in floating-point weight；

The weight inverse of quantization is turned into floating data by following formula: f_w=S_w*q_w+w_min, wherein f_w is floating Point data.

4. model training method according to claim 2, which is characterized in that activation value is quantified by following steps:

Determine the quantized interval of activation value are as follows: [act_min, act_max], act_min=min (ActMin, 0), act_max= ActMax, wherein ActMin is the minimum value of current layer weight, and ActMax is the maximum value of current layer weight；

Activation value is quantified by following formula: q_act=int ((R-act_min)/S_act+0.5), wherein q_act For the fixed-point data of [0,255], R is the floating data that the floating-point mould is exported according to every layer of input data；

The activation value inverse of quantization is turned into floating data by following formula: f_act=S_act*q_act+act_min, In, f_act is floating data.

5. model training method according to claim 1, which is characterized in that it is described to be based on identical input data, it executes The floating-point network forward inference and the output data for obtaining the floating-point network execute the quantization network forward inference and obtain The step of taking the output data of the quantization network, comprising:

Obtain input data；

Using the input data as the input of the floating-point network, executes the floating-point network forward inference and obtain described floating The output data of spot net；The floating data that inverse is turned to executes the quantization network as the input of the quantization network Forward inference and the output data for obtaining the quantization network；The data of quantization to setting quantized interval are pushed away as actual forward The input of module is managed, and obtains the output data of the actual forward reasoning module.

6. model training method according to claim 5, which is characterized in that described to be lost based on described to the quantization net Network is iterated training, until the damage of the output data of the floating-point network and the output data of the quantization network after repetitive exercise Mistake reaches the step of setting condition, comprising:

By all gradient zero setting in the quantization network；

Execute backpropagation and the gradient updating of the loss；

Loop iteration training, until calculating the output number of the output data of the floating-point network and the quantization network after repetitive exercise According to loss reach the setting condition.

7. model training method according to claim 5, which is characterized in that during repetitive exercise, the method is also Include:

The output data of the quantization network respective layer is covered using the output data of the actual forward reasoning module； And

It keeps mean value and the variance fixation in the BN layer of the floating-point network and quantization network not update, keeps the floating-point network Dropout layer fixation with quantization network does not update.

8. a kind of model training apparatus characterized by comprising

Reasoning execution unit, for being based on identical input data, executing the floating-point network forward inference and obtaining described floating The output data of spot net executes the quantization network forward inference and obtains the output data of the quantization network；

Costing bio disturbance unit, for calculating between the output data of the floating-point network and the output data of the quantization network Loss；

Model training unit, for judging whether the loss reaches setting condition, if the not up to described setting condition, is based on The loss is iterated training to the quantization network, until the output data and the amount after repetitive exercise of the floating-point network The loss for changing the output data of network reaches the setting condition.

9. a kind of electronic equipment, which is characterized in that including processor and memory, the memory is stored with can be by the place The machine-executable instruction that device executes is managed, the machine-executable instruction can be performed to realize claim 1-7 in the processor Any method.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program Such as method of any of claims 1-7 is realized when being executed by processor.