CN112990457B - Offline quantization optimization method, device, equipment, medium and program product - Google Patents

Offline quantization optimization method, device, equipment, medium and program product Download PDF

Info

Publication number
CN112990457B
CN112990457B CN202110324266.1A CN202110324266A CN112990457B CN 112990457 B CN112990457 B CN 112990457B CN 202110324266 A CN202110324266 A CN 202110324266A CN 112990457 B CN112990457 B CN 112990457B
Authority
CN
China
Prior art keywords
tuning
network
tuned
convolution layer
weight parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110324266.1A
Other languages
Chinese (zh)
Other versions
CN112990457A (en
Inventor
陈泓昊
黄明飞
王海涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Open Intelligent Machine Shanghai Co ltd
Original Assignee
Open Intelligent Machine Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Open Intelligent Machine Shanghai Co ltd filed Critical Open Intelligent Machine Shanghai Co ltd
Priority to CN202110324266.1A priority Critical patent/CN112990457B/en
Publication of CN112990457A publication Critical patent/CN112990457A/en
Application granted granted Critical
Publication of CN112990457B publication Critical patent/CN112990457B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides an off-line quantitative tuning method, an off-line quantitative tuning device, off-line quantitative tuning equipment, off-line quantitative tuning medium and a program product. The method comprises the following steps: the method comprises the steps of obtaining a to-be-tuned optimal network comprising a plurality of convolution layers in a preset network model, adjusting weight parameters of all the convolution layers in the to-be-tuned optimal network, determining tuning output results of the to-be-tuned optimal network in different weight parameter distribution states, and determining tuning weight parameters of all the convolution layers in the to-be-tuned optimal network according to simulation quantized output results and similarity between tuning output results corresponding to different weight parameters so as to realize efficient tuning of the model on the basis of a small amount of input data.

Description

Offline quantization optimization method, device, equipment, medium and program product
Technical Field
The embodiment of the application relates to the technical field of artificial intelligence, and in particular relates to a model offline quantitative tuning method, device, equipment, medium and program product.
Background
With the rapid development of artificial intelligence technology, neural network models have wide application in the fields of system identification, pattern recognition, intelligent control and the like.
At present, for the neural network model, the existing offline quantization mode mostly adopts an iterative weight updating mode, so that the quantization precision is higher. The method mainly comprises two modes, namely offline quantization and quantization retraining, wherein the offline quantization has the advantages of less input quantity and convenient use.
However, in offline quantization training, a need exists for an efficient manner of tuning models when evaluating with such a small number of inputs.
Disclosure of Invention
The offline quantization optimization method, the device, the equipment, the medium and the program product provided by the embodiment of the application can realize efficient optimization of the model on the basis of a small amount of input data.
In a first aspect, an embodiment of the present application provides an offline quantization tuning method, including:
Obtaining a network to be tuned in a preset network model, wherein the network to be tuned comprises a plurality of convolution layers;
Adjusting weight parameters of all convolution layers in the network to be tuned, wherein the output precision of the network to be tuned is kept unchanged;
Determining tuning output results of the network to be tuned in different weight parameter distribution states;
And determining tuning weight parameters of each convolution layer in the network to be tuned according to the simulation quantized output result and the similarity between tuning output results corresponding to different weight parameters.
In one possible design, the adjusting the weight parameters of each convolution layer in the network to be tuned includes:
Distributing basic weight parameter combinations for the to-be-tuned network, wherein the to-be-tuned network comprises N convolution layers, the basic weight parameter combinations comprise N basic weight parameters, the convolution layers of the to-be-tuned network correspond to the basic weight parameters of the basic weight parameter groups one by one, and N is a positive integer;
And distributing a tuning weight parameter combination for the network to be tuned, wherein the tuning weight parameter combination comprises N tuning weight parameters, the convolution layer of the network to be tuned corresponds to the basic weight parameters of the tuning weight parameter group one by one, and the product of the tuning weight parameters relative to the tuning times of the corresponding basic weight parameters is a preset fixed value.
In one possible design, determining the tuning output result of the network to be tuned in different weight parameter allocation states includes:
And determining a tuning output result of the network to be tuned according to different tuning weight parameter combinations, wherein each tuning weight parameter combination is used as a basic weight parameter combination for next tuning weight parameter distribution.
In one possible design, the determining the tuning output result of the network to be tuned according to different tuning weight parameter combinations includes:
If the network to be tuned comprises a first common convolution layer and a second common convolution layer, wherein the second common convolution layer is the rear end of the first common convolution layer, determining that the result output of the second common convolution layer is the tuning output result according to different tuning weight parameter combinations; or alternatively
If the network to be tuned comprises a first common convolution layer, an intermediate specific convolution layer and a second common convolution layer, wherein the intermediate specific convolution layer is the rear end of the first common convolution layer, and the second common convolution layer is the rear end of the intermediate specific convolution layer, determining that the result output of the second common convolution layer is the tuning output result according to different tuning weight parameter combinations.
In one possible design, the first normal convolution layer and the second normal convolution layer have at least one linear rectification function.
In one possible design, the determining the tuning weight parameters of each convolution layer in the network to be tuned according to the analog quantized output result and the similarity between tuning output results corresponding to different weight parameters includes:
Respectively determining cosine similarity between tuning output results corresponding to the analog quantized output results and each sub-tuning weight parameter combination;
And determining the tuning weight parameter combination with the highest cosine similarity as the tuning weight parameter of each convolution layer in the network to be tuned.
In one possible design, the obtaining the network to be tuned in the preset network model includes:
Sequentially acquiring each network to be selected in the preset network model, and determining the number of continuous convolution layers in the network to be selected and the convolution type of each convolution layer;
And if the number of the continuous convolution layers meets the preset layer number condition and the convolution type meets the preset type condition, determining the network to be selected as the network to be tuned.
In a second aspect, an embodiment of the present application provides an offline quantization tuning device, including:
the system comprises an acquisition module, a control module and a control module, wherein the acquisition module is used for acquiring a network to be tuned in a preset network model, and the network to be tuned comprises a plurality of convolution layers;
The processing module is used for adjusting the weight parameters of each convolution layer in the network to be tuned, wherein the output precision of the network to be tuned is kept unchanged;
the processing module is further used for determining tuning output results of the network to be tuned in different weight parameter distribution states;
and the tuning module is also used for determining tuning weight parameters of each convolution layer in the network to be tuned according to the analog quantized output result and the similarity between the tuning output results corresponding to different weight parameters.
In one possible design, the processing module is specifically configured to:
Distributing basic weight parameter combinations for the to-be-tuned network, wherein the to-be-tuned network comprises N convolution layers, the basic weight parameter combinations comprise N basic weight parameters, the convolution layers of the to-be-tuned network correspond to the basic weight parameters of the basic weight parameter groups one by one, and N is a positive integer;
And distributing a tuning weight parameter combination for the network to be tuned, wherein the tuning weight parameter combination comprises N tuning weight parameters, the convolution layer of the network to be tuned corresponds to the basic weight parameters of the tuning weight parameter group one by one, and the product of the tuning weight parameters relative to the tuning times of the corresponding basic weight parameters is a preset fixed value.
In one possible design, the tuning module is specifically configured to:
And determining a tuning output result of the network to be tuned according to different tuning weight parameter combinations, wherein each tuning weight parameter combination is used as a basic weight parameter combination for next tuning weight parameter distribution.
In one possible design, the processing module is specifically configured to:
If the network to be tuned comprises a first common convolution layer and a second common convolution layer, wherein the second common convolution layer is the rear end of the first common convolution layer, determining that the result output of the second common convolution layer is the tuning output result according to different tuning weight parameter combinations; or alternatively
If the network to be tuned comprises a first common convolution layer, an intermediate specific convolution layer and a second common convolution layer, wherein the intermediate specific convolution layer is the rear end of the first common convolution layer, and the second common convolution layer is the rear end of the intermediate specific convolution layer, determining that the result output of the second common convolution layer is the tuning output result according to different tuning weight parameter combinations.
In one possible design, the first normal convolution layer and the second normal convolution layer have at least one linear rectification function.
In one possible design, the tuning module is specifically configured to:
Respectively determining cosine similarity between tuning output results corresponding to the analog quantized output results and each sub-tuning weight parameter combination;
And determining the tuning weight parameter combination with the highest cosine similarity as the tuning weight parameter of each convolution layer in the network to be tuned.
In one possible design, the acquisition module is specifically configured to:
Sequentially acquiring each network to be selected in the preset network model, and determining the number of continuous convolution layers in the network to be selected and the convolution type of each convolution layer;
And if the number of the continuous convolution layers meets the preset layer number condition and the convolution type meets the preset type condition, determining the network to be selected as the network to be tuned.
In a third aspect, an embodiment of the present application further provides an electronic device, including: the device comprises a processor and a memory, wherein the processor is respectively connected with the memory;
The memory is used for storing a computer program of the processor;
Wherein the processor is configured to implement any one of the possible offline quantization tuning methods of the first aspect by executing the computer program.
In a fourth aspect, embodiments of the present application also provide a machine-readable storage medium having stored thereon executable instructions that when executed by a machine cause the implementation of any of the possible offline quantization tuning methods of the first aspect.
In a fifth aspect, embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements any one of the possible offline quantization tuning methods of the first aspect.
In the above technical solution, the to-be-tuned network including a plurality of convolution layers in the preset network model is obtained, the weight parameters of each convolution layer in the to-be-tuned network are adjusted, tuning output results of the to-be-tuned network in different weight parameter distribution states are determined, and tuning weight parameters of each convolution layer in the to-be-tuned network are determined according to the simulation quantized output results and the similarity between the tuning output results corresponding to different weight parameters, so that efficient tuning of the model can be realized on the basis of a small amount of input data.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly explain the drawings used as needed in the embodiments or the description of the prior art. However, it should be understood by those skilled in the art that the drawings in the following description are only some examples of the present application and do not limit the scope thereof.
FIG. 1 is a diagram of an application network architecture of an off-line quantization tuning method according to an exemplary embodiment of the present application;
FIG. 2 is a flow chart of an off-line quantization tuning method according to an exemplary embodiment of the present application;
FIG. 3 is a flow chart of an off-line quantization tuning method according to an exemplary embodiment of the present application;
Fig. 4 is a schematic structural diagram of an off-line quantization tuning apparatus according to another exemplary embodiment of the present application;
fig. 5 is a schematic structural view of an electronic device according to an exemplary embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. It will be appreciated by those of ordinary skill in the art that the embodiments described are some, but not all, of the embodiments of the application. Based on the embodiments in the application, any suitable modification or variation may be made by a person skilled in the art, so as to obtain all other embodiments.
At present, for the neural network model, the existing offline quantization mode mostly adopts an iterative weight updating mode, so that the quantization precision is higher. The method mainly comprises two modes, namely offline quantization and quantization retraining, wherein the offline quantization has the advantages of less input quantity and convenient use. However, when offline quantization training is performed, there is a greater risk of overfitting when evaluating with the small number of inputs described above. However, since the offline quantization loading picture amount is far smaller than the training picture, the existing offline quantization method can not only lose the FP32 precision, but also cause a great amount of overfitting risks.
Therefore, the existing quantization strategy cannot well ensure the off-line quantization optimizing effect and the generalization effect. The large number of non-evaluable weight tuning modes is so large that the end result may be severely overfitted.
In view of this, the embodiments of the present application provide an offline quantization optimization method, apparatus, device, medium, and program product, which aim to efficiently perform optimization on a quantization model based on a small amount of input data. The above technical scheme will be described in detail with reference to specific embodiments.
Fig. 1 is a diagram of an application network architecture of an off-line quantization optimization method according to an exemplary embodiment of the present application. As shown in fig. 1, the offline quantization tuning method provided in this embodiment is applied to a preset neural network model, where the preset neural network model may include a plurality of convolution layers. The interlayer detection module can be used for judging whether two interlayers containing weight parameters meet joint debugging requirements or not, then the interlayer search module is used for scaling the interlayer weight parameters, the final output of the joint debugging layer is kept the same as the original FP32 on the basis of changing weight parameter distribution, and meanwhile the risk of overfitting is reduced to a certain extent. Then, the interlayer search module part may be iterated a plurality of times through the interlayer evaluation module, and the quantization loss is evaluated. Finally, screening the results obtained by the search module according to the evaluation results by utilizing the update module to obtain the final quantized tuning distribution;
Fig. 2 is a flow chart illustrating an offline quantization tuning method according to an exemplary embodiment of the present application. As shown in fig. 2, the offline quantization tuning method provided in this embodiment includes:
s101, acquiring a network to be optimized in a preset network model.
In this step, a network to be tuned in a preset network model is obtained, wherein the network to be tuned includes a plurality of convolution layers. It should be noted that the network to be tuned may include two continuous convolution layers or three continuous convolution layers, and may further include a linear rectification function (RECTIFIED LINEAR Unit, abbreviated as ReLU) activation between the convolution layers. ReLU, also known as a modified linear unit, is a commonly used activation function in artificial neural networks, often referred to as a nonlinear function represented by a ramp function and its variants.
S102, adjusting weight parameters of all convolution layers in the network to be tuned.
In the step, the weight parameters of all convolution layers in the network to be tuned are adjusted, wherein the output precision of the network to be tuned is kept unchanged. The method can be applied to 8bit and lower bit offline quantization optimization, and performs joint equalization on weights among the multi-layer neural networks, so that the distribution of the weights is modified on the basis of not changing the final FP32 output, and the requirement of quantization distribution is met, so that an efficient offline quantization mode with nondestructive FP32 precision is realized.
S103, determining tuning output results of the network to be tuned in different weight parameter distribution states.
And after the weight parameters of all the convolution layers in the network to be tuned are adjusted by utilizing different weight parameters, determining tuning output results of the network to be tuned in different weight parameter distribution states.
S104, determining tuning weight parameters of each convolution layer in the network to be tuned according to the simulation quantized output results and the similarity between tuning output results corresponding to different weight parameters.
And finally, respectively determining cosine similarity between the analog quantized output result and the tuning output result corresponding to each sub-tuning weight parameter combination, and then taking the tuning weight parameter combination with the highest cosine similarity as the tuning weight parameter of each convolution layer in the network to be tuned.
In this embodiment, by acquiring a to-be-tuned network including a plurality of convolution layers in a preset network model, adjusting weight parameters of each convolution layer in the to-be-tuned network, then determining tuning output results of the to-be-tuned network in different weight parameter distribution states, and determining tuning weight parameters of each convolution layer in the to-be-tuned network according to simulation quantization output results and similarity between tuning output results corresponding to different weight parameters, efficient tuning of the model can be achieved on the basis of a small amount of input data.
In addition, the output precision of the network to be tuned is always kept unchanged in the process of adjusting the weight parameters of each convolution layer in the network to be tuned, so that the distribution of weights can be modified on the basis of not changing final output so as to meet the requirement of quantization distribution, and a precision lossless and efficient offline quantization mode is realized.
Fig. 3 is a flow chart illustrating an offline quantization tuning method according to an exemplary embodiment of the present application. As shown in fig. 3, the offline quantization tuning method provided in this embodiment includes:
S201, sequentially acquiring each network to be selected in a preset network model, and determining the number of continuous convolution layers in the network to be selected and the convolution type of each convolution layer.
In the step, each network to be selected in the preset network model is sequentially obtained, and the number of continuous convolution layers in the network to be selected and the convolution type of each convolution layer are determined. The network to be tuned may include two continuous convolution layers or three continuous convolution layers, and the convolution type of the convolution layers may be a normal convolution layer or a specific convolution layer (e.g., DEPTHWISE convolution layers).
S202, if the number of continuous convolution layers meets the preset number of layers condition and the convolution type meets the preset type condition, determining the network to be selected as the network to be optimized.
In this step, for example, the network to be tuned may include two consecutive normal convolutions (relu may be located therebetween), and specifically, may include a first normal convolution layer and a second normal convolution layer, where the second normal convolution layer is a back end of the first normal convolution layer. Or the network to be tuned may include three convolutional layers, specifically, a first normal convolutional layer, an intermediate specific convolutional layer and a second normal convolutional layer, where the intermediate specific convolutional layer is the back end of the first normal convolutional layer, and the second normal convolutional layer is the back end of the intermediate specific convolutional layer. And the continuous convolution layer number accords with the preset layer number condition, and the convolution type accords with the to-be-tuned network with the preset type condition to carry out merging and debugging.
S203, basic weight parameter combinations are distributed for the network to be tuned.
S204, distributing tuning weight parameter combinations for the network to be tuned.
In S203-S204, a basic weight parameter combination may be allocated to the network to be tuned, where the network to be tuned includes N convolution layers, the basic weight parameter combination includes N basic weight parameters, and the convolution layers of the network to be tuned correspond to the basic weight parameters of the basic weight parameter set one by one, where N is a positive integer. The method comprises the steps of distributing tuning weight parameter combinations for a network to be tuned, wherein the tuning weight parameter combinations comprise N tuning weight parameters, a convolution layer of the network to be tuned corresponds to basic weight parameters of a tuning weight parameter set one by one, and products of tuning multiple of each tuning weight parameter relative to the corresponding basic weight parameter are preset fixed values.
In one possibility, the network to be tuned includes 2 convolution layers, and by means of relative scaling of continuous convolution, for example, the first convolution layer is enlarged 10 times, and the second convolution layer is reduced 10 times, the product of tuning times of each tuning weight parameter with respect to the corresponding basic weight parameter is 1, so that the FP32 output of continuous convolution is kept consistent with the original FP32 output.
S205, determining a tuning output result of the network to be tuned according to different tuning weight parameter combinations.
Specifically, according to different tuning weight parameter combinations, a tuning output result of the network to be tuned is determined, wherein each tuning weight parameter combination is used as a basic weight parameter combination for next tuning weight parameter distribution. Namely, under each scaling multiple, corresponding quantized output is realized, and all observation results are recorded and recorded.
Optionally, if the network to be tuned includes a first normal convolution layer and a second normal convolution layer, where the second normal convolution layer is a rear end of the first normal convolution layer, according to different tuning weight parameter combinations, it is determined that a result output of the second normal convolution layer is a tuning output result.
Or if the network to be tuned comprises a first common convolution layer, an intermediate specific convolution layer (for example: DEPTHWISE convolutions) and a second common convolution layer, wherein the intermediate specific convolution layer is the rear end of the first common convolution layer, and the second common convolution layer is the rear end of the intermediate specific convolution layer, determining that the result output of the second common convolution layer is a tuning output result according to different tuning weight parameter combinations.
S206, respectively determining cosine similarity between tuning output results corresponding to the analog quantized output results and the sub-tuning weight parameter combinations.
S207, determining a tuning weight parameter combination with highest cosine similarity as a tuning weight parameter of each convolution layer in the network to be tuned.
On the basis of the above embodiment, the weight quantization tuning of all the continuous convolutions of the full model can be completed by repeating S202-S307. And then, the weight parameters among the multi-layer neural networks are subjected to joint equalization, so that the distribution of the weights is modified on the basis of not changing the final FP32 output, the requirement of quantitative distribution is met, and the effect of quantitative tuning is achieved.
Fig. 4 is a schematic structural diagram of an off-line quantization tuning apparatus according to another exemplary embodiment of the present application. As shown in fig. 4, the offline quantization tuning device 300 provided in this embodiment includes:
An obtaining module 301, configured to obtain a network to be tuned in a preset network model, where the network to be tuned includes a plurality of convolution layers;
The processing module 302 is configured to adjust weight parameters of each convolution layer in the network to be tuned, where output accuracy of the network to be tuned remains unchanged;
the processing module 302 is further configured to determine tuning output results of the network to be tuned in different weight parameter allocation states;
The tuning module 303 is further configured to determine tuning weight parameters of each convolution layer in the network to be tuned according to the analog quantized output result and the similarity between tuning output results corresponding to different weight parameters.
In one possible design, the processing module 302 is specifically configured to:
Distributing basic weight parameter combinations for the to-be-tuned network, wherein the to-be-tuned network comprises N convolution layers, the basic weight parameter combinations comprise N basic weight parameters, the convolution layers of the to-be-tuned network correspond to the basic weight parameters of the basic weight parameter groups one by one, and N is a positive integer;
And distributing a tuning weight parameter combination for the network to be tuned, wherein the tuning weight parameter combination comprises N tuning weight parameters, the convolution layer of the network to be tuned corresponds to the basic weight parameters of the tuning weight parameter group one by one, and the product of the tuning weight parameters relative to the tuning times of the corresponding basic weight parameters is a preset fixed value.
In one possible design, the tuning module 303 is specifically configured to:
And determining a tuning output result of the network to be tuned according to different tuning weight parameter combinations, wherein each tuning weight parameter combination is used as a basic weight parameter combination for next tuning weight parameter distribution.
In one possible design, the processing module 302 is specifically configured to:
If the network to be tuned comprises a first common convolution layer and a second common convolution layer, wherein the second common convolution layer is the rear end of the first common convolution layer, determining that the result output of the second common convolution layer is the tuning output result according to different tuning weight parameter combinations; or alternatively
If the network to be tuned comprises a first common convolution layer, an intermediate specific convolution layer and a second common convolution layer, wherein the intermediate specific convolution layer is the rear end of the first common convolution layer, and the second common convolution layer is the rear end of the intermediate specific convolution layer, determining that the result output of the second common convolution layer is the tuning output result according to different tuning weight parameter combinations.
In one possible design, the first normal convolution layer and the second normal convolution layer have at least one linear rectification function.
In one possible design, the tuning module 303 is specifically configured to:
Respectively determining cosine similarity between tuning output results corresponding to the analog quantized output results and each sub-tuning weight parameter combination;
And determining the tuning weight parameter combination with the highest cosine similarity as the tuning weight parameter of each convolution layer in the network to be tuned.
In one possible design, the obtaining module 301 is specifically configured to:
Sequentially acquiring each network to be selected in the preset network model, and determining the number of continuous convolution layers in the network to be selected and the convolution type of each convolution layer;
And if the number of the continuous convolution layers meets the preset layer number condition and the convolution type meets the preset type condition, determining the network to be selected as the network to be tuned.
In the embodiment of the application, the division of the modules is only one logic function division, and other division modes can be adopted in actual implementation. For example, multiple modules or components may be combined or may be integrated into another system. In addition, the coupling between the various modules may be direct coupling or indirect coupling. In addition, each functional module in the embodiment of the present application may be integrated in one processing module, or may exist separately and physically.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored on a machine-readable storage medium. Accordingly, aspects of the present application may be embodied in a software product, which may be stored on a machine-readable storage medium, which may include instructions for causing an electronic device to perform all or part of the processes of the aspects described in embodiments of the present application. The storage medium may include a ROM, a RAM, a removable disk, a hard disk, a magnetic disk, or an optical disk, etc. various media in which program codes can be stored.
Fig. 5 is a schematic structural view of an electronic device according to an exemplary embodiment of the present application. As shown in fig. 5, the electronic device 400 provided in this embodiment includes:
a processor 401 and a memory 402, the processor 401 being connected to the memory 403;
the memory 402 is configured to store a computer program of the processor 401;
Wherein the processor 401 is configured to implement the steps of any of the method embodiments described above by executing the computer program.
Alternatively, the memory 402 may be separate or integrated with the processor 401.
When the memory 402 is a device independent from the processor 401, the electronic apparatus 400 may further include:
a bus 403 for connecting the processor 401 and the memory 402.
In addition, the embodiment of the application also provides a machine-readable storage medium. The machine-readable storage medium may store executable instructions that, when executed by a machine, cause the machine to perform the specific processes in the above method embodiments.
The machine-readable storage medium of the present application described above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
The embodiments of the present application also provide a program product comprising a computer program stored in a readable storage medium. The computer program may be read from a readable storage medium by at least one processor of an electronic device, the at least one processor executing the computer program to cause the electronic device to perform the steps of the method described above.
Furthermore, those of skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above is merely an embodiment of the present application, and the scope of the present application is not limited thereto. Those skilled in the art can make changes or substitutions within the technical scope of the present disclosure, and such changes or substitutions should be included in the scope of the present disclosure.

Claims (10)

1. The off-line quantitative tuning method is characterized by being applied to a preset neural network model, wherein the off-line quantitative tuning method is used for loading pictures, and comprises the following steps:
Obtaining a network to be tuned in a preset network model, wherein the network to be tuned comprises a plurality of convolution layers;
Adjusting weight parameters of all convolution layers in the network to be tuned, and carrying out joint balance on weights among the multi-layer neural networks, wherein the output precision of the network to be tuned is kept unchanged;
after the weight parameters of all convolution layers in the network to be tuned are adjusted by utilizing different weight parameters, tuning output results of the network to be tuned in different weight parameter distribution states are determined;
determining tuning weight parameters of each convolution layer in the network to be tuned according to the simulation quantized output result and the similarity between tuning output results corresponding to different weight parameters;
the adjusting the weight parameters of each convolution layer in the network to be tuned comprises the following steps:
Distributing basic weight parameter combinations for the to-be-tuned network, wherein the to-be-tuned network comprises N convolution layers, the basic weight parameter combinations comprise N basic weight parameters, the convolution layers of the to-be-tuned network correspond to the basic weight parameters of the basic weight parameter groups one by one, and N is a positive integer;
And distributing a tuning weight parameter combination for the network to be tuned, wherein the tuning weight parameter combination comprises N tuning weight parameters, the convolution layer of the network to be tuned corresponds to the basic weight parameters of the tuning weight parameter group one by one, and the product of the tuning weight parameters relative to the tuning times of the corresponding basic weight parameters is a preset fixed value.
2. The offline quantization tuning method according to claim 1, wherein determining tuning output results of the network to be tuned in different weight parameter allocation states comprises:
And determining a tuning output result of the network to be tuned according to different tuning weight parameter combinations, wherein each tuning weight parameter combination is used as a basic weight parameter combination for next tuning weight parameter distribution.
3. The offline quantization tuning method according to claim 2, wherein the determining the tuning output result of the network to be tuned according to different tuning weight parameter combinations includes:
If the network to be tuned comprises a first common convolution layer and a second common convolution layer, wherein the second common convolution layer is the rear end of the first common convolution layer, determining that the result output of the second common convolution layer is the tuning output result according to different tuning weight parameter combinations; or alternatively
If the network to be tuned comprises a first common convolution layer, an intermediate specific convolution layer and a second common convolution layer, wherein the intermediate specific convolution layer is the rear end of the first common convolution layer, and the second common convolution layer is the rear end of the intermediate specific convolution layer, determining that the result output of the second common convolution layer is the tuning output result according to different tuning weight parameter combinations.
4. The offline quantization tuning method according to claim 3, wherein the first normal convolution layer and the second normal convolution layer have at least one linear rectification function.
5. The method for offline quantization and tuning according to any one of claims 1-4, wherein determining tuning weight parameters of each convolutional layer in the network to be tuned according to the analog quantization output result and the similarity between tuning output results corresponding to different weight parameters includes:
Respectively determining cosine similarity between tuning output results corresponding to the analog quantized output results and each sub-tuning weight parameter combination;
And determining the tuning weight parameter combination with the highest cosine similarity as the tuning weight parameter of each convolution layer in the network to be tuned.
6. The method for offline quantization tuning according to any one of claims 1-4, wherein the obtaining a network to be tuned in a preset network model includes:
Sequentially acquiring each network to be selected in the preset network model, and determining the number of continuous convolution layers in the network to be selected and the convolution type of each convolution layer;
And if the number of the continuous convolution layers meets the preset layer number condition and the convolution type meets the preset type condition, determining the network to be selected as the network to be tuned.
7. The utility model provides an off-line quantization tuning device which is characterized in that is applied to a preset neural network model, and the off-line quantization is loaded and is the picture, and the off-line quantization tuning device includes:
the system comprises an acquisition module, a control module and a control module, wherein the acquisition module is used for acquiring a network to be tuned in a preset network model, and the network to be tuned comprises a plurality of convolution layers;
The processing module is used for adjusting weight parameters of all convolution layers in the network to be regulated and carrying out joint balance on weights among the multi-layer neural networks, wherein the output precision of the network to be regulated is kept unchanged;
The processing module is further used for determining tuning output results of the to-be-tuned network in different weight parameter distribution states after the weight parameters of all convolution layers in the to-be-tuned network are adjusted by using different weight parameters;
the tuning module is also used for determining tuning weight parameters of each convolution layer in the network to be tuned according to the analog quantized output result and the similarity between the tuning output results corresponding to different weight parameters;
the adjusting the weight parameters of each convolution layer in the network to be tuned comprises the following steps:
Distributing basic weight parameter combinations for the to-be-tuned network, wherein the to-be-tuned network comprises N convolution layers, the basic weight parameter combinations comprise N basic weight parameters, the convolution layers of the to-be-tuned network correspond to the basic weight parameters of the basic weight parameter groups one by one, and N is a positive integer;
And distributing a tuning weight parameter combination for the network to be tuned, wherein the tuning weight parameter combination comprises N tuning weight parameters, the convolution layer of the network to be tuned corresponds to the basic weight parameters of the tuning weight parameter group one by one, and the product of the tuning weight parameters relative to the tuning times of the corresponding basic weight parameters is a preset fixed value.
8. An electronic device, comprising: the device comprises a processor and a memory, wherein the processor is respectively connected with the memory;
The memory is used for storing a computer program of the processor;
Wherein the processor is configured to implement the off-line quantization tuning method of any one of claims 1 to 6 by executing the computer program.
9. A machine-readable storage medium having stored thereon executable instructions that when executed by a machine cause the offline quantization tuning method according to any one of claims 1 to 6 to be implemented.
10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the off-line quantitative tuning method of any one of claims 1 to 6.
CN202110324266.1A 2021-03-26 2021-03-26 Offline quantization optimization method, device, equipment, medium and program product Active CN112990457B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110324266.1A CN112990457B (en) 2021-03-26 2021-03-26 Offline quantization optimization method, device, equipment, medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110324266.1A CN112990457B (en) 2021-03-26 2021-03-26 Offline quantization optimization method, device, equipment, medium and program product

Publications (2)

Publication Number Publication Date
CN112990457A CN112990457A (en) 2021-06-18
CN112990457B true CN112990457B (en) 2024-05-03

Family

ID=76333790

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110324266.1A Active CN112990457B (en) 2021-03-26 2021-03-26 Offline quantization optimization method, device, equipment, medium and program product

Country Status (1)

Country Link
CN (1) CN112990457B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system
CN111144511A (en) * 2019-12-31 2020-05-12 上海云从汇临人工智能科技有限公司 Image processing method, system, medium and electronic terminal based on neural network
CN111368978A (en) * 2020-03-02 2020-07-03 开放智能机器(上海)有限公司 Precision improving method for offline quantization tool
CN111723901A (en) * 2019-03-19 2020-09-29 百度在线网络技术(北京)有限公司 Training method and device of neural network model
CN112400176A (en) * 2019-06-12 2021-02-23 上海寒武纪信息科技有限公司 Neural network quantitative parameter determination method and related product

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109146064B (en) * 2018-09-05 2023-07-25 腾讯科技(深圳)有限公司 Neural network training method, device, computer equipment and storage medium
US11676029B2 (en) * 2019-06-12 2023-06-13 Shanghai Cambricon Information Technology Co., Ltd Neural network quantization parameter determination method and related products

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107644254A (en) * 2017-09-09 2018-01-30 复旦大学 A kind of convolutional neural networks weight parameter quantifies training method and system
CN111723901A (en) * 2019-03-19 2020-09-29 百度在线网络技术(北京)有限公司 Training method and device of neural network model
CN112400176A (en) * 2019-06-12 2021-02-23 上海寒武纪信息科技有限公司 Neural network quantitative parameter determination method and related product
CN111144511A (en) * 2019-12-31 2020-05-12 上海云从汇临人工智能科技有限公司 Image processing method, system, medium and electronic terminal based on neural network
CN111368978A (en) * 2020-03-02 2020-07-03 开放智能机器(上海)有限公司 Precision improving method for offline quantization tool

Also Published As

Publication number Publication date
CN112990457A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
Yang et al. Netadapt: Platform-aware neural network adaptation for mobile applications
CN106485316A (en) Neural network model compression method and device
CN110956202B (en) Image training method, system, medium and intelligent device based on distributed learning
CN111723901B (en) Training method and device for neural network model
CN112101525A (en) Method, device and system for designing neural network through NAS
CN110414630A (en) The training method of neural network, the accelerated method of convolutional calculation, device and equipment
CN112906294A (en) Quantization method and quantization device for deep learning model
CN112200296B (en) Network model quantization method and device, storage medium and electronic equipment
CN114444668A (en) Network quantization method, network quantization system, network quantization apparatus, network quantization medium, and image processing method
CN112990457B (en) Offline quantization optimization method, device, equipment, medium and program product
CN112906883A (en) Hybrid precision quantization strategy determination method and system for deep neural network
CN115906927B (en) Data access analysis method and system based on artificial intelligence and cloud platform
Yang et al. Resource-aware pareto-optimal automated machine learning platform
CN116402123A (en) Pre-training model fine tuning method and system based on learning strategy
CN111797991A (en) Deep network model compression system, method and device
CN115392441A (en) Method, apparatus, device and medium for on-chip adaptation of quantized neural network model
CN115345303A (en) Convolutional neural network weight tuning method, device, storage medium and electronic equipment
CN114004334A (en) Model compression method, model compression system, server and storage medium
CN114626284A (en) Model processing method and related device
CN111930670A (en) Heterogeneous intelligent processing quantization device, quantization method, electronic device and storage medium
Cai et al. ACF: An Adaptive Compression Framework for Multimodal Network in Embedded Devices
CN116739049A (en) Network compression method and device and storage medium
CN117454943A (en) Automatic model compression method, device and medium
Khoram et al. TOCO: A framework for compressing neural network models based on tolerance analysis
CN110298438A (en) The method of adjustment and adjustment device of neural network model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant