CN114418097A

CN114418097A - Neural network quantization processing method and device, electronic equipment and storage medium

Info

Publication number: CN114418097A
Application number: CN202210249363.3A
Authority: CN
Inventors: 徐祥; 艾国; 杨作兴; 房汝明; 向志宏
Original assignee: Shenzhen MicroBT Electronics Technology Co Ltd
Current assignee: Shenzhen MicroBT Electronics Technology Co Ltd
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-04-29

Abstract

The present disclosure relates to a neural network quantization processing method, apparatus, electronic device, and storage medium, including: respectively inputting all samples in the sample set into a neural network model containing at least one normalization layer to obtain normalized output data which is output by the at least one normalization layer and respectively corresponds to each sample; obtaining the distribution range of all samples according to the normalized output data; dividing the distribution range into at least two interval ranges; calculating the sub-data of the distribution value of the output data falling into each interval range to obtain the distribution value data, and acquiring the preset number of the distribution value data from each interval range; obtaining samples corresponding to the obtained distribution value data from the sample set according to the obtained distribution value data to form a post-training quantization sample set; training the neural network model by utilizing a post-training quantized sample set to obtain a quantized neural network model; and performing at least one of classification and detection based on the quantitative neural network model.

Description

Neural network quantization processing method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a neural network quantization processing method and apparatus, an electronic device, and a storage medium.

Background

In order to meet the requirements of various AI (Artificial Intelligence) applications on detection precision, the number of the width, the number of layers, the depth, various parameters and the like of a deep neural network structure rapidly rises, so that a deep learning model needs larger space requirements and lower reasoning efficiency. The quantification of the neural network model can convert floating point calculation into low-ratio specific point calculation, the calculation intensity of the model, the parameter size and the memory consumption can be effectively reduced, the reasonable quantification almost has no precision loss, and the method is suitable for most models and use scenes.

However, it still has great challenges to achieve reasonable quantization, for example, after quantization is introduced for a neural network model that has already been trained, post-training quantization is required, although the neural network model may not be retrained, but a part of samples in a sample set is also required to perform quantization calibration, and due to the distribution of the samples in the sample set, the selected samples for post-training quantization still affect the effect of the neural network model. Therefore, how to select the post-training quantization sample set to meet the requirements of operations such as rapid and accurate classification and detection still remains a problem to be solved urgently.

Disclosure of Invention

In view of this, the present disclosure provides a neural network quantization processing method, an apparatus, an electronic device, and a storage medium, so as to improve selection of a post-training quantization sample set, so that samples of the post-training quantization sample set obtained from the sample set are distributed more uniformly, thereby improving an effect of a post-training quantized neural network model, and ensuring fast and accurate execution of operations such as classification and detection by the post-training quantized neural network model.

The technical scheme of the disclosure is realized as follows:

a neural network quantization processing method comprises the following steps:

step A, inputting a kth sample in a sample set into at least one normalization layer in a neural network model containing N normalization layers, wherein the sample set comprises M samples, the value of M is a positive integer greater than 1, the kth sample is any one of the M samples, and the value of N is a positive integer greater than 0;

step B, obtaining a plurality of output data of each normalization layer in the at least one normalization layer;

step C, aiming at each normalization layer, determining at least two interval ranges according to the output data, wherein each interval range has a corresponding reference value;

step D, aiming at each normalization layer, obtaining distribution value subdata according to the reference value of the interval range where each output data is located;

step E, calculating to obtain the distribution value data of the kth sample according to the distribution value sub-data in the at least one normalization layer;

step F, executing the steps A to E on part of or all samples to respectively obtain distribution value data corresponding to each sample;

step G, dividing the obtained partial or all distribution value data into a plurality of blocks, obtaining at least one sample corresponding to each block, and forming a post-training quantized sample set;

and H, training the neural network model by using the post-training quantization sample set to obtain a quantization neural network model.

Further, the step C includes:

calculating a mean value and a standard deviation of the plurality of output data according to the plurality of output data;

determining a standard range interval according to the average value and the standard deviation;

and dividing the region outside the standard range interval into a plurality of non-standard intervals according to the distance between the region and the standard range interval, and respectively setting the reference value of the standard range interval and the reference value of the non-standard intervals.

Further, the determining a standard range interval according to the average value and the standard deviation comprises:

subtracting the standard deviation from the average value to obtain a left boundary;

adding the average value and the standard deviation to obtain a right boundary;

determining an interval between the left boundary and the right boundary as the standard range interval;

and the number of the first and second groups,

the dividing a region outside the standard range interval into a plurality of non-standard intervals according to the distance between the region and the standard range interval, and setting the reference value of the standard range interval and the reference value of the non-standard intervals respectively, includes:

dividing a plurality of non-standard intervals from the left boundary and the right boundary respectively to the direction far away from the standard range interval according to the set interval step length;

and setting a reference value of each non-standard interval according to an interval step length between each non-standard interval and the standard range interval.

Further, the neural network quantization processing method further includes:

setting the standard deviation as the interval step.

Further, the step D includes:

adding the reference values of each interval range in which the output data is positioned to obtain the distribution value subdata corresponding to each normalization layer;

alternatively, the first and second electrodes may be,

and weighting and adding at least one reference value of the reference values of each interval range in which the output data is positioned to obtain the distribution value sub-data corresponding to each normalization layer.

Further, the neural network quantization processing method further includes:

dividing the result obtained by adding the reference values by the total number of the plurality of output data of the normalization layer corresponding to the reference values to obtain distribution value subdata;

alternatively, the first and second electrodes may be,

and weighting and adding at least one reference value of the reference values in the interval range of each output data, and dividing the result by the total number of the output data of the normalization layer corresponding to the reference value to obtain the distribution value sub-data.

Further, the dividing, in the step G, into a plurality of block sections according to the obtained partial or entire distribution value data, and acquiring at least one sample corresponding to each of the block sections, further includes:

monotonously ordering part or all of the distribution value data;

dividing the data of the distribution values sorted by the tones into a plurality of block sections, and acquiring at least one sample data corresponding to each block section.

Further, the number of the acquired sample data respectively corresponding to each of the block sections is equal.

A neural network quantization processing apparatus, comprising:

a sample input module configured to perform: inputting a kth sample in a sample set into at least one normalization layer in a neural network model containing N normalization layers, wherein the sample set comprises M samples, the value of M is a positive integer greater than 1, the kth sample is any one of the M samples, and the value of N is a positive integer greater than 0;

an output data acquisition module configured to perform: obtaining a plurality of output data for each of the at least one normalization layer;

an interval range determination module configured to perform: for each normalization layer, determining at least two interval ranges according to the output data, wherein each interval range has a corresponding reference value;

a distribution value sub-data obtaining module configured to perform: for each normalization layer, obtaining sub-data of the distribution values according to the reference value of the interval range in which each output data is located;

a distribution value data acquisition module configured to perform: calculating to obtain the distribution value data of the kth sample according to the distribution value sub-data in the at least one normalization layer;

a multi-sample distribution value data acquisition module configured to perform: calling a sample input module, an output data acquisition module, an interval range determination module, a distribution value sub-data acquisition module and a distribution value data acquisition module to respectively obtain distribution value data corresponding to each sample;

an interval segment dividing and sample acquiring module configured to perform: dividing the obtained partial or all distribution value data into a plurality of blocks, obtaining at least one sample corresponding to each block, and forming a post-training quantized sample set;

a training module configured to perform: and training the neural network model by utilizing the post-training quantization sample set to obtain a quantization neural network model.

A non-transitory computer readable storage medium storing instructions that, when executed by a processor, cause the processor to perform a neural network quantization processing method as any one of the above.

An electronic device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a neural network quantization processing method as claimed in any one of the preceding claims.

A computer program product comprising a computer program/instructions which, when executed by a processor, implement the steps in the neural network quantization processing method as described above.

It can be seen from the above solution that, in the neural network quantization processing method, the apparatus electronic device and the storage medium according to the embodiments of the present disclosure, all samples in a sample set are respectively input to a neural network including at least one normalization layer, after normalized output data output by the at least one normalization layer and respectively corresponding to each sample is obtained, a sample is selected according to distribution of the normalized output data of each sample in the sample set to form a trained quantized sample set, it is ensured that the selected sample is uniformly distributed in expression of individuality in the sample set, wherein, sample distribution value data is determined according to an element capable of reflecting the individuality of the sample in the normalized output data of each sample falling outside a standard range interval, individuality of the sample is reflected by the sample distribution value data, and then a plurality of intervals are divided according to different sample distribution value data, and the same number of samples are obtained from each block in the sample set from each block to form a post-training quantized sample set, and the post-training quantized sample set contains various representative samples, so that the precision of post-training quantization is improved. On the basis, the neural network model is trained by utilizing the post-training quantization sample set to obtain a quantization neural network model, and at least one operation of classification and detection is executed based on the quantization neural network model, so that the rapid and accurate execution of the operations of classification, detection and the like by the post-training quantization neural network model is ensured.

Drawings

Fig. 1 is a schematic diagram of a neural network quantization processing method according to an embodiment of the present disclosure;

fig. 2 is a diagram illustrating a flow example of a neural network quantization processing method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram illustrating a dividing structure of a standard interval and a non-standard interval according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a neural network quantization processing apparatus according to an embodiment of the disclosure;

FIG. 5 is a flow chart of another neural network quantization processing method according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of another neural network quantization processing apparatus according to an embodiment of the present disclosure;

fig. 7 is a schematic diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more apparent, the present disclosure is further described in detail below with reference to the accompanying drawings and examples.

As shown in fig. 1, the neural network quantization processing method according to the embodiment of the present disclosure includes the following steps 1 to 5.

Step 1, respectively inputting all samples in a sample set into a neural network model containing at least one normalization layer to obtain normalized output data which is output by the at least one normalization layer and respectively corresponds to each sample;

step 2, obtaining the distribution range of all samples according to normalized output data which is output by at least one normalization layer and respectively corresponds to each sample;

step 3, dividing the distribution range into at least two interval ranges;

step 4, calculating the distribution value subdata of the normalized output data of any sample output by all normalization layers falling into each interval range, obtaining distribution value data according to the distribution value subdata, dividing a plurality of intervals, and setting to obtain a preset number of distribution value data from each interval;

step 5, obtaining samples corresponding to the distribution value data of each interval from a sample set to form a post-training quantization sample set;

step 6, training the neural network model by utilizing a post-training quantized sample set to obtain a quantized neural network model;

and 7, performing at least one operation of classification and detection based on the quantitative neural network model.

Wherein, before the normalization layer is introduced, the neural network model is a neural network model which is trained. In some embodiments, the sample set may be a sample set for quantization composed of a portion of samples in a validation set or a test set for the neural network model.

The neural network quantization processing method of the embodiment of the disclosure is suitable for post-training quantization of a neural network model, and in order to ensure consistency between the quantized neural network model and accuracy when the neural network model is not quantized, a post-training quantized sample set needs to be obtained from a sample set for training the neural network model.

In some examples, in order to make the distribution of the samples of the post-training quantized sample set obtained from the sample set more uniform, it is necessary to perform screening from all the samples of the sample set, and therefore, it is necessary to consider the data representation of all the samples in the normalized neural network model, in this case, in the embodiment of the present disclosure, in step 1, all the samples in the sample set are respectively input into the neural network model containing at least one normalization layer, and normalized output data output by at least one normalization layer and respectively corresponding to each sample is obtained, so that the samples are screened in the subsequent steps to form the post-training quantized sample set.

In some examples, the normalized output data is expressed in an array, a matrix form, and includes a plurality of elements, in which case, the step 4 calculates the distribution value sub-data of the normalized output data of any one sample output by all normalization layers falling within each interval range, and obtains the distribution value data according to the distribution value sub-data, including the following steps 211 to 215.

Step 211, obtaining an average value of all elements of the normalized output data (i.e. normalized output data of any sample output by any normalization layer) and a standard deviation of all elements according to the normalized output data of any sample output by any normalization layer.

And step 212, determining a standard range interval according to the average value and the standard deviation.

For example, a standard range interval [ - σ, σ ], σ is a positive number, and the distances to σ are 1 σ,2 σ,3 σ … … n σ, respectively, based on the right boundary σ of the interval (where n σ represents the multiplication of n and σ), where n is a positive integer, thus dividing n non-standard intervals, i.e., (1 σ,2 σ ], (2 σ,3 σ), … …, ((n-1) σ, n σ ]. similarly, the distances to σ are 1 σ,2 σ,3 σ … … m σ, respectively, based on the left boundary- σ of the interval (where m σ represents the multiplication of m and σ), where m is a positive integer, thus dividing m non-standard intervals, i.e., [ -2 σ, -1 σ), [ -3 σ, -2 σ), … …, [ -m, (-1) of m.

In step 213, a plurality of non-standard intervals are divided into regions other than the standard range interval according to the distance from the standard range interval, and a reference value of each non-standard interval is set.

For example, the non-standard intervals include (1 σ,2 σ ], (2 σ,3 σ), … …, ((n-1) σ, n σ ], and [ -2 σ, -1 σ), [ -3 σ, -2 σ), … …, [ -m σ, - (m-1) σ).

And step 214, obtaining the distribution value subdata of the normalized output data of any sample output by any normalization layer according to the elements of the normalized output data of any sample output by any normalization layer, which fall into each interval, and the reference value corresponding to each interval.

In the embodiment of the present disclosure, the distribution value sub-data is obtained by adding the reference values according to the reference values of the elements of the normalized output data that fall into the standard range interval and the reference values of the elements that fall into the respective non-standard intervals.

In another embodiment of the present disclosure, it is considered to be a reasonable point that the reference values of the elements falling within the standard range interval, for example, [ - σ, σ ], are counted as 0, then the reference values of the elements falling within each interval depend on the reference values of the elements falling within each non-standard interval, and the reference values of the elements of each non-standard interval are added to obtain the distribution value sub-data.

In another embodiment of the present disclosure, a plurality of non-standard range sections are divided according to the distance from the standard range section, and a reference value of each non-standard range section is set. For example, the standard range section is sequentially divided into a plurality of non-standard ranges according to a certain step length according to the left boundary or the right boundary of the standard range section, and the reference value of each non-standard range section is set. The certain step length may be the same step length for all the non-standard range sections, and of course, different step lengths may also be set as required, for example, the farther the distance from the standard range section is, the longer the set step length may be, and it should be understood by those skilled in the art that this does not constitute a limitation to the present disclosure. Wherein the respective reference values of the respective non-standard range sections are set, different reference values may be set for all the non-standard range sections, or the symmetrical non-standard range spaces may be set to the same reference value, that is, those non-standard range spaces having the same distance from the center point of the standard range space, for example, for the standard range section, e.g., [ - σ, σ ], the step size is σ, and for (2 σ,3 σ ] or [ -3 σ, -2 σ), the same reference value is set. Of course, different reference values may be set as needed. In addition, the step size may be other values, such as a, and the same value or different values may be set similarly for 3A and-3A.

In an embodiment of the disclosure, the respective reference value of each non-standard range section is set, and the reference value of each non-standard range section may be set according to a value of a specific section of the non-standard range section located at a left boundary or a right boundary apart from the standard range section. For example, if the standard range interval is [ - σ, σ ], and the step size is σ, then the non-standard range intervals are (1 σ,2 σ ], (2 σ,3 σ), … …, ((n-1) σ, n σ ], and the corresponding reference values are 1, 2, 3 … … n, respectively, although other settings may be made as necessary.

Step 215, obtaining the distribution value data of any sample according to the distribution value subdata of the normalized output data of any sample output by all the normalization layers.

In step 211, first, an average value of all elements of the normalized output data is obtained according to all elements of the normalized output data; then, averaging the square sum of the difference between each element of the normalized output data and the average value to obtain the element variance of all elements of the normalized output data; and finally, taking the arithmetic square root of the element variance of all elements of the normalized output data to obtain the standard deviation of all elements of the normalized output data.

In some examples, each element of the normalized output data generally conforms to a normal distribution, that is, the number of elements far from the average value is larger, and the number of elements far from the average value is smaller, and there may be some elements that are too far from the average value, and these elements far from the average value are elements capable of reflecting different characteristics of the sample from other samples.

Based on this, the determining of the standard range interval according to the mean and the standard deviation of step 212 includes:

step 2121, subtracting the standard deviation from the average value to obtain a left boundary;

step 2122, adding the average value and the standard deviation to obtain a right boundary;

and step 2123, determining the interval between the left boundary and the right boundary as a standard range interval.

In the embodiment of the disclosure, the elements within the standard range interval reflect the commonalities among the samples, and the elements outside the standard range interval reflect the individuality of each sample (i.e., different from those of other samples). According to the embodiment of the disclosure, the samples with different representativeness are selected from the sample set and used as the post-training quantization sample set, so that the post-training quantization sample set contains various representative samples, and the precision of post-training quantization is further improved.

In some examples, the greater the distance from the average, the greater the contribution to the personality of the sample, in which case, the elements falling outside the standard range interval are scored, and the total score corresponding to the sample is obtained according to the scores of all the elements falling outside the standard range interval in the sample, so that the personality of the sample can be reflected by the total score, and then the selection of the sample is performed according to the total score of each sample, in which case, the scoring of the elements falling outside the standard range interval needs to be specified.

In this case, in some examples, dividing the region outside the standard range section into a plurality of non-standard sections according to the distance from the standard range section and setting the reference value of each non-standard section in step 213 includes:

2131, dividing a plurality of non-standard intervals from two sides of the standard range interval to the direction far away from the standard range interval according to the set interval step length;

and 2132, setting a reference value of each non-standard interval according to the interval step length between each non-standard interval and the standard range interval.

In some embodiments, the neural network quantization processing method of the present disclosure may further include:

the standard deviation is set as the interval step.

After the scheme is adopted, the falling elements can be scored according to the reference values of the non-standard intervals. In this case, in some examples, the step 214 of obtaining the distribution value sub-data of the normalized output data of any one sample output by any one normalization layer according to the elements of the normalized output data of any one sample falling into each section and the reference values corresponding to each section, where the distribution value sub-data is obtained according to the elements falling into each non-standard section and the reference values corresponding to each non-standard section, where in one case, the reference values corresponding to the elements falling into the standard section may be set to 0, includes:

2141, obtaining a distribution value corresponding to any one element according to any one element of normalized output data of any one sample output by any one normalization layer, wherein the element falls into any one non-standard interval;

2142, adding the distribution values corresponding to all elements falling into any non-standard interval of the normalized output data of any sample output by any normalization layer to obtain the distribution value subdata of the normalized output data of any sample output by any normalization layer;

step 2143, add the sub data of the distribution value of the normalized output data of any sample output by all the normalization layers to obtain the data of the distribution value of any sample.

In step 2142, the sum of the distribution values corresponding to all the elements is used as the distribution value sub-data of the normalized output data of the arbitrary sample. In addition, the sub-data of the distribution value of the normalized output data of the arbitrary sample may be obtained by, for example, multiplication or averaging as necessary. For example, obtaining the distribution value subdata of the normalized output data of any sample of any normalization layer output in step 214 may include: step 2142', a distribution value of the normalized output data of any sample output by any normalization layer is obtained, and the distribution value is divided by the total number of the normalized output data of each normalization layer to obtain the distribution value sub-data.

After obtaining the distribution value subdata of the normalized output data of any one sample output by each normalization layer, it is necessary to obtain the total distribution value data obtained by the quantized neural network model for the sample, in this case, in some examples, the obtaining of the distribution value data of any one sample according to the distribution value subdata of the normalized output data of any one sample output by all the normalization layers in step 215 includes:

and adding the sub-data of the distribution value of the normalized output data of any sample output by all the normalization layers to obtain the data of the distribution value of any sample.

The distribution value data of each sample can be obtained by the above examples. Obviously, the distribution value data are not all equal because of different samples, and therefore, before selecting a sample, it is necessary to determine the blocks of the distribution value data of all samples, in this case, in some examples, in order to obtain a plurality of samples with distinct individuality in the sample set, rather than the samples with individual convergence, so as to ensure the comprehensiveness of the post-training quantized sample set, in step 4, the plurality of blocks are divided, and it is set that a preset number of distribution value data are obtained from each block, including:

monotonously ordering the distribution value data of all samples;

in the monotonously ordered distribution value data, a plurality of section segments are divided according to the number of the distribution value data. The interval referred to here may differ from the preceding interval ranges.

In order to ensure consistency in the number of samples acquired in each interval, and to ensure comprehensiveness of the composed post-training quantized sample set, in some examples, the number of distribution value data is equal in each interval range.

In step 4, a preset number of distribution value data is acquired from each block, and in step 5, a sample corresponding to the distribution value data of each block is acquired from the sample set according to the acquired distribution value data to form a post-training quantized sample set. After the scheme is adopted, the number of the obtained samples is equal in each interval, and further the number of the samples with each personality is equal, and all the samples forming the post-training quantized sample set can cover the samples with various personalities and uniform distribution in the sample set, so that the post-training quantized sample set contains various representative samples, and further the precision of post-training quantization is improved.

After step 5 is completed, in step 6, the neural network model is trained by using the post-training quantization sample set to obtain a quantization neural network model, so that the quantization training of the neural network model is completed. In combination with an actual application scene, the quantitative neural network model can perform applications in various aspects such as classification and detection, and can ensure fast and accurate reasoning. Furthermore, after step 6 is completed, at least one of classification and detection is performed based on the quantized neural network model in step 7, so that the operations of classification, detection and the like can be quickly and accurately performed by the post-training quantized neural network model.

The neural network quantitative processing method of the embodiment of the disclosure inputs all samples in a sample set into a neural network model containing at least one normalization layer respectively, obtains normalized output data output by the at least one normalization layer and respectively corresponding to each sample, selects the samples according to the distribution of the normalized output data of each sample in the sample set to form a training quantitative sample set, ensures that the selected samples are uniformly distributed in the expression of individuality in the sample set, wherein, according to the elements which can reflect the individuality of the samples and fall outside the standard range interval in the normalized output data of each sample, the sample distribution value data is determined, the individuality of the samples is reflected by the sample distribution value data, then according to the difference of the sample distribution value data, the same number of samples are obtained from each interval in the sample set to form the post-training quantitative sample set, the post-training quantization sample set contains various representative samples, and the precision of post-training quantization is improved. On the basis, the neural network model is trained by utilizing the post-training quantization sample set to obtain a quantization neural network model, and at least one operation of classification and detection is executed based on the quantization neural network model, so that the rapid and accurate execution of the operations of classification, detection and the like by the post-training quantization neural network model is ensured.

As shown in fig. 2, a diagram of an example of a flow of a neural network quantization processing method according to an embodiment of the present disclosure is shown, and the example includes the following steps.

Step a, obtaining a sample set, and executing the following steps b to g on each sample in the sample set.

Step b, inputting the samples into a neural network model containing at least one normalization layer, obtaining the normalized output data which is output by each normalization layer and corresponds to the samples, and executing the following steps c to f on the normalized output data which is output by each normalization layer and corresponds to the samples.

The sample set is a sample set for quantification, wherein the sample set is composed of a part of samples in a verification set or a test set of the neural network model.

And c, obtaining the average value and the standard deviation of all elements of the normalized output data according to the normalized output data of the sample output by the normalization layer.

And d, determining a standard range interval according to the average value and the standard deviation of all elements of the normalized output data.

In an alternative embodiment, in step d, the average is subtracted from the standard deviation to obtain a left boundary of a standard range interval, the average is added to the standard deviation to obtain a right boundary of the standard range interval, and an interval between the left boundary and the right boundary is taken as the standard range interval.

And e, dividing the area outside the standard range interval into a plurality of non-standard intervals according to the distance between the area and the standard range interval, and setting the reference value of each non-standard interval.

In an alternative embodiment, the standard deviation is set as a section step, a plurality of non-standard sections are divided according to the section step from both sides of the standard range section to a direction away from the standard range section, and the reference value of each non-standard section is set according to the section step between each non-standard section and the standard range section.

Fig. 3 shows a division structure of the standard interval and the non-standard interval in the embodiment of the present disclosure, as shown in fig. 3, the average value of all elements of the normalized output data is μ and the standard deviation is σ, the standard range section is a region between μ - σ and μ + σ centered on μ, the reference value of the first non-standard section in the range (σ) extending 1 section step from the standard range section to a direction away from (left and right sides of) the standard range section is 1 time σ, the reference value of the second non-standard section in the range extending 1 further σ from the first non-standard section to a direction away from the standard range section is 2 times σ, the reference value of the third non-standard section in the range extending 1 further σ from the second non-standard section to a direction away from the standard range section is 3 times σ, and so on, a plurality of non-standard sections are divided, and the reference values of the respective non-standard sections are obtained. In the aforementioned embodiment of setting the standard range interval [ - σ, σ ], the average value of all elements of the normalized output data is μ = 0.

And f, obtaining the distribution value subdata of the normalized output data according to the elements of the normalized output data falling into the non-standard intervals and the reference values corresponding to the non-standard intervals.

In an alternative embodiment, the elements falling within the standard interval in the normalized output data are not considered or the distribution value of the elements falling within the standard interval in the normalized output data is set to 0. The elements in the normalized output data that fall into the non-standard intervals are compared with the respective non-standard intervals, and if an element falls into a certain non-standard interval, the distribution value of the element is assigned as the reference value of the non-standard interval into which the element falls, for example, if an element falls into a first non-standard interval, the distribution value of the element is assigned as 1-fold σ, if an element falls into a second non-standard interval, the distribution value of the element is assigned as 2-fold σ, if an element falls into a third non-standard interval, the distribution value of the element is assigned as 3-fold σ, and so on.

In an alternative embodiment, in the case of not considering the elements falling into the standard interval in the normalized output data, the distribution values of all the elements falling into the non-standard interval in the normalized output data are added to obtain the distribution value sub-data of the normalized output data.

In an alternative embodiment, when the distribution value of the element falling into the standard interval in the normalized output data is set to 0, the distribution values of all the elements in the normalized output data are added to obtain the distribution value sub-data of the normalized output data.

Thus, the acquisition of the distribution value sub-data of the normalized output data of one sample by one normalization layer is completed.

And g, obtaining the distribution value data of any sample according to the distribution value subdata of the normalized output data of the sample output by all normalization layers.

In an alternative embodiment, the quantized neural network model may have a structure with more than one normalization layer, and after a sample is input into the quantized neural network model, each normalization layer obtains its own normalized output data, in this case, after a sample is input into the quantized neural network model to obtain the normalized output data corresponding to the sample output by all normalization layers, the obtained normalized output data corresponding to the sample output by all normalization layers are added to obtain the distribution value data of the sample. The distribution value data for the sample characterizes the total contribution made to the distribution value for the sample by all normalization layers in the quantitative neural network model.

And (c) inputting all samples in the sample set into a neural network model containing at least one normalization layer and executing the processes from the step b to the step g to obtain the distribution value data of all the samples.

And h, determining the total distribution range of the distribution value data of all samples as the distribution range of the distribution value data of all samples.

And i, monotonously sequencing the distribution value data of all samples.

In alternative embodiments, the distribution value data for all samples may be monotonically ordered in order of large to small or small to large.

And j, dividing the distribution range into at least two section sections according to the quantity of the distribution value data in the monotonously ordered distribution value data.

In an alternative embodiment, the number of distribution value data is equal in each section. Each distribution value data corresponds to one sample one by one, and the number of the distribution value data is the number of the samples, so that the sequencing of the distribution value data is the sequencing of the samples. And dividing the distribution range into at least two sections according to the number of the distribution value data, namely sequencing all samples according to the size of the distribution value data.

The distribution value data is obtained from elements in the normalized output data reflecting the sample personality, so that the distribution value data also reflects the sample personality, and the sorting of all samples according to the size of the distribution value data represents the personality sorting of the samples.

And k, acquiring the same quantity of distribution value data from each section.

In an alternative embodiment, the same number of distribution value data may be acquired in each interval range by a random method.

In an alternative embodiment, the same amount of distribution value data may be acquired in each of the block sections according to a preset condition. For example, a predetermined number of distribution value data of the start position and/or the middle position and/or the end position within each block section is acquired.

And step l, obtaining samples corresponding to the obtained distribution value data from the sample set according to the obtained distribution value data to form a post-training quantization sample set.

The distribution value data reflects sample individuality, so that the individuality of samples of which the distribution value data are in the same block is similar, in this case, the same number of distribution value data are obtained from each block, and corresponding samples are obtained from the sample set according to the obtained distribution value data, so that the individuality of various samples in the sample set is covered, and the formed post-training quantization sample set contains various representative samples, and the accuracy of post-training quantization can be improved.

And m, training the neural network model by utilizing the post-training quantization sample set to obtain a quantization neural network model.

And n, performing at least one operation of classification and detection based on the quantitative neural network model.

The neural network model is trained by utilizing the post-training quantization sample set to obtain a quantization neural network model, and at least one operation of classification and detection is executed based on the quantization neural network model, so that the rapid and accurate execution of the operations of classification, detection and the like by the post-training quantization neural network model is ensured.

Fig. 4 is a schematic structural diagram of a neural network quantization processing apparatus according to an embodiment of the present disclosure, and as shown in fig. 4, the neural network quantization processing apparatus according to the embodiment of the present disclosure includes a normalized data obtaining unit 401, a distributed data obtaining unit 402, an interval dividing unit 403, a distributed value selecting unit 404, a sample set generating unit 405, a training unit 406, and an executing unit 407. The normalized data obtaining unit 401 is configured to input all samples in the sample set into a neural network model including at least one normalization layer, and obtain normalized output data output by the at least one normalization layer and corresponding to each sample. A distribution data obtaining unit 402, configured to obtain distribution value data and a distribution range of all samples according to normalized output data output by at least one normalization layer and respectively corresponding to each sample. A section dividing unit 403, configured to divide the distribution range into at least two section ranges. A distribution value selecting unit 404, configured to calculate distribution value sub-data of normalized output data of any one sample output by all normalization layers falling within each interval range, obtain distribution value data according to the distribution value sub-data, and obtain a preset number of distribution value data from each interval. A sample set generating unit 405, configured to acquire samples corresponding to the acquired distribution value data from the sample set according to the acquired distribution value data, and compose a post-training quantized sample set. And the training unit 406 is configured to train the neural network model by using the post-training quantization sample set to obtain a quantization neural network model. The execution unit 407 is configured to perform at least one of classification and detection based on the quantized neural network model.

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

With regard to the neural network quantization processing apparatus in the above-described embodiment, the specific manner in which each unit performs the operation has been described in detail in the embodiment related to the neural network quantization processing method, and will not be described in detail here.

It should be noted that: the foregoing embodiments are merely illustrated by the division of the functional modules, and in practical applications, the above functions may be distributed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above.

Based on the above embodiment related to the flowchart shown in fig. 2, the embodiment of the present disclosure further provides another neural network quantization processing method, fig. 5 is a flowchart schematic diagram of another neural network quantization processing method in the embodiment of the present disclosure, and as shown in fig. 5, the neural network quantization processing method includes the following steps.

Step A, inputting a kth sample in a sample set into at least one normalization layer in a neural network model containing N normalization layers, wherein the sample set comprises M samples, the value of M is a positive integer larger than 1, the kth sample is any one of the M samples, and the value of N is a positive integer larger than 0.

And B, obtaining a plurality of output data of each normalization layer in at least one normalization layer.

And step C, determining at least two interval ranges according to the plurality of output data aiming at each normalization layer, wherein each interval range has a corresponding reference value.

And D, aiming at each normalization layer, obtaining the sub-data of the distribution values according to the reference value of the interval range in which each output data is positioned.

And E, calculating to obtain the distribution value data of the kth sample according to the distribution value sub-data in the at least one normalization layer.

And F, executing the steps A to E on part of or all the samples to respectively obtain the distribution value data corresponding to each sample.

And G, dividing the obtained partial or all distribution value data into a plurality of sections, acquiring at least one sample corresponding to each section, and forming a post-training quantized sample set.

And step H, training the neural network model by utilizing the post-training quantization sample set to obtain a quantization neural network model.

In some embodiments, step C comprises:

calculating an average value and a standard deviation of the plurality of output data according to the plurality of output data;

dividing the region outside the standard range interval into a plurality of non-standard intervals according to the distance between the region and the standard range interval, and respectively setting the reference value of the standard range interval and the reference value of the non-standard interval.

In some embodiments, the determining of the standard range interval from the mean and the standard deviation in the above description comprises:

adding the average value and the standard deviation to obtain a right boundary;

determining the interval between the left boundary and the right boundary as a standard range interval;

and the number of the first and second groups,

in the above description, the method of dividing the region other than the standard range section into a plurality of non-standard sections according to the distance from the standard range section, and setting the reference value of the standard range section and the reference value of the non-standard section respectively includes:

and setting a reference value of each non-standard interval according to the interval step length between each non-standard interval and the standard range interval.

In some embodiments, the neural network quantization processing method of the embodiments of the present disclosure further includes:

the standard deviation is set as the interval step.

In some embodiments, step D comprises:

alternatively, the first and second electrodes may be,

In some embodiments, the neural network quantization processing method of the embodiment of the present disclosure further includes:

alternatively, the first and second electrodes may be,

In some embodiments, the dividing, in the step G, into a plurality of block sections according to the obtained part or all of the distribution value data, and acquiring at least one sample corresponding to each block section, further includes:

monotonously ordering part or all of the distribution value data;

the monotonously sorted distribution value data is divided into a plurality of block sections, and at least one sample data corresponding to each block section is acquired.

In some embodiments, the number of sample data acquired respectively corresponding to each of the block sections is equal.

With regard to the neural network quantization processing method in the above embodiment, further specific real-time manners in each step can be referred to the descriptions in each embodiment of the neural network quantization processing method related to fig. 1, fig. 2, and fig. 3 in the foregoing, and the detailed descriptions are not provided herein.

Fig. 6 is a schematic structural diagram of another neural network quantization processing apparatus according to an embodiment of the present disclosure, and as shown in fig. 6, the neural network quantization processing apparatus includes a sample input module 601, an output data acquisition module 602, an interval range determination module 603, a distribution value sub-data acquisition module 604, a distribution value data acquisition module 605, a multi-sample distribution value data acquisition module 606, an interval division and sample acquisition module 607, and a training module 608.

A sample input module 601 configured to perform: inputting a kth sample in a sample set into at least one normalization layer in a neural network model containing N normalization layers, wherein the sample set comprises M samples, the value of M is a positive integer larger than 1, the kth sample is any one of the M samples, and the value of N is a positive integer larger than 0.

An output data acquisition module 602 configured to perform: obtaining a plurality of output data for each of the at least one normalization layer.

An interval range determination module 603 configured to perform: and determining at least two interval ranges according to the output data for each normalization layer, wherein each interval range has a corresponding reference value.

A distribution value sub data obtaining module 604 configured to perform: and aiming at each normalization layer, obtaining the sub-data of the distribution values according to the reference value of the interval range in which each output data is positioned.

A distribution value data acquisition module 605 configured to perform: and calculating to obtain the distribution value data of the k sample according to the distribution value sub-data in the at least one normalization layer.

A multi-sample distribution value data acquisition module 606 configured to perform: a sample input module 601, an output data acquisition module 602, an interval range determination module 603, a distribution value sub-data acquisition module 604, and a distribution value data acquisition module 605 are called to obtain distribution value data corresponding to each sample, respectively.

An interval segment dividing and sample acquiring module 607 configured to perform: dividing the obtained partial or all distribution value data into a plurality of sections, obtaining at least one sample corresponding to each section, and forming a post-training quantized sample set.

A training module 608 configured to perform: and training the neural network model by utilizing the post-training quantization sample set to obtain a quantization neural network model.

With regard to the neural network quantization processing apparatus in the above embodiment, the specific manner in which each unit performs the operation has been described in detail in the embodiments related to the neural network quantization processing method and the aforementioned neural network quantization processing method related to fig. 1, fig. 2, and fig. 3, and will not be described in detail here.

The disclosed embodiments also provide a non-volatile computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the steps of the neural network quantization processing method as described in the above description.

The embodiment of the present disclosure also provides an electronic device at the same time, as shown in fig. 7, the electronic device includes: at least one processor 701 and a memory 702. The memory 702 is communicatively coupled to the at least one processor 701, for example, the memory 702 and the at least one processor 701 are coupled via a bus. The memory 702 stores instructions executable by the at least one processor 701 to cause the at least one processor 701 to perform the steps of the neural network quantization processing method as described in the above description.

The disclosed embodiments also provide a computer program product including a computer program/instructions, which when executed by a processor implement the steps in the neural network quantization processing method as described in the above description.

The present disclosure is to be considered as limited only by the preferred embodiments and not limited to the specific embodiments described herein, and all changes, equivalents, and modifications that come within the spirit and scope of the disclosure are desired to be protected.

Claims

1. A neural network quantization processing method comprises the following steps:

2. The neural network quantization processing method according to claim 1, wherein the step C includes:

3. The neural network quantization processing method of claim 2, wherein:

determining a standard range interval according to the average value and the standard deviation, including:

adding the average value and the standard deviation to obtain a right boundary;

and the number of the first and second groups,

4. The neural network quantization processing method according to claim 3, wherein the neural network quantization processing method further includes:

setting the standard deviation as the interval step.

5. The neural network quantization processing method according to claim 1, wherein the step D includes:

alternatively, the first and second electrodes may be,

6. The neural network quantization processing method according to claim 5, wherein the neural network quantization processing method further includes:

alternatively, the first and second electrodes may be,

7. The neural network quantization processing method according to claim 1, wherein the dividing of the distribution value data obtained in step G into a plurality of blocks is performed based on a part or all of the obtained distribution value data, and at least one sample corresponding to each of the blocks is acquired, further comprising:

monotonously ordering part or all of the distribution value data;

8. The neural network quantization processing method of claim 7, wherein:

and the quantity of the acquired sample data respectively corresponding to each section is equal.

9. A neural network quantization processing apparatus, comprising:

10. A non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the neural network quantization processing method of any one of claims 1 to 8.

11. An electronic device, comprising:

at least one processor; and the number of the first and second groups,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a neural network quantization processing method of any one of claims 1 to 8.