CN112541857A

CN112541857A - Image characterization method and system based on performance enhancement neural network batch normalization

Info

Publication number: CN112541857A
Application number: CN202011551847.0A
Authority: CN
Inventors: 程明明; 高尚华; 韩琦
Original assignee: Nankai University
Current assignee: Shenzhen MicroBT Electronics Technology Co Ltd
Priority date: 2020-12-24
Filing date: 2020-12-24
Publication date: 2021-03-23
Anticipated expiration: 2040-12-24
Also published as: CN112541857B

Abstract

The invention discloses an image characterization method and system based on enhancing expressive force neural network batch normalization, which comprises the following steps: acquiring a target image; extracting the characteristics of a target image; centering calibration is carried out on the characteristics of the target image to obtain a result after centering calibration; performing centering normalization processing on the result after centering calibration to obtain a centering normalization result; carrying out scaling normalization processing on the centered normalization result to obtain a scaling normalization result; carrying out scaling calibration on the scaling normalization result to obtain a scaling calibration result; carrying out affine transformation on the scaling calibration result to obtain output characteristics; the image is characterized based on the output features. According to the method, the accurate representation of the image is realized by adopting an improved batch normalization mode for the neural network model.

Description

Image characterization method and system based on performance enhancement neural network batch normalization

Technical Field

The application relates to the technical field of computer vision, in particular to an image characterization method and system based on performance enhancement neural network batch normalization.

Background

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.

Convolutional Neural Networks (CNNs) improve the performance of various computer vision tasks with their powerful representation capabilities. CNNs face more training difficulties as the structural complexity and model parameters grow. Batch normalization (BatchNorm) alleviates the training difficulty by constraining the intermediate features within the normalized distribution using minimum batch statistics. Various standardization techniques have been proposed in the prior art to achieve more efficient feature standardization and task-specific feature transformation. Statistical normalization statistical information calculated from different dimensions and regions is used for feature non-normalization. BatchNorm utilizes small batches of statistical information to normalize intermediate functions and stabilize training. In contrast, GhostNorm obtains statistical information about small virtual batches to reduce generalization errors. EvalNorm reevaluates the normalized statistics during the evaluation process. Kalman norm estimates the statistics of one layer and its previous layer. LayerNorm, instanceNorm and GroupNorm normalization functions, with statistical information from channel, sample and channel group dimensions, respectively. Instead of using all pixels in one dimension to compute the statistics, the local normalization technique utilizes the statistics of neighboring regions. Normalization using small-batch independent statistics can improve the stability of the model when the small-batch statistics are particularly inaccurate, but due to the lack of batch processing information, training instability can in many cases make its performance inferior to BatchNorm. MixtureNorm decomposes the distribution into different modes by a gaussian mixture model and independent normalization features in each mode. Multiple normalization combining methods such as this typically require additional computational cost to normalize features between different dimensions.

In BatchNorm, the dependence on small batches of information is based on the following assumptions: elements generated from different instances fit the same distribution within a channel. However, this assumption cannot always be true in the following two cases: i) inconsistency may exist between the small batch of statistical data in training and the operational statistical data in testing; ii) instances in the test set may not always belong to a distribution in the training set. To avoid the side effects of these two inconsistencies, some work has used instance-specific statistics rather than mini-batch statistics to specify intermediate functions. However, due to the lack of batch information, training instability makes its performance inferior to BatchNorm in many cases. Other works take advantage of small lot and instance statistics by incorporating multiple normalization techniques or an attention-drawing mechanism. However, these methods usually incur more overhead, making them unfriendly to practical use.

The inventor finds that the neural network model of the general normalization layer used in the prior art lacks batch statistical information or sample-level information, and cannot realize accurate characterization of images.

Disclosure of Invention

In order to overcome the defects of the prior art, the application provides an image characterization method and system based on performance enhancement neural network batch normalization;

in a first aspect, the application provides an image characterization method based on enhanced expression neural network batch normalization;

the image characterization method based on the enhanced expression neural network batch normalization comprises the following steps:

acquiring a target image; extracting the characteristics of a target image;

centering calibration is carried out on the characteristics of the target image to obtain a result after centering calibration;

performing centering normalization processing on the result after centering calibration to obtain a centering normalization result;

carrying out scaling normalization processing on the centered normalization result to obtain a scaling normalization result;

carrying out scaling calibration on the scaling normalization result to obtain a scaling calibration result;

carrying out affine transformation on the scaling calibration result to obtain output characteristics; the image is characterized based on the output features.

In a second aspect, the application provides an image characterization system based on enhanced performance neural network batch normalization;

the image characterization system based on the enhanced expression neural network batch normalization comprises:

an acquisition module configured to: acquiring a target image; extracting the characteristics of a target image;

a centering calibration module configured to: centering calibration is carried out on the characteristics of the target image to obtain a result after centering calibration;

a centering normalization module configured to: performing centering normalization processing on the result after centering calibration to obtain a centering normalization result;

a scaling normalization module configured to: carrying out scaling normalization processing on the centered normalization result to obtain a scaling normalization result;

a scaling calibration module configured to: carrying out scaling calibration on the scaling normalization result to obtain a scaling calibration result;

an affine transformation module configured to: carrying out affine transformation on the scaling calibration result to obtain output characteristics; the image is characterized based on the output features.

In a third aspect, the present application further provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs are stored in the memory, and when the electronic device is running, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first aspect.

In a fourth aspect, the present application also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.

In a fifth aspect, the present application also provides a computer program (product) comprising a computer program for implementing the method of any of the preceding first aspects when run on one or more processors.

Compared with the prior art, the beneficial effects of this application are:

the method realizes accurate representation of the image through a neural network batch normalization mode of the neural network model based on the enhanced expressive force.

Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a flow chart of a normalization method for enhancing expressiveness according to a first embodiment.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.

We focus on the normalization operation consisting of the feature zero mean and variance scaling operations. During training, based on the small batch of statistical information, a characteristic zero mean may ensure that elements have a zero mean property, and a scaling operation may cause elements to have a unit variance. The zero mean and unit variance attributes of the elements cannot be maintained throughout the test due to statistical and instance inconsistencies. Centering with an improper running average can cause the centered feature to contain additional noise or lose some useful representation after activation. When the average of the test case features is below the running average, activation may mistakenly delete some representative features. When the average of the feature is greater than the running average, the activation cannot filter out noise of smaller values. Also, improper operational changes may cause the scaling operation to generate scaled features that are too small or too large in intensity, resulting in unstable feature distributions between channels. In this work, we propose a batch normalization approach with enhanced performance to solve the above problem.

Example one

The embodiment provides an image characterization method based on enhancing expressive force neural network batch normalization;

s101: acquiring a target image; extracting the characteristics of a target image;

s102: centering calibration is carried out on the characteristics of the target image to obtain a result after centering calibration;

s103: performing centering normalization processing on the result after centering calibration to obtain a centering normalization result;

s104: carrying out scaling normalization processing on the centered normalization result to obtain a scaling normalization result;

s105: carrying out scaling calibration on the scaling normalization result to obtain a scaling calibration result;

s106: carrying out affine transformation on the scaling calibration result to obtain output characteristics; the image is characterized based on the output features.

As one or more embodiments, the S102: centering calibration is carried out on the characteristics of the target image to obtain a result after centering calibration; the method comprises the following specific steps:

X_cm(n,c,h,w)＝X_(n,c,h,w)+w_m⊙K_m (1)

wherein, X_(n,c,h,w)Features representing a target image; x_cm(n,c,h,w)Representing centered calibrated knotsFruit;

n represents the current feature batch number; c represents the number of channels; h represents a feature height; w represents the feature width;

w_m∈R^1×C×1×1is a weight vector that can be learned, C represents the number of channels;

K_mis X_(n,c,h,w)M is a subscript, and has no practical significance;

K_m∈R^N×C×1×1or K_m∈R^N×1×H×wN represents the current feature batch number; h represents a feature height; w represents the feature width;

as an example, a dot-product operator, K, which broadcasts two operands to the same shape and then performs a dot-product operation_mUsed as the mean value μ_c；

As one or more embodiments, the S103: performing centering normalization processing on the result after centering calibration to obtain a centering normalization result; the method comprises the following specific steps:

centering the centering calibration result obtained in the step S102 to obtain a centered feature X_m：

X_m＝X_cm-E(X_cm)

E represents the statistically derived expectation during the training process.

As one or more embodiments, the S104: carrying out scaling normalization processing on the centered normalization result to obtain a scaling normalization result; the method comprises the following specific steps:

scaling and normalizing the centered features obtained in the step S103 to obtain features X_s：

Var represents the variance statistic during training.

As one or more embodiments, the S105: carrying out scaling calibration on the scaling normalization result to obtain a scaling result; the method comprises the following specific steps:

given an input feature X_s(n,c,h,w)Will calibrate feature X_cs(n,c,h,w)Write as:

X_cs(n,c,h,w)＝X_s(n,c,h,w)·R(w_v⊙K_s+w_b) (2)

wherein, w_v,w_b∈R^1×C×1×1Is a weight vector which can be learned, R is a contracting function, the Sigmoid function is used as R, an indicator is a dot product operator which broadcasts two operands to the same shape and then performs dot product operation; here K_sIs X_s(n,c,h,w)Is used as the variance

As one or more embodiments, in step S106, affine transformation is performed on the scaling result to obtain an output feature; the method comprises the following specific steps:

performing radial transformation on the scaling calibration result obtained in the step S105 to obtain Y:

Y＝X_csγ+β

and gamma and beta respectively represent a scale parameter and an offset parameter and are obtained by learning in the training process.

We intend to enhance the example specific representation capabilities and retain the benefits of BatchNorm. In this work, we focus on the feature normalization operation consisting of feature centering (zero-averaging) and feature scaling. Our proposed "performance enhancement batch normalization" function provides a simple and efficient feature calibration scheme that enhances the example-specific features and produces a more stable feature distribution.

Statistical information specific to some example functions is used to calibrate the BatchNorm interrupt routineLine statistics is necessary. The method mainly studies channel dimensional statistics, since BatchNorm aims at counting channel statistics. Statistical, mean μ of channel dimensions_cSum variance

The characteristic channels are given as follows:

in light of the above, two calibration mechanisms are proposed:

(1) feature centering calibration

(2) Feature scaling calibration

(1) And (3) feature centralization calibration:

to reduce the dependence of the centering operation on the running average, we add a centering calibration scheme driven by the example statistical information. And (5) establishing centering calibration. The centering calibration is added prior to the centering operation of the original BatchNorm layer. Given an input feature X, the centered calibration of the feature is as follows:

X_cm(n,c,h,w)＝X_(n,c,h,w)+w_m⊙K_m (1)

wherein w_m∈R^1×C×1×1Is a learnable weight vector, K_mIs a characteristic statistic K of X_m∈R^N×C×1×1Or K_m∈R^N×1×H×WAn-is a dot-product operator that broadcasts two operands to the same shape and then performs a dot-product operation. Here K_mCan be used as mu_c.

(2) Calibrating characteristic scaling:

unlike the centering operation, which determines the features to be retained after activation, the scaling operation alters the feature strength while ignoring affine transformation effects. The scaling operation scales the element to have a unit variance with the running variance value. However, scaling features with inaccurate running variance can result in unstable feature intensities, and we propose scaling calibration to calibrate feature intensities based on example statistics.

And (5) making scaling calibration. We add the scaling calibration after the original scaling operation.

Given the input features Xs, the calibration features are written as

X_css(n,c,h,w)＝X_s(n,c,h,w)·R(w_v⊙K_s+w_b) (2)

Wherein w_v,w_b∈R^1×C×1×1In this algorithm, using the Sigmoid function as R, the dot-product operator that broadcasts two operands to the same shape and then performs a dot-product operation. Here K_sIs X_sIs used as

Combining (1) and (2), the overall flow chart of the algorithm can be represented as fig. 1.

We tested the experimental results on the public data set ImageNet and compared to the previous batchnorm (bn) with a large performance improvement using our enhanced performance batch normalization operation (RBN), where the error rate and top 5 classes of error rate performance are shown in table 1.

TABLE 1 error Rate and first class 5 error Rate Performance

Example two

The embodiment provides an image characterization system based on enhanced expressive force neural network batch normalization;

It should be noted here that the above-mentioned obtaining module, centering calibration module, centering normalization module, scaling calibration module and affine transformation module correspond to steps S101 to S106 in the first embodiment, and the above-mentioned modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the contents disclosed in the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.

In the foregoing embodiments, the descriptions of the embodiments have different emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The proposed system can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules may be combined or integrated into another system, or some features may be omitted, or not executed.

EXAMPLE III

The present embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein, a processor is connected with the memory, the one or more computer programs are stored in the memory, and when the electronic device runs, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first embodiment.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.

The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Example four

The present embodiments also provide a computer-readable storage medium for storing computer instructions, which when executed by a processor, perform the method of the first embodiment.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. The image characterization method based on the enhanced expression neural network batch normalization is characterized by comprising the following steps of:

acquiring a target image; extracting the characteristics of a target image;

2. The method for image characterization based on enhanced performance neural network batch normalization of claim 1,

centering calibration is carried out on the characteristics of the target image to obtain a result after centering calibration; the method comprises the following specific steps:

X_{cm(n，c，h，w)}＝X_{(n，c，h，w)}+w_m⊙K_m (1)

wherein, X_{(n，c，h，w)}Features representing a target image; x_{cm(n，c，h，w)}Represents the result after centering calibration;

w_m∈R^1×C×1×1is the learned weight vector, C represents the number of channels;

K_mis X_{(n，c，h，w)}M is a subscript, and has no practical significance;

3. The method for image characterization based on enhanced performance neural network batch normalization of claim 1,

performing centering normalization processing on the result after centering calibration to obtain a centering normalization result; the method comprises the following specific steps:

centering the obtained centering calibration result to obtain centered feature X_m：

X_m＝X_cm-E(X_cm)

E represents the statistically derived expectation during the training process.

4. The method for image characterization based on enhanced performance neural network batch normalization of claim 1,

carrying out scaling normalization processing on the centered normalization result to obtain a scaling normalization result; the method comprises the following specific steps:

scaling and normalizing the obtained centered features to obtain features X_s：

V_arRepresenting variance statistics during training.

5. The method for image characterization based on enhanced performance neural network batch normalization of claim 1,

carrying out scaling calibration on the scaling normalization result to obtain a scaling result; the method comprises the following specific steps:

given an input feature X_{s(n，c，h，w)}Will calibrate feature X_{cs(n，c，h，w)}Write as:

X_{cs(n，c，h，w)}＝X_{s(n，c，h，w)}·R(w_v⊙K_s+w_b) (2)

wherein, w_v，w_b∈R^1×C×1×1Is a weight vector which can be learned, R is a contracting function, the Sigmoid function is used as R, an indicator is a dot product operator which broadcasts two operands to the same shape and then performs dot product operation; here K_sIs X_{s(n，c，h，w)}Is used as the variance

6. The method for image characterization based on enhanced performance neural network batch normalization of claim 1,

carrying out affine transformation on the scaling result to obtain output characteristics; the method comprises the following specific steps:

and carrying out radial transformation on the obtained scaling calibration result to obtain Y:

Y＝X_csγ+β

7. The image characterization system based on the enhanced expression neural network batch normalization is characterized by comprising the following steps:

8. The image characterization system based on enhanced performance neural network batch normalization of claim 7,

X_{cm(n，c，h，w)}＝X_{(n，c，h，w)}+w_m⊙K_m (1)

wherein, X_{(n，c，h，w)}Features representing a target image; x_{cm(n，c，h，w)}Representing centered calibrated knotsFruit;

K_mis X_{(n，c，h，w)}M is a subscript, and has no practical significance;

9. An electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of any of the preceding claims 1-6.

10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 6.