CN112270343A

CN112270343A - Image classification method and device and related components

Info

Publication number: CN112270343A
Application number: CN202011110384.4A
Authority: CN
Inventors: 杨宏斌; 金良; 赵雅倩; 董刚; 刘海威; 蒋东东; 胡克坤; 李仁刚
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2021-01-26
Also published as: WO2022077894A1

Abstract

The application discloses an image classification method, an image classification device, electronic equipment and a computer-readable storage medium, wherein the image classification method comprises the steps of constructing a TELU activation function according to a function corresponding to an ELU negative half axis and a TLU function; performing convolution calculation on input image data to obtain characteristic data; acquiring target data after mean square normalization processing corresponding to the c channel and the b batch in the current layer of the neural network according to the characteristic data; obtaining a characteristic diagram corresponding to the target data through a TELU activation function; and obtaining an image classification result according to all the feature maps. The application can solve the problems that FRN lacks deviation problem that mean value is centralized to cause that activation is far away from 0 value, and neuron death caused by gradient disappearance of an input part is caused, has soft saturation characteristic when small value is taken by input, reduces forward propagation change and information, improves the overall performance of a neural network, and enables image classification results to have higher accuracy.

Description

Image classification method and device and related components

Technical Field

The present application relates to the field of image classification, and in particular, to an image classification method, an image classification device, and related components.

Background

BN (Batch Normalization) is a technique with milestone significance in the field of deep learning, and can train various networks and promote the development of the field of computer vision to a great extent. The BN normalizes the features by calculating the mean and variance in the (mini-) batch, simplifies the optimization process of the network and enables a deeper neural network to converge in training. However, normalization along the dimension of the batch has many problems, and when the batch size becomes smaller, due to inaccurate estimation of statistical information of the batch, the error rate of network models such as image classification and the like rises sharply, and the use of BN in training larger models is limited.

In order to solve the above technical problem, the prior art adopts a TLU-based FRN normalization scheme, which does not have a batch dependency, and operates independently on each activation channel (filter response) of each sample, and the accuracy of each batch size shows stability and consistency, but TLU is an improvement based on ReLU, so it has some disadvantages of ReLU itself, because the output value of ReLU has no negative value, the output average value is greater than 0, when the average value of activation values is not 0, a bias is caused to the next layer, and if the activation values do not cancel each other (i.e. the average value is not 0), a bias shift is caused to the activation unit of the next layer. By means of the superposition, the more units are, the larger bias shift is, so that the problem of unconvergence of the ReLU can occur when some ultra-deep networks are trained, and due to some inherent characteristics of the ReLU activation function, the situation that the gradient of the input negative half shaft part disappears and the weight cannot be updated easily can be caused, the derivative of the ReLU can become 0 when the input is negative, and the disappearance of the gradient can cause the problem of neuron death, so that the overall performance of the neural network is influenced, and the image classification accuracy is low.

Therefore, how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

The application aims to provide an image classification method, an image classification device, electronic equipment and a computer-readable storage medium, which can solve the problems of deviation of activation far from a 0 value caused by the fact that an FRN is lack of a mean value and is centered and neuron death caused by the fact that a gradient disappears at an input part, have the characteristic of soft saturation when a small value is input, reduce the change and information of forward propagation, improve the overall performance of a neural network and enable an image classification result to have higher accuracy.

In order to solve the above technical problem, the present application provides an image classification method, including:

constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;

performing convolution calculation on input image data to obtain characteristic data;

acquiring target data after mean square normalization processing corresponding to a c channel and a b batch in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;

obtaining a characteristic diagram corresponding to the target data through the TELU activation function;

and obtaining an image classification result according to all the feature maps.

Preferably, the tel u activation function is:

or the like, or, alternatively,

where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.

Preferably, the tensor of the feature data is represented as [ B, W, H, C ], B is mini batch size, C is the number of filters in convolution, W is the width of the feature data, and H is the height of the feature data.

Preferably, the process of obtaining the target data after the mean square normalization processing corresponding to the c-th channel and the b-th batch in the current layer of the neural network according to the feature data specifically includes:

obtaining the filter response vectors of the c channel and the b batch in the current layer of the neural network according to the characteristic data;

respectively carrying out mean square normalization processing on each vector to obtain a mean value of the response of the filter;

and carrying out linear transformation on the average value to obtain target data.

Preferably, the process of respectively performing mean-square normalization processing on each vector to obtain the mean value of the filter response specifically includes:

respectively carrying out mean square normalization processing on each vector through a first relational expression;

obtaining the average value of the filter response through a second relational expression;

the first relation is upsilon²＝∑_ix_i ²/N；

The second relation is

Wherein X ═ X_b,；,；c∈R^NVector of response to filter for the c channel, the b batch, x_iIs the ith element in the x set, upsilon²Is a mean-square normalization of x,

for the mean value, ε is a normal number, and N ═ W × H.

Preferably, the process of performing linear transformation on the mean value to obtain the target data includes:

carrying out linear transformation on the average value through a third relational expression to obtain target data, wherein the third relational expression is

Where γ is a trainable scaling factor and β is a trainable offset.

In order to solve the above technical problem, the present application further provides an image classification apparatus, including:

the construction module is used for constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;

the convolution calculation module is used for carrying out convolution calculation on the input image data to obtain characteristic data;

the normalization processing module is used for acquiring target data after mean square normalization processing corresponding to the c channel and the b batch in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;

the activation module is used for obtaining a characteristic diagram corresponding to the target data through the TELU activation function;

and the classification module is used for obtaining an image classification result according to all the characteristic graphs.

Preferably, the tel u activation function is:

or the like, or, alternatively,

Preferably, the normalization processing module includes:

the acquisition unit is used for acquiring the vector of the filter response of the c channel and the b batch in the current layer of the neural network according to the characteristic data;

the processing unit is used for respectively carrying out mean square normalization processing on each vector to obtain a mean value of the filter response;

and the transformation unit is used for carrying out linear transformation on the average value to obtain target data.

Preferably, the processing unit is specifically configured to:

the first relation is upsilon²＝∑_ix_i ²/N；

The second relation is

for the mean value, ε is a normal number, and N ═ W × H.

Preferably, the transformation unit is specifically configured to:

Where γ is a trainable scaling factor and β is a trainable offset.

In order to solve the above technical problem, the present application further provides an electronic device, including:

a memory for storing a computer program;

a processor for implementing the steps of the image classification method as claimed in any one of the above when said computer program is executed.

To solve the above technical problem, the present application further provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the image classification method according to any one of the above.

The application provides an image classification method, because the new construction TELU function has introduced the function that ELU negative half axle corresponds, consequently can get the negative value, let the unit activation mean value more be close to 0, can solve FRN and lack the mean value and center the deviation problem that the activation that causes kept away from the 0 value, the problem of the dead neuron that the input part arouses because of the gradient disappears has been solved simultaneously, activation function's performance has been increased, because TELU function negative half section part slope is less, when the input takes the less value, have soft saturated characteristic, reduce the change and the information of forward propagation, the robustness to the noise has been strengthened, the wholeness of neural network has been improved, thereby make the image classification result more accurate. The application also provides an image classification device, an electronic device and a computer readable storage medium, which have the same beneficial effects as the image classification method.

Drawings

In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.

FIG. 1 is a flowchart illustrating steps of an image classification method according to the present application;

FIG. 2a is a diagram of an ELU function provided herein;

FIG. 2b is a TELU function curve provided herein;

fig. 3a is a schematic diagram of a ReLU activation function provided in the present application;

FIG. 3b is a schematic diagram of a ReLU (y- τ) activation function provided herein;

FIG. 3c is a schematic diagram of a max (y, τ) activation function provided herein;

FIG. 4 is a schematic diagram of an improved FRN layer normalization and activation process provided herein;

fig. 5 is a schematic structural diagram of an image classification apparatus provided in the present application;

fig. 6 is a schematic structural diagram of an electronic device provided in the present application.

Detailed Description

The core of the application is to provide an image classification method, an image classification device, electronic equipment and a computer-readable storage medium, which can solve the problems of deviation of activation far from a 0 value caused by the fact that FRN lacks mean value centering and neuron death caused by disappearance of gradient of an input part, have the characteristic of soft saturation when a small value is input, reduce the change and information of forward propagation, improve the overall performance of a neural network and enable an image classification result to have higher accuracy.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart illustrating steps of an image classification method according to the present application, the image classification method including:

s101: constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;

first, please refer to fig. 2a, where fig. 2a is an ELU function curve provided by the present application, and the expression thereof is:

wherein alpha is more than 0. Considering that the lack of mean centering of FRN results in any deviation of activation away from the 0 value, the deviation, in combination with ReLU, results in a decrease in accuracy, and therefore, a learnable threshold, τ, is added to ReLU, defining TLU as z ═ max (y, τ), because max (y, τ) max (y- τ,0) + τ ReLU (y- τ) + τ, the effect of TLU activation is equivalent to having a shared bias before and after ReLU, the FRN normalization performance based on TLU is significantly better than other normalization methods, especially when batch comparison is small, TLU is slightly improved based on ReLU, therefore, the method has some advantages of the ReLU, such as fast convergence speed, especially the convergence speed in SGD algorithm is obviously faster than sigmoid and tanh, this is because ReLU solves the problem of gradient disappearance when sigmoid and tanh are far from 0, if the calculation complexity is low, an exponential operation is not needed, and the activation value can be obtained only through one threshold value. The activation functions of ReLU, ReLU (y- τ) and max (y, τ) are shown in FIG. 3 a-FIG. 3c, respectively.

Specifically, on the basis of the above scheme, the application introduces a function curve of an ELU negative half axis, and improves the TLU to obtain a TELU (threshold explicit Linear unit) function, the curve of the TELU function is shown in FIG. 2b, the TELU combines the advantages of the TLU and the ELU, and the expression of the TELU function is:

or the like, or, alternatively,

Accordingly, referring to fig. 4, fig. 4 is a schematic diagram of an improved FRN layer normalization and activation process provided by an embodiment of the present application.

S102: performing convolution calculation on input image data to obtain characteristic data;

specifically, convolution calculation is performed on input image data to obtain feature data, where the tenor of the feature data is represented as [ B, W, H, C ], B is mini batch size, C is the number of filters in the convolution, H is the height of the feature data, and W is the width of the feature data.

The input image data can be image data corresponding to a security monitoring video, image data correspondingly acquired in an automatic driving process and image data corresponding to a streaming media online video, and the application field of the input image data is not specifically limited.

S103: acquiring target data after mean square normalization processing corresponding to a channel c and a batch b in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;

specifically, let X be X_b,；,；c∈R^NTo indicate the c-th channel c^thThe b th batch b^thThe vector of filter response of (1), where N is W × H, is subjected to a mean square normalization process on the vector x of each batch and each channel according to a first relation, where the first relation is ν²＝∑_ix_i ²N, then calculating the mean value of the filter response by a second relation, wherein the second relation is

Then, the mean value is linearly transformed according to a third relation to compensate the loss of representation capability possibly caused by normalization

Wherein x is_iIs the ith element in x, upsilon²Is a mean-square normalization of x,

to mean, ε is a small positive constant to prevent divide-by-0 errors, γ is a trainable scaling factor, and β is a trainable offset.

S104: obtaining a characteristic diagram corresponding to the target data through a TELU activation function;

s105: and obtaining an image classification result according to all the feature maps.

It can be understood that, the activation operation of the target data by the tel function obtains the corresponding feature map, S101-S104 complete the normalized activation of one layer in the neural network, and for other convolutional layers in the neural network, S101-S104 are also executed, and all feature maps are subjected to feature classification, and the output layer of the neural network outputs the image classification result.

Specifically, compared with the ReLU, the activation function tel formed by combining the TLU with the ELU in the present application inherits the adjustment of the TLU to the problem that the activation of the FRN result deviates from a 0 value due to lack of mean centering, and introduces many advantages of the ELU, the tel may take a negative value, and may make the cell activation mean closer to 0, similar to the effect of Batch Normalization, but requiring lower computational complexity. Since TELU increases the treatment for the gradient 0 portion, the problem of neuronal death can be solved and it speeds up the training speed, evidence suggests that mean activation close to 0 can make training faster. And the slope of the negative half-segment part of the TELU is smaller, so that the method has the characteristic of soft saturation when a smaller value is input, the change and the information of forward propagation are reduced, the robustness of noise is enhanced, the accuracy of the whole network is further improved, and the image classification result is more accurate. As can be seen from fig. 2a, α is a tunable parameter that controls when the negative part of the ELU saturates, and the activation function TELU has a significant saturation plateau in its negative state, enabling learning of a more robust and stable representation.

Specifically, the improved FRN layer gradient calculation process is as follows:

because all ofThe conversion of (2) is performed along the channels and the present embodiment only derives the gradient per channel. Assuming a certain level in the neural network, according to the second relation, the TELU function and the fourth relation

The transformation of (a) is to send x to the FRN layer for computation, and its output is z (see FIG. 4. let f (z) be the mapping of the neural network applied to z, with inverse gradient of

The parameters γ, β, and τ are vectors for each channel, then:

the gradient of the update to τ is:

the gradient updated to y is:

where z is_bIs b^thThe activation vector per channel of batch.

Further, when y is more than or equal to tau,

when y ∈ [ - τ, τ),

when y < - τ is greater than the value,

the gradient for γ is as follows:

further:

when y is more than or equal to tau,

when y ∈ [ - τ, τ),

when y < - τ is greater than the value,

the gradient for β is as follows:

further:

when y is more than or equal to tau,

when y ∈ [ - τ, τ),

when y < - τ is greater than the value,

by

It can be derived that:

according to the above formula and

then

When y is more than or equal to tau,

when y ∈ [ - τ, τ),

when y < - τ is greater than the value,

it can be seen from the above relation that when yi < - τ, the gradient of TELU is not 0, and through reverse derivation, yi < - τ is obtained

The input value is not 0, so that the problem of neuron death caused by gradient disappearance of the negative half input of the TLU is solved, the slope of the negative half input of the TLU is small, the negative half input of the TLU has the characteristic of soft saturation when the input is small, forward propagation changes and information are reduced, and the robustness to noise is enhanced. In addition, alpha is used as a tunable parameter to control when the ELU negative part is saturated, thereby further enhancing the flexibility of gradient control of the part.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an image classification apparatus provided in the present application, the image classification apparatus including:

the construction module 11 is used for constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;

a convolution calculation module 12, configured to perform convolution calculation on input image data to obtain feature data;

the normalization processing module 13 is configured to obtain target data after mean square normalization processing corresponding to a c-th channel and a b-th batch in a current layer of the neural network according to the feature data, where b and c are positive integers;

the activation module 14 is configured to obtain a feature map corresponding to the target data through a TELU activation function;

and the classification module 15 is used for obtaining an image classification result according to all the feature maps.

It can be seen that, the new-structured tel function in this embodiment introduces the function corresponding to the negative half axis of the ELU, so that a negative value can be taken, and the unit activation mean value is closer to 0, so that the problem of deviation of activation away from the 0 value caused by the fact that the FRN lacks the mean value in the middle can be solved, and the problem of neuron death caused by the disappearance of the gradient in the input part can be solved, thereby increasing the expressive ability of the activation function, because the slope of the negative half segment of the tel function is smaller, when the input takes a smaller value, the soft saturation characteristic is provided, the change and the information of forward propagation are reduced, the robustness to noise is enhanced, the overall performance of the neural network is improved, and thus the image classification result is more accurate.

As a preferred embodiment, the TELU activation function is:

or the like, or, alternatively,

As a preferred embodiment, the tensor of the feature data is represented as [ B, W, H, C ], B is the mini batch size, C is the number of filters in the convolution, W is the width of the feature data, and H is the height of the feature data.

As a preferred embodiment, the normalization processing module 13 includes:

the acquisition unit is used for acquiring the filter response vectors of the c channel and the b batch in the current layer of the neural network according to the characteristic data;

the processing unit is used for respectively carrying out mean square normalization processing on each vector to obtain a mean value of filter response;

As a preferred embodiment, the processing unit is specifically configured to:

the first relation is upsilon²＝∑_ix_i ²/N；

The second relation is

as an average, ∈ is a normal number, and N ═ W × H.

As a preferred embodiment, the transformation unit is specifically configured to:

carrying out linear transformation on the mean value through a third relational expression to obtain target data, wherein the third relational expression is

Where γ is a trainable scaling factor and β is a trainable offset.

On the other hand, the present application further provides an electronic device, as shown in fig. 6, which shows a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device according to the embodiment may include: a processor 21 and a memory 22.

Optionally, the electronic device may further comprise a communication interface 23, an input unit 24 and a display 25 and a communication bus 26.

The processor 21, the memory 22, the communication interface 23, the input unit 24 and the display 25 are all communicated with each other through a communication bus 26.

In the embodiment of the present application, the processor 21 may be a Central Processing Unit (CPU), an application specific integrated circuit, a digital signal processor, an off-the-shelf programmable gate array or other programmable logic device, etc.

The processor may call a program stored in the memory 22. Specifically, the processor may perform operations performed on the electronic device side in the embodiments of the image classification method described below.

The memory 22 is used for storing one or more programs, the program may include program codes, the program codes include computer operation instructions, and in the embodiment of the present application, the memory stores at least the program for realizing the following functions:

acquiring target data after mean square normalization processing corresponding to a channel c and a batch b in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;

obtaining a characteristic diagram corresponding to the target data through a TELU activation function;

and obtaining an image classification result according to all the feature maps.

In one possible implementation, the memory 22 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a mean square normalization calculation function, etc.), and the like; the storage data area may store data created according to the use of the computer.

Further, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid state storage device.

The communication interface 23 may be an interface of a communication module, such as an interface of a GSM module.

The present application may also include a display 24 and an input unit 25, etc.

Of course, the structure of the internet of things device shown in fig. 6 does not constitute a limitation on the internet of things device in the embodiment of the present application, and in practical applications, the electronic device may include more or less components than those shown in fig. 6, or some components in combination.

In another aspect, embodiments of the present application also disclose a computer-readable storage medium, where the computer-readable storage medium includes Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable hard disk, a CD-ROM, or any other form of storage medium known in the art. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of:

and obtaining an image classification result according to all the feature maps.

In some specific embodiments, the TELU activation function is:

or the like, or, alternatively,

In some specific embodiments, the tensor for the feature data is represented as [ B, W, H, C ], B is the mini batch size, C is the number of filters in the convolution, W is the width of the feature data, and H is the height of the feature data.

In some specific embodiments, the computer subprogram stored in the computer readable storage medium, when executed by the processor, may specifically implement the following steps:

respectively carrying out mean square normalization processing on each vector to obtain a mean value of filter response;

In some specific embodiments, the process of respectively performing the mean-square normalization processing on each vector to obtain the average value of the filter response specifically includes:

the first relation is upsilon²＝∑_ix_i ²/N；

The second relation is

as an average, ∈ is a normal number, and N ═ W × H.

In some specific embodiments, the process of obtaining the target data by performing linear transformation on the mean value includes:

Where γ is a trainable scaling factor and β is a trainable offset.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. An image classification method, comprising:

and obtaining an image classification result according to all the feature maps.

2. The image classification method according to claim 1, characterized in that the TELU activation function is:

or the like, or, alternatively,

3. The image classification method according to claim 1, wherein the tensor of the feature data is represented by [ B, W, H, C ], B is mini batch size, C is the number of filters in convolution, W is the width of the feature data, and H is the height of the feature data.

4. The image classification method according to claim 3, wherein the process of obtaining the target data after the mean square normalization processing corresponding to the c-th channel and the b-th batch in the current layer of the neural network according to the feature data specifically includes:

5. The image classification method according to claim 4, wherein the process of respectively performing mean-square normalization processing on each vector to obtain the mean value of the filter response specifically comprises:

the first relation is upsilon²＝∑_ix_i ²/N；

The second relation is

for the mean value, ε is a normal number, and N ═ W × H.

6. The image classification method according to claim 5, wherein the process of linearly transforming the mean value to obtain the target data comprises:

Where γ is a trainable scaling factor and β is a trainable offset.

7. An image classification apparatus, comprising:

8. The image classification device according to claim 7, wherein the TELU activation function is:

or the like, or, alternatively,

9. The image classification apparatus according to claim 7, wherein the tensor of the feature data is represented by [ B, W, H, C ], B is a mini batch size, C is the number of filters in convolution, W is the width of the feature data, and H is the height of the feature data.

10. The image classification device according to claim 9, wherein the normalization processing module includes:

11. The image classification device according to claim 10, wherein the processing unit is specifically configured to:

the first relation is upsilon²＝∑_ix_i ²/N；

The second relation is

for the mean value, ε is a normal number, and N ═ W × H.

12. The image classification device according to claim 11, wherein the transformation unit is specifically configured to:

Where γ is a trainable scaling factor and β is a trainable offset.

13. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the image classification method according to any one of claims 1 to 6 when executing said computer program.

14. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the image classification method according to any one of claims 1 to 6.