CN112270343A - Image classification method and device and related components - Google Patents

Image classification method and device and related components Download PDF

Info

Publication number
CN112270343A
CN112270343A CN202011110384.4A CN202011110384A CN112270343A CN 112270343 A CN112270343 A CN 112270343A CN 202011110384 A CN202011110384 A CN 202011110384A CN 112270343 A CN112270343 A CN 112270343A
Authority
CN
China
Prior art keywords
image classification
data
mean
batch
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011110384.4A
Other languages
Chinese (zh)
Inventor
杨宏斌
金良
赵雅倩
董刚
刘海威
蒋东东
胡克坤
李仁刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202011110384.4A priority Critical patent/CN112270343A/en
Publication of CN112270343A publication Critical patent/CN112270343A/en
Priority to PCT/CN2021/089922 priority patent/WO2022077894A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The application discloses an image classification method, an image classification device, electronic equipment and a computer-readable storage medium, wherein the image classification method comprises the steps of constructing a TELU activation function according to a function corresponding to an ELU negative half axis and a TLU function; performing convolution calculation on input image data to obtain characteristic data; acquiring target data after mean square normalization processing corresponding to the c channel and the b batch in the current layer of the neural network according to the characteristic data; obtaining a characteristic diagram corresponding to the target data through a TELU activation function; and obtaining an image classification result according to all the feature maps. The application can solve the problems that FRN lacks deviation problem that mean value is centralized to cause that activation is far away from 0 value, and neuron death caused by gradient disappearance of an input part is caused, has soft saturation characteristic when small value is taken by input, reduces forward propagation change and information, improves the overall performance of a neural network, and enables image classification results to have higher accuracy.

Description

Image classification method and device and related components
Technical Field
The present application relates to the field of image classification, and in particular, to an image classification method, an image classification device, and related components.
Background
BN (Batch Normalization) is a technique with milestone significance in the field of deep learning, and can train various networks and promote the development of the field of computer vision to a great extent. The BN normalizes the features by calculating the mean and variance in the (mini-) batch, simplifies the optimization process of the network and enables a deeper neural network to converge in training. However, normalization along the dimension of the batch has many problems, and when the batch size becomes smaller, due to inaccurate estimation of statistical information of the batch, the error rate of network models such as image classification and the like rises sharply, and the use of BN in training larger models is limited.
In order to solve the above technical problem, the prior art adopts a TLU-based FRN normalization scheme, which does not have a batch dependency, and operates independently on each activation channel (filter response) of each sample, and the accuracy of each batch size shows stability and consistency, but TLU is an improvement based on ReLU, so it has some disadvantages of ReLU itself, because the output value of ReLU has no negative value, the output average value is greater than 0, when the average value of activation values is not 0, a bias is caused to the next layer, and if the activation values do not cancel each other (i.e. the average value is not 0), a bias shift is caused to the activation unit of the next layer. By means of the superposition, the more units are, the larger bias shift is, so that the problem of unconvergence of the ReLU can occur when some ultra-deep networks are trained, and due to some inherent characteristics of the ReLU activation function, the situation that the gradient of the input negative half shaft part disappears and the weight cannot be updated easily can be caused, the derivative of the ReLU can become 0 when the input is negative, and the disappearance of the gradient can cause the problem of neuron death, so that the overall performance of the neural network is influenced, and the image classification accuracy is low.
Therefore, how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide an image classification method, an image classification device, electronic equipment and a computer-readable storage medium, which can solve the problems of deviation of activation far from a 0 value caused by the fact that an FRN is lack of a mean value and is centered and neuron death caused by the fact that a gradient disappears at an input part, have the characteristic of soft saturation when a small value is input, reduce the change and information of forward propagation, improve the overall performance of a neural network and enable an image classification result to have higher accuracy.
In order to solve the above technical problem, the present application provides an image classification method, including:
constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
performing convolution calculation on input image data to obtain characteristic data;
acquiring target data after mean square normalization processing corresponding to a c channel and a b batch in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
obtaining a characteristic diagram corresponding to the target data through the TELU activation function;
and obtaining an image classification result according to all the feature maps.
Preferably, the tel u activation function is:
Figure BDA0002728410510000021
or the like, or, alternatively,
Figure BDA0002728410510000022
where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.
Preferably, the tensor of the feature data is represented as [ B, W, H, C ], B is mini batch size, C is the number of filters in convolution, W is the width of the feature data, and H is the height of the feature data.
Preferably, the process of obtaining the target data after the mean square normalization processing corresponding to the c-th channel and the b-th batch in the current layer of the neural network according to the feature data specifically includes:
obtaining the filter response vectors of the c channel and the b batch in the current layer of the neural network according to the characteristic data;
respectively carrying out mean square normalization processing on each vector to obtain a mean value of the response of the filter;
and carrying out linear transformation on the average value to obtain target data.
Preferably, the process of respectively performing mean-square normalization processing on each vector to obtain the mean value of the filter response specifically includes:
respectively carrying out mean square normalization processing on each vector through a first relational expression;
obtaining the average value of the filter response through a second relational expression;
the first relation is upsilon2=∑ixi 2/N;
The second relation is
Figure BDA0002728410510000031
Wherein X ═ Xb,;,;c∈RNVector of response to filter for the c channel, the b batch, xiIs the ith element in the x set, upsilon2Is a mean-square normalization of x,
Figure BDA0002728410510000032
for the mean value, ε is a normal number, and N ═ W × H.
Preferably, the process of performing linear transformation on the mean value to obtain the target data includes:
carrying out linear transformation on the average value through a third relational expression to obtain target data, wherein the third relational expression is
Figure BDA0002728410510000033
Where γ is a trainable scaling factor and β is a trainable offset.
In order to solve the above technical problem, the present application further provides an image classification apparatus, including:
the construction module is used for constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
the convolution calculation module is used for carrying out convolution calculation on the input image data to obtain characteristic data;
the normalization processing module is used for acquiring target data after mean square normalization processing corresponding to the c channel and the b batch in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
the activation module is used for obtaining a characteristic diagram corresponding to the target data through the TELU activation function;
and the classification module is used for obtaining an image classification result according to all the characteristic graphs.
Preferably, the tel u activation function is:
Figure BDA0002728410510000034
or the like, or, alternatively,
Figure BDA0002728410510000035
where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.
Preferably, the tensor of the feature data is represented as [ B, W, H, C ], B is mini batch size, C is the number of filters in convolution, W is the width of the feature data, and H is the height of the feature data.
Preferably, the normalization processing module includes:
the acquisition unit is used for acquiring the vector of the filter response of the c channel and the b batch in the current layer of the neural network according to the characteristic data;
the processing unit is used for respectively carrying out mean square normalization processing on each vector to obtain a mean value of the filter response;
and the transformation unit is used for carrying out linear transformation on the average value to obtain target data.
Preferably, the processing unit is specifically configured to:
respectively carrying out mean square normalization processing on each vector through a first relational expression;
obtaining the average value of the filter response through a second relational expression;
the first relation is upsilon2=∑ixi 2/N;
The second relation is
Figure BDA0002728410510000041
Wherein X ═ Xb,;,;c∈RNVector of response to filter for the c channel, the b batch, xiIs the ith element in the x set, upsilon2Is a mean-square normalization of x,
Figure BDA0002728410510000042
for the mean value, ε is a normal number, and N ═ W × H.
Preferably, the transformation unit is specifically configured to:
carrying out linear transformation on the average value through a third relational expression to obtain target data, wherein the third relational expression is
Figure BDA0002728410510000043
Where γ is a trainable scaling factor and β is a trainable offset.
In order to solve the above technical problem, the present application further provides an electronic device, including:
a memory for storing a computer program;
a processor for implementing the steps of the image classification method as claimed in any one of the above when said computer program is executed.
To solve the above technical problem, the present application further provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the image classification method according to any one of the above.
The application provides an image classification method, because the new construction TELU function has introduced the function that ELU negative half axle corresponds, consequently can get the negative value, let the unit activation mean value more be close to 0, can solve FRN and lack the mean value and center the deviation problem that the activation that causes kept away from the 0 value, the problem of the dead neuron that the input part arouses because of the gradient disappears has been solved simultaneously, activation function's performance has been increased, because TELU function negative half section part slope is less, when the input takes the less value, have soft saturated characteristic, reduce the change and the information of forward propagation, the robustness to the noise has been strengthened, the wholeness of neural network has been improved, thereby make the image classification result more accurate. The application also provides an image classification device, an electronic device and a computer readable storage medium, which have the same beneficial effects as the image classification method.
Drawings
In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is a flowchart illustrating steps of an image classification method according to the present application;
FIG. 2a is a diagram of an ELU function provided herein;
FIG. 2b is a TELU function curve provided herein;
fig. 3a is a schematic diagram of a ReLU activation function provided in the present application;
FIG. 3b is a schematic diagram of a ReLU (y- τ) activation function provided herein;
FIG. 3c is a schematic diagram of a max (y, τ) activation function provided herein;
FIG. 4 is a schematic diagram of an improved FRN layer normalization and activation process provided herein;
fig. 5 is a schematic structural diagram of an image classification apparatus provided in the present application;
fig. 6 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
The core of the application is to provide an image classification method, an image classification device, electronic equipment and a computer-readable storage medium, which can solve the problems of deviation of activation far from a 0 value caused by the fact that FRN lacks mean value centering and neuron death caused by disappearance of gradient of an input part, have the characteristic of soft saturation when a small value is input, reduce the change and information of forward propagation, improve the overall performance of a neural network and enable an image classification result to have higher accuracy.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of an image classification method according to the present application, the image classification method including:
s101: constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
first, please refer to fig. 2a, where fig. 2a is an ELU function curve provided by the present application, and the expression thereof is:
Figure BDA0002728410510000061
wherein alpha is more than 0. Considering that the lack of mean centering of FRN results in any deviation of activation away from the 0 value, the deviation, in combination with ReLU, results in a decrease in accuracy, and therefore, a learnable threshold, τ, is added to ReLU, defining TLU as z ═ max (y, τ), because max (y, τ) max (y- τ,0) + τ ReLU (y- τ) + τ, the effect of TLU activation is equivalent to having a shared bias before and after ReLU, the FRN normalization performance based on TLU is significantly better than other normalization methods, especially when batch comparison is small, TLU is slightly improved based on ReLU, therefore, the method has some advantages of the ReLU, such as fast convergence speed, especially the convergence speed in SGD algorithm is obviously faster than sigmoid and tanh, this is because ReLU solves the problem of gradient disappearance when sigmoid and tanh are far from 0, if the calculation complexity is low, an exponential operation is not needed, and the activation value can be obtained only through one threshold value. The activation functions of ReLU, ReLU (y- τ) and max (y, τ) are shown in FIG. 3 a-FIG. 3c, respectively.
Specifically, on the basis of the above scheme, the application introduces a function curve of an ELU negative half axis, and improves the TLU to obtain a TELU (threshold explicit Linear unit) function, the curve of the TELU function is shown in FIG. 2b, the TELU combines the advantages of the TLU and the ELU, and the expression of the TELU function is:
Figure BDA0002728410510000062
or the like, or, alternatively,
Figure BDA0002728410510000063
where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.
Accordingly, referring to fig. 4, fig. 4 is a schematic diagram of an improved FRN layer normalization and activation process provided by an embodiment of the present application.
S102: performing convolution calculation on input image data to obtain characteristic data;
specifically, convolution calculation is performed on input image data to obtain feature data, where the tenor of the feature data is represented as [ B, W, H, C ], B is mini batch size, C is the number of filters in the convolution, H is the height of the feature data, and W is the width of the feature data.
The input image data can be image data corresponding to a security monitoring video, image data correspondingly acquired in an automatic driving process and image data corresponding to a streaming media online video, and the application field of the input image data is not specifically limited.
S103: acquiring target data after mean square normalization processing corresponding to a channel c and a batch b in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
specifically, let X be Xb,;,;c∈RNTo indicate the c-th channel cthThe b th batch bthThe vector of filter response of (1), where N is W × H, is subjected to a mean square normalization process on the vector x of each batch and each channel according to a first relation, where the first relation is ν2=∑ixi 2N, then calculating the mean value of the filter response by a second relation, wherein the second relation is
Figure BDA0002728410510000071
Then, the mean value is linearly transformed according to a third relation to compensate the loss of representation capability possibly caused by normalization
Figure BDA0002728410510000072
Wherein x isiIs the ith element in x, upsilon2Is a mean-square normalization of x,
Figure BDA0002728410510000073
to mean, ε is a small positive constant to prevent divide-by-0 errors, γ is a trainable scaling factor, and β is a trainable offset.
S104: obtaining a characteristic diagram corresponding to the target data through a TELU activation function;
s105: and obtaining an image classification result according to all the feature maps.
It can be understood that, the activation operation of the target data by the tel function obtains the corresponding feature map, S101-S104 complete the normalized activation of one layer in the neural network, and for other convolutional layers in the neural network, S101-S104 are also executed, and all feature maps are subjected to feature classification, and the output layer of the neural network outputs the image classification result.
Specifically, compared with the ReLU, the activation function tel formed by combining the TLU with the ELU in the present application inherits the adjustment of the TLU to the problem that the activation of the FRN result deviates from a 0 value due to lack of mean centering, and introduces many advantages of the ELU, the tel may take a negative value, and may make the cell activation mean closer to 0, similar to the effect of Batch Normalization, but requiring lower computational complexity. Since TELU increases the treatment for the gradient 0 portion, the problem of neuronal death can be solved and it speeds up the training speed, evidence suggests that mean activation close to 0 can make training faster. And the slope of the negative half-segment part of the TELU is smaller, so that the method has the characteristic of soft saturation when a smaller value is input, the change and the information of forward propagation are reduced, the robustness of noise is enhanced, the accuracy of the whole network is further improved, and the image classification result is more accurate. As can be seen from fig. 2a, α is a tunable parameter that controls when the negative part of the ELU saturates, and the activation function TELU has a significant saturation plateau in its negative state, enabling learning of a more robust and stable representation.
Specifically, the improved FRN layer gradient calculation process is as follows:
because all ofThe conversion of (2) is performed along the channels and the present embodiment only derives the gradient per channel. Assuming a certain level in the neural network, according to the second relation, the TELU function and the fourth relation
Figure BDA0002728410510000081
The transformation of (a) is to send x to the FRN layer for computation, and its output is z (see FIG. 4. let f (z) be the mapping of the neural network applied to z, with inverse gradient of
Figure BDA0002728410510000082
The parameters γ, β, and τ are vectors for each channel, then:
Figure BDA0002728410510000083
Figure BDA0002728410510000084
the gradient of the update to τ is:
Figure BDA0002728410510000085
the gradient updated to y is:
Figure BDA0002728410510000086
where z isbIs bthThe activation vector per channel of batch.
Further, when y is more than or equal to tau,
Figure BDA0002728410510000087
when y ∈ [ - τ, τ),
Figure BDA0002728410510000088
when y < - τ is greater than the value,
Figure BDA0002728410510000089
the gradient for γ is as follows:
Figure BDA0002728410510000091
further:
when y is more than or equal to tau,
Figure BDA0002728410510000092
when y ∈ [ - τ, τ),
Figure BDA0002728410510000093
when y < - τ is greater than the value,
Figure BDA0002728410510000094
the gradient for β is as follows:
Figure BDA0002728410510000095
further:
when y is more than or equal to tau,
Figure BDA0002728410510000096
when y ∈ [ - τ, τ),
Figure BDA0002728410510000097
when y < - τ is greater than the value,
Figure BDA0002728410510000098
by
Figure BDA0002728410510000099
It can be derived that:
Figure BDA00027284105100000910
Figure BDA00027284105100000911
according to the above formula and
Figure BDA00027284105100000912
then
Figure BDA00027284105100000913
When y is more than or equal to tau,
Figure BDA00027284105100000914
when y ∈ [ - τ, τ),
Figure BDA00027284105100000915
when y < - τ is greater than the value,
Figure BDA00027284105100000916
it can be seen from the above relation that when yi < - τ, the gradient of TELU is not 0, and through reverse derivation, yi < - τ is obtained
Figure BDA00027284105100000917
The input value is not 0, so that the problem of neuron death caused by gradient disappearance of the negative half input of the TLU is solved, the slope of the negative half input of the TLU is small, the negative half input of the TLU has the characteristic of soft saturation when the input is small, forward propagation changes and information are reduced, and the robustness to noise is enhanced. In addition, alpha is used as a tunable parameter to control when the ELU negative part is saturated, thereby further enhancing the flexibility of gradient control of the part.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an image classification apparatus provided in the present application, the image classification apparatus including:
the construction module 11 is used for constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
a convolution calculation module 12, configured to perform convolution calculation on input image data to obtain feature data;
the normalization processing module 13 is configured to obtain target data after mean square normalization processing corresponding to a c-th channel and a b-th batch in a current layer of the neural network according to the feature data, where b and c are positive integers;
the activation module 14 is configured to obtain a feature map corresponding to the target data through a TELU activation function;
and the classification module 15 is used for obtaining an image classification result according to all the feature maps.
It can be seen that, the new-structured tel function in this embodiment introduces the function corresponding to the negative half axis of the ELU, so that a negative value can be taken, and the unit activation mean value is closer to 0, so that the problem of deviation of activation away from the 0 value caused by the fact that the FRN lacks the mean value in the middle can be solved, and the problem of neuron death caused by the disappearance of the gradient in the input part can be solved, thereby increasing the expressive ability of the activation function, because the slope of the negative half segment of the tel function is smaller, when the input takes a smaller value, the soft saturation characteristic is provided, the change and the information of forward propagation are reduced, the robustness to noise is enhanced, the overall performance of the neural network is improved, and thus the image classification result is more accurate.
As a preferred embodiment, the TELU activation function is:
Figure BDA0002728410510000101
or the like, or, alternatively,
Figure BDA0002728410510000102
where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.
As a preferred embodiment, the tensor of the feature data is represented as [ B, W, H, C ], B is the mini batch size, C is the number of filters in the convolution, W is the width of the feature data, and H is the height of the feature data.
As a preferred embodiment, the normalization processing module 13 includes:
the acquisition unit is used for acquiring the filter response vectors of the c channel and the b batch in the current layer of the neural network according to the characteristic data;
the processing unit is used for respectively carrying out mean square normalization processing on each vector to obtain a mean value of filter response;
and the transformation unit is used for carrying out linear transformation on the average value to obtain target data.
As a preferred embodiment, the processing unit is specifically configured to:
respectively carrying out mean square normalization processing on each vector through a first relational expression;
obtaining the average value of the filter response through a second relational expression;
the first relation is upsilon2=∑ixi 2/N;
The second relation is
Figure BDA0002728410510000111
Wherein X ═ Xb,;,;c∈RNVector of response to filter for the c channel, the b batch, xiIs the ith element in the x set, upsilon2Is a mean-square normalization of x,
Figure BDA0002728410510000112
as an average, ∈ is a normal number, and N ═ W × H.
As a preferred embodiment, the transformation unit is specifically configured to:
carrying out linear transformation on the mean value through a third relational expression to obtain target data, wherein the third relational expression is
Figure BDA0002728410510000113
Where γ is a trainable scaling factor and β is a trainable offset.
On the other hand, the present application further provides an electronic device, as shown in fig. 6, which shows a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device according to the embodiment may include: a processor 21 and a memory 22.
Optionally, the electronic device may further comprise a communication interface 23, an input unit 24 and a display 25 and a communication bus 26.
The processor 21, the memory 22, the communication interface 23, the input unit 24 and the display 25 are all communicated with each other through a communication bus 26.
In the embodiment of the present application, the processor 21 may be a Central Processing Unit (CPU), an application specific integrated circuit, a digital signal processor, an off-the-shelf programmable gate array or other programmable logic device, etc.
The processor may call a program stored in the memory 22. Specifically, the processor may perform operations performed on the electronic device side in the embodiments of the image classification method described below.
The memory 22 is used for storing one or more programs, the program may include program codes, the program codes include computer operation instructions, and in the embodiment of the present application, the memory stores at least the program for realizing the following functions:
constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
performing convolution calculation on input image data to obtain characteristic data;
acquiring target data after mean square normalization processing corresponding to a channel c and a batch b in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
obtaining a characteristic diagram corresponding to the target data through a TELU activation function;
and obtaining an image classification result according to all the feature maps.
It can be seen that, the new-structured tel function in this embodiment introduces the function corresponding to the negative half axis of the ELU, so that a negative value can be taken, and the unit activation mean value is closer to 0, so that the problem of deviation of activation away from the 0 value caused by the fact that the FRN lacks the mean value in the middle can be solved, and the problem of neuron death caused by the disappearance of the gradient in the input part can be solved, thereby increasing the expressive ability of the activation function, because the slope of the negative half segment of the tel function is smaller, when the input takes a smaller value, the soft saturation characteristic is provided, the change and the information of forward propagation are reduced, the robustness to noise is enhanced, the overall performance of the neural network is improved, and thus the image classification result is more accurate.
In one possible implementation, the memory 22 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a mean square normalization calculation function, etc.), and the like; the storage data area may store data created according to the use of the computer.
Further, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid state storage device.
The communication interface 23 may be an interface of a communication module, such as an interface of a GSM module.
The present application may also include a display 24 and an input unit 25, etc.
Of course, the structure of the internet of things device shown in fig. 6 does not constitute a limitation on the internet of things device in the embodiment of the present application, and in practical applications, the electronic device may include more or less components than those shown in fig. 6, or some components in combination.
In another aspect, embodiments of the present application also disclose a computer-readable storage medium, where the computer-readable storage medium includes Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable hard disk, a CD-ROM, or any other form of storage medium known in the art. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of:
constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
performing convolution calculation on input image data to obtain characteristic data;
acquiring target data after mean square normalization processing corresponding to a channel c and a batch b in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
obtaining a characteristic diagram corresponding to the target data through a TELU activation function;
and obtaining an image classification result according to all the feature maps.
It can be seen that, the new-structured tel function in this embodiment introduces the function corresponding to the negative half axis of the ELU, so that a negative value can be taken, and the unit activation mean value is closer to 0, so that the problem of deviation of activation away from the 0 value caused by the fact that the FRN lacks the mean value in the middle can be solved, and the problem of neuron death caused by the disappearance of the gradient in the input part can be solved, thereby increasing the expressive ability of the activation function, because the slope of the negative half segment of the tel function is smaller, when the input takes a smaller value, the soft saturation characteristic is provided, the change and the information of forward propagation are reduced, the robustness to noise is enhanced, the overall performance of the neural network is improved, and thus the image classification result is more accurate.
In some specific embodiments, the TELU activation function is:
Figure BDA0002728410510000131
or the like, or, alternatively,
Figure BDA0002728410510000132
where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.
In some specific embodiments, the tensor for the feature data is represented as [ B, W, H, C ], B is the mini batch size, C is the number of filters in the convolution, W is the width of the feature data, and H is the height of the feature data.
In some specific embodiments, the computer subprogram stored in the computer readable storage medium, when executed by the processor, may specifically implement the following steps:
obtaining the filter response vectors of the c channel and the b batch in the current layer of the neural network according to the characteristic data;
respectively carrying out mean square normalization processing on each vector to obtain a mean value of filter response;
and carrying out linear transformation on the average value to obtain target data.
In some specific embodiments, the process of respectively performing the mean-square normalization processing on each vector to obtain the average value of the filter response specifically includes:
respectively carrying out mean square normalization processing on each vector through a first relational expression;
obtaining the average value of the filter response through a second relational expression;
the first relation is upsilon2=∑ixi 2/N;
The second relation is
Figure BDA0002728410510000141
Wherein X ═ Xb,;,;c∈RNVector of response to filter for the c channel, the b batch, xiIs the ith element in the x set, upsilon2Is a mean-square normalization of x,
Figure BDA0002728410510000142
as an average, ∈ is a normal number, and N ═ W × H.
In some specific embodiments, the process of obtaining the target data by performing linear transformation on the mean value includes:
carrying out linear transformation on the mean value through a third relational expression to obtain target data, wherein the third relational expression is
Figure BDA0002728410510000143
Where γ is a trainable scaling factor and β is a trainable offset.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (14)

1. An image classification method, comprising:
constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
performing convolution calculation on input image data to obtain characteristic data;
acquiring target data after mean square normalization processing corresponding to a c channel and a b batch in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
obtaining a characteristic diagram corresponding to the target data through the TELU activation function;
and obtaining an image classification result according to all the feature maps.
2. The image classification method according to claim 1, characterized in that the TELU activation function is:
Figure FDA0002728410500000011
or the like, or, alternatively,
Figure FDA0002728410500000012
where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.
3. The image classification method according to claim 1, wherein the tensor of the feature data is represented by [ B, W, H, C ], B is mini batch size, C is the number of filters in convolution, W is the width of the feature data, and H is the height of the feature data.
4. The image classification method according to claim 3, wherein the process of obtaining the target data after the mean square normalization processing corresponding to the c-th channel and the b-th batch in the current layer of the neural network according to the feature data specifically includes:
obtaining the filter response vectors of the c channel and the b batch in the current layer of the neural network according to the characteristic data;
respectively carrying out mean square normalization processing on each vector to obtain a mean value of the response of the filter;
and carrying out linear transformation on the average value to obtain target data.
5. The image classification method according to claim 4, wherein the process of respectively performing mean-square normalization processing on each vector to obtain the mean value of the filter response specifically comprises:
respectively carrying out mean square normalization processing on each vector through a first relational expression;
obtaining the average value of the filter response through a second relational expression;
the first relation is upsilon2=∑ixi 2/N;
The second relation is
Figure FDA0002728410500000021
Wherein X ═ Xb,;,;c∈RNVector of response to filter for the c channel, the b batch, xiIs the ith element in the x set, upsilon2Is a mean-square normalization of x,
Figure FDA0002728410500000025
for the mean value, ε is a normal number, and N ═ W × H.
6. The image classification method according to claim 5, wherein the process of linearly transforming the mean value to obtain the target data comprises:
carrying out linear transformation on the average value through a third relational expression to obtain target data, wherein the third relational expression is
Figure FDA0002728410500000022
Where γ is a trainable scaling factor and β is a trainable offset.
7. An image classification apparatus, comprising:
the construction module is used for constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
the convolution calculation module is used for carrying out convolution calculation on the input image data to obtain characteristic data;
the normalization processing module is used for acquiring target data after mean square normalization processing corresponding to the c channel and the b batch in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
the activation module is used for obtaining a characteristic diagram corresponding to the target data through the TELU activation function;
and the classification module is used for obtaining an image classification result according to all the characteristic graphs.
8. The image classification device according to claim 7, wherein the TELU activation function is:
Figure FDA0002728410500000023
or the like, or, alternatively,
Figure FDA0002728410500000024
where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.
9. The image classification apparatus according to claim 7, wherein the tensor of the feature data is represented by [ B, W, H, C ], B is a mini batch size, C is the number of filters in convolution, W is the width of the feature data, and H is the height of the feature data.
10. The image classification device according to claim 9, wherein the normalization processing module includes:
the acquisition unit is used for acquiring the vector of the filter response of the c channel and the b batch in the current layer of the neural network according to the characteristic data;
the processing unit is used for respectively carrying out mean square normalization processing on each vector to obtain a mean value of the filter response;
and the transformation unit is used for carrying out linear transformation on the average value to obtain target data.
11. The image classification device according to claim 10, wherein the processing unit is specifically configured to:
respectively carrying out mean square normalization processing on each vector through a first relational expression;
obtaining the average value of the filter response through a second relational expression;
the first relation is upsilon2=∑ixi 2/N;
The second relation is
Figure FDA0002728410500000031
Wherein X ═ Xb,;,;c∈RNVector of response to filter for the c channel, the b batch, xiIs the ith element in the x set, upsilon2Is a mean-square normalization of x,
Figure FDA0002728410500000033
for the mean value, ε is a normal number, and N ═ W × H.
12. The image classification device according to claim 11, wherein the transformation unit is specifically configured to:
carrying out linear transformation on the average value through a third relational expression to obtain target data, wherein the third relational expression is
Figure FDA0002728410500000032
Where γ is a trainable scaling factor and β is a trainable offset.
13. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the image classification method according to any one of claims 1 to 6 when executing said computer program.
14. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the image classification method according to any one of claims 1 to 6.
CN202011110384.4A 2020-10-16 2020-10-16 Image classification method and device and related components Pending CN112270343A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011110384.4A CN112270343A (en) 2020-10-16 2020-10-16 Image classification method and device and related components
PCT/CN2021/089922 WO2022077894A1 (en) 2020-10-16 2021-04-26 Image classification and apparatus, and related components

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011110384.4A CN112270343A (en) 2020-10-16 2020-10-16 Image classification method and device and related components

Publications (1)

Publication Number Publication Date
CN112270343A true CN112270343A (en) 2021-01-26

Family

ID=74337398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011110384.4A Pending CN112270343A (en) 2020-10-16 2020-10-16 Image classification method and device and related components

Country Status (2)

Country Link
CN (1) CN112270343A (en)
WO (1) WO2022077894A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022077894A1 (en) * 2020-10-16 2022-04-21 苏州浪潮智能科技有限公司 Image classification and apparatus, and related components
CN114708460A (en) * 2022-04-12 2022-07-05 济南博观智能科技有限公司 Image classification method, system, electronic equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358257A (en) * 2017-07-07 2017-11-17 华南理工大学 Under a kind of big data scene can incremental learning image classification training method
CN109753983A (en) * 2017-11-07 2019-05-14 北京京东尚科信息技术有限公司 Image classification method, device and computer readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10290085B2 (en) * 2016-12-14 2019-05-14 Adobe Inc. Image hole filling that accounts for global structure and local texture
CN110569965A (en) * 2019-08-27 2019-12-13 中山大学 Neural network model optimization method and system based on ThLU function
CN112270343A (en) * 2020-10-16 2021-01-26 苏州浪潮智能科技有限公司 Image classification method and device and related components

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107358257A (en) * 2017-07-07 2017-11-17 华南理工大学 Under a kind of big data scene can incremental learning image classification training method
CN109753983A (en) * 2017-11-07 2019-05-14 北京京东尚科信息技术有限公司 Image classification method, device and computer readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DJORK-ARN´E CLEVERT等: "FAST AND ACCURATE DEEP NETWORK LEARNING BY EXPONENTIAL LINEAR UNITS (ELUS)", 《ICLR2016》 *
SAURABH SINGH等: "Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks", 《ARXIV》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022077894A1 (en) * 2020-10-16 2022-04-21 苏州浪潮智能科技有限公司 Image classification and apparatus, and related components
CN114708460A (en) * 2022-04-12 2022-07-05 济南博观智能科技有限公司 Image classification method, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2022077894A1 (en) 2022-04-21

Similar Documents

Publication Publication Date Title
US11373087B2 (en) Method and apparatus for generating fixed-point type neural network
US11657254B2 (en) Computation method and device used in a convolutional neural network
US10789734B2 (en) Method and device for data quantization
JP6724869B2 (en) Method for adjusting output level of neurons in multilayer neural network
CN112949678B (en) Deep learning model countermeasure sample generation method, system, equipment and storage medium
Yan et al. An adaptive surrogate modeling based on deep neural networks for large-scale Bayesian inverse problems
CN112488104A (en) Depth and confidence estimation system
CN112270343A (en) Image classification method and device and related components
KR20200144398A (en) Apparatus for performing class incremental learning and operation method thereof
JP3979007B2 (en) Pattern identification method and apparatus
WO2018235449A1 (en) Artificial neural network circuit training method, training program, and training device
CN116245015A (en) Data change trend prediction method and system based on deep learning
CN111105017A (en) Neural network quantization method and device and electronic equipment
KR20200129458A (en) A computing device for training an artificial neural network model, a method for training an artificial neural network model, and a memory system for storing the same
CN111144560B (en) Deep neural network operation method and device
US20230025626A1 (en) Method and apparatus for generating process simulation models
CN114137967B (en) Driving behavior decision method based on multi-network joint learning
CN107967691B (en) Visual mileage calculation method and device
CN111612816B (en) Method, device, equipment and computer storage medium for tracking moving target
Kawashima et al. The aleatoric uncertainty estimation using a separate formulation with virtual residuals
CN116630697B (en) Image classification method based on biased selection pooling
Hameed A dynamic annealing learning for PLSOM neural networks: Applications in medicine and applied sciences
CN116205138B (en) Wind speed forecast correction method and device
CN111105020B (en) Feature representation migration learning method and related device
KR102539876B1 (en) Layer optimization system for 3d rram device using artificial intelligence technology and method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210126

RJ01 Rejection of invention patent application after publication