CN112270343A - Image classification method and device and related components - Google Patents
Image classification method and device and related components Download PDFInfo
- Publication number
- CN112270343A CN112270343A CN202011110384.4A CN202011110384A CN112270343A CN 112270343 A CN112270343 A CN 112270343A CN 202011110384 A CN202011110384 A CN 202011110384A CN 112270343 A CN112270343 A CN 112270343A
- Authority
- CN
- China
- Prior art keywords
- image classification
- data
- mean
- batch
- target data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 230000004913 activation Effects 0.000 claims abstract description 61
- 238000010606 normalization Methods 0.000 claims abstract description 50
- 238000012545 processing Methods 0.000 claims abstract description 39
- 238000013528 artificial neural network Methods 0.000 claims abstract description 31
- 238000010586 diagram Methods 0.000 claims abstract description 18
- 238000004364 calculation method Methods 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims description 73
- 239000013598 vector Substances 0.000 claims description 31
- 230000004044 response Effects 0.000 claims description 30
- 230000009466 transformation Effects 0.000 claims description 21
- 230000008569 process Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 4
- 230000001131 transforming effect Effects 0.000 claims 1
- 230000016273 neuron death Effects 0.000 abstract description 9
- 230000008859 change Effects 0.000 abstract description 8
- 230000008034 disappearance Effects 0.000 abstract description 8
- 238000001994 activation Methods 0.000 description 53
- 238000004891 communication Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 4
- 230000001965 increasing effect Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 229920006395 saturated elastomer Polymers 0.000 description 2
- WURBVZBTWMNKQT-UHFFFAOYSA-N 1-(4-chlorophenoxy)-3,3-dimethyl-1-(1,2,4-triazol-1-yl)butan-2-one Chemical compound C1=NC=NN1C(C(=O)C(C)(C)C)OC1=CC=C(Cl)C=C1 WURBVZBTWMNKQT-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000020411 cell activation Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/061—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The application discloses an image classification method, an image classification device, electronic equipment and a computer-readable storage medium, wherein the image classification method comprises the steps of constructing a TELU activation function according to a function corresponding to an ELU negative half axis and a TLU function; performing convolution calculation on input image data to obtain characteristic data; acquiring target data after mean square normalization processing corresponding to the c channel and the b batch in the current layer of the neural network according to the characteristic data; obtaining a characteristic diagram corresponding to the target data through a TELU activation function; and obtaining an image classification result according to all the feature maps. The application can solve the problems that FRN lacks deviation problem that mean value is centralized to cause that activation is far away from 0 value, and neuron death caused by gradient disappearance of an input part is caused, has soft saturation characteristic when small value is taken by input, reduces forward propagation change and information, improves the overall performance of a neural network, and enables image classification results to have higher accuracy.
Description
Technical Field
The present application relates to the field of image classification, and in particular, to an image classification method, an image classification device, and related components.
Background
BN (Batch Normalization) is a technique with milestone significance in the field of deep learning, and can train various networks and promote the development of the field of computer vision to a great extent. The BN normalizes the features by calculating the mean and variance in the (mini-) batch, simplifies the optimization process of the network and enables a deeper neural network to converge in training. However, normalization along the dimension of the batch has many problems, and when the batch size becomes smaller, due to inaccurate estimation of statistical information of the batch, the error rate of network models such as image classification and the like rises sharply, and the use of BN in training larger models is limited.
In order to solve the above technical problem, the prior art adopts a TLU-based FRN normalization scheme, which does not have a batch dependency, and operates independently on each activation channel (filter response) of each sample, and the accuracy of each batch size shows stability and consistency, but TLU is an improvement based on ReLU, so it has some disadvantages of ReLU itself, because the output value of ReLU has no negative value, the output average value is greater than 0, when the average value of activation values is not 0, a bias is caused to the next layer, and if the activation values do not cancel each other (i.e. the average value is not 0), a bias shift is caused to the activation unit of the next layer. By means of the superposition, the more units are, the larger bias shift is, so that the problem of unconvergence of the ReLU can occur when some ultra-deep networks are trained, and due to some inherent characteristics of the ReLU activation function, the situation that the gradient of the input negative half shaft part disappears and the weight cannot be updated easily can be caused, the derivative of the ReLU can become 0 when the input is negative, and the disappearance of the gradient can cause the problem of neuron death, so that the overall performance of the neural network is influenced, and the image classification accuracy is low.
Therefore, how to provide a solution to the above technical problem is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide an image classification method, an image classification device, electronic equipment and a computer-readable storage medium, which can solve the problems of deviation of activation far from a 0 value caused by the fact that an FRN is lack of a mean value and is centered and neuron death caused by the fact that a gradient disappears at an input part, have the characteristic of soft saturation when a small value is input, reduce the change and information of forward propagation, improve the overall performance of a neural network and enable an image classification result to have higher accuracy.
In order to solve the above technical problem, the present application provides an image classification method, including:
constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
performing convolution calculation on input image data to obtain characteristic data;
acquiring target data after mean square normalization processing corresponding to a c channel and a b batch in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
obtaining a characteristic diagram corresponding to the target data through the TELU activation function;
and obtaining an image classification result according to all the feature maps.
Preferably, the tel u activation function is:
or the like, or, alternatively,
where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.
Preferably, the tensor of the feature data is represented as [ B, W, H, C ], B is mini batch size, C is the number of filters in convolution, W is the width of the feature data, and H is the height of the feature data.
Preferably, the process of obtaining the target data after the mean square normalization processing corresponding to the c-th channel and the b-th batch in the current layer of the neural network according to the feature data specifically includes:
obtaining the filter response vectors of the c channel and the b batch in the current layer of the neural network according to the characteristic data;
respectively carrying out mean square normalization processing on each vector to obtain a mean value of the response of the filter;
and carrying out linear transformation on the average value to obtain target data.
Preferably, the process of respectively performing mean-square normalization processing on each vector to obtain the mean value of the filter response specifically includes:
respectively carrying out mean square normalization processing on each vector through a first relational expression;
obtaining the average value of the filter response through a second relational expression;
the first relation is upsilon2=∑ixi 2/N;
Wherein X ═ Xb,;,;c∈RNVector of response to filter for the c channel, the b batch, xiIs the ith element in the x set, upsilon2Is a mean-square normalization of x,for the mean value, ε is a normal number, and N ═ W × H.
Preferably, the process of performing linear transformation on the mean value to obtain the target data includes:
carrying out linear transformation on the average value through a third relational expression to obtain target data, wherein the third relational expression is
Where γ is a trainable scaling factor and β is a trainable offset.
In order to solve the above technical problem, the present application further provides an image classification apparatus, including:
the construction module is used for constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
the convolution calculation module is used for carrying out convolution calculation on the input image data to obtain characteristic data;
the normalization processing module is used for acquiring target data after mean square normalization processing corresponding to the c channel and the b batch in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
the activation module is used for obtaining a characteristic diagram corresponding to the target data through the TELU activation function;
and the classification module is used for obtaining an image classification result according to all the characteristic graphs.
Preferably, the tel u activation function is:
or the like, or, alternatively,
where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.
Preferably, the tensor of the feature data is represented as [ B, W, H, C ], B is mini batch size, C is the number of filters in convolution, W is the width of the feature data, and H is the height of the feature data.
Preferably, the normalization processing module includes:
the acquisition unit is used for acquiring the vector of the filter response of the c channel and the b batch in the current layer of the neural network according to the characteristic data;
the processing unit is used for respectively carrying out mean square normalization processing on each vector to obtain a mean value of the filter response;
and the transformation unit is used for carrying out linear transformation on the average value to obtain target data.
Preferably, the processing unit is specifically configured to:
respectively carrying out mean square normalization processing on each vector through a first relational expression;
obtaining the average value of the filter response through a second relational expression;
the first relation is upsilon2=∑ixi 2/N;
Wherein X ═ Xb,;,;c∈RNVector of response to filter for the c channel, the b batch, xiIs the ith element in the x set, upsilon2Is a mean-square normalization of x,for the mean value, ε is a normal number, and N ═ W × H.
Preferably, the transformation unit is specifically configured to:
carrying out linear transformation on the average value through a third relational expression to obtain target data, wherein the third relational expression is
Where γ is a trainable scaling factor and β is a trainable offset.
In order to solve the above technical problem, the present application further provides an electronic device, including:
a memory for storing a computer program;
a processor for implementing the steps of the image classification method as claimed in any one of the above when said computer program is executed.
To solve the above technical problem, the present application further provides a computer-readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the image classification method according to any one of the above.
The application provides an image classification method, because the new construction TELU function has introduced the function that ELU negative half axle corresponds, consequently can get the negative value, let the unit activation mean value more be close to 0, can solve FRN and lack the mean value and center the deviation problem that the activation that causes kept away from the 0 value, the problem of the dead neuron that the input part arouses because of the gradient disappears has been solved simultaneously, activation function's performance has been increased, because TELU function negative half section part slope is less, when the input takes the less value, have soft saturated characteristic, reduce the change and the information of forward propagation, the robustness to the noise has been strengthened, the wholeness of neural network has been improved, thereby make the image classification result more accurate. The application also provides an image classification device, an electronic device and a computer readable storage medium, which have the same beneficial effects as the image classification method.
Drawings
In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is a flowchart illustrating steps of an image classification method according to the present application;
FIG. 2a is a diagram of an ELU function provided herein;
FIG. 2b is a TELU function curve provided herein;
fig. 3a is a schematic diagram of a ReLU activation function provided in the present application;
FIG. 3b is a schematic diagram of a ReLU (y- τ) activation function provided herein;
FIG. 3c is a schematic diagram of a max (y, τ) activation function provided herein;
FIG. 4 is a schematic diagram of an improved FRN layer normalization and activation process provided herein;
fig. 5 is a schematic structural diagram of an image classification apparatus provided in the present application;
fig. 6 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
The core of the application is to provide an image classification method, an image classification device, electronic equipment and a computer-readable storage medium, which can solve the problems of deviation of activation far from a 0 value caused by the fact that FRN lacks mean value centering and neuron death caused by disappearance of gradient of an input part, have the characteristic of soft saturation when a small value is input, reduce the change and information of forward propagation, improve the overall performance of a neural network and enable an image classification result to have higher accuracy.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart illustrating steps of an image classification method according to the present application, the image classification method including:
s101: constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
first, please refer to fig. 2a, where fig. 2a is an ELU function curve provided by the present application, and the expression thereof is:
wherein alpha is more than 0. Considering that the lack of mean centering of FRN results in any deviation of activation away from the 0 value, the deviation, in combination with ReLU, results in a decrease in accuracy, and therefore, a learnable threshold, τ, is added to ReLU, defining TLU as z ═ max (y, τ), because max (y, τ) max (y- τ,0) + τ ReLU (y- τ) + τ, the effect of TLU activation is equivalent to having a shared bias before and after ReLU, the FRN normalization performance based on TLU is significantly better than other normalization methods, especially when batch comparison is small, TLU is slightly improved based on ReLU, therefore, the method has some advantages of the ReLU, such as fast convergence speed, especially the convergence speed in SGD algorithm is obviously faster than sigmoid and tanh, this is because ReLU solves the problem of gradient disappearance when sigmoid and tanh are far from 0, if the calculation complexity is low, an exponential operation is not needed, and the activation value can be obtained only through one threshold value. The activation functions of ReLU, ReLU (y- τ) and max (y, τ) are shown in FIG. 3 a-FIG. 3c, respectively.
Specifically, on the basis of the above scheme, the application introduces a function curve of an ELU negative half axis, and improves the TLU to obtain a TELU (threshold explicit Linear unit) function, the curve of the TELU function is shown in FIG. 2b, the TELU combines the advantages of the TLU and the ELU, and the expression of the TELU function is:
or the like, or, alternatively,
where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.
Accordingly, referring to fig. 4, fig. 4 is a schematic diagram of an improved FRN layer normalization and activation process provided by an embodiment of the present application.
S102: performing convolution calculation on input image data to obtain characteristic data;
specifically, convolution calculation is performed on input image data to obtain feature data, where the tenor of the feature data is represented as [ B, W, H, C ], B is mini batch size, C is the number of filters in the convolution, H is the height of the feature data, and W is the width of the feature data.
The input image data can be image data corresponding to a security monitoring video, image data correspondingly acquired in an automatic driving process and image data corresponding to a streaming media online video, and the application field of the input image data is not specifically limited.
S103: acquiring target data after mean square normalization processing corresponding to a channel c and a batch b in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
specifically, let X be Xb,;,;c∈RNTo indicate the c-th channel cthThe b th batch bthThe vector of filter response of (1), where N is W × H, is subjected to a mean square normalization process on the vector x of each batch and each channel according to a first relation, where the first relation is ν2=∑ixi 2N, then calculating the mean value of the filter response by a second relation, wherein the second relation isThen, the mean value is linearly transformed according to a third relation to compensate the loss of representation capability possibly caused by normalizationWherein x isiIs the ith element in x, upsilon2Is a mean-square normalization of x,to mean, ε is a small positive constant to prevent divide-by-0 errors, γ is a trainable scaling factor, and β is a trainable offset.
S104: obtaining a characteristic diagram corresponding to the target data through a TELU activation function;
s105: and obtaining an image classification result according to all the feature maps.
It can be understood that, the activation operation of the target data by the tel function obtains the corresponding feature map, S101-S104 complete the normalized activation of one layer in the neural network, and for other convolutional layers in the neural network, S101-S104 are also executed, and all feature maps are subjected to feature classification, and the output layer of the neural network outputs the image classification result.
Specifically, compared with the ReLU, the activation function tel formed by combining the TLU with the ELU in the present application inherits the adjustment of the TLU to the problem that the activation of the FRN result deviates from a 0 value due to lack of mean centering, and introduces many advantages of the ELU, the tel may take a negative value, and may make the cell activation mean closer to 0, similar to the effect of Batch Normalization, but requiring lower computational complexity. Since TELU increases the treatment for the gradient 0 portion, the problem of neuronal death can be solved and it speeds up the training speed, evidence suggests that mean activation close to 0 can make training faster. And the slope of the negative half-segment part of the TELU is smaller, so that the method has the characteristic of soft saturation when a smaller value is input, the change and the information of forward propagation are reduced, the robustness of noise is enhanced, the accuracy of the whole network is further improved, and the image classification result is more accurate. As can be seen from fig. 2a, α is a tunable parameter that controls when the negative part of the ELU saturates, and the activation function TELU has a significant saturation plateau in its negative state, enabling learning of a more robust and stable representation.
Specifically, the improved FRN layer gradient calculation process is as follows:
because all ofThe conversion of (2) is performed along the channels and the present embodiment only derives the gradient per channel. Assuming a certain level in the neural network, according to the second relation, the TELU function and the fourth relationThe transformation of (a) is to send x to the FRN layer for computation, and its output is z (see FIG. 4. let f (z) be the mapping of the neural network applied to z, with inverse gradient ofThe parameters γ, β, and τ are vectors for each channel, then:
the gradient of the update to τ is:
the gradient updated to y is:
where z isbIs bthThe activation vector per channel of batch.
the gradient for γ is as follows:
further:
the gradient for β is as follows:
further:
it can be seen from the above relation that when yi < - τ, the gradient of TELU is not 0, and through reverse derivation, yi < - τ is obtainedThe input value is not 0, so that the problem of neuron death caused by gradient disappearance of the negative half input of the TLU is solved, the slope of the negative half input of the TLU is small, the negative half input of the TLU has the characteristic of soft saturation when the input is small, forward propagation changes and information are reduced, and the robustness to noise is enhanced. In addition, alpha is used as a tunable parameter to control when the ELU negative part is saturated, thereby further enhancing the flexibility of gradient control of the part.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an image classification apparatus provided in the present application, the image classification apparatus including:
the construction module 11 is used for constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
a convolution calculation module 12, configured to perform convolution calculation on input image data to obtain feature data;
the normalization processing module 13 is configured to obtain target data after mean square normalization processing corresponding to a c-th channel and a b-th batch in a current layer of the neural network according to the feature data, where b and c are positive integers;
the activation module 14 is configured to obtain a feature map corresponding to the target data through a TELU activation function;
and the classification module 15 is used for obtaining an image classification result according to all the feature maps.
It can be seen that, the new-structured tel function in this embodiment introduces the function corresponding to the negative half axis of the ELU, so that a negative value can be taken, and the unit activation mean value is closer to 0, so that the problem of deviation of activation away from the 0 value caused by the fact that the FRN lacks the mean value in the middle can be solved, and the problem of neuron death caused by the disappearance of the gradient in the input part can be solved, thereby increasing the expressive ability of the activation function, because the slope of the negative half segment of the tel function is smaller, when the input takes a smaller value, the soft saturation characteristic is provided, the change and the information of forward propagation are reduced, the robustness to noise is enhanced, the overall performance of the neural network is improved, and thus the image classification result is more accurate.
As a preferred embodiment, the TELU activation function is:
or the like, or, alternatively,
where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.
As a preferred embodiment, the tensor of the feature data is represented as [ B, W, H, C ], B is the mini batch size, C is the number of filters in the convolution, W is the width of the feature data, and H is the height of the feature data.
As a preferred embodiment, the normalization processing module 13 includes:
the acquisition unit is used for acquiring the filter response vectors of the c channel and the b batch in the current layer of the neural network according to the characteristic data;
the processing unit is used for respectively carrying out mean square normalization processing on each vector to obtain a mean value of filter response;
and the transformation unit is used for carrying out linear transformation on the average value to obtain target data.
As a preferred embodiment, the processing unit is specifically configured to:
respectively carrying out mean square normalization processing on each vector through a first relational expression;
obtaining the average value of the filter response through a second relational expression;
the first relation is upsilon2=∑ixi 2/N;
Wherein X ═ Xb,;,;c∈RNVector of response to filter for the c channel, the b batch, xiIs the ith element in the x set, upsilon2Is a mean-square normalization of x,as an average, ∈ is a normal number, and N ═ W × H.
As a preferred embodiment, the transformation unit is specifically configured to:
carrying out linear transformation on the mean value through a third relational expression to obtain target data, wherein the third relational expression is
Where γ is a trainable scaling factor and β is a trainable offset.
On the other hand, the present application further provides an electronic device, as shown in fig. 6, which shows a schematic structural diagram of an electronic device according to an embodiment of the present application, where the electronic device according to the embodiment may include: a processor 21 and a memory 22.
Optionally, the electronic device may further comprise a communication interface 23, an input unit 24 and a display 25 and a communication bus 26.
The processor 21, the memory 22, the communication interface 23, the input unit 24 and the display 25 are all communicated with each other through a communication bus 26.
In the embodiment of the present application, the processor 21 may be a Central Processing Unit (CPU), an application specific integrated circuit, a digital signal processor, an off-the-shelf programmable gate array or other programmable logic device, etc.
The processor may call a program stored in the memory 22. Specifically, the processor may perform operations performed on the electronic device side in the embodiments of the image classification method described below.
The memory 22 is used for storing one or more programs, the program may include program codes, the program codes include computer operation instructions, and in the embodiment of the present application, the memory stores at least the program for realizing the following functions:
constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
performing convolution calculation on input image data to obtain characteristic data;
acquiring target data after mean square normalization processing corresponding to a channel c and a batch b in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
obtaining a characteristic diagram corresponding to the target data through a TELU activation function;
and obtaining an image classification result according to all the feature maps.
It can be seen that, the new-structured tel function in this embodiment introduces the function corresponding to the negative half axis of the ELU, so that a negative value can be taken, and the unit activation mean value is closer to 0, so that the problem of deviation of activation away from the 0 value caused by the fact that the FRN lacks the mean value in the middle can be solved, and the problem of neuron death caused by the disappearance of the gradient in the input part can be solved, thereby increasing the expressive ability of the activation function, because the slope of the negative half segment of the tel function is smaller, when the input takes a smaller value, the soft saturation characteristic is provided, the change and the information of forward propagation are reduced, the robustness to noise is enhanced, the overall performance of the neural network is improved, and thus the image classification result is more accurate.
In one possible implementation, the memory 22 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a mean square normalization calculation function, etc.), and the like; the storage data area may store data created according to the use of the computer.
Further, the memory 22 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device or other volatile solid state storage device.
The communication interface 23 may be an interface of a communication module, such as an interface of a GSM module.
The present application may also include a display 24 and an input unit 25, etc.
Of course, the structure of the internet of things device shown in fig. 6 does not constitute a limitation on the internet of things device in the embodiment of the present application, and in practical applications, the electronic device may include more or less components than those shown in fig. 6, or some components in combination.
In another aspect, embodiments of the present application also disclose a computer-readable storage medium, where the computer-readable storage medium includes Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable hard disk, a CD-ROM, or any other form of storage medium known in the art. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of:
constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
performing convolution calculation on input image data to obtain characteristic data;
acquiring target data after mean square normalization processing corresponding to a channel c and a batch b in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
obtaining a characteristic diagram corresponding to the target data through a TELU activation function;
and obtaining an image classification result according to all the feature maps.
It can be seen that, the new-structured tel function in this embodiment introduces the function corresponding to the negative half axis of the ELU, so that a negative value can be taken, and the unit activation mean value is closer to 0, so that the problem of deviation of activation away from the 0 value caused by the fact that the FRN lacks the mean value in the middle can be solved, and the problem of neuron death caused by the disappearance of the gradient in the input part can be solved, thereby increasing the expressive ability of the activation function, because the slope of the negative half segment of the tel function is smaller, when the input takes a smaller value, the soft saturation characteristic is provided, the change and the information of forward propagation are reduced, the robustness to noise is enhanced, the overall performance of the neural network is improved, and thus the image classification result is more accurate.
In some specific embodiments, the TELU activation function is:
or the like, or, alternatively,
where yi is the input image data, τ is a learning threshold, and α is an adjustable parameter.
In some specific embodiments, the tensor for the feature data is represented as [ B, W, H, C ], B is the mini batch size, C is the number of filters in the convolution, W is the width of the feature data, and H is the height of the feature data.
In some specific embodiments, the computer subprogram stored in the computer readable storage medium, when executed by the processor, may specifically implement the following steps:
obtaining the filter response vectors of the c channel and the b batch in the current layer of the neural network according to the characteristic data;
respectively carrying out mean square normalization processing on each vector to obtain a mean value of filter response;
and carrying out linear transformation on the average value to obtain target data.
In some specific embodiments, the process of respectively performing the mean-square normalization processing on each vector to obtain the average value of the filter response specifically includes:
respectively carrying out mean square normalization processing on each vector through a first relational expression;
obtaining the average value of the filter response through a second relational expression;
the first relation is upsilon2=∑ixi 2/N;
Wherein X ═ Xb,;,;c∈RNVector of response to filter for the c channel, the b batch, xiIs the ith element in the x set, upsilon2Is a mean-square normalization of x,as an average, ∈ is a normal number, and N ═ W × H.
In some specific embodiments, the process of obtaining the target data by performing linear transformation on the mean value includes:
carrying out linear transformation on the mean value through a third relational expression to obtain target data, wherein the third relational expression is
Where γ is a trainable scaling factor and β is a trainable offset.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (14)
1. An image classification method, comprising:
constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
performing convolution calculation on input image data to obtain characteristic data;
acquiring target data after mean square normalization processing corresponding to a c channel and a b batch in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
obtaining a characteristic diagram corresponding to the target data through the TELU activation function;
and obtaining an image classification result according to all the feature maps.
3. The image classification method according to claim 1, wherein the tensor of the feature data is represented by [ B, W, H, C ], B is mini batch size, C is the number of filters in convolution, W is the width of the feature data, and H is the height of the feature data.
4. The image classification method according to claim 3, wherein the process of obtaining the target data after the mean square normalization processing corresponding to the c-th channel and the b-th batch in the current layer of the neural network according to the feature data specifically includes:
obtaining the filter response vectors of the c channel and the b batch in the current layer of the neural network according to the characteristic data;
respectively carrying out mean square normalization processing on each vector to obtain a mean value of the response of the filter;
and carrying out linear transformation on the average value to obtain target data.
5. The image classification method according to claim 4, wherein the process of respectively performing mean-square normalization processing on each vector to obtain the mean value of the filter response specifically comprises:
respectively carrying out mean square normalization processing on each vector through a first relational expression;
obtaining the average value of the filter response through a second relational expression;
the first relation is upsilon2=∑ixi 2/N;
6. The image classification method according to claim 5, wherein the process of linearly transforming the mean value to obtain the target data comprises:
carrying out linear transformation on the average value through a third relational expression to obtain target data, wherein the third relational expression is
Where γ is a trainable scaling factor and β is a trainable offset.
7. An image classification apparatus, comprising:
the construction module is used for constructing a TELU activation function according to a function corresponding to the ELU negative half shaft and the TLU function;
the convolution calculation module is used for carrying out convolution calculation on the input image data to obtain characteristic data;
the normalization processing module is used for acquiring target data after mean square normalization processing corresponding to the c channel and the b batch in the current layer of the neural network according to the characteristic data, wherein b and c are positive integers;
the activation module is used for obtaining a characteristic diagram corresponding to the target data through the TELU activation function;
and the classification module is used for obtaining an image classification result according to all the characteristic graphs.
9. The image classification apparatus according to claim 7, wherein the tensor of the feature data is represented by [ B, W, H, C ], B is a mini batch size, C is the number of filters in convolution, W is the width of the feature data, and H is the height of the feature data.
10. The image classification device according to claim 9, wherein the normalization processing module includes:
the acquisition unit is used for acquiring the vector of the filter response of the c channel and the b batch in the current layer of the neural network according to the characteristic data;
the processing unit is used for respectively carrying out mean square normalization processing on each vector to obtain a mean value of the filter response;
and the transformation unit is used for carrying out linear transformation on the average value to obtain target data.
11. The image classification device according to claim 10, wherein the processing unit is specifically configured to:
respectively carrying out mean square normalization processing on each vector through a first relational expression;
obtaining the average value of the filter response through a second relational expression;
the first relation is upsilon2=∑ixi 2/N;
12. The image classification device according to claim 11, wherein the transformation unit is specifically configured to:
carrying out linear transformation on the average value through a third relational expression to obtain target data, wherein the third relational expression is
Where γ is a trainable scaling factor and β is a trainable offset.
13. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the image classification method according to any one of claims 1 to 6 when executing said computer program.
14. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the image classification method according to any one of claims 1 to 6.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011110384.4A CN112270343A (en) | 2020-10-16 | 2020-10-16 | Image classification method and device and related components |
PCT/CN2021/089922 WO2022077894A1 (en) | 2020-10-16 | 2021-04-26 | Image classification and apparatus, and related components |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011110384.4A CN112270343A (en) | 2020-10-16 | 2020-10-16 | Image classification method and device and related components |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112270343A true CN112270343A (en) | 2021-01-26 |
Family
ID=74337398
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011110384.4A Pending CN112270343A (en) | 2020-10-16 | 2020-10-16 | Image classification method and device and related components |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112270343A (en) |
WO (1) | WO2022077894A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022077894A1 (en) * | 2020-10-16 | 2022-04-21 | 苏州浪潮智能科技有限公司 | Image classification and apparatus, and related components |
CN114708460A (en) * | 2022-04-12 | 2022-07-05 | 济南博观智能科技有限公司 | Image classification method, system, electronic equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107358257A (en) * | 2017-07-07 | 2017-11-17 | 华南理工大学 | Under a kind of big data scene can incremental learning image classification training method |
CN109753983A (en) * | 2017-11-07 | 2019-05-14 | 北京京东尚科信息技术有限公司 | Image classification method, device and computer readable storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10290085B2 (en) * | 2016-12-14 | 2019-05-14 | Adobe Inc. | Image hole filling that accounts for global structure and local texture |
CN110569965A (en) * | 2019-08-27 | 2019-12-13 | 中山大学 | Neural network model optimization method and system based on ThLU function |
CN112270343A (en) * | 2020-10-16 | 2021-01-26 | 苏州浪潮智能科技有限公司 | Image classification method and device and related components |
-
2020
- 2020-10-16 CN CN202011110384.4A patent/CN112270343A/en active Pending
-
2021
- 2021-04-26 WO PCT/CN2021/089922 patent/WO2022077894A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107358257A (en) * | 2017-07-07 | 2017-11-17 | 华南理工大学 | Under a kind of big data scene can incremental learning image classification training method |
CN109753983A (en) * | 2017-11-07 | 2019-05-14 | 北京京东尚科信息技术有限公司 | Image classification method, device and computer readable storage medium |
Non-Patent Citations (2)
Title |
---|
DJORK-ARN´E CLEVERT等: "FAST AND ACCURATE DEEP NETWORK LEARNING BY EXPONENTIAL LINEAR UNITS (ELUS)", 《ICLR2016》 * |
SAURABH SINGH等: "Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks", 《ARXIV》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022077894A1 (en) * | 2020-10-16 | 2022-04-21 | 苏州浪潮智能科技有限公司 | Image classification and apparatus, and related components |
CN114708460A (en) * | 2022-04-12 | 2022-07-05 | 济南博观智能科技有限公司 | Image classification method, system, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2022077894A1 (en) | 2022-04-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11373087B2 (en) | Method and apparatus for generating fixed-point type neural network | |
US11657254B2 (en) | Computation method and device used in a convolutional neural network | |
US10789734B2 (en) | Method and device for data quantization | |
JP6724869B2 (en) | Method for adjusting output level of neurons in multilayer neural network | |
CN112949678B (en) | Deep learning model countermeasure sample generation method, system, equipment and storage medium | |
Yan et al. | An adaptive surrogate modeling based on deep neural networks for large-scale Bayesian inverse problems | |
CN112488104A (en) | Depth and confidence estimation system | |
CN112270343A (en) | Image classification method and device and related components | |
KR20200144398A (en) | Apparatus for performing class incremental learning and operation method thereof | |
JP3979007B2 (en) | Pattern identification method and apparatus | |
WO2018235449A1 (en) | Artificial neural network circuit training method, training program, and training device | |
CN116245015A (en) | Data change trend prediction method and system based on deep learning | |
CN111105017A (en) | Neural network quantization method and device and electronic equipment | |
KR20200129458A (en) | A computing device for training an artificial neural network model, a method for training an artificial neural network model, and a memory system for storing the same | |
CN111144560B (en) | Deep neural network operation method and device | |
US20230025626A1 (en) | Method and apparatus for generating process simulation models | |
CN114137967B (en) | Driving behavior decision method based on multi-network joint learning | |
CN107967691B (en) | Visual mileage calculation method and device | |
CN111612816B (en) | Method, device, equipment and computer storage medium for tracking moving target | |
Kawashima et al. | The aleatoric uncertainty estimation using a separate formulation with virtual residuals | |
CN116630697B (en) | Image classification method based on biased selection pooling | |
Hameed | A dynamic annealing learning for PLSOM neural networks: Applications in medicine and applied sciences | |
CN116205138B (en) | Wind speed forecast correction method and device | |
CN111105020B (en) | Feature representation migration learning method and related device | |
KR102539876B1 (en) | Layer optimization system for 3d rram device using artificial intelligence technology and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210126 |
|
RJ01 | Rejection of invention patent application after publication |