WO2023092938A1 - 一种图像识别方法、装置、设备及介质 - Google Patents

一种图像识别方法、装置、设备及介质 Download PDF

Info

Publication number
WO2023092938A1
WO2023092938A1 PCT/CN2022/089350 CN2022089350W WO2023092938A1 WO 2023092938 A1 WO2023092938 A1 WO 2023092938A1 CN 2022089350 W CN2022089350 W CN 2022089350W WO 2023092938 A1 WO2023092938 A1 WO 2023092938A1
Authority
WO
WIPO (PCT)
Prior art keywords
function
neural network
activation function
network model
preset
Prior art date
Application number
PCT/CN2022/089350
Other languages
English (en)
French (fr)
Inventor
陈静静
吴睿振
黄萍
王凛
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023092938A1 publication Critical patent/WO2023092938A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of artificial intelligence, in particular to an image recognition method, device, equipment and medium.
  • the activation function is a function added to the artificial neural network to help the network learn complex patterns in the data.
  • the general activation functions sigmoid and tanh that is, the hyperbolic tangent function
  • the gradients at both ends gradually approach zero, so as the depth increases, the magnitude of its calculation becomes smaller and smaller, and eventually the gradient disappears. , thus affecting the convergence speed of the model and the accuracy of image recognition.
  • an image recognition method including:
  • the image training sample data set includes image training sample data and label information corresponding to the image training sample data;
  • the preset The bias adjustment function is a function that constructs the sign function, the first trainable parameter, and the quadratic term in a multiplicative relationship
  • the trained neural network model is used to output a recognition result corresponding to the image to be recognized.
  • constructing the basic activation function and the preset bias increase function and bias adjustment function as a preset activation function in an additive relationship includes:
  • the basic activation function, the preset bias adjustment function, and the preset linear function are constructed as the activation function in an additive relationship to obtain the preset activation function;
  • the preset linear function includes a second trainable parameter.
  • the basic activation function, the preset bias adjustment function, and the preset linear function are constructed as the activation function in an additive relationship to obtain the preset activation function, including:
  • the basic activation function, the preset bias adjustment function, and the preset linear function are constructed as activation functions in an additive relationship to obtain the preset activation function.
  • the preset activation function is:
  • ⁇ (x, ⁇ ,a,b) ⁇ *h(x)+(1- ⁇ )*[u(x)+ ⁇ (x)]
  • h(x) is the basic activation function
  • u(x) is the preset linear function
  • ⁇ (x) is the preset bias adjustment function
  • is the trainable weight parameter
  • b and c are the second trainable parameters, and a is the first trainable parameter.
  • the basic activation function is a hyperbolic tangent function or a sigmoid function.
  • the determining the preset activation function as the activation function of the neural network model to obtain the initial neural network model includes:
  • the preset activation function is determined as the activation function of the cyclic neural network model to obtain an initial neural network model.
  • the inputting the image training sample data set into the initial neural network model for training until the model converges, and after obtaining the trained neural network model further includes:
  • test data set Inputting the test data set into the trained neural network model to obtain test results corresponding to the test data set;
  • an image recognition device including:
  • a training sample data acquisition module configured to acquire an image training sample data set; wherein, the image training sample data set includes image training sample data and tag information corresponding to the image training sample data;
  • the initial neural network model acquisition module is used to construct the basic activation function and the preset bias adjustment function as a preset activation function in an additive relationship, and determine the preset activation function as the activation function of the neural network model to obtain the initial neural network model.
  • a network model wherein, the preset bias adjustment function is a function that constructs a sign function, a first trainable parameter, and a quadratic term in a multiplicative relationship;
  • a neural network model training module configured to input the image training sample data set into the initial neural network model for training until the model converges to obtain a trained neural network model
  • the image recognition module is configured to use the trained neural network model to output a recognition result corresponding to the image to be recognized when the image to be recognized is acquired.
  • the embodiment of the present application also provides an electronic device, including a memory and one or more processors, where computer-readable instructions are stored in the memory, and the computer-readable instructions are executed by the one or more When the processor is executed, the one or more processors are made to execute the steps of any one of the above image recognition methods.
  • the embodiment of the present application finally provides one or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, one or more A plurality of processors execute the steps of any one of the image recognition methods described above.
  • Fig. 1 is a flow chart of an image recognition method provided by one or more embodiments of the present application
  • FIG. 2 is an example diagram of a MNIST data set provided by one or more embodiments of the present application
  • Fig. 3 is a schematic diagram of a recurrent neural network structure provided by one or more embodiments of the present application.
  • FIG. 4 is a comparison diagram for training based on a recurrent neural network model provided by one or more embodiments of the present application using the tanh function and the preset activation function provided by the scheme of the present application;
  • Fig. 5 is a schematic structural diagram of an image recognition device provided by one or more embodiments of the present application.
  • FIG. 6 is a structural diagram of an electronic device provided by one or more embodiments of the present application.
  • Fig. 7 is a schematic diagram of an activation function provided by one or more embodiments of the present application.
  • Fig. 8 is a Sigmoid function curve diagram provided by one or more embodiments of the present application.
  • FIG. 9 is a graph of the derivative of the Sigmoid function provided by one or more embodiments of the present application.
  • Figure 10 is a graph of the tanh function provided by one or more embodiments of the present application.
  • FIG. 11 is a graph of the derivative of the tanh function provided by one or more embodiments of the present application.
  • Fig. 12 is a schematic diagram of an activation function construction in the prior art.
  • the activation function is a function added to the artificial neural network to help the network learn complex patterns in the data.
  • the general activation functions sigmoid and tanh that is, the hyperbolic tangent function
  • the gradients at both ends gradually approach zero, so as the depth increases, the magnitude of its calculation becomes smaller and smaller, and eventually the gradient disappears. , thus affecting the convergence speed of the model and the accuracy of image recognition.
  • this application provides an image recognition scheme, which can avoid gradient disappearance, thereby improving the convergence speed of the model and the accuracy of image recognition
  • an image recognition method including:
  • Step S11 Acquire an image training sample data set; wherein, the image training sample data set includes image training sample data and label information corresponding to the image training sample data.
  • the MNIST data set can be obtained, a part of the data is used as an image training sample data set, and another part of the data is used as a test set.
  • other data sets are used as the training set.
  • MNIST data set is a classic small-scale image classification data set. A total of 250 different people's handwritten digital pictures were counted, 50% of which were high school students and 50% were from the staff of the Census Bureau. The purpose of collecting this data set is to realize the recognition of handwritten digits through algorithms.
  • MNIST contains 70,000 handwritten digital pictures, each picture is composed of 28 x 28 pixels, and each pixel is represented by a gray value. In the embodiment of the present application, 60,000 samples may be used as a training data set, and 10,000 samples may be used as a testing data set. Each sample has its corresponding label, represented by a single decimal number, corresponding to the category corresponding to the picture.
  • This data set is widely used in the field of machine learning and deep learning to test the effect of algorithms, such as linear classifiers (Linear Classifiers), K-Nearest Neighbors (K-Nearest Neighbors), support vector machines (SVMs), neural Neural Nets, Convolutional nets, etc.
  • Linear Classifiers Linear Classifiers
  • K-Nearest Neighbors K-Nearest Neighbors
  • SVMs support vector machines
  • neural Neural Nets e.g., a neural Neural Nets, Convolutional nets, etc.
  • FIG. 2 is an example diagram of an MNIST data set provided in an embodiment of the present application.
  • Step S12 Construct the basic activation function and the preset bias adjustment function as a preset activation function in an additive relationship, and determine the preset activation function as the activation function of the neural network model to obtain the initial neural network model; wherein, the The preset bias adjustment function is a function that constructs a sign function, a first trainable parameter, and a quadratic term in a multiplicative relationship.
  • the basic activation function, the preset bias adjustment function, and the preset linear function can be constructed as an activation function in an additive relationship to obtain a preset activation function; wherein, the preset linear function includes a second possible training parameters.
  • the basic activation function, the preset bias adjustment function, and the preset linear function can be constructed as an activation function in an additive relationship to obtain the preset activation function.
  • the preset activation function may be:
  • ⁇ (x, ⁇ ,a,b) ⁇ *h(x)+(1- ⁇ )*[u(x)+ ⁇ (x)]
  • h(x) is the basic activation function
  • u(x) is the preset linear function
  • ⁇ (x) is the preset bias adjustment function
  • is the trainable weight parameter
  • b and c are the second trainable parameters, and a is the first trainable parameter.
  • ⁇ (x) the derivative of ⁇ (x) can be obtained:
  • the basic activation function is a hyperbolic tangent function or a sigmoid function.
  • the existing general activation function has a fixed functional form and its parameters are fixed and cannot be trained.
  • the activation function is constructed as a function with a fixed structure, but its parameters are the same as the neural parameters, which are It can be trained according to the distribution of models, tasks and data. Therefore, the embodiment of this application proposes an activation function with trainable parameters that can be trained according to the distribution of models, tasks and data on the basis of the original general activation function, and the gradient disappearance is taken into account in the construction process The problem.
  • the embodiment of the present application may determine the preset activation function as the activation function of the recurrent neural network model to obtain the initial neural network model.
  • FIG. 3 is a schematic structural diagram of a recurrent neural network provided by an embodiment of the present application.
  • the left side of the equal sign is a schematic diagram of the cyclic neural network model not expanded in time series
  • the right side of the equal sign is a schematic diagram of expanded time series.
  • Figure 3 describes the RNN model near the time series index number t.
  • xt represents the input of the training sample at the sequence index number t.
  • xt-1 and xt+1 represent the input of the training samples at sequence index numbers t-1 and t+1.
  • ht represents the hidden state of the model at sequence index number t.
  • ht is jointly determined by xt and ht-1.
  • ot represents the output of the model at sequence index t.
  • ot is only determined by the model's current hidden state ht.
  • the following is a strict mathematical definition of the classic RNN structure:
  • the input is x 1 , x 2 , ... x t
  • the corresponding hidden state is h 1 , h 2 , ... h t ;
  • the output is y 1 , y 2 , ... y t , for example, the operation process of the classic RNN can be expressed as:
  • U, W, V, b, and c are all parameters, and f( ⁇ ) represents the activation function, which is generally a tanh function.
  • the embodiment of the present application takes the classic RNN as an example, replaces the activation function in the classic RNN with the preset activation function provided by the present application, and uses the RNN to realize MNIST handwriting classification.
  • the network structure is as follows:
  • the fourth layer Dense (100)
  • the sixth layer softmax
  • Loss function cross-entropy loss function, torch.nn.CrossEntropyLoss, which describes the actual output The distance from the expected output y, where n is the batchsize (batch size), and i represents the i-th sample data:
  • the optimizer chooses Adam.
  • Step S13 Input the image training sample data set into the initial neural network model for training until the model converges to obtain a trained neural network model.
  • the training loss is calculated, and the model is updated based on the loss until the model converges to obtain the trained neural network model.
  • test data set input the test data set into the trained neural network model to obtain the test results corresponding to the test data set; use the test results to evaluate the accuracy of the trained neural network model .
  • a part of the data in the MNIST dataset can be used as a test set to evaluate the accuracy of the neural network model after training.
  • FIG. 4 is a comparison diagram of a training based on a recurrent neural network model disclosed in the embodiment of the present application using the tanh function and the preset activation function provided by the scheme of the present application.
  • ⁇ (x, ⁇ ,a,b) ⁇ *h(x)+(1- ⁇ )*[u(x)+ ⁇ (x)]
  • the above-mentioned recurrent neural network model is trained and tested on the MNIST dataset. It can be seen from Figure 4 that the activation function provided by the application scheme converges faster than the original tanh function, and the accuracy of the model is higher than the original tanh function.
  • the accuracy rate of the model using the tanh activation function is 0.9842
  • the accuracy rate of the model using the activation function provided by this application is 0.9921. It can be seen that the convergence speed and model accuracy of the scheme provided by the present application are both better than the original tanh function.
  • Step S14 When the image to be recognized is acquired, use the trained neural network model to output a recognition result corresponding to the image to be recognized.
  • preset activation functions provided in this application can also be applied to other data sets and models to achieve model training and model applications, such as weather prediction and so on.
  • the embodiment of the present application first obtains the image training sample data set; wherein, the image training sample data set includes the image training sample data and the label information corresponding to the image training sample data, and the basic activation function and the preset bias
  • the adjustment function is constructed as a preset activation function with an additive relationship, and the preset activation function is determined as the activation function of the neural network model to obtain the initial neural network model; wherein, the preset bias adjustment function is a symbolic function,
  • the first trainable parameter, the function that the quadratic term is constructed with the multiplication relation then input the described image training sample data set into the described initial neural network model to train, until the model converges, obtain the neural network model after training, when obtaining to-be-waited Recognizing an image, using the trained neural network model to output a recognition result corresponding to the image to be recognized.
  • the activation function used by the neural network model in this application is an activation function with a preset bias adjustment function added on the basis of the basic activation function, and the default bias adjustment function is the sign function, the first trainable Parameters and quadratic terms are constructed with a multiplicative relationship.
  • the bias adjustment function is the sign function
  • the first trainable Parameters and quadratic terms are constructed with a multiplicative relationship.
  • an image recognition device including:
  • the training sample data acquisition module 11 is configured to acquire an image training sample data set; wherein, the image training sample data set includes image training sample data and label information corresponding to the image training sample data.
  • the initial neural network model acquisition module 12 is used to construct the basic activation function and the preset bias adjustment function as a preset activation function in an additive relationship, and determine the preset activation function as the activation function of the neural network model to obtain the initial A neural network model; wherein, the preset bias adjustment function is a function that constructs a symbolic function, a first trainable parameter, and a quadratic term in a multiplicative relationship.
  • a neural network model training module 13 configured to input the image training sample data set into the initial neural network model for training until the model converges to obtain a trained neural network model.
  • the image recognition module 14 is configured to use the trained neural network model to output a recognition result corresponding to the image to be recognized when the image to be recognized is acquired.
  • the embodiment of the present application first obtains the image training sample data set; wherein, the image training sample data set includes the image training sample data and the label information corresponding to the image training sample data, and the basic activation function and the preset bias
  • the adjustment function is constructed as a preset activation function with an additive relationship, and the preset activation function is determined as the activation function of the neural network model to obtain the initial neural network model; wherein, the preset bias adjustment function is a symbolic function,
  • the first trainable parameter, the function that the quadratic term is constructed with the multiplication relation then input the described image training sample data set into the described initial neural network model to train, until the model converges, obtain the neural network model after training, when obtaining to-be-waited Recognizing an image, using the trained neural network model to output a recognition result corresponding to the image to be recognized.
  • the activation function used by the neural network model in this application is an activation function with a preset bias adjustment function added on the basis of the basic activation function, and the default bias adjustment function is the sign function, the first trainable Parameters and quadratic terms are constructed with a multiplicative relationship.
  • the bias adjustment function is the sign function
  • the first trainable Parameters and quadratic terms are constructed with a multiplicative relationship.
  • the initial neural network model acquisition module 12 is specifically used to construct the basic activation function, the preset bias adjustment function, and the preset linear function as an activation function in an additive relationship to obtain a preset activation function; wherein, the preset linear The function includes a second trainable parameter.
  • the initial neural network model acquisition module 12 is specifically used to construct the basic activation function, the preset bias adjustment function, and the preset linear function into an activation function in an additive relationship according to the trainable weight parameters to obtain the preset activation function.
  • the preset activation function is:
  • ⁇ (x, ⁇ ,a,b) ⁇ *h(x)+(1- ⁇ )*[u(x)+ ⁇ (x)]
  • h(x) is the basic activation function
  • u(x) is the preset linear function
  • ⁇ (x) is the preset bias adjustment function
  • is the trainable weight parameter
  • b and c are the second trainable parameters, and a is the first trainable parameter.
  • the basic activation function is a hyperbolic tangent function or a sigmoid function.
  • the initial neural network model acquisition module 12 is specifically configured to determine the preset activation function as the activation function of the recurrent neural network model to obtain the initial neural network model.
  • the device also includes a model evaluation module for:
  • the image training sample data set is input into the initial neural network model for training until the model converges, and after obtaining the trained neural network model, a test data set is obtained; the test data set is input into the trained neural network model, Obtaining test results corresponding to the test data set; using the test results to evaluate the accuracy of the trained neural network model.
  • the embodiment of the present application discloses an electronic device 20, including a memory 22 and one or more processors 21.
  • Computer-readable instructions are stored in the memory 22, and the computer-readable instructions are processed by one or more processors.
  • processors 21 When executed by the processor 21, one or more processors 21 are executed to perform the steps of any one of the above-mentioned image recognition methods.
  • the memory 22, as a resource storage carrier may be a read-only memory, random access memory, magnetic disk or optical disk, etc., and the storage method may be temporary storage or permanent storage.
  • the electronic device 20 also includes a power supply 23, a communication interface 24, an input and output interface 25, and a communication bus 26; wherein the power supply 23 is used to provide operating voltages for each hardware device on the electronic device 20; the The communication interface 24 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows is any communication protocol applicable to the technical solution of the present application, which is not specifically limited here;
  • the input and output interface 25 is used to obtain external input data or output data to the external, and its specific interface type can be selected according to specific application needs, and is not specifically limited here.
  • the embodiment of the present application also discloses one or more non-volatile computer-readable storage media storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, one or more The processor executes the steps of any one of the above image recognition methods.
  • An Activation Function is a function added to an artificial neural network to help the network learn complex patterns in data. Similar to neuron-based models in the human brain, the activation function ultimately determines what gets fired to the next neuron.
  • a node's activation function defines the node's output given an input or set of inputs.
  • a standard computer chip circuit can be thought of as a digital circuit activation function that produces an output that is on (1) or off (0) depending on the input.
  • an activation function is a mathematical equation that determines the output of a neural network. The mathematical process of the activation function can be described as shown in FIG. 7 , which is a schematic diagram of an activation function provided in an embodiment of the present application.
  • X represents the sample feature
  • m represents the number of samples
  • i represents the i-th sample
  • the input is X
  • the operation of each X in the convolution kernel is its weight multiplied by the sample feature, and the obtained After adding, add the offset value to get the final output, and its operation is described as:
  • This output is used as the input of the activation function.
  • the activation function is f(x) in FIG. 7 , and the final output result is y through the operation of the activation function.
  • ANN Artificial Neural Network, artificial neural network
  • MAC Multiply Accumulate, multiplication and accumulation operation
  • activation function has the greatest impact on the accuracy of the final calculation result.
  • Different activation functions are applied to different AI (Artificial Intelligence, artificial intelligence) models and are suitable for different computing tasks. Two common activation functions are described below:
  • the Sigmoid function is also called the Logistic function because the Sigmoid function can be inferred from logistic regression (LR) and is also the activation function specified by the LR model.
  • the value range of the Sigmoid function is between (0,1), and the output of the network can be mapped to this range for easy analysis. Its formula is expressed as:
  • FIG. 8 is a graph of the Sigmoid function provided by the embodiment of the present application.
  • FIG. 9 is a graph of the derivative of the Sigmoid function provided by the embodiment of the present application.
  • the sigmoid function has the advantages of smoothness and easy derivation, and solves the problem of continuity of the function and its derivatives. But correspondingly, sigmoid also has the following disadvantages: 1. The amount of calculation is large; 2. When backpropagating to find the error gradient, the derivative operation involves division; 3. The derivatives at both ends are infinitely close to 0 and may occur in deep operations. The gradient disappears; 4. The function is not symmetrical based on 0, and it is easy to change the distribution characteristics of the data when the operation is deepened.
  • Tanh is a hyperbolic tangent function, and its English name is Hyperbolic Tangent. Tanh and sigmoid are similar, both belong to the saturation activation function, the difference is that the output value range has changed from (0,1) to (-1,1), the tanh function can be regarded as the result of the sigmoid's downward translation and stretching. Its formula is expressed as:
  • FIG. 10 is a graph of the tanh function provided by the embodiment of the present application.
  • FIG. 11 is a graph of the derivative of the tanh function provided by the embodiment of the present application.
  • the tanh function solves the problem of 0 symmetry, and its derivative curve is steeper, indicating that it has a better convergence speed.
  • the tanh function still has the following disadvantages: 1. The amount of calculation is large; 2. When backpropagating to find the error gradient, the derivative operation involves division; 3. The derivative at both ends approaches 0 infinitely, and the gradient may disappear in deep-level operations. .
  • the activation function needs to be continuous on both its curve and derivative curve (continuously differentiable), so that its activation function can exhibit smooth properties.
  • a discontinuous function such as Relu
  • Relu when the data falls at its discontinuous point, such as 0, it will have an unsatisfactory effect on its classification, and because the probability of such a drop point varies with the discontinuous point of the function
  • the number of discontinuities varies, so the constructed activation function can accept discontinuities, but needs to minimize their number.
  • the common gradient disappearance exists in sigmoid and tanh, because the gradients of sigmoid and tanh at both ends gradually approach zero, so as the depth increases, the magnitude of its calculation becomes smaller and smaller, and finally the gradient disappears.
  • the general gradient is less than 0.024 activation function, that is, the gradient will disappear.
  • a common solution in the industry to solve the gradient disappearance problem is to use a Gaussian or random distribution near the zero end of the gradient to make it jitter and reduce the gradient disappearance.
  • the existing activation function can be constructed according to the core operation unit composed of unary operation and binary operation, as shown in FIG. 12 , which is a schematic diagram of an activation function construction in the prior art. As shown in Figure 12, the research on the activation function divides it into two parts: unary operation and binary operation. For different data x, the operations are continuously combined with each other, and finally an activation function is formed to realize the operation output for all data. According to the existing successful way of constructing activation functions, unary operations and binary operations have the following combinations:
  • unary function represents a single input and single output
  • binary function represents the operation of two inputs and one output. Any activation function can be obtained using a combination of unary and binary functions.
  • a good activation function in order to achieve fast classification activation at the near zero point, requires a significant gradient change, that is, the closer to the 0 point, the higher the gradient, and the farther away from the 0 point, the lower the gradient. .
  • the unary functions that can satisfy such extrusion characteristics only the exponent exp can be effectively satisfied, so most activation functions will more or less use the exponent exp to construct the activation function.
  • sigmoid and tanh activation functions Both sigmoid and tanh activation functions perform very well in neural networks. Is a general activation function, but not the optimal activation function.
  • LSTM Long-Short Term Memory, long-short memory neural network
  • GRU Gated Recurrent unit
  • each embodiment in this specification is described in a progressive manner, each embodiment focuses on the difference from other embodiments, and the same or similar parts of each embodiment can be referred to each other.
  • the description is relatively simple, and for relevant details, please refer to the description of the method part.
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically programmable ROM
  • EEPROM electrically erasable programmable ROM
  • registers hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种图像识别方法、装置、设备及介质,获取图像训练样本数据集;图像训练样本数据集包括图像训练样本数据和所述图像训练样本数据对应的标签信息;将基础激活函数和预设偏置调整函数以加法关系构造为预设激活函数,并将所述预设激活函数确定为神经网络模型的激活函数,得到初始神经网络模型;预设偏置调整函数为将符号函数、第一可训练参数、二次项以乘法关系构造的函数;将所述图像训练样本数据集输入所述初始神经网络模型进行训练,直到模型收敛,得到训练后神经网络模型;当获取到待识别图像,利用所述训练后神经网络模型输出所述待识别图像对应的识别结果。

Description

一种图像识别方法、装置、设备及介质
相关申请的交叉引用
本申请要求于2021年11月24日提交中国专利局,申请号为202111398690.7,申请名称为“一种图像识别方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,特别涉及一种图像识别方法、装置、设备及介质。
背景技术
随着人工智能技术的发展,如何利用人工神经网络模型进行图像识别得到了广泛的研究,激活函数是一种添加到人工神经网络中的函数,旨在帮助网络学习数据中的复杂模式。目前,通用的激活函数sigmoid和tanh(即双曲正切函数),在两端的梯度都逐渐趋近于零,因此随着深度的增加,其计算导致的量级越来越小,最终发生梯度消失,从而影响模型的收敛速度以及图像识别的准确度。
发明内容
第一方面,本申请公开了一种图像识别方法,包括:
获取图像训练样本数据集;其中,所述图像训练样本数据集包括图像训练样本数据和所述图像训练样本数据对应的标签信息;
将基础激活函数和预设偏置调整函数以加法关系构造为预设激活函数,并将所述预设激活函数确定为神经网络模型的激活函数,得到初始神经网络模型;其中,所述预设偏置调整函数为将符号函数、第一可训练参数、二次项以乘法关系构造的函数;
将所述图像训练样本数据集输入所述初始神经网络模型进行训练,直到模型收敛,得到训练后神经网络模型;和
当获取到待识别图像,利用所述训练后神经网络模型输出所述待识别图像对应的识别结果。
可选的,所述将基础激活函数和预设偏置增加函数偏置调整函数以加法关系构造为预设激活函数,包括:
将基础激活函数和预设偏置调整函数、预设线性函数以加法关系构造为激活函数,得到预设激活函数;
其中,所述预设线性函数包括第二可训练参数。
可选的,所述将基础激活函数和预设偏置调整函数、预设线性函数以加法关系构造为激活函数,得到预设激活函数,包括:
根据可训练权重参数,将基础激活函数和预设偏置调整函数、预设线性函数以加法关系构造为激活函数,得到预设激活函数。
可选的,所述预设激活函数为:
φ(x,α,a,b)=α*h(x)+(1-α)*[u(x)+η(x)]
其中,h(x)为基础激活函数,u(x)为预设线性函数,η(x)为预设偏置调整函数,α为可训练权重参数,并且,
u(x)=b*x+c
η(x)=sign(x)*a*x 2
其中,b、c为第二可训练参数,a为第一可训练参数。
可选的,所述基础激活函数为双曲正切函数或sigmoid函数。
可选的,所述将所述预设激活函数确定为神经网络模型的激活函数,得到初始神经网络模型,包括:
将所述预设激活函数确定为循环神经网络模型的激活函数,得到初始神经网络模型。
可选的,所述将所述图像训练样本数据集输入所述初始神经网络模型进行训练,直到模型收敛,得到训练后神经网络模型之后,还包括:
获取测试数据集;
将所述测试数据集输入所述训练后神经网络模型,得到所述测试数据集对应的测试结果;和
利用所述测试结果评估所述训练后神经网络模型的准确度。
第二方面,本申请公开了一种图像识别装置,包括:
训练样本数据获取模块,用于获取图像训练样本数据集;其中,所述图像训练样本数据集包括图像训练样本数据和所述图像训练样本数据对应的标签信息;
初始神经网络模型获取模块,用于将基础激活函数和预设偏置调整函数以加法关系构造为预设激活函数,并将所述预设激活函数确定为神经网络模型的激活函数,得到初始神经网 络模型;其中,所述预设偏置调整函数为将符号函数、第一可训练参数、二次项以乘法关系构造的函数;
神经网络模型训练模块,用于将所述图像训练样本数据集输入所述初始神经网络模型进行训练,直到模型收敛,得到训练后神经网络模型;和
图像识别模块,用于当获取到待识别图像,利用所述训练后神经网络模型输出所述待识别图像对应的识别结果。
第三方面,本申请实施例还提供了一种电子设备,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行上述任一项图像识别方法的步骤。
第四方面,本申请实施例最后还提供了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述任一项图像识别方法的步骤。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请一个或多个实施例提供的一种图像识别方法流程图;
图2为本申请一个或多个实施例提供的一种MNIST数据集示例图;
图3为本申请一个或多个实施例提供的一种循环神经网络结构示意图;
图4为本申请一个或多个实施例提供的一种基于循环神经网络模型的采用tanh函数和本申请方案提供的预设激活函数进行训练的对比图;
图5为本申请一个或多个实施例提供的一种图像识别装置结构示意图;
图6为本申请一个或多个实施例提供的一种电子设备结构图;
图7为本申请一个或多个实施例提供的一种激活函数示意图;
图8为本申请一个或多个实施例提供的Sigmoid函数曲线图;
图9为本申请一个或多个实施例提供的Sigmoid函数导数曲线图;
图10为本申请一个或多个实施例提供的tanh函数曲线图;
图11为本申请一个或多个实施例提供的tanh函数导数曲线图;
图12为现有技术中的一种激活函数构造示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
随着人工智能技术的发展,如何利用人工神经网络模型进行图像识别得到了广泛的研究,激活函数是一种添加到人工神经网络中的函数,旨在帮助网络学习数据中的复杂模式。目前,通用的激活函数sigmoid和tanh(即双曲正切函数),在两端的梯度都逐渐趋近于零,因此随着深度的增加,其计算导致的量级越来越小,最终发生梯度消失,从而影响模型的收敛速度以及图像识别的准确度。为此,本申请提供了一种图像识别方案,能够避免梯度消失,从而提升模型收敛速度以及图像识别的准确度
参见图1所示,本申请实施例公开了一种图像识别方法,包括:
步骤S11:获取图像训练样本数据集;其中,所述图像训练样本数据集包括图像训练样本数据和所述图像训练样本数据对应的标签信息。
在具体的实施方式中,可以获取MNIST数据集,将其中的一部分数据作为图像训练样本数据集,另一部分数据作为测试集。当然,在另外一些实施例中,采用其他的数据集作为训练集。
需要指出的是,MNIST数据集是经典的小型图像分类数据集,一共统计了来自250个不同的人手写数字图片,其中50%是高中生,50%来自人口普查局的工作人员。该数据集的收集目的是希望通过算法,实现对手写数字的识别。MNIST包含70,000张手写体数字图片,每张图片由28 x 28个像素点构成,每个像素点用一个灰度值表示。本申请实施例可以以其中60000个样本作为训练数据集,10000张样本作为测试数据集。每个样本都有其对应的标签,用单个十进制数表示,对应图片对应的类别。该数据集被广泛地应用于机器学习和深度学习领域,用来测试算法的效果,例如线性分类器(Linear Classifiers)、K-近邻算法(K-Nearest Neighbors)、支持向量机(SVMs)、神经网络(Neural Nets)、卷积神经网络(Convolutional nets)等等。例如,参见图2所示,图2为本申请实施例提供的一种MNIST 数据集示例图。
步骤S12:将基础激活函数和预设偏置调整函数以加法关系构造为预设激活函数,并将所述预设激活函数确定为神经网络模型的激活函数,得到初始神经网络模型;其中,所述预设偏置调整函数为将符号函数、第一可训练参数、二次项以乘法关系构造的函数。
在具体的实施方式中,可以将基础激活函数和预设偏置调整函数、预设线性函数以加法关系构造为激活函数,得到预设激活函数;其中,所述预设线性函数包括第二可训练参数。
进一步的,本申请实施例可以根据可训练权重参数,将基础激活函数和预设偏置调整函数、预设线性函数以加法关系构造为激活函数,得到预设激活函数。
在具体的实施方式中,所述预设激活函数可以为:
φ(x,α,a,b)=α*h(x)+(1-α)*[u(x)+η(x)]
其中,h(x)为基础激活函数,u(x)为预设线性函数,η(x)为预设偏置调整函数,α为可训练权重参数,并且,
u(x)=b*x+c
η(x)=sign(x)*a*x 2
其中,b、c为第二可训练参数,a为第一可训练参数。
可以理解的是,对u(x)求导可以得到:u(x)′=b,因此u(x)用来移动基础激活函数,使其满足在数据分布密集的地方梯度最大,这样得到了可以根据模型、任务及数据的分布情况进行训练的含有可训练参数的激活函数,从而提升模型收敛速度和模型准确度。另外,对η(x)求导可以得到:
η(x)′=2* sign(x)*a*x
这样,给梯度加一个与x值成比例的偏置,当x趋于两端时,能有效的避免梯度消失的情况。
其中,所述基础激活函数为双曲正切函数或sigmoid函数。
需要指出的是,现有通用的激活函数具有固定的函数形式且参数固定不可训练,本申请实施例中,将激活函数构造为有固定结构的函数,但是它的参数和神经的参数一样,是可以根据模型、任务及数据的分布情况进行训练的。因此本申请实施例在原有通用的激活函数的基础上,提出了一种可以根据模型、任务及数据的分布情况进行训练的含有可训练参数的激 活函数,并且在构造的过程中考虑到了梯度消失的问题。
并且,在具体的实施方式中,本申请实施例可以将所述预设激活函数确定为循环神经网络模型的激活函数,得到初始神经网络模型。
需要指出的是,梯度消失的情况在RNN(即Recurrent Neural Network,循环神经网络)中表现得更为明显,因此,本申请实施例采用循环神经网络模型,但是在另外一些实施例中,可以应用于其他神经网络模型,以解决其梯度消失问题。
参见图3所示,图3为本申请实施例提供的一种循环神经网络结构示意图。等号左侧是循环神经网络模型没有按时间序列展开的示意图,等号右侧是按时间序列展开的示意图,图3描述了在时间序列索引号t附近RNN的模型。其中,xt代表在序列索引号t时训练样本的输入。同样的xt-1和xt+1代表在序列索引号t-1和t+1时训练样本的输入。ht代表在序列索引号t时模型的隐藏状态。ht由xt和ht-1共同决定。ot代表在序列索引号t时模型的输出。ot只由模型当前的隐藏状态ht决定。下面为经典RNN结构的严格数学定义:
输入为x 1,x 2,…x t,对应的隐藏状态为h 1,h 2,…h t
输出为y 1,y 2,…y t,如,则经典RNN的运算过程可以表示为:
h t=f(Ux t+Wh t-1+b)
y t=soft max(Vh t+c)
其中,U,W,V,b,c均为参数,而f(·)表示激活函数,一般为tanh函数。
也即,本申请实施例以经典RNN为例,将经典RNN中的激活函数替换为本申请提供的预设激活函数,利用RNN实现MNIST手写体分类。网络结构如下:
输入:28*28;
第一层:RNN(100,activation='tanh',return_sequences=True);
第二层:RNN(200,activation='tanh',return_sequences=True);
第三层:RNN(50,activation='tanh');
第四层:Dense(100)
第五层:Dense(10)
第六层:softmax
损失函数:交叉熵损失函数,torch.nn.CrossEntropyLoss,刻画的是实际输出
Figure PCTCN2022089350-appb-000001
与期望输 出y的距离,其中,n为batchsize(批尺寸),i表示第i个样本数据:
Figure PCTCN2022089350-appb-000002
优化器选择Adam。
步骤S13:将所述图像训练样本数据集输入所述初始神经网络模型进行训练,直到模型收敛,得到训练后神经网络模型。
在训练的过程中,计算训练损失,基于损失更新模型,直到模型收敛,得到训练后神经网络模型。
进一步的,获取测试数据集;将所述测试数据集输入所述训练后神经网络模型,得到所述测试数据集对应的测试结果;利用所述测试结果评估所述训练后神经网络模型的准确度。
如前述内容可知,本申请实施例可以采用MNIST数据集中的一部分数据作为测试集,评估训练后神经网络模型的准确度。
参见图4所示,图4为本申请实施例公开的一种基于循环神经网络模型的采用tanh函数和本申请方案提供的预设激活函数进行训练的对比图。使用相同的循环神经网络模型、相同的超参数,分别使用tanh激活函数和本申请方案提供的预设激活函数,即:
φ(x,α,a,b)=α*h(x)+(1-α)*[u(x)+η(x)]
对上述循环神经网络模型在MNIST数据集上进行模型训练及测试。从图4可以看出,本申请方案提供的激活函数收敛速度比原始tanh函数要快,且模型的准确率比原始的tanh函数要高。将训练好的模型应用到相同测试集进行推断的时候,使用tanh激活函数的模型的准确率为0.9842,使用本申请提供的激活函数的模型的准确率为0.9921。由此可见,本申请提供的方案的收敛速度和模型的准确率都要优于原始的tanh函数。
步骤S14:当获取到待识别图像,利用所述训练后神经网络模型输出所述待识别图像对应的识别结果。
需要指出的是,本申请提供的预设激活函数也可以应用于其他数据集和模型,实现模型训练以及模型应用,比如天气预测等等。
可见,本申请实施例先获取图像训练样本数据集;其中,所述图像训练样本数据集包括图像训练样本数据和所述图像训练样本数据对应的标签信息,以及将基础激活函数和预设偏置调整函数以加法关系构造为预设激活函数,并将所述预设激活函数确定为神经网络模型的激活函数,得到初始神经网络模型;其中,所述预设偏置调整函数为将符号函数、第一可训 练参数、二次项以乘法关系构造的函数,然后将所述图像训练样本数据集输入所述初始神经网络模型进行训练,直到模型收敛,得到训练后神经网络模型,当获取到待识别图像,利用所述训练后神经网络模型输出所述待识别图像对应的识别结果。也即,本申请中神经网络模型所采用的激活函数为在基础激活函数的基础上增加了预设偏置调整函数的激活函数,且预设偏置调整函数为将符号函数、第一可训练参数、二次项以乘法关系构造的函数,这样,在求梯度时,给梯度增加了一个与自变量成线性比例的偏置,且由于利用了符号函数,该偏置不为负数,当自变量趋向两端时,能够避免梯度消失,从而提升模型收敛速度以及图像识别的准确度。
参见图5所示,本申请实施例公开了一种图像识别装置,包括:
训练样本数据获取模块11,用于获取图像训练样本数据集;其中,所述图像训练样本数据集包括图像训练样本数据和所述图像训练样本数据对应的标签信息。
初始神经网络模型获取模块12,用于将基础激活函数和预设偏置调整函数以加法关系构造为预设激活函数,并将所述预设激活函数确定为神经网络模型的激活函数,得到初始神经网络模型;其中,所述预设偏置调整函数为将符号函数、第一可训练参数、二次项以乘法关系构造的函数。
神经网络模型训练模块13,用于将所述图像训练样本数据集输入所述初始神经网络模型进行训练,直到模型收敛,得到训练后神经网络模型。
图像识别模块14,用于当获取到待识别图像,利用所述训练后神经网络模型输出所述待识别图像对应的识别结果。
可见,本申请实施例先获取图像训练样本数据集;其中,所述图像训练样本数据集包括图像训练样本数据和所述图像训练样本数据对应的标签信息,以及将基础激活函数和预设偏置调整函数以加法关系构造为预设激活函数,并将所述预设激活函数确定为神经网络模型的激活函数,得到初始神经网络模型;其中,所述预设偏置调整函数为将符号函数、第一可训练参数、二次项以乘法关系构造的函数,然后将所述图像训练样本数据集输入所述初始神经网络模型进行训练,直到模型收敛,得到训练后神经网络模型,当获取到待识别图像,利用所述训练后神经网络模型输出所述待识别图像对应的识别结果。也即,本申请中神经网络模型所采用的激活函数为在基础激活函数的基础上增加了预设偏置调整函数的激活函数,且预设偏置调整函数为将符号函数、第一可训练参数、二次项以乘法关系构造的函数,这样,在求梯度时,给梯度增加了一个与自变量成线性比例的偏置,且由于利用了符号函数,该偏置 不为负数,当自变量趋向两端时,能够避免梯度消失,从而提升模型收敛速度以及图像识别的准确度。
其中,初始神经网络模型获取模块12,具体用于将基础激活函数和预设偏置调整函数、预设线性函数以加法关系构造为激活函数,得到预设激活函数;其中,所述预设线性函数包括第二可训练参数。
进一步的,初始神经网络模型获取模块12,具体用于根据可训练权重参数,将基础激活函数和预设偏置调整函数、预设线性函数以加法关系构造为激活函数,得到预设激活函数。
在具体的实施方式中,所述预设激活函数为:
φ(x,α,a,b)=α*h(x)+(1-α)*[u(x)+η(x)]
其中,h(x)为基础激活函数,u(x)为预设线性函数,η(x)为预设偏置调整函数,α为可训练权重参数,并且,
u(x)=b*x+c
η(x)=sign(x)*a*x 2
其中,b、c为第二可训练参数,a为第一可训练参数。
并且,所述基础激活函数为双曲正切函数或sigmoid函数。
初始神经网络模型获取模块12,具体用于将所述预设激活函数确定为循环神经网络模型的激活函数,得到初始神经网络模型。
所述装置还包括模型评估模块,用于:
将所述图像训练样本数据集输入所述初始神经网络模型进行训练,直到模型收敛,得到训练后神经网络模型之后,获取测试数据集;将所述测试数据集输入所述训练后神经网络模型,得到所述测试数据集对应的测试结果;利用所述测试结果评估所述训练后神经网络模型的准确度。
参见图6所示,本申请实施例公开了一种电子设备20,包括存储器22及一个或多个处理器21,存储器22中储存有计算机可读指令,计算机可读指令被一个或多个处理器21执行时,使得一个或多个处理器21执行上述任一项图像识别方法的步骤。
关于上述图像识别方法的具体过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
并且,所述存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,存储方式可以是短暂存储或者永久存储。
另外,所述电子设备20还包括电源23、通信接口24、输入输出接口25和通信总线26;其中,所述电源23用于为所述电子设备20上的各硬件设备提供工作电压;所述通信接口24能够为所述电子设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行具体限定;所述输入输出接口25,用于获取外界输入数据或向外界输出数据,其具体的接口类型可以根据具体应用需要进行选取,在此不进行具体限定。
进一步的,本申请实施例还公开了一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述任一项图像识别方法的步骤。
关于上述图像识别方法的具体过程可以参考前述实施例中公开的相应内容,在此不再进行赘述。
下面,为了本领域技术人员充分理解本申请实施例所提供的技术方案所具有的技术效果,可实际应用价值。对现有技术中存的问题进行进一步说明。
激活函数(Activation Function)是一种添加到人工神经网络中的函数,旨在帮助网络学习数据中的复杂模式。类似于人类大脑中基于神经元的模型,激活函数最终决定了要发射给下一个神经元的内容。在人工神经网络中,一个节点的激活函数定义了该节点在给定的输入或输入集合下的输出。标准的计算机芯片电路可以看作是根据输入得到开(1)或关(0)输出的数字电路激活函数。因此,激活函数是确定神经网络输出的数学方程式。激活函数的数学过程可以描述为图7所示,图7为本申请实施例提供的一种激活函数示意图。如图7所示,X表示样本特征,m表示样本数量,i表示第i个样本,输入为X,每个X在卷积核中所做的运算为其权重乘以样本特征,所得经过连加后,再加上偏移值,得到最终的输出,其运算描述为:
Figure PCTCN2022089350-appb-000003
该输出作为激活函数的输入,激活函数在图7中即为f(x),通过激活函数的运算,最终输出结果为y。
如上述内容可知,在人工智能计算中,数据的分布绝大多数是非线性的,而一般神经网络的计算是线性的,引入激活函数,是在神经网络中引入非线性,强化网络的学习能力。所以激活函数的最大特点就是非线性。
虽然ANN(即Artificial Neural Network,人工神经网络)中最大量的运算来源于MAC(即Multiply Accumulate,乘积累加运算)阵列的乘加运算,但是对最终运算结果准确影响最大的是激活函数的应用。不同的激活函数应用于不同AI(即Artificial Intelligence,人工智能)模型,适用于不同的运算任务。下面介绍两种常见的激活函数:
Sigmoid函数也称为Logistic函数,因为Sigmoid函数可以从逻辑回归(LR)中推理得到,也是LR模型指定的激活函数。Sigmoid函数的取值范围在(0,1)之间,可以将网络的输出映射在这一范围,方便分析。其公式表示为:
Figure PCTCN2022089350-appb-000004
其导数的公式表示为:
Figure PCTCN2022089350-appb-000005
参见图8所示,图8为本申请实施例提供的Sigmoid函数曲线图,参见图9所示,图9为本申请实施例提供的Sigmoid函数导数曲线图。
如上述可知,sigmoid函数具有平滑和易于求导的优点,并且解决了函数及其导数连续性的问题。但相应的,sigmoid也具有以下缺点:1.运算量大;2.反向传播求误差梯度时,导数运算涉及除法;3.两端的导数无限趋近于0而可能在深层次的运算中发生梯度消失;4.函数不基于0对称,容易在运算加深时改变数据的分布特征。
Tanh为双曲正切函数,其英文读作Hyperbolic Tangent。Tanh和sigmoid相似,都属于饱和激活函数,区别在于输出值范围由(0,1)变为了(-1,1),可以把tanh函数看做是sigmoid向下平移和拉伸后的结果。其公式表示为:
Figure PCTCN2022089350-appb-000006
其导数的公式表示为:
Figure PCTCN2022089350-appb-000007
参见图10所示,图10为本申请实施例提供的tanh函数曲线图,参见图11所示,图11为本申请实施例提供的tanh函数导数曲线图。
可知,相比sigmoid函数,tanh函数解决了0对称的问题,且其导数曲线更加陡峭,表示其具有更好的收敛速度。但是tanh函数依然具有以下缺点:1.运算量大;2.反向传播求误差梯度时,导数运算涉及除法;3.两端的导数无限趋近于0而可能在深层次的运算中发生梯度消失。
以上两种激活函数被使用的最为广泛,但这两种激活函数依然具有明显的缺点就是容易导致梯度消失。本申请提供的方案可以在解决这类激活函数梯度消失的问题。
基于上述几种激活函数的描述,考虑到激活函数在ANN中所需完成的工作,可以总结出一个激活函数需要满足的基本特性:
基于上述几种激活函数的描述,考虑到激活函数在ANN中所需完成的工作,可以总结出一个激活函数需要满足的基本特性:
1、连续性。激活函数需要在其曲线和导数曲线(连续可导)上都是连续的,这样其激活功能才能表现出平滑特性。对于不连续的函数,比如Relu,当数据落在其不连续点,比如0点时,就会对其分类产生不理想的特性影响,而因为这种落点的几率随着函数的不连续点数量而变化,因此所构建出的激活函数可以接受不连续点,但是需要尽量减少其数量。
2、梯度爆炸或梯度消失。当神经网络的权重计算随着深入逐渐向某一个方向越来越深时,其权重的更新也会跟着激活函数的梯度(导数)递增或递减,于是这个更新会对数据集产生巨大影响。当梯度是递增的,导致权重呈指数级增加,导致数据过大,无法进行正确的分类计算,此时被称为是梯度爆炸。常见的梯度爆炸可以见于Relu,随着数据更新的深度增加,其梯度不断增大,导致最后无法计算,因此成为梯度爆炸。相应的,假如权重随着梯度的更新逐渐减小,导致不同数据之间无法区分,叫做梯度消失。常见的梯度消失存在于sigmoid和tanh,因为在两端sigmoid和tanh的梯度都逐渐趋近于零,因此随着深度的增加,其计算导致的量级越来越小,最终发生梯度消失。按照已有研究结论,一般梯度小于0.024的激活函数,即会发生梯度消失情况。业界常见的为解决梯度消失问题,使用的方案为在梯度近零端,使用高斯或随机分布,使其产生抖动,减少梯度消失。
3.、饱和性,激活函数的曲线本身在两端趋近于0时,即被成为饱和。饱和特性有左饱 和和右饱和两类,分别表示的是激活函数的曲线向左趋近于0或是向右趋近于0。对于梯度消失问题,假如激活函数本身还具有非饱和性,则可以一定范围内解决“梯度消失”问题,实现激活函数的快速收敛。
现有的激活函数的构造可以按照一元运算和二元运算所组合成的核心运算单元构造而成,如图12所示,图12为现有技术中的一种激活函数构造示意图。如图12所示,对激活函数的研究将其分为了一元运算和二元运算两部分,对于不同的数据x,运算不断的相互组合,最终形成激活函数,实现针对所有数据的运算输出。按照现有成功的构造激活函数的方式,一元运算和二元运算有以下组合:
一元函数:
x,-x,|x|,x 2,x 3
Figure PCTCN2022089350-appb-000008
βx,x+β,log(|x|+α),exp(x),sin(x),cos(x),tanh(x),sinh(x),coh(x)...
二元函数:
x 1+x 2,x 1*x 2,x 1-x 2
Figure PCTCN2022089350-appb-000009
max(x 1,x 2),exp(x 1)*x 2...
其区分方式为,一元函数的运算代表单输入单输出,二元函数代表两个输入得到一个输出的运算。使用一元函数和二元函数的组合可以得到任何激活函数。
此外,经过长期的运算可知,上述一元函数和二元函数虽然代表了所有激活函数的构造方式,但是二元函数主要表示对于多输入的单输出选择情况。而真正影响激活函数连续性、梯度特性以及饱和性的,主要由一元函数构造特性决定。
基于万能近似理论(universal approximation theorem)可知,在大量的一元函数中,一个好的激活函数需要有近中点的快速梯度下降特性,以及两端的梯度逐渐平缓特性。而能够满足这样挤压特性的一元函数中,只有exp可以有效满足,因此大部分的激活函数都会或多或少的用到exp进行激活函数的构造。
基于上面的激活函数特性描述可知,一个好的激活函数,在近零点为了能够实现快速分类激活,需要有明显的梯度变化,即越靠近0点梯度越高,而在越远离0点梯度越下降。而能够满足这样挤压特性的一元函数中,只有指数exp可以有效满足,因此大部分的激活函数都会或多或少的用到指数exp进行激活函数的构造。例如,sigmoid和tanh激活函数。sigmoid和tanh激活函数在神经网络中,都有非常优秀的表现。是一种通用的激活函数,但不是最优的激活函数。并且,sigmoid和tanh的两端的梯度都逐渐趋近于零,因此随着深度的增加,其计算导致的量级越来越小,最终发生梯度消失。梯度消失的情况在RNN网络中表现得更为明显。LSTM(Long-Short Term Memory,长短型记忆神经网络)和GRU(即 Gated Recurrent Unit,门控循环单元)网络的发明,就是为了解决RNN梯度消失的问题。本申请则是从激活函数的角度出发,寻找解决RNN模型梯度消失的方法。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
以上对本申请所提供的一种图像识别方法、装置、设备及介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (10)

  1. 一种图像识别方法,其特征在于,包括:
    获取图像训练样本数据集;其中,所述图像训练样本数据集包括图像训练样本数据和所述图像训练样本数据对应的标签信息;
    将基础激活函数和预设偏置调整函数以加法关系构造为预设激活函数,并将所述预设激活函数确定为神经网络模型的激活函数,得到初始神经网络模型;其中,所述预设偏置调整函数为将符号函数、第一可训练参数、二次项以乘法关系构造的函数;
    将所述图像训练样本数据集输入所述初始神经网络模型进行训练,直到模型收敛,得到训练后神经网络模型;和
    当获取到待识别图像,利用所述训练后神经网络模型输出所述待识别图像对应的识别结果。
  2. 根据权利要求1所述的图像识别方法,其特征在于,所述将基础激活函数和预设偏置增加函数偏置调整函数以加法关系构造为预设激活函数,包括:
    将基础激活函数和预设偏置调整函数、预设线性函数以加法关系构造为激活函数,得到预设激活函数;
    其中,所述预设线性函数包括第二可训练参数。
  3. 根据权利要求2所述的图像识别方法,其特征在于,所述将基础激活函数和预设偏置调整函数、预设线性函数以加法关系构造为激活函数,得到预设激活函数,包括:
    根据可训练权重参数,将基础激活函数和预设偏置调整函数、预设线性函数以加法关系构造为激活函数,得到预设激活函数。
  4. 根据权利要求3所述的图像识别方法,其特征在于,所述预设激活函数为:
    φ(x,α,a,b)=α*h(x)+(1-α)*[u(x)+η(x)]
    其中,h(x)为基础激活函数,u(x)为预设线性函数,η(x)为预设偏置调整函数,α为可训练权重参数,并且,
    u(x)=b*x+c
    η(x)=sign(x)*a*x 2
    其中,b、c为第二可训练参数,a为第一可训练参数。
  5. 根据权利要求1所述的图像识别方法,其特征在于,所述基础激活函数为双曲正切函数或sigmoid函数。
  6. 根据权利要求1所述的图像识别方法,其特征在于,所述将所述预设激活函数确定为神经网络模型的激活函数,得到初始神经网络模型,包括:
    将所述预设激活函数确定为循环神经网络模型的激活函数,得到初始神经网络模型。
  7. 根据权利要求1至6任一项所述的图像识别方法,其特征在于,所述将所述图像训练样本数据集输入所述初始神经网络模型进行训练,直到模型收敛,得到训练后神经网络模型之后,还包括:
    获取测试数据集;
    将所述测试数据集输入所述训练后神经网络模型,得到所述测试数据集对应的测试结果;和
    利用所述测试结果评估所述训练后神经网络模型的准确度。
  8. 一种图像识别装置,其特征在于,包括:
    训练样本数据获取模块,用于获取图像训练样本数据集;其中,所述图像训练样本数据集包括图像训练样本数据和所述图像训练样本数据对应的标签信息;
    初始神经网络模型获取模块,用于将基础激活函数和预设偏置调整函数以加法关系构造为预设激活函数,并将所述预设激活函数确定为神经网络模型的激活函数,得到初始神经网络模型;其中,所述预设偏置调整函数为将符号函数、第一可训练参数、二次项以乘法关系构造的函数;
    神经网络模型训练模块,用于将所述图像训练样本数据集输入所述初始神经网络模型进行训练,直到模型收敛,得到训练后神经网络模型;和
    图像识别模块,用于当获取到待识别图像,利用所述训练后神经网络模型输出所述待识别图像对应的识别结果。
  9. 一种电子设备,其特征在于,包括存储器及一个或多个处理器,所述存储器中储存有计算机可读指令,所述计算机可读指令被所述一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-7任意一项所述的方法的步骤。
  10. 一个或多个存储有计算机可读指令的非易失性计算机可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如权利要求1-7任意一项所述的方法的步骤。
PCT/CN2022/089350 2021-11-24 2022-04-26 一种图像识别方法、装置、设备及介质 WO2023092938A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111398690.7 2021-11-24
CN202111398690.7A CN113822386B (zh) 2021-11-24 2021-11-24 一种图像识别方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2023092938A1 true WO2023092938A1 (zh) 2023-06-01

Family

ID=78919800

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/089350 WO2023092938A1 (zh) 2021-11-24 2022-04-26 一种图像识别方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN113822386B (zh)
WO (1) WO2023092938A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822386B (zh) * 2021-11-24 2022-02-22 苏州浪潮智能科技有限公司 一种图像识别方法、装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108898213A (zh) * 2018-06-19 2018-11-27 浙江工业大学 一种面向深度神经网络的自适应激活函数参数调节方法
KR20190048274A (ko) * 2017-10-31 2019-05-09 전자부품연구원 다중 아날로그 제어 변수 생성을 위한 다중 뉴럴 네트워크 구성 방법
CN110059741A (zh) * 2019-04-15 2019-07-26 西安电子科技大学 基于语义胶囊融合网络的图像识别方法
CN112613581A (zh) * 2020-12-31 2021-04-06 广州大学华软软件学院 一种图像识别方法、系统、计算机设备和存储介质
CN113822386A (zh) * 2021-11-24 2021-12-21 苏州浪潮智能科技有限公司 一种图像识别方法、装置、设备及介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104537387A (zh) * 2014-12-16 2015-04-22 广州中国科学院先进技术研究所 利用神经网络实现车型分类的方法和系统
CN106056595B (zh) * 2015-11-30 2019-09-17 浙江德尚韵兴医疗科技有限公司 基于深度卷积神经网络自动识别甲状腺结节良恶性的辅助诊断系统
CN106845401B (zh) * 2017-01-20 2020-11-03 中国科学院合肥物质科学研究院 一种基于多空间卷积神经网络的害虫图像识别方法
US11074430B2 (en) * 2018-05-29 2021-07-27 Adobe Inc. Directional assistance for centering a face in a camera field of view
CN111091175A (zh) * 2018-10-23 2020-05-01 北京嘀嘀无限科技发展有限公司 神经网络模型训练方法、分类方法、装置和电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190048274A (ko) * 2017-10-31 2019-05-09 전자부품연구원 다중 아날로그 제어 변수 생성을 위한 다중 뉴럴 네트워크 구성 방법
CN108898213A (zh) * 2018-06-19 2018-11-27 浙江工业大学 一种面向深度神经网络的自适应激活函数参数调节方法
CN110059741A (zh) * 2019-04-15 2019-07-26 西安电子科技大学 基于语义胶囊融合网络的图像识别方法
CN112613581A (zh) * 2020-12-31 2021-04-06 广州大学华软软件学院 一种图像识别方法、系统、计算机设备和存储介质
CN113822386A (zh) * 2021-11-24 2021-12-21 苏州浪潮智能科技有限公司 一种图像识别方法、装置、设备及介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LIU, HUA: "Adaptive Activation Functions in Deep Convolutional Networks", INFORMATION & TECHNOLOGY, CHINA DOCTORAL DISSERTATIONS/MASTER'S THESES FULL-TEXT DATABASE (MASTER), no. 12, 1 March 2018 (2018-03-01), CN, pages 1 - 58, XP009545890 *

Also Published As

Publication number Publication date
CN113822386B (zh) 2022-02-22
CN113822386A (zh) 2021-12-21

Similar Documents

Publication Publication Date Title
Messikommer et al. Event-based asynchronous sparse convolutional networks
WO2022022163A1 (zh) 文本分类模型的训练方法、装置、设备及存储介质
US20190303535A1 (en) Interpretable bio-medical link prediction using deep neural representation
Godin et al. Dual rectified linear units (DReLUs): A replacement for tanh activation functions in quasi-recurrent neural networks
CN109948149B (zh) 一种文本分类方法及装置
WO2021184902A1 (zh) 图像分类方法、装置、及其训练方法、装置、设备、介质
WO2021089013A1 (zh) 空间图卷积网络的训练方法、电子设备及存储介质
CN109086654B (zh) 手写模型训练方法、文本识别方法、装置、设备及介质
CN112164391A (zh) 语句处理方法、装置、电子设备及存储介质
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
KR20190099927A (ko) 심층 신경망의 학습을 수행시키는 방법 및 그에 대한 장치
WO2021089012A1 (zh) 图网络模型的节点分类方法、装置及终端设备
Zhang et al. Sequential active learning using meta-cognitive extreme learning machine
CN114677565B (zh) 特征提取网络的训练方法和图像处理方法、装置
US20230113318A1 (en) Data augmentation method, method of training supervised learning system and computer devices
WO2023092938A1 (zh) 一种图像识别方法、装置、设备及介质
Liu et al. Structured learning of tree potentials in CRF for image segmentation
WO2020106871A1 (en) Image processing neural networks with dynamic filter activation
Zhu et al. Improved self-paced learning framework for nonnegative matrix factorization
Mendonça et al. Machine learning: Review and trends
WO2021253938A1 (zh) 一种神经网络的训练方法、视频识别方法及装置
Kumar APTx: better activation function than MISH, SWISH, and ReLU's variants used in deep learning
CN113515519A (zh) 图结构估计模型的训练方法、装置、设备及存储介质
US20240257512A1 (en) Image recognition method and apparatus, and device and medium
CN116109834A (zh) 一种基于局部正交特征注意力融合的小样本图像分类方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897038

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18565043

Country of ref document: US