CN113469271A

CN113469271A - Image classification method based on Echo State Network

Info

Publication number: CN113469271A
Application number: CN202110813222.5A
Authority: CN
Inventors: 李丽香; 孙婧瑜; 彭海朋; 孟寅
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-07-19
Filing date: 2021-07-19
Publication date: 2021-10-01

Abstract

The invention discloses an image classification method based on an Echo State Network, which comprises the steps of constructing a Network structure, transmitting and updating data in training, outputting training and performing image classification processing, wherein all parameters of a reserve pool are set as follows: the spectrum radius lambda of the reserve pool IS 1.25, the size Nx of the reserve pool IS 2000, the scale IS of the input unit of the reserve pool IS 1, the sparsity SD of the reserve pool IS 1e-8, the leakage rate alpha IS 0.15, and the number of idle running images accounts for 10%. According to the image classification method based on the Echo State Network, provided by the invention, the problems of high complexity and low speed of a convolution Network in use are solved by using an Echo State Network (ESN) to classify and process images.

Description

Image classification method based on Echo State Network

Technical Field

The invention relates to the technical field of image processing, in particular to an image classification method based on Echo State Network.

Background

Most of the conventional methods for processing images use a Convolutional Neural Network (CNN), but the Convolutional Neural Network has a complex structure and has 3 major disadvantages: 1) the biological basic support of CNN is insufficient and there is no memory function. A neural network does not focus on one feature uniquely per neuron, but on one feature by a group of neurons. A group of neurons can output a vector and an input can only output one value. 2) The CNN full-connection mode is too redundant and inefficient, needs a large amount of data and a large amount of parameter adjusting operation, and is high in complexity and low in speed. 3) The CNN wins feature detection but is not sufficient in feature comprehension.

The Echo State Network (ESN) has the advantages of fast speed, simple structure, etc., however, the current ESN application mainly focuses on the sequence prediction problem, and the conventional ESN is a single input process to the time sequence. There are many researchers who study Echo State networks, most of them are studying the prediction function of Echo State Network.

For example, "A combination adaptive based on search adaptation method and echo state network for energy consumption prediction in USA" proposes a hybrid energy consumption prediction model combining with Echo State Network (ESN). The scientific and accurate American energy consumption prediction model is established, and the method has important significance for formulating an energy policy and configuring energy resources. The seasonal adjustment method inherits the divide and conquer idea, and decomposes the original time sequence into a seasonal subsequence and a residual subsequence instead of the conventional three parts (seasonality, trend and residual), thereby avoiding the complex modeling task of the residual subsequence. The seasonal subsequence and the remaining subsequence are then modeled and predicted using the model ESN and EEMD-GOA-ESN, respectively. And summarizing the two parts to generate a final prediction result. Empirical research on fossil fuel, nuclear power and renewable energy consumption shows that the model is superior to other alternative benchmarks in the aspects of effectiveness and expandability. The sample extrapolation prediction result shows that the method can control the monthly energy consumption error within 3.3 percent.

The 'Effective energy conservation for estimating used echo state network' provides an improved optimization model of a bag echo state network improved based on a differential evolution algorithm to estimate energy consumption. Accurate analysis and prediction of energy consumption not only affects energy safety and environment of a country, but also provides useful decision-making basis for decision-makers. In order to reduce prediction errors and improve the generalization degree of the network, a Bagging algorithm is adopted. And optimizing three parameters of the echo state network by using a differential evolution algorithm. The model combines the advantages of an echo state network, and the accuracy and reliability of the model are verified through two comparison examples and an extended application example. The results of the comparison of the examples show that the model has better prediction performance compared with the basic echo state network and the existing popular model. The improved optimization model of the bag echo state network based on the differential evolution algorithm is improved, has an average absolute percentage error of 0.215% for the energy consumption prediction in China, has high precision and stability, and is a satisfactory energy consumption prediction tool.

Chinese patent CN111553415A discloses an ESN neural network image classification processing method based on memristors, and relates to the technical field of image processing. By utilizing the unique memory characteristics and the computing capability of the memristor and combining with the echo state network, the ESN neural network circuit based on the memristor is designed to meet the requirement on the storage capability in the image processing process, so that the access operation of training data is reduced, and finally the purpose of improving the performance and the efficiency of the whole neural network is achieved. The Chinese patent CN111553415A fuses data storage and operation based on the memristor, image data is used as a training object, the image preprocessing function is realized by utilizing the convolution operation of the image, the basic logic operation required by the image preprocessing is screened out, the circuit design of the memristor is carried out on the basic logic operation through the reference inclusion circuit, so that the data storage and operation structure based on the memristor is completed, the access operation of the training data is reduced through the combination of the storage and the operation of the image data, and the performance of the whole neural network is improved. The method has the disadvantages of high complexity and high operation speed.

Disclosure of Invention

The invention aims to provide an image classification method based on Echo State Network, which tries to classify images by using an Echo State Network (ESN) and solves the problems of high complexity and low speed of a convolution Network in use.

In order to achieve the above purpose, the invention provides the following technical scheme:

the invention provides an image classification method based on an Echo State Network, which comprises the following steps:

s1, constructing a network structure: the Echo State Network consists of three parts, including input layer with K neurons and one bias unit, reserve pool with L neurons and output layer with N neurons_x×N_xThe various parameters of the reserve pool are set as: the spectrum radius lambda of the reserve pool IS 1.25, the size Nx of the reserve pool IS 2000, the scale IS of the input unit of the reserve pool IS 1, the sparsity SD of the reserve pool IS 1e-8, the leakage rate alpha IS 0.15, and the number of idle running images accounts for 10%;

s2, data transmission and updating in training: when training is started, idling needs to be carried out for a certain time to form a stable internal state, and then the output is stored in a matrix;

s3, outputting training; the output y (n) has a dimension for each class, and y^label(n) equals 1 in the dimension corresponding to the correct class, and equals zero everywhere else;

s4, image classification processing: the classification values are mapped to integer values, each of which is represented as a binary vector, all of which are 0 values except for an integer having an index of 1.

Further, the node state update equation of the ESN neural network is x (t +1) ═ f (W)_inu (t +1) + Wx (t)), where x (t) is the node state, u (t) is the input signal, t is the number of time steps, W_inThe connection right is input.

Further, the activation function f in the reserve pool is a sigmoid function.

Further, the image classification process is formulated as

y_j(m) denotes the j-th dimension of y (m), ω denotes a time interval, ξ y denotes the average value of y over this interval.

Further, in the image classification processing step, the data is normalized, a dimensional expression is converted into a dimensionless expression to become a scalar quantity, and the formula is：a_norm＝(a-a_min)/(a_max-a_min) Where a is the data to be normalized, a_normIs the normalized value of a.

Compared with the prior art, the invention has the beneficial effects that:

according to the image classification method based on the Echo State Network, provided by the invention, the problems of high complexity and low speed of a convolution Network in use are solved by using an Echo State Network (ESN) to classify and process images.

Drawings

In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

Fig. 1 is a diagram of an ESN network structure and its parameters according to an embodiment of the present invention.

FIG. 2 shows the effect of the size of the pool provided by an embodiment of the present invention on the Micro-F1 Score of the MNIST dataset classification.

Fig. 3 shows the classification result of the MNIST data set by the echo state network according to the embodiment of the present invention, where TP represents correct classification of positive case, FP represents incorrect classification of positive case, and FN represents incorrect classification of negative case.

Detailed Description

The ESN is a simple three-layer network structure, and is a recurrent neural network, and in recent years, many students have studied network structures such as CNN, LSTM, Capsule Net, and the like, and meanwhile, excellent results in the field are still endless. However, in the present invention, we have studied a simple network of ESNs that has not been deeply mined to process images, which has advantages over CNNs in terms of training time cost.

In the present invention, we performed several trials and tests aimed at exploring more the performance of ESNs. Firstly, data prediction is carried out by using ESN, the superiority of the ESN in sequence prediction is verified, then the ESN is used for processing images, the accuracy is gradually improved through processes of parameter adjustment, feature extraction and the like, and some regular findings are obtained.

In terms of accuracy, because the ESN is a simple three-layer, in the reserve pool, the greater the number of the neural nodes, the better the performance, but there is a threshold, and when N is greater than the threshold D, the speed of accuracy improvement will become slow. But not as accurate as the convolutional network. The reasons for the poor performance we summarize as follows: ESN uses more weight and bias parameters and fewer convolutional layers for two reasons: (1) parameter sharing: a feature detector is available in a portion of the image, and then in a portion of another image; (2) sparse connection: in each layer, each output value depends only on a small number of inputs. Disadvantages of 2, rnn: when pictures are input into the structure, the pictures are regularly broken up and rearranged, the overall characteristic structure is destroyed, and the memory function of the ESN has no advantage over a single discontinuous picture but over a time-dependent sequence.

The ESN is further described below. The ESN consists of three parts, i.e. an input layer, a reserve pool and an output layer, wherein the input layer is provided with K neurons, the output layer is provided with L neurons, and the neurons respectively consist of states

And

where m is 1, 2, 3 …, where m is a discrete time and is the number of training data set data points. The intermediate storage pool is N_x×N_xThe process of inputting the input data e (m) into the ESN and updating the reservoir internal state x (m) can be regarded as inputting data from a low-dimensional space into a high-dimensional space.

The most important structure in ESN IS the intermediate reservoir, and its performance IS determined by various parameters, such as spectral radius λ, reservoir size Nx, reservoir input unit scale IS, reservoir sparsity SD and leak rate α. Wherein:

the spectral radius λ is an important parameter to ensure that the echo state network has echo state properties. The spectral radius of the reservoir's connection matrix W is the maximum absolute value of W, which is the width of the matrix W or non-zero elements. The size of the spectral radius affects the speed of the state over time, i.e. the memory strength of the network and the stability of the reservoir. λ <1 is a necessary condition to ensure network stability.

The pool size Nx is the number of neurons in the repository, and is related to the number of samples, which has a large impact on network performance. A large reservoir space is advantageous for providing the function ylabel (t) of the linearly combined signal. On the premise that proper standardization measures are taken to prevent overfitting, the larger the reservoir scale is, the more accurate the description of the ESN of a given dynamic system is, and of course, the calculation difficulty is increased correspondingly. In general, good parameters may be shifted to larger reservoirs, so that when selecting a reservoir size, selecting a smaller size facilitates time-saving testing and tuning of the parameters, and then expanding its size.

The scale IS of the reservoir input unit IS a scale factor that needs to be multiplied before the reservoir input signal IS connected to the reservoir neurons, i.e. the input signal IS scaled to a certain extent. In general, the greater the nonlinearity of the object to be processed, the greater IS. In uniformly distributed W_inputIn (1), the range of the scaling value is [ -s, s]In a normal distribution of W_inputThe scaling value is measured by the standard deviation. To reduce the number of parameters that need to be adjusted, W_inputAll columns in (a) are scaled using a single scaling value. For the first column, it refers to the offset input unit of the pool, so it can be split out for scaling.

The sparsity SD of the reserve pool indicates the connection between the neurons in the reservoir, not all the neurons are connected, SD represents the percentage of the total number of interconnected neurons in the reservoir to the total number of neurons N (SD is m/N, m is the number of interconnected neurons), and the larger the SD value is, the stronger the ability of approaching nonlinearity is. Sparsity generally has little impact on performance, SD is not optimized preferentially, but the reserve pool can be updated quickly with a sparse matrix.

The leak rate α is a parameter that needs to be adjusted by trial and error, and it is a parameter that dynamically updates the reservoir in discrete time, and the update status in continuous time can be expressed as formula (1). When it is discrete time, it is discretized and can be expressed as formula (2).

Parameter versus output weight (W) in a pool_out) The influence of (a) is very large and includes three main parameters: the spectral radius λ of the reservoir, the size of the reservoir and the scaling of the reservoir. The value of the spectrum radius of the reserve pool is the absolute value of the maximum eigenvalue of the internal matrix, which is generally less than 1, but can be adjusted according to the actual situation. The size of the reserve pool is one of the main factors influencing the performance, when the size of the reserve pool is smaller, the performance of the reserve pool is improved along with the increase of the size, however, the increase of the size of the reserve pool also adds more calculation to the computer, so that the training time is increased. When the size of the reserve tank is too large, the performance of the reserve tank is not only slowly improved, but also reduced. The scaling of the pool may be used to resize the values of the input weight matrix to limit the data input to the interior of the pool.

For a better understanding of the present solution, the method of the present invention is described in detail below with reference to the accompanying drawings.

The network structure of the invention is shown in figure 1 and comprises an input layer, a reserve pool and an output layer, wherein the input layer is provided with K neurons and a bias unit, the bias unit sets the value to be 1, the output layer is provided with L neurons, and the reserve pool is N_x×N_xWhen ESN is used for static classification, rather than processing time series predictions, the input continuous signal is input as discrete image samples. Therefore, the input of the previous picture does not influenceTraining of the next picture, there is no feedback input of output neurons into the pool.

The various parameters of the reserve pool of the present invention are set as follows: the spectrum radius lambda of the reserve pool IS 1.25, the size Nx of the reserve pool IS 2000, the scale IS of the input unit of the reserve pool IS 1, the sparsity SD of the reserve pool IS 1e-8, the leakage rate alpha IS 0.15, and the number of idle running images accounts for 10%.

The data transmission and update rules in the training are as follows, where

The update is represented by a representation of the update,

activation vectors for neurons in the pool, f and

respectively, representing a function and a vertical vector (or matrix) connection. Alpha represents the leak rate and should be between 0 and 1.

Equations

3 and 4 represent reservoir updates. Wherein winput is NxX (1+ K)_e) W is a matrix of dimensions Nx x Nx.

The mathematical expression shown in equation 5 is a linear readout layer of the echo state network, Wout from the reservoir layer to the output layer is a matrix of Ly × (1+ Nx + Ke), since the input structure layer is a vector of K dimensions and the reservoir layer is a vector of N dimensions.

y(m)＝Wout[1；e(m)；x(m)] (5)

The most important part of the echo state network is the training of the output layer. The output layer makes Y closer to Y through training Y^labelThereby improving the accuracy of the network structure. The output is calculated by equation 6Therein are calculated

Is the set of all y (m), Q ∈ R^(1+Ke+Nx)×tIs a mixture of all [ 1; e (m); x (m)]All generated by the input unit e (m) via the intermediate layer. In training, linear regression is generally used, such as the ridge regression algorithm in equation 7, where Q is considered as a matrix, μ is the regularization coefficient, and a is the identity matrix.

Y＝WoutQ (6)

Wout＝Y^labelQ^T(QQ^T+μA)^-1 (7)

During training, in order to evaluate the effect of high quality training, it is necessary to observe and adjust Wout. As many feature extractions as possible are performed on the input data before training, the results are often over-fit. Regularization is an effective method to solve this problem. In addition, if the output weight is too large, the nuances of x (m) are amplified, making the training more sensitive. In ESN, the output will be the next input, and this sensitivity will increase the instability of the system and is not good for training. For this case, Tikhonov regularized μ a is used to make up for this disadvantage (equation 8), but the performance of the ridge regression algorithm is better.

Wherein W_i-outFor row i of Wout, | | | · | |, is the euclidean norm. Mu | W_i-out| | is a so-called penalty term in regularization that imposes some constraint on Wout to facilitate the squared error count between y (t) and ylabel (t). The expression includes the sum of two objectives, namely the sum of the error and the weight, which requires a minimum of their combined values. The value of μ plays a decisive role in the ratio of these two targets. It can be seen that when mu isAt a value of zero, the regularization nature is lost and the ridge regression becomes a normal linear regression.

In the invention, the reserve pool weight and the input weight of the echo state network are randomly generated, and the initial state of the internal reserve pool is constantly changed when training is just started, so that the output can be greatly influenced, and idling needs to be carried out for a certain time to form a stable internal state, and then the output is stored in the matrix.

x(t+1)＝f(W_inu(t+1)+Wx(t))

The activation function f in the reserve pool has a great influence on the next state of the reserve pool, the three commonly used activation functions are respectively a tanh function, a sigmiod function and a softmax function, and experiments are respectively carried out on the three functions under the same other conditions and different reserve pool scales, and the results are shown in table 1.

TABLE 1 Effect of activation function on Mirco-F1 Score

As can be seen from Table 1, the scores of the sigmoid functions are all highest, the tanh function is slightly inferior to the sigmoid function, and the softmax function performs the worst, so the activation function selected in the present invention is the sigmoid function.

Since the input of the ESN is generally a timing signal with a single dimension, the image data set needs to be processed accordingly when the image is classified. First, the data is mapped between [0,1] by linearly transforming the original image pixel values through normalization, and the image pixel values are converted into a mean value of 0 and a variance of 1 through normalization such that the image pixel values follow a standard normal distribution. Image data sets are horizontally stacked, namely, images are arranged from left to right to form a matrix with a large number of columns, and meanwhile, labels of the images are transposed according to the matrix coded by onehot and then horizontally stacked to form a new matrix.

After processing the image data set, the matrix of images and the matrix of labels may correspond in columns, each column of the image matrix being a portion of an image and each column of the label matrix being a label onehot value corresponding to the image, in the number of pixels in the image column. The pixel number of the image line input to the ESN network at one time still corresponds to one label, namely, the multi-dimensional input corresponds to a single label, and then the characteristic of linear input of the ESN can be met so as to perform image classification operation.

The invention, when classifying an image, outputs y (n) with one dimension for each class, and ylabel (n) equals 1 in the dimension corresponding to the correct class and zero everywhere else. Expressed by the following formula. In our experiments, when a handwritten digital picture is classified, one-hot encoding is 1-bit valid encoding, N-bit state registers are used to encode N states, each state has an independent register bit, and only 1 bit is valid at any time. First, it is required to map the classification values to integer values. Each integer value is then represented as a binary vector, with 0 values except for the integer index of 1.

y_j(m) denotes the j-th dimension of y (m), and ω denotes a time interval, so that ζ y denotes the average value of y over this interval. Thus, ζ x and ζ y can be used in place of x (m) and y (m) in the experiments. Accordingly, ξ x represents [ 1; e (m); x (m)]Average over a time interval ω, where y (m) may be equal to y^lableWith bias, we only need xi y and y^lableThe approach is only needed. When we use xi x and y (m), we only need to be in conjunction with W_outMultiply and find F (ξ y, y) for each m in ω^label) Finding F (y (m), y)^label(m)) is easier.

In order to reserve the weight of different times in the sequenceValue, different time periods omega₁，ω₂，…，ω_jAverage weight xi in_lx，ξ₂x，…，ξ_jx may be linked to xi x, where xi x xi_kx；…；ξ_jx]. Different features in different feature vectors often have different dimensions and dimension units, which affect the result of data analysis, in order to eliminate the dimension influence between indexes, normalization processing is performed on data, a dimensional expression is converted into a dimensionless expression to become a scalar, and the formula is as follows: a is_norm＝(a-a_min)/(a_max-a_min)。

In the embodiment of the invention, the adopted data set is an MNIST handwritten digital image data set, normalization is carried out on the data set, the image is normalized to be 0 in mean value and 1 in variance, then the image is normalized, the mean value is subtracted, and then the square difference is divided, so that the bias term in the network can be increased, and the convergence rate of the model is improved.

The spectral radius of the reservoir was limited to 1.25 and remained unchanged, while the scaling scale was 1, i.e. no scaling, and the effect of the echo state network on Micro-F1 Score of MNIST dataset classification was explored by increasing the size of the reservoir, and the experimental results are shown in fig. 2.

As can be seen from FIG. 2, the pool size initially had a large effect on the Micro-F1 Score, with the Micro-F1 Score increasing significantly as the pool size increased, but the Micro-F1 Score increasing significantly slower as the pool size reached 1400.

Due to the limitation of the echo state network, the input weight matrix of the network, the internal matrix of the reserve pool and the initial state matrix are all generated randomly, so that network training and classification are affected, and the accuracy rate fluctuates. This situation cannot be avoided, and the result of each operation has a slight error, and when the error is not large, the contribution of the selection of the parameters to the network is not affected. Meanwhile, the characteristics of the echo state network also enable training time to be reduced, and the training time is faster than that of the convolutional neural network, but the final Micro-F1 is inferior to that of the convolutional neural network.

Fig. 3 shows the number of classification digits of the MNIST data set based on the echo state network, and it can be seen that the classification results of different digits are greatly different, and the classification accuracy of the digit "5" is too low compared to the digit "1", which may be that after the digit "5" is divided into 28 columns, the features between the columns are not very obvious, so that the number of the digit "5" being classified into other categories is increased, and the accuracy is low. However, the number of the positive case classification correctness (TP) is much higher than the number of the positive case classification errors (FP) and the negative case classification errors (FN) in the overall view, and the classification performance of the echo state network is much inferior to that of the current Convolutional Neural Network (CNN), but the traditional neural network still has considerable effect.

The image classification method of the invention classifies the images by utilizing the advantages of simple structure, high speed and memory of the ESN, and can realize the multi-input classification of the ESN to the time sequence. There is an advantage in time over conventional convolutional networks and conventional ESN single input processing of time series.

In addition, Long Short Term memory network (Long Short Term) is abbreviated LSTM, which is a special type of RNN and can learn Long Term dependency information. The single layer test of the present invention is more accurate (greater than 85%) than the image classification of LSTM.

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: it is to be understood that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for some of the technical features thereof, but such modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An image classification method based on an Echo State Network is characterized by comprising the following steps of constructing a Network structure, transmitting and updating data in training, outputting training and performing image classification processing, wherein all parameters of a reserve pool are set as follows: the spectrum radius lambda of the reserve pool IS 1.25, the size Nx of the reserve pool IS 2000, the scale IS of the input unit of the reserve pool IS 1, the sparsity SD of the reserve pool IS 1e-8, the leakage rate alpha IS 0.15, and the number of idle running images accounts for 10%.

2. The image classification method based on Echo State Network according to claim 1, characterized in that in the step of constructing the Network structure, the Echo State Network is composed of three parts, i.e. an input layer, a reservoir and an output layer, wherein the input layer has K neurons and a bias unit, the output layer has L neurons, and the reservoir is N neurons_x×N_xOf the matrix of (a).

3. The image classification method based on Echo State Network according to claim 1, characterized in that in the data transmission and update step in training, when training is just started, idle running needs to be performed for a certain time to form a stable internal State, and then the output is stored in the matrix.

4. The Echo State Network-based image classification method according to claim 1, characterized in that the node State update equation of the ESN neural Network is x (t +1) ═ f (W)_inu (t +1) + Wx (t)), where x (t) is the node state, u (t) is the input signal, t is the number of time steps, W_inThe connection right is input.

5. The Echo State Network-based image classification method according to claim 1, characterized in that the activation function f in the pool is a sigmoid function.

6. The Echo State Network-based image classification method according to claim 1, characterized in that in the output training step, the output y (n) has one dimension for each class, and y is a dimension^label(n) equals 1 in the dimension corresponding to the correct class and zero everywhere else.

7. The Echo State Network-based image classification method according to claim 1, characterized in that in the image classification process, the classification values are mapped to integer values, each integer value is represented as a binary vector, except that the index of the integer is 1, and all other values are 0 values.

8. The Echo State Network-based image classification method according to claim 1, characterized in that the formula of the image classification process is

9. The image classification method based on Echo State Network according to claim 1, characterized in that in the image classification processing step, the data is normalized, the dimensional expression is transformed into a dimensionless expression, which becomes a scalar, and the formula is: a is_norm＝(a-a_min)/(a_max-a_min)。