CN111652271A

CN111652271A - Nonlinear feature selection method based on neural network

Info

Publication number: CN111652271A
Application number: CN202010331361.XA
Authority: CN
Inventors: 朱建勇; 杨辉; 黄鑫; 聂飞平
Original assignee: East China Jiaotong University
Current assignee: East China Jiaotong University
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2020-09-11

Abstract

The invention discloses a nonlinear feature selection method based on a neural network, which aims at the problem that an unsupervised learning method cannot utilize important information carried by a sample label only by analyzing the relationship among features; meanwhile, the industrial process has the characteristics of nonlinearity, complex coupling, hysteresis and the like, and the accurate characteristic weight cannot be obtained through a linear error function. The neural network error function is provided to replace the linear error function of the sparse regularization model, and group sparse constraint is carried out on the weight of the neural network input layer according to the complexity of the weight of the neural network, so that the nonlinear problem prediction precision of the sparse regularization model is improved. In addition, when solving for neural networks, L is used_2,1The norm is an error function solution, and the influence of the abnormal value on the feature selection result is reduced.

Description

Nonlinear feature selection method based on neural network

Technical Field

The invention relates to the field of feature selection in feature selection engineering, and particularly relates to a nonlinear feature selection method for enhancing the robustness of a feature selection model based on neural network regression function adjustment.

Background

With the continuous improvement of automation level in industrial process, more and more important positions are changed from manual control to computer monitoring, and machine vision technology is an important component of the industrial process. The machine vision technology is beneficial to solving the problems of high labor cost, difficulty in real-time monitoring, low control precision and the like in industrial processes due to the advantages of high efficiency, accuracy, objectivity and the like, and is widely applied to the fields of soft measurement, fault diagnosis, product classification and the like in various industrial processes. However, high-dimensional samples inevitably have dimensional disasters and data noise problems during the image digitization process of machine vision. In order to avoid dimension disasters and reduce noise influence, researchers provide a feature selection method technology, and irrelevant or redundant features are removed to reduce model complexity and improve learning performance.

At present, most of the common feature selection methods in industry are mainly filtering feature selection methods, and by analyzing the correlation and feature divergence among features, most of feature information is kept, but label information of samples is ignored, so that the method is an unsupervised learning method. The sparse regularization model is a simple and efficient feature selection method, a linear error function between features and label quantity is established, meanwhile, sparse constraint is carried out on feature weights, and an optimal feature selection result is obtained by solving the weight error function under the constraint condition. However, the industrial process has the characteristics of nonlinearity, complex coupling, hysteresis and the like, and the linear error function based on the classical sparse regularization model cannot accurately describe the characteristic relationship of the industrial process, so that the obtained characteristic selection result is not accurate enough.

Disclosure of Invention

In order to overcome the defects of the existing method, the invention provides a nonlinear feature selection method based on a neural network, which is called RSBP for short.

The invention aims to solve the problems that the unsupervised learning method cannot utilize important information carried by a sample label only by analyzing the relation between characteristics; meanwhile, the industrial process has the characteristics of nonlinearity, complex coupling, hysteresis and the like, and cannot be obtained through a linear error functionAnd obtaining the accurate feature weight. The neural network error function is provided to replace the linear error function of the sparse regularization model, and group sparse constraint is carried out on the weight of the neural network input layer according to the complexity of the weight of the neural network, so that the nonlinear problem prediction precision of the sparse regularization model is improved. In addition, when solving for neural networks, L is used_2,1The norm is an error function solution, and the influence of the abnormal value on the feature selection result is reduced.

The technical scheme of the invention is as follows:

a nonlinear feature selection method based on a neural network comprises the following steps: (1) neural network embedding: replacing a linear error function of the sparse regularization model with a neural network error function, and simultaneously carrying out group sparse constraint on weights of input layers of the neural network to establish a nonlinear feature selection model; (2) and (3) robustness optimization: by means of L_2,1Robustness of the norm, robustness optimization is carried out on the neural network, and an RSBP target function is established; (3) and (3) optimizing the strategy: introduction of class L_2,1And (3) solving an iterative function of the neural network according to a projection gradient descent method to obtain an optimal weight matrix, and taking the sum of absolute values of weight vectors corresponding to each input layer neuron as a characteristic importance index to solve the provided problem.

The nonlinear feature selection method based on the neural network comprises the following steps of (1):

the classical sparse regularization model (Lasso) can be understood to satisfy L₁Solving β the optimal solution of the linear regression problem under the norm constraint condition, where y ═ y₁,…,y_N) Representing an N-dimensional response vector, the input quantity X is an N × p matrix, and a lagrange function can be constructed as follows:

replacing a linear error function of the sparse regularization model with a neural network error function, and carrying out group sparse constraint on the weight of the neural network input layer according to the complexity of the weight of the neural network to obtain a nonlinear feature selection model:

a^m＝f^m-1((W^m-1)^Ta^m-1+b^m-1) (3)

wherein E represents an error function, h represents a constraint function, w represents a feature weight, f is an activation function, lambda is a sparse coefficient, M represents the number of layers of the neural network, a^mRepresents the net output of the mth layer, b^mRepresents an mth layer bias value;

gradient vanishing and gradient guarantee are also phenomena that the deep neural network must avoid in the solving process, and therefore attention must be paid to the hidden layer activation function when setting the neural network parameters, for example, an ELU (explicit Linear units) function can be adopted as the activation function:

the advantages of this function are: the method can ensure that the gradient disappears under the condition of the pole removing end, and simultaneously ensures the continuity and the differentiability of the function, thereby being convenient for the next solution.

The nonlinear feature selection method based on the neural network, wherein the step (2): l is_2,1Norm is compared to L₁The norm has better robustness, so that robustness constraint can be added on the basis of an original sparse neural network to improve the accuracy of feature selection:

the nonlinear feature selection method based on the neural network, wherein the step (3): due to L_2,1Norm is a non-smooth function, and a class L needs to be established_2,1And (3) a smooth function of the norm, solving an iterative function of the neural network according to a projection gradient descent method, taking the sum of absolute values of weight vectors corresponding to neurons of each input layer as a characteristic importance index, and carrying out the following process:

due to, L_2,1There is an immeasurable point in the norm, which increases the difficulty of the solution process. To solve this problem, we introduce an L_R,1Norm (class L)_2,1Norm) that makes the entire function differentiable by adding a smoothing term at points that are not differentiable. L is_R,1The norm is as follows:

the minimum value is expressed, and the above formula is easy to see when

When L is_R,1Norm and L_2,1The norm is equivalent. At the same time, due to the addition of a minimum value

It can be ensured that the derivative of the function is not 0, i.e. differentiable within the whole domain of definition;

in order to solve the expression, the number of the neural network layers is assumed to be 4, the ELU function is an activation function, and each layer of the neural network is expressed as L-i-j, and L is used for expressing_R,1Norm instead of L_2,1Norm, which leads to the following results;

carrying out integral derivation on the objective function to obtain a weight value updating formula

Computing neuron outputs o of layers^m：

Then calculating partial derivatives of each layer according to the error in the reverse direction:

let s^mIndicating the sensitivity:

introducing a derivative matrix:

this makes it possible to obtain:

finally updating the weight and the offset value

And repeatedly iterating, updating and optimizing the formula until a stopping condition is met, so as to obtain the optimal input layer weight matrix of the neural network. And finally, taking the sum of the absolute values of the weight vectors corresponding to each input layer neuron as a feature importance index to obtain a feature selection result.

In summary, the method aims at the existing industrial characteristicsThe selection method does not utilize sample label information and nonlinear and complex coupling of an industrial process, provides a linear error function of a sparse regularization model by replacing a neural network error function, and carries out group sparse constraint on the weight of an input layer of the neural network according to the complexity of the weight of the neural network so as to improve the nonlinear problem prediction precision of the sparse regularization model. In addition, when solving for neural networks, L is used_2,1The norm is an error function solution, and the influence of the abnormal value on the feature selection result is reduced.

The method is suitable for supervised feature selection of nonlinear processes such as complex industrial processes.

Drawings

FIG. 1 shows the variation of each index of a nonlinear feature selection model under different sparse coefficients; (a) sparse parameter sensitivity under MSE index; (b) sparse parameter sensitivity under ARE index; (c) r²Sparse parameter sensitivity under indexes;

FIG. 2 is a graph of regression yields for different selected dimensions; (a) comparing results of the algorithm under the MSE index; (b) comparing results of algorithms under ARE indexes; (c) r²Comparing the algorithm results under indexes;

FIG. 3 is the regression yields in different dimensions after adding a perturbation; (a) comparing results of the algorithm under the MSE index; (b) comparing results of algorithms under ARE indexes; (c) r²Comparing the algorithm results under indexes;

Detailed Description

The present invention will be described in detail with reference to specific examples.

Firstly, combining a neural network with a sparse regularization model, replacing a linear error function of the sparse regularization model with a neural network error function, carrying out group sparse constraint on a weight of an input layer of the neural network according to the complexity of the weight of the neural network, and establishing a nonlinear feature selection model. Then, using L_2,1And (3) carrying out robustness optimization on the neural network by the robustness of the norm, and establishing an RSBP target function. Finally, solving the iterative function of the neural network according to a projection gradient descent method, taking the sum of absolute values of weight vectors corresponding to each input layer neuron as a characteristic importance index, and solving the problem of the proposed weight vectorTo a problem of (a). Furthermore, due to L_2,1Norm is a non-smooth function, and a class L needs to be established_2,1And a smooth function of the norm avoids the occurrence of non-derivable points. The technical scheme is specifically described as follows:

(1) neural network embedding:

a^m＝f^m-1((W^m-1)^Ta^m-1+b^m-1) (3)

(2) And (3) robustness optimization:

the problems of measurement disturbance and artificial error existing in the complex industrial process are not beneficial to the establishment of a feature selection model, and the mathematical experiment of researchers proves that: l is_2,1Norm is compared to L₁The norm has better robustness, so that robustness constraint can be added on the basis of an original sparse neural network to improve the accuracy of feature selection:

(3) and (3) optimizing the strategy:

due to L_2,1Norm is a non-smooth function, and a class L needs to be established_2,1And (3) solving an iterative function of the neural network according to a projection gradient descent method, taking the sum of absolute values of weight vectors corresponding to neurons of each input layer as a characteristic importance index, and obtaining the problem provided by solving the actual industrial problem, wherein the process is as follows:

the minimum value is expressed, and the above formula is easy to see when

Computing neuron outputs o of layers^m：

let s^mIndicating the sensitivity:

introducing a derivative matrix:

this makes it possible to obtain:

finally updating the weight and the offset value

The method selects the real data of the foam image of the first roughing tank in the copper ore flotation process of a certain flotation plant to carry out simulation experiment, preprocesses the data, removes outliers, normalizes the data to obtain 245 groups of copper flotation samples, each group of samples comprises 14 characteristics (average value, peak value, standard deviation, skewness, R, G, B, hue, red component, yellow component, speed, stability, bearing rate and gray level) and 1 label (mineral grade), 195 groups are selected as training samples according to the data distribution characteristics of the mineral grade, and 49 groups are selected as test samples. In order to make the solution algorithm fast and effective, the number of hidden layers is set to 2, the number of neurons is 8, and the learning rate η is 0.0003. The convergence judgment condition is set as that "if the variation of the objective function does not exceed 0.01 for 20 consecutive generations, the objective function is considered to be converged", and the maximum iteration number is 4000. And solving the objective function to obtain the weight of the input layer, and obtaining the characteristic importance sequence aiming at the foam image by taking the absolute value of the weight as the importance basis. And generating feature subsets with different dimensions according to the feature importance ordering, performing inspection by using an SVR (singular value representation) model, and comparing to obtain an optimal combination result under the fixed feature ordering.

First, to explain the scientificity of the selection results, the change of the feature importance index was obtained by changing the sparse coefficient, as shown in tables 1-2. It can be seen that, as the sparse coefficient is gradually increased, each feature index is gradually decreased, and sparse solutions gradually appear until all are zero (when the sparse coefficient is large enough). In the process, the sequence of importance is fixed, the sequence is consistent with the sequence of sparse solution, the result is matched with subjective analysis, and the algorithm process is reasonable and interpretable.

TABLE 1 top 7 feature importance ranking

7 last feature importance ranking of Table 2

Meanwhile, analyzing the performance influence of the sparse parameter lambda on the technology of the invention, selecting lambda ∈ [0,240 ]]Representing the process from no sparseness to full sparseness of the objective function. Using mean square error MSE, mean relative error ARE and square correlation coefficient R²As an evaluation standard, the influence of the sparse coefficient on the algorithm performance under the input of different dimensional feature subsets is obtained, and the result is shown in figure 1, wherein (a), (b) and (c) represent the change of three different evaluation indexes, and as can be seen from figure 1, the most suitable sparse coefficient interval for the problem of the copper flotation primary tank foam image feature selection is lambda ∈ [90,120 ]]。

Then, the existing feature selection algorithms are compared, and the comparison algorithms comprise selection Sequence Forward Selection (SFS), Principal Component Analysis (PCA), Sparse PCA (SPCA), Sparse Artificial Neural Network (SANN) and robust feature selection algorithm (RFS), so that a new data set is obtained according to the same steps. In order to ensure the authenticity of the comparison condition, an SVR model with the same parameters is adopted as a test model of the feature selection result. The new data set is used to train the SVR model, and the mean square error, the average relative error and the decision coefficient are used as evaluation criteria, and the comparison results obtained respectively are shown in fig. 2. Among different evaluation criteria, RSBP is generally superior to all comparative feature selection methods, and the error of the method is lower compared with 14-dimensional original feature data. The method can effectively improve the model prediction precision.

Finally, in order to compare the anti-interference capability of the algorithm, 25% of training samples are randomly selected and 5% of proportion disturbance is added. As can be seen from fig. 3, the redundancy rate of the selected feature subset on the data set by RSBP is significantly lower than that of other methods. At the same time, it can be seen that the error of RSBP is kept at a fairly low level at all times in different dimensions, which indicates that our feature selection method is more robust. Comparing fig. 2 and fig. 3, we can see that the error of other methods will fluctuate significantly when adding the interference data to the training set, but our method always keeps the error at a low level. In the case of 25% sample interference, our method is 5% -12% more accurate than other methods.

It will be understood that modifications and variations can be made by persons skilled in the art in light of the above teachings and all such modifications and variations are intended to be included within the scope of the invention as defined in the appended claims.

Claims

1. A nonlinear feature selection method based on a neural network is characterized by comprising the following steps: (1) neural network embedding: replacing a linear error function of the sparse regularization model with a neural network error function, and simultaneously carrying out group sparse constraint on weights of input layers of the neural network to establish a nonlinear feature selection model; (2) and (3) robustness optimization: by means of L_2,1Robustness of the norm, robustness optimization is carried out on the neural network, and an RSBP target function is established; (3) and (3) optimizing the strategy: introduction of class L_2,1Smooth function of norm, solving of nerves according to projection gradient descent methodThe iterative function of the network obtains an optimal weight matrix, and the sum of the absolute values of the weight vectors corresponding to each input layer neuron is taken as a characteristic importance index, so that the proposed problem is solved.

2. The neural network-based nonlinear feature selection method according to claim 1, wherein the step (1):

a^m＝f^m-1((W^m-1)^Ta^m-1+b^m-1) (3)

wherein E represents an error function, h represents a constraint function, w represents a feature weight, f is an activation function, lambda is a sparse coefficient, M represents the number of layers of the neural network, a^mRepresents the net output of the mth layer, b^mRepresenting the mth layer bias value.

3. The method of claim 2, wherein the neural network-based nonlinear feature selection method is characterized in that an ELU function is used as the activation function when the neural network parameters are set, and the ELU function is used as the activation function:

4. the neural network-based nonlinear feature selection method according to claim 1, wherein the step (2): l is_2,1Norm is compared to L₁The norm has better robustness, and robustness constraint is added on the basis of an original sparse neural network:

5. the neural network-based nonlinear feature selection method according to claim 1, wherein the step (3): due to L_2,1Norm is a non-smooth function, and a class L needs to be established_2,1And (3) a smooth function of the norm, solving an iterative function of the neural network according to a projection gradient descent method, taking the sum of absolute values of weight vectors corresponding to neurons of each input layer as a characteristic importance index, and carrying out the following process:

introducing a class L_2,1Norm: l is_R,1A norm that makes the entire function differentiable by adding a smoothing term at a point that is not differentiable; l is_R,1The norm is as follows:

indicates a minimum value when

When L is_R,1Norm and L_2,1Norm equivalence, at the same time, because of adding minimum value

Ensure that the derivative of the function is not 0, i.e. differentiable over the entire domain;

assuming that the number of the neural network layers is 4, the ELU function is an activation function, and the nerves of each layer of the neural network are expressed as L-i-j, and L is used_R,1Norm instead of L_2,1Norm, which leads to the following results;

Computing neuron outputs o of layers^m：

let s^mIndicating the sensitivity:

introducing a derivative matrix:

this gives:

finally updating the weight and the offset value

Repeatedly iterating, updating and optimizing through the formula until a stopping condition is met to obtain an optimal input layer weight matrix of the neural network; and finally, taking the sum of the absolute values of the weight vectors corresponding to each input layer neuron as a feature importance index to obtain a feature selection result.