CN1945602A

CN1945602A - Characteristic selecting method based on artificial nerve network

Info

Publication number: CN1945602A
Application number: CNA2006100195700A
Authority: CN
Inventors: 桑农; 曹治国; 张天序; 谢衍涛; 张�荣; 贾沛
Original assignee: Huazhong University of Science and Technology
Current assignee: Streamax Technology Co Ltd
Priority date: 2006-07-07
Filing date: 2006-07-07
Publication date: 2007-04-11
Anticipated expiration: 2026-07-07
Also published as: CN100367300C

Abstract

The invention discloses a feature selection method based on an artificial neural network, which includes: ① The user specifies all the features that need to be selected for feature selection, and provides samples for training the artificial neural network; The number of nodes in each layer of the artificial neural network and the initial value of the connection weights between layers and the fuzzy membership function parameters are set; ③Use the backpropagation algorithm to train and adjust the Network connection weights and parameters of the fuzzy membership function; ④ Calculate the importance measure of all features and sort the features. The invention avoids the difficult problem of data normalization; the calculation is simple, and the network only needs to be trained once; it is easy to combine with various search algorithms to form a complete feature selection system. The invention has been successfully applied to various pattern recognition and object classification with multi-dimensional features, and can also be applied to various pattern recognition fields involving data-type features.

Description

A Feature Selection Method Based on Artificial Neural Network

技术领域technical field

本发明属于模式识别领域，涉及一种特征选择方法，具体为一种基于人工神经网络的特征选择方法。The invention belongs to the field of pattern recognition and relates to a feature selection method, in particular to a feature selection method based on an artificial neural network.

背景技术Background technique

特征选择(feature selection)技术是模式识别领域中的一个重要方面，因为模式识别算法的复杂度往往随着数据维数的增长而以指数的形式增长，如果不设法降低数据的维数，分类器的规模将变得异常庞大，进行分类所需要的运算开销也会大得无法承受。因此对数据特征进行选择，选出其中的重要特征，降低数据特征的维数是不可缺少的环节。而且，目前多数模式识别算法所用的特征大多是由机器自动提取的，这就不可避免地存在冗余、噪声等特征，利用特征选择可以有效地消除这一问题。Feature selection technology is an important aspect in the field of pattern recognition, because the complexity of pattern recognition algorithms tends to grow exponentially with the growth of data dimensions. If the dimensionality of data is not reduced, the classifier The scale of will become extremely large, and the computational overhead required for classification will be unbearably large. Therefore, it is an indispensable link to select data features, select important features, and reduce the dimensionality of data features. Moreover, the features used in most pattern recognition algorithms are mostly automatically extracted by machines, which inevitably have features such as redundancy and noise, which can be effectively eliminated by using feature selection.

特征选择是在不降低或较少降低分类器识别率的条件下，从所有特征构成的集合中选出一个子集的过程。特征选择技术的关键点是选用什么准则来度量特征的重要性。传统的度量准则，如基于距离的度量、基于信息(或不确定性)的度量、基于依赖性的度量等等，侧重于分析数据的特性，这类方法在实践中效果并不十分理想。随着人工智能领域的不断进步，一些利用人工神经网络(artificialneuron networks)和模糊数学(fuzzy math)等技术的特征度量方法被提了出来。这一类方法都是基于分类错误率的，即根据特征对分类错误率的贡献来度量其重要性大小，因此比前一类方法更加有效。在具体操作上，这类方法大多数利用人工神经网络技术来进行特征选择。Feature selection is the process of selecting a subset from the set of all features without reducing or reducing the recognition rate of the classifier. The key point of feature selection technology is what criteria to choose to measure the importance of features. Traditional measurement criteria, such as distance-based measurement, information-based (or uncertainty)-based measurement, dependency-based measurement, etc., focus on analyzing the characteristics of data, and the effect of such methods is not very satisfactory in practice. With the continuous progress in the field of artificial intelligence, some feature measurement methods using techniques such as artificial neural networks and fuzzy mathematics have been proposed. This type of method is based on the classification error rate, that is, the importance of the feature is measured according to the contribution of the feature to the classification error rate, so it is more effective than the previous method. In terms of specific operations, most of these methods use artificial neural network technology for feature selection.

基于人工神经网络的特征选择可以看作是剪枝算法(prune algorithm)的一个特例，即剪除输入层的节点而不是隐含层的节点或权值，如文献1的Reed R.Pruning Algorithms-A Survey.IEEE Transactions on Neural Networks，1993，4(5)：740~746介绍的。常见的思路是利用剪枝前后人工神经网络的输出值的变化作为特征的敏感性度量，如文献2的Verikas A，Baeauskiene M.Featureselection with neural networks.Pattern Recognition Letters，2002，23(11)：1323~1335。这种思路的基本假设是：一个学习良好的神经网络，对于越重要的特征的变化，其相应的输出值变化也越大，即越敏感，反之亦然。基于敏感性度量Aj的特征选择方法最直接而准确地反映了这一假设，如文献3的Ruck D W，Rogers S K and Kabrisky M.Feature selection using a multilayerperceptron，Journal of Neural Network Computing，1990，9(1)：40~48所述。Feature selection based on artificial neural network can be regarded as a special case of pruning algorithm (prune algorithm), that is, the nodes of the input layer are pruned instead of the nodes or weights of the hidden layer, such as Reed R.Pruning Algorithms-A of literature 1 Introduced by Survey.IEEE Transactions on Neural Networks, 1993, 4(5): 740~746. The common idea is to use the change of the output value of the artificial neural network before and after pruning as the sensitivity measure of the feature, such as Verikas A, Baeauskiene M. Feature selection with neural networks. Pattern Recognition Letters in Document 2, 2002, 23(11): 1323 ~1335. The basic assumption of this idea is: a well-learned neural network, for the change of the more important features, the greater the change of its corresponding output value, that is, the more sensitive it is, and vice versa. The feature selection method based on the sensitivity measure Aj most directly and accurately reflects this assumption, such as Ruck D W, Rogers S K and Kabrisky M. Feature selection using a multilayer perceptron in literature 3, Journal of Neural Network Computing, 1990, 9 (1): described in 40~48.

具体考察某个特征的重要性时，通过计算该特征删除前后人工神经网络的输出的变化作为特征度量。所谓删除特征，就是令样本中该特征的观察值恒为零，如文献4的De R.K，Basak J and Pal S K.Neuro-Fuzzy Feature EvaluationWith Theoretical Analysis.Neural Networks，1999，12(10)：1429~1455。这种方法要求首先对数据进行归一化，这可能会破坏数据。为了避开归一化的问题，可以在人工神经网络中增加一个模糊映射(fuzzy mapping)层，该层将每一个特征按一对多映射，映射后的新特征，即模糊特征，其定义域限定为[0，1]，因此就避开了归一化的问题，如文献5的Jia P and Sang N.Feature selectionusing a radial basis function networks and fuzzy set theoretic measures.In：Proceedings of SPIE 5281(1)-the Third International Symposium onMultispectral Image Processing and Pattern Recognition，Beijing，China：The International Society of Optical Engineering Press，2003.109~114。在这种方法里，模糊隶属度函数(fuzzy membership function)是在人工神经网络进行学习之前就已经获得的，它依赖的仍然是数据的一、二阶矩，这与文献4的归一化其实有同样的问题。事实上，完全可以把文献5提出的模糊映射层从网络中独立出来，作为一种归一化方法进行数据的预处理。When specifically examining the importance of a feature, the change in the output of the artificial neural network before and after the feature is deleted is calculated as the feature measure. The so-called deletion of features is to make the observation value of the feature in the sample constant to zero, such as De R.K, Basak J and Pal S K.Neuro-Fuzzy Feature Evaluation With Theoretical Analysis.Neural Networks, 1999, 12(10): 1429 in literature 4 ~1455. This approach requires the data to be normalized first, which can corrupt the data. In order to avoid the problem of normalization, a fuzzy mapping (fuzzy mapping) layer can be added to the artificial neural network, which maps each feature one-to-many, and the new feature after mapping, that is, the fuzzy feature, its definition domain It is limited to [0, 1], so it avoids the problem of normalization, such as Jia P and Sang N. Feature selection using a radial basis function networks and fuzzy set theoretical measures. In: Proceedings of SPIE 5281(1 )-the Third International Symposium on Multispectral Image Processing and Pattern Recognition, Beijing, China: The International Society of Optical Engineering Press, 2003.109~114. In this method, the fuzzy membership function (fuzzy membership function) is obtained before the artificial neural network learns, and it still relies on the first and second order moments of the data, which is actually the same as the normalization of literature 4. Had the same problem. In fact, it is entirely possible to separate the fuzzy mapping layer proposed in Document 5 from the network and use it as a normalization method for data preprocessing.

发明内容Contents of the invention

本发明的目的在于提供一种基于人工神经网络的特征选择方法，该方法避免了数据归一化的难题，鲁棒性高，对噪声特征和冗余特征具有好的效果。The purpose of the present invention is to provide a feature selection method based on artificial neural network, which avoids the problem of data normalization, has high robustness, and has good effect on noise features and redundant features.

本发明提供的一种基于人工神经网络的特征选择方法，包括以下步骤：A kind of feature selection method based on artificial neural network provided by the invention comprises the following steps:

(1)用户指定需要进行特征选择的特征f_i，i＝1，…，N，给出对人工神经网络进行训练用的训练样本集：(1) The user specifies the feature fi that needs to be selected for feature selection, _i =1,...,N, and gives the training sample set used for training the artificial neural network:

训练样本有相同的维数R，R＝N，分为K个类别：ω₁，…，ω_K，第q个训练样本x_q的第i维x_qi即指定的第i个特征f_i的第q次观测值；The training samples have the same dimension R, R=N, and are divided into K categories: ω ₁ ,..., ω _K , the i-th dimension x _qi of the q-th training sample x _q is the specified i-th feature f _i The qth observation value;

(2)根据训练样本，构造依次由输入层、模糊映射层、隐含层和输出层组成的人工神经网络；神经网络数据由输入层输入神经网络，通过连接权w²传递到模糊映射层，经过模糊映射层作用之后再通过连接权w³传递到隐含层，经过隐含层作用之后再通过连接权w⁴传递到输出层获得输出，其中，m＝2，3，4；(2) According to the training samples, construct an artificial neural network composed of input layer, fuzzy mapping layer, hidden layer and output layer; the neural network data is input into the neural network from the input layer, and passed to the fuzzy mapping layer through the connection weight ^w2 , After the action of the fuzzy mapping layer, it is passed to the hidden layer through the connection weight w ³ , and then passed to the output layer through the connection weight w ⁴ to obtain the output after the action of the hidden layer, wherein, m=2, 3, 4;

(3)使用用户给出的训练样本集训练初始化之后的人工神经网络，其处理过程为：(3) Use the training sample set given by the user to train the artificial neural network after initialization, and the processing process is as follows:

(3.1)选用均方误差的估计量e作为学习过程中的性能指数：(3.1) Select the estimator e of the mean square error as the performance index in the learning process:

$e e = = \frac{11}{Q Q} {Σ Σ}_{q q = = 11}^{Q Q} {Σ Σ}_{i i = = 11}^{G G} {(({t t}_{i i}^{m m} ((q q)) - - {a a}_{i i}^{m m} ((q q))))}^{22}$

其中，t_i ^m(q)是第m层的节点i在输入第q个样本时的输出的目标值，a_i ^m(q)是第m层的节点i在输入第q个样本时的实际输出，G为该层的节点数；Among them, t _i ^m (q) is the target value of the output of node i in the mth layer when inputting the qth sample, and a _i ^m (q) is the actual output value of the node i in the mth layer when inputting the qth sample Output, G is the number of nodes in this layer;

(3.2)采用反向传播算法对人工神经网络各层之间的连接权矩阵w^m进行训练，其中m＝3，4；(3.2) The connection weight matrix ^w between the layers of the artificial neural network is trained by using the backpropagation algorithm, where m=3,4;

(3.3)对模糊映射层节点的作用函数中的参数ξ，σ，τ进行更新；(3.3) update the parameters ξ, σ, and τ in the action function of the fuzzy mapping layer node;

(3.4)当e满足收敛条件时，进入步骤(4)，否则重复步骤(3.2)-(3.3)；(3.4) When e meets the convergence condition, enter step (4), otherwise repeat steps (3.2)-(3.3);

(4)使用已训练好的人工神经网络对特征进行模糊剪枝，计算每个特征的重要性度量，并按重要性的度量值对特征排序。(4) Use the trained artificial neural network to perform fuzzy pruning on the features, calculate the importance measure of each feature, and sort the features according to the measure value of importance.

本发明只需要用户给出原始特征集和训练用的样本，能够从中获得原始特征集中的所有特征对分类的重要性的排序。本发明的特征选择方法与现有的特征选择方法相比的优点在于：较好地避免了数据归一化的难题；计算简单，神经网络只需训练一次；容易和各种搜索算法结合起来组成一个完整的特征选择系统。本发明已成功应用于多种具有多维特征的模式识别和目标分类，也可应用于各类涉及数据型特征的模式识别领域。The present invention only needs the user to provide the original feature set and training samples, and can obtain the ranking of the importance of all features in the original feature set for classification. Compared with the existing feature selection method, the feature selection method of the present invention has the advantages of: better avoiding the difficult problem of data normalization; simple calculation, and the neural network only needs to be trained once; easy to combine with various search algorithms to form A complete feature selection system. The invention has been successfully applied to various pattern recognition and object classification with multi-dimensional features, and can also be applied to various pattern recognition fields involving data-type features.

附图说明Description of drawings

图1为基于带自适应模糊映射层的人工神经网络的特征选择方法的流程图；Fig. 1 is the flow chart of the feature selection method based on the artificial neural network of band adaptive fuzzy mapping layer;

图2为带有自适应模糊映射层的人工神经网络的结构示意图；Fig. 2 is the structural representation of the artificial neural network with adaptive fuzzy mapping layer;

图3为实例中建立的带有自适应模糊映射层的人工神经网络的结构示意图Figure 3 is a schematic diagram of the structure of the artificial neural network with adaptive fuzzy mapping layer established in the example

图4为特征seqal length的模糊隶属度函数图(初始值)。Fig. 4 is the fuzzy membership function diagram (initial value) of the feature seqal length.

具体实施方式Detailed ways

本发明的特征选择方法在用户给出训练用的数据集和需要进行选择的特征集的前提下，开始特征选择过程，下面详细介绍特征选择流程。The feature selection method of the present invention starts the feature selection process on the premise that the user provides the data set for training and the feature set to be selected. The feature selection process will be described in detail below.

进行特征选择，就是要获得对特征的重要性的度量。本发明提出的特征选择方法中，使用用户提供的数据集训练带有模糊映射层的人工神经网络，再借助训练好的网络计算出各个特征的重要性度量值，达到特征选择的目的。如图1所示，本发明方法包括以下步骤：To perform feature selection is to obtain a measure of the importance of features. In the feature selection method proposed by the present invention, the data set provided by the user is used to train the artificial neural network with the fuzzy mapping layer, and then the importance measure value of each feature is calculated by means of the trained network, so as to achieve the purpose of feature selection. As shown in Figure 1, the inventive method comprises the following steps:

(1)用户指定需要进行特征选择的特征f_i(i＝1，…，N)，给出对人工神经网络进行训练用的训练样本。(1) The user specifies the feature f _i (i=1, .

(1.1)特征的指定(1.1) Designation of features

指定的特征必须是数据型的特征，直接反映对象的实际物理意义或者几何意义，如重量、速度、长度等。特征的个数N为自然数，也就是说特征的个数为一个或者多个。The specified features must be data-type features that directly reflect the actual physical or geometric meaning of the object, such as weight, speed, length, etc. The number N of features is a natural number, that is to say, the number of features is one or more.

(1.2)训练样本的限定(1.2) Limitation of training samples

对人工神经网络进行训练用的训练样本也为数据型，所有样本有着相同的维数R(R＝N)，分为K个类别：ω₁，…，ω_K。维数R等于步骤(1.1)中指定的特征个数。第q个训练样本x_q的第i维x_qi就是指定的第i个特征f_i的第q次观测值。训练样本集的具体数学描述为：The training samples used for training the artificial neural network are also data types, all samples have the same dimension R (R=N), and are divided into K categories: ω ₁ , . . . , ω _K . The dimension R is equal to the number of features specified in step (1.1). The i-th dimension x _qi of the q-th training sample x _q is the q-th observation value of the specified i-th feature _fi . The specific mathematical description of the training sample set is:

其中，Q是训练样本的个数，且Q≥K，每一个类ω_l(l＝1，…，K)至少有一个样本，表示实数集，R是样本x_q的维数，等于训练样本集X的特征数N。Among them, Q is the number of training samples, and Q≥K, each class ω _l (l=1,...,K) has at least one sample, Represents a real number set, R is the dimension of the sample x _q , which is equal to the feature number N of the training sample set X.

(2)根据训练样本，构造由特征层A、模糊映射层B、隐含层C和输出层D组成的人工神经网络，并初始化。(2) Construct and initialize an artificial neural network consisting of feature layer A, fuzzy mapping layer B, hidden layer C and output layer D according to the training samples.

如图2所示，人工神经网络结构包括输入层A(即特征层)、模糊映射层B、隐含层C和输出层D，层与层之间用连接权w^m(m＝2，3，4)相连接。数据由输入层输入神经网络，然后通过连接权传递到模糊映射层，经过模糊映射层作用之后再通过连接权传递到隐含层，经过隐含层作用之后再通过连接权传递到输出层从而获得输出。构造带有模糊映射层的人工神经网络需要设置输入层(特征层)、隐含层和输出层的节点数；确定每个特征f_i对应的模糊隶属度函数的个数m_i，并定义这些模糊隶属度函数。初始化工作则需要确定人工神经网络各层之间的连接权值的初始值和模糊映射层中每个节点内的模糊隶属度函数的参数的初始值。As shown in Figure 2, the artificial neural network structure includes input layer A (i.e. feature layer), fuzzy mapping layer B, hidden layer C and output layer D, and the connection weight w ^m (m=2,3 , 4) are connected. The data is input into the neural network from the input layer, and then passed to the fuzzy mapping layer through the connection weight, and then passed to the hidden layer through the connection weight after the function of the fuzzy mapping layer, and then passed to the output layer through the connection weight after the action of the hidden layer to obtain output. Constructing an artificial neural network with a fuzzy mapping layer needs to set the number of nodes in the input layer (feature layer), hidden layer and output layer; determine the number m _i of fuzzy membership functions corresponding to each feature f _i , and define these Fuzzy membership function. The initialization work needs to determine the initial value of the connection weight between the layers of the artificial neural network and the initial value of the parameters of the fuzzy membership function in each node in the fuzzy mapping layer.

具体过程如下：The specific process is as follows:

(2.1)输入层A(2.1) Input layer A

(2.1.1)输入层节点个数的选取(2.1.1) Selection of the number of input layer nodes

输入层A的节点数S¹等于训练样本的维数R。The number of nodes ^S1 of the input layer A is equal to the dimensionality R of the training samples.

(2.1.2)输入层节点的输入和输出(2.1.2) Input and output of input layer nodes

每个节点输入训练样本的某一维。当神经网络输入第q个样本时，输入层的节点A_i的输入为：Each node inputs a certain dimension of the training sample. When the neural network inputs the qth sample, the input of the node A _i of the input layer is:

${n no}_{i i}^{11} ((q q)) = = {x x}_{qi qi},,$

输出为：The output is:

${a a}_{i i}^{11} ((q q)) = = {x x}_{qi qi} . .$

(2.2)模糊映射层B(2.2) Fuzzy Mapping Layer B

(2.2.1)各特征对应的模糊隶属度函数的个数的选取(2.2.1) Selection of the number of fuzzy membership functions corresponding to each feature

对于特征f_i，可以根据其具体的物理意义来定义f_i对应的m_i个模糊隶属度函数，每一个模糊隶属度函数构成一个模糊映射层节点。也就是说，模糊映射层B的节点数 $S^{2} = Σ_{i = 1}^{S^{1}} m_{i},$ m_i值的选取需要满足如下条件：For feature f _i , _mi fuzzy membership functions corresponding to f _i can be defined according to its specific physical meaning, and each fuzzy membership function constitutes a fuzzy mapping layer node. That is, the number of nodes in the fuzzy mapping layer B $S$ $^{2} = Σ_{i = 1}^{S^{1}} m_{i},$ The selection of the value of m _i needs to meet the following conditions:

$\frac{{Q Q}_{min min}}{{Σ Σ}_{i i = = 11}^{{S S}^{11}} {m m}_{i i}} > > 33$

其中，Q_min＝min{Q_l}，Q_l表示用户给出的训练样本中属于类ω_l的样本数。Among them, Q _min =min{Q _l }, Q _l represents the number of samples belonging to class ω _l in the training samples given by the user.

(2.2.2)输入层与模糊映射层之间的连接权(2.2.2) The connection weight between the input layer and the fuzzy mapping layer

输入层的节点A_i与模糊映射层的节点B_i1，…，B_imi用连接权相连接，而且模糊映射层的节点B_i1，…，B_imi不与除A_i之外的输入层其他节点连接，即所谓的1对多的连接方式。特征层A节点A_i和模糊映射层B节点B_ij之间的连接权值恒为1，即特征层A和模糊映射层B之间的连接权矩阵w²不参加人工神经网络的训练。The node A _i of the input layer is connected with the nodes B _i1 ,...,B _imi of the fuzzy mapping layer with connection weights, and the nodes B _i1 ,...,B _imi of the fuzzy mapping layer are not connected with other nodes of the input layer except A _i Connection, the so-called 1-to-many connection. The connection weight between node A _i of feature layer A and node B _ij of fuzzy mapping layer is always 1, that is, the connection weight matrix w ² between feature layer A and fuzzy mapping layer B does not participate in the training of artificial neural network.

(2.2.3)模糊映射层的节点B_ij的输入(2.2.3) The input of the node B _ij of the fuzzy mapping layer

当神经网络输入第q个样本时，模糊映射层的节点B_ij的输入为：When the neural network inputs the qth sample, the input of the node B _ij of the fuzzy mapping layer is:

${n no}_{ij ij}^{22} ((q q)) - - {x x}_{qi qi} \times \times 11 = = {x x}_{qi qi} . .$

(2.2.4)模糊映射层的节点B_ij的作用函数(2.2.4) The action function of the node B _ij of the fuzzy mapping layer

模糊映射层的节点B_ij的作用函数为模糊隶属度函数μ_ij，即特征f_i的第j个隶属度函数。在本发明中，所谓给定第i维特征f_i的一个模糊隶属度函数，是指给定了一个映射μ_i：f_i→[0，1]。The action function of the node B _ij of the fuzzy mapping layer is the fuzzy membership function μ _ij , which is the jth membership function of the feature f _i . In the present invention, the so-called fuzzy membership function given the i-th dimension feature f _i means that a mapping μ _i is given: f _i → [0, 1].

节点B_ij的模糊隶属度函数形式如下：The form of fuzzy membership function of node B _ij is as follows:

$a_{ij}^{2} (q) = \frac{1}{1 + {(\frac{n_{ij}^{2} (q) - ξ_{ij}}{σ_{ij}})}^{2 τ}},$ σ_ij≠0，τ_ij≥0. $a_{ij}^{2} (q) = \frac{1}{1 + {(\frac{{no}_{ij}^{2} (q) - ξ_{ij}}{σ_{ij}})}^{2 τ}},$ σ _ij ≠0, τ _ij ≥0.

在这里，n_ij ²(q)为输入第q个样本时模糊映射层的节点B_ij的输入，a_ij ²(q)为相应的实际输出。ξ_ij为节点B_ij的类条件概率密度的期望，σ_ij为节点B_ij的类条件概率密度的标准差，τ_ij为节点B_ij的一个参数。τ的作用表现在：即使两个隶属度函数的ξ和σ是相等的，仍然可以通过对τ的调整避免两个隶属度函数完全相同。Here, n _ij ² (q) is the input of node B _ij of the fuzzy mapping layer when the qth sample is input, and a _ij ² (q) is the corresponding actual output. ξ _ij is the expectation of class conditional probability density of node B _ij , σ _ij is the standard deviation of class conditional probability density of node B _ij , τ _ij is a parameter of node B _ij . The role of τ is as follows: Even if the ξ and σ of the two membership functions are equal, the adjustment of τ can still prevent the two membership functions from being identical.

对于σ_ij和τ_ij的初始化没有特别的限制，ξ_ij一般采取在对应的特征f_i的值域上随机选取的方法。There are no special restrictions on the initialization of σ _ij and τ _ij , and ξ _ij generally adopts a method of randomly selecting the value range of the corresponding feature f _i .

(2.3)隐含层C(2.3) hidden layer C

(2.3.1)隐含层节点个数的选取(2.3.1) Selection of the number of hidden layer nodes

隐含层C的节点个数S³的选取没有特别的要求，一般只要不小于训练样本的类别数K即可。There is no special requirement for the selection of the number ^S3 of nodes in the hidden layer C, generally as long as it is not less than the number K of categories of training samples.

(2.3.2)模糊映射层与隐含层之间的连接权(2.3.2) Connection weight between fuzzy mapping layer and hidden layer

模糊映射层B与隐含层C之间为全连接，也就是说模糊映射层B的每一个节点都与隐含层C的所有节点相连接，隐含层C的每一个节点也与模糊映射层B的所有节点相连接。模糊映射层B与隐含层C之间的连接权There is a full connection between the fuzzy mapping layer B and the hidden layer C, that is to say, each node of the fuzzy mapping layer B is connected to all nodes of the hidden layer C, and each node of the hidden layer C is also connected to the fuzzy mapping layer All nodes of layer B are connected. Connection weight between fuzzy mapping layer B and hidden layer C

$w^{3} = {[\begin{matrix} w_{11}^{3} & \cdot \cdot \cdot & w_{1 u}^{3} & \cdot \cdot \cdot & w_{{1 S}^{3}}^{3} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ w_{p 1}^{3} & \cdot \cdot \cdot & w_{pu}^{3} & \cdot \cdot \cdot & w_{{pS}^{3}}^{3} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ w_{S^{2} 1}^{3} & \cdot \cdot \cdot & w_{S^{2} u}^{3} & \cdot \cdot \cdot & w_{S^{2} S^{3}}^{3} \end{matrix}]}_{S^{2} \times S^{3}}$ (p＝1，…，S²，u＝1，…，S³)的初始化采取随机的方法，连接权值的取值范围为[0，1]。 $w^{3} = {[\begin{matrix} w_{11}^{3} & &Center Dot; &Center Dot; &Center Dot; & w_{1 u}^{3} & \cdot \cdot \cdot & w_{{1 S}^{3}}^{3} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ w_{p 1}^{3} & &Center Dot; &Center Dot; &Center Dot; & w_{pu}^{3} & \cdot \cdot \cdot & w_{{PS}^{3}}^{3} \\ &Center Dot; & &Center Dot; & &Center Dot; \\ &Center Dot; & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ w_{S}^{^{2} 1} & &Center Dot; &Center Dot; &Center Dot; & w_{S^{2} u}^{3} & \cdot \cdot &Center Dot; & w_{S^{2} S^{3}}^{3} \end{matrix}]}_{S^{2} \times S^{3}}$ The initialization of (p=1, ..., S ² , u=1, ..., S ³ ) adopts a random method, and the value range of the connection weight is [0, 1].

(2.3.3)隐含层节点的输入(2.3.3) Input of hidden layer nodes

当神经网络输入第q个样本时，隐含层的节点C_u(u＝1，…，S³)的输入为：When the neural network inputs the qth sample, the input of the hidden layer node C _u (u=1,..., S ³ ) is:

${n no}_{u u}^{33} ((q q)) = = {Σ Σ}_{p p = = 11}^{{S S}^{22}} {a a}_{p p}^{22} ((q q)) \times \times {w w}_{pu pu}^{33} . .$

其中，a_p ²(q)是模糊映射层的节点B_p(p＝1，…，S²)在神经网络输入第q个样本时的输出，w_pu ³是模糊映射层的节点B_p与隐含层节点C_u之间的连接权。Among them, a _p ² (q) is the output of the node B _p (p=1,..., S ² ) of the fuzzy mapping layer when the neural network inputs the qth sample, w _pu ³ is the node B _p of the fuzzy mapping layer and Connection weight between hidden layer nodes C _u .

(2.3.4)隐含层节点的作用函数(2.3.4) Action function of hidden layer nodes

隐含层节点的作用函数选择为Sigmoid函数：The function function of the hidden layer node is selected as the Sigmoid function:

$a_{u}^{3} (q) = \frac{1}{1 + \exp (- n_{u}^{3} (q))},$ (u＝1，…，S³). $a_{u}^{3} (q) = \frac{1}{1 + \exp (- {no}_{u}^{3} (q))},$ (u=1, . . . , S ³ ).

其中，n_u ³(q)为神经网络输入第q个样本时隐含层的节点C_u的输入，a_u ³(q)为相应的输出。Among them, n _u ³ (q) is the input of node C _u of the hidden layer when the neural network inputs the qth sample, and a _u ³ (q) is the corresponding output.

也可以选择为双曲线正切函数：It can also optionally be a hyperbolic tangent function:

$a_{u}^{3} (q) = \frac{1 - \exp (- n_{u}^{3} (q))}{1 + \exp (- n_{u}^{3} (q))},$ (u＝1，…，S³). $a_{u}^{3} (q) = \frac{1 - \exp (- {no}_{u}^{3} (q))}{1 + \exp (- {no}_{u}^{3} (q))},$ (u=1, . . . , S ³ ).

(2.4)输出层D(2.4) Output layer D

(2.4.1)输出层节点个数的选择(2.4.1) Selection of the number of nodes in the output layer

输出层D的节点数S⁴等于训练样本的类别数K。The number of nodes S ⁴ of the output layer D is equal to the number K of categories of training samples.

(2.4.2)隐含层与输出层之间的连接权(2.4.2) Connection weight between hidden layer and output layer

隐含层C与输出层D之间为全连接，也就是说隐含层C的每一个节点都与输出层D的所有节点相连接，输出层D的每一个节点也与隐含层C的所有节点相连接。隐含层C与输出层D之间的连接权值 $w^{4} = {[\begin{matrix} w_{11}^{4} & \cdot \cdot \cdot & w_{1 l}^{4} & \cdot \cdot \cdot & w_{{lS}^{4}}^{4} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ w_{u 1}^{4} & \cdot \cdot \cdot & w_{ul}^{4} & \cdot \cdot \cdot & w_{{uS}^{4}}^{4} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ w_{S^{3} 1}^{4} & \cdot \cdot \cdot & w_{S^{3} l}^{4} & \cdot \cdot \cdot & w_{S^{3} S^{4}}^{4} \end{matrix}]}_{S^{3} \times S^{4}}$ (u＝1，…，S³，l＝1，…，S⁴)的初始化采取随机的方法，权值的取值范围为[0，1]。There is a full connection between the hidden layer C and the output layer D, that is to say, each node of the hidden layer C is connected to all nodes of the output layer D, and each node of the output layer D is also connected to all nodes of the hidden layer C. All nodes are connected. Connection weight between hidden layer C and output layer D $w$ $^{4} = {[\begin{matrix} w_{11}^{4} & &Center Dot; &Center Dot; &Center Dot; & w_{1 l}^{4} & \cdot &Center Dot; &Center Dot; & w_{l}^{^{4}} \\ \cdot & &Center Dot; & &Center Dot; \\ &Center Dot; & &Center Dot; & &Center Dot; \\ &Center Dot; & &Center Dot; & &Center Dot; \\ w_{u 1}^{4} & \cdot &Center Dot; &Center Dot; & w_{ul}^{} & \cdot &Center Dot; &Center Dot; & w_{u}^{^{4}} \\ &Center Dot; & &Center Dot; & \cdot \\ \cdot & &Center Dot; & &Center Dot; \\ &Center Dot; & \cdot & &Center Dot; \\ w_{S^{3} 1}^{4} & \cdot \cdot \cdot & w_{S^{3} l}^{4} & \cdot \cdot \cdot & w_{S^{3} S^{4}}^{4} \end{matrix}]}_{S^{3} \times S^{4}}$ (u=1, . . . , S ³ , l= ¹ , .

(2.4.3)输出层节点的输入和输出(2.4.3) Input and output of output layer nodes

输出层节点D_l(l＝1，…，S⁴)的输入和输出相等，D_l的输出值n_l ⁴(q)就是神经网络输入的第q个样本属于类ω_l的概率。The input and output of the output layer node D _l (l=1,..., S ⁴ ) are equal, and the output value n _l ⁴ (q) of D _l is the probability that the qth sample input by the neural network belongs to the class ω _l .

${n no}_{l l}^{44} ((q q)) = = {a a}_{l l}^{44} ((q q)) = = {Σ Σ}_{u u = = 11}^{{S S}^{33}} {n no}_{u u}^{33} ((q q)) \times \times {w w}_{}^{ul ul} . .$

其中，w_ul ⁴是隐含层的节点C_u与输出层节点D_l之间的连接权。Among them, w _ul ⁴ is the connection weight between the node C _u of the hidden layer and the node D _l of the output layer.

(3)使用用户给出的训练样本集训练初始化之后的人工神经网络。(3) Use the training sample set given by the user to train the artificial neural network after initialization.

利用反向传播算法，以批处理的学习方式，依据用户给出的训练样本集对人工神经网络进行训练，在每一次训练中更新神经网络各层之间的连接权值和模糊隶属度函数的参数，直至人工神经网络满足用户设定的收敛条件。Using the backpropagation algorithm, the artificial neural network is trained according to the training sample set given by the user in a batch learning method, and the connection weights between the layers of the neural network and the fuzzy membership function are updated in each training. parameters until the artificial neural network meets the convergence conditions set by the user.

具体的训练方法如下。The specific training method is as follows.

(3.1)收敛条件的选取(3.1) Selection of convergence conditions

首先，选用均方误差的估计量e作为学习过程中的性能指数：First, the estimator e of the mean square error is selected as the performance index in the learning process:

$e e = = \frac{11}{Q Q} {Σ Σ}_{q q = = 11}^{Q Q} {Σ Σ}_{i i = = 11}^{G G} {(({t t}_{i i}^{m m} ((q q)) - - {a a}_{i i}^{m m} ((q q))))}^{22} . .$

其中，t_i ^m(q)是第m层的节点i在输入第q个样本时的输出的目标值，a_i ^m(q)是第m层的节点i在输入第q个样本时的实际输出，G为该层的节点数。Among them, t _i ^m (q) is the target value of the output of node i in the mth layer when inputting the qth sample, and a _i ^m (q) is the actual output value of the node i in the mth layer when inputting the qth sample Output, G is the number of nodes in this layer.

用户可以根据对运算精度的要求，设定e小于某个很小的正数为收敛条件。比如，设定e＜0.001为收敛条件，则人工神经网络在某次训练中完成步骤(3.2)和(3.3)之后，计算e的值，如果小于0.001，就停止训练；否则进行下一次训练。The user can set e to be smaller than a small positive number as the convergence condition according to the requirements for operation accuracy. For example, if e<0.001 is set as the convergence condition, the artificial neural network calculates the value of e after completing steps (3.2) and (3.3) in a training session, and if it is less than 0.001, stop training; otherwise, proceed to the next training.

(3.2)各层之间连接权值的更新(3.2) Update of connection weights between layers

输入层A与模糊映射层B之间的连接权值恒为1，不参加训练。模糊映射层B与隐含层C之间的连接权w³，隐含层C与输出层D之间的连接权w⁴都需要参加训练，而且w³和w⁴在训练中的更新方法相同。The connection weight between the input layer A and the fuzzy mapping layer B is always 1, and does not participate in training. The connection weight w ³ between the fuzzy mapping layer B and the hidden layer C, and the connection weight w ⁴ between the hidden layer C and the output layer D need to participate in the training, and the update method of w ³ and w ⁴ in the training is the same .

反向传播算法中的均方误差的估计量e对于第m层输入的敏感性定义为The sensitivity of the estimator e of the mean square error in the backpropagation algorithm to the input of the mth layer is defined as

${g g}^{m m} = = \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}^{m m}} {[\begin{matrix} \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{11}^{m m} ((11))} & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{11}^{m m} ((q q))} & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{11}^{m m} ((Q Q))} \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ \cdot &Center Dot; & \cdot \cdot & \cdot &Center Dot; \\ \cdot &Center Dot; & \cdot \cdot & \cdot &Center Dot; \\ \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{i i}^{m m} ((11))} & \cdot \cdot \cdot &Center Dot; \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{i i}^{m m} ((q q))} & \cdot \cdot \cdot \cdot \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{i i}^{m m} ((Q Q))} \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ \cdot \cdot & \cdot &Center Dot; & \cdot &Center Dot; \\ \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{{S S}^{m m}}^{m m} ((11))} & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{{S S}^{m m}}^{m m} ((q q))} & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{{S S}^{m m}}^{m m} ((Q Q))} \end{matrix}]}_{}$

其中，S^m是人工神经网络第m层的节点数，n^m是一个大小为S^m×Q的矩阵，表示人工神经网络第m层的输入；n_i ^m(q)表示第m层的节点i在神经网络输入第q个样本时的输入。而且， $\frac{&PartialD; e}{{&PartialD; n}_{i}^{m} (q)} = \frac{&PartialD; e}{{&PartialD; a}_{i}^{m} (q)} \times \frac{{&PartialD; a}_{i}^{m} (q)}{{&PartialD; n}_{i}^{m} (q)} .$ Among them, S ^m is the number of nodes in the m-th layer of the artificial neural network, n ^m is a matrix with a size of S ^m ×Q, which represents the input of the m-th layer of the artificial neural network; n _i ^m (q) represents the nodes of the m-th layer i is the input when the neural network inputs the qth sample. and, $\frac{&PartialD; e}{{&PartialD; no}_{i}^{m} (q)} = \frac{&PartialD; e}{{&PartialD; a}_{i}^{m} (q)} \times \frac{{&PartialD; a}_{i}^{m} (q)}{{&PartialD; no}_{i}^{m} (q)} .$

按最速下降法对连接权值进行更新，在此处也可采用共轭梯度法等最小模估计算法。人工神经网络第m层和第m-1层(m＝3，4)之间的连接权矩阵w^m(维数是S^m-1×S^m)，在第(r+1)次训练开始时更新为The connection weights are updated according to the steepest descent method, and the minimum modulus estimation algorithm such as the conjugate gradient method can also be used here. The connection weight matrix w ^m (dimension is S m ^- 1 ×S ^m ) between the mth layer and the m-1th layer (m=3, 4) of the artificial neural network, the (r+1)th training starts when updated to

w^m(r+1)＝w^m(r)-αg^m(a^m-1)^T.w ^m (r+1)＝w ^m (r)-αg ^m (a ^m-1 ) ^T .

其中，α是权值学习速率，取值范围是0＜α≤1，一般选择为0.05。r是训练的次数。a^m是一个大小为S^m×Q的矩阵，表示人工神经网络第m层的实际输出：Among them, α is the weight learning rate, and the value range is 0<α≤1, and it is generally selected as 0.05. r is the number of training times. a ^m is a matrix of size S ^m ×Q, representing the actual output of the mth layer of the artificial neural network:

${a a}^{m m} = = {[\begin{matrix} {a a}_{11}^{m m} ((11)) & \cdot \cdot \cdot \cdot \cdot &Center Dot; & {a a}_{11}^{m m} ((q q)) & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & {a a}_{11}^{m m} ((Q Q)) \\ \cdot &Center Dot; & \cdot \cdot & \cdot \cdot \\ \cdot \cdot & \cdot \cdot & \cdot \cdot \\ \cdot \cdot & \cdot \cdot & \cdot &Center Dot; \\ {a a}_{i i}^{m m} ((11)) & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & {a a}_{i i}^{m m} ((q q)) & \cdot \cdot \cdot &Center Dot; \cdot &Center Dot; & {a a}_{i i}^{m m} ((Q Q)) \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ \cdot &Center Dot; & \cdot \cdot & \cdot &Center Dot; \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ {a a}_{{S S}^{m m}}^{m m} ((11)) & \cdot &Center Dot; \cdot \cdot \cdot &Center Dot; & {a a}_{{S S}^{m m}}^{m m} ((q q)) & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & {a a}_{{S S}^{m m}}^{m m} ((Q Q)) \end{matrix}]}_{{S S}^{m m} \times \times Q Q}$

(3.3)模糊映射层节点的作用函数的参数ξ，σ，τ的更新(3.3) The update of the parameters ξ, σ, τ of the action function of the fuzzy mapping layer node

模糊映射层B的节点B_p(p＝1，…，S²)的作用函数的三个参数ξ_p，σ_p，τ_p按以下式子更新，其中θ是ξ_p的学习速率，

是σ_p的学习速率，ρ是τ_p的学习速率，采用试错法等参数选取方法。The three parameters ξ _p , σ _p , τ _p of the action function of node B _p (p=1, ..., S ² ) of fuzzy mapping layer B are updated according to the following formula, where θ is the learning rate of ξ _p ,

is the learning rate of σ _p , and ρ is the learning rate of τ _p , using trial and error and other parameter selection methods.

${ξ ξ}_{p p} ((r r + + 11)) = = {ξ ξ}_{p p} ((r r)) - - θ θ \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD;}_{p p} {a a}^{22}} {((\frac{{&PartialD; &PartialD;}_{p p} {a a}^{22}}{{&PartialD; &PartialD; ξ ξ}_{p p} ((r r))}))}^{T T},,$

${τ τ}_{p p} ((r r + + 11)) = = {τ τ}_{p p} ((r r)) - - ρ ρ \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD;}_{p p} {a a}^{22}} {((\frac{{&PartialD; &PartialD;}_{p p} {a a}^{22}}{{&PartialD; &PartialD; τ τ}_{p p} ((r r))}))}^{T T} . .$

其中，_pa²是对人工神经网络输入Q个样本时模糊映射层B的输出矩阵a²的第p行。而且Among them, _p a ² is the pth row of the output matrix a ² of the fuzzy mapping layer B when Q samples are input to the artificial neural network. and

$\frac{{&PartialD; &PartialD;}_{p p} {a a}^{22}}{{&PartialD; &PartialD; ξ ξ}_{p p} ((r r))} = = {[[\cdot &Center Dot; \cdot \cdot \cdot \cdot,, \frac{{22 τ τ}_{p p} ((r r)) (({a a}_{i i}^{11} ((q q)) - - {ξ ξ}_{p p} ((r r))))}{{(({σ σ}_{p p} ((r r))))}^{22}} {(({a a}_{p p}^{22} ((q q))))}^{22} {((\frac{11}{{a a}_{p p}^{22} ((q q))}))}^{11 - - \frac{11}{{τ τ}_{p p} ((r r))}},, \cdot &Center Dot; \cdot &Center Dot; \cdot \cdot]]}_{11 \times \times Q Q},,$

$\frac{{&PartialD; &PartialD;}_{p p} {a a}^{22}}{{&PartialD; &PartialD; σ σ}_{p p} ((r r))} = = {[[\cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot;,, \frac{{22 τ τ}_{p p} ((r r))}{{σ σ}_{p p} ((r r))} {a a}_{p p}^{22} ((q q)) ((11 - - {a a}_{p p}^{22} ((q q)))) \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot;]]}_{11 \times \times Q Q},,$

$\frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD;}_{p p} {a a}^{22}} = = 11_{11 \times \times {s the s}^{33}} \times \times {[\begin{matrix} \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{11}^{33} ((11))} \frac{{&PartialD; &PartialD; n no}_{11}^{33} ((11))}{{&PartialD; &PartialD; a a}_{p p}^{22} ((11))} & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{11}^{33} ((q q))} \frac{{&PartialD; &PartialD; n no}_{11}^{33} ((q q))}{{&PartialD; &PartialD; a a}_{p p}^{22} ((q q))} & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{11}^{33} ((Q Q))} \frac{{&PartialD; &PartialD; n no}_{11}^{33} ((Q Q))}{{&PartialD; &PartialD; a a}_{p p}^{22} ((Q Q))} \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{u u}^{33} ((11))} \frac{{&PartialD; &PartialD;}_{u u}^{33} ((11))}{{&PartialD; &PartialD; a a}_{p p}^{22} ((11))} & \cdot &Center Dot; \cdot \cdot \cdot \cdot & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{u u}^{33} ((q q))} \frac{{&PartialD; &PartialD; n no}_{u u}^{33} ((q q))}{{&PartialD; &PartialD; a a}_{p p}^{22} ((q q))} & \cdot \cdot \cdot \cdot \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{u u}^{33} ((Q Q))} \frac{{&PartialD; &PartialD; n no}_{u u}^{33} ((Q Q))}{{&PartialD; &PartialD; a a}_{p p}^{22} ((Q Q))} \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot \cdot \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot \cdot \\ \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{{S S}^{33}}^{33} ((11))} \frac{{&PartialD; &PartialD; n no}_{{S S}^{33}}^{33}}{{&PartialD; &PartialD; a a}_{p p}^{22} ((11))} & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{{S S}^{33}}^{33} ((q q))} \frac{&PartialD; &PartialD; {n no}_{{S S}^{33}}^{33} ((q q))}{{&PartialD; &PartialD; a a}_{p p}^{22} ((q q))} & \cdot \cdot \cdot \cdot \cdot \cdot & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{{S S}^{33}}^{33} ((Q Q))} \frac{{&PartialD; &PartialD; n no}_{{S S}^{33}}^{33} ((Q Q))}{{&PartialD; &PartialD; a a}_{p p}^{22} ((Q Q))} \end{matrix}]}_{{S S}^{33} \times \times Q Q}$

其中，a_i ¹(q)是与节点B_p相连的输入层节点A_i在神经网络输入第q个样本时的输出，也就是x_qi。Among them, a _i ¹ (q) is the output of the input layer node A _i connected to the node B _p when the neural network inputs the qth sample, that is, x _qi .

(3.4)训练的终止(3.4) Termination of training

人工神经网络在每一次训练中都进行步骤(3.2)和(3.3)的操作。每一次训练完成之后，计算e的值，如果满足在步骤(3.1)中设置的收敛条件，就停止训练；否则进行下一次训练。The artificial neural network performs steps (3.2) and (3.3) in each training. After each training is completed, the value of e is calculated, and if the convergence condition set in step (3.1) is satisfied, the training is stopped; otherwise, the next training is performed.

(4)使用已训练好的人工神经网络对特征进行模糊剪枝，计算每个特征的重要性度量，并排序。(4) Use the trained artificial neural network to perform fuzzy pruning on the features, calculate the importance measure of each feature, and sort them.

(4.1)对特征f_i进行模糊剪枝(4.1) Perform fuzzy pruning on feature f _i

所谓对特征f_i的模糊剪枝(fuzzy prune algorithm)，就是将特征f_i对应的所有模糊隶属度函数的输出值置为0.5，也就是使得模糊映射层的输出为The so-called fuzzy prune algorithm for feature f _i is to set the output values of all fuzzy membership functions corresponding to feature f _i to 0.5, that is, to make the output of the fuzzy mapping layer be

然后，获得这种条件下人工神经网络对于输入样本x_q时输出层给出的输出向量a⁴(x_q，i)。Then, obtain the output vector a ⁴ (x _q , i) given by the output layer of the artificial neural network for the input sample x _q under this condition.

(4.2)计算特征的重要性度量FQJ(i)(4.2) Calculate the importance measure FQJ(i) of the feature

本发明提出的特征度量函数FQJ(i)表示第i维特征f_i对于分类的重要性，特征f_i的FQJ(i)值越大则表明该特征对分类而言越重要。FQJ(i)的定义如下：The feature measurement function FQJ(i) proposed by the present invention represents the importance of the i-th dimension feature f _i for classification, and the larger the FQJ(i) value of feature f _i is, the more important the feature is for classification. The definition of FQJ(i) is as follows:

$FQJ FQJ ((i i)) = = \frac{11}{Q Q} {Σ Σ}_{q q = = 11}^{Q Q} {| | | | {a a}^{44} (({x x}_{q q},, i i)) - - {a a}^{44} (({x x}_{q q})) | | | |}^{22}$

其中，a⁴(x_q)表示人工神经网络对于输入样本x_q时输入层给出的输出向量，a⁴(x_q，i)表示对特征f_i进行模糊剪枝后的人工神经网络对于输入样本x_q给出的输出向量。使用在步骤(3)中已训练好的人工神经网络对在步骤(1.1)中用户给出的所有特征f_i，按照公式上述公式计算其相应的FQJ(i)，特征f_i的FQJ(i)值就是其重要性的度量。Among them, a ⁴ (x _q ) represents the output vector given by the input layer of the artificial neural network for the input sample x _q , and a ⁴ (x _q , i) represents the fuzzy pruning of the feature f _i by the artificial neural network for the input The output vector given by sample x _q . Use the artificial neural network that has been trained in step (3) to calculate its corresponding FQJ(i) for all the features f _i given by the user in step (1.1) according to the above formula, and the FQJ(i) of feature f _i ) value is the measure of its importance.

(4.2)对所有特征f_i，按照其重要性度量FQJ(i)排序(4.2) For all features f _i , sort them according to their importance measure FQJ(i)

对所有特征f_i，按照对应的FQJ(i)值的大小降序排列，就得到了所有特征对于分类的重要性的排序。用户可以根据实际需要或者客观条件的约束，选取排名靠前的一个或者多个特征用于识别，从而达到了特征选择的目的。For all features f _i , sort them in descending order according to the corresponding FQJ(i) value, and then get the ranking of the importance of all features for classification. Users can select one or more top-ranked features for identification according to actual needs or constraints of objective conditions, thus achieving the purpose of feature selection.

实例：Example:

用户希望考察以下四个特征：Sepal length、Sepal width、Petal length和Petal width对分类人物的重要性，并给出了训练样本：数据集IRIS。IRIS数据集被很多研究者用于模式识别方面的研究，已经成为一种基准数据。该数据集包含3类，每类有50个样本，每个样本有4个特征，依次是Sepal length、Sepal width、Petal length和Petal width。The user wants to investigate the following four features: the importance of Sepal length, Sepal width, Petal length and Petal width to classify characters, and gives a training sample: the data set IRIS. The IRIS data set has been used by many researchers in the study of pattern recognition and has become a benchmark data. The data set contains 3 categories, each category has 50 samples, and each sample has 4 features, which are Sepal length, Sepal width, Petal length and Petal width.

进行特征选择的具体步骤如下：The specific steps for feature selection are as follows:

(1.1)特征的指定(1.1) Designation of features

用户指定的4个特征：Sepal length、Sepal width、Petal length和Petalwidth都是数据性特征。则N＝4。The four features specified by the user: Sepal length, Sepal width, Petal length, and Petalwidth are all data features. Then N=4.

(1.2)给出训练样本(1.2) Given training samples

用户给出的训练样本分为3个类：Iris Setosa，Iris Versicolor和IrisVirginica，即K＝3。每一个类有50个样本，一共150个样本，也即Q＝150。每个样本有4维特征：Sepal length、Sepal width、Petal length和Petal width。样本的维数R＝N＝4。The training samples given by the user are divided into 3 categories: Iris Setosa, Iris Versicolor and IrisVirginica, that is, K=3. Each class has 50 samples, a total of 150 samples, that is, Q=150. Each sample has 4-dimensional features: Sepal length, Sepal width, Petal length, and Petal width. The dimension of the sample is R=N=4.

(2.1)构造输入层A(2.1) Constructing the input layer A

输入层A的节点数S¹等于训练样本的维数R，即S¹＝4。The number S ¹ of nodes in the input layer A is equal to the dimension R of the training samples, ie S ¹ =4.

(2.2)构造模糊映射层B(2.2) Construct fuzzy mapping layer B

给每个特征定义3个模糊隶属度函数，即m₁＝m₂＝m₃＝m₄＝3，这样模糊映射层的节点个数是 $S^{2} = Σ_{i = 1}^{S^{1}} m_{i} = 3 + 3 + 3 + 3 = 12,$ 有 $\frac{Q_{\min}}{Σ_{i = 1}^{s^{1}} m_{i}} = \frac{50}{12} > 3,$ 满足约束。Define 3 fuzzy membership functions for each feature, namely m ₁ =m ₂ =m ₃ =m ₄ =3, so the number of nodes in the fuzzy mapping layer is $S^{2} = Σ_{i = 1}^{S^{1}} m_{i} = 3 + 3 + 3 + 3 = 12,$ have $\frac{Q_{\min}}{Σ_{i = 1}^{{the s}^{1}} m_{i}} = \frac{50}{12} > 3,$ Satisfy the constraints.

输入层的节点A₁只与模糊映射层的节点B₁₁，B₁₂，B₁₃用连接权相连，输入层的节点A₂只与模糊映射层的节点B₂₁，B₂₂，B₂₃用连接权相连，输入层的节点A₃只与模糊映射层的节点B₃₁，B₃₂，B₃₃用连接权相连，输入层的节点A₄只与模糊映射层的节点B₄₁，B₄₂，B₄₃用连接权相连。The node A ₁ of the input layer is only connected with the nodes B ₁₁ , B ₁₂ , and B ₁₃ of the fuzzy mapping layer with connection weights, and the node A ₂ of the input layer is only connected with the nodes B ₂₁ , B ₂₂ , and B ₂₃ of the fuzzy mapping layer with connection weights The node A ₃ of the input layer is only connected with the nodes B ₃₁ , B ₃₂ , and B ₃₃ of the fuzzy mapping layer with connection weights, and the node A ₄ of the input layer is only connected with the nodes B ₄₁ , B ₄₂ , and B ₄₃ of the fuzzy mapping layer. Connect right to connect.

(2.2.3)选择模糊映射层节点的作用函数(2.2.3) Select the action function of the fuzzy mapping layer node

选择节点B_ij的模糊隶属度函数：Choose the fuzzy membership function of node B _ij :

隶属度函数的参数ξ_ij的初始值一般在特征f_i的值域上随机选取。以特征Sepal length为例，该特征的取值范围为[4.3，7.9]，则f₁对应的3个模糊隶属度函数中，选取的ξ的初始值可能是：ξ₁₁＝5.2，ξ₁₂＝6.1，ξ₁₃＝7.0。σ可以设置为σ₁₁＝σ₁₂＝σ₁₃＝0.45，τ可以设置为τ₁₁＝τ₁₂＝τ₁₃＝2，得到的隶属度函数如下图4所示。The initial value of the parameter ξ _ij of the membership function is generally randomly selected on the value range of the feature f _i . Taking the feature Sepal length as an example, the value range of this feature is [4.3, 7.9], then among the three fuzzy membership functions corresponding to f ₁ , the selected initial value of ξ may be: ξ ₁₁ =5.2, ξ ₁₂ = 6.1, ξ ₁₃ =7.0. σ can be set as σ ₁₁ =σ ₁₂ =σ ₁₃ =0.45, τ can be set as τ ₁₁ =τ ₁₂ =τ ₁₃ =2, and the obtained membership function is shown in Figure 4 below.

(2.3)隐含层C(2.3) hidden layer C

根据经验，选择S³＝6。According to experience, choose S ³ =6.

模糊映射层B与隐含层C之间的连接权 $w^{3} = {[\begin{matrix} w_{11}^{3} & \cdot \cdot \cdot & w_{1 u}^{3} & \cdot \cdot \cdot & w_{1,6}^{3} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ w_{p 1}^{3} & \cdot \cdot \cdot & w_{pu}^{3} & \cdot \cdot \cdot & w_{p, 6}^{3} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ w_{12,1}^{3} & \cdot \cdot \cdot & w_{12, u}^{3} & \cdot \cdot \cdot & w_{12,6}^{3} \end{matrix}]}_{12 \times 6}$ (p＝1，…，12，u＝1，…，6)的初始化采取随机的方法，连接权值的取值范围为[0，1]。可以令w_pu＝0.5。Connection weight between fuzzy mapping layer B and hidden layer C $w^{3} = {[\begin{matrix} w_{11}^{3} & &Center Dot; \cdot \cdot & w_{1 u}^{3} & &Center Dot; &Center Dot; &Center Dot; & w_{1,6}^{3} \\ \cdot & \cdot & \cdot \\ &Center Dot; & \cdot & \cdot \\ &Center Dot; & &Center Dot; & &Center Dot; \\ w_{p 1}^{3} & \cdot &Center Dot; \cdot & w_{pu}^{3} & &Center Dot; &Center Dot; &Center Dot; & w_{p, 6}^{3} \\ &Center Dot; & &Center Dot; & &Center Dot; \\ \cdot & \cdot & \cdot \\ &Center Dot; & \cdot & \cdot \\ w_{12,1}^{3} & &Center Dot; &Center Dot; &Center Dot; & w_{12, u}^{3} & &Center Dot; &Center Dot; &Center Dot; & w_{12,6}^{3} \end{matrix}]}_{12 \times 6}$ (p=1, . . . , 12, u=1, . It is possible to set w _pu =0.5.

(2.3.3)选择隐含层节点的作用函数(2.3.3) Select the action function of hidden layer nodes

$a_{u}^{3} (q) = \frac{1}{1 + \exp (- n_{u}^{3} (q))},$ (u＝1，…，6). $a_{u}^{3} (q) = \frac{1}{1 + \exp (- {no}_{u}^{3} (q))},$ (u=1,...,6).

(2.4)输出层D(2.4) Output layer D

输出层D的节点数S⁴等于训练样本的类别数K，即S⁴＝K＝3。The number S ⁴ of nodes in the output layer D is equal to the number K of categories of training samples, that is, S ⁴ =K=3.

隐含层C与输出层D之间的连接权值 $w^{4} = {[\begin{matrix} w_{11}^{4} & \cdot \cdot \cdot & w_{12}^{4} & \cdot \cdot \cdot & w_{13}^{4} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ w_{u 1}^{4} & \cdot \cdot \cdot & w_{u 2}^{4} & \cdot \cdot \cdot & w_{u 3}^{4} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ w_{61}^{4} & \cdot \cdot \cdot & w_{62}^{4} & \cdot \cdot \cdot & w_{63}^{4} \end{matrix}]}_{6 \times 3}$ (u＝1，…，6)的初始化采取随机的方法，权值的取值范围为[0，1]。可以令w_ul＝0.5(l＝1，2，3)。Connection weight between hidden layer C and output layer D $w$ $^{4} = {[\begin{matrix} w_{11}^{4} & \cdot \cdot \cdot & w_{12}^{4} & &Center Dot; &Center Dot; &Center Dot; & w_{13}^{4} \\ &Center Dot; & &Center Dot; & &Center Dot; \\ &Center Dot; & \cdot & &Center Dot; \\ &Center Dot; & &Center Dot; & &Center Dot; \\ w_{u 1}^{4} & &Center Dot; \cdot \cdot & w_{u}^{2} & &Center Dot; &Center Dot; &Center Dot; & w_{u 3}^{4} \\ \cdot & \cdot & \cdot \\ \cdot & \cdot & \cdot \\ \cdot & &Center Dot; & &Center Dot; \\ w_{61}^{4} & \cdot \cdot &Center Dot; & w_{62}^{4} & \cdot \cdot \cdot & w_{63}^{4} \end{matrix}]}_{6 \times 3}$ The initialization of (u=1, . . . , 6) adopts a random method, and the value range of the weight is [0, 1]. It is possible to set w _ul =0.5 (l=1, 2, 3).

至此，带有模糊映射层的人工神经网络构造完毕，其结构图如图3所示。So far, the artificial neural network with fuzzy mapping layer has been constructed, and its structure diagram is shown in Figure 3.

(3.1)收敛条件的选取(3.1) Selection of convergence conditions

设定e＜0.001为收敛条件。Set e<0.001 as the convergence condition.

根据经验选择权值学习速率α＝0.05。The weight learning rate α=0.05 is selected according to experience.

按最速下降法，人工神经网络第m层和第m-1层(m＝3，4)之间的连接权矩阵w^m(维数是S^m-1×S^m)，在第(r+1)次训练开始时更新为According to the steepest descent method, the connection weight matrix w ^m (dimension is S ^m-1 × S ^m ) between the mth layer and the m-1th layer (m=3, 4) of the artificial neural network, at the (r+ 1) At the beginning of training, it is updated as

w^m(r+1)＝w^m(r)-0.05g^m(a^m-1)^T.w ^m (r+1)＝w ^m (r)-0.05g ^m (a ^m-1 ) ^T .

其中in

${g g}^{m m} = = \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}^{m m}} {[\begin{matrix} \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{11}^{m m} ((11))} & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{11}^{m m} ((q q))} & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{11}^{m m} ((Q Q))} \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ \cdot \cdot & \cdot &Center Dot; & \cdot &Center Dot; \\ \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{i i}^{m m} ((11))} & \cdot \cdot \cdot \cdot \cdot &Center Dot; & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{i i}^{m m} ((q q))} & \cdot &Center Dot; \cdot &Center Dot; \cdot \cdot & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{i i}^{m m} ((Q Q))} \\ \cdot \cdot & \cdot &Center Dot; & \cdot \cdot \\ \cdot \cdot & \cdot &Center Dot; & \cdot \cdot \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{{S S}^{m m}}^{m m} ((11))} & \cdot &Center Dot; \cdot &Center Dot; \cdot \cdot & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{{S S}^{m m}}^{m m} ((q q))} & \cdot &Center Dot; \cdot &Center Dot; \cdot \cdot & \frac{&PartialD; &PartialD; e e}{{&PartialD; &PartialD; n no}_{{S S}^{m m}}^{m m} ((Q Q))} \end{matrix}]}_{}$

${a a}^{m m} = = {[\begin{matrix} {a a}_{11}^{m m} ((11)) & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & {a a}_{11}^{m m} ((q q)) & \cdot \cdot \cdot &Center Dot; \cdot &Center Dot; & {a a}_{11}^{m m} ((Q Q)) \\ \cdot \cdot & \cdot \cdot & \cdot &Center Dot; \\ \cdot &Center Dot; & \cdot &Center Dot; & \cdot &Center Dot; \\ \cdot \cdot & \cdot &Center Dot; & \cdot &Center Dot; \\ {a a}_{i i}^{m m} ((11)) & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & {a a}_{i i}^{m m} ((q q)) & \cdot &Center Dot; \cdot \cdot \cdot &Center Dot; & {a a}_{i i}^{m m} ((Q Q)) \\ \cdot &Center Dot; & \cdot \cdot & \cdot &Center Dot; \\ \cdot \cdot & \cdot \cdot & \cdot &Center Dot; \\ \cdot &Center Dot; & \cdot \cdot & \cdot &Center Dot; \\ {a a}_{{S S}^{m m}}^{m m} ((11)) & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & {a a}_{{S S}^{m m}}^{m m} ((q q)) & \cdot &Center Dot; \cdot &Center Dot; \cdot &Center Dot; & {a a}_{{S S}^{m m}}^{m m} ((Q Q)) \end{matrix}]}_{{S S}^{m m} \times \times Q Q}$

(3.3)模糊映射层节点的作用函数的参数ξ，σ，τ的更新选择各参数的学习速率为θ＝0.1，

ρ＝0.1。(3.3) The parameters ξ of the action function of the fuzzy mapping layer node, σ, the update selection of τ The learning rate of each parameter is θ=0.1,

p = 0.1.

使用如下公式更新模糊映射层B的节点B_p(p＝1，…，S²)的作用函数的三个参数ξ_p，σ_p，τ_p。The following formulas are used to _update the three parameters ξ p , σ _p , τ _p of the action function of node B _p (p=1, . . . , S ² ) of fuzzy mapping layer B.

其中，_pa²是对人工神经网络输入Q个样本时模糊映射层B的输出矩阵a²的第p行。Among them, _p a ² is the pth row of the output matrix a ² of the fuzzy mapping layer B when Q samples are input to the artificial neural network.

(3.4)训练的终止(3.4) Termination of training

在第1037次训练结束之后，计算发现e＝0.000999，满足收敛条件，终止训练。After the 1037th training, the calculation finds that e=0.000999, the convergence condition is satisfied, and the training is terminated.

(4.1)对特征f_i进行模糊剪枝(4.1) Perform fuzzy pruning on feature f _i

以特征Sepal length为例，对f_i进行剪枝，也就是使得模糊映射层的节点B₁₁，B₁₂，B₁₃的输出值置为0。比如，特征Sepal length的观察值是5.1，剪枝前模糊映射层节点B₁₂，B₁₂，B₁₃的输出是[0.117，0.005，0.009]，特征Sepal width的观察值是3.5，剪枝前模糊映射层节点B₂₁，B₂₂，B₂₃的输出是[0.100，0.500，0.500]，特征Petal length的观察值是1.4，剪枝前模糊映射层节点B₃₁，B₃₂，B₃₃的输出是[0.141，0.974，0.028]，特征Petal width的观察值是0.2，剪枝前模糊映射层节点B₄₁，B₄₂，B₄₃的输出是[0.265，0.069，0.030]，因此样本[5.1，3.5，1.4，0.2]在剪枝前模糊映射层的输出是Taking the feature Sepal length as an example, _fi is pruned, that is, the output values of nodes B ₁₁ , B ₁₂ , and B ₁₃ of the fuzzy mapping layer are set to 0. For example, the observed value of the feature Sepal length is 5.1, the output of the fuzzy mapping layer nodes B ₁₂ , B ₁₂ , and B ₁₃ before pruning is [0.117, 0.005, 0.009], the observed value of the feature Sepal width is 3.5, and the blurred before pruning The output of mapping layer nodes B ₂₁ , B ₂₂ , B ₂₃ is [0.100, 0.500, 0.500], the observation value of the feature Petal length is 1.4, and the output of fuzzy mapping layer nodes B ₃₁ , B ₃₂ , B ₃₃ before pruning is [ 0.141, 0.974, 0.028], the observation value of the feature Petal width is 0.2, the output of the fuzzy mapping layer nodes B ₄₁ , B ₄₂ , and B ₄₃ before pruning is [0.265, 0.069, 0.030], so the sample [5.1, 3.5, 1.4 , 0.2] The output of the fuzzy mapping layer before pruning is

[0.117，0.005，0.009，0.100，0.500，0.500，0.141，0.974，0.028，0.265，0.069，0.030]。[0.117, 0.005, 0.009, 0.100, 0.500, 0.500, 0.141, 0.974, 0.028, 0.265, 0.069, 0.030].

而要进行剪枝，就是要把这个输出修改为To perform pruning, it is necessary to modify this output to

[0.500，0.500，0.500，0.100，0.500，0.500，0.141， 0.974，0.028，0.265，0.069，0.030]。[0.500, 0.500, 0.500, 0.100, 0.500, 0.500, 0.141, 0.974, 0.028, 0.265, 0.069, 0.030].

然后，计算进行这种修改后人工神经网络对于输入样本x_q时输出层给出的输出向量a⁴(x_q，1)。对其他特征的剪枝以此类推。Then, calculate the output vector a ⁴ (x _q , 1) given by the output layer of the artificial neural network for the input sample x _q after such modification. The pruning of other features can be deduced by analogy.

仍然以特征Sepal length为例，对f₁计算FQJ(1)：Still taking the feature Sepal length as an example, calculate FQJ(1) for f ₁ :

$FQJ FQJ ((11)) = = \frac{11}{105105} {Σ Σ}_{q q = = 11}^{150150} {| | | | {a a}^{44} (({x x}_{q q},, 11)) - - {a a}^{44} (({x x}_{q q})) | | | |}^{22} = = 0.08171 0.08171 . .$

类似的，计算得到FQJ(2)＝0.095858，FQJ(3)＝0.491984，FQJ(4)＝0.511002。Similarly, FQJ(2)=0.095858, FQJ(3)=0.491984, FQJ(4)=0.511002 are calculated.

对所有特征f_i，按照对应的FQJ(i)值的大小降序排列，得到特征对于分类任务的重要性排序为：Petal width，Petal length，Sepal width，Sepal length。For all features f _i , they are arranged in descending order according to the corresponding FQJ(i) values, and the importance of the features for the classification task is sorted as follows: Petal width, Petal length, Sepal width, Sepal length.

Claims

1. A feature selection method based on an artificial neural network comprises the following steps:

(1) user specifies the feature f to be selected_iI-1, …, N, giving a training sample set for training an artificial neural network:

the training samples have the same dimension R, R ═ N, and are classified into K categories: omega₁，…，ω_KThe qth training sample x_qX of the ith dimension of_qiI.e. the specified i-th feature f_iThe q-th observation of (1);

(2) constructing an artificial neural network sequentially consisting of an input layer, a fuzzy mapping layer, a hidden layer and an output layer according to the training sample; the neural network data is input into the neural network from the input layer through the connection weight w²Transferring to the fuzzy mapping layer, acting by the fuzzy mapping layer, and passing through the connection weight w³Transferred to the hidden layer, acted by the hidden layer and then passed through the connection weight w⁴Transmitting the output layer to obtain an output, wherein m is 2, 3, 4;

(3) training the initialized artificial neural network by using a training sample set given by a user, wherein the processing procedure is as follows:

(3.1) selecting the estimator e of mean square error as a performance index in the learning process:

wherein, t_i ^m(q) is a target value of an output of the node i of the mth layer when the qth sample is input, a_i ^m(q) is the actual output of node i at the mth level when the qth sample is input, and G is the number of nodes at that level;

(3.2) adopting a back propagation algorithm to carry out connection weight matrix w between each layer of the artificial neural network^mTraining, wherein m is 3, 4;

(3.3) updating parameters xi, sigma and tau in the function of the fuzzy mapping layer node;

(3.4) when e meets the convergence condition, entering the step (4), and otherwise, repeating the steps (3.2) - (3.3);

(4) and carrying out fuzzy pruning on the features by using the trained artificial neural network, calculating the importance measurement of each feature, and sequencing the features according to the measurement values of the importance.

2. The method of claim 1, wherein: the step (2) comprises the following steps:

(2.1) input layer A

Input layer A node number S¹Equal to the dimension R of the training sample, each node inputs a certain dimension of the training sample, and when the q sample is input by the neural network, the node A of the input layer_iThe inputs of (a) are:

n_{i}^{1} (q) = x_{qi},

the output is:

a_{i}^{1} (q) = x_{qi} .

(2.2) fuzzy mapping layer B

Fuzzy mapping of node number of layer B

m_iThe selection of the value needs to satisfy the following conditions:

wherein Q is_min＝min{Q_l}，Q_lRepresenting the class omega in the training sample given by the user_lThe number of samples of (a);

node A of the input layer_iNode B with fuzzy mapping layer_i1，…，B_imiNode B connected by connection weights and having fuzzy mapping layer_i1，…，B_imlDo not interact with except A_iOther nodes of the other input layer are connected; feature level a node a_iAnd fuzzy mapping layer node B_ijThe connection weight between the two is constantly 1; blurring node B of mapping layer when q sample is inputted by neural network_ijThe inputs of (a) are:

node B of fuzzy mapping layer_ijThe function of (a) is a fuzzy membership function mu_ijGiven the ith dimension feature f_iA fuzzy membership function of (1) means that a mapping mu is given_i：f_i→[0，1]；

Node B_ijThe fuzzy membership function of (a) is of the form:

<math> <mrow> <msubsup> <mi>a</mi> <mi>ij</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>1</mn> <mo>+</mo> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <msubsup> <mi>n</mi> <mi>ij</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>ξ</mi> <mi>ij</mi> </msub> </mrow> <msub> <mi>σ</mi> <mi>ij</mi> </msub> </mfrac> <mo>)</mo> </mrow> <mrow> <mn>2</mn> <mi>τ</mi> </mrow> </msup> </mrow> </mfrac> <mo>,</mo> <msub> <mi>σ</mi> <mi>ij</mi> </msub> <mo>&NotEqual;</mo> <mn>0</mn> <mo>,</mo> <msub> <mi>σ</mi> <mi>ij</mi> </msub> <mo>&GreaterEqual;</mo> <mn>0</mn> <mo>.</mo> </mrow> </math>

n_ij ²(q) node B of the fuzzy mapping layer when the q-th sample is input_ijInput of a_ij ²(q) is the corresponding actual output, ξ_ijIs a node B_ijClass of conditional probability density of σ_ijIs a node B_ijStandard deviation of the class of conditional probability densities, τ_ijIs a node B_ijOne of the parameters of (a);

(2.3) hidden layer C

Number of nodes S of hidden layer C³The number of classes K of the samples is greater than or equal to; the fuzzy mapping layer B and the hidden layer C are all connected, and the connection weight between the fuzzy mapping layer B and the hidden layer C

Wherein p is 1, …, S²，u＝1，…，S³The initialization of (a) takes a random approach,the value range of the connection weight is [0, 1]]；

When the q sample is input to the neural network, node C of the hidden layer_u(u＝1，…，S³) The inputs of (a) are:

wherein, a_p ²(q) node B which is a fuzzy mapping layer_p(p＝1，…，S²) Output at the input of the q sample of the neural network, w_pu ³Node B being a fuzzy mapping layer_pAnd hidden layer node C_uThe right of connection between;

the role function of the hidden layer node is selected as a Sigmoid function or a hyperbolic tangent function:

wherein n is_u ³(q) node C of the hidden layer at the input of the q-th sample for the neural network_uInput of a_u ³(q) is the corresponding output;

(2.4) output layer D

Node number S of output layer D⁴Is equal to the number of classes K of the training sample; the hidden layer C and the output layer D are fully connected; the connection weight between the hidden layer C and the output layer D is as follows:

wherein u is 1, …, S³，l＝1，…，S⁴The initialization of (2) adopts a random method, and the value range of the weight is [0, 1]]；

Output layer node D_l(l＝1，…，S⁴) Input and output of (D) are equal_lOutput value n of_l ⁴(q) is that the q-th sample of the neural network input belongs to the class ω_lProbability of (c):

wherein, w_ul ⁴Node C being a hidden layer_uAnd an output layer node D_lThe right of connection between.

3. The method according to claim 1 or 2, characterized in that: the treatment processes of the steps (3.2) and (3.3) are as follows:

(3.2) training the connection weight w between the fuzzy mapping layer B and the hidden layer C^m：

The sensitivity of the estimator of the mean square error e to the input of the mth layer is defined as

<math> <mrow> <msup> <mi>g</mi> <mi>m</mi> </msup> <mo>=</mo> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <msup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mi>m</mi> </msup> </mfrac> <mo>=</mo> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> </mtable> </mfenced> <mo>.</mo> </mrow> </math>

Wherein S is^mIs the number of nodes at the mth layer of the artificial neural network, n^mIs one size of S^mA matrix of xQ representing the input of the mth layer of the artificial neural network; n is_i ^m(q) represents a node i of the m-th layerThe input at the time when the q-th sample is input to the neural network, and,

<math> <mrow> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>a</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>×</mo> <mfrac> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>a</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>;</mo> </mrow> </math>

connection weight matrix w between mth layer and m-1 layer of artificial neural network^mIs S^m-1×S^mAnd m is 3, 4, and is updated to be at the beginning of the (r +1) th training

w^m(r+1)＝w^m(r)-αg^m(a^m-1)^T.

Wherein alpha is weight learning rate, the value range is more than 0 and less than or equal to 1, r is the training frequency, a^mIs one size of S^mA matrix of x Q, representing the actual output of the mth layer of the artificial neural network:

(3.3) node B of fuzzy mapping layer B_p(p＝1，…，S²) Three parameters xi of the action function of_p，σ_p，τ_pUpdated according to the following formula, wherein theta is xi_pThe learning rate of (a) is determined,is σ_pRho is τ_pLearning rate of (d):

<math> <mrow> <msub> <mi>ξ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>ξ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>θ</mi> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msub> <mo>&PartialD;</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> </mfrac> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mo>&PartialD;</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> <mrow> <msub> <mrow> <mo>&PartialD;</mo> <mi>ξ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>,</mo> </mrow> </math>

<math> <mrow> <msub> <mi>σ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>σ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>&upsi;</mi> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msub> <mo>&PartialD;</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> </mfrac> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mo>&PartialD;</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> <mrow> <msub> <mrow> <mo>&PartialD;</mo> <mi>σ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>,</mo> </mrow> </math>

<math> <mrow> <msub> <mi>τ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>τ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>ρ</mi> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msub> <mo>&PartialD;</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> </mfrac> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mo>&PartialD;</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> <mrow> <msub> <mrow> <mo>&PartialD;</mo> <mi>τ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>.</mo> </mrow> </math>

wherein,_pa²fuzzy mapping layer B output matrix a when inputting Q samples to artificial neural network²Line p of (2), and

<math> <mrow> <mfrac> <mrow> <msub> <mo>&PartialD;</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> <mrow> <msub> <mrow> <mo>&PartialD;</mo> <mi>ξ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <msub> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> <mo>,</mo> </mtd> <mtd> <mfrac> <mrow> <msub> <mrow> <mn>2</mn> <mi>τ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>i</mi> <mn>1</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>ξ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <msup> <mrow> <mo>(</mo> <msub> <mi>σ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mfrac> <msup> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> <msup> <mrow> <mo>(</mo> <mfrac> <mn>1</mn> <mrow> <msubsup> <mi>a</mi> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mrow> <mn>1</mn> <mo>-</mo> <mfrac> <mn>1</mn> <mrow> <msub> <mi>τ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </msup> </mtd> <mtd> <mo>,</mo> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> </mtr> </mtable> </mfenced> <mrow> <mn>1</mn> <mo>×</mo> <mi>Q</mi> </mrow> </msub> <mo>,</mo> </mrow> </math>

<math> <mrow> <mfrac> <mrow> <msub> <mo>&PartialD;</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> <mrow> <msub> <mrow> <mo>&PartialD;</mo> <mi>σ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <msub> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <msub> <mrow> <mn>2</mn> <mi>τ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>σ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <msubsup> <mi>a</mi> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msubsup> <mi>a</mi> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> </mtr> </mtable> </mfenced> <mrow> <mn>1</mn> <mo>×</mo> <mi>Q</mi> </mrow> </msub> <mo>,</mo> </mrow> </math>

<math> <mrow> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msub> <mo>&PartialD;</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> </mfrac> <mo>=</mo> <msub> <mn>1</mn> <mrow> <mn>1</mn> <mo>×</mo> <msup> <mi>S</mi> <mn>3</mn> </msup> </mrow> </msub> <mo>×</mo> <msub> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mn>1</mn> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mn>1</mn> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mn>1</mn> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mn>1</mn> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mn>1</mn> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mn>1</mn> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mi>u</mi> <mi>e</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>&PartialD;</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>&PartialD;</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> </mtable> </mfenced> <mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mo>×</mo> <mi>Q</mi> </mrow> </msub> </mrow> </math>

wherein, a_i ¹(q) is with node B_pConnected input layer node A_iThe output when the q sample is input into the neural network is x_qi。

4. The method according to claim 1 or 2, characterized in that: the processing procedure of the step (4) is as follows:

(4.1) pairs of features f_iPerforming fuzzy pruning to make the output of the fuzzy mapping layer as

Obtaining the input sample x of the artificial neural network under the condition_qOutput vector a given by time-output layer⁴(x_q，i)；

(4.2) calculating the importance measure of the feature FQJ (i)

The feature metric function FQJ (i) represents the ith dimension feature f_iFor the importance of classification, fqj (i) is defined as follows:

wherein, a⁴(x_q) Representing an artificial neural network for an input sample x_qOutput vector given by time input layer, a⁴(x_qI) represents perthSign f_iInput sample x for artificial neural network after fuzzy pruning_qThe given output vector uses the artificial neural network trained in the step (3) to give all the features f given by the user in the step (1.1)_iCalculating the corresponding FQJ (i), characteristic f according to the formula_iThe value of (fqj), (i) is a measure of its importance;

(4.3) for all characteristics f_iSorted by their importance measure fqj (i).

5. The method of claim 3, wherein: the processing procedure of the step (4) is as follows:

(4.2) calculating the importance measure of the feature FQJ (i)

wherein, a⁴(x_q) Representing an artificial neural network for an input sample x_qOutput vector given by time input layer, a⁴(x_qI) represents a pair of features f_iInput sample x for artificial neural network after fuzzy pruning_qThe given output vector uses the artificial neural network trained in the step (3) to give all the features f given by the user in the step (1.1)_iCalculating the corresponding FQJ (i), characteristic f according to the formula_iThe value of (fqj), (i) is a measure of its importance;

(4.3) for all characteristics f_iSorted by their importance measure fqj (i).