Detailed Description
The feature selection method of the present invention starts the feature selection process on the premise that the user gives the data set for training and the feature set to be selected, and the feature selection process is described in detail below.
Feature selection is performed to obtain a measure of the importance of the features. In the feature selection method provided by the invention, the artificial neural network with the fuzzy mapping layer is trained by using the data set provided by the user, and then the importance metric of each feature is calculated by means of the trained network, so that the purpose of feature selection is achieved. As shown in fig. 1, the method of the present invention comprises the steps of:
(1) user specifies the feature f to be selectedi(i ═ 1, …, N), training samples for training the artificial neural network are given.
(1.1) specification of features
The specified characteristics must be data-type characteristics that directly reflect the actual physical or geometric meaning of the object, such as weight, speed, length, etc. The number N of features is a natural number, that is, the number of features is one or more.
(1.2) definition of training samples
Training samples for training the artificial neural network are also of a data type, and all samples have the same dimension R (R ═ N) and are classified into K categories: omega1,…,ωK. Dimension R is equal to the number of features specified in step (1.1). The q training sample xqX of the ith dimension ofqiIs the specified i-th feature fiThe q-th observation. The specific mathematical description of the training sample set is:
wherein Q is the number of training samples, and Q is more than or equal to K, each class omegal(l-1, …, K) at least one sample,representing a set of real numbers, R being a sample xqIs equal to the number of features N of the training sample set X.
(2) And constructing an artificial neural network consisting of a characteristic layer A, a fuzzy mapping layer B, a hidden layer C and an output layer D according to the training sample, and initializing.
As shown in FIG. 2, the artificial neural network structure comprises an input layer A (i.e. a characteristic layer), a fuzzy mapping layer B, a hidden layer C and an output layer D, wherein a connection weight w is used between layersm(m-2, 3, 4) are linked. Data is input into the neural network from the input layer, then is transmitted to the fuzzy mapping layer through the connection weight, is transmitted to the hidden layer through the connection weight after being acted by the fuzzy mapping layer, and is transmitted to the output layer through the connection weight after being acted by the hidden layer, so that output is obtained. The construction of an artificial neural network with a fuzzy mapping layer requires the setting of the node numbers of an input layer (characteristic layer), a hidden layer and an output layer; determining each feature fiNumber m of corresponding fuzzy membership functioniAnd defining the fuzzy membership functions. The initialization operation needs to determine the initial value of the connection weight between each layer of the artificial neural network and the initial value of the parameter of the fuzzy membership function in each node in the fuzzy mapping layer.
The specific process is as follows:
(2.1) input layer A
(2.1.1) selection of number of input layer nodes
Input layer A node number S1Equal to the dimension R of the training samples.
(2.1.2) input and output of input layer nodes
Per node input trainingA certain dimension of the sample. When the q sample is input by the neural network, the node A of the input layeriThe inputs of (a) are:
the output is:
(2.2) fuzzy mapping layer B
(2.2.1) selection of the number of fuzzy membership functions corresponding to each feature
For feature fiF can be defined according to its specific physical meaningiCorresponding miAnd each fuzzy membership function forms a fuzzy mapping layer node. That is, the number of nodes of the mapping layer B is blurred <math> <mrow> <msup> <mi>S</mi> <mn>2</mn> </msup> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>S</mi> <mn>1</mn> </msup> </munderover> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>,</mo> </mrow> </math> miThe selection of the value needs to satisfy the following conditions:
<math> <mrow> <mfrac> <msub> <mi>Q</mi> <mi>min</mi> </msub> <mrow> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>S</mi> <mn>1</mn> </msup> </munderover> <msub> <mi>m</mi> <mi>i</mi> </msub> </mrow> </mfrac> <mo>></mo> <mn>3</mn> </mrow> </math>
wherein Q ismin=min{Ql},QlRepresenting the class omega in the training sample given by the userlThe number of samples of (1).
(2.2.2) connection weights between input layer and fuzzy mapping layer
Node A of the input layeriNode B with fuzzy mapping layeri1,…,BimiNode B connected by connection weights and having fuzzy mapping layeri1,…,BimiDo not interact with except AiAnd other nodes of the other input layers are connected, namely a 1-to-many connection mode. Feature level a node aiAnd fuzzy mapping layer node BijThe connection weight value between the feature layer A and the fuzzy mapping layer B is constant to 1, namely, the connection weight matrix w between the feature layer A and the fuzzy mapping layer B2The training of the artificial neural network is not participated.
(2.2.3) node B of fuzzy mapping layerijIs inputted
Blurring node B of mapping layer when q sample is inputted by neural networkijThe inputs of (a) are:
<math> <mrow> <msubsup> <mi>n</mi> <mi>ij</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>x</mi> <mi>qi</mi> </msub> <mo>×</mo> <mn>1</mn> <mo>=</mo> <msub> <mi>x</mi> <mi>qi</mi> </msub> <mo>.</mo> </mrow> </math>
(2.2.4) node B of fuzzy mapping layerijFunction of (2)
Node B of fuzzy mapping layerijThe function of (a) is a fuzzy membership function muijI.e. characteristic fiThe jth membership function of (a). In the present invention, the feature f of the ith dimension is giveniA fuzzy membership function of (1) means that a map is givenMu shootingi:fi→[0,1]。
Node BijThe fuzzy membership function of (a) is of the form:
<math> <mrow> <msubsup> <mi>a</mi> <mi>ij</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>1</mn> <mo>+</mo> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <msubsup> <mi>n</mi> <mi>ij</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>ξ</mi> <mi>ij</mi> </msub> </mrow> <msub> <mi>σ</mi> <mi>ij</mi> </msub> </mfrac> <mo>)</mo> </mrow> <mrow> <mn>2</mn> <mi>τ</mi> </mrow> </msup> </mrow> </mfrac> <mo>,</mo> </mrow> </math> σij≠0,τij≥0.
here, n isij 2(q) node B of the fuzzy mapping layer when the q-th sample is inputijInput of aij 2(q) is the corresponding actual output. XiijIs a node BijClass of conditional probability density of σijIs a node BijStandard deviation of the class of conditional probability densities, τijIs a node BijIs measured by the measurement unit. The action of tau is shown in: even if ξ and σ of the two membership functions are equal, it is still possible to avoid that the two membership functions are exactly the same by adjusting τ.
For σijAnd τijIs not particularly limited, ξijGenerally taken in the corresponding characteristic fiIs randomly selected on the value range of (1).
(2.3) hidden layer C
(2.3.1) selection of the number of hidden layer nodes
Number of nodes S of hidden layer C3The selection of (2) is not particularly required, and generally, the selection is not less than the number K of the classes of the training samples.
(2.3.2) obfuscating the connection weights between the mapping layer and the hidden layer
The fuzzy mapping layer B is fully connected with the hidden layer C, that is, each node of the fuzzy mapping layer B is connected with all nodes of the hidden layer C, and each node of the hidden layer C is also connected with all nodes of the fuzzy mapping layer B. Fuzzy mapping layer B and implicit layer C
<math> <mrow> <msup> <mi>w</mi> <mn>3</mn> </msup> <mo>=</mo> <msub> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <msubsup> <mi>w</mi> <mn>11</mn> <mn>3</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mrow> <mn>1</mn> <mi>u</mi> </mrow> <mn>3</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <msup> <mrow> <mn>1</mn> <mi>S</mi> </mrow> <mn>3</mn> </msup> <mn>3</mn> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>w</mi> <mrow> <mi>p</mi> <mn>1</mn> </mrow> <mn>3</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mi>pu</mi> <mn>3</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <msup> <mi>pS</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>w</mi> <mrow> <msup> <mi>S</mi> <mn>2</mn> </msup> <mn>1</mn> </mrow> <mn>3</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mrow> <msup> <mi>S</mi> <mn>2</mn> </msup> <mi>u</mi> </mrow> <mn>3</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mrow> <msup> <mi>S</mi> <mn>2</mn> </msup> <msup> <mi>S</mi> <mn>3</mn> </msup> </mrow> <mn>3</mn> </msubsup> </mtd> </mtr> </mtable> </mfenced> <mrow> <msup> <mi>S</mi> <mn>2</mn> </msup> <mo>×</mo> <msup> <mi>S</mi> <mn>3</mn> </msup> </mrow> </msub> </mrow> </math> (p=1,…,S2,u=1,…,S3) The initialization of the connection weight value adopts a random method, and the value range of the connection weight value is [0, 1]]。
(2.3.3) input of hidden layer node
When the q sample is input to the neural network, node C of the hidden layeru(u=1,…,S3) The inputs of (a) are:
<math> <mrow> <msubsup> <mi>n</mi> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>p</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>S</mi> <mn>2</mn> </msup> </munderover> <msubsup> <mi>a</mi> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>×</mo> <msubsup> <mi>w</mi> <mi>pu</mi> <mn>3</mn> </msubsup> <mo>.</mo> </mrow> </math>
wherein, ap 2(q) node B which is a fuzzy mapping layerp(p=1,…,S2) Output at the input of the q sample of the neural network, wpu 3Node B being a fuzzy mapping layerpAnd hidden layer node CuThe right of connection between.
(2.3.4) Effect function of hidden layer nodes
The role function of the hidden layer node is selected as a Sigmoid function:
(u=1,…,S3).
wherein n isu 3(q) node C of the hidden layer at the input of the q-th sample for the neural networkuInput of au 3(q) is the corresponding output.
It can also be chosen as a hyperbolic tangent function:
(u=1,…,S3).
wherein n isu 3(q) node C of the hidden layer at the input of the q-th sample for the neural networkuInput of au 3(q) is the corresponding output.
(2.4) output layer D
(2.4.1) selection of number of output layer nodes
Node number S of output layer D4Equal to the number of classes K of the training samples.
(2.4.2) connection rights between hidden layer and output layer
The hidden layer C and the output layer D are fully connected, that is, each node of the hidden layer C is connected to all nodes of the output layer D, and each node of the output layer D is also connected to all nodes of the hidden layer C. Connection weight between hidden layer C and output layer D <math> <mrow> <msup> <mi>w</mi> <mn>4</mn> </msup> <mo>=</mo> <msub> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <msubsup> <mi>w</mi> <mn>11</mn> <mn>4</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mrow> <mn>1</mn> <mi>l</mi> </mrow> <mn>4</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <msup> <mi>lS</mi> <mn>4</mn> </msup> <mn>4</mn> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>w</mi> <mrow> <mi>u</mi> <mn>1</mn> </mrow> <mn>4</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mi>ul</mi> <mn>4</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <msup> <mi>uS</mi> <mn>4</mn> </msup> <mn>4</mn> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>w</mi> <mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>1</mn> </mrow> <mn>4</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mi>l</mi> </mrow> <mn>4</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <msup> <mi>S</mi> <mn>4</mn> </msup> </mrow> <mn>4</mn> </msubsup> </mtd> </mtr> </mtable> </mfenced> <mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mo>×</mo> <msup> <mi>S</mi> <mn>4</mn> </msup> </mrow> </msub> </mrow> </math> (u=1,…,S3,l=1,…,S4) The initialization of (2) adopts a random method, and the value range of the weight is [0, 1]]。
(2.4.3) input and output of output layer nodes
Output layer node Dl(l=1,…,S4) Input and output of (D) are equallOutput value n ofl 4(q) is that the q-th sample of the neural network input belongs to the class ωlThe probability of (c).
<math> <mrow> <msubsup> <mi>n</mi> <mi>l</mi> <mn>4</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>=</mo> <msubsup> <mi>a</mi> <mi>l</mi> <mn>4</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>u</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> </munderover> <msubsup> <mi>n</mi> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>×</mo> <msubsup> <mi>w</mi> <mi>ul</mi> <mn>4</mn> </msubsup> <mo>.</mo> </mrow> </math>
Wherein, wul 4Node C being a hidden layeruAnd an output layer node DlThe right of connection between.
(3) And training the artificial neural network after initialization by using a training sample set given by a user.
Training the artificial neural network by using a back propagation algorithm in a batch learning mode according to a training sample set given by a user, and updating the connection weight between each layer of the neural network and the parameters of the fuzzy membership function in each training until the artificial neural network meets the convergence condition set by the user.
The specific training method is as follows.
(3.1) selection of Convergence Condition
Firstly, an estimator e of mean square error is selected as a performance index in the learning process:
<math> <mrow> <mi>e</mi> <mo>=</mo> <mfrac> <mn>1</mn> <mi>Q</mi> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>q</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>Q</mi> </munderover> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>G</mi> </munderover> <msup> <mrow> <mo>(</mo> <msubsup> <mi>t</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>-</mo> <msubsup> <mi>a</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>.</mo> </mrow> </math>
wherein, ti m(q) is a target value of an output of the node i of the mth layer when the qth sample is input, ai m(q) is the actual output of node i at the mth level when the qth sample is input, and G is the number of nodes at that level.
The user can set e less than a small positive number as a convergence condition according to the requirement on the calculation precision. For example, setting e < 0.001 as a convergence condition, calculating the value of e after the artificial neural network completes the steps (3.2) and (3.3) in a certain training, and stopping the training if the value of e is less than 0.001; otherwise, the next training is carried out.
(3.2) updating of connection weights between layers
The connection weight between the input layer A and the fuzzy mapping layer B is constant to 1, and the training is not participated in. Connection weight w between fuzzy mapping layer B and hidden layer C3Connection w between hidden layer C and output layer D4All need to take part in training, and w3And w4The updating method in training is the same.
The sensitivity of the estimator of the mean square error e in the back-propagation algorithm to the input of the mth layer is defined as
<math> <mrow> <msup> <mi>g</mi> <mi>m</mi> </msup> <mo>=</mo> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <msup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>m</mi> </msup> </mfrac> <msub> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> </mtable> </mfenced> <mn></mn> </msub> </mrow> </math>
Wherein S ismIs the number of nodes at the mth layer of the artificial neural network, nmIs one size of SmXQ matrix representing the mth of the artificial neural networkInputting a layer; n isi m(q) represents the input of the node i of the mth layer at the time when the q-th sample is input to the neural network. Furthermore, it is possible to provide a liquid crystal display device, <math> <mrow> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>a</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>×</mo> <mfrac> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>a</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>.</mo> </mrow> </math>
the connection weight is updated according to the steepest descent method, and a minimum modulus estimation algorithm such as a conjugate gradient method can also be adopted here. Connection weight matrix w between mth layer and m-1 layer (m is 3, 4) of artificial neural networkm(dimension is S)m-1×Sm) Updated to (r +1) th training start
wm(r+1)=wm(r)-αgm(am-1)T.
Wherein, alpha is weight learning rate, the value range is more than 0 and less than or equal to 1, and is generally selected to be 0.05. r is the number of training sessions. a ismIs one size of SmA matrix of x Q, representing the actual output of the mth layer of the artificial neural network:
<math> <mrow> <msup> <mi>a</mi> <mi>m</mi> </msup> <mo>=</mo> <msub> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <msubsup> <mi>a</mi> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>a</mi> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>a</mi> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>a</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>a</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>a</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>a</mi> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>a</mi> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>a</mi> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mrow> <msup> <mi>S</mi> <mi>m</mi> </msup> <mo>×</mo> <mi>Q</mi> </mrow> </msub> </mrow> </math>
(3.3) updating of parameters xi, sigma, tau of the Functions of nodes of the fuzzy mapping layer
Node B of fuzzy mapping layer B
p(p=1,…,S
2) Three parameters xi of the action function of
p,σ
p,τ
pUpdated as follows, where θ is ξ
pThe learning rate of (a) is determined,
is σ
pIs the learning rate of, ρ is τ
pThe learning rate of the method adopts parameter selection methods such as a trial and error method and the like.
<math> <mrow> <msub> <mi>ξ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>ξ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>θ</mi> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msub> <mo>∂</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> </mfrac> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mo>∂</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> <mrow> <msub> <mrow> <mo>∂</mo> <mi>ξ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>,</mo> </mrow> </math>
<math> <mrow> <msub> <mi>τ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>τ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>ρ</mi> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msub> <mo>∂</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> </mfrac> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mo>∂</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> <mrow> <msub> <mrow> <mo>∂</mo> <mi>τ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>.</mo> </mrow> </math>
Wherein,pa2is the output matrix a of the fuzzy mapping layer B when inputting Q samples to the artificial neural network2Row p. And also
<math> <mrow> <mfrac> <mrow> <msub> <mo>∂</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> <mrow> <msub> <mrow> <mo>∂</mo> <mi>ξ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <msub> <mrow> <mo>[</mo> <mo>·</mo> <mo>·</mo> <mo>·</mo> <mo>,</mo> <mfrac> <mrow> <msub> <mrow> <mn>2</mn> <mi>τ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>i</mi> <mn>1</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>ξ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <msup> <mrow> <mo>(</mo> <msub> <mi>σ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mfrac> <msup> <mrow> <mo>(</mo> <msubsup> <mi>a</mi> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> <mn>2</mn> </msup> <msup> <mrow> <mo>(</mo> <mfrac> <mn>1</mn> <mrow> <msubsup> <mi>a</mi> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mrow> <mn>1</mn> <mo>-</mo> <mfrac> <mn>1</mn> <mrow> <msub> <mi>τ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </msup> <mo>,</mo> <mo>·</mo> <mo>·</mo> <mo>·</mo> <mo>]</mo> </mrow> <mrow> <mn>1</mn> <mo>×</mo> <mi>Q</mi> </mrow> </msub> <mo>,</mo> </mrow> </math>
<math> <mrow> <mfrac> <mrow> <msub> <mo>∂</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> <mrow> <msub> <mrow> <mo>∂</mo> <mi>σ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>=</mo> <msub> <mrow> <mo>[</mo> <mo>·</mo> <mo>·</mo> <mo>·</mo> <mo>,</mo> <mfrac> <mrow> <msub> <mrow> <mn>2</mn> <mi>τ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msub> <mi>σ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mrow> <msubsup> <mi>a</mi> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msubsup> <mi>a</mi> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>)</mo> </mrow> </mrow> <mo>·</mo> <mo>·</mo> <mo>·</mo> <mo>]</mo> </mrow> <mrow> <mn>1</mn> <mo>×</mo> <mi>Q</mi> </mrow> </msub> <mo>,</mo> </mrow> </math>
<math> <mrow> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msub> <mo>∂</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> </mfrac> <mo>=</mo> <msub> <mn>1</mn> <mrow> <mn>1</mn> <mo>×</mo> <msup> <mi>s</mi> <mn>3</mn> </msup> </mrow> </msub> <mo>×</mo> <msub> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mn>1</mn> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mn>1</mn> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mn>1</mn> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mn>1</mn> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mn>1</mn> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mn>1</mn> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mrow> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mo>∂</mo> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>u</mi> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mrow> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <mo>∂</mo> <msubsup> <mi>n</mi> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mfrac> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mn>3</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>a</mi> </mrow> <mi>p</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> </mtable> </mfenced> <mrow> <msup> <mi>S</mi> <mn>3</mn> </msup> <mo>×</mo> <mi>Q</mi> </mrow> </msub> </mrow> </math>
Wherein, ai 1(q) is with node BpConnected input layer node AiThe output at the input of the q sample of the neural networkIs xqi。
(3.4) termination of training
And (3) carrying out the operations of the steps (3.2) and (3.3) in each training of the artificial neural network. After each training is completed, calculating the value of e, and stopping the training if the convergence condition set in the step (3.1) is met; otherwise, the next training is carried out.
(4) And carrying out fuzzy pruning on the features by using the trained artificial neural network, calculating the importance measure of each feature, and sequencing.
(4.1) pairs of features fiPerforming fuzzy pruning
So-called pair feature fiFuzzy pruning (fuzzy prune algorithm) of (1), namely, the feature f is obtainediThe output value of all corresponding fuzzy membership function is set to 0.5, namely the output of the fuzzy mapping layer is set to be
Then, the artificial neural network under the condition is obtained for the input sample xqOutput vector a given by time-output layer4(xq,i)。
(4.2) calculating the importance measure of the feature FQJ (i)
The characteristic metric function FQJ (i) provided by the invention represents the ith dimension characteristic fiFor the importance of classification, feature fiA larger value of fqj (i) indicates that the feature is more important for classification. FQJ (i) is defined as follows:
<math> <mrow> <mi>FQJ</mi> <mrow> <mo>(</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>Q</mi> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>q</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>Q</mi> </munderover> <msup> <mrow> <mo>|</mo> <mo>|</mo> <msup> <mi>a</mi> <mn>4</mn> </msup> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>q</mi> </msub> <mo>,</mo> <mi>i</mi> <mo>)</mo> </mrow> <mo>-</mo> <msup> <mi>a</mi> <mn>4</mn> </msup> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>q</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> </mrow> </math>
wherein, a4(xq) Representing an artificial neural network for an input sample xqOutput vector given by time input layer, a4(xqI) represents a pair of features fiInput sample x for artificial neural network after fuzzy pruningqGiven the output vector. Using the artificial neural network trained in step (3) to apply to all features f given by the user in step (1.1)iCalculating the corresponding FQJ (i), characteristic f according to the formulaiThe value of fqj (i) is a measure of its importance.
(4.2) for all characteristics fiSorted according to their importance measures FQJ (i)
For all features fiThe sorting of the importance of all features to the classification is obtained in descending order of the magnitude of the corresponding fqj (i) values. The user can select one or more characteristics with the top rank for identification according to the actual needs or the constraints of objective conditions, so that the purpose of characteristic selection is achieved.
Example (c):
the user wishes to investigate the following four features: the importance of the Sepal length, the Sepal width, the Petal length and the Petal width to the classification of people is provided, and a training sample is given: the data set IRIS. The IRIS data set is used by many researchers for research in pattern recognition and has become a benchmark. The data set contains 3 classes, each class has 50 samples, each sample has 4 characteristics, sequentially Sepal length, Sepal width, Petal length and Petal width.
The specific steps for feature selection are as follows:
(1) user specifies the feature f to be selectedi(i ═ 1, …, N), training samples for training the artificial neural network are given.
(1.1) specification of features
User-specified 4 features: the Sepal length, Sepal width, Petal length, and Petalwidth are all datalogical features. Then N is 4.
(1.2) giving a training sample
Training samples given by the user are divided into 3 classes: iris Setosa, Iris Versicolor and Iris virginica, i.e., K ═ 3. Each class has 50 samples for a total of 150 samples, i.e., Q150. Each sample has 4-dimensional features: sepal length, Sepal width, Petal length, and Petal width. The dimension R of the sample is 4.
(2) And constructing an artificial neural network consisting of a characteristic layer A, a fuzzy mapping layer B, a hidden layer C and an output layer D according to the training sample, and initializing.
(2.1) construction of the input layer A
(2.1.1) selection of number of input layer nodes
Input layer A node number S1Equal to the dimension R, i.e. S, of the training sample1=4。
(2.2) constructing a fuzzy mapping layer B
(2.2.1) selection of the number of fuzzy membership functions corresponding to each feature
Defining 3 fuzzy membership functions, m, for each feature1=m2=m3=m4The number of nodes in the fuzzy mapping layer is 3 <math> <mrow> <msup> <mi>S</mi> <mn>2</mn> </msup> <mo>=</mo> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>S</mi> <mn>1</mn> </msup> </munderover> <msub> <mi>m</mi> <mi>i</mi> </msub> <mo>=</mo> <mn>3</mn> <mo>+</mo> <mn>3</mn> <mo>+</mo> <mn>3</mn> <mo>+</mo> <mn>3</mn> <mo>=</mo> <mn>12</mn> <mo>,</mo> </mrow> </math> Is provided with <math> <mrow> <mfrac> <msub> <mi>Q</mi> <mi>min</mi> </msub> <mrow> <munderover> <mi>Σ</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <msup> <mi>s</mi> <mn>1</mn> </msup> </munderover> <msub> <mi>m</mi> <mi>i</mi> </msub> </mrow> </mfrac> <mo>=</mo> <mfrac> <mn>50</mn> <mn>12</mn> </mfrac> <mo>></mo> <mn>3</mn> <mo>,</mo> </mrow> </math> The constraints are satisfied.
(2.2.2) connection weights between input layer and fuzzy mapping layer
Node A of the input layer1Node B with fuzzy mapping layer only11,B12,B13Connected by means of connection rights, node A of the input layer2Node B with fuzzy mapping layer only21,B22,B23Connected by means of connection rights, node A of the input layer3Node B with fuzzy mapping layer only31,B32,B33Connected by means of connection rights, node A of the input layer4Node B with fuzzy mapping layer only41,B42,B43Are connected by a connection right.
(2.2.3) selecting Functions of fuzzy mapping layer nodes
Selecting a node BijFuzzy membership function of (1):
<math> <mrow> <msubsup> <mi>a</mi> <mi>ij</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mn>1</mn> <mo>+</mo> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <msubsup> <mi>n</mi> <mi>ij</mi> <mn>2</mn> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>ξ</mi> <mi>ij</mi> </msub> </mrow> <msub> <mi>σ</mi> <mi>ij</mi> </msub> </mfrac> <mo>)</mo> </mrow> <mrow> <mn>2</mn> <mi>τ</mi> </mrow> </msup> </mrow> </mfrac> <mo>,</mo> </mrow> </math> σij≠0,τij≥0.
parameter xi of membership functionijIs generally in the characteristic fiIs randomly selected over the range of values. Taking the feature Sepal length as an example, the value range of the feature is [4.3, 7.9 ]]Then f is1In the corresponding 3 fuzzy membership functions, the initial value of xi selected may be: xi11=5.2,ξ12=6.1,ξ137.0. σ may be set to σ11=σ12=σ13τ may be set to τ 0.4511=τ12=τ13The resulting membership function is shown in fig. 4 below, which is 2.
(2.3) hidden layer C
(2.3.1) selection of the number of hidden layer nodes
Empirically, S is selected3=6。
(2.3.2) obfuscating the connection weights between the mapping layer and the hidden layer
Fuzzy mapping layer B and implicit layer C <math> <mrow> <msup> <mi>w</mi> <mn>3</mn> </msup> <mo>=</mo> <msub> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <msubsup> <mi>w</mi> <mn>11</mn> <mn>3</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mrow> <mn>1</mn> <mi>u</mi> </mrow> <mn>3</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mn>1,6</mn> <mn>3</mn> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>w</mi> <mrow> <mi>p</mi> <mn>1</mn> </mrow> <mn>3</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mi>pu</mi> <mn>3</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mrow> <mi>p</mi> <mo>,</mo> <mn>6</mn> </mrow> <mn>3</mn> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>w</mi> <mn>12,1</mn> <mn>3</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mrow> <mn>12</mn> <mo>,</mo> <mi>u</mi> </mrow> <mn>3</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mn>12,6</mn> <mn>3</mn> </msubsup> </mtd> </mtr> </mtable> </mfenced> <mrow> <mn>12</mn> <mo>×</mo> <mn>6</mn> </mrow> </msub> </mrow> </math> The initialization of (p 1, …, 12, u 1, …, 6) adopts a random method, and the value range of the connection weight is [0, 1]. Can make wpu=0.5。
(2.3.3) selecting the Functions of the hidden layer nodes
The role function of the hidden layer node is selected as a Sigmoid function:
(u=1,…,6).
wherein n isu 3(q) node C of the hidden layer at the input of the q-th sample for the neural networkuInput of au 3(q) is the corresponding output.
(2.4) output layer D
(2.4.1) selection of number of output layer nodes
Node number S of output layer D4Equal to the number of classes K, i.e. S, of the training samples4=K=3。
(2.4.2) connection rights between hidden layer and output layer
Connection weight between hidden layer C and output layer D <math> <mrow> <msup> <mi>w</mi> <mn>4</mn> </msup> <mo>=</mo> <msub> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <msubsup> <mi>w</mi> <mn>11</mn> <mn>4</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mn>12</mn> <mn>4</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mn>13</mn> <mn>4</mn> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>w</mi> <mrow> <mi>u</mi> <mn>1</mn> </mrow> <mn>4</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mrow> <mi>u</mi> <mn>2</mn> </mrow> <mn>4</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mrow> <mi>u</mi> <mn>3</mn> </mrow> <mn>4</mn> </msubsup> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>w</mi> <mn>61</mn> <mn>4</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mn>62</mn> <mn>4</mn> </msubsup> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>w</mi> <mn>63</mn> <mn>4</mn> </msubsup> </mtd> </mtr> </mtable> </mfenced> <mrow> <mn>6</mn> <mo>×</mo> <mn>3</mn> </mrow> </msub> </mrow> </math> The initialization of (u is 1, …, 6) adopts a random method, and the value range of the weight is [0, 1%]. Can make wul=0.5(l=1,2,3)。
So far, the artificial neural network with the fuzzy mapping layer is constructed, and the structure diagram is shown in fig. 3.
(3) And training the artificial neural network after initialization by using a training sample set given by a user.
(3.1) selection of Convergence Condition
The convergence condition is set to e < 0.001.
(3.2) updating of connection weights between layers
And selecting the weight learning rate alpha to be 0.05 according to experience.
According to the steepest descent method, a connection weight matrix w between the mth layer and the (m is 3, 4) th layer of the artificial neural networkm(dimension is S)m-1×Sm) Updated to (r +1) th training start
wm(r+1)=wm(r)-0.05gm(am-1)T.
Wherein
<math> <mrow> <msup> <mi>g</mi> <mi>m</mi> </msup> <mo>=</mo> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <msup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>m</mi> </msup> </mfrac> <msub> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msubsup> <mrow> <mo>∂</mo> <mi>n</mi> </mrow> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mrow> </mfrac> </mtd> </mtr> </mtable> </mfenced> <mn></mn> </msub> </mrow> </math>
<math> <mrow> <msup> <mi>a</mi> <mi>m</mi> </msup> <mo>=</mo> <msub> <mfenced open='[' close=']'> <mtable> <mtr> <mtd> <msubsup> <mi>a</mi> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>a</mi> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>a</mi> <mn>1</mn> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>a</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>a</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>a</mi> <mi>i</mi> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> <mtd> </mtd> <mtd> <mo>·</mo> </mtd> </mtr> <mtr> <mtd> <msubsup> <mi>a</mi> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>a</mi> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>q</mi> <mo>)</mo> </mrow> </mtd> <mtd> <mo>·</mo> <mo>·</mo> <mo>·</mo> </mtd> <mtd> <msubsup> <mi>a</mi> <msup> <mi>S</mi> <mi>m</mi> </msup> <mi>m</mi> </msubsup> <mrow> <mo>(</mo> <mi>Q</mi> <mo>)</mo> </mrow> </mtd> </mtr> </mtable> </mfenced> <mrow> <msup> <mi>S</mi> <mi>m</mi> </msup> <mo>×</mo> <mi>Q</mi> </mrow> </msub> </mrow> </math>
(3.3) updating parameters xi, sigma and tau of the function of fuzzy mapping layer node selects the learning rate theta of each parameter as 0.1,
ρ=0.1。
node B for updating fuzzy mapping layer B using the following formulap(p=1,…,S2) Three parameters xi of the action function ofp,σp,τp。
<math> <mrow> <msub> <mi>ξ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>ξ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>θ</mi> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msub> <mo>∂</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> </mfrac> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mo>∂</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> <mrow> <msub> <mrow> <mo>∂</mo> <mi>ξ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>,</mo> </mrow> </math>
<math> <mrow> <msub> <mi>τ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>+</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>τ</mi> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> <mo>-</mo> <mi>ρ</mi> <mfrac> <mrow> <mo>∂</mo> <mi>e</mi> </mrow> <mrow> <msub> <mo>∂</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> </mfrac> <msup> <mrow> <mo>(</mo> <mfrac> <mrow> <msub> <mo>∂</mo> <mi>p</mi> </msub> <msup> <mi>a</mi> <mn>2</mn> </msup> </mrow> <mrow> <msub> <mrow> <mo>∂</mo> <mi>τ</mi> </mrow> <mi>p</mi> </msub> <mrow> <mo>(</mo> <mi>r</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> </mrow> <mi>T</mi> </msup> <mo>.</mo> </mrow> </math>
Wherein,pa2is the output matrix a of the fuzzy mapping layer B when inputting Q samples to the artificial neural network2Row p.
(3.4) termination of training
After the 1037 th training is finished, the calculation finds that e is 0.000999, the convergence condition is met, and the training is terminated.
(4) And carrying out fuzzy pruning on the features by using the trained artificial neural network, calculating the importance measure of each feature, and sequencing.
(4.1) pairs of features fiPerforming fuzzy pruning
Take the feature Sepal length as an example, for fiPruning, i.e. making node B of the mapping layer fuzzy11,B12,B13The output value of (d) is set to 0. For example, the observed value of the feature Sepal length is 5.1, the fuzzy mapping layer node B before pruning12,B12,B13The output of (1) is [0.117, 0.005, 0.009]The observed value of the characteristic Sepal width is 3.5, the fuzzy mapping layer node B before pruning21,B22,B23Is [0.100, 0.500 ]]The observed value of the characteristic Petal length is 1.4, and node B of the fuzzy mapping layer before pruning31,B32,B33The output of (1) is [0.141, 0.974, 0.028 ]]The observed value of the characteristic Petal width is 0.2, and the node B of the fuzzy mapping layer before pruning41,B42,B43Is [0.265, 0.069, 0.030 ]]Thus sample [5.1, 3.5, 1.4, 0.2%]The output of the fuzzy mapping layer before pruning is
[0.117,0.005,0.009,0.100,0.500,0.500,0.141,0.974,0.028,0.265,0.069,0.030]。
When pruning is to be performed, the output is modified to
[0.500,0.500,0.500,0.100,0.500,0.500,0.141, 0.974,0.028,0.265,0.069,0.030]。
Then, the artificial neural network after such modification is calculated for the input sample xqOutput vector a given by time-output layer4(xq,1). Pruning of other features and so on.
(4.2) calculating the importance measure of the feature FQJ (i)
Still take the feature Sepal length as an example, for f1Calculation FQJ (1):
<math> <mrow> <mi>FQJ</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>105</mn> </mfrac> <munderover> <mi>Σ</mi> <mrow> <mi>q</mi> <mo>=</mo> <mn>1</mn> </mrow> <mn>150</mn> </munderover> <msup> <mrow> <mo>|</mo> <mo>|</mo> <msup> <mi>a</mi> <mn>4</mn> </msup> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>q</mi> </msub> <mo>,</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>-</mo> <msup> <mi>a</mi> <mn>4</mn> </msup> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>q</mi> </msub> <mo>)</mo> </mrow> <mo>|</mo> <mo>|</mo> </mrow> <mn>2</mn> </msup> <mo>=</mo> <mn>0.08171</mn> <mo>.</mo> </mrow> </math>
similarly, FQJ (2) ═ 0.095858, FQJ (3) ═ 0.491984, and FQJ (4) ═ 0.511002 were calculated.
For all features fiAnd sorting the obtained features in descending order according to the sizes of the corresponding FQJ (i) values, wherein the obtained features have the following importance sequence for the classification task: petal width, Petal length, Sepal width, Sepal length.