CN111353525A

CN111353525A - Modeling and missing value filling method for unbalanced incomplete data set

Info

Publication number: CN111353525A
Application number: CN202010085969.9A
Authority: CN
Inventors: 刘辉; 张立勇; 陆艺丹
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2020-02-11
Filing date: 2020-02-11
Publication date: 2020-06-30

Abstract

The invention discloses a modeling and missing value filling method for an unbalanced incomplete data set, and belongs to the technical field of data mining. The invention includes a build model portion and a fill plan portion. In the model building part, aiming at the data imbalance, a distance density algorithm is designed to be applied to the front piece identification process of TS modeling; in the filling scheme part, aiming at the incompleteness of data, a missing value is taken as a variable and is made to participate in an iterative learning filling scheme of conclusion parameter identification, a conclusion parameter is calculated based on a filled data set in the filling process, then the filling value is updated based on the adjusted conclusion parameter, and the filling is completed when iteration converges. The invention reduces the influence of the imbalance of the data set on the TS modeling, fully utilizes the data information in the incomplete data set, and has ideal filling precision on the unbalanced incomplete data set.

Description

Modeling and missing value filling method for unbalanced incomplete data set

Technical Field

The invention belongs to the technical field of data mining, and relates to a modeling and missing value filling method for an unbalanced incomplete data set.

Background

Data loss and imbalance of data sets are two unavoidable problems in the field of data mining. Data missing refers to data value missing or attribute missing caused by factors such as environment when a data set is collected or stored; the imbalance of the data set means that the distribution of the classes in the data set has imbalance, and the number of different class samples has large distance. The imbalance and loss of data sets are widely present in the field of data analysis mining, and thus research on such data sets has received increasing attention.

The imbalance of the data sets creates difficulties in data mining. In the process of fuzzy partition processing of unbalanced data sets, a phenomenon of "uniform effect" (Zhou K, Yang s. expanding the uniform effect of fcmclusing: a data distribution permanent [ J ]. Knowledge-Based Systems, 2016, 96:76-83) is easily generated, i.e., samples in a plurality of classes are divided into a few classes, so that the number of samples in each set in the result is approximately the same. For such phenomena, researchers have proposed fuzzy partitioning methods based on an undersampled data preprocessing model, a kernel function-based clustering algorithm, a multi-point representation method, and the like.

The absence of data sets is also an inevitable problem in the field of data mining. The incomplete samples are directly discarded and the remaining complete samples are used for data analysis, so that the results are biased due to insufficient data. In contrast, by studying the existing data, reasonable padding values for missing values are obtained, and in most cases, better results are obtained. Currently, researchers have proposed a variety of padding methods. The principle of the regression filling method is to establish a regression equation to estimate a missing value according to a regression relationship between an existing value and a pre-filled missing value in a data set, and the regression filling method is widely applied to various work of processing incomplete data.

However, the conventional regression filling method cannot identify the correlation existing between the sample attributes. To identify relationships between attributes, one approach is to divide data with similar regression relationships into subsets using fuzzy clustering and approximate each subset with a linear model. The method can obtain a rule-based fuzzy model by utilizing the existing fuzzy partition matrix, and solves the problem that the correlation relation among the attributes in the actual data set is unknown.

The Takagi-Takagi model (TS model for short) is a typical representative of fuzzy model, and is composed of several if-then rules, and the Modeling process is divided into two parts of front part identification and back part identification (T. Takagi, M.Takagi, fuzzy identification of Systems and Its Applications to Modeling and Control, IEEETrans. Syst. Man Cybern. SMC-15(1985) 116- "132"). It is a nonlinear model represented by the "IF-THEN" fuzzy rule. When modeling data, firstly dividing an input space into a plurality of fuzzy subspaces, then establishing a local linear model in each fuzzy subspace, and connecting all local models by using a membership function. The ith rule of the TS model is as shown in equation (1):

in the formula, R⁽ⁱ⁾Denotes the ith fuzzy rule, i is 1,2.. k denotes the number of rules of the TS model; x is the number of_j＝{x_j1,x_j2,...,x_jsIs the jth input variable of the system, also called the antecedent variable, where

j

1,2_jsAn s-th attribute representing a j-th sample;

is a fuzzy set of the m-th attribute, also called R, in the ith rule⁽ⁱ⁾Wherein m is 1,2.

A conclusion parameter, also called a back-end parameter,

then, the conclusion parameter of the s attribute in the ith rule is represented; y is_j ⁽ⁱ⁾Indicating the output of the jth input variable in the ith rule.

Final output y of jth input variable in fuzzy system_jComprises the following steps:

in the formula v_j ⁽ⁱ⁾The weight of the jth input variable in the ith rule is given by equation (3):

in the formula A_m ⁽ⁱ⁾(x_jm) The m-th attribute x of the j-th sample in the ith rule_jmBelonging to fuzzy sets

Wherein m is 1,2.

Filling method (Missing Value expressions by Rule-base Incomplex Data Fuzzy modeling. Xiacohen Lai, Xin Liu, Liyong Zhang, et al. IEEE International Conference on Communications (IEEE ICC 2019)) based on TS model obtains membership of each Rule through FCM-PDS clustering algorithm, and uses Fuzzy set

As a precursor parameter, the incomplete data set is divided into several subsets, and a local linear regression model is established, which only contains important input variables of each subset. Then, a global nonlinear model is obtained by weighted summation of each local linear model, and the output thereof is used as a padding value. Compared with the traditional regression filling method, the method fully utilizes the existing values and more accurately describes the relationship between the attributes. However, the problem of data imbalance in the actual data set is inevitable, and the above fuzzy partitioning method does not consider the influence of the imbalance of the data set on the fuzzy partitioning.

Disclosure of Invention

In order to solve the problems, the accuracy of the regression equation can be improved by reasonably dividing the unbalanced data set, and therefore the invention provides a modeling and missing value filling method of the unbalanced incomplete data set on the basis of a TS model. The invention comprises two parts: the method comprises the steps of constructing a model part and filling a scheme part, wherein the former improves a TS model front part parameter identification method so as to reduce the influence of data imbalance on fuzzy division; the latter uses the incomplete samples in the training process to improve the data utilization of the incomplete data set.

In the front piece identification process of the model, front piece parameter identification is carried out on an unbalanced incomplete data set based on an idea (SD algorithm) combining distance density and maximum and minimum distance, and the regular number of the front piece is determined so as to reduce the influence of data imbalance on fuzzy division; then aiming at the problem of incomplete input data in the modeling process, firstly selecting input variables to obtain a determined model structure, and then applying a least square method and an iterative updating strategy to realize estimation of conclusion parameters and filling of missing values so as to realize full utilization of the existing data; when the iteration converges, the parameters and the padding values tend to be fixed, thereby completing missing value padding.

The padding accuracy of the missing value padding method can be measured by the Root Mean Square Error (RMSE), i.e.

In which N is the number of missing values, x_i∈X_MIn the form of the original actual data value,

is the padding value of the missing value under the padding scheme. If the RMSE value is smaller, the data padding effect is good, otherwise, the padding effect is poor.

The technical scheme of the invention is as follows:

a modeling and missing value filling method for an unbalanced incomplete data set comprises a model building part and a filling scheme, and specifically comprises the following steps:

(1) building models

The local density and the local distance are combined to define the distance density ds for each sample_ijAnd designing a distance density algorithm for identifying the front piece model (SD algorithm for short):

with incomplete data set X ═ X_M,X_CIn which X is_MFor subsets formed by missing values in the data set, X_CA subset of non-missing values in the dataset. For arbitrary sample x_i,x_j∈ X, distance density ds thereof_ijComprises the following steps:

ds_ij＝exp(S(x_i))×pd(x_i,x_j) (5)

in the formula, S (x)_iIs a sample x defined in formula (6)_iLocal density of (c), pd (x)_i,x_j) Is x obtained from the formula (7)_iAnd x_jThe local distance of (a).

Sample X in dataset X_iThe local density of (a) is defined as:

in the formula, N_jIs represented by sample x_iK number of neighboring samples x_jA set of components, wherein

i

1,2, n, n represents the number of samples, and

j

1,2. pd (x)_i,x_j) The local distance is defined, and the calculation method comprises the following steps:

where s is the number of sample attributes, I_imMarking the m attribute value x of the i sample_imWhether or not it is absent, I_jmMarking the mth attribute value x of the jth sample_jmWhether the deletion exists or not is calculated as follows:

and calculating the clustering centers of the samples and the number of the clustering centers by adopting an SD algorithm, calculating the membership degree by using the obtained clustering centers, and finally obtaining the front-part parameters of the model.

(2) Filling scheme

The invention updates conclusion parameters and padding values of the TS model based on an iterative learning (IU) mode. And (4) aiming at the incomplete data set X with the sample attribute quantity of s, respectively taking each dimension attribute as output, and building s TS models. The input of each TS model is D^(m)＝{D₁,D₂,...,D_m-1,D_m+1,...D_sD-desired output_mWherein m is 1,2. Firstly, the incomplete data set is randomly initialized to obtain a complete data set, and then conclusion parameters are calculated based on a least square method. In each TS model, for the jth sample x_jRule I of (1)⁽ⁱ⁾Weighted input H of_j ⁽ⁱ⁾Obtained from formula (9):

H_j ⁽ⁱ⁾＝v_j ⁽ⁱ⁾Γ⁽ⁱ⁾(9)

in the formula v_j ⁽ⁱ⁾Representing a weight; gamma-shaped⁽ⁱ⁾＝[1,x_j1 ⁽ⁱ⁾,...,x_j(q-1) ⁽ⁱ⁾,x_j(q+1) ⁽ⁱ⁾,...,x_js ⁽ⁱ⁾]Denotes R after selection of variables⁽ⁱ⁾Wherein the input variable x_jq ⁽ⁱ⁾Is rejected,

i

1,2, 1, k,

j

1,2<q<s。

The actual output values of the model are then calculated

In the formula P⁽ⁱ⁾For rule i R derived from least squares⁽ⁱ⁾Conclusion parameters of (1).

Obtaining output sets of s TS models through formulas (9) and (10)

Where l represents the l-th iteration,

indicating that the padding value is to be updated,

model output representing existing data for calculating root mean square error f from corresponding true values^(l). Then calculating the root mean square error f obtained from the last iterative learning^(l-1)If the difference value | △ f | is larger than the threshold value epsilon, the steps are repeated to enter a new round of learning, otherwise, the iteration is finished and the filled data set is output.

The invention has the beneficial effects that: firstly, the algorithm based on the adopted distance density is adopted to replace the original FCM method to identify the front part parameters of the TS model, and the membership degree is reconstructed, so that the influence of the data imbalance on the fuzzy partition is reduced. Secondly, aiming at the problem of incomplete input data in the modeling process, the missing values are regarded as variables, and a set of iterative learning filling scheme for dynamically updating the missing values and model conclusion parameters is adopted, so that the existing data is fully utilized.

Drawings

Fig. 1 is a schematic diagram of the operation of the present invention.

In fig. 1: 1, inputting an unbalanced incomplete data set containing missing values into a model; 2 dividing the data set by a distance density algorithm (SD); 3 calculating the distance pd (x) between the sample and the center by adopting a local distance strategy_i,c_t) (ii) a 4, selecting input variables; 5 dynamically updating conclusion parameters and filling values through iterative learning (IU); 6 output the complete data set containing the padding values.

FIG. 2 is a workflow diagram of the distance density algorithm (SD) of the present invention.

FIG. 3 is a diagram of an implementation of the iterative learning method (IU) of the present invention.

In fig. 3: step 1, random pre-filling is carried out on an incomplete data set; step 2, inputting the filled data set into an iterative learning model; step 3, when the output condition is not met, continuously updating the filling value; and 4, outputting the data set containing the final filling value when the output condition is reached.

Detailed Description

The following detailed description of the embodiments of the invention is provided in conjunction with the accompanying drawings.

FIG. 1 is a schematic diagram of the operation of the present invention showing a first row D in an unbalanced incomplete data set₁,D₂,...,D_sIndicating the attribute name, the black mark indicating the missing value, and the gray mark indicating the padding value. Based on fig. 1, the invention uses a distance density algorithm to identify the parameters of the front part, and then uses an iterative learning method to dynamically realize the identification of conclusion parameters and the filling of missing values. Firstly, inputting an unbalanced incomplete data set containing missing values into a model; in the model construction, n samples of the data set are divided into k classes by using a distance density algorithm, and the class center of each class is c₁,c₂,...c_k(ii) a Due to the lack of the data set attribute, the present invention calculates the distance pd (x) between the sample and the center by the formula (6)_i,c_t) Wherein i is 1,2, n, t is 1,2, k, thereby completing the antecedent parameter identification of the model; secondly, selecting input variables to enable the model to only comprise a regression equation of the significant variables; in the filling scheme, dynamically updating the conclusion parameters and the filling values so as to complete iterative learning of the model; when the iteration converges, an unbalanced complete data set containing the final padding values is output.

The details of the technical scheme of the invention are explained by taking three data sets of a UCI machine learning database as an example. An incomplete data set is constructed by manually deleting portions of the data in the data set.

(1) Building models

The distance density algorithm (SD algorithm) divides the input unbalanced incomplete data set into k subsets. Aiming at the imbalance of the data set, the principle is to ensure that the distance between a new cluster center obtained each time and the existing cluster center is relatively far. The situation that cluster centers are too close to each other, a plurality of cluster centers are selected in the same class, and no cluster center exists in a small cluster is avoided.

Let B denote the cluster center subscript set, recording the cluster center subscript selected from the dataset samples. Then, selecting a sample farthest from the selected class center from the non-class center samples, where the sample is denoted by q, where q satisfies:

then take x_qIs a new cluster center and adds the index of the new cluster center to set B. Wherein, c_tRepresenting the t-th cluster center of the data set.

The algorithm does not need to give the number of clusters in advance, and can determine the number of initial cluster centers according to a certain calculation rule. The number of the clustering centers is the regular number of the TS model.

The workflow of the distance density (SD) algorithm is detailed in fig. 2, and the specific steps are as follows:

step 1: inputting an incomplete data set;

step 2: initializing an empty set B, the number K of neighbor samples and a parameter theta, wherein theta is less than 1;

and step 3: calculating x_iLocal distance to the remaining samples pd (x)_i,x_j) Wherein j is 1, i-1, i +1, n. Then sorting the obtained local distances, and selecting the first K nearest samples to form a set N_i；

And 4, step 4: calculating the local density of each sample according to the formula (6), and taking the sample with the maximum local density as the first class center c₁Record c₁＝x_i，B＝B+{i}；

And 5: computing the remaining samples to c according to equation (5)₁And selecting the sample with the largest distance density attribute as the second class center c₂Record c₂＝x_j，B＝B+{j}；

Step 6: if the maximum minimum distance

Is still greater than theta × pd (c)₁,c₂) If yes, go to step 7, otherwise go to step 9;

and 7: center for recording new selectionIs c_qQ satisfies formula (11);

and 8: calculate the remaining samples to the new center c according to equation (5)_qAnd selecting the sample with the largest distance density attribute as the first class center c_nextRecord c_next＝x_lB ═ B + { l }. Returning to the step 6;

and step 9: output clustering center { c₁,c₂,...,c_|B|And the number | B | of cluster centers.

The number | B | of the cluster centers is equal to the number k of the fuzzy rules, i.e., | B | ═ k. And then calculating the membership degree by using the clustering centers obtained in the steps 1-9. By using

Represents a sample x_iBelonging to A^(t)In which A is^(t)Represents one with c_tA set of multi-dimensional ambiguities that are centered,

obtained from formula (12):

in the formula, pd (c)_t,x_i) The local distance between the t-th cluster center and the i-th sample is represented, where t is 1,2. Obtaining fuzzy sets

Thereby completing the identification of the parameters of the model front piece.

(2) Filling scheme

After obtaining the front-part parameters, firstly, stepwise regression is used for selecting input variables, so that only significant variables exist in the model. The method for filling and conclusion parameter identification based on iterative learning (IU) is shown in fig. 3. First row D in FIG. 3₁,D₂,...,D_sRepresenting an attribute name; black marks representing dynamic padding values

Wherein l represents the l-th iteration; the gray mark represents the final padding value; v. of⁽ⁱ⁾Is each rule R⁽ⁱ⁾Wherein i 1,2.., k; h represents the weighted input of all rules; p represents a conclusion parameter, which is calculated in the following way:

P＝(H^TH)^-1H^TY (13)

wherein Y ═ x_1m,x_2m,...,x_nm]^TRepresenting all samples in the attribute of the mth dimension, wherein m is 1,2,.. s, the absolute value of the difference value of the root mean square error obtained by the existing data and the corresponding model output in two adjacent iterative learning is represented by | △ f | used for judging whether the iterative learning is finished, and epsilon represents a threshold value for stopping the iteration, wherein the calculation formula of f is shown as the following formula:

wherein, | X_CL represents the number of existing data,

and x_i∈X_C. The specific steps of iterative learning (IU) are as follows:

step 1: random pre-filling is carried out on the incomplete data set to obtain a value containing dynamic filling

The data set of (a);

step 2: the conclusion parameter P is calculated based on the padded data set and equations (9) and (13). And obtaining the model output from the formula (10)

And step 3: by using

Updating the padding value based on

And equation (14) calculates f^(l)And f obtained from the last iteration^(l-1)Comparing and solving the difference | △ f | if | △ f |>E, returning to the step 2, and entering the next iterative learning;

and 4, if the | △ f | is less than or equal to the epsilon, terminating the iteration and outputting a data set containing the final filling value.

(3) Experiment of

3 data sets are selected from a UCI machine learning database to verify the filling performance of the method, and the description of the data sets is shown in table 1. In order to calculate the error between the estimation of the missing value and the true value, the selected data sets are all complete data sets, and an experiment constructs an incomplete data set by manually deleting partial data according to the specified missing rate. The specified deletion rates were 5%, 10%, 15%, 20%, 25%, 30%, 40%, 45%, and 50%, respectively.

Table 1 data set description

The experiment is based on the proposed method to fill in incomplete data and compare the filled value with the actual value. For the complete dataset at each specified deficiency ratio, 5 incomplete datasets were randomly generated and the average RMSE value was calculated as the final experimental result. The invention compares the following five filling schemes: a conventional regression model based padding method (REG); a traditional TS modeling-based padding method (Basic-TS); a TS modeling filling method (SD-TS) for constructing a model based on a distance density algorithm; adopting a TS modeling filling method (TS-IU) of iterative learning; a model is constructed based on a distance density algorithm, and an iterative learning TS modeling filling method (SD-TS-IU) is adopted. In each set of comparative experiments, the same initialization data set was used for all methods. Table 2 shows the RMSE indicator results for the five padding methods, where the best results are bolded and underlined, and the second best results are bolded.

TABLE 2 RMSE indices of five filling methods

As can be seen from Table 2, the padding precision of Basic-TS is generally higher than that of REG, which shows that the padding method based on TS modeling is more effective than the padding method based on regression; further inspection of the data in the tables reveals that RMSEs for SD-TS are generally lower than those for Basic-TS, and that SD-TS-IU also generally gives better results than TS-IU. And with the improvement of the imbalance degree of the data set, the effect of the distance density algorithm is more obvious; comparing RMSEs of TS-IU and Basic-TS, finding that RMSEs of TS-IU are superior to Basic-TS under all conditions except special cases, and showing that the iterative update strategy can effectively improve the filling precision.

In conclusion, the SD-TS-IU of the invention has the most optimal result, which shows that the filling precision of the SD-TS-IU is superior to other comparison methods and has higher filling precision.

Claims

1. A modeling and missing value filling method for an unbalanced incomplete data set is characterized by comprising the following steps:

(1) building models

The local density and the local distance are combined to define the distance density ds for each sample_ijAnd designing a distance density algorithm for identifying the front piece model, namely an SD algorithm:

with incomplete data set X ═ X_M,X_CIn which X is_MFor subsets formed by missing values in the data set, X_CFor subsets of non-missing values in a data set, for any sample x_i,x_j∈ X, distance density ds thereof_ijComprises the following steps:

ds_ij＝exp(S(x_i))×pd(x_i,x_j) (5)

in the formula, S (x)_iIs a sample x defined in formula (6)_iLocal density of (c), pd (x)_i,x_j) Is x obtained from the formula (7)_iAnd x_jThe local distance of (a);

sample X in dataset X_iThe local density of (a) is defined as:

in the formula, N_jIs represented by sample x_iK number of neighboring samples x_jA set of components, where i 1,2, n, n denotes the number of samples, j 1,2, K is a custom constant, pd (x)_i,x_j) The local distance is defined, and the calculation method comprises the following steps:

calculating the clustering centers of the samples and the number of the clustering centers by adopting an SD algorithm, then calculating the membership degree by using the obtained clustering centers, and finally obtaining the front part parameters of the model;

(2) filling scheme

Updating conclusion parameters and filling values of the TS model based on an iterative learning mode: aiming at the incomplete data set X with the sample attribute quantity of s, respectively taking each dimension attribute as output, building s TS models, and taking the input of each TS model as D^(m)＝{D₁,D₂,...,D_m-1,D_m+1,...D_sD-desired output_mWherein m 1,2.. times, s, the incomplete data set is randomly initialized to obtain a complete data set, and then conclusion parameters are calculated based on a least square method, and in each TS model, for the jth sample x_jRule I of (1)⁽ⁱ⁾Weighting ofInput H_j ⁽ⁱ⁾Obtained from formula (9):

H_j ⁽ⁱ⁾＝v_j ⁽ⁱ⁾Γ⁽ⁱ⁾(9)

in the formula v_j ⁽ⁱ⁾Representing a weight; gamma-shaped⁽ⁱ⁾＝[1,x_j1 ⁽ⁱ⁾,...,x_j(q-1) ⁽ⁱ⁾,x_j(q+1) ⁽ⁱ⁾,...,x_js ⁽ⁱ⁾]Denotes R after selection of variables⁽ⁱ⁾Wherein the input variable x_jq ⁽ⁱ⁾Is rejected, i 1,2, 1, k, j 1,2<q<s, then calculating the actual output value of the model

In the formula P⁽ⁱ⁾For rule i R derived from least squares⁽ⁱ⁾The conclusion parameter of (1);

obtaining output sets of s TS models through formulas (9) and (10)

Where l represents the l-th iteration,

indicating that the padding value is to be updated,

model output representing existing data for calculating root mean square error f from corresponding true values^(l)Then calculating the root mean square error f obtained from the last iterative learning^(l-1)If the difference value | △ f | is larger than the threshold value epsilon, repeating the steps to enter a new round of learning, otherwise, finishing iteration and outputting a filled data set, and thus, modeling the unbalanced incomplete data TS with the s-dimensional attribute as the output is realized.

2. The method of claim 1, wherein the incomplete dataset is modeled by a plurality of partial data sets,

b is made to represent a clustering center subscript set, and the class center subscript selected from the data set sample is recorded; then, selecting a sample farthest from the selected class center from the non-class center samples, where the sample is denoted by q, where q satisfies:

then take x_qAdding the subscript of the new clustering center into the set B; wherein, c_tA tth cluster center representing the data set;

the specific process of constructing the model is as follows:

step 1: inputting an incomplete data set;

and step 3: calculating x_iLocal distance to the remaining samples pd (x)_i,x_j) Wherein j is 1, i-1, i +1, n; then sorting the obtained local distances, and selecting the first K nearest samples to form a set N_i；

Step 6: if the maximum minimum distance

Is still greater than theta × pd (c)₁,c₂) Go to step 7, otherwiseTurning to step 9;

and 7: recording the newly selected center as c_qQ satisfies formula (11);

and 8: calculate the remaining samples to the new center c according to equation (5)_qAnd selecting the sample with the largest distance density attribute as the first class center c_nextRecord c_next＝x_lB ═ B + { l }; returning to the step 6;

and step 9: output clustering center { c₁,c₂,...,c_|B|And the number | B | of clustering centers;

the number | B | of the clustering centers is equal to the number k of the fuzzy rules, namely | B | ═ k;

then calculating the membership degree by using the clustering centers obtained in the steps 1-9; by using

obtained from formula (12):

in the formula, pd (c)_t,x_i) Means a local distance between the t-th cluster center and the i-th sample, where t is 1,2. Obtaining fuzzy sets

3. The method of claim 1, wherein the incomplete dataset is modeled by a plurality of partial data sets,

h represents the weighted input of all rules; p represents a conclusion parameter, which is calculated in the following way:

P＝(H^TH)^-1H^TY (13)

wherein Y ═ x_1m,x_2m,...,x_nm]^TRepresenting all samples in the attribute of the mth dimension, wherein m is 1,2, s, and | △ f | represents the absolute value of the difference value of the root mean square error obtained by the existing data and the corresponding model output in two adjacent iterative learning and is used for judging whether the iterative learning is finished, epsilon represents the threshold value for stopping the iteration, and the calculation formula of f is shown as the following formula:

wherein, | X_CL represents the number of existing data,

and x_i∈X_C；

The specific process of the iterative learning is as follows:

The data set of (a);

step 2: calculating a conclusion parameter based on the padded data set and equations (9) and (13); and obtaining the model output from the formula (10)

And step 3: by using

Updating the padding value based on

And equation (14) calculates f^(l)And f obtained from the last iteration^(l ^-1)ComparisonAnd calculating the difference value of △ f, if △ f does not induce calculation of Y>E, returning to the step 2, and entering the next iterative learning;