CN111507365A

CN111507365A - Confidence rule automatic generation method based on fuzzy clustering

Info

Publication number: CN111507365A
Application number: CN201910821869.5A
Authority: CN
Inventors: 王晓丽; 吕星晓; 阳春华; 周佳怡; 桂卫华
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2019-09-02
Filing date: 2019-09-02
Publication date: 2020-08-07

Abstract

The invention discloses a confidence rule automatic generation method based on fuzzy clustering, and belongs to the technical field of data mining. Aiming at the problem of time and labor consumption of establishing a confidence rule base completely depending on manual experience, on the basis of the method for expanding the confidence rule base, the method firstly carries out fuzzy clustering on sample data, and takes a fuzzy clustering center of the sample and the sample which takes the clustering center as a circle center and takes the distance from the clustering center as a certain set value as basic data for generating the expanded confidence rule base; and calculating the precondition attribute weight and the rule weight according to the variable incidence relation and the sample membership degree matrix. The invention aims to solve the problems that sample data is difficult to select and is not suitable for large sample data modeling in the method for expanding the confidence rule base. The invention reduces the dependence degree of experience setting precondition attribute weight and initial rule weight on expert experience in the existing method for expanding the confidence rule base. By adopting the method and the device, the confidence rule is extracted from the actual production data, and compared with the manual setting of the initial confidence rule, the generation efficiency and the reasoning precision of the confidence rule base are improved.

Description

Confidence rule automatic generation method based on fuzzy clustering

Technical Field

The invention relates to a confidence rule automatic generation method based on fuzzy clustering, and belongs to the technical field of data mining.

Background

Automatic data transfer to knowledge is a technology necessary for intelligent manufacturing development. The confidence rule base method is a knowledge-based method, is developed on the basis of a D-S evidence theory, a decision theory, a fuzzy theory and a traditional IF-THEN rule base, and has the capability of modeling various uncertain and nonlinear data. Compared with the traditional IF-THEN generation type rule, the confidence rule adds weight parameters such as rule weight, premise attribute weight, result confidence coefficient and the like, so that the rule description range is expanded, the randomness and the fuzziness of knowledge can be fully combined, and a knowledge expression mode which is closer to reality and can contain more information is provided. The confidence rule base method can utilize quantitative production data with large amount of uncertainty and qualitative expert knowledge in the production process, has strong model modeling capacity on data with fuzzy uncertainty or probability uncertainty, or collected incomplete data, or data with remarkable nonlinear characteristics, and can better deal with the situations. The confidence rule base method has been successfully applied to the detection, evaluation and prediction of problems in the fields of engineering management, financial decision, military, medical treatment and the like. However, the value of the weight parameter of the confidence rule base system is initially given by an expert, and the accuracy of the weight parameter directly affects the prediction performance of the system. Because the personal experience and judgment of experts have certain subjectivity and are not accurate enough, especially when modeling industrial complex problems, the initial confidence rule base established by the experts often cannot meet the precision requirement and cannot adapt to the change of the situation. Therefore, a parameter optimization learning model of the confidence rule base system is usually required to be established, parameters such as rule weight, premise attribute weight and result confidence coefficient are optimized, and the inference precision of the confidence rule base system is improved. That is, the confidence rule base needs to be built not only by expert knowledge, but also by a time-consuming parameter learning process.

The extended confidence rule base method can solve the problem that the traditional confidence rule base is completely dependent on manual experience to establish time and labor, does not need a complicated initial confidence rule base generation mechanism and a time-consuming parameter iteration optimization training process, and can directly, simply and efficiently and automatically generate the extended confidence rule base from sample data. The extended confidence rule base method adopts a data driving mode to generate the confidence rules, each sample data correspondingly generates an extended confidence rule, and the number of the sample data determines the rule number of the extended confidence rule base. When the extended confidence rule base method is applied to complex problems, if the number of selected samples is small, incomplete rules are easily caused, and the inference precision of the established rule base is not high; if the number of the selected samples is too large, the samples are susceptible to noise data and rule redundancy is caused, and the inference precision is influenced when the rules are activated for inference. In addition, the quality of the sample data also affects the reasoning ability of the extended confidence rule base, and actually, the sample data often has incompleteness caused by incomplete data and inconsistency caused by mutual contradiction between data. Therefore, in the extended confidence rule base method, the selection of sample data is a difficult problem and is not suitable for large sample data modeling. In addition, it also requires the expert to give the weight of the premise attribute and the weight of the initial rule, and cannot completely avoid the dependence on the expert experience knowledge.

Aiming at the existing problems, on the basis of an expanded confidence rule base method, the process of avoiding manually selecting sample data to improve reasoning accuracy is researched, the dependence on expert experience knowledge is reduced, a confidence rule base model is established, a real automatic reasoning target is realized, and the method has very important significance for improving the generation efficiency and the reasoning accuracy of a confidence rule base and the knowledge automation and control intelligence of an industrial process.

Disclosure of Invention

The invention aims to solve the problem that the selection of sample data in the method for expanding the confidence rule base is difficult to cause that the sample data is not suitable for modeling large sample data; and the dependence of experience setting precondition attribute weight and initial rule weight on expert experience in the existing method for expanding the confidence rule base is reduced.

The technical scheme of the invention is as follows:

firstly, carrying out fuzzy clustering on a large amount of historical sample data by using a fuzzy C-means clustering algorithm to obtain a fuzzy clustering center V1, re-clustering the sample data of which the elements in the membership degree matrix are all smaller than 0.01 to obtain a fuzzy clustering center V2, and continuously re-clustering the sample data of which the elements in the membership degree matrix are all smaller than 0.01 until the elements in the membership degree matrix are not smaller than 0.01, thereby obtaining final fuzzy clustering center data (V1 ∪ V2 ∪ …).

And taking the obtained fuzzy clustering center data (V1 ∪ V2 ∪ …) as basic data for constructing an expanded confidence rule base, and determining the number of reference levels and utility values of each pre-condition attribute and conclusion attribute.

Converting the input-output pairs in the basic data into a confidence distribution expression form of the premise attributes and conclusion attributes of the extended confidence rule base shown in formula (1):

wherein, T_kIs the number of premise attributes in the kth rule, J_iThe number of reference levels for the ith precondition attribute, N is the number of output result levels, θ_kRule weight of the kth rule;_iis the attribute weight;

a utility value representing the jth reference level of the ith precondition attribute in the kth rule, where i is 1,2, L, T_k,j＝1,2,L,J_i；J_iThe number of reference levels representing the ith precondition attribute;

the confidence of the corresponding reference level is indicated, where i ═ 1,2, L, T_k,j＝1,2,L,J_i；D_nN is 1, 2L, N is the evaluation level of the output result, β_n,kIndicates the evaluation result D in the k-th rule_nThe confidence of (c).

For input-output pairsInputting a value x_iIts precondition attribute is U_iAccording to the formula (3), x can be expressed_iIs converted into U_iAs shown in equation (2):

S(x_i)＝{(A_ij,α_ij)；j＝1,2,L,J_i} (2)

wherein the content of the first and second substances,

wherein, for the ith precondition attribute U_iThe number of reference levels is J_iAmong these different reference levels, the effective value of the jth reference level is set to A_ij。α_ijFor a given value x_iConversion to prerequisite Attribute U_iBy examining the value x_iThe degree of similarity to the corresponding reference level utility value is obtained.

Similarly, for the output value y in an input-output pair_iIts conclusion attribute is V_iAccording to the formula (5), y can be expressed_iConversion to V_iAs shown in equation (4):

S(y_i)＝{(D_ij,β_ij)；j＝1,2,L,N} (4)

wherein the content of the first and second substances,

wherein N represents the number of evaluation levels of conclusion attributes; d_ijThe utility value of the jth evaluation level in the N evaluation levels representing the ith conclusion attribute β_ijRepresents a value y_iTransition to conclusion Attribute V_iThe confidence level at the jth evaluation level of (a) is also obtained by comparing the utility value of the corresponding evaluation level with the value y_iThe degree of similarity of (a).

And then, calculating the weight of the precondition attribute based on the incidence relation of the variables, namely, respectively calculating the Pearson correlation coefficient of each precondition attribute and the conclusion attribute by using historical sample data, and normalizing the correlation coefficient to obtain the weight of the precondition attribute.

Calculating rule weight based on sample membership matrix, firstly calculating initial rule weight theta_p. Assuming that the pth rule is generated by the pth clustering center, the ratio of the number of sample data belonging to the pth class to the total number of the whole sample data is the initial rule weight theta_p. Then, according to the extended confidence rule base method, the rule weight is updated according to the inconsistency degree of the rule. The steps of updating the rule weights are as follows:

the first step is as follows: rule similarity is calculated.

First, the calculation formula (7) for similarity is obtained from the euclidean distance:

let X be { X ═ X₁,x₂,L,x_n}，Y＝{y₁,y₂,L,y_nThe euclidean distance can be defined as follows:

then:

Sim(X,Y)＝1-d(X,Y) (7)

rule similarities include the first piece of Similarity (SRA) and the second piece of Similarity (SRC). Assume that there are two rules with only one conclusion attribute:

wherein, U_TRepresenting the Tth precondition attribute, wherein T represents the number of the precondition attributes;

representing the confidence distribution of the t-th precondition attribute in the ith rule; dⁱRepresenting the confidence distribution of the conclusion attribute of the ith rule.

Then, the front piece similarity SRA and the back piece similarity SRC of the above two rules are defined as follows:

SRC(i,k)＝Sim(Dⁱ,D^k) (11)

wherein, the similarity between the ith rule and the tth precondition attribute of the kth rule in the formula (10) can be calculated according to the formula (7)

And similarity Sim (D) of conclusion attributes of the ith rule and the kth rule in equation (11)ⁱ,D^k)。

The second step is that: and calculating the rule inconsistency.

First, using SRA and SRC, the degree of agreement between two rules is calculated as follows:

Cons(R_i,R_k)＝exp{-(SRA(i,k)/SRC(i,k)-1.0)²/(1/SRA(i,k))²} (12)

wherein, Cons (R)_i,R_k)∈[0,1]。

Then, the inconsistency of the ith rule for the whole rule base is calculated:

wherein, L^*For the number of generated rules, then, the overall degree of inconsistency of the rule base is calculated:

the third step: the rule weights are updated.

The formula for calculating the rule weight is as follows:

wherein the content of the first and second substances,

the constant λ is a function of the update weight of the pth rule, depending on the particular application.

Therefore, a confidence rule base can be constructed and completed, and automatic generation of the confidence rules is realized.

The invention has the beneficial effects that: firstly, fuzzy clustering is carried out on sample data, a fuzzy clustering center of a sample and the sample which takes the clustering center as a circle center and has a certain set value of distance from the clustering center are used as basic data for generating an extended confidence rule base, and the problem that the data sample is difficult to select is solved; and then, the precondition attribute weight and the rule weight are calculated according to the variable incidence relation and the sample membership matrix, so that the degree of dependence on expert experience knowledge is reduced, and the defect of expanding the construction of a confidence rule base is overcome. By adopting the method and the device, the confidence rule is extracted from the actual production data, and compared with the manual setting of the initial confidence rule, the generation efficiency and the reasoning precision of the confidence rule base are improved.

Drawings

FIG. 1 is a flow chart of the fuzzy C-means clustering algorithm of the present invention;

fig. 2 is a flowchart of the confidence rule automatic generation method based on fuzzy clustering according to the present invention.

Detailed Description

Specific embodiments of the present invention will now be described with reference to the accompanying drawings, it being understood that the embodiments shown and described in the drawings are merely illustrative and are intended to illustrate the principles and methods of the present invention and not to limit the scope of the invention.

The invention relates to a confidence rule automatic generation method based on fuzzy clustering. Firstly, on the basis of carrying out fuzzy clustering on sample data for multiple times by using a fuzzy C-means clustering algorithm, taking an obtained fuzzy clustering center as basic data for generating an expanded confidence rule base; and calculating the weight of the precondition attribute and the rule weight according to the incidence relation of the variables and the membership matrix of the sample, thereby realizing the automatic generation of the confidence rule. The invention realizes the optimization control of the basic dosage of antimony roughing by using an automatic generation method of a fuzzy clustering confidence rule based on industrial production data. 530 groups of industrial historical data continuously collected in an antimony roughing procedure of a certain gold and antimony flotation plant in China are taken as cases for analysis, wherein 330 groups of industrial historical data are taken as training data, and the remaining 200 groups of industrial historical data are taken as test data. The data comprises ore feeding conditions, foam image characteristics, medicine adding amount and other process data, wherein the medicine adding amount comprises xanthate, black powder, copper sulfate, lead nitrate and second oil.

FIG. 1 shows a flow chart of the fuzzy C-means clustering algorithm of the present invention. Suppose the dataset to be clustered is X ═ X₁,x₂,L,x_k,L,x_nAnd (4) classifying the same into c types. The objective function of the fuzzy C-means clustering algorithm is shown in formula (1):

wherein

d_ik＝||x_k-v_i|| (2)

And is

Wherein x is_k∈R^P，R^PIs P-dimensional space, n is the number of data, c is more than or equal to 2<n，U＝[μ_ik]I is 1,2, L, c, k is 1,2, L, n is fuzzy classification matrix, mu_ikRepresents a sample x_kDegree of membership belonging to class i, and μ_ikSatisfies the formula (3), V ═ V₁,v₂,L,v_c) For the cluster center matrix, m ∈ [1, ∞) ] is the fuzzy weighting index, usually taking the values 1.1. ltoreq. m.ltoreq.2.5, d_ikThen the sample x is represented_kTo the ith cluster center v_iEuclidean distance of.

The fuzzy C-means clustering algorithm minimizes the objective function to obtain the optimal clustering result. To obtain the extreme values of the objective function, the values are measured in μ_ikAnd v_iThe derivation is performed, and when the constraint condition shown in equation (3) is satisfied, it is obtained:

the specific implementation process of the fuzzy C-means clustering algorithm can be summarized as follows:

Step₁firstly, setting parameters of a fuzzy C-means clustering algorithm: the cluster category number c, the fuzzy weighting index m, the iteration stop threshold value and the maximum iteration number, and initializing a cluster center V⁽⁰⁾And let b equal to 0.

Step₂Sequentially calculating a membership matrix U according to a formula (4) and a formula (5)^(b)And a cluster center V^(b+1)。

Step₃If V | |^(b+1)-V^(b)If | is less, stopping the operation; otherwise, let b be b +1, go to Step₂。

Fig. 2 is a flowchart illustrating an automatic confidence rule generation method based on fuzzy clustering according to the present invention, and the specific implementation steps thereof can be summarized as follows:

Step₁fuzzy clustering is carried out on a large amount of historical sample data by using a fuzzy C-means clustering algorithm to obtain a fuzzy clustering center V1.

Step₂Re-clustering sample data with elements less than 0.01 in the membership degree matrix to obtain a fuzzy clustering center V2.

Step₃Repeat Step₂And taking the obtained fuzzy clustering center data (V1 ∪ V2 ∪ …) as basic data for constructing an extended confidence rule base, and determining the number and utility values of the reference levels of each premise attribute and conclusion attribute.

Step₄Converting input-output pairs in the underlying data to expanded as shown in equation (6)And the confidence distribution representation form of the premise attribute and the conclusion attribute of the confidence rule base.

For input value x in input-output pair_iIts precondition attribute is U_iAccording to the formula (8), x can be expressed_iIs converted into U_iAs shown in equation (7):

S(x_i)＝{(A_ij,α_ij)；j＝1,2,L,J_i} (7)

wherein the content of the first and second substances,

wherein, for the ith precondition attribute U_iThe number of reference levels is J_iAmong these different reference levels, the effective value of the jth reference level is set to A_ij。α_ijFor a given valuex_iConversion to prerequisite Attribute U_iBy examining the value x_iThe degree of similarity to the corresponding reference level utility value is obtained.

Similarly, for the output value y in an input-output pair_iIts conclusion attribute is V_iAccording to the formula (10), y can be expressed_iConversion to V_iThe confidence distribution form of (2) is shown as formula (9):

S(y_i)＝{(D_ij,β_ij)；j＝1,2,L,N} (9)

wherein the content of the first and second substances,

Step₅Continuously executing Step₄Each input-output pair will generate an extended confidence rule until all underlying data has generated the corresponding confidence rule.

Step₆Finally, the rule weight and the premise attribute weight of each rule are calculated.

And (3) calculating the weight of the attribute of the precondition based on the incidence relation of the variables, namely calculating the Pearson correlation coefficient of each precondition attribute and each conclusion attribute respectively by using historical sample data, and normalizing the correlation coefficients to obtain the weight of the attribute of the precondition.

Calculating rule weight based on sample membership matrix, firstly calculating initial rule weight theta_p. Assuming that the pth rule is generated by the pth clustering center, the ratio of the number of sample data belonging to the pth class to the total number of the whole sample data is the initial rule weight theta_p. Then, according to the extended confidence ruleThe library method updates the rule weights according to the degree of inconsistency of the rules. The steps of updating the rule weights are as follows:

the first step is as follows: rule similarity is calculated.

First, the calculation formula (12) for similarity is obtained from the euclidean distance:

then:

Sim(X,Y)＝1-d(X,Y) (12)

wherein T represents the number of the premise attributes;

representing a confidence distribution of the t-th premise attribute in the ith rule; dⁱRepresenting the confidence distribution of the conclusion attribute of the ith rule.

SRC(i,k)＝Sim(Dⁱ,D^k) (16)

wherein, according toEquation (12) can calculate the similarity between the ith rule and the tth premise attribute of the kth rule in equation (15)

And similarity Sim (D) of conclusion attributes of the ith rule and the kth rule in equation (16)ⁱ,D^k)。

The second step is that: and calculating the rule inconsistency.

Cons(R_i,R_k)＝exp{-(SRA(i,k)/SRC(i,k)-1.0)²/(1/SRA(i,k))²} (17)

wherein, Cons (R)_i,R_k)∈[0,1]。

Then, the inconsistency of the ith rule for the whole rule base is calculated:

the third step: the rule weights are updated.

The formula for calculating the rule weight is as follows:

wherein the content of the first and second substances,

According to the above 6 steps, a confidence rule base can be constructed.

The method provided by the invention is compared with performance of a local parameter Optimization post-confidence Rule library model (L OBRB, &lTtTtranslation = L "&gTtL &lTt/T &gTttemporal Optimization Belief Base) and a Global parameter Optimization post-confidence Rule library model (GOBRB, Global Optimization Belief Base) to obtain a result shown in table 2.

The invention designs an automatic confidence rule extraction method based on fuzzy clustering, which can effectively solve the problem that sample data is difficult to select in an extended confidence rule base method, and designs a calculation method based on rule weight and precondition attribute weight of data, thereby reducing the dependence on expert experience knowledge. The method is beneficial to improving the generation efficiency and the reasoning precision of the confidence rule base, and lays a good foundation for realizing knowledge automation and control intelligence of the industrial process.

Claims

1. A confidence rule automatic generation method based on fuzzy clustering is characterized in that a fuzzy C-means clustering algorithm is used for carrying out fuzzy clustering on a large amount of historical sample data to obtain a fuzzy clustering center V1, the sample data with elements smaller than 0.01 in a membership degree matrix are clustered again to obtain a fuzzy clustering center V2, the sample data with elements smaller than 0.01 in the membership degree matrix are continuously clustered again until the elements in the membership degree matrix are not smaller than 0.01, and therefore final fuzzy clustering center data (V1 ∪ V2 ∪ …) is obtained;

the obtained fuzzy clustering center data (V1 ∪ V2 ∪ …) is used as basic data for constructing an extended confidence rule base, and input-output pairs in the basic data are converted into confidence distribution expression forms of the premise attributes and conclusion attributes of the extended confidence rule base, wherein the expression is shown as a formula (1):

with a regular weight θ_kand attribute weight_i

Wherein R is_kDenotes the kth rule, T_kIs the number of premise attributes in the kth rule, J₁The number of reference levels for the 1 st prerequisite attribute,

is the T th_kThe number of levels of each precondition attribute, N is the number of output result levels, and theta_kRule weight of the kth rule;_iattribute weight of the ith attribute;

a utility value representing the jth reference level of the ith precondition attribute in the kth rule, where i is 1,2, …, T_k,j＝1,2,…,J_i；J_iThe number of reference levels representing the ith precondition attribute;

the confidence of the corresponding reference level is indicated, where i is 1,2, …, T_k,j＝1,2,…,J_i；D_nN is 1,2 …, N is the evaluation level of the output result, β_n,kIndicates the evaluation result D in the k-th rule_nThe confidence of (2);

calculating a precondition attribute weight based on the incidence relation of the variables, namely calculating the Pearson correlation coefficient of each precondition attribute and the conclusion attribute respectively by using historical sample data, and performing normalization processing to obtain the precondition attribute weight;

then, rule weights are calculated based on the sample membership matrix, and initial rule weights theta are calculated firstly_p(ii) a Assuming that the pth rule is generated by the pth clustering center, the ratio of the number of sample data belonging to the pth class to the total number of the whole sample data is the initial rule weight theta_p(ii) a Then, according to the method of the extended confidence rule base, updating the rule weight according to the inconsistency of the rule; the steps of updating the rule weights are as follows:

the first step is as follows: calculating rule similarity;

first, a calculation formula (3) of similarity is obtained from the euclidean distance:

let X be { X ═ X₁,x₂,…,x_n}，Y＝{y₁,y₂,…,y_n}, the Euclidean distance of which is defined as formula (2):

then:

Sim(X,Y)＝1-d(X,Y) (3)

the rule similarity comprises a front piece similarity SRA and a back piece similarity SRC; two rules with only one conclusion attribute are represented by equations (4) and (5):

confidence score representing the t-th premise attribute in the ith ruleCloth; dⁱA confidence distribution representing an attribute of the conclusion of the ith rule;

the pre-piece similarity SRA and the post-piece similarity SRC of the two rules of equations (4) and (5) are defined as equation (6) and (7), respectively:

SRC(i,k)＝Sim(Dⁱ,D^k) (7)

wherein, the similarity between the ith rule in the formula (6) and the tth precondition attribute of the kth rule is calculated according to the formula (3)

And similarity Sim (D) of conclusion attributes of the ith rule and the kth rule in equation (7)ⁱ,D^k)；

The second step is that: calculating the rule inconsistency;

Cons(R_i,R_k)＝exp{-[SRA(i,k)/SRC(i,k)-1.0]²/[1/SRA(i,k)]²} (8)

wherein, Cons (R)_i,R_k)∈[0,1]；

Then, the inconsistency of the ith rule for the whole rule base is calculated:

the third step: updating the rule weight;

the formula for calculating the rule weight is as follows:

wherein the content of the first and second substances,

the updating weight of the p rule is obtained, and the value range of the constant lambda is (0, 1);

according to the steps, a confidence rule base is directly constructed from the data, and the automatic generation of the confidence rules is realized.