CN115018006A

CN115018006A - Dempster-Shafer framework-based classification method

Info

Publication number: CN115018006A
Application number: CN202210776766.3A
Authority: CN
Inventors: 肖富元; 何华平; 何立蜓
Original assignee: Individual
Current assignee: Individual
Priority date: 2022-07-04
Filing date: 2022-07-04
Publication date: 2022-09-06

Abstract

The invention discloses a classification method based on Dempster-Shafer framework, which comprises a discrimination framework, a quality function, a combination rule of evidence theory, posterior probability, fuzzy set theory, BJS difference between two BBAs, single case forms of PPT and PPT, an identification attribute framework, member calculation, single generation probability, training sample distribution, test sample distribution, BJS divergence discrimination, group discrimination probability, a weighting integral mechanism and BPA construction, the accuracy of the method is statistically superior to that of 11 methods, the decision result is more reliable and steady, the model is effective and reasonable, in addition, the method introducing the BJS divergence has high sensitivity to data change, thereby providing convenience for practical application, predicting that WFIG-DSF is particularly prominent in data fusion application, paying attention to data from different sensor sources, and considering the difference between the test sample and the training sample distribution, in order to make better decisions.

Description

Dempster-Shafer framework-based classification method

Technical Field

The invention belongs to the technical field of weighted fuzzy individual generation and group discrimination classification correlation based on a Dempster-Shafer framework, and particularly relates to a classification method based on the Dempster-Shafer framework.

Background

In D-S theory, an identification framework is defined as a complete set of incompatible basic propositions, a subset of each framework called a proposition. The D-S theory is a combination rule of multi-source information, and has a function of synthesizing basic results of a plurality of sensors as an output.

Disclosure of Invention

The invention aims to provide a classification method based on a Dempster-Shafer framework.

In order to achieve the purpose, the invention provides the following technical scheme:

a classification method based on a Dempster-Shafer framework comprises a discrimination framework, a quality function, a combination rule of an evidence theory, a posterior probability, a fuzzy set theory, a BJS difference between two BBAs, single case forms of PPT and PPT, an identification attribute framework, member calculation, a single generation probability, training sample distribution, distribution of test samples, BJS divergence discrimination, a group discrimination probability, a weighted integral mechanism and BPA construction.

Preferably, the discrimination framework flow is as follows:

let Θ denote a mutually exclusive set, which may contain an infinite number of elements, e.g., Θ ═ θ ₁ ，θ ₂ ，…，θ _j ，…，θ _N In which θ _j Is an element or event identifying the frame Θ, N is the number of elements, j is 1,2, …, N, the set of all subsets of Θ is called the quality function, and is defined as follows

Representing an empty set; { theta ] _i ，θ _j Indicates theta _i Or theta _j Event occurrence, for a system, we can use 2 no matter it is in any state ^Θ Represents it by an element of (1);

preferably, the quality function flow is as follows:

let m be from set 2 ^Θ Under theta [0, 1]]The focus element A represents any subset of the identification framework Θ, and m is defined as

The basic probability distribution function represents the initial distribution of trust established by the evidence, the basic probability distribution function of the event a is recorded as m (A) and is used for representing the trust degree of the evidence on the A, and m (A) is also commonly called a quality function and is constructed according to data obtained by detection or is given by people according to experience;

preferably, the flow of the combination rule of the evidence theory is as follows:

let E ₁ And E ₂ Is to identify two pieces of evidence under the theta framework, whose corresponding BPAs are m ₁ And m ₂ 。A _i And B _i Is the focus element. The combination rule of Desmpter is defined as

Wherein

The K value represents the degree of conflict between evidence sources, in most cases, for different evidences, two or more different BPAs are obtained due to different data sources, and at this time, in order to calculate the trust function and the likelihood function, two or more BPAs need to be synthesized into one BPA;

preferably, the posterior probability process is as follows:

let X _i (i ═ 1, 2.., p) is a p-independent feature, and X denotes a p-dimensional feature vector. Y is formed by C ═ C ₁ ，C ₂ ，...，C _N Is the same sort flag as the X state. Then posterior probability P (Y | X) _i )，i＝1…p，Y∈{C ₁ ，C ₂ ，...，C _N It can be defined as follows:

in this study, we constructed models using both individual generative models and population generative models. This approach provides an operable way to balance individual differences with overall differences, and fuzzy set theory is used as a criterion to describe membership.

Preferably, the fuzzy set theory process is as follows:

is provided with C _i (i ═ 1, 2.., n) is a class, and the basic probability of DST is defined as:

FST is an extension of the exact set. Compared with the exact set, the FST provides a standard for judging the attribution and non-attribution concepts, compared with the standard Bayes theory, the FST provides another flexible uncertainty modeling framework for processing uncertainty and complexity in practical application, because real-world samples are fuzzy, the factors can not be modeled by classical probability theory sometimes, so that the definition is not strict, and the FST has the using advantage,

the BJS divergence theory is the popularization of the KL divergence theory, mainly measures the similarity of two probabilities, solves the problem of asymmetric result in the KL theory, has good symmetry and boundary of the BJS divergence, and is more suitable for being applied in the evidence theory, although the gradient of a point is 0 when two distribution distances are far away, most of the problems encountered in the research are about the decision of high similarity, so that the use of the BJS divergence has certain advantages;

preferably, the BJS difference flow between the two BBAs is as follows:

suppose A _i Is one of the elements of the recognition framework m, assuming that there are two BBAsm under the same recognition framework ω ₁ And m ₂ ，m ₁ And m ₂ The BJS deviation in between is defined as:

wherein

And S (m) ₁ ，m ₂ )＝∑ _i m ₁ (A _i )＝1(i＝1，2，...，M；j＝1，2)，∑ _i m _j (A _i )＝1(i＝1，2，...，M；j＝1，2)，

After the transition, BJS can be represented as

Wherein H (m) _j )＝-∑ _i m _j (A _i )logm _j (A _i )(i＝1，2，...，M；j＝1，2)

The BJS divergence is similar to the JS divergence in form, but the BJS divergence replaces a probability distribution function with a quality function, which makes the BBA probability distribution when all assumptions of the confidence function are assigned to a single element, at which time the BJS divergence degenerates to the JS divergence;

preferably, the process of identifying the attribute framework is as follows:

let C be an identification framework comprising N mutually exclusive hypotheses Θ ═ C ₁ ，C ₂ ，...，C _N }, distribution value of information sources 2 ^Θ Reference number C considering combination element _i ，C _j The identification frame for which i ≠ j is not greater than 2 is represented as follows:

Ω＝{{C ₁ }，...，{C _N }，{C ₁ ，C ₂ }，...，{C _i ，C _j }，…，{C _N-1 ，C _N }}.

since some properties are similar, the representations on the gaussian profiles overlap, and complex assumptions are therefore proposed to represent such cases.

Preferably, the member calculation process is as follows:

is provided with

For different classes of membership for each object x,

for the variance of the class, ε is the sample mean, and the membership is calculated as follows:

dividing the generation probability calculated according to the membership degrees into the generation probability of single membership degree and the generation probability of composite membership degree, wherein the generation probability of single membership degree is represented by the membership degree of the category, and the generation probability of composite membership degree is calculated by the minimum value of t combination of two types of membership degree norms;

preferably, the single generation probability process is as follows:

is provided with

As a member of the combination, the sample to be tested is recorded as

For generating probabilities

Expressed, as follows:

preferably, the training sample distribution process is as follows:

there are n sets of training samples, n being a multiple of m, epsilon _P (C _i ) And σ _P (C _i ) C representing the jth group of samples in turn _i Sample mean and variance of attributes, training set distribution

Is defined as

Wherein,

the distribution characteristics of the training set samples conform to Gaussian distribution, and assuming that the kth sample in a database needs to be tested, a test set is constructed by using m elements according to a time sequence model

The text classification standard is classified according to each group m and is defined as

Preferably, the distribution flow of the test sample is as follows:

let the test set contain n elements, ε _Q ({C _i })，σ _Q ({C _i }) of samples in turn, which determines C for the j sample sets _i Quality, and training set distribution

Is defined as

The distribution characteristics of the training set samples conform to Gaussian distribution, and the method uses the BJS divergence to measure the difference between the training set and the test set;

preferably, the BJS divergence determination process is as follows:

the single attribute BJS divergence discrimination is expressed as:

the multi-attribute BJS divergence discrimination is defined as follows:

wherein,

preferably, the population discrimination probability process is as follows:

is provided with

Representing group discrimination probability under single attribute, for corresponding group discrimination probability of composite attribute

Is represented as follows:

preferably, the weighted integration mechanism flow is as follows:

let gamma denote the weighted integral result, and the heuristic algorithm is expressed by alpha and beta factors, defined as follows

Note that the learning factors (α, β) are different for different attribute classes;

preferably, the configuration BPA scheme is as follows:

let M _k For the final BPA, this is also a weighted normalized expression, as follows:

wherein,

note that since Γ itself is not equal to 1, it is not an explicit BPA, and therefore, the normalization factor Γ was proposed to assist Γ in the last generation of BPAs.

Compared with the prior art, the invention provides a classification method based on a Dempster-Shafer framework, which has the following beneficial effects:

the WFIG-DSF provided by the invention utilizes the BJS divergence theory and the data characteristics to determine the attributes of the evidence and capture the uncertainty among classes, and the application of the electroencephalogram data not only controls the distribution characteristics of the whole data set, but also focuses on the characteristics of individual representative data, so that the method effectively measures the uncertainty among the evidences and reduces the harmful conflict among classifiers. The method can well capture high-conflict information, maintain important complementarity among classifiers and improve the fusion performance of the classifiers, in addition, the WFIG-DSF is evaluated based on 12 data sets in a UCI machine learning library, and compared with other existing classification methods, finally, a conclusion is drawn, the method has wide application value, the accuracy of the method is statistically superior to the performance of 11 methods through the accuracy test of UCI data, the decision result is more reliable and steady, the model is effective and reasonable, in addition, the method introducing the BJS divergence has high sensitivity to data change, convenience is provided for practical application, the WFIG-DSF is predicted to be particularly prominent in data fusion application, the method pays attention to data from different sensor sources, and the difference between the test sample and the distribution of the training sample is considered, in order to make better decisions.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a technical scheme that:

The discrimination framework flow is as follows:

Representing an empty set; { theta ] _i ，θ _j Indicates θ _i Or theta _j Event occurrence, for a system, we can use 2 no matter it is in any state ^Θ Represents it by an element of (1);

the quality function flow is as follows:

let m be from set 2 ^Θ At theta [0, 1]]The focus element A represents any subset of the identification framework Θ, and m is defined as

the flow of the combination rule of evidence theory is as follows:

Wherein

The K value represents the degree of conflict between evidence sources, in most cases, for different evidences, two or more different BPA are obtained due to different data sources, and at the moment, in order to calculate the trust function and the likelihood function, the two or more BPA need to be synthesized into one BPA;

the posterior probability flow is as follows:

let X _i (i ═ 1, 2.., p) is a p-independent feature, and X denotes a p-dimensional feature vector. Y is formed by C ═ C ₁ ，C ₂ ，...，C _N Is the same sort flag as the X state. Then posterior probability P (Y | X) _i )，i＝1…p，Y∈{C ₁ ，C ₂ ，...，C _N The following can be defined:

The fuzzy set theory process is as follows:

the BJS difference flow between the two BBAs is as follows:

wherein

After the transition, BJS can be represented as

the process of identifying the attribute framework is as follows:

The member calculation process is as follows:

is provided with

For different classes of membership for each object x,

the single generation probability flow is as follows:

is provided with

As a member of the combination, the sample to be tested is recorded as

For generating probabilities

Expressed, as follows:

the training sample distribution process is as follows:

Is defined as

Wherein,

The distribution flow of the test samples is as follows:

let the test set contain n elements, ε _Q ({C _i })，σ _Q ({C _i }) represents the mean and variance of the samples in turn, which determines C for the j sample sets _i Quality, and training set distribution

Is defined as

the BJS divergence judging process comprises the following steps:

the single attribute BJS divergence discrimination is expressed as:

the multi-attribute BJS divergence discrimination is defined as follows:

wherein,

the population discrimination probability flow is as follows:

is provided with

Is represented as follows:

the weighted integration mechanism flow is as follows:

the BPA build flow is as follows:

wherein,

note that since Γ itself is not equal to 1, it is not an explicit BPA, the normalization factor Γ was proposed to assist Γ in the last generation of BPAs;

combining BPA according to D-S rules

In combining the BPA, we combine the BPA for each individual information source generated according to equation [ eq14] using D-S rules, resulting in an overall BPA.

Probability transformation

After the total BPA is obtained, the total BPA is converted to individual decision probabilities according to equation [ eq11 ].

Terminal decision making

And obtaining a final decision result through PPT, and outputting the most possible sample category.

The UCI data set (https:// area-beta. ics. UCI. edu) is a classic machine learning test data set, and is suitable for pattern recognition and machine learning research. We used 12 kinds of data in UCI dataset, i.e. Isir, Heart, distance, Wine, Australian, Climate, Heapatitis, Waveform, Parkinsons, Forest, Ionosphere, Spambase and Sonar, to compare with eight most advanced classifiers, which are Naive Bayes (NB), Nearest Mean Classifier (NMC), k nearest neighbor (k-NN), decision tree (REPTree), Support Vector Machine (SVM), support vector machine and radial basis function (SVM-RBF), multilayer perceptron (MLP), Radial Basis Function Network (RBFN), and four classification algorithms based on D-S theory to find that the classification accuracy of the WFIG-DSF is better than other methods. The classification algorithms based on the D-S theory are respectively a k-nearest neighbor D-S theory (kNNDST), a classifier based on Normal Distribution (NDBC), evidence correction (Evicalib) and a weighted fuzzy D-S framework (WFDSF).

In classifier fusion, the classification results of different classifiers tend to be highly conflicting, sometimes even completely contradictory, which is likely to result in unreasonable fusion results, i.e., errors. This fusion method handles conflict situations very well, which is a major advantage of this method, when the selected data sets contain missing information, in handling such situations, since the proposed algorithm contains a time series and a BJS divergence signature, these missing values can be fitted with the corresponding variables in the D-S theory, specifically, if the data contains a missing value of a certain data, the missing value is considered as an uncertain problem, and the confidence of the missing value is 0, i.e., m (c) is 1. Furthermore, this step results in an increase in the population probability and a decrease in the individual probability. In addition to this missing value, the remaining attributes will be collected periodically, in fact, in order to improve the classification result of the different classifiers closer to the accuracy. By properly fitting and modeling partial inaccuracies, the degree of conflict between classifiers can be reduced. By combining the conventional attributes of other classifiers, the relative reliability of each classifier is evaluated by utilizing the incompatibility with other classifiers, and finally, the classifier with higher conflict degree with other classifiers submits smaller relative reliability value.

As the process of information diversification increases, the problem of classification of information attributes is brought up to the agenda. The method using machine learning and classifiers is a method widely used in data fusion, provides a weighted fuzzy individual generation and group discrimination classification rule based on a Dempster-Shafer framework, takes BPA as a plurality of classifiers to support different attribute sets, constructs a new target classification identification method, introduces fuzzy Bayes and BJS divergence theories to aggregate and classify information sources, determines the probability of a data point entering a cluster according to the distance between the characteristic of the attribute and the class centroid, and calculates the fuzzy membership of the cluster for classification.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A classification method based on a Dempster-Shafer framework is characterized by comprising the following steps: the distinguishing and classifying method comprises the steps of distinguishing a frame, a quality function, a combination rule of an evidence theory, a posterior probability, a fuzzy set theory, a BJS difference between two BBAs, PPT, single case forms of the PPT, an identification attribute frame, member calculation, single generation probability, training sample distribution, test sample distribution, BJS divergence judgment, group distinguishing probability, a weighting integral mechanism and BPA construction.

2. The Dempster-Shafer framework-based classification method according to claim 1, characterized in that: the discrimination framework flow is as follows:

let Θ denote a mutually exclusive set, which may contain an infinite number of elements, e.g., Θ ═ θ ₁ ,θ ₂ ,…,θ _j ,…,θ _N In which θ _j Is an element or event identifying the framework Θ, N is the number of elements, j is 1,2, …, N, the set of all subsets of Θ is called the quality function, and is defined as follows

Representing an empty set; { theta ] _i ,θ _j Indicates θ _i Or theta _j Event occurrence, for a system, we can use 2 no matter it is in any state ^Θ Represents it by an element of (1);

the quality function flow is as follows:

the flow of the combination rule of the evidence theory is as follows:

Wherein

the posterior probability flow is as follows:

let X _i (i ═ 1,2, …, p) is a p-independent feature, and X represents a p-dimensional feature vector. Y is formed by C ═ C ₁ ,C ₂ ,…,C _N Is the same sort flag as the X state. Then the posterior probability P (Y |. X) _i ),i＝1…p,Y∈{C ₁ ,C ₂ ,…,C _N The following can be defined:

3. The Dempster-Shafer framework-based classification method according to claim 1, characterized in that: the fuzzy set theory process is as follows:

let C _i (i ═ 1,2, …, n) is a class, and the basic probability of DST is defined as:

the BJS difference flow between the two BBAs is as follows:

wherein

And S (m) ₁ ,m ₂ )＝∑ _i m ₁ (A _i )＝1(i＝1,2,…,M；j＝1,2)，∑ _i m _j (A _i )＝1(i＝1,2,…,M；j＝1,2)，

After the transition, BJS can be represented as

Wherein H (m) _j )＝-∑ _i m _j (A _i )logm _j (A _i )(i＝1,2,…,M；j＝1,2)

the process of identifying the attribute framework is as follows:

let C be an identification framework comprising N mutually exclusive hypotheses Θ ═ C ₁ ,C ₂ ,…,C _N }, distribution value of information sources 2 ^Θ Reference number C considering combination element _i ,C _j An identification frame with i ≠ j no greater than 2 is expressed as follows:

Ω＝{{C ₁ },…,{C _N },{C ₁ ,C ₂ },…,{C _i ,C _j },…,{C _N-1 ,C _N }}.

4. The Dempster-Shafer framework-based classification method according to claim 1, characterized in that: the member calculation process is as follows:

is provided with

For different classes of membership for each object x,

the single generation probability process is as follows:

is provided with

As a member of the combination, the sample to be tested is recorded as

For generating probabilities

Expressed, as follows:

the training sample distribution process is as follows:

Is defined as

Wherein,

The text classification standard is classified according to m of each group and is defined as

5. The Dempster-Shafer framework-based classification method according to claim 1, characterized in that: the distribution flow of the test samples is as follows:

let the test set contain n elements, ε _Q ({C _i }),σ _Q ({C _i }) of samples in turn, which determines C for the j sample sets _i Quality, and training set distribution

Is defined as

the BJS divergence judging process is as follows:

the single attribute BJS divergence discrimination is expressed as:

the multi-attribute BJS divergence discrimination is defined as follows:

wherein,

6. the Dempster-Shafer framework-based classification method according to claim 1, characterized in that: the population discrimination probability process is as follows:

is provided with

Is represented as follows:

7. the Dempster-Shafer framework-based classification method according to claim 1, characterized in that: the weighted integration mechanism flow is as follows:

let 'r' denote the weighted integration result, and the heuristic algorithm is expressed by alpha and beta factors, defined as follows

the BPA configuration flow is as follows:

wherein,

note that since Γ itself is not equal to 1, it is not an explicit BPA, the normalization factor Γ is proposed to assist Γ in the last generation BPA.