CN108921299A - Cost-sensitive classification method based on sequential three decisions - Google Patents

Cost-sensitive classification method based on sequential three decisions Download PDF

Info

Publication number
CN108921299A
CN108921299A CN201810623608.8A CN201810623608A CN108921299A CN 108921299 A CN108921299 A CN 108921299A CN 201810623608 A CN201810623608 A CN 201810623608A CN 108921299 A CN108921299 A CN 108921299A
Authority
CN
China
Prior art keywords
cost
decision
layer
granularity
definition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810623608.8A
Other languages
Chinese (zh)
Inventor
方宇
闵帆
杨新
刘忠慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Petroleum University
Original Assignee
Southwest Petroleum University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Petroleum University filed Critical Southwest Petroleum University
Priority to CN201810623608.8A priority Critical patent/CN108921299A/en
Publication of CN108921299A publication Critical patent/CN108921299A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of cost-sensitive classification methods based on sequential three decisions; sequential three decisions embody the advantage of Information Granulating and cost sensitive learning; wherein Information Granulating is human cognitive and the basis that decision executes, and cost is then an important factor for information processing is related to.It is proposed sequential three decision models for being directed to cost sensitive learning.Firstly, the relationship between Information Granulating and decision cost is defined and is described;Then, from the visual angle of sequence decision process, cost function is constructed using the cost matrix of different grain size level;Finally, being balancing decision result cost and decision process cost, two optimization problems is proposed, and theoretically elaborate its meaning, the validity of algorithm is demonstrated from analysis of experimental results, embody advantage of sequential three decisions in cost-sensitive classification problem.

Description

Cost sensitive classification method based on three sequential decisions
Technical Field
The invention belongs to the technical field of data mining and machine learning, and relates to a cost-sensitive classification method based on three sequential decisions.
Background
Cost-sensitive learning (Cost-sensitive learning) is an important research topic of data mining and machine learning, and the main purpose of the Cost-sensitive learning is to deal with various Cost problems generated in the decision making process. Cost sensitive learning problems have universality in real-life production, for example: medical diagnostics, robotics, industrial processes, communication network fault diagnostics, and the like. According to Hunt et al, two main categories of costs are of concern in cost sensitive learning studies: and determining the misclassification cost and the test cost of the object attribute. Turney classifies costs in inductive concept learning research and provides context for cost-sensitive learning research. Many studies indicate that cost sensitive learning is important and necessary in the decision making process.
Three-way decisions (3 WD) have been developed vigorously in the last decade as an important decision methodology for decision-making cognition and rule learning in humans. The three-branch decision is mainly composed of two closely interleaved tasks: divide fen and treat three diseases. Trisection refers to dividing a domain of discourse into three regions (such as a region I, a region II and a region III) which are not intersected in pairs; triple therapy refers to the use of different treatment modalities (e.g., strategy i, strategy ii, strategy iii) for objects in three different domains. In different research contexts, specific construction and explanation of the two tasks of the third-degree and the third-degree are carried out in many researches, and a large number of specific models and applications related to three-branch decision making are provided. On the extended model and the optimized model, the related researches are as follows: decision-making rough sets, probability rough sets, game theory rough sets, interval sets, fuzzy interval sets, three-branch decision-based incomplete information systems, three-branch decisions based on statistics, three-branch concept lattices, and the like. In terms of application, relevant studies are: clinical diagnosis, treatise peer review, government and investment decisions, text classification, mail filtering, recommendation systems, cluster analysis, face recognition, attribute reduction, and the like.
Disclosure of Invention
The invention aims to provide a sequential three-branch decision model aiming at cost-sensitive learning, so that a cost-sensitive classification method based on sequential three-branch decision is provided, and the advantages of the sequential three-branch decision on the cost-sensitive classification problem are verified.
The invention is realized by the following technical scheme:
the cost-sensitive classification method based on the three sequential decisions comprises the following operations:
1) the relationship between information graining and decision cost is defined and described as follows:
1.1) in the S3WD model, supposing that a domain is composed of independent elements, the domain space has n +1, n is more than or equal to 1 layer of granularity, and the {0,1,2, …, n } index set identifies the n +1 layer; layer sequences n to 0, marking the granularity layers of the information particles from the thickest to the finest; there is a full-order relationship for multiple descriptions of granularity layersNamely: Des0(x) Is the finest description of object x, Desn(x) Is the coarsest description;
for a specific layer pair UiI is more than or equal to 0 and n is more than or equal to 1, the three divisions are carried out, and an evaluation function v is introducedi(Desi(x) And threshold value pairs (α)ii) Definition 1 and definition 2 are given for the S3WD model;
1.2) defining a cost function in S3WD from the perspective of grain calculation, and interpreting information granulation; obtaining an overall description of a system or problem by aggregating information grains having the same grain size, the aggregation of these grains forming a grain size, the process of constructing a grain size being referred to as granulation of the system or problem at a specific layer;
let [ x ] in]AExpressed as information particles, g (a) is the partition of domain of discourse U, where a is expressed as a subset of conditional attribute C; for the decision table, giving a multi-granularity space construction and interpretation definition 3 and a definition 4 of the decision table;
1.3) in the proposed S3WD model, before making a clear decision, there are a series of attribute testing and delay decisions, and the corresponding costs are testing costs and delay costs, and the variation process of the cost function between different granularity layers can be identified as repeatable sequence operation between two adjacent layers;
giving a decision table S, wherein n +1 exists, n is more than or equal to 1 layer of granularity, and the S3WD cost structure on S is defined as definition 5;
2) from the view point of the sequence decision process, a cost function is constructed by using cost matrixes of different granularity levels;
the division of three domains at 4 different granularities and the corresponding decision cost are shown in the table.
3) Providing two optimization problems and an explanatory algorithm for balancing the decision result cost and the decision process cost;
the decision process cost and the decision process cost are in a trade-off relationship, and a balance point is searched between the two costs by adopting the following two models;
3.1) a minimum decision result cost sequential three-branch decision model, on the basis that a decision maker sets a decision process cost upper limit, finding object partition under a granularity layer of the minimum decision result cost in the S3WD process, and partitioning by definition 6;
3.2) a minimum decision process cost sequential three-branch decision model, wherein the object division under the minimum decision process cost related granularity layer is divided by defining 7;
by using the two models and the corresponding algorithm, two costs can be balanced in real life so as to make a decision which accords with the reality.
Compared with the prior art, the invention has the following beneficial technical effects:
(1) the method aims to provide a cost-sensitive sequential three-branch decision model under the concept of Granular Computing (GrC) through the driving, explanation and implementation of three-branch decisions.
(2) The feature that different attribute numbers reflect different description capacities of the system is utilized to construct an equivalent set, so that the information system is granulated, and a multi-granularity space with an order relation is further formed.
(3) The method researches decision cost related to a process of constructing a multi-granularity space by three sequential decisions, provides a cost function according to a granulation process and a granularity coarsening-thinning principle of information acquisition, constructs a reasonable cost structure for evaluating the three sequential decisions by using a cost matrix, and reasonably explains a threshold value pair (α).
(4) In the change process of the granularity from coarse to fine, the cost of the decision result has the characteristic of non-monotonic decrease, and the cost of the decision process has the characteristic of monotonic increase; therefore, two optimization problems are provided, and the optimization target is as follows: the user sets an upper cost limit, minimizes the other cost and divides the objects of the information system into three parts. Two different attitudes of people to treat risks in real life are reflected.
Detailed Description
The present invention will now be described in further detail with reference to specific examples, which are intended to be illustrative, but not limiting, of the invention.
1-cost sensitive sequential three-branch decision model
This section mainly discusses a cost-sensitive method driven by particle computation to make three sequential decisions. Firstly, starting from a particle calculation concept, the definition of the sequential three-branch decision is introduced, then the structure of a multi-granularity space of a sequential three-branch decision model is given, finally a cost structure in the sequential three-branch decision is given, and a cost sensitive sequential three-branch decision model is provided.
1.1 Sequential three-way decision Sequential three-way decisions (S3 WD) theory as a product under the concept of granular computing, the goal is to provide a flexible mechanism and method that allows the user to make appropriate decisions during the information-pelletization sequence. Since the S3WD process requires less time and information, the S3WD has advantages of less cost and faster decision speed in comparison with 3 WD. For example, the true state of an object is determined by examining its description or related information. However, such useful information is often incomplete and ambiguous; the set of conditions may be non-standard; the description about the object has ambiguity. A delayed decision is required and a clear decision can only be made if the details of the information are sufficient to support decision accuracy. Such a decision-making approach is the basic principle of human cognition and decision-making, which is also the basis of the S3WD model.
In the S3WD model, the domain is assumed to be composed of independent elements, the domain space has n +1, n ≧ 1 layer of granularity, and the {0,1,2, …, n } index set identifies n +1 layer. The layer sequence n to 0 indicates the layer of granularity of the information particles from the coarsest to the finest. If the object x e U has one description under each granularity layer, a plurality of descriptions corresponding to the granularity layers have a full-order relationNamely:Des0(x) Is the finest description of object x, Desn(x) Is the coarsest description. For a specific layer pair UiI is more than or equal to 0 and n is more than or equal to 1, the three divisions are carried out, and an evaluation function v is introducedi(Desi(x) And threshold value pairs (α)ii). The following basic definitions (including definitions 1 and 2) are given for the S3WD model.
Definition 1 suppose that the domain U has n +1, n ≧ 1 layer of granularity,is a slave UiTo the full order setGiven a threshold value pair (α)ii∈Li) And isIn a specific layer i, 1. ltoreq. i.ltoreq.n, LiCan be divided into three two disjoint domains:
wherein,are boundary domains, and objects in the boundary domains are all delayed for decision-making. As more detailed information is obtained from the lower layers, the boundary domain size will gradually decrease and the object will be partitioned from BND to POS and NEG. Finally, S3WD implements a simple two-tap decision at level 0.
Definition 2 at layer 0, L0Two disjoint domains can be divided:
wherein the threshold value gamma0∈L0Indicating that the two domains are partitioned based on the threshold.
1.2 construction of a sequential three-decision multi-granularity space the cost function in S3WD is defined from the perspective of grain computation, and it is first necessary to make a reasonable interpretation of information granulation. An information grain represents a subset of objects of a U that describes a sub-part of a system or problem, and the granularity of the information grain expresses its generalization or abstraction capabilities. By aggregating information grains of the same size, a global description of a system or problem can be obtained, the aggregation of grains forming a size, the process of constructing a size being referred to as granulation of a system or problem at a particular level. Meanwhile, a large pellet is supported by a plurality of small pellets.
The equivalence class [ x ] is usually used]AThe description capability of the object is gradually enhanced by increasing the number of the object attributes, so that the coarse-to-fine granulation process reflects the concept and the basic method of multi-granularity. Through different equivalence classes and equivalence class division on the basis of the equivalence classes, a multi-granularity space of S3WD can be constructed.
Let [ x ] in]ADenoted as info particles, g (a) is the partition to domain of interest U, where a is denoted as a subset of conditional attribute C. For the decision table, granulation is defined as follows:
definition 3 given a decision table S ═ (U, At ═ C ∪ D, { V ═ Va|a∈At},{Ia| a belongs to At }), assuming that n +1 exists, n is more than or equal to 1 layer of granularity, and the granulation definition on the decision table S is as follows:
wherein g (A)i) Partitioning U for a set of information particles of a particular uniform size, AiI is more than or equal to 1 and less than or equal to n, is a subset of conditional attributes and satisfies a condition,
example 1 given a decision table S, as shown in table 1, a multi-granularity space can be constructed for the decision system according to definition 3.
TABLE 1 decision information Table
In table 1, there are 6 objects, 2 decision tags and 4 conditional attributes, and the equivalence classes are constructed in an incremental manner from 1 to 4 attributes under different attribute sets. Let g (A)i) And i-1 is more than or equal to 0 and less than or equal to 3, which is the set division of the information particles under the ith layer granularity. Thus, a 4-level granularity space can be constructed, for example, assuming a simple attribute number increase process of A0={a1},A1={a1,a2},A2={a1,a2,a3},A3={a1,a2,a3,a4The coarsest particle size is g (A)0) The finest particle size is g (A)3). The granulation was as follows:
layer 3, g (A)0)=g({a1})={{x1,x2,x4,x6},{x3,x5}};
Layer 2, g (A)1)=g({a1,a2})={{x1,x2,x6},{x3,x5},{x4}};
Layer 1, g (A)2)=g({a1,a2,a3})={{x1,x2,x6},{x3},{x5},{x4}};
Layer 0, g (A)3)=g({a1,a2,a3,a4})={{x1,x2},{x3},{x5},{x4},{x6}}。
In example 1, it can be observed that two adjacent layers have a partial order relationship, and the partial order relationship has transitivity.
Definition 4 gives a decision table S with n +1, n ≧ 1 layer granularity,it means that the ith layer is granulated,i-1-j-1-n, which represents the jth layer granulation, if the following relationship is satisfiedThe following holds true:
wherein P and Q are eachAnd (4) collecting the granulated granules.
In example 1, g (A)0),g(A1),g(A2),g(A3) Is a granulation process that constitutes 4 layers of particle size. According to definition 4, it can be concludedg(A0) With two grains { x1,x2,x4,x6And { x }3,x5},g(A1) With three particles { x1,x2,x6},{x3,x5And { x }4}. For the expression in g (A)0) Can find a g (A) in any particle Q1) InOf particles P of (A) such thatFor example, { x1,x2,x6}∈{x1,x2,x4,x6Either { x } or { x }4Is larger than { x1, x2, x4, x6 }. Likewise, can obtainThus, it is easy to proveSuch a relationship (proving abbreviation), therefore, is defined in definition 4Has transferability.
For multi-granularity spatial construction and interpretation of decision tables, definition 3 and definition 4 give basic semantics. It is true that the construction and understanding of the S3WD multi-granularity layer introduces the cost structure and the threshold value pair on the basis of the above.
1.3 sequential three-branch decision model cost structure design there are mainly two kinds of costs in the S3WD model, one is decision result cost (which can be seen as risk generated by object misclassification), as shown in table 2; another cost is the decision process cost (which can be viewed as the cost of describing an object or deriving the value of an object's attribute). However, in most 3WD studies, the decision-making process cost is often neglected. In the design of the cost structure, the cost of the decision process is a key factor, and only if a good definition is given to the cost, the advantage of the S3WD in the decision method can be reflected.
TABLE 2 decision result cost matrix
In the proposed S3WD model, before making a clear decision, there are a series of attribute testing and delay decisions (in the decision process), and the corresponding costs are testing costs and delay costs, and as shown in table 3, the variation process between different granularity layers with respect to the cost function can be identified as a repeatable sequence operation between two adjacent layers.
TABLE 3 decision process cost vector
Definition 5 given a decision table S with n +1, n ≧ 1 layer granularity, the S3WD cost structure on S can be defined as:
meaning C is T ═ C (COST)R,COSTP) (5)
Wherein, COSTRAnd COSTPRespectively representing the cost of the decision result and the cost of the decision process.
Assuming that the granulation is performed on S, there is an order of attribute aggregation: a. the0,A1,…,An(ii) a Wherein g (A)i) I-1 ≦ 0 ≦ n represents the set of information particles at the nth-i level, and then at this level, the decision result cost for object x is:
COSTR=λ(a(x)|g(Ai)) (6)
set of information particles g (A)i) The cost of the decision result is:
wherein g (A)i) As in definition 3, a (x) denotes the decision made for object x, i.e. aP,aN,aBIs in one of λ ∈ { λ }PPPNBPBNNPNN}。
The decision process cost for object x is:
COSTP(x)=(tc(x),dc(x)) (8)
wherein, tc (x) and dc (x) respectively represent attribute testing cost and object delay cost, and the cost function is represented as:
it is noted that, at a granularity, the attributes of a plurality of information granules are tested simultaneously, and the attributes of the objects are considered to be independent of each other, so that the information granule knot set g (A) on the ith layer can be obtainedi) The cost of the decision process of (assuming that S3WD is divided at the nth layer, i.e. the coarsest layer, let):
2 cost sensitive sequential three-branch decision model example
This section demonstrates the decision-making process of S3WD primarily through an algorithm. Construction of the cost structure provides semantics and interpretation for the S3WD model. The cost-sensitive S3WD model has theoretical significance and application scenarios.
In example 1, g (A)0),g(A1),g(A2),g(A3) Assuming that the cost function is given in table 2 and table 3, the decision result cost matrix at each layer of granularity is the same for simplifying the computation complexity, therefore, the threshold α -0.75 and β -0.4 can be computed, and the detailed description of the threshold on the specific computation process in the literature about decision rough set and three-branch decision is provided[7-8,11,14]And will not be described in detail herein. Now, according to equation (7), the decision result cost at each granularity level of the decision table (as shown in table 1) is calculated.
(1) At level 3, for object x1The conditional probability is:
thus, according to definition 1,likewise, all objects of this layer can be judged, whereby three domains can be divided into:
according to formula (7):
COSTR(g(A0))=0×($5×0.5)+4×($1×0.5+$2×0.5)+2×($4×0.5)=$10
(2) at layer 2, the objects that have been divided into the domains POS and NEG are not considered any more, and only further judgment is needed for the objects in the BND at layer 3, and the same calculation is performed at layer 3, and the three domains can be divided into:
COSTR(g(A1))=$6.67
(3) at level 1, three domains can be divided into:
COSTR(g(A2))=$4
(4) at level 0, three domains can be divided into:
COSTR(g(A3))=$4
from equation (11), the decision process cost at each granularity level can be calculated.
1. At level 3, there are 1 attribute { a }1Test, 6 subjects judged, therefore, COSTP(g(A0))=($18,12min)。
2. at level 2, there are 2 attributes { a }1,a2Test, 4 subjects judged, therefore, COSTP(g(A1))=($32,32min)。
3. at level 1, there are 3 attributes { a }1,a2,a3Test, 3 subjects decided, and therefore, COSTP(g(A2))=($36,30min)。
4. at level 0, there are 4 attributes { a }1,a2,a3,a4Test, 3 subjects decided, and therefore, COSTP(g(A3))=($51,30min)。
finally, the division situation of three domains under 4 different granularities and the corresponding decision cost are obtained, as shown in table 4.
TABLE 4 sequential three-decision example results
Cost of decision 3
In this section, two optimization problems are proposed on the basis of discussing the relationship and respective characteristics of the cost of the decision result and the cost of the decision process, and an explanatory algorithm is given to each optimization problem.
3.1 decision result COST and decision process COST from the example in section 2, COST can be observed as granularity becomes thinnerRThe population shows a decreasing trend, but some attributes are reducible if further analysis finds it, so even if they are examined, the classification accuracy cannot be improved, which leads to COSTRThe non-monotonicity is fully proved in partial documents and is not described in detail herein.
For COSTPThe population exhibits a monotonically increasing trend, namely: COSTP(g(An))>COSTP(g(An-1))>…>COSTP(g(A0)). This property can be explained for two reasons: 1. the test cost increases as the number of attributes increases. 2. The delay penalty accumulates as layers increase.
In general, the decision process cost and the decision process cost are a trade-off relationship. A decision maker can make a quick decision at a coarse granularity, and the decision making process is low in cost, but the decision making result is high in cost; on the contrary, a decision maker can make a decision at a fine granularity, the decision result is low in cost and high in classification precision, but the decision process is high in cost. Therefore, finding a balance point between these two costs becomes the key to the decision maker to effectively utilize the S3WD method.
3.2 in some practical applications, the three-branch decision model with the minimum decision result cost is generally more concerned than the decision process cost, so as to avoid high-risk decision. For example, in the process of diagnosing a suspected cancer patient, the patient is under X-ray examination, but the information obtained by the examination is not enough for the doctor to make a judgment, and the patient needs to further perform MRI examination in order to avoid misdiagnosis (high risk), which results in an increased cost of the decision making process.
In this case, the goal of the optimization is the minimum decision result cost. One possible approach is to find the object partition at the granularity level (i.e., 3WD at that level) with the smallest decision result cost in the S3WD process based on the decision maker setting the upper limit of the decision process cost.
Defining 6a decision table S with n +1, n ≧ 1 layer granularity, and setting the upper limit of cost in decision process In S3WD, the minimum decision result cost is associated with the object partition in the granularity layer, and the following condition is satisfied if and only if:
(1)
(2)
wherein G has the meaning of definition 3,respectively representing three decisions on j and i layers, wherein i is more than or equal to 0 and less than j and less than or equal to n.
Algorithm 1 minimum decision result cost S3WD algorithm
Inputting: the decision table S, n +1, n is more than or equal to 1 layer granularity, and the COST structure C is defined as T ═ COST (COST)R,COSTP) Upper limit of decision process cost
And (3) outputting: minCOSTRAnd three domains POS, BND, NEG below the relevant granularity layer.
①begin
③i=n+1;Un+1=U;
④Compute the pair of thresholds(α,β);
⑤whileandi>0do
⑥i=i-1;
⑦foreachdo
⑧ifthen
⑨ifthen
⑩BND=Ui-POS-NEG;
Ui=BND;
ComputeCOSTPandCOSTR
ifandthen
foreachdo
ifthen
ifthen
ComputeCOSTR
returnCOSTR,POS,BND,NEG.
3.3 minimum decision process cost sequential three-branch decision model with respect to the minimum decision result cost S3WD model, the decision process cost S3WD model also has relevant application scenarios, and for reasons of article space, only definitions and algorithms are given below.
Defining 7 a decision table S with n +1, n ≧ 1 layer granularity, and setting upper limit of cost of decision result In S3WD, the minimum decision process cost is associated with the object partition at the granularity level, and the following conditions are satisfied:
(1)
(2)
wherein G has the meaning of definition 3,respectively representing three decisions on j and i layers, wherein i is more than or equal to 0 and less than j and less than or equal to n.
Algorithm 2 minimum decision process cost S3WD algorithm
Inputting: the decision table S, n +1, n is more than or equal to 1 layer granularity, and the COST structure C is defined as T ═ COST (COST)R,COSTP) Upper limit of cost of decision result
And (3) outputting: minCOSTPAnd three domains POS, BND, NEG below the relevant granularity layer.
①begin
③i=n+1;Un+1=U;
④Compute the pair of thresholds(α,β);
⑤whileandi>0do
⑥foreachdo
⑦ifthen
⑧ifthen
⑨BND=Ui-POS-NEG;
⑩Ui=BND;
i=i-1;
ComputeCOSTPandCOSTR
ifandthen
foreachdo
ifthen
ifthen
ComputeCOSTP
returnCOSTP,POS,BND,NEG.
The two models proposed in sections 3.2 and 3.3 reflect two risk attitudes and two decision strategies in the decision making process of a decision maker, and by applying the two models and corresponding algorithms, two costs can be balanced in real life so as to make a decision which accords with reality.
4 experiment
4 UCI data sets are selected for algorithm verification, and the decision attributes of each data set are assumed to be of two types and have no missing value. Basic information of the data set, decision result cost and decision process cost setting are shown in table 5.
TABLE 5UCI data set basis information and cost ceiling setting
Algorithms 1 and 2 were run 3000 times on each dataset using chi-squared values χ2Verifying the validity of the algorithm using approximate qualityThe classification effect is evaluated, and the effectiveness of the three sequential decisions on the classification problem is verified by using the layer number of the granularity. Matlab R2016a was used as a platform for algorithm implementation. The experimental statistics are shown in table 6.
TABLE 6 chi-square, mean and standard deviation of the number of layers of approximate masses and particle sizes
The invention considers the process of cost-sensitive learning and S3WD from the point of particle calculation, and provides a cost-sensitive classification model and a classification method under S3 WD. Two types of costs in the S3WD model and the relationship between the two types of costs are discussed, and two optimization problems and related algorithms are proposed on the basis of the two types of costs. Analysis of theoretical and experimental results shows that: cost sensitive S3WD is more advantageous in practical applications.

Claims (6)

1. The cost-sensitive classification method based on three sequential decisions is characterized by comprising the following operations:
1) the relationship between information graining and decision cost is defined and described as follows:
1.1) in the S3WD model, a domain is assumed to be composed of independent elements, a domain space has n +1, n is more than or equal to 1 layer of granularity, and a {0,1, 2.., n } index set identifies an n +1 layer; layer sequences n to 0, marking the granularity layers of the information particles from the thickest to the finest; there is a global ordering relationship ≦ for multiple descriptions of the corresponding granularity layer, i.e.: des0(x)≤Des1(x)≤…≤Desn(x),Des0(x) Is the finest description of object x, Desn(x) Is the coarsest description;
for a specific layer pair UiI is more than or equal to 0 and n is more than or equal to 1, the three divisions are carried out, and an evaluation function v is introducedi(Desi(x) And threshold value pairs (α)i,βi) Definition 1 and definition 2 are given for the S3WD model;
1.2) defining a cost function in S3WD from the perspective of grain calculation, and interpreting information granulation; obtaining an overall description of a system or problem by aggregating information grains having the same grain size, the aggregation of these grains forming a grain size, the process of constructing a grain size being referred to as granulation of the system or problem at a specific layer;
let [ x ] in]AExpressed as information particles, g (a) is the partition of domain of discourse U, where a is expressed as a subset of conditional attribute C; for the decision table, giving a multi-granularity space construction and interpretation definition 3 and a definition 4 of the decision table;
1.3) in the proposed S3WD model, before making a clear decision, there are a series of attribute testing and delay decisions, and the corresponding costs are testing costs and delay costs, and the variation process of the cost function between different granularity layers can be identified as repeatable sequence operation between two adjacent layers;
given a decision table S with n +1, n ≧ 1 layer granularity, the S3WD cost structure on S is defined as definition 5:
2) from the view point of the sequence decision process, a cost function is constructed by using cost matrixes of different granularity levels;
the division of three domains at 4 different granularities and the corresponding decision cost are shown in the table.
3) Providing two optimization problems and an explanatory algorithm for balancing the decision result cost and the decision process cost;
the decision process cost and the decision process cost are in a trade-off relationship, and a balance point is searched between the two costs by adopting the following two models;
3.1) a minimum decision result cost sequential three-branch decision model, on the basis that a decision maker sets a decision process cost upper limit, finding object partition under a granularity layer of the minimum decision result cost in the S3WD process, and partitioning by definition 6;
3.2) a minimum decision process cost sequential three-branch decision model, wherein the object division under the minimum decision process cost related granularity layer is divided by defining 7;
by using the two models and the corresponding algorithm, two costs can be balanced in real life so as to make a decision which accords with the reality.
2. The sequential three-decision based cost-sensitive classification method according to claim 1, characterized in that said definition 1 is:
suppose that the discourse domain U has n +1, n is more than or equal to 1 layer of granularity, viIs a slave UiTo the full ordered set (L)i,≤i) Given a threshold value pair (α)i,βi∈Li) And βiiαiIn a specific layer i, 1. ltoreq. i.ltoreq.n, LiCan be divided into three two disjoint domains:
wherein,is a boundary domain, and objects in the boundary domain are all delayed for decision; as more detail information is obtained from the lower layer, the size of the boundary domain is gradually reduced, and the object is divided into POS and NEG from BND; finally, S3WDImplementing a simple two-branch decision at layer 0;
definition 2 at layer 0, L0Two disjoint domains can be divided:
wherein the threshold value gamma0∈L0Indicating that the two domains are partitioned based on the threshold.
3. The sequential three-decision based cost-sensitive classification method according to claim 1, characterized in that said definition 3 is:
given a decision table S ═ (U, At ═ C ∪ D, { V ═ Va|a∈At},{Ia| a belongs to At }), assuming that n +1 exists, n is more than or equal to 1 layer of granularity, and the granulation definition on the decision table S is as follows:
wherein g (A)i) Partitioning U for a set of information particles of a particular uniform size, AiI is more than or equal to 1 and less than or equal to n, is a subset of conditional attributes and satisfies a condition,
the definition 4 is:
giving a decision table S with n +1, n ≧ 1 layer granularity,it means that the ith layer is granulated,i-1-j-1-n, which represents the jth layer granulation, if the following relationship is satisfiedThe following holds true:
wherein P and Q are eachAnd (4) collecting the granulated granules.
4. The sequential three-decision based cost-sensitive classification method according to claim 1, characterized in that said definition 5 is:
COST=(COSTR,COSTP) (5)
wherein, COSTRAnd COSTPRespectively representing the cost of a decision result and the cost of a decision process;
set on S for granulation, there is an attribute set order of: a. the0,A1,…,An(ii) a Wherein g (A)i) I-1 ≦ 0 ≦ n represents the set of information particles at the nth-i level, and then at this level, the decision result cost for object x is:
COSTR=λ(a(x)|g(Ai)) (6)
set of information particles g (A)i) The cost of the decision result is:
wherein g (A)i) As in definition 3, a (x) denotes the decision made for object x, i.e. aP,aN,aBIs in one of λ ∈ { λ }PP,λPN,λBP,λBN,λNP,λNN};
The decision process cost for object x is:
COSTP(x)=(tc(x),dc(x)) (8)
wherein, tc (x) and dc (x) respectively represent attribute testing cost and object delay cost, and the cost function is represented as:
under one granularity, the attributes of a plurality of information grains are tested simultaneously, and the attributes of the considered objects are independent from each other, so that the information grain knot set g (A) on the ith layer can be obtainedi) The cost of the decision process of (assuming that S3WD is divided at the nth layer, i.e. the coarsest layer, let
5. The sequential three-decision based cost-sensitive classification method according to claim 1, characterized in that said definition 6 is:
giving a decision table S with n +1 and n ≧ 1 layer granularity, and setting the upper limit of the decision process cost In S3WD, the minimum decision result cost is associated with the object partition in the granularity layer, and the following condition is satisfied if and only if:
(1)
(2)
wherein G has the meaning of definition 3,respectively representing three decisions on j and i layers, wherein i is more than or equal to 0 and less than j and less than or equal to n.
6. The sequential three-decision based cost-sensitive classification method according to claim 1, characterized in that said definition 7 is:
giving a decision table S with n +1 and n ≧ 1 layer granularity, and setting the upper limit of the decision result cost In S3WD, the minimum decision process cost is associated with the object partition at the granularity level, and the following conditions are satisfied:
(1)
(2)
wherein G has the meaning of definition 3,respectively representing three decisions on j and i layers, wherein i is more than or equal to 0 and less than j and less than or equal to n.
CN201810623608.8A 2018-06-15 2018-06-15 Cost-sensitive classification method based on sequential three decisions Pending CN108921299A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810623608.8A CN108921299A (en) 2018-06-15 2018-06-15 Cost-sensitive classification method based on sequential three decisions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810623608.8A CN108921299A (en) 2018-06-15 2018-06-15 Cost-sensitive classification method based on sequential three decisions

Publications (1)

Publication Number Publication Date
CN108921299A true CN108921299A (en) 2018-11-30

Family

ID=64421286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810623608.8A Pending CN108921299A (en) 2018-06-15 2018-06-15 Cost-sensitive classification method based on sequential three decisions

Country Status (1)

Country Link
CN (1) CN108921299A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461241A (en) * 2020-04-08 2020-07-28 南通大学 Multi-target cost sensitive attribute reduction algorithm based on specific classes
CN111814737A (en) * 2020-07-27 2020-10-23 西北工业大学 Target intention identification method based on three sequential decisions

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111461241A (en) * 2020-04-08 2020-07-28 南通大学 Multi-target cost sensitive attribute reduction algorithm based on specific classes
CN111814737A (en) * 2020-07-27 2020-10-23 西北工业大学 Target intention identification method based on three sequential decisions
CN111814737B (en) * 2020-07-27 2022-02-18 西北工业大学 Target intention identification method based on three sequential decisions

Similar Documents

Publication Publication Date Title
Huang et al. Identifying autism spectrum disorder from resting-state fMRI using deep belief network
Xue et al. Particle swarm optimisation for feature selection in classification: Novel initialisation and updating mechanisms
US5463548A (en) Method and system for differential diagnosis based on clinical and radiological information using artificial neural networks
Khatri et al. Wheat seed classification: utilizing ensemble machine learning approach
Lin et al. Multi-temperature simulated annealing for optimizing mixed-blocking permutation flowshop scheduling problems
Zhu et al. Evolving soft subspace clustering
Kaope et al. The Effect of Class Imbalance Handling on Datasets Toward Classification Algorithm Performance
CN108921299A (en) Cost-sensitive classification method based on sequential three decisions
Mase et al. Cohort shapley value for algorithmic fairness
Ghosh et al. Leukox: leukocyte classification using least entropy combiner (lec) for ensemble learning
CN111930957A (en) Method and apparatus for analyzing intimacy between entities, electronic device, and storage medium
CN108446841A (en) A kind of systems approach determining accident factor hierarchical structure using grey correlation
JP7247292B2 (en) Electronic device and method for training a classification model for age-related macular degeneration
Wu A SD-IITFOWA operator and TOPSIS based approach for MAGDM problems with intuitionistic trapezoidal fuzzy numbers
Zarandi et al. A new validity index for fuzzy-possibilistic c-means clustering
Geng et al. Multi-frame decision fusion based on evidential association rule mining for target identification
Chen et al. Reinforcement learning with heterogeneous data: estimation and inference
El-Sebakhy et al. Iterative least squares functional networks classifier
Sun et al. A fuzzy brain emotional learning classifier design and application in medical diagnosis
Walia et al. A survey on applications of adaptive neuro fuzzy inference system
Al-Tashi et al. Enhanced multi-objective grey wolf optimizer with lévy flight and mutation operators for feature selection
Azadmanesh et al. A white-box generator membership inference attack against generative models
Peignier et al. Data-driven gene regulatory networks inference based on classification algorithms
Ozdemir Adapting transfer learning models to dataset through pruning and Avg-TopK pooling
Yanto et al. Deep learning approach analysis model prediction and classification poverty status

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181130

RJ01 Rejection of invention patent application after publication