CN107273912A

CN107273912A - A kind of Active Learning Method based on three decision theories

Info

Publication number: CN107273912A
Application number: CN201710326684.8A
Authority: CN
Inventors: 胡峰; 张苗; 张清华; 于洪; 程麟焰; 余春霖; 靳义林; 李智星; 王进; 雷大江
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2017-05-10
Filing date: 2017-05-10
Publication date: 2017-10-20

Abstract

A kind of Active Learning Algorithm based on three decision-makings is claimed in the present invention, the problem of by using the thought of three decision-makings to solve current unmarked sample.It is related to rough set, the field such as data mining.The uncertainty of data untagged is determined by margin strategies first.Then data untagged is divided into by three different domains by uncertainty：Positive domain, negative domain, Boundary Region.Handled for the data in each domain using corresponding solution.Purpose is exactly that, in order to select information content high, representative strong data are marked.These data marked are added to training set, new grader is created.Learnt by the training iterated, until reaching the preset times of iteration or reaching desired evaluation criteria.The present invention can preferably improve the properties of grader.

Description

A kind of Active Learning Method based on three decision theories

Technical field

The invention belongs to the fields such as rough set, machine learning, data mining, and in particular to a kind of Active Learning Method.

Background technology

, it is necessary to train a model by given data (training set) during data analysis and processing is done, In specific practice process：We can have found easily can efficiently get data, but these data often all do not have It is markd.With the arriving in big data epoch, data influence the every aspect of people's life, but these data are redundancies , it is cumbersome, unmarked.Directly obtain that valuable, not only appointed condition does not reach and needed with markd data The fund of writing and time.However, we can start with from these without markd data, how these data are added Work processing, gives full play to its potential value, new requirement is proposed to the existing technology of people.

Active Learning Method is one kind of machine learning method, can effectively be solved the above problems.Calculated by Active Learning Method, which selects most useful data and gives expert, to be marked, and expands training set, creates more effective model.Compared to more traditional Passive learning, this method can select information content high, and representational data carry out mark, it is to avoid the redundancy additions of data and Unnecessary addition.Meanwhile, the human and material resources of high-volume flag data are reduced, the cost of data analysis is reduced.

1974, Simon just proposed the related thought of Active Learning to lea.Valiant is from statistical angle Demonstrate by training the selection of example to effectively reduce the data needed for training.In recent years, increasing scholar Research direction Active Learning is locked, it is proposed that about the related notion and theory of Active Learning.It is compared to passive learning The mode of data is randomly selected, Active Learning Algorithm selects useful sample to user's mark, rather than passive receive data. According to experimental result, on the premise of same accuracy is realized, actively selection can largely reduce institute compared with random selection The sample size needed.The implementation procedure of Active Learning Algorithm can be divided into two processes.One：Selected by sample selection algorithm Go out information content highest, the maximum sample of the magnitude of value is marked, the sample of mark is added in markd sample.Its Two：There is flag data to create a base grader using existing, select suitable evaluation index, weighed by supervised learning The classification performance of grader.The two processes are alternately performed, and pass through continuous iteration so that the performance of grader is optimal, Or the number of times of iteration is set, until reaching default condition.

According to the form of selection data untagged, Active Learning can be divided into two classes：Active Learning based on stream, it is based on The Active Learning in pond.

Active Learning based on stream：This method is obtained in natural language processing directions such as part-of-speech tagging, elimination meaning of a word differences It is widely applied.Sample is judged it one by one according to the form of stream with learning algorithm, as long as two kinds of possibility of judged result, Or mark or not.The sample for needing mark is transferred into expert's mark.The sample of not mark is directly abandoned, and is not being used. Representational method has the committee to inquire about (Query By Committee, QBC).

Active Learning based on pond：Sequential sampling is required from the Active Learning based on stream, judges different one by one, and this method will Ask and unified judgement is carried out to the data of a certain scope, topk sample is selected according to a certain index and is labeled.Certainly it is based on The sample in pond can also be changed using based on stream learning method, selected a small amount of sample from pond every time and individually sentenced It is disconnected.Active Learning based on pond is to study most popular, of greatest concern, most widely used method instantly.

The samples selection strategy of Active Learning：

1. based on probabilistic samples selection strategy

By probabilistic measure, the sample for picking out its most unascertainable classification gives human expert progress Mark., can be by creating probabilistic model for two classification problems, the sample for selecting posterior probability values closest to 0.5 is carried out Mark.For many classification problems, what is often selected is the sample of low confidence, or selects maximum a posteriori probability based on margin The minimum sample with the difference of secondary maximum a posteriori probability.The uncertain of sample is determined it can in addition contain be calculated by comentropy Property, entropy is bigger, uncertain bigger.

2. the samples selection strategy reduced based on following error

Do not have markd sample to be put into sample to be marked to be marked, be then added in training set by each, train Go out new grader, select so that the maximum sample of grader anticipation error reduction is marked.Due to estimation anticipation error Reduction needs very big amount of calculation, and institute often applies to 3. sample based on committee's inquiry in two classification problems in this way Selection strategy

Method based on committee's inquiry constitutes the committees by creating multiple models, using these models respectively to without mark Remember that sample carries out decision-making, decision-making least consistent sample is considered as the sample that least can determine that classification, such sample is added to and treated Mark is carried out in marker samples can minimize version space.

Three decision-makings are to be studied to draw on the basis of probability rough set and decision-making rough set by Yao Y Y.Probability is coarse Collect model and introduce two parameter alphas and β, whole space is divided for three domains：Positive domain, negative domain also have Boundary Region.Yao Y Y are first The secondary related notion for proposing three decision-makings：Positive domain means the affirmative to things, and negative domain means the negative to things, Boundary Region Mean there is no full assurance to a certain things, it is impossible to make a policy immediately.Three decision-makings are that probability rough set is imparted newly Semanteme, its appearance provides new thinking methods for decision problem.

The content of the invention

Present invention seek to address that above problem of the prior art.Propose a kind of preferably lifting grader properties, It can be effectively used for the Active Learning Method based on three decision theories of the situation of the current a large amount of categorical attributes missings of processing.This hair Bright technical scheme is as follows：

A kind of Active Learning Method based on three decision theories, it comprises the following steps：

1), obtain data set and call random function by data set randomization, data set has proportionally been divided into mark Remember data set, data untagged collection, test set；

2), using there is flag data collection training Naive Bayes Classifier, by Naive Bayes Classifier to unmarked Data carry out posterior probability estimation, calculate the uncertainty of each data untagged；

3) data untagged, is divided into by three domains, i.e., positive domain, negative domain and Boundary Region according to probabilistic size.Its In be related to two kinds of division methods：Divided according to threshold value and division is ranked up according to uncertain size；

4), to step 3) in sample not on same area separately handle, select contain much information, costly sample enters rower Note, and the sample marked is added in training set, to train new grader to carry out result test to test set, in order to survey The performance of grader is tried, is verified using different evaluation criterias.

Further, the step 2) use margin strategies to carry out posterior probability estimation to data untagged, after calculating The difference of probability is tested, the uncertainty of each data untagged is determined；

D_value (x)=p (y_first|x,L)-p(y_second| x, L) (1),

Wherein D_value (x) represents probabilistic size, p (y_first| x, L) the maximum value of posterior probability, p (y_second|x, L) the maximum value of posterior probability time.

Further, the step 3) in data untagged be divided into by three domains according to threshold value specifically included：According to threshold Value threshold_ α, threshold_ β are divided, and are defined as follows：

If D_value(x)≤threshold_αx∈POS(X)

If threshold_α<D_value(x)<threshod_β x∈BND(X) (2)

If D_value(x)≥threshold_βx∈NEG(X)

Wherein, 0≤threshold_ α<threshold_β≤1

Threshold_ α, threshold_ β can based on experience value or the confidence level of setting is determined.

Further, the step 3) in be ranked up division according to uncertainty and specifically include：According to uncertainty by Small, by controlling the size, sample space to be divided to being ranked up greatly, preceding topk belongs to positive domain, and rear topk belongs to negative domain, A middle part belongs to Boundary Region, and k value is influenceed by quantity selectNum is selected.

Further, the step 4) in step 3) in sample not on same area separately handle and specifically include step：

For positive domain：That is x ∈ POS (X) sample set, is directly appended to sequence to be marked, and by it from unmarked number Deleted according to middle；Negative domain：I.e. x ∈ NEG (X) sample set, does not do any processing to such sample；Boundary Region：X ∈ BND (X) are needed Further determine whether mark, including step：By sample two-by-two between distance determine the field radius of Boundary Region sample；Meter Calculation field density, selects most representational sample, representational sample is descending to be ranked up, and selects topk sample It is added in sequence select to be marked.

Further, it is described calculate Boundary Region sample two-by-two between distance include：If attribute is continuous type attribute, Europe is used Distance is drawn, is defined as follows：Assuming that sample X={ x₁,x₂,...x_j...,x_m, Y={ y₁,y₂,...y_j...,y_m}：

x_jRepresent sample x j-th of attribute, y_jRepresent sample Y j-th of attribute；

If attribute is discrete type attribute, selection uses VDM, is defined as follows：Assuming that sample x₁, x₂The two of discrete type attribute Individual value V₁, V₂,

C₁It is that the property value is V in all samples₁Number, C_1iIt is then i numbers, C for wherein classification₂In being all samples The property value is V₂Number, C_2iIt is then i numbers for wherein classification, K is constant, generally takes 1.

Further, the formula of the field radius for determining sample is：

δ=min (dis (x_i, s)) and+w × range (dis (x_i, s)), 0≤w≤1 (5)

Wherein, min (dis (x_i, s)) and represent the sample away from its nearest neighbours, range (dis (x_i, s)) represent in specified data The span of its distance is concentrated, w controls the size of radius,x_iδ neighborhood definitions be：δ(x_i)={ x_i|dis (x_i,x_i)≤δ }, wherein δ is a metric predefined.

Further, the representative point is defined as below：

D_value (x) represents probabilistic size, and the smaller uncertainty of the value is bigger, x_kFor the nothing in the radius of neighbourhood Marker samples, N is to make formula dis (x, x_k)≤δ sets up x_kNumber, it is assumed that sample x and sample x_kAttribute space be respectively x ={ x¹,x²,...x^j...,x^m, x_k={ x_k ¹,x_k ²,...x_k ^j...,x_k ^m, both similarities are calculated using cosine formula, it is remaining String formula is as follows：

Advantages of the present invention and have the beneficial effect that：

The present invention has applied to three decision theories in Active Learning, is divided sample space by uncertain size For 3 domains：Positive domain, Boundary Region, negative domain.The present invention proposes two kinds of division methods, method one：Divided according to threshold value, method Two：Division is ranked up according to uncertain size.It has selected different processing methods respectively for the sample in not same area, it is right Sample in positive domain is directly appended in sequence select to be marked；Do not processed for the sample of negative domain；For Boundary Region Sample, neighborhood density is calculated based on neighborhood, determines that it is representative, and topk sample of selection is added to sequence select to be marked In, most relief expert is that the sample in select carries out mark, this from individual difference, and sample choosing is carried out with a definite target in view The method selected can finer selection go out have mark meaning sample be marked so that preferably lifted grader properties, Such as classification accuracy rate, ROC, F-value.This method is extended to Active Learning, can be effectively used for processing a large amount of at present The situation of categorical attribute missing.

Brief description of the drawings

Fig. 1 is that the present invention provides preferred embodiment Active Learning flow chart；

Fig. 2：Divide positive domain, the schematic diagram of negative domain；

Fig. 3：Select the schematic diagram of representative sample on Boundary Region；

Fig. 4：Schematic diagram based on uncertain zoning；

Fig. 5：Active Learning flow chart based on three decision-makings.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, detailed Carefully describe.Described embodiment is only a part of embodiment of the present invention.

The present invention solve above-mentioned technical problem technical scheme be：

Technical scheme includes following steps：

The method of the present invention comprises the following steps：

Step1:Experimental data is divided.By data be divided into flag data (training set), data untagged and Data (test set) to be tested.

Step2:There is flag data i.e. training set training one base grader --- Naive Bayes Classification using existing Device, the posterior probability values of each sample are estimated using the grader created in data untagged.If the grader is one Individual two classification, the posterior probability of another classification is subtracted with the posterior probability of maximum classification.If many classification, then select posteriority general Maximum and time maximum the value of rate value simultaneously seeks difference D_value.

Step3：Based on uncertainty, whole data untagged space is divided into three different domains according to a certain threshold value In.For the D_value less data of value, it is considered as uncertain higher data, is directly divided into positive domain, is added It is added in sequence select to be marked, waits human expert to be marked, and the deletion partial data is concentrated from data untagged； The larger data of value for D_value, classification can be determined by being considered as, and be divided into negative domain, without the partial data Process；Remaining data are divided into Boundary Region.If the data for Boundary Region are not not anticipate completely to their marks Justice, in order to select the data for more thering is mark to be worth.Need to redefine whether mark.

Step4：Point on Boundary Region is handled.For the data on Boundary Region, in order that must mark it is more valuable, The concept of neighborhood is introduced, the density of data untagged is calculated in neighborhood.The distributed intelligence of sample is taken into account, selection is provided Representational data.

Step5：Sequence is re-started to representative information content, selection topk is added to data set to be marked In select.

Step6：Domain expert is given by the data in select, mark is carried out to them, according to select result more New training set and data untagged collection.New grader is created using the training set after renewal, result test is carried out to test set.

The step of repeating Step2-Step6, until meeting the preparatory condition of iteration or reaching the requirement of performance metric.

Further, in the Step1：It is single to use a number in order to be preferably estimated to learner Generalization Capability Result of the test is obtained according to division to be often not sufficiently stable reliably, it is also difficult that there is convincingness.If selection is using random division several times, weight Retrial is tested, the method finally averaged, it is clear that obtained assessment result is more reasonable, more convincing.When testing, The number of times for repeating experiment is specified, the test result that different pieces of information is divided in each circulation experiment is observed.So dividing data When, it is necessary to call random function, realize the random division to data.

Further, in the Step2：Naive Bayes Classifier is created, Naive Bayes Classifier is with conditional attribute Premised on independence, it is assumed that the influence that each attribute is produced to classification results is separate.So as to solve solution class condition The problem of joint probability distribution of probability P (x | c) this all properties, joint probability is directly calculated if based on finite sample, The problem of will being faced with multiple shot array in computational problem, will be faced with the problem of sample is sparse in data, if number Attribute especially many situation is still belonged to according to collection, then problems faced can be more serious.

Determine that the principle of a certain sample classification is as follows by Naive Bayes Classifier：

The set of attributive character：X={ a₁,a₂,a₃...a_m}

The set of category attribute：C={ y₁,y₂,y₃...y_n}

P (x) is to be used for normalized " evidence " factor, and for given sample x, evidence factor p (x) does not have with categorical attribute There is any relation, the size for any categorical attribute value does not change.As long as so molecule is maximized.Piao The expression formula of plain Bayes classifier is usually defined as follows：

Formula will be causedTake the y of maximum_iIt is used as the result of decision.From this knot If it can be found that directly using the output probability value of the result in fruit, probable value that will be maximum

p(y_first, x) with time probable value p (y of maximum_second, x) do difference, it may appear that different probable values, but difference Identical situation.So that two classify as an example, scene one：p(y_first, x) with p (y_second, x) value is respectively 0.4,0.2,

D_value=p (y_first,x)-p(y_second, x)=0.2；Scene two：p(y_first, x) with p (y_second, x) value Respectively

0.5,0.3, D_value=p (y_first,x)-p(y_second, x)=0.2；In fact, scene one adds the evidence factor Afterwards

Scene two Add after the evidence factor

Difference in obvious scene two is than scene one It is small, it is uncertain some larger in scene two.So, when from naive Bayesian as grader, utilize formula (2) Obtained probable value needs to be normalized with evidence factor p (x), it is impossible to directly do difference operation using the probable value of output.

Further with regards to conditional probability p (x_j|y_i) calculating, for discrete type attribute, using formulaTo calculate.WhereinIt is x to represent j-th of attribute value_jSample set.For continuous type category For property, it will be assumed that p (x_j|y_i) Normal Distribution, i.e.,So what continuous type was calculated Condition probability formula is（3）

Further, in the Step2 and Step3：Naive Bayes Classifier is constructed, selects most uncertain sample to enter Line flag, selects to represent method margin strategies using probabilistic here, margin formula are as follows

x^*=argmin (p (y_first|x,L)-p(y_second|x,L)) (4)

y_first：The maximum value of posterior probability, y_second：The maximum value of posterior probability time, when the values difference of two is minimum Wait, it is uncertain bigger.

By taking two classification (y, n) as an example, if D_value value very little illustrates that data belong to classification y or the n that classifies probability Compare neighbouring.At this moment grader accurate can not carry out decision-making to the data.Can be very big if by such data mark The performance of classification is improved in degree, here it is based on the uncertain sample for selecting most worth mark.If D_value's takes Value is larger, such as the posterior probability for the y that classifies is far longer than classification n posterior probability, in the range of error permission, at this moment classifies Device has had the full classification results for holding the determination data.In this case, we need not do to such data Processing.

Further, in the Step3：Based on uncertainty, data are divided according to threshold value, two are referred here to Plant division methods.Method one：Based on experience value or setting confidence level determine that threshold_ α, threshold_ β are drawn Point；Method two：It is ranked up, is divided by controlling the size according to uncertainty is ascending.In order to better illustrate not With the division in region, draw as Fig. 2 further illustrates problem.

For sample A, posterior probability values are obtained by grader, most probable value is far longer than time most probable value, this When think that its classification can be determined.Even if A density is very big, also no longer A is marked, now, A is divided into negative Domain.

For sample B, posterior probability values are obtained by grader, most probable value and time most probable value very close to, point Class device is very big to the probability of its decision error, although sample B does not have sample A density big.Obviously, meaning is marked more to B Greatly, now, B is divided into positive domain.

Further, in the Step4：Based on the uncertain division data are carried out with three, there is most sample This acquirement is not the most value at two ends, and uncertainty is in intermediateness, if it is known that having more unmarked around such data Sample, then illustrate that the data are representative, and mark is carried out to it can reduce the uncertainty of surrounding sample, improve grader Performance.

To such as Fig. 3 of the data processing in Boundary Region, the grader of construction has not true to the A of unmarked sample, B classification It is qualitative, but it is clear that sample A is more representative, study is more beneficial for A marks.Sample on Boundary Region was needed week The distribution situation for enclosing sample takes into account, by most representative sample mark.

It is described below with specific embodiment, specifically includes following steps：

(1) data are divided

Call random function to make randomizing data, data are divided, it is settable to have flag data collection：Data untagged Collection：Test set=1:69:The data of 30, i.e., 1% are used for having a flag data, and 69% data are used for data untagged, 30% Data are used to test.69% data untagged collection is selectively added into 1% by each iteration flag data Collect in (training set).It is iterating through Active Learning Method every time to select data untagged collection, picks out most worthy, most Significant sample is marked.The sample marked is added in training set, trains new grader to carry out test set Test, observes the performance of grader after more each adding procedure.

(2) uncertainty of data untagged is calculated, different domains are divided according to uncertain

Naive Bayes Classifier is constructed, according to margin strategies, the difference of posterior probability is calculated, determines each without mark The uncertainty for the evidence that counts.

D_value (x)=p (y_first|x,L)-p(y_second|x,L) (1)

Based on probabilistic size, whole sample space is divided into 3 domains：Positive domain, negative domain, Boundary Region.

Division methods one：Divided, be defined as follows according to threshold value threshold_ α, threshold_ β：

If D_value(x)≤threshold_αx∈POS(X)

If threshold_α<D_value(x)<threshod_β x∈BND(X) (2)

If D_value(x)≥threshold_βx∈NEG(X)

Wherein, 0≤threshold_ α<threshold_β≤1

Threshold_ α, threshold_ β can based on experience value or the confidence level of setting is determined.So that two classify as an example Illustrate, if threshold_ α=0.05, threshold_ β=0.95

As D_value=0.05

p(y₁|x)+p(y₂| x)=1, p (y₁|x)-p(y₂| equation group x)=0.05 is solved, following result p (y can be obtained₁|x) =0.525, p (y₂| x)=0.475, i.e., when the posterior probability of two classification is 0.525 and 0.475 respectively, it is divided into just Domain, is considered as and not can determine that classification completely.

As D_value=0.95

p(y₁|x)+p(y₂| x)=1, p (y₁|x)-p(y₂| equation group x)=0.95 is solved, following result p (y can be obtained₁|x) =0.975, p (y₂| x)=0.025, i.e., when the posterior probability of two classification is 0.975 and 0.025 respectively, it is divided into bearing Domain, classification can be determined by being considered as.

Division methods two：Such as Fig. 4, it is ranked up, by controlling the size, sample space is entered according to probable value is ascending Row is divided, and preceding topk belongs to positive domain, and rear topk belongs to negative domain, and a middle part belongs to Boundary Region.

WithExemplified by

Ifx∈POS(X)

Ifx∈BND(X)

Ifx∈NEG(X)

Wherein, selectNum：Each iteration expects the quantity of mark.

top(x)：A function is defined, the data x sequence number in sequencing queue is obtained.

(3) corresponding processing is done to the data on not same area

Positive domain：I.e. x ∈ POS (X) sample set, is directly appended to sequence to be marked, and by it from data untagged Delete；Negative domain：I.e. x ∈ NEG (X) sample set, has little significance to such sample mark, does not do any place to such sample Reason.Boundary Region：X ∈ BND (X) need to further determine whether mark.

1. the distance between calculating sample two-by-two.

If attribute is continuous type attribute, using Euler's distance, it is defined as follows：

(concrete meaning is as above changed) (3)

C₁It is that the property value is V in all samples₁Number, C_1iIt is then i numbers, C for wherein classification₂In being all samples The property value is V₂Number, C_2iIt is then i numbers for wherein classification.K is constant, generally takes 1.

2. determine the field radius of sample

δ=min (dis (x_i, s)) and+w × range (dis (x_i, s)), 0≤w≤1 (5)

Wherein, min (dis (x_i, s)) and represent the sample away from its nearest neighbours, range (dis (x_i, s)) represent in specified data The span of its distance is concentrated, w controls the size of radius

3. the most representational sample of selection

It is uncertain bigger because D_value value is smaller, take its opposite number then to have D_value value bigger, do not know Property it is bigger, in order to avoid negative, progress Jia 1 and operated, and then representative point is defined as below：

N is to make formula dis (x, x_kThe number that)≤δ is set up, wherein,

Descending to representational sample to be ranked up, topk sample of selection is added to sequence select to be marked In

(4) it is sequence mark to be marked, creates new grader

The sample in union operation, i.e., sequence select to be marked is taken to the sample that each domain is selected, expert etc. is given It is to be marked, marked sample is added to training set, new grader is created.

(5) result is tested

Active Learning Algorithm is the process of an iteration, and selection selectNum is added in data set to be marked every time, and They are labeled.Training set is updated, new grader is created, classification performance is tested using test set.By not Disconnected iteration addition data untagged is until meeting default iterations or evaluation criterion.Here evaluation index can select to make With accuracy, ROC, F-value etc..In order that obtaining experimental result has more reliability, by the method for random division, The test of 10 times such as is carried out to data, this 10 average results are finally asked for.

The above embodiment is interpreted as being merely to illustrate the present invention rather than limited the scope of the invention. After the content for the record for having read the present invention, technical staff can make various changes or modifications to the present invention, these equivalent changes Change and modification equally falls into the scope of the claims in the present invention.

Claims

1. a kind of Active Learning Method based on three decision theories, it is characterised in that comprise the following steps：

1), obtain data set and call random function by data set randomization, data set has proportionally been divided into reference numerals According to collection, data untagged collection, test set；

2), using there is flag data collection training Naive Bayes Classifier, by Naive Bayes Classifier to data untagged Posterior probability estimation is carried out, the uncertainty of each data untagged is calculated；

3) data untagged, is divided into by three domains, i.e., positive domain, negative domain and Boundary Region according to probabilistic size.Wherein relate to And to two kinds of division methods：Divided according to threshold value and division is ranked up according to uncertain size；

4), to step 3) in sample not on same area separately handle, select contain much information, costly sample is marked, And the sample marked is added in training set, to train new grader to carry out result test to test set, in order to test The performance of grader, is verified using different evaluation criterias.

2. the Active Learning Method according to claim 1 based on three decision theories, it is characterised in that

The step 2) use margin strategies to carry out posterior probability estimation to data untagged, the difference of posterior probability is calculated, Determine the uncertainty of each data untagged；

D_value (x)=p (y_first|x,L)-p(y_second| x, L) (1), wherein

D_value (x) represents probabilistic size, p (y_first| x, L) the maximum value of posterior probability, p (y_second| x, L) posteriority The maximum value of probability time.

3. the Active Learning Method according to claim 2 based on three decision theories, it is characterised in that

The step 3) in data untagged be divided into by three domains according to threshold value specifically included：

Divided, be defined as follows according to threshold value threshold_ α, threshold_ β：

Wherein, 0≤threshold_ α<threshold_β≤1

4. the Active Learning Method according to claim 2 based on three decision theories, it is characterised in that

The step 3) in be ranked up division according to uncertainty and specifically include：Arranged according to uncertainty is ascending Sequence, by controlling the size, is divided to sample space, and preceding topk belongs to positive domain, and rear topk belongs to negative domain, a middle part Belong to Boundary Region, k value is influenceed by quantity selectNum is selected.

5. the Active Learning Method according to claim 4 based on three decision theories, it is characterised in that the step 4) In to step 3) in sample not on same area separately handle and specifically include step：

For positive domain：I.e. x ∈ POS (X) sample set, is directly appended to sequence to be marked, and by it from data untagged Delete；Negative domain：I.e. x ∈ NEG (X) sample set, does not do any processing to such sample；Boundary Region：X ∈ BND (X) need into One step determines whether mark, including step：By sample two-by-two between distance determine the field radius of Boundary Region sample；Calculate neck Domain density, selects most representational sample, representational sample is descending to be ranked up, topk sample addition of selection Into sequence select to be marked.

6. the Active Learning Method according to claim 5 based on three decision theories, it is characterised in that

It is described calculate Boundary Region sample two-by-two between distance include：If attribute is continuous type attribute, using Euler's distance, definition is such as Under：Assuming that sample X={ x₁,x₂,...x_j...,x_m, Y={ y₁,y₂,...y_j...,y_m}：

If attribute is discrete type attribute, selection uses VDM, is defined as follows：Assuming that sample x₁, x₂In two values of discrete type attribute V₁, V₂,

C₁It is that the property value is V in all samples₁Number, C_1iIt is then i numbers, C for wherein classification₂It is the category in all samples Property value be V₂Number, C_2iIt is then i numbers for wherein classification, K is constant, generally takes 1.

7. the Active Learning Method according to claim 5 based on three decision theories, it is characterised in that the determination sample The formula of this field radius is：

δ=min (dis (x_i, s)) and+w × range (dis (x_i, s)), 0≤w≤1 (5)

Wherein, min (dis (x_i, s)) and represent the sample away from its nearest neighbours, range (dis (x_i, s)) represent in data set is specified The span of its distance, w controls the size of radius,x_j∈U x_iδ neighborhood definitions be：δ(x_i)={ x_i|dis(x_i, x_i)≤δ }, wherein δ is a metric predefined.

8. the Active Learning Method according to claim 5 based on three decision theories, it is characterised in that

The representative point is defined as below：

D_value (x) represents probabilistic size, and the smaller uncertainty of the value is bigger, x_kFor the unmarked sample in the radius of neighbourhood This, N is to make formula dis (x, x_k)≤δ sets up x_kNumber, it is assumed that sample x and sample x_kAttribute space be respectively x={ x¹, x²,...x^j...,x^m, x_k={ x_k ¹,x_k ²,...x_k ^j...,x_k ^m, both similarities, cosine formula are calculated using cosine formula It is as follows：