CN110059752A - A kind of statistical learning querying method based on comentropy Sampling Estimation - Google Patents

A kind of statistical learning querying method based on comentropy Sampling Estimation Download PDF

Info

Publication number
CN110059752A
CN110059752A CN201910319193.XA CN201910319193A CN110059752A CN 110059752 A CN110059752 A CN 110059752A CN 201910319193 A CN201910319193 A CN 201910319193A CN 110059752 A CN110059752 A CN 110059752A
Authority
CN
China
Prior art keywords
sample
comentropy
mark
probability distribution
statistical learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910319193.XA
Other languages
Chinese (zh)
Inventor
曲豫宾
李芳�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong Textile Vocational Technology College
Original Assignee
Nantong Textile Vocational Technology College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong Textile Vocational Technology College filed Critical Nantong Textile Vocational Technology College
Priority to CN201910319193.XA priority Critical patent/CN110059752A/en
Publication of CN110059752A publication Critical patent/CN110059752A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of statistical learning querying methods based on comentropy Sampling Estimation, this method is using having marked training pattern that sample obtains to not marking each example calculation comentropy in Instances Pool, it selects several uncertainty highest samples and calculates the expectation empiric risk of corresponding data distribution, selection makes it is expected that the sample of empirical risk minimization is labeled.The present invention has the advantages that the present invention selects sample from the microcosmic angle of sample, the information content of sample itself is made full use of, adequately combine to the two facilitates selection to be not only able to satisfy sample information content higher but also be able to satisfy the smallest sample of expected loss;Simultaneous selection strategy effectively reduces the computation complexity based on statistical learning selection strategy.

Description

A kind of statistical learning querying method based on comentropy Sampling Estimation
Technical field
The present invention relates to statistical learning querying method, in particular to a kind of statistical learning based on comentropy Sampling Estimation is looked into Inquiry method.
Background technique
Traditional supervised learning uses flag data collection training pattern, however flag data collection sometimes needs to spend largely Time and cost, Active Learning frame select a small amount of example to be labeled by concentrating in unlabeled data, reach preferable point Class effect.The common Active Learning query strategy based on pond can be divided into Uncertain Sapmpling, Query-By- Committee,Expected Model Change,Expected Error Reduction,Variance Reduction, Density-Weighted Methods etc. is several, and the model for classification includes naive Bayesian, random forest, supporting vector Machine etc..Uncertain Sapmpling selects not mark sample based on the angle of uncertainty, and the strategy is found in practice With stronger robustness, but there are problems that abnormal point selection;The collection of Query-By-Committee maintenance disaggregated model It closes, does not mark the standard of sample alternatively according to the inconsistency of different classifications device, common evaluation criterion includes vote Entropy, Killback-Leibler divergence etc., the strategy are substantially a kind of contractions by space is assumed To realize that sample selects;The method choice of Expected Model Change strategy use decision information opinion influences most model Big does not mark example;Expected Error Reduction is directly to be calculated based on Statistical Learning Theory and do not marked sample Different labeled bring expected risk, criterion is minimized according to expected risk and selects not mark sample, which is Selection strategy is exactly direct optimization expected risk, exists simultaneously the higher problem of computation complexity;Variance Reduction plan Slightly not instead of by directly optimizing expected risk, indirectly select to reduce output variance realization sample is not marked; Density-Weighted Methods considers the representativeness for not marking sample while consideration does not mark sample information content, right Information content imposes different weights from representativeness, selects sample according to weighted value.Huang Shirong, Jin Rongrong, Zhou Zhenhua etc., " to look into Information and typical example are ask to carry out Active Learning ", " neural information processing systems progress ", 2010 combination supporting vector machines The QUIRE algorithm of proposition also belongs to such, has preferable classifying quality in multiple fields, however that there are computation complexities is higher Problem.
Sample is not marked using the method choice of statistical learning has obtained in-depth study.D. MacKay.(1992) Information-based objective functions for active data selection,Neural Computation 4 (4): 590-604 and Cohn, D.A., Ghahramani, Z., &Jordan, M.I. (1996) Active learning with statistical models.Journal of Artificial Intelligence Research, 4,129-145 propose to carry out optimization object function using the method for statistical learning, are created using classifiers such as feedforward neural networks Model.N.Roy,A.McCallum,"Towards optimal active learning through sampling estimation of error reduction", Proc.18th Int.Conf.Mach.Learn.,pp.441-448, 2001 propose directly to select not mark sample using the statistical learning method for minimizing expected risk function, however this method is still So there is a problem of that calculating information content is larger.Z.Wang,J.Ye, ″Querying discriminative and Representative samples for batch mode active learning ", Proc.ACM SIGKDD, Pp.158-166,2013 are contained much information by the selection of minimization empiric risk and representational do not mark sample.Tang Ying Roc, Huang Shengjun, autonomous rhythm Active Learning --- in the correct thing of correct time inquiring, " the 33rd AAAI is artificial Intelligent conference collection of thesis ", 2019 introducings do not mark example from what step study simultaneous selection was easily classified, and simultaneous selection meets information What the features such as amount is big had a potential value does not mark example, achieves preferable classifying quality.
Therefore, research and develop that a kind of time complexity is lower, the higher statistical learning based on comentropy Sampling Estimation of validity is looked into Inquiry method is necessary.
Summary of the invention
The technical problem to be solved in the present invention is to provide a kind of time complexity is lower, validity is higher to be taken out based on comentropy The statistical learning querying method of sample estimation.
In order to solve the above technical problems, the technical solution of the present invention is as follows: a kind of statistics based on comentropy Sampling Estimation Practise querying method, innovative point is: the statistical learning querying method the following steps are included:
Step 1: setting training examples x ∈ D=Rn, the label of example is ∈ Y={ y1, y2... yk, to training examples X's Conditional probability distribution is P (y | x), has marked sample set D using independent same distribution sampling, joint probability distribution P (x, y)= P (y | x) P (x) then generates posterior probability to input sample xTherefore the expected risk based on statistical learningAre as follows:
Step 2: loss function L is used to measure true probability distribution P (x, y) and posterior probability estimation point of sample (x, y) ClothDifference, then loss function L are as follows:
Step 3: expected riskThe target of optimization, which is that selection is optimal, does not mark sample sequence k={ x1, x2, x3, ...xk, wherein k indicates never to mark the number sampled in sample, does not mark sample (x for each of sample sequence k*, y*), then
Step 4: determining have for not marking sample for the range for being learnt based on not marking sample M, therefore being learnt Determining estimation P (x);Definition will not mark sample (x*, y*) be added marked sample set D generation new mark integrate as D* =D+ (x*, y*), new mark sample set D*Distribution function it is unknown, in order to effective calculation formula (2), using having marked The probability distribution of sample set is infused to estimate not marking sample (x currently*, y*), then the empiric risk of current class deviceAre as follows:
Step 5: byIt calculates and does not mark sample x*, in y*Expected risk value, y in the case of ∈ Y*True value be not Know, known probability distribution P (x, y) can be used, calculate estimated probability Distribution Value, using different probability distribution as power Value calculatesFinal desired value are as follows:
Step 6: selecting the highest sample x of comentropy in never mark example set MU, max:
xU, max=argmax (- ∑iPD(yi|x)log PD(yi|x)) (6);
Step 7: calculating the information entropy of sample to have marked sample combination D, according to comentropy, select uncertainty most Q high sample, calculates Q sample the expected risk of response, and the smallest sample of expected risk value is selected to carry out manual mark Note.
Further, the step 7, the specific steps are as follows:
Step 1: input initialization flag data collection D={ x1... xl, Unlabeled data collection M={ xl+1... xl+u, Data markers y1i... yl, maximum cycle Umax
Step 2: output condition probability distribution
Step 3: using flag data initialization training pattern P (x, y);
Step 4: when reaching maximum cycle or, then calculating corresponding information according to formula (6) in unmarked training set M Entropy;
Step 5: according to comentropy, selecting the maximum Q sample of comentropy;
Step 6: sample being labeled using the classification in set Y, is added separately in training set, re -training mould Type calculates corresponding loss function according to formula (2);
Step 7: calculating corresponding empirical risk function according to formula (4);
Step 8: according to the difference of classification, the desired value of expected risk function is calculated according to formula (5), Q sample calculates It completes, then enters step 9, Q sample and do not calculate completion, then return step 6;
Step 9: selection carries out manual mark so that the smallest sample of desired value;If without the smallest sample of desired value, Return step 2.
The present invention has the advantages that the present invention is based on the statistical learning querying method of comentropy Sampling Estimation, this method makes With training pattern that sample obtains has been marked to each example calculation comentropy in Instances Pool is not marked, several uncertainties are selected Highest sample and the expectation empiric risk for calculating corresponding data distribution, selection make it is expected that the sample of empirical risk minimization is marked Note selects sample from the microcosmic angle of sample, makes full use of the information content of sample itself, carries out adequately combination to the two and helps It is higher but also be able to satisfy the smallest sample of expected loss that sample information content had not only been able to satisfy in selection;Simultaneous selection strategy effectively reduces Computation complexity based on statistical learning selection strategy.
Detailed description of the invention
The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
Fig. 1 is that the present invention is based on steps 7 in the statistical learning querying method of comentropy Sampling Estimation to be specifically chosen process Figure.
Fig. 2 is the ACCURACY performance change figure of data set tic-tac-toe.
Fig. 3 is the ACCURACY performance change figure of data set transfusion.
Fig. 4 is the ACCURACY performance change figure of data set kr-vs-kp.
Fig. 5 is the ACCURACY performance change figure of data set diagnosis.
Fig. 6 is the ACCURACY performance change figure of data set breast-cancer.
Specific embodiment
The following examples can make professional and technical personnel that the present invention be more fully understood, but therefore not send out this It is bright to be limited among the embodiment described range.
Embodiment
Statistical learning querying method of the present embodiment based on comentropy Sampling Estimation, the statistical learning querying method include with Lower step:
Step 1: setting training examples x ∈ D=Rn, the label of example is ∈ Y={ y1, y2... yk, to training examples X's Conditional probability distribution is P (y | x), has marked sample set D using independent same distribution sampling, joint probability distribution P (x, y)= P (y | x) P (x) then generates posterior probability to input sample xTherefore the expected risk based on statistical learningAre as follows:
Step 2: loss function L is used to measure true probability distribution P (x, y) and posterior probability estimation point of sample (x, y) ClothDifference, then loss function L are as follows:
Step 3: expected riskThe target of optimization, which is that selection is optimal, does not mark sample sequence k={ x1, x2, x3, ...xk, wherein k indicates never to mark the number sampled in sample, does not mark sample (x for each of sample sequence k*, y*), then
Step 4: determining have for not marking sample for the range for being learnt based on not marking sample M, therefore being learnt Determining estimation P (x);Definition will not mark sample (x*, y*) be added marked sample set D generation new mark integrate as D* =D+ (x*, y*), new mark sample set D*Distribution function it is unknown, in order to effective calculation formula (2), using having marked The probability distribution of sample set is infused to estimate not marking sample (x currently*, y*), then the empiric risk of current class deviceAre as follows:
Step 5: byIt calculates and does not mark sample x*, in y*Expected risk value, y in the case of ∈ Y*True value be unknown , it can be used known probability distribution P (x, y), calculate estimated probability Distribution Value, using different probability distribution as weight, It calculatesFinal desired value are as follows:
Step 6: selecting the highest sample x of comentropy in never mark example set MU, max:
xU, max=argmax (- ∑iPD(yi|x)log PD(yi|x)) (6);
Step 7: calculating the information entropy of sample to have marked sample combination D, according to comentropy, select uncertainty most Q high sample, calculates Q sample the expected risk of response, and the smallest sample of expected risk value is selected to carry out manual mark Note.
As embodiment, specific embodiment is the step 7, as shown in Figure 1, the specific steps are as follows:
Step 1: input initialization flag data collection D={ x1... xl, Unlabeled data collection M={ xl+1... xl+u, Data markers y1i... yl, maximum cycle Umax
Step 2: output condition probability distribution
Step 3: using flag data initialization training pattern P (x, y);
Step 4: when reaching maximum cycle or, then calculating corresponding information according to formula (6) in unmarked training set M Entropy;
Step 5: according to comentropy, selecting the maximum Q sample of comentropy;
Step 6: sample being labeled using the classification in set Y, is added separately in training set, re -training mould Type calculates corresponding loss function according to formula (2);
Step 7: calculating corresponding empirical risk function according to formula (4);
Step 8: according to the difference of classification, the desired value of expected risk function is calculated according to formula (5), Q sample calculates It completes, then enters step 9, Q sample and do not calculate completion, then return step 6;
Step 9: selection carries out manual mark so that the smallest sample of desired value;If without the smallest sample of desired value, Return step 2.
In order to verify the validity of the statistics active learning strategies based on comentropy Sampling Estimation, exist with random sampling procedure It is compared on multiple data sets.Random sampling procedure randomly chooses several samples from unmarked example, selects so that the phase The smallest sample in danger of keeping watch carries out manual mark.
The data set for machine learning that experimental data is proposed from University of California at Irvine.It selects therein Tic-tac-toe, transfusion, kr-vs-kp, diagnosis, breast-cancer are used for the data set of two classification, The partial data collection is frequently used for the research of Active Learning query strategy, Tang Yingpeng, Huang Shengjun, autonomous rhythm Active Learning --- In the correct thing of correct time inquiring, " the 33rd AAAI artificial intelligence conference collection of thesis ", 2019, specific number 1 is shown in Table according to collection description.
The data set used in the experiment of table 1
Experimental data divides data set using stratified sampling, and 50% is used for training data, and 50% for testing number According to taking-up 10% is used as initial labeled data collection from training data, for establishing model.Experiment repeats 5 times at random, adopts With cross validation, each data set generates 5*2 group data, takes the average value of all data for the prediction knot of the labeled data point Fruit.Experiment uses sklearn kit, and classifier selects random forest grader and logistics to return classifier, and parameter makes With system default parameter.
Categorical data in UCI data set converts the LabelEncoder class by sklearn to realize the coding of category, Attribute is converted into corresponding integer value.Never mark selection subset in example and need to set hyper parameter C, hyper parameter C indicate from The number sampled in example set is not marked, in this strategy, sets C as 20.The evaluation index of algorithm uses ACCURACY, indicates The ratio of real example and the sum of real example and false positive example.
Disaggregated model using random random forest as data set, in data set
In tic-tac-toe, transfusion, kr-vs-kp, diagnosis, breast-cancer two kinds it is to be compared Algorithm is as the increase performance change of mark sample is as shown in Fig. 2, Fig. 3, Fig. 4, Fig. 5, Fig. 6.
In order to further investigate propose strategy validity, labeled data ratio be 20%, 40%, 60%, 80%, Win/draw/loss analysis is done to the result of two algorithms in the case where 100%.Win/draw/loss analysis is for describing not The algorithm of algorithm difference when with to(for) same data set.Such as labeled data ratio be 20% when, in data set tic- The classifier accuracy mean value of the upper comentropy sampling policy of tac-toe is denoted as Aie, adopted at random on data set tic-tac-toe The classifier accuracy mean value of sample strategy is denoted as ArIf Aie>Ar, then win=1, if Aie=Ar, then draw=1, If Aie<Ar, then loss=1.Table 2 illustrates the win/draw/ based on comentropy sampling policy and randomized policy comparison loss。
Table 2 be based on comentropy sampling policy and randomized policy labeled data ratio be 20%, 40%, 60%, 80%, The win/draw/loss analysis compared in the case where 100%
From Fig. 2 to Fig. 6 and from the point of view of the displaying result of table 2, based on the Sampling Strategies of comentropy than randomized policy in most of feelings It is attained by preferable classifying quality under condition, has absolutely proved the validity based on comentropy strategy.Simultaneously also illustrate into When the capable sub-sample based on statistical learning, the selection towards individual information amount is better than random selection.With mark sample number Purpose increases, and two sampling policies of Active Learning are sampled according to different angle, and the precision of sorting algorithm is all mentioned Height also illustrates the validity of Active Learning Method frame.But also there are different performances, base on data set diagnosis In the tactful with the increase for marking sample of comentropy, it is rapidly achieved preferable classifying quality;And stochastic sampling strategy does not have not only Have and reaches preferable classifying quality or even biggish fluctuation occurs in classification performance.Illustrate the strategy based on comentropy so more Added with the classifier precision for being conducive to lift scheme.From the point of view of the performance tendency of data set transfusion, with mark sample Increase, algorithm performance has obtained faster growth, and converges on more stable classifying quality, and the strategy of stochastical sampling exists Reach preferable classifying quality and even occurs the case where performance decline later.
In order to sufficiently study the performance change situation of different sampling policies, selection using other classifiers to data set into Row modeling, the performance comparison for comparing different sampling policies on different classifications device only describe labeled data ratio to save space Example is performance comparison situation in the case where 20%, 40%, 60%, 80%, 100%.Table 3 illustrates different on different classifications device The performance comparison of sampling policy.
The performance comparison of different sampling policies on 3 different classifications device of table does comparative test based on paired t-test, performance compared with Good is shown with black matrix
Either still logistic regression is used to classify using random forest grader from the results shown in Table 3 Device, it is proposed that the Sampling Estimation strategy based on comentropy all achieve optimal effect in most cases, even if in portion It is not optimal in the case of point, it is many is not weaker than optimal performance yet.It is possible thereby to illustrate, for different classifier or In the case where different mark example ratios, it is proposed that the Sampling Strategies based on comentropy can there is stable performance to mention It rises.
In addition from the performance comparable situation of two kinds of different classifications devices, we can also be seen that be sampled based on comentropy and estimate The strategy of meter possesses more stable performance, and with increasing for mark sample, classification performance is promoted steadily until converging to best Level.And stochastic sampling strategy obviously shows the unstability of stronger randomness and performance.
Experiment on the common data collection of machine learning shows that this method effectively can be marked never in example and selects The example for needing manually to mark.
Basic principles and main features and advantages of the present invention of the invention have been shown and described above.The skill of the industry Art personnel it should be appreciated that the present invention is not limited to the above embodiments, the above embodiments and description only describe The principle of the present invention, without departing from the spirit and scope of the present invention, various changes and improvements may be made to the invention, these Changes and improvements all fall within the protetion scope of the claimed invention.The claimed scope of the invention by appended claims and Its equivalent thereof.

Claims (2)

1. a kind of statistical learning querying method based on comentropy Sampling Estimation, it is characterised in that: the statistical learning issuer Method the following steps are included:
Step 1: setting training examples x ∈ D=Rn, the label of example is ∈ Y={ y1, y2... yk, to the condition of training examples X Probability distribution is P (y | x), has marked sample set D using independent same distribution sampling, joint probability distribution P (x, y)=P (y | X) P (x) then generates posterior probability to input sample xTherefore the expected risk based on statistical learningAre as follows:
Step 2: loss function L is used to measure true probability distribution P (x, y) of sample (x, y) and posterior probability estimation is distributedDifference, then loss function L are as follows:
Step 3: expected riskThe target of optimization, which is that selection is optimal, does not mark sample sequence k={ x1, x2, x3... xk, wherein K indicates the number sampled in never mark sample, does not mark sample (x for each of sample sequence k*, y*), then
Step 4: determining there is determination for not marking sample for the range for being learnt based on not marking sample M, therefore being learnt Estimation P (x);Definition will not mark sample (x*, y*) be added marked sample set D generation new mark integrate as D*=D+ (x*, y*), new mark sample set D*Distribution function it is unknown, in order to effective calculation formula (2), using having marked sample The probability distribution of example collection is estimated currently not mark sample (x*, y*), then the empiric risk of current class deviceAre as follows:
Step 5: byIt calculates and does not mark sample x*, in y*Expected risk value, y in the case of ∈ Y*True value be it is unknown, Known probability distribution P (x, y) can be used, calculate estimated probability Distribution Value, using different probability distribution as weight, calculateFinal desired value are as follows:
Step 6: selecting the highest sample χ of comentropy in never mark example set MU, max:
xU, max=argmax (- ∑iPD(yi|x)log PD(yi|x)) (6);
Step 7: calculating the information entropy of sample to have marked sample combination D, according to comentropy, select uncertainty highest Q sample, calculates Q sample the expected risk of response, and the smallest sample of expected risk value is selected to carry out manual mark.
2. the statistical learning querying method according to claim 1 based on comentropy Sampling Estimation, it is characterised in that: described Step 7, the specific steps are as follows:
Step 1: input initialization flag data collection D={ x1,...xi, Unlabeled data collection
M={ xi+1,...xi+u, data markers y1i... yl, maximum cycle Umax
Step 2: output condition probability distribution
Step 3: using flag data initialization training pattern P (x, y);
Step 4: when reaching maximum cycle or, then calculating corresponding information entropy according to formula (6) in unmarked training set M;
Step 5: according to comentropy, selecting the maximum Q sample of comentropy;
Step 6: sample is labeled using the classification in set Y, is added separately in training set, re -training model, according to Corresponding loss function is calculated according to formula (2);
Step 7: calculating corresponding empirical risk function according to formula (4);
Step 8: according to the difference of classification, the desired value of expected risk function is calculated according to formula (5), Q sample, which calculates, to be completed, It then enters step 9, Q sample and does not calculate completion, then return step 6;
Step 9: selection carries out manual mark so that the smallest sample of desired value;If being returned without the smallest sample of desired value Step 2.
CN201910319193.XA 2019-04-19 2019-04-19 A kind of statistical learning querying method based on comentropy Sampling Estimation Pending CN110059752A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910319193.XA CN110059752A (en) 2019-04-19 2019-04-19 A kind of statistical learning querying method based on comentropy Sampling Estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910319193.XA CN110059752A (en) 2019-04-19 2019-04-19 A kind of statistical learning querying method based on comentropy Sampling Estimation

Publications (1)

Publication Number Publication Date
CN110059752A true CN110059752A (en) 2019-07-26

Family

ID=67319780

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910319193.XA Pending CN110059752A (en) 2019-04-19 2019-04-19 A kind of statistical learning querying method based on comentropy Sampling Estimation

Country Status (1)

Country Link
CN (1) CN110059752A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914061A (en) * 2020-07-13 2020-11-10 上海乐言信息科技有限公司 Radius-based uncertainty sampling method and system for text classification active learning
CN114169470A (en) * 2022-02-15 2022-03-11 南京航空航天大学 Artificial intelligence learning method based on target model and sample double sampling

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111914061A (en) * 2020-07-13 2020-11-10 上海乐言信息科技有限公司 Radius-based uncertainty sampling method and system for text classification active learning
CN111914061B (en) * 2020-07-13 2021-04-16 上海乐言科技股份有限公司 Radius-based uncertainty sampling method and system for text classification active learning
CN114169470A (en) * 2022-02-15 2022-03-11 南京航空航天大学 Artificial intelligence learning method based on target model and sample double sampling

Similar Documents

Publication Publication Date Title
Dong et al. Gaussian classifier-based evolutionary strategy for multimodal optimization
Huang et al. An effective hybrid learning system for telecommunication churn prediction
CN108446741B (en) Method, system and storage medium for evaluating importance of machine learning hyper-parameter
CN107766929B (en) Model analysis method and device
CN109308306A (en) A kind of user power utilization anomaly detection method based on isolated forest
CN111324642A (en) Model algorithm type selection and evaluation method for power grid big data analysis
CN105809672B (en) A kind of image multiple target collaboration dividing method constrained based on super-pixel and structuring
Guo et al. CST: Convolutional Swin Transformer for detecting the degree and types of plant diseases
CN109461025A (en) A kind of electric energy substitution potential customers&#39; prediction technique based on machine learning
CN109273096A (en) A kind of risk management grading evaluation method based on machine learning
Bifet et al. Ensembles of restricted hoeffding trees
JP6897749B2 (en) Learning methods, learning systems, and learning programs
CN106202388B (en) A kind of user gradation Automated Partition Method and system
CN111815054A (en) Industrial steam heat supply network short-term load prediction method based on big data
CN107403255A (en) Chinese medicine price composite index assessment method and system
CN116485280B (en) UVC-LED production quality evaluation method and system based on artificial intelligence
CN108595499A (en) A kind of population cluster High dimensional data analysis method of clone&#39;s optimization
CN110059752A (en) A kind of statistical learning querying method based on comentropy Sampling Estimation
CN110365603A (en) A kind of self adaptive network traffic classification method open based on 5G network capabilities
Chundawat et al. TabSynDex: A universal metric for robust evaluation of synthetic tabular data
CN117495512B (en) Order data management method, device, equipment and storage medium
CN110232397A (en) A kind of multi-tag classification method of combination supporting vector machine and projection matrix
CN107274025B (en) System and method for realizing intelligent identification and management of power consumption mode
CN112541010B (en) User gender prediction method based on logistic regression
CN116956160A (en) Data classification prediction method based on self-adaptive tree species algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190726

RJ01 Rejection of invention patent application after publication