CN104102716A - Imbalance data predicting method based on cluster stratified sampling compensation logic regression - Google Patents
Imbalance data predicting method based on cluster stratified sampling compensation logic regression Download PDFInfo
- Publication number
- CN104102716A CN104102716A CN201410341930.3A CN201410341930A CN104102716A CN 104102716 A CN104102716 A CN 104102716A CN 201410341930 A CN201410341930 A CN 201410341930A CN 104102716 A CN104102716 A CN 104102716A
- Authority
- CN
- China
- Prior art keywords
- data
- stratified
- sample
- class
- logic regression
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Complex Calculations (AREA)
Abstract
The invention relates to an imbalance data predicting method based on cluster stratified sampling compensation logic regression, belongs to the field of imbalance data prediction, and aims to solve the problem that traditional predicting models are poor in imbalance data predicting effect. The method includes the steps of firstly, using a k-means algorithm to cluster a to-be-predicted sample set so as to obtain K categories of data; secondly, performing stratified sampling on the K categories of data so as to extract n data; thirdly, performing maximum likelihood estimation on the parameters of a stratified sample logic regression model to obtain the parameter estimator of the stratified sample logic regression model and determine the stratified sample logic regression model; inputting the n data into the stratified sample logic regression model to determine whether the to-be-predicted sampler set is an imbalance data set or not. The method is applicable to fields such as biology, medicine, engineering and computing which need imbalance data prediction.
Description
Technical field
The invention belongs to unbalance data prediction field.
Background technology
As everyone knows, decision-making must depend on prediction.Prediction is estimation and the deduction to making future, in order to reach this purpose, often will imitate or abstract real world (or claim research object), and this process is referred to as modeling.The model of therefore, one " good " can not only be expressed reality and should be able to be passed through the real data slice-of-life rule of development accurately.Therefore, forecast model is a kind of prediction or prophesy that is expressed as feature with quantification.
Forecasting problem towards unbalance data set is the difficulties in natural science field, and has important actual application value at numerous areas such as biology, medical science, engineering, calculating.Fact proved, in the situation that data category is unbalance, directly adopt Classical forecast model all can not reach the prediction effect that makes us acceptance.
The stratified sampling technology now adopting mainly comprises the stratified sampling method of network-oriented flow data, for the data hierarchy methods of sampling of IT system application appraisal expansion platform with towards the method for sampling of the stratified sampling of high attribute dimension data.Above three kinds of layered sampling method are all towards the real data of specific area, and formulate the stratified sampling of corresponding Stratified Strategy guide data according to data self character is artificial.
And existing logistic regression forecasting techniques, be applied in to adopt to utilize more and penalize logistic regression (PLR) model according to the method for quality screening plant embryos, by method and the method based on pseudomorphism in multivariate logistic regression detection ICU patient record of logistic regression algorithm predicts organic chemicals biodegradability, and logistic regression forecasting techniques is not used in to the prediction field of unbalance data set.
Summary of the invention
The object of the invention is in order to solve the bad problem of effect of the unbalance data of Classical forecast model prediction, the invention provides a kind of unbalance data predication method returning based on hierarchical cluster sampling compensation logic.
The unbalance data predication method returning based on hierarchical cluster sampling compensation logic of the present invention,
It comprises the steps:
Step 1: adopt k-means algorithm to carry out cluster to sample set to be predicted, obtain the data of K class;
Step 2: carry out stratified sampling to obtaining the data of K class, extract n data;
Step 3: the parameter of the Logic Regression Models of stratified sample is carried out to maximal possibility estimation, obtain the parameter estimation formula of stratified sample Logic Regression Models, determine stratified sample Logic Regression Models;
Step 4: the n of an extraction data are inputed in stratified sample Logic Regression Models, determine whether sample set to be predicted is unbalance data set.
Beneficial effect of the present invention is, the present invention adopts the method for hierarchical cluster sampling first unbalance data to be resampled, and cuts down in a large number the noise data of impact prediction, reduces unbalance ratio, reduces the generation of data submerge phenomenon; Secondly, the change distributing for the data after sampling, proposes a kind of parametric compensation logistic regression forecast model, proofreaies and correct prediction probability value when effectively improving estimated performance.Through verification experimental verification, Forecasting Methodology of the present invention can significantly improve the precision of prediction of unbalance data.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the unbalance data predication method returning based on hierarchical cluster sampling compensation logic described in embodiment one.
Fig. 2 is the level division principle schematic diagram based on cluster in embodiment two.
Embodiment
Embodiment one: in conjunction with Fig. 1, present embodiment is described, the unbalance data predication method returning based on hierarchical cluster sampling compensation logic described in present embodiment, it comprises the steps:
Step 1: adopt k-means algorithm to carry out cluster to sample set to be predicted, obtain the data of K class;
Step 2: carry out stratified sampling to obtaining the data of K class, extract n data;
Step 3: the parameter of the Logic Regression Models of stratified sample is carried out to maximal possibility estimation, obtain the parameter estimation formula of stratified sample Logic Regression Models, determine stratified sample Logic Regression Models;
Step 4: the n of an extraction data are inputed in stratified sample Logic Regression Models, determine whether sample set to be predicted is unbalance data set.
Stratified sampling, is also named type sampling.Be exactly by population unit by some important property feature divide into several classes's type or layer, then all types of or layer in adopt simple random sampling (simple random sampling) or systematic sampling (system sampling) mode sample drawn unit.Feature is: due to by drawing class layering, increased the common point between all types of middle units, easily extracted representative investigation sample out.Stratified sampling is more more accurate than simple random sampling and systematic sampling, can pass through the investigation to less sampling unit, obtain inferred results more accurately, particularly, when totally large, inner structure is complicated, stratified sampling often can obtain gratifying effect.Meanwhile, stratified sampling, in to overall deduction, can also obtain the inference to every layer.The method is applicable to general status complexity, between constituent parts, differs greatly, and the situation that unit is more.Stratified random smapling can be done more accurately and estimate overall attribute than random sampling.
Stratified sampling is the heterogeneous stronger stronger subpopulation of homogeney one by one that is totally divided into, then the sample extracting in different subpopulations represents respectively this subpopulation, and all samples and then representative are overall.Than simple random sampling, first stratified sampling will carry out the division of level, i.e. layering.Adopt in actual applications the method for sampling of stratified sampling, most important work is exactly how sample to be carried out to rational level division, and the sampling that makes sample after layering is expression population distribution and the characteristic of refining more.It is emphasis and the difficulties of stratified sampling that level is divided.So present embodiment adopts the mode of cluster to carry out level division.
Cluster is one of the most common technology of Data Mining, for finding that each group forming by cluster process is called a class at the unknown data class of database.Before cluster, quantity and type that data class is divided are all unknown.This data class divide according to being " things of a kind come together, people of a mind fall into the same group ", press the similarity between individuality or data object, research object is divided into some.Cluster is returned into some classifications a group objects according to similarity, and object is to make to belong between other object of same class to have similar as far as possible feature, and between object in belonging to a different category, has as much as possible relatively only.Therefore good theoretical direction and the feasible method of providing is provided for level that, clustering method is stratified sampling.
Embodiment two: present embodiment is described in conjunction with Fig. 2, present embodiment is the further restriction to the unbalance data predication method returning based on hierarchical cluster sampling compensation logic described in embodiment one, in step 1, adopt k-means algorithm to carry out cluster to sample set to be predicted, the method for obtaining the data of K class comprises:
Step is one by one: select K data at random in sample set to be predicted, each data is as Yi Gelei center;
Step 1 two: according to the principle nearest apart from each Lei center, by other data allocations in sample set to be predicted in each corresponding class;
Step 1 three: for each class, calculate the average property value of all data in such, and using described average property value as such Xin center;
Step 1 four: according to the principle nearest apart from each Lei Xin center, again by the data allocations in sample set to be predicted in each corresponding class; And whether class and the step 1 class of two minutes that judgement divides is again identical, if identical, stops, and determines the data of K class, if not identical, proceeds to step 1 three.
Present embodiment adopts k-means clustering algorithm to be applied to the middle-level division of stratified sampling, why selects k-means clustering algorithm except its feature such as simple, effective, the most important thing is, this clustering algorithm can be set the number of cluster classification in advance.From level, divide, apply the number of plies that this algorithm namely can the required division of predefined, can effectively control sampling process like this.Embodiment three: present embodiment is the further restriction to the unbalance data predication method returning based on hierarchical cluster sampling compensation logic described in embodiment one, in step 3,
The parameter estimation formula of described stratified sample Logic Regression Models is
α wherein
1and β ' is the unknown parameter of stratified sample Logic Regression Models, the vector that β ' is 1 * m, β '=(β
1..., β
m)
t, x
ijbe i j feature of data extracting, m is the Characteristic Number of the data of each extraction, i=1, and 2,3 ..., n; y
ithe predicted value of i the data that extract, y
ivalue is { 0,1};
Described Logic Regression Models is
The feature vector, X of the data of each extraction=(x
1, x
2..., x
m), x
mm the feature for the data that extract.
Existing logistic regression forecast model directly applies to the data subset after sampling mostly, and the probability that enters sample due to dissimilar data is different, and the distribution of sample point and population distribution no longer have homogeneity.Under stratified sampling condition, due to the inconsistency of sample distribution and population distribution, directly adopt maximum likelihood estimate to cause the estimated bias of model parameter and probability, cause prediction probability value inaccurate.Present embodiment adopts the parametric compensation of logistic regression under a kind of stratified sampling, causing a deviation when to maximal possibility estimation gives reasonably compensation and makes logistic regression prediction adapt to inconsistent that data distribute, and finally makes prediction probability value more level off to actual probability of happening.
Logic Regression Models is a kind of nonlinear model, so the parameter estimation of model adopts maximum likelihood to estimate conventionally.Can prove, under random sample condition, the maximal possibility estimation of Logic Regression Models has consistance, progressive validity and asymptotic normality.Yet in much research, sampling is not completely random, but adopts the method for stratified sampling, and this just need to consider the Parameter Estimation Problem of Logic Regression Models under stratified sampling condition.
In logistic regression, dependent variable Y
i(i=1,2,3 ..., n) follow Bernoulli probability distribution, the probability that dependent variable is 1 is P
i, be that 0 probability is 1-P
i, P
i/ 1-P
irefer to the diversity ratio odds that event occurs.Vector X
i(i=1,2,3 ..., n) for the vectorization of observation sample represents, the attribute number that constant K is sample, the namely number of representation class.
Y
i~Bernoulli(Y
i/P
i) (1)
Be logarithm diversity ratio above, two sides, by negate logarithm, can recently represent by difference.
For certain, apply especially the expression way that Logic Regression Models has plurality of optional to select; Also relatively easy from the angle logistic regression calculating, and have many instruments can carry out the parameter estimation of logistic regression; In actual applications, the performance of logistic regression is also pretty good.We notice, if we know diversity ratio or logarithm diversity ratio, are easy to so calculate corresponding probability of happening.
Wherein, unknown parameter α
0be a constant, β ' is the vector of K * 1, corresponding each independent variable.The parameter of model is estimated by the method for maximal possibility estimation:
For random sampling (x
i, y
i), i=1,2 ..., n, takes the logarithm by two sides, and in conjunction with formula (2), log-likelihood function is reduced to:
Unknown parameter α
0and the maximal possibility estimation equation of the value of β ' by below obtains.
Under random sample condition, the maximal possibility estimation of Logic Regression Models has consistance, progressive validity and asymptotic normality.Yet in the research of some problem, sampling is not completely random, but adopt the method for stratified sampling.Under random sampling condition, the distribution of sample point is identical with population distribution; And under stratified sampling condition, the probability that enters sample due to dissimilar data is different, the distribution of sample point and population distribution no longer have homogeneity.Under stratified sampling condition, due to the inconsistency of sample distribution and population distribution, directly adopt maximum likelihood estimate to cause the estimated bias of model parameter and probability.The estimated bias that the art of this patent produces for the stratified sampling logistic regression forecast model of unbalance data set is studied, and proposes a kind of compensation method of estimated bias.
In population sample N, group very given figure is P
0n, large classification sample number is (1-P
0) N, adopt stratified sampling to extract respectively n in little classification and large classification sample
1and n
2individual as sample.Make λ
0for the ratio of overall medium and small class number with large class number, λ
0=P
0n/ (1-P
0) N=P
0/ (1-P
0); λ
1for the ratio of the medium and small class number of sample with large class number, λ
1=n
1/ n
2.By theory, derive, to stratified sample (x
i, y
i), i=1,2 ..., n, log-likelihood function is:
Utilize formula (5) to obtain
Wherein,
for with the irrelevant number of solve for parameter.If make α
1=α
0+ λ, parameter alpha
1, the maximal possibility estimation of β ' can be obtained by following system of equations:
Formula (11) is the parameter estimation formula to stratified sample Logic Regression Models.Under random sampling, sample distribution is consistent with population distribution, λ
1=λ
0, thereby λ=0, α
1=α
0, formula (11) is identical with formula (8), so formula (11) can be seen the popularization of an accepted way of doing sth (8) under stratified sampling.
Parameter and probability estimate deviation are carried out to theoretical analysis below, contrast formula (11) and formula (8) see, under stratified sampling, with formula (8) estimation model, is by α
1, the estimation of β ' is when doing α
0, the estimation of β ', this can cause:
1) deviation that constant term is estimated
α
1=α
0+ λ, λ is α more
1larger, formula for stratified sample (8) estimation model be there will be to estimated value and the positively related phenomenon of λ of constant term, relevant with methods of sampling design to the estimation of constant term, in stratified sample, λ value obtains greatlyr, and the estimated value of the constant term obtaining is just more
2) deviation of probability estimate
If Z=is α
0+ β ' X, due to α
1> α
0, use α
1, β ' replaces α
0, β ' will make Z increase, thereby makes
increase, will over-evaluate so the other probability of happening of group, and λ get larger, this amplitude of over-evaluating is just larger.
There is two internal factors, i.e. deviation proportion and absence of information in unbalance data set.Wherein, deviation proportion (being designated as S) refers to large classification and other ratio of group, and it has represented the degree that data are unbalance.The number that during stratified sampling, level is divided, is designated as H.In stratified sampling process, the art of this patent, for the feature of unbalance data set, proposes the method for hierarchical cluster, and the sampling strategy adopting is very this all collection of group, and large classification sample Shuo equivalent collection by group from each layer.Adopt this Sampling Strategies combination discussion above, can obtain
From formula (12), see, unbalance than S, more λ is larger, and this illustrates for unbalance data set, and data are unbalance, and situation is more serious, and λ is larger, and this deviation of over-evaluating is just larger.The strategy that formula (12) also can instruct level to divide, it is more serious that data are unbalance, and stratified sampling more trends towards multi-segment, the deviation that can reduce to over-evaluate.
The present invention success specifically extracts in application and implements and obtain successfully in answer.It is a sub-field of information extraction research that answer is extracted, and is also the important core ingredient of question answering system, and it is the sign that question answering system is different from text retrieval system under ordinary meaning.It is a kind of typical two classification problems that answer is extracted, and candidate answers may be only two kinds and have a kind of in form, is answer or is not answer.Therefore, this class problem is applicable to adopting the method for logistic regression to analyze and process theoretically.And in actual conditions, the quantity of correct option, far fewer than the quantity of disturbing answer, makes sample data serious unbalance.These features are just being applicable to the Forecasting Methodology returning based on hierarchical cluster sampling compensation logic that the art of this patent proposes.Therefore in the application of, extracting in the answer of InsunQA system, adopt the method to extract correct option.
The task that the information retrieval part of InsunQA system completes is that each problem is returned to 70 associated paragraphs.In these paragraphs, may comprise the correct option of problem, certainly wherein also comprise a large amount of interference answers.All these candidate answers are carried out to vector representation with the feature of extracting above, and each sample comprises 15 characteristic attribute values above.The fundamental purpose that answer is extracted is exactly in candidate answers, to extract correct answer, the answer abstracting method that logic-based returns is in fact the process of a candidate answers sequence, this just needs a sequence formula that contains above characteristic attribute, and formula is extracted in namely answer.
Because answer extracted data collection is typical unbalance data set, can adopt the method for the hierarchical cluster sampling that this chapter proposes to extract the sample set that extracts parameter estimation for answer, and the estimated bias compensation method of using SPSS software and us to propose, just can obtain the solve for parameter value α in formula (13)
0and β '.Wherein, β ' is characteristic weights set.Work as α
0with the value of β ' is known, predictor formula is so:
So, formula is extracted in the answer that formula (14) namely generates.Through type (14) can predicting candidate answer be just the probability of correct option, and can to candidate answers, sort according to the size of probable value, usings the candidate answers of probable value maximum as final correct option.
Claims (3)
1. the unbalance data predication method returning based on hierarchical cluster sampling compensation logic, is characterized in that, it comprises the steps:
Step 1: adopt k-means algorithm to carry out cluster to sample set to be predicted, obtain the data of K class;
Step 2: carry out stratified sampling to obtaining the data of K class, extract n data;
Step 3: the parameter of the Logic Regression Models of stratified sample is carried out to maximal possibility estimation, obtain the parameter estimation formula of stratified sample Logic Regression Models, determine stratified sample Logic Regression Models;
Step 4: the n of an extraction data are inputed in stratified sample Logic Regression Models, determine whether sample set to be predicted is unbalance data set.
2. the unbalance data predication method returning based on hierarchical cluster sampling compensation logic according to claim 1, is characterized in that, in step 1, adopts k-means algorithm to carry out cluster to sample set to be predicted, and the method for obtaining the data of K class comprises:
Step is one by one: select K data at random in sample set to be predicted, each data is as Yi Gelei center;
Step 1 two: according to the principle nearest apart from each Lei center, by other data allocations in sample set to be predicted in each corresponding class;
Step 1 three: for each class, calculate the average property value of all data in such, and using described average property value as such Xin center;
Step 1 four: according to the principle nearest apart from each Lei Xin center, again by the data allocations in sample set to be predicted in each corresponding class; And whether class and the step 1 class of two minutes that judgement divides is again identical, if identical, stops, and determines the data of K class, if not identical, proceeds to step 1 three.
3. the unbalance data predication method returning based on hierarchical cluster sampling compensation logic according to claim 1, is characterized in that,
In step 3,
The parameter estimation formula of described stratified sample Logic Regression Models is
α wherein
1and β ' is the unknown parameter of stratified sample Logic Regression Models, the vector that β ' is 1 * m, β '=(β
1..., β
m)
t, x
ijbe i j feature of data extracting, m is the Characteristic Number of the data of each extraction, i=1, and 2,3 ..., n; y
ithe predicted value of i the data that extract, y
ivalue is { 0,1};
Described Logic Regression Models is
The feature vector, X of the data of each extraction=(x
1, x
2..., x
m), x
mm the feature for the data that extract.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410341930.3A CN104102716A (en) | 2014-07-17 | 2014-07-17 | Imbalance data predicting method based on cluster stratified sampling compensation logic regression |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410341930.3A CN104102716A (en) | 2014-07-17 | 2014-07-17 | Imbalance data predicting method based on cluster stratified sampling compensation logic regression |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104102716A true CN104102716A (en) | 2014-10-15 |
Family
ID=51670870
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410341930.3A Pending CN104102716A (en) | 2014-07-17 | 2014-07-17 | Imbalance data predicting method based on cluster stratified sampling compensation logic regression |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104102716A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104636447A (en) * | 2015-01-21 | 2015-05-20 | 上海天呈医流科技股份有限公司 | Intelligent evaluation method and system for medical instrument B2B website users |
CN106407706A (en) * | 2016-09-29 | 2017-02-15 | 北京理工大学 | Boruta algorithm-based multi-level old people physical state quantization level calculation method |
CN106982230A (en) * | 2017-05-10 | 2017-07-25 | 深信服科技股份有限公司 | A kind of flow rate testing methods and system |
CN110458199A (en) * | 2019-07-16 | 2019-11-15 | 中国传媒大学 | Based on the kohonen neural network clustering methods of sampling |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014060753A1 (en) * | 2012-10-16 | 2014-04-24 | Randox Laboratories Ltd. | Diagnosis and risk stratification of bladder cancer |
-
2014
- 2014-07-17 CN CN201410341930.3A patent/CN104102716A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014060753A1 (en) * | 2012-10-16 | 2014-04-24 | Randox Laboratories Ltd. | Diagnosis and risk stratification of bladder cancer |
Non-Patent Citations (2)
Title |
---|
张吉凯 等: ""聚类在流行病学分层分析中的应用"", 《基础理论与方法》 * |
彭寿康: ""分层抽样条件下Logistic回归模型"", 《统计研究》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104636447A (en) * | 2015-01-21 | 2015-05-20 | 上海天呈医流科技股份有限公司 | Intelligent evaluation method and system for medical instrument B2B website users |
CN104636447B (en) * | 2015-01-21 | 2017-12-29 | 上海天呈医流科技股份有限公司 | A kind of intelligent Evaluation method and system towards medicine equipment B2B websites user |
CN106407706A (en) * | 2016-09-29 | 2017-02-15 | 北京理工大学 | Boruta algorithm-based multi-level old people physical state quantization level calculation method |
CN106982230A (en) * | 2017-05-10 | 2017-07-25 | 深信服科技股份有限公司 | A kind of flow rate testing methods and system |
CN110458199A (en) * | 2019-07-16 | 2019-11-15 | 中国传媒大学 | Based on the kohonen neural network clustering methods of sampling |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | An extended cellular automaton using case‐based reasoning for simulating urban development in a large complex region | |
CN107633265A (en) | For optimizing the data processing method and device of credit evaluation model | |
CN109242149A (en) | A kind of student performance early warning method and system excavated based on educational data | |
CN102737120A (en) | Personalized network learning resource recommendation method | |
Bai et al. | A forecasting method of forest pests based on the rough set and PSO-BP neural network | |
Verma et al. | Fuzzy association rule mining based model to predict students’ performance | |
CN105354208A (en) | Big data information mining method | |
CN104102716A (en) | Imbalance data predicting method based on cluster stratified sampling compensation logic regression | |
CN107729555A (en) | A kind of magnanimity big data Distributed Predictive method and system | |
Vultureanu-Albişi et al. | Improving students’ performance by interpretable explanations using ensemble tree-based approaches | |
Xu et al. | CET-4 score analysis based on data mining technology | |
Gross et al. | Systemic test and evaluation of a hard+ soft information fusion framework: Challenges and current approaches | |
Abdulrahman et al. | Machine Learning in Nonlinear Material Physics | |
CN106815320B (en) | Investigation big data visual modeling method and system based on expanded three-dimensional histogram | |
Pramudita et al. | Optimization Analysis of Neural Network Algorithms Using Bagging Techniques on Classification of Date Fruit Types | |
Ntoutsi et al. | A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees | |
bin Othman et al. | Neuro fuzzy classification and detection technique for bioinformatics problems | |
Kumari et al. | Analyzing the Factors Influencing the Waiting Time to First Citation and Long-Term Impact of Publications. | |
Rajkumar et al. | Image segmentation method based on finite doubly truncated bivariate gaussian mixture model with hierarchical clustering | |
Park et al. | A new forecasting system using the latent dirichlet allocation (LDA) topic modeling technique | |
Yuan et al. | Early Detecting the At-risk Students in Online Courses Based on Their Behavior Sequences | |
Yun et al. | [Retracted] Quality Evaluation and Satisfaction Analysis of Online Learning of College Students Based on Artificial Intelligence | |
Zanellati et al. | Representation of learning in the post-digital: students’ dropout predictive models with artificial intelligence algorithms | |
Prasad et al. | Analysis and prediction of crime against woman using machine learning techniques | |
Truscott et al. | Detecting shadow economy sizes with symbolic regression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20141015 |
|
RJ01 | Rejection of invention patent application after publication |