CN108170695A - One data stream self-adapting Ensemble classifier method based on comentropy - Google Patents

One data stream self-adapting Ensemble classifier method based on comentropy Download PDF

Info

Publication number
CN108170695A
CN108170695A CN201611158475.9A CN201611158475A CN108170695A CN 108170695 A CN108170695 A CN 108170695A CN 201611158475 A CN201611158475 A CN 201611158475A CN 108170695 A CN108170695 A CN 108170695A
Authority
CN
China
Prior art keywords
concept
grader
data
drift
window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611158475.9A
Other languages
Chinese (zh)
Inventor
孙艳歌
卲罕
刘宏兵
冯岩
王淑礼
姚建峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xinyang Normal University
Original Assignee
Xinyang Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xinyang Normal University filed Critical Xinyang Normal University
Priority to CN201611158475.9A priority Critical patent/CN108170695A/en
Publication of CN108170695A publication Critical patent/CN108170695A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries

Abstract

The invention discloses a data stream self-adapting Ensemble classifier methods based on comentropy,It can not only detect concept drift and can identify and repeat concept,Within the system,New grader is just only rebuild when detecting new concept and is put into grader pond,The problem of preventing from repeating repetition training caused by concept occurs,Reduce model modification frequency,Improve model real-time grading ability and classifying quality,By carrying out performance evaluation comparison with classical data flow algorithm on artificial synthesized data set and truthful data collection,Experiment shows that this method can not only cope with multiple types concept drift,Improve disaggregated model noise resisting ability,And under the premise of ensureing compared with high-class accuracy rate,Consume less time cost,This method can be applied to sensor network abnormality detection,Credit card fraud behavioral value,In numerous practical problems such as weather forecast and Research on electricity price prediction.

Description

One data stream self-adapting Ensemble classifier method based on comentropy
Technical field
The invention belongs to data minings and machine learning techniques field, are related to a kind of data flow towards concept drift environment Ensemble classifier method especially proposes a kind of detecting system for the concept that can handle reproduction.The experimental results showed that proposed Method has apparent advantage on average classification accuracy, and less time is consumed than other Integrated Algorithms, is suitble to multiple types The environment of type concept drift and with higher noise immunity.The system can be applied to sensor network abnormality detection, credit card In numerous actual application problems such as fraud detection, weather forecast and Research on electricity price prediction.
Background technology
In numerous actual application problems of real world, data all constantly generate in the form of streaming.It is this quick Reach, real-time, continuous and unbounded data sequence is known as data flow (Data Streams).In true data flow ring In border, data distribution can usually change with the time, its essence of this phenomenon reflection data flow may have unstable. For example, rule based on weather forecast may change with seasonal variations;Customer's shopping online preference analysis side Method may change with the variation of the factors such as the interest, businessman's prestige, service type of customer group;Industrial electricity can be with There is cyclically-varying in season alternation.Usually, the data distribution in this data flow as the time occurs in some way The phenomenon that variation referred to as concept drift (Concept Drift).With therefore, we are required for for many actual application problems A kind of study mechanism of specific Data Flow Oriented variation characteristic of research and development quickly, copes with these problems in real time.
Concept drift mode can be divided into mutation formula (Abrupt Concept Drift) and gradual change according to speed is changed Formula (Gradual Concept Drift).If in a relatively short period of time, data distribution is suddenly complete by another in data flow Different data distributions are replaced, then claim mutation formula concept drift has occurred in data flow at this time.The drift of this type usually exists Have no that (such as sensor breaks down suddenly) occurs in the case of sign, accuracy rate can be made drastically to decline even model and lost completely Effect.And gradual change type concept drift is then a kind of slow rate change (the gradual failure of such as sensor), when typically passing through longer one section Between after just it is observed that, and concept drift occur before and after have between concept it is more or less similar.And in actual environment, data Concept repeats to be generally existing in stream.Reproduce-type concept drift (Recurring Concept Drift) is a kind of spy The concept drift of different type, other than the characteristics of having both above two drift, certain conception of species has rule or irregular can weigh It appears again existing so that disaggregated model needs continuous progress repetition training to adapt to this variation.Such as electricity consumption throughout the year Data can change with seasonal periodicity;In social networks a certain topic may at a fixed time (such as red-letter day or election) period go out It is existing.
Concept drift is the challenge in data Mining stream, in recent years, is made for concept drift problem domestic and foreign scholars Big research is broadly divided into Case-based Reasoning selection, Case-based Reasoning weighting and integrated study three kinds of methods.Most of these algorithms are only It is handled for the concept drift of a certain type, is not fully considered the situation that concept can repeat.To this type concept Drift, it is desirable that model can usage history data, and can use when repeating concept and occur the model trained in the past into Row classification, so as to avoid repetition training.One ideal disaggregated model should be able to increment type study and adapt to a plurality of types of Variation.Therefore, the sorting algorithm for designing the concept drift that can cope with multiple types has important research significance.Integrated approach leads to It crosses and carrys out training individuals grader in different periods data to retain historical concept, therefore be a kind of effective processing concept drift Method.We focus mainly on how building the data flow Ensemble classifier model that the data-oriented regularity of distribution changes over time.
Current concept drift detection method is the variation being distributed according to the classification error rate of model come detection data mostly, Such as document " Learning with drift detection. " (Gama, J., Medas, P., and Castillo, G., et al..Learning with drift detection.In:Proceedings of the 17th Brazilian Symposium on Artificial Intelligence.Berlin:Springer-Verlag, 2004.pp.286-295.) What is proposed detects the DDM algorithms of variation (Drift Detection Method) by monitoring the error rate of "current" model, but It cannot effectively detect gradual change type concept drift.Then, document " Learning from time-changing data with Adaptive windowing. " (Bifet, A., and Gavalda, R..Learning from time-changing data with adaptive windowing.In:Apte, C., Skillicorn, D., and Liu, B., et al. (eds.) .Proceedings of the 7th SIAM International Conference on Data Mining(SDM 2007) .Philadelphia, PA:SIAM, 2007.pp.443-448.) it proposes based on Bernoulli Jacob's distribution detection concept drift Method EDDM (Early Drift Detection Method), can be to the same of the detection of mutation formula concept drift ensure that When, improve detection result of the algorithm to gradual change concept drift.Nishida etc. proposes STEPD algorithms, by acquiring training sample Classification accuracy and the classification accuracy of whole training samples is come detection concept drift.The adaptive sliding of the propositions such as Bifet Dynamic window algorithm ADWIN (ADaptive WINdowing), by comparing the difference of the mean value of the error rate between different child windows It is different to determine whether occur concept drift.Ross etc. proposes ECDD (EWMA for Concept Drift Detection) calculations Method, utilization index weighted moving average control figure (EWMA) monitor error rate, when error rate is more than certain threshold value, then illustrate to send out Raw concept drift.
However, algorithm above does not consider the problems of that concept can repeat mostly.It is just proposed early in Widmer in 1996 The problem of concept can repeat, the up to date concern for just obtaining academia in several years.Widmer etc. proposes FLORA3 algorithms, The description of the concept of history is saved, when concept reappear when, the grader of preservation is reactivated.Nishida etc. is carried A kind of Online integration algorithm ACE (Adaptive Classifier Ensemble) is gone out to cope with the appearance of repetition concept.Class As method there is Ramamurthy etc. to propose one based on integrated learning approach EB (Ensemble Building).EB algorithms exist Build one group of global grader in sequence of blocks of data, this method will not deleting history grader, but selectively from selecting Correlation classifier in global classification device.Katakis etc. is a kind of to find new concept using cluster, based on representation of concept model. Yang etc. sees concept as the state in Markov chain, learns the rule of concept drift during being converted from concept, and Concept is described by Markov model to convert, and selects a concept most like with current concepts.Gama etc. is employed two layers The model of grader, first layer trains grader according to current concepts, and wherein the second layer is created according to existing concept Grader.When detecting that concept drift occurs, then the grader of the second layer is reused.Deng]It proposes at one The general framework RCD (Recurring Concept Drifts) of the repeated concept drift of reason passes through the side of polynary nonparametric statistics Whether method identifies the new and old concept from same distribution.
Invention content
Technical problem:For concept drift, there are two problems demands to solve:First, how fast and accurately to detect concept Drift;Second is that after detecting drift how according to different types of variation come correction model to adapt to these variations.For this purpose, this hair It is bright to have designed and Implemented a kind of adaptive set constituent class method and system that cope with a variety of concept variations.Main contributions are as follows:
(1) for first problem, it is proposed that the concept drift detection method based on comentropy.Go out from the angle of comentropy Hair measures the distance of data distribution between new and old window by Jensen-Shannon divergences, can not only detect that concept is floated It moves, and repetition concept can be efficiently identified.
(2) for Second Problem, a kind of mechanism in grader pond is devised, after concept drift is detected, if newly Concept is then added in grader pond, if repeating concept then reuses existing grader.
(3) propose it is a kind of can detection concept drift simultaneously and using the integrated system for repeating concept, and manually closing Into with the experiment on truthful data stream, from classification accuracy, the multi-angles such as run time and noise immunity are investigated, and verification carries Go out the feasibility and validity of method.
Technical solution:In view of the repeatability of concept, identify that the cost of historical concept is smaller than creating new conceptual model It is more, therefore the necessary essential information by historical concept in data flow stores.History is preserved using grader pond Concept, one concept of each grader expression, when detecting that repetition concept occurs, quickly recalls relevant information and is handled, Reduce unnecessary repetition training.Therefore it needs to increase the concept detection method of an inside to increase algorithm to concept drift Adaptability, adaptive set constituent class algorithm (the Ensemble with concept drift testing mechanism of proposition Internal Change Detection, ECD).New grader is only just rebuild when detecting new concept and is put into point In class device pond, the problem of preventing from repeating repetition training caused by concept occurs, model modification frequency is reduced, model is improved and divides in real time Class ability and classifying quality.The present invention proposes adaptive Integrated Algorithm method, mainly forms in two stages:The concept detection stage With the Ensemble classifier stage.
The present invention proposes a data stream self-adapting Ensemble classifier method based on comentropy.Its specific steps is included such as Under:
step1:Initialize integrated classifier and buffer area;
step2:Example is moved into sliding window one by one;
step3:The detection model based on two windows proposed is utilized to be described as follows:Use W1={ xt+1, xt+2..., xt+nAnd W2={ xt+n+1..., xt+2nThe continuous equal-sized window of t moment two, W are represented respectively1Represent reference window Mouthful, W2Represent current window.With JSD (W1||W2) distance being distributed between two windows is measured, when this value is less than or equal to 10-5It is (non- Very close to when zero), representing that the data distribution of two windows is identical, that is, find to repeat concept;When more than 10-5During less than threshold tau, There was no significant difference for the distribution for thinking between two windows, when then showing there is concept drift at this time more than threshold value.Threshold value is adopted It is calculated with the method for bootstrap.Due to window one example of each forward slip, mutation formula can be detected in time Concept drift.
step4:When having detected concept drift generation, just with the distribution of the data for establishing grader in grader pond It is compared, if new concept then creates a grader and is added in grader pond, and corresponding data are placed on buffer area; If repeating concept then reuses existing grader.Grader sorts from high to low according to the frequency of reuse, when grader pond When the grader number of middle storage reaches maximum value, then the grader being least commonly used is replaced.
step5:According to the classification error rate of each base grader example in newest window, by the way of Nearest Neighbor with Weighted Voting Each example is predicted.
Description of the drawings
The classification accuracy of Fig. 1 different windows sizes compares.
Classification accuracy compares on Fig. 2 SEA data sets.
Classification accuracy compares on Fig. 3 Elist data sets.
Specific implementation method
Technical scheme of the present invention is further described below in conjunction with drawings and examples.
(1) the concept detection algorithm based on comentropy
In information theory, it is that measurement is identical that relative entropy (Relative Entropy), which is also known as Kullback-Leibler divergences, In event space X two probability distribution relative mistakes away from estimate.The relative entropy of two probability distribution p (x) and q (x) is defined as:
However, Kullback-Leibler divergences are unsatisfactory for symmetry, therefore it is not stringent distance conception. Jensen-Shannon divergences are a kind of distance metrics based on Kullback-Leibler divergences, it solves Kullback- The asymmetry problem of Leibler divergences.Jensen-Shannon divergences in information theory can represent two data point well Relationship between cloth, therefore the present invention proposes a kind of concept detection algorithm based on Jensen-Shannon divergences, by comparing Data distribution carrys out detection concept drift with the presence or absence of significant difference in two adjacent window apertures.Jensen-Shannon divergences are determined Justice is as shown in formula (2).
Detection model JSD (the W based on two windows proposed1||W2) distance being distributed between two windows is measured, when This value is less than or equal to 10-5It when (being in close proximity to zero), represents that the data distribution of two windows is identical, that is, finds to repeat concept;When More than 10-5During less than threshold tau, it is believed that there was no significant difference for the distribution between two windows, when then showing have at this time more than threshold value Concept drift occurs.Threshold value is calculated using the method for bootstrap.Due to window one example of each forward slip, because This can detect mutation formula concept drift in time.Pseudocode is as shown in algorithm 1.
Concept detection algorithm of the algorithm 1 based on comentropy
(2) the adaptive set constituent class system based on comentropy
Particularly, with E={ C1, C2..., CkRepresent the grader pond being made of k grader, while each point Class device is also attached to variable for recording the number that the grader is reused, B={ B1, B2..., BkRepresent the corresponding of storage Data, C ' expressions establish new grader.Newest data are safeguarded using sliding window model, for continually reaching Example (xi, yi), W1It represents to refer to (old) window, W2Represent current window.By comparing the distribution of new and old two window datas Distance come detection concept drift, when detecting concept drift, just with the data for establishing grader in grader pond Distribution is compared, if new concept then creates a grader and is added in grader pond, and corresponding data are placed on slow Deposit area;If repeating concept then reuses existing grader.Grader sorts from high to low according to the frequency of reuse, works as classification When the grader number stored in device pond reaches maximum value, then the grader being least commonly used is replaced.Then according to each base point Class device Ci(i=1,2 ..., k) in newest window example classification error rate, it is weighted by formula (4), weighting throw Ticket mode predicts each example.
Weight(Ci)=MSEr-MSEij (4)
Wherein, MSErFor the mean square deviation of stochastic prediction grader, MSEijFor base grader CiIt is predicted on current window Mean square deviation,It represents in grader CiMiddle prediction property value is xiClass value be yiProbability, p (yi) it is yiPriori it is general Rate.In this case, the sub-model for representing current concept is searched in grader pond, is reduced with learning relevant calculating A kind of new model, also improves the adaptation to concept drift in cost.Pseudocode is as shown in algorithm 2.
2 ECD pseudo-code of the algorithm of algorithm
The simulation result of the present invention
It in CPU is 2.8GHZ that this system, which is, inside saves as 8GB, operating system is to be tested in the PC machine of Windows 7 , experiment chooses 3 artificial generated data set pairs and proposes that model is verified, as shown in table 1.
1 artificial synthesized data acquisition system essential information of table
Table 1 Characteristic of synthetic datasets
3 type concept drifts are generated with data stream generator:Mutation formula, gradual change type and reproduce-type concept drift.
HyperPlane is most popular data flow data collection, the power which passes through change data sample attribute Value simulates concept drift phenomenon.Using the data flow generator HyperplaneGenerator generation examples in MOA in experiment Change the gradual change type concept drift data set that probability is 0.001.
SEA is Street to be proposed in 2001, famous when only containing continuous type attribute, was classical mutation formula concept Drift data collection.Its basic structure is<f1, f2, f3, C>, wherein f1、f2And f3For conditional attribute, C is generic attribute, only f1, f2And C It is related.When the attribute of example meets f1+f2During≤θ, belong to the first kind, otherwise belong to the second class.Data flow is used in experiment first Generator generation includes the data set of 3 mutation, occurs at 250K, 500K and 750K respectively, and then use can generate repetition The data flow generator generation of concept includes the data set of 3 repetition concepts.
Waveform data sets are each by 3 kinds of reference waveforms (each reference waveform is made of 21 numeric type attributes) Classification is all two of which or 3 kinds of combination.There are 40 Numeric Attributes waveforms using the generation of data flow generator in experiment Data flow data collection, including 19 uncorrelated attributes.
Emailing list (abbreviation Elist) are the data sets comprising burst concept drift and repetition concept, and Spam Filtering (abbreviation Spam) is then the data set for including gradual change concept drift, and two datasets are all with Boolean type bag of words mould Type represents.Data set can be in http://mlkd.csd.auth.gr/concept_driff.html is downloaded, in MOA Static digital simulation is generated data flow by ArffFileStream generators.
Elist simulates the continuously various e-mail messages from different field, and user can be according to interest this A little mail labels become rubbish or interested.Including 1500 examples and 913 attributes altogether, data are divided into 5 stages, Changed the appearance to simulate concept drift by the interest of user.Table 2 describes being recognized for which type in each stage For interested or spam, wherein (+) represents interested, (-) represents a spam.Use C1Represent user only to doctor The mail of etc is interested, C2Represent that user is interested in aviation and baseball, then this data flow represents C1, C2, C1, C2, C1Generally Read sequence.
2 Emailing list data sets of table describe
Table 2.Characteristic of Emailing list dstaset
Spam includes 9324 examples and 500 attributes, and each example represents the information of a mail, is divided into two types Type:Spam (only accounting for 20%) and legitimate mail.The feature of spam in data set is slowly varying with the time.
Experiment is first then used as training data using Prequential Evaluation Strategies, i.e. every example as test data, Accuracy rate is incremental update in this way.Do not have to detain data set using this Evaluation Strategy to test, so as to ensure to maximize profit With the information of each data, also ensure that accuracy rate has flatness at any time.
It is tested on SEA data sets, it is [500,2000] to test sliding window size n values respectively, is verified with this Window size sets the influence to algorithm performance.As seen from Figure 1, at the beginning, as the increase of window causes structure to classify The data of device increase, and classification accuracy also rises therewith.However, continuing to increase with window, concept drift finds lag, together When, the classification accuracy of grader has reached bottleneck, thus average classification accuracy slightly reduces.Table 3 shows put forward method Influenced very little by window size setting, when window size be 1000 when algorithm can obtain relatively high classification accuracy.
Classification accuracy under 3 different windows size of table
Table 3 accuracy using different window sizes
Then, algorithm is carried to be compared with following 3 algorithms:Hoeffding Tree (abbreviation HT), RCD and Accuracy Update Ensemble (abbreviation AUE).Wherein, HT and AUE is realized under MOA, and RCD can behttps:// sites.google.com/site/moaextensions/It obtains.For the ease of comparing, grader number k=in grader pond 15, using Hoeffding Tree, leafy node uses the base grader for the Ensemble classifier algorithm being compared Adaptive Bayes predicts class value, wherein nmin=100, confidence level δ=0.01, τ=0.05.It is accurate from classification respectively Two aspects of true rate and run time are compared.
Table 4 illustrates algorithm classification accuracy situation on 5 data sets.All in all, Integrated Algorithm is obtained than single point Class device algorithm wants high classification accuracy.HT algorithms showed on the data set comprising concept drift it is worst, this is because its There is no any processing concept drift mechanism, therefore be not suitable for concept drift environment.In no concept drift data set Waveform On, since data distribution is relatively stablized, all algorithm difference are simultaneously little, and the algorithm proposed does not have a clear superiority.And On gradual change type data set HyperPlane, AUE obtains highest accuracy rate, secondly the algorithm proposed.To find out its cause, it is Due to AUE algorithms, training generates grader constantly on latest data block, can and cope with gradual change type concept drift.And comprising On the data set SEA for repeating concept, propose that algorithm obtains highest accuracy rate, this is because increasing repetition concept drift Testing mechanism can establish new grader to adapt to unexpected concept variation in time.On truthful data collection Elist, the present invention It is proposed that algorithm behaves oneself best.Most preferably RCD is showed on Spam, secondly algorithm proposed by the invention.
Aspect at runtime, as shown in table 5.It is found by comparing analysis, the run time of HT is most short, secondly this hair Bright proposed algorithm, and AUE time loss longests.This is because algorithm proposed by the invention, which is one kind, is based on data distribution Detection algorithm, concept drift can be quickly detected and identification repeats concept, and an existing model is selected, avoid Repetition training, thus it is advantageous in time.Single classifier algorithm HT is although fastest, but on classification accuracy It shows worst.
4 algorithms of different classification accuracy (%) of table
Table 4.Comparison of Classification Accuracy (%)
5 algorithms of different run time (second) of table
Table 5.Comparison of Time Consumption
3 mutation have occurred in SEA data sets, from Fig. 2 it has been observed that all algorithm variation tendencies are basically identical.Most First phase data is more stable, and all algorithms all maintain higher, stable accuracy rate, and algorithm proposed by the invention has no Clear superiority.With the increase of data volume, concept drift number increases, so the classification accuracy of algorithm is all declined, The ratio that wherein HT algorithms decline is more serious, and fluctuates larger.When concept mutation occurs at 250K, 500K and 750K, own The accuracy rate of algorithm all drastically declines, and algorithm proposed by the invention maintains higher, stable accuracy rate.Institute of the present invention On the average value of the algorithm accuracy rate of proposition 20% or so is higher by than HT algorithm.This is because algorithm proposed by the invention can be fast Speed captures concept variation, and establishes new grader, so as to cope with this variation in time.Due to being added to 10% in data Noise also shows algorithm proposed by the invention with noise resisting ability.
Concept variation has unpredictability and uncertainty, therefore be more able to verify that algorithm in true data stream environment Generalization ability.On Elist, the situation of change of accuracy rate is as shown in figure 3, different journeys occurs in the accuracy rate curve of all algorithms The fluctuation of degree, this shows that there are concept drift phenomenons in the data set.And the accuracy rate curve phase of algorithm proposed by the invention To steady, the knowledge that history grader is possessed can be made full use of to solve the problems, such as that it is periodic that data flow concept drift is presented. Show that proposed method is influenced minimum by concept drift in data, to true well adapting to property of data environment.
By experimental contrast analysis, can obtain to draw a conclusion:(1) algorithm has bright on comprising the data set for repeating concept Aobvious advantage;(2) it is keeping compared with high score accuracy rate, is consuming the relatively small number of time;(3) there is certain stalwartness to noise Property.

Claims (1)

  1. A 1. data stream self-adapting Ensemble classifier method based on comentropy, it is characterised in that:Adaptive set constituent class method, It forms in two stages:Concept detection stage and Ensemble classifier stage;Its specific steps includes as follows:
    Step 1:Initialize integrated classifier and buffer area;
    Step 2:Example is moved into sliding window one by one;
    Step 3:The detection model based on two windows proposed is utilized to be described as follows:Use W1={ xt+1, xt+2..., xt+n} And W2={ xt+n+1..., xt+2nThe continuous equal-sized window of t moment two, W are represented respectively1Represent reference windows, W2Table Show current window.With JSD (W1||W2) distance being distributed between two windows is measured, when this value is less than or equal to 10-5It is (very close It when zero), represents that the data distribution of two windows is identical, that is, finds to repeat concept;When more than 10-5During less than threshold tau, it is believed that two There was no significant difference for distribution between a window, when then showing there is concept drift at this time more than threshold value.Threshold value uses The method of bootstrap is calculated.Due to window one example of each forward slip, it can detect that mutation formula is general in time Read drift;
    Step 4:When having detected concept drift generation, just with the distribution of the data for establishing grader in grader pond into Row compares, if new concept then creates a grader and is added in grader pond, and corresponding data are placed on buffer area;If It is to repeat concept then to reuse existing grader.Grader sorts from high to low according to the frequency of reuse, when in grader pond When the grader number of storage reaches maximum value, then the grader being least commonly used is replaced;
    Step 5:According to the classification error rate of each base grader example in newest window, by the way of Nearest Neighbor with Weighted Voting pair Each example is predicted.
CN201611158475.9A 2016-12-07 2016-12-07 One data stream self-adapting Ensemble classifier method based on comentropy Pending CN108170695A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611158475.9A CN108170695A (en) 2016-12-07 2016-12-07 One data stream self-adapting Ensemble classifier method based on comentropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611158475.9A CN108170695A (en) 2016-12-07 2016-12-07 One data stream self-adapting Ensemble classifier method based on comentropy

Publications (1)

Publication Number Publication Date
CN108170695A true CN108170695A (en) 2018-06-15

Family

ID=62527185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611158475.9A Pending CN108170695A (en) 2016-12-07 2016-12-07 One data stream self-adapting Ensemble classifier method based on comentropy

Country Status (1)

Country Link
CN (1) CN108170695A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359677A (en) * 2018-10-09 2019-02-19 中国石油大学(华东) A kind of resistance to online kernel-based learning method of classifying of making an uproar more
CN109615075A (en) * 2018-12-14 2019-04-12 大连海事大学 A kind of resident's daily behavior recognition methods based on multi-level clustering model
CN110445726A (en) * 2019-08-16 2019-11-12 山东浪潮人工智能研究院有限公司 A kind of adaptive network stream concept drift detection method based on comentropy
CN110705646A (en) * 2019-10-09 2020-01-17 南京大学 Mobile equipment streaming data identification method based on model dynamic update
CN111639694A (en) * 2020-05-25 2020-09-08 南京航空航天大学 Concept drift detection method based on classifier diversity and Mcdiarmid inequality
CN112765324A (en) * 2021-01-25 2021-05-07 四川虹微技术有限公司 Concept drift detection method and device
CN113033643A (en) * 2021-03-17 2021-06-25 上海交通大学 Concept drift detection method and system based on weighted sampling and electronic equipment
US11080352B2 (en) 2019-09-20 2021-08-03 International Business Machines Corporation Systems and methods for maintaining data privacy in a shared detection model system
US11157776B2 (en) 2019-09-20 2021-10-26 International Business Machines Corporation Systems and methods for maintaining data privacy in a shared detection model system
US11188320B2 (en) 2019-09-20 2021-11-30 International Business Machines Corporation Systems and methods for updating detection models and maintaining data privacy
US11216268B2 (en) 2019-09-20 2022-01-04 International Business Machines Corporation Systems and methods for updating detection models and maintaining data privacy
CN114422450A (en) * 2022-01-21 2022-04-29 中国人民解放军国防科技大学 Network flow analysis method and device based on multi-source network flow data
CN114513328A (en) * 2021-12-31 2022-05-17 西安电子科技大学 Network traffic intrusion detection method based on concept drift and deep learning
EP4027277A1 (en) * 2021-01-11 2022-07-13 Fundación Tecnalia Research & Innovation Method, system and computer program product for drift detection in a data stream
US20230025677A1 (en) * 2021-07-26 2023-01-26 Raytheon Company Architecture for ml drift evaluation and visualization

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020288A (en) * 2012-12-28 2013-04-03 大连理工大学 Method for classifying data streams under dynamic data environment
CN105809190A (en) * 2016-03-03 2016-07-27 南京邮电大学 Characteristic selection based SVM cascade classifier method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020288A (en) * 2012-12-28 2013-04-03 大连理工大学 Method for classifying data streams under dynamic data environment
CN103020288B (en) * 2012-12-28 2016-03-02 大连理工大学 Method for classifying data stream under a kind of dynamic data environment
CN105809190A (en) * 2016-03-03 2016-07-27 南京邮电大学 Characteristic selection based SVM cascade classifier method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
潘吴斌等: ""基于信息熵的自适应网络流概念漂移分类方法"", 《计 算 机 学 报》 *
黄 莉等: "两种相似度计算方法对 KNN分类效果的影响研究"", 《情 报 杂 志》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109359677A (en) * 2018-10-09 2019-02-19 中国石油大学(华东) A kind of resistance to online kernel-based learning method of classifying of making an uproar more
CN109359677B (en) * 2018-10-09 2021-11-23 中国石油大学(华东) Noise-resistant online multi-classification kernel learning algorithm
CN109615075A (en) * 2018-12-14 2019-04-12 大连海事大学 A kind of resident's daily behavior recognition methods based on multi-level clustering model
CN109615075B (en) * 2018-12-14 2022-08-19 大连海事大学 Resident daily behavior identification method based on multilayer clustering model
CN110445726A (en) * 2019-08-16 2019-11-12 山东浪潮人工智能研究院有限公司 A kind of adaptive network stream concept drift detection method based on comentropy
US11216268B2 (en) 2019-09-20 2022-01-04 International Business Machines Corporation Systems and methods for updating detection models and maintaining data privacy
US11080352B2 (en) 2019-09-20 2021-08-03 International Business Machines Corporation Systems and methods for maintaining data privacy in a shared detection model system
US11157776B2 (en) 2019-09-20 2021-10-26 International Business Machines Corporation Systems and methods for maintaining data privacy in a shared detection model system
US11188320B2 (en) 2019-09-20 2021-11-30 International Business Machines Corporation Systems and methods for updating detection models and maintaining data privacy
CN110705646A (en) * 2019-10-09 2020-01-17 南京大学 Mobile equipment streaming data identification method based on model dynamic update
CN110705646B (en) * 2019-10-09 2021-11-23 南京大学 Mobile equipment streaming data identification method based on model dynamic update
CN111639694A (en) * 2020-05-25 2020-09-08 南京航空航天大学 Concept drift detection method based on classifier diversity and Mcdiarmid inequality
EP4027277A1 (en) * 2021-01-11 2022-07-13 Fundación Tecnalia Research & Innovation Method, system and computer program product for drift detection in a data stream
CN112765324A (en) * 2021-01-25 2021-05-07 四川虹微技术有限公司 Concept drift detection method and device
CN113033643A (en) * 2021-03-17 2021-06-25 上海交通大学 Concept drift detection method and system based on weighted sampling and electronic equipment
CN113033643B (en) * 2021-03-17 2022-11-22 上海交通大学 Concept drift detection method and system based on weighted sampling and electronic equipment
US20230025677A1 (en) * 2021-07-26 2023-01-26 Raytheon Company Architecture for ml drift evaluation and visualization
US11816186B2 (en) * 2021-07-26 2023-11-14 Raytheon Company Architecture for dynamic ML model drift evaluation and visualization on a GUI
CN114513328A (en) * 2021-12-31 2022-05-17 西安电子科技大学 Network traffic intrusion detection method based on concept drift and deep learning
CN114513328B (en) * 2021-12-31 2023-02-10 西安电子科技大学 Network traffic intrusion detection method based on concept drift and deep learning
CN114422450A (en) * 2022-01-21 2022-04-29 中国人民解放军国防科技大学 Network flow analysis method and device based on multi-source network flow data
CN114422450B (en) * 2022-01-21 2024-01-19 中国人民解放军国防科技大学 Network traffic analysis method and device based on multi-source network traffic data

Similar Documents

Publication Publication Date Title
CN108170695A (en) One data stream self-adapting Ensemble classifier method based on comentropy
CN108491970B (en) Atmospheric pollutant concentration prediction method based on RBF neural network
CN112640380B (en) Apparatus and method for anomaly detection of an input stream of events
Din et al. Exploiting evolving micro-clusters for data stream classification with emerging class detection
Zhang et al. A hybrid learning framework for imbalanced stream classification
CN101826090A (en) WEB public opinion trend forecasting method based on optimal model
Yin et al. A real-time dynamic concept adaptive learning algorithm for exploitability prediction
Zhang Financial data anomaly detection method based on decision tree and random forest algorithm
Basterrech et al. Tracking changes using Kullback-Leibler divergence for the continual learning
Zhu et al. Adversarial training of LSTM-ED based anomaly detection for complex time-series in cyber-physical-social systems
Sarnovsky et al. Adaptive bagging methods for classification of data streams with concept drift
Orouskhani et al. Fuzzy adaptive cat swarm algorithm and Borda method for solving dynamic multi‐objective problems
Mao et al. Online sequential classification of imbalanced data by combining extreme learning machine and improved SMOTE algorithm
Zhang et al. Multi-weather classification using evolutionary algorithm on efficientnet
Viswambaran et al. Evolutionary design of long short term memory (lstm) ensemble
Song et al. A self-adaptive fuzzy network for prediction in non-stationary environments
Bathwal et al. Ensemble machine learning methods for modeling Covid19 deaths
Zhang et al. A high performance intrusion detection system using lightgbm based on oversampling and undersampling
Escovedo et al. Neve++: A neuro-evolutionary unlimited ensemble for adaptive learning
He et al. Overview of Key Performance Indicator Anomaly Detection
Wagde et al. A review on method of stream data classification through tree based approach
Lobo et al. A probabilistic sample matchmaking strategy for imbalanced data streams with concept drift
Courtney et al. Data Science Techniques to Detect Fraudulent Resource Consumption in the Cloud
Styp-Rekowski et al. Optimizing Convergence for Iterative Learning of ARIMA for Stationary Time Series
Fuli et al. Fault classification algorithm for smart meters based on multi-dimensional model fusion and big data flow analysis

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180615