CN103678512A

CN103678512A - Data stream merge sorting method under dynamic data environment

Info

Publication number: CN103678512A
Application number: CN201310608553.0A
Authority: CN
Inventors: 姚远
Original assignee: Dalian Nationalities University
Current assignee: Dalian Minzu University
Priority date: 2013-12-26
Filing date: 2013-12-26
Publication date: 2014-03-26

Abstract

The invention relates to the technical field of intelligent information processing and discloses a data stream merge sorting method under a dynamic data environment. According to the data stream merge sorting method, a data stream sorting model is established through ensemble learning and a mixed model frame, the requirement for the massiveness feature, the real-time feature and the dynamic change feature of a data stream can be met, and accuracy of data stream sorting is improved. An ensemble learning model utilizes relevant content of the ensemble learning theory, sorting is conducted through multiple sorters, and the sorting effect and the ability to fit in with the dynamic nature of the data stream are improved. In addition, sorting results are clustered through a clustering method, the internal relationship between the sorting results is effectively utilized, the improvement of the sorting accuracy is facilitated, and time consumed by sorting is shortened.

Description

Data stream hybrid sorting process under a kind of dynamic data environment

Technical field

The present invention relates to intelligent information processing technology field, particularly the data stream hybrid sorting process under a kind of dynamic data environment, is applicable to network invasion monitoring, the aspects such as network security monitoring, sensing data monitoring and mains supply.

Background technology

Along with the development of Internet of Things, and the arrival in " large data " epoch, traditional data mining method is being faced with new challenge, and wherein the variation of data mode is the most important and basic content.Traditional data form mainly be take static data as main, and its finite capacity can be stored and substantially unchanged.Therefore, the design to traditional data mining algorithm, often tentation data is static, considers it is more algorithm itself rather than data mode adjustment.

But in recent years, along with going deep into of Informatization Development, a kind of brand-new data mode, data stream, becomes mainstream data form gradually.Different from static data form, data stream mainly comprises three kinds of essential characteristics, i.e. magnanimity, real-time and dynamic change, if therefore continue the simple traditional data mining method of applying mechanically, often cannot obtain gratifying result again, or even complete failure.The research of excavating for data stream at present also Just because of this, becomes new study hotspot.

Concerning data flow classification problem, its key problem is the sorting technique that design adapts to data flow characteristics (magnanimity, real-time and dynamic change).Specifically, compared with traditional classification method, the magnanimity feature of data stream requires method for classifying data stream under cannot the prerequisite of store historical data, data to be trained and to be classified; The requirement of real-time disaggregated model of data stream is in assorting process, except considering classification accuracy aspect, also need the classification time be optimized and compress, before new data stream produces, complete classification overall process as much as possible, the operational efficiency of disaggregated model has been proposed to new requirement; The dynamic change of data stream requires disaggregated model to have certain extendibility and self, can adapt to the variation of data stream.Because so, design the disaggregated model that meets three kinds of features of data stream completely, be the target that academia is pursued always, and current proposed sorting technique, major part can only meet one or both data flow characteristicses, can only reach to a certain extent the requirement of classification.

International, domestic not yet appearance at present adapts to the sorting technique of data flow characteristics, urgently the data stream hybrid sorting process under a kind of dynamic data environment completely.

Summary of the invention

The object of the invention is: for solving above-mentioned problems of the prior art, provide the hybrid sorting process of the data stream under a kind of dynamic data environment, can meet the feature of data stream magnanimity, real-time and dynamic change, reach classificating requirement.

For achieving the above object, the technical solution used in the present invention is: the data stream hybrid sorting process under a kind of dynamic data environment, specifically comprises the following steps:

Step 1: dynamic dataflow collection module 102 is collected data according to time sequencing from magnanimity real-time stream 101.

Step 2: data stream is divided the data flow data in module 103 read step 1, and according to the time order and function relation of data flow data, data stream is divided; Described data stream initialization module 103 is divided in the data block obtaining, and comprising 3 class data is respectively training set, checking collection and test set, and the data sample quantity that each data centralization comprises is N; N is fixed variable, by user, is set in advance.

Step 3: will divide the resulting three kinds of static data collection of module 103 through data stream is that training set, test set and checking collection are input to data initialization module 104, and static data collection is normalized.

Step 4: the training set data after data initialization module 104 is processed is input in integrated classifier module 105, and 105 pairs of training set data of described integrated classifier module are classified and build integrated classifier.

Step 5: utilize parameter optimization module (106) to carry out parameter optimization to integrated classifier model in step 4;

Step 6: the checking collection after data initialization module (104) is processed is input in the integrated classifier after step 5 is optimized, and the data category label obtaining is data set L;

Step 7: data set L is input in cluster module 107, used Clustering Model is trained.

Step 8: the resulting test set data of data initialization module 104 are input in constructed hybrid classification model, complete data flow classification process.

Wherein, in described step 2, data stream is divided the division of 103 pairs of data stream of module, comprises the following steps:

Step 2.1: first use slip window sampling to carry out staticize processing to magnanimity real-time stream; Wherein, the each distance of sliding of moving window is n, and the sample size that each static subset comprises is also nindividual;

Step 2.2: use and to randomly draw method the resulting subset of step 2.1 is mixed, obtain respectively three data sets, i.e. training set, test set and checking collection, wherein the size of training set and test set is 4 n.

Wherein, in described step 3, data initialization module 104 adopts MapMinMax method for normalizing to be normalized data, comprises the following steps:

Step 3.1: first by the training set obtaining, test set and checking collection, respectively its each property value is added up, found the minimum and maximum property value of each attribute;

Step 3.2: each attribute to data set is normalized, described method for normalizing formula is:

Wherein, x _irepresent of current sample iindividual property value, min ( x _i) and max ( x _i) represent respectively current ithe minimum of individual attribute and maximal value, ymax and ymin represent respectively normalized upper and lower bound, if while wanting to normalize to [0,1] interval, ymax is that 1, ymin is 0.

Wherein, in described step 4, data integration classifier modules 105 adopts supporting vector machine model as basic classification model, data stream to be classified, and builds integrated classifier, comprises the following steps:

Step 4.1: first use two kinds of supporting vector machine models as basic classification model, i.e. C-SVM and ν (nu)-SVM model;

Step 4.2: use three kinds of functions to divide above-mentioned two kinds of supporting vector machine models, obtain six different support vector machine disaggregated models, wherein, the kernel function of using be linear kernel function, gaussian radial basis function kernel function and Sigmoid kernel function;

Step 4.3: to the integrated study model training obtaining.

Wherein, in described step 5,106 pairs of constructed integrated classifiers of parameter optimization module carry out parameter optimization, and the optimization method that uses is particle cluster algorithm, and optimizing process comprises following steps:

Step 5.1: first by the parameter of using in the constructed disaggregated model of C-SVM and gaussian radial basis function kernel function cwith gextract;

Step 5.2: the verification msg collection after 104 normalization of data stream initialization module is input in this model, then uses PSO algorithm to be optimized parameter, wherein the fitness function in optimizing process is used mthe method of cross validation, its formula table is shown:

Wherein, parameter mthe quantity of the sample set extracting from checking collection, l ⁱrepresent the sample size in each sample set, l ⁱ _trepresent to be classified correct sample size in subset;

Step 5.3: by the parameter after optimizing cwith gjoin in model and use as model inner parameter.

Wherein, the classification results that in described step 7, cluster module 107 provides for integrated classifier is that data set L carries out cluster, obtains final classification results, the clustering method that uses for Self-organizing Maps, comprise following steps:

Step 7.1: first to the training of SOM model, the SOM model after being trained;

Step 7.2: test set is input in the Ensemble classifier model after building, obtains the class label data set that test set is corresponding;

Step 7.3: class label data set is input in the SOM model training, model calculates institute's sample of input and the final distance of classification, finds the node that is activated, and computing method are as follows:

Wherein, xrepresent input sample, w _irepresent the weight between each node of SOM model;

Step 7.4: repeating step 7.2 is to step 7.3, until that all data are all classified is complete.

Wherein, the test set using in described step 2.2 is the set outside checking collection and training set, and its size is equal to moving window size n, parameter nartificial setting in advance.

Wherein, the integrated study model training method that uses in described step 4.3, comprises following sub-step:

Step 4.3.1: first training set is divided into six data subsets, division methods is halving method;

Step 4.3.2: being input to respectively in six sorters in integrated study model after dividing well trained.

Wherein, the PSO optimization method that uses in described step 5.2, comprises following sub-step:

Step 5.2.1: first use random value to carry out assignment to the variable that will optimize;

Step 5.2.2: then constantly update two variablees in optimizing process v[] and presentvalue, update method is as follows,

Wherein, v[] represents the speed of searching optimization of PSO algorithm, present[] represents that current optimal value is in position and the direction of solution space,

represent a random function, the random value scope providing is (0,1), variable c1with c2represent the study factor;

Step 5.2.3: repeat above-mentioned steps, until meet the fitness function in step 5.2.

Wherein, in described step 7.1, the training process of SOM Clustering Model that uses comprises following steps:

Step 7.1.1: first verification msg collection is input in integrated study disaggregated model, is verified the corresponding categorical data collection of data set L;

Step 7.1.2: by the training of resulting categorical data set pair SOM model.

Wherein, described data stream 101 comprises: network intrusion monitoring, network security monitoring, sensing data monitoring and mains supply various aspects data.

The invention has the beneficial effects as follows: the present invention adopts integrated study and mixture model framework to build data flow classification model, can adapt to the requirement of magnanimity, real-time and three kinds of features of dynamic change of data stream, and improve the accuracy rate of data flow classification.Wherein, integrated study model has utilized the theoretical related content of integrated study, by using a plurality of sorters to classify, improves the ability of classifying quality and adaptation data stream dynamic.In addition, clustering method gathers classification results, effectively utilizes the internal relations between classification results, is conducive to improve classification accuracy, reduces the elapsed time because of classification institute.

Accompanying drawing explanation

Fig. 1 is the FB(flow block) of the data stream hybrid sorting process under a kind of dynamic data environment of the present invention.

Fig. 2 is that the present invention utilizes integrated study to build a kind of embodiment of sorter.

Fig. 3 is the process flow diagram that data set of the present invention is converted into tally set.

Accompanying drawing sign: 101-data stream, 102-data stream collection module, 103-data stream is divided module, 104-data stream initialization module, 105-integrated classifier module, 106-parameter optimization module, 107-cluster module.

Embodiment

Below in conjunction with drawings and Examples, the present invention is described in detail.

With reference to Fig. 1, the framework of the data stream hybrid sorting process under a kind of dynamic data environment of the present invention, comprises data stream 101, data stream collection module 102, data stream is divided module 103, data stream initialization module 104, integrated classifier module 105, parameter optimization module 106, cluster module 107;

Wherein, data stream collection module 102 sequencing according to the time from data stream 101 obtains stream data, described data stream 101 comprises the data stream to any type known to persons of ordinary skill in the art, particularly including network invasion monitoring data stream, network security monitoring data stream, sensing data monitor data stream and mains supply data stream.Because data stream is that real time mass produces, therefore cannot to data, preserve by the mode of physical store, data are just deleted after finishing using.

Data stream is divided module 103 and from data stream collection module 102, is obtained stream data sample, and according to the moving window capacity artificially setting in advance, according to the time order and function relation of data sample, data stream is divided, and obtains the data subset of a plurality of static state.The sample size that these subsets comprise is identical, and does not occur simultaneously each other.Data stream is divided the specified size of module 103 and is specified in advance by user, and data stream initialization module 104, integrated classifier module 105, parameter optimization module 106, the division result that the designed raw data set of cluster module 107 is divided module 103 by data stream obtains.

Data stream is divided to the data block obtaining after module 103 is divided and be input in data stream initialization module 104, it is carried out to initialization operation, content comprises: first use existing method for normalizing to be normalized original data block; Then, use the method for randomly drawing, obtain two new data set, be i.e. training set and checking collection.Wherein training set is used for to the training of Ensemble classifier model, and checking collection is used for Clustering Model to train.

The training set that data stream initialization module 104 is obtained, is input in integrated classifier module 105, and integrated classifier is learnt.Integrated classifier module 105 is used supporting vector machine model as basic classification device, and by using different support vector machine (C-SVM and nv-SVM) and kernel function (linear kernel function, gaussian radial basis function kernel function and Sigmoid kernel function) to build 6 kinds of different disaggregated models, use training set to train it.

The constructed sorter operation parameter of integrated classifier module 105 is optimized to module 106 and carry out parameter optimization.First by the parameter of using in the constructed disaggregated model of C-SVM and gaussian radial basis function kernel function cwith gextract, then the enough verification msg collection of data stream initialization module 104 normalization are input in this model, then use PSO algorithm, parameter is optimized, wherein fitness function is:

Wherein, parameter mthe quantity of the sample set extracting from checking collection, l ⁱrepresent the sample size in each sample set, l ⁱ _trepresent to be classified correct sample size in subset; Finally by the parameter after optimizing cwith gjoin in model and use as model inner parameter.

Ensemble classifier model module 105 after the input of verification msg collection is optimized, obtains label data collection, then uses the label data set pair clustering module 107 obtaining to train, and completes the structure to hybrid classification model.

The new data that data stream 101 is produced is as test data, and usage data stream collection module 102 carries out staticize processing, and then usage data stream divides module 103 and divide, and obtains static data set.104 pairs of test sets of last usage data stream initialization module are normalized, and the data of handling well are input in the constructed hybrid classification model of above-mentioned steps, finally obtain Data classification result.

With reference to Fig. 2, described a kind of embodiment of utilizing integrated study to build sorter.In multi-categorizer builds, used the thought of integrated study, use a plurality of supporting vector machine models to carry out Ensemble classifier model construction.Two kinds of supporting vector machine models (C-SVM and nv-SVM) have been used respectively, and construct 6 kinds of different disaggregated models in conjunction with 3 kinds of kernel functions (linear kernel function, gaussian radial basis function kernel function and Sigmoid kernel function), they are integrated and build Ensemble classifier model integral body.

With reference to Fig. 3, described data set is converted into the process flow diagram of tally set.After the Ensemble classifier model of training dataset by structure, each sorter can provide classification results, i.e. a class label.In the present invention, Ensemble classifier model includes 6 sorters, so data can obtain 6 class labels after being input to Ensemble classifier model.The data of these labels and input have corresponding relation, by Ensemble classifier model, mutually transform.Resulting class label data set is using the input data as Clustering Model, for follow-up work provides Data support.

Example 1

A data stream hybrid sorting process under dynamic data environment, specifically comprises the following steps:

Step 1: using Australian personal credit data set as data stream, dynamic dataflow collection module (102) is collected data according to time sequencing from data stream; It comprises sample size is 690,15 attributes, and wherein front 14 attributes are data attribute, and the 15th attribute is category attribute; By statistics, what class label was " 1 " accounts for 55.5% of overall data sample, and class label is " 0 " accounts for 44.5% of data integral body.

Step 2: data stream is divided module (103) and read Australian personal credit data flow data, and according to the time order and function relation of data flow data, data stream is divided;

Described data stream is divided the division of module (103) to data stream, comprises the following steps:

Step 2.1: first data stream is carried out to staticize processing, use moving window method, according to time sequencing, data stream is divided, window size is set as 10, therefore obtains 69 data subsets, and front 30 data subsets are taken out.

Step 2.2: use the method for randomly drawing to extract above-mentioned subset, obtain training set and checking collection, and the sample size of each set is 120; All the other 39 data subsets are as test set, for follow-up test is prepared.

Step 3: will divide the resulting three kinds of static data collection of module (103) through data stream is that training set, test set and checking collection are input to data initialization module (104), and static data collection is normalized;

Described data initialization module (104) adopts MapMinMax method for normalizing to be normalized data, comprises the following steps:

Step 3.1: data attribute value is mapped to [0,1] interval; With reference to MapMinMax formula, suppose that the maximin of the 1st attribute is respectively 100 and 50, and ymax is that 1, ymin is 0, the sample that property value is 66 so;

Step 3.2: above-mentioned attribute is normalized, after normalization is:

。

Step 4: the training set data after data initialization module (104) is processed is input in integrated classifier module (105), and described integrated classifier module (105) is used training set data to train, and builds integrated classifier model.

Step 4.1: adopting supporting vector machine model (Support vector machine, SVM) is fundamental classifier.

Step 4.2: using different sorters and kernel function to build 6 kinds of disaggregated models, is respectively C-SVM and linear kernel function (Model1), C-SVM and gaussian radial basis function kernel function (Model2), C-SVM and Sigmoid kernel function (Model3), v-SVM and linear kernel function (Model4), v-SVM and gaussian radial basis function kernel function (Model5), v-SVM and Sigmoid kernel function (Model6).

Step 4.3: to the integrated study model training obtaining; Described integrated study model training method, comprises following sub-step:

Step 4.3.1: training set is used to the method randomly draw, be divided into 6 subsets (X1, X2 ..., X6), and each subset has same sample quantity; Quantity can shift to an earlier date artificially to be set, and for example 100.

Step 4.3.2: by the subset obtaining, be input to respectively in 6 kinds of corresponding sorters and train, complete training process.

Step 5: utilize parameter optimization module (106) to carry out parameter optimization to the Model2 in integrated classifier model model in step 4, the optimization method that uses be particle cluster algorithm (PSO), optimizing process comprises following steps:

Step 5.2: the checking collection after normalization is input in Ensemble classifier model, uses PSO algorithm to be optimized parameter;

Described PSO optimization method, optimizing process comprises following sub-step:

Step 5.2.1: to parameter cwith guse random value to carry out assignment, suppose cbe 0.5, gbe 0.7, be then brought in model and classify;

Step 5.2.3: calculate fitness function value, check and whether meet the demands; Fitness function computing method are: suppose that by the Model1 sample size accurately of classifying be that 60, Model2 is that 80, Model3 is that 30, Model4 is that 50, Model5 is that 78, Model6 is 88, and total sample number amount is 100:

。

If the fitness function value of setting is in advance 50%, parameter meets the demands, and optimizing process finishes.

If the fitness function value of setting is in advance 80%, parameter does not meet the demands, and uses PSO algorithm to upgrade parameter; Suppose speed of searching optimization v[] be 0.6, the study factor c1 He c2 are respectively 0.3 and 0.4, for parameter c, parameter current value present[] and be 0.5, random value rand () is 0.1, current optimal value pbest[] and be 0.5, global optimum gbest[] be 0.6:

；

Finally by calculating parameter cnew value be 1.104, parameter grenewal process and parameter csimilar, repeat no more herein.

Repeat said process, until meet fitness function requirement, complete optimizing process.

Step 6: verification msg collection is input in the Ensemble classifier model after optimization, obtains label data collection L;

Step 7: use label data collection L to train Self-organizing Maps (SOM) Clustering Model, training process is as follows:

Step 7.1: use random value initialization SOM model;

Step 7.2: suppose that it is L that categorical data is concentrated a certain vector _i= l ₁, l ₂..., l ₆limit weight is w _j ={ w _{1
j}, w _{2
j}, w _{6
j};

Step 7.3: computing activation node; Suppose current sample vector for 1,0,1,1,1,0}, and limit weight be 0.1,0.5,0.3,0.4,0.2,0.4}:

Figure 2013106085530100002DEST_PATH_IMAGE017

；

By activating point and the classification of inputted sample vector, think associatedly, complete the training to SOM model.

Step 8: the resulting test set data of data initialization module (104) are input in constructed hybrid classification model, complete data flow classification process.Described test set assorting process is:

The first step: test set is input in integrated classifier model, collects the class label that each sub-classifier provides, thereby obtain class label data set L.

Second step: class label data set L is input in SOM model, finds and activate node.

The 3rd step: using the classification that activates node as data category, complete assorting process.

Example 2

Step 1: using German personal credit data set as data stream, dynamic dataflow collection module (102) is collected data according to time sequencing from data stream; It comprises sample size is 1000,20 attributes, and wherein front 19 attributes are data attribute, and the 20th attribute is category attribute; By statistics, what class label was " 1 " accounts for 70% of overall data sample, and class label is " 0 " accounts for 30% of data integral body.

Step 2: data stream is divided module (103) and read German personal credit data flow data, and according to the time order and function relation of data flow data, data stream is divided;

Step 2.1: first data stream is carried out to staticize processing, use moving window method, according to time sequencing, data stream is divided, window size is set as 10, therefore obtains 100 data subsets, and front 40 data subsets are taken out.

Step 2.2: use the method for randomly drawing to extract above-mentioned subset, obtain training set and checking collection, and the sample size of each set is 400; All the other 60 data subsets are as test set, for follow-up test is prepared.

Step 3.1: data attribute value is mapped to [0,1] interval; With reference to MapMinMax formula, suppose that the maximin of the 1st attribute is respectively 350 and 120, and ymax is that 1, ymin is 0, the sample that property value is 136 so;

Step 3.2: above-mentioned attribute is normalized, after normalization is:

。

Step 4.3.1: training set is used to the method randomly draw, be divided into 6 subsets (X1, X2 ..., X6), and each subset has same sample quantity; Quantity can shift to an earlier date artificially to be set, and for example 300.

Step 5.2.1: to parameter cwith guse random value to carry out assignment, suppose cbe 12, gbe 15, be then brought in model and classify;

Step 5.2.3: calculate fitness function value, check and whether meet the demands; Fitness function computing method are: suppose that by the Model1 sample size accurately of classifying be that 100, Model2 is that 200, Model3 is that 250, Model4 is that 247, Model5 is that 232, Model6 is 189, and total sample number amount is 300:

Figure 2013106085530100002DEST_PATH_IMAGE021

。

If the fitness function value of setting is in advance 90%, parameter does not meet the demands, and uses PSO algorithm to upgrade parameter; Suppose speed of searching optimization v[] be 0.45, the study factor c1 He c2 are respectively 0.2 and 0.3, for parameter c, parameter current value present[] and be 12, random value rand () is 0.1, current optimal value pbest[] and be 12, global optimum gbest[] be 15:

；

Finally by calculating parameter cnew value be 12.54, parameter grenewal process and parameter csimilar, repeat no more herein.

Step 6: verification msg collection is input in the Ensemble classifier model after optimization, obtains label data collection L.

Step 7.1: use random value initialization SOM model;

Step 7.3: computing activation node; Suppose current sample vector for 1,1,0,0,1,0}, and limit weight be 0.7,0.5,0.8,0.2,0.6,0.9}:

；

Above content is the further description of the present invention being done in conjunction with optimal technical scheme, can not assert that the concrete enforcement of invention only limits to these explanations.Concerning general technical staff of the technical field of the invention, not departing under the prerequisite of design of the present invention, can also make simple deduction and replacement, all should be considered as protection scope of the present invention.

Claims

1. the data stream hybrid sorting process under dynamic data environment, specifically comprises the following steps:

Step 1: dynamic dataflow collection module (102) is collected data according to time sequencing from magnanimity real-time stream (101);

Step 2: data stream is divided the data flow data in module (103) read step 1, and according to the time order and function relation of data flow data, data stream is divided; Described data stream initialization module (103) is divided in the data block obtaining, and comprising 3 class data is respectively training set, checking collection and test set, and the data sample quantity that each data centralization comprises is N; N is fixed variable, by user, is set in advance;

Step 4: the training set data after data initialization module (104) is processed is input in integrated classifier module (105), and described integrated classifier module (105) is used training set data to train, and builds integrated classifier model;

Step 7: data set L is input in cluster module (107), used Clustering Model is trained;

Step 8: the resulting test set data of data initialization module (104) are input in constructed hybrid classification model, complete data flow classification process.

2. the data stream hybrid sorting process under a kind of dynamic data environment according to claim 1, is characterized in that, in described step 2, data stream is divided the division of module (103) to data stream, comprises the following steps:

3. the data stream hybrid sorting process under a kind of dynamic data environment according to claim 1, it is characterized in that, in described step 3, data initialization module (104) adopts MapMinMax method for normalizing to be normalized data, comprises the following steps:

4. the data stream hybrid sorting process under a kind of dynamic data environment according to claim 1, it is characterized in that, in described step 4, data integration classifier modules (105) adopts supporting vector machine model as basic classification model, data stream to be classified, and build integrated classifier, comprise the following steps:

Step 4.3: to the integrated study model training obtaining.

5. the data stream hybrid sorting process under a kind of dynamic data environment according to claim 1, it is characterized in that, in described step 5, parameter optimization module (106) is carried out parameter optimization to constructed integrated classifier, the optimization method that uses is particle cluster algorithm, and optimizing process comprises following steps:

Step 5.2: the verification msg collection after data stream initialization module (104) normalization is input in this model, then uses PSO algorithm to be optimized parameter, wherein the fitness function in optimizing process is used mthe method of cross validation, its formula table is shown:

6. the data stream hybrid sorting process under a kind of dynamic data environment according to claim 1, it is characterized in that, the classification results that in described step 7, cluster module (107) provides for integrated classifier is that data set L carries out cluster, obtain final classification results, the clustering method that uses is Self-organizing Maps, comprises following steps:

7. according to the data stream hybrid sorting process under a kind of dynamic data environment claimed in claim 2, it is characterized in that, the test set using in described step 2.2 is the set outside checking collection and training set, and its size is equal to moving window size n, parameter nartificial setting in advance.

8. the data stream hybrid sorting process under a kind of dynamic data environment according to claim 4, is characterized in that the integrated study model training method that uses in described step 4.3 comprises following sub-step:

9. according to the data stream hybrid sorting process under a kind of dynamic data environment claimed in claim 5, it is characterized in that, the PSO optimization method that uses in described step 5.2, comprises following sub-step:

Wherein, v[] represents the speed of searching optimization of PSO algorithm, present[] represents that current optimal value is in position and the direction of solution space, represent a random function, the random value scope providing is (0,1), variable c1with c2represent the study factor;

10. the data stream hybrid sorting process under a kind of dynamic data environment according to claim 6, is characterized in that, in described step 7.1, the training process of SOM Clustering Model that uses comprises following steps:

Step 7.1.2: by the training of resulting categorical data set pair SOM model.