CN103577602A - Secondary clustering method and system - Google Patents
Secondary clustering method and system Download PDFInfo
- Publication number
- CN103577602A CN103577602A CN201310581217.1A CN201310581217A CN103577602A CN 103577602 A CN103577602 A CN 103577602A CN 201310581217 A CN201310581217 A CN 201310581217A CN 103577602 A CN103577602 A CN 103577602A
- Authority
- CN
- China
- Prior art keywords
- data
- reference point
- cluster
- data stream
- density
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24553—Query execution of query operations
- G06F16/24561—Intermediate data storage techniques for performance improvement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a secondary clustering method which is applied to the technical field of data mining. The secondary clustering method comprises the following steps of carrying out partitioning on data flow and reading in data blocks, obtaining the reference point of density clusters by using DBSCAN algorithm clustering, carrying out k-means algorithm clustering on the obtained reference point of the density clusters and storing the k mean value reference point obtained through the k-means algorithm clustering by adopting a layered structure. The secondary clustering method effectively improves the clustering quality, the clustering precision and efficiency of the data flow which is distributed irregularly and contains noise.
Description
Technical field
The invention belongs to Mining Data Stream Technology field, relate in particular to a kind of secondary clustering method and system.
Background technology
In recent years, along with the development of hardware technology, there is increasing application to produce data stream, data stream is different from traditional static data on disk that is stored in, but the new data object of a class, it is data unlimited, continuous, orderly, fast-changing, magnanimity; Typical data stream packets includes network and the monitoring information data of system of monitoring road traffic are, the message registration data of telecommunication department, the various Monitoring Data of being passed back by sensor, the stock price information data in stock exchange and the Monitoring Data of environment temperature etc.When these features of data stream itself have determined data stream to process, the scanning of one to twice can only be done to data, and a small amount of data can only be stored temporarily.Therefore original a lot of ripe data mining, data analysis and data query technique become inapplicable in data stream, need to propose new solution.
Therefore, once the attention that occurs having caused researcher, there are a lot of achievements in research in the problem of data stream, data stream is studied from many aspects such as management, inquiry, analysis and mining algorithms; Mining Data Stream Technology is as the new problem of Data Mining, and a lot of mining algorithms need to be transformed for data stream; Data stream clustering is analyzed an important research direction of excavating as data stream, is faced with equally huge challenge, has also caused researchers' extensive concern, has occurred at present many relevant achievements in research, and has been applied in practice.
Traditional cluster is based upon under database manipulation pattern; Complicated query manipulation be stored and be supported to traditional database can to all data.Therefore, under database schema, classic method can adopt repeatedly reading out data, and data is carried out to the operations such as random access and realize the cluster to stored data.Yet under data stream environment, these methods of operating are all infeasible, the feature that data stream itself has makes traditional clustering algorithm (even can not) not directly apply to data stream clustering.
Thereby, to compare with traditional clustering method, Data Stream Clustering Algorithm should have following characteristics:
First, use limited internal memory and storage space.Data stream has continuous unlimitedness, and data total amount is wherein considerably beyond offering space (main memory) capacity that clustering algorithm is used, so the data in full storage data stream are infeasible, is also impossible.Data Stream Clustering Algorithm can not be stored all need data object to be processed, can only be by generalization or give up selectively data and guarantee that used space size is limited, reasonably.
Secondly, linear sweep increment type is processed or a scanning.For mass data ultra-large in data stream, linear sweep is unique effective reading out data method, and random reading out data needs quite expensive calculation cost.And, even the data in data stream are carried out to repeatedly linear sweep, be also to need a lot of calculation costs, because these data are stored in the very slow external equipment of reading speed conventionally.Moreover in a lot of data stream environment, data, with very fast velocity variations, do not need its storage.These data must be just processed when it produces, and is then dropped.Therefore, Data Stream Clustering Algorithm should only carry out a scanning to data, at least will realize the increment type of linear sweep and process.
Again, the processing of data recording is had to real-time.In data stream, the pace of change of data is very fast, very high to the requirement of response speed.Therefore, in Data Stream Clustering Algorithm, the processing procedure of usage data record must have very fast processing speed, and avoiding omitting need data recording to be processed.
But known Data Stream Clustering Algorithm is applicable to have the data of specific distribution mostly, and more responsive to noise.Yet, the data stream in practical application area mostly have data distribute irregular, contain the features such as noise, what make existing Data Stream Clustering Algorithm is difficult to obtain gratifying cluster quality.
Summary of the invention
The invention provides a kind of secondary clustering method and system, to address the above problem.
The invention provides a kind of secondary clustering method, comprise the following steps:
Data stream is carried out to piecemeal reading data piece;
Use DBSCAN algorithm cluster, obtain Density Cluster reference point;
The described Density Cluster reference point of obtaining is carried out k-means algorithm cluster and adopted the structure of layering to preserve the k mean reference point that k-means algorithm cluster obtains.
The invention provides a kind of secondary clustering system, comprising: piecemeal reads in module, Density Cluster reference point acquisition module, k mean reference point acquisition module; Piecemeal reads in module and is connected with k mean reference point acquisition module by Density Cluster reference point acquisition module;
Described piecemeal reads in module, for data stream being carried out to piecemeal reading data piece;
Described Density Cluster reference point acquisition module, for using DBSCAN algorithm cluster, obtains Density Cluster reference point;
Described k mean reference point acquisition module, for carrying out k-means algorithm cluster to the described Density Cluster reference point of obtaining and adopting the structure of layering to preserve the k mean reference point that k-means algorithm cluster obtains.
The present invention proposes a kind of secondary clustering method, first data stream is carried out to piecemeal and do the middle cluster result Density Cluster of DBSCAN cluster generation with reference to point set, subsequently these Density Cluster reference point are carried out to k-means cluster, by the structure of layering, preserve bunch reference point that each cluster obtains, effectively improve cluster quality, clustering precision and the efficiency of the irregular data stream with containing noise of data distribution.
Accompanying drawing explanation
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms the application's a part, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is the processing flow chart of the embodiment of the present invention 1;
Fig. 2 is the system diagram of realizing of the embodiment of the present invention 2.
Embodiment
Hereinafter with reference to accompanying drawing, also describe the present invention in detail in conjunction with the embodiments.It should be noted that, in the situation that not conflicting, embodiment and the feature in embodiment in the application can combine mutually.
Fig. 1 is the processing flow chart of the embodiment of the present invention 1, comprises the following steps:
Step 101: data stream is carried out to piecemeal reading data piece;
Step 102: use DBSCAN algorithm cluster, obtain Density Cluster reference point;
Step 103: the described Density Cluster reference point of obtaining is carried out k-means algorithm cluster and adopted the structure of layering to preserve the k mean reference point that k-means algorithm cluster obtains.
Wherein, the process of data stream being carried out to piecemeal reading data piece is: in moving window, realize the Circulant Block of data stream is processed, and obtain final cluster result; If data are untreated complete, read in next data block, until Data Stream Processing is complete.
Wherein, Density Cluster reference point is defined as follows:
In tentation data stream, data are with piece X1, X2,, Xn, form arrive, can in internal memory, process the data point that each data block comprises similar number for every;
Definition 1:{| ≠: Density Cluster reference point: the data block Xt that moment t is arrived, with the clustering algorithm based on density, carry out cluster, generate k.(kt=1,2) individual clustering, and average point is respectively cl,, Ci,, ek, data block will be by k.Individual shape is as the two tuples formations of (ci, ni), and ni is the data point number that is under the jurisdiction of ci in Xi, claims that rd (ei, ni) is the Density Cluster reference point in data stream.
Wherein, k mean reference point is defined as follows:
In tentation data stream, data are with piece X1, X2,, Xn, form arrive, can in internal memory, process the data point that each data block comprises similar number for every;
Definition 2:k mean reference point: individual the clustering of 2k of m Density Cluster reference point being carried out to the generation of k mean cluster, average point is respectively cl,, Ci, c2k, data block will consist of two tuples of 2k shape j/N (el, hi), and ni is the data point number that is under the jurisdiction of ci, claim that rk (ci, ni) is the k mean reference point in data stream.
The present invention proposes data stream secondary clustering method (Twice Clustering Streaming Algorithm-TCLUSA), to improve clustering precision.
TCLUSA is based on subregion thought, use DBSCAN method to delete outlier to every blocks of data cluster and using the average point of each bunch as its representative point, obtain Local Clustering result (Density Cluster reference point), then by k-means method, obtained representative point is carried out to cluster acquisition net result (k mean reference point).
TCLUSA describes:
In tentation data stream, data are with piece X1, X2,, Xn, form arrive, can in internal memory, process the data point that each data block comprises similar number for every.
Definition 1:{| ≠: Density Cluster reference point.The data block Xt that moment t is arrived, carries out cluster with the clustering algorithm based on density, generates k.(kt=1,2) individual clustering, and average point is respectively cl,, Ci,, ek, data block will be by k.Individual shape is as the two tuples formations of (ci, ni), and ni is the data point number that is under the jurisdiction of ci in Xi, claims that rd (ei, ni) is the Density Cluster reference point in data stream.
Definition 2:k mean reference point.M Density Cluster reference point carried out to individual the clustering of 2k of k mean cluster generation, average point is respectively cl,, Ci, c2k, data block will consist of two tuples of 2k shape j/N (el, hi), and ni is the data point number that is under the jurisdiction of ci, claim that rk (ci, ni) is the k mean reference point in data stream.
In definition l and definition 2,11i can be interpreted as to the weight of reference point.
The thought of TCLUSA algorithm based on subregion is carried out piecemeal processing to data stream, and preserve according to the structure of layering bunch reference point that each cluster obtains, preserve m reference point for every layer, this makes this algorithm can in limited memory headroom, realize the cluster to data stream, and data stream in chronological order every m data point forms a data block.
This algorithm is used DBSCAN algorithm to each data block cluster, calculates Density Cluster reference point, obtains Local Clustering result, uses k-means algorithm to Density Cluster reference point cluster, until obtain final result (k mean reference point).
Below the specific implementation of above-mentioned principle is elaborated:
False code is as follows:
Procedure?TCLUSA
/ wooden Function: the data stream of accumulation a period of time, in moving window, then utilizes TCLUSA algorithm to data stream clustering, and according to a plurality of cluster pieces of result division place.
Input: data block records number oncepattern, bunch number k, radius of neighbourhood eps, the minpits that counts that core point at least comprises.
Output:k bunch
*/
(1) do while (less than data stream end)
(2) f=1; //f represents the number of plies of processing
(3) read a data block;
(4) use DBSCAN algorithm to carry out cluster to this data block;
(5) calculate Density Cluster reference point, be stored in f layer;
(6) if (number of Density Cluster reference point==m) //m represents to store intermediate density bunch reference point maximum number
(7) t=0; //f is that two-valued variable: f is that 1 expression is all
Layer be not all filled with data,
//f is 0 and indicates that layer has been filled with data.
(8)dowhile(f==0)
(9) use k-means algorithm to i layer bunch reference point cluster;
(10) juice+;
(11) store the 2Ji} obtaining a k mean reference point into f layer;
(12) if (a bunch reference for f layer count unequal to institute)
(13)t?t=1;
(14)end?if
(15)end//end?while
(16)end?if
(17)end//end?while
(18) all Density Cluster reference point clusters to storage with k-means algorithm, generate final k bunch;
Step (1) realizes in moving window to be processed the Circulant Block of data stream, and obtains final cluster result.If data are untreated complete, read in next data block, until Data Stream Processing is complete; DBSCAN algorithm cluster is used to the data block of reading in step (3)-(5), and bulk density bunch reference point, is saved in i layer;
Step (6)-(8) judge that whether the Density Cluster reference point of i layer is full, if i layer reference point is full, use k.means algorithm to process the layering of Density Cluster reference point, and result is saved in to its last layer, until all layers are not all filled with; Step (9)-(11) are used k-means algorithm to carry out cluster to i layer reference point, obtain 2k mean reference point, are saved in i+l layer, the mean reference point number that (12) step judges its upper strata whether less than, as less than, end process; When data stream finishes, step (18) represents to use k-means algorithm to carry out cluster to all Density Cluster reference point of storage, obtains k bunch ((k mean reference point)) output.
Carry out Algorithm Analysis below, be described in detail as follows:
Note m is data point number in piece, k is cluster number of clusters, C is cycle index, t is for obtaining the data block number of m Density Cluster mean reference point, the time overhead of algorithm consists of two parts: data stream is carried out to piecemeal, 111 data points of order form a data block, the m of each a data block data point is carried out to DBSCAN cluster and generate Density Cluster mean reference point, and the expense of this part is O (m2); M Density Cluster mean reference point carried out to k.means cluster, and this part of expense is O (mkc).So total time overhead reaches O (m2t+mkc).
Because data point number m in deblocking is generally less, therefore, total time overhead is little.After generating m Density Cluster mean reference point, this m Density Cluster reference point carried out to k.means cluster, because it has simply, efficient advantage, its computation complexity is O (rake), conventionally have k<<m and c<m, and this part of calculation times is less than the number of times of bulk density bunch mean reference point, so its time overhead is less.Aspect the space expense of algorithm, because this algorithm adopts the structure of layering and by the information of average point, data preserved, make algorithm can in limited memory headroom, realize the cluster to data stream.
Aspect cluster quality, TCLUSA algorithm adopts DBSCAN algorithm to carry out ground floor cluster preprocessing to data stream, for distribution is irregular, processes with data that contain a large amount of noises, makes the Density Cluster reference point generating have higher precision.Adopt complete partitioning algorithm k.means to carry out cluster to Density Cluster reference point, therefore, in cluster process, retained all bunch reference point information, make the result of cluster more can reflect the overall condition that data distribute.
Carry out a concrete experiment below, technical solution of the present invention be elaborated:
Realize the Data Stream Clustering Algorithm TCLUSA software and hardware configuration proposing as follows: CPU frequency is P4.2.80G, inside saves as DDR.II256MB, and hard disk size is 80GB; Operating system is Windows XP, take C++ as development language.With Scat Comp coefficient [38], weigh cluster quality (Scat Comp coefficient is less, and cluster quality is better).Test the data source with network invasion monitoring data set KDD.CUP99 herein, this data set derives from 2 weeks interior 494020 the LAN (Local Area Network) linkage records in Lincoln laboratory of MIT, is a kind of noise and data irregular data that distribute that contain.Every of data centralization records 42 attributes, corresponding normal mode or certain intrusion model, and utilization 120000 data recording are wherein imitated flow data and are carried out cluster, have deleted the nonumeric attribute in data, only use 34 numerical attributes wherein.
Data Stream Clustering Algorithm (being called KSCDC algorithm herein) based on k-means in Data Stream Clustering Algorithm TCLUSA and document [39] is compared.This algorithm is divided into one by every 100 data recording, and k gets 5, minpits and gets 7, eps and get 0.5.With KSCDC algorithm and TCLUSA algorithm, the different pieces of information of 20,000 to 100,000 is carried out to cluster respectively, experimental result is as shown in table l.As known from Table 1: algorithm TCLUSA all can obtain the better cluster quality than algorithm KSCDC under different parameters condition.Owing to adopting DBSCAN algorithm in ground floor cluster, consuming time more than k-means algorithm, cause algorithm TCLUSA slightly poorer than algorithm KSCDC, however in the larger environment of cluster quality-critical degree this part time performance be lost in acceptable scope within.
The time performance of table 1 algorithm KSCDC and TCLUSA and cluster mass ratio are
Fig. 2 is the system diagram of realizing of the embodiment of the present invention 2, comprising: piecemeal reads in module 201, Density Cluster reference point acquisition module 202, k mean reference point acquisition module 203; Piecemeal reads in module 201 and is connected with k mean reference point acquisition module 203 by Density Cluster reference point acquisition module 202;
Described piecemeal reads in module 201, for data stream being carried out to piecemeal reading data piece;
Described Density Cluster reference point acquisition module 202, for using DBSCAN algorithm cluster, obtains Density Cluster reference point;
Described k mean reference point acquisition module 203, for carrying out k-means algorithm cluster to the described Density Cluster reference point of obtaining and adopting the structure of layering to preserve the k mean reference point that k-means algorithm cluster obtains.
Wherein, described piecemeal reads in module 201, for data stream being carried out to the process of piecemeal reading data piece, is also: in moving window, realize the Circulant Block of data stream is processed, and obtain final cluster result; If data are untreated complete, read in next data block, until Data Stream Processing is complete.
The present invention proposes a kind of secondary clustering method, first data stream is carried out to piecemeal and do the middle cluster result Density Cluster of DBSCAN cluster generation with reference to point set, subsequently these Density Cluster reference point are carried out to k-means cluster, by the structure of layering, preserve bunch reference point (k mean reference point) that each cluster obtains, effectively improve cluster quality and clustering precision and the efficiency of the irregular data stream with containing noise of data distribution.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.
Claims (6)
1. a secondary clustering method, is characterized in that, comprises the following steps:
Data stream is carried out to piecemeal reading data piece;
Use DBSCAN algorithm cluster, obtain Density Cluster reference point;
The described Density Cluster reference point of obtaining is carried out k-means algorithm cluster and adopted the structure of layering to preserve the k mean reference point that k-means algorithm cluster obtains.
2. method according to claim 1, is characterized in that:
The process of data stream being carried out to piecemeal reading data piece is: in moving window, realize the Circulant Block of data stream is processed, and obtain final cluster result; If data are untreated complete, read in next data block, until Data Stream Processing is complete.
3. method according to claim 1, is characterized in that: Density Cluster reference point is defined as follows:
In tentation data stream, data are with piece X1, X2,, Xn, form arrive, can in internal memory, process the data point that each data block comprises similar number for every;
Definition 1:{| ≠: Density Cluster reference point: the data block Xt that moment t is arrived, with the clustering algorithm based on density, carry out cluster, generate k.(kt=1,2) individual clustering, and average point is respectively cl,, Ci,, ek, data block will be by k.Individual shape is as the two tuples formations of (ci, ni), and ni is the data point number that is under the jurisdiction of ci in Xi, claims that rd (ei, ni) is the Density Cluster reference point in data stream.
4. method according to claim 1, is characterized in that: k mean reference point is defined as follows:
In tentation data stream, data are with piece X1, X2,, Xn, form arrive, can in internal memory, process the data point that each data block comprises similar number for every;
Definition 2:k mean reference point: individual the clustering of 2k of m Density Cluster reference point being carried out to the generation of k mean cluster, average point is respectively cl,, Ci, c2k, data block will consist of two tuples of 2k shape j/N (el, hi), and ni is the data point number that is under the jurisdiction of ci, claim that rk (ci, ni) is the k mean reference point in data stream.
5. a secondary clustering system, is characterized in that, comprising:
Piecemeal reads in module, Density Cluster reference point acquisition module, k mean reference point acquisition module; Piecemeal reads in module and is connected with k mean reference point acquisition module by Density Cluster reference point acquisition module;
Described piecemeal reads in module, for data stream being carried out to piecemeal reading data piece;
Described Density Cluster reference point acquisition module, for using DBSCAN algorithm cluster, obtains Density Cluster reference point;
Described k mean reference point acquisition module, for carrying out k-means algorithm cluster to the described Density Cluster reference point of obtaining and adopting the structure of layering to preserve the k mean reference point that k-means algorithm cluster obtains.
6. system according to claim 5, is characterized in that,
Described piecemeal reads in module, for data stream being carried out to the process of piecemeal reading data piece, is also: in moving window, realize the Circulant Block of data stream is processed, and obtain final cluster result; If data are untreated complete, read in next data block, until Data Stream Processing is complete.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310581217.1A CN103577602A (en) | 2013-11-18 | 2013-11-18 | Secondary clustering method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310581217.1A CN103577602A (en) | 2013-11-18 | 2013-11-18 | Secondary clustering method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103577602A true CN103577602A (en) | 2014-02-12 |
Family
ID=50049378
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310581217.1A Pending CN103577602A (en) | 2013-11-18 | 2013-11-18 | Secondary clustering method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103577602A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202335A (en) * | 2016-06-28 | 2016-12-07 | 银江股份有限公司 | A kind of big Data Cleaning Method of traffic based on cloud computing framework |
CN107392220A (en) * | 2017-05-31 | 2017-11-24 | 阿里巴巴集团控股有限公司 | The clustering method and device of data flow |
CN107657277A (en) * | 2017-09-22 | 2018-02-02 | 上海斐讯数据通信技术有限公司 | A kind of human body unusual checking based on big data and decision method and system |
CN108520023A (en) * | 2018-03-22 | 2018-09-11 | 合肥佳讯科技有限公司 | A kind of identification of thunderstorm core and method for tracing based on Hybrid Clustering Algorithm |
CN109344171A (en) * | 2018-12-21 | 2019-02-15 | 中国计量大学 | A kind of nonlinear system characteristic variable conspicuousness mining method based on Data Stream Processing |
CN110287244A (en) * | 2019-07-03 | 2019-09-27 | 武汉中海庭数据技术有限公司 | It is a kind of based on the traffic lights localization method repeatedly clustered |
CN110298558A (en) * | 2019-06-11 | 2019-10-01 | 欧拉信息服务有限公司 | Vehicle resources dispositions method and device |
CN110367969A (en) * | 2019-07-05 | 2019-10-25 | 复旦大学 | A kind of improved electrocardiosignal K-Means Cluster |
CN111179592A (en) * | 2019-12-31 | 2020-05-19 | 合肥工业大学 | Urban traffic prediction method and system based on spatio-temporal data flow fusion analysis |
CN111832791A (en) * | 2019-11-27 | 2020-10-27 | 北京中交兴路信息科技有限公司 | Gas station prediction method based on machine learning logistic regression |
CN111860554A (en) * | 2019-04-28 | 2020-10-30 | 杭州海康威视数字技术股份有限公司 | Risk monitoring method and device, storage medium and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060015215A1 (en) * | 2004-07-15 | 2006-01-19 | Howard Michael D | System and method for automated search by distributed elements |
CN101505314A (en) * | 2008-12-29 | 2009-08-12 | 成都市华为赛门铁克科技有限公司 | P2P data stream recognition method, apparatus and system |
CN101853291A (en) * | 2010-05-24 | 2010-10-06 | 合肥工业大学 | Data flow based car fault diagnosis method |
CN101989289A (en) * | 2009-08-06 | 2011-03-23 | 富士通株式会社 | Data clustering method and device |
CN102289478A (en) * | 2011-08-01 | 2011-12-21 | 江苏广播电视大学 | System and method for recommending video on demand based on fuzzy clustering |
-
2013
- 2013-11-18 CN CN201310581217.1A patent/CN103577602A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060015215A1 (en) * | 2004-07-15 | 2006-01-19 | Howard Michael D | System and method for automated search by distributed elements |
CN101505314A (en) * | 2008-12-29 | 2009-08-12 | 成都市华为赛门铁克科技有限公司 | P2P data stream recognition method, apparatus and system |
CN101989289A (en) * | 2009-08-06 | 2011-03-23 | 富士通株式会社 | Data clustering method and device |
CN101853291A (en) * | 2010-05-24 | 2010-10-06 | 合肥工业大学 | Data flow based car fault diagnosis method |
CN102289478A (en) * | 2011-08-01 | 2011-12-21 | 江苏广播电视大学 | System and method for recommending video on demand based on fuzzy clustering |
Non-Patent Citations (1)
Title |
---|
胡学钢 等: "一种有效的数据流二次聚类算法", 《西南交通大学学报》, vol. 44, no. 4, 31 August 2009 (2009-08-31) * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106202335B (en) * | 2016-06-28 | 2019-06-28 | 银江股份有限公司 | A kind of traffic big data cleaning method based on cloud computing framework |
CN106202335A (en) * | 2016-06-28 | 2016-12-07 | 银江股份有限公司 | A kind of big Data Cleaning Method of traffic based on cloud computing framework |
CN107392220A (en) * | 2017-05-31 | 2017-11-24 | 阿里巴巴集团控股有限公司 | The clustering method and device of data flow |
WO2018219284A1 (en) * | 2017-05-31 | 2018-12-06 | 阿里巴巴集团控股有限公司 | Method and apparatus for clustering data stream |
US11226993B2 (en) | 2017-05-31 | 2022-01-18 | Advanced New Technologies Co., Ltd. | Method and apparatus for clustering data stream |
CN107392220B (en) * | 2017-05-31 | 2020-05-05 | 创新先进技术有限公司 | Data stream clustering method and device |
CN107657277A (en) * | 2017-09-22 | 2018-02-02 | 上海斐讯数据通信技术有限公司 | A kind of human body unusual checking based on big data and decision method and system |
CN107657277B (en) * | 2017-09-22 | 2022-02-01 | 金言 | Human body abnormal behavior detection and judgment method and system based on big data |
CN108520023A (en) * | 2018-03-22 | 2018-09-11 | 合肥佳讯科技有限公司 | A kind of identification of thunderstorm core and method for tracing based on Hybrid Clustering Algorithm |
CN108520023B (en) * | 2018-03-22 | 2021-07-20 | 合肥佳讯科技有限公司 | Thunderstorm kernel identification and tracking method based on hybrid clustering algorithm |
CN109344171A (en) * | 2018-12-21 | 2019-02-15 | 中国计量大学 | A kind of nonlinear system characteristic variable conspicuousness mining method based on Data Stream Processing |
CN111860554A (en) * | 2019-04-28 | 2020-10-30 | 杭州海康威视数字技术股份有限公司 | Risk monitoring method and device, storage medium and electronic equipment |
CN110298558A (en) * | 2019-06-11 | 2019-10-01 | 欧拉信息服务有限公司 | Vehicle resources dispositions method and device |
CN110287244A (en) * | 2019-07-03 | 2019-09-27 | 武汉中海庭数据技术有限公司 | It is a kind of based on the traffic lights localization method repeatedly clustered |
CN110367969A (en) * | 2019-07-05 | 2019-10-25 | 复旦大学 | A kind of improved electrocardiosignal K-Means Cluster |
CN111832791A (en) * | 2019-11-27 | 2020-10-27 | 北京中交兴路信息科技有限公司 | Gas station prediction method based on machine learning logistic regression |
CN111179592B (en) * | 2019-12-31 | 2021-06-11 | 合肥工业大学 | Urban traffic prediction method and system based on spatio-temporal data flow fusion analysis |
CN111179592A (en) * | 2019-12-31 | 2020-05-19 | 合肥工业大学 | Urban traffic prediction method and system based on spatio-temporal data flow fusion analysis |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103577602A (en) | Secondary clustering method and system | |
Pahins et al. | Hashedcubes: Simple, low memory, real-time visual exploration of big data | |
CN103345514B (en) | Streaming data processing method under big data environment | |
CN106202569A (en) | A kind of cleaning method based on big data quantity | |
CN107066476A (en) | A kind of real-time recommendation method based on article similarity | |
CN105808358B (en) | A kind of data dependence thread packet mapping method for many-core system | |
CN103118132B (en) | A kind of distributed cache system towards space-time data and method | |
CN106372190A (en) | Method and device for querying OLAP (on-line analytical processing) in real time | |
CN106708989A (en) | Spatial time sequence data stream application-based Skyline query method | |
JP2019204472A (en) | Method for reading plurality of small files of 2 mb or smaller from hdfs having data merge module and hbase cash module on the basis of hadoop | |
CN103916478B (en) | The method and apparatus that streaming based on distributed system builds data side | |
CN107103068A (en) | The update method and device of service buffer | |
WO2015062540A9 (en) | Driving amount model event-based storage and index methods and system | |
CN104036029A (en) | Big data consistency comparison method and system | |
Zhong et al. | VegaIndexer: A distributed composite index scheme for big spatio-temporal sensor data on cloud | |
CN102012946A (en) | High-efficiency safety monitoring video/image data storage method | |
Xia et al. | SW-BiLSTM: a Spark-based weighted BiLSTM model for traffic flow forecasting | |
Jin et al. | Association rules redundancy processing algorithm based on hypergraph in data mining | |
Xia et al. | DAPR-tree: a distributed spatial data indexing scheme with data access patterns to support Digital Earth initiatives | |
CN107426315A (en) | A kind of improved method of the distributed cache system Memcached based on BP neural network | |
CN103118102A (en) | System and method for counting and controlling spatial data access laws under cloud computing environment | |
Demir et al. | Clustering spatial networks for aggregate query processing: A hypergraph approach | |
Gaurav et al. | An outline on big data and big data analytics | |
CN105956816A (en) | Cargo transportation information intelligent processing method | |
Qin et al. | Towards a smart, internet-scale cache service for data intensive scientific applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140212 |