CN110163255A - A kind of data stream clustering method and device based on density peaks - Google Patents

A kind of data stream clustering method and device based on density peaks Download PDF

Info

Publication number
CN110163255A
CN110163255A CN201910324141.1A CN201910324141A CN110163255A CN 110163255 A CN110163255 A CN 110163255A CN 201910324141 A CN201910324141 A CN 201910324141A CN 110163255 A CN110163255 A CN 110163255A
Authority
CN
China
Prior art keywords
data
cluster
clustered
density
data flow
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910324141.1A
Other languages
Chinese (zh)
Other versions
CN110163255B (en
Inventor
孙红卫
张瑞
杜韬
王信堂
许婧文
朱连江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Jinan
Original Assignee
University of Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Jinan filed Critical University of Jinan
Priority to CN201910324141.1A priority Critical patent/CN110163255B/en
Publication of CN110163255A publication Critical patent/CN110163255A/en
Application granted granted Critical
Publication of CN110163255B publication Critical patent/CN110163255B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Abstract

The data stream clustering method and device based on density peaks that the present disclosure discloses a kind of, based on density peaks and fuzzy clustering method, the concept for the doubtful outlier being put forward for the first time, with the adaptively sampled window model of width and space-time attenuating mechanism for main innovative point, to improve algorithm to the efficiency of data stream clustering as main target and starting point, innovatively propose a kind of new data stream clustering method and device, i.e. a kind of data stream clustering method and device based on density peaks, under the premise of ensureing considerable clustering precision, obtain more efficient data stream clustering effect.

Description

A kind of data stream clustering method and device based on density peaks
Technical field
The disclosure belongs to the technical field of data stream clustering, be related to a kind of data stream clustering method based on density peaks and Device.
Background technique
Only there is provided background technical informations relevant to the disclosure for the statement of this part, it is not necessary to so constitute first skill Art.
It is leading that the world today, which is in advanced technologies such as artificial intelligence, machine learning, big data analysis, virtual realities, The 4th scientific and technological revolution in, the trend that intelligent epoch arrive has advanced swiftly unhindered, and all trades and professions are all actively being added to intelligence In spring tide, make great efforts to improve production efficiency and competitiveness.
Data, are the virgin materials of this glutton's grand banquet of intelligent epoch, and the high dimensional data of flood tide contains information abundant And knowledge, and with the rapid development of personal terminal technology and network technology, information exchange is increasingly frequent, and the traffic also mentions significantly Height, all there is the data endlessly flowed, these moment fast propagations in network medium all the time in network Data become a kind of new data mode --- data flow, and being for most enterprises, unit impossible be by network data whole United analysis is saved in storage medium again after interception, first is that hardware resource requirements are high, second is that network data have it is certain Timeliness, all store when analyzing the result obtained, knowledge again and may have been subjected to.
And data flow belongs to the research object of unsupervised learning another characteristic is that without label, clustering is nothing Important content in supervised learning, and data flow is faced, traditional global clustering algorithm has no longer been applicable in, and is needed a kind of efficient Data Flow Oriented clustering algorithm come data are analyzed in real time and are fed back its analyze result.
Most classic Data Stream Clustering Algorithm be by K-means algorithm improvement Lai CluStream algorithm, this is also several According to the starting point of stream clustering algorithm.Occurred that the modified version of CluStream algorithm --- HPStream algorithm makes it towards height later More robust when dimension data stream.Since their core algorithm is still based on K-means, ball-type cluster can only be found, when in face of non- Its disadvantage will be exposed when ball-type cluster, the Data Stream Clustering Algorithm DenStream for being then based on density is suggested, in addition D-Stream algorithm based on data flow grid model is also a kind of Name-based Routing.In addition, due to the height in data flow Dimension data form disunity can inevitably have mixed type data, and traditional clustering algorithm just can not be handled effectively, then face It is just suggested to the Data Stream Clustering Algorithm HCLuStream of mixed attributes, so that clustering algorithm is suitable in the data for being really Flow environment.
However, inventor has found in R&D process, although these algorithms are all that data stream clustering is made that respective tribute It offers, the perfect method of data flow dynamic clustering makes it increasingly meet application request, but all have a problem that, he Principal concern be all to be proposed to be correspondingly improved according to data cases in data type, that is, focus on poly- In class precision, however data flow is faced, cluster efficiency is also extremely important content, and so that algorithm is adaptively adjusted makes Itself can be also a significant research point without losing information with efficient process data as much as possible.
Summary of the invention
For the deficiencies in the prior art, one or more other embodiments of the present disclosure provide a kind of based on density peak The data stream clustering method and device of value guarantees basic cluster efficiency using fuzzy clustering, introduces density peaks algorithm and guarantees Basic clustering precision proposes the concept of doubtful outlier on the basis of the two to improve the accuracy rate of clustering method, introduces Space-time attenuating mechanism and adaptively sampled data window model ensure the high efficiency of clustering method.The disclosure can have Effect ground is applied to enterprises and institutions' data and analyzes, and especially timeliness is more demanding and the biggish application environment of data volume, can be with In real time, the result of clustering efficiently, is intuitively obtained.
According to the one aspect of one or more other embodiments of the present disclosure, it is poly- to provide a kind of data flow based on density peaks Class method.
A kind of data stream clustering method based on density peaks, this method comprises:
Receive first batch of data flow data to be clustered, initiation parameter and data structure;
It receives new a batch data flow data to be clustered and carries out the pre- cluster of new data as new data, while will be old Data flow data decays;
New a batch data flow data to be clustered and old data flow data and its data structure are merged, and clustered Data after merging, the data after the merging become old data;
The screening for carrying out doubtful outlier to old data updates;The doubtful outlier is to be calculated according to old data Weight is greater than the object-point of threshold value;
The width of data sampling window in following iteration is determined according to the attenuation of old data;
Increasing cluster is carried out according to the maximal density center situation of the doubtful outlier obtained using density peaks algorithm, or according to The spatial position of existing cluster in the heart carries out the merging of cluster;
Return receives the step of new a batch data flow data to be clustered continuation data stream clustering iteration, when sampling window arrives When up to data flow tail portion to be clustered, terminate data stream clustering.
Further, in the method, the parameter of initialization includes cluster number, and cluster heart matrix is first batch of to be clustered Data flow data, cluster result sequence, life span sequence, decay weight sequence, and doubtful outlier screens weight sequence.
Further, in the method, the initiation parameter further includes to the first batch of data flow data to be clustered Fuzzy clustering calculating is carried out, initial subordinated-degree matrix and cluster heart matrix adjusted are obtained.
Further, in the method, the initialization data structure is the first batch of data flow data to be clustered of basis, will The value of each data structure is adjusted correspondingly.
Further, in the method, the pre- cluster for carrying out new data is the data fluxion that new a batch is to be clustered Fuzzy clustering is carried out on the basis of the cluster heart matrix in initiation parameter in the initial value of cluster heart matrix or a upper iteration accordingly, is obtained Its subordinated-degree matrix.
Further, in the method, described to decay to old data flow data to old data according to decaying Weight decays, and attenuates the old data point that weight is lower than preset drop threshold, adjusts each data knot of old data Structure and its value;The decaying weight is calculated according to space factor and time factor.
Further, in the method, the specific steps for increasing cluster include:
According to the maximum density center of density of the doubtful outlier obtained using density peaks algorithm;
When the density for having density center is greater than pre-set density threshold value, increased newly the density center as the cluster heart of newly-increased cluster One new cluster;
The combined specific steps of the cluster include:
Judge whether existing cluster is less than in the heart there are two distance between the cluster heart according to the spatial position of existing cluster in the heart to preset Cluster heart distance threshold;
When distance is less than default cluster heart distance threshold between two cluster hearts, two clusters are merged.
Further, in the method, the sampling window reaches the tool of the judgment method of data flow tail portion to be clustered Body step includes:
According to the data volume of the width control next group of data sampling window in following iteration data flow data to be clustered;
Judge whether the data volume of remaining data flow to be clustered is less than the data of next group data flow data to be clustered Amount, if it is, by the data of the data volume assignment of remaining data flow to be clustered to next group data flow data to be clustered Amount, and last interative computation is carried out, terminate iteration, otherwise continues.
According to the one aspect of one or more other embodiments of the present disclosure, a kind of computer readable storage medium is provided.
A kind of computer readable storage medium, wherein being stored with a plurality of instruction, described instruction is suitable for by terminal device Reason device loads and executes a kind of data stream clustering method based on density peaks.
According to the one aspect of one or more other embodiments of the present disclosure, a kind of terminal device is provided.
A kind of terminal device comprising processor and computer readable storage medium, processor is for realizing each instruction;Meter Calculation machine readable storage medium storing program for executing is suitable for being loaded by processor and being executed described one kind and is based on for storing a plurality of instruction, described instruction The data stream clustering method of density peaks.
According to the one aspect of one or more other embodiments of the present disclosure, it is poly- to provide a kind of data flow based on density peaks Class device.
A kind of data stream clustering device based on density peaks, it is poly- based on a kind of data flow based on density peaks Class method, comprising:
Initialization module is configured as receiving first batch of data flow data to be clustered, initiation parameter and data structure;
Pre- cluster and decaying parallel modules, are configured as receiving new a batch data flow data to be clustered as new data, The pre- cluster of new data is carried out, while old data flow data being decayed;
New and old data combiners block, be configured as new a batch data flow data to be clustered and old data flow data and Its data structure merges, and the data after Cluster merging, and the data after the merging become old data;
Doubtful discrete point screening module, the screening for being configured as carrying out old data doubtful outlier update;It is described to doubt It is that the weight calculated according to old data is greater than the object-point of threshold value like outlier;
Lower batch data amount determining module is configured as determining that data are adopted in following iteration according to the attenuation of old data The width of sample window;
Increase cluster and subtract cluster module, is configured as in the most high-density according to the doubtful outlier obtained using density peaks algorithm Mood condition carries out increasing cluster, or the merging of cluster is carried out according to the spatial position of existing cluster in the heart;
Data stream clustering terminates detection module, is configured as the step of return receives new a batch data flow data to be clustered Continue data stream clustering iteration, when sampling window reaches data flow tail portion to be clustered, terminates data stream clustering.
The disclosure the utility model has the advantages that
A kind of data stream clustering method and device based on density peaks that the disclosure provides guarantees base using fuzzy clustering This cluster efficiency introduces density peaks algorithm and guarantees basic clustering precision, doubtful outlier is proposed on the basis of the two Concept improve the accuracy rate of algorithm, introduce space-time attenuating mechanism and adaptively sampled data window model to ensure The high efficiency of algorithm.The disclosure is effectively applied to the analysis of enterprises and institutions' data, and especially timeliness is more demanding and data Biggish application environment is measured, realizes the result in real time, efficiently, intuitively obtaining clustering.
Detailed description of the invention
The Figure of description for constituting a part of this disclosure is used to provide further understanding of the disclosure, and the disclosure is shown Meaning property embodiment and its explanation do not constitute the improper restriction to the disclosure for explaining the disclosure.
Fig. 1 is a kind of data stream clustering method flow diagram based on density peaks according to one or more embodiments;
Fig. 2 is a kind of data stream clustering method detailed process based on density peaks according to one or more embodiments Figure;
Fig. 3 is the doubtful outlier schematic diagram according to one or more embodiments;
Fig. 4 is the doubtful outlier schematic diagram in the dynamic clustering according to one or more embodiments;
Fig. 5 is the real outlier schematic diagram according to one or more embodiments.
Specific embodiment:
Below in conjunction with the attached drawing in one or more other embodiments of the present disclosure, to one or more other embodiments of the present disclosure In technical solution be clearly and completely described, it is clear that described embodiment is only disclosure a part of the embodiment, Instead of all the embodiments.Based on one or more other embodiments of the present disclosure, those of ordinary skill in the art are not being made Every other embodiment obtained under the premise of creative work belongs to the range of disclosure protection.
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the disclosure.Unless another It indicates, all technical and scientific terms that the present embodiment uses have and disclosure person of an ordinary skill in the technical field Normally understood identical meanings.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root According to the illustrative embodiments of the disclosure.As used herein, unless the context clearly indicates otherwise, otherwise singular Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
It should be noted that flowcharts and block diagrams in the drawings show according to various embodiments of the present disclosure method and The architecture, function and operation in the cards of system.It should be noted that each box in flowchart or block diagram can represent A part of one module, program segment or code, a part of the module, program segment or code may include one or more A executable instruction for realizing the logic function of defined in each embodiment.It should also be noted that some alternately Realization in, function marked in the box can also occur according to the sequence that is marked in attached drawing is different from.For example, two connect The box even indicated can actually be basically executed in parallel or they can also be executed in a reverse order sometimes, This depends on related function.It should also be noted that each box and flow chart in flowchart and or block diagram And/or the combination of the box in block diagram, the dedicated hardware based system that functions or operations as defined in executing can be used are come It realizes, or the combination of specialized hardware and computer instruction can be used to realize.
In the absence of conflict, the feature in the embodiment and embodiment in the disclosure can be combined with each other, and tie below It closes attached drawing and embodiment is described further the disclosure.
Embodiment one
In order to realize full online data stream cluster, the purpose of efficiency of algorithm is improved, in fact according to the one or more of the disclosure The one aspect for applying example provides a kind of data stream clustering method based on density peaks.
As Figure 1-Figure 2, a kind of data stream clustering method based on density peaks, this method comprises:
A kind of data stream clustering method based on density peaks, this method comprises:
Step S1: first batch of data flow data to be clustered, initiation parameter and data structure are received;
Step S2: new a batch data flow data to be clustered is received as new data, carries out the pre- cluster of new data, simultaneously Old data flow data is decayed;
Step S3: new a batch data flow data to be clustered and old data flow data and its data structure are closed And and Cluster merging after data, data after the merging become old data;
Step S4: the screening for carrying out doubtful outlier to old data updates;The doubtful outlier is according to old number It is greater than the object-point of threshold value according to the weight of calculating;
Step S5: the width of data sampling window in following iteration is determined according to the attenuation of old data;
Step S6: increased according to the maximal density center situation of the doubtful outlier obtained using density peaks algorithm Cluster, or the merging of the spatial position progress cluster according to existing cluster in the heart;
Step S7: returning to the step S2 continuation data stream clustering iteration for receiving new a batch data flow data to be clustered, when When sampling window reaches data flow tail portion to be clustered, terminate data stream clustering.
One, the preparation stage:
In the step S1 of the present embodiment, this method carries out the preparation of initialization carry out method, to parameters and Data structure carries out assignment and building.
Step S101: the parameter of initialization includes cluster number, cluster heart matrix, first batch of data flow data to be clustered, Cluster result sequence, life span sequence, decay weight sequence, and doubtful outlier screens weight sequence;
Step S102: to first batch of pending data X carry out fuzzy clustering be calculated initial subordinated-degree matrix Mem_xv and Cluster heart matrix V adjusted, the value of other each data structures are all adjusted correspondingly;
In the present embodiment, it before the system core recycles body running, needs to some necessary data structures and parameter It is initialized, is that assignment is carried out to the i.e. k of initial cluster heart number first, because our algorithm can adaptively adjust of cluster Number, here can be with one value appropriate of random initializtion;Then by first batch of data X=[x1,x2,……,xn] load into interior It deposits, wherein xiIt is the attribute vector of m dimension, 1≤i≤n.Corresponding cluster result sequence C lass is constructed according to first batch of data volume, Life span sequence Time, decay weight sequence D _ weight, and doubtful outlier screening weight sequence O_weight etc. matches tricks According to structure, wherein the element initialization value in Time and O_weight is the equal assignment 1 of element of 0, D_weight.Assuming that number It is L according to amount, then data structure is the one-dimensional vector that length is L.K object is finally selected at random in X as the initial cluster heart Matrix V calculates X using fuzzy clustering algorithm, obtains initial subordinated-degree matrix Mem_xv.Algorithm beam worker makes knots at this time Beam.
Two, core loop body:
In the step S2 of the present embodiment, specific steps include:
Step S201: newly into batch of data, fuzzy clustering is carried out on the basis of cluster heart matrix V obtained in the previous step, is obtained Its subordinated-degree matrix Mem_xv_n;
Step S202: decaying to old data according to decaying weight, and weight will be by lower than the old data point of threshold value It attenuates, adjusts each data structure of old data and its value;
Step S201 and step S202 is parallel processing.
In the step S3 of the present embodiment, by initial subordinated-degree matrix Mem_xv and newly into the subordinated-degree matrix of batch of data Mem_xv_n and new legacy data and its data structure merge, and continue fuzzy clustering based on cluster heart matrix V and obtain To new initial subordinated-degree matrix Mem_xv and cluster heart matrix V, the data set dataset after merging at this time becomes old data;
In the step S4 of the present embodiment, doubtful outlier right value update is carried out to old data and to screen weight as base Standard carries out the filtering of doubtful outlier;
In the step S5 of the present embodiment, number in following iteration is determined according to the attenuation of data old in current iteration According to the width of sampling window, the quantity in_num of next batch data is controlled;
In the step S6 of the present embodiment, specific steps include:
Step S601: judge whether have density greater than threshold value o_threshold in doubtful outlier, if there is taking most Big point is that the cluster heart increases a new cluster newly, and cluster interior knot is obtained from following iteration calculating;
Step S602: judging existing cluster, whether there are two distances between the cluster heart to be less than threshold value d_threshold in the heart, If there is two clusters are merged.
In the present embodiment, core loop body, this part be system to core, mainly realize dynamic clustering, data Merge, the screening of doubtful outlier, the adaptive adjustment of sampling data window width, the decaying of old data, the newly-increased and fusion of cluster Etc. functions.Firstly, the data for being newly in_num into a collection of quantity, carry out fuzzy clustering division by initial cluster heart matrix of V, obtain Its subordinated-degree matrix Mem_xv_n;The update and data decaying of data decaying weight are completed simultaneously for old data portion Work, the old data after obtained decaying are merged with new data and its corresponding data structure, and continue fuzzy clustering, New subordinated-degree matrix Mem_xv and updated old data dataset after being merged.The decaying weight of old data Calculating is to have comprehensively considered room and time factor, is indicated with formula (1).
Wherein parameter w and z is the adjusting parameter of respective control time and spacial influence degree, is generally respectively set to w=2 And z=1.Dc (t, i) is indicated at a distance from i-th of data point in the t times iteration is between the maximum cluster heart of its degree of membership.It is all The new decaying weight to data be initialized as 1.Drop threshold λ is 0.2, as D_weight (t, i) < λ, corresponding data Point is attenuated, i.e. removal memory is no longer participate in calculating.
And the screening operation of next doubtful outlier be also built upon doubtful outlier screening weight imparting it On, doubtful outlier is the concept that we are put forward for the first time, it mainly has the characteristics that following: 1, positioned at two clusters it Between or intersection;2, each cluster of positional distance is all farther out;3, more similar to the degree of membership of multiple clusters;4, in dynamic clustering In the process temporarily apart from affiliated cluster farther out.As shown in figure 3, the partial objects being overlapped between triangle and diamond shape cluster can be divided to Doubtful outlier scope.By these features we can see that these ranges cover outlier, real outlier one It surely is to ensconce in doubtful outlier, so we need first to screen doubtful outlier, screening weight such as (2) is indicated.
Wherein 1≤i, j≤Lt。LtWhat is indicated is data volume when being currently at the t times iteration in memory.In degree of membership square In battle array, each data point to each cluster has a degree of membership, does difference operation between these degrees of membership and obtains a series of differences, this A little value compositions Length be sumd=1+2+ ...+k-1.AndValue it is smaller, illustrating it more may be doubtful outlier.Indicate data object xiIt arrives At a distance from its degree of membership maximum cluster heart.Smaller its of the value of last fraction more may be doubtful outlier in formula.ξ is one Weight parameter, value range ξ ∈ [0.001,0.01,0.1,1].And as the standard for filtering doubtful outlier, screen threshold value Setting such as shown in (3):
WhenWhen, we are just by object xiIt is divided to the scope of doubtful outlier.
It is the doubtful outlier occurred in dynamic clustering process as shown in Figure 4 or is known as pseudo- outlier schematic diagram, in Fig. 4 The left side is the distribution situation of global data collection, cluster three is individually plucked out be in order to be more convenient to explain that is doubtful outlier, when Static global data collection according to Stream Processing, only there can be a part of data in memory, that is, will appear lower right corner subgraph In the case where occurring, have single cluster interior knot temporarily apart from main cluster farther out, it is considered to be doubtful outlier becomes at this time Pseudo- outlier is properer.
It is real outlier schematic diagram as shown in Figure 5, in dynamic clustering process, the characteristics of real outlier is to belong to The subset of doubtful outlier feature can temporarily divide doubtful outlier into, can obtain right part of flg by density peaks algorithm finally In decision diagram be filtered out, improve arithmetic accuracy.Because the space complexity of density peaks algorithm is higher, and less The division of doubtful outlier will not influence efficiency of algorithm.
For determining that how many new data enter system when following iteration, i.e., adaptively adjust the width of sampling data window model Degree is to calculate gained by formula (4) and (5):
Dif (t)=countd(t)-in_num(t) (4)
Wherein countd(t) the old data amount check being attenuated in current iteration is indicated.Matlab function floor () It is accomplished that the function of rounding up, for example floor (1.43)=2. parameter e is a tune ginseng, value is greater than 1.Next group newly counts It is in_num (t)=Width (t+1) according to amount.
The last one part and parcel is the newly-increased and combined realization of cluster, when the scale of doubtful outlier is increasing, The maximum density center of density is obtained by density peaks algorithm, it, just will be in this when the density for having density center is greater than threshold value The cluster heart of the heart as newly-increased cluster, increases the division that cluster carries out cluster member, the setting such as (6) of threshold value newly to this in next iteration It is shown:
What wherein density_o (t) was indicated is doubtful outlier density sequence in the t times iteration, preserves each doubt Like the local density values of outlier, number, that is, sequence length of doubtful outlier.τ is a flexible parameter, for adjusting threshold The setting of value, its value range are [0.9,1.1].Density (vi, t) indicates the cluster heart v in the t times iterationiPart it is close Degree.As max (density_o (t)) >=d_threshold (t), a cluster, k=k+1 are increased newly.Real outlier is repeatedly The scope of doubtful outlier can be always belonged to during generation.With the iteration of data, they are eventually attenuated, and are avoided it and are drawn The characteristic offset risen.In addition, may become closer to each other in subsequent iteration for the new cluster or old cluster of generation.When wherein When the heart is close enough, they are merged to form a new cluster.We calculate the distance between cluster heart d_v (t, vi,vj), DC= 0.2*max (d_v (t)) is distance threshold.When the distance between two cluster hearts are less than DC, corresponding two clusters are merged, K=k-1 at this time.
Three, algorithm terminates:
In the step S7 of the present embodiment, judge whether remaining data volume is less than the quantity in_num of next batch data, If so, remaining data volume is assigned the quantity of next batch data to in_num and carries out last interative computation, end loop Otherwise body continues.
In the present embodiment, after core algorithm body completes current iteration, need to judge whether data flow terminates, thus certainly Fixed circulation of whether jumping out terminates program.When detecting that remaining data amount is greater than the in_num value being calculated, normally enter down One wheel iterative process, and when it is less than in_num, needing to adjust in_num value is remaining data amount, enters back into following iteration. After only completing normal iteration work need that density peaks algorithm is finally used to divide remaining doubtful outlier, it will Real outlier screens, then other points are divided with fuzzy clustering, terminates entire algorithm at this time to data flow Handle work.
Embodiment two
According to the one aspect of one or more other embodiments of the present disclosure, a kind of computer readable storage medium is provided.
A kind of computer readable storage medium, wherein being stored with a plurality of instruction, described instruction is suitable for by terminal device Reason device loads and executes a kind of data stream clustering method based on density peaks.
Embodiment three
According to the one aspect of one or more other embodiments of the present disclosure, a kind of terminal device is provided.
A kind of terminal device comprising processor and computer readable storage medium, processor is for realizing each instruction;Meter Calculation machine readable storage medium storing program for executing is suitable for being loaded by processor and being executed described one kind and is based on for storing a plurality of instruction, described instruction The data stream clustering method of density peaks.
These computer executable instructions execute the equipment according to each reality in the disclosure Apply method or process described in example.
In the present embodiment, computer program product may include computer readable storage medium, containing for holding The computer-readable program instructions of row various aspects of the disclosure.Computer readable storage medium, which can be, can keep and store By the tangible device for the instruction that instruction execution equipment uses.Computer readable storage medium for example can be-- but it is unlimited In-- storage device electric, magnetic storage apparatus, light storage device, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned Any appropriate combination.The more specific example (non exhaustive list) of computer readable storage medium includes: portable computing Machine disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or Flash memory), static random access memory (SRAM), Portable compressed disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, the punch card for being for example stored thereon with instruction or groove internal projection structure, with And above-mentioned any appropriate combination.Computer readable storage medium used herein above is not interpreted instantaneous signal itself, The electromagnetic wave of such as radio wave or other Free propagations, the electromagnetic wave (example propagated by waveguide or other transmission mediums Such as, pass through the light pulse of fiber optic cables) or pass through electric wire transmit electric signal.
Computer-readable program instructions described herein can be downloaded to from computer readable storage medium it is each calculate/ Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing present disclosure operation can be assembly instruction, instruction set architecture (ISA) Instruction, machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programmings The source code or object code that any combination of language is write, the programming language include the programming language-of object-oriented such as C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer-readable program refers to Order can be executed fully on the user computer, partly be executed on the user computer, as an independent software package Execute, part on the user computer part on the remote computer execute or completely on a remote computer or server It executes.In situations involving remote computers, remote computer can include local area network by the network-of any kind (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize internet Service provider is connected by internet).In some embodiments, by being believed using the state of computer-readable program instructions Breath comes personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or programmable logic Array (PLA), the electronic circuit can execute computer-readable program instructions, to realize the various aspects of present disclosure.
Example IV
According to the one aspect of one or more other embodiments of the present disclosure, it is poly- to provide a kind of data flow based on density peaks Class device.
A kind of data stream clustering device based on density peaks, it is poly- based on a kind of data flow based on density peaks Class method, comprising:
It predicts target acquisition module, is configured as receiving the prediction of the different energy sources alternative solution of energy substitution conversion enterprise Target, and obtain the frequent item set of all kinds of targets;
Prediction model establishes module, and it is related to be configured as receiving all kinds of enterprise energy substitutions front and back in electric power enterprise database Data construct Gauss regression combination prediction model;
Predicted target values computing module is configured as according to Gauss regression combination prediction model to different energy sources alternative solution The frequent item sets of all kinds of targets carry out sub-clustering prediction, obtain the predicted target values of different energy sources alternative solution, and to prediction mesh Scale value carries out linear combination and obtains expense year value predicted target values;
Electric energy alternative solution prediction module is configured as being worth equal principle based on expense year, is predicted according to expense year value Target value carries out data reckoning to fungible energy source scheme, obtains the boundary electricity price of electric energy alternative solution, and calculates electric energy substitution The uncertainty of scheme is estimated, and electric energy alternative solution prediction result is obtained;
Parameter adjustment module is configured as carrying out data feedback based on prediction result, and answers with electric system related service It is compared with the received real data of platform, adjusts the parameter of Gauss regression combination prediction model, it is adjusted using parameter Gauss regression combination prediction model carries out the prediction of electric energy alternative solution.
It should be noted that although being referred to several modules or submodule of equipment in the detailed description above, it is this Division is only exemplary rather than enforceable.In fact, in accordance with an embodiment of the present disclosure, two or more above-described moulds The feature and function of block can embody in a module.Conversely, the feature and function of an above-described module can be with Further division is to be embodied by multiple modules.
The disclosure the utility model has the advantages that
A kind of data stream clustering method and device based on density peaks that the disclosure provides, solves electric energy alternative solution In prediction, the problem of traditional statistical method time and effort consuming, and the big problem of traditional statistical method precision of prediction error is improved; In addition, the disclosure is worth equal principle based on expense year, the boundary electricity price of electric energy alternative solution is obtained, and calculates electric energy alternative The uncertainty of case is estimated, and provides effective support for the prediction of electric energy alternative solution, and the final disclosure is responding state-of-the-nation call, In terms of preserving the ecological environment, the support with practical significance is provided.
The foregoing is merely preferred embodiment of the present disclosure, are not limited to the disclosure, for the skill of this field For art personnel, the disclosure can have various modifications and variations.It is all within the spirit and principle of the disclosure, it is made any to repair Change, equivalent replacement, improvement etc., should be included within the protection scope of the disclosure.Therefore, the disclosure is not intended to be limited to this These embodiments shown in text, and it is to fit to the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. a kind of data stream clustering method based on density peaks, which is characterized in that this method comprises:
Receive first batch of data flow data to be clustered, initiation parameter and data structure;
New a batch data flow data to be clustered is received as new data, carries out the pre- cluster of new data, while by old data Flow data is decayed;
New a batch data flow data to be clustered and old data flow data and its data structure are merged, and Cluster merging Data afterwards, the data after the merging become old data;
The screening for carrying out doubtful outlier to old data updates;The doubtful outlier is the weight calculated according to old data Greater than the object-point of threshold value;
The width of data sampling window in following iteration is determined according to the attenuation of old data;
Increasing cluster is carried out according to the maximal density center situation of the doubtful outlier obtained using density peaks algorithm, or according to existing The spatial position of cluster in the heart carries out the merging of cluster;
Return to the step of receiving new a batch data flow data to be clustered and continue data stream clustering iteration, when sampling window reach to When the data flow tail portion of cluster, terminate data stream clustering.
2. a kind of data stream clustering method based on density peaks as described in claim 1, which is characterized in that in this method In, the parameter of initialization includes cluster number, cluster heart matrix, first batch of data flow data to be clustered, cluster result sequence, life Time series is deposited, decay weight sequence, and doubtful outlier screens weight sequence.
3. a kind of data stream clustering method based on density peaks as claimed in claim 2, which is characterized in that in this method In, the initiation parameter further includes carrying out fuzzy clustering calculating to the first batch of data flow data to be clustered, is obtained initial Subordinated-degree matrix and cluster heart matrix adjusted.
Further, in the method, the initialization data structure is the first batch of data flow data to be clustered of basis, will be each The value of data structure is adjusted correspondingly.
4. a kind of data stream clustering method based on density peaks as described in claim 1, which is characterized in that in this method In, the pre- cluster for carrying out new data is by new a batch data flow data to be clustered with cluster heart matrix in initiation parameter Fuzzy clustering is carried out on the basis of cluster heart matrix in initial value or a upper iteration, obtains its subordinated-degree matrix.
5. a kind of data stream clustering method based on density peaks as described in claim 1, which is characterized in that in this method In, described decay to old data flow data decays to old data according to decaying weight, and it is low to attenuate weight In the old data point of preset drop threshold, each data structure of old data and its value are adjusted;The decaying weight according to Space factor and time factor calculate.
6. a kind of data stream clustering method based on density peaks as described in claim 1, which is characterized in that in this method In, the specific steps for increasing cluster include:
According to the maximum density center of density of the doubtful outlier obtained using density peaks algorithm;
It is when the density for having density center is greater than pre-set density threshold value, the density center is one newly-increased as the cluster heart of newly-increased cluster New cluster;
The combined specific steps of the cluster include:
Judging existing cluster according to the spatial position of existing cluster in the heart, whether there are two distances between the cluster heart to be less than the default cluster heart in the heart Distance threshold;
When distance is less than default cluster heart distance threshold between two cluster hearts, two clusters are merged.
7. a kind of data stream clustering method based on density peaks as described in claim 1, which is characterized in that in this method In, the specific steps that the sampling window reaches the judgment method of data flow tail portion to be clustered include:
According to the data volume of the width control next group of data sampling window in following iteration data flow data to be clustered;
Judge whether the data volume of remaining data flow to be clustered is less than the data volume of next group data flow data to be clustered, such as Fruit is, then by the data volume of the data volume assignment of remaining data flow to be clustered to next group data flow data to be clustered, and Last interative computation is carried out, terminates iteration, otherwise continues.
8. a kind of computer readable storage medium, wherein being stored with a plurality of instruction, which is characterized in that described instruction is suitable for by terminal The processor of equipment is loaded and is executed such as a kind of described in any item data stream clustering sides based on density peaks claim 1-7 Method.
9. a kind of terminal device comprising processor and computer readable storage medium, processor is for realizing each instruction;It calculates Machine readable storage medium storing program for executing is for storing a plurality of instruction, which is characterized in that described instruction is suitable for being loaded by processor and being executed such as power Benefit requires a kind of described in any item data stream clustering methods based on density peaks of 1-7.
10. a kind of data stream clustering device based on density peaks, which is characterized in that based on such as any one of claim 1-7 institute A kind of data stream clustering method based on density peaks stated, comprising:
Initialization module is configured as receiving first batch of data flow data to be clustered, initiation parameter and data structure;
Pre- cluster and decaying parallel modules are configured as receiving new a batch data flow data to be clustered as new data, carry out The pre- cluster of new data, while old data flow data being decayed;
New and old data combiners block is configured as new a batch data flow data to be clustered and old data flow data and its number It is merged according to structure, and the data after Cluster merging, the data after the merging become old data;
Doubtful discrete point screening module, the screening for being configured as carrying out old data doubtful outlier update;It is described it is doubtful from Group's point is that the weight calculated according to old data is greater than the object-point of threshold value;
Lower batch data amount determining module, is configured as determining data sampling window in following iteration according to the attenuation of old data The width of mouth;
Increase cluster and subtract cluster module, is configured as mood in the most high-density according to the doubtful outlier obtained using density peaks algorithm Condition carries out increasing cluster, or the merging of cluster is carried out according to the spatial position of existing cluster in the heart;
Data stream clustering terminates detection module, is configured as the step of return receives new a batch data flow data to be clustered continuation Data stream clustering iteration terminates data stream clustering when sampling window reaches data flow tail portion to be clustered.
CN201910324141.1A 2019-04-22 2019-04-22 Data stream clustering method and device based on density peak value Expired - Fee Related CN110163255B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910324141.1A CN110163255B (en) 2019-04-22 2019-04-22 Data stream clustering method and device based on density peak value

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910324141.1A CN110163255B (en) 2019-04-22 2019-04-22 Data stream clustering method and device based on density peak value

Publications (2)

Publication Number Publication Date
CN110163255A true CN110163255A (en) 2019-08-23
CN110163255B CN110163255B (en) 2021-11-16

Family

ID=67639909

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910324141.1A Expired - Fee Related CN110163255B (en) 2019-04-22 2019-04-22 Data stream clustering method and device based on density peak value

Country Status (1)

Country Link
CN (1) CN110163255B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110488259A (en) * 2019-08-30 2019-11-22 成都纳雷科技有限公司 A kind of classification of radar targets method and device based on GDBSCAN
CN114861729A (en) * 2022-05-20 2022-08-05 西安邮电大学 Method and device for detecting time sequence abnormity in wireless sensor network
CN116227538A (en) * 2023-04-26 2023-06-06 国网山西省电力公司晋城供电公司 Clustering and deep learning-based low-current ground fault line selection method and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139035A (en) * 2015-08-31 2015-12-09 浙江工业大学 Mixed attribute data flow clustering method for automatically determining clustering center based on density
CN105868266A (en) * 2016-01-27 2016-08-17 电子科技大学 Clustering model based high-dimensional data stream outlier detection method
CN109409400A (en) * 2018-08-28 2019-03-01 西安电子科技大学 Merge density peaks clustering method, image segmentation system based on k nearest neighbor and multiclass

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139035A (en) * 2015-08-31 2015-12-09 浙江工业大学 Mixed attribute data flow clustering method for automatically determining clustering center based on density
CN105868266A (en) * 2016-01-27 2016-08-17 电子科技大学 Clustering model based high-dimensional data stream outlier detection method
CN109409400A (en) * 2018-08-28 2019-03-01 西安电子科技大学 Merge density peaks clustering method, image segmentation system based on k nearest neighbor and multiclass

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
RUI ZHANG 等: "A Principal Component Analysis Algorithm Based on Dimension Reduction Window", 《IEEE》 *
谢娟英 等: "K近邻优化的密度峰值快速搜索聚类算法", 《中国科学:信息科学》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110488259A (en) * 2019-08-30 2019-11-22 成都纳雷科技有限公司 A kind of classification of radar targets method and device based on GDBSCAN
CN114861729A (en) * 2022-05-20 2022-08-05 西安邮电大学 Method and device for detecting time sequence abnormity in wireless sensor network
CN116227538A (en) * 2023-04-26 2023-06-06 国网山西省电力公司晋城供电公司 Clustering and deep learning-based low-current ground fault line selection method and equipment

Also Published As

Publication number Publication date
CN110163255B (en) 2021-11-16

Similar Documents

Publication Publication Date Title
De Toro et al. PSFGA: a parallel genetic algorithm for multiobjective optimization
CN103365727B (en) Host load forecasting method in cloud computing environment
CN107220217A (en) Characteristic coefficient training method and device that logic-based is returned
CN110163255A (en) A kind of data stream clustering method and device based on density peaks
CN109933306A (en) Mix Computational frame generation, data processing method, device and mixing Computational frame
CN106527381B (en) A kind of fast evaluation method towards parallel batch processing machine dynamic dispatching
CN104951425A (en) Cloud service performance adaptive action type selection method based on deep learning
CN109840154A (en) A kind of computation migration method that task based access control relies under mobile cloud environment
CN110389824A (en) Handle method, equipment and the computer program product of calculating task
CN110533484A (en) A kind of product Method for Sales Forecast method based on PCA and improved BP
CN117472587B (en) Resource scheduling system of AI intelligent computation center
CN109523178A (en) A kind of O&amp;M method and device towards power communication scene
CN109409746A (en) A kind of production scheduling method and device
El‐Ghandour et al. Survey of information technology applications in construction
Gong et al. Evolutionary computation in China: A literature survey
Peng et al. Reliability-aware computation offloading for delay-sensitive applications in mec-enabled aerial computing
Shahin Memetic multi-objective particle swarm optimization-based energy-aware virtual network embedding
CN114650321A (en) Task scheduling method for edge computing and edge computing terminal
Shi et al. Analytics for IoT‐enabled human–robot hybrid sortation: an online optimization approach
CN107038244A (en) A kind of data digging method and device, a kind of computer-readable recording medium and storage control
Duca et al. An overview of non-Gaussian state-space models for wind speed data
Mamdouh et al. Airport resource allocation using machine learning techniques
CN109857817A (en) The whole network domain electronic mutual inductor frequent continuous data is screened and data processing method
CN108256694A (en) Based on Fuzzy time sequence forecasting system, the method and device for repeating genetic algorithm
CN111046321B (en) Photovoltaic power station operation and maintenance strategy optimization method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20211116