CN110163255A - A kind of data stream clustering method and device based on density peaks - Google Patents
A kind of data stream clustering method and device based on density peaks Download PDFInfo
- Publication number
- CN110163255A CN110163255A CN201910324141.1A CN201910324141A CN110163255A CN 110163255 A CN110163255 A CN 110163255A CN 201910324141 A CN201910324141 A CN 201910324141A CN 110163255 A CN110163255 A CN 110163255A
- Authority
- CN
- China
- Prior art keywords
- data
- cluster
- clustered
- density
- data flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Abstract
The data stream clustering method and device based on density peaks that the present disclosure discloses a kind of, based on density peaks and fuzzy clustering method, the concept for the doubtful outlier being put forward for the first time, with the adaptively sampled window model of width and space-time attenuating mechanism for main innovative point, to improve algorithm to the efficiency of data stream clustering as main target and starting point, innovatively propose a kind of new data stream clustering method and device, i.e. a kind of data stream clustering method and device based on density peaks, under the premise of ensureing considerable clustering precision, obtain more efficient data stream clustering effect.
Description
Technical field
The disclosure belongs to the technical field of data stream clustering, be related to a kind of data stream clustering method based on density peaks and
Device.
Background technique
Only there is provided background technical informations relevant to the disclosure for the statement of this part, it is not necessary to so constitute first skill
Art.
It is leading that the world today, which is in advanced technologies such as artificial intelligence, machine learning, big data analysis, virtual realities,
The 4th scientific and technological revolution in, the trend that intelligent epoch arrive has advanced swiftly unhindered, and all trades and professions are all actively being added to intelligence
In spring tide, make great efforts to improve production efficiency and competitiveness.
Data, are the virgin materials of this glutton's grand banquet of intelligent epoch, and the high dimensional data of flood tide contains information abundant
And knowledge, and with the rapid development of personal terminal technology and network technology, information exchange is increasingly frequent, and the traffic also mentions significantly
Height, all there is the data endlessly flowed, these moment fast propagations in network medium all the time in network
Data become a kind of new data mode --- data flow, and being for most enterprises, unit impossible be by network data whole
United analysis is saved in storage medium again after interception, first is that hardware resource requirements are high, second is that network data have it is certain
Timeliness, all store when analyzing the result obtained, knowledge again and may have been subjected to.
And data flow belongs to the research object of unsupervised learning another characteristic is that without label, clustering is nothing
Important content in supervised learning, and data flow is faced, traditional global clustering algorithm has no longer been applicable in, and is needed a kind of efficient
Data Flow Oriented clustering algorithm come data are analyzed in real time and are fed back its analyze result.
Most classic Data Stream Clustering Algorithm be by K-means algorithm improvement Lai CluStream algorithm, this is also several
According to the starting point of stream clustering algorithm.Occurred that the modified version of CluStream algorithm --- HPStream algorithm makes it towards height later
More robust when dimension data stream.Since their core algorithm is still based on K-means, ball-type cluster can only be found, when in face of non-
Its disadvantage will be exposed when ball-type cluster, the Data Stream Clustering Algorithm DenStream for being then based on density is suggested, in addition
D-Stream algorithm based on data flow grid model is also a kind of Name-based Routing.In addition, due to the height in data flow
Dimension data form disunity can inevitably have mixed type data, and traditional clustering algorithm just can not be handled effectively, then face
It is just suggested to the Data Stream Clustering Algorithm HCLuStream of mixed attributes, so that clustering algorithm is suitable in the data for being really
Flow environment.
However, inventor has found in R&D process, although these algorithms are all that data stream clustering is made that respective tribute
It offers, the perfect method of data flow dynamic clustering makes it increasingly meet application request, but all have a problem that, he
Principal concern be all to be proposed to be correspondingly improved according to data cases in data type, that is, focus on poly-
In class precision, however data flow is faced, cluster efficiency is also extremely important content, and so that algorithm is adaptively adjusted makes
Itself can be also a significant research point without losing information with efficient process data as much as possible.
Summary of the invention
For the deficiencies in the prior art, one or more other embodiments of the present disclosure provide a kind of based on density peak
The data stream clustering method and device of value guarantees basic cluster efficiency using fuzzy clustering, introduces density peaks algorithm and guarantees
Basic clustering precision proposes the concept of doubtful outlier on the basis of the two to improve the accuracy rate of clustering method, introduces
Space-time attenuating mechanism and adaptively sampled data window model ensure the high efficiency of clustering method.The disclosure can have
Effect ground is applied to enterprises and institutions' data and analyzes, and especially timeliness is more demanding and the biggish application environment of data volume, can be with
In real time, the result of clustering efficiently, is intuitively obtained.
According to the one aspect of one or more other embodiments of the present disclosure, it is poly- to provide a kind of data flow based on density peaks
Class method.
A kind of data stream clustering method based on density peaks, this method comprises:
Receive first batch of data flow data to be clustered, initiation parameter and data structure;
It receives new a batch data flow data to be clustered and carries out the pre- cluster of new data as new data, while will be old
Data flow data decays;
New a batch data flow data to be clustered and old data flow data and its data structure are merged, and clustered
Data after merging, the data after the merging become old data;
The screening for carrying out doubtful outlier to old data updates;The doubtful outlier is to be calculated according to old data
Weight is greater than the object-point of threshold value;
The width of data sampling window in following iteration is determined according to the attenuation of old data;
Increasing cluster is carried out according to the maximal density center situation of the doubtful outlier obtained using density peaks algorithm, or according to
The spatial position of existing cluster in the heart carries out the merging of cluster;
Return receives the step of new a batch data flow data to be clustered continuation data stream clustering iteration, when sampling window arrives
When up to data flow tail portion to be clustered, terminate data stream clustering.
Further, in the method, the parameter of initialization includes cluster number, and cluster heart matrix is first batch of to be clustered
Data flow data, cluster result sequence, life span sequence, decay weight sequence, and doubtful outlier screens weight sequence.
Further, in the method, the initiation parameter further includes to the first batch of data flow data to be clustered
Fuzzy clustering calculating is carried out, initial subordinated-degree matrix and cluster heart matrix adjusted are obtained.
Further, in the method, the initialization data structure is the first batch of data flow data to be clustered of basis, will
The value of each data structure is adjusted correspondingly.
Further, in the method, the pre- cluster for carrying out new data is the data fluxion that new a batch is to be clustered
Fuzzy clustering is carried out on the basis of the cluster heart matrix in initiation parameter in the initial value of cluster heart matrix or a upper iteration accordingly, is obtained
Its subordinated-degree matrix.
Further, in the method, described to decay to old data flow data to old data according to decaying
Weight decays, and attenuates the old data point that weight is lower than preset drop threshold, adjusts each data knot of old data
Structure and its value;The decaying weight is calculated according to space factor and time factor.
Further, in the method, the specific steps for increasing cluster include:
According to the maximum density center of density of the doubtful outlier obtained using density peaks algorithm;
When the density for having density center is greater than pre-set density threshold value, increased newly the density center as the cluster heart of newly-increased cluster
One new cluster;
The combined specific steps of the cluster include:
Judge whether existing cluster is less than in the heart there are two distance between the cluster heart according to the spatial position of existing cluster in the heart to preset
Cluster heart distance threshold;
When distance is less than default cluster heart distance threshold between two cluster hearts, two clusters are merged.
Further, in the method, the sampling window reaches the tool of the judgment method of data flow tail portion to be clustered
Body step includes:
According to the data volume of the width control next group of data sampling window in following iteration data flow data to be clustered;
Judge whether the data volume of remaining data flow to be clustered is less than the data of next group data flow data to be clustered
Amount, if it is, by the data of the data volume assignment of remaining data flow to be clustered to next group data flow data to be clustered
Amount, and last interative computation is carried out, terminate iteration, otherwise continues.
According to the one aspect of one or more other embodiments of the present disclosure, a kind of computer readable storage medium is provided.
A kind of computer readable storage medium, wherein being stored with a plurality of instruction, described instruction is suitable for by terminal device
Reason device loads and executes a kind of data stream clustering method based on density peaks.
According to the one aspect of one or more other embodiments of the present disclosure, a kind of terminal device is provided.
A kind of terminal device comprising processor and computer readable storage medium, processor is for realizing each instruction;Meter
Calculation machine readable storage medium storing program for executing is suitable for being loaded by processor and being executed described one kind and is based on for storing a plurality of instruction, described instruction
The data stream clustering method of density peaks.
According to the one aspect of one or more other embodiments of the present disclosure, it is poly- to provide a kind of data flow based on density peaks
Class device.
A kind of data stream clustering device based on density peaks, it is poly- based on a kind of data flow based on density peaks
Class method, comprising:
Initialization module is configured as receiving first batch of data flow data to be clustered, initiation parameter and data structure;
Pre- cluster and decaying parallel modules, are configured as receiving new a batch data flow data to be clustered as new data,
The pre- cluster of new data is carried out, while old data flow data being decayed;
New and old data combiners block, be configured as new a batch data flow data to be clustered and old data flow data and
Its data structure merges, and the data after Cluster merging, and the data after the merging become old data;
Doubtful discrete point screening module, the screening for being configured as carrying out old data doubtful outlier update;It is described to doubt
It is that the weight calculated according to old data is greater than the object-point of threshold value like outlier;
Lower batch data amount determining module is configured as determining that data are adopted in following iteration according to the attenuation of old data
The width of sample window;
Increase cluster and subtract cluster module, is configured as in the most high-density according to the doubtful outlier obtained using density peaks algorithm
Mood condition carries out increasing cluster, or the merging of cluster is carried out according to the spatial position of existing cluster in the heart;
Data stream clustering terminates detection module, is configured as the step of return receives new a batch data flow data to be clustered
Continue data stream clustering iteration, when sampling window reaches data flow tail portion to be clustered, terminates data stream clustering.
The disclosure the utility model has the advantages that
A kind of data stream clustering method and device based on density peaks that the disclosure provides guarantees base using fuzzy clustering
This cluster efficiency introduces density peaks algorithm and guarantees basic clustering precision, doubtful outlier is proposed on the basis of the two
Concept improve the accuracy rate of algorithm, introduce space-time attenuating mechanism and adaptively sampled data window model to ensure
The high efficiency of algorithm.The disclosure is effectively applied to the analysis of enterprises and institutions' data, and especially timeliness is more demanding and data
Biggish application environment is measured, realizes the result in real time, efficiently, intuitively obtaining clustering.
Detailed description of the invention
The Figure of description for constituting a part of this disclosure is used to provide further understanding of the disclosure, and the disclosure is shown
Meaning property embodiment and its explanation do not constitute the improper restriction to the disclosure for explaining the disclosure.
Fig. 1 is a kind of data stream clustering method flow diagram based on density peaks according to one or more embodiments;
Fig. 2 is a kind of data stream clustering method detailed process based on density peaks according to one or more embodiments
Figure;
Fig. 3 is the doubtful outlier schematic diagram according to one or more embodiments;
Fig. 4 is the doubtful outlier schematic diagram in the dynamic clustering according to one or more embodiments;
Fig. 5 is the real outlier schematic diagram according to one or more embodiments.
Specific embodiment:
Below in conjunction with the attached drawing in one or more other embodiments of the present disclosure, to one or more other embodiments of the present disclosure
In technical solution be clearly and completely described, it is clear that described embodiment is only disclosure a part of the embodiment,
Instead of all the embodiments.Based on one or more other embodiments of the present disclosure, those of ordinary skill in the art are not being made
Every other embodiment obtained under the premise of creative work belongs to the range of disclosure protection.
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the disclosure.Unless another
It indicates, all technical and scientific terms that the present embodiment uses have and disclosure person of an ordinary skill in the technical field
Normally understood identical meanings.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root
According to the illustrative embodiments of the disclosure.As used herein, unless the context clearly indicates otherwise, otherwise singular
Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
It should be noted that flowcharts and block diagrams in the drawings show according to various embodiments of the present disclosure method and
The architecture, function and operation in the cards of system.It should be noted that each box in flowchart or block diagram can represent
A part of one module, program segment or code, a part of the module, program segment or code may include one or more
A executable instruction for realizing the logic function of defined in each embodiment.It should also be noted that some alternately
Realization in, function marked in the box can also occur according to the sequence that is marked in attached drawing is different from.For example, two connect
The box even indicated can actually be basically executed in parallel or they can also be executed in a reverse order sometimes,
This depends on related function.It should also be noted that each box and flow chart in flowchart and or block diagram
And/or the combination of the box in block diagram, the dedicated hardware based system that functions or operations as defined in executing can be used are come
It realizes, or the combination of specialized hardware and computer instruction can be used to realize.
In the absence of conflict, the feature in the embodiment and embodiment in the disclosure can be combined with each other, and tie below
It closes attached drawing and embodiment is described further the disclosure.
Embodiment one
In order to realize full online data stream cluster, the purpose of efficiency of algorithm is improved, in fact according to the one or more of the disclosure
The one aspect for applying example provides a kind of data stream clustering method based on density peaks.
As Figure 1-Figure 2, a kind of data stream clustering method based on density peaks, this method comprises:
A kind of data stream clustering method based on density peaks, this method comprises:
Step S1: first batch of data flow data to be clustered, initiation parameter and data structure are received;
Step S2: new a batch data flow data to be clustered is received as new data, carries out the pre- cluster of new data, simultaneously
Old data flow data is decayed;
Step S3: new a batch data flow data to be clustered and old data flow data and its data structure are closed
And and Cluster merging after data, data after the merging become old data;
Step S4: the screening for carrying out doubtful outlier to old data updates;The doubtful outlier is according to old number
It is greater than the object-point of threshold value according to the weight of calculating;
Step S5: the width of data sampling window in following iteration is determined according to the attenuation of old data;
Step S6: increased according to the maximal density center situation of the doubtful outlier obtained using density peaks algorithm
Cluster, or the merging of the spatial position progress cluster according to existing cluster in the heart;
Step S7: returning to the step S2 continuation data stream clustering iteration for receiving new a batch data flow data to be clustered, when
When sampling window reaches data flow tail portion to be clustered, terminate data stream clustering.
One, the preparation stage:
In the step S1 of the present embodiment, this method carries out the preparation of initialization carry out method, to parameters and
Data structure carries out assignment and building.
Step S101: the parameter of initialization includes cluster number, cluster heart matrix, first batch of data flow data to be clustered,
Cluster result sequence, life span sequence, decay weight sequence, and doubtful outlier screens weight sequence;
Step S102: to first batch of pending data X carry out fuzzy clustering be calculated initial subordinated-degree matrix Mem_xv and
Cluster heart matrix V adjusted, the value of other each data structures are all adjusted correspondingly;
In the present embodiment, it before the system core recycles body running, needs to some necessary data structures and parameter
It is initialized, is that assignment is carried out to the i.e. k of initial cluster heart number first, because our algorithm can adaptively adjust of cluster
Number, here can be with one value appropriate of random initializtion;Then by first batch of data X=[x1,x2,……,xn] load into interior
It deposits, wherein xiIt is the attribute vector of m dimension, 1≤i≤n.Corresponding cluster result sequence C lass is constructed according to first batch of data volume,
Life span sequence Time, decay weight sequence D _ weight, and doubtful outlier screening weight sequence O_weight etc. matches tricks
According to structure, wherein the element initialization value in Time and O_weight is the equal assignment 1 of element of 0, D_weight.Assuming that number
It is L according to amount, then data structure is the one-dimensional vector that length is L.K object is finally selected at random in X as the initial cluster heart
Matrix V calculates X using fuzzy clustering algorithm, obtains initial subordinated-degree matrix Mem_xv.Algorithm beam worker makes knots at this time
Beam.
Two, core loop body:
In the step S2 of the present embodiment, specific steps include:
Step S201: newly into batch of data, fuzzy clustering is carried out on the basis of cluster heart matrix V obtained in the previous step, is obtained
Its subordinated-degree matrix Mem_xv_n;
Step S202: decaying to old data according to decaying weight, and weight will be by lower than the old data point of threshold value
It attenuates, adjusts each data structure of old data and its value;
Step S201 and step S202 is parallel processing.
In the step S3 of the present embodiment, by initial subordinated-degree matrix Mem_xv and newly into the subordinated-degree matrix of batch of data
Mem_xv_n and new legacy data and its data structure merge, and continue fuzzy clustering based on cluster heart matrix V and obtain
To new initial subordinated-degree matrix Mem_xv and cluster heart matrix V, the data set dataset after merging at this time becomes old data;
In the step S4 of the present embodiment, doubtful outlier right value update is carried out to old data and to screen weight as base
Standard carries out the filtering of doubtful outlier;
In the step S5 of the present embodiment, number in following iteration is determined according to the attenuation of data old in current iteration
According to the width of sampling window, the quantity in_num of next batch data is controlled;
In the step S6 of the present embodiment, specific steps include:
Step S601: judge whether have density greater than threshold value o_threshold in doubtful outlier, if there is taking most
Big point is that the cluster heart increases a new cluster newly, and cluster interior knot is obtained from following iteration calculating;
Step S602: judging existing cluster, whether there are two distances between the cluster heart to be less than threshold value d_threshold in the heart,
If there is two clusters are merged.
In the present embodiment, core loop body, this part be system to core, mainly realize dynamic clustering, data
Merge, the screening of doubtful outlier, the adaptive adjustment of sampling data window width, the decaying of old data, the newly-increased and fusion of cluster
Etc. functions.Firstly, the data for being newly in_num into a collection of quantity, carry out fuzzy clustering division by initial cluster heart matrix of V, obtain
Its subordinated-degree matrix Mem_xv_n;The update and data decaying of data decaying weight are completed simultaneously for old data portion
Work, the old data after obtained decaying are merged with new data and its corresponding data structure, and continue fuzzy clustering,
New subordinated-degree matrix Mem_xv and updated old data dataset after being merged.The decaying weight of old data
Calculating is to have comprehensively considered room and time factor, is indicated with formula (1).
Wherein parameter w and z is the adjusting parameter of respective control time and spacial influence degree, is generally respectively set to w=2
And z=1.Dc (t, i) is indicated at a distance from i-th of data point in the t times iteration is between the maximum cluster heart of its degree of membership.It is all
The new decaying weight to data be initialized as 1.Drop threshold λ is 0.2, as D_weight (t, i) < λ, corresponding data
Point is attenuated, i.e. removal memory is no longer participate in calculating.
And the screening operation of next doubtful outlier be also built upon doubtful outlier screening weight imparting it
On, doubtful outlier is the concept that we are put forward for the first time, it mainly has the characteristics that following: 1, positioned at two clusters it
Between or intersection;2, each cluster of positional distance is all farther out;3, more similar to the degree of membership of multiple clusters;4, in dynamic clustering
In the process temporarily apart from affiliated cluster farther out.As shown in figure 3, the partial objects being overlapped between triangle and diamond shape cluster can be divided to
Doubtful outlier scope.By these features we can see that these ranges cover outlier, real outlier one
It surely is to ensconce in doubtful outlier, so we need first to screen doubtful outlier, screening weight such as (2) is indicated.
Wherein 1≤i, j≤Lt。LtWhat is indicated is data volume when being currently at the t times iteration in memory.In degree of membership square
In battle array, each data point to each cluster has a degree of membership, does difference operation between these degrees of membership and obtains a series of differences, this
A little value compositions Length be sumd=1+2+ ...+k-1.AndValue it is smaller, illustrating it more may be doubtful outlier.Indicate data object xiIt arrives
At a distance from its degree of membership maximum cluster heart.Smaller its of the value of last fraction more may be doubtful outlier in formula.ξ is one
Weight parameter, value range ξ ∈ [0.001,0.01,0.1,1].And as the standard for filtering doubtful outlier, screen threshold value
Setting such as shown in (3):
WhenWhen, we are just by object xiIt is divided to the scope of doubtful outlier.
It is the doubtful outlier occurred in dynamic clustering process as shown in Figure 4 or is known as pseudo- outlier schematic diagram, in Fig. 4
The left side is the distribution situation of global data collection, cluster three is individually plucked out be in order to be more convenient to explain that is doubtful outlier, when
Static global data collection according to Stream Processing, only there can be a part of data in memory, that is, will appear lower right corner subgraph
In the case where occurring, have single cluster interior knot temporarily apart from main cluster farther out, it is considered to be doubtful outlier becomes at this time
Pseudo- outlier is properer.
It is real outlier schematic diagram as shown in Figure 5, in dynamic clustering process, the characteristics of real outlier is to belong to
The subset of doubtful outlier feature can temporarily divide doubtful outlier into, can obtain right part of flg by density peaks algorithm finally
In decision diagram be filtered out, improve arithmetic accuracy.Because the space complexity of density peaks algorithm is higher, and less
The division of doubtful outlier will not influence efficiency of algorithm.
For determining that how many new data enter system when following iteration, i.e., adaptively adjust the width of sampling data window model
Degree is to calculate gained by formula (4) and (5):
Dif (t)=countd(t)-in_num(t) (4)
Wherein countd(t) the old data amount check being attenuated in current iteration is indicated.Matlab function floor ()
It is accomplished that the function of rounding up, for example floor (1.43)=2. parameter e is a tune ginseng, value is greater than 1.Next group newly counts
It is in_num (t)=Width (t+1) according to amount.
The last one part and parcel is the newly-increased and combined realization of cluster, when the scale of doubtful outlier is increasing,
The maximum density center of density is obtained by density peaks algorithm, it, just will be in this when the density for having density center is greater than threshold value
The cluster heart of the heart as newly-increased cluster, increases the division that cluster carries out cluster member, the setting such as (6) of threshold value newly to this in next iteration
It is shown:
What wherein density_o (t) was indicated is doubtful outlier density sequence in the t times iteration, preserves each doubt
Like the local density values of outlier, number, that is, sequence length of doubtful outlier.τ is a flexible parameter, for adjusting threshold
The setting of value, its value range are [0.9,1.1].Density (vi, t) indicates the cluster heart v in the t times iterationiPart it is close
Degree.As max (density_o (t)) >=d_threshold (t), a cluster, k=k+1 are increased newly.Real outlier is repeatedly
The scope of doubtful outlier can be always belonged to during generation.With the iteration of data, they are eventually attenuated, and are avoided it and are drawn
The characteristic offset risen.In addition, may become closer to each other in subsequent iteration for the new cluster or old cluster of generation.When wherein
When the heart is close enough, they are merged to form a new cluster.We calculate the distance between cluster heart d_v (t, vi,vj), DC=
0.2*max (d_v (t)) is distance threshold.When the distance between two cluster hearts are less than DC, corresponding two clusters are merged,
K=k-1 at this time.
Three, algorithm terminates:
In the step S7 of the present embodiment, judge whether remaining data volume is less than the quantity in_num of next batch data,
If so, remaining data volume is assigned the quantity of next batch data to in_num and carries out last interative computation, end loop
Otherwise body continues.
In the present embodiment, after core algorithm body completes current iteration, need to judge whether data flow terminates, thus certainly
Fixed circulation of whether jumping out terminates program.When detecting that remaining data amount is greater than the in_num value being calculated, normally enter down
One wheel iterative process, and when it is less than in_num, needing to adjust in_num value is remaining data amount, enters back into following iteration.
After only completing normal iteration work need that density peaks algorithm is finally used to divide remaining doubtful outlier, it will
Real outlier screens, then other points are divided with fuzzy clustering, terminates entire algorithm at this time to data flow
Handle work.
Embodiment two
According to the one aspect of one or more other embodiments of the present disclosure, a kind of computer readable storage medium is provided.
A kind of computer readable storage medium, wherein being stored with a plurality of instruction, described instruction is suitable for by terminal device
Reason device loads and executes a kind of data stream clustering method based on density peaks.
Embodiment three
According to the one aspect of one or more other embodiments of the present disclosure, a kind of terminal device is provided.
A kind of terminal device comprising processor and computer readable storage medium, processor is for realizing each instruction;Meter
Calculation machine readable storage medium storing program for executing is suitable for being loaded by processor and being executed described one kind and is based on for storing a plurality of instruction, described instruction
The data stream clustering method of density peaks.
These computer executable instructions execute the equipment according to each reality in the disclosure
Apply method or process described in example.
In the present embodiment, computer program product may include computer readable storage medium, containing for holding
The computer-readable program instructions of row various aspects of the disclosure.Computer readable storage medium, which can be, can keep and store
By the tangible device for the instruction that instruction execution equipment uses.Computer readable storage medium for example can be-- but it is unlimited
In-- storage device electric, magnetic storage apparatus, light storage device, electric magnetic storage apparatus, semiconductor memory apparatus or above-mentioned
Any appropriate combination.The more specific example (non exhaustive list) of computer readable storage medium includes: portable computing
Machine disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or
Flash memory), static random access memory (SRAM), Portable compressed disk read-only memory (CD-ROM), digital versatile disc
(DVD), memory stick, floppy disk, mechanical coding equipment, the punch card for being for example stored thereon with instruction or groove internal projection structure, with
And above-mentioned any appropriate combination.Computer readable storage medium used herein above is not interpreted instantaneous signal itself,
The electromagnetic wave of such as radio wave or other Free propagations, the electromagnetic wave (example propagated by waveguide or other transmission mediums
Such as, pass through the light pulse of fiber optic cables) or pass through electric wire transmit electric signal.
Computer-readable program instructions described herein can be downloaded to from computer readable storage medium it is each calculate/
Processing equipment, or outer computer or outer is downloaded to by network, such as internet, local area network, wide area network and/or wireless network
Portion stores equipment.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, firewall, interchanger, gateway
Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted
Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment
In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing present disclosure operation can be assembly instruction, instruction set architecture (ISA)
Instruction, machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programmings
The source code or object code that any combination of language is write, the programming language include the programming language-of object-oriented such as
C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer-readable program refers to
Order can be executed fully on the user computer, partly be executed on the user computer, as an independent software package
Execute, part on the user computer part on the remote computer execute or completely on a remote computer or server
It executes.In situations involving remote computers, remote computer can include local area network by the network-of any kind
(LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize internet
Service provider is connected by internet).In some embodiments, by being believed using the state of computer-readable program instructions
Breath comes personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or programmable logic
Array (PLA), the electronic circuit can execute computer-readable program instructions, to realize the various aspects of present disclosure.
Example IV
According to the one aspect of one or more other embodiments of the present disclosure, it is poly- to provide a kind of data flow based on density peaks
Class device.
A kind of data stream clustering device based on density peaks, it is poly- based on a kind of data flow based on density peaks
Class method, comprising:
It predicts target acquisition module, is configured as receiving the prediction of the different energy sources alternative solution of energy substitution conversion enterprise
Target, and obtain the frequent item set of all kinds of targets;
Prediction model establishes module, and it is related to be configured as receiving all kinds of enterprise energy substitutions front and back in electric power enterprise database
Data construct Gauss regression combination prediction model;
Predicted target values computing module is configured as according to Gauss regression combination prediction model to different energy sources alternative solution
The frequent item sets of all kinds of targets carry out sub-clustering prediction, obtain the predicted target values of different energy sources alternative solution, and to prediction mesh
Scale value carries out linear combination and obtains expense year value predicted target values;
Electric energy alternative solution prediction module is configured as being worth equal principle based on expense year, is predicted according to expense year value
Target value carries out data reckoning to fungible energy source scheme, obtains the boundary electricity price of electric energy alternative solution, and calculates electric energy substitution
The uncertainty of scheme is estimated, and electric energy alternative solution prediction result is obtained;
Parameter adjustment module is configured as carrying out data feedback based on prediction result, and answers with electric system related service
It is compared with the received real data of platform, adjusts the parameter of Gauss regression combination prediction model, it is adjusted using parameter
Gauss regression combination prediction model carries out the prediction of electric energy alternative solution.
It should be noted that although being referred to several modules or submodule of equipment in the detailed description above, it is this
Division is only exemplary rather than enforceable.In fact, in accordance with an embodiment of the present disclosure, two or more above-described moulds
The feature and function of block can embody in a module.Conversely, the feature and function of an above-described module can be with
Further division is to be embodied by multiple modules.
The disclosure the utility model has the advantages that
A kind of data stream clustering method and device based on density peaks that the disclosure provides, solves electric energy alternative solution
In prediction, the problem of traditional statistical method time and effort consuming, and the big problem of traditional statistical method precision of prediction error is improved;
In addition, the disclosure is worth equal principle based on expense year, the boundary electricity price of electric energy alternative solution is obtained, and calculates electric energy alternative
The uncertainty of case is estimated, and provides effective support for the prediction of electric energy alternative solution, and the final disclosure is responding state-of-the-nation call,
In terms of preserving the ecological environment, the support with practical significance is provided.
The foregoing is merely preferred embodiment of the present disclosure, are not limited to the disclosure, for the skill of this field
For art personnel, the disclosure can have various modifications and variations.It is all within the spirit and principle of the disclosure, it is made any to repair
Change, equivalent replacement, improvement etc., should be included within the protection scope of the disclosure.Therefore, the disclosure is not intended to be limited to this
These embodiments shown in text, and it is to fit to the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. a kind of data stream clustering method based on density peaks, which is characterized in that this method comprises:
Receive first batch of data flow data to be clustered, initiation parameter and data structure;
New a batch data flow data to be clustered is received as new data, carries out the pre- cluster of new data, while by old data
Flow data is decayed;
New a batch data flow data to be clustered and old data flow data and its data structure are merged, and Cluster merging
Data afterwards, the data after the merging become old data;
The screening for carrying out doubtful outlier to old data updates;The doubtful outlier is the weight calculated according to old data
Greater than the object-point of threshold value;
The width of data sampling window in following iteration is determined according to the attenuation of old data;
Increasing cluster is carried out according to the maximal density center situation of the doubtful outlier obtained using density peaks algorithm, or according to existing
The spatial position of cluster in the heart carries out the merging of cluster;
Return to the step of receiving new a batch data flow data to be clustered and continue data stream clustering iteration, when sampling window reach to
When the data flow tail portion of cluster, terminate data stream clustering.
2. a kind of data stream clustering method based on density peaks as described in claim 1, which is characterized in that in this method
In, the parameter of initialization includes cluster number, cluster heart matrix, first batch of data flow data to be clustered, cluster result sequence, life
Time series is deposited, decay weight sequence, and doubtful outlier screens weight sequence.
3. a kind of data stream clustering method based on density peaks as claimed in claim 2, which is characterized in that in this method
In, the initiation parameter further includes carrying out fuzzy clustering calculating to the first batch of data flow data to be clustered, is obtained initial
Subordinated-degree matrix and cluster heart matrix adjusted.
Further, in the method, the initialization data structure is the first batch of data flow data to be clustered of basis, will be each
The value of data structure is adjusted correspondingly.
4. a kind of data stream clustering method based on density peaks as described in claim 1, which is characterized in that in this method
In, the pre- cluster for carrying out new data is by new a batch data flow data to be clustered with cluster heart matrix in initiation parameter
Fuzzy clustering is carried out on the basis of cluster heart matrix in initial value or a upper iteration, obtains its subordinated-degree matrix.
5. a kind of data stream clustering method based on density peaks as described in claim 1, which is characterized in that in this method
In, described decay to old data flow data decays to old data according to decaying weight, and it is low to attenuate weight
In the old data point of preset drop threshold, each data structure of old data and its value are adjusted;The decaying weight according to
Space factor and time factor calculate.
6. a kind of data stream clustering method based on density peaks as described in claim 1, which is characterized in that in this method
In, the specific steps for increasing cluster include:
According to the maximum density center of density of the doubtful outlier obtained using density peaks algorithm;
It is when the density for having density center is greater than pre-set density threshold value, the density center is one newly-increased as the cluster heart of newly-increased cluster
New cluster;
The combined specific steps of the cluster include:
Judging existing cluster according to the spatial position of existing cluster in the heart, whether there are two distances between the cluster heart to be less than the default cluster heart in the heart
Distance threshold;
When distance is less than default cluster heart distance threshold between two cluster hearts, two clusters are merged.
7. a kind of data stream clustering method based on density peaks as described in claim 1, which is characterized in that in this method
In, the specific steps that the sampling window reaches the judgment method of data flow tail portion to be clustered include:
According to the data volume of the width control next group of data sampling window in following iteration data flow data to be clustered;
Judge whether the data volume of remaining data flow to be clustered is less than the data volume of next group data flow data to be clustered, such as
Fruit is, then by the data volume of the data volume assignment of remaining data flow to be clustered to next group data flow data to be clustered, and
Last interative computation is carried out, terminates iteration, otherwise continues.
8. a kind of computer readable storage medium, wherein being stored with a plurality of instruction, which is characterized in that described instruction is suitable for by terminal
The processor of equipment is loaded and is executed such as a kind of described in any item data stream clustering sides based on density peaks claim 1-7
Method.
9. a kind of terminal device comprising processor and computer readable storage medium, processor is for realizing each instruction;It calculates
Machine readable storage medium storing program for executing is for storing a plurality of instruction, which is characterized in that described instruction is suitable for being loaded by processor and being executed such as power
Benefit requires a kind of described in any item data stream clustering methods based on density peaks of 1-7.
10. a kind of data stream clustering device based on density peaks, which is characterized in that based on such as any one of claim 1-7 institute
A kind of data stream clustering method based on density peaks stated, comprising:
Initialization module is configured as receiving first batch of data flow data to be clustered, initiation parameter and data structure;
Pre- cluster and decaying parallel modules are configured as receiving new a batch data flow data to be clustered as new data, carry out
The pre- cluster of new data, while old data flow data being decayed;
New and old data combiners block is configured as new a batch data flow data to be clustered and old data flow data and its number
It is merged according to structure, and the data after Cluster merging, the data after the merging become old data;
Doubtful discrete point screening module, the screening for being configured as carrying out old data doubtful outlier update;It is described it is doubtful from
Group's point is that the weight calculated according to old data is greater than the object-point of threshold value;
Lower batch data amount determining module, is configured as determining data sampling window in following iteration according to the attenuation of old data
The width of mouth;
Increase cluster and subtract cluster module, is configured as mood in the most high-density according to the doubtful outlier obtained using density peaks algorithm
Condition carries out increasing cluster, or the merging of cluster is carried out according to the spatial position of existing cluster in the heart;
Data stream clustering terminates detection module, is configured as the step of return receives new a batch data flow data to be clustered continuation
Data stream clustering iteration terminates data stream clustering when sampling window reaches data flow tail portion to be clustered.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910324141.1A CN110163255B (en) | 2019-04-22 | 2019-04-22 | Data stream clustering method and device based on density peak value |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910324141.1A CN110163255B (en) | 2019-04-22 | 2019-04-22 | Data stream clustering method and device based on density peak value |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110163255A true CN110163255A (en) | 2019-08-23 |
CN110163255B CN110163255B (en) | 2021-11-16 |
Family
ID=67639909
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910324141.1A Expired - Fee Related CN110163255B (en) | 2019-04-22 | 2019-04-22 | Data stream clustering method and device based on density peak value |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110163255B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110488259A (en) * | 2019-08-30 | 2019-11-22 | 成都纳雷科技有限公司 | A kind of classification of radar targets method and device based on GDBSCAN |
CN114861729A (en) * | 2022-05-20 | 2022-08-05 | 西安邮电大学 | Method and device for detecting time sequence abnormity in wireless sensor network |
CN116227538A (en) * | 2023-04-26 | 2023-06-06 | 国网山西省电力公司晋城供电公司 | Clustering and deep learning-based low-current ground fault line selection method and equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139035A (en) * | 2015-08-31 | 2015-12-09 | 浙江工业大学 | Mixed attribute data flow clustering method for automatically determining clustering center based on density |
CN105868266A (en) * | 2016-01-27 | 2016-08-17 | 电子科技大学 | Clustering model based high-dimensional data stream outlier detection method |
CN109409400A (en) * | 2018-08-28 | 2019-03-01 | 西安电子科技大学 | Merge density peaks clustering method, image segmentation system based on k nearest neighbor and multiclass |
-
2019
- 2019-04-22 CN CN201910324141.1A patent/CN110163255B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105139035A (en) * | 2015-08-31 | 2015-12-09 | 浙江工业大学 | Mixed attribute data flow clustering method for automatically determining clustering center based on density |
CN105868266A (en) * | 2016-01-27 | 2016-08-17 | 电子科技大学 | Clustering model based high-dimensional data stream outlier detection method |
CN109409400A (en) * | 2018-08-28 | 2019-03-01 | 西安电子科技大学 | Merge density peaks clustering method, image segmentation system based on k nearest neighbor and multiclass |
Non-Patent Citations (2)
Title |
---|
RUI ZHANG 等: "A Principal Component Analysis Algorithm Based on Dimension Reduction Window", 《IEEE》 * |
谢娟英 等: "K近邻优化的密度峰值快速搜索聚类算法", 《中国科学:信息科学》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110488259A (en) * | 2019-08-30 | 2019-11-22 | 成都纳雷科技有限公司 | A kind of classification of radar targets method and device based on GDBSCAN |
CN114861729A (en) * | 2022-05-20 | 2022-08-05 | 西安邮电大学 | Method and device for detecting time sequence abnormity in wireless sensor network |
CN116227538A (en) * | 2023-04-26 | 2023-06-06 | 国网山西省电力公司晋城供电公司 | Clustering and deep learning-based low-current ground fault line selection method and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN110163255B (en) | 2021-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
De Toro et al. | PSFGA: a parallel genetic algorithm for multiobjective optimization | |
CN103365727B (en) | Host load forecasting method in cloud computing environment | |
CN107220217A (en) | Characteristic coefficient training method and device that logic-based is returned | |
CN110163255A (en) | A kind of data stream clustering method and device based on density peaks | |
CN109933306A (en) | Mix Computational frame generation, data processing method, device and mixing Computational frame | |
CN106527381B (en) | A kind of fast evaluation method towards parallel batch processing machine dynamic dispatching | |
CN104951425A (en) | Cloud service performance adaptive action type selection method based on deep learning | |
CN109840154A (en) | A kind of computation migration method that task based access control relies under mobile cloud environment | |
CN110389824A (en) | Handle method, equipment and the computer program product of calculating task | |
CN110533484A (en) | A kind of product Method for Sales Forecast method based on PCA and improved BP | |
CN117472587B (en) | Resource scheduling system of AI intelligent computation center | |
CN109523178A (en) | A kind of O&M method and device towards power communication scene | |
CN109409746A (en) | A kind of production scheduling method and device | |
El‐Ghandour et al. | Survey of information technology applications in construction | |
Gong et al. | Evolutionary computation in China: A literature survey | |
Peng et al. | Reliability-aware computation offloading for delay-sensitive applications in mec-enabled aerial computing | |
Shahin | Memetic multi-objective particle swarm optimization-based energy-aware virtual network embedding | |
CN114650321A (en) | Task scheduling method for edge computing and edge computing terminal | |
Shi et al. | Analytics for IoT‐enabled human–robot hybrid sortation: an online optimization approach | |
CN107038244A (en) | A kind of data digging method and device, a kind of computer-readable recording medium and storage control | |
Duca et al. | An overview of non-Gaussian state-space models for wind speed data | |
Mamdouh et al. | Airport resource allocation using machine learning techniques | |
CN109857817A (en) | The whole network domain electronic mutual inductor frequent continuous data is screened and data processing method | |
CN108256694A (en) | Based on Fuzzy time sequence forecasting system, the method and device for repeating genetic algorithm | |
CN111046321B (en) | Photovoltaic power station operation and maintenance strategy optimization method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20211116 |