CN110019543A

CN110019543A - A kind of method and device of Time Series Clustering

Info

Publication number: CN110019543A
Application number: CN201710817446.7A
Authority: CN
Inventors: 刘建伟
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2017-09-12
Filing date: 2017-09-12
Publication date: 2019-07-16

Abstract

The invention discloses a kind of method and devices of Time Series Clustering, are related to intelligent information and communication technique field, and method includes: that algorithm assembly receives the algorithm parameter for being clustered to time series data that parameter component is sent；Algorithm assembly reads time series data from time series database, carries out clustering processing to read time series data according to the received algorithm parameter of institute, obtains the cluster result comprising temporal aspect matrix and Time Series Clustering label；The cluster result comprising temporal aspect matrix and Time Series Clustering label is saved in cluster result database by algorithm assembly, and shows the cluster result by display component.

Description

A kind of method and device of Time Series Clustering

Technical field

The present invention relates to intelligent IC T (Information and Communication Technology, information and communications Technology) field, in particular to a kind of method and device of Time Series Clustering.

Background technique

IT (Internet Technology, Internet technology) cluster has extensive utilization in all trades and professions, is transported with telecommunications It seeks for quotient, core net, network management center and data center etc. are to rely on IT cluster.In general, IT cluster scale is huge Greatly, the hardware and software number and type of configuration are various.IT cluster is have strict demand to the uptime uninterrupted again System sharply declines if software error and hardware fault occur and family experience not being used only, and expends a large amount of maintenance costs.Therefore The management of cluster and O&M are always important and challenging task, need uninterruptedly monitor cluster performance data with Just detection incipient fault or exception are carried out.

With the introducing of the technologies such as virtualization and SDN (Software Defined Network, software defined network), pass IT cluster of uniting changes to cloudization, and cluster scale further increases, and upper layer software (applications) is applied and type of service increases increasingly, required monitoring Performance indicator quantity have million grades or even more.Therefore, the method that traditional artificial given threshold is monitored has been difficult to full Sufficient application demand, not only cost of labor increases, and O&M efficiency and accuracy decline.Automation O&M pair is realized based on machine learning It solves the problems, such as that this is of great significance, in the industry cycle obtains common concern.One key of automation O&M is dug using data The method of pick carries out abnormality detection performance data, since clustering performance data class is multifarious, does not have according in machine learning There is free lunch theorem, all timing can not be solved the problems, such as using a kind of Outlier Detection Algorithm, is needed for different spies Property timing select respectively suitable Outlier Detection Algorithm.

Summary of the invention

The technical issues of scheme provided according to embodiments of the present invention solves is the automatic of the performance data of IT cluster acquisition Classification problem, so that respectively suitable Outlier Detection Algorithm provides basis for different classes of data selection.

A kind of method of the Time Series Clustering provided according to embodiments of the present invention, comprising:

Algorithm assembly receives the algorithm parameter for being clustered to time series data that parameter component is sent；

Algorithm assembly reads time series data from time series database, according to the received algorithm parameter of institute to read timing Data carry out clustering processing, obtain the cluster result comprising temporal aspect matrix and Time Series Clustering label；

The cluster result comprising temporal aspect matrix and Time Series Clustering label is saved in cluster result by algorithm assembly In database, and the cluster result is shown by display component.

A kind of device of the Time Series Clustering provided according to embodiments of the present invention, comprising:

Receiving module, for receiving the algorithm parameter for being clustered to time series data；

Cluster module, for reading time series data from time series database, according to the received algorithm parameter of institute to being read Time series data carry out clustering processing, obtain the cluster result comprising temporal aspect matrix and Time Series Clustering label；

Preservation and display module, for saving the cluster result comprising temporal aspect matrix and Time Series Clustering label The cluster result is shown into cluster result database, and through display component.

A kind of electronic equipment of the Time Series Clustering provided according to embodiments of the present invention, the electronic equipment include: processor And memory, wherein the memory is for storing executable program code；The processor is by reading in the memory The executable program code of storage runs program corresponding with executable program code, for executing following steps:

The scheme provided according to embodiments of the present invention, (1) notify algorithm parameter to algorithm assembly by parameter component, can mention The flexibility of high algorithm assembly.User can need to select suitable algorithm parameter according to problem, so that oneself can be solved by obtaining The algorithm service of problem.(2) algorithm assembly can also be more in addition to providing cluster result to display component in a manner of providing service A application component provides service, these application components can be front end display interface, be also possible to anomaly detection component (for every class Timing provides suitable Outlier Detection Algorithm) etc., improve the reusability of algorithm assembly.(3) algorithm assembly uses hierarchical clustering Mode, timing is first divided into preiodic type and Non-periodic Type two major classes, then carry out clustering inside each major class, can reduced poly- The complexity of alanysis.

Detailed description of the invention

Fig. 1 is a kind of method flow diagram of Time Series Clustering provided in an embodiment of the present invention；

Fig. 2 is a kind of schematic device of Time Series Clustering provided in an embodiment of the present invention；

Fig. 3 is Time Series Clustering system process flow diagram provided in an embodiment of the present invention；

Fig. 4 is the flow chart that algorithm assembly provided in an embodiment of the present invention is clustered；

Fig. 5 is periodical method of discrimination flow chart provided in an embodiment of the present invention；

Fig. 6 is all kinds of time diagrams of preiodic type provided in an embodiment of the present invention；

Fig. 7 is all kinds of time diagrams of Non-periodic Type provided in an embodiment of the present invention.

Specific embodiment

Below in conjunction with attached drawing to a preferred embodiment of the present invention will be described in detail, it should be understood that described below is excellent Select embodiment only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention.

Fig. 1 is a kind of method flow diagram of Time Series Clustering provided in an embodiment of the present invention, as shown in Figure 1, comprising:

Step S101: algorithm assembly receives the algorithm parameter for being clustered to time series data that parameter component is sent；

Step S102: algorithm assembly reads time series data from time series database, according to the received algorithm parameter of institute to institute The time series data of reading carries out clustering processing, obtains the cluster result comprising temporal aspect matrix and Time Series Clustering label；

Step S103: the cluster result comprising temporal aspect matrix and Time Series Clustering label is saved in by algorithm assembly In cluster result database, and the cluster result is shown by display component.

Wherein, the algorithm parameter includes timing cycles preset value, temporal aspect collection and Time Series Clustering number；When described Sequence characteristics collection includes seasonal indicator, tendency index, the degree of bias, kurtosis, auto-correlation coefficient, relative entropy, Sample Entropy, self-similarity And one or more of Liapunov coefficient；The Time Series Clustering number includes period Time Series Clustering number and aperiodic Time Series Clustering number.

Wherein, the algorithm assembly carries out clustering processing packet to read time series data according to the received algorithm parameter of institute Include: algorithm assembly extracts the structure characteristic collection of the time series data, and root according to the temporal aspect collection in the algorithm parameter According to the timing cycles preset value in the algorithm parameter, determine period time series data in the time series data and it is aperiodic when Ordinal number evidence；Algorithm assembly is according to taken out time series data structure characteristic collection and the Time Series Clustering number, respectively to institute's week Phase time series data and aperiodic time series data carry out clustering processing.

Specifically, the algorithm assembly is according to the timing cycles preset value in the algorithm parameter, when determining described Period time series data and aperiodic time series data of the ordinal number in include: the amplitude spectrum maximum that algorithm assembly calculates time series data Value, amplitude spectrum average value and amplitude spectrum standard deviation, and judge the difference of the amplitude spectrum maximum value Yu the amplitude spectrum average value Whether value is greater than the multiple of amplitude spectrum standard deviation；If judging the difference of the amplitude spectrum maximum value and the amplitude spectrum average value not Greater than the multiple of amplitude spectrum standard deviation, then algorithm assembly differentiates that the time series data is aperiodic time series data；If described in judgement Amplitude spectrum maximum value and the difference of the amplitude spectrum average value are greater than the multiple of amplitude spectrum standard deviation, then algorithm assembly is further sentenced Whether the corresponding timing cycles of the amplitude spectrum maximum value of breaking are equal to the timing cycles preset value；If judging the amplitude spectrum most It is worth corresponding timing cycles greatly equal to the timing cycles preset value, then algorithm assembly differentiates that the time series data is period timing Data；If judging the corresponding timing cycles of the amplitude spectrum maximum value not equal to the timing cycles preset value, algorithm assembly Differentiate that the time series data is aperiodic time series data.

Specifically, the algorithm assembly is according to taken out time series data structure characteristic collection and the Time Series Clustering number, Respectively to institute's period time series data and aperiodic time series data carry out clustering processing include: algorithm assembly according to it is taken out when Ordinal number carries out clustering processing to the period time series data according to structure characteristic collection, according to the period Time Series Clustering number；Algorithm Component according to taken out time series data structure characteristic collection, according to the aperiodic Time Series Clustering number to it is described aperiodic when ordinal number According to progress clustering processing.

Fig. 2 is a kind of schematic device of Time Series Clustering provided in an embodiment of the present invention, as shown in Figure 2, comprising: receives mould Block 201, for receiving the algorithm parameter for being clustered to time series data；Cluster module 202, for from time series database Time series data is read, clustering processing is carried out to read time series data according to the received algorithm parameter of institute, is obtained comprising timing The cluster result of eigenmatrix and Time Series Clustering label；Preservation and display module 203, for will described include temporal aspect matrix It is saved in cluster result database with the cluster result of Time Series Clustering label, and shows that the cluster is tied by display component Fruit.

Wherein, the cluster module 202 includes: processing unit, for according to the temporal aspect collection in the algorithm parameter, The structure characteristic collection of the time series data is extracted, and according to the timing cycles preset value in the algorithm parameter, determines institute State the period time series data and aperiodic time series data in time series data；Cluster cell, for according to taken out time series data knot Structure feature set and the Time Series Clustering number respectively carry out at cluster institute's period time series data and aperiodic time series data Reason.

Specifically, the processing unit includes: feature extraction subelement, for according to the timing in the algorithm parameter Feature set extracts the structure characteristic collection of the time series data；Period differentiates subelement, for calculating the amplitude of time series data Maximum value, amplitude spectrum average value and amplitude spectrum standard deviation are composed, and judges that the amplitude spectrum maximum value and the amplitude spectrum are average Whether the difference of value is greater than the multiple of amplitude spectrum standard deviation, if judging the amplitude spectrum maximum value and the amplitude spectrum average value Difference is not more than the multiple of amplitude spectrum standard deviation, then differentiates that the time series data is aperiodic time series data, and if judging institute The difference of amplitude spectrum maximum value and the amplitude spectrum average value is stated greater than the multiple of amplitude spectrum standard deviation, then described in further judgement Whether the corresponding timing cycles of amplitude spectrum maximum value are equal to the timing cycles preset value, if judging the amplitude spectrum maximum value pair The timing cycles answered are equal to the timing cycles preset value, then differentiate that the time series data is period time series data, if judging institute The corresponding timing cycles of amplitude spectrum maximum value are stated not equal to the timing cycles preset value, then differentiate that the time series data is non-week Phase time series data.

Specifically, the cluster cell includes: the first cluster subelement, for special according to taken out time series data structure Collection carries out clustering processing to the period time series data according to the period Time Series Clustering number；Second cluster subelement, is used According to taken out time series data structure characteristic collection, according to the aperiodic Time Series Clustering number to the aperiodic time series data Carry out clustering processing.

Fig. 3 is Time Series Clustering system process flow diagram provided in an embodiment of the present invention, as shown in Figure 3, comprising: parameter group Part, algorithm assembly, time series database, cluster result database and display component.It specifically includes:

Step 301: related algorithm parameter is notified that, to algorithm assembly, algorithm parameter includes by parameter component in a manner of message Timing cycles preset value pt, selected temporal aspect collection and Time Series Clustering number.

Step 302: algorithm assembly reads time series data from time series database, and the algorithm according to transmitted by parameter component is joined Count up into Time Series Clustering.

Step 303: cluster result is stored in cluster result database by algorithm assembly, and cluster result includes temporal aspect square Battle array, Time Series Clustering label etc..

Step 304: cluster result is sent to display component by algorithm assembly.

Step 305: display component result be presented to user, including timing diagram and timing tag.

Specifically, ordinal number when time series data is first divided into preiodic type time series data and Non-periodic Type by the algorithm assembly of step 2 According to two major classes, then clustering is carried out inside each major class, to reduce the complexity of clustering.

Fig. 4 is the flow chart that algorithm assembly provided in an embodiment of the present invention is clustered, as shown in Figure 4, comprising:

Step 401: reading time series data from time series database, and time series data is pre-processed, including fill up missing Value, removal noise etc..

Step 402: extracting the structure feature of time series data.

According to the temporal aspect collection that parameter component is sent into, the structure feature of time series data is extracted.Temporal aspect collection is by such as One or more compositions of lower feature: seasonal indicator, tendency index, the degree of bias, kurtosis, auto-correlation coefficient, relative entropy, sample Entropy, self-similarity, Liapunov coefficient.

Step 403: preiodic type differentiation being carried out to time series data, time series data is first divided into two major classes: preiodic type and non-week Phase type.

As shown in figure 5, periodical method of discrimination are as follows:

1) Fourier (FFT) transformation is done to timing, if the length of FFT transform is fft_size.

2) amplitude of FFT coefficient is taken to obtain corresponding amplitude frequency spectrum.

3) 20 times of logarithmic transformations are done to amplitude frequency spectrum, frequency spectrum is carried out smooth.

4) frequency spectrum subscript [1, fft_size/4] section corresponding maximum value MAX, mean value m and standard deviation std after converting are sought. Id is designated as under note MAX is corresponding.

If 5) MAX > m+3.2*std, enter step 6), otherwise determines that timing is Non-periodic Type.

6) calculating maximum spectrum point corresponding period is that (fs is p=fft_size/ (Id*fs) * fs=fft_size/Id Sample frequency), if the period is equal to preset value pt, timing is determined as preiodic type and returns to its period p, is otherwise determined as non-week Phase type.

Step 404: the other timing of each major class being clustered using clustering algorithm, clusters the structure feature according to extraction It carries out.

It clusters the Time Series Clustering number that number is passed to by parameter component to determine, i.e. preiodic type Time Series Clustering number k1 and non-week Phase type Time Series Clustering number k2.

Step 405: according to cluster result, exporting each timing generic.

Algorithm assembly can also be sent to more application components as a result, for example in addition to sending cluster result to display component Cluster result is sent to some anomaly detection component, the anomaly detection component is according to timing generic using corresponding abnormal inspection Method of determining and calculating.

Embodiment:

In the present embodiment, the time series data stored in attached drawing 3 is 407 network port datas on flows, and acquisition time is long Degree is 2 weeks, the acquisition granularity 15 minutes (i.e. 96 points of acquisition daily).

The timing cycles preset value pt that parameter component passes to algorithm assembly in attached drawing 3 is equal to 1 day, i.e., periodically judgement is calculated Whether method determines timing using day as the period.The characteristic set of transmitting are as follows: seasonal indicator, tendency index, the degree of bias, relative entropy, Sample Entropy, self-similarity, Liapunov coefficient；7 structure features will be extracted in algorithm assembly.The preiodic type timing of transmitting Cluster number k1=4, Non-periodic Type Time Series Clustering number k2=5；Preiodic type timing is divided into 4 classes, Non-periodic Type data point For 5 classes.

The basic process of algorithm assembly used in the present embodiment, comprising:

Step 1: reading time series data, time series data is pre-processed, i.e., fills up missing by the way of linear interpolation Value, and remove noise.

Step 2: extracting the seasonal indicator of time series data, tendency index, the degree of bias, relative entropy, Sample Entropy, self-similarity With 7 structure features such as Liapunov coefficient.

Step 3: to time series data carry out preiodic type differentiation, time series data is first divided into two major classes: preiodic type with it is aperiodic Type.

Step 4: the other timing of each major class being clustered using K mean cluster algorithm, it is 4 that preiodic type data, which are gathered, Class, Non-periodic Type data are gathered for 5 classes.

Step 5: according to cluster result, exporting each timing generic.

The treatment process of the preiodic type method of discrimination of step 3, comprising:

1) Fourier (FFT) transformation is done to timing, the length of FFT transform is fft_size.

2) amplitude of FFT coefficient is taken to obtain corresponding amplitude spectrum.

3) 20 times of logarithmic transformations are done to amplitude spectrum, frequency spectrum is carried out smooth.

By preiodic type distinguished number, 407 timing have 165 timing to be divided into preiodic type data, and 242 timing are divided At Non-periodic Type data.Then clustering is carried out using data of k mean value (k-means) algorithm to each major class.Attached drawing 6 is The exemplary waveform diagram of each categorical data of preiodic type timing；Attached drawing 7 is the typical waveform of each categorical data of Non-periodic Type timing Figure.The timing number of each subclass summarizes as shown in table 1.

Table 1: each timing subclass numbers summary sheet

After completing cluster, cluster result is stored in the cluster result database (feature vector and classification mark of each timing Label).In this example it is shown that component completes front end display function, cluster result is presented to the user, i.e. display such as attached drawing 6 With the waveform diagram of the timing of all categories of attached drawing 7.

In other embodiments, it can choose the technical solution in different implementation detail realization summary of the invention.For example it answers It is equal to 1 week with the period preset value pt that component 1 is passed to；Incoming preiodic type cluster number k1 is equal to 2, and Non-periodic Type clusters number Equal to 4；Selected temporal aspect integrates as seasonal indicator, tendency index, auto-correlation coefficient, relative entropy, Sample Entropy；Increase Other application component calls the cluster result of algorithm assembly.

A kind of computer storage medium provided according to embodiments of the present invention, is stored with the program of Time Series Clustering, when described The program of sequence cluster when being executed by processor the following steps are included:

The scheme provided according to embodiments of the present invention gives algorithm assembly to transmit relevant parameter, when realization by parameter component The flexible deployment of sequence clustering algorithm.Algorithm assembly reduces the complexity of clustering by the way of hierarchical cluster.Algorithm groups Cluster result is supplied to other application component as a kind of service by part, improves its function reusability.

Although describing the invention in detail above, but the invention is not restricted to this, those skilled in the art of the present technique It can be carry out various modifications with principle according to the present invention.Therefore, all to be modified according to made by the principle of the invention, all it should be understood as Fall into protection scope of the present invention.

Claims

1. a kind of method of Time Series Clustering, comprising:

Algorithm assembly reads time series data from time series database, according to the received algorithm parameter of institute to read time series data Clustering processing is carried out, the cluster result comprising temporal aspect matrix and Time Series Clustering label is obtained；

The cluster result comprising temporal aspect matrix and Time Series Clustering label is saved in cluster result data by algorithm assembly In library, and the cluster result is shown by display component.

2. according to the method described in claim 1, the algorithm parameter includes timing cycles preset value, temporal aspect collection with timely Sequence clusters number；The temporal aspect collection includes seasonal indicator, tendency index, the degree of bias, kurtosis, auto-correlation coefficient, opposite One or more of entropy, Sample Entropy, self-similarity and Liapunov coefficient；When the Time Series Clustering number includes the period Sequence clusters number and aperiodic Time Series Clustering number.

3. according to the method described in claim 2, the algorithm assembly is according to the received algorithm parameter of institute to read timing Data carry out clustering processing

Algorithm assembly extracts the structure characteristic collection of the time series data according to the temporal aspect collection in the algorithm parameter, and According to the timing cycles preset value in the algorithm parameter, period time series data in the time series data and aperiodic is determined Time series data；

Algorithm assembly according to taken out time series data structure characteristic collection and the Time Series Clustering number, respectively to institute the period when Ordinal number evidence and aperiodic time series data carry out clustering processing.

4. according to the method described in claim 3, the algorithm assembly is according to the timing cycles preset value in the algorithm parameter, The period time series data and aperiodic time series data determined in the time series data include:

Algorithm assembly calculates amplitude spectrum maximum value, amplitude spectrum average value and the amplitude spectrum standard deviation of time series data, and judges Whether the amplitude spectrum maximum value and the difference of the amplitude spectrum average value are greater than the multiple of amplitude spectrum standard deviation；

If judging, the amplitude spectrum maximum value is not more than the multiple of amplitude spectrum standard deviation with the difference of the amplitude spectrum average value, Algorithm assembly differentiates that the time series data is aperiodic time series data；

If judging, the amplitude spectrum maximum value is greater than the multiple of amplitude spectrum standard deviation with the difference of the amplitude spectrum average value, calculates Method component further judges whether the corresponding timing cycles of the amplitude spectrum maximum value are equal to the timing cycles preset value；

If judging, the corresponding timing cycles of the amplitude spectrum maximum value are equal to the timing cycles preset value, and algorithm assembly differentiates The time series data is period time series data；

If judging the corresponding timing cycles of the amplitude spectrum maximum value not equal to the timing cycles preset value, algorithm assembly is sentenced The not described time series data is aperiodic time series data.

5. according to the method described in claim 3, the algorithm assembly is according to taken out time series data structure characteristic collection and described Time Series Clustering number, carrying out clustering processing to institute's period time series data and aperiodic time series data respectively includes:

Algorithm assembly is according to taken out time series data structure characteristic collection, when according to the period Time Series Clustering number to the period Ordinal number is according to progress clustering processing；

Algorithm assembly is according to taken out time series data structure characteristic collection, according to the aperiodic Time Series Clustering number to the non-week Phase time series data carries out clustering processing.

6. a kind of device of Time Series Clustering, comprising:

Cluster module, for reading time series data from time series database, according to the received algorithm parameter of institute to it is read when Ordinal number obtains the cluster result comprising temporal aspect matrix and Time Series Clustering label according to clustering processing is carried out；

Preservation and display module, it is poly- for the cluster result comprising temporal aspect matrix and Time Series Clustering label to be saved in In class result database, and the cluster result is shown by display component.

7. device according to claim 6, the algorithm parameter includes timing cycles preset value, temporal aspect collection with timely Sequence clusters number；The temporal aspect collection includes seasonal indicator, tendency index, the degree of bias, kurtosis, auto-correlation coefficient, opposite One or more of entropy, Sample Entropy, self-similarity and Liapunov coefficient；When the Time Series Clustering number includes the period Sequence clusters number and aperiodic Time Series Clustering number.

8. device according to claim 7, the cluster module include:

Processing unit, for extracting the structure feature of the time series data according to the temporal aspect collection in the algorithm parameter Collection, and according to the timing cycles preset value in the algorithm parameter, determine period time series data in the time series data and Aperiodic time series data；

Cluster cell is used for according to taken out time series data structure characteristic collection and the Time Series Clustering number, respectively to described in institute Period time series data and aperiodic time series data carry out clustering processing.

9. device according to claim 8, the processing unit include:

Feature extraction subelement, for extracting the knot of the time series data according to the temporal aspect collection in the algorithm parameter Structure feature set；

Period differentiates subelement, for calculating amplitude spectrum maximum value, amplitude spectrum average value and the amplitude spectrum mark of time series data It is quasi- poor, and judge the difference of the amplitude spectrum maximum value and the amplitude spectrum average value whether be greater than amplitude spectrum standard deviation again Number, if judging, the amplitude spectrum maximum value is not more than the multiple of amplitude spectrum standard deviation with the difference of the amplitude spectrum average value, Differentiate that the time series data is aperiodic time series data, and if judging the amplitude spectrum maximum value and the amplitude spectrum average value Difference be greater than amplitude spectrum standard deviation multiple, then further judge the corresponding timing cycles of the amplitude spectrum maximum value whether etc. In the timing cycles preset value, if judging, the corresponding timing cycles of the amplitude spectrum maximum value are default equal to the timing cycles Value then differentiates that the time series data is period time series data, if judging, the corresponding timing cycles of the amplitude spectrum maximum value are differed In the timing cycles preset value, then differentiate that the time series data is aperiodic time series data.

10. a kind of electronic equipment of Time Series Clustering, the electronic equipment include: processor and memory, wherein the memory For storing executable program code；The processor is transported by reading the executable program code stored in the memory Row program corresponding with executable program code, for executing following steps: