CN111814897A

CN111814897A - Time series data classification method based on multi-level shape

Info

Publication number: CN111814897A
Application number: CN202010696976.2A
Authority: CN
Inventors: 丁琳琳; 脱乃元; 曹鲁杰; 张翰林; 宋宝燕
Original assignee: Liaoning University
Current assignee: Liaoning University
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2020-10-23

Abstract

A time sequence data classification method based on multi-level shape includes the steps of 1) preprocessing time sequence data: performing data dimension reduction processing on the original time sequence by using an SAX method; step 2) obtaining the time sequence initial subsequence: extracting a subsequence set in the time sequence by a sliding window method, and indirectly controlling the extraction length of the subsequence by changing and adjusting the size of a window; step 3), discovery and extraction of a multi-level shape candidate set: filtering and combining the candidate set through the proposed multi-level shape frame, and selecting a shape with large information gain as a candidate set; 4) shapelet transforms and constructs classifiers. According to the method, an efficient multi-level shape candidate set filtering model is provided, the number of shape candidate sets is effectively reduced, the shape sets with high classification capacity are rapidly screened, and then the effective classification of time sequence data is achieved through an ELM classifier.

Description

Time series data classification method based on multi-level shape

Technical Field

The invention belongs to the field of time series data mining, relates to a time series data classification method, and particularly relates to a time series data classification method based on a multi-layer shape.

Background

Time series data generally represents the observation of a potential process at a set sampling frequency over equally spaced time periods, and is derived from the fields of medical diagnosis, disaster prediction, commercial monitoring, and the like. Time series data generally has characteristics such as data bulk, dimension height, update are fast. Time series data classification has always been a major problem in the field of time series data mining, and has a wide range of applications. In the time series classification task, the shape technology is an effective method for solving the time series classification problem. The method for classifying based on the shape has the characteristics of high interpretability, strong reducibility, high operation efficiency, high classification accuracy and the like, can clearly reflect the category corresponding to the data, and can embody the intuitiveness of an expected classification effect. However, the existing shape acquisition method still has the problems of too many candidate sets and too large calculation amount. On one hand, with the help of the shape method, the distinguishing capability of the subsequences in the time sequence needs to be judged one by one, a large amount of operation cost is generated through distance similarity calculation, and the complexity in the discovery process is increased; on the other hand, in the operation process, a large number of candidate sets and alternative shape sequences are generated due to the excessive number of subsequences, so that a large amount of time is consumed for direct calculation, and a great challenge is brought to shape discovery and extraction. In addition, the ELM serving as a single hidden layer feedforward neural network has the characteristics of high training speed and high classification precision, is widely applied in various fields, and can combine the classification problem of time series with the existing classifier. Therefore, the invention provides a multi-level shape candidate set extraction method combined with the problem of huge shape candidate set calculation for research, and an ELM classifier is applied to research on the classification problem of time series data.

Since time series data generally does not have direct features, even though the potential features of dimension are still high through a complex feature selection technique, dimension reduction of time series data is generally required before classification of time series data. At present, the time series data dimension reduction classification methods widely used include PAA, SAX, shape method, and the like. The essence of the Shapelet method is that time sequence data is mapped from an original input space to a new feature space, however, the time sequence of different variables in the original time sequence is ignored, and in the process of generating a shape candidate set, the method of judging subsequences one by one may ignore the approximate relation of the shape, and the time efficiency and the calculation amount are very large.

Disclosure of Invention

In order to overcome the defects of the conventional shape classification method of the time sequence, the invention provides a time sequence data classification method based on a multilayer shape, which can quickly and effectively process the problem of accurately classifying high-dimensional time sequence data.

The purpose of the invention is realized by the following technical scheme:

a time sequence data classification method based on multi-level shape is characterized in that: the method comprises the following steps:

step 1) preprocessing time sequence data: performing data dimension reduction processing on the original time sequence by using an SAX method:

step 2) obtaining the time sequence initial subsequence: extracting a subsequence set in a time sequence by a sliding window method, and indirectly controlling the extraction length of the subsequence by changing and adjusting the size of a window;

step 3), discovery and extraction of a multi-level shape candidate set: filtering and combining the candidate set through the proposed multi-level shape frame, and selecting a shape with large information gain as a candidate set;

4) shapelet conversion and classifier construction:

4-1) shape classification conversion: firstly, establishing a simple initialized data matrix for the initial N time sequence data sets according to the number of the time sequence data sets, and simultaneously performing matrix generation on all shape candidate sets obtained by a multi-level frame method according to the sequence of the attributed time sequence; secondly, according to a multi-pair and multi-mapping relation between the initial N time sequence sets and the shape matrix, similarity calculation of Euclidean distances is carried out to obtain characteristic values of each time sequence, wherein the attribute of each characteristic value represents one shape, and the value of each attribute is the distance from the shape to the original sequence; finally, constructing the characteristic values into N characteristic vectors to finish the characteristic vector representation of the time sequence data set;

4.2) after the classifier is established for the time sequence, putting the subsequent training sample data into the classifier for training, wherein in the training process, ELM firstly generates input weight and hidden node threshold value randomly, and then calculates output weight of SLFNs according to the training data.

In the step 1), the concrete steps are as follows:

1.1) normalized piecewise approximation of data: changing the initial time sequence data into a data set with the average value of 0 and the variance of 1 by adopting a 0-average standardization method;

1.2) performing character representation on the processed data: and mapping the average value in each segment into a Gaussian distribution table, wherein the range of the Gaussian distribution table represents the expression range of time sequence dimensionality reduction, and performing a symbolization operation according to the initialized and set w parameter index, the size of the letter base r and the range of the split point beta to finish symbol aggregation approximate expression.

In the step 2), the concrete steps are as follows:

firstly, setting the size of a sliding window, and fixing the length and the range of each extracted subsequence; secondly, sliding the window according to the principle of moving 1 to the right each time, changing the position of the window in the time sequence, and finishing the extraction of the subsequence at different positions in the time sequence; and finally, adjusting and changing the size of the window to finish the extraction of all the subsequences with different lengths, and storing the extracted subsequences in a set.

In the step 3), the concrete steps are as follows:

3.1) initial subsequence clustering based on k-means: after extracting subsequences of all time sequences, clustering candidate subsequences, introducing a DTW distance measurement calculation mode as a measurement index, filtering and screening a subsequence set, wherein DTW distances represent the similarity of subsequence shapes, and dividing all alternative shape candidate sets by adopting a DTW algorithm to enable the shape candidate sets in the same cluster to have similar characteristics in shape;

the method comprises the steps of calculating the similarity of shape based on DTW distance, setting two different shape sequences, namely X1 { X1, X2, … xM }, Y1 { Y1, Y2, …, yN }, and firstly calculating a distance matrix

Then calculating the accumulated distance matrix S_ij＝D_ij+min(s_i，j-1，S_i，j-1，S_i-1，j-1)

3.2) updating the clustering result: after the subsequence candidate set is clustered by combining a k-means method and a DTW method, real-time iteration and updating are carried out on the obtained clustering result so as to ensure that the clustering result of the subsequence meets the characteristic of approximate shape, thereby realizing definite classification in subsequent SHAPET candidate extraction;

3.3) establishing a multi-level shape extraction framework.

In the step 3.3), the concrete steps are as follows:

3.3.1) performing intra-level candidate set merging: firstly, according to the condition that a 'heap' is generated by sub-sequence clustering, hierarchical division of all clustered sub-sequences is completed; secondly, the integration of the candidate sets is completed through the inherent 'approximate' relation of the candidate sets in the hierarchy, the screening is carried out through the approximate characteristics of the shapes, the candidate sets with similar shapes are merged and integrated, and the candidate sets with obvious shape characteristics have distinguishing capacity and are updated interpretively, so that the integration is reserved; the DTW distance is used as a threshold or a given threshold, two candidate sets with the distance smaller than the threshold are shown to be very similar in shape, and the candidate sets close to the threshold range are reserved for reduction; finally, obtaining a simplified shape candidate set at each level;

3.3.2) performing inter-level candidate set merging: in the SH-ELM model, a Levenshtein Distance algorithm is used for merging candidate sets among multiple levels, the lengths of two character strings a and b are divided into | a | and | b |, and the Levenshtein Distance calculation formula is as follows:

wherein when a_i＝b_jWhen, lev_a，b(i, j) value is 0, otherwise lev_a，b(i, j) value 1, lev_a，b(i, j) is the edit distance between the first i characters and the first j characters of b, the similarity Sim of a and b_a，bExpressed as:

Sim_a，b＝1-lev_a，b(|a|，|b|)/max(|a|，|b|)

in the process of merging the candidate sets, performing connection calculation on the candidate sets in adjacent layers in the frame, and comparing and screening the candidate sets between the layers by using a Levenshtein Distance method and by means of charaterized approximate Distance calculation;

3.3.3) Multi-level top-k candidate set validation: taking the index of information gain as a judgment standard for measuring the classification capability, and selecting k shape slices with the maximum information gain in a single layer to finish the extraction task; and finally confirming the extracted candidate set, and finishing a classification task of the candidate time sequence by using the extracted candidate set. The process is a process for extracting k best shapelets from the data set; initially, the k-shape set is empty, then a candidate shape sequence is obtained in each layer, and the distance between the sequence and the layer is required to be calculated; and after the distance value is obtained, calculating the information gain corresponding to the sequence, sequencing according to the information gain, finishing the candidate replacement of the optimal shape, and finally outputting the optimal k-shape.

The beneficial effects created by the invention are as follows: the invention provides a time sequence data classification method based on multilayer shapets, which designs an efficient multilayer shapet candidate set filtering model, further effectively reduces the number of shapelets candidate sets, quickly screens shapelets sets with higher classification capability, and then realizes effective classification of time sequence data through an ELM classifier.

Drawings

FIG. 1: the invention provides a time sequence SH-ELM model work flow chart.

FIG. 2: the invention discloses a staged schematic diagram of a time series data classification model.

FIG. 3: the invention discloses a schematic diagram of a time sequence SAX (software-executable code) tokenization dimension reduction representation method.

FIG. 4: the invention provides a schematic diagram of a subsequence candidate set extraction execution process.

FIG. 5: the invention provides a model structure schematic diagram of a multi-level shape frame.

FIG. 6 a: the invention compares the change in k-value in the real dataset to the sort run time.

FIG. 6 b: comparison of k-value changes in the synthetic dataset to sort run times in the experiments of the present invention.

FIG. 7 a: in the experiment of the invention, the change of the k value in the real data set is compared with the classification accuracy.

FIG. 7 b: the comparison of k-value changes in the synthetic dataset to classification accuracy in the experiments of the invention.

FIG. 8 a: the invention compares the change of the number of layers in a real data set with the classification running time in the experiment.

FIG. 8 b: the number of layers in the synthetic dataset varied in comparison to the sort run time in the experiments of the present invention.

FIG. 9: the influence of the time sequence length on the classification running time in the experiment of the invention.

FIG. 10: the influence of the time sequence length on the classification accuracy in the experiment of the invention.

Detailed Description

First some related concepts are given. If a domain of a shape object contains the minimum number of data objects, then the shape object is the cluster center for a shape direction.

Definitions

1 and 2 are the meaning of the expression of time series and subsequence, and

definitions

3 and 4 give the definitions of candidate set and distance calculation method.

Definition 1: time series and subsequences: first, a time sequence T of length m is given, each subsequence S of T being a continuous truncation starting from any position of T. Wherein the content of the first and second substances,assuming that the subsequence has a length of l and the point at which extraction begins is p, the subsequence is denoted as S-t_p，...，t_p+l-1The extraction range of the point p is that l is less than or equal to p is less than or equal to m-l + 1.

Definition 2: sliding the window: given a time sequence T of length m and a defined subsequence length l, all possible subsequences can be extracted by setting a sliding window of size l over T. The superscript l and the subscript p denote the subsequence extraction length and the starting position of the sliding window in the time series, respectively. The set of all subsequences of length l extracted from T is defined as

The set of subsequences represented by a sliding window is

Definition 3: distance in time series: dist (T, S) is a distance function, and distance operation is performed on two time series T and R with the same length through the Dist function, and a distance value d is returned, and the value is the distance value between the time series. The Dist function can also be used to measure the distance between two subsequences of the same length.

Definition 4: distance from time series to subsequence: SubsequenceDist (T, S) is a distance function, which returns a non-negative value d using the time sequence T and the subsequence S as inputs, i.e. the distance between the time sequence and the subsequence. Subsequence Dist (T, S) ═ MIN (Dist (S, S')).

Definition 5: dimension reduction normalization of the time series: to obtain an efficient dimension-reduction feature representation of the time-series data, a normalization method (z-normalization, as shown below) is used to convert the time-series with length m into w symbols. Initializing time-series data T, converting into standard sequence, and dividing time-series into w segments of equal size, i.e. C ═ C₁，c₂，....，c_ω. The dimension reduction normalization is expressed as follows:

Step 2) obtaining the time sequence initial subsequence: extracting a subsequence set in a time sequence by a sliding window method, and indirectly controlling the extraction length of the subsequence by changing and adjusting the size of a window, wherein the method comprises the following specific steps:

3.3) establishing a multi-level shape extraction framework:

Sim_a，b＝1-lev_a，b(|a|，|b|)/max(|a|，|b|)

4) Shapelet conversion and classifier construction:

4.2) after the classifier is established for the time sequence, putting the subsequent training sample data into the classifier for training, wherein in the training process, the ELM firstly randomly generates input weight and hidden node threshold, and then calculates the output weight of SLFNs according to the training data.

Example 1:

(1) a multi-level time sequence shield classification model is constructed and comprises three stages, the whole work flow of the model is shown in figure 1, the staged work distribution of the model is shown in figure 2, and the staged distribution respectively comprises a candidate set acquisition stage, a multi-level screening stage and a shield conversion stage. Firstly, performing dimensionality reduction on a time sequence by adopting an SAX algorithm, extracting a subsequence from the sequence by using a sliding window method, and then performing clustering processing on a shape candidate set by using a DTW clustering method.

Fig. 3 shows an example of extracting subsequences by using a sliding window method, where the data representation set of time-series data after dimension reduction by a dimension reduction SAX tokenization representation method is { dcbbacdcbdcacd }, the size W of the sliding window is set to 3, the subsequences are sequentially extracted from the left, and are { dcb, cbb, bba,... acd }, respectively, the extracted subsequences are constructed into a set, and then the set is subjected to candidate classification in shape by a DTW clustering method.

(2) And constructing a multi-level shape framework, wherein candidate subsequences with similar shapes can be formed in each time sequence. These very similar candidate subsequences are mapped to one layer by the DTW clustering method. Through the hierarchical combination among a plurality of layers, an SH-ELM framework model is constructed.

Information gain is defined 6. Given some splitting policy sp that divides the data set D into two subsets D1 and D2, the entropy before and after splitting is I (D) and

(D) in that respect Thus, the information gain of the segmentation rule is

Gain(sp)＝I(D)-f(D1)I(D1)+f(D2)I(D2)。

Definition 7: the optimum split point. The time series data set D consists of two categories a and B. For shape candidate S, a certain distance threshold d is selected_thAnd divides D into D1 and D2 such that for each time series object T in D1_i，SubsequenceDist(T1，S)＜d_thAnd for each time-series object T2 in D2, subsequenceDist (T2, S) ≧ D_th. The optimal division point is the distance threshold Gain (S, d)_osp(D，S))≥Gain(S，d′_th)。

Definition 8: sharelet with the best segmentation point. Given a set of time series data D consisting of two classes A and B, shape (D) is a subsequence thereof having a corresponding best segmentation point, Gain (D), D_osp(D，shapelet(D))≥Gain(S，d_osp(D，S))。

Fig. 4 shows the architecture of a model of a multi-level shape frame, and it can be seen that candidate sets with very similar shapes in 5 time series (T1, T2, T3, T4, T5) are divided into the same level, and a shape set of candidates in each time series is effectively divided into shapes, so as to form a shape effect after DTW clustering; the shape candidate set in each layer selects shape by using a calculation method for defining information gain in 6-8, and selects k shape with the largest information gain.

FIG. 5 shows the hierarchical relationship generated in the multi-level model, and the candidate shape set in each level is passed through the Levenshtein Distance algorithm. The subsequence dbdbdb and the subsequence dbdcb generated in the graph 4 are calculated, and a candidate set can be further filtered, so that the filtered subsequence set still keeps the effect of information gain, namely the subsequence dbcbdb simplifies the later-stage operation amount through the operation among the character strings, meanwhile, the later-stage classification calculation time is reduced to a certain extent again, and the classification efficiency is improved.

(3) And performing matrix conversion calculation on the obtained shape set and the initial time sequence set by using a shape conversion technology to obtain a spatial feature vector which is used as an input of ELM classification, and further directly linking the constructed SH-ELM model with a classifier for application. For each piece of time series data T_iThe distances between the shape vector and the k shape shells are sequentially calculated, and the distances are combined to form a distance vector, wherein the distance vector is the representation of the sequence data among the shape shells; on the other hand, each value in the vector contains the sequential relation of data in the time sequence data.

In terms of parameter setting, the shape candidate generation process includes many factors. Because the influence factor of the shape length is overcome, the DTW distance calculation is used for clustering the shape, and therefore the influence of the k value is considered. The number of shape is determined by the k value of a key parameter, and the k parameter is also a factor influencing the experimental time, so that the influence of different k values on the classification precision is tested.

(4) Performing time series data classification

The performance of the screening optimization method is evaluated in detail through various experimental settings, including the influence of different parameters in the SH-ELM model on the classification accuracy and the extraction speed. The different parameter settings include the choice of k values, the size of the data sets and the length of each data set. The classification results of SH-ELM models include many types. The performance of the proposed SH-ELM model was evaluated by comparing it to a number of current mainstream time series shape extraction classification algorithms including FSH (fast shape), divshade-ELM (variogram Top-kshapelet) and FLAG. The invention has been tested mainly from 4 aspects, which are described below:

varying influence of changing k-value parameter

Fig. 6 a-6 b show the effect of the change of k-value of the SH-ELM model in the real dataset and the synthetic dataset on the runtime. Wherein the horizontal axis represents the k value and the vertical axis represents the run time. With the increase of the k value, the computation time of the SH-ELM algorithm provided by the invention in the real data set and the synthetic data set is superior to that of a DIV-Shapelet algorithm, and the SH-ELM algorithm has better effects in the early stage and the later stage of the change of the k value. The SH-ELM algorithm provided by the invention screens redundant candidate sets in the process of the subsequence candidate set, reduces the operation amount of shape extraction, and thus reduces the consumed time.

The experimental graphs shown in fig. 7 a-7 b show the influence of the change of k value on the classification accuracy of time series, and reflect the comparison between the SH-ELM model proposed by the present invention and other algorithms. It can be seen that, as the k value changes, the classification accuracy on the time series data set is improved, and the algorithm classification accuracy is better than that of the FSH algorithm and the FLAG algorithm, which also indicates that the shape set at this time has good distinguishing capability on the time series data set. The SH-ELM algorithm applies the updating strategy of the candidate set, replaces and optimizes the shape set in the model, and improves the adaptivity and the accuracy of the classifier.

Varying effects of varying the number of tiers

In the experimental diagrams of fig. 8a to 8b, the horizontal axis represents the number of layers and the vertical axis represents the operating time. The figure shows the impact of the number of layers on the classification run time in the real dataset and the composite dataset, respectively. As the number of tiers increases, the runtime of the taxonomy assumes an increasing posture. This is because the number of layers increases, the layers corresponding to the candidate set become more accurate, and the computation time required for the computation also increases, so that the running time of the model increases relatively.

Model classification time and velocity profiles

Since the SH-ELM model takes the longest time in the shield classification experiment stage, the experiment of the invention compares the time and the speed of the stage with other mainstream shield extraction methods. FIG. 9 shows a comparison of the classification run times of the algorithm of the present invention. The present invention compares the algorithm of the present invention to Fast-Shapelet (FSH), Div-Shapelet (DIVSH) and FLAG algorithms. In the SH-ELM model of the invention, firstly, the dimension of the whole time sequence dataset is effectively reduced by using the SAX dimension reduction method, so that the search time in the shape discovery phase becomes faster. And secondly, in the stage of searching for the shape, the shape method of DTW clustering is adopted to replace the original algorithm. By sequentially traversing the subsequences, the candidate set is then searched out during shape discovery. The Div-Shapelet algorithm starts to compute the sub-sequence in the initial part of the whole time series. The runtime of the algorithm of the present invention is shorter than the original Fast-Shapelet algorithm and DivShap-ELM algorithm as the length of the time series data increases.

Comparison of model classification accuracy

Fig. 10 shows the classification accuracy of different time series classification algorithms. Along with the increase of the length of the time sequence, the classification precision of various algorithms fluctuates, and other comparison algorithms are better than the SH-ELM model provided by the invention in the intermediate stage of the classification task. This is because the length size of the time series is changed, which has a certain influence on the extraction of the shape subsequence. It can be seen that, as the length of the time series data increases, the classification accuracy in the initial stage and the subsequent stage of the SH-ELM model of the present invention is higher, because the multilevel model of the present invention can select subsequences with lower similarity, and the accuracy of time series classification is more accurate, slightly higher than FSH and DIV-SH.

Claims

1. A time sequence data classification method based on multi-level shape is characterized in that: the method comprises the following steps:

step 2) obtaining the time sequence initial subsequence: extracting a subsequence set in the time sequence by a sliding window method, and indirectly controlling the extraction length of the subsequence by changing and adjusting the size of a window;

4) shapelet conversion and classifier construction:

4-1) shape classification conversion: firstly, establishing a simple initialized data matrix for the initial N time sequence data sets according to the number of the initial N time sequence data sets, and simultaneously performing matrix generation on all shape candidate sets obtained by a multi-level frame method according to the sequence of the attributed time sequences; secondly, according to a many-to-many mapping relation between the initial N time sequence sets and the shape matrix, similarity calculation of Euclidean distances is carried out to obtain characteristic values of each time sequence, wherein the attribute of each characteristic value represents one shape, and the value of each attribute is the distance from the shape to the original sequence; finally, constructing the characteristic values into N characteristic vectors to finish the characteristic vector representation of the time sequence data set;

2. The method for classifying time-series data based on multilevel shape according to claim 1, wherein: in the step 1), the concrete steps are as follows:

1.1) normalized piecewise approximation of data: changing the initial time sequence data into a data set with a mean value of 0 and a variance of 1 by adopting a 0-mean standardization method;

3. The method for classifying time-series data based on multilevel shape according to claim 1, wherein: in the step 2), the concrete steps are as follows:

firstly, setting the size of a sliding window, and fixing the length and the range of each extracted subsequence; secondly, sliding the window according to the principle of moving 1 to the right each time, changing the position of the window in the time sequence, and finishing the extraction of subsequences at different positions in the time sequence; and finally, adjusting and changing the size of the window to finish the extraction of all the subsequences with different lengths, and storing the extracted subsequences in a set.

4. The method for classifying time-series data based on multilevel shape according to claim 1, wherein: in the step 3), the concrete steps are as follows:

3.1) initial subsequence clustering based on k-means: after extracting subsequences of all time sequences, clustering candidate subsequences, introducing a DTW distance measurement calculation mode as a measurement index, filtering and screening a subsequence set, wherein DTW distances represent the similarity of subsequence shapes, and dividing all alternative shape candidate sets by adopting a DTW algorithm to enable the shape candidate sets in the same cluster to have similar characteristics;

Then, the accumulated distance matrix S is calculated_ij＝D_ij+min(S_i，j-1，S_i，j-1，S_i-1，j-1)

3.3) establishing a multi-level shape extraction framework.

5. The method for classifying time-series data based on multilevel shape according to claim 4, wherein: in the step 3.3), the concrete steps are as follows:

3.3.1) performing intra-level candidate set merging: firstly, according to the condition that a 'heap' is generated by sub-sequence clustering, hierarchical division of all clustered sub-sequences is completed; secondly, the integration of the candidate sets is completed through the inherent 'approximate' relation of the candidate sets in the hierarchy, the screening is carried out through the approximate characteristics of the shapes, the candidate sets with similar shapes are merged and integrated, and the candidate sets with obvious shape characteristics have distinguishing capacity and are updated interpretively, so that the integration is reserved; the DTW distance is used as a threshold or a given threshold, two candidate sets with the distance smaller than the threshold are shown to be very similar in shape, and the candidate sets close to the threshold range are reserved for reduction; finally, a simplified shape candidate set is obtained at each level;

Sim_a，b＝1-lev_a，b(|a|，|b|)/max(|a|，|b|)

in the process of merging the candidate sets, performing connection calculation on the candidate sets in adjacent layers in the frame, and comparing and screening the candidate sets among the layers by using a Levenshtein Distance method and by means of charaterized approximate Distance calculation;

3.3.3) Multi-level top-k candidate set validation: taking the index of information gain as a judgment standard for measuring the classification capability, and selecting k shape slices with the maximum information gain in a single layer, and finishing the extraction task by top-k shape slices; and finally confirming the extracted candidate set, and finishing the classification task of the candidate time sequence by using the extracted candidate set. The process is a process of extracting k best shapelets from the data set; initially, the k-shape set is empty, then a candidate shape sequence is obtained in each layer, and the distance between the sequence and the layer is required to be calculated; and after the distance value is obtained, calculating the information gain corresponding to the sequence, sequencing according to the information gain, finishing the candidate replacement of the optimal shape, and finally outputting the optimal k-shape.