CN113705715B

CN113705715B - Time sequence classification method based on LSTM and multi-scale FCN

Info

Publication number: CN113705715B
Application number: CN202111034788.4A
Authority: CN
Inventors: 陈志奎
Original assignee: Dalian Juzhi Information Technology Co ltd
Current assignee: Dalian Juzhi Information Technology Co ltd
Priority date: 2021-09-04
Filing date: 2021-09-04
Publication date: 2024-04-19
Anticipated expiration: 2041-09-04
Also published as: CN113705715A

Abstract

The invention provides a time sequence classification method based on LSTM and multi-scale FCNs, and belongs to the field of time sequence classification. Setting a general structure of the multi-mode network; extracting time dependent features by using a long-term and short-term memory network; fully excavating geometrical space features of various granularities of the time sequence curve by utilizing a full convolution module; integrating the space-time characteristics and distinguishing samples according to the space-time characteristics; the model is fully trained using a back propagation algorithm. The method can comprehensively explore the space characteristics with large-scale and multi-scale receptive fields, and can adaptively learn long-term and short-term time dependence, and learned beneficial information is more comprehensive than that of the existing model. By virtue of more comprehensive grasp on the characteristic of the time sequence difference, more accurate judgment can be given.

Description

Time sequence classification method based on LSTM and multi-scale FCN

Technical Field

The invention relates to the field of time sequence classification, in particular to a time sequence classification method based on LSTM and multi-scale FCN.

Background

In the big data age, various types of structured and unstructured data are everywhere visible. Time series data, like images and text, is also an extremely common form. Which is numerical data obtained by continuously sampling one or more physical quantities at equal time intervals. The task of time series Classification (TIME SERIES Classification, abbreviated as TSC) is a ubiquitous and significant topic. For example, in the industrial field, the pressure applied to the mechanical equipment and the data collected by the vibration sensor are time series data, and whether the current part or the whole machine has faults or what kind of faults occur can be judged according to the time series data, so that maintenance suggestions are given; in the medical field, waveform data such as electrocardiogram and the like are also time series data, and the efficiency of medical workers can be improved by classifying the waveform data by an artificial intelligence method; in the financial field, analysis such as prediction, classification and the like is performed on time series data of historic trends of products such as securities, stocks and the like, so that investors can be assisted in making decisions. In addition, other information of non-sequence data can be converted into time sequence representation, and the problem is solved by classification. For example, researchers have tried to extract edge curves of plant leaf and animal bone pictures, and determine the species to which they belong by classifying such curves.

The current methods applied to time series classification tasks can be categorized into the following categories:

A distance metric based method: the classification method based on distance measurement relies on the distance between samples to be classified as information on which the classification task is based. Currently, research on distance-based TSC methods is mainly focused on optimizing distance measurement methods and innovations of distance information utilization modes, and used classifiers are more conventional, but rarely have innovations on the classifier, because they are not critical to improving the performance of such methods. The computation of the inter-sample distance typically uses elastic metrics, most typically Dynamic Time Warping (DTW), many solutions alleviate the problem of the ill-conditioned alignment of DTWs by imposing strict constraints on the warped path, such as weighting, window limiting, improving the step-size increase pattern, and limiting the time step difference between alignment points, etc. The method overcomes the defect of DTW to a certain extent, but at present, no method can perfectly solve the problem of solving the problem of distance between unaligned time series samples, and the distance-based TSC method is simple to operate but has poor task performance.

Feature representation-based method: the method based on the characteristic representation is to convert the original sequence into a characteristic space which is easier to detect the difference for representation, wherein the characteristic representation of the time sequence can be discrete and has lower dimension, so that the defects of overhigh dimension of the time sequence data attribute and non-alignment of samples in time are overcome, and more conventional classifiers can be used for solving the problem of time sequence classification. There are a number of methods for classifying feature information based on time series, which construct a variety of feature representations, the most common of which are Shapelet transforms and symbolized representations, and also trend features, spike extraction, domain transforms, segment statistics, etc. However, there are some limitations to this type of approach: the extraction operation of some features is complex and complicated, and the time cost is high; some detail information of the original data is inevitably lost in the conversion process; meanwhile, whether the manually established characteristic selection scheme is scientific or not can cause great influence on the result.

Method based on ensemble learning: the ensemble learning is a method of combining a plurality of basic weak classifiers into a stronger classification model, which can achieve the effect of reducing variance or improving task performance. When the TSC method has reached a certain degree of accuracy, integrating it is a classical way to further improve performance. The method based on the integrated learning comprises COTE and HIVE-COT based on domain transformation characteristic representation, NNE of an integrated neural network, PROP integrating different elastic distance measurement modes and the like. TSC models based on ensemble learning possess relatively high accuracy, but their level of accuracy is affected by the base model. Integrated models typically require training tens of basic models, or extracting a variety of feature information, which results in integration-based methods that are typically large in volume and relatively complex.

Deep learning-based method: the deep learning model commonly used in the time sequence classification field comprises a multi-layer perceptron MLP, a convolutional neural network CNN, a cyclic neural network RNN, an automatic encoder Autoencoder and the like. In addition, there are many methods for combining the above basic models to form a multi-modal neural network, such as MCDD-CNN, MCNN, LSTM-FCN. The method has certain fault tolerance capability on input data, so the method has less influence on the problem of misalignment of sequences, and can still obtain higher precision under the condition of directly taking the original time sequence as input; features in the data can be automatically learned, and blindness of manual extraction is avoided; the method can directly learn the original data without missing detail information and characteristics, and can also input important characteristics extracted manually at the same time, thereby enhancing the learning of key information. Compared with a non-deep learning integrated model, the single deep learning model achieves similar classification precision by a simpler and lighter structure, and the multi-mode architecture can obtain the current best effect in the TSC field.

Time series classification can solve a number of practical problems in a variety of fields. It has some drawbacks:

(1) The attribute dimension is too high (i.e., the time step is too long). The sequence time step of the data set causes the sequence to be inconspicuous in the global direction, and the learning difficulty is increased. In addition, too high a dimension of properties can result in some methods having high complexity and thus being difficult to apply.

(2) The samples are not aligned in time. The time series data is equally spaced sampling data in a real environment, and different degrees of delay may occur in sampling. The resulting sequences are not perfectly aligned in time steps and misalignment can often occur. The method can not directly take the difference of different samples at the same position as the basis for classifying the samples, and simultaneously brings difficulty to the global similarity solution among the samples.

Both of the above drawbacks result in difficulty in obtaining information useful for classification tasks directly from the original time series.

Disclosure of Invention

The full convolutional neural network FCN is one of the most powerful tools in the TSC field, and has strong feature learning capability. Full mining of multi-scale spatial features by supplementing it with time-dependent features or organizing multiple FCNs of different structures can serve to enhance task performance, but currently there is no way to combine the advantages of both at the same time, providing a more comprehensive depiction of the time series. In view of this situation, the present invention proposes a time series classification method based on LSTM and multiscale FCNs (LSTM Multi-Scale FCNs, called LSTM-MFCN). The device consists of an FCN module and an LSTM module which are subjected to multi-scale convolution, so that the device can sense the shape characteristics of a time sequence curve in various scales and can retain the gain effect brought by the time sequence characteristics. The multi-mode architecture can fully mine various scale features contained in the high-dimensional time sequence from two angles, so that more accurate judgment is given.

In order to achieve the above purpose, the specific technical scheme adopted by the invention is as follows:

A time sequence classification method based on LSTM and multi-scale FCN comprises a multi-scale FCN module and an LSTM module for carrying out multi-scale convolution, and specifically comprises the following steps:

(1) Extracting time dependency characteristics by using a long-short-term memory network, obtaining dependency relationship between the current value of the observed variable and historical data, and describing time sequence association inside the sequence; there is also some difference in the associations that exist between the instances of the different categories. Time dependence is also an important differential feature for classification tasks.

(2) Fully mining spatial features of multiple granularities of a time sequence curve by using a full convolution module, splitting the first two layers of convolution kernels in a classical FCN structure into multiple groups, wherein the adopted multiple scale numbers are represented by M, a larger scale part is realized by cavity convolution, a smaller scale part is realized by common convolution, the multi-granularity abstract features extracted from each layer are converged by using a MFCN module and then uniformly conveyed to the next layer, and at the last layer of the MFCN module, feature extraction is performed by using single scale convolution and features in the multiple convolution kernels are integrated by using global pooling as finally output features;

The time series data generally includes shape features of various sizes. Large scale features reflect the long range trends of the sequence, while small features indicate subtle changes in local regions, an excellent TSC model should be able to capture features on different scales.

In general, the range of data perceivable by a convolution layer can be enlarged by data pooling and increasing the size of a convolution kernel, but both of the two methods can result in information loss or an increase in the amount of parameters to be trained. The open-loop convolution can enlarge the receptive field range under the conditions of not compressing information and not increasing the number of parameters. The invention realizes multiscale receptive field by using convolution kernels of fixed size and different void ratios, so that multiscales on a larger size level can be constructed under the same parameter scale.

Specifically, the invention splits the first two layers of convolution kernels in a classical FCN structure into a plurality of groups, the adopted multi-scale number is represented by M, wherein the larger scale part is realized by cavity convolution, and the smaller scale part is realized by common convolution. Considering the feature extraction operation performed on each layer of the deep convolutional neural network, the dependence on the features of the previous layer may be Multi-scale, and the Multi-granularity abstract features extracted from each layer are converged by a MFCN (Multi-scale FCN) module of the model and then are uniformly transmitted to the next layer. At the last layer of the MFCN module, since the features are already abstract enough, the single scale convolution is used here for feature extraction and global pooling is used to integrate the features in multiple convolution kernels as the final output features.

(3) And integrating and distinguishing the space-time characteristics, splicing and integrating the time and space characteristics, and adaptively learning the mapping relation between the space-time characteristics and sample characteristics by using a fully connected neural network to obtain an LSTM-MFCN model. The geometric space and time-dependent features of the time series data given in the above two parts are the information obtained by learning the time series from different angles. If the two important characteristic information are combined, the model can more comprehensively master the difference characteristics of the time sequence under the combined action of the two important characteristic information, so that more accurate judgment is provided.

Preferably, the transverse sizes of convolution kernels of all layers of the multi-scale FCN module are 8, 5 and 3, the total number of convolution kernels is 128, 256 and 128, the void rate d is not more than 4, and the void rate of all layers is in a pyramid structure.

Preferably, the number proportion of each layer of scale receptive fields of the multi-scale FCN module is variable, and super-parameters are setAs an adjustable proportion, the calculation formula of the real convolution kernel number NF _Li of the ith scale of the L layer is as followsWhere NF _L is the total number of layer L convolution kernels.

Preferably, the LSTM module performs dimensional transposition on univariate time series data, transforms the univariate time series data into serial input of one value at a time, and then adjusts the number of LSTM neurons according to the complexity of time features in specific data and the training capacity of a model.

Preferably, the LSTM neuron number is 8, 64, or 128.

Preferably, the LSTM module further performs pruning operations.

Preferably, the Dropout rate of the pruning operation is set to 0.8.

Preferably, in the step (3), the splicing and integrating of the time and space features specifically refers to the splicing and integrating of the time and space features by using a layer of full connection structure and a SoftMax activation function.

Preferably, step (3) further comprises training the LSTM-MFCN model using an error back propagation algorithm, preserving the model with minimal error.

The invention has the beneficial effects that: aiming at the problem that the difference information is difficult to obtain from the original time sequence directly, the invention designs a time sequence classification method based on LSTM and multi-scale FCN. The method can comprehensively explore the space characteristics with large-scale and multi-scale receptive fields, and can adaptively learn long-term and short-term time dependence, and learned beneficial information is more comprehensive than that of the existing model.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of the structure of LSTM-MFCN of the present invention;

FIG. 2 is a flow chart of a time series classification method based on LSTM and multi-scale FCNs in accordance with the present invention;

FIG. 3 is a graph of model critical differences for experimental comparisons;

FIG. 4 is a graph of accuracy versus structure for a FCN-dependent series of models or structures;

FIGS. 5a and 5b are LSTM-MFCN experimental results and their comparison to baseline models.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, other embodiments that may be obtained by those of ordinary skill in the art without making any inventive effort are within the scope of the present invention.

As shown in fig. 1-5 b, the invention provides a time sequence classification method based on LSTM and multi-scale FCN, which takes a double-scale and three-scale convolution structure as a specific embodiment, and is composed of a multi-scale FCN module and an LSTM module for performing multi-scale convolution, wherein the multi-scale FCN module is used for fully extracting geometric space features of various granularities of a time sequence curve, the LSTM module is used for learning features of time change of sequence values, and feature vectors output by the two modules are spliced and learned by a layer of neurons and are converted into classification results. The method specifically comprises the following steps:

(1) Extracting time dependency characteristics by using a long-short-term memory network, obtaining dependency relationship between the current value of the observed variable and historical data, and describing time sequence association inside the sequence;

(3) And integrating and distinguishing the space-time characteristics, splicing and integrating the time and space characteristics, and adaptively learning the mapping relation between the space-time characteristics and sample characteristics by using a fully connected neural network to obtain an LSTM-MFCN model.

The method according to the invention is explained in detail below:

The deep neural network needs to be subjected to structural setting before training, but the neural network, particularly the multi-modal network, generally sets the general structure of the model directly, keeps the super parameters of the part of the structure unchanged for all data sets, then carries out limiting gridding search on the super parameters of the part of the structure aiming at each data set, and evaluates the super parameters according to the overall performance of the super parameters on all data sets. Previous studies have shown the network structure of the FCN and LSTM-FCN models when they are near optimal, and the present invention will be based on this basic structure with constant hyper-parameters as detailed in table 1.

TABLE 1 structural superparameter for classical multimodal neural networks LSTM-FCN

The multi-scale FCN convolution module has the convolution kernel transverse dimensions of 8, 5 and 3 and the total convolution kernels of 128, 256 and 128. The multiscale is realized by adjusting the void ratio, the larger scale part is realized by void convolution, and the smaller scale part is realized by common convolution. This allows more diverse features to be extracted at limited depths on an equivalent parametric scale.

Too high a void fraction can cause too large a span between adjacent data being sensed, resulting in the convolution operation not extracting valid features. Thus, the void fraction d in the method is not more than 4. In addition, the CNN series model mostly keeps each layer of receptive field in a pyramid structure when in application, and the feature size is smaller and smaller after layer-by-layer extraction. The selection of the void fraction of each layer also follows the principle, for example, three layers are taken as a combination of 4, 2 and 1.

Considering that the number proportion of different scale features displayed by each time series data is not necessarily the same, the number proportion of the receptive fields of various scales is also variable, and the super-parameters are setAs an adjustable ratio thereof. The number of true convolution kernels NF _Li for the ith scale of layer L may be calculated as in equation 1:

where NF _L is the total number of layer L convolution kernels.

Based on the above structural setup, two specific multi-scale structures will be used as embodiments of the proposed method:

1) When m=2, it can be represented by LSTM-DFCN (Dual-scale FCN). In the multiscale FCN module of LSTM-DFCN, each layer has receptive fields of two scales. The first layer of large receptive field convolution kernel is realized by adopting the void ratio of 4 or 2, the second layer is realized by adopting the void ratio of 2, and the small receptive fields of the first two layers are realized by the traditional convolution. The convolution kernel ratio w1, w2 for both scales is 2: 2. 3: 1. 1:3, one of three. But eventually only one case where the first layer d is 2 or 4 will remain as the behavior of the method at the double scale.

2) The structure with m=3 can be denoted as LSTM-TFCN (Triple-scale FCN) as a general example. The large scale receptive field of the first two layers is implemented using a void fraction of 4, the middle receptive field is implemented using a void fraction of 2, and the small receptive field is also implemented using a conventional convolution, i.e. d=1. w1, w2, w3 at 2:1: 1. 1:2:1 and 1:1: 2.

The long and short term memory network LSTM is able to adaptively obtain the dependency that exists between each current value of the observed variable of the time series and the historical data.

Since LSTM is not a point-to-point model but is dependent on the history state, first the univariate time series data is dimension transposed and converted to serial input of one value at a time. The LSTM neuron number can then be adjusted in 8, 64, and 128 depending on the complexity of the temporal features in the particular data and the training capabilities of the model. To reduce the complexity of this portion, pruning is used to compress it, culling out the portion that contributes little, while enhancing the robustness of the extracted features. The Dropout rate of the pruning operation is set to 0.8.

Integrating time and space characteristics by using a full connection layer and making classification prediction: the two features extracted by LSTM and MFCN are not direct classification results, and may have a certain difference in data specification, and two kinds of information need to be adaptively integrated and converted to obtain a predicted class. The most traditional neural network of full-connection structure has the ability to fit any complex nonlinear functional relationship, where a layer of full-connection structure is used with SoftMax activation function as the final output layer. Thus, the number of neurons of this layer should be consistent with the number of classes of samples to be classified. The probability value output by the model is directly compared with the class label in One-hot form during training, the error of the model can be calculated, and the maximum term of the probability is output as the predicted class during testing.

Based on the first four steps, a complete LSTM-MFCN model can be obtained, and the flow direction of the time sequence in the model is shown in algorithm 1. Meanwhile, table 2 gives specific structural parameters of the two proposed embodiments.

The time series data is first divided into two parts for processing in LSTM-MFCN. In the first two layers of MFCN modules (rows 2-8 in algorithm 1), the data are convolved under the set multi-scale structure, the calculated result is subjected to batch normalization processing and activation of a linear correction unit to obtain multi-granularity characteristics (rows 4-5), and the multi-granularity characteristics are converged to serve as the input of the next layer (row 7). Feature extraction is performed at the last layer of MFCN with single-scale convolution and features in multiple convolution kernels are integrated and stitched using global pooling (lines 9-10). The LSTM network learns the data to output time-dependent features (lines 11-13). Finally, the full connection layer integrates the spatial and temporal characteristics of the two partial outputs, and the spatial and temporal characteristics are converted into classification results by a SoftMax function (lines 14-15).

Data flow in Algorithm 1 LSTM-MFCN

Table 2 LSTM-DFCN and LSTM-TFCN structural parameters

And then training the model by using an error back propagation algorithm, and reserving the model with the minimum error. Algorithm 2 gives the model building process.

Some of the structural parameters in LSTM-MFCN are constant and are given in the first line of algorithm 2. The parameters that need to be set are the number of multiscales M, the void fraction d ₁,d₂,…,d_M for each scale and all possible multiscale proportional combinations W. In the implementation method, firstly, selecting a possible multi-scale proportion W in W, and calculating an undetermined super-parameter in a MFCN module under the proportion (3 rd to 7 th lines of algorithm 2); the number of neurons of the LSTM module is then explored in a limiting manner, and a specific model structure is determined for each selected number N. Under the structure, training set D can be used for training LSTM-MFCN and predicting test set T (algorithm 2, steps 8-14); the best results in the above structure exploration process are retained as the model's behavior at the M scale.

Algorithm 2A time sequence classification method based on LSTM and Multi-scale FCN

In connection with the scheme of the invention, experimental analysis was performed as follows:

To verify the validity of the model, experiments were performed on two specific structures of the proposed LSTM-MFCN on UCR time series classification standard dataset. Since numerous deep-learning TSC models were compared on the datasets used in the baseline experiments of the literature, the present experiment also selected a total of 44 datasets.

(1) Verification experiment

For the representative structures of m=2 and m=3 (only one of m=2 is reserved as a result), the number of receptive fields with different scales has 3 possible ratios, and different neuron numbers can be given to the LSTM network under each ratio for limiting tuning, and the specific flow is shown in algorithm 2. To ensure comparative fairness, 6 equal amounts of replicate experiments were performed on LSTM-FCNs (LSTM-MFCN each structure: 3 multiscale ratios 2 replicates for 6 total). In addition, the LSTM-FCN and the LSTM-DFCN and the LSTM-TFCN are used for judging the test set by using the model with the minimum training loss, and the best result of each data set is taken as the performance of each model in the experiment. Other hyper-parameters settings for the experiments are given in table 3.

Table 3 LSTM-MFCN Supermarameter selection for verification experiments

After 100 times of training, the learning rate is reduced when the verification result is not improvedUntil a final learning rate.

The LSTM-MFCN provided by the invention is a multi-modal neural network, so that a baseline model is mainly based on a TSC method based on deep learning, and meanwhile, a plurality of representative non-deep learning methods with better performance are selected for comparison. Selecting FCN and ResNet to represent a basic single-mode deep learning method; MCNN, MFCN, LSTM-FCN and GRU-FCN represent classical multi-modal neural network TSC models, wherein MCNN input information contains representations of time series at various downsampling rates, which is equivalent to multi-scale spatial feature learning. MFCN is a multi-scale convolution FCN model that is structurally different from the present chapter algorithm, and is enhanced by the LSTM-free module. While LSTM-FCN is one of the sources of inspiration of the proposed model of the invention, GRU-FCN is an optimization attempt based on LSTM-FCN. The unimodal or integrated non-deep learning TSC model based on distance metrics, shape, symbolization, frequency domain information, etc., is represented by LWDTW, BOSS, COTE, PROP and HIVE-COTE.

Common result evaluation indicators in the TSC field include overall classification accuracy or error rate, number of times the best result is achieved, average ranking of results, and average class error rate MPEC (mean per-class error). MPEC can be used to anticipate the classification error rate of the model for individual classes, which is defined as in equations 2 and 3.

Wherein: n represents the number of data sets, i is the ith data set,E _i denotes the error rate of the i-th dataset; c _i denotes the number of categories of the i-th dataset; the PEC represents the error rate that is averaged over each category.

In addition to using the four metrics described above for scoring each model, the Friedman rank sum test based on algorithmic ordering can also be used to overall evaluate whether multiple models perform differently across multiple data sets. Rank, i.e., the ordering of the model to the classification accuracy of a data set in all participating comparison models, the calculation of rank sums may utilize the average rank found and the total number of data sets. Assuming that k algorithms are compared across N data sets, the statistical quantity FF constructed in Friedman test is calculated as follows:

where R _j is the average rank of the jth algorithm over all datasets.

The original assumption H0 for Friedman hypothesis test is: the multiple samples were from a population with no significant differences, i.e., there were no differences between the multiple models that participated in the comparison. When FF is greater than the threshold, the original assumption H0 is rejected, indicating that there is a significant difference in performance of the algorithms. At this point, the difference should be analyzed continuously, and the following test Nemenyi is often used to replace the pairwise comparison between algorithms. Critical parameters in Nemenyi test-critical value CD (Critical Distance) for average rank difference can be calculated according to equation 6, where q _α is the test coefficient available in a table look-up. When the ranking of the two algorithms is greater than CD, a significant difference is considered. The specific performance differences for all models involved in the comparison are generally given in the form of a critical difference plot.

Experiments show that when the first layer of the double-scale structure has a void ratio of 4, the performance is slightly better, so the result in this case is taken as the performance of LSTM-DFCN. FIGS. 5a and 5b fully present experimental results of two specific structures of the proposed LSTM-MFCN on UCR datasets. Max represents the best result among the two, but does not participate in ranking comparisons. The bolded font indicates that the model achieves the highest classification accuracy of all the participating contrast models on the dataset.

After the ranking of each algorithm is calculated, it may continue to be differentially analyzed. At significant levels α=0.05, the experiment required a subsequent Nemenyi test, with the resulting critical differences as shown in fig. 3, where cd=2.591. The algorithms are arranged according to their average rank R _j, and there is no statistically significant difference between a set of algorithms covered by the same line (length is the CD value). It can be seen that LSTM-TFCN has a significant effect improvement over LSTM-FCN. While LSTM-MFCN (collectively referred to herein as LSTM-DFCN and LSTM-TFCN) does not significantly differ from the more excellent GRU-FCN, hive-COTE, the two structures of LSTM-MFCN in this set of methods perform best in the average accuracy, average rank, and MPEC of the above-mentioned evaluation indices, with the number of times that the highest accuracy is achieved also being the leading.

Combining the detailed scores of fig. 5a and 5b with the visual ranking and performance differences presented in fig. 3, the following analysis can be made:

In the baseline model, LWDTW is an excellent improvement algorithm based on DTW, and pro is a classifier integrating 11 different elastic distance measurement modes, but they achieve the worst performance in this comparison. The reason is that distance-based methods have difficulty giving the correct distance measure in case of sequence misalignment. The method based on the feature representation and the integrated non-deep learning model is slightly inferior to the deep learning method. The classical BOSS and COTE ensemble learning methods perform close to the single-mode neural network ResNet and FCN, but inferior to the multi-mode neural network; the Hive-COTE with new breakthrough can maintain the performance similar to that of the multi-mode neural networks MFCN and LSTM-FCN by virtue of a huge and complex structure, but the performance is inferior to the latest method proposed in the chapter.

LSTM-MFCN is superior to LSTM-FCN and GRU-FCN in terms of average accuracy, rank, expected class error rate, etc., demonstrating that large receptive fields implemented by hole convolution can play a role in time series data, and that extracting multiple granularity spatial features with multi-scale receptive fields is superior to extracting fixed-scale features with single-scale receptive fields. Both LSTM-FCN and LSTM-MFCN perform better than MFCN and MCNN, which illustrate that the time-varying features of the data are also important information for TSC problems, and that focusing on the temporal features and the multi-scale spatial features can learn the data more comprehensively, thus having better performance. More specifically, the GRU-FCN is superior to all other models in the number of times of obtaining the highest classification accuracy, but is slightly inferior in average accuracy, rank and the like, which shows that the model is outstanding for some data sets, but is more degraded in performance on other data, and the robustness or application range of the model is not as good as that of LSTM-MFCN proposed in the chapter.

For both specific structures of the proposed model, LSTM-DFCN and LSTM-TFCN achieved performance improvement on the basis of single-scale LSTM-FCN, but LSTM-DFCN performed at a distance from LSTM-TFCN. This shows from the side that under multi-scale convolution, the receptive fields of various scales each perform to learn spatial features of different granularity. LSTM-TFCN with three dimensions can better cover diverse spatial features, while a dual-scale structure, no matter its large receptive field convolution kernel has a void ratio of 2 or 4, will be less of a concern for features that appear to be of some length. If the best results obtained for LSTM-DFCN at the two void fractions of the first convolutional layer are combined, it can be found that at this time the average accuracy of LSTM-DFCN reaches 0.925, which is even slightly better than LSTM-TFCN. Since LSTM-DFCN at this time also has the perception of d=1, 2,4, and it tries 6 super-parametric combinations, the search is more detailed. In addition, in fig. 5a and 5b, max represents the effect that can be achieved by performing a more detailed structural exploration in a multi-scale structure. If Max is represented, the model proposed in this chapter has a greater advantage of leading from three evaluation indexes.

(2) Ablation experiments

In order to prove the effect exerted by the multi-scale convolution and the multi-scale receptive field realized by using the cavity convolution mode, the section surrounds an LSTM-DFCN model, and an ablation experiment is carried out on two similar structures realized on the basis of the traditional convolution kernel. The method comprises the following steps:

1) Maintaining the same parameter scale as LSTM-DFCN, and realizing multi-scale convolution by using a traditional convolution kernel;

2) The same receptive field size as LSTM-DFCN is maintained, and the multi-scale convolution implemented with conventional convolution kernels.

Experiments for each structure were repeated twice, leaving the best results as representative of the performance under that structure. The specific structure settings are shown in Table 4, wherein LSTM-DFCN (1) and LSTM-DFCN (2) represent two similar comparative structures, respectively. Three sets of comparisons can be made from the above experimental results. The first set, performed for LSTM-DFCN (1) and single-scale convolutions LSTM-FCN, can be used to explore whether using multi-scale convolutions has advantages over single-scale convolutions. In the second set of comparisons, the experimental results of LSTM-DFCN (1) were compared with the results of LSTM-DFCN to demonstrate whether a larger receptive field combination achieved using hole convolution could provide further improvement to the model. In the third set of comparisons, LSTM-DFCN (2) was compared with the results of LSTM-DFCN to verify whether the large receptive field combination formed by the hole convolution could be replaced by a mere expansion of the convolution kernel size.

Table 4 structural setup and experimental results relating to the model in ablation experiments

The average experimental accuracy of the several models mentioned above over 44 UCR datasets is also presented in table 4, with overall classification accuracy for both comparison structures being lower than the proposed LSTM-DFCN. The classification accuracy of LSTM-DFCN (1) for the multi-scale convolution is higher than that of LSTM-FCN for the same number of times compared with the comparison of the first group under the same parameter scale. In fact, each convolution kernel can extract features that are smaller than or equal to its own size, and cannot extract features that are larger than itself. Thus, when a portion of the convolution kernel is scaled down in size, or still small scale detail features can be extracted; the other part is amplified, the receptive field of the convolution kernel is increased, but the characteristics with larger granularity which cannot be perceived before are learned. Therefore, the multi-scale convolution is equivalent to reasonably distributing the size and the proportion of the convolution kernel again, and the feature extraction effect is better. In the second set of comparisons, the overall classification accuracy achieved by the LSTM-DFCN (1) structure implemented by conventional convolution is lower than LSTM-DFCN. As will be appreciated from a review of Table 4, the combination of multiscale receptive fields that can be achieved without introducing a hole convolution is about (10, 6) - (6, 4), while those that can be achieved after introducing a hole are (32,8) - (10, 5). LSTM-DFCN can achieve multiscales at larger receptive field size levels with limited parameters using hole convolution. Meanwhile, the hole convolution can adapt to time series data with continuous values, and the characteristics of the time series data can be roughly learned by sensing the data through the convolution kernel with the holes in an equidistant jumping mode. Thus, LSTM-DFCN has the opportunity to learn the larger scale features that LSTM-DFCN (1) cannot do. Comparing LSTM-DFCN (2) with LSTM-DFCN may demonstrate that simply relying on conventional convolution to expand the receptive field effect is not as effective as using hole convolution, as this operation results in an increase in the overall parameters to be trained. The training sample data of many data sets in the TSC field are less, and the training capacity is limited, so that the training process of the structure is limited, and LSTM-DFCN (2) can obtain relatively poor results under a large parameter quantity.

The above ablation experimental results and analysis prove the correctness and effectiveness of the optimization ideas proposed in this chapter. Under the same condition, the characteristic extraction of the time sequence data by using the multi-scale receptive field is superior to that of single-scale convolution, and the large-scale receptive field realized by using the cavity convolution can effectively sense the characteristic of the diversity of the time sequence data on the premise of not influencing the training effect of the model.

In addition, the results of this ablation experiment can be compared to all previous related studies. Fig. 4 more intuitively shows the difference in classification accuracy of the FCN-based series model, and shows the gain effect caused by various improved ideas. FCN is a basic single-mode neural network, MFCN and LSTM-FCN are perfect models from different angles on the basis of the FCN, and the precision is improved; LSTM-DFCN (1) and LSTM-DFCN (2) are models designed by taking two improved ideas into consideration at the same time, and the task performance of the models is further broken through; LSTM-DFCN and LSTM-TFCN introduce a hole convolution on the basis of retaining both optimizations, which enables a larger receptive field under the constraint of equal training capabilities. Therefore, the two can feel multi-scale geometric features on a larger granularity level and learn time features at the same time, integrates all advantages, and obtains the highest classification precision.

With the above description of the preferred embodiments according to the present invention as a teaching, those skilled in the art can make various changes and modifications without departing from the scope of the technical idea of the present invention. The technical scope of the present invention is not limited to the description, but must be determined according to the scope of the claims.

Claims

1. The time sequence classification method based on LSTM and multi-scale FCN is characterized by being realized by a system consisting of a multi-scale FCN module and an LSTM module which carry out multi-scale convolution, wherein time sequence data are data acquired by pressure applied to mechanical equipment and vibration sensors, and specifically comprises the following steps of:

(2) Fully mining spatial features of multiple granularities of a time sequence curve by using a full convolution module, splitting the first two layers of convolution kernels in a classical FCN structure into multiple groups, wherein the adopted multiple scale numbers are represented by M, a large scale part is realized by cavity convolution, a small scale part is realized by common convolution, converging the extracted multiple granularity abstract features of each layer at a MFCN module, uniformly conveying the converging multiple granularity abstract features to the next layer, extracting features by using single scale convolution at the last layer of the MFCN module, and integrating the features in the multiple convolution kernels by using global pooling as the finally output features;

When m=2, in the multi-scale FCN module, each layer has two scale receptive fields, wherein the first layer of large receptive field convolution kernel is realized by adopting the void ratio of 4 or 2, the second layer of large receptive field convolution kernel is realized by adopting the void ratio of 2, and the small receptive fields of the first two layers are realized by traditional convolution; or alternatively

When m=3, the large-scale receptive field of the first two layers is realized by using the void ratio 4, the middle receptive field is realized by using the void ratio 2, and the small receptive field also uses the traditional convolution;

2. The time sequence classification method based on LSTM and multi-scale FCN according to claim 1, wherein the transverse sizes of convolution kernels of each layer of the multi-scale FCN module are 8, 5 and 3, the total number of convolution kernels is 128, 256 and 128, the void ratio d is not more than 4, and the void ratio of each layer is in a pyramid structure.

3. The time sequence classification method based on LSTM and multi-scale FCN as claimed in claim 1, wherein the number proportion of scale receptive fields of each layer of the multi-scale FCN module is variable, and a super parameter w ₁,w₂,K,w_M is set as the adjustable proportion thereof, and then the calculation formula of the real convolution kernel number NF _Li of the ith scale of the L-th layer is as followsWhere NF _L is the total number of layer L convolution kernels.

4. The time series classification method based on LSTM and multi-scale FCN as claimed in claim 1, wherein the LSTM module performs dimension transposition on single-variable time series data, converts the single-variable time series data into serial input of one value at a time, and then adjusts the number of LSTM neurons according to the complexity of time features in specific data and the training capacity of a model.

5. The method of claim 4, wherein the LSTM neurons count is 8, 64 or 128.

6. The method of claim 4, wherein the LSTM module further performs pruning.

7. The method of time series classification based on LSTM and multi-scale FCN according to claim 6, wherein Dropout rate of pruning operation is set to 0.8.

8. The method of claim 1, wherein in the step (3), the step of splicing and integrating the time and space features specifically refers to using a layer of full connection structure together with a SoftMax activation function to splice and integrate the time and space features.

9. The method of claim 1, wherein step (3) further comprises training the LSTM-MFCN model using an error back propagation algorithm, preserving a model with minimal error.