CN110751169B

CN110751169B - Time sequence classification method based on relation change among multiple variables

Info

Publication number: CN110751169B
Application number: CN201910833290.0A
Authority: CN
Inventors: 蔡瑞初; 陈嘉伟; 温雯; 郝志峰; 陈炳丰; 李梓健
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2019-09-04
Filing date: 2019-09-04
Publication date: 2023-09-29
Anticipated expiration: 2039-09-04
Also published as: CN110751169A

Abstract

The invention provides a time sequence classification method based on relation change among multiple variables, which comprises the following steps: obtaining sample data from an observation data set, calculating the partial correlation coefficient between every two variables of the sample data, and constructing a partial correlation coefficient matrix; coding the bias correlation coefficient matrix through a convolutional neural network to obtain a corresponding feature map; stretching each feature map into feature vectors, circularly inputting the feature vectors into a long-short memory neural network, and obtaining a hidden state for capturing a change mode among variable relations; and inputting the hidden state into a tag classifier, outputting a corresponding sample category, and completing the classification of the time sequence. The time sequence classification method based on the relation change among the multiple variables fully considers the relation among different variables in the time sequence data, classifies the time sequence data based on the relation modes of the variables, fully expresses the change modes of the relation among different variables in the time sequence data, and has better robustness to the input noise value and high classification precision.

Description

Time sequence classification method based on relation change among multiple variables

Technical Field

The invention relates to the technical field of data mining, in particular to a time sequence classification method based on relation change among multiple variables.

Background

The use of time series data in industrial systems, information systems, medical health, financial markets, etc. is becoming more and more common today. Therefore, the task of classifying time series has become an important and valuable research topic, such as anomaly detection. Traditional time sequence classification methods based on similarity, such as K-nearest neighbor (KNN) and Dynamic Time Warping (DTW), etc. However, such methods are only sensitive to the values of the variables and do not take into account the relationship between the different variables.

Another type of method that is currently popular is to perform a series of feature transformations on the time series data, thereby mining patterns therein for classification, such as multi-layer perceptron (MLP), long-short-term memory neural network (LSTM), convolutional Neural Network (CNN), etc. Such methods, while implicitly capturing relationships between different variables in feature space, have difficulty characterizing the pattern of changes in relationships between the variables. On the time series classification problem, some kind of variation in the relationship between variables often represents a class of classification. For example, in an information system, generally, an increase in "CPU usage" of a certain server causes an increase in "CPU temperature", and an increase in "CPU temperature" causes an increase in "fan speed", so that the "CPU temperature" is maintained relatively stable while the "CPU usage" continues to increase. It can be seen that the relationship between "CPU utilization" and "CPU temperature" changes from independent to independent during this period. However, when the fan of the server fails, then both the "CPU temperature" and the "fan speed" may be irrelevant, and an increase in "CPU utilization" results in a continuous increase in "CPU temperature" and even in downtime of the server. Therefore, the relationship between the "CPU usage" and the "CPU temperature" for this period of time is always not independent.

The manner of change in the relationship between the variables is different in the two categories, however, the current method cannot express and classify the change well.

Disclosure of Invention

The invention provides a time sequence classification method based on the relation change among multiple variables, which aims to overcome the technical defects that the existing time sequence data classification method can not effectively express the change modes of the relation among the variables and is used for mankind.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a method of time series classification based on a change in a relationship between multiple variables, comprising the steps of:

s1: acquiring a labeled observation data set;

s2: obtaining sample data from an observation data set, calculating the partial correlation coefficient between every two variables of the sample data, and constructing a partial correlation coefficient matrix to obtain a partial correlation coefficient matrix at each moment;

s3: taking the partial correlation coefficient matrix at each moment as an input convolutional neural network CNN, and encoding the partial correlation coefficient matrix by the convolutional neural network to obtain a corresponding feature map;

s4: stretching each feature map into feature vectors, circularly inputting the feature vectors into a long-short memory neural network LSTM, and thus obtaining a hidden state for capturing a change mode among variable relations;

s5: and inputting the hidden state into a tag classifier, outputting a corresponding sample category, and completing the classification of the time sequence.

The step S1 specifically includes:

sampling by utilizing the fixed time of a data acquisition device of an industrial system or an information system; obtaining different index values at each sampling moment, simultaneously representing the system state corresponding to the moment by using a tag variable, and acquiring an observation data set after the system runs for a period of time, wherein:

characterizing the observation dataset as x= [ X ₁ ,x ₂ ,Λ,x _m ]Wherein m is the number of samples; let sample data x at time t _t ∈R ⁿ I.e. having n variables, with one tag variable y for each sample data _t, wherein y_t ∈R。

The step S2 specifically includes:

s21: acquisition of sample data X of length w from an observation dataset _t ＝[x _t-w+1 ,x _t-w+2 ,Λ,x _t], wherein X_t For a time slice in X, for calculating a matrix P of partial correlation coefficients _t ∈R ^n×n As a relation matrix between variables at time t;

s22: the time sequence of two variables i and j in the time period is respectivelyThen one coefficient in the partial correlation coefficient matrix +.>The calculation method is as follows:

wherein ,is covariance matrix sigma _t Is the inverse of the element of the inverse matrix of (b), and the covariance matrix sigma _t Element->The calculation method is as follows:

wherein , and />Representing the mean of the two variables over the period of time, respectively.

S23: obtaining a bias correlation coefficient matrix P at each moment according to the calculation mode of the step S22 _t For representing the relationship between the different variables at each instant.

The step S3 specifically includes: the partial correlation coefficient matrix P with a period of time of l _t-l+1 ,P _t-l+2 ,ΛP _t Inputting a convolutional neural network, and encoding a bias correlation coefficient matrix by the convolutional neural network to obtain corresponding l feature graphs and corresponding labels y at each moment _t 。

Wherein in said step S4, the hidden state h _t For capturing the pattern of variation between the relationships of the l variables.

In the step S5, the label classifier uses a full connection layer and outputs the label classifierTo sample class

Wherein the method further comprises step S6: and (3) adopting the cross entropy of the output sample class as a loss function, and repeating the steps S3-S5 by using a gradient descent method so as to improve the classification accuracy.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the time sequence classification method based on the relation change among the multiple variables fully considers the relation among different variables in time sequence data, and classifies the time sequence based on the relation modes of the variables.

Drawings

FIG. 1 is a schematic flow chart of the method of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;

it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.

Example 1

As shown in fig. 1, a method for time sequence classification based on a change in a relationship between multiple variables includes the steps of:

s1: acquiring a labeled observation data set;

The step S1 specifically includes:

More specifically, the step S2 specifically includes:

More specifically, the step S3 specifically includes: the partial correlation coefficient matrix P with a period of time of l _t-l+1 ,P _t-l+2 ,ΛP _t Inputting a convolutional neural network, and encoding a bias correlation coefficient matrix by the convolutional neural network to obtain corresponding l feature graphs and corresponding labels y at each moment _t 。

More specifically, in said step S4, the state h is hidden _t For capturing the pattern of variation between the relationships of the l variables.

More specifically, the method comprises the steps of,in the step S5, the tag classifier uses a full connection layer, and outputs a sample class

More specifically, the method further comprises step S6: and (3) adopting the cross entropy of the output sample class as a loss function, and repeating the steps S3-S5 by using a gradient descent method so as to improve the classification accuracy.

In a specific implementation process, the time sequence classification method based on the relation change among multiple variables fully considers the relation among different variables in time sequence data, and classifies based on the relation modes of the variables.

It is to be understood that the above examples of the present invention are provided by way of illustration only and not by way of limitation of the embodiments of the present invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims

1. A method of time series classification based on a change in a relationship between multiple variables, comprising the steps of:

s1: acquiring a labeled observation data set;

the step S1 specifically comprises the following steps:

characterizing the observation dataset as x= [ X ₁ ,x ₂ ,…,x _m ]Wherein m is the number of samples; let sample data x at time t _t ∈R ⁿ I.e. having n variables, with one tag variable y for each sample data _t, wherein y_t ∈R；

the step S2 specifically comprises the following steps:

s21: acquisition of sample data X of length w from an observation dataset _t ＝[x _t-w+1 ,x _t-w+2 ,…,x _t], wherein X_t For a time slice in X, for calculating a matrix P of partial correlation coefficients _t ∈R ^n×n As a relation matrix between variables at time t;

wherein , and />Respectively representing the average value of the two variables in the period of time;

s23: obtaining a bias correlation coefficient matrix P at each moment according to the calculation mode of the step S22 _t For representing the relationship between the different variables at each instant;

s3: taking the partial correlation coefficient matrix at each moment as an input convolutional neural network, and encoding the partial correlation coefficient matrix by the convolutional neural network to obtain a corresponding feature map;

the step S3 specifically comprises the following steps: the partial correlation coefficient matrix P with a period of time of l _t-l+1 ,P _t-l+2 ,…P _t Inputting a convolutional neural network, and encoding a bias correlation coefficient matrix by the convolutional neural network to obtain corresponding l feature graphs and corresponding labels y at each moment _t ；

S4: stretching each feature map into feature vectors, circularly inputting the feature vectors into a long-short memory neural network, and obtaining a hidden state for capturing a change mode among variable relations; in said step S4, the hidden state h _t The method is used for capturing a variation mode among the variable relations;

2. The method according to claim 1, wherein in said step S5, said method comprises the steps ofThe label classifier adopts a full connection layer and outputs to obtain sample types

3. A method of time-series classification based on a change in a relationship between multiple variables according to any one of claims 1-2, further comprising step S6: and (3) adopting the cross entropy of the output sample class as a loss function, and repeating the steps S3-S5 by using a gradient descent method so as to improve the classification accuracy.