CN112990372A - Data processing method, model training device and electronic equipment - Google Patents

Data processing method, model training device and electronic equipment Download PDF

Info

Publication number
CN112990372A
CN112990372A CN202110462699.3A CN202110462699A CN112990372A CN 112990372 A CN112990372 A CN 112990372A CN 202110462699 A CN202110462699 A CN 202110462699A CN 112990372 A CN112990372 A CN 112990372A
Authority
CN
China
Prior art keywords
data
window
processing
sequence
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110462699.3A
Other languages
Chinese (zh)
Other versions
CN112990372B (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Real AI Technology Co Ltd
Original Assignee
Beijing Real AI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Real AI Technology Co Ltd filed Critical Beijing Real AI Technology Co Ltd
Priority to CN202110462699.3A priority Critical patent/CN112990372B/en
Publication of CN112990372A publication Critical patent/CN112990372A/en
Application granted granted Critical
Publication of CN112990372B publication Critical patent/CN112990372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Testing And Monitoring For Control Systems (AREA)

Abstract

The application relates to a data processing method, a model training device and electronic equipment, and belongs to the technical field of data processing. Acquiring sequence data which are acquired by a sensor at different times and are arranged according to a time sequence; pre-processing the sequence data, the pre-processing comprising normalization or normalization; performing sliding window processing on the preprocessed sequence data to obtain a plurality of window data; inputting the window data into a pre-trained VAE model for data anomaly detection to be processed, and obtaining anomaly data in the window data and a predicted value of the anomaly data; and for each abnormal data, correcting the abnormal data according to the predicted value of the abnormal data. The method can quickly and accurately detect the abnormal value in the data acquired by the sensor and correct the abnormal value.

Description

Data processing method, model training device and electronic equipment
Technical Field
The application belongs to the technical field of data processing, and particularly relates to a data processing method, a model training device and electronic equipment.
Background
The detection and correction of abnormal values of dam sensors (sensors applied to dams, such as pressure sensors, displacement sensors, flow sensors and the like) are related to the overall healthy operation of the dam and even the personal safety of surrounding residents. At present, two methods are mainly used for detecting and correcting data abnormity of a dam sensor:
1. manual detection and correction by expert experience. This method is highly accurate, but also has many problems: a) the reproducibility is poor. The judgment result is different from person to person and even the same person can make the opposite judgment on the same data due to different time. b) The labor cost is high. c) When mass data exists, the method cannot be completely finished by depending on experts, so that the method is usually adopted only when the sensor channel data is important.
2. Methods based on statistical tests, such as 3-sigma anomaly detection methods. Such methods assume that normal data follows some particular distribution and that the vast majority of the data are normal values. It is also simple to define the outlier by considering the data with a larger offset compared to the location of the normal point as an anomaly. The 3-sigma method can only identify outliers and cannot correct outliers. In addition, this method cannot determine an abnormal region that is not an outlier, resulting in low determination accuracy.
Due to the fact that dam sensors are numerous, the requirement cannot be met by means of manual expert diagnosis. Thus, automated detection diagnosis by means of a computer is becoming a necessity. However, due to the complexity of the changes of the surrounding environment of the dam, the sensor data mode also presents complexity, such as: large amounts of data are missing and jumping, the distribution of data may change over time, etc. These problems result in that no detection result of a certain model can achieve the effect of manual diagnosis by experts at present.
Disclosure of Invention
In view of the above, an object of the present application is to provide a data processing method, a model training device, and an electronic device, so as to solve the problems that the existing detection method cannot accurately detect an abnormal value in data collected by a sensor, and cannot correct the abnormal value.
The embodiment of the application is realized as follows:
in a first aspect, an embodiment of the present application provides a data processing method, including: acquiring sequence data which are acquired by a sensor at different times and are arranged according to a time sequence; pre-processing the sequence data, the pre-processing comprising normalization or normalization; performing sliding window processing on the preprocessed sequence data to obtain a plurality of window data; inputting the window data into a pre-trained VAE (variable Auto Encoder) model for data anomaly detection to be processed, and obtaining anomaly data in the window data and a predicted value of the anomaly data; and for each abnormal data, correcting the abnormal data according to the predicted value of the abnormal data. In the embodiment of the application, the abnormal data and the predicted value of the abnormal data in the window data can be obtained by acquiring the sequence data which is acquired by the sensor at different moments and arranged according to the time sequence, then carrying out standardization or normalization pretreatment on the sequence data, then carrying out sliding window treatment on the pretreated sequence data to obtain a plurality of window data, then inputting the window data into a pre-trained VAE model for data abnormality detection to be processed, and finally correcting the abnormal data according to the predicted value of the abnormal data for each abnormal data, so that the abnormal value in the data acquired by the sensor can be rapidly and accurately detected, and the abnormal value can be corrected.
With reference to a possible implementation manner of the embodiment of the first aspect, inputting the window data into a pre-trained VAE model for data anomaly detection to process the window data, includes: performing reverse order operation on each window data in the plurality of window data to obtain corresponding reverse order window data; and inputting each window data and the corresponding reverse-order window data into the VAE model for processing to obtain abnormal data and a predicted value of the abnormal data. In the embodiment of the application, the reverse order operation is carried out on each window data to obtain the corresponding reverse order window data, each window data and the corresponding reverse order window data are input into the VAE model to be processed to obtain the abnormal data and the predicted value of the abnormal data, and due to the fact that the mode carries out bidirectional prediction on the data from the forward direction and the reverse direction, the accuracy of judging the abnormal data can be further improved.
With reference to a possible implementation manner of the embodiment of the first aspect, if the normalization preprocessing is performed on the sequence data, the method further includes: performing negation operation on data in each window data to obtain corresponding negation window data; performing negation operation on data in the negative sequence window data corresponding to each window data to obtain negation negative sequence window data; correspondingly, inputting each window data and the corresponding reverse-order window data into the VAE model for processing, including: and inputting each window data and the corresponding inverted window data, the inverted window data and the inverted window data into the VAE model for processing. In the embodiment of the application, if the sequence data is subjected to the standardized preprocessing, the data in each window data can be subjected to negation operation to obtain corresponding negation window data, and the data in the inverted window data corresponding to each window data is subjected to negation operation to obtain inverted window data; the method carries out multi-directional prediction on the data through forward/reverse and upright/inverted (taking the inverse), so that the judgment of abnormal data is more accurate.
With reference to a possible implementation manner of the embodiment of the first aspect, inputting each window data and corresponding inverted window data, and inverted window data into the VAE model for processing includes: inputting each window data and corresponding inverted window data, inverted window data and inverted window data into the VAE model, and outputting prediction window data corresponding to each window data, prediction inverted window data corresponding to each inverted window data and prediction inverted window data corresponding to each inverted window data; for data i with a data number not smaller than the sliding window size s, if the absolute value of the difference value between the data and the respective predicted value of the data i in the ith-s +1 th prediction window data, the ith-s +1 th prediction reversal window data, the ith prediction reversal window data and the ith prediction reversal window data is larger than a preset threshold value, the data is abnormal data. In the embodiment of the application, when the input data includes window data and corresponding inverted-order window data, inverted-order window data and inverted-order window data, for data i with a data number not smaller than the size s of the sliding window, 4 abnormal prediction results can be obtained through two (i-th and i-s + 1-th) window data (namely, the absolute value of the difference between the predicted value and the data is larger than a preset threshold value, the predicted value is an abnormal prediction result), only when the four abnormal prediction results are abnormal at the same time, the data point is marked as abnormal, otherwise, the data point is marked as a normal value, and therefore the detection accuracy can be improved.
With reference to a possible implementation manner of the embodiment of the first aspect, modifying the abnormal data according to the predicted value of the abnormal data includes: acquiring a first predicted value of the abnormal data in window data, a second predicted value of the abnormal data in reverse-order window data, a third predicted value of the abnormal data in reverse-order window data and a fourth predicted value of the abnormal data in reverse-order window data; obtaining an average value of the first predicted value, the second predicted value, the third predicted value and the fourth predicted value; and carrying out inverse standardization processing on the average value, and correcting the abnormal data into data obtained by carrying out inverse standardization processing on the average value. In the embodiment of the application, the average value of the predicted values of the abnormal data in different channels is obtained, then the average value is subjected to inverse standardization processing, and the abnormal data is corrected into the data obtained after the average value is subjected to inverse standardization processing, so that the corrected data is accurate as much as possible.
With reference to a possible implementation manner of the embodiment of the first aspect, modifying the abnormal data according to the predicted value of the abnormal data includes: acquiring a first predicted value of the abnormal data in window data and a second predicted value of the abnormal data in reverse-order window data; obtaining an average value of the first predicted value and the second predicted value; and carrying out the inverse operation of the pretreatment on the average value, and correcting the abnormal data into data obtained by carrying out the inverse operation of the pretreatment on the average value. In the embodiment of the application, the average value of the predicted values of the abnormal data in the window data and the inverted window data is obtained, then the average value is subjected to the inverse operation processing of preprocessing, and the abnormal data is corrected into the data obtained after the inverse operation processing is performed on the average value, so that the corrected data is accurate as much as possible.
With reference to a possible implementation manner of the embodiment of the first aspect, inputting each window data and the corresponding reverse-order window data into the VAE model for processing to obtain the abnormal data and the predicted value of the abnormal data, includes: inputting each window data and the corresponding reverse-order window data into the VAE model, and outputting the prediction window data corresponding to each window data and the prediction reverse-order window data corresponding to each reverse-order window data; and aiming at data i with the data number not smaller than the sliding window size s, if the absolute value of the difference value between the data i and the predicted value of the data i in the (i-s + 1) th prediction window data and the ith prediction reverse order window data is larger than a preset threshold value, the data is abnormal data. In the embodiment of the application, when the input data includes window data and corresponding reverse-order window data, 2 abnormal prediction results (namely, the predicted value is an abnormal prediction result if the absolute value of the difference between the predicted value and the data is greater than a preset threshold value) can be obtained through two (i-th and i-s + 1) window data aiming at the data i of which the data number is not less than the size s of the sliding window, and only when the 2 abnormal prediction results are abnormal at the same time, the data point is marked as abnormal, otherwise, the data point is marked as a normal value, so that the detection accuracy can be improved.
With reference to one possible implementation manner of the embodiment of the first aspect, acquiring sequence data that is acquired by a sensor at different times and is arranged in a time sequence includes: acquiring initial sequence data which are acquired by a sensor at different times and are arranged according to a time sequence; and processing abnormal values in the initial sequence data according to missing values to obtain the sequence data. In the embodiment of the application, the accuracy of the subsequent processing result can be improved by processing the abnormal value in the initial sequence data according to the missing value.
With reference to one possible implementation manner of the embodiment of the first aspect, acquiring sequence data that is acquired by a sensor at different times and is arranged in a time sequence includes: and if the sequence data acquired by the sensor are not the sequence data sampled at equal time intervals, resampling the sequence data acquired by the sensor to obtain the sequence data sampled at equal time intervals. In the embodiment of the application, if the sequence data acquired by the sensor is not the sequence data sampled at equal time intervals, the sequence data acquired by the sensor is resampled to obtain the sequence data sampled at the time intervals, so that the method can be applied to more complex environments.
With reference to a possible implementation manner of the embodiment of the first aspect, performing sliding window processing on the preprocessed sequence data to obtain multiple window data includes: if the length of the continuous missing value in the preprocessed sequence data exceeds a preset length, dividing the preprocessed sequence data into a plurality of subdata fragments according to the length of the continuous missing value exceeding the preset length; performing interpolation completion processing on missing values in the subdata segments of which the data lengths are not less than the size of the sliding window in the plurality of subdata segments; and performing sliding window processing on the sub-data segments after interpolation completion processing to obtain the plurality of window data. In the embodiment of the application, when the sliding window processing is performed, the segments with the continuous missing value length exceeding the preset length are cut to eliminate the segments with the continuous missing value length exceeding the preset length, then the missing values in the sub-data segments with the data length not less than the size of the sliding window are subjected to interpolation completion processing, and finally the sub-data segments subjected to interpolation completion processing are subjected to the sliding window processing, so that the accuracy of data detection can be improved.
With reference to a possible implementation manner of the embodiment of the first aspect, before the window data is input into a pre-trained VAE model for data anomaly detection and processed, the method further includes: acquiring a training data set, wherein the training data set comprises a plurality of window data; and training an initial VAE model by using the training data set to obtain the VAE model. In the embodiment of the application, the VAE model is trained by using the training data set, so that the VAE model can detect abnormal values in input window data according to the learned information by learning information of a large amount of data, and then the VAE model for detecting the sensor data abnormality is obtained.
With reference to a possible implementation manner of the embodiment of the first aspect, the training data set further includes reverse-order window data obtained by performing a reverse-order operation on each of the plurality of window data; acquiring a training data set comprising: acquiring training sequence data which are acquired by a sensor at different times and are arranged according to a time sequence; normalizing or normalizing the training sequence data; performing sliding window processing on the training sequence data after the normalization or normalization preprocessing to obtain a plurality of window data; and performing reverse operation on each window data to obtain the training data set.
In a second aspect, an embodiment of the present application further provides a model training method, including: acquiring sequence data which are acquired by a sensor at different times and are arranged according to a time sequence; pre-processing the sequence data, the pre-processing comprising: normalization or normalization; performing sliding window processing on the preprocessed sequence data to obtain a training data set, wherein the training data set comprises a plurality of window data; and training the VAE model by using the training data set to obtain the trained VAE model for detecting the sensor data abnormity.
In a third aspect, an embodiment of the present application further provides a data processing apparatus, including: the device comprises an acquisition module, a preprocessing module, a sliding window module and a processing module; the acquisition module is used for acquiring sequence data which are acquired by the sensor at different moments and are arranged according to a time sequence; a pre-processing module for pre-processing the sequence data, the pre-processing comprising normalization or normalization; the sliding window module is used for performing sliding window processing on the preprocessed sequence data to obtain a plurality of window data; the processing module is used for inputting the window data into a pre-trained VAE model for data anomaly detection to be processed, so that anomaly data in the window data and a predicted value of the anomaly data are obtained; and for each abnormal data, correcting the abnormal data according to the predicted value of the abnormal data.
In a fourth aspect, an embodiment of the present application further provides a model training apparatus, including: the system comprises an acquisition module, a preprocessing module and a training module; the acquisition module is used for acquiring sequence data which are acquired by the sensor at different moments and are arranged according to a time sequence; a pre-processing module for pre-processing the sequence data, the pre-processing comprising normalization or normalization; the preprocessing module is used for performing sliding window processing on the preprocessed sequence data to obtain a training data set, and the training data set comprises a plurality of window data; and the training module is used for training the VAE model by utilizing the training data set to obtain the trained VAE model for detecting the sensor data abnormity.
In a fifth aspect, an embodiment of the present application further provides an electronic device, including: a memory and a processor, the processor coupled to the memory; the memory is used for storing programs; the processor is configured to invoke a program stored in the memory to perform the method according to the first aspect and/or any possible implementation manner of the first aspect, or to perform the method according to the second aspect.
In a sixth aspect, embodiments of the present application further provide a storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the method provided in the foregoing first aspect and/or any one of the possible implementation manners of the first aspect, or to perform the method provided in the foregoing second aspect.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts. The foregoing and other objects, features and advantages of the application will be apparent from the accompanying drawings. Like reference numerals refer to like parts throughout the drawings. The drawings are not intended to be to scale as practical, emphasis instead being placed upon illustrating the subject matter of the present application.
Fig. 1 shows a schematic flow chart of a data processing method provided in an embodiment of the present application.
Fig. 2 shows a schematic flowchart of a model training method provided in an embodiment of the present application.
Fig. 3 shows a block diagram of an analog data processing apparatus according to an embodiment of the present application.
Fig. 4 shows a block diagram of a model training apparatus according to an embodiment of the present application.
Fig. 5 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, relational terms such as "first," "second," and the like may be used solely in the description herein to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
Further, the term "and/or" in the present application is only one kind of association relationship describing the associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone.
The method solves the problems that the abnormal value in the data collected by the dam sensor cannot be accurately detected by the existing detection method, and the abnormal value cannot be corrected. The embodiment of the application provides a data processing method, through which abnormal values in data acquired by a dam sensor can be quickly and accurately detected, and the abnormal values can be corrected, so that the change of the surrounding environment of a dam can be monitored in real time.
For ease of understanding, the data processing method provided in the embodiment of the present application will be described below with reference to fig. 1. The data processing method comprises the following steps:
step S101: acquiring sequence data which are acquired by the sensor at different times and are arranged in time sequence.
Acquiring sequence data which are acquired by the sensor at different times and are arranged in time sequence. The sequence data includes a plurality of data arranged according to the acquisition time sequence, for example, the sequence data includes tens of thousands of data.
The sensor may refer to various sensors applied to a dam, such as a pressure sensor, a displacement sensor, a flow sensor, etc. And the data in the sequence data is the data acquired by the same sensor.
In an alternative embodiment, the process of acquiring the sequence data collected by the sensors at different times and arranged in time sequence may be: the method includes acquiring initial sequence data which are acquired by a sensor at different times and are arranged in time sequence, and processing abnormal values in the initial sequence data according to missing values to obtain the sequence data. In the embodiment, the data in the acquired initial sequence data is subjected to primary abnormal value detection, obvious abnormal values in the data are primarily screened out, and processing is performed according to the missing values, so that the accuracy of subsequent processing is improved. Among them, preliminary abnormal value detection can be performed by using an n-sigma (generally n ≧ 3) abnormal value detection method. In addition, other methods capable of achieving the preliminary screening of the abnormal value can be used alternatively, for example, a box plot-based four-quadrant spacing method, a DBScan clustering algorithm, and the like.
The above-mentioned processes for outlier detection using the n-sigma method, the four-quadrant spacing method and the DBScan clustering algorithm are well known to those skilled in the art and will not be described herein.
If the sequence data acquired by the sensor is not the sequence data sampled at equal time intervals, the sequence data acquired by the sensor needs to be resampled to obtain the sequence data sampled at equal time intervals. This process usually occurs in the case of variable frequency sampling of the sensor, and if the sequence data collected by the sensor is already sequence data at equal time intervals, it is not necessary to resample the data.
In one embodiment, if the sequence data acquired by the sensor is not sequence data sampled at equal time intervals, the sequence data acquired by the sensor needs to be resampled to obtain the sequence data sampled at equal time intervals, and then abnormal values in the sequence data are processed according to missing values to obtain final sequence data.
Step S202: preprocessing the sequence data, the preprocessing comprising normalization or normalization.
After obtaining the sequence data collected by the sensor, the sequence data is subjected to normalization or normalization preprocessing.
When the sequence data is normalized, the sequence can be calculated firstMean μ and standard deviation σ of the data, using the formula x
Figure SYM_210422144410001
= (x- μ)/σ convert each data x except missing value in sequence data into new data x
Figure SYM_210422144410002
Thus, normalized sequence data can be obtained. In the calculation of the mean μ and the standard deviation σ of the sequence data, only data having a specific numerical value in the sequence data is calculated, and missing data does not participate in the calculation because no numerical value is present.
When the normalization processing is carried out on the sequence data, the normalized sequence data with the numerical value between 0 and 1 can be obtained. When normalizing the sequence data, the following formula x can be used
Figure SYM_210422144410003
=[x-min(x)]/[max(x)-min(x)]Converting each data x except missing values in the sequence data into new data x
Figure SYM_210422144410004
Thus, normalized sequence data can be obtained. Where max (x) and min (x) represent the maximum and minimum values, respectively, in the raw sequence data. When the normalization processing is performed on the sequence data, only data having a specific numerical value among the sequence data is normalized, and when the missing data has no numerical value, the normalization processing is not performed.
Step S203: and performing sliding window processing on the preprocessed sequence data to obtain a plurality of window data.
After the data is normalized or normalized, the pre-processed data can be subjected to sliding window processing, so as to obtain a plurality of window data. The data length of each window of data resulting from the sliding window operation is the same as the data length of the selected sliding window size s, e.g., 50 data each.
The process of performing sliding window processing on the preprocessed sequence data to obtain a plurality of window data comprises the following steps: if the length of the continuous missing value in the preprocessed sequence data exceeds the preset length, dividing the preprocessed sequence data into a plurality of subdata fragments according to the length of the continuous missing value exceeding the preset length, performing interpolation completion processing on the missing value in the subdata fragments of which the data length is not less than the size of the sliding window, and performing sliding window processing on the subdata fragments after the interpolation completion processing to obtain a plurality of window data.
If the length of the continuous missing value in the pre-processed sequence data exceeds a preset length, if the preset length is 60% of the sliding window size s, and if 50 × 60% =30, the pre-processed sequence data is divided into a plurality of sub-data fragments according to the length of the continuous missing value exceeding the preset length, and if the length of 1 continuous missing value exceeds the preset length, the pre-processed sequence data can be divided into 2 sub-data fragments; if the length of 3 consecutive missing values exceeds the preset length, the preprocessed sequence data can be divided into 4 sub-data segments. For the convenience of understanding, it is assumed that the length of the preprocessed sequence data is 1000, and the data number thereof is 1-1000, wherein 100-. Dividing the preprocessed sequence data into a plurality of subdata fragments according to the length of continuous missing values exceeding the preset length, discarding the subdata fragments of which the data length is smaller than the size of the sliding window, performing interpolation completion processing on the missing values in the subdata fragments of which the data length is not smaller than the size of the sliding window, and performing sliding window processing on the subdata fragments after the interpolation completion processing to obtain a plurality of window data.
The interpolation and completion processing on the missing value may be interpolation and completion processing by using a linear interpolation method. It should be noted that the completion method is not limited to the linear interpolation method, and for example, the data may be completed using mean completion or machine learning using correlation features.
Step S104: and inputting the plurality of window data into a pre-trained VAE model for data anomaly detection to be processed, so as to obtain the anomaly data in the plurality of window data and the predicted value of the anomaly data.
After obtaining the plurality of window data, inputting the obtained plurality of window data into a pre-trained VAE model for data anomaly detection for processing, so that the anomaly data in the plurality of window data and the predicted value of the anomaly data can be obtained.
And regarding data i (i is a data number), if the absolute value of the difference value between the predicted value of the data i in the prediction window data and the data i is greater than a preset threshold value, the data i can be considered as abnormal data.
Alternatively, the preset threshold may be the set { | yi-xiI =1, 2, …, the standard deviation can be adjusted at the later stage according to the need, and the preset threshold can be regarded as an adjustable hyper-parameter of the model. Wherein, yiAs a prediction value of data i, xiIs the value of data i in the window data. Further, the preset threshold may also be set to a fixed value, and therefore, the triple standard deviation of the above-described set cannot be understood as a limitation to the preset threshold of the present application.
In an alternative embodiment, the process of inputting the data of multiple windows into the pre-trained VAE model for data anomaly detection may be: and performing reverse order operation on each window data in the plurality of window data to obtain corresponding reverse order window data, and inputting each window data and the corresponding reverse order window data into the VAE model for processing to obtain abnormal data and a predicted value of the abnormal data. Before inputting the VAE model, each window data in the plurality of window data is subjected to reverse order operation to obtain the corresponding reverse order window data, and then each window data and the corresponding reverse order window data are input into the VAE model to be processed, so that abnormal data and the predicted value of the abnormal data can be obtained. In the method, the reverse-order window data is also taken into consideration, which is equivalent to that the data is processed in two directions, so that the judgment of abnormal data is more accurate.
Optionally, in a case of considering the reverse-order window data, inputting each window data and the corresponding reverse-order window data into the VAE model for processing, and obtaining the abnormal data and the predicted value of the abnormal data may be: inputting each window data and the corresponding reverse-order window data into the VAE model, and outputting the prediction window data corresponding to each window data and the prediction reverse-order window data corresponding to each reverse-order window data, and regarding the data i (i is a data number), if the absolute value of the difference value between the prediction value of the data i in the prediction window data and the prediction reverse-order window data and the data i is greater than a preset threshold value, the data i can be considered as abnormal data. That is, if the absolute value of the difference between the predicted value of the data i in the prediction window data and the data i is greater than the preset threshold, and if the absolute value of the difference between the predicted value of the data i in the prediction reverse order window data and the data i is greater than the preset threshold, the data i may be considered as abnormal data.
In the standard case, the input of the VAE model is a window of data such as X1={x1, x2, x3, …, x50The output is AND X1Corresponding set of values Y1={y1, y2, y3, …, y50}. If only the last value of the predicted data corresponding to the window data is taken, e.g. y50The last data, e.g. x, as the window data50In the case of considering the reverse-order window data, each window data and the corresponding reverse-order window data are input into the VAE model for processing, and the process of obtaining the abnormal data and the predicted value of the abnormal data may be: inputting each window data and the corresponding reverse-order window data into the VAE model, so as to output the prediction window data corresponding to each window data and the prediction reverse-order window data corresponding to each reverse-order window data, aiming at the data i (i is a data number) with the data number not less than the size s of the sliding window,and if the absolute value of the difference value between the data and the predicted value of the data i in the ith-s +1 th prediction window data and the ith prediction reverse order window data is larger than a preset threshold value, the data is abnormal data. That is, if the absolute value of the difference between the predicted value of the data i in the i-s +1 th prediction window data and the data i is greater than the preset threshold, and if the absolute value of the difference between the predicted value of the data i in the i-th prediction reverse order window data and the data i is greater than the preset threshold, the data i may be considered as abnormal data.
In this embodiment, only window data, such as X, is taken1Corresponding prediction data Y1Last value of (a), y50As corresponding to the window data such as X1Last data x of50The predicted value of (2). And aiming at data i with the data number not smaller than the sliding window size s, the last data in the i-s +1 th prediction window data is the same as the last data in the ith prediction reverse-order window data. For the sake of easy understanding, the sliding window size s is 50, and the sliding window is performed on 100 data, and the 1 st window data X is described as an example1={x1, x2, x3, …, x50Data X of the 2 nd window2={x2, x3, x4, …, x5150 th window data X of … …50={x50, x51, x52, …, x9951 st window data X51={x51, x52, x53, …, x100}. Accordingly, the 1 st window data X1={x1, x2, x3, …, x50The corresponding reverse-order window data is X
Figure SYM_210422144410005
1={x50, x49, x48, …, x1Data X of the 2 nd window2={x2, x3, x4, …, x51The corresponding reverse-order window data is X
Figure SYM_210422144410006
2={x51, x50, x49, …, x250 th window data X of … …50={x50, x51, x52, …, x99The corresponding reverse-order window data is X50
Figure SYM_210422144410007
={x99, x98, x97, …, x5051 st window data X51={x51, x52, x53, …, x100The corresponding reverse-order window data is X
Figure SYM_210422144410008
51={x100, x99, x98, …, x51}. Suppose data x is to be judged50If the window data is abnormal, i is 50, and correspondingly, the (i-s + 1) th prediction window data is the 1 st prediction window data, namely the prediction data corresponding to the 1 st prediction window data, and can be represented as Y1={y1, y2, y3, …, y50The ith predicted reverse-order window data is the predicted data corresponding to the 50 th reverse-order window data, and can be represented as Y
Figure SYM_210422144410009
50={y
Figure SYM_210422144410010
99, y
Figure SYM_210422144410011
98, y
Figure SYM_210422144411012
97, …, y
Figure SYM_210422144411013
50}. If data x50Prediction value y in the 50 th prediction inverse window data
Figure SYM_210422144411014
50And the data x50Is greater than a predetermined threshold, and the data x50Prediction value y in the 1 st prediction window data50And the data x50Is greater than a preset threshold, the data x can be considered to be50Is the exception data. Due to this embodiment, since the data following the data i are also taken into account, more data are taken into account, resulting in a higher accuracy of the prediction.
In this embodiment, for data i with a data number not smaller than the sliding window size s, 2 window data (i.e., X) can be passedi-s+1And Xi) And obtaining 2 abnormal prediction results, wherein the data point is marked as abnormal only when the 2 abnormal prediction results are abnormal at the same time, and otherwise, the data point is marked as a normal value. The input data includes: under the condition of window data and inverted-order window data, whether the data are abnormal or not can be judged by only one window data, and if the absolute value of the difference value between the predicted value of the data i in the (i-s + 1) th predicted window data and the data i in the (i-s + 1) th predicted inverted-order window data is larger than a preset threshold value, the data are marked as abnormal.
In an alternative embodiment, if the normalized preprocessing is performed on the sequence data, the method further comprises: and performing negation operation on the data in each window data to obtain corresponding negation window data, and performing negation operation on the data in the reverse-order window data corresponding to each window data to obtain the negation reverse-order window data. E.g. for the 1 st window data X1={x1, x2, x3, …, x50Get the negation operation to get the corresponding negation window data X-1={-x1, -x2, -x3, …, -x50}; e.g., reverse-order window data X corresponding to 1 st window data
Figure SYM_210422144411015
1={x50, x49, x48, …, x1Get the inverseOperating to obtain corresponding inverted window data X
Figure SYM_210422144411016
-1={-x50, -x49, -x48, …, -x1}. Correspondingly, the process of inputting each window data and the corresponding reverse-order window data into the VAE model for processing comprises the following steps: and inputting each window data and the corresponding inverted window data, the inverted window data and the inverted window data into the VAE model for processing.
Optionally, the process of inputting each window data and the corresponding inverted window data, and inverted window data into the VAE model for processing may be: inputting each window data and corresponding inverted sequence window data, inverted window data and inverted window data into a VAE model, and outputting prediction window data corresponding to each window data, prediction inverted window data corresponding to each inverted window data and prediction inverted window data corresponding to each inverted window data; for data i with a data number not smaller than the sliding window size s, if the absolute value of the difference value between the data and the respective predicted value of the data i in the ith-s +1 th prediction window data, the ith-s +1 th prediction reversal window data, the ith prediction reversal window data and the ith prediction reversal window data is larger than a preset threshold value, the data is abnormal data. In this embodiment, for data i with a data number not smaller than the sliding window size s, if an absolute value of a difference between a predicted value of the data i in the i-s +1 th prediction window data and the data is greater than a preset threshold, an absolute value of a difference between a predicted value of the data i in the i-s +1 th prediction inversion window data and the data is greater than a preset threshold, an absolute value of a difference between a predicted value of the data i in the i-th prediction inversion window data and the data is greater than a preset threshold, and an absolute value of a difference between a predicted value of the data i in the i-th prediction inversion window data and the data is greater than a preset threshold, that is, absolute values of differences between 4 predicted values and the data are greater than a preset threshold, the data i is abnormal data.
In this embodiment, for data i with a data number not smaller than the sliding window size s, 2 window data (i.e., X) can be passedi-s+1And Xi) And obtaining 4 abnormal prediction results, wherein the data point is marked as abnormal only when the 4 abnormal prediction results are abnormal at the same time, and otherwise, the data point is marked as a normal value. It should be noted that, when the input data includes window data, inverted window data, and inverted window data, it may be determined whether the data is abnormal only by one window data, and if the absolute value of the difference between the predicted value of the data i in the i-s +1 th predicted window data, the i-s +1 th predicted inverted window data, and the i-s +1 th predicted inverted window data and the data is greater than a preset threshold, the data is marked as abnormal.
For easy understanding of the above window data, reverse order window data, inverted reverse order window data, and corresponding predicted window data, predicted reverse order window data, predicted inverted reverse order window data, and predicted inverted reverse order window data, the 1 st window data X is used1={x1, x2, x3, …, x50Explanation will be given for examples. Then to X1Carrying out reverse order operation to obtain reverse order window data X
Figure SYM_210422144411017
1={x50, x49, x48, …, x1}. For window data X1And reverse order window data X
Figure SYM_210422144411018
1Respectively carrying out negation operation to obtain negation window data X-1={-x1, -x2, -x3, …, -x50And inverted window data X
Figure SYM_210422144411019
-1={-x50, -x49, -x48, …, -x1}. Window data X1The corresponding prediction window data may be represented as Y1={y1, y2, y3, …, y50Get window data X back-1The corresponding prediction inversion window data may be represented as Y-1={y-1 1, y-1 2, y-1 3, …, y-1 50}, reverse order window data X
Figure SYM_210422144411020
1The corresponding predicted reverse-order window data may be represented as Y
Figure SYM_210422144411021
1=={y
Figure SYM_210422144411022
50, y
Figure SYM_210422144411023
49, y
Figure SYM_210422144411024
48, …, y
Figure SYM_210422144411025
1Get the inverse window data X
Figure SYM_210422144411026
-1The corresponding prediction inversion window data can be represented as Y
Figure SYM_210422144411027
-1={y
Figure SYM_210422144411028
-1 50, y
Figure SYM_210422144411029
-1 49, y
Figure SYM_210422144411030
-1 48, …, y
Figure SYM_210422144411031
-1 1}。
The VAE model described above may be used only for prediction data, or may not be used for abnormal data detection. Therefore, the above-described exemplary embodiments for both outputting predicted data and for anomaly data detection after obtaining predicted data should not be construed as limiting the VAE model of the present application.
It should be noted that the VAE model is a VAE model trained in advance for data anomaly detection, that is, before step S104, the method further includes: and acquiring a training data set, wherein the training data set comprises a plurality of window data, and training the initial VAE model by using the training data set to obtain the VAE model. In this embodiment, the training data may include only a plurality of window data, and the data of the input model may include only a plurality of window data when the model is applied. In this embodiment, the process of obtaining the training data set may be: acquiring training sequence data which are acquired by a sensor at different times and are arranged according to a time sequence; carrying out standardization or normalization pretreatment on the training sequence data; and performing sliding window processing on the training sequence data after the normalization or normalization preprocessing to obtain a plurality of window data, thereby obtaining the training data set.
In an optional embodiment, the training data set further includes, in addition to the plurality of window data, reverse-order window data corresponding to each of the plurality of window data. In this embodiment, the process of obtaining the training data set may be: acquiring training sequence data which are acquired by a sensor at different times and are arranged according to a time sequence; carrying out standardization or normalization pretreatment on the training sequence data; performing sliding window processing on the training sequence data after the normalization or normalization preprocessing to obtain a plurality of window data; and performing reverse order operation on each window data to obtain a training data set comprising a plurality of window data and corresponding reverse order window data.
If the training sequence data is normalized, in an optional embodiment, the training data set includes, in addition to the plurality of window data and the inverted window data corresponding to each window data, inverted window data obtained by inverting each window data and inverted window data obtained by inverting each inverted window data. In this embodiment, the process of obtaining the training data set may be: acquiring training sequence data which are acquired by a sensor at different times and are arranged according to a time sequence; carrying out standardization or normalization pretreatment on the training sequence data; performing sliding window processing on the training sequence data after the normalization or normalization preprocessing to obtain a plurality of window data; and performing reverse order and inversion operation on each window data to obtain corresponding reverse order window data and inversion window data, and performing inversion operation on each reverse order window data to obtain corresponding inversion reverse order window data, so as to obtain a training data set comprising a plurality of window data and corresponding reverse order window data, inversion window data and inversion reverse order window data.
The above-mentioned training sequence data and the above-mentioned sequence data are merely for convenience of distinction (that is, in the model training stage, a data group acquired by the sensor at different times and arranged in time sequence is referred to as training sequence data, and in the model application stage, a data group acquired by the sensor at different times and arranged in time sequence is referred to as sequence data), and both of them are substantially not substantially different from each other, and refer to a data group acquired by the sensor at different times and arranged in time sequence.
Step S105: and for each abnormal data, correcting the abnormal data according to the predicted value of the abnormal data.
After the abnormal data is obtained, the abnormal data can be corrected according to the predicted value of the abnormal data. The predicted value corresponding to the abnormal data may be subjected to inverse operation processing of preprocessing, and then the abnormal data may be corrected according to the data obtained by the inverse operation processing.
In an optional embodiment, if the input data only includes a plurality of window data, in this embodiment, the abnormal data is corrected to data obtained by performing inverse operation processing of preprocessing the predicted value corresponding to the abnormal data.
If the input data includes a plurality of window data and corresponding reverse-order window data, in this embodiment, the process of correcting the abnormal data according to the predicted value of the abnormal data may be: acquiring a first predicted value of the abnormal data in window data and a second predicted value of the abnormal data in reverse-order window data; obtaining an average value of the first predicted value and the second predicted value; and carrying out the inverse operation of preprocessing on the average value, and correcting the abnormal data into the data obtained by carrying out the inverse operation of preprocessing on the average value. Of course, the first predicted value and the second predicted value may be subjected to the inverse operation of the preprocessing, and then 2 data obtained after the inverse operation processing may be averaged, where the average value is the correction value of the abnormal data.
If the input data comprises a plurality of window data and corresponding reverse-order window data, negation window data and negation reverse-order window data. In this embodiment, the process of correcting the abnormal data according to the predicted value of the abnormal data may be: acquiring a first predicted value of the abnormal data in window data, a second predicted value of the abnormal data in reverse-order window data, a third predicted value of the abnormal data in reverse-order window data and a fourth predicted value of the abnormal data in reverse-order window data; obtaining an average value of the first predicted value, the second predicted value, the third predicted value and the fourth predicted value; the average value is subjected to inverse normalization processing, and the abnormal data is corrected to data obtained by subjecting the average value to inverse normalization processing. Of course, the inverse operation of preprocessing the first predicted value, the second predicted value, the third predicted value and the fourth predicted value may be performed, and then the average value of 4 data obtained after the inverse operation processing is obtained, where the average value is the correction value of the abnormal data.
Wherein the formula of the normalized inverse operation process is: x is the number of
Figure SYM_210422144411032
=σ·xi+ mu; the formula of the normalized inverse operation process is: x is the number of
Figure SYM_210422144411033
=[max(x)-min(x)]·xi+min(x),x
Figure SYM_210422144411034
To correct value, xiIs a predicted value or an average value of predicted values.
Based on the same inventive concept, the embodiment of the present application further provides a model training method, and the following describes the model training method provided by the embodiment of the present application with reference to fig. 2.
Step S201: acquiring sequence data which are acquired by the sensor at different times and are arranged in time sequence.
This step is the same as step S101 described above, and step S201 will not be described here in order to avoid redundancy.
Step S202: pre-processing the sequence data, the pre-processing comprising: normalization or normalization.
This step is the same as step S102 described above, and step S202 will not be described here in order to avoid redundancy.
Step S203: and performing sliding window processing on the preprocessed sequence data to obtain a training data set, wherein the training data set comprises a plurality of window data.
The step is the same as step S103, and for avoiding redundancy, the same parts as step S103 will not be described here, and the detailed parts refer to the corresponding contents in step S103.
It should be noted that, in an embodiment, the training data set includes, in addition to the plurality of window data, reverse-order window data corresponding to each of the plurality of window data. At this time, the process of performing sliding window processing on the preprocessed sequence data to obtain the training data set may be: performing sliding window processing on the preprocessed sequence data to obtain a plurality of window data; and performing reverse order operation on each window data to obtain a training data set comprising a plurality of window data and corresponding reverse order window data.
In one embodiment, the training data set includes, in addition to the plurality of window data and the corresponding inverted window data, inverted window data obtained by inverting each window data and inverted window data obtained by inverting each inverted window data. At this time, the process of performing sliding window processing on the preprocessed sequence data to obtain the training data set may be: performing sliding window processing on the preprocessed sequence data to obtain a plurality of window data; and performing reverse order and inversion operation on each window data to obtain corresponding reverse order window data and inversion window data, and performing inversion operation on each reverse order window data to obtain corresponding inversion reverse order window data, so as to obtain a training data set comprising a plurality of window data and corresponding reverse order window data, inversion window data and inversion reverse order window data.
Step S204: and training the VAE model by using the training data set to obtain the trained VAE model for detecting the sensor data abnormity.
And training the VAE model by utilizing the training data set to obtain the trained VAE model for detecting the sensor data abnormity. And training the VAE model by utilizing a training data set, so that the VAE model learns the information of all data, and the input window data is converted into another group of data which is as close to the original data as possible according to the learned information and is output. Since a large amount of data in all data are normal values, the new data are converted according to the learned normal data information, the VAE has the function of converting the data into the new data with the maximum probability by using the learned information in the whole process, and then abnormal value detection and abnormal value correction are carried out.
After the trained VAE model is obtained, window data are input, and abnormal data and predicted values of the abnormal data in the window data can be obtained. Please refer to the detailed process of step S104 above for how to obtain the abnormal data in the window data.
Based on the same inventive concept, the embodiment of the present application further provides a data processing apparatus 100, as shown in fig. 3, where the data processing apparatus 100 includes: the device comprises an acquisition module 110, a preprocessing module 120, a sliding window module 130 and a processing module 140.
And an obtaining module 110, configured to obtain sequence data that is collected by the sensor at different times and is arranged in a time sequence. Optionally, the obtaining module 110 is specifically configured to: acquiring initial sequence data which are acquired by a sensor at different times and are arranged according to a time sequence; and processing abnormal values in the initial sequence data according to missing values to obtain the sequence data. Optionally, the obtaining module 110 is specifically configured to: and if the sequence data acquired by the sensor are not the sequence data sampled at equal time intervals, resampling the sequence data acquired by the sensor to obtain the sequence data sampled at equal time intervals.
A pre-processing module 120 configured to pre-process the sequence data, wherein the pre-processing includes normalization or normalization.
And a sliding window module 130, configured to perform sliding window processing on the preprocessed sequence data to obtain multiple window data. Optionally, the sliding window module 130 is specifically configured to: if the length of the continuous missing value in the preprocessed sequence data exceeds a preset length, dividing the preprocessed sequence data into a plurality of subdata fragments according to the length of the continuous missing value exceeding the preset length; performing interpolation completion processing on missing values in the subdata segments of which the data lengths are not less than the size of the sliding window in the plurality of subdata segments; and performing sliding window processing on the sub-data segments after interpolation completion processing to obtain the plurality of window data.
The processing module 140 is configured to input the multiple window data into a pre-trained VAE model for data anomaly detection to perform processing, so as to obtain anomaly data in the multiple window data and a predicted value of the anomaly data; and for each abnormal data, correcting the abnormal data according to the predicted value of the abnormal data.
Optionally, the processing module 140 is specifically configured to: performing reverse order operation on each window data in the plurality of window data to obtain corresponding reverse order window data; and inputting each window data and the corresponding reverse-order window data into the VAE model for processing to obtain abnormal data and a predicted value of the abnormal data. The processing module 140 is specifically configured to: inputting each window data and the corresponding reverse-order window data into the VAE model, and outputting the prediction window data corresponding to each window data and the prediction reverse-order window data corresponding to each reverse-order window data; and aiming at data i with the data number not smaller than the sliding window size s, if the absolute value of the difference value between the data i and the predicted value of the data i in the (i-s + 1) th prediction window data and the ith prediction reverse order window data is larger than a preset threshold value, the data is abnormal data. The processing module 140 is specifically configured to: acquiring a first predicted value of the abnormal data in window data and a second predicted value of the abnormal data in reverse-order window data; obtaining an average value of the first predicted value and the second predicted value; and carrying out the inverse operation of the pretreatment on the average value, and correcting the abnormal data into data obtained by carrying out the inverse operation of the pretreatment on the average value.
If the normalized preprocessing is performed on the sequence data, optionally, the processing module 140 is further configured to: performing negation operation on data in each window data to obtain corresponding negation window data; and performing negation operation on the data in the inverted window data corresponding to each window data to obtain the negated inverted window data. Accordingly, the processing module 140 is specifically configured to: and inputting each window data and the corresponding inverted window data, the inverted window data and the inverted window data into the VAE model for processing. The processing module 140 is specifically configured to: inputting each window data and corresponding inverted window data, inverted window data and inverted window data into the VAE model, and outputting prediction window data corresponding to each window data, prediction inverted window data corresponding to each inverted window data and prediction inverted window data corresponding to each inverted window data; for data i with a data number not smaller than the sliding window size s, if the absolute value of the difference value between the data and the respective predicted value of the data i in the ith-s +1 th prediction window data, the ith-s +1 th prediction reversal window data, the ith prediction reversal window data and the ith prediction reversal window data is larger than a preset threshold value, the data is abnormal data. The processing module 140 is specifically configured to: acquiring a first predicted value of the abnormal data in window data, a second predicted value of the abnormal data in reverse-order window data, a third predicted value of the abnormal data in reverse-order window data and a fourth predicted value of the abnormal data in reverse-order window data; obtaining an average value of the first predicted value, the second predicted value, the third predicted value and the fourth predicted value; and carrying out inverse standardization processing on the average value, and correcting the abnormal data into data obtained by carrying out inverse standardization processing on the average value.
Optionally, the data processing apparatus 100 further comprises a model training sub-apparatus. Before inputting the window data into a pre-trained VAE model for data anomaly detection and processing, a model training sub-device is used for: acquiring a training data set, wherein the training data set comprises a plurality of window data; and training an initial VAE model by using the training data set to obtain the VAE model. The training data set further includes reverse-order window data obtained by performing reverse-order operation on each window data in the plurality of window data, and the model training sub-device is specifically configured to: acquiring training sequence data which are acquired by a sensor at different times and are arranged according to a time sequence; normalizing or normalizing the training sequence data; performing sliding window processing on the training sequence data after the normalization or normalization preprocessing to obtain a plurality of window data; and performing reverse operation on each window data to obtain the training data set.
The data processing apparatus 100 according to the embodiment of the present application has the same implementation principle and the same technical effect as those of the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments for the parts of the apparatus embodiments that are not mentioned.
Based on the same inventive concept, the embodiment of the present application further provides a model training apparatus 200, as shown in fig. 4, the model training apparatus 200 includes: an acquisition module 210, a preprocessing module 220, a sliding window module 230, and a training module 240.
The acquiring module 210 is configured to acquire sequence data acquired by the sensor at different times and arranged in a time sequence.
A pre-processing module 220 for pre-processing the sequence data, the pre-processing including normalization or normalization.
A sliding window module 230, configured to perform sliding window processing on the preprocessed sequence data to obtain a training data set, where the training data set includes multiple window data.
And the training module 240 is configured to train the VAE model by using the training data set, so as to obtain a trained VAE model for detecting sensor data anomalies.
The model training apparatus 200 provided in the embodiment of the present application has the same implementation principle and the same technical effect as those of the foregoing method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the foregoing method embodiments for the parts of the embodiment of the apparatus that are not mentioned.
As shown in fig. 5, fig. 5 is a block diagram illustrating a structure of an electronic device 300 according to an embodiment of the present disclosure. The electronic device 300 includes: a transceiver 310, a memory 320, a communication bus 330, and a processor 340.
The elements of the transceiver 310, the memory 320 and the processor 340 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, these components may be electrically coupled to each other via one or more communication buses 330 or signal lines. The transceiver 310 is used for transceiving data. The memory 320 is used for storing a computer program, such as a software functional module shown in fig. 3 or fig. 4, that is, the data processing apparatus 100 in fig. 3 or the model training apparatus 200 in fig. 4. The data processing apparatus 100 or the model training apparatus 200 includes at least one software function module, which may be stored in the memory 320 in the form of software or firmware (firmware) or solidified in an Operating System (OS) of the electronic device 300. The processor 340 is configured to execute executable modules stored in the memory 320, such as software functional modules or computer programs included in the data processing apparatus 100. For example, processor 340, is configured to: acquiring sequence data which are acquired by a sensor at different times and are arranged according to a time sequence; pre-processing the sequence data, the pre-processing comprising normalization or normalization; performing sliding window processing on the preprocessed sequence data to obtain a plurality of window data; inputting the window data into a pre-trained VAE model for data anomaly detection to be processed, and obtaining anomaly data in the window data and a predicted value of the anomaly data; and for each abnormal data, correcting the abnormal data according to the predicted value of the abnormal data.
And a processor 340 for executing software functional modules or computer programs included in the model training apparatus 200. For example, processor 340, is configured to: acquiring sequence data which are acquired by a sensor at different times and are arranged according to a time sequence; pre-processing the sequence data, the pre-processing comprising: normalization or normalization; performing sliding window processing on the preprocessed sequence data to obtain a training data set, wherein the training data set comprises a plurality of window data; and training the VAE model by using the training data set to obtain the trained VAE model for detecting the sensor data abnormity.
The Memory 320 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
Processor 340 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor 340 may be any conventional processor or the like.
The electronic device 300 includes, but is not limited to, a computer, a server, and the like.
The present embodiment also provides a non-volatile computer-readable storage medium (hereinafter, referred to as a storage medium), where the storage medium stores a computer program, and the computer program is executed by the computer, such as the electronic device 300, to execute the data processing method or the model training method described above.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a notebook computer, a server, or an electronic device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (17)

1. A data processing method, comprising:
acquiring sequence data which are acquired by a sensor at different times and are arranged according to a time sequence;
pre-processing the sequence data, the pre-processing comprising normalization or normalization;
performing sliding window processing on the preprocessed sequence data to obtain a plurality of window data;
inputting the window data into a pre-trained VAE model for data anomaly detection to be processed, and obtaining anomaly data in the window data and a predicted value of the anomaly data;
and for each abnormal data, correcting the abnormal data according to the predicted value of the abnormal data.
2. The method of claim 1, wherein inputting the window data into a pre-trained VAE model for data anomaly detection comprises:
performing reverse order operation on each window data in the plurality of window data to obtain corresponding reverse order window data;
and inputting each window data and the corresponding reverse-order window data into the VAE model for processing to obtain abnormal data and a predicted value of the abnormal data.
3. The method of claim 2, wherein if the sequence data is subjected to normalization preprocessing, the method further comprises:
performing negation operation on data in each window data to obtain corresponding negation window data;
performing negation operation on data in the negative sequence window data corresponding to each window data to obtain negation negative sequence window data; accordingly, the number of the first and second electrodes,
inputting each window data and the corresponding reverse-order window data into the VAE model for processing, wherein the processing comprises the following steps:
and inputting each window data and the corresponding inverted window data, the inverted window data and the inverted window data into the VAE model for processing.
4. The method of claim 3, wherein inputting each window data and corresponding inverted window data, and inverted window data into the VAE model for processing comprises:
inputting each window data and corresponding inverted window data, inverted window data and inverted window data into the VAE model, and outputting prediction window data corresponding to each window data, prediction inverted window data corresponding to each inverted window data and prediction inverted window data corresponding to each inverted window data;
for data i with a data number not smaller than the sliding window size s, if the absolute value of the difference value between the data and the respective predicted value of the data i in the ith-s +1 th prediction window data, the ith-s +1 th prediction reversal window data, the ith prediction reversal window data and the ith prediction reversal window data is larger than a preset threshold value, the data is abnormal data.
5. The method of claim 3, wherein modifying the anomaly data based on the predicted value of the anomaly data comprises:
acquiring a first predicted value of the abnormal data in window data, a second predicted value of the abnormal data in reverse-order window data, a third predicted value of the abnormal data in reverse-order window data and a fourth predicted value of the abnormal data in reverse-order window data;
obtaining an average value of the first predicted value, the second predicted value, the third predicted value and the fourth predicted value;
and carrying out inverse standardization processing on the average value, and correcting the abnormal data into data obtained by carrying out inverse standardization processing on the average value.
6. The method of claim 2, wherein modifying the anomaly data based on the predicted value of the anomaly data comprises:
acquiring a first predicted value of the abnormal data in window data and a second predicted value of the abnormal data in reverse-order window data;
obtaining an average value of the first predicted value and the second predicted value;
and carrying out the inverse operation of the pretreatment on the average value, and correcting the abnormal data into data obtained by carrying out the inverse operation of the pretreatment on the average value.
7. The method according to claim 2, wherein inputting each window data and the corresponding reverse-order window data into the VAE model for processing to obtain abnormal data and a predicted value of the abnormal data comprises:
inputting each window data and the corresponding reverse-order window data into the VAE model, and outputting the prediction window data corresponding to each window data and the prediction reverse-order window data corresponding to each reverse-order window data;
and aiming at data i with the data number not smaller than the sliding window size s, if the absolute value of the difference value between the data i and the predicted value of the data i in the (i-s + 1) th prediction window data and the ith prediction reverse order window data is larger than a preset threshold value, the data is abnormal data.
8. The method of claim 1, wherein obtaining sequence data collected by the sensor at different times and arranged in a chronological order comprises:
acquiring initial sequence data which are acquired by a sensor at different times and are arranged according to a time sequence;
and processing abnormal values in the initial sequence data according to missing values to obtain the sequence data.
9. The method of claim 1, wherein obtaining sequence data collected by the sensor at different times and arranged in a chronological order comprises:
and if the sequence data acquired by the sensor are not the sequence data sampled at equal time intervals, resampling the sequence data acquired by the sensor to obtain the sequence data sampled at equal time intervals.
10. The method of claim 1, wherein performing a sliding window process on the pre-processed sequence data to obtain a plurality of window data comprises:
if the length of the continuous missing value in the preprocessed sequence data exceeds a preset length, dividing the preprocessed sequence data into a plurality of subdata fragments according to the length of the continuous missing value exceeding the preset length; performing interpolation completion processing on missing values in the subdata segments of which the data lengths are not less than the size of the sliding window in the plurality of subdata segments;
and performing sliding window processing on the sub-data segments after interpolation completion processing to obtain the plurality of window data.
11. The method of claim 1, wherein before inputting the plurality of window data into a pre-trained VAE model for data anomaly detection for processing, the method further comprises:
acquiring a training data set, wherein the training data set comprises a plurality of window data;
and training an initial VAE model by using the training data set to obtain the VAE model.
12. The method of claim 11, wherein the training data set further comprises reverse-ordered window data obtained by performing a reverse-ordered operation on each of the plurality of window data; acquiring a training data set comprising:
acquiring training sequence data which are acquired by a sensor at different times and are arranged according to a time sequence;
normalizing or normalizing the training sequence data;
performing sliding window processing on the training sequence data after the normalization or normalization preprocessing to obtain a plurality of window data;
and performing reverse operation on each window data to obtain the training data set.
13. A method of model training, comprising:
acquiring sequence data which are acquired by a sensor at different times and are arranged according to a time sequence;
pre-processing the sequence data, the pre-processing comprising: normalization or normalization;
performing sliding window processing on the preprocessed sequence data to obtain a training data set, wherein the training data set comprises a plurality of window data;
and training the VAE model by using the training data set to obtain the trained VAE model for detecting the sensor data abnormity.
14. A data processing apparatus, comprising:
the acquisition module is used for acquiring sequence data which are acquired by the sensor at different moments and are arranged according to a time sequence;
a pre-processing module for pre-processing the sequence data, the pre-processing comprising normalization or normalization;
the sliding window module is used for performing sliding window processing on the preprocessed sequence data to obtain a plurality of window data;
the processing module is used for inputting the window data into a pre-trained VAE model for data anomaly detection to be processed, so that anomaly data in the window data and a predicted value of the anomaly data are obtained; and for each abnormal data, correcting the abnormal data according to the predicted value of the abnormal data.
15. A model training apparatus, comprising:
the acquisition module is used for acquiring sequence data which are acquired by the sensor at different moments and are arranged according to a time sequence;
a pre-processing module for pre-processing the sequence data, the pre-processing comprising normalization or normalization;
the sliding window module is used for performing sliding window processing on the preprocessed sequence data to obtain a training data set, and the training data set comprises a plurality of window data;
and the training module is used for training the VAE model by utilizing the training data set to obtain the trained VAE model for detecting the sensor data abnormity.
16. An electronic device, comprising:
a memory and a processor, the processor coupled to the memory;
the memory is used for storing programs;
the processor for invoking a program stored in the memory to perform the method of any one of claims 1-12 or to perform the method of claim 13.
17. A storage medium having stored thereon a computer program which, when executed by a processor, performs the method of any one of claims 1-12 or performs the method of claim 13.
CN202110462699.3A 2021-04-27 2021-04-27 Data processing method, model training device and electronic equipment Active CN112990372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110462699.3A CN112990372B (en) 2021-04-27 2021-04-27 Data processing method, model training device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110462699.3A CN112990372B (en) 2021-04-27 2021-04-27 Data processing method, model training device and electronic equipment

Publications (2)

Publication Number Publication Date
CN112990372A true CN112990372A (en) 2021-06-18
CN112990372B CN112990372B (en) 2021-08-06

Family

ID=76340396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110462699.3A Active CN112990372B (en) 2021-04-27 2021-04-27 Data processing method, model training device and electronic equipment

Country Status (1)

Country Link
CN (1) CN112990372B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537352A (en) * 2021-07-15 2021-10-22 杭州鲁尔物联科技有限公司 Sensor abnormal value monitoring method and device, computer equipment and storage medium
CN115717590A (en) * 2022-11-22 2023-02-28 西安交通大学 Intelligent abnormity detection method for compressor and related device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107480341A (en) * 2017-07-21 2017-12-15 河海大学 A kind of dam safety comprehensive method based on deep learning
CN108628281A (en) * 2017-03-23 2018-10-09 株式会社日立制作所 Abnormality detection system and method for detecting abnormality
CN109727446A (en) * 2019-01-15 2019-05-07 华北电力大学(保定) A kind of identification and processing method of electricity consumption data exceptional value
CN110427996A (en) * 2019-07-24 2019-11-08 清华大学 The recognition methods of time series abnormal patterns and device based on fuzzy matching
US20200104639A1 (en) * 2018-09-28 2020-04-02 Applied Materials, Inc. Long short-term memory anomaly detection for multi-sensor equipment monitoring
CN111369804A (en) * 2019-07-05 2020-07-03 杭州海康威视系统技术有限公司 Vehicle data processing method and device, electronic equipment and storage medium
CN111931713A (en) * 2020-09-21 2020-11-13 成都睿沿科技有限公司 Abnormal behavior detection method and device, electronic equipment and storage medium
CN112179691A (en) * 2020-09-04 2021-01-05 西安交通大学 Mechanical equipment running state abnormity detection system and method based on counterstudy strategy
CN112651435A (en) * 2020-12-22 2021-04-13 中国南方电网有限责任公司 Self-learning-based detection method for flow abnormity of power network probe

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108628281A (en) * 2017-03-23 2018-10-09 株式会社日立制作所 Abnormality detection system and method for detecting abnormality
CN107480341A (en) * 2017-07-21 2017-12-15 河海大学 A kind of dam safety comprehensive method based on deep learning
US20200104639A1 (en) * 2018-09-28 2020-04-02 Applied Materials, Inc. Long short-term memory anomaly detection for multi-sensor equipment monitoring
CN109727446A (en) * 2019-01-15 2019-05-07 华北电力大学(保定) A kind of identification and processing method of electricity consumption data exceptional value
CN111369804A (en) * 2019-07-05 2020-07-03 杭州海康威视系统技术有限公司 Vehicle data processing method and device, electronic equipment and storage medium
CN110427996A (en) * 2019-07-24 2019-11-08 清华大学 The recognition methods of time series abnormal patterns and device based on fuzzy matching
CN112179691A (en) * 2020-09-04 2021-01-05 西安交通大学 Mechanical equipment running state abnormity detection system and method based on counterstudy strategy
CN111931713A (en) * 2020-09-21 2020-11-13 成都睿沿科技有限公司 Abnormal behavior detection method and device, electronic equipment and storage medium
CN112651435A (en) * 2020-12-22 2021-04-13 中国南方电网有限责任公司 Self-learning-based detection method for flow abnormity of power network probe

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
彭萍萍: "高速列车状态识别系统设计与应用", 《中国优秀硕士学位论文全文数据库》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537352A (en) * 2021-07-15 2021-10-22 杭州鲁尔物联科技有限公司 Sensor abnormal value monitoring method and device, computer equipment and storage medium
CN113537352B (en) * 2021-07-15 2023-08-11 杭州鲁尔物联科技有限公司 Sensor abnormal value monitoring method, device, computer equipment and storage medium
CN115717590A (en) * 2022-11-22 2023-02-28 西安交通大学 Intelligent abnormity detection method for compressor and related device
CN115717590B (en) * 2022-11-22 2024-03-29 西安交通大学 Intelligent abnormality detection method and related device for compressor

Also Published As

Publication number Publication date
CN112990372B (en) 2021-08-06

Similar Documents

Publication Publication Date Title
US11551111B2 (en) Detection and use of anomalies in an industrial environment
CN112990372B (en) Data processing method, model training device and electronic equipment
EP2905665B1 (en) Information processing apparatus, diagnosis method, and program
EP3136297A1 (en) System and method for determining information and outliers from sensor data
EP3191797B1 (en) Gas turbine sensor failure detection utilizing a sparse coding methodology
WO2021189844A1 (en) Detection method and apparatus for multivariate kpi time series, and device and storage medium
US20110307743A1 (en) False alarm mitigation
KR20170125265A (en) Plant system, and fault detecting method thereof
CN113868006B (en) Time sequence detection method and device, electronic equipment and computer storage medium
CN111241744B (en) Low-pressure casting machine time sequence data abnormity detection method based on bidirectional LSTM
US20150272509A1 (en) Diagnostic apparatus and method
KR20170125237A (en) Plant system, and fault detecting method thereof
CN114528934A (en) Time series data abnormity detection method, device, equipment and medium
CN111811567B (en) Equipment detection method based on curve inflection point comparison and related device
CN113255792B (en) Data anomaly point detection method, device, system and storage medium
CN112416662A (en) Multi-time series data anomaly detection method and device
CN114265882A (en) Method, system, device and medium for detecting time sequence signal point abnormity
WO2020094525A1 (en) Solution for machine learning system
CN114595113A (en) Anomaly detection method and device in application system and anomaly detection function setting method
CN113537352A (en) Sensor abnormal value monitoring method and device, computer equipment and storage medium
EP3502979A1 (en) A probability-based detector and controller apparatus, method, computer program
CN113092083B (en) Machine pump fault diagnosis method and device based on fractal dimension and neural network
US20220083039A1 (en) Abnormality detection apparatus, abnormality detection system, and learning apparatus, and methods for the same and nontemporary computer-readable medium storing the same
CN112380073B (en) Fault position detection method and device and readable storage medium
EP4002021A1 (en) System for monitoring a circuit breaker

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant