WO2022095434A1

WO2022095434A1 - Auto-encoder-based data anomaly identification method and apparatus and computer device

Info

Publication number: WO2022095434A1
Application number: PCT/CN2021/097550
Authority: WO
Inventors: 邓悦; 郑立颖; 徐亮
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-11-09
Filing date: 2021-05-31
Publication date: 2022-05-12
Also published as: CN112329865A; CN112329865B

Abstract

The present application relates to the technical field of artificial intelligence, and provided therein are an auto-encoder-based data anomaly identification method and apparatus, a computer device, and a storage medium. The method comprises: receiving an inputted time sequence to be detected; performing, on the basis of the time sequence and according to a preset rule, integration training on a specified quantity of pre-generated and sparsely connected auto-encoders so as to generate a corresponding auto-encoder integrated framework; calculating, by means of the auto-encoder integrated framework, an abnormal score value corresponding to each vector comprised in the time sequence; and identifying, according to the abnormal score value, whether an abnormal data value is present in the time sequence. By means of the present application, it can be accurately identified whether an abnormal data value is present in a time sequence, thus effectively improving the accuracy of identifying abnormal data values in time sequences. The present application further relates to the field of blockchains, and the auto-encoder integrated framework can be stored in a blockchain.

Description

Data anomaly identification method, device and computer equipment based on autoencoder

This application claims the priority of the Chinese patent application filed on November 09, 2020 with the application number 202011242143.5 and the title of the invention is "Auto-encoder-based data anomaly identification method, device and computer equipment", the entire content of which is Incorporated herein by reference.

technical field

The present application relates to the technical field of artificial intelligence, and in particular to a method, device and computer equipment for identifying data anomalies based on an autoencoder.

Background technique

With the advent of the era of big data, various emerging topics such as cloud computing and the Internet of Things have emerged. Among them, mining the potential data that people ultimately need from massive data has become more and more important. Traditional data mining mainly focuses on the data model containing a large amount of data, and pays less attention to the detection of abnormal data. In fact, it is important to analyze and mine useful data, but outliers with important data deviations also contain a lot of useful information, which can affect the data and make the data deformed, so that correct results cannot be obtained. Therefore, for abnormal data detection cannot be ignored either.

In the prior art, most of the current anomaly detection methods are based on statistics, mainly including deviation-based methods, methods based on the distribution of specified recommendation scores, distance-based methods and density-based methods, etc. However, these types of methods need to know the distribution of the data in advance. In addition, most of the statistical-based anomaly detection algorithms are only suitable for mining univariate numerical data, and are not suitable for time series data. If the effect is directly applied to time series data It is not ideal, and the recognition accuracy of abnormal data is low.

technical problem

The main purpose of this application is to provide an autoencoder-based data anomaly identification method, device, computer equipment and storage medium, aiming to solve the problem that the existing anomaly detection method is not applicable to time series data, if it is directly applied to time The effect on sequence data is not ideal, and the recognition accuracy of abnormal data is low.

technical solutions

The present application proposes a method for identifying data anomalies based on an autoencoder, the method comprising the steps of:

Receive the input time series to be detected;

Based on the time series, an integrated training process is performed on a pre-generated specified number of sparsely connected autoencoders according to preset rules, and a corresponding autoencoder integration framework is generated, wherein the sparsely connected autoencoders are obtained by separately Generated after the specified number of cyclic neural network-based autoencoders are processed by unit connection deletion;

Calculate the abnormal score value corresponding to each vector included in the time series through the autoencoder integration framework;

According to the abnormal score value, it is identified whether there is abnormal data value in the time series.

The present application also provides a device for identifying data anomalies based on an autoencoder, including:

a receiving module for receiving the input time series to be detected;

The training module is configured to perform integrated training processing on a pre-generated specified number of sparsely connected autoencoders based on the time series according to preset rules, and generate a corresponding autoencoder integration framework, wherein the sparsely connected autoencoders are The encoder is generated by deleting the unit connection of a specified number of cyclic neural network-based autoencoders respectively;

a calculation module, configured to calculate the abnormal score value corresponding to each vector included in the time series through the autoencoder integration framework;

An identification module, configured to identify whether there is an abnormal data value in the time series according to the abnormal score value.

The present application also provides a computer device, including a memory and a processor, wherein a computer program is stored in the memory, and the processor implements a method for identifying data anomalies based on an autoencoder when the processor executes the computer program, wherein the The method for identifying data anomalies based on the autoencoder includes the following steps:

Receive the input time series to be detected;

The present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements an autoencoder-based data abnormality identification method, wherein the autoencoder-based data anomaly identification method is The data anomaly identification method includes the following steps:

Receive the input time series to be detected;

Based on the time series, an integrated training process is performed on the pre-generated specified number of sparsely connected autoencoders according to preset rules, and a corresponding autoencoder integration framework is generated, wherein the sparsely connected autoencoders are obtained by separately Generated after the specified number of cyclic neural network-based autoencoders are processed by unit connection deletion;

beneficial effect

The method, device, computer equipment and storage medium for data anomaly identification based on the autoencoder provided in this application effectively improve the identification accuracy of abnormal data values in time series, and for abnormal data values in time series The recognition processing efficiency is high.

Description of drawings

1 is a schematic flowchart of a method for identifying data anomalies based on an autoencoder according to an embodiment of the present application;

2 is a schematic structural diagram of an apparatus for identifying data anomalies based on an autoencoder according to an embodiment of the present application;

FIG. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.

BEST MODE FOR CARRYING OUT THE INVENTION

It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application.

In order to facilitate the explanation of the embodiments of the present application, some concepts are briefly introduced below:

The essence of Recurrent Neural Network (RNN) is that it has the ability to remember like a human being. Therefore, the output of RNN depends on the current input and memory. The RNN network introduces directed loops, which can deal with the problem of contextual correlation between those inputs. It breaks the full connection between the layers of the traditional neural network structure, and the transition state of no connection between the nodes of each layer is not the input-hidden layer-output mode. The purpose of RNN: process sequence data RNN content: the current output of a sequence is also related to the previous input. The specific method of RNN: The network will memorize the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layers are no longer unconnected, and the input of the hidden layer includes not only the output of the input layer, but also the upper The output of the hidden layer at a moment. The functional characteristics of RNN: 1. The hidden layer nodes can be interconnected or self-connected; 2. In the RNN network, the output of each step is not necessary, and the input of each step is not necessary. RNN uses: language model and text generation research, machine translation, speech recognition, image description generation.

Autoencoder: It is a kind of neural network. After training, it can try to copy the input to the output. There is a hidden layer h inside the autoencoder, which can generate the encoded representation input. The network can be regarded as composed of two parts: one is composed of An encoder represented by a function h=f(x) and a decoder that generates a reconstruction r=g(h). The processing procedure of traditional autoencoder for time series is: for time series T=<s ₁ ,s ₂ ,...,s _C >, each vector s _t in the time series is fed to the encoding of the autoencoder RNN unit in the generator to perform the following computations:

where s _t is the vector at time step t in the time series, the hidden state

is the output of the previous RNN unit at time step t-1 in the encoder, and f( ) is a nonlinear function. by the above formula

The hidden state of the encoder’s current RNN unit can be obtained at time step t

It is then hidden into the next RNN unit at time step t-1. Also, in the decoder of the autoencoder, the time series is reconstructed in reverse order, i.e.

First, the last hidden state of the encoder is used as the first hidden state of the decoder. Decoder based

The previous hidden state and the previously reconstructed vector of

reconstruct the current vector

and calculate the current hidden state

where g(·) is a nonlinear function.

1 , an autoencoder-based data anomaly identification method according to an embodiment of the present application includes:

S1: Receive the input time series to be detected;

S2: Based on the time series, perform integrated training processing on the pre-generated specified number of sparsely connected autoencoders according to preset rules, and generate a corresponding autoencoder integration framework, wherein the sparsely connected autoencoders are Generated by deleting the unit connection of a specified number of cyclic neural network-based autoencoders respectively;

S3: Calculate the abnormal score value corresponding to each vector included in the time series through the autoencoder integration framework;

S4: Identify whether there is an abnormal data value in the time series according to the abnormal score value.

As described in the above steps S1 to S4, the execution body of this embodiment of the method is a data abnormality identification device based on an autoencoder. In practical applications, the above-mentioned device for identifying data anomalies based on autoencoders can be implemented through virtual devices, such as software codes, or through physical devices written or integrated with relevant execution codes, and can communicate with users through keyboards, mice, Human-computer interaction is carried out by means of remote control, touchpad or voice control device. The apparatus for identifying data anomalies based on an autoencoder in this embodiment can quickly and accurately identify anomalous data values in the time series to be detected. Specifically, the input time series to be detected is received first. Wherein, the above-mentioned time series to be detected is the time series of whether there are abnormal data values to be detected. For example, the time series may be a KPI (Key Performance Indicator, key performance indicator) time series in the server, and the time series includes The data is in vector form. Then, based on the above time series, the pre-generated specified number of sparsely connected autoencoders are integrated and trained according to preset rules to generate a corresponding autoencoder integration framework, wherein the above sparsely connected autoencoders are obtained by separately Generated after a specified number of RNN-based autoencoders perform unit connection removal processing. Specifically, the above-mentioned generation process of the sparsely connected autoencoders may include: first obtaining a specified number of cyclic neural network-based autoencoders. The above-mentioned cyclic neural network-based autoencoder may specifically be a cyclic neural network autoencoder (RSCN) using additional auxiliary connections, and the cyclic neural network autoencoder using additional auxiliary connections adds between each RNN unit. Auxiliary connections, and the above specified number is not specifically limited, and can be set according to actual needs. In this embodiment, the specified number may be taken as N. Then, the unit connection deletion process is performed on each of the above-mentioned cyclic neural network-based autoencoders respectively to generate a corresponding number of sparsely connected autoencoders. Since the autoencoder of the recurrent neural network with additional auxiliary connections adds auxiliary connections between each RNN unit, it is possible to cut off the auxiliary links between some RNN units to make certain differences between the network layers. Specifically, the process of performing unit connection deletion processing on each of the above-mentioned cyclic neural network-based autoencoders may include: for an autoencoder based on a cyclic neural network using additional auxiliary connections, by introducing a sparse weight vector, it can be controlled at each Which auxiliary connections should be removed at time step t.

w _t represents the sparse weight vector,

and

Represents the elements contained in the sparse weight vector. At least one element in the sparse weight vector _wt is not equal to 0, that is, _wt = (0,1), (1,0), (1,1) three cases. Therefore, a sparsely connected autoencoder can be generated based on the above sparse weight vector w _t , and the hidden state of each RNN unit in the obtained sparsely connected autoencoder is calculated as follows:

where s _t is the vector at time step t in the input time series data, h _t-1 is the hidden state at time step t-1 in the encoder of the sparse loop autoencoder, and h _tL is the sparse loop The hidden state at time step tL in the encoder of the autoencoder, w _t is the sparse weight vector, ||w _t || ₀ represents the number of non-zero elements in the vector w _t . Further, the unit connection deletion process can also be performed by randomly deleting connections according to actual needs. For each RNN-based autoencoder, the connection of some RNN units is randomly deleted to obtain a sparsely connected autoencoder, The reconstruction error obtained by the sparsely connected autoencoder after reconstruction processing of the time series is not the same, which effectively expands the scope of application of the autoencoder and enhances the reliability, accuracy and generalization of the autoencoder. . In addition, assuming that the above specified number is N, then N self-encoders of the above-mentioned sparse loops are obtained, and the self-encoder of each sparse loop consists of an encoder E _i and a decoder D _i , 1≤i≤N, and each Each sparse recurrent autoencoder has its different sparse weight vector. In addition, the above-mentioned autoencoder integration framework may include an independent framework and a shared framework. Specifically, the corresponding first objective function can be generated based on all the vectors contained in the above time series and the reconstruction vector corresponding to each vector contained in the above time series generated by the sparsely connected autoencoder, and then based on the first objective function. The objective function is to train each sparsely connected autoencoder separately to obtain the above autoencoder ensemble framework. And, the corresponding second objective function can be generated based on all the vectors contained in the above-mentioned time series, the reconstruction vector corresponding to each vector contained in the above-mentioned time series generated by the sparsely connected autoencoder, and the preset shared hidden state, and then Based on the second objective function, all sparsely connected autoencoders are jointly trained to obtain the above-mentioned autoencoder ensemble framework. After the above-mentioned autoencoder integration framework is obtained, the abnormal score value corresponding to each vector included in the above-mentioned time series is calculated by the above-mentioned autoencoder integration framework. Among them, the reconstruction error corresponding to each vector contained in the time series can be calculated and generated by each autoencoder included in the above-mentioned autoencoder integration framework, and then for any specified vector in the time series, the calculation and The median of all the above-mentioned reconstruction errors corresponding to the above-mentioned designated vector, and then the abnormal score value corresponding to the designated vector can be obtained. Finally, according to the above abnormal score value, it is identified whether there is abnormal data value in the above time series. Wherein, whether there are abnormal data values in the above-mentioned time series can be identified according to a preset abnormal threshold, and if the abnormal score value corresponding to any one of the specified vectors in the above-mentioned time series is greater than the abnormal threshold, the specified vector is determined as abnormal data value. And if the abnormal score value corresponding to the designated vector is not greater than the abnormal threshold, the designated vector is determined to be a normal data value, that is, the designated vector does not belong to an abnormal data value. Different from the existing anomaly detection methods, this embodiment adopts an autoencoder-based integration framework to perform data anomaly identification processing for time series. When receiving the input time series to be detected, the original The autoencoder of the recurrent neural network is improved to generate a sparsely connected autoencoder, and then the pre-generated sparsely connected autoencoder is integrated and trained based on the time series to generate an autoencoder that can be used for outlier identification of time series data. Integration framework, so that the autoencoder integration framework can be used to calculate the abnormal score value corresponding to each vector included in the above time series, and then can quickly and accurately identify whether there is abnormality in the above time series according to the abnormal score value The data value effectively improves the identification accuracy of abnormal data values in the time series, and the identification processing efficiency for abnormal data values in the time series is high.

Further, in an embodiment of the present application, the above step S2 includes:

S200: Acquire all the first vectors included in the time series; and,

S201: Obtain a one-to-one corresponding first reconstruction vector generated by each of the sparsely connected autoencoders based on each of the first vectors;

S202: Based on the first vector and the first reconstruction vector, generate a corresponding first objective function;

S203: Perform training on each of the sparsely connected autoencoders based on the first objective function to obtain a trained first autoencoder, wherein the number of the first autoencoders is related to the sparse connections the same number of autoencoders;

S204: Perform integrated processing on all the first autoencoders to generate corresponding independent frames, wherein the independent frames include a specified number of the first autoencoders, and each of the first autoencoders no interaction occurs;

S205: Determine the independent frame as the autoencoder integration frame.

As described in the above steps S200 to S205, the above-mentioned auto-encoder integration framework may be an independent framework generated based on all the above-mentioned sparsely connected auto-encoders, and the training process of the independent framework is to independently train each different sparsely connected auto-encoders , so each sparsely connected autoencoder does not interact during the training phase, nor does each autoencoder contained in the generated independent frame interact. Specifically, based on the above-mentioned time series, the pre-generated specified number of sparsely connected autoencoders are subjected to integrated training processing according to preset rules, and the step of generating the corresponding autoencoder integration framework may include: first obtaining the above-mentioned time series including all first vectors of . The above input time series to be detected may be: T=<s ₁ ,s ₂ ,...,s _C >, and the vectors s ₁ ,s ₂ ,...,s _C included in the time series T can be regarded as the above first vector. and simultaneously acquire and obtain the first reconstruction vectors that are generated by each of the sparsely connected autoencoders based on each of the above-mentioned first vectors. , will generate a reconstructed time series corresponding to the time series

and reconstruct the time series

vector contained in

That is, it can be regarded as the first reconstruction vector corresponding to the above-mentioned first vector respectively. Then, based on the first vector and the first reconstruction vector, a corresponding first objective function is generated. Wherein, the difference between the input vector in the above time series and the corresponding reconstruction vector generated by the sparsely connected autoencoder corresponding to the input vector can be used as the first objective function J _i , and the The first objective function J _i is used to independently train each sparsely connected autoencoder. Specifically, the first objective function may be:

Among them, J _i is the first objective function, s _t is the vector at time step t in the time series,

represents that the decoder D _i contained in the autoencoder from the sparse connections at time step t generates a reconstructed vector for vector s _t ,

is the L2-norm of the vector. After the first objective function is obtained, each of the sparsely connected autoencoders is trained based on the first objective function to obtain a trained first autoencoder, wherein the first autoencoder is The number is the same as the number of sparsely connected autoencoders above. After the above-mentioned first self-encoders are obtained, all the above-mentioned first self-encoders are integrated and processed to generate corresponding independent frames. Wherein, the above-mentioned independent frame includes a specified number of the above-mentioned first auto-encoders, and there is no interaction among the above-mentioned first auto-encoders. Specifically, all the above-mentioned first autoencoders can be integrated into a preset integrated framework to generate the above-mentioned independent framework. In addition, each decoder D _i in an independent frame will have an independent hidden state

used as initial hidden state and corresponding weight matrix

linear combination of . Finally, when the above-mentioned independent framework is obtained, the above-mentioned independent framework is determined as the above-mentioned self-encoder integrated framework. In this embodiment, an independent frame composed of a specified number of sparsely connected autoencoders with different network structures is generated through training. Since the reconstruction error from multiple autoencoders will be considered when using the independent frame for anomaly detection, This helps to reduce the variance of the overall reconstruction error, so that the anomaly score value corresponding to each vector included in the above time series can be accurately calculated subsequently according to the independent framework, and then according to the anomaly score value, to quickly and accurately Identify whether there are abnormal data values in the above time series, so as to effectively improve the recognition efficiency and recognition accuracy of abnormal data values in the time series.

S210: Acquire a preset shared layer, wherein the shared layer includes a shared hidden state;

S211: Perform weight sharing processing on all the sparsely connected autoencoders through the sharing layer;

S212: Perform L1 regularization processing on the shared hidden state to obtain a processed shared hidden state;

S213: Acquire all second vectors included in the time series; and,

S214: Obtain a one-to-one corresponding second reconstruction vector generated by each of the sparsely connected autoencoders based on each of the second vectors;

S215: Generate a corresponding second objective function according to the processed shared hidden state, the second vector and the second reconstruction vector;

S216: Perform joint training on all the sparsely connected autoencoders based on the second objective function to obtain a trained second autoencoder, wherein the number of the second autoencoders is the same as the number of the sparsely connected autoencoders The same number of autoencoders;

S217: Integrate all the second autoencoders to generate a corresponding shared frame, wherein the shared frame includes a specified number of the second autoencoders, and each of the second autoencoders There is interaction between;

S218: Determine the shared framework as the autoencoder integration framework.

As described in the above steps S210 to S218, the above-mentioned auto-encoder integration framework may be a shared framework generated based on all the above-mentioned sparsely connected auto-encoders and a preset shared layer, including different auto-encoders, and due to the shared The framework includes the interaction between different autoencoders, so compared with the above independent framework, the recognition accuracy of abnormal data values in time series can be further improved. Specifically, based on the above-mentioned time series, the pre-generated specified number of sparsely connected autoencoders are subjected to integrated training processing according to preset rules, and the step of generating a corresponding autoencoder integration framework may include: first obtaining a preset shared layer, and weight sharing processing is performed on all the sparsely connected autoencoders through the sharing layer, wherein the sharing layer includes a shared hidden state. In addition, the above shared layer is the last hidden state of the encoder that connects all the above sparse connections

with the corresponding weight matrix

A linear combination of, specifically, shared layers, i.e. shared hidden states

Then, L1 regularization is performed on the above shared hidden state to obtain the processed shared hidden state. Among them, by performing L1 regularization processing on the shared hidden state, the shared hidden state can be

Sparse. This in turn prevents some encoders from overfitting the above time series, making the decoder more applicable and less susceptible to outlier data values. After obtaining the shared hidden state processed above, all second vectors included in the above time series are obtained. The above input time series to be detected may be: T=<s ₁ ,s ₂ ,...,s _C >, and the vectors s ₁ ,s ₂ ,...,s _C included in the time series T can be regarded as the above second vector. and simultaneously acquiring the one-to-one corresponding second reconstruction vectors generated by the sparsely connected autoencoders based on the second vectors. Wherein, after each of the above-mentioned sparsely connected autoencoders performs reconstruction processing on the above-mentioned time series, a reconstructed time series corresponding to the time series is generated.

and reconstruct the time series

vector contained in

That is, it can be regarded as the second reconstruction vector corresponding to the above-mentioned second vector respectively. Then, a corresponding second objective function is generated according to the processed shared hidden state, the second vector, and the second reconstruction vector. Specifically, the above-mentioned second objective function may specifically be:

where λ is the weight parameter controlling the importance of the L1 regularization term, s _t is the vector at time step t in the time series,

represents the reconstructed vector from the decoder _Di at time step t,

is the shared hidden state after L1 regularization,

is the L2-norm of the vector, and J _i is the first objective function above. After the second objective function is obtained, joint training is performed on all the sparsely connected autoencoders based on the second objective function to obtain a trained second autoencoder, wherein the number of the second autoencoders Same as the number of sparsely connected autoencoders above. Afterwards, all the above-mentioned second autoencoders are integrated to generate the corresponding shared frame. Wherein, the shared framework includes a specified number of the second auto-encoders, and there is interaction between the second auto-encoders. In addition, all of the above second autoencoders can be integrated into a preset integration framework to generate the above shared framework. Finally, the above-mentioned shared framework is determined as the above-mentioned autoencoder integration framework. In this embodiment, a shared frame consisting of a specified number of sparsely connected autoencoders with different network structures is generated through training. Since the reconstruction error from multiple autoencoders will be considered when the shared frame is used for anomaly detection, Moreover, there can be interactions between the sparsely connected autoencoders, which is more helpful to reduce the variance of the overall reconstruction error, so that the corresponding value of each vector included in the above time series can be accurately calculated according to the shared framework. The abnormal score value, and then according to the abnormal score value, to quickly and accurately identify whether there is an abnormal data value in the above time series, so as to effectively improve the identification efficiency and accuracy of the abnormal data value in the time series.

Further, in an embodiment of the present application, the above step S3 includes:

S300: Calculate and generate a reconstruction error corresponding to a specified vector by each autoencoder included in the autoencoder integration framework, wherein the specified vector is any one of all vectors included in the time series;

S301: Calculate the median of all the reconstruction errors;

S302: Determine the median as a specified abnormal score value corresponding to the specified vector in the time series.

As described in the above steps S300 to S302, the above-mentioned step of calculating the abnormal score value corresponding to each vector included in the above-mentioned time series through the above-mentioned autoencoder integration framework may specifically include: Each of the autoencoders calculates and generates a reconstruction error corresponding to a specified vector, where the specified vector is any one of all vectors included in the above time series. Specifically, assuming that the above specified number is N, for any vector _sk in the original time series T=<s ₁ , s ₂ , ..., s _C >, the N auto-encoders included in the auto-encoder integration framework can be used. The generator generates N reconstruction errors {a ₁ , a ₂ , . . . , a _N } corresponding to the vector _sk . In addition, the generating process of the reconstruction error may include: generating reconstructed time series corresponding to the above-mentioned time series by using N autoencoders included in the autoencoder integration framework, and then extracting the corresponding time series from the reconstructed time series, respectively. The reconstruction vector corresponding to the vector _sk is called, so that the vector _sk and the calculation formula related to the reconstruction vector corresponding to the vector sk are called to calculate the reconstruction error corresponding to the quantity _sk . Then calculate the median of all the above reconstruction errors. Wherein, the above median can be calculated by the formula OS( _sk )=median{a ₁ , a ₂ , . . . , a _N }. Finally, the above median is determined as the specified anomaly score value corresponding to the above specified vector in the above time series. Among them, in order to reduce the influence of the reconstruction error from the autoencoder, the median of the N reconstruction errors is therefore used as the final outlier score value of the vector _sk . It should be noted that the above-mentioned independent framework and the above-mentioned shared framework use the same calculation formula to calculate the abnormal score value corresponding to each vector included in the above-mentioned time series. This embodiment calculates and generates a reconstruction error corresponding to the specified vector by using each autoencoder included in the autoencoder integration framework, and the median of all the above reconstruction errors is corresponding to the above specified vector in the above time series. The specified abnormal score value of , so as to accurately calculate and calculate the abnormal score value corresponding to each vector included in the above time series, which is helpful to quickly and accurately identify whether the above time series exists in the above time series according to the abnormal score value. Abnormal data values to effectively improve the identification efficiency and accuracy of abnormal data values in time series.

Further, in an embodiment of the present application, the above step S300 includes:

S3000: Perform reconstruction processing on the time series by using a specific autoencoder to obtain a specific reconstructed time series corresponding to the time series, where the specific autoencoder is a component included in the autoencoder integration framework any one of all autoencoders;

S3001: Extract a specific reconstruction vector corresponding to the specified vector from the specific reconstruction time series;

S3002: Calculate a specific reconstruction error corresponding to the specified vector according to the specified vector and the specified reconstruction vector.

As described in the above steps S3000 to S3002, the above-mentioned step of calculating and generating the reconstruction error corresponding to the specified vector by each autoencoder included in the above-mentioned autoencoder integration framework may specifically include: The time series is reconstructed to obtain a specific reconstructed time series corresponding to the above-mentioned time series, wherein the above-mentioned specific auto-encoder is any one of all the auto-encoders included in the above-mentioned auto-encoder integration framework. The above input time series to be detected may be: T=<s ₁ , s ₂ ,...,s _C >, and the specific autoencoder can generate a corresponding time series after reconstructing the above time series. Reconstructing time series

1≤i≤N. Then, a specific reconstruction vector corresponding to the above-mentioned specified vector is extracted from the above-mentioned specific reconstruction time series. where, for the specified vector _sk in the above time series, the reconstructed time series can be generated from a specific autoencoder

Extract the specific reconstruction vector corresponding to the specified vector _sk from

Finally, according to the above-mentioned designated vector and the above-mentioned specific reconstruction vector, a specific reconstruction error corresponding to the above-mentioned designated vector is calculated. Among them, the formula can be

to calculate the specific reconstruction error corresponding to the specified vector above. Further, by formula

to calculate the specified anomaly score value corresponding to the above specified vector in the above time series. So that the reconstruction error corresponding to the specified vector can be calculated and generated according to each autoencoder included in the autoencoder integration framework in the future, so as to quickly calculate the abnormal score value corresponding to each vector included in the above time series, and then It is beneficial to quickly and accurately identify whether there is an abnormal data value in the above-mentioned time series according to the abnormal score value, so as to effectively improve the identification efficiency and identification accuracy of the abnormal data value in the time series.

Further, in an embodiment of the present application, the above step S4 includes:

S400: Obtain a preset abnormal threshold;

S401: Determine whether there is a specified score value with a value greater than the abnormal threshold value among all the abnormal score values;

S402: If yes, filter out the specified score value from all the abnormal score values;

S403: Find a third vector corresponding to the specified score value from the time series;

S404: Determine the third vector as the abnormal data value.

As described in the above steps S400 to S404, the above step of identifying whether there is an abnormal data value in the above time series according to the above abnormal score value may specifically include first obtaining a preset abnormal threshold value. The value of the above abnormal threshold is not specifically limited, and can be generated based on corresponding statistical calculation of historical time series data, or can be set according to actual needs. Then, it is judged whether there is a specified score value with a value greater than the above-mentioned abnormal threshold value among all the above-mentioned abnormal score values. If there is a designated score value whose value is greater than the above abnormal threshold value among all the above abnormal score values, the above designated score value is filtered out from all the above abnormal score values. Then, the third vector corresponding to the above specified score value is found from the above time series. Finally, when the third vector is obtained, the third vector is determined as the abnormal data value. In this embodiment, the autoencoder integration framework is used to calculate the abnormal score value corresponding to each vector included in the above time series. By comparing the abnormal score value with the preset abnormal threshold value, the specified score value that is greater than the above abnormal score value among all abnormal score values is found from the time series, and the corresponding score value corresponding to the specified score value in the time series will be found. The third vector of is determined as the abnormal data value, which realizes the accurate identification of the abnormal data value contained in the time series, and effectively improves the identification efficiency of the abnormal data in the time series.

Further, in an embodiment of the present application, after the above step S404, it includes:

S405: Screen out a fourth vector other than the third vector from the time series;

S406: mark the second vector as a normal data value;

S407: Obtain the first quantity corresponding to the third vector; and,

S408: Obtain a second quantity corresponding to the fourth vector;

S409: Generate an abnormality analysis report corresponding to the time series according to the abnormal data value, the first quantity, the normal data, and the second quantity;

S410: Display the abnormality analysis report.

As described in the above steps S405 to S410, after obtaining the abnormal data value in the above-mentioned time series, a corresponding abnormality analysis report may be further generated according to the abnormal data value and related data. Specifically, the above-mentioned third vector is determined as described above. After the step of identifying the abnormal data value, the method may further include: firstly screening out a fourth vector other than the third vector from the time series, and marking the second vector as a normal data value. Then get the first quantity corresponding to the above third vector. and simultaneously acquiring the second quantity corresponding to the above-mentioned fourth vector. Then, an anomaly analysis report corresponding to the above-mentioned time series is generated according to the above-mentioned abnormal data value, the above-mentioned first quantity, the above-mentioned normal data, and the above-mentioned second quantity. Wherein, one of the above-mentioned abnormality analysis reports at least includes the above-mentioned abnormal data value, the above-mentioned first quantity, the above-mentioned normal data, and the above-mentioned second quantity. Finally, after obtaining the above-mentioned abnormality analysis report, the above-mentioned abnormality analysis report is displayed, so that the user can clearly understand the specific distribution and scale of abnormal data values contained in the time series to be detected according to the abnormality analysis report, as well as the normal The specific distribution and scale of data values. The display method of the above exception analysis report is not specifically limited, and can be set according to implementation requirements.

The method for identifying data anomalies based on the autoencoder in the embodiments of the present application can also be applied to the blockchain field, for example, the data such as the above-mentioned autoencoder integration framework is stored on the blockchain. By using the blockchain to store and manage the above-mentioned self-encoder integration framework, the security and immutability of the above-mentioned self-encoder integration framework can be effectively guaranteed.

The above-mentioned blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

The underlying platform of the blockchain can include processing modules such as user management, basic services, smart contracts, and operation monitoring. Among them, the user management module is responsible for the identity information management of all blockchain participants, including maintenance of public and private key generation (account management), key management, and maintenance of the corresponding relationship between the user's real identity and blockchain address (authority management), etc. When authorized, supervise and audit the transactions of some real identities, and provide rule configuration for risk control (risk control audit); the basic service module is deployed on all blockchain node devices to verify the validity of business requests, After completing the consensus on valid requests, record them in the storage. For a new business request, the basic service first adapts the interface for analysis and authentication processing (interface adaptation), and then encrypts the business information through the consensus algorithm (consensus management), After encryption, it is completely and consistently transferred to the shared ledger (network communication), and records are stored; the smart contract module is responsible for the registration and issuance of contracts, as well as contract triggering and contract execution. Developers can define contract logic through a programming language and publish to On the blockchain (contract registration), according to the logic of the contract terms, call the key or other events to trigger execution, complete the contract logic, and also provide the function of contract upgrade and cancellation; the operation monitoring module is mainly responsible for the deployment in the product release process , configuration modification, contract settings, cloud adaptation, and visual output of real-time status in product operation, such as: alarms, monitoring network conditions, monitoring node equipment health status, etc.

Referring to FIG. 2 , an embodiment of the present application also provides a device for identifying data anomalies based on an autoencoder, including:

A receiving module 1, for receiving the input time series to be detected;

The training module 2 is configured to perform integrated training processing on a pre-generated specified number of sparsely connected autoencoders based on the time series according to preset rules, and generate a corresponding autoencoder integration framework, wherein the sparsely connected autoencoders are The autoencoder is generated by removing the unit connection of a specified number of cyclic neural network-based autoencoders respectively;

Calculation module 3, for calculating the abnormal score value corresponding to each vector included in the time series through the autoencoder integration framework;

The identification module 4 is configured to identify whether there is an abnormal data value in the time series according to the abnormal score value.

In this embodiment, for details of the realization process of the functions and functions of the receiving module, the training module, the calculation module and the identification module in the above-mentioned self-encoder-based data abnormality identification device, please refer to the corresponding data in the above-mentioned self-encoder-based data abnormality identification method. The implementation process of steps S1 to S4 will not be repeated here.

Further, in an embodiment of the present application, the above-mentioned training module includes:

a first acquiring unit, configured to acquire all the first vectors included in the time series; and,

a second obtaining unit, configured to obtain a one-to-one corresponding first reconstruction vector generated by each of the sparsely connected autoencoders based on each of the first vectors;

a first generating unit for generating a corresponding first objective function based on the first vector and the first reconstruction vector;

a first training unit, configured to separately train each of the sparsely connected autoencoders based on the first objective function to obtain a trained first autoencoder, wherein the number of the first autoencoders is the same as the number of sparsely connected autoencoders;

The first processing unit is configured to perform integrated processing on all the first self-encoders to generate a corresponding independent frame, wherein the independent frame includes a specified number of the first self-encoders, and each of the There is no interaction between the first autoencoders;

a first determining unit, configured to determine the independent frame as the autoencoder integrated frame.

In this embodiment, the functions and functions of the first acquisition unit, the second acquisition unit, the first generation unit, the first training unit, the first processing unit, and the first determination unit in the above-mentioned autoencoder-based data abnormality identification device For details of the implementation process, please refer to the implementation process corresponding to steps S200 to S205 in the above-mentioned autoencoder-based data abnormality identification method, which will not be repeated here.

a third acquiring unit, configured to acquire a preset shared layer, wherein the shared layer includes a shared hidden state;

a second processing unit, configured to perform weight sharing processing on all the sparsely connected autoencoders through the sharing layer;

a third processing unit, configured to perform L1 regularization processing on the shared hidden state to obtain the processed shared hidden state;

a fourth acquiring unit, configured to acquire all the second vectors contained in the time series; and,

a fifth obtaining unit, configured to obtain a one-to-one corresponding second reconstruction vector generated by each of the sparsely connected autoencoders based on each of the second vectors;

a second generating unit, configured to generate a corresponding second objective function according to the processed shared hidden state, the second vector and the second reconstruction vector;

The second training unit is configured to jointly train all the sparsely connected autoencoders based on the second objective function to obtain a trained second autoencoder, wherein the number of the second autoencoders is the same as The number of sparsely connected autoencoders is the same;

The fourth processing unit is configured to perform integrated processing on all the second auto-encoders to generate a corresponding shared frame, wherein the shared frame includes a specified number of the second auto-encoders, and each of the There is interaction between the second autoencoders;

The second determining unit is configured to determine the shared frame as the autoencoder integration frame.

In this embodiment, the third obtaining unit, the second processing unit, the third processing unit, the fourth obtaining unit, the fifth obtaining unit, the second generating unit, the second training unit in the above-mentioned autoencoder-based data anomaly identification device The implementation process of the functions and functions of the unit, the fourth processing unit and the second determining unit can be found in the implementation process corresponding to steps S210 to S218 in the above-mentioned autoencoder-based data abnormality identification method, which will not be repeated here.

Further, in an embodiment of the present application, the above calculation module includes:

A first calculation unit, configured to calculate and generate a reconstruction error corresponding to a specified vector through each autoencoder included in the autoencoder integration framework, where the specified vector is one of all vectors included in the time series any vector of ;

a second computing unit for computing the median of all the reconstruction errors;

A third determining unit, configured to determine the median as a specified abnormal score value corresponding to the specified vector in the time series.

In this embodiment, for the implementation process of the functions and functions of the first calculation unit, the second calculation unit and the third determination unit in the above-mentioned self-encoder-based data abnormality identification device, please refer to the above-mentioned self-encoder-based data abnormality identification for details. The implementation process corresponding to steps S300 to S302 in the method will not be repeated here.

Further, in an embodiment of the present application, the above-mentioned first computing unit includes:

a processing subunit, configured to perform reconstruction processing on the time series by using a specific autoencoder to obtain a specific reconstructed time series corresponding to the time series, wherein the specific autoencoder is the integration of the autoencoder Any one of all autoencoders included in the framework;

an extraction subunit for extracting a specific reconstruction vector corresponding to the specified vector from the specific reconstruction time series;

A calculation subunit, configured to calculate a specific reconstruction error corresponding to the specified vector according to the specified vector and the specified reconstruction vector.

In this embodiment, for details of the implementation process of the functions and functions of the processing subunit, the extraction subunit, and the calculation subunit in the above-mentioned self-encoder-based data abnormality identification method, please refer to the corresponding data in the above-mentioned self-encoder-based data abnormality identification method. The implementation process of steps S3000 to S3002 will not be repeated here.

Further, in an embodiment of the present application, the above-mentioned identification module includes:

a sixth obtaining unit, used for obtaining a preset abnormal threshold;

a judging unit for judging whether in all the abnormal score values, whether there is a specified score value whose value is greater than the abnormal threshold;

a first screening unit, configured to filter out the specified score value from all the abnormal score values if it is;

a search unit, configured to search out a third vector corresponding to the specified score value from the time series;

a fourth determination unit, configured to determine the third vector as the abnormal data value.

In this embodiment, the implementation process of the functions and functions of the sixth acquiring unit, the judging unit, the first screening unit, the searching unit and the fourth determining unit in the above-mentioned self-encoder-based data abnormality identification device are detailed in the above-mentioned self-based The implementation process corresponding to steps S400 to S404 in the data abnormality identification method of the encoder will not be repeated here.

a second screening unit, configured to screen out a fourth vector other than the third vector from the time series;

a marking unit for marking the second vector as a normal data value;

a seventh obtaining unit, for obtaining the first quantity corresponding to the third vector; and,

an eighth obtaining unit, configured to obtain a second quantity corresponding to the fourth vector;

a third generating unit, configured to generate an anomaly analysis report corresponding to the time series according to the second screening unit, the first quantity, the normal data, and the second quantity;

The display unit is used to display the abnormality analysis report.

In this embodiment, the implementation process of the functions and roles of the second screening unit, the marking unit, the seventh acquiring unit, the eighth acquiring unit, the third generating unit and the displaying unit in the above-mentioned autoencoder-based data abnormality identification device is specific For details, please refer to the implementation process corresponding to steps S405 to S410 in the above-mentioned method for identifying data anomaly based on an autoencoder, which will not be repeated here.

Referring to FIG. 3 , an embodiment of the present application further provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 3 . The computer equipment includes a processor, memory, a network interface, a display screen, an input device and a database connected by a system bus. Among them, the processor of the computer equipment is designed to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium, an internal memory. The nonvolatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the execution of the operating system and computer programs in the non-volatile storage medium. The database of the computer device is used to store data such as time series to be detected, sparsely connected autoencoders, autoencoder integration frameworks, abnormal score values, and abnormal data values. The network interface of the computer device is used to communicate with an external terminal through a network connection. The display screen of the computer equipment is an indispensable graphic and text output device in the computer, which is used to convert digital signals into optical signals, so that text and graphics can be displayed on the screen of the display screen. The input device of the computer equipment is the main device for information exchange between the computer and the user or other devices, and is used to transmit data, instructions and certain flag information to the computer. When the computer program is executed by the processor, a method for identifying data anomalies based on an autoencoder is realized.

The above-mentioned processor performs the steps of the above-mentioned self-encoder-based data anomaly identification method:

Receive the input time series to be detected;

Those skilled in the art can understand that the structure shown in FIG. 3 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the apparatus or computer equipment to which the solution of the present application is applied.

An embodiment of the present application further provides a computer-readable storage medium, the computer-readable storage medium may be non-volatile or volatile, and a computer program is stored thereon, and the computer program is implemented when executed by a processor The self-encoder-based data abnormality identification method shown in any of the above-mentioned exemplary embodiments, the self-encoder-based data abnormality identification method comprises the following steps:

Receive the input time series to be detected;

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the computer program can be stored in a non-volatile computer-readable storage In the medium, when the computer program is executed, it may include the processes of the above-mentioned method embodiments. Wherein, any reference to memory, storage, database or other medium provided in this application and used in the embodiments may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double-rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

It should be noted that, herein, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, apparatus, article or method comprising a series of elements includes not only those elements, It also includes other elements not expressly listed or inherent to such a process, apparatus, article or method. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, apparatus, article, or method that includes the element.

The above are only the preferred embodiments of the present application, and are not intended to limit the scope of the patent of the present application. Any equivalent structure or equivalent process transformation made by using the contents of the description and drawings of the present application, or directly or indirectly applied to other related The technical field is similarly included in the scope of patent protection of this application.

Claims

An autoencoder-based data anomaly identification method, comprising:

Receive the input time series to be detected;

Based on the time series, an integrated training process is performed on a pre-generated specified number of sparsely connected autoencoders according to preset rules, and a corresponding autoencoder integration framework is generated, wherein the sparsely connected autoencoders are obtained by separately Generated after the specified number of cyclic neural network-based autoencoders are processed by unit connection deletion;

Calculate the abnormal score value corresponding to each vector included in the time series through the autoencoder integration framework;

According to the abnormal score value, it is identified whether there is abnormal data value in the time series.
The method for identifying anomalies in data based on an autoencoder according to claim 1, wherein, based on the time series, the pre-generated specified number of sparsely connected autoencoders are subjected to an integrated training process according to preset rules to generate The steps of the corresponding autoencoder integration framework include:

obtain all first vectors contained in the time series; and,

obtaining a one-to-one corresponding first reconstruction vector generated by each of the sparsely connected autoencoders based on each of the first vectors;

generating a corresponding first objective function based on the first vector and the first reconstruction vector;

Based on the first objective function, each of the sparsely connected autoencoders is trained to obtain a trained first autoencoder, wherein the number of the first autoencoders is related to the sparsely connected autoencoders. The number of encoders is the same;

Perform integrated processing on all the first self-encoders to generate a corresponding independent frame, wherein the independent frame includes a specified number of the first self-encoders, and each of the first self-encoders is between no interaction;

The independent framework is determined to be the autoencoder integrated framework.
The method for identifying anomalies in data based on an autoencoder according to claim 1, wherein, based on the time series, the pre-generated specified number of sparsely connected autoencoders are subjected to an integrated training process according to preset rules to generate The steps of the corresponding autoencoder integration framework include:

obtaining a preset shared layer, wherein the shared layer includes a shared hidden state;

Perform weight sharing processing on all the sparsely connected autoencoders through the shared layer;

L1 regularization processing is performed on the shared hidden state to obtain the processed shared hidden state;

obtain all second vectors contained in the time series; and,

obtaining a one-to-one corresponding second reconstruction vector generated by each of the sparsely connected autoencoders based on each of the second vectors;

generating a corresponding second objective function according to the processed shared hidden state, the second vector and the second reconstruction vector;

All the sparsely connected autoencoders are jointly trained based on the second objective function to obtain a trained second autoencoder, wherein the number of the second autoencoders is the same as the number of the sparsely connected autoencoders the same number of devices;

Perform integrated processing on all the second self-encoders to generate a corresponding shared frame, wherein the shared frame includes a specified number of the second self-encoders, and each of the second self-encoders there is interaction;

The shared framework is determined to be the autoencoder integration framework.
The method for identifying anomalies in data based on an autoencoder according to claim 1, wherein the step of calculating an anomaly score value corresponding to each vector included in the time series through the autoencoder integration framework includes the following steps: :

Calculate and generate a reconstruction error corresponding to a specified vector by each autoencoder included in the autoencoder integration framework, wherein the specified vector is any one of all vectors included in the time series;

calculating the median of all said reconstruction errors;

The median is determined as the specified anomaly score value corresponding to the specified vector in the time series.
The method for identifying data anomalies based on an autoencoder according to claim 4, wherein the step of generating a reconstruction error corresponding to a specified vector by calculating each autoencoder included in the autoencoder integration framework, include:

The time series is reconstructed by a specific autoencoder to obtain a specific reconstructed time series corresponding to the time series, wherein the specific autoencoder is all the autoencoders included in the autoencoder integration framework. any one of the autoencoders in the encoder;

extracting a specific reconstruction vector corresponding to the specified vector from the specific reconstruction time series;

According to the specified vector and the specific reconstruction vector, a specific reconstruction error corresponding to the specified vector is calculated.
The method for identifying data anomalies based on an autoencoder according to claim 1, wherein the step of identifying whether there is an abnormal data value in the time series according to the abnormal score value comprises:

Get the preset abnormal threshold;

Judging whether there is a specified score value with a value greater than the abnormal threshold value among all the abnormal score values;

If so, filter out the specified score value from all the abnormal score values;

Find a third vector corresponding to the specified score value from the time series;

The third vector is determined to be the outlier data value.
The method for identifying data anomalies based on an autoencoder according to claim 6, wherein after the step of determining the third vector as the abnormal data value, the method comprises:

Filter out a fourth vector other than the third vector from the time series;

marking the second vector as normal data values;

obtaining a first quantity corresponding to the third vector; and,

obtaining a second quantity corresponding to the fourth vector;

generating an anomaly analysis report corresponding to the time series according to the abnormal data value, the first quantity, the normal data and the second quantity;

Display the anomaly analysis report.
A device for identifying data anomalies based on an autoencoder, comprising:

a receiving module for receiving the input time series to be detected;

The training module is configured to perform integrated training processing on a pre-generated specified number of sparsely connected autoencoders based on the time series according to preset rules, and generate a corresponding autoencoder integration framework, wherein the sparsely connected autoencoders are The encoder is generated by deleting the unit connection of a specified number of cyclic neural network-based autoencoders respectively;

a calculation module, configured to calculate the abnormal score value corresponding to each vector included in the time series through the autoencoder integration framework;

An identification module, configured to identify whether there is an abnormal data value in the time series according to the abnormal score value.
A computer device, comprising a memory and a processor, wherein a computer program is stored in the memory, wherein, when the processor executes the computer program, a method for identifying data anomalies based on an autoencoder is implemented:

Wherein, the data anomaly identification method based on the autoencoder includes:

Receive the input time series to be detected;

Based on the time series, an integrated training process is performed on a pre-generated specified number of sparsely connected autoencoders according to preset rules, and a corresponding autoencoder integration framework is generated, wherein the sparsely connected autoencoders are obtained by separately Generated after the specified number of cyclic neural network-based autoencoders are processed by unit connection deletion;

Calculate the abnormal score value corresponding to each vector included in the time series through the autoencoder integration framework;

According to the abnormal score value, it is identified whether there is abnormal data value in the time series.
The computer device according to claim 9, wherein, based on the time series, the pre-generated specified number of sparsely connected autoencoders are subjected to integrated training processing according to preset rules to generate a corresponding autoencoder integration framework steps, including:

obtain all first vectors contained in the time series; and,

obtaining a one-to-one corresponding first reconstruction vector generated by each of the sparsely connected autoencoders based on each of the first vectors;

generating a corresponding first objective function based on the first vector and the first reconstruction vector;

Based on the first objective function, each of the sparsely connected autoencoders is trained to obtain a trained first autoencoder, wherein the number of the first autoencoders is related to the sparsely connected autoencoders. The number of encoders is the same;

Perform integrated processing on all the first self-encoders to generate a corresponding independent frame, wherein the independent frame includes a specified number of the first self-encoders, and each of the first self-encoders is between no interaction;

The independent framework is determined to be the autoencoder integrated framework.
The computer device according to claim 9, wherein, based on the time series, the pre-generated specified number of sparsely connected autoencoders are subjected to integrated training processing according to preset rules to generate a corresponding autoencoder integration framework steps, including:

obtaining a preset shared layer, wherein the shared layer includes a shared hidden state;

Perform weight sharing processing on all the sparsely connected autoencoders through the shared layer;

L1 regularization processing is performed on the shared hidden state to obtain the processed shared hidden state;

obtain all second vectors contained in the time series; and,

obtaining a one-to-one corresponding second reconstruction vector generated by each of the sparsely connected autoencoders based on each of the second vectors;

generating a corresponding second objective function according to the processed shared hidden state, the second vector and the second reconstruction vector;

All the sparsely connected autoencoders are jointly trained based on the second objective function to obtain a trained second autoencoder, wherein the number of the second autoencoders is the same as the number of the sparsely connected autoencoders the same number of devices;

Perform integrated processing on all the second self-encoders to generate a corresponding shared frame, wherein the shared frame includes a specified number of the second self-encoders, and each of the second self-encoders there is interaction;

The shared framework is determined to be the autoencoder integration framework.
The computer device according to claim 9, wherein the step of calculating the abnormal score value corresponding to each vector included in the time series through the autoencoder integration framework comprises:

Calculate and generate a reconstruction error corresponding to a specified vector by each autoencoder included in the autoencoder integration framework, wherein the specified vector is any one of all vectors included in the time series;

calculating the median of all said reconstruction errors;

The median is determined as the specified anomaly score value corresponding to the specified vector in the time series.
The computer device according to claim 12, wherein the step of calculating and generating a reconstruction error corresponding to a specified vector by each autoencoder included in the autoencoder integration framework comprises:

The time series is reconstructed by a specific autoencoder to obtain a specific reconstructed time series corresponding to the time series, wherein the specific autoencoder is all the autoencoders included in the autoencoder integration framework. any one of the autoencoders in the encoder;

extracting a specific reconstruction vector corresponding to the specified vector from the specific reconstruction time series;

According to the specified vector and the specific reconstruction vector, a specific reconstruction error corresponding to the specified vector is calculated.
The computer device according to claim 9, wherein the step of identifying whether there is an abnormal data value in the time series according to the abnormal score value comprises:

Get the preset abnormal threshold;

Judging whether there is a specified score value with a value greater than the abnormal threshold value among all the abnormal score values;

If so, filter out the specified score value from all the abnormal score values;

Find a third vector corresponding to the specified score value from the time series;

The third vector is determined to be the outlier data value.
The computer device of claim 14, wherein after the step of determining the third vector as the abnormal data value, comprising:

Filter out a fourth vector other than the third vector from the time series;

marking the second vector as normal data values;

obtaining a first quantity corresponding to the third vector; and,

obtaining a second quantity corresponding to the fourth vector;

generating an anomaly analysis report corresponding to the time series according to the abnormal data value, the first quantity, the normal data and the second quantity;

Display the anomaly analysis report.
A computer-readable storage medium on which a computer program is stored, wherein, when the computer program is executed by a processor, a method for identifying data anomalies based on an auto-encoder is implemented, wherein the data anomaly based on an auto-encoder The identification method includes the following steps:

Receive the input time series to be detected;

Based on the time series, an integrated training process is performed on a pre-generated specified number of sparsely connected autoencoders according to preset rules, and a corresponding autoencoder integration framework is generated, wherein the sparsely connected autoencoders are obtained by separately Generated after the specified number of cyclic neural network-based autoencoders are processed by unit connection deletion;

Calculate the abnormal score value corresponding to each vector included in the time series through the autoencoder integration framework;

According to the abnormal score value, it is identified whether there is abnormal data value in the time series.
The computer-readable storage medium according to claim 16, wherein, based on the time series, an integrated training process is performed on a pre-generated specified number of sparsely connected auto-encoders according to preset rules to generate corresponding auto-encoders The steps of the server integration framework, including:

obtain all first vectors contained in the time series; and,

obtaining a one-to-one corresponding first reconstruction vector generated by each of the sparsely connected autoencoders based on each of the first vectors;

generating a corresponding first objective function based on the first vector and the first reconstruction vector;

Based on the first objective function, each of the sparsely connected autoencoders is trained to obtain a trained first autoencoder, wherein the number of the first autoencoders is related to the sparsely connected autoencoders. The number of encoders is the same;

Perform integrated processing on all the first self-encoders to generate a corresponding independent frame, wherein the independent frame includes a specified number of the first self-encoders, and each of the first self-encoders is between no interaction;

The independent framework is determined to be the autoencoder integrated framework.
The computer-readable storage medium according to claim 16, wherein the step of calculating the abnormal score value corresponding to each vector included in the time series through the autoencoder integration framework comprises:

Calculate and generate a reconstruction error corresponding to a specified vector by each autoencoder included in the autoencoder integration framework, wherein the specified vector is any one of all vectors included in the time series;

calculating the median of all said reconstruction errors;

The median is determined as the specified anomaly score value corresponding to the specified vector in the time series.
The computer-readable storage medium according to claim 18, wherein the step of calculating and generating a reconstruction error corresponding to a specified vector by each autoencoder included in the autoencoder integration framework comprises:

The time series is reconstructed through a specific auto-encoder to obtain a specific reconstructed time series corresponding to the time series, wherein the specific auto-encoder is all the auto-encoders included in the auto-encoder integration framework. Any one of the autoencoders in the encoder;

extracting a specific reconstruction vector corresponding to the specified vector from the specific reconstruction time series;

According to the specified vector and the specific reconstruction vector, a specific reconstruction error corresponding to the specified vector is calculated.
The computer-readable storage medium according to claim 16, wherein the step of identifying whether there is an abnormal data value in the time series according to the abnormal score value comprises:

Get the preset abnormal threshold;

Judging whether there is a specified score value whose value is greater than the abnormal threshold value among all the abnormal score values;

If so, filter out the specified score value from all the abnormal score values;

Find a third vector corresponding to the specified score value from the time series;

The third vector is determined to be the outlier data value.