CN116627708B - Storage fault analysis system and method thereof - Google Patents

Storage fault analysis system and method thereof Download PDF

Info

Publication number
CN116627708B
CN116627708B CN202310906536.9A CN202310906536A CN116627708B CN 116627708 B CN116627708 B CN 116627708B CN 202310906536 A CN202310906536 A CN 202310906536A CN 116627708 B CN116627708 B CN 116627708B
Authority
CN
China
Prior art keywords
feature vector
training
operation state
classification
error log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310906536.9A
Other languages
Chinese (zh)
Other versions
CN116627708A (en
Inventor
王建东
曾德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Weichu Information Technology Co ltd
Original Assignee
Hunan Weichu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Weichu Information Technology Co ltd filed Critical Hunan Weichu Information Technology Co ltd
Priority to CN202310906536.9A priority Critical patent/CN116627708B/en
Publication of CN116627708A publication Critical patent/CN116627708A/en
Application granted granted Critical
Publication of CN116627708B publication Critical patent/CN116627708B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3037Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a memory, e.g. virtual memory, cache
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The application relates to the technical field of intelligent analysis, in particular to a storage fault analysis system and a storage fault analysis method, wherein the storage fault analysis system comprises the following steps: the system comprises an operation state monitoring module, a log acquisition module, an operation data arrangement module, an operation time sequence feature extraction module, a log semantic understanding module, a packaging semantic matching fusion module and a fault type dividing module, wherein operation state data (temperature, voltage, current and transmission rate) of a plurality of preset time points of a diagnosed storage module in a preset time period and error logs of the diagnosed storage module are comprehensively utilized, and faults in a storage system are automatically identified and classified by combining deep learning and artificial intelligence technology, so that the efficiency and the accuracy of storage fault analysis are improved.

Description

Storage fault analysis system and method thereof
Technical Field
The present application relates to the field of intelligent analysis technology, and more particularly, to a storage failure analysis system and a method thereof.
Background
Storage failure analysis is a method of detecting and diagnosing failures occurring in a storage system. The storage fault analysis can help a storage administrator to quickly locate and solve faults, improve the reliability and performance of a storage system, and reduce the risk of data loss and service interruption.
Existing storage failure analysis schemes lack automated failure analysis tools, resulting in a storage administrator being required to manually collect and analyze large amounts of failure data, time consuming, labor consuming, and prone to error. Thus, an optimized solution is desired.
Disclosure of Invention
The application aims to solve the technical problem of providing a storage fault analysis system and a storage fault analysis method so as to improve the efficiency and the accuracy of storage fault analysis.
The embodiment of the application provides a storage fault analysis system and a storage fault analysis method, which comprehensively utilize operation state data (temperature, voltage, current and transmission rate) of a diagnosed storage module at a plurality of preset time points in a preset time period and error logs of the diagnosed storage module, and automatically identify and classify faults in the storage system by combining deep learning and artificial intelligence technology so as to improve the efficiency and accuracy of storage fault analysis.
In a first aspect, there is provided a storage failure analysis system comprising:
the operation state monitoring module is used for acquiring operation state data of the diagnosed storage module at a plurality of preset time points in a preset time period, wherein the operation state data comprise temperature, voltage, current and transmission rate;
The log acquisition module is used for acquiring the error log of the diagnosed storage module;
the operation data arrangement module is used for arranging the operation state data of the plurality of preset time points into an operation state data time sequence input matrix according to a time dimension and a sample dimension;
the operation time sequence feature extraction module is used for enabling the operation state data time sequence input matrix to pass through a convolutional neural network model serving as a filter so as to obtain an operation state time sequence associated feature vector;
the log semantic understanding module is used for obtaining an error log semantic understanding feature vector through a semantic encoder comprising a word embedding layer after word segmentation processing is carried out on the error log;
the packaging semantic matching fusion module is used for fusing the operation state time sequence associated feature vector and the error log semantic understanding feature vector to obtain a classification feature vector; and
the fault type dividing module is used for enabling the classification feature vector to pass through a classifier to obtain a classification result, and the classification result is used for representing a fault type label of the diagnosed storage module.
In the above storage failure analysis system, the operation timing characteristic extraction module is configured to: each layer of the convolutional neural network model used as the filter performs the following steps on input data in forward transfer of the layer: carrying out convolution processing on the input data to obtain a convolution characteristic diagram; carrying out mean pooling treatment based on a feature matrix on the convolution feature map to obtain a pooled feature map; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the convolutional neural network model serving as the filter is the operation state time sequence associated feature vector, and the input of the first layer of the convolutional neural network model serving as the filter is the operation state data time sequence input matrix.
In the above storage failure analysis system, the log semantic understanding module includes: the word segmentation unit is used for carrying out word segmentation processing on the error log so as to convert the error log into a word sequence consisting of a plurality of words; an embedded encoding unit, configured to map each word in the word sequence to a word vector using a word embedding layer of the semantic encoder including the word embedding layer to obtain a sequence of word vectors; and a context coding unit, configured to perform global context semantic coding on the sequence of word vectors using the converter of the semantic encoder including the word embedding layer to obtain the error log semantic understanding feature vector.
In the above storage failure analysis system, the context encoding unit includes: the vector construction subunit is used for carrying out one-dimensional arrangement on the sequence of the word vectors to obtain global word feature vectors; a self-attention subunit, configured to calculate a product between the global word feature vector and a transpose vector of each word vector in the sequence of word vectors to obtain a plurality of self-attention association matrices; the normalization subunit is used for respectively performing normalization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of normalized self-attention correlation matrices; the attention calculating subunit is used for obtaining a plurality of probability values through a Softmax classification function by each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and an attention applying subunit, configured to weight each word vector in the sequence of word vectors with each probability value in the plurality of probability values as a weight to obtain the error log semantic understanding feature vector.
In the above storage failure analysis system, the package semantic matching fusion module is configured to: fusing the operation state time sequence associated feature vector and the error log semantic understanding feature vector by using the following optimization formula to obtain a classification feature vector; wherein, the optimization formula is:
wherein ,representing the operational state timing related feature vector, < >>Representing the error log semantic understanding feature vector, < >> and />Representing the first and second norms of the vector, respectively, "> and />The weight and bias super-parameters are respectively given,representing a per-position distance matrix between the run state timing related feature vector and the error log semantic understanding feature vector, and +.>Is a unitary matrix->Representing multiplication by location +.>Representing matrix multiplication +.>Representing addition by position +.>Representing subtraction by position +.>Representing the classification feature vector.
In the above storage failure analysis system, the failure type classification module includes: the full-connection coding unit is used for carrying out full-connection coding on the classification characteristic vectors by using a plurality of full-connection layers of the classifier so as to obtain coded classification characteristic vectors; and the classification unit is used for passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
The storage failure analysis system further includes: a training module for training the convolutional neural network model as a filter, the semantic encoder comprising a word embedding layer, and the classifier; wherein, training module includes: the training running state monitoring unit is used for acquiring training running state data of the training diagnosed storage module at a plurality of preset time points in a preset time period; the training log obtaining unit is used for obtaining the training error log of the training diagnosed storage module; the training operation data arrangement unit is used for arranging the training operation state data of the plurality of preset time points into a training operation state data time sequence input matrix according to the time dimension and the sample dimension; the training operation time sequence feature extraction unit is used for enabling the training operation state data time sequence input matrix to pass through the convolutional neural network model serving as a filter so as to obtain training operation state time sequence associated feature vectors; the training log semantic understanding unit is used for obtaining training error log semantic understanding feature vectors through the semantic encoder comprising the word embedding layer after word segmentation processing is carried out on the training error log; the training fusion unit is used for fusing the training operation state time sequence associated feature vector and the training error log semantic understanding feature vector to obtain a training classification feature vector; the classification loss unit is used for passing the training classification feature vector through the classifier to obtain a classification loss function value; and a training unit for training the convolutional neural network model as a filter, the semantic encoder including a word embedding layer, and the classifier based on the classification loss function value and with back propagation of gradient descent, wherein in each iteration of the training, a cross-domain attention transfer optimization of feature distribution is performed on a weight matrix of the classifier.
In the above storage failure analysis system, performing cross-domain attention transfer optimization of feature distribution on the weight matrix of the classifier includes: performing cross-domain attention transfer optimization of feature distribution on the weight matrix of the classifier by using the following transfer optimization formula to obtain an optimized weight matrix;
the transfer optimization formula is as follows:
wherein ,is the weight matrix,/->Is of the scale +.>,/>、/>To->Is the respective row vector of the weight matrix, is->Representing the two norms of the feature vector, +.>Represents the +.o of the weight matrix>Line->The characteristic value of the column,is a row vector obtained by arranging the sum value of each row vector of the weight matrix,/->Representing the transpose of the matrix>Representing matrix multiplication +.> and />All represent a single layer convolution operation, ">Is the optimized weight matrix.
In a second aspect, there is provided a storage failure analysis method, comprising:
acquiring operation state data of a diagnosed storage module at a plurality of preset time points in a preset time period, wherein the operation state data comprise temperature, voltage, current and transmission rate;
obtaining an error log of the diagnosed storage module;
arranging the running state data of the plurality of preset time points into a running state data time sequence input matrix according to a time dimension and a sample dimension;
The operation state data time sequence input matrix is processed through a convolutional neural network model serving as a filter to obtain operation state time sequence associated feature vectors;
after word segmentation is carried out on the error log, a semantic encoder comprising a word embedding layer is used for obtaining an error log semantic understanding feature vector;
fusing the operation state time sequence associated feature vector and the error log semantic understanding feature vector to obtain a classification feature vector; and
and the classification feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for representing the fault type label of the diagnosed storage module.
Compared with the prior art, the storage fault analysis system and the storage fault analysis method provided by the application comprehensively utilize the operation state data (temperature, voltage, current and transmission rate) of the diagnosed storage module at a plurality of preset time points in the preset time period and the error log of the diagnosed storage module, and automatically identify and classify faults in the storage system by combining deep learning and artificial intelligence technology so as to improve the efficiency and accuracy of storage fault analysis.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is an application scenario diagram of a storage failure analysis system according to an embodiment of the present application.
Fig. 2 is a block diagram of a storage failure analysis system according to an embodiment of the present application.
Fig. 3 is a block diagram of the log semantic understanding module in the storage failure analysis system according to an embodiment of the present application.
Fig. 4 is a block diagram of the context encoding unit in the storage failure analysis system according to an embodiment of the present application.
Fig. 5 is a block diagram of the failure type classification module in the storage failure analysis system according to an embodiment of the present application.
Fig. 6 is a flowchart of a storage failure analysis method according to an embodiment of the present application.
Fig. 7 is a schematic diagram of a system architecture of a storage failure analysis method according to an embodiment of the present application.
Detailed Description
The following description of the technical solutions according to the embodiments of the present application will be given with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
Unless defined otherwise, all technical and scientific terms used in the embodiments of the application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application.
In describing embodiments of the present application, unless otherwise indicated and limited thereto, the term "connected" should be construed broadly, for example, it may be an electrical connection, or may be a communication between two elements, or may be a direct connection, or may be an indirect connection via an intermediate medium, and it will be understood by those skilled in the art that the specific meaning of the term may be interpreted according to circumstances.
It should be noted that, the term "first\second\third" related to the embodiment of the present application is merely to distinguish similar objects, and does not represent a specific order for the objects, it is to be understood that "first\second\third" may interchange a specific order or sequence where allowed. It is to be understood that the "first\second\third" distinguishing objects may be interchanged where appropriate such that embodiments of the application described herein may be practiced in sequences other than those illustrated or described herein.
Aiming at the technical problems, the technical concept of the application is to comprehensively utilize the operation state data (temperature, voltage, current and transmission rate) of the diagnosed storage module at a plurality of preset time points in a preset time period and the error log of the diagnosed storage module, and automatically identify and classify faults in a storage system by combining deep learning and artificial intelligence technology, thereby improving the efficiency and accuracy of storage fault analysis.
Specifically, in the technical scheme of the application, firstly, operation state data of a diagnosed storage module at a plurality of preset time points in a preset time period are obtained, wherein the operation state data comprise temperature, voltage, current and transmission rate. And simultaneously, obtaining an error log of the diagnosed storage module. The running state data of the storage system is an important index reflecting the performance and health condition of the storage system, and can be used for monitoring and analyzing the working condition of the storage system and finding and early warning the abnormality and the fault of the storage system. By acquiring the operation state data of the diagnosed storage module at a plurality of preset time points in a preset time period, the operation characteristics of the storage module can be observed from a plurality of dimensions and angles, and the change trend and rule of the operation state data and the relevance of the operation state data to the fault type can be found. More specifically, the temperature, voltage, current, and transmission rate reflect the temperature condition, power supply condition, circuit condition, and data transmission condition, respectively, of the memory module, and have important reference values for diagnosing the failure of the memory module. In addition, the error log is an important data source for recording fault information in the storage system, and can reflect abnormal conditions, such as hardware faults, software errors, network problems and the like, of the storage module in the operation process. By analyzing the error log, the fault cause and the influence range of the storage module can be known.
And then, arranging the operation state data of the plurality of preset time points into an operation state data time sequence input matrix according to the time dimension and the sample dimension. Therefore, the time sequence input matrix of the operation state data can keep the time change trend of the operation state of the storage module and reflect the state difference before and after the occurrence of the fault.
As previously described, the operational state data timing input matrix includes a plurality of operational state parameters of the diagnosed memory module at different points in time. There may be complex spatiotemporal correlations between these parameters. In the technical scheme of the application, the operation state data time sequence input matrix is used for obtaining the operation state time sequence association characteristic vector through a convolution neural network model serving as a filter. Here, the convolutional neural network model can effectively extract local features in the input data, thereby realizing high-level abstraction and representation of the input data. Specifically, by taking the convolutional neural network model as a filter, multi-level convolutional operation can be performed on the time sequence input matrix of the operation state data, so that key features in the operation state data, such as abnormal fluctuation, trend change, periodicity law and the like, are extracted, and the features can reflect the operation condition and the fault risk of the storage module.
And then, performing word segmentation on the error log, and then obtaining an error log semantic understanding feature vector through a semantic encoder comprising a word embedding layer. The word segmentation process is to segment the text in the error log into meaningful minimum units so as to facilitate subsequent semantic analysis. The word segmentation process can improve the information density of the error log, remove irrelevant stop words, and reduce noise and redundancy. In addition, the word embedding layer is a process of converting a plurality of words after word segmentation into numerical vectors so as to facilitate subsequent model operations. Furthermore, the semantic encoder can perform context-related semantic understanding on the error log, extract high-level semantic feature information of the error log, and improve semantic representation capability of the error log.
As described above, the operation state time sequence associated feature vector reflects the dynamic change condition of the diagnosed storage module, and the error log semantic understanding feature vector reflects the abnormal information of the diagnosed storage module. The two feature vectors describe the fault features of the storage module from different angles and dimensions respectively, and in the technical scheme of the application, the operation state time sequence associated feature vector and the error log semantic understanding feature vector are fused to obtain a classification feature vector with more comprehensive and rich information expression, so that different types of faults can be better distinguished.
And then, the classification feature vector is passed through a classifier to obtain a classification result, wherein the classification result is used for representing the fault type label of the diagnosed storage module. Here, the classifier may output a predicted class label based on the input feature vector. The fault type label can be a disk fault, a controller fault, a power supply fault, a memory fault, a heat dissipation problem, a data line fault, a software fault and the like. In the subsequent application, fault type labels can be added and subtracted according to actual conditions. In this way, faults in the storage system are automatically identified and classified.
In the technical scheme of the application, the operation state time sequence associated feature vector expresses the local associated semantic feature under the sample-time sequence cross dimension of the operation state data, and the error log semantic understanding feature vector expresses the text semantic coding feature of the error log text, so that fusion based on semantic matching between feature vectors needs to be carried out in a depth feature space obtained by a neural network model of the operation state time sequence associated feature vector and the error log semantic understanding feature vector.
The applicant of the present application thus relates the operational state timing-related feature vectors, for example, toAnd said error log semantic understanding feature vector, e.g. written +.>Carrying out deep space encapsulation semantic matching fusion to obtain the classification feature vector +.>For example, it is noted that, in the above-mentioned classification feature vector +.>The concrete steps are as follows:
and />Representing the first and second norms of the vector, respectively, "> and />The weight and bias super-parameters are respectively given,representing the time sequence associated feature vector of the operation state>And said error log semantic understanding feature vector +.>A matrix of the distances by position, i.e. +.>And->Is a unitary matrix->Representing the classification feature vector.
Here, for the run state timing-related feature vector in the depth feature spaceAnd said error log semantic understanding feature vector +.>The semantic expression is packaged into a deep space, so that fine-grained features in the overall distribution of feature vectors simultaneously comprise low-level semantic distribution and high-level semantic distribution, and therefore, through the deep space packaging semantic matching fusion, matching of semantic levels of classification mode layers can be performed through balancing the low-level semantic distribution and the high-level semantic distribution, so that semantic controlled compiling fusion of the features in the feature space is realized, and the operation state time sequence related feature vectors are obtained >And said error log semantic understanding feature vector +.>Semantic collaboration in the feature fusion space promotes the classification feature vector +.>Time sequence associated feature vector of the operation state>And said error log semantic understanding feature vector +.>The fusion effect of said classification feature vector is improved>The accuracy of the classification result obtained by the classifier.
In the technical scheme of the application, the operation state time sequence associated feature vector expresses the local associated semantic feature under the sample-time sequence cross dimension of the operation state data, and the error log semantic understanding feature vector expresses the text semantic coding feature of the error log text, so that after the operation state time sequence associated feature vector and the error log semantic understanding feature vector are fused based on semantic matching in a depth feature space obtained by a neural network model, the classification feature vector can have diversified feature expression.
Thus, when the classification feature vector is classified by a classifier, the distribution transferability difference of the diversified feature distribution expression in the domain transfer process of classification is considered, for example, when the weight matrix of the classifier is adapted with respect to a feature expression of a certain class, it will have better distribution transferability than a feature expression of another class, and vice versa. Therefore, the weight matrix of the classifier needs to be adaptively optimized for the classification feature vector, so as to improve the training effect of the classification training of the classification feature vector through the classifier, namely, improve the classification speed and the accuracy of the obtained classification result.
Thus, the applicant of the present application, in each iteration of the weighting matrix of the classifier, has a function of the weighting matrixPerforming cross-domain attention transfer optimization of feature distribution, wherein the cross-domain attention transfer optimization is specifically expressed as follows:
wherein ,is the weight matrix,/->Is of the scale +.>,/> />To->Is the respective row vector of the weight matrix, is->Representing the two norms of the feature vector, +.>Represents the +.o of the weight matrix>Line->The characteristic value of the column,is a row vector obtained by arranging the sum value of each row vector of the weight matrix,/->Representing the transpose of the matrix>Representing matrix multiplication +.> and />All represent a single layer convolution operation, ">Is the optimized weight matrix.
Here, the feature distribution-based cross-domain attention transfer optimization is directed to the classificationDifferent representations of feature distribution of feature vectors existing in feature space domain and classification target domain, based on weight matrix of classifierCross-domain diversity feature representation of the fecal multiscale fusion feature vector obtained after expansion of the classification feature vector to be classified is realized by the weight matrix +.>Is focused by convolution operations to enhance the transferability of cross-domain gaps of good transferred feature distributions in a diversified feature distribution while suppressing negative transfer (negative transfer) of bad transferred feature distributions to be based on the weight matrix ∈ - >The weight matrix is realized by the distribution structure of the classification characteristic vector to be classified>The self-adaptive optimization is transferred to the unsupervised domain, so that the training effect of the classification feature vector for classification training through the classifier is improved.
The application has the following technical effects: 1. an automated storage failure analysis scheme is provided. 2. The scheme can automatically identify and classify faults in the storage system, thereby helping a storage manager to discover and solve the fault problems in time, ensuring the normal operation of the storage system and avoiding data loss and service interruption.
Fig. 1 is an application scenario diagram of a storage failure analysis system according to an embodiment of the present application. As shown in fig. 1, in the application scenario, first, operation state data of a diagnosed storage module at a plurality of predetermined time points within a predetermined period of time (e.g., C1 as illustrated in fig. 1) is acquired, and an error log of the diagnosed storage module is acquired (e.g., C2 as illustrated in fig. 1); the acquired operational status data and error log are then input to a server (e.g., S as illustrated in fig. 1) deployed with a storage failure analysis algorithm, wherein the server is capable of processing the operational status data and the error log frequency based on the storage failure analysis algorithm to generate a classification result for representing a failure type tag of the diagnosed storage module.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.
In one embodiment of the present application, FIG. 2 is a block diagram of a storage failure analysis system according to an embodiment of the present application. As shown in fig. 2, a storage failure analysis system 100 according to an embodiment of the present application includes: an operation state monitoring module 110 for acquiring operation state data of the diagnosed storage module at a plurality of predetermined time points within a predetermined time period, wherein the operation state data includes temperature, voltage, current and transmission rate; a log obtaining module 120, configured to obtain an error log of the diagnosed storage module; an operation data arrangement module 130, configured to arrange operation state data of the plurality of predetermined time points into an operation state data time sequence input matrix according to a time dimension and a sample dimension; the operation time sequence feature extraction module 140 is configured to pass the operation state data time sequence input matrix through a convolutional neural network model serving as a filter to obtain an operation state time sequence associated feature vector; the log semantic understanding module 150 is configured to obtain an error log semantic understanding feature vector through a semantic encoder including a word embedding layer after performing word segmentation processing on the error log; the package semantic matching fusion module 160 is configured to fuse the running state timing related feature vector and the error log semantic understanding feature vector to obtain a classification feature vector; and a fault type classification module 170, configured to pass the classification feature vector through a classifier to obtain a classification result, where the classification result is used to represent a fault type label of the diagnosed storage module.
Specifically, in the embodiment of the present application, the operation state monitoring module 110 and the log obtaining module 120 are configured to obtain operation state data of the diagnosed storage module at a plurality of predetermined time points within a predetermined period of time, where the operation state data includes a temperature, a voltage, a current, and a transmission rate; and obtaining an error log of the diagnosed storage module.
Aiming at the technical problems, the technical concept of the application is to comprehensively utilize the operation state data (temperature, voltage, current and transmission rate) of the diagnosed storage module at a plurality of preset time points in a preset time period and the error log of the diagnosed storage module, and automatically identify and classify faults in a storage system by combining deep learning and artificial intelligence technology, thereby improving the efficiency and accuracy of storage fault analysis.
Specifically, in the technical scheme of the application, firstly, operation state data of a diagnosed storage module at a plurality of preset time points in a preset time period are obtained, wherein the operation state data comprise temperature, voltage, current and transmission rate. And simultaneously, obtaining an error log of the diagnosed storage module. The running state data of the storage system is an important index reflecting the performance and health condition of the storage system, and can be used for monitoring and analyzing the working condition of the storage system and finding and early warning the abnormality and the fault of the storage system. By acquiring the operation state data of the diagnosed storage module at a plurality of preset time points in a preset time period, the operation characteristics of the storage module can be observed from a plurality of dimensions and angles, and the change trend and rule of the operation state data and the relevance of the operation state data to the fault type can be found.
More specifically, the temperature, voltage, current, and transmission rate reflect the temperature condition, power supply condition, circuit condition, and data transmission condition, respectively, of the memory module, and have important reference values for diagnosing the failure of the memory module. In addition, the error log is an important data source for recording fault information in the storage system, and can reflect abnormal conditions, such as hardware faults, software errors, network problems and the like, of the storage module in the operation process. By analyzing the error log, the fault cause and the influence range of the storage module can be known.
Specifically, in the embodiment of the present application, the operation data arrangement module 130 is configured to arrange the operation state data of the plurality of predetermined time points into an operation state data time sequence input matrix according to a time dimension and a sample dimension. And then, arranging the operation state data of the plurality of preset time points into an operation state data time sequence input matrix according to the time dimension and the sample dimension. Therefore, the time sequence input matrix of the operation state data can keep the time change trend of the operation state of the storage module and reflect the state difference before and after the occurrence of the fault.
Specifically, in the embodiment of the present application, the operation timing feature extraction module 140 is configured to pass the operation state data timing input matrix through a convolutional neural network model serving as a filter to obtain an operation state timing related feature vector. As previously described, the operational state data timing input matrix includes a plurality of operational state parameters of the diagnosed memory module at different points in time. There may be complex spatiotemporal correlations between these parameters. In the technical scheme of the application, the operation state data time sequence input matrix is used for obtaining the operation state time sequence association characteristic vector through a convolution neural network model serving as a filter.
Here, the convolutional neural network model can effectively extract local features in the input data, thereby realizing high-level abstraction and representation of the input data. Specifically, by taking the convolutional neural network model as a filter, multi-level convolutional operation can be performed on the time sequence input matrix of the operation state data, so that key features in the operation state data, such as abnormal fluctuation, trend change, periodicity law and the like, are extracted, and the features can reflect the operation condition and the fault risk of the storage module.
Wherein, the operation timing feature extraction module 140 is configured to: each layer of the convolutional neural network model used as the filter performs the following steps on input data in forward transfer of the layer: carrying out convolution processing on the input data to obtain a convolution characteristic diagram; carrying out mean pooling treatment based on a feature matrix on the convolution feature map to obtain a pooled feature map; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the convolutional neural network model serving as the filter is the operation state time sequence associated feature vector, and the input of the first layer of the convolutional neural network model serving as the filter is the operation state data time sequence input matrix.
The convolutional neural network (Convolutional Neural Network, CNN) is an artificial neural network and has wide application in the fields of image recognition and the like. The convolutional neural network may include an input layer, a hidden layer, and an output layer, where the hidden layer may include a convolutional layer, a pooling layer, an activation layer, a full connection layer, etc., where the previous layer performs a corresponding operation according to input data, outputs an operation result to the next layer, and obtains a final result after the input initial data is subjected to a multi-layer operation.
The convolutional neural network model has excellent performance in the aspect of image local feature extraction by taking a convolutional kernel as a feature filtering factor, and has stronger feature extraction generalization capability and fitting capability compared with the traditional image feature extraction algorithm based on statistics or feature engineering.
Specifically, in the embodiment of the present application, the log semantic understanding module 150 is configured to obtain the semantic understanding feature vector of the error log through a semantic encoder including a word embedding layer after performing word segmentation processing on the error log. And then, performing word segmentation on the error log, and then obtaining an error log semantic understanding feature vector through a semantic encoder comprising a word embedding layer. The word segmentation process is to segment the text in the error log into meaningful minimum units so as to facilitate subsequent semantic analysis. The word segmentation process can improve the information density of the error log, remove irrelevant stop words, and reduce noise and redundancy. In addition, the word embedding layer is a process of converting a plurality of words after word segmentation into numerical vectors so as to facilitate subsequent model operations. Furthermore, the semantic encoder can perform context-related semantic understanding on the error log, extract high-level semantic feature information of the error log, and improve semantic representation capability of the error log.
Fig. 3 is a block diagram of the log semantic understanding module in the storage fault analysis system according to an embodiment of the present application, and as shown in fig. 3, the log semantic understanding module 150 includes: a word segmentation unit 151 for performing word segmentation processing on the error log to convert the error log into a word sequence composed of a plurality of words; an embedded encoding unit 152, configured to map each word in the word sequence to a word vector using a word embedding layer of the semantic encoder including the word embedding layer to obtain a sequence of word vectors; and a context encoding unit 153, configured to perform global-based context semantic encoding on the sequence of word vectors using the converter of the semantic encoder including the word embedding layer to obtain the error log semantic understanding feature vector.
Wherein, fig. 4 is a block diagram of the context encoding unit in the storage failure analysis system according to an embodiment of the present application, as shown in fig. 4, the context encoding unit 153 includes: a vector construction subunit 1531, configured to one-dimensionally arrange the sequence of word vectors to obtain a global word feature vector; a self-attention subunit 1532 for calculating a product between the global word feature vector and a transpose vector of each word vector in the sequence of word vectors to obtain a plurality of self-attention correlation matrices; a normalization subunit 1533, configured to perform normalization processing on each of the plurality of self-attention correlation matrices to obtain a plurality of normalized self-attention correlation matrices; a attention calculating subunit 1534, configured to obtain a plurality of probability values from each normalized self-attention correlation matrix in the plurality of normalized self-attention correlation matrices by using a Softmax classification function; and an attention applying subunit 1535 configured to weight each word vector in the sequence of word vectors with each probability value in the plurality of probability values as a weight to obtain the error log semantic understanding feature vector.
The context encoder aims to mine for hidden patterns between contexts in the word sequence, optionally the encoder comprises: CNN (Convolutional Neural Network ), recurrent NN (RecursiveNeural Network, recurrent neural network), language Model (Language Model), and the like. The CNN-based method has a better extraction effect on local features, but has a poor effect on Long-Term Dependency (Long-Term Dependency) problems in sentences, so Bi-LSTM (Long Short-Term Memory) based encoders are widely used. The repetitive NN processes sentences as a tree structure rather than a sequence, has stronger representation capability in theory, but has the weaknesses of high sample marking difficulty, deep gradient disappearance, difficulty in parallel calculation and the like, so that the repetitive NN is less in practical application. The transducer has a network structure with wide application, has the characteristics of CNN and RNN, has a better extraction effect on global characteristics, and has a certain advantage in parallel calculation compared with RNN (RecurrentNeural Network ).
Specifically, in the embodiment of the present application, the package semantic matching fusion module 160 is configured to fuse the operation state timing related feature vector and the error log semantic understanding feature vector to obtain a classification feature vector. As described above, the operation state time sequence associated feature vector reflects the dynamic change condition of the diagnosed storage module, and the error log semantic understanding feature vector reflects the abnormal information of the diagnosed storage module. The two feature vectors describe the fault features of the storage module from different angles and dimensions respectively, and in the technical scheme of the application, the operation state time sequence associated feature vector and the error log semantic understanding feature vector are fused to obtain a classification feature vector with more comprehensive and rich information expression, so that different types of faults can be better distinguished.
In the technical scheme of the application, the operation state time sequence associated feature vector expresses the local associated semantic feature under the sample-time sequence cross dimension of the operation state data, and the error log semantic understanding feature vector expresses the text semantic coding feature of the error log text, so that fusion based on semantic matching between feature vectors needs to be carried out in a depth feature space obtained by a neural network model of the operation state time sequence associated feature vector and the error log semantic understanding feature vector.
Thus, the applicant of the present application relates feature vectors to the operating state timing,for example, asAnd said error log semantic understanding feature vector, e.g. written +.>Carrying out deep space encapsulation semantic matching fusion to obtain the classification feature vector +.>For example, it is noted that, in the above-mentioned classification feature vector +.>The concrete steps are as follows: fusing the operation state time sequence associated feature vector and the error log semantic understanding feature vector by using the following optimization formula to obtain a classification feature vector; wherein, the optimization formula is:
wherein ,representing the operational state timing related feature vector, < > >Representing the error log semantic understanding feature vector, < >> and />Representing the first and second norms of the vector, respectively, "> and />The weight and bias super-parameters are respectively given,representing a per-position distance matrix between the run state timing related feature vector and the error log semantic understanding feature vector, and +.>Is a unitary matrix->Representing multiplication by location +.>Representing matrix multiplication +.>Representing addition by position +.>Representing subtraction by position +.>Representing the classification feature vector.
Here, for the run state timing-related feature vector in the depth feature spaceAnd said error log semantic understanding feature vector +.>The semantic expression is packaged into a deep space, so that fine-grained features in the overall distribution of feature vectors simultaneously comprise low-level semantic distribution and high-level semantic distribution, and therefore, through the deep space packaging semantic matching fusion, matching of semantic levels of classification mode layers can be performed through balancing the low-level semantic distribution and the high-level semantic distribution, so that semantic controlled compiling fusion of the features in the feature space is realized, and the operation state time sequence related feature vectors are obtained >And saidError log semantic understanding feature vector->Semantic collaboration in the feature fusion space promotes the classification feature vector +.>Time sequence associated feature vector of the operation state>And said error log semantic understanding feature vector +.>The fusion effect of said classification feature vector is improved>The accuracy of the classification result obtained by the classifier.
Specifically, in the embodiment of the present application, the fault type classification module 170 is configured to pass the classification feature vector through a classifier to obtain a classification result, where the classification result is used to represent a fault type label of the diagnosed storage module. And then, the classification feature vector is passed through a classifier to obtain a classification result, wherein the classification result is used for representing the fault type label of the diagnosed storage module. Here, the classifier may output a predicted class label based on the input feature vector. The fault type label can be a disk fault, a controller fault, a power supply fault, a memory fault, a heat dissipation problem, a data line fault, a software fault and the like. In the subsequent application, fault type labels can be added and subtracted according to actual conditions. In this way, faults in the storage system are automatically identified and classified.
Fig. 5 is a block diagram of the fault type classification module in the storage fault analysis system according to an embodiment of the present application, and as shown in fig. 5, the fault type classification module 170 includes: a full-connection encoding unit 171, configured to perform full-connection encoding on the classification feature vector by using a plurality of full-connection layers of the classifier to obtain an encoded classification feature vector; and a classification unit 172, configured to pass the encoded classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
Further, the storage failure analysis system further comprises a training module for training the convolutional neural network model as a filter, the semantic encoder comprising a word embedding layer and the classifier; wherein, training module includes: the training running state monitoring unit is used for acquiring training running state data of the training diagnosed storage module at a plurality of preset time points in a preset time period; the training log obtaining unit is used for obtaining the training error log of the training diagnosed storage module; the training operation data arrangement unit is used for arranging the training operation state data of the plurality of preset time points into a training operation state data time sequence input matrix according to the time dimension and the sample dimension; the training operation time sequence feature extraction unit is used for enabling the training operation state data time sequence input matrix to pass through the convolutional neural network model serving as a filter so as to obtain training operation state time sequence associated feature vectors; the training log semantic understanding unit is used for obtaining training error log semantic understanding feature vectors through the semantic encoder comprising the word embedding layer after word segmentation processing is carried out on the training error log; the training fusion unit is used for fusing the training operation state time sequence associated feature vector and the training error log semantic understanding feature vector to obtain a training classification feature vector; the classification loss unit is used for passing the training classification feature vector through the classifier to obtain a classification loss function value; and a training unit for training the convolutional neural network model as a filter, the semantic encoder including a word embedding layer, and the classifier based on the classification loss function value and with back propagation of gradient descent, wherein in each iteration of the training, a cross-domain attention transfer optimization of feature distribution is performed on a weight matrix of the classifier.
In the technical scheme of the application, the operation state time sequence associated feature vector expresses the local associated semantic feature under the sample-time sequence cross dimension of the operation state data, and the error log semantic understanding feature vector expresses the text semantic coding feature of the error log text, so that after the operation state time sequence associated feature vector and the error log semantic understanding feature vector are fused based on semantic matching in a depth feature space obtained by a neural network model, the classification feature vector can have diversified feature expression.
Thus, when the classification feature vector is classified by a classifier, the distribution transferability difference of the diversified feature distribution expression in the domain transfer process of classification is considered, for example, when the weight matrix of the classifier is adapted with respect to a feature expression of a certain class, it will have better distribution transferability than a feature expression of another class, and vice versa. Therefore, the weight matrix of the classifier needs to be adaptively optimized for the classification feature vector, so as to improve the training effect of the classification training of the classification feature vector through the classifier, namely, improve the classification speed and the accuracy of the obtained classification result. Thus, the applicant of the present application, in each iteration of the weighting matrix of the classifier, has a function of the weighting matrix And performing cross-domain attention transfer optimization of feature distribution.
Specifically, in the embodiment of the present application, performing cross-domain attention transfer optimization of feature distribution on the weight matrix of the classifier includes: performing cross-domain attention transfer optimization of feature distribution on the weight matrix of the classifier by using the following transfer optimization formula to obtain an optimized weight matrix; the transfer optimization formula is as follows:
wherein ,is the weight matrix,/->Is of the scale +.>,/>、/>To->Is the respective row vector of the weight matrix, is->Representing the two norms of the feature vector, +.>Represents the +.o of the weight matrix>Line->The characteristic value of the column,is a row vector obtained by arranging the sum value of each row vector of the weight matrix,/->Representing the transpose of the matrix>Representing matrix multiplication +.> and />All represent a single layer convolution operation, ">Is the optimized weight matrix.
Here, the feature distribution-based cross-domain attention transfer optimization optimizes different representations of the feature distribution of the classification feature vector existing in a feature space domain and a classification target domain based on a weight matrix of the classifierCross-domain diversity feature representation of the fecal multiscale fusion feature vector obtained after expansion of the classification feature vector to be classified is realized by the weight matrix +. >Is focused by convolution operations to enhance the transferability of cross-domain gaps of good transferred feature distributions in a diversified feature distribution while suppressing negative transfer (negative transfer) of bad transferred feature distributions to be based on the weight matrix ∈ ->The weight matrix is realized by the distribution structure of the classification characteristic vector to be classified>The self-adaptive optimization is transferred to the unsupervised domain, so that the training effect of the classification feature vector for classification training through the classifier is improved.
In summary, the storage failure analysis system 100 according to the embodiment of the present application is illustrated, which comprehensively utilizes the operation state data (temperature, voltage, current and transmission rate) of the diagnosed storage module at a plurality of predetermined time points within a predetermined period of time and the error log of the diagnosed storage module, and automatically identifies and classifies the failure in the storage system in combination with deep learning and artificial intelligence technology, so as to improve the efficiency and accuracy of the storage failure analysis.
As described above, the storage failure analysis system 100 according to the embodiment of the present application may be implemented in various terminal devices, such as a server or the like for storage failure analysis. In one example, the storage failure analysis system 100 according to embodiments of the present application may be integrated into a terminal device as a software module and/or hardware module. For example, the storage failure analysis system 100 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the storage failure analysis system 100 could equally be one of a number of hardware modules of the terminal device.
Alternatively, in another example, the storage failure analysis system 100 and the terminal device may be separate devices, and the storage failure analysis system 100 may be connected to the terminal device through a wired and/or wireless network and transmit the interaction information in a agreed data format.
In one embodiment of the present application, FIG. 6 is a flow chart of a storage failure analysis method according to an embodiment of the present application. As shown in fig. 6, a storage failure analysis method according to an embodiment of the present application includes: 210, acquiring operation state data of a diagnosed storage module at a plurality of preset time points in a preset time period, wherein the operation state data comprises temperature, voltage, current and transmission rate; 220, obtaining an error log of the diagnosed storage module; 230, arranging the operation state data of the plurality of preset time points into an operation state data time sequence input matrix according to a time dimension and a sample dimension; 240, passing the operation state data time sequence input matrix through a convolutional neural network model serving as a filter to obtain an operation state time sequence associated feature vector; 250, performing word segmentation on the error log, and then obtaining an error log semantic understanding feature vector through a semantic encoder comprising a word embedding layer; 260, fusing the operation state time sequence associated feature vector and the error log semantic understanding feature vector to obtain a classification feature vector; and 270, passing the classification feature vector through a classifier to obtain a classification result, wherein the classification result is used for representing the fault type label of the diagnosed storage module.
Fig. 7 is a schematic diagram of a system architecture of a storage failure analysis method according to an embodiment of the present application. As shown in fig. 7, in the system architecture of the storage failure analysis method, first, operation state data of a diagnosed storage module at a plurality of predetermined time points within a predetermined period of time is acquired, wherein the operation state data includes temperature, voltage, current and transmission rate; then, obtaining an error log of the diagnosed storage module; then, arranging the operation state data of the plurality of preset time points into an operation state data time sequence input matrix according to a time dimension and a sample dimension; then, the operation state data time sequence input matrix passes through a convolutional neural network model serving as a filter to obtain an operation state time sequence associated feature vector; then, performing word segmentation on the error log, and then obtaining an error log semantic understanding feature vector through a semantic encoder comprising a word embedding layer; then, fusing the operation state time sequence associated feature vector and the error log semantic understanding feature vector to obtain a classification feature vector; and finally, the classification feature vector is passed through a classifier to obtain a classification result, wherein the classification result is used for representing the fault type label of the diagnosed storage module.
In a specific example, in the above storage failure analysis method, passing the operation state data time sequence input matrix through a convolutional neural network model as a filter to obtain an operation state time sequence association feature vector includes: each layer of the convolutional neural network model used as the filter performs the following steps on input data in forward transfer of the layer: carrying out convolution processing on the input data to obtain a convolution characteristic diagram; carrying out mean pooling treatment based on a feature matrix on the convolution feature map to obtain a pooled feature map; performing nonlinear activation on the pooled feature map to obtain an activated feature map; the output of the last layer of the convolutional neural network model serving as the filter is the operation state time sequence associated feature vector, and the input of the first layer of the convolutional neural network model serving as the filter is the operation state data time sequence input matrix.
In a specific example, in the above storage fault analysis method, after performing word segmentation on the error log, obtaining a semantic understanding feature direction of the error log by a semantic encoder including a word embedding layer, including: word segmentation processing is carried out on the error log so as to convert the error log into a word sequence composed of a plurality of words; mapping each word in the word sequence to a word vector using a word embedding layer of the semantic encoder including the word embedding layer to obtain a sequence of word vectors; and performing global-based context semantic coding on the sequence of word vectors using a converter of the semantic encoder including a word embedding layer to obtain the error log semantic understanding feature vector.
In a specific example, in the above storage failure analysis method, performing global-based context semantic coding on the sequence of word vectors using a converter of the semantic encoder including a word embedding layer to obtain the error log semantic understanding feature vector includes: one-dimensional arrangement is carried out on the sequence of the word vectors to obtain global word feature vectors; calculating the product between the global word feature vector and the transpose vector of each word vector in the sequence of word vectors to obtain a plurality of self-attention association matrices; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and weighting each word vector in the sequence of word vectors by taking each probability value in the plurality of probability values as a weight to obtain the error log semantic understanding feature vector.
In a specific example, in the above storage fault analysis method, fusing the operation state timing related feature vector and the error log semantic understanding feature vector to obtain a classification feature vector includes: fusing the operation state time sequence associated feature vector and the error log semantic understanding feature vector by using the following optimization formula to obtain a classification feature vector; wherein, the optimization formula is:
wherein ,representing the operational state timing related feature vector, < >>Representing the error log semantic understanding feature vector, < >> and />Representing the first and second norms of the vector, respectively, "> and />The weight and bias super-parameters are respectively given,representing a per-position distance matrix between the run state timing related feature vector and the error log semantic understanding feature vector, and +.>Is a unitary matrix->Representing multiplication by location +.>Representing matrix multiplication +.>Representing addition by position +.>Representing subtraction by position +.>Representing the classification feature vector.
In a specific example, in the storage fault analysis method, the classifying feature vector is passed through a classifier to obtain a classification result, where the classification result is used to represent a fault type label of the diagnosed storage module, and the method includes: performing full-connection coding on the classification feature vectors by using a plurality of full-connection layers of the classifier to obtain coded classification feature vectors; and passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
In a specific example, in the storage failure analysis method, the storage failure analysis method further includes: a training phase for training the convolutional neural network model as a filter, the semantic encoder comprising a word embedding layer, and the classifier; wherein the training phase comprises: acquiring training running state data of a plurality of preset time points of a training diagnosed storage module in a preset time period; acquiring a training error log of the training diagnosed storage module; arranging the training operation state data of the plurality of preset time points into a training operation state data time sequence input matrix according to a time dimension and a sample dimension; the training operation state data time sequence input matrix passes through the convolutional neural network model serving as a filter to obtain a training operation state time sequence associated feature vector; after word segmentation is carried out on the training error log, a semantic encoder containing a word embedding layer is used for obtaining a training error log semantic understanding feature vector; fusing the training operation state time sequence associated feature vector and the training error log semantic understanding feature vector to obtain a training classification feature vector; passing the training classification feature vector through the classifier to obtain a classification loss function value; and training the convolutional neural network model as a filter, the semantic encoder comprising a word embedding layer, and the classifier based on the classification loss function value and with back propagation of gradient descent, wherein a cross-domain attention transfer optimization of feature distribution is performed on a weight matrix of the classifier in each iteration of the training.
In a specific example, in the storage failure analysis method, performing cross-domain attention transfer optimization of feature distribution on a weight matrix of the classifier includes: performing cross-domain attention transfer optimization of feature distribution on the weight matrix of the classifier by using the following transfer optimization formula to obtain an optimized weight matrix; the transfer optimization formula is as follows:
/>
wherein ,is the weight matrix,/->Is of the scale +.>,/>、/>To->Is the respective row vector of the weight matrix, is->Representing the two norms of the feature vector, +.>Represents the +.o of the weight matrix>Line->The characteristic value of the column,is a row vector obtained by arranging the sum value of each row vector of the weight matrix,/->Representing the transpose of the matrix>Representing matrix multiplication +.> and />All represent a single layer convolution operation, ">Is the optimized weight matrix.
It will be appreciated by those skilled in the art that the specific operation of the respective steps in the above-described storage failure analysis method has been described in detail in the above description of the storage failure analysis system with reference to fig. 1 to 5, and thus, repetitive descriptions thereof will be omitted.
The present application also provides a computer program product comprising instructions which, when executed, cause an apparatus to perform operations corresponding to the above-described method.
In one embodiment of the present application, there is also provided a computer-readable storage medium storing a computer program for executing the above-described method.
It should be appreciated that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the forms of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects may be utilized. Furthermore, the computer program product may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Methods, systems, and computer program products of embodiments of the present application are described in the flow diagrams and/or block diagrams. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not intended to be limiting, and these advantages, benefits, effects, etc. are not to be considered as essential to the various embodiments of the present application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not necessarily limited to practice with the above described specific details.
The block diagrams of the devices, apparatuses, devices, systems referred to in the present application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent aspects of the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (7)

1. A storage failure analysis system, comprising:
the operation state monitoring module is used for acquiring operation state data of the diagnosed storage module at a plurality of preset time points in a preset time period, wherein the operation state data comprise temperature, voltage, current and transmission rate;
the log acquisition module is used for acquiring the error log of the diagnosed storage module;
the operation data arrangement module is used for arranging the operation state data of the plurality of preset time points into an operation state data time sequence input matrix according to a time dimension and a sample dimension;
the operation time sequence feature extraction module is used for enabling the operation state data time sequence input matrix to pass through a convolutional neural network model serving as a filter so as to obtain an operation state time sequence associated feature vector;
the log semantic understanding module is used for obtaining an error log semantic understanding feature vector through a semantic encoder comprising a word embedding layer after word segmentation processing is carried out on the error log;
the packaging semantic matching fusion module is used for fusing the operation state time sequence associated feature vector and the error log semantic understanding feature vector to obtain a classification feature vector; and
The fault type dividing module is used for passing the classification feature vector through a classifier to obtain a classification result, and the classification result is used for representing a fault type label of the diagnosed storage module;
the storage fault analysis system is used for training the convolutional neural network model serving as a filter, the semantic encoder comprising a word embedding layer and the classifier;
wherein, training module includes:
the training running state monitoring unit is used for acquiring training running state data of the training diagnosed storage module at a plurality of preset time points in a preset time period;
the training log obtaining unit is used for obtaining the training error log of the training diagnosed storage module;
the training operation data arrangement unit is used for arranging the training operation state data of the plurality of preset time points into a training operation state data time sequence input matrix according to the time dimension and the sample dimension;
the training operation time sequence feature extraction unit is used for enabling the training operation state data time sequence input matrix to pass through the convolutional neural network model serving as a filter so as to obtain training operation state time sequence associated feature vectors;
The training log semantic understanding unit is used for obtaining training error log semantic understanding feature vectors through the semantic encoder comprising the word embedding layer after word segmentation processing is carried out on the training error log;
the training fusion unit is used for fusing the training operation state time sequence associated feature vector and the training error log semantic understanding feature vector to obtain a training classification feature vector;
the classification loss unit is used for passing the training classification feature vector through the classifier to obtain a classification loss function value; and
a training unit, configured to train the convolutional neural network model as a filter, the semantic encoder including a word embedding layer, and the classifier based on the classification loss function value and with back propagation of gradient descent, where in each iteration of the training, a cross-domain attention transfer optimization of feature distribution is performed on a weight matrix of the classifier;
the cross-domain attention transfer optimization of feature distribution is carried out on the weight matrix of the classifier, and the cross-domain attention transfer optimization comprises the following steps: performing cross-domain attention transfer optimization of feature distribution on the weight matrix of the classifier by using the following transfer optimization formula to obtain an optimized weight matrix;
The transfer optimization formula is as follows:
wherein ,is the weight matrix,/->Is of the scale +.>,/> />To->Is the respective row vector of the weight matrix, is->Representing the two norms of the feature vector, +.>Represents the +.o of the weight matrix>Line->The characteristic value of the column,is a row vector obtained by arranging the sum value of each row vector of the weight matrix,/->Representing the transpose of the matrix>Representing matrix multiplication +.> and />All represent a single layer convolution operation, ">Is the optimized weight matrix;
the package semantic matching fusion module is used for: fusing the operation state time sequence associated feature vector and the error log semantic understanding feature vector by using the following optimization formula to obtain a classification feature vector;
wherein, the optimization formula is:
wherein ,representing the operational state timing related feature vector, < >>Representing the error log semantic understanding feature vector, < >> and />Representing the first and second norms of the vector, respectively, "> and />The weight and bias super-parameters are respectively given,representing a per-position distance matrix between the run state timing related feature vector and the error log semantic understanding feature vector, and +.>Is a unitary matrix- >Representing multiplication by location +.>Representing matrix multiplication +.>Representing addition by position +.>Representing subtraction by position +.>Representing the classification feature vector.
2. The storage failure analysis system of claim 1, wherein the operation timing feature extraction module is configured to: each layer of the convolutional neural network model used as the filter performs the following steps on input data in forward transfer of the layer:
carrying out convolution processing on the input data to obtain a convolution characteristic diagram;
carrying out mean pooling treatment based on a feature matrix on the convolution feature map to obtain a pooled feature map; and
non-linear activation is carried out on the pooled feature map so as to obtain an activated feature map;
the output of the last layer of the convolutional neural network model serving as the filter is the operation state time sequence associated feature vector, and the input of the first layer of the convolutional neural network model serving as the filter is the operation state data time sequence input matrix.
3. The storage failure analysis system of claim 2, wherein the log semantic understanding module comprises:
the word segmentation unit is used for carrying out word segmentation processing on the error log so as to convert the error log into a word sequence consisting of a plurality of words;
An embedded encoding unit, configured to map each word in the word sequence to a word vector using a word embedding layer of the semantic encoder including the word embedding layer to obtain a sequence of word vectors; and
and the context coding unit is used for carrying out global context semantic coding on the sequence of the word vectors by using the converter of the semantic coder containing the word embedding layer so as to obtain the error log semantic understanding feature vector.
4. The storage failure analysis system of claim 3, wherein the context encoding unit comprises:
the vector construction subunit is used for carrying out one-dimensional arrangement on the sequence of the word vectors to obtain global word feature vectors;
a self-attention subunit, configured to calculate a product between the global word feature vector and a transpose vector of each word vector in the sequence of word vectors to obtain a plurality of self-attention association matrices;
the normalization subunit is used for respectively performing normalization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of normalized self-attention correlation matrices;
the attention calculating subunit is used for obtaining a plurality of probability values through a Softmax classification function by each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and
And the attention applying subunit is used for weighting each word vector in the sequence of word vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the error log semantic understanding feature vector.
5. The storage failure analysis system of claim 4, wherein the failure type classification module comprises:
the full-connection coding unit is used for carrying out full-connection coding on the classification characteristic vectors by using a plurality of full-connection layers of the classifier so as to obtain coded classification characteristic vectors; and
and the classification unit is used for passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
6. A storage failure analysis method, comprising:
acquiring operation state data of a diagnosed storage module at a plurality of preset time points in a preset time period, wherein the operation state data comprise temperature, voltage, current and transmission rate;
obtaining an error log of the diagnosed storage module;
arranging the running state data of the plurality of preset time points into a running state data time sequence input matrix according to a time dimension and a sample dimension;
The operation state data time sequence input matrix is processed through a convolutional neural network model serving as a filter to obtain operation state time sequence associated feature vectors;
after word segmentation is carried out on the error log, a semantic encoder comprising a word embedding layer is used for obtaining an error log semantic understanding feature vector;
fusing the operation state time sequence associated feature vector and the error log semantic understanding feature vector to obtain a classification feature vector; and
the classification feature vector passes through a classifier to obtain a classification result, and the classification result is used for representing a fault type label of a diagnosed storage module;
the storage fault analysis system is used for training the convolutional neural network model serving as a filter, the semantic encoder containing the word embedding layer and the classifier;
wherein the training phase comprises:
acquiring training running state data of a plurality of preset time points of a training diagnosed storage module in a preset time period;
acquiring a training error log of the training diagnosed storage module;
arranging the training operation state data of the plurality of preset time points into a training operation state data time sequence input matrix according to a time dimension and a sample dimension;
The training operation state data time sequence input matrix passes through the convolutional neural network model serving as a filter to obtain a training operation state time sequence associated feature vector;
after word segmentation is carried out on the training error log, a semantic encoder containing a word embedding layer is used for obtaining a training error log semantic understanding feature vector;
fusing the training operation state time sequence associated feature vector and the training error log semantic understanding feature vector to obtain a training classification feature vector;
passing the training classification feature vector through the classifier to obtain a classification loss function value; and
training the convolutional neural network model as a filter, the semantic encoder comprising a word embedding layer and the classifier based on the classification loss function value and with back propagation of gradient descent, wherein in each iteration of the training, a cross-domain attention transfer optimization of feature distribution is performed on a weight matrix of the classifier;
the cross-domain attention transfer optimization of feature distribution is carried out on the weight matrix of the classifier, and the cross-domain attention transfer optimization comprises the following steps: performing cross-domain attention transfer optimization of feature distribution on the weight matrix of the classifier by using the following transfer optimization formula to obtain an optimized weight matrix;
The transfer optimization formula is as follows:
wherein ,is the weight matrix,/->Is of the scale +.>,/> />To->Is the respective row vector of the weight matrix, is->Representing the two norms of the feature vector, +.>Represents the +.o of the weight matrix>Line->The characteristic value of the column,is a row vector obtained by arranging the sum value of each row vector of the weight matrix,/->Representing the transpose of the matrix>Representing matrix multiplication +.> and />All represent a single layer convolution operation, ">Is the optimized weight matrix;
the step of fusing the operation state time sequence associated feature vector and the error log semantic understanding feature vector to obtain a classification feature vector comprises the following steps: fusing the operation state time sequence associated feature vector and the error log semantic understanding feature vector by using the following optimization formula to obtain a classification feature vector;
wherein, the optimization formula is:
wherein ,representing the operational state timing related feature vector, < >>Representing the error log semantic understanding feature vector, < >> and />Representing the first and second norms of the vector, respectively, "> and />The weight and bias super-parameters are respectively given,representing a per-position distance matrix between the run state timing related feature vector and the error log semantic understanding feature vector, and +. >Is a unitary matrix->Representing multiplication by location +.>Representing matrix multiplication +.>Representing addition by position +.>Representing subtraction by position +.>Representing the classification feature vector.
7. The storage failure analysis method of claim 6, wherein passing the operational state data timing input matrix through a convolutional neural network model as a filter to obtain an operational state timing correlation eigenvector, comprising: each layer of the convolutional neural network model used as the filter performs the following steps on input data in forward transfer of the layer:
carrying out convolution processing on the input data to obtain a convolution characteristic diagram;
carrying out mean pooling treatment based on a feature matrix on the convolution feature map to obtain a pooled feature map; and
non-linear activation is carried out on the pooled feature map so as to obtain an activated feature map;
the output of the last layer of the convolutional neural network model serving as the filter is the operation state time sequence associated feature vector, and the input of the first layer of the convolutional neural network model serving as the filter is the operation state data time sequence input matrix.
CN202310906536.9A 2023-07-24 2023-07-24 Storage fault analysis system and method thereof Active CN116627708B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310906536.9A CN116627708B (en) 2023-07-24 2023-07-24 Storage fault analysis system and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310906536.9A CN116627708B (en) 2023-07-24 2023-07-24 Storage fault analysis system and method thereof

Publications (2)

Publication Number Publication Date
CN116627708A CN116627708A (en) 2023-08-22
CN116627708B true CN116627708B (en) 2023-09-19

Family

ID=87617438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310906536.9A Active CN116627708B (en) 2023-07-24 2023-07-24 Storage fault analysis system and method thereof

Country Status (1)

Country Link
CN (1) CN116627708B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116777892B (en) * 2023-07-03 2024-01-26 东莞市震坤行胶粘剂有限公司 Method and system for detecting dispensing quality based on visual detection
CN116778430B (en) * 2023-08-24 2023-11-28 吉林省牛人网络科技股份有限公司 Disease monitoring system and method for beef cattle cultivation
CN117034093A (en) * 2023-10-10 2023-11-10 尚宁智感(北京)科技有限公司 Intrusion signal identification method based on optical fiber system
CN117076915B (en) * 2023-10-17 2024-01-09 中海油能源发展股份有限公司采油服务分公司 Intelligent fault attribution analysis method and system for FPSO crude oil process system
CN117093996B (en) * 2023-10-18 2024-02-06 湖南惟储信息技术有限公司 Safety protection method and system for embedded operating system
CN117520993B (en) * 2023-12-29 2024-03-08 杭州杭叉桥箱有限公司 Electric drive axle test system and method thereof
CN117494056B (en) * 2023-12-29 2024-03-22 长春黄金设计院有限公司 Equipment fault early warning system and method based on big data technology

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107741568A (en) * 2017-11-08 2018-02-27 中南大学 A kind of lithium battery SOC estimation method that optimization RBF neural is shifted based on state
CN110163414A (en) * 2019-04-18 2019-08-23 中南大学 A kind of multiple target state transfer optimization method and system based on decomposition
US10459962B1 (en) * 2018-09-19 2019-10-29 Servicenow, Inc. Selectively generating word vector and paragraph vector representations of fields for machine learning
WO2020250247A2 (en) * 2019-06-13 2020-12-17 Tata Consultancy Services Limited Method and system for industrial anomaly detection
CN112668307A (en) * 2020-12-30 2021-04-16 清华大学 Automatic bilingual sentence alignment method and device
EP3879429A2 (en) * 2020-06-16 2021-09-15 Baidu USA LLC Cross-lingual unsupervised classification with multi-view transfer learning
EP3940608A1 (en) * 2020-07-15 2022-01-19 Accenture Global Solutions Limited Utilizing machine learning models with a centralized repository of log data to predict events and generate alerts and recommendations
CN114580383A (en) * 2022-03-03 2022-06-03 中国工商银行股份有限公司 Log analysis model training method and device, electronic equipment and storage medium

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7082394B2 (en) * 2002-06-25 2006-07-25 Microsoft Corporation Noise-robust feature extraction using multi-layer principal component analysis
US10348828B2 (en) * 2016-06-20 2019-07-09 Cisco Technology, Inc. Method and apparatus for optimizing data transfers utilizing machine learning
US10761952B2 (en) * 2018-04-13 2020-09-01 International Business Machines Corporation Intelligent failover migration across multiple high availability stacks based on quality of prior failover migrations
US10990758B2 (en) * 2018-05-04 2021-04-27 Dell Products L.P. Linguistic semantic analysis monitoring/alert integration system
US11741560B2 (en) * 2019-09-09 2023-08-29 Deckard Technologies, Inc. Detecting and validating improper homeowner exemptions through data mining, natural language processing, and machine learning
US11113048B1 (en) * 2020-02-26 2021-09-07 Accenture Global Solutions Limited Utilizing artificial intelligence and machine learning models to reverse engineer an application from application artifacts
US11900071B2 (en) * 2020-05-29 2024-02-13 Fmr Llc Generating customized digital documents using artificial intelligence
US11360843B2 (en) * 2020-07-10 2022-06-14 Hitachi, Ltd. System and signal processing method for failure mode identification
US20220101115A1 (en) * 2020-09-28 2022-03-31 International Business Machines Corporation Automatically converting error logs having different format types into a standardized and labeled format having relevant natural language information
US20220179764A1 (en) * 2020-12-03 2022-06-09 International Business Machines Corporation Multi-source data correlation extraction for anomaly detection
CN115129679A (en) * 2021-03-29 2022-09-30 戴尔产品有限公司 Service request remediation through machine-learning based identification of critical areas of log files

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107741568A (en) * 2017-11-08 2018-02-27 中南大学 A kind of lithium battery SOC estimation method that optimization RBF neural is shifted based on state
US10459962B1 (en) * 2018-09-19 2019-10-29 Servicenow, Inc. Selectively generating word vector and paragraph vector representations of fields for machine learning
CN110163414A (en) * 2019-04-18 2019-08-23 中南大学 A kind of multiple target state transfer optimization method and system based on decomposition
WO2020250247A2 (en) * 2019-06-13 2020-12-17 Tata Consultancy Services Limited Method and system for industrial anomaly detection
EP3879429A2 (en) * 2020-06-16 2021-09-15 Baidu USA LLC Cross-lingual unsupervised classification with multi-view transfer learning
EP3940608A1 (en) * 2020-07-15 2022-01-19 Accenture Global Solutions Limited Utilizing machine learning models with a centralized repository of log data to predict events and generate alerts and recommendations
CN112668307A (en) * 2020-12-30 2021-04-16 清华大学 Automatic bilingual sentence alignment method and device
CN114580383A (en) * 2022-03-03 2022-06-03 中国工商银行股份有限公司 Log analysis model training method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
双重状态转移优化RBFNN的锂电池SOC估算方法;阳春华;李学鹏;陈宁;周晓君;;控制工程(第12期);全文 *

Also Published As

Publication number Publication date
CN116627708A (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN116627708B (en) Storage fault analysis system and method thereof
CN116245513A (en) Automatic operation and maintenance system and method based on rule base
CN113094200A (en) Application program fault prediction method and device
CN116405326B (en) Information security management method and system based on block chain
CN115951883B (en) Service component management system of distributed micro-service architecture and method thereof
CN115859437A (en) Jacket underwater stress detection system based on distributed optical fiber sensing system
CN116679890B (en) Storage device security management system and method thereof
CN114615019A (en) Anomaly detection method and system based on micro-service topological relation generation
CN112257263A (en) Equipment residual life prediction system based on self-attention mechanism
CN115344414A (en) Log anomaly detection method and system based on LSTM-Transformer
CN116663540A (en) Financial event extraction method based on small sample
CN114416479A (en) Log sequence anomaly detection method based on out-of-stream regularization
CN112394973B (en) Multi-language code plagiarism detection method based on pseudo-twin network
CN117076931B (en) Time sequence data prediction method and system based on conditional diffusion model
CN117231590A (en) Fault prediction system and method for hydraulic system
CN116663499A (en) Intelligent industrial data processing method and system
CN115859989A (en) Entity identification method and system based on remote supervision
CN115982037A (en) Software defect prediction method based on abstract syntax tree
CN115757062A (en) Log anomaly detection method based on sentence embedding and Transformer-XL
CN115587007A (en) Robertta-based weblog security detection method and system
CN111737107B (en) Repeated defect report detection method based on heterogeneous information network
CN110650130B (en) Industrial control intrusion detection method based on multi-classification GoogLeNet-LSTM model
CN109886119B (en) Industrial control signal-based control function classification method and system
Tao et al. Biglog: Unsupervised large-scale pre-training for a unified log representation
CN115238805B (en) Training method of abnormal data recognition model and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant