CN116149895A - Big data cluster performance prediction method and device and computer equipment - Google Patents

Big data cluster performance prediction method and device and computer equipment Download PDF

Info

Publication number
CN116149895A
CN116149895A CN202310198721.7A CN202310198721A CN116149895A CN 116149895 A CN116149895 A CN 116149895A CN 202310198721 A CN202310198721 A CN 202310198721A CN 116149895 A CN116149895 A CN 116149895A
Authority
CN
China
Prior art keywords
target
data
matrix
cluster
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310198721.7A
Other languages
Chinese (zh)
Inventor
杨济银
沈贇
黄萌
阳万里
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310198721.7A priority Critical patent/CN116149895A/en
Publication of CN116149895A publication Critical patent/CN116149895A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present application relates to the field of big data technologies, and in particular, to a method, an apparatus, and a computer device for predicting performance of a big data cluster, where the method includes: acquiring attribute data of a target cluster and behavior data of a target application at each target moment; constructing an input matrix according to attribute data of a target cluster and behavior data of a target application at each target moment; determining a performance prediction matrix of the target cluster based on the input matrix; performing abnormality judgment on the performance prediction matrix according to a preset threshold matrix, and determining a judgment result, wherein the judgment result is used for representing whether attribute data of the target cluster is abnormal or not under N time units in the future; and under the condition that the judging result represents that the attribute data of the target cluster is abnormal under the condition of N time units in the future, processing the performance prediction matrix based on a preset repairing strategy to obtain a simulation repairing result aiming at the target cluster. By adopting the method and the device, the intelligent monitoring of the big data cluster with the self-repairing attempt mechanism can be realized.

Description

Big data cluster performance prediction method and device and computer equipment
Technical Field
The present disclosure relates to the field of big data technologies, and in particular, to a method, an apparatus, and a computer device for predicting performance of a big data cluster.
Background
With the development of the big data field, the traditional operation and maintenance has the problems of larger cluster regulation, more complex monitoring visualization and intellectualization and the like, and the intelligent operation and maintenance is generated.
In the field of intelligent operation and maintenance, intelligent monitoring modeling of big data clusters based on multi-angle and multi-dimensional monitoring indexes is a very important fundamental problem. The conventional technology generally adopts a traditional time sequence model or a neural network model to establish a big data cluster model. However, the intelligent monitoring modeling method lacks a reliable repairing mechanism, and cannot realize a reliable self-repairing mechanism by predicting abnormality, simulating emergency repairing and evaluating the simulated emergency effect, and the abnormal condition can only be manually interfered by operation and maintenance personnel, so that the efficiency is low.
It can be seen that the modeling scheme for intelligent monitoring of the big data cluster does not have a self-repairing attempt mechanism.
Disclosure of Invention
Based on the foregoing, it is necessary to provide a large data cluster performance prediction method, apparatus, computer device, computer readable storage medium and computer program product to implement intelligent monitoring of a large data cluster with a self-repair attempt mechanism.
In a first aspect, the present application provides a method for predicting performance of a big data cluster, where the method includes:
acquiring attribute data of a target cluster and behavior data of a target application at each target moment, wherein the target application runs on the target cluster;
constructing an input matrix according to the attribute data of the target cluster and the behavior data of the target application at each target time;
determining a performance prediction matrix of the target cluster based on the input matrix, wherein the performance prediction matrix comprises attribute data of the target cluster and behavior data of the target application under N time units in the future, and N is a positive integer;
performing abnormality judgment on the performance prediction matrix according to a preset threshold matrix, and determining a judgment result, wherein the judgment result is used for representing whether attribute data of the target cluster are abnormal or not under N time units in the future;
and under the condition that the judging result represents that the attribute data of the target cluster is abnormal in N time units in the future, processing the performance prediction matrix based on a preset repair strategy to obtain a simulation repair result aiming at the target cluster.
In one embodiment, the constructing an input matrix according to the attribute data of the target cluster and the behavior data of the target application at each target time includes:
Aiming at any target moment, constructing a data vector corresponding to the target moment according to the target moment, attribute data of the target cluster under the target moment and behavior data of the target application;
and arranging K row data vectors corresponding to each target moment in time sequence, and splicing the K row data vectors with N row zero vectors to obtain an input matrix, wherein K is a positive integer.
In one embodiment, the determining the performance prediction matrix of the target cluster based on the input matrix includes:
and processing the input matrix by adopting a target Informier model to obtain a performance prediction matrix of the target cluster.
In one embodiment, the performing anomaly judgment on the performance prediction matrix according to a preset threshold matrix, and determining the judgment result includes:
scalar calculation of corresponding positions is carried out on a preset threshold matrix and the performance prediction matrix, and a state vector is determined;
and under the condition that the modular length of the state vector is larger than zero, obtaining a judgment result representing that the attribute data of the target cluster is abnormal under N time units in the future.
In one embodiment, the performance prediction matrix includes N rows of prediction vectors, where the prediction vectors include attribute data of the target cluster and behavior data of the target application corresponding to any prediction time under N future time units, and the processing the performance prediction matrix based on a preset repair policy to obtain a simulated repair result for the target cluster includes:
Changing any numerical value in the prediction vector in the performance prediction matrix based on a preset repair strategy to obtain a simulation prediction matrix;
selecting a front a-row predictive vector from the simulation predictive matrix, and splicing the front a-row predictive vector, a rear K-a-row data vector and an N-row zero vector of the input matrix to obtain a simulation input matrix, wherein a is a positive integer;
and processing the simulation input matrix by adopting a target Informier model to obtain a simulation repair result aiming at the target cluster.
In one embodiment, the method further comprises:
acquiring sample attribute data of the target cluster and sample behavior data of the target application at each sample time, wherein the last N sample times in each sample time are taken as marking times, and the rest sample times are taken as training times;
according to sample attribute data of the target cluster at each training moment and sample behavior data of the target application, a sample input matrix is constructed, wherein the sample input matrix comprises a plurality of rows of sample data vectors which are arranged in time sequence, and the sample data vectors comprise the training moment, the sample attribute data of the target cluster at the training moment and the sample behavior data of the target application;
And training an initial Informier model based on the sample input matrix to obtain the target Informier model.
In one embodiment, the training the initial Informier model based on the sample input matrix to obtain the target Informier model includes:
taking each row of the sample data vectors in the sample input matrix as an encoder input sequence;
the samples are input into K rows of sample data vectors after the matrix and are spliced with N rows of zero vectors to obtain a decoder input sequence;
training an initial Informier model based on the encoder input sequence and the decoder input sequence to obtain the target Informier model.
In one embodiment, the training the initial Informier model based on the encoder input sequence and the decoder input sequence to obtain the target Informier model includes:
inputting the encoder input sequence into an encoder of an initial Informir model to obtain hidden layer characteristics;
inputting the decoder input sequence and the hidden layer characteristics into a decoder of the initial Informir model to obtain prediction data corresponding to each marked moment;
and training the initial Informier model according to the difference between the sample attribute data of the target cluster and the sample behavior data of the target application at each labeling moment and the prediction data corresponding to each labeling moment to obtain the target Informier model.
In a second aspect, the present application further provides a big data cluster performance prediction apparatus, the apparatus including:
the data acquisition module is used for acquiring attribute data of a target cluster and behavior data of a target application at each target moment, wherein the target application runs on the target cluster;
the matrix construction module is used for constructing an input matrix according to the attribute data of the target cluster and the behavior data of the target application at each target time;
the prediction module is used for determining a performance prediction matrix of the target cluster based on the input matrix, wherein the performance prediction matrix comprises attribute data of the target cluster and behavior data of the target application under N time units in the future, and N is a positive integer;
the judging module is used for carrying out abnormal judgment on the performance prediction matrix according to a preset threshold matrix, and determining a judging result, wherein the judging result is used for representing whether the attribute data of the target cluster is abnormal or not under N time units in the future;
and the simulation repair module is used for processing the performance prediction matrix based on a preset repair strategy under the condition that the judging result represents that the attribute data of the target cluster is abnormal in N time units in the future, so as to obtain a simulation repair result aiming at the target cluster.
In one embodiment, the matrix construction module is further configured to construct, for any one of the target moments, a data vector corresponding to the target moment according to the target moment, attribute data of the target cluster at the target moment, and behavior data of the target application; and arranging K row data vectors corresponding to each target moment in time sequence, and splicing the K row data vectors with N row zero vectors to obtain an input matrix, wherein K is a positive integer.
In one embodiment, the prediction module is further configured to process the input matrix by using a target infomer model to obtain a performance prediction matrix of the target cluster.
In one embodiment, the judging module is further configured to perform scalar calculation on the corresponding positions of the preset threshold matrix and the performance prediction matrix, and determine a state vector; and under the condition that the modular length of the state vector is larger than zero, obtaining a judgment result representing that the attribute data of the target cluster is abnormal under N time units in the future.
In one embodiment, the performance prediction matrix includes N rows of prediction vectors, where the prediction vectors include attribute data of the target cluster and behavior data of the target application corresponding to any prediction time under N future time units. The simulation repair module is further used for changing any numerical value in the prediction vector in the performance prediction matrix based on a preset repair strategy to obtain a simulation prediction matrix; selecting a front a-row predictive vector from the simulation predictive matrix, and splicing the front a-row predictive vector, a rear K-a-row data vector and an N-row zero vector of the input matrix to obtain a simulation input matrix, wherein a is a positive integer; and processing the simulation input matrix by adopting a target Informier model to obtain a simulation repair result aiming at the target cluster.
In one embodiment, the big data cluster performance prediction device further includes a training module, where the training module is configured to obtain sample attribute data of the target cluster and sample behavior data of the target application at each sample time, and take the last N sample moments of the sample moments as marking moments, and the remaining sample moments as training moments; according to sample attribute data of the target cluster at each training moment and sample behavior data of the target application, a sample input matrix is constructed, wherein the sample input matrix comprises a plurality of rows of sample data vectors which are arranged in time sequence, and the sample data vectors comprise the training moment, the sample attribute data of the target cluster at the training moment and the sample behavior data of the target application; and training an initial Informier model based on the sample input matrix to obtain the target Informier model.
In one embodiment, the training module is further configured to use each row of the sample data vector in the sample input matrix as an encoder input sequence; the samples are input into K rows of sample data vectors after the matrix and are spliced with N rows of zero vectors to obtain a decoder input sequence; training an initial Informier model based on the encoder input sequence and the decoder input sequence to obtain the target Informier model.
In one embodiment, the training module is further configured to input the encoder input sequence into an encoder of an initial infomer model to obtain hidden layer features; inputting the decoder input sequence and the hidden layer characteristics into a decoder of the initial Informir model to obtain prediction data corresponding to each marked moment; and training the initial Informier model according to the difference between the sample attribute data of the target cluster and the sample behavior data of the target application at each labeling moment and the prediction data corresponding to each labeling moment to obtain the target Informier model.
In a third aspect, the present application further provides a computer device, where the computer device includes a memory and a processor, where the memory stores a computer program, and where the processor implements the steps of the method embodiments described above when the processor executes the computer program.
In a fourth aspect, the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
The big data cluster performance prediction method, the big data cluster performance prediction device, the computer equipment, the computer readable storage medium and the computer program product acquire attribute data of a target cluster and behavior data of a target application at each target moment, wherein the target application runs on the target cluster; constructing an input matrix according to the attribute data of the target cluster and the behavior data of the target application at each target time; determining a performance prediction matrix of the target cluster based on the input matrix, wherein the performance prediction matrix comprises attribute data of the target cluster and behavior data of the target application under N time units in the future, and N is a positive integer; performing abnormality judgment on the performance prediction matrix according to a preset threshold matrix, and determining a judgment result, wherein the judgment result is used for representing whether attribute data of the target cluster are abnormal or not under N time units in the future; and under the condition that the judging result represents that the attribute data of the target cluster is abnormal in N time units in the future, processing the performance prediction matrix based on a preset repair strategy to obtain a simulation repair result aiming at the target cluster. According to the large data cluster performance prediction method, device, computer equipment, computer readable storage medium and computer program product, attribute data of a target cluster and behavior data of target application under N time units in the future are predicted to obtain a performance prediction matrix, whether the future running state of the target cluster is normal or not is judged by using a predefined threshold matrix, then simulation repair of the target cluster is carried out according to a judgment result and a preset repair strategy, and intelligent monitoring of the large data cluster with a self-repair attempt mechanism is realized.
Drawings
FIG. 1 is a flow chart of a method for predicting performance of a big data cluster in one embodiment;
FIG. 2 is a flow chart of step 104 in one embodiment;
FIG. 3 is a flow chart of step 108 in one embodiment;
FIG. 4 is a flow chart of step 110 in one embodiment;
FIG. 5 is a flow chart of a method for predicting performance of a big data cluster in one embodiment;
FIG. 6 is a flow chart of step 506 in one embodiment;
FIG. 7 is a schematic diagram of the structure of an Informier model in one embodiment;
FIG. 8 is a flow chart of step 606 in one embodiment;
FIG. 9 is a flow chart of a method for large data cluster performance prediction in one embodiment;
FIG. 10 is a flow chart of a method for large data cluster performance prediction in one embodiment;
FIG. 11 is a schematic diagram of a large data cluster performance prediction apparatus according to one embodiment;
fig. 12 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
A time series (time series) is a set of data point sequences arranged in chronological order. Typically, the time interval of a set of time series is a constant value (e.g., 1 second, 5 minutes, 12 hours, 7 days, 1 year), so the time series can be analyzed as discrete time data. And (3) time sequence prediction: the basic tasks include: (1) single index timing prediction task: given the historical change condition of a certain index, predicting the change of the index in a future period of time; (2) multi-index timing prediction task: given the historical change of several indicators, its change over a period of time in the future is predicted. The task is different from a single-index time sequence prediction task in that a plurality of indexes are not necessarily independent of each other, but have certain influence; (3) timing anomaly detection task: a process of identifying abnormal events or behaviors from a normal time series. The method can detect from historical data, and early warning can be made on the abnormality which does not occur based on time sequence prediction; (4) clustering time sequence indexes: and classifying the time sequence indexes with similar change trends into the same category. In actual operation and maintenance work, the faced indexes can be hundreds or thousands, the indexes are respectively analyzed, the workload is too great, and modeling analysis can be performed on the basis of clustering; (5) index association analysis: i.e. whether the analysis of index a will have an impact on index B and what impact (positive/negative, precedence, how much time steps follow, etc.). Long-sequence time series prediction (Long sequence time-series forecasting, LSTF) problems exist in a number of areas and are often encountered, such as power consumption planning, network monitoring, energy management, economic and financial, and disease propagation analysis, where long-term predictions of the future need to be made from large amounts of data in the past. LSTF expects that the model can possess a high prediction capacity (capability) in order to be able to capture long-range dependencies between inputs and outputs.
In the big data age of data volume explosion, in order to solve the cluster rule that exists in traditional fortune dimension and is bigger, monitor a plurality of difficult problems such as visual and intelligent more complicacy, intelligent fortune dimension has become to come together. In the field of intelligent operation and maintenance, intelligent monitoring is an important issue, because the monitored objects often involve systems that are not single but rather large and complex. The indexes mutually influence each other, and even events are successively divided. Therefore, intelligent monitoring modeling of big data clusters based on multi-angle and multi-dimensional monitoring indexes is a very important fundamental problem. The establishment of a good intelligent monitoring model can at least meet the following requirements: (1) By combining the current cluster running state and the behavior data (such as access quantity) of the application running on the system, whether the cluster is stable or not is found and alarmed in advance in a period of time in the future, so that a proactive handling operation is made (or an auxiliary operation and maintenance personnel makes) to ensure that the cluster is stable in a period of time in the future; (2) The simulation of the pressure test can be performed based on the established model and a reliable pressure test conclusion can be obtained so as to provide a reference report for the pressure test actually performed on the cluster. At present, a big data cluster model is established based on a traditional time sequence model or a neural network model in the traditional technology. The intelligent monitoring method has a plurality of application scenes in the intelligent monitoring of the big data clusters, such as the prediction of disk use rate, the prediction of network flow, the prediction of online number of players of the game, and the like. The requirements for prediction are different in different scenes, the requirements for short-term prediction long-term prediction and prediction efficiency and even prediction accuracy are different, for example, trend prediction alarm is generated, the requirements can be met by timely early alarm, and the requirements for specific predicted values are not high. Therefore, in practice, the model for intelligent monitoring of the big data cluster is generally equivalent to a time sequence model flow, and the time sequence model flow is generally divided into four steps of data acquisition, data preprocessing, period detection, and model establishment and training. Firstly, data acquisition: the method comprises the steps of presetting a detection index in a large data cluster, wherein the acquisition mode of index data comprises, but is not limited to, script acquisition monitoring data or service burial points and the like. And secondly, data preprocessing, including data smoothing, normalization, missing data filling and the like. Again, cycle detection, since the running data of the cluster in most cases conforms to a certain periodicity, the cycle of the data needs to be determined in advance. However, the data period to be predicted is not fixed, and therefore, the period needs to be identified by a mathematical method such as fourier series or autocorrelation coefficients. And finally, establishing and training models, wherein the models are selected according to different prediction targets and can be divided into a short-term prediction model, a medium-term prediction model and a long-term prediction model according to the prediction targets.
In the prediction process, different models are possibly needed for different scenes and different data, so that the problems of certain hysteresis, low prediction precision and the like can occur when the traditional model is used for different period demands, and longer training time and more hardware resources are needed when the long-term memory neural network model is used. Training the resource cost and accuracy goal of a robust, unified model is a matter of balance. In addition, due to the selection of the model, the intelligent monitoring modeling method in the traditional technology lacks reliability, and cannot achieve a set of reliable self-repairing mechanism for predicting abnormality, simulating emergency repairing, evaluating the simulated emergency effect and implementing an emergency scheme. Therefore, the traditional technology uses a method for establishing a large data cluster intelligent monitoring overall model by using a plurality of sub-models, so that the correlation among a plurality of indexes is difficult to establish, and the correlation among a long-term prediction target, a medium-term prediction target and a short-term prediction target is difficult to predict; certain hysteresis can occur in prediction, prediction precision is low, and the neural network training consumes more resources; the method is characterized in that periodic parameters of different time spans are required to be established aiming at different prediction targets, and more super parameters are required to be established during modeling; because it is difficult to build a reliable medium-long term predictive model, it is difficult to build a reliable self-repair mechanism.
Based on the above, the embodiment of the application provides a big data cluster performance prediction method to solve the above problem and realize big data cluster intelligent monitoring with a self-repairing attempt mechanism.
In one embodiment, as shown in fig. 1, a method for predicting performance of a big data cluster is provided, where this embodiment is applied to a server for illustration, it is understood that the method may also be applied to a terminal, and may also be applied to a system including a terminal and a server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:
step 102, obtaining attribute data of a target cluster and behavior data of a target application at each target moment, wherein the target application runs on the target cluster.
The target cluster is a cluster needing big data intelligent monitoring, and the target moment is the moment of data acquisition. The attribute data of the target cluster may include server operation data, database physical machine operation data, resource utilization, network traffic, and the like. The behavior data of the target application may include, for example, SQL (Structured Query Language, structured query statement, for managing relational database management system) access volumes, SQL access data volumes, SQL time consuming, etc. For example, an ELK architecture (a set of open-source free and powerful log analysis management system, which is composed of Elasticsearch, logstash, kibana parts, abbreviated as ELK) may be used to collect attribute data of a target cluster and behavior data of a target application running on the target cluster, and the attribute data and the behavior data are integrated into a collection center for unified management and are consolidated into a data set. ELK is a framework for solving the problem of log data, and the main solution is the storage and retrieval of log data, the collection, filtration and formatting of the log, and the display statistics and data visualization of the log.
And 104, constructing an input matrix according to the attribute data of the target cluster and the behavior data of the target application at each target moment.
The attribute data of the target cluster at each target moment and the behavior data of the target application can be integrated into an aggregation calculation result with a timestamp and the finest granularity of a minute level, and organized into a data row. And (3) carrying out certain reconstruction processing on each data line to obtain an input matrix.
And 106, determining a performance prediction matrix of the target cluster based on the input matrix, wherein the performance prediction matrix comprises attribute data of the target cluster and behavior data of the target application under N time units in the future, and N is a positive integer.
The input matrix can be input into a target Informier model trained in advance, and the performance prediction matrix is obtained through output. The performance prediction matrix may include prediction vectors in which N time units are arranged in time sequence from top to bottom, and each of the prediction vectors includes a prediction time, attribute data of a target cluster corresponding to the prediction time, and behavior data of a target application.
And step 108, performing abnormality judgment on the performance prediction matrix according to a preset threshold matrix, and determining a judgment result, wherein the judgment result is used for representing whether the attribute data of the target cluster is abnormal or not under N time units in the future.
Wherein predefined rules (including but not limited to conventional logic rules or neural networks) may be used to determine whether the future operating state of the target cluster is normal. For example, the predefined rule may be a preset threshold matrix, and the abnormality determination is performed on the performance prediction matrix according to the preset threshold matrix. It should be noted that, in the embodiments of the present application, each threshold value in the threshold value matrix is not specifically limited, so long as the corresponding positions of the threshold value matrix and the performance prediction matrix are the same attribute data or behavior data. And if the attribute data of one target cluster in the performance prediction matrix is larger than the threshold value of the corresponding position in the threshold value matrix, the judgment result of the performance prediction matrix is abnormal.
Step 110, under the condition that the judging result represents that the attribute data of the target cluster is abnormal in the future N time units, the performance prediction matrix is processed based on a preset repair strategy, and a simulation repair result aiming at the target cluster is obtained.
The embodiment of the application does not specifically limit the preset repair strategy. The preset repair strategy can be a plurality of emergency schemes formulated by experts, each emergency scheme can adjust the value of each position in the performance prediction matrix to simulate the change of attribute data of the target cluster when the emergency scheme is implemented on the target cluster, and a simulated repair result aiming at the target cluster is obtained.
According to the large data cluster performance prediction method provided by the embodiment of the application, the attribute data of the target cluster and the behavior data of the target application under N time units in the future are predicted to obtain the performance prediction matrix, whether the future running state of the target cluster is normal or not is judged by using the predefined threshold matrix, and then the simulation repair of the target cluster is carried out according to the judgment result and a preset repair strategy, so that the intelligent monitoring of the large data cluster with a self-repair attempt mechanism is realized.
In one embodiment, as shown in fig. 2, in step 104, constructing an input matrix according to attribute data of the target cluster and behavior data of the target application at each target time may include:
step 202, for any target time, constructing a data vector corresponding to the target time according to the target time, attribute data of the target cluster under the target time and behavior data of the target application.
The average value of the attribute data of the target cluster and the behavior data of the target application at the target time in the minute corresponding to the target time can be obtained, and the average value and the target time are integrated into a data line. Taking the attribute data as an example, the attribute data includes a database operation index, a server operation index, and the behavior data includes an application behavior data index, and the data row may be <2022-09-0119:08:13, the database operation index, the server operation index, and the application behavior data index >. And mapping the target time processing in the data line into a low-dimensional vector to obtain the data vector. Each bit of the vector indicates that the timestamp belongs to: year, month, week, day, hour, minute, abrupt event time special signs, etc., such as 2022-05-01:00:00, may be expressed as: [2022,5,1,1,2,1,5] means 5 months, 1 day, 1 week, 2 hours, 1 minute, fifth holiday nationwide. The special sign of the sudden event can be specifically defined according to the specific conditions of the cluster monitoring service, such as legal holidays, fixed production days, major public opinion event signs and the like.
And 204, arranging K row data vectors corresponding to each target moment in time sequence, and splicing the K row data vectors with N row zero vectors to obtain an input matrix, wherein K is a positive integer.
It should be noted that, in the embodiment of the present application, the value of K is not specifically limited, that is, the number of the collected target moments is not specifically limited. The numerical value of N in the embodiment of the present application is not specifically limited, that is, the number of predicted moments to be predicted is not specifically limited. Each zero vector can comprise a time point to be predicted, and each data vector and the zero vector can be spliced from top to bottom in time sequence to obtain an input matrix.
In the embodiment of the disclosure, the target time is integrated into the data vector, so that convenience is provided for realizing intelligent monitoring of big data.
In one embodiment, determining the performance prediction matrix of the target cluster based on the input matrix in step 106 may include:
and processing the input matrix by adopting a target Informier model to obtain a performance prediction matrix of the target cluster.
The improved Informater time sequence prediction model is a modified Informater model based on a transducer and specially designed for long sequence time sequence prediction (Long Sequence Time-Series Forecasting, which is called LSTF below) and solves serious problems when the transducer is applied to the LSTF. Such as secondary time complexity, higher memory usage, and inherent limitations of the codec structure. The Informir time sequence prediction model is mainly characterized in that the one-time generation type prediction mode has faster reasoning speed and equivalent effect compared with the step-by-step mode. The target Informier model is a pre-trained relational model between large data clusters related to application behavior, time, and cluster performance attributes. The target Informier model may be implemented in the python language, version python 3.6, using the pytorch framework as a support library for the deep learning model. After the input matrix is input into the target Informier model, the performance prediction matrix of the target cluster can be output and obtained.
In the embodiment of the disclosure, the performance prediction matrix of the target cluster is predicted by using the target Informier model, and forward calculation can be performed once on the long-time sequence, so that all prediction results are output, the reasoning speed of long-time sequence prediction is greatly improved, and the efficiency of large data cluster performance prediction is improved.
In one embodiment, as shown in fig. 3, in step 108, performing anomaly judgment on the performance prediction matrix according to a preset threshold matrix, and determining the judgment result may include:
step 302, performing scalar calculation on the corresponding positions of the preset threshold matrix and the performance prediction matrix, and determining a state vector.
After a relation model between related application behaviors, time and cluster performance attributes of a big data cluster is established through a target Informier model, the output characteristics of the Informier model are a reliable performance prediction matrix in a long time, namely, a performance prediction matrix with the time window length of N. The performance prediction matrix satisfies the following equation (one).
Y predict =[y 1 ,...,y N ]Formula 1
Wherein Y is predict For the performance prediction matrix, y 1 The prediction vector is a prediction vector under the 1 st time unit in the future, and the prediction vector comprises attribute data of a target cluster corresponding to the 1 st time unit in the future and behavior data of a target application.
The threshold matrix is a matrix having the same shape as the performance prediction matrix, and scalar calculation of the corresponding position is performed with the prediction matrix. The corresponding positions of the threshold matrix and the performance prediction matrix are the same attribute data or behavior data, if the attribute data of the target cluster in the performance prediction matrix is larger than the threshold value of the corresponding position in the threshold matrix, the value is set to 1, otherwise, the value is set to 0, and a scalar calculation matrix is obtained.
Illustratively, the threshold matrix is a scalar calculation matrix that satisfies the following equation (two) with scalar calculation of the performance prediction matrix.
Figure BDA0004108259030000111
Wherein,,
Figure BDA0004108259030000112
calculating a matrix for scalar, y t1,data1 For data1 attribute data at t1 prediction time, τ data1 For the threshold value of data1 attribute data, only y t1,data1 Greater than tau data1 Setting the position to 1 and the other positions to 0; y is t2,data2 For data2 attribute data at t2 prediction time, τ data2 Is the threshold for data2 attribute data.
And fixing the non-time dimension of the scalar calculation matrix, and compressing to obtain a state vector A. Illustratively, a scalar calculation matrix
Figure BDA0004108259030000113
The time dimension of each column is different, but the attribute data dimension is the same, and the scalar calculation matrix is overlapped from top to bottom according to the columns to obtain a state vector A. The state vector a may satisfy the following formula (three).
A= [1,0] formula (iii)
And step 304, obtaining a judgment result representing that the attribute data of the target cluster is abnormal under the condition that the modular length of the state vector is larger than zero.
If the modulus of the state vector is greater than 0, it indicates that the target cluster will be abnormal in N time units in the future, meanwhile, the non-zero value position of the state vector indicates the corresponding attribute data of the target cluster that will generate the abnormality, and the size of the non-zero value indicates how many time units the abnormality exists, so as to reflect the severity of the abnormality. The modulo length of a state vector is the norm of the matrix, i.e. the sum of the squares of each element in the matrix, and the vector is a special matrix (1 x more or 1 x more).
According to the embodiment of the disclosure, the judgment of the abnormal condition of the target cluster is realized based on scalar calculation of the threshold matrix and the performance prediction matrix.
In one embodiment, as shown in fig. 4, the performance prediction matrix includes N rows of prediction vectors, where the prediction vectors include attribute data of a target cluster and behavior data of a target application corresponding to any prediction time under N time units in the future. In step 110, processing the performance prediction matrix based on a preset repair policy to obtain a simulated repair result for the target cluster may include:
Step 402, changing any numerical value in the prediction vectors in the performance prediction matrix based on a preset repair strategy to obtain a simulation prediction matrix.
The preset repair strategy can comprise various emergency schemes formulated by experts. The preset repair policy may be executed according to the actual situation of the state vector a. To ensure the validity of the emergency plan, verification of the emergency plan is required. Simulating the implementation of the contingency plan on the target cluster by adjusting the values of the various locations in the performance prediction matrix results in a change in the cluster performance data. Illustratively, an emergency plan 1 is executed, certain attribute data corresponding to the first a predicted moments is changed, and the number 1 of the emergency plan is added to the plan queue.
And step 404, selecting a front row of predictive vectors from the analog predictive matrix, and splicing the front row of predictive vectors, the rear K-a row of data vectors and the N row of zero vectors of the input matrix to obtain the analog input matrix, wherein a is a positive integer.
Where a is the number of rows of the prediction vector that change based on a preset repair policy. And selecting a front a-row predictive vector from the analog predictive matrix, and splicing the front a-row predictive vector with a rear K-a-row data vector and an N-row zero vector of the input matrix to obtain the analog input matrix.
And step 406, processing the simulation input matrix by adopting a target Informier model to obtain a simulation repair result aiming at the target cluster.
The behavior data of the target application in the front a-row predictive vector selected in the simulation predictive matrix is used along with the behavior data in the first K time units in the input matrix, so that under the current application behavior mode, after an emergency scheme is adopted, the cluster performance attribute prediction in the next N time units is consulted with the target Informier model. And after the simulation input matrix is input into the target Informier model, outputting to obtain the performance prediction matrix after the restoration strategy is executed. And performing scalar calculation on the corresponding positions of the performance prediction matrix and the threshold matrix after the repair strategy is executed to obtain a judgment result after the repair strategy is executed. And if the judging result represents that the attribute data of the target cluster is still abnormal under the condition of N time units in the future, obtaining a simulation repair result aiming at the target cluster, wherein the simulation repair result represents that the simulation repair fails. Step 402 is repeated, emergency plan 2 is executed, other attribute data is changed, and the number 2 of the emergency plan is added to the plan queue. Repeating the steps until the judging result represents that the attribute data of the target cluster is normal under the condition of N time units in the future, obtaining a simulation repairing result which represents that the simulation repairing is successful and aims at the target cluster, and sequentially executing schemes 1 and 2 which are already enqueued in the scheme queue to repair the target cluster; or repeating the steps until the repetition times reach a threshold times B, wherein the target cluster still cannot be recovered to be normal after B times of self-repairing attempts, and then sending an alarm to operation and maintenance personnel to prompt the operation and maintenance personnel to perform manual intervention, wherein B is a positive integer.
In the embodiment of the disclosure, a reliable cluster self-repairing mechanism can be established based on the target Informier model.
In one embodiment, as shown in fig. 5, the large data cluster performance prediction method may further include:
step 502, obtaining sample attribute data of a target cluster and sample behavior data of a target application at each sample time, taking the last N sample times of each sample time as marking times, and taking the rest sample times as training times.
The sample time is a historical time for data acquisition. Sample attribute data may include server operation data, database physical machine operation data, resource utilization, network traffic, etc., and sample behavior data of a target application may include, for example, SQL access volume, SQL access data total, SQL time consumption, etc. Sample attribute data and sample behavior data corresponding to the training time are used for inputting an initial Informier model for training, and sample attribute data and sample behavior data corresponding to the labeling time are used for labeling and correcting the output of the initial Informier model. For example, the first L sample moments are time-ordered as training moments, and the last N sample moments are labeling moments.
In step 504, a sample input matrix is constructed according to the sample attribute data of the target cluster and the sample behavior data of the target application at each training time, wherein the sample input matrix comprises a plurality of rows of sample data vectors arranged in time sequence, and the sample data vectors comprise the sample attribute data of the target cluster and the sample behavior data of the target application at the training time and the training time.
The average value of the sample attribute data of the target cluster and the sample behavior data of the target application at the training time in the corresponding minutes can be obtained, and the average value and the training time are integrated into a data row. Mapping the training time processing in the data row into a low-dimensional vector to obtain a sample data vector. And splicing the data vectors of each sample from top to bottom in time sequence to obtain a sample input matrix.
And step 506, training the initial Informier model based on the sample input matrix to obtain a target Informier model.
The sample input matrix can be input into an initial Informier model to obtain prediction data corresponding to each labeling time, and the initial Informier model is trained according to the prediction data corresponding to each labeling time, sample attribute data of a target cluster under each labeling time and sample behavior data of a target application to obtain a target Informier model. In the embodiment of the disclosure, a relationship model target Informir model between large data cluster related application behaviors, time and cluster performance attributes is established through sample time, sample attribute data and sample behavior data.
In one embodiment, as shown in FIG. 6, step 506, training the initial Informier model based on the sample input matrix to obtain the target Informier model may include:
at step 602, each row of sample data vectors in the sample input matrix is used as an encoder input sequence.
The Informir model is based on a Google classical large model, exceeds the classical transform model in the aspect of long-distance dependent information extraction, and has the characteristics of low computational complexity (compared with the transform model), low memory utilization rate and capability of efficiently processing long-sequence data. When the Informir model generates the prediction results, a parallel generation type decoder mechanism is adopted, and forward computation is carried out on the long-time sequence once, so that all the prediction results are output, and the reasoning speed of long-sequence prediction is greatly improved. In contrast, the transducer model often requires a step-by-step calculation of the calculation result for each time step when calculating the sequence result.
Illustratively, as shown in FIG. 7, the Informir model uses a codec structure. Encoder accepts long sequence input X feed_en The hidden layer characteristic representation is obtained through a self-attention module and a self-attention distillation module. Decoder accepts long sequence input X feed_de The target part output is directly predicted at the last time through interaction of multi-head self-attention and hidden layer characteristic representation. Using each row of sample data vectors in a sample input matrix as an encoder input sequence X feed_en . The encoder input sequence may satisfy the following equation (four).
Figure BDA0004108259030000141
Wherein L is the number of training moments, namely the number of rows of sample data vectors in the sample input matrix, X i And taking a real value in the sample data vector of the ith row, wherein R is a real number, and d is sample attribute data and sample behavior data in the sample data vector.
Step 604, the last K rows of sample data vectors in the sample input matrix are spliced with N rows of zero vectors to obtain the decoder input sequence.
Wherein, the last K rows of sample data vectors and N rows of zero vectors in the sample input matrix can be spliced from top to bottom in time sequence to obtain the decoder input sequence X feed_de . The decoder input sequence may satisfy the following equation (five).
Figure BDA0004108259030000142
Wherein concat represents splicing the last K rows of sample data vectors and N rows of zero vectors in the sample input matrix, and start represents the input sequence X of the slave encoder feed_en The length K of the tail truncated sequence, taking the truncated sequence as a starting condition, is 1-start to represent the last K rows of sample data vectors in the sample input matrix. Start-L represents the N rows of zero vectors.
Illustratively, 1,2,3,4,5,6,7,8,9, 10 are sequences of ten consecutive sample times, each sample time having data collected at that sample time, e.g., 1 sample time having 5 data collected at that sample time, the data items collected at each sample time being identical. If L is 8 and N is 2,K is 4, the encoder may receive a sequence of 1-8 training moments as the encoder input sequence X feed_en As the predicted "context". The decoder receives 5-10 sequences as the decoder input sequence X feed_de Which can be divided into two parts: 5-8 are the back K rows of sample data vectors in the sample input matrix as 'transition', and 9-10 are all acquired data corresponding to the original two marking moments, namely N zero vectors, which need to be replaced by 0.
The length of the sequences received by the encoder and the decoder is different, the training length of the encoder is generally long, and the decoding length of the decoder is short, so that the training effectiveness can be ensured.
Step 606, training the initial Infomer model based on the encoder input sequence and the decoder input sequence to obtain a target Infomer model.
The Informater model code has defined input and output formats, and the processed input sequence of the encoder and the processed input sequence of the decoder are directly imported into a training inlet of the initial Informater model. The mean square error (Mean Square Error, MSE) can be used during training as a loss function to guide the back propagation of the initial infomer model. The training of the target Informir model can be characterized by the MSE below a certain threshold, and the smaller the MSE is, the better the MSE is.
In the embodiment of the disclosure, the effectiveness of target Informir model training is ensured based on the fact that the encoder and the decoder receive sequences with different lengths.
In one embodiment, as shown in FIG. 8, step 606, training the initial Informier model based on the encoder input sequence and the decoder input sequence to obtain a target Informier model may include:
step 802, inputting an encoder input sequence into an encoder of an initial Informir model to obtain hidden layer features.
Wherein the initial Informater model code has been defined to input and output formats, and the encoder is directly input into sequence X feed_en The encoder of the initial Informir model is input, and hidden layer characteristics are obtained through a self-attention module and a self-attention distillation module.
Step 804, inputting the decoder input sequence and hidden layer features into the decoder of the initial Informir model to obtain the predicted data corresponding to each labeling time.
Wherein a decoder of an initial Informir model receives a decoder input sequence X feed_de The target part output is directly predicted at the last time through interaction of multi-head self-attention and hidden layer characteristic representation. Target portion Outhe tputs may include prediction data corresponding to each of the marked moments. The output may satisfy the following formula (six).
Figure BDA0004108259030000161
Wherein, if L is the number of vectors in the decoder input sequence, y i For the i-th row vector output by the decoder, dy is each item of attribute data and behavior data in the vector. The encoder can receive the sequence of 1-8 training moments as the encoder input sequence X feed_en The decoder receives 5-10 sequences as the decoder input sequence X feed_de The predicted 5-10 sequence is output as output. The sequence 9-10 is the predicted data corresponding to each marking time.
And step 806, training the initial Informir model according to the difference between the sample attribute data of the target cluster and the sample behavior data of the target application at each labeling moment and the prediction data corresponding to each labeling moment to obtain a target Informir model.
After obtaining the prediction data corresponding to each labeling time, the prediction data can be compared with the sample attribute data of the target cluster and the sample behavior data of the target application under each labeling time obtained in advance, the initial Informier model is subjected to iterative training according to the difference, and the training of the target Informier model is characterized when the difference reaches a set threshold value.
In the embodiment of the disclosure, the effectiveness of target Informir model training is ensured based on the fact that the encoder and the decoder receive sequences with different lengths.
To facilitate a further understanding of embodiments of the present application, as shown in fig. 9, the present application provides a most complete embodiment herein. The application is based on an open source long-sequence time series prediction model Informier model, the Informier model is partially realized by adopting a python language, the version is python 3.6, and a pytorch framework is used as a support library of a deep learning model. The method mainly comprises the following steps: 1) The ELK architecture is used for collecting attribute data (such as resource utilization rate, network traffic and the like) of a target cluster and application behavior data (such as SQL access quantity, SQL access data total quantity, SQL time consumption and the like) running on the cluster, and the attribute data and the application behavior data are integrated into a data set for unified management by an acquisition center. Specifically, the ELK framework is utilized to collect server operation data of a cluster frame, operation data of a database physical machine and behavior data of a plurality of APP deployed on the cluster, and the data are integrated into an aggregation calculation result with a timestamp and the finest granularity of a minute level, and are organized into data rows. Such as: < 2022-09-01:19:08:13, database run index, server run index, application behavior data index >. Wherein, all index data of the data line, except the time stamp, is an average value of running data within the minute. 2) And modifying the data set, and establishing the connection among the time sequence, the application behavior and the cluster attribute data based on the Informir model. Specifically, according to the structural design characteristics of the infomer model, the following formula (seventh) and formula (eighth) can be satisfied for each input/output pair of sample row data.
Pair=<x,y>=<[time_stamp,x t_app ,x t_data ],x t+1_data >Formula (seven)
y t =x t+1_data =Informer(time_stamp,x t_app ,x t_data ) Formula (eight)
The Pair is an input-output Pair of training data, x is an input of an Informir model, y is an output of the Informir model, time_stamp is a time stamp, xt_app is behavior data of a target application at a time t, xt_data is sample data of a target cluster at a time t+1, xt+1_data is attribute data of the target cluster at a time t+1, yt is an output of the Informir model at a time t, namely, the attribute data of the target cluster at a time t+1 is calculated by the time stamp, the behavior data and the attribute data at a time t.
3) As shown in fig. 10, the operation state pre-judging and the scheme simulation are performed, whether the future operation state is normal is judged by using a predefined rule (including but not limited to a conventional logic rule or a neural network), and then feedback or emergency scheme simulation is performed according to the model and the judgment result. Specifically, the first a prediction vectors of the simulation prediction matrix are spliced with the last K-a data vectors of the input matrix, application behavior data in the first K time units are used, the vectors of the last N time units are all zero vectors, and the step is that under the current application behavior mode, after an emergency scheme is adopted, cluster performance attribute prediction in the next N time units is consulted with a model. After obtaining new predictions, inputting the new predictions into a cluster running state judgment model for judgment; if the model judges that the cluster cannot be abnormal in the future, stopping and sequentially executing schemes which are already enqueued in the scheme queue, otherwise, indicating that the cluster cannot be restored to normal after B self-repairing attempts, and sending an alarm to operation and maintenance personnel to prompt the operation and maintenance personnel to perform manual intervention
The large data cluster performance prediction method provided by the embodiment of the application can be used for relatively common large data clusters based on the ELK framework, the modeling flow is short, the modeling method is simple, and a set of reliable cluster self-repairing mechanism can be established based on a reliable Informier model. The embodiment of the application solves the problems of more super parameters, weak correlation degree among all sub-models, more modeling steps and the like in the traditional modeling scheme for intelligent monitoring of the big data cluster, and provides a modeling method which has fewer modeling steps, correlates with various prediction indexes, has multiple prediction targets and has a self-repairing attempt mechanism.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a big data cluster performance prediction device for realizing the above-mentioned big data cluster performance prediction method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the apparatus for predicting performance of a big data cluster provided below may be referred to as the limitation of the method for predicting performance of a big data cluster hereinabove, and will not be repeated here.
In one embodiment, as shown in FIG. 11, a large data cluster performance prediction apparatus 1100 is provided. The big data cluster performance prediction apparatus 1100 includes:
the data acquisition module 1102 is configured to acquire attribute data of a target cluster and behavior data of a target application at each target moment, where the target application runs on the target cluster;
a matrix construction module 1104, configured to construct an input matrix according to attribute data of the target cluster and behavior data of the target application at each target moment;
the prediction module 1106 is configured to determine a performance prediction matrix of the target cluster based on the input matrix, where the performance prediction matrix includes attribute data of the target cluster and behavior data of the target application in N time units in the future, and N is a positive integer;
The judging module 1108 is configured to perform an anomaly judgment on the performance prediction matrix according to a preset threshold matrix, determine a judging result, and use the judging result to characterize whether attribute data of the target cluster is abnormal in N time units in the future;
the simulation repair module 1110 is configured to process the performance prediction matrix based on a preset repair policy to obtain a simulation repair result for the target cluster when the determination result indicates that the attribute data of the target cluster is abnormal in N time units in the future.
According to the large data cluster performance prediction device provided by the embodiment of the application, the attribute data of the target cluster and the behavior data of the target application under N time units in the future are predicted to obtain the performance prediction matrix, whether the future running state of the target cluster is normal or not is judged by using the predefined threshold matrix, and then the simulation repair of the target cluster is carried out according to the judgment result and the preset repair strategy, so that the intelligent monitoring of the large data cluster with the self-repair attempt mechanism is realized.
In one embodiment, the matrix construction module 1104 is further configured to construct, for any target time, a data vector corresponding to the target time according to the target time, attribute data of the target cluster at the target time, and behavior data of the target application; and (3) arranging K row data vectors corresponding to each target moment in time sequence, and splicing the K row data vectors with N row zero vectors to obtain an input matrix, wherein K is a positive integer.
In one embodiment, the prediction module 1106 is further configured to process the input matrix using a target infomer model to obtain a performance prediction matrix for the target cluster.
In one embodiment, the determining module 1108 is further configured to perform scalar calculation of corresponding positions of the preset threshold matrix and the performance prediction matrix, and determine a state vector; and under the condition that the modular length of the state vector is larger than zero, obtaining a judgment result representing that the attribute data of the target cluster is abnormal under N time units in the future.
In one embodiment, the performance prediction matrix includes N rows of prediction vectors, where the prediction vectors include attribute data of a target cluster and behavior data of a target application corresponding to any prediction time under N time units in the future. The simulation repair module 1110 is further configured to change any numerical value in the prediction vectors in the performance prediction matrix based on a preset repair policy, so as to obtain a simulation prediction matrix; selecting a front a-row predictive vector from the analog predictive matrix, and splicing the front a-row predictive vector with a rear K-a-row data vector and an N-row zero vector of the input matrix to obtain the analog input matrix, wherein a is a positive integer; and processing the simulation input matrix by adopting a target Informier model to obtain a simulation repair result aiming at the target cluster.
In one embodiment, the big data cluster performance prediction apparatus 1100 further includes a training module, where the training module is configured to obtain sample attribute data of the target cluster and sample behavior data of the target application at each sample time, and take the last N sample times of each sample time as marking times, and the remaining sample times as training times; according to sample attribute data of the target cluster and sample behavior data of the target application at each training moment, a sample input matrix is constructed, wherein the sample input matrix comprises a plurality of rows of sample data vectors which are arranged in time sequence, and the sample data vectors comprise the training moment, the sample attribute data of the target cluster at the training moment and the sample behavior data of the target application; training the initial Infomer model based on the sample input matrix to obtain a target Infomer model.
In one embodiment, the training module is further configured to use each row of sample data vectors in the sample input matrix as an encoder input sequence; the method comprises the steps of inputting samples into K rows of sample data vectors after a matrix, and splicing the K rows of sample data vectors with N rows of zero vectors to obtain a decoder input sequence; the initial Infomer model is trained based on the encoder input sequence and the decoder input sequence to obtain a target Infomer model.
In one embodiment, the training module is further configured to input an encoder input sequence into an encoder of the initial Informir model to obtain hidden layer features; inputting the decoder input sequence and hidden layer characteristics into a decoder of an initial Informir model to obtain prediction data corresponding to each labeling moment; and training the initial Informir model according to the difference between the sample attribute data of the target cluster at each labeling moment and the sample behavior data of the target application and the prediction data corresponding to each labeling moment to obtain a target Informir model.
The modules in the big data cluster performance prediction device can be implemented in whole or in part by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 12. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing big data cluster attribute data and application behavior data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a classification method of an application resource-intensive type.
It will be appreciated by those skilled in the art that the structure shown in fig. 12 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (12)

1. A method for predicting performance of a big data cluster, the method comprising:
acquiring attribute data of a target cluster and behavior data of a target application at each target moment, wherein the target application runs on the target cluster;
constructing an input matrix according to the attribute data of the target cluster and the behavior data of the target application at each target time;
Determining a performance prediction matrix of the target cluster based on the input matrix, wherein the performance prediction matrix comprises attribute data of the target cluster and behavior data of the target application under N time units in the future, and N is a positive integer;
performing abnormality judgment on the performance prediction matrix according to a preset threshold matrix, and determining a judgment result, wherein the judgment result is used for representing whether attribute data of the target cluster are abnormal or not under N time units in the future;
and under the condition that the judging result represents that the attribute data of the target cluster is abnormal in N time units in the future, processing the performance prediction matrix based on a preset repair strategy to obtain a simulation repair result aiming at the target cluster.
2. The method according to claim 1, wherein the constructing an input matrix according to the attribute data of the target cluster and the behavior data of the target application at each of the target time instants comprises:
aiming at any target moment, constructing a data vector corresponding to the target moment according to the target moment, attribute data of the target cluster under the target moment and behavior data of the target application;
And arranging K row data vectors corresponding to each target moment in time sequence, and splicing the K row data vectors with N row zero vectors to obtain an input matrix, wherein K is a positive integer.
3. The method of claim 1, wherein the determining a performance prediction matrix for the target cluster based on the input matrix comprises:
and processing the input matrix by adopting a target Informier model to obtain a performance prediction matrix of the target cluster.
4. The method according to claim 1, wherein the performing anomaly determination on the performance prediction matrix according to a preset threshold matrix, and determining a determination result includes:
scalar calculation of corresponding positions is carried out on a preset threshold matrix and the performance prediction matrix, and a state vector is determined;
and under the condition that the modular length of the state vector is larger than zero, obtaining a judgment result representing that the attribute data of the target cluster is abnormal under N time units in the future.
5. The method according to claim 1, wherein the performance prediction matrix includes N rows of prediction vectors, the prediction vectors include attribute data of the target cluster and behavior data of the target application corresponding to any prediction time under N future time units, the processing the performance prediction matrix based on a preset repair policy to obtain a simulated repair result for the target cluster includes:
Changing any numerical value in the prediction vector in the performance prediction matrix based on a preset repair strategy to obtain a simulation prediction matrix;
selecting a front a-row predictive vector from the simulation predictive matrix, and splicing the front a-row predictive vector, a rear K-a-row data vector and an N-row zero vector of the input matrix to obtain a simulation input matrix, wherein a is a positive integer;
and processing the simulation input matrix by adopting a target Informier model to obtain a simulation repair result aiming at the target cluster.
6. A method according to claim 3, characterized in that the method further comprises:
acquiring sample attribute data of the target cluster and sample behavior data of the target application at each sample time, wherein the last N sample times in each sample time are taken as marking times, and the rest sample times are taken as training times;
according to sample attribute data of the target cluster at each training moment and sample behavior data of the target application, a sample input matrix is constructed, wherein the sample input matrix comprises a plurality of rows of sample data vectors which are arranged in time sequence, and the sample data vectors comprise the training moment, the sample attribute data of the target cluster at the training moment and the sample behavior data of the target application;
And training an initial Informier model based on the sample input matrix to obtain the target Informier model.
7. The method of claim 6, wherein training an initial infomer model based on the sample input matrix to obtain the target infomer model comprises:
taking each row of the sample data vectors in the sample input matrix as an encoder input sequence;
the samples are input into K rows of sample data vectors after the matrix and are spliced with N rows of zero vectors to obtain a decoder input sequence;
training an initial Informier model based on the encoder input sequence and the decoder input sequence to obtain the target Informier model.
8. The method of claim 7, wherein training an initial infomer model based on the encoder input sequence and the decoder input sequence results in the target infomer model, comprising:
inputting the encoder input sequence into an encoder of an initial Informir model to obtain hidden layer characteristics;
inputting the decoder input sequence and the hidden layer characteristics into a decoder of the initial Informir model to obtain prediction data corresponding to each marked moment;
And training the initial Informier model according to the difference between the sample attribute data of the target cluster and the sample behavior data of the target application at each labeling moment and the prediction data corresponding to each labeling moment to obtain the target Informier model.
9. A big data cluster performance prediction apparatus, the apparatus comprising:
the data acquisition module is used for acquiring attribute data of a target cluster and behavior data of a target application at each target moment, wherein the target application runs on the target cluster;
the matrix construction module is used for constructing an input matrix according to the attribute data of the target cluster and the behavior data of the target application at each target time;
the prediction module is used for determining a performance prediction matrix of the target cluster based on the input matrix, wherein the performance prediction matrix comprises attribute data of the target cluster and behavior data of the target application under N time units in the future, and N is a positive integer;
the judging module is used for carrying out abnormal judgment on the performance prediction matrix according to a preset threshold matrix, and determining a judging result, wherein the judging result is used for representing whether the attribute data of the target cluster is abnormal or not under N time units in the future;
And the simulation repair module is used for processing the performance prediction matrix based on a preset repair strategy under the condition that the judging result represents that the attribute data of the target cluster is abnormal in N time units in the future, so as to obtain a simulation repair result aiming at the target cluster.
10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.
11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 8.
12. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 8.
CN202310198721.7A 2023-02-24 2023-02-24 Big data cluster performance prediction method and device and computer equipment Pending CN116149895A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310198721.7A CN116149895A (en) 2023-02-24 2023-02-24 Big data cluster performance prediction method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310198721.7A CN116149895A (en) 2023-02-24 2023-02-24 Big data cluster performance prediction method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN116149895A true CN116149895A (en) 2023-05-23

Family

ID=86350605

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310198721.7A Pending CN116149895A (en) 2023-02-24 2023-02-24 Big data cluster performance prediction method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN116149895A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116679890A (en) * 2023-08-02 2023-09-01 湖南惟储信息技术有限公司 Storage device security management system and method thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116679890A (en) * 2023-08-02 2023-09-01 湖南惟储信息技术有限公司 Storage device security management system and method thereof
CN116679890B (en) * 2023-08-02 2023-09-29 湖南惟储信息技术有限公司 Storage device security management system and method thereof

Similar Documents

Publication Publication Date Title
CN111045894B (en) Database abnormality detection method, database abnormality detection device, computer device and storage medium
CN108052528A (en) A kind of storage device sequential classification method for early warning
CN104778622A (en) Method and system for predicting TPS transaction event threshold value
US20230123527A1 (en) Distributed client server system for generating predictive machine learning models
CN112148561B (en) Method and device for predicting running state of business system and server
CN108415810B (en) Hard disk state monitoring method and device
CN112508679A (en) Small and micro enterprise loan risk assessment method and device and storage medium
CN116149895A (en) Big data cluster performance prediction method and device and computer equipment
CN113535522A (en) Abnormal condition detection method, device and equipment
CN115185804A (en) Server performance prediction method, system, terminal and storage medium
CN115221942A (en) Equipment defect prediction method and system based on time sequence fusion and neural network
CN114818353A (en) Train control vehicle-mounted equipment fault prediction method based on fault characteristic relation map
Wang et al. A transformer-based multi-entity load forecasting method for integrated energy systems
CN118260087A (en) Method for measuring correlation between database query statement and server energy consumption
Yang et al. Zte-predictor: Disk failure prediction system based on lstm
CN114154716A (en) Enterprise energy consumption prediction method and device based on graph neural network
CN117687815A (en) Hard disk fault prediction method and system
CN117874200A (en) Answer text generation method, device, equipment and medium for wind power operation and maintenance data
WO2024087404A1 (en) Nuclear reactor fault determination method, apparatus, device, storage medium, and product
CN108415819A (en) Hard disk fault tracking method and device
RU2632124C1 (en) Method of predictive assessment of multi-stage process effectiveness
CN116245212A (en) PCA-LSTM-based power data anomaly detection and prediction method and system
Cuzzocrea et al. Machine-Learning-Based Multidimensional Big Data Analytics over Clouds via Multi-Columnar Big OLAP Data Cube Compression
Raj et al. On the Impact of ML use cases on Industrial Data Pipelines
CN112395167A (en) Operation fault prediction method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination