CN116820883A - Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning - Google Patents
Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning Download PDFInfo
- Publication number
- CN116820883A CN116820883A CN202310783248.9A CN202310783248A CN116820883A CN 116820883 A CN116820883 A CN 116820883A CN 202310783248 A CN202310783248 A CN 202310783248A CN 116820883 A CN116820883 A CN 116820883A
- Authority
- CN
- China
- Prior art keywords
- disk
- action
- strategy
- network
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000002787 reinforcement Effects 0.000 title claims abstract description 31
- 230000009471 action Effects 0.000 claims abstract description 75
- 230000036541 health Effects 0.000 claims abstract description 63
- 230000006870 function Effects 0.000 claims abstract description 48
- 238000012549 training Methods 0.000 claims abstract description 32
- 230000000875 corresponding effect Effects 0.000 claims abstract description 21
- 238000004140 cleaning Methods 0.000 claims abstract description 16
- 238000011156 evaluation Methods 0.000 claims abstract description 16
- 238000001514 detection method Methods 0.000 claims abstract description 14
- 238000013135 deep learning Methods 0.000 claims abstract description 6
- 230000006399 behavior Effects 0.000 claims abstract description 5
- 239000003795 chemical substances by application Substances 0.000 claims description 16
- 230000004913 activation Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 230000008439 repair process Effects 0.000 claims description 4
- 230000001133 acceleration Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000003993 interaction Effects 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000005201 scrubbing Methods 0.000 claims description 3
- 230000007423 decrease Effects 0.000 claims description 2
- 238000005457 optimization Methods 0.000 claims description 2
- 210000004027 cell Anatomy 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/3034—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3447—Performance evaluation by modeling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
Abstract
The invention discloses a disk intelligent monitoring and optimizing system and method based on deep reinforcement learning, wherein the system comprises a health evaluation module, a strategy adjustment module and an optimizer; the strategy adjustment module comprises a strategy network, a target network and an experience playback buffer zone; the health evaluation module is used for acquiring the overall health level of the magnetic disk; the strategy network is used for acquiring corresponding actions and states according to the overall health level of the disk; the target network is used for acquiring a target state and a behavior value corresponding to a target action in a training stage; the experience playback buffer is used for storing data of the training stage; the optimizer is used for acquiring the loss function and updating parameters of the strategy network based on the loss function. According to the invention, the optimal redundancy strategy and the disk cleaning period can be trained simultaneously by the reinforcement learning method, so that the self-adaptability and the reliability of the system are enhanced, and the data is not easy to lose and is easy to manage; the health condition of the magnetic disk is evaluated through deep learning detection, and the system is trained through reinforcement learning, so that the accuracy is improved.
Description
Technical Field
The invention relates to the technical field of data center management and disk health monitoring, in particular to a disk intelligent monitoring and optimizing system and method based on deep reinforcement learning.
Background
Disk health prediction is an important means of improving disk reliability and avoiding data loss, and many studies utilize machine learning techniques to predict disk failure from various features extracted from SMART (self-monitoring, analysis, and reporting techniques) data.
Disk health prediction can be used to improve disk reliability by adjusting redundancy settings and disk cleaning as it reflects the health and future trends of the disk. Disk adaptive redundancy is a technique for dynamically adjusting redundancy settings based on disk reliability in a clustered storage system. The current implementation method uses a standard window-based variable point detection algorithm to adjust redundancy setting, and has the defects of poor timeliness, low prediction precision and the like compared with an active prediction method. Disk cleaning is the process of periodically reading disks to detect potential sector errors and repair them as much as possible. The current method of setting different disk cleaning rates for each disk and even for different areas of each disk may make the storage system more difficult to manage and result in data inconsistencies during disk cleaning, resulting in data loss or other problems. However, the existing scheme considers the redundancy of the magnetic disk and the cleaning of the magnetic disk as independent parts, and does not consider the redundancy of the magnetic disk and the cleaning of the magnetic disk as an integral system, so that the health condition of the magnetic disk and the monitoring accuracy and reliability are low, and the data are easy to lose and difficult to manage.
Disclosure of Invention
Aiming at the defects in the prior art, the intelligent disk monitoring and optimizing system and method based on deep reinforcement learning provided by the invention solve the problems of low accuracy and reliability of monitoring the health condition of a disk, easy loss of data and difficult management of storage in the prior art.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
the intelligent disk monitoring and optimizing system based on deep reinforcement learning comprises a health evaluation module, a strategy adjustment module and an optimizer; the strategy adjustment module adopts a deep Q network model; the deep Q network model includes a policy network, a target network, and an experience playback buffer;
the health evaluation module is used for acquiring the overall health level of the magnetic discs of different brands;
the policy network is used for acquiring corresponding actions and states according to the overall health level of the magnetic disks of different brands; the actions are used for intelligent monitoring and optimizing of the magnetic disk;
the target network is used for acquiring a target state and a behavior value corresponding to a target action in a training stage;
an experience playback buffer for storing the status, actions, and rewards of the training phase;
and the optimizer is used for acquiring a loss function according to the outputs of the strategy network and the target network and updating parameters of the strategy network based on the loss function.
There is provided a monitoring and optimization method comprising the steps of:
s1, obtaining health scores of magnetic discs of different brands through a health evaluation module;
s2, constructing reinforcement learning intelligent agents and rewarding functions; initializing an environment, a strategy adjustment module, an optimizer, a reward list and the current training times;
s3, setting the ep reward to 0;
s4, obtaining the overall health level of the brand according to the health score of the magnetic disk of the specific brand; simulating a damaged portion of the disk according to the overall health level of the brand; initializing the current step number;
s5, generating a random number and judging whether the random number is smaller than the exploration rate, if so, entering a step S6; otherwise, entering step S7;
s6, randomly selecting an action; step S8 is entered;
s7, selecting the action with the maximum action value in the current state through a strategy network; step S8 is entered;
s8, executing actions in the environment to obtain the next state and rewards; storing the current state, the executed action and the acquired rewards to an experience playback buffer;
s9, judging whether the size of the experience playback buffer area is larger than or equal to a set value, if so, entering a step S10; otherwise, enter step S11;
s10, randomly extracting a batch of experience data in an experience playback buffer area, and respectively obtaining a target value and a predicted value through a target network and a strategy network; calculating a loss function between the target value and the predicted value according to the target value and the predicted value; parameter updating of the strategy network is carried out through the minimized loss function of the optimizer, and an updated strategy network is obtained;
s11, updating the current state and information of the reinforcement learning agent and the epicode rewards according to the next state and rewards obtained in the step S8;
s12, judging whether the current step number reaches the maximum step number, if so, ending a round of training, adding the epoode rewards to a rewards list and entering a step S13; otherwise, adding 1 to the current step number and returning to the step S5;
s13, judging whether the current training times reach the maximum training times, if so, obtaining a trained intelligent disk monitoring and optimizing system and entering a step S14; otherwise, resetting the environment and the initial state, adding 1 to the current training times and returning to the step S3;
s14, deploying the trained intelligent disk monitoring and optimizing system to a data center system; obtaining the overall health level of the brand through a trained disk intelligent monitoring and optimizing system; and dynamically adjusting the corresponding disk redundancy strategy and the disk cleaning rate according to the overall health level to finish monitoring and optimizing.
Further, the health evaluation module adopts an LSTM neural network model; the LSTM neural network model comprises two LSTM layers and a full connection layer which are connected in series; each LSTM layer includes 128 LSTM cells; the fully connected layer comprises 4 neurons; the LSTM layer adopts a ReLU function as an activation function; the fully connected layer adopts a softmax function as an activation function;
the strategy network comprises an input layer, a hidden layer and an output layer; the target network comprises an input layer, a hidden layer and an output layer; the deep Q network model is trained by adopting a Q-Learning algorithm, and environment exploration is carried out by adopting an epsilon-greedy strategy.
Further, the specific steps of step S1 are as follows:
s1-1, acquiring original SMART data of a disk to be monitored in a period of time through a monitoring acquisition system; wherein the original SMART data includes negative sample data and positive sample data;
s1-2, processing original SMART data based on feature selection and feature processing to obtain SMART features and building a training set;
s1-3, classifying disk data of different brands and models; training the training set based on deep learning to obtain the health degree of the single disk;
s1-4, according to the formula:
obtaining health scores H of specific brands and models; wherein n represents the number of hard disks in the brand, i represents the label, and w i Representing the weight assigned to tag i, p i Representing the proportion of the hard disk of tag i in the brand.
Further, the formula of the reward function of step S2 is:
Reward=C 1 *H/H a *MTTDL diff +C 2 *H a /H*MTTD diff +C 3 *Space save +C 4 *Cost diff
wherein λ represents a disk failure rate, μ represents a disk repair rate, k represents k redundant blocks obtained by encoding each block of data, n represents a total block number, MTTDL represents an average failure time, n 1 And k 1 The number of coded blocks and the number of data blocks, n, respectively, representing the new redundancy method 2 And k 2 The number of coded blocks and the number of data blocks representing the old redundancy method, space, respectively save Representing saved space memory, Z i Represents an acceleration factor, N i Represents the number of disks, r represents the normal scrubbing rate, MTTD represents the average detection time, T represents the time span, sigma (-) represents the summing function, cost represents the Cost, reward represents the rewarding function, C 1 、C 2 、C 3 And C 4 Indicating hyper-parameters, H indicating health score, H a Representing the health score, MTTDL, of a set alert level disk diff Indicating the variation of mean time to failure, MTTD diff Representing the variation of the average detection time, cost diff Representing the amount of change in cost.
Further, the initial value of the exploration rate in the step S5 is set to be 1.0, and gradually decreases along with the continuous interaction between the reinforcement learning agent and the environment; the change process of the exploration rate is shown in the following formula:
wherein ε represents the exploration rate, ε final Represents the final exploration rate, ε start represents the initial exploration rate, ε sdecay Attenuation speed, steps, representing exploration rate done Indicating the number of steps that the reinforcement learning agent has performed, e indicating a constant.
Further, in step S5, the generation of the random number adopts an epsilon-greedy strategy, and the specific method is as follows:
according to the formula:
acquiring probability pi of selecting action a in current state s k (a|s), and will pi k (a|s) generating random numbers as probabilities that the random numbers to be generated are smaller than the exploration rate; wherein A represents an action set, a represents an action corresponding to the maximum instant prize, a' represents an action, s represents a state, k represents a time step number, Q k (s, a ') represents the instant prize corresponding to the number of time steps k for performing the action a' in the state s,the maximum instant prize value for the action a' in state s corresponding to the taken time step number k is indicated.
Further, in step S7, the calculation formulas of the values of the different actions in the current state are as follows:
Q(s,a)=E[r+γmax s' Q(s',a')|s=s',a=a']
where s represents the current state, a represents the current action, r represents the reward taken from the environment, γ represents the discount factor, Q (s, a) represents the expected total reward for action a taken in state s, s 'represents the next state, a' represents the next action, max s' Q (s ', a') represents the maximum expected total prize, E [. Cndot.]Representing an optimal bellman function.
Further, the formula of the target value and the minimum loss function obtained by the optimizer in step S10 is as follows:
wherein L represents the minimum loss function, N represents the number of samples, j represents the current step number, r i An immediate return representing the current step number j, gamma representing the discount factor, s 'representing the next state, a' representing the next action, s j State, a, representing the current step number j j An operation of the current step number j is represented, Q (s j ,a j ) Representing the state s of the target network j The action value under the condition A (s ', a ') represents the action value of the target network under the condition S ',representing the probability that the policy network selects action a 'in state s',indicating the future prize value expected to be achieved by selecting the optimal action in the current state.
Further, step S1-2 compensates for the original SMART data imbalance using an undersampling method.
The beneficial effects of the invention are as follows: the intelligent disk monitoring and optimizing system brings the redundancy strategy and the cleaning period into one system through the reinforcement learning method, so that the optimal redundancy strategy and the disk cleaning period can be trained simultaneously, the self-adaptability and the reliability of the system are enhanced, and data are not easy to lose and easy to manage; according to the monitoring and optimizing method, the health condition of the magnetic disk is estimated through deep learning detection, and the magnetic disk intelligent monitoring and optimizing system is trained through reinforcement learning, so that the accuracy is improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
The intelligent disk monitoring and optimizing system based on deep reinforcement learning comprises a health evaluation module, a strategy adjustment module and an optimizer; the strategy adjustment module adopts a deep Q network model; the deep Q network model includes a policy network, a target network, and an experience playback buffer;
the health evaluation module is used for acquiring the overall health level of the magnetic discs of different brands;
the policy network is used for acquiring corresponding actions and states according to the overall health level of the magnetic disks of different brands; the actions are used for intelligent monitoring and optimizing of the magnetic disk;
the target network is used for acquiring a target state and a behavior value corresponding to a target action in a training stage;
an experience playback buffer for storing the status, actions, and rewards of the training phase;
and the optimizer is used for acquiring a loss function according to the outputs of the strategy network and the target network and updating parameters of the strategy network based on the loss function.
As shown in fig. 1, a monitoring and optimizing method includes the following steps:
s1, obtaining health scores of magnetic discs of different brands through a health evaluation module;
s2, constructing reinforcement learning intelligent agents and rewarding functions; initializing an environment, a strategy adjustment module, an optimizer, a reward list and the current training times;
s3, setting the ep reward to 0;
s4, obtaining the overall health level of the brand according to the health score of the magnetic disk of the specific brand; simulating a damaged portion of the disk according to the overall health level of the brand; initializing the current step number;
s5, generating a random number and judging whether the random number is smaller than the exploration rate, if so, entering a step S6; otherwise, entering step S7;
s6, randomly selecting an action; step S8 is entered;
s7, selecting the action with the maximum action value in the current state through a strategy network; step S8 is entered;
s8, executing actions in the environment to obtain the next state and rewards; storing the current state, the executed action and the acquired rewards to an experience playback buffer;
s9, judging whether the size of the experience playback buffer area is larger than or equal to a set value, if so, entering a step S10; otherwise, enter step S11;
s10, randomly extracting a batch of experience data in an experience playback buffer area, and respectively obtaining a target value and a predicted value through a target network and a strategy network; calculating a loss function between the target value and the predicted value according to the target value and the predicted value; parameter updating of the strategy network is carried out through the minimized loss function of the optimizer, and an updated strategy network is obtained;
s11, updating the current state and information of the reinforcement learning agent and the epicode rewards according to the next state and rewards obtained in the step S8;
s12, judging whether the current step number reaches the maximum step number, if so, ending a round of training, adding the epoode rewards to a rewards list and entering a step S13; otherwise, adding 1 to the current step number and returning to the step S5;
s13, judging whether the current training times reach the maximum training times, if so, obtaining a trained intelligent disk monitoring and optimizing system and entering a step S14; otherwise, resetting the environment and the initial state, adding 1 to the current training times and returning to the step S3;
s14, deploying the trained intelligent disk monitoring and optimizing system to a data center system; obtaining the overall health level of the brand through a trained disk intelligent monitoring and optimizing system; and dynamically adjusting the corresponding disk redundancy strategy and the disk cleaning rate according to the overall health level to finish monitoring and optimizing.
The health evaluation module adopts an LSTM neural network model; the LSTM neural network model comprises two LSTM layers and a full connection layer which are connected in series; each LSTM layer includes 128 LSTM cells; the fully connected layer comprises 4 neurons; the LSTM layer adopts a ReLU function as an activation function; the fully connected layer adopts a softmax function as an activation function;
the strategy network comprises an input layer, a hidden layer and an output layer; the target network comprises an input layer, a hidden layer and an output layer; the deep Q network model is trained by adopting a Q-Learning algorithm, and environment exploration is carried out by adopting an epsilon-greedy strategy.
The specific steps of the step S1 are as follows:
s1-1, acquiring original SMART data of a disk to be monitored in a period of time through a monitoring acquisition system; wherein the original SMART data includes negative sample data and positive sample data;
s1-2, processing original SMART data based on feature selection and feature processing to obtain SMART features and building a training set;
s1-3, classifying disk data of different brands and models; training the training set based on deep learning to obtain the health degree of the single disk;
s1-4, according to the formula:
obtaining health scores H of specific brands and models; wherein n represents the number of hard disks in the brand, i represents the label, and w i Representing the weight assigned to tag i, p i Representing the proportion of the hard disk of tag i in the brand.
The formula of the bonus function of step S2 is:
Reward=C 1 *H/H a *MTTDL diff +C 2 *H a /H*MTTD diff +C 3 *Space save +C 4 *Cost diff
wherein λ represents a disk failure rate, μ represents a disk repair rate, k represents k redundant blocks obtained by encoding each block of data, n represents a total block number, MTTDL represents an average failure time, n 1 And k 1 The number of coded blocks and the number of data blocks, n, respectively, representing the new redundancy method 2 And k 2 The number of coded blocks and the number of data blocks representing the old redundancy method, space, respectively save Representing saved space memory, Z i Represents an acceleration factor, N i Represents the number of disks, r represents the normal scrubbing rate, MTTD represents the average detection time, T represents the time span, sigma (-) represents the summing function, cost represents the Cost, reward represents the rewarding function, C 1 、C 2 、C 3 And C 4 Indicating hyper-parameters, H indicating health score, H a Representing the health score, MTTDL, of a set alert level disk diff Indicating the variation of mean time to failure, MTTD diff Representing the variation of the average detection time, cost diff Representing the amount of change in cost.
Setting the initial value of the exploration rate in the step S5 to be 1.0, and gradually reducing along with continuous interaction between the reinforcement learning intelligent agent and the environment; the change process of the exploration rate is shown in the following formula:
wherein ε represents the exploration rate, ε final Represents the final exploration rate, ε start represents the initial exploration rate, ε sdecay Attenuation speed, steps, representing exploration rate done Indicating the number of steps that the reinforcement learning agent has performed, e indicating a constant.
In the step S5, the generation of the random number adopts an epsilon-greedy strategy, and the specific method is as follows:
according to the formula:
acquiring probability pi of selecting action a in current state s k (a|s), and will pi k (a|s) generating random numbers as probabilities that the random numbers to be generated are smaller than the exploration rate; wherein A represents an action set, a represents an action corresponding to the maximum instant prize, a' represents an action, s represents a state, k represents a time step number, Q k (s, a ') represents the instant prize corresponding to the number of time steps k for performing the action a' in the state s,the maximum instant prize value for the action a' in state s corresponding to the taken time step number k is indicated.
In step S7, the calculation formulas of the values of the different actions in the current state are as follows:
Q(s,a)=E[r+γmax s' Q(s',a')|s=s',a=a']
where s represents the current state, a represents the current action, r represents the reward taken from the environment, γ represents the discount factor, Q (s, a) represents the expected total reward for action a taken in state s, s 'represents the next state, a' represents the next action, max s' Q (s ', a') represents the maximum expected total prize, E [. Cndot.]Representing an optimal bellman function.
The formula of the target value and the minimum loss function obtained by the optimizer in step S10 is as follows:
wherein L represents the minimum loss function, N represents the number of samples, j represents the current step number, r i An immediate return representing the current step number j, gamma representing the discount factor, s 'representing the next state, a' representing the next action, s j State, a, representing the current step number j j An operation of the current step number j is represented, Q (s j ,a j ) Representing the state s of the target network j The action value under the condition A (s ', a ') represents the action value of the target network under the condition S ',representing the probability that the policy network selects action a 'in state s',indicating the future prize value expected to be achieved by selecting the optimal action in the current state.
Step S1-2 adopts an undersampling method to compensate for the imbalance of the original SMART data.
In one embodiment of the invention, the target network and policy network parameters of the disk intelligent monitoring and optimizing system based on deep reinforcement learning are the same, but are not updated when the disk intelligent monitoring and optimizing system of the invention is trained. The size of the input layer of the strategy network is set to 10000, namely the set number of disks; the hidden layer size of the policy network is set to 32; the size of the output layer of the policy network is the action space size, namely, nine actions of 0 to 8, wherein 0 indicates that no operation is performed, and 1 to 8 respectively indicate different actions. When the health score is calculated, the weight of a normal disk is lower, and the weight of a failed disk is higher; in addition, there are indexes for judging the accuracy of the disk health score, namely, accuracy (Precision), recall (Recall), comprehensive evaluation index (F-measure) and comprehensive evaluation index MCC, and the corresponding formulas are as follows:
where TP represents the true instance, i.e., the number of positive samples correctly classified by the model as positive classes; TN represents true counterexamples, i.e., the number of negative examples that the model correctly classifies as negative; FP represents the false positive, i.e. the number of negative samples that the model wrongly classifies as positive classes; FN represents the false negative, i.e. the number of positive samples that the model wrongly classifies as negative.
The four indexes of the reward function are mean time between failure (MTTDL), and Space saving memory (Space) save ) Average detection time (MTTD) and Cost (Cost). The mean time between failure (MTTDL) refers to the mean time that the system can normally run before data loss occurs, reflects the reliability of the system, and indicates that the greater the value is, the higher the reliability of the system is; space saving memory (Space) save ) The saved space occupies the proportion of the original data space, reflects the storage efficiency of the system, and indicates that the higher the value is, the higher the storage efficiency of the system is; the average detection time (MTTD) refers to the average time spent by the system before the disk failure is found after the disk cleaning period is modified, reflects the failure detection capability of the system, and indicates that the lower the value is, the stronger the failure detection capability of the system is; cost (Cost) refers to the system consumption after modifying the disk wash cycle, reflecting the running Cost of the system, and the lower the value, the lower the running Cost of the system. The system consumption refers to the cost of performing disk cleaning, including disk life, energy consumption, performance loss, and the like.
The rewards list is the total rewards earned by the reinforcement agent for learning and performing tasks in different environments. The exploration rate refers to the tendency of an agent to try new actions in the learning process, and the performance degradation caused by excessive exploration of the agent can be prevented by reducing the exploration rate. The intelligent monitoring and optimizing system based on the deep reinforcement learning for the magnetic disk adjusts the behavior strategy by tracking the performances of the intelligent agent in different states and recording rewards and executed actions of each state.
In summary, the redundancy strategy and the cleaning period are integrated into a system through the reinforcement learning method, so that the optimal redundancy strategy and the disk cleaning period can be trained simultaneously, the self-adaptability and the reliability of the system are enhanced, and the data are not easy to lose and are easy to manage; according to the monitoring and optimizing method, the health condition of the magnetic disk is estimated through deep learning detection, and the magnetic disk intelligent monitoring and optimizing system is trained through reinforcement learning, so that the accuracy is improved.
Claims (10)
1. A disk intelligent monitoring and optimizing system based on deep reinforcement learning is characterized in that: the system comprises a health evaluation module, a strategy adjustment module and an optimizer; the strategy adjustment module adopts a deep Q network model; the deep Q network model includes a policy network, a target network, and an experience playback buffer;
the health evaluation module is used for acquiring the overall health level of the magnetic discs of different brands;
the strategy network is used for acquiring corresponding actions and states according to the overall health level of the magnetic disks of different brands; the actions are used for intelligent monitoring and optimizing of the magnetic disk;
the target network is used for acquiring a target state and a behavior value corresponding to a target action in a training stage;
the experience playback buffer area is used for storing the state, action and rewards of the training stage;
and the optimizer is used for acquiring a loss function according to the output of the strategy network and the target network and updating the parameters of the strategy network based on the loss function.
2. The intelligent monitoring and optimization system for a disk based on deep reinforcement learning of claim 1, wherein: the health evaluation module adopts an LSTM neural network model; the LSTM neural network model comprises two LSTM layers and a full connection layer which are connected in series; each LSTM layer comprises 128 LSTM units; the fully-connected layer comprises 4 neurons; the LSTM layer adopts a ReLU function as an activation function; the fully connected layer adopts a softmax function as an activation function;
the strategy network comprises an input layer, a hidden layer and an output layer; the target network comprises an input layer, a hidden layer and an output layer; the deep Q network model is trained by adopting a Q-Learning algorithm, and environment exploration is carried out by adopting an epsilon-greedy strategy.
3. A method for monitoring and optimizing a disk intelligent monitoring and optimizing system based on deep reinforcement learning as set forth in any one of claims 1 or 2, characterized in that: the method comprises the following steps:
s1, obtaining health scores of magnetic discs of different brands through a health evaluation module;
s2, constructing reinforcement learning intelligent agents and rewarding functions; initializing an environment, a strategy adjustment module, an optimizer, a reward list and the current training times;
s3, setting the ep reward to 0;
s4, obtaining the overall health level of the brand according to the health score of the magnetic disk of the specific brand; simulating a damaged portion of the disk according to the overall health level of the brand; initializing the current step number;
s5, generating a random number and judging whether the random number is smaller than the exploration rate, if so, entering a step S6; otherwise, entering step S7;
s6, randomly selecting an action; step S8 is entered;
s7, selecting the action with the maximum action value in the current state through a strategy network; step S8 is entered;
s8, executing actions in the environment to obtain the next state and rewards; storing the current state, the executed action and the acquired rewards to an experience playback buffer;
s9, judging whether the size of the experience playback buffer area is larger than or equal to a set value, if so, entering a step S10; otherwise, enter step S11;
s10, randomly extracting a batch of experience data in an experience playback buffer area, and respectively obtaining a target value and a predicted value through a target network and a strategy network; calculating a loss function between the target value and the predicted value according to the target value and the predicted value; parameter updating of the strategy network is carried out through the minimized loss function of the optimizer, and an updated strategy network is obtained;
s11, updating the current state and information of the reinforcement learning agent and the epicode rewards according to the next state and rewards obtained in the step S8;
s12, judging whether the current step number reaches the maximum step number, if so, ending a round of training, adding the epoode rewards to a rewards list and entering a step S13; otherwise, adding 1 to the current step number and returning to the step S5;
s13, judging whether the current training times reach the maximum training times, if so, obtaining a trained intelligent disk monitoring and optimizing system and entering a step S14; otherwise, resetting the environment and the initial state, adding 1 to the current training times and returning to the step S3;
s14, deploying the trained intelligent disk monitoring and optimizing system to a data center system; obtaining the overall health level of the brand through a trained disk intelligent monitoring and optimizing system; and dynamically adjusting the corresponding disk redundancy strategy and the disk cleaning rate according to the overall health level to finish monitoring and optimizing.
4. A method of monitoring and optimizing as claimed in claim 3, wherein: the specific steps of the step S1 are as follows:
s1-1, acquiring original SMART data of a disk to be monitored in a period of time through a monitoring acquisition system; wherein the original SMART data includes negative sample data and positive sample data;
s1-2, processing the original SMART data based on feature selection and feature processing to obtain SMART features and building a training set;
s1-3, classifying disk data of different brands and models; training the training set based on deep learning to obtain the health degree of the single disk;
s1-4, according to the formula:
obtaining health scores H of specific brands and models; wherein n represents the number of hard disks in the brand, i represents the label, and w i Representing the weight assigned to tag i, p i Representing the proportion of the hard disk of tag i in the brand.
5. A method of monitoring and optimizing as claimed in claim 3, wherein: the formula of the reward function of the step S2 is:
Reward=C 1 *H/H a *MTTDL diff +C 2 *H a /H*MTTD diff +C 3 *Space save +C 4 *Cost diff
wherein λ represents a disk failure rate, μ represents a disk repair rate, k represents k redundant blocks obtained by encoding each block of data, n represents a total block number, and MTTDL represents an average absenceTime of failure, n 1 And k 1 The number of coded blocks and the number of data blocks, n, respectively, representing the new redundancy method 2 And k 2 The number of coded blocks and the number of data blocks representing the old redundancy method, space, respectively save Representing saved space memory, Z i Represents an acceleration factor, N i Represents the number of disks, r represents the normal scrubbing rate, MTTD represents the average detection time, T represents the time span, sigma (-) represents the summing function, cost represents the Cost, reward represents the rewarding function, C 1 、C 2 、C 3 And C 4 Indicating hyper-parameters, H indicating health score, H a Representing the health score, MTTDL, of a set alert level disk diff Indicating the variation of mean time to failure, MTTD diff Representing the variation of the average detection time, cost diff Representing the amount of change in cost.
6. A method of monitoring and optimizing as claimed in claim 3, wherein: the initial value of the exploration rate in the step S5 is set to be 1.0, and gradually decreases along with continuous interaction between the reinforcement learning intelligent agent and the environment; the change process of the exploration rate is shown in the following formula:
wherein ε represents the exploration rate, ε final Represents the final exploration rate, ε start represents the initial exploration rate, ε sdecay Attenuation speed, steps, representing exploration rate done Indicating the number of steps that the reinforcement learning agent has performed, e indicating a constant.
7. A method of monitoring and optimizing as claimed in claim 3, wherein: the random number generation in the step S5 adopts an epsilon-greedy strategy, and the specific method is as follows:
according to the formula:
acquiring probability pi of selecting action a in current state s k (a|s), and will pi k (a|s) generating random numbers as probabilities that the random numbers to be generated are smaller than the exploration rate; wherein A represents an action set, a represents an action corresponding to the maximum instant prize, a' represents an action, s represents a state, k represents a time step number, Q k (s, a ') represents the instant prize corresponding to the number of time steps k for performing the action a' in the state s,the maximum instant prize value for the action a' in state s corresponding to the taken time step number k is indicated.
8. A method of monitoring and optimizing as claimed in claim 3, wherein: the calculation formula of the values of the different actions in the current state in the step S7 is as follows:
Q(s,a)=E[r+γmax s' Q(s',a')|s=s',a=a']
where s represents the current state, a represents the current action, r represents the reward taken from the environment, γ represents the discount factor, Q (s, a) represents the expected total reward for action a taken in state s, s 'represents the next state, a' represents the next action, max s' Q (s ', a') represents the maximum expected total prize, E [. Cndot.]Representing an optimal bellman function.
9. A method of monitoring and optimizing as claimed in claim 3, wherein: the formula of the target value and the minimum loss function obtained by the optimizer in the step S10 is as follows:
wherein L represents the minimum loss function, N represents the number of samples, j represents the current step number, r i Immediate return representing current step number j, gamma representing discount factor, s' tableShowing the next state, a' represents the next action, s j State, a, representing the current step number j j An operation of the current step number j is represented, Q (s j ,a j ) Representing the state s of the target network j The action value under the condition A (s ', a ') represents the action value of the target network under the condition S ',representing the probability that the policy network selects action a 'in state s',indicating the future prize value expected to be achieved by selecting the optimal action in the current state.
10. A method of monitoring and optimizing as claimed in claim 3, wherein: the step S1-2 adopts an undersampling method to compensate the unbalance of the original SMART data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310783248.9A CN116820883A (en) | 2023-06-28 | 2023-06-28 | Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310783248.9A CN116820883A (en) | 2023-06-28 | 2023-06-28 | Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116820883A true CN116820883A (en) | 2023-09-29 |
Family
ID=88125431
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310783248.9A Pending CN116820883A (en) | 2023-06-28 | 2023-06-28 | Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116820883A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117575174A (en) * | 2024-01-15 | 2024-02-20 | 山东环球软件股份有限公司 | Intelligent agricultural monitoring and management system |
-
2023
- 2023-06-28 CN CN202310783248.9A patent/CN116820883A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117575174A (en) * | 2024-01-15 | 2024-02-20 | 山东环球软件股份有限公司 | Intelligent agricultural monitoring and management system |
CN117575174B (en) * | 2024-01-15 | 2024-04-02 | 山东环球软件股份有限公司 | Intelligent agricultural monitoring and management system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110413227B (en) | Method and system for predicting remaining service life of hard disk device on line | |
JP7105932B2 (en) | Anomaly detection using deep learning on time series data related to application information | |
CN110399238B (en) | Disk fault early warning method, device, equipment and readable storage medium | |
US7493300B2 (en) | Model and system for reasoning with N-step lookahead in policy-based system management | |
CN108647136A (en) | Hard disk corruptions prediction technique and device based on SMART information and deep learning | |
CN111178553A (en) | Industrial equipment health trend analysis method and system based on ARIMA and LSTM algorithms | |
CN116820883A (en) | Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning | |
CN108446734A (en) | Disk failure automatic prediction method based on artificial intelligence | |
Blouw et al. | Event-driven signal processing with neuromorphic computing systems | |
CN112433896B (en) | Method, device, equipment and storage medium for predicting server disk faults | |
CN116683588B (en) | Lithium ion battery charge and discharge control method and system | |
CN112446557B (en) | Disk failure prediction evasion method and system based on deep learning | |
KR20210082349A (en) | Method and apparatus for determining storage load of application | |
CN112988550A (en) | Server failure prediction method, device and computer readable medium | |
Li et al. | Prediction of HDD failures by ensemble learning | |
Wang et al. | Evaluation and prediction method of rolling bearing performance degradation based on attention-LSTM | |
KR102480518B1 (en) | Method for credit evaluation model update or replacement and apparatus performing the method | |
CN115617604A (en) | Disk failure prediction method and system based on image pattern matching | |
KR20080087571A (en) | Context prediction system and method thereof | |
CN114227701A (en) | Robot fault prediction method based on production data | |
CN113268782A (en) | Machine account identification and camouflage countermeasure method based on graph neural network | |
CN112395167A (en) | Operation fault prediction method and device and electronic equipment | |
CN117473445B (en) | Extreme learning machine-based equipment abnormality analysis method and device | |
CN117556221B (en) | Data analysis method and system based on intelligent electrical control interaction session | |
CN116894658A (en) | Method for predicting faults of internal and external equipment in warranty period based on attribute characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |