CN116820883A - Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning - Google Patents

Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning Download PDF

Info

Publication number
CN116820883A
CN116820883A CN202310783248.9A CN202310783248A CN116820883A CN 116820883 A CN116820883 A CN 116820883A CN 202310783248 A CN202310783248 A CN 202310783248A CN 116820883 A CN116820883 A CN 116820883A
Authority
CN
China
Prior art keywords
disk
action
strategy
network
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310783248.9A
Other languages
Chinese (zh)
Inventor
邵杰
苏薄
付骏峰
何鸿才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Higher Research Institute Of University Of Electronic Science And Technology Shenzhen
Original Assignee
Higher Research Institute Of University Of Electronic Science And Technology Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Higher Research Institute Of University Of Electronic Science And Technology Shenzhen filed Critical Higher Research Institute Of University Of Electronic Science And Technology Shenzhen
Priority to CN202310783248.9A priority Critical patent/CN116820883A/en
Publication of CN116820883A publication Critical patent/CN116820883A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3034Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a storage system, e.g. DASD based or network based
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3447Performance evaluation by modeling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/092Reinforcement learning

Abstract

The invention discloses a disk intelligent monitoring and optimizing system and method based on deep reinforcement learning, wherein the system comprises a health evaluation module, a strategy adjustment module and an optimizer; the strategy adjustment module comprises a strategy network, a target network and an experience playback buffer zone; the health evaluation module is used for acquiring the overall health level of the magnetic disk; the strategy network is used for acquiring corresponding actions and states according to the overall health level of the disk; the target network is used for acquiring a target state and a behavior value corresponding to a target action in a training stage; the experience playback buffer is used for storing data of the training stage; the optimizer is used for acquiring the loss function and updating parameters of the strategy network based on the loss function. According to the invention, the optimal redundancy strategy and the disk cleaning period can be trained simultaneously by the reinforcement learning method, so that the self-adaptability and the reliability of the system are enhanced, and the data is not easy to lose and is easy to manage; the health condition of the magnetic disk is evaluated through deep learning detection, and the system is trained through reinforcement learning, so that the accuracy is improved.

Description

Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning
Technical Field
The invention relates to the technical field of data center management and disk health monitoring, in particular to a disk intelligent monitoring and optimizing system and method based on deep reinforcement learning.
Background
Disk health prediction is an important means of improving disk reliability and avoiding data loss, and many studies utilize machine learning techniques to predict disk failure from various features extracted from SMART (self-monitoring, analysis, and reporting techniques) data.
Disk health prediction can be used to improve disk reliability by adjusting redundancy settings and disk cleaning as it reflects the health and future trends of the disk. Disk adaptive redundancy is a technique for dynamically adjusting redundancy settings based on disk reliability in a clustered storage system. The current implementation method uses a standard window-based variable point detection algorithm to adjust redundancy setting, and has the defects of poor timeliness, low prediction precision and the like compared with an active prediction method. Disk cleaning is the process of periodically reading disks to detect potential sector errors and repair them as much as possible. The current method of setting different disk cleaning rates for each disk and even for different areas of each disk may make the storage system more difficult to manage and result in data inconsistencies during disk cleaning, resulting in data loss or other problems. However, the existing scheme considers the redundancy of the magnetic disk and the cleaning of the magnetic disk as independent parts, and does not consider the redundancy of the magnetic disk and the cleaning of the magnetic disk as an integral system, so that the health condition of the magnetic disk and the monitoring accuracy and reliability are low, and the data are easy to lose and difficult to manage.
Disclosure of Invention
Aiming at the defects in the prior art, the intelligent disk monitoring and optimizing system and method based on deep reinforcement learning provided by the invention solve the problems of low accuracy and reliability of monitoring the health condition of a disk, easy loss of data and difficult management of storage in the prior art.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
the intelligent disk monitoring and optimizing system based on deep reinforcement learning comprises a health evaluation module, a strategy adjustment module and an optimizer; the strategy adjustment module adopts a deep Q network model; the deep Q network model includes a policy network, a target network, and an experience playback buffer;
the health evaluation module is used for acquiring the overall health level of the magnetic discs of different brands;
the policy network is used for acquiring corresponding actions and states according to the overall health level of the magnetic disks of different brands; the actions are used for intelligent monitoring and optimizing of the magnetic disk;
the target network is used for acquiring a target state and a behavior value corresponding to a target action in a training stage;
an experience playback buffer for storing the status, actions, and rewards of the training phase;
and the optimizer is used for acquiring a loss function according to the outputs of the strategy network and the target network and updating parameters of the strategy network based on the loss function.
There is provided a monitoring and optimization method comprising the steps of:
s1, obtaining health scores of magnetic discs of different brands through a health evaluation module;
s2, constructing reinforcement learning intelligent agents and rewarding functions; initializing an environment, a strategy adjustment module, an optimizer, a reward list and the current training times;
s3, setting the ep reward to 0;
s4, obtaining the overall health level of the brand according to the health score of the magnetic disk of the specific brand; simulating a damaged portion of the disk according to the overall health level of the brand; initializing the current step number;
s5, generating a random number and judging whether the random number is smaller than the exploration rate, if so, entering a step S6; otherwise, entering step S7;
s6, randomly selecting an action; step S8 is entered;
s7, selecting the action with the maximum action value in the current state through a strategy network; step S8 is entered;
s8, executing actions in the environment to obtain the next state and rewards; storing the current state, the executed action and the acquired rewards to an experience playback buffer;
s9, judging whether the size of the experience playback buffer area is larger than or equal to a set value, if so, entering a step S10; otherwise, enter step S11;
s10, randomly extracting a batch of experience data in an experience playback buffer area, and respectively obtaining a target value and a predicted value through a target network and a strategy network; calculating a loss function between the target value and the predicted value according to the target value and the predicted value; parameter updating of the strategy network is carried out through the minimized loss function of the optimizer, and an updated strategy network is obtained;
s11, updating the current state and information of the reinforcement learning agent and the epicode rewards according to the next state and rewards obtained in the step S8;
s12, judging whether the current step number reaches the maximum step number, if so, ending a round of training, adding the epoode rewards to a rewards list and entering a step S13; otherwise, adding 1 to the current step number and returning to the step S5;
s13, judging whether the current training times reach the maximum training times, if so, obtaining a trained intelligent disk monitoring and optimizing system and entering a step S14; otherwise, resetting the environment and the initial state, adding 1 to the current training times and returning to the step S3;
s14, deploying the trained intelligent disk monitoring and optimizing system to a data center system; obtaining the overall health level of the brand through a trained disk intelligent monitoring and optimizing system; and dynamically adjusting the corresponding disk redundancy strategy and the disk cleaning rate according to the overall health level to finish monitoring and optimizing.
Further, the health evaluation module adopts an LSTM neural network model; the LSTM neural network model comprises two LSTM layers and a full connection layer which are connected in series; each LSTM layer includes 128 LSTM cells; the fully connected layer comprises 4 neurons; the LSTM layer adopts a ReLU function as an activation function; the fully connected layer adopts a softmax function as an activation function;
the strategy network comprises an input layer, a hidden layer and an output layer; the target network comprises an input layer, a hidden layer and an output layer; the deep Q network model is trained by adopting a Q-Learning algorithm, and environment exploration is carried out by adopting an epsilon-greedy strategy.
Further, the specific steps of step S1 are as follows:
s1-1, acquiring original SMART data of a disk to be monitored in a period of time through a monitoring acquisition system; wherein the original SMART data includes negative sample data and positive sample data;
s1-2, processing original SMART data based on feature selection and feature processing to obtain SMART features and building a training set;
s1-3, classifying disk data of different brands and models; training the training set based on deep learning to obtain the health degree of the single disk;
s1-4, according to the formula:
obtaining health scores H of specific brands and models; wherein n represents the number of hard disks in the brand, i represents the label, and w i Representing the weight assigned to tag i, p i Representing the proportion of the hard disk of tag i in the brand.
Further, the formula of the reward function of step S2 is:
Reward=C 1 *H/H a *MTTDL diff +C 2 *H a /H*MTTD diff +C 3 *Space save +C 4 *Cost diff
wherein λ represents a disk failure rate, μ represents a disk repair rate, k represents k redundant blocks obtained by encoding each block of data, n represents a total block number, MTTDL represents an average failure time, n 1 And k 1 The number of coded blocks and the number of data blocks, n, respectively, representing the new redundancy method 2 And k 2 The number of coded blocks and the number of data blocks representing the old redundancy method, space, respectively save Representing saved space memory, Z i Represents an acceleration factor, N i Represents the number of disks, r represents the normal scrubbing rate, MTTD represents the average detection time, T represents the time span, sigma (-) represents the summing function, cost represents the Cost, reward represents the rewarding function, C 1 、C 2 、C 3 And C 4 Indicating hyper-parameters, H indicating health score, H a Representing the health score, MTTDL, of a set alert level disk diff Indicating the variation of mean time to failure, MTTD diff Representing the variation of the average detection time, cost diff Representing the amount of change in cost.
Further, the initial value of the exploration rate in the step S5 is set to be 1.0, and gradually decreases along with the continuous interaction between the reinforcement learning agent and the environment; the change process of the exploration rate is shown in the following formula:
wherein ε represents the exploration rate, ε final Represents the final exploration rate, ε start represents the initial exploration rate, ε sdecay Attenuation speed, steps, representing exploration rate done Indicating the number of steps that the reinforcement learning agent has performed, e indicating a constant.
Further, in step S5, the generation of the random number adopts an epsilon-greedy strategy, and the specific method is as follows:
according to the formula:
acquiring probability pi of selecting action a in current state s k (a|s), and will pi k (a|s) generating random numbers as probabilities that the random numbers to be generated are smaller than the exploration rate; wherein A represents an action set, a represents an action corresponding to the maximum instant prize, a' represents an action, s represents a state, k represents a time step number, Q k (s, a ') represents the instant prize corresponding to the number of time steps k for performing the action a' in the state s,the maximum instant prize value for the action a' in state s corresponding to the taken time step number k is indicated.
Further, in step S7, the calculation formulas of the values of the different actions in the current state are as follows:
Q(s,a)=E[r+γmax s' Q(s',a')|s=s',a=a']
where s represents the current state, a represents the current action, r represents the reward taken from the environment, γ represents the discount factor, Q (s, a) represents the expected total reward for action a taken in state s, s 'represents the next state, a' represents the next action, max s' Q (s ', a') represents the maximum expected total prize, E [. Cndot.]Representing an optimal bellman function.
Further, the formula of the target value and the minimum loss function obtained by the optimizer in step S10 is as follows:
wherein L represents the minimum loss function, N represents the number of samples, j represents the current step number, r i An immediate return representing the current step number j, gamma representing the discount factor, s 'representing the next state, a' representing the next action, s j State, a, representing the current step number j j An operation of the current step number j is represented, Q (s j ,a j ) Representing the state s of the target network j The action value under the condition A (s ', a ') represents the action value of the target network under the condition S ',representing the probability that the policy network selects action a 'in state s',indicating the future prize value expected to be achieved by selecting the optimal action in the current state.
Further, step S1-2 compensates for the original SMART data imbalance using an undersampling method.
The beneficial effects of the invention are as follows: the intelligent disk monitoring and optimizing system brings the redundancy strategy and the cleaning period into one system through the reinforcement learning method, so that the optimal redundancy strategy and the disk cleaning period can be trained simultaneously, the self-adaptability and the reliability of the system are enhanced, and data are not easy to lose and easy to manage; according to the monitoring and optimizing method, the health condition of the magnetic disk is estimated through deep learning detection, and the magnetic disk intelligent monitoring and optimizing system is trained through reinforcement learning, so that the accuracy is improved.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
The intelligent disk monitoring and optimizing system based on deep reinforcement learning comprises a health evaluation module, a strategy adjustment module and an optimizer; the strategy adjustment module adopts a deep Q network model; the deep Q network model includes a policy network, a target network, and an experience playback buffer;
the health evaluation module is used for acquiring the overall health level of the magnetic discs of different brands;
the policy network is used for acquiring corresponding actions and states according to the overall health level of the magnetic disks of different brands; the actions are used for intelligent monitoring and optimizing of the magnetic disk;
the target network is used for acquiring a target state and a behavior value corresponding to a target action in a training stage;
an experience playback buffer for storing the status, actions, and rewards of the training phase;
and the optimizer is used for acquiring a loss function according to the outputs of the strategy network and the target network and updating parameters of the strategy network based on the loss function.
As shown in fig. 1, a monitoring and optimizing method includes the following steps:
s1, obtaining health scores of magnetic discs of different brands through a health evaluation module;
s2, constructing reinforcement learning intelligent agents and rewarding functions; initializing an environment, a strategy adjustment module, an optimizer, a reward list and the current training times;
s3, setting the ep reward to 0;
s4, obtaining the overall health level of the brand according to the health score of the magnetic disk of the specific brand; simulating a damaged portion of the disk according to the overall health level of the brand; initializing the current step number;
s5, generating a random number and judging whether the random number is smaller than the exploration rate, if so, entering a step S6; otherwise, entering step S7;
s6, randomly selecting an action; step S8 is entered;
s7, selecting the action with the maximum action value in the current state through a strategy network; step S8 is entered;
s8, executing actions in the environment to obtain the next state and rewards; storing the current state, the executed action and the acquired rewards to an experience playback buffer;
s9, judging whether the size of the experience playback buffer area is larger than or equal to a set value, if so, entering a step S10; otherwise, enter step S11;
s10, randomly extracting a batch of experience data in an experience playback buffer area, and respectively obtaining a target value and a predicted value through a target network and a strategy network; calculating a loss function between the target value and the predicted value according to the target value and the predicted value; parameter updating of the strategy network is carried out through the minimized loss function of the optimizer, and an updated strategy network is obtained;
s11, updating the current state and information of the reinforcement learning agent and the epicode rewards according to the next state and rewards obtained in the step S8;
s12, judging whether the current step number reaches the maximum step number, if so, ending a round of training, adding the epoode rewards to a rewards list and entering a step S13; otherwise, adding 1 to the current step number and returning to the step S5;
s13, judging whether the current training times reach the maximum training times, if so, obtaining a trained intelligent disk monitoring and optimizing system and entering a step S14; otherwise, resetting the environment and the initial state, adding 1 to the current training times and returning to the step S3;
s14, deploying the trained intelligent disk monitoring and optimizing system to a data center system; obtaining the overall health level of the brand through a trained disk intelligent monitoring and optimizing system; and dynamically adjusting the corresponding disk redundancy strategy and the disk cleaning rate according to the overall health level to finish monitoring and optimizing.
The health evaluation module adopts an LSTM neural network model; the LSTM neural network model comprises two LSTM layers and a full connection layer which are connected in series; each LSTM layer includes 128 LSTM cells; the fully connected layer comprises 4 neurons; the LSTM layer adopts a ReLU function as an activation function; the fully connected layer adopts a softmax function as an activation function;
the strategy network comprises an input layer, a hidden layer and an output layer; the target network comprises an input layer, a hidden layer and an output layer; the deep Q network model is trained by adopting a Q-Learning algorithm, and environment exploration is carried out by adopting an epsilon-greedy strategy.
The specific steps of the step S1 are as follows:
s1-1, acquiring original SMART data of a disk to be monitored in a period of time through a monitoring acquisition system; wherein the original SMART data includes negative sample data and positive sample data;
s1-2, processing original SMART data based on feature selection and feature processing to obtain SMART features and building a training set;
s1-3, classifying disk data of different brands and models; training the training set based on deep learning to obtain the health degree of the single disk;
s1-4, according to the formula:
obtaining health scores H of specific brands and models; wherein n represents the number of hard disks in the brand, i represents the label, and w i Representing the weight assigned to tag i, p i Representing the proportion of the hard disk of tag i in the brand.
The formula of the bonus function of step S2 is:
Reward=C 1 *H/H a *MTTDL diff +C 2 *H a /H*MTTD diff +C 3 *Space save +C 4 *Cost diff
wherein λ represents a disk failure rate, μ represents a disk repair rate, k represents k redundant blocks obtained by encoding each block of data, n represents a total block number, MTTDL represents an average failure time, n 1 And k 1 The number of coded blocks and the number of data blocks, n, respectively, representing the new redundancy method 2 And k 2 The number of coded blocks and the number of data blocks representing the old redundancy method, space, respectively save Representing saved space memory, Z i Represents an acceleration factor, N i Represents the number of disks, r represents the normal scrubbing rate, MTTD represents the average detection time, T represents the time span, sigma (-) represents the summing function, cost represents the Cost, reward represents the rewarding function, C 1 、C 2 、C 3 And C 4 Indicating hyper-parameters, H indicating health score, H a Representing the health score, MTTDL, of a set alert level disk diff Indicating the variation of mean time to failure, MTTD diff Representing the variation of the average detection time, cost diff Representing the amount of change in cost.
Setting the initial value of the exploration rate in the step S5 to be 1.0, and gradually reducing along with continuous interaction between the reinforcement learning intelligent agent and the environment; the change process of the exploration rate is shown in the following formula:
wherein ε represents the exploration rate, ε final Represents the final exploration rate, ε start represents the initial exploration rate, ε sdecay Attenuation speed, steps, representing exploration rate done Indicating the number of steps that the reinforcement learning agent has performed, e indicating a constant.
In the step S5, the generation of the random number adopts an epsilon-greedy strategy, and the specific method is as follows:
according to the formula:
acquiring probability pi of selecting action a in current state s k (a|s), and will pi k (a|s) generating random numbers as probabilities that the random numbers to be generated are smaller than the exploration rate; wherein A represents an action set, a represents an action corresponding to the maximum instant prize, a' represents an action, s represents a state, k represents a time step number, Q k (s, a ') represents the instant prize corresponding to the number of time steps k for performing the action a' in the state s,the maximum instant prize value for the action a' in state s corresponding to the taken time step number k is indicated.
In step S7, the calculation formulas of the values of the different actions in the current state are as follows:
Q(s,a)=E[r+γmax s' Q(s',a')|s=s',a=a']
where s represents the current state, a represents the current action, r represents the reward taken from the environment, γ represents the discount factor, Q (s, a) represents the expected total reward for action a taken in state s, s 'represents the next state, a' represents the next action, max s' Q (s ', a') represents the maximum expected total prize, E [. Cndot.]Representing an optimal bellman function.
The formula of the target value and the minimum loss function obtained by the optimizer in step S10 is as follows:
wherein L represents the minimum loss function, N represents the number of samples, j represents the current step number, r i An immediate return representing the current step number j, gamma representing the discount factor, s 'representing the next state, a' representing the next action, s j State, a, representing the current step number j j An operation of the current step number j is represented, Q (s j ,a j ) Representing the state s of the target network j The action value under the condition A (s ', a ') represents the action value of the target network under the condition S ',representing the probability that the policy network selects action a 'in state s',indicating the future prize value expected to be achieved by selecting the optimal action in the current state.
Step S1-2 adopts an undersampling method to compensate for the imbalance of the original SMART data.
In one embodiment of the invention, the target network and policy network parameters of the disk intelligent monitoring and optimizing system based on deep reinforcement learning are the same, but are not updated when the disk intelligent monitoring and optimizing system of the invention is trained. The size of the input layer of the strategy network is set to 10000, namely the set number of disks; the hidden layer size of the policy network is set to 32; the size of the output layer of the policy network is the action space size, namely, nine actions of 0 to 8, wherein 0 indicates that no operation is performed, and 1 to 8 respectively indicate different actions. When the health score is calculated, the weight of a normal disk is lower, and the weight of a failed disk is higher; in addition, there are indexes for judging the accuracy of the disk health score, namely, accuracy (Precision), recall (Recall), comprehensive evaluation index (F-measure) and comprehensive evaluation index MCC, and the corresponding formulas are as follows:
where TP represents the true instance, i.e., the number of positive samples correctly classified by the model as positive classes; TN represents true counterexamples, i.e., the number of negative examples that the model correctly classifies as negative; FP represents the false positive, i.e. the number of negative samples that the model wrongly classifies as positive classes; FN represents the false negative, i.e. the number of positive samples that the model wrongly classifies as negative.
The four indexes of the reward function are mean time between failure (MTTDL), and Space saving memory (Space) save ) Average detection time (MTTD) and Cost (Cost). The mean time between failure (MTTDL) refers to the mean time that the system can normally run before data loss occurs, reflects the reliability of the system, and indicates that the greater the value is, the higher the reliability of the system is; space saving memory (Space) save ) The saved space occupies the proportion of the original data space, reflects the storage efficiency of the system, and indicates that the higher the value is, the higher the storage efficiency of the system is; the average detection time (MTTD) refers to the average time spent by the system before the disk failure is found after the disk cleaning period is modified, reflects the failure detection capability of the system, and indicates that the lower the value is, the stronger the failure detection capability of the system is; cost (Cost) refers to the system consumption after modifying the disk wash cycle, reflecting the running Cost of the system, and the lower the value, the lower the running Cost of the system. The system consumption refers to the cost of performing disk cleaning, including disk life, energy consumption, performance loss, and the like.
The rewards list is the total rewards earned by the reinforcement agent for learning and performing tasks in different environments. The exploration rate refers to the tendency of an agent to try new actions in the learning process, and the performance degradation caused by excessive exploration of the agent can be prevented by reducing the exploration rate. The intelligent monitoring and optimizing system based on the deep reinforcement learning for the magnetic disk adjusts the behavior strategy by tracking the performances of the intelligent agent in different states and recording rewards and executed actions of each state.
In summary, the redundancy strategy and the cleaning period are integrated into a system through the reinforcement learning method, so that the optimal redundancy strategy and the disk cleaning period can be trained simultaneously, the self-adaptability and the reliability of the system are enhanced, and the data are not easy to lose and are easy to manage; according to the monitoring and optimizing method, the health condition of the magnetic disk is estimated through deep learning detection, and the magnetic disk intelligent monitoring and optimizing system is trained through reinforcement learning, so that the accuracy is improved.

Claims (10)

1. A disk intelligent monitoring and optimizing system based on deep reinforcement learning is characterized in that: the system comprises a health evaluation module, a strategy adjustment module and an optimizer; the strategy adjustment module adopts a deep Q network model; the deep Q network model includes a policy network, a target network, and an experience playback buffer;
the health evaluation module is used for acquiring the overall health level of the magnetic discs of different brands;
the strategy network is used for acquiring corresponding actions and states according to the overall health level of the magnetic disks of different brands; the actions are used for intelligent monitoring and optimizing of the magnetic disk;
the target network is used for acquiring a target state and a behavior value corresponding to a target action in a training stage;
the experience playback buffer area is used for storing the state, action and rewards of the training stage;
and the optimizer is used for acquiring a loss function according to the output of the strategy network and the target network and updating the parameters of the strategy network based on the loss function.
2. The intelligent monitoring and optimization system for a disk based on deep reinforcement learning of claim 1, wherein: the health evaluation module adopts an LSTM neural network model; the LSTM neural network model comprises two LSTM layers and a full connection layer which are connected in series; each LSTM layer comprises 128 LSTM units; the fully-connected layer comprises 4 neurons; the LSTM layer adopts a ReLU function as an activation function; the fully connected layer adopts a softmax function as an activation function;
the strategy network comprises an input layer, a hidden layer and an output layer; the target network comprises an input layer, a hidden layer and an output layer; the deep Q network model is trained by adopting a Q-Learning algorithm, and environment exploration is carried out by adopting an epsilon-greedy strategy.
3. A method for monitoring and optimizing a disk intelligent monitoring and optimizing system based on deep reinforcement learning as set forth in any one of claims 1 or 2, characterized in that: the method comprises the following steps:
s1, obtaining health scores of magnetic discs of different brands through a health evaluation module;
s2, constructing reinforcement learning intelligent agents and rewarding functions; initializing an environment, a strategy adjustment module, an optimizer, a reward list and the current training times;
s3, setting the ep reward to 0;
s4, obtaining the overall health level of the brand according to the health score of the magnetic disk of the specific brand; simulating a damaged portion of the disk according to the overall health level of the brand; initializing the current step number;
s5, generating a random number and judging whether the random number is smaller than the exploration rate, if so, entering a step S6; otherwise, entering step S7;
s6, randomly selecting an action; step S8 is entered;
s7, selecting the action with the maximum action value in the current state through a strategy network; step S8 is entered;
s8, executing actions in the environment to obtain the next state and rewards; storing the current state, the executed action and the acquired rewards to an experience playback buffer;
s9, judging whether the size of the experience playback buffer area is larger than or equal to a set value, if so, entering a step S10; otherwise, enter step S11;
s10, randomly extracting a batch of experience data in an experience playback buffer area, and respectively obtaining a target value and a predicted value through a target network and a strategy network; calculating a loss function between the target value and the predicted value according to the target value and the predicted value; parameter updating of the strategy network is carried out through the minimized loss function of the optimizer, and an updated strategy network is obtained;
s11, updating the current state and information of the reinforcement learning agent and the epicode rewards according to the next state and rewards obtained in the step S8;
s12, judging whether the current step number reaches the maximum step number, if so, ending a round of training, adding the epoode rewards to a rewards list and entering a step S13; otherwise, adding 1 to the current step number and returning to the step S5;
s13, judging whether the current training times reach the maximum training times, if so, obtaining a trained intelligent disk monitoring and optimizing system and entering a step S14; otherwise, resetting the environment and the initial state, adding 1 to the current training times and returning to the step S3;
s14, deploying the trained intelligent disk monitoring and optimizing system to a data center system; obtaining the overall health level of the brand through a trained disk intelligent monitoring and optimizing system; and dynamically adjusting the corresponding disk redundancy strategy and the disk cleaning rate according to the overall health level to finish monitoring and optimizing.
4. A method of monitoring and optimizing as claimed in claim 3, wherein: the specific steps of the step S1 are as follows:
s1-1, acquiring original SMART data of a disk to be monitored in a period of time through a monitoring acquisition system; wherein the original SMART data includes negative sample data and positive sample data;
s1-2, processing the original SMART data based on feature selection and feature processing to obtain SMART features and building a training set;
s1-3, classifying disk data of different brands and models; training the training set based on deep learning to obtain the health degree of the single disk;
s1-4, according to the formula:
obtaining health scores H of specific brands and models; wherein n represents the number of hard disks in the brand, i represents the label, and w i Representing the weight assigned to tag i, p i Representing the proportion of the hard disk of tag i in the brand.
5. A method of monitoring and optimizing as claimed in claim 3, wherein: the formula of the reward function of the step S2 is:
Reward=C 1 *H/H a *MTTDL diff +C 2 *H a /H*MTTD diff +C 3 *Space save +C 4 *Cost diff
wherein λ represents a disk failure rate, μ represents a disk repair rate, k represents k redundant blocks obtained by encoding each block of data, n represents a total block number, and MTTDL represents an average absenceTime of failure, n 1 And k 1 The number of coded blocks and the number of data blocks, n, respectively, representing the new redundancy method 2 And k 2 The number of coded blocks and the number of data blocks representing the old redundancy method, space, respectively save Representing saved space memory, Z i Represents an acceleration factor, N i Represents the number of disks, r represents the normal scrubbing rate, MTTD represents the average detection time, T represents the time span, sigma (-) represents the summing function, cost represents the Cost, reward represents the rewarding function, C 1 、C 2 、C 3 And C 4 Indicating hyper-parameters, H indicating health score, H a Representing the health score, MTTDL, of a set alert level disk diff Indicating the variation of mean time to failure, MTTD diff Representing the variation of the average detection time, cost diff Representing the amount of change in cost.
6. A method of monitoring and optimizing as claimed in claim 3, wherein: the initial value of the exploration rate in the step S5 is set to be 1.0, and gradually decreases along with continuous interaction between the reinforcement learning intelligent agent and the environment; the change process of the exploration rate is shown in the following formula:
wherein ε represents the exploration rate, ε final Represents the final exploration rate, ε start represents the initial exploration rate, ε sdecay Attenuation speed, steps, representing exploration rate done Indicating the number of steps that the reinforcement learning agent has performed, e indicating a constant.
7. A method of monitoring and optimizing as claimed in claim 3, wherein: the random number generation in the step S5 adopts an epsilon-greedy strategy, and the specific method is as follows:
according to the formula:
acquiring probability pi of selecting action a in current state s k (a|s), and will pi k (a|s) generating random numbers as probabilities that the random numbers to be generated are smaller than the exploration rate; wherein A represents an action set, a represents an action corresponding to the maximum instant prize, a' represents an action, s represents a state, k represents a time step number, Q k (s, a ') represents the instant prize corresponding to the number of time steps k for performing the action a' in the state s,the maximum instant prize value for the action a' in state s corresponding to the taken time step number k is indicated.
8. A method of monitoring and optimizing as claimed in claim 3, wherein: the calculation formula of the values of the different actions in the current state in the step S7 is as follows:
Q(s,a)=E[r+γmax s' Q(s',a')|s=s',a=a']
where s represents the current state, a represents the current action, r represents the reward taken from the environment, γ represents the discount factor, Q (s, a) represents the expected total reward for action a taken in state s, s 'represents the next state, a' represents the next action, max s' Q (s ', a') represents the maximum expected total prize, E [. Cndot.]Representing an optimal bellman function.
9. A method of monitoring and optimizing as claimed in claim 3, wherein: the formula of the target value and the minimum loss function obtained by the optimizer in the step S10 is as follows:
wherein L represents the minimum loss function, N represents the number of samples, j represents the current step number, r i Immediate return representing current step number j, gamma representing discount factor, s' tableShowing the next state, a' represents the next action, s j State, a, representing the current step number j j An operation of the current step number j is represented, Q (s j ,a j ) Representing the state s of the target network j The action value under the condition A (s ', a ') represents the action value of the target network under the condition S ',representing the probability that the policy network selects action a 'in state s',indicating the future prize value expected to be achieved by selecting the optimal action in the current state.
10. A method of monitoring and optimizing as claimed in claim 3, wherein: the step S1-2 adopts an undersampling method to compensate the unbalance of the original SMART data.
CN202310783248.9A 2023-06-28 2023-06-28 Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning Pending CN116820883A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310783248.9A CN116820883A (en) 2023-06-28 2023-06-28 Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310783248.9A CN116820883A (en) 2023-06-28 2023-06-28 Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
CN116820883A true CN116820883A (en) 2023-09-29

Family

ID=88125431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310783248.9A Pending CN116820883A (en) 2023-06-28 2023-06-28 Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN116820883A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117575174A (en) * 2024-01-15 2024-02-20 山东环球软件股份有限公司 Intelligent agricultural monitoring and management system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117575174A (en) * 2024-01-15 2024-02-20 山东环球软件股份有限公司 Intelligent agricultural monitoring and management system
CN117575174B (en) * 2024-01-15 2024-04-02 山东环球软件股份有限公司 Intelligent agricultural monitoring and management system

Similar Documents

Publication Publication Date Title
CN110413227B (en) Method and system for predicting remaining service life of hard disk device on line
JP7105932B2 (en) Anomaly detection using deep learning on time series data related to application information
CN110399238B (en) Disk fault early warning method, device, equipment and readable storage medium
US7493300B2 (en) Model and system for reasoning with N-step lookahead in policy-based system management
CN108647136A (en) Hard disk corruptions prediction technique and device based on SMART information and deep learning
CN111178553A (en) Industrial equipment health trend analysis method and system based on ARIMA and LSTM algorithms
CN116820883A (en) Intelligent disk monitoring and optimizing system and method based on deep reinforcement learning
CN108446734A (en) Disk failure automatic prediction method based on artificial intelligence
Blouw et al. Event-driven signal processing with neuromorphic computing systems
CN112433896B (en) Method, device, equipment and storage medium for predicting server disk faults
CN116683588B (en) Lithium ion battery charge and discharge control method and system
CN112446557B (en) Disk failure prediction evasion method and system based on deep learning
KR20210082349A (en) Method and apparatus for determining storage load of application
CN112988550A (en) Server failure prediction method, device and computer readable medium
Li et al. Prediction of HDD failures by ensemble learning
Wang et al. Evaluation and prediction method of rolling bearing performance degradation based on attention-LSTM
KR102480518B1 (en) Method for credit evaluation model update or replacement and apparatus performing the method
CN115617604A (en) Disk failure prediction method and system based on image pattern matching
KR20080087571A (en) Context prediction system and method thereof
CN114227701A (en) Robot fault prediction method based on production data
CN113268782A (en) Machine account identification and camouflage countermeasure method based on graph neural network
CN112395167A (en) Operation fault prediction method and device and electronic equipment
CN117473445B (en) Extreme learning machine-based equipment abnormality analysis method and device
CN117556221B (en) Data analysis method and system based on intelligent electrical control interaction session
CN116894658A (en) Method for predicting faults of internal and external equipment in warranty period based on attribute characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination