CN117667519A - Data backup performance optimization method based on machine learning - Google Patents

Data backup performance optimization method based on machine learning Download PDF

Info

Publication number
CN117667519A
CN117667519A CN202311721170.4A CN202311721170A CN117667519A CN 117667519 A CN117667519 A CN 117667519A CN 202311721170 A CN202311721170 A CN 202311721170A CN 117667519 A CN117667519 A CN 117667519A
Authority
CN
China
Prior art keywords
backup
data
backed
information
storage capacity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311721170.4A
Other languages
Chinese (zh)
Inventor
方彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Credible Cloud Technology Co ltd
Original Assignee
Shenzhen Credible Cloud Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Credible Cloud Technology Co ltd filed Critical Shenzhen Credible Cloud Technology Co ltd
Priority to CN202311721170.4A priority Critical patent/CN117667519A/en
Publication of CN117667519A publication Critical patent/CN117667519A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data backup performance optimization method based on machine learning, which particularly relates to the technical field of data backup, and comprises the following steps: step one, collecting backup information of a target backup system; step two, the collected backup information is arranged and operated, and a neural network model is built; step three, collecting data information to be backed up; analyzing according to the data information to be backed up and the information which can be backed up, and determining that the data information to be backed up is backed up in whole or separately according to the analysis result; the invention can determine whether the data to be backed up is the whole backup or the separate backup according to the analysis result of the data information to be backed up and the backup information by collecting the backup information of the target backup system, thereby avoiding unnecessary data transmission, reducing backup time, removing the paths possibly causing data backup deletion during path selection according to the established neural network model, and realizing the optimization of data backup performance.

Description

Data backup performance optimization method based on machine learning
Technical Field
The invention relates to the technical field of data backup, in particular to a data backup performance optimization method based on machine learning.
Background
Along with the popularization of computers and the development of information technology, information security and data storage technology are attracting more and more importance to large enterprises, and the data can be effectively prevented from being lost by backing up the data, so that the importance of data backup and recovery is increasingly highlighted. However, the conventional data backup method cannot always meet the requirements of large-scale and high concurrency, when the data to be backed up is too large, more time is required to screen backup paths, and as the backup data is increased, the problems of efficiency reduction and unstable channels of backup paths caused by confusion of backup paths are easy to occur, and finally the problems of disturbance and failure in backup of the backup data are caused, so that the method can improve the data backup performance, reduce the backup time and improve the backup efficiency, and becomes the current problem to be solved urgently.
Items of the invention
In order to achieve the above purpose, the present invention provides the following technical solutions:
the data backup performance optimization method based on machine learning comprises the following steps:
step one, collecting backup information of a target backup system;
step two, the collected backup information is arranged and operated, and a neural network model is built;
step three, collecting data information to be backed up;
analyzing according to the data information to be backed up and the information which can be backed up, and determining that the data information to be backed up is backed up in whole or separately according to the analysis result;
inputting the data to be backed up into a neural network model according to the analysis result in the step four, selecting an optimal backup path according to the output result of the neural network model, and backing up the data to be backed up;
and step six, acquiring current backup performance statistics when the backup is completed, and optimizing the data backup performance when the current backup performance statistics are greater than the historical backup performance statistics.
The backup information of the target backup system comprises a backup path of the target backup system, total storage capacity corresponding to the backup path, existing storage capacity and path distance, the backup path is marked as KB i, the total storage capacity corresponding to the backup path is marked as ZC i-m, the existing storage capacity is marked as XC i-m, and the path distance is marked as LJ i-m.
In a preferred embodiment, the sorting operation of the collected backup information means that:
step 1, obtaining a backup path KB i, a total storage amount ZC i-m corresponding to the backup path, an existing storage amount XC i-m and a path distance LJ i-m from backup information of a target backup system;
and 2, respectively carrying out unit unification processing on the total storage capacity ZC i-m, the existing storage capacity XC i-m and the path distance LJ i-m corresponding to the backup paths, and summarizing the backup paths KB i, the total storage capacity ZC i-m, the existing storage capacity XC i-m and the path distance LJ i-m corresponding to the backup paths into a backup path collating set.
In a preferred embodiment, in the second step, building a neural network model means:
step one, simulating backup operation, namely selecting data with the data size of SJn as backup data during simulation, then using backup paths KB i to backup, wherein each backup path KB i collects k groups of data sets, each group of data sets comprises backup paths KBi, total storage capacity ZC i-m corresponding to the backup paths, existing storage capacity XC i-m and path distance LJ i-m, and marking the total time for completing the simulation backup as MNT when the simulation backup is completed;
step two, transmitting the k groups of data sets in the step one to a neural network model, taking the total storage capacity ZC i-m corresponding to the backup path, the existing storage capacity XC i-m and the path distance LJ i-m with the data size of SJn as input data of model training, taking MNT when the simulation backup is completed as output data of model training, and taking the MNT as output data of model training according to the proportion of 6:2:2, dividing the model into a training set, a verification set and a test set, and considering that the training is finished when the performance of the model on the verification set is not improved any more;
and thirdly, assigning MNT for completing the simulation backup to be 0 or 1 for training the model on the basis of the second step, wherein 0 represents that MNT for completing the simulation backup is smaller than or equal to a set threshold Y1, and 1 represents that MNT for completing the simulation backup is larger than the set threshold Y1.
In a preferred embodiment, in the third step, the data information to be backed up includes a total occupied storage space required for the data to be backed up, an occupied storage space required for each piece of divided backup data in the case of separable backup, and the total occupied storage space required for the data to be backed up is denoted as ZC, and the occupied storage space required for each piece of divided backup data in the case of separable backup is denoted as FCp.
In a preferred embodiment, in the fourth step, analysis is performed according to the data information to be backed up and the backup information, and determining that the data information to be backed up is backed up in whole or separately according to the analysis result refers to:
comparing the total storage capacity ZC required to be occupied by the data to be backed up with a preset threshold value, if the total storage capacity ZC required to be occupied by the data to be backed up is larger than the preset threshold value Y2, separating backup, and if the total storage capacity ZC required to be occupied by the data to be backed up is smaller than the preset threshold value Y2, performing secondary analysis processing.
The logic of the secondary analysis process is as follows:
s1, substituting the total storage capacity ZC occupied by the data to be backed up, the total storage capacity ZC i-m corresponding to the backup path, the existing storage capacity XC i-m and the path distance LJ i-m into a calculation formula of a determination value PD, alpha and beta are specific proportionality coefficients, when the judging value PD is more than or equal to a preset threshold value Y3, and +.>When the result of the data information to be backed up is larger than or equal to a preset threshold Y4, the data information to be backed up is backed up in whole, otherwise, the data information to be backed up is backed up separately.
In a preferred embodiment, in the fifth step, inputting the data to be backed up into the neural network model according to the analysis result in the fourth step, selecting the best backup path according to the output result of the neural network model, and backing up the data to be backed up means:
when the data information to be backed up is backed up in whole, inputting the total storage capacity ZC required by the data to be backed up, the total storage capacity ZC i-m corresponding to the backup path, the existing storage capacity XC i-m and the path distance LJ i-m into a neural network model, and sorting the backup paths KB i corresponding to the obtained result of assigning the MNT for completing the simulation backup to 0 according to the size of the MNT for completing the simulation backup, taking the backup paths KB i arranged at the head as the optimal backup path and backing up the data to be backed up;
when the data information to be backed up is backed up separately, the size of the storage space FCp required by each piece of separated backup data is ordered in descending order, the storage space FCp required by each piece of separated backup data, the total storage space ZC i-m corresponding to the backup path, the existing storage space XC i-m and the path distance LJ i-m are input into the neural network model according to the ordering order, the backup paths KB i corresponding to the obtained result of assigning 0 to the total time of completing the simulation backup are ordered in ascending order according to the size of MNT when the total time of completing the simulation backup is completed, the backup paths KB i arranged at the head are used as the optimal backup paths corresponding to the current storage space FCp required by the separated backup data, then the backup paths KB i are deleted from the neural network model until the result of assigning 0 to the total time of not completing the simulation backup is not completed, then all backup paths KB i are thrown into the neural network model for next batch matching, all backup paths KB i which are completed in the backup are updated according to the order, and all backup paths which are completed in the backup paths need to occupy the storage space FCp are completely matched according to the order, and then all backup paths need to be completed are matched with all backup paths are completed in sequence.
The invention has the technical effects and advantages that:
the invention can improve the backup efficiency, can more accurately determine the optimal backup path by collecting the backup information of the target backup system, sorting the information and establishing the neural network model, thereby improving the backup efficiency, and can determine whether the data to be backed up is backup in whole or separate according to the analysis result of the data information to be backed up and the backup information, thereby avoiding unnecessary data transmission and reducing the backup time.
The invention can predict the possible problems in the data transmission process, such as easy breaking of balance, etc., by carrying out simulated backup on the data to be backed up and training the neural network model, thereby eliminating the path possibly causing data backup missing during path selection, avoiding the problem of data backup missing, and realizing the optimization of the data backup performance by acquiring the current backup performance statistics and comparing with the historical backup performance statistics, further improving the backup efficiency and reducing the backup time.
Drawings
For the convenience of those skilled in the art, the present invention will be further described with reference to the accompanying drawings;
fig. 1 is a schematic diagram of a machine learning-based data backup performance optimization method in the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Embodiment 1, a machine learning-based data backup performance optimization method, comprising the steps of:
step one, collecting backup information of a target backup system; the backup information of the target backup system comprises a backup path of the target backup system, total storage capacity corresponding to the backup path, existing storage capacity and path distance, the backup path is marked as KB i, the total storage capacity corresponding to the backup path is marked as ZC i-m, the existing storage capacity is marked as XC i-m, the path distance is marked as LJ i-m, i, m, n, p, t in the text is only the sequence or the type of the project, the specific meaning is not involved, and the details are not repeated in the following, and i, m, n, p, t is a positive integer in the process of value.
Step two, the collected backup information is arranged and operated, and a neural network model is built; the method comprises the following steps: step 1, obtaining a backup path KB i, a total storage amount ZC i-m corresponding to the backup path, an existing storage amount XC i-m and a path distance LJ i-m from backup information of a target backup system;
step 2, respectively carrying out unit unification processing on the total storage capacity ZC i-m, the existing storage capacity XC i-m and the path distance LJ i-m corresponding to the backup paths, wherein the unit unification processing refers to unit unification conversion processing according to international standard so that units in subsequent calculation have uniformity, and summarizing the backup paths KB i, the total storage capacity ZC i-m corresponding to the backup paths, the existing storage capacity XC i-m and the path distance LJ i-m into backup path collating sets so as to facilitate subsequent backup path KB i searching, and establishing a neural network model refers to:
step one, simulating backup operation, namely selecting data with the data size of SJn as backup data during simulation, then using backup paths KB i to backup, wherein each backup path KB i collects k groups of data sets, each group of data sets comprises backup paths KBi, total storage capacity ZC i-m corresponding to the backup paths, existing storage capacity XC i-m and path distance LJ i-m, and marking the total time for completing the simulation backup as MNT when the simulation backup is completed;
step two, transmitting the k groups of data sets in the step one to a neural network model, taking the total storage capacity ZC i-m corresponding to the backup path, the existing storage capacity XC i-m and the path distance LJ i-m with the data size of SJn as input data of model training, taking MNT when the simulation backup is completed as output data of model training, and taking the MNT as output data of model training according to the proportion of 6:2:2, dividing the model into a training set, a verification set and a test set, and considering that the training is finished when the performance of the model on the verification set is not improved any more;
and thirdly, assigning MNT for completing the simulation backup to be 0 or 1 for training the model on the basis of the second step, wherein 0 represents that MNT for completing the simulation backup is smaller than or equal to a set threshold Y1, and 1 represents that MNT for completing the simulation backup is larger than the set threshold Y1.
Step three, collecting data information to be backed up; the data information to be backed up comprises the total occupied storage capacity of the data to be backed up, the occupied storage capacity of each piece of divided backup data in the separable backup process, the total occupied storage capacity of the data to be backed up is marked as ZC, and the occupied storage capacity of each piece of divided backup data in the separable backup process is marked as FCp.
Analyzing according to the data information to be backed up and the information which can be backed up, and determining that the data information to be backed up is backed up in whole or separately according to the analysis result; the method comprises the following steps: comparing the total storage capacity ZC required to be occupied by the data to be backed up with a preset threshold value, if the total storage capacity ZC required to be occupied by the data to be backed up is larger than the preset threshold value Y2, separating backup, and if the total storage capacity ZC required to be occupied by the data to be backed up is smaller than the preset threshold value Y2, performing secondary analysis processing.
The logic of the secondary analysis process is as follows:
s1, substituting the total storage capacity ZC occupied by the data to be backed up, the total storage capacity ZC i-m corresponding to the backup path, the existing storage capacity XC i-m and the path distance LJ i-m into a calculation formula of a determination value PD, alpha and beta are specific proportionality coefficients, when the judging value PD is more than or equal to a preset threshold value Y3, and +.>When the result of the data information to be backed up is larger than or equal to a preset threshold Y4, the data information to be backed up is backed up in whole, otherwise, the data information to be backed up is backed up separately.
Inputting the data to be backed up into a neural network model according to the analysis result in the step four, selecting an optimal backup path according to the output result of the neural network model, and backing up the data to be backed up; in the fifth step, inputting the data to be backed up into the neural network model according to the analysis result in the fourth step, selecting the best backup path according to the output result of the neural network model, and backing up the data to be backed up means that:
when the data information to be backed up is backed up in whole, backing up the data to be backed up according to a preset backup strategy;
when the data information to be backed up is backed up separately, backing up the data to be backed up according to a preset backup strategy II.
Step six, acquiring current backup performance statistics when the backup is completed, and when the current backup performance statistics are greater than the historical backup performance statistics, optimizing the data backup performance, wherein the backup performance can be referred to as the total time spent from the start of the backup to the completion in the embodiment, and the data backup performance can be considered to be optimized when the time spent is shorter.
Embodiment 2, a machine learning-based data backup performance optimization method, comprising the steps of:
step one, collecting backup information of a target backup system;
step two, the collected backup information is arranged and operated, and a neural network model is built;
step three, collecting data information to be backed up;
analyzing according to the data information to be backed up and the information which can be backed up, and determining that the data information to be backed up is backed up in whole or separately according to the analysis result;
inputting the data to be backed up into a neural network model according to the analysis result in the step four, selecting an optimal backup path according to the output result of the neural network model, and backing up the data to be backed up;
and step six, acquiring current backup performance statistics when the backup is completed, and optimizing the data backup performance when the current backup performance statistics are greater than the historical backup performance statistics.
The backup information of the target backup system comprises a backup path of the target backup system, total storage capacity corresponding to the backup path, existing storage capacity and path distance, the backup path is marked as KB i, the total storage capacity corresponding to the backup path is marked as ZC i-m, the existing storage capacity is marked as XC i-m, and the path distance is marked as LJ i-m.
The sorting operation of the collected backup information is as follows:
step 1, obtaining a backup path KB i, a total storage amount ZC i-m corresponding to the backup path, an existing storage amount XC i-m and a path distance LJ i-m from backup information of a target backup system;
and 2, respectively carrying out unit unification processing on the total storage capacity ZC i-m, the existing storage capacity XC i-m and the path distance LJ i-m corresponding to the backup paths, and summarizing the backup paths KB i, the total storage capacity ZC i-m, the existing storage capacity XC i-m and the path distance LJ i-m corresponding to the backup paths into a backup path collating set.
In the second step, building a neural network model refers to:
step one, simulating backup operation, namely selecting data with the data size of SJn as backup data during simulation, then using backup paths KB i to backup, wherein each backup path KB i collects k groups of data sets, each group of data sets comprises backup paths KBi, total storage capacity ZC i-m corresponding to the backup paths, existing storage capacity XC i-m and path distance LJ i-m, and marking the total time for completing the simulation backup as MNT when the simulation backup is completed;
step two, transmitting the k groups of data sets in the step one to a neural network model, taking the total storage capacity ZC i-m corresponding to the backup path, the existing storage capacity XC i-m and the path distance LJ i-m with the data size of SJn as input data of model training, taking MNT when the simulation backup is completed as output data of model training, and taking the MNT as output data of model training according to the proportion of 6:2:2, dividing the model into a training set, a verification set and a test set, and considering that the training is finished when the performance of the model on the verification set is not improved any more;
and thirdly, assigning MNT for completing the simulation backup to be 0 or 1 for training the model on the basis of the second step, wherein 0 represents that MNT for completing the simulation backup is smaller than or equal to a set threshold Y1, and 1 represents that MNT for completing the simulation backup is larger than the set threshold Y1. For how to confirm that training is completed, an early-stopping mechanism may be set, that is, when the performance of the model on the verification set is no longer improved, the training is considered to be completed, or a maximum training period may be set, and after a certain number of training times is reached, the training is stopped, whether the performance of the model is still improved or not, where the early-stopping mechanism is selected, and a neural network in the neural network model is selected as a convolutional neural network, where the neural network model may be set on a learning machine, and where the neural network model is built according to the above manner, where MNt is considered to be easily broken in the data transmission process when the total use of the simulated backup is completed, so that the overall data transmission time is prolonged, and even the problem of data backup loss easily occurs, and therefore MNt is eliminated when the total use of the simulated backup is completed when the path is selected.
In the third step, the data information to be backed up includes the total occupied storage capacity of the data to be backed up, the occupied storage capacity of each piece of divided backup data in the case of separable backup, and the total occupied storage capacity of the data to be backed up is marked as ZC, the occupied storage capacity of each piece of divided backup data in the case of separable backup is marked as FCp, the splitting mode of each piece of divided backup data in the case of separable backup follows a preset data splitting strategy, such as the number of folders of the minimum level, and then each folder is subdivided into the minimum level and split.
In the fourth step, analysis is performed according to the data information to be backed up and the backup information, and determining that the data information to be backed up is backed up in whole or separately according to the analysis result refers to:
comparing the total storage capacity ZC required to be occupied by the data to be backed up with a preset threshold value, if the total storage capacity ZC required to be occupied by the data to be backed up is larger than the preset threshold value Y2, separating backup, and if the total storage capacity ZC required to be occupied by the data to be backed up is smaller than the preset threshold value Y2, performing secondary analysis processing.
The logic of the secondary analysis process is as follows:
s1, occupying total storage ZC required by data to be backed up, corresponding total storage ZC i-m of a backup path, existing storage XC i-m and a pathThe radial distance LJ i-m is substituted into the calculation formula of the determination value PD, alpha and beta are specific proportionality coefficients, when the judging value PD is more than or equal to a preset threshold value Y3, and +.>When the result of the data information to be backed up is larger than or equal to a preset threshold Y4, otherwise, the data information to be backed up is backed up separately, and the judging value PD is smaller than the preset threshold Y3, and the balance is determined to be easy to break in the data transmission process, so that the whole data transmission time is prolonged, and even the problem of data backup missing is easy to occur.
In the fifth step, inputting the data to be backed up into the neural network model according to the analysis result in the fourth step, selecting the best backup path according to the output result of the neural network model, and backing up the data to be backed up means that:
when the data information to be backed up is backed up in whole, backing up the data to be backed up according to a preset backup strategy; the method comprises the following steps: inputting the total storage ZC occupied by the data to be backed up, the total storage ZC i-m corresponding to the backup paths, the existing storage XC i-m and the path distance LJ i-m into a neural network model, sorting the backup paths KB i corresponding to the obtained result of assigning MNT to 0 when the analog backup is completed according to the size of MNT when the analog backup is completed, taking the backup paths KB i arranged at the head as the optimal backup path and backing up the data to be backed up; the smaller the MNT is when the analog backup total is completed, the more stable the transmission performance of the backup path KB i is considered.
When the data information to be backed up is backed up separately, backing up the data to be backed up according to a preset backup strategy II; the method comprises the following steps: the method comprises the steps of conducting descending order sorting on the size of storage space FCp required by each piece of divided backup data in the separable backup process, inputting the storage space FCp required by each piece of divided backup data, the total storage space ZC i-m corresponding to each piece of the divided backup data, the existing storage space XC i-m and the path distance LJ i-m into a neural network model according to the sorting order, conducting ascending order sorting on the backup paths KB i corresponding to the obtained result of assigning 0 to the obtained total MNT value for completing the analog backup according to the size of MNT when the total analog backup is completed, conducting ascending order sorting on the backup paths KB i corresponding to the obtained result of assigning 0 to the divided backup data as the optimal backup paths corresponding to the current divided backup data, deleting the backup paths KB i from the neural network model until the result of assigning 0 to the total analog backup data is not completed, conducting update processing on the backup paths xci-m in the backup paths xci corresponding to the divided backup data until the existing backup data is known to occupy the storage space FCp, conducting matching with the existing backup paths, and completing the backup paths fci according to the sequence, and completing the matching with the existing backup data according to the required time: only one piece of separated backup data in the same batch needs to occupy the storage space FCp and is successfully matched with the only one backup path KB i, and the steps of listing all matching results according to the time sequence of completing the matching, and then backing up the data to be backed up according to the sequence of matching batches are as follows: firstly, establishing a backup channel between the storage space FCp required by the separated backup data successfully matched in the previous batch and the corresponding backup path KB i, after the backup of the backup path KB i of the previous batch is completed, taking the storage space FCp required by the separated backup data corresponding to the backup path KB i of the previous batch in the next batch, then establishing the backup channel between the two until all data to be backed up are transmitted, then acquiring current backup performance statistics, and when the current backup performance statistics are larger than the historical backup performance statistics, optimizing the data backup performance.
The above formulas are all formulas with dimensions removed and numerical values calculated, the formulas are formulas with a large amount of data collected for software simulation to obtain the latest real situation, and preset parameters in the formulas are set by those skilled in the art according to the actual situation.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. The data backup performance optimization method based on machine learning is characterized by comprising the following steps:
step one, collecting backup information of a target backup system;
step two, the collected backup information is arranged and operated, and a neural network model is built;
step three, collecting data information to be backed up;
analyzing according to the data information to be backed up and the information which can be backed up, and determining that the data information to be backed up is backed up in whole or separately according to the analysis result;
inputting the data to be backed up into a neural network model according to the analysis result in the step four, selecting an optimal backup path according to the output result of the neural network model, and backing up the data to be backed up;
and step six, acquiring current backup performance statistics when the backup is completed, and optimizing the data backup performance when the current backup performance statistics are greater than the historical backup performance statistics.
2. The machine learning based data backup performance optimization method of claim 1, wherein the backup information of the target backup system comprises a backup path of the target backup system, a total storage amount corresponding to the backup path, an existing storage amount, and a path distance, and the backup path is marked as KBi, the total storage amount corresponding to the backup path is marked as ZCi-m, the existing storage amount is marked as XCi-m, and the path distance is marked as LJi-m, respectively.
3. The machine learning based data backup performance optimization method of claim 2, wherein the sorting operation of the collected backup information is:
step 1, obtaining a backup path KBi, a total storage capacity ZCi-m corresponding to the backup path, an existing storage capacity XCi-m and a path distance LJi-m from backup information of a target backup system;
and 2, respectively carrying out unit unification processing on the total storage capacity ZCi-m, the existing storage capacity XCi-m and the path distance LJi-m corresponding to the backup paths, and summarizing the backup paths KBi, the total storage capacity ZCi-m, the existing storage capacity XCi-m and the path distance LJi-m corresponding to the backup paths into a backup path collating set.
4. The machine learning based data backup performance optimization method of claim 3, wherein in the second step, building a neural network model means:
step one, simulating backup operation, namely selecting data with the data size of SJn as backup data during simulation, then using backup paths KBi to backup, wherein each backup path KBi collects k groups of data sets, each group of data sets comprises backup paths KBi, total storage capacity ZCi-m corresponding to the backup paths, existing storage capacity XCi-m and path distance LJi-m, and marking the total time for completing the simulation backup as MNT when the simulation backup is completed;
step two, transmitting the k groups of data sets in the step one to a neural network model, taking the total storage capacity ZC i-m corresponding to the backup path, the existing storage capacity XC i-m and the path distance LJi-m with the data size of SJn as input data of model training, taking MNT when the simulation backup is completed as output data of model training, and according to the proportion 6:2:2, dividing the model into a training set, a verification set and a test set, and considering that the training is finished when the performance of the model on the verification set is not improved any more;
and thirdly, assigning MNT for completing the simulation backup to be 0 or 1 for training the model on the basis of the second step, wherein 0 represents that MNT for completing the simulation backup is smaller than or equal to a set threshold Y1, and 1 represents that MNT for completing the simulation backup is larger than the set threshold Y1.
5. The machine learning based data backup performance optimization method according to claim 4, wherein in step three, the data information to be backed up includes a total occupied storage amount of the data to be backed up, an occupied storage amount of each piece of divided backup data required in the case of divided backup, and the total occupied storage amount of the data to be backed up is denoted as ZC, and the occupied storage amount of each piece of divided backup data required in the case of divided backup is denoted as FCp.
6. The machine learning based data backup performance optimization method according to claim 5, wherein in the fourth step, analysis is performed according to the data information to be backed up and the backup information, and determining that the data information to be backed up is backed up in whole or separately according to the analysis result means that:
comparing the total storage capacity ZC required to be occupied by the data to be backed up with a preset threshold value, if the total storage capacity ZC required to be occupied by the data to be backed up is larger than the preset threshold value Y2, separating backup, and if the total storage capacity ZC required to be occupied by the data to be backed up is smaller than the preset threshold value Y2, performing secondary analysis processing.
7. The machine learning based data backup performance optimization method of claim 6 wherein the logic of the secondary analysis process is as follows:
s1, substituting the total storage capacity ZC occupied by the data to be backed up, the total storage capacity ZCi-m corresponding to the backup path, the existing storage capacity XCi-m and the path distance LJi-m into a calculation formula of a determination value PD, alpha and beta are specific proportionality coefficients, when the judging value PD is more than or equal to a preset threshold value Y3, and +.>When the result of the data information to be backed up is larger than or equal to a preset threshold Y4, the data information to be backed up is backed up in whole, otherwise, the data information to be backed up is backed up separately.
8. The machine learning based data backup performance optimization method of claim 7, wherein in step five, inputting the data to be backed up into the neural network model according to the analysis result in step four, selecting the best backup path according to the output result of the neural network model, and backing up the data to be backed up means:
when the data information to be backed up is backed up in whole, backing up the data to be backed up according to a preset backup strategy;
when the data information to be backed up is backed up separately, backing up the data to be backed up according to a preset backup strategy II.
CN202311721170.4A 2023-12-14 2023-12-14 Data backup performance optimization method based on machine learning Pending CN117667519A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311721170.4A CN117667519A (en) 2023-12-14 2023-12-14 Data backup performance optimization method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311721170.4A CN117667519A (en) 2023-12-14 2023-12-14 Data backup performance optimization method based on machine learning

Publications (1)

Publication Number Publication Date
CN117667519A true CN117667519A (en) 2024-03-08

Family

ID=90071238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311721170.4A Pending CN117667519A (en) 2023-12-14 2023-12-14 Data backup performance optimization method based on machine learning

Country Status (1)

Country Link
CN (1) CN117667519A (en)

Similar Documents

Publication Publication Date Title
CN113379399B (en) RPA component recommendation method based on state transition probability model
CN114168486A (en) Interface automation test method, device, medium, device, and program
CA3004594C (en) Cause backtracing method
CN115392592B (en) Storage product parameter configuration recommendation method, device, equipment and medium
CN106296129A (en) A kind of status indicator method and device
CN110335641B (en) Four-body combination genetic relationship identification method and device
CN113051725B (en) DET and RELAP5 coupled dynamic characteristic analysis method based on universal auxiliary variable method
JPWO2016031681A1 (en) Log analysis apparatus, log analysis system, log analysis method, and computer program
CN111178701A (en) Risk control method and device based on feature derivation technology and electronic equipment
CN117667519A (en) Data backup performance optimization method based on machine learning
KR20190069637A (en) Charging method and system in multi cloud in the same way
JP2017049639A (en) Evaluation program, procedure manual evaluation method, and evaluation device
CN107193736A (en) Method of testing, device, electronic equipment and storage medium
WO2023103574A1 (en) Unit test method and apparatus, electronic device, storage medium, and program
CN114465875B (en) Fault processing method and device
CN110908356A (en) Flight control test data rapid fault positioning method
CN110262950A (en) Abnormal movement detection method and device based on many index
CN113377962B (en) Intelligent process simulation method based on image recognition and natural language processing
CN114579809A (en) Event analysis method and device, electronic equipment and storage medium
Arciszewski et al. Inductive learning: the user's perspective
CN110333906B (en) Method, system, device and storage medium for rapidly processing reserved equipment
CN117573561B (en) Automatic test system, method, electronic equipment and storage medium
CN114020974B (en) Sample data determination and dialogue intention identification method, storage medium, and program product
US20240134780A1 (en) Method, device, and computer program product for generating test case
CN111061640B (en) Software reliability test case screening method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination