CN118152124A - Data processing method and system based on cloud computing - Google Patents

Data processing method and system based on cloud computing Download PDF

Info

Publication number
CN118152124A
CN118152124A CN202410296530.9A CN202410296530A CN118152124A CN 118152124 A CN118152124 A CN 118152124A CN 202410296530 A CN202410296530 A CN 202410296530A CN 118152124 A CN118152124 A CN 118152124A
Authority
CN
China
Prior art keywords
cloud computing
data
computing resource
data processing
resource quantity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410296530.9A
Other languages
Chinese (zh)
Inventor
马明星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Juming Data Nanjing Co ltd
Original Assignee
Juming Data Nanjing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Juming Data Nanjing Co ltd filed Critical Juming Data Nanjing Co ltd
Priority to CN202410296530.9A priority Critical patent/CN118152124A/en
Publication of CN118152124A publication Critical patent/CN118152124A/en
Pending legal-status Critical Current

Links

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a data processing method and a system based on cloud computing, which relate to the technical field of data processing and comprise the steps of preprocessing data and uploading the data to a cloud computing platform; analyzing the cloud computing resource quantity required by data processing, and making a preliminary cloud computing resource quantity allocation strategy; and the cloud computing platform processes the data according to the preliminary cloud computing resource quantity allocation strategy and performs real-time monitoring and adjustment. According to the invention, the time sequence prediction model based on the LSTM model and the cloud computing resource allocation formula are used, the real-time monitoring system is combined to dynamically adjust the allocated cloud computing resource quantity in real time, and the LSTM model is trained again when the allocated cloud computing resource quantity fails to solve the shortage of the cloud computing resource, so that the dynamic resource allocation is realized, the efficiency of data processing and resource use is improved, the reliability and the safety of data are also improved, and the method has important significance for improving the high efficiency and the flexibility of processing large-scale data sets.

Description

Data processing method and system based on cloud computing
Technical Field
The invention relates to the technical field of data processing, in particular to a data processing method and system based on cloud computing.
Background
With rapid development and wide application of cloud computing technology, demands for processing large-scale data are increasing, in this context, it is becoming important to efficiently manage and optimize cloud computing resources, and conventional resource allocation methods often rely on static resource allocation policies, which may cause different use efficiency of resources under different conditions, so that problems of resource waste or resource shortage are caused, and data processing performance is affected, so it is becoming important to develop a more flexible and intelligent data processing method.
Disclosure of Invention
The present invention has been made in view of the above-mentioned problems occurring in the conventional cloud computing-based data processing method and system.
Therefore, the problem to be solved by the present invention is that the conventional resource allocation method often depends on a static resource allocation policy, which may cause different usage efficiency of resources under different conditions, thereby causing a problem of resource waste or resource shortage.
In order to solve the technical problems, the invention provides the following technical scheme: a data processing method based on cloud computing comprises the steps of preprocessing data; analyzing the cloud computing resource quantity required by data processing, and making a preliminary cloud computing resource quantity allocation strategy; the cloud computing platform processes data according to the preliminary cloud computing resource quantity allocation strategy and performs real-time monitoring and adjustment; and carrying out backup and disaster recovery scheme design after the data processing is completed.
As a preferable scheme of the cloud computing-based data processing method of the present invention, the method comprises: the preprocessing of the data comprises the steps of collecting the data to be processed, removing irrelevant and repeated data, carrying out standardized processing and format conversion on the data, carrying out preliminary data analysis and statistics on the data, and uploading the data to a cloud computing platform after preprocessing.
As a preferable scheme of the cloud computing-based data processing method of the present invention, the method comprises: the amount of cloud computing resources required for the analytical data processing includes,
Determining the number of layers and the number of units of each layer of the LSTM model, and determining the input and output of the model;
Dividing the processed data into a training set and a testing set, and training an LSTM model by using the training set data;
performing iterative computation on the LSTM model to adjust model parameters so as to optimize the model performance;
after each training period has ended, the test set is used to evaluate model performance,
If the verification difference does not change obviously after the training period is finished, stopping model training to obtain a trained LSTM model;
and predicting the resource demand in a future period by using the LSTM model as a time sequence prediction model, wherein the formula is shown as follows:
Wherein, Is the predicted future resource amount, x t-n is the historical resource amount data;
and inputting the processed data into an LSTM model, and automatically processing the input data by the LSTM model and outputting a prediction result.
As a preferable scheme of the cloud computing-based data processing method of the present invention, the method comprises: the making of the preliminary cloud computing resource allocation policy includes making the preliminary cloud computing resource allocation policy according to a predicted result of the LSTM model, where if the predicted result of the LSTM model indicates that the next required resource amount will remain stable, the existing cloud computing resource amount is maintained and continuously monitored, if the predicted result of the LSTM model indicates that the next required resource amount will increase, the allocated cloud computing resource amount is increased, and if the predicted result of the LSTM model indicates that the next required resource amount will decrease, the allocated cloud computing resource amount is reduced, and cloud computing resource allocation is performed, where the cloud computing resource allocation formula is:
Wherein, Is to predict/>, of future resource amount using nonlinear adjustment index gamma adjustmentBecomes nonlinear, α is the load adjustment factor, β is the current load factor, and L t is the current actual load.
As a preferable scheme of the cloud computing-based data processing method of the present invention, the method comprises: the cloud computing platform processes data according to the preliminary cloud computing resource quantity allocation strategy and carries out real-time monitoring and adjustment, wherein the cloud computing platform processes the data according to the preliminary cloud computing resource quantity allocation strategy and uses a Prometaus monitoring system to monitor the data processing process and the use condition of the allocated cloud computing resources in real time, the cloud computing platform collects the average utilization rate data of the cloud computing resource performance, the average utilization rate data of the cloud computing resource performance is the average value of the performance utilization rates of all cloud computing resources, and the average utilization rate data of the cloud computing resource performance is sent back to the Prometaus monitoring system for analysis:
If the distributed cloud computing resource quantity is all in use and all data are in the processing process, the average utilization rate of the cloud computing resource performance is more than 70% within 5 minutes, the cloud computing resource distribution strategy does not need to be adjusted, continuous monitoring and checking are carried out on the average utilization rate of the cloud computing resource performance, the data processing rate is quickened when unprocessed data are found, the cloud computing resource quantity is released after the data processing is finished, and more cloud computing resource quantity is provided for the unprocessed data;
If the distributed cloud computing resource quantity is in use and still has partial unprocessed data, the average utilization rate of the cloud computing resource performance exceeds 70% and lasts for more than 5 minutes, which indicates that the distributed cloud computing resource is in shortage, the cloud computing resource distribution strategy needs to be adjusted immediately, the cloud computing platform checks other distributed cloud computing resource quantities, if the distributed cloud computing resource quantity is not in use, the cloud computing resource quantity is preferentially distributed to the part still having unprocessed data, if the other distributed cloud computing resource quantity is in use, the cloud computing platform analyzes the required cloud computing resource quantity again according to the residual unprocessed data, generates a cloud computing resource redistribution strategy to process the residual data and marks the data for storing as historical data, and is used as a reference for updating the cloud computing resource distribution strategy, if the generated cloud computing resource redistribution strategy still fails to solve the problem of the cloud computing resource shortage, an alarm notification manager is sent to process the data immediately, the processed data is marked after the end, the data is stored as the latest historical data, and training data of the LSTM model is replaced as the latest historical data for retraining;
If the cloud computing resource quantity which is distributed is not in use and all data are in the processing process, the average utilization rate of the cloud computing resource performance is continuously lower than 50% within 3 minutes, the cloud computing platform checks other distributed cloud computing resource quantities, if the distributed cloud computing resource quantity is not in use, the cloud computing resource quantity which is not used is distributed to the part preferentially, if the distributed cloud computing resource quantity is not in use, a resource recycling mechanism is automatically triggered, component detection is carried out on the average utilization rate of the cloud computing resource performance, and the cloud computing resource with the performance utilization rate lower than 30% is gradually recycled until the average utilization rate of the cloud computing resource performance is higher than 70%.
As a preferable scheme of the cloud computing-based data processing method of the present invention, the method comprises: the step of backing up after the data processing is completed comprises the step of determining key data which need to be backed up after the data processing is completed, wherein the key data comprises a database, important files and system configuration, and the backup frequency is set according to the importance and the update frequency of the data.
As a preferable scheme of the cloud computing-based data processing method of the present invention, the method comprises: the disaster recovery scheme design comprises the steps of formulating recovery strategies aiming at the conditions of hardware faults and network attacks of different types of disasters, periodically checking and updating backup data and recovery plans, selecting a plurality of data centers in different geographic positions for data backup, and deploying an automatic backup system for periodically and automatically backing up data.
The data processing system based on cloud computing is characterized by comprising a data preprocessing module, a cloud computing resource quantity allocation strategy module, a real-time monitoring module and a data backup and recovery module;
the data preprocessing module is used for preprocessing data and uploading the preprocessed data to the cloud computing platform;
the cloud computing resource quantity allocation strategy module is used for processing data according to the preliminary cloud computing resource quantity allocation strategy, analyzing the resource quantity required by data processing and formulating the preliminary cloud computing resource quantity allocation strategy;
The real-time monitoring module is used for automatically adjusting resource allocation according to the real-time monitoring data processing process;
The data backup and recovery module is used for backing up data and designing a disaster recovery scheme.
A computer device, comprising: a memory and a processor; the memory stores a computer program characterized in that: the processor, when executing the computer program, implements the steps of a data processing method based on cloud computing.
A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program when executed by a processor implements the steps of a cloud computing based data processing method.
The invention has the beneficial effects that: according to the invention, the time sequence prediction model based on the LSTM model and the cloud computing resource allocation formula are used, the real-time monitoring system is combined to dynamically adjust the allocated cloud computing resource quantity in real time, and the LSTM model is trained again when the allocated cloud computing resource quantity fails to solve the shortage of the cloud computing resource, so that the dynamic resource allocation is realized, the efficiency of data processing is improved, the high efficiency and the real-time performance of the data processing are ensured, the reliability and the safety of the data are also improved, and the method has important significance for improving the high efficiency and the flexibility of processing a large-scale data set.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a data processing method based on cloud computing.
Fig. 2 is a schematic diagram of a data processing system based on cloud computing.
Detailed Description
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
Further, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic can be included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
Example 1
Referring to fig. 1, a first embodiment of the present invention provides a data processing method based on cloud computing, where the data processing method based on cloud computing includes the following steps:
s1, preprocessing data;
Specifically, preprocessing the data includes collecting the data to be processed, removing irrelevant and repeated data, performing standardized processing and format conversion on the data, performing preliminary data analysis and statistics on the data, preprocessing, and uploading the data to the cloud computing platform.
The method comprises the steps of identifying and removing data irrelevant to a current analysis target by using a filter or query, detecting repeated data by using a data processing tool, deleting the repeated data by using a command, carrying out consistency check, ensuring that the data after the irrelevant and repeated data are removed are accurate and consistent, carrying out basic statistical analysis on the data by calculating the record number, the unique value number and the missing value number, and counting the total data, wherein the analyzed data are distributed in different categories, the statistical data are the mean value, the median value, the standard deviation, the variance, the maximum value and the minimum value of the statistical data, so that the basic characteristics of the data can be quickly known, the problems existing in the data can be quickly identified based on the statistical data, the problems are the references for evaluating the performance of an LSTM model, after preprocessing is carried out, the data are stored in storage services provided by a cloud platform, and the data are uploaded by using a graphical interface tool and a command line tool provided by the cloud platform, so that the data uploaded to the cloud computing platform are ensured to be accurate, clean and useful, thereby the complexity and the difficulty of a subsequent processing link are reduced, and the processing speed and the processing efficiency are improved.
S2, analyzing the cloud computing resource quantity required by data processing, and making a preliminary cloud computing resource quantity allocation strategy;
In particular, the amount of cloud computing resources required for analytical data processing includes,
Determining the number of layers and the number of units of each layer of the LSTM model, and determining the input and output of the model;
Dividing the processed data into a training set and a testing set, and training an LSTM model by using the training set data;
performing iterative computation on the LSTM model to adjust model parameters so as to optimize the model performance;
after each training period has ended, the test set is used to evaluate model performance,
If the verification difference does not change obviously after the training period is finished, stopping model training to obtain a trained LSTM model;
and predicting the resource demand in a future period by using the LSTM model as a time sequence prediction model, wherein the formula is shown as follows:
Wherein, Is the predicted future resource amount, x t-n is the historical resource amount data;
and inputting the processed data into an LSTM model, and automatically processing the input data by the model and outputting a prediction result.
The LSTM model is a special cyclic neural network and is used for processing and predicting time sequence data, future demands of cloud computing resources can be accurately predicted, cloud computing resources can be more accurately distributed according to a prediction result, data processing efficiency is improved, change of the data processing demands is quickly adapted, timely resource adjustment is achieved, processing delay and system shutdown caused by insufficient resources are reduced, a single-layer LSTM model is initially selected as a starting point according to data complexity and predicted performance requirements of the LSTM model, 64 or 128 units are used, if the performance of the initial LSTM model is poor, more LSTM layers are added, the number of units of each layer is adjusted according to training results to find optimal model complexity, input data is determined according to historical data sequences including time stamps and corresponding resource use conditions, and output data is determined according to predicted resource demands in a future period of time.
Further, the step of establishing a preliminary cloud computing resource allocation policy includes establishing a preliminary cloud computing resource allocation policy according to a predicted result of the LSTM model, wherein if the predicted result of the LSTM model indicates that the next required resource amount will remain stable, the existing cloud computing resource amount is maintained and continuously monitored, if the predicted result of the LSTM model indicates that the next required resource amount will increase, the allocated cloud computing resource amount is increased, and if the predicted result of the LSTM model indicates that the next required resource amount will decrease, the allocated cloud computing resource amount is reduced, and cloud computing resource allocation is performed, and the cloud computing resource allocation formula is as follows:
Wherein R alloc is the amount of allocated cloud computing resources, Is to predict/>, of future resource amount using nonlinear adjustment index gamma adjustmentBecomes nonlinear, α is the load adjustment factor, β is the current load factor, and L t is the current actual load.
By analyzing historical load data, alpha and beta can be obtained through experiments and historical data, L t can be obtained through reading the current resource use condition, and the application of nonlinear adjustment indexes and load adjustment coefficients enables resource allocation to flexibly adapt to different load conditions, and in a high-demand period, enough resources are ensured to be available, system performance and response speed are maintained, service interruption or performance degradation caused by insufficient resources is avoided, the accuracy and efficiency of resource allocation are improved, and the performance and cost benefits of the whole cloud computing platform are improved.
S3, the cloud computing platform processes data according to the preliminary cloud computing resource amount allocation strategy and performs real-time monitoring and adjustment;
Specifically, the cloud computing platform processes data according to the preliminary cloud computing resource amount distribution strategy and performs real-time monitoring and adjustment, which includes that the cloud computing platform processes data according to the preliminary cloud computing resource amount distribution strategy and uses a Prometaus monitoring system to monitor the data processing process and the use condition of distributed cloud computing resources in real time, collects average utilization rate data of cloud computing resource performance on the cloud computing platform, wherein the average utilization rate data of cloud computing resource performance is an average value of all cloud computing resource performance utilization rates, and sends the average utilization rate data of cloud computing resource performance back to the Prometaus monitoring system for analysis:
If the distributed cloud computing resource quantity is all in use and all data are in the processing process, the average utilization rate of the cloud computing resource performance is more than 70% within 5 minutes, the cloud computing resource distribution strategy does not need to be adjusted, continuous monitoring and checking are carried out on the average utilization rate of the cloud computing resource performance, the data processing rate is quickened when unprocessed data are found, the cloud computing resource quantity is released after the data processing is finished, and more cloud computing resource quantity is provided for the unprocessed data;
If the distributed cloud computing resource quantity is in use and still has partial unprocessed data, the average utilization rate of the cloud computing resource performance exceeds 70% and lasts for more than 5 minutes, which indicates that the distributed cloud computing resource is in shortage, the cloud computing resource distribution strategy needs to be adjusted immediately, the cloud computing platform checks other distributed cloud computing resource quantities, if the distributed cloud computing resource quantity is not in use, the cloud computing resource quantity is preferentially distributed to the part still having unprocessed data, if the other distributed cloud computing resource quantity is in use, the cloud computing platform analyzes the required cloud computing resource quantity again according to the residual unprocessed data, generates a cloud computing resource redistribution strategy to process the residual data and marks the data for storing as historical data, and is used as a reference for updating the cloud computing resource distribution strategy, if the generated cloud computing resource redistribution strategy still fails to solve the problem of the cloud computing resource shortage, an alarm notification manager is sent to process the data immediately, the processed data is marked after the end, the data is stored as the latest historical data, and training data of the LSTM model is replaced as the latest historical data for retraining;
If the cloud computing resource quantity which is distributed is not in use and all data are in the processing process, the average utilization rate of the cloud computing resource performance is continuously lower than 50% within 3 minutes, the cloud computing platform checks other distributed cloud computing resource quantities, if the distributed cloud computing resource quantity is not in use, the cloud computing resource quantity which is not used is distributed to the part preferentially, if the distributed cloud computing resource quantity is not in use, a resource recycling mechanism is automatically triggered, component detection is carried out on the average utilization rate of the cloud computing resource performance, and the cloud computing resource with the performance utilization rate lower than 30% is gradually recycled until the average utilization rate of the cloud computing resource performance is higher than 70%.
The Prometaus monitoring system is an open-source system monitoring and alarming tool package, is designed for processing a cloud service framework with high dynamic property, is suitable for monitoring in a cloud environment, the performance utilization rate is obtained through real-time calculation of the Prometaus monitoring system, is based on statistics data of real-time utilization conditions of resources such as a CPU (Central processing Unit), a memory, a storage and a network, and aims to obtain a comprehensive performance utilization rate index by averaging the data of the resource utilization conditions within a certain time, the range within 70% is healthy resource utilization rate, not only has no shortage of resources but also has no surplus resources, the range that the resource utilization rate is over 70% indicates that the resources are saturated, the shortage of resources possibly occurs, the short-term stability of the resource utilization rate is lower than 50%, the 5 minutes and 3 minutes are short-term stability of the resource utilization rate is rapidly recognized and confirmed, the observation of the resource utilization rate is not accidental phenomenon is ensured, the threshold and the time window are based on actual operation experience and best practice of cloud computing resource management, the purposes are to balance the resource utilization rate, the cost and the system performance are achieved, the real-time operation experience of the real-time operation of the cloud computing resource management is utilized, the real-time operation experience of the cloud computing resource utilization rate is utilized, the real-time data is utilized, the resource utilization rate of the cloud computing platform is utilized, and the real-time performance data is always provided, and the performance data of the performance is fully is guaranteed, and the performance is always sufficient to process is enabled.
S4, carrying out backup and disaster recovery scheme design after the data processing is completed;
specifically, the step of backing up after the data processing is completed includes determining key data to be backed up after the data processing is completed, including a database, important files and system configuration, and setting backup frequency according to the importance and update frequency of the data.
Compared with the traditional backup method, the cloud computing platform can provide a more efficient and automatic backup solution, the risk of data loss is reduced, the service interruption time caused by the data loss can be remarkably reduced by timely data backup, the overall reliability and the user trust of the cloud computing platform are improved through regular backup, the backup frequency is set according to the importance and the update frequency of the data, the use and management of storage resources can be optimized, and in a cloud computing environment, the flexibility and the expandability of the backup strategy are beneficial to coping with the continuously changing service requirements and data quantity.
Further, the disaster recovery scheme design comprises the steps of making recovery strategies aiming at the conditions of hardware faults and network attacks of different types of disasters, periodically checking and updating backup data and recovery plans, selecting a plurality of data centers in different geographic positions for data backup, and deploying an automatic backup system for periodically and automatically backing up data.
The disaster recovery scheme ensures that organizations can continuously operate under any condition by minimizing the influence of potential disasters, protects key data and assets, enables a cloud computing platform to effectively cope with sudden events such as hardware faults and network attacks, reduces the negative influence of the events on services, remarkably enhances the robustness of data processing, and ensures that the normal operation of the services can be maintained even under the extreme conditions of hardware faults and network attacks.
Example 2
Referring to fig. 2, in a second embodiment of the present invention, which is different from the previous embodiment, a data processing system based on cloud computing is provided, and the data processing system includes a data preprocessing module, a cloud computing resource amount allocation policy module, a real-time monitoring module, and a data backup recovery module;
the data preprocessing module is used for preprocessing data and uploading the preprocessed data to the cloud computing platform;
the cloud computing resource quantity allocation strategy module is used for processing data according to the preliminary cloud computing resource quantity allocation strategy, analyzing the resource quantity required by data processing and formulating the preliminary cloud computing resource quantity allocation strategy;
The real-time monitoring module is used for automatically adjusting resource allocation according to the real-time monitoring data processing process;
The data backup and recovery module is used for backing up data and designing a disaster recovery scheme.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Example 3
For the third example of the present invention, which is different from the first two examples, the comparison results are shown in table 1, as demonstrated by comparing the present invention method with the prior art in order to verify the beneficial effects of the present invention method.
Table 1: the method of the invention is compared with the prior art experiment table
In summary, the invention provides a more comprehensive and dynamic data processing scheme by utilizing the LSTM model and the cloud computing resource allocation formula, which remarkably improves the data processing speed and the resource utilization rate, and reduces the long-term operation cost and the automatic resource adjustment response time at the same time, so the invention has remarkable advantages in the aspects of improving the intellectualization and the automation of the data processing.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims (10)

1. A data processing method based on cloud computing is characterized by comprising the following steps: comprising the steps of (a) a step of,
Preprocessing data;
analyzing the cloud computing resource quantity required by data processing, and making a preliminary cloud computing resource quantity allocation strategy;
The cloud computing platform processes data according to the preliminary cloud computing resource quantity allocation strategy and performs real-time monitoring and adjustment;
and carrying out backup and disaster recovery scheme design after the data processing is completed.
2. The cloud computing-based data processing method of claim 1, wherein: the preprocessing of the data comprises the steps of collecting the data to be processed, removing irrelevant and repeated data, carrying out standardized processing and format conversion on the data, carrying out preliminary data analysis and statistics on the data, and uploading the data to a cloud computing platform after preprocessing.
3. The cloud computing-based data processing method of claim 2, wherein: the amount of cloud computing resources required for the analytical data processing includes,
Determining the number of layers and the number of units of each layer of the LSTM model, and determining the input and output of the model;
Dividing the processed data into a training set and a testing set, and training an LSTM model by using the training set data;
performing iterative computation on the LSTM model to adjust model parameters so as to optimize the model performance;
After each training period is finished, evaluating the performance of the model by using a test set;
If the verification difference does not change obviously after the training period is finished, stopping model training to obtain a trained LSTM model;
and predicting the resource demand in a future period by using the LSTM model as a time sequence prediction model, wherein the formula is shown as follows:
Wherein, Is the predicted future resource amount, x t-n is the historical resource amount data;
and inputting the processed data into an LSTM model, and automatically processing the input data by the LSTM model and outputting a prediction result.
4. The cloud computing-based data processing method of claim 3, wherein: the making of the preliminary cloud computing resource allocation policy includes making the preliminary cloud computing resource allocation policy according to a predicted result of the LSTM model, where if the predicted result of the LSTM model indicates that the next required resource amount will remain stable, the existing cloud computing resource amount is maintained and continuously monitored, if the predicted result of the LSTM model indicates that the next required resource amount will increase, the allocated cloud computing resource amount is increased, and if the predicted result of the LSTM model indicates that the next required resource amount will decrease, the allocated cloud computing resource amount is reduced, and cloud computing resource allocation is performed, where the cloud computing resource allocation formula is:
Wherein, Is to predict/>, of future resource amount using nonlinear adjustment index gamma adjustmentBecomes nonlinear, α is the load adjustment factor, β is the current load factor, and L t is the current actual load.
5. The cloud computing-based data processing method of claim 4, wherein: the cloud computing platform processes data according to the preliminary cloud computing resource quantity allocation strategy and carries out real-time monitoring and adjustment, wherein the cloud computing platform processes the data according to the preliminary cloud computing resource quantity allocation strategy and uses a Prometaus monitoring system to monitor the data processing process and the use condition of the allocated cloud computing resources in real time, the cloud computing platform collects the average utilization rate data of the cloud computing resource performance, the average utilization rate data of the cloud computing resource performance is the average value of the performance utilization rates of all cloud computing resources, and the average utilization rate data of the cloud computing resource performance is sent back to the Prometaus monitoring system for analysis:
If the distributed cloud computing resource quantity is all in use and all data are in the processing process, the average utilization rate of the cloud computing resource performance is more than 70% within 5 minutes, the cloud computing resource distribution strategy does not need to be adjusted, continuous monitoring and checking are carried out on the average utilization rate of the cloud computing resource performance, the data processing rate is quickened when unprocessed data are found, and the cloud computing resource quantity is released after the data processing is completed;
If the distributed cloud computing resource quantity is in use and still has partial unprocessed data, the average utilization rate of the cloud computing resource performance exceeds 70% and lasts for more than 5 minutes, which indicates that the distributed cloud computing resource is in shortage, the cloud computing resource distribution strategy needs to be adjusted immediately, the cloud computing platform checks other distributed cloud computing resource quantities, if the distributed cloud computing resource quantity is not in use, the cloud computing resource quantity is preferentially distributed to the part still having unprocessed data, if the other distributed cloud computing resource quantity is in use, the cloud computing platform analyzes the required cloud computing resource quantity again according to the residual unprocessed data, generates a cloud computing resource redistribution strategy to process the residual data and marks the data for storing as historical data, and is used as a reference for updating the cloud computing resource distribution strategy, if the generated cloud computing resource redistribution strategy still fails to solve the problem of the cloud computing resource shortage, an alarm notification manager is sent to process the data immediately, the processed data is marked after the end, the data is stored as the latest historical data, and training data of the LSTM model is replaced as the latest historical data for retraining;
If the cloud computing resource quantity which is distributed is not in use and all data are in the processing process, the average utilization rate of the cloud computing resource performance is continuously lower than 50% within 3 minutes, the cloud computing platform checks other distributed cloud computing resource quantities, if the distributed cloud computing resource quantity is not in use, the cloud computing resource quantity which is not used is distributed to the part preferentially, if the distributed cloud computing resource quantity is not in use, a resource recycling mechanism is automatically triggered, component detection is carried out on the average utilization rate of the cloud computing resource performance, and the cloud computing resource with the performance utilization rate lower than 30% is gradually recycled until the average utilization rate of the cloud computing resource performance is higher than 70%.
6. The cloud computing-based data processing method of claim 5, wherein: the step of backing up after the data processing is completed comprises the steps of storing the data in a database after the data processing is completed, determining key data needing to be backed up, including the database, important files and system configuration, and setting the backup frequency according to the importance and the update frequency of the data.
7. The cloud computing-based data processing method of claim 6, wherein: the disaster recovery scheme design comprises the steps of formulating recovery strategies aiming at the conditions of hardware faults and network attacks of different types of disasters, periodically checking and updating backup data and recovery plans, selecting a plurality of data centers in different geographic positions for data backup, and deploying an automatic backup system for periodically and automatically backing up data.
8. The data processing system based on cloud computing is characterized by comprising a data preprocessing module, a cloud computing resource quantity allocation strategy module, a real-time monitoring module and a data backup and recovery module;
the data preprocessing module is used for preprocessing data and uploading the preprocessed data to the cloud computing platform;
the cloud computing resource quantity allocation strategy module is used for processing data according to the preliminary cloud computing resource quantity allocation strategy, analyzing the resource quantity required by data processing and formulating the preliminary cloud computing resource quantity allocation strategy;
The real-time monitoring module is used for automatically adjusting resource allocation according to the real-time monitoring data processing process;
The data backup and recovery module is used for backing up data and designing a disaster recovery scheme.
9. A computer device, comprising: a memory and a processor; the memory stores a computer program characterized in that: the processor, when executing the computer program, implements the steps of the method of any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon a computer program, characterized by: the computer program implementing the steps of the method of any of claims 1 to 7 when executed by a processor.
CN202410296530.9A 2024-03-15 2024-03-15 Data processing method and system based on cloud computing Pending CN118152124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410296530.9A CN118152124A (en) 2024-03-15 2024-03-15 Data processing method and system based on cloud computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410296530.9A CN118152124A (en) 2024-03-15 2024-03-15 Data processing method and system based on cloud computing

Publications (1)

Publication Number Publication Date
CN118152124A true CN118152124A (en) 2024-06-07

Family

ID=91292401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410296530.9A Pending CN118152124A (en) 2024-03-15 2024-03-15 Data processing method and system based on cloud computing

Country Status (1)

Country Link
CN (1) CN118152124A (en)

Similar Documents

Publication Publication Date Title
US11003641B2 (en) Automatic database troubleshooting
US9280436B2 (en) Modeling a computing entity
CN111381970B (en) Cluster task resource allocation method and device, computer device and storage medium
CN112148561A (en) Service system running state prediction method and device and server
CN113238714A (en) Disk capacity prediction method and system based on historical monitoring data and storage medium
US20230034061A1 (en) Method for managing proper operation of base station and system applying the method
CN117439256A (en) Power station equipment management method and system based on Internet of things
CN117971488A (en) Storage management method and related device for distributed database cluster
CN115114124A (en) Host risk assessment method and device
CN116539994A (en) Substation main equipment operation state detection method based on multi-source time sequence data
CN113703974B (en) Method and device for predicting server capacity
CN118152124A (en) Data processing method and system based on cloud computing
CN116049765A (en) Data analysis processing method, device and equipment
CN116107854A (en) Method, system, equipment and medium for predicting operation maintenance index of computer
CN116185797A (en) Method, device and storage medium for predicting server resource saturation
CN114389962A (en) Broadband loss user determination method and device, electronic equipment and storage medium
CN114168409A (en) Service system running state monitoring and early warning method and system
CN113065234A (en) Batch reliability risk level assessment method and system for intelligent electric meters
CN112395167A (en) Operation fault prediction method and device and electronic equipment
CN118101421B (en) Intelligent alarm threshold self-adaption method based on machine learning
RU2809254C1 (en) Method and system for monitoring automated systems
CN115297018B (en) Operation and maintenance system load prediction method based on active detection
CN117150032B (en) Intelligent maintenance system and method for hydropower station generator set
CN118409929A (en) Operation and maintenance method and device of database, computer storage medium and electronic equipment
CN116737784A (en) Method, apparatus, device, medium and program product for detecting periodic fluctuation of data

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination