CN117851122A - Disaster recovery backup and recovery system of power information system in cloud environment - Google Patents

Disaster recovery backup and recovery system of power information system in cloud environment Download PDF

Info

Publication number
CN117851122A
CN117851122A CN202311718958.XA CN202311718958A CN117851122A CN 117851122 A CN117851122 A CN 117851122A CN 202311718958 A CN202311718958 A CN 202311718958A CN 117851122 A CN117851122 A CN 117851122A
Authority
CN
China
Prior art keywords
backup
disaster recovery
data
information
component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311718958.XA
Other languages
Chinese (zh)
Inventor
汤铭
何金陵
程昕云
夏飞
王鹏飞
李亚乔
王智慷
刘喆
宋浒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd filed Critical Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority to CN202311718958.XA priority Critical patent/CN117851122A/en
Publication of CN117851122A publication Critical patent/CN117851122A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses a disaster recovery backup recovery system of an electric power information system in a cloud environment, which comprises a disaster recovery center and a devots assembly line module which are established on private clouds, and a production server, a backup server and a disaster recovery server which are distributed on public clouds; the disaster recovery center comprises a monitoring center, a dispatching center, a storage center and a disaster recovery backup vertical model which is generated based on large language model training; when the monitoring center monitors that one public cloud production server is abnormal, the disaster recovery scheduling module is informed to trigger the devots assembly line module, the production assembly line information is automatically obtained, the address information and disaster recovery backup data of the disaster recovery server are obtained from the nodes by the disaster recovery service, the deployment host information is changed to deploy, the DNS server is informed, and the ip information corresponding to the domain name of the production service is changed to the disaster recovery server. The invention greatly improves the disaster recovery backup and recovery capability of the power information system.

Description

Disaster recovery backup and recovery system of power information system in cloud environment
Technical Field
The invention relates to the technical field of disaster recovery of power information systems, in particular to a disaster recovery backup recovery system of a power information system in a cloud environment.
Background
A single public cloud or private cloud is insufficient to support increasingly complex user needs, and the enterprise needs for cross-platform and cross-regional service deployment are more obvious. The public cloud and the private cloud are fused by the hybrid cloud, so that the safety of the private cloud and the economical efficiency of the public cloud are considered, and the hybrid cloud becomes the main direction of cloud computing development. The power industry can better cope with continuously changing power demands and load fluctuation through the hybrid cloud, the core power system and key applications are deployed in a private cloud environment, and meanwhile, the elastic resources of the public cloud are utilized to meet temporary or sudden demands, such as load increase in peak hours or temporary computing tasks.
In the power industry, the use of hybrid cloud deployments is increasing. Hybrid clouds refer to a deployment model that combines private clouds (On-Premises) and Public clouds (Public clouds) by retaining critical business applications and sensitive data in the private clouds while taking advantage of the Public Cloud's elasticity and flexibility to handle temporary workloads and bursty business needs.
In the power industry, hybrid cloud deployment may offer the following advantages:
(1) Elastic expansion capability: the power industry may face temporary peak loads, such as increased data processing demands caused by power peaks or emergencies. By taking advantage of the elastic expansion capabilities of the public cloud, these temporary workloads can be quickly provided with sufficient computing and storage resources without having to invest in and maintain additional private cloud infrastructure for this purpose.
(2) Disaster preparation and disaster recovery: critical business applications in the power industry require high availability and disaster recovery protection. Hybrid cloud deployments may use public clouds as backup and disaster recovery locations to ensure that services can be quickly restored when private clouds fail or disaster. The transregional and transregional disaster recovery capability provided by the public cloud can provide more reliable protection for key applications in the power industry.
The mixed cloud backup data content comprises deployment packages, deployment scripts, configuration files, data scripts and the like, and the traditional NLP analysis mode is insufficient for completing the integrity and configuration consistency check detection of file data. Although in some scenes, the disaster recovery administrator can manually confirm and compare the deployment package, deployment script, configuration file and data script of partial backup, which takes a long time and cannot cope with frequent production backup demands; meanwhile, because the disaster recovery manager is unfamiliar with the production business, only the information confirmation such as the file name or the file md5 identification can be carried out, and the possible problem of missing data or configuration file content cannot be identified. The invention of patent application number 202011359464.3 provides a solution for data transmission delay under the condition of low bandwidth of a master end and a slave end, which can only ensure that data transmission is not delayed and can not identify the problems of data loss and configuration file configuration consistency.
Disclosure of Invention
The invention aims to provide a disaster recovery backup recovery system of an electric power information system in cloud environment, aiming at the incomplete problems of the electric power information system based on mixed cloud deployment but disaster recovery and recovery methods, a disaster recovery center is established by taking private cloud as a main node, each public cloud environment is taken as a slave node, a production server in the public cloud environment is protected, the production server is registered in the disaster recovery center, the disaster recovery center carries out mutual backup of environment and data (deployment package, deployment script and production data) in a plurality of public cloud production servers in real time based on devots production lines according to registration information, when the public cloud is abnormal, the automatic replication production line changes deployment host information to be deployed quickly, and the deployment host information is switched to another public cloud quickly, so that the production service can be provided normally; meanwhile, a disaster recovery backup vertical model is generated through training, the problem that the traditional NLP analysis mode is insufficient to complete the checking and detection of the integrity and the configuration consistency of backup file data is solved, disaster recovery operators are helped to analyze the rationality and the standardability of backup strategies through a dialogue prompt function in the process of disaster recovery backup, and the disaster recovery backup recovery capacity of an electric power information system is greatly improved.
In order to achieve the technical purpose, the invention adopts the following technical scheme:
the disaster recovery backup and recovery system of the power information system in the cloud environment comprises a disaster recovery center and a devots assembly line module which are built on private clouds, and a production server, a backup server and a disaster recovery server which are distributed on public clouds; the private cloud is used as a master node, the public cloud is used as a slave node, the master node and each slave node are connected with each other, and a production server and a disaster recovery server in the slave nodes with disaster recovery backup relationship are connected with each other;
the disaster recovery center comprises a monitoring center, a dispatching center, a storage center and a disaster recovery backup vertical model which is generated based on large language model training;
the storage center divides a plurality of areas according to different storage clouds and respectively stores production data, cache data and configuration files provided by backup servers in various cloud environments, and the data of the plurality of areas are assembled together to form integral disaster-tolerant backup data;
the monitoring center is used for monitoring the states of various components, systems and services in the disaster recovery environment in real time;
the dispatching center comprises a registration module, a disaster recovery dispatching module and a dispatcher module; the production servers on each public cloud register in the disaster recovery center by virtue of the registration module; the dispatcher module manages and controls each dispatching node in each public cloud to execute backup tasks according to registration information, carries out environment and data mutual backup on a plurality of public cloud production servers based on a devots pipeline in real time, invokes a disaster recovery backup vertical model to correlate key data points in production data, cache data and configuration files with context information, analyzes and supplements missing content of the backup data, carries out data integrity and configuration consistency check on the supplemented disaster recovery backup data, judges the effectiveness of the disaster recovery backup data by combining the type and the level of a disaster recovery backup strategy, confirms whether the disaster recovery backup data meets the requirement of the backup strategy, and generates a dispatching task for the disaster recovery service slave node to call after verification;
When monitoring center monitors that one public cloud production server is abnormal, the disaster recovery scheduling module is informed to trigger the DevOps assembly line module immediately, the production assembly line information in the corresponding DevOps is automatically obtained according to the information of the abnormal production server, the address information and disaster recovery backup data of the disaster recovery server are obtained from the nodes of the disaster recovery service, the deployment host information is changed to deploy, the DNS server is informed to change the ip information corresponding to the domain name of the production service into the disaster recovery server, and users can be directly switched to the disaster recovery server when accessing from the public network.
Further, the storage center also stores disaster recovery backup process data and approval process data;
the disaster recovery backup process data comprises backup data, backup time stamps, backup places, backup file paths, backup state information and backup strategies; the backup strategy comprises full backup, incremental backup, backup period and backup data volume;
the approval process data is approval process information including sponsors, approvers, approval time and approval opinions.
Further, the disaster recovery backup recovery system comprises a model management module;
The model management module comprises a data preprocessing component, a model construction component, a consistency check training component and a strategy check training component;
the data preprocessing component is used for collecting text data in the power industry, including data backup specifications, technical documents, operation and maintenance knowledge base and disaster recovery knowledge base, cleaning the text data, removing special characters and punctuation marks contained in the text data, segmenting the text data and processing case and case formats, and unifying terms contained in the text data into specific power industry vocabulary based on a domain vocabulary;
the model construction component is used for selecting an open source pre-training language model of GPT-3.5 or BERT as a basic model, and performing fine adjustment on model parameters of the open source pre-training language model by using the preprocessed text data of the power industry output by the data preprocessing component to construct a disaster recovery backup vertical model;
the consistency verification training component trains the disaster recovery backup vertical model aiming at a consistency verification scene, guides the disaster recovery backup vertical model to analyze consistency of data files or configuration file contents, wherein the analysis contents comprise configuration of Redis clusters in the configuration file contents in a service application backup package, and the consistency of the configuration files in different micro services to the Redis middleware under the same application system is judged by the training of the disaster recovery backup vertical model on the basis of the analysis contents;
The strategy verification training component trains the disaster recovery backup vertical model aiming at a disaster recovery strategy verification scene, and guides the disaster recovery backup vertical model to judge timeliness information of disaster recovery backup data according to the type or the level of the disaster recovery backup system.
Further, the model management module also comprises a question-answer type guidance training component; the question-answer type guidance training component trains the disaster recovery backup vertical model aiming at the question-answer type guidance scene; the training process comprises the following steps:
the method comprises the steps of collecting scene data of a user aiming at different disaster recovery backup scenes, inputting backup strategy guide file contents of the power industry to a disaster recovery backup vertical model in advance, enabling the disaster recovery backup vertical model to ask questions to the user aiming at input information and collecting corresponding key information: the system level and the backup data belong to business data or operation and maintenance data; guiding the user to output the equiprotection rating of the system if the user does not know the system level; meanwhile, aiming at different scenes, the disaster recovery backup vertical model is guided to acquire corresponding pre-conditions from a user side, and then a problem result is returned to the user.
The backup strategy guidance file content of the power industry comprises system service data of three levels, namely core, important and common, backup strategy, backup frequency and backup mode data of operation and maintenance data.
Further, the model management module further comprises an evaluation component;
the evaluation component selects evaluation indexes including confusion degree and BLEU score, evaluates the performance of the disaster recovery backup vertical model, and adjusts the super parameters of the disaster recovery backup vertical model according to the evaluation result: learning rate and batch size, and finding out the optimal parameter combination by a cross-validation method.
Further, the disaster recovery scheduling module comprises a disaster recovery plan management component, a disaster recovery strategy making component, a task priority and scheduling component and an automatic task scheduling component;
the disaster recovery plan management component is responsible for managing and executing a disaster recovery plan, and the disaster recovery plan comprises a disaster recovery strategy, steps and a schedule;
the disaster recovery strategy making component is used for setting a disaster recovery strategy;
the task priority and scheduling component is used for determining the execution sequence and the resource allocation priority of the disaster recovery scheduling task according to the task priority and the scheduling strategy;
the automatic task scheduling component is used for coordinating and managing backup, data synchronization, service recovery and test tasks in a disaster recovery environment according to a disaster recovery plan and a disaster recovery strategy which are defined in advance.
Further, the scheduler module comprises a scheduler setting component, a backup task executing component, a monitoring and reporting component and an exception handling component;
The scheduler setting component triggers the backup task according to a specified schedule and priority according to a predefined strategy;
the backup task execution component notifies the dispatching nodes in each public cloud to execute the backup task according to the availability and resource condition of the equipment;
the monitoring and reporting component is used for monitoring the execution condition of the backup task in the scheduling node and generating a corresponding monitoring report;
the abnormal processing component transmits the abnormal backup task information to the disaster recovery backup vertical model, and the disaster recovery backup vertical model comprehensively analyzes the reasons of the abnormality and the subsequent processing method based on the abnormal content and the learned history operation and maintenance experience.
Further, the backup task execution component copies the latest mysql production side data content to a storage center according to middleware service information corresponding to the production application service, creates a real-time mysql data update task to monitor production side binlog data, and continuously updates and stores newly generated production data; each backup and update operation generates a scheduling task for processing by the disaster recovery slave node.
Further, the disaster recovery strategy type is a cross-cloud disaster recovery mode, and the backup type comprises a cold backup mode, a hot backup mode and a multi-activity mode.
Further, the registration information of the production server on each public cloud comprises production server information of the public cloud to be protected, parameter information of the production service and middleware parameter information corresponding to the production service;
the public cloud production server information to be protected comprises public cloud information, an IP address, an account password, an operating system, a storage type, network configuration and a security group;
the parameter information of the production service comprises a deployed pipeline address, a pipeline name and pipeline account information;
the middleware parameter information corresponding to the production service comprises a dependent kafka message middleware and a mysql database
Compared with the prior art, the invention has the following beneficial effects:
firstly, the disaster recovery backup recovery system for the electric power information system in the cloud environment uses private cloud as a master node to establish a disaster recovery center, uses public cloud environments as slave nodes to protect production servers in the public cloud environments, the production servers register in the disaster recovery center, the disaster recovery center carries out environment and data (deployment package, deployment script and production data) mutual backup on a plurality of public cloud production servers in real time based on devots assembly line according to registration information, and when the public cloud is abnormal, the automatic replication assembly line changes the deployment host information to rapidly deploy and rapidly switch to another public cloud, so that the production service can be ensured to normally provide service.
Secondly, the disaster recovery and backup recovery system for the power information system in the cloud environment uses a self-training disaster recovery and backup vertical model in the process of disaster recovery and backup, energizes the process of disaster recovery and backup through an operation and maintenance knowledge base, a disaster recovery and backup record and operation and maintenance monitoring data, provides disaster recovery and backup data output in a dialogue mode, and helps to analyze the rationality of a disaster recovery and backup strategy, the integrity of backup data and the like.
Thirdly, the disaster recovery and backup recovery system for the power information system in the cloud environment can provide question-answer guidance, configuration file consistency verification, disaster recovery strategy verification and other functions in the process of disaster recovery and backup by using the self-training disaster recovery and backup vertical model.
Drawings
FIG. 1 is a schematic diagram of a disaster recovery backup recovery system of an electric power information system in a cloud environment according to the present invention;
fig. 2 is a schematic diagram of a disaster recovery backup vertical model.
Detailed Description
Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.
Noun interpretation referred to in this example: devops: is a collective term for a set of processes, methods and systems for facilitating communication, collaboration and integration between development (application/software engineering), technical operations and Quality Assurance (QA) departments.
Referring to fig. 1, the embodiment discloses a disaster recovery backup recovery system for an electric power information system in a cloud environment, wherein the disaster recovery backup recovery system comprises a disaster recovery center and a devops pipeline module which are built on private clouds, and a production server, a backup server and a disaster recovery server which are distributed on public clouds; the private cloud is used as a master node, the public cloud is used as a slave node, the master node and each slave node are connected with each other, and a production server and a disaster recovery server in the slave nodes with disaster recovery backup relationship are connected with each other;
the disaster recovery center comprises a monitoring center, a dispatching center, a storage center and a disaster recovery backup vertical model which is generated based on large language model training;
the storage center divides a plurality of areas according to different storage clouds and respectively stores production data, cache data and configuration files provided by backup servers in various cloud environments, and the data of the plurality of areas are assembled together to form integral disaster-tolerant backup data;
the monitoring center is used for monitoring the states of various components, systems and services in the disaster recovery environment in real time;
the dispatching center comprises a registration module, a disaster recovery dispatching module and a dispatcher module; the production servers on each public cloud register in the disaster recovery center by virtue of the registration module; the dispatcher module manages and controls each dispatching node in each public cloud to execute backup tasks according to registration information, carries out environment and data mutual backup on a plurality of public cloud production servers based on a devots pipeline in real time, invokes a disaster recovery backup vertical model to correlate key data points in production data, cache data and configuration files with context information, analyzes and supplements missing content of the backup data, carries out data integrity and configuration consistency check on the supplemented disaster recovery backup data, judges the effectiveness of the disaster recovery backup data by combining the type and the level of a disaster recovery backup strategy, confirms whether the disaster recovery backup data meets the requirement of the backup strategy, and generates a dispatching task for the disaster recovery service slave node to call after verification;
When monitoring center monitors that one public cloud production server is abnormal, the disaster recovery scheduling module is informed to trigger the devots assembly line module immediately, the production assembly line information in the corresponding devots is automatically obtained according to the information of the abnormal production server, the address information and disaster recovery backup data of the disaster recovery server are obtained from the nodes of the disaster recovery service, the deployment host information is changed to deploy, the DNS server is informed to change the ip information corresponding to the domain name of the production service into the disaster recovery server, and users can be directly switched to the disaster recovery server when accessing from the public network.
1. Disaster recovery center
Building a disaster recovery center in private cloud of the power industry, wherein the disaster recovery center comprises the following modules:
(1) And a communication module: the disaster recovery center establishes stable network connection and communication equipment to ensure real-time data synchronization and communication with the main data center. This includes network devices, firewalls, routers, switches and other communication devices;
(2) And the storage center: the disaster recovery center ensures the reliability and the integrity of key data through a backup and storage mechanism. This includes periodically backing up key data, configuration files and databases and storing them in a storage center for recovery when a disaster occurs. The storage center is different from a traditional storage center, the data sources of the hybrid cloud are distributed, the core data are derived from production services in private cloud, and real-time production data are respectively stored in public cloud. The storage center divides a plurality of areas according to different storage clouds and respectively stores production data, cache data, configuration information and the like in each cloud environment, and the data of the plurality of areas are assembled together to form integral disaster recovery data.
In addition to backup critical data, disaster tolerant backup process data and approval process data are also stored in the storage center. The disaster recovery backup process data generally comprises backup data, backup time stamps, backup places, backup file paths, backup states and other information, and the backup strategy comprises full backup or incremental backup, backup period, backup data volume and other information. The approval process data comprises detailed information of an approval process, including sponsors, approvers, approval time, approval opinions and the like. The disaster recovery backup process data and the approval process data are very important for tracing the disaster recovery backup approval process, can also be used as disaster recovery backup knowledge base data for personnel to review, and can be used as a data base of a disaster recovery backup vertical model subsequently.
(3) Disaster recovery backup vertical model: different from the traditional NLP text comparison or keyword matching recognition method, the disaster recovery backup vertical model can correlate key data points and context information in disaster recovery backup key data, configuration files, database scripts and other data, and effectively analyze missing content of backup data so as to ensure the integrity of the backup data. In order to avoid the problems of data leakage and model leakage, a disaster recovery backup vertical model of the power industry is trained and deployed through a disaster recovery backup knowledge base in the power industry in an intranet environment of an isolation network. Referring to fig. 2, the training process of the disaster recovery backup vertical model includes the following steps:
Firstly, data collection and preprocessing are carried out, and text data in the power industry, including data backup specifications, technical documents, operation and maintenance knowledge base, disaster recovery knowledge base and the like, are collected. Cleaning text data, including removing special characters and punctuation marks, performing word segmentation, processing cases and the like. The domain vocabulary is used to unify terms into specific words so that the model better understands the terminology of the power industry.
The universal large language model is a natural language processing model trained on a large-scale corpus, and is used for obtaining understanding of language modes and semantics through unsupervised learning. In outputting technical documents to a large language model training process, for the "forward active", "reverse active" terms specific to the power industry, additional term description and context information generation campt are required to aid in large language model understanding. Many of these, such as "quadrant I reactive", "II reactive", "III reactive", "IV reactive", etc., require the above methods to adjust the understanding of the large language model.
An open source pre-trained language model, such as GPT-3.5 or BERT, is chosen as the base model, which has good language understanding capabilities in the general context. And fine-tuning the pre-training model in the intranet by using the data collected and preprocessed in the power industry, and constructing and obtaining the disaster recovery backup vertical model. In the fine tuning process, a specific corpus in the power industry is used for adjusting model parameters, such as classification and grading data of various service components of the power cloud platform, a system backup strategy and a backup management method, so that the system is better suitable for the context and terminology of the power industry.
Training a disaster recovery backup vertical model under a question-answer type guiding scene, and particularly collecting detailed scene data of a user according to different disaster recovery backup scenes. In the "backup policy" scenario, the backup policy guiding file content of the power industry needs to be input to the disaster recovery backup vertical model in advance, which includes the data such as the backup policy, backup frequency, backup mode, etc. of the service data and operation and maintenance data of the core, important and common level system. Meanwhile, the disaster recovery backup vertical model needs to ask questions to the user and collect corresponding key information aiming at the information: system level, whether the backup data belongs to business data or operation and maintenance data, etc., if the user does not know the system level, the user can be guided to output the system's underwriting rating (deduce the system level according to the underwriting rating). And similarly, aiming at different scenes, guiding the disaster recovery backup vertical model to acquire relatively complete preconditions from a user side, and returning a problem result to the user.
Training the disaster recovery backup vertical model in the consistency check scene, and guiding the large language model to deeply analyze and understand the content of the data file or the configuration file. If the configuration of the Redis cluster in the configuration file content in the micro-service application backup package, besides the disaster recovery backup vertical model itself can identify the relevant configuration (by spring: redis key words) corresponding to the Redis in the configuration file, the disaster recovery backup vertical model needs to be redirected to analyze whether the current Redis is the cluster configuration, the number of redisnode nodes, and other information according to the sensor key word information; configuration files in different micro services under the same application system should be consistent for the configuration of Redis middleware, and when inconsistency occurs, configuration file backup is likely to be caused by errors. And guiding the disaster recovery backup vertical model to carry out consistency verification on disaster recovery backup files of the same application system, and carrying out verification prompt by taking inconsistent information as risk information.
Aiming at disaster recovery strategy verification, the disaster recovery backup vertical model is further guided to judge information such as timeliness of backup data according to the type or the level of a disaster recovery backup system, for example, service data in a core-level service system in the power industry is definitely required to be backed up once in 12 hours, the time interval between the latest data storage time and backup triggering time in the service data can be judged in the analysis process of the backup data by utilizing a large language model, and if valid service data in 12 hours is not recognized, a risk prompt 'backup data does not meet the backup strategy requirement' is automatically carried out.
After the disaster recovery backup vertical model is trained, selecting a proper evaluation index such as confusion (BLEU score) or field-specific evaluation index (used for task generation evaluation), and evaluating the performance of the disaster recovery backup vertical model. Meanwhile, super parameters of the disaster recovery backup vertical model, such as learning rate, batch size and the like, are continuously adjusted, and the optimal parameter combination is found through a cross verification method and the like, so that the performance of the disaster recovery backup vertical model is improved.
And finally, periodically monitoring the performance of the disaster recovery backup vertical model, including the quality, accuracy and practicability of the generated text. If problems are found, the data is further trimmed or updated to improve the disaster recovery backup vertical model. Along with the continuous accumulation of operation and maintenance and disaster recovery backup knowledge bases in the power industry, new data can be added regularly, and the disaster recovery backup vertical model is retrained and fine-tuned regularly so as to keep the disaster recovery backup vertical model in the latest state.
Under the disaster recovery and backup restoration scene of the power information system, the disaster recovery and backup vertical model can support backup management personnel and backup operation personnel to directly point an AI dialog box on a page of the disaster recovery and backup system, and obtain effective information returned by the large model in a questioning mode, such as inquiring about what is required by a core-level system backup strategy, and the disaster recovery and backup vertical model can output disaster recovery and backup knowledge obtained from a knowledge base. In addition, the disaster recovery backup vertical model can analyze file contents in real time when backup operators manually upload backup files, the disaster recovery system automatically converts real-time analysis file content events into input contents taking the file contents as the disaster recovery backup vertical model, and simultaneously provides instructions for analyzing the file contents for the disaster recovery backup vertical model, so that the backup operators are helped to check data integrity, for example, whether the configuration contents in a plurality of application service configuration files have the configuration information of the Redis whistle, and the configuration files of a certain service are configured in the configuration file of the Redis single point, and the traditional detection means can only be used for checking the data integrity and are difficult to process for configuration difference verification.
(4) And (3) a monitoring center: the system is used for monitoring the states of various components, systems and services in the disaster recovery environment in real time. The method comprises the steps of monitoring hardware equipment, network connection, data synchronization conditions and the like, and sending out alarms in time so as to take corresponding measures; the system can detect the occurrence of faults and give an alarm in time so as to take corresponding measures to repair the faults and ensure the stable operation of the disaster recovery environment.
(5) And a dispatching center: the dispatching center comprises a disaster recovery dispatching module and a dispatcher module.
And (5.1) the disaster recovery scheduling module comprises a disaster recovery plan management component, a disaster recovery strategy making component, a task priority and scheduling component and an automatic task scheduling component.
Disaster recovery plan management component: is responsible for managing and executing disaster recovery plans. It may store and maintain detailed information of disaster recovery plans, including disaster recovery policies, steps, and schedules. Through the dispatching center, an administrator can check and update the disaster recovery plan and ensure that disaster recovery operation is carried out according to the plan;
disaster recovery strategy formulation component: the manager can utilize the disaster recovery strategy to formulate a component to set the disaster recovery strategy of the system. When the system fails, repairing the failure according to a set strategy and corresponding measures and ensuring the stable operation of the disaster recovery environment;
task priority and scheduling component: the execution order and resource allocation of the tasks may be decided according to the priorities of the tasks and the scheduling policy. The method can dynamically adjust the priority of the task and reasonably allocate the resources according to factors such as service requirements, disaster recovery targets, resource availability and the like so as to ensure the priority recovery of key services and maximally utilize the available resources;
An automated task scheduling component: is responsible for coordinating and managing various automation tasks in the disaster recovery environment. The system can automatically trigger and execute tasks such as backup, data synchronization, service recovery, test and the like according to a predefined plan and strategy. The efficiency and the accuracy of the disaster recovery process can be improved through automatic task scheduling;
(5.2) scheduler Module
The scheduler module comprises a scheduler setting component, a backup task executing component, a monitoring and reporting component and an exception handling component.
The scheduler setting component: and the system is used for managing and controlling each scheduling node in each public cloud to execute the backup task. The scheduler settings component may trigger backup tasks according to a specified schedule and priority according to predefined policies.
Backup task execution component: when the scheduler setting component triggers a backup task, the backup task executing component automatically allocates the backup task to a corresponding scheduling node or service. The backup task execution component decides which server to allocate the backup task to execute according to the availability and resource condition of the device.
Monitoring and reporting component: the monitoring and reporting component monitors the execution of the backup tasks in the scheduling node and generates a corresponding monitoring report. The method can record the information such as the starting time, the ending time, the execution state and the like of the backup task, and give an alarm when an abnormal condition or error occurs.
An exception handling component: if errors or anomalies occur in the executing process of the backup task, the anomaly processing component can transmit anomaly information to the disaster recovery backup vertical model, and the disaster recovery backup vertical model comprehensively analyzes the reasons of the anomalies and the subsequent processing method based on the anomaly content and the learned historical operation and maintenance experience. And finally, informing the backup manager of the backup abnormal information and the reason information analyzed by the large language model, wherein the backup manager can process the comments given by the large language model.
2. Disaster recovery process description:
in the disaster recovery strategy list, setting the disaster recovery strategy type as a cross-cloud disaster recovery mode, and selecting modes such as cold backup (default), hot backup, multiple activities and the like.
Adding public cloud server information to be protected in a disaster recovery center production service list, wherein the public cloud server information to be protected comprises: public cloud information, IP addresses, account passwords, operating systems, storage types, network configuration, security groups and the like; adding parameter information of the production service, such as a deployed pipeline address, a pipeline name, pipeline account information and the like; middleware parameter information corresponding to the production service is added, such as a kafka message-dependent middleware, a mysql database and the like.
The disaster recovery center automatically mutually prepares different public clouds and records corresponding disaster recovery backup relations based on a disaster recovery strategy and public cloud information configured in a production service list of the disaster recovery center, for example, a production server A of the ali cloud carries out disaster recovery in a messenger cloud server B; the disaster recovery center automatically applies for a server for disaster recovery in the public cloud server through the communication center and creates a disaster recovery service slave node in the server.
Many power information systems are put into production and deployed based on a DevOps pipeline. The scheduling center automatically acquires the production line information in the corresponding DevOps according to the information of the production service, acquires the latest deployment mode (dock, k8 s) of the application service from the production line, acquires a deployment package, a deployment script, a configuration file, a data script and the like, and automatically stores the information to the storage center and records the storage time; meanwhile, the scheduling center subscribes to the production line, and when the production line changes, the information in the production line is automatically transmitted to the scheduling center: the deployment mode, the deployment package, the deployment script, the configuration file, the data script and the like are updated in the storage center; each backup and update operation generates a scheduling task for processing by the disaster recovery slave node.
The scheduling center copies the latest mysql production side data content to the storage center according to middleware service information such as mysql database corresponding to the production application service, then creates a real-time mysql data update task at the scheduling center to monitor the binlog data of the production side, and continuously updates and stores the newly generated production data; each backup and update operation generates a scheduling task for processing by the disaster recovery slave node; the dispatching center calls a data model identification module, carries out integrity check on the backup data in the storage center through the disaster recovery backup large language model, judges whether the conditions of inconsistent configuration file information, missing table data fields, missing multi-table associated field data and the like exist, and carries out prompt marking in real time and provides an inspection method if the conditions of the configuration file information, the missing table data fields, the missing multi-table associated field data and the like exist.
The disaster recovery service slave node firstly applies for server resources according to the production service information acquired in the dispatching center, keeps consistent with the use information of the production service server with the target protected, and stores the information of the standby server; the dispatching center communicates with the disaster recovery service slave node in real time, and notifies the disaster recovery service slave node when a new dispatching task exists; the disaster recovery service slave node synchronously deploys packages, scripts and data to the target position of the backup cloud host (consistent with the data storage position of the protected production service) according to the scheduling task;
In this embodiment, a disaster recovery backup vertical model specially used for disaster recovery backup and built in a disaster recovery center is deeply integrated with an existing DevOps platform, the disaster recovery center is deployed in a private cloud, and business services are deployed in different public clouds, so that the situation that the business cannot provide services due to large-area faults in a public cloud environment is avoided.
As a preferred example, the disaster recovery backup recovery system further comprises a task management module, and disaster recovery operators can create, schedule and execute tasks through the task management module to select production service information to be backed up, and submit the production service information to the disaster recovery operators for auditing. After the disaster tolerance manager checks, the scheduling task triggers the scheduler to automatically acquire the production line information in the corresponding DevOps according to the information of the production service, and acquires the latest deployment mode (dock, k8 s) of the generation service from the production line, and simultaneously acquires a deployment package, a deployment script, a configuration file, a data script and the like, and the scheduling task automatically stores the information to the storage center and records the storage time. And after the dispatching task automatically carries out an auditing link and waits for the disaster recovery manager to audit, the disaster recovery operator executes the backup operation. And the auditing personnel clicks the AI identification function by one click on the auditing page, the disaster recovery backup vertical model helps the auditing personnel to carry out the problems of file consistency, data file deletion and configuration file consistency, and the analysis result is displayed to the auditing personnel, if the problems exist, the abnormal files and the positions are given. After confirming no errors, the auditing personnel are responsible for collecting service side data backup demands and evaluating demand feasibility, submitting backup applications after the feasibility evaluation is completed, and waiting for auditing by disaster recovery management personnel. After the disaster recovery management personnel passes the verification, the disaster recovery operation personnel is responsible for executing the verified backup operation.
3. Disaster recovery monitoring and switching process
When the monitoring center monitors that the protected production service cannot provide service faults, the scheduling center is informed to trigger the DevOps production deployment pipeline immediately, the pipeline acquires address information of the standby server from the node in the disaster recovery service, the production service is deployed in another public cloud, and service is provided after deployment is completed, so that service continuity is ensured. The deployment pipeline respectively starts middleware, runs application and the like, and loads data files, configuration files and the like which are already backed up by the disaster recovery service slave node from a designated position. The disaster recovery center main node informs a DNS server, changes information such as ip corresponding to a domain name of a production service into a backup server, and directly switches to the backup server when a user accesses from a public network
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The solutions in the embodiments of the present application may be implemented in various computer languages, for example, object-oriented programming language Java, and an transliterated scripting language JavaScript, etc.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (10)

1. The disaster recovery backup and recovery system of the power information system in the cloud environment is characterized by comprising a disaster recovery center and a devots assembly line module which are built on private cloud, and a production server, a backup server and a disaster recovery server which are distributed on each public cloud; the private cloud is used as a master node, the public cloud is used as a slave node, the master node and each slave node are connected with each other, and a production server and a disaster recovery server in the slave nodes with disaster recovery backup relationship are connected with each other;
The disaster recovery center comprises a monitoring center, a dispatching center, a storage center and a disaster recovery backup vertical model which is generated based on large language model training;
the storage center divides a plurality of areas according to different storage clouds and respectively stores production data, cache data and configuration files provided by backup servers in various cloud environments, and the data of the plurality of areas are assembled together to form integral disaster-tolerant backup data;
the monitoring center is used for monitoring the states of various components, systems and services in the disaster recovery environment in real time;
the dispatching center comprises a registration module, a disaster recovery dispatching module and a dispatcher module; the production servers on each public cloud register in the disaster recovery center by virtue of the registration module; the dispatcher module manages and controls each dispatching node in each public cloud to execute backup tasks according to registration information, carries out environment and data mutual backup on a plurality of public cloud production servers based on a devots pipeline in real time, invokes a disaster recovery backup vertical model to correlate key data points in production data, cache data and configuration files with context information, analyzes and supplements missing content of the backup data, carries out data integrity and configuration consistency check on the supplemented disaster recovery backup data, judges the effectiveness of the disaster recovery backup data by combining the type and the level of a disaster recovery backup strategy, confirms whether the disaster recovery backup data meets the requirement of the backup strategy, and generates a dispatching task for the disaster recovery service slave node to call after verification;
When monitoring center monitors that one public cloud production server is abnormal, the disaster recovery scheduling module is informed to trigger the DevOps assembly line module immediately, the production assembly line information in the corresponding DevOps is automatically obtained according to the information of the abnormal production server, the address information and disaster recovery backup data of the disaster recovery server are obtained from the nodes of the disaster recovery service, the deployment host information is changed to deploy, the DNS server is informed to change the ip information corresponding to the domain name of the production service into the disaster recovery server, and users can be directly switched to the disaster recovery server when accessing from the public network.
2. The disaster recovery backup and recovery system for power information system in cloud environment as claimed in claim 1, wherein said storage center further stores disaster recovery backup process data and approval process data;
the disaster recovery backup process data comprises backup data, backup time stamps, backup places, backup file paths, backup state information and backup strategies; the backup strategy comprises full backup, incremental backup, backup period and backup data volume;
the approval process data is approval process information including sponsors, approvers, approval time and approval opinions.
3. The disaster recovery backup and restoration system for an electric power information system in a cloud environment as set forth in claim 1, wherein said disaster recovery backup and restoration system comprises a model management module;
the model management module comprises a data preprocessing component, a model construction component, a consistency check training component and a strategy check training component;
the data preprocessing component is used for collecting text data in the power industry, including data backup specifications, technical documents, operation and maintenance knowledge base and disaster recovery knowledge base, cleaning the text data, removing special characters and punctuation marks contained in the text data, segmenting the text data and processing case and case formats, and unifying terms contained in the text data into specific power industry vocabulary based on a domain vocabulary;
the model construction component is used for selecting an open source pre-training language model of GPT-3.5 or BERT as a basic model, and performing fine adjustment on model parameters of the open source pre-training language model by using the preprocessed text data of the power industry output by the data preprocessing component to construct a disaster recovery backup vertical model;
the consistency verification training component trains the disaster recovery backup vertical model aiming at a consistency verification scene, guides the disaster recovery backup vertical model to analyze consistency of data files or configuration file contents, wherein the analysis contents comprise configuration of Redis clusters in the configuration file contents in a service application backup package, and the consistency of the configuration files in different micro services to the Redis middleware under the same application system is judged by the training of the disaster recovery backup vertical model on the basis of the analysis contents;
The strategy verification training component trains the disaster recovery backup vertical model aiming at a disaster recovery strategy verification scene, and guides the disaster recovery backup vertical model to judge timeliness information of disaster recovery backup data according to the type or the level of the disaster recovery backup system.
4. The disaster recovery backup and recovery system for power information systems in cloud environment as claimed in claim 3, wherein said model management module further comprises a question-answer instruction training component; the question-answer type guidance training component trains the disaster recovery backup vertical model aiming at the question-answer type guidance scene; the training process comprises the following steps:
the method comprises the steps of collecting scene data of a user aiming at different disaster recovery backup scenes, inputting backup strategy guide file contents of the power industry to a disaster recovery backup vertical model in advance, enabling the disaster recovery backup vertical model to ask questions to the user aiming at input information and collecting corresponding key information: the system level and the backup data belong to business data or operation and maintenance data; guiding the user to output the equiprotection rating of the system if the user does not know the system level; meanwhile, aiming at different scenes, the disaster recovery backup vertical model is guided to acquire corresponding pre-conditions from a user side, and then a problem result is returned to the user.
The backup strategy guidance file content of the power industry comprises system service data of three levels, namely core, important and common, backup strategy, backup frequency and backup mode data of operation and maintenance data.
5. The disaster recovery backup and recovery system for power information systems in cloud environment as claimed in claim 3, wherein said model management module further comprises an evaluation component;
the evaluation component selects evaluation indexes including confusion degree and BLEU score, evaluates the performance of the disaster recovery backup vertical model, and adjusts the super parameters of the disaster recovery backup vertical model according to the evaluation result: learning rate and batch size, and finding out the optimal parameter combination by a cross-validation method.
6. The disaster recovery backup and recovery system of an electric power information system in a cloud environment according to claim 1, wherein the disaster recovery scheduling module comprises a disaster recovery plan management component, a disaster recovery policy making component, a task priority and scheduling component and an automated task scheduling component;
the disaster recovery plan management component is responsible for managing and executing a disaster recovery plan, and the disaster recovery plan comprises a disaster recovery strategy, steps and a schedule;
the disaster recovery strategy making component is used for setting a disaster recovery strategy;
The task priority and scheduling component is used for determining the execution sequence and the resource allocation priority of the disaster recovery scheduling task according to the task priority and the scheduling strategy;
the automatic task scheduling component is used for coordinating and managing backup, data synchronization, service recovery and test tasks in a disaster recovery environment according to a disaster recovery plan and a disaster recovery strategy which are defined in advance.
7. The disaster recovery backup and recovery system for power information systems in cloud environment as claimed in claim 1, wherein said scheduler module comprises a scheduler setting component, a backup task executing component, a monitoring and reporting component and an exception handling component;
the scheduler setting component triggers the backup task according to a specified schedule and priority according to a predefined strategy;
the backup task execution component notifies the dispatching nodes in each public cloud to execute the backup task according to the availability and resource condition of the equipment;
the monitoring and reporting component is used for monitoring the execution condition of the backup task in the scheduling node and generating a corresponding monitoring report;
the abnormal processing component transmits the abnormal backup task information to the disaster recovery backup vertical model, and the disaster recovery backup vertical model comprehensively analyzes the reasons of the abnormality and the subsequent processing method based on the abnormal content and the learned history operation and maintenance experience.
8. The disaster recovery backup and restoration system for power information system in cloud environment according to claim 7, wherein said backup task execution component copies the latest mysql production side data content to a storage center according to middleware service information corresponding to production application service, creates real-time mysql data update task monitoring production side binlog data, and continuously updates and stores newly generated production data; each backup and update operation generates a scheduling task for processing by the disaster recovery slave node.
9. The system according to claim 1, wherein the disaster recovery policy type is a cross-cloud disaster recovery mode, and the backup types include a cold backup, a hot backup and a multi-active mode.
10. The disaster recovery backup and recovery system for power information system in cloud environment according to claim 1, wherein the registration information of the production server on each public cloud includes production server information of the public cloud to be protected, parameter information of the production service itself and middleware parameter information corresponding to the production service;
the public cloud production server information to be protected comprises public cloud information, an IP address, an account password, an operating system, a storage type, network configuration and a security group;
The parameter information of the production service comprises a deployed pipeline address, a pipeline name and pipeline account information;
the middleware parameter information corresponding to the production service comprises a kafka message-dependent middleware and a mysql database.
CN202311718958.XA 2023-12-13 2023-12-13 Disaster recovery backup and recovery system of power information system in cloud environment Pending CN117851122A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311718958.XA CN117851122A (en) 2023-12-13 2023-12-13 Disaster recovery backup and recovery system of power information system in cloud environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311718958.XA CN117851122A (en) 2023-12-13 2023-12-13 Disaster recovery backup and recovery system of power information system in cloud environment

Publications (1)

Publication Number Publication Date
CN117851122A true CN117851122A (en) 2024-04-09

Family

ID=90542786

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311718958.XA Pending CN117851122A (en) 2023-12-13 2023-12-13 Disaster recovery backup and recovery system of power information system in cloud environment

Country Status (1)

Country Link
CN (1) CN117851122A (en)

Similar Documents

Publication Publication Date Title
US10901727B2 (en) Monitoring code sensitivity to cause software build breaks during software project development
KR101856543B1 (en) Failure prediction system based on artificial intelligence
US20180129483A1 (en) Developing software project plans based on developer sensitivity ratings detected from monitoring developer error patterns
CN100461130C (en) Method for testing a software application
CN108964995A (en) Log correlation analysis method based on time shaft event
CN110088744B (en) Database maintenance method and system
CN101116058B (en) Test flight on-board processing system and method
CN108170566A (en) Product failure information processing method, system, equipment and collaboration platform
CN112199355B (en) Data migration method and device, electronic equipment and storage medium
CN110971464A (en) Operation and maintenance automatic system suitable for disaster recovery center
CN104125085A (en) EBS (Enterprise Service Bus) data management and control method and device
CN113312200A (en) Event processing method and device, computer equipment and storage medium
Yan et al. Aegis: Attribution of Control Plane Change Impact across Layers and Components for Cloud Systems
CN110609761B (en) Method and device for determining fault source, storage medium and electronic equipment
CN116506340A (en) Flow link testing method and device, electronic equipment and storage medium
CN117851122A (en) Disaster recovery backup and recovery system of power information system in cloud environment
CN116149824A (en) Task re-running processing method, device, equipment and storage medium
CN114422386B (en) Monitoring method and device for micro-service gateway
KR101288535B1 (en) Method for monitoring communication system and apparatus therefor
CN113626288A (en) Fault processing method, system, device, storage medium and electronic equipment
Matevska et al. Decentralised Avionics and Software Architecture for Sounding Rocket Missions.
CN111447329A (en) Method, system, device and medium for monitoring state server in call center
Šimonová et al. Proactive IT/IS monitoring for business continuity planning
CN113157532B (en) Instrument fault warning method and device based on terminal linkage
CN117670033A (en) Security check method, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination