CN118132333A - Disaster recovery method based on edge calculation and related device - Google Patents

Disaster recovery method based on edge calculation and related device Download PDF

Info

Publication number
CN118132333A
CN118132333A CN202410138396.XA CN202410138396A CN118132333A CN 118132333 A CN118132333 A CN 118132333A CN 202410138396 A CN202410138396 A CN 202410138396A CN 118132333 A CN118132333 A CN 118132333A
Authority
CN
China
Prior art keywords
node
data
target
service
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410138396.XA
Other languages
Chinese (zh)
Inventor
赵惊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202410138396.XA priority Critical patent/CN118132333A/en
Publication of CN118132333A publication Critical patent/CN118132333A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a disaster recovery method based on edge calculation and a related device, which are used for transferring a service to be processed and data to be processed from a main node to an edge node in advance; evaluating the risk of the target service based on the terminal intelligent risk control model to obtain an evaluation result; if the evaluation result belongs to the first level, acquiring a first target node for processing the target service, identifying the first target node, and if the first target node fails, selecting a second target node; and identifying a second target node, and if the second target node fails, enabling the standby node to process the target service. The process transfers the service to be processed and the data to be processed from the main node to the edge node in advance, and the service and the data are not all stored in the main node, so that the service and the data can be transferred to other nodes for processing under the condition that the related node fails, and the problem that all the services cannot be continuously processed and must be switched to the standby node under the condition that the current node fails in the prior art is avoided.

Description

Disaster recovery method based on edge calculation and related device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a disaster recovery method based on edge computation and a related device.
Background
In today's digital age, the continuity of information systems and the persistence of data have become critical to businesses and organizations. The continued availability of data is critical whether in the financial, health, educational or any other industries. Thus, disaster recovery and disaster recovery planning occupy a core place in enterprise IT policies.
The main current disaster recovery means mainly comprise: regular data backup, multi-activity data center, hot backup, cold backup, remote disaster backup, etc. Although these methods can ensure data security and system continuity in most cases, in case of failure of the current node, all traffic cannot be processed continuously, and must be switched to the standby node.
Disclosure of Invention
In view of this, the present invention provides a disaster recovery method based on edge computation and a related device, which are used for solving the problem that all services cannot be continuously processed and must be switched to a standby node in case of a failure of a current node. The specific scheme is as follows:
a disaster recovery method based on edge calculation comprises the following steps:
the method comprises the steps that a service to be processed and data to be processed are transferred from a main node to edge nodes in advance, wherein the number of the edge nodes is at least one;
under the condition that a processing request for a target service is received, evaluating the risk of the target service based on an intelligent risk control model to obtain an evaluation result;
If the evaluation result belongs to a first level, acquiring a first target node for processing the target service, identifying the first target node, and if the first target node fails, selecting a second target node;
And identifying the second target node, and if the second target node fails, enabling a standby node to process the target service.
The disaster recovery method based on edge calculation, optionally, further includes:
If the evaluation result belongs to the second level, selecting a main node to execute the target service, and if the main node fails, enabling a standby node to process the target service, or;
And if the evaluation result belongs to a third level and the first target node fails, processing the target service in an offline mode, and synchronizing the data generated in the processing process of the target service after the first target node is recovered to be normal.
In the disaster recovery method based on edge calculation, optionally, the transferring the service to be processed and the data to be processed from the host node to the edge node in advance includes:
the method comprises the steps of obtaining each first grade of a service to be processed and each second grade of each edge node in advance, and establishing an association relation between the first grade and the second grade;
Acquiring a target first grade to which a current service to be processed belongs, and determining a target second grade corresponding to the target first grade based on the association relation;
Determining a target edge node corresponding to the target second classification, acquiring current to-be-processed data corresponding to the current to-be-processed service, and slicing the current to-be-processed data to obtain current to-be-processed data after each slicing;
and storing the current service to be processed and the current data to be processed after the slicing to the target edge node.
In the disaster recovery method based on edge calculation, optionally, under the condition of receiving a processing request for a target service, evaluating the risk of the target service based on an intelligent risk control model to obtain an evaluation result, including:
Acquiring user behavior data, equipment information and context information associated with the target service, wherein the context information comprises: network environment, geographic location, and time information;
Transmitting the user behavior data, the equipment information and the context information to a risk scoring function to obtain a risk score;
and comparing the risk score with a preset risk score threshold value to obtain an evaluation result.
The disaster recovery method based on edge calculation, optionally, further includes:
Acquiring service data stored by a current edge node, and generating check data based on a preset coding rule by the service data;
the verification data is stored in the other edge node.
The disaster recovery method based on edge calculation, optionally, further includes:
Acquiring service data of a current edge node, traversing other edge nodes, and searching for another edge node containing the service data;
And carrying out consistency check on the service data and the service data in the other edge node.
A disaster recovery device based on edge computation, comprising:
the transfer module is used for transferring the service to be processed and the data to be processed from the main node to the edge node in advance, wherein the number of the edge nodes is at least one;
the evaluation module is used for evaluating the risk of the target service based on the terminal intelligent risk control model under the condition of receiving the processing request of the target service, so as to obtain an evaluation result;
The identifying and selecting module is used for acquiring a first target node for processing the target service if the evaluation result belongs to a first level, identifying the first target node, and selecting a second target node if the first target node fails;
and the identifying and starting module is used for identifying the second target node, and starting the standby node to process the target service if the second target node fails.
The disaster recovery device based on edge calculation, optionally, further includes:
The selecting and starting module is used for selecting a main node to execute the target service if the evaluation result belongs to a second level, and starting a standby node to process the target service if the main node fails, or;
And the offline processing and synchronizing module is used for processing the target service in an offline mode if the evaluation result belongs to a third level and the first target node fails, and synchronizing the data generated in the processing process of the target service after the first target node returns to be normal.
A storage medium comprising a stored program, wherein the program performs the steps of the disaster recovery method based on edge computation described above.
A computer program product comprising a computer program which, when executed by a processor, performs the steps of the edge-calculation based disaster recovery method described above.
Compared with the prior art, the invention has the following advantages:
The invention discloses a disaster recovery method and a related device based on edge calculation, wherein the disaster recovery method comprises the following steps: the method comprises the steps that a service to be processed and data to be processed are transferred from a main node to edge nodes in advance, wherein the number of the edge nodes is at least one; under the condition that a processing request for a target service is received, evaluating the risk of the target service based on an intelligent risk control model to obtain an evaluation result; if the evaluation result belongs to a first level, acquiring a first target node for processing the target service, identifying the first target node, and if the first target node fails, selecting a second target node; and identifying the second target node, and if the second target node fails, enabling a standby node to process the target service. The process transfers the service to be processed and the data to be processed from the main node to the edge node in advance, and the service and the data are not all stored in the main node, so that the service and the data can be transferred to other nodes for processing under the condition that the related node fails, and the problem that all the services cannot be continuously processed and must be switched to the standby node under the condition that the current node fails in the prior art is avoided.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a disaster recovery method based on edge calculation according to an embodiment of the present invention;
FIG. 2 is a block diagram of a disaster recovery device based on edge calculation according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention discloses a disaster recovery method based on edge calculation and a related device, which are applied to the disaster recovery processing process, wherein the disaster recovery is a strategy and a method for enabling an information system to recover data and apply as soon as possible after a large-scale fault caused by natural or artificial reasons. The main objective is to ensure that critical business functions can be restored and run continuously as soon as possible after a disaster has occurred. Edge computing is a distributed computing paradigm that transfers the processing tasks of data from a central server of a data center to the edge of a network, i.e., near the data source. This helps reduce delay and bandwidth usage of data transmission while also enhancing the privacy and security of the data. The existing disaster recovery processing method comprises the following steps:
Traditional data backup: this is the most basic disaster recovery method by periodically backing up data to another physical location. The problem with this approach is that data recovery may take a long time and that data may be lost between the disaster and the last backup.
Hot Standby (Hot Standby): this method involves two running systems, one operating normally and the other on standby. If the primary system fails, the backup system can take over immediately, but this approach is costly.
Cold Standby (Cold Standby): this is a fully configured but not running backup system. In the event of a failure of the primary system, it may start up and take over, but with longer start up and recovery times.
Multiple lived data center: a plurality of data centers are deployed, each of which can process requests. If one data center fails, the other data centers may continue processing.
Although these methods can ensure data security and system continuity in most cases, in case of failure of the current node, all traffic cannot be processed continuously, and must be switched to the standby node. Further, the following problems exist in the prior art:
(1) Problem of centralization of data centers
Conventional disaster recovery schemes typically rely on a centralized data center. This centralized architecture allows the system to be configured in the face of a failure of a data center, it may take a longer time to switch to another data center, thereby affecting the user experience.
(2) Data synchronization and consistency issues
Synchronizing data between multiple data centers may lead to data inconsistencies that are unacceptable in the financial field because accuracy and consistency of the data is critical to financial transactions.
(3) Data privacy and security issues
Existing edge computing typically does not take into account the privacy and security of data. Edge nodes may be more vulnerable to attacks due to their decentralized and open nature, whereas traditional data protection methods may not be suitable for this new computing environment.
(4) Flexibility and extensibility issues
Conventional disaster recovery schemes are typically static, lacking in flexibility and extensibility. When the traffic demand or data volume changes, the entire system needs to be redesigned and deployed.
In order to solve the problems, the invention provides a disaster recovery method based on edge calculation, and the core idea is to put the data and calculation tasks of core business on edge nodes to ensure the real-time performance, consistency and safety of the data. Data and tasks can be staged at different edge nodes depending on the importance of the traffic. By means of a special technical means, the data security of the nodes is enhanced, and the security of the data in the transmission, storage and processing processes is ensured. The execution flow of the method is shown in fig. 1, and comprises the following steps:
S101, transferring a service to be processed and data to be processed from a main node to edge nodes in advance, wherein the number of the edge nodes is at least one;
in the embodiment of the present invention, in the initial situation, all services and data are stored on a master node, in order to ensure that the data are effectively and quickly processed in an edge node, and simultaneously reduce the data volume transmitted to a central server, further speed up the processing speed of the data and reduce the bandwidth cost, the to-be-processed services and the to-be-processed data in the master node need to be transferred to the corresponding edge nodes in advance, where the selection of the to-be-processed services and the to-be-processed data is related to the priority of the services and the specific application scenario, and in the embodiment of the present invention, specific limitations are not performed, where, for the to-be-processed services, each first classification of the to-be-processed services is obtained in advance, where the first classification can be determined based on the importance (core), the sensitivity and the complexity of the to-be-processed services, and the core and the sensitivity of the services are two key factors for evaluating the importance level and the risk level of the services. Core is mainly concerned with the importance of the business. A business is highly centralised if it is critical to the core goals, core flows or core customers of the enterprise. Such services often need to be handled and protected most preferentially, as they are critical to the success and survival of the enterprise. Sensitivity is then a concern about the risk level of the business. A business has high sensitivity if it involves sensitive data, critical operations, or high risk decisions. Such services require special attention to security and risk control to ensure data security, operational accuracy and service continuity. In assessing the centrality and sensitivity of a business, multiple aspects of business processes, data security, legal and regulatory requirements, customer requirements, and competing environments need to be considered comprehensively. For high-core, high-sensitivity business, more stringent protection measures and risk control mechanisms need to be formulated to ensure stable and safe operation. The first hierarchy is to classify traffic into different classes, such as core traffic, medium core traffic and non-core traffic, according to its importance, sensitivity and complexity. Different levels of traffic may have different processing and storage requirements.
For core traffic, because of its high importance and sensitivity, it is necessary to select edge nodes with high availability and high reliability for deployment. These nodes should have powerful computing, storage and communication capabilities to ensure continuity and stability of traffic.
For medium core traffic, the importance is relatively low, but a certain reliability and stability guarantee is still required. Some edge nodes with certain computing and storage capacity can be selected for deployment so as to ensure the normal operation of the service.
For non-core services, due to the low importance and sensitivity, some edge nodes with low cost and high flexibility can be selected for deployment. These nodes can be used to handle some lightweight traffic demands to increase the flexibility and scalability of the overall system.
The following is a specific description of storage and processing requirements corresponding to different classes of services, aiming at the classification of the core service, the medium core service and the non-core service belonging to static service classification:
(1) Core services
Importance: core business is the most critical, least indispensable part of enterprise operation, and is directly related to the survival and long-term development of enterprises.
Sensitivity: because of the importance and sensitivity of core business, it typically involves core assets, critical data, and high value transactions for the enterprise.
Complexity: core business often involves the cooperation of multiple systems, multiple flows, and multiple departments, and thus is relatively complex to operate and process.
Storage requirements:
high availability: the storage system must ensure extremely high availability, ensuring quick and accurate access to the core data at any time.
High redundancy: multiple backups of data are required to prevent data loss or corruption.
And (3) encrypting and storing: all data should be encrypted when stored to prevent unauthorized access.
The processing requirements are as follows:
and (3) real-time processing: the data and transactions for the core business need to be processed in real time to ensure continuity and efficiency of the business.
High concurrency treatment: the system needs to be able to handle a large number of concurrent requests without performance degradation or crashing.
Strict security control: strict security measures such as authentication, access control, and audit trails need to be implemented in handling core traffic.
(2) Medium core traffic
Importance: medium core services, while less critical than core services, are an integral part of the enterprise's daily operations.
Sensitivity: the data and transactions of such businesses are somewhat sensitive, but may not involve the core assets of the enterprise as the core business does.
Complexity: the complexity is typically lower than the core traffic, but still requires the cooperation of multiple systems and processes.
Storage requirements:
Good usability: storage systems need to guarantee good availability to ensure quick access to data in most cases.
Moderate redundancy: the data needs to be backed up, but the frequency and number of backups may be less than the core traffic.
And (3) encrypting and storing: sensitive data should be encrypted when stored.
The processing requirements are as follows:
quasi-real-time processing: data and transactions for medium core services require near real-time processing, allowing for some delay.
Better concurrency treatment: the system needs to be able to handle a certain number of concurrent requests, maintaining good performance.
Proper safety control: appropriate security measures need to be implemented, but may not be as strict as the core traffic.
(3) Non-core traffic
Importance: non-core services are typically ancillary services in enterprise operations that do not directly impact the long-term evolution of the enterprise.
Sensitivity: the data and transaction sensitivity of such traffic is relatively low.
Complexity: the operations and processes are relatively simple and may involve only a single system or process.
Storage requirements:
basic availability: the storage system only needs to guarantee basic availability, allowing a certain access delay in certain situations.
Limited redundancy: the data may be backed up only in a limited or no way.
Optional encryption storage: depending on the sensitivity of the data, it may be selected whether to perform encrypted storage.
The processing requirements are as follows:
Batch processing: data and transactions for non-core traffic may be processed in a batch mode, allowing for some processing delay.
Basic concurrent processing: the system only needs to be able to handle a certain number of concurrent requests, maintaining basic performance.
Basic safety control: basic security measures need to be implemented, but may not be as strict as core traffic and medium core traffic.
Therefore, a second hierarchy of each edge node needs to be preset, where the second hierarchy is determined based on factors such as computing capability, storage capability, communication capability or flexibility, a specific determining process is related to an application scenario, and the embodiment of the present invention is not limited specifically. Further, the first classification and the second classification are established in association with each other based on a specific application scene.
Acquiring a target first grade to which a current service to be processed belongs, determining a target second grade corresponding to the target first grade based on the association relation, determining a target edge node corresponding to the target second grade, acquiring current data to be processed corresponding to the current service to be processed, and slicing the current data to be processed to obtain current data to be processed after each slicing, wherein the specific slicing process is as follows: large datasets are broken down into smaller, more manageable parts (i.e. "shards"). Each slice contains a portion of the dataset. For example, transaction data may be fragmented daily or hourly. By means of data slicing, it is ensured that each edge node only processes its associated data subset, thereby increasing the processing speed. And storing the current service to be processed and the current data to be processed after the slicing to the target edge node.
S102, under the condition that a processing request for a target service is received, evaluating the risk of the target service based on an intelligent risk control model to obtain an evaluation result;
in the embodiment of the invention, the intelligent risk control model is a method for performing risk assessment and decision making on the terminal equipment. It uses data and context information on the device to evaluate the risk level of a transaction or operation in real time and decides whether to allow or deny the operation according to a predetermined policy. The intelligent risk control model is a method for risk assessment and decision making on the end device. It uses data and context information on the device to evaluate the risk level of a transaction or operation in real time and decides whether to allow or deny the operation according to a predetermined policy. The input to the model mainly includes data on the device and contextual information, including user behavior patterns, device information, and other relevant contexts. By analyzing this data in real time, the model can assess the risk level of the transaction or operation.
The intelligent risk control model is aimed at, and depends on the application scene, the data characteristics and the requirements of risk control. The following are several commonly used models:
rule-based model: such a model evaluates risk by predefining a set of rules. For example, if the transaction amount exceeds a certain threshold, or the transaction occurs at an unusual time or place, the model may mark the transaction as high risk. Such models are straightforward, but may have difficulty coping with complex or unknown risk patterns.
Statistical-based models: such models utilize historical data to identify risk patterns. It may employ various statistical techniques such as regression analysis, time series analysis, or anomaly detection algorithms to identify anomalous behavior. For example, by analyzing the historical transaction behavior of the user, the model may learn the user's normal transaction pattern and identify transactions that deviate significantly from such pattern.
Machine learning model: such models automatically discover risk patterns by learning large amounts of data. Machine learning algorithms, such as random forests, gradient-lifted trees, or neural networks, may be used to construct such models. These models are typically more complex than rule-based models, but are also more flexible and powerful, being able to automatically adapt to changes in the data.
Deep learning model: for data with complex patterns and relationships, a deep learning model may be an effective choice. These models use neural network structures, such as Convolutional Neural Networks (CNNs) or Recurrent Neural Networks (RNNs), to process high-dimensional data and sequence data to find deeper risk patterns.
Integration model: to improve accuracy and robustness, multiple models are sometimes integrated together. Integrated methods, such as voting, stacking or fusion, can combine predictions from different models to produce a more reliable risk assessment.
Factors such as availability of data, complexity of the model, cost of training and maintenance, and risk tolerance need to be considered in selecting a particular model. In general, selecting a model is an iterative process that may require experimentation and verification to find a model that best suits the needs of a particular application.
In the embodiment of the invention, decision trees are selected for risk scoring aiming at the intelligent risk control model of the terminal. The decision tree risk control model is a tree-based classification model that learns decision rules by recursively partitioning the dataset into smaller subsets. In the context of risk assessment, decision trees may be used to predict whether a transaction is fraudulent or to assign transactions to different risk levels.
The following is a basic step of constructing a decision tree risk control model:
Data preparation: first, historical transaction data is collected and pre-processed, including cleaning, conversion, and feature engineering. It is ensured that the dataset contains enough samples and that each sample contains features that are meaningful for risk assessment.
Feature selection: features from the dataset that have an impact on the risk assessment, such as transaction amount, transaction location, transaction time, user behavior, etc., are selected. These features will serve as input variables for the decision tree.
Model training: the model is trained on historical data using decision tree algorithms (e.g., ID3, C4.5, CART, etc.). In the training process, the algorithm can select the optimal node splitting rule based on indexes such as information gain, gain rate or genie non-purity of the features, so as to construct a decision tree.
Pruning: to prevent the decision tree from overfitting, it may be pruned. Pruning may be achieved by pre-pruning (stopping growth during tree construction) or post-pruning (simplifying the tree after it has been constructed).
Model evaluation: test datasets are used to evaluate the performance of decision tree models, common evaluation metrics including accuracy, recall, F1 score, and the like.
Once the decision tree model training is complete, it can be used to score the risk of the new transaction. The risk score is typically a probability value or class label output by the model. To translate the risk score into a risk level, a mapping rule may be defined, such as:
risk score between 0 and 0.2: low risk
The risk score is between 0.2 and 0.5: medium risk
The risk score is between 0.5 and 0.8: high risk
Risk score between 0.8-1: very high risk
Model training: the model is trained using historical data and known risk events to obtain a risk scoring function.
The risk scoring function is typically a numerical value calculated from a plurality of features to quantify the risk level of a transaction or event. This function may be linear or non-linear, depending on the nature of the model and features used, the risk scoring function need not be a specific function as long as it can be evaluated with the input, in the embodiment of the invention, the following features are assumed:
x1 transaction amount
X2 distance between transaction location and usual location
X3 transaction time (which may be a classification variable such as "day" or "night")
For a decision tree, each internal node divides the sample into child nodes based on a characteristic decision rule. It is assumed that a decision tree model has been trained and that it is desired to calculate a risk score for a new transaction. One possible approach is to define a risk scoring function based on the proportion of fraudulent transactions in the leaf nodes.
For example, assuming that a leaf node L of the decision tree contains both fraudulent and non-fraudulent transactions, the proportion pL of fraudulent transactions in that leaf node may be calculated. A risk scoring function may then be defined that maps the transaction to the leaf node where it ultimately resides and uses the fraud proportion for that leaf node as a score:
RiskScore (x) =pl, where x is the feature vector of the transaction and L is the leaf node to which x corresponds in the decision tree.
However, one disadvantage of this approach is that it gives only a discrete score (i.e., the proportion of fraud on the leaf nodes). If a more continuous score is desired, this ratio may be further transformed, for example using logarithmic transformation:
This gives a continuous number with the log probability as a score. If the fraud proportion pL approaches 1, the score will approach infinity; if approaching 0, the score will approach negative infinity. In this way, the risk can be judged according to the magnitude of the score.
Scoring in real time: and on the edge nodes, carrying out real-time risk scoring on each transaction by using the trained model.
The formula is:
risk score = f (transaction data, user behavior, historical transaction record …)
Where f is a trained risk scoring function.
Since the transaction environment and user behavior patterns may change over time, a fixed risk score threshold may no longer be applicable. Thus, there is a need to dynamically adjust the risk score threshold to accommodate the current transaction environment.
The main flow is as follows:
(1) Environmental factors: the influence of external factors such as time, holidays, large activities, etc. on the risk score threshold is considered.
User behavior pattern change: analysis of the user's historical transaction behavior, if a significant change in the user's transaction pattern is found, may adjust the risk score threshold accordingly.
The adjustment principle of the risk score threshold should be based on deep analysis of the user behavior pattern change, and comprehensively determined by combining multiple factors such as business risk tolerance, supervision requirements, model performance and the like. The following are some key principles that can be used to guide the adjustment of risk score thresholds:
Data driving principle: the adjustment of the risk score threshold should be based on an analysis of historical transaction data of the user, in particular a statistical analysis of changes in the user's behavior patterns. If the data shows that the user's transaction activity has changed significantly, such as increased transaction frequency, increased transaction amount, abnormal changes in the location or object of the transaction, etc., these may be trigger points for adjusting the risk score threshold.
Business risk tolerance: financial institutions have different tolerances for different types of risks. The risk score threshold should be set in conformity with business objectives and risk preferences of the institution. For example, if a bank is more concerned with customer experience and willing to take on a certain risk, it may set a relatively high risk score threshold; conversely, if an institution risk is averted, a lower risk score threshold may be set.
Regulatory requirements: financial institutions must adhere to relevant legal regulations and regulatory requirements during operation. The adjustment of the risk score threshold should ensure that any regulatory criteria are not violated.
Model performance monitoring: as the user's behavior changes, the performance of the risk scoring model may also be affected. The adjustment of the risk score threshold should be combined with continuous monitoring of the model performance to ensure that the model can accurately identify new risk patterns. If the model performance decreases, it may be necessary to retrain the model or adjust the model parameters while adjusting the risk score threshold accordingly.
Stepwise adjustment and test: the adjustment of the risk score threshold should not be abrupt and extensive, but should take a stepwise adjustment strategy and test thoroughly after each adjustment to assess the impact of the adjustment on system performance and risk identification. This can be achieved by setting up experimental and control groups, or performing retrospective tests using historical data.
Timely response and periodic review: once a significant change in the user behavior pattern is detected, the adjustment of the risk score threshold should be done in time in order to quickly respond to the potential risk. In addition, even if there is no significant change, the risk score threshold should be reviewed regularly to accommodate the continuing changes in market environment, business development, and regulatory requirements.
In summary, the adjustment of the risk score threshold is a process that comprehensively considers multiple factors, and requires close cooperation among the risk management department, the data analysis team and the business team to ensure that the adjustment can effectively cope with risks and does not cause unnecessary interference to normal business.
And (3) real-time feedback adjustment: and dynamically feeding back the adjustment parameters in real time through feedback of user behaviors.
The preset risk score threshold adjustment formula may be expressed as:
preset risk score threshold = original risk score threshold + a× (environmental factor + user behavior pattern change + real-time feedback adjustment)
Where α is an adjustment coefficient.
In general, edge decision making and risk assessment ensures that each transaction is subject to strict risk control and decision making during the transaction, thereby ensuring the scientific reliability of the dynamic transaction hierarchy.
Further, by environmental factors are generally meant those external factors that affect the user's transaction behavior but are not directly controlled by the user. These factors may include market conditions (e.g., volatility of the stock market), economic indicators (e.g., inflation rate, interest rate level), geographic location (e.g., political stability of the country/region where the transaction occurs), industry dynamics (e.g., impact of new technology occurrences on the industry), etc. To obtain these environmental factors, the organization may:
And subscribing to a market data service to acquire real-time or historical market and economic data.
Using public data sources such as government published statistics, industry reports, and news.
By cooperating with third party data providers, more specialized and customized data is obtained.
(2) User behavior pattern change:
A user behavior pattern change refers to a new pattern or deviation that occurs compared to the user's historical transaction behavior. These changes may include increases or decreases in transaction frequency, fluctuations in transaction amount, conversion of transaction channels, unusual transaction times or places, and the like. To capture these changes, the mechanism should:
and carrying out long-term tracking and analysis on the transaction history of the user, and establishing a user behavior baseline.
Statistical rules and abnormal patterns of user behavior are identified using data mining and machine learning techniques.
User activity is monitored in real time and compared to a baseline to detect any significant changes.
(3) And (3) real-time feedback adjustment:
The real-time feedback adjustment is an adjustment made based on a real-time assessment of the user's current transaction activity by the system. This typically involves risk assessment of ongoing transactions and dynamically adjusting a preset risk score threshold based on the outcome of the assessment. To achieve this, the mechanism requires:
A system architecture is established that is capable of processing and analyzing transaction data in real time.
Each transaction is risk assessed in real-time using a risk scoring model or other algorithmic tool.
And automatically or semi-automatically adjusting risk parameters such as thresholds according to the risk assessment result and preset business rules.
The adjustment factor α plays a weight role that determines the extent to which the respective parameter affects the preset risk score threshold. The value of alpha should be set according to the business needs and risk preferences of the institution and may be determined by experimentation or historical data analysis.
Further, input data of the intelligent control model is exemplified, and the input data includes: user behavior data, device information, and contextual information, wherein the user behavior data comprises:
Historical transaction data: the user's past transaction records, including transaction time, transaction amount, transaction type (transfer, purchase, recharge, etc.).
Logging behavior: the login frequency, login time, login location, login device type, etc. of the user.
Operation habit: conventional operations of the user on the application or platform, such as click frequency, browse path, dwell time, etc.
The device information includes:
device type: a mobile phone, a tablet computer, a notebook computer, etc.
Operating system: iOS, android, windows, etc.
Device unique identifier: IMEI, advertisement ID, device serial number, etc.
Device security status: whether jail is broken or root rights are obtained, whether security software or malicious software is installed, and the like.
The context information includes:
network environment: network type (Wi-Fi, 4G, 5G, etc.), network speed, network stability, etc. of the user connection.
Geographic location: the geographic location of the user at the time of the transaction or operation may be obtained by IP address, GPS or mobile network location.
Time information: current system time, date, and the specific time period (e.g., day, night, etc.) that the user is operating on.
The output of the intelligent risk control model is an evaluation result, and whether to allow or reject the operation is determined according to a preset strategy. The evaluation may be based on risk levels, security rules, or other business logic.
S103, if the evaluation result belongs to a first level, acquiring a first target node for processing the target service, identifying the first target node, and if the first target node fails, selecting a second target node;
In the embodiment of the invention, after the evaluation result is obtained, the risk level of the evaluation result is determined, wherein the risk level can be adjusted according to actual requirements. Generally, risk levels may be classified into three high, medium, low, or in more detail five or more. Different levels of risk may take different countermeasures, such as refusing the transaction, requiring additional verification or reducing the transaction limit, etc.
In defining the risk level, a variety of factors may be considered, such as transaction amount, transaction type, user behavior patterns, device information, and the like. These factors may be quantified by weights or probabilities to help the model evaluate risk levels more accurately.
In the case that the risk level belongs to the first level, for example, the first level may be a risk level, the target service may determine a first target node for processing the target service based on the static service level in S101, identify the first target node, if the first target node is normal, process the target service based on the first target node, preferably, the first target node may be an edge node, and if the first target node fails, select a second target node, where a selection principle of the second target node is related to a node selection principle in a specific application scenario.
S104, identifying the second target node, and if the second target node fails, enabling a standby node to process the target service.
In the embodiment of the invention, after the second target node is selected, the second target node is identified, if the second target node is normal, the target service is processed based on the second target, and on the premise that the second target node fails, the standby node is started to process the target service.
The invention discloses a disaster recovery method based on edge calculation, which comprises the following steps: the method comprises the steps that a service to be processed and data to be processed are transferred from a main node to edge nodes in advance, wherein the number of the edge nodes is at least one; under the condition that a processing request for a target service is received, evaluating the risk of the target service based on an intelligent risk control model to obtain an evaluation result; if the evaluation result belongs to a first level, acquiring a first target node for processing the target service, identifying the first target node, and if the first target node fails, selecting a second target node; and identifying the second target node, and if the second target node fails, enabling a standby node to process the target service. The process transfers the service to be processed and the data to be processed from the main node to the edge node in advance, and the service and the data are not all stored in the main node, so that the service and the data can be transferred to other nodes for processing under the condition that the related node fails, and the problem that all the services cannot be continuously processed and must be switched to the standby node under the condition that the current node fails in the prior art is avoided.
In the embodiment of the present invention, if the evaluation result belongs to a second level, preferably, the second level may be a high risk level, and the target service is executed by selecting a master node, if the master node fails, a standby node is started to process the target service, or if the evaluation result belongs to a third level and the first target node fails, the target service is processed in an offline manner, and after the first target node returns to normal, data generated in the processing process of the target service is synchronized.
The goal of the disaster recovery strategy is to ensure that critical business functions can quickly recover and continue to operate when a primary system or data center is subjected to a failure or disaster. Conventional disaster recovery strategies rely on remote backup data centers, which typically involve high costs and possible data delays. While edge computation provides a new, more efficient solution to disaster recovery.
To ensure continued availability of data, edge computing may store redundant copies of data on edge nodes at multiple geographic locations.
The main flow is as follows:
data synchronization: when the data of the primary data center is updated, these changes are automatically synchronized to all edge nodes.
Fast failover: when the primary data center fails, the system automatically routes transaction traffic to the nearest healthy edge node.
To ensure continued operation of the edge nodes, the system needs to periodically check the health status of each node and issue an early warning when a potential failure is detected.
The main flow is as follows:
health check protocol: a standard procedure is defined how to check the health status of a node. The health check protocol (Health CheckProtocol) is a standardized set of procedures and methods for monitoring and assessing the health of edge nodes (or other system components). The protocol ensures that the system is able to periodically, accurately, and comprehensively collect information about the status of each node in order to discover and handle potential faults or performance problems in a timely manner.
Health check protocols generally involve the following key aspects:
(1) Checking frequency and timing:
the frequency at which the system performs health checks is defined, such as every few seconds, minutes or hours.
A determination is made when to trigger a health check, such as at system start-up, when a particular resource utilization is reached, or after a particular event has occurred.
(2) Check content and criteria:
Specific items to be checked are explicitly required, such as memory usage of the node, CPU load, disk space, network connectivity, etc.
Health criteria for each check, such as memory usage not exceeding 80%, CPU load below a certain threshold, etc., are set.
(3) The checking mode is as follows:
Specifying how to perform a health check may include sending a heartbeat signal, executing a diagnostic script, calling a specific API interface, etc.
It is determined whether to initiate a check from the central node or to report status information by itself by the edge node.
(4) Response mechanism:
Defining how to process results after the health check is completed, including normal, alert and fault conditions.
Ensuring that the system can take timely action when unhealthy nodes are discovered, such as raising an alarm, logging, attempting to automatically repair or isolate a failed node, etc.
(5) Security and privacy:
Ensuring that data transmitted during health checks is secure may involve security measures such as encrypted communications, access control, etc.
Privacy protection is considered to ensure that sensitive information is not accessed or compromised by unauthorized access.
6. Extensibility and configurability:
The protocol should be designed with possible future expansion requirements in mind, such as adding new inspection items, adjusting inspection criteria, etc.
Allowing an administrator or operator to configure parameters of the health check according to the actual situation. The fault detection algorithm: the health status data returned from each node is analyzed to identify possible failure modes. The fault detection algorithm is an algorithm for automatically identifying and diagnosing faults or abnormal conditions in the system. These algorithms are typically based on monitoring the operating state and performance metrics of the system and using such data to detect potential problems. The specific description of the fault detection algorithm is as follows:
(1) Threshold-based fault detection algorithm:
Principle of: thresholds for one or more performance indicators are set, and a fault alert is triggered when monitored data exceeds or falls below these thresholds.
The implementation mode is as follows: performance metrics (e.g., CPU utilization, memory occupancy, network latency, etc.) of the nodes are periodically collected and compared to a preset threshold.
(2) Fault detection algorithm based on statistical model:
Principle of: and establishing a statistical model (such as a Gaussian distribution model) by utilizing the historical data, and judging whether faults occur or not according to the coincidence degree of the current data and the model.
The implementation mode is as follows: and collecting historical data of the nodes in a normal running state, calculating statistical parameters such as mean value and variance of the nodes, and updating the model in real time. Comparing the current data with the model, and calculating the abnormal score or probability.
Automatic fault recovery: when a node failure is detected, the system automatically initiates a predetermined recovery process.
In a multi-node environment, maintaining data integrity and consistency is a great challenge. To cope with this problem, specific technical means are required.
The technical core flow comprises the following steps:
transaction log: whenever changes occur to the data, the system will record those changes in the transaction log.
And (3) data checksum: the data consistency is ensured by calculating the checksum of the data and comparing the checksum with the data of other nodes.
The process of calculating the checksum is as follows:
When a node receives data or changes data, the system calculates a checksum for the data. The checksum is a value calculated by a particular algorithm (e.g., CRC32, MD5, SHA-256, etc.) that represents a particular digest or fingerprint of the data. The calculation of the checksum is typically based on the content of the data, and the checksum may be quite different even if there is a small change in the data.
And sending a checksum: after calculating the checksum, the node will send this value along with other relevant data (e.g., timestamp, data identifier, etc.) to other nodes in the network. This process may be real-time or may be performed on a periodic basis, depending on the requirements and performance considerations of the system.
Receiving and comparing: after receiving the data containing the checksum, other nodes execute the same checksum calculation by using own local data. These nodes will then compare their own calculated checksum with the received checksum.
Consistency judgment: if the calculated checksum matches the received checksum, then the data may be considered consistent. If there is no match, indicating that the data has been changed or corrupted at a node, further measures need to be taken to solve the inconsistency problem.
Handling inconsistencies: upon detecting an inconsistency, the system may trigger various conflict resolution policies, such as using the latest version of the data, requesting an update of the original data source, or arbitrating according to preset rules, etc. This process may require manual intervention or may be fully automated, depending on the design and requirements of the system.
Assume a distributed system comprising three nodes: A. b and C. These nodes are responsible for storing and updating user account information. Attention is now directed to a particular user account whose account balance is a critical data point.
Initial state: the user account balances on nodes A, B and C are 100 yuan.
Updating data: the user initiates a transaction with an account balance being increased by 50 yuan. This update is first processed at node a. Node a calculates a checksum of the new balance (e.g., using the MD5 algorithm). Assume that the MD5 checksum of the new balance 150 is checksum_A.
And sending a checksum: node a sends the new balance (150 elements) and the corresponding checksum checksum_a to nodes B and C.
Receiving and comparing: nodes B and C each receive an update from node a. They first compute a 150-element MD5 checksum on their own local data.
It is assumed that the checksum calculated by nodes B and C is also checksum_a (since the data is identical, the checksum should also be identical).
Consistency judgment: nodes B and C compare their own calculated checksum checksum_a with the checksum received from node a. Because the two match, node B and C acknowledge data are identical.
Nodes B and C update their own local data and set the user account balance to 150.
Process inconsistencies (assuming a scene): if at some point node B does not receive the update from node a correctly for some reason (e.g., network delay, hardware failure, etc.), it still considers the account balance to be 100 yuan. When node B performs data synchronization with node a or C, it may find that its data is inconsistent with other nodes (because it computes a 100-element checksum would result in a value other than checksum_a). At this point, the system may trigger a conflict resolution policy. For example, node B may request that node A or C send the latest account balance and checksum, and then update. Or the system may have a centralized coordinator to resolve this inconsistency.
The checksum is used as a lightweight mechanism to help the system detect and resolve data inconsistencies in a distributed environment.
The above description is directed to a verification method in the case that the stored data are just consistent, if all the data stored by the edge nodes are inconsistent, the situation that the data overlap exists, and if the data overlap exists, the problem can be solved in the above manner. By combining distributed storage and verification techniques for cases where the data is misaligned, the integrity and consistency of the data may still be ensured. The specific treatment process is as follows:
Data slicing and storing: the raw data is divided into a plurality of slices, each of which is assigned to a particular edge node for storage. The size and number of fragments may be adjusted based on the nature of the data and the storage requirements.
Generating verification data:
check data is generated for each slice using Error Checking and Correction (ECC) techniques. These check data may be additional parity bits, checksums, or other forms of redundant information.
The ECC check data is stored on nodes other than the edge node of the memory slice. In this way, even if data of a certain slice is lost or corrupted, the data can be recovered by ECC checking.
And (3) distributed verification: when the data of a certain fragment needs to be read or verified, the data can be acquired from the edge node storing the fragment, and meanwhile, the corresponding verification data can be acquired from the node storing the ECC verification data. The acquired fragmented data and the verification data are verified using an ECC algorithm, either locally or on a centralized verification server. If the data is complete, the verification process will be successful; otherwise, the corrupted slice data may be repaired by ECC checking the data.
Redundancy and fault tolerance: to further increase the fault tolerance of the system, redundant Array of Independent Disks (RAID) verification or Erasure Code (EC) verification techniques may be used. RAID or EC verification involves combining multiple pieces of data and/or parity data to generate additional parity data or more complex redundancy codes. These additional verification data are stored on the additional edge nodes. If multiple pieces of data or parity data are lost or corrupted, RAID or EC parity data may be used for recovery.
Consistency guarantees: when data is written, it is ensured that all relevant fragmented data and check data are successfully written into their corresponding edge nodes. Upon data reading and verification, it is ensured that data is acquired from the correct node and that a data repair or recovery mechanism can be triggered upon verification failure.
Monitoring and maintaining: the data integrity of each edge node is checked periodically, for example by periodically calculating and comparing checksums. If data inconsistencies or corruption is detected, an automatic repair process is triggered or an administrator is notified to perform manual intervention.
The disaster recovery method of the invention has the following advantages:
(1) Response speed is improved: because the data processing and analysis is done at the local or near-user edge nodes, the user's request can be responded to quickly, greatly reducing network delay and data transmission time.
(2) Improving the usability of the system: edge nodes are deployed in multiple geographic locations, so that the system has higher redundancy. When a certain node or a central server fails, the user request can be quickly switched to other healthy nodes, and the service continuity is ensured.
(3) Data privacy and security are enhanced: the processing, analysis and storage of the data is locally placed, reducing the transmission of the data in the network, thereby reducing the risk of data theft or tampering. Meanwhile, the introduced intelligent risk control model can evaluate and cope with various security risks in real time.
(4) Bandwidth and resources are saved: only necessary data can be transmitted to the central server, so that network bandwidth is greatly saved. In addition, the task of data processing is dispersed to the edge nodes, so that the load of the central server is also reduced.
(5) Flexible offline transaction processing: on the premise of ensuring safety, the user is allowed to conduct offline transactions in certain scenes, and the use flexibility of the system is improved. A step of
(6) Dynamic risk assessment: the intelligent risk control model based on the user context can adjust the risk threshold in real time, so that the risk control is more accurate, and the transaction safety is improved.
(7) Simplified disaster recovery preparation and recovery: conventional disaster recovery schemes typically require complex backup, migration, and restoration processes. In the invention, by arranging the edge nodes in a plurality of places, the processes are simplified, and disaster recovery is faster and simpler.
In summary, the invention provides an efficient and safe disaster recovery solution, which meets the requirements of modern financial services on high availability, high response speed and high safety.
Based on the above-mentioned disaster recovery method based on edge calculation, in the embodiment of the present invention, there is provided a disaster recovery device based on edge calculation, where a structural block diagram of the device is shown in fig. 2, and the disaster recovery device includes:
A transfer module 201, an evaluation module 202, an identification and selection module 203, and an identification and enablement module 204.
Wherein,
The transferring module 201 is configured to transfer, in advance, a service to be processed and data to be processed from a master node to an edge node, where the edge node is at least one;
The evaluation module 202 is configured to evaluate, based on an intelligent risk control model, a risk of a target service to obtain an evaluation result when a processing request for the target service is received;
the identifying and selecting module 203 is configured to obtain a first target node for processing the target service if the evaluation result belongs to a first level, identify the first target node, and select a second target node if the first target node fails;
The identifying and enabling module 204 is configured to identify the second target node, and enable a standby node to process the target service if the second target node fails.
The invention discloses a disaster recovery device based on edge calculation, which comprises: the method comprises the steps that a service to be processed and data to be processed are transferred from a main node to edge nodes in advance, wherein the number of the edge nodes is at least one; under the condition that a processing request for a target service is received, evaluating the risk of the target service based on an intelligent risk control model to obtain an evaluation result; if the evaluation result belongs to a first level, acquiring a first target node for processing the target service, identifying the first target node, and if the first target node fails, selecting a second target node; and identifying the second target node, and if the second target node fails, enabling a standby node to process the target service. The process transfers the service to be processed and the data to be processed from the main node to the edge node in advance, and the service and the data are not all stored in the main node, so that the service and the data can be transferred to other nodes for processing under the condition that the related node fails, and the problem that all the services cannot be continuously processed and must be switched to the standby node under the condition that the current node fails in the prior art is avoided.
In an embodiment of the present invention, the disaster recovery device further includes:
The selection and enabling module and the off-line processing and synchronization module.
Wherein,
The selecting and starting module is used for selecting a main node to execute the target service if the evaluation result belongs to a second level, and starting a standby node to process the target service if the main node fails, or;
And the offline processing and synchronizing module is used for processing the target service in an offline mode if the evaluation result belongs to a third level and the first target node fails, and synchronizing the data generated in the processing process of the target service after the first target node returns to normal.
In an embodiment of the present invention, the transferring module 201 includes:
the device comprises an acquisition and establishment unit, an acquisition and determination unit, an acquisition and fragmentation unit and a storage unit.
Wherein,
The acquiring and establishing unit is used for acquiring each first grade of the service to be processed and each second grade of each edge node in advance and establishing an association relation between the first grade and the second grade;
the acquiring and determining unit is used for acquiring a target first grade to which the current service to be processed belongs and determining a target second grade corresponding to the target first grade based on the association relation;
The obtaining and slicing unit is configured to determine a target edge node corresponding to the target second class, obtain current to-be-processed data corresponding to the current to-be-processed service, and slice the current to-be-processed data to obtain current to-be-processed data after each slice;
And the storage unit is used for storing the current service to be processed and the current data to be processed after the slicing to the target edge node.
In an embodiment of the present invention, the evaluation module 202 includes:
The device comprises an acquisition unit, a scoring unit and a comparison unit.
Wherein,
The acquiring unit is configured to acquire user behavior data, device information, and context information associated with the target service, where the context information includes: network environment, geographic location, and time information;
The scoring unit is used for transmitting the user behavior data, the equipment information and the context information to a risk scoring function to obtain a risk score;
And the comparison unit is used for comparing the risk score with a preset risk score threshold value to obtain an evaluation result.
The disaster recovery device comprises a processor and a memory, wherein the transfer module, the evaluation module, the identification and selection module, the identification and starting module and the like are all stored in the memory as program units, and the processor executes the program units stored in the memory to realize corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, and the problem that all services cannot be continuously processed and must be switched to the standby node under the condition that the current node fails in the prior art is solved by adjusting the kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
The embodiment of the invention provides a computer storage medium, on which a program is stored, which when being executed by a processor, realizes the disaster recovery method based on edge calculation.
The embodiment of the application provides a processor, which is used for running a program, wherein the program executes the steps of any disaster recovery method based on edge calculation.
The embodiment of the application provides a computer program product, which comprises a computer program, wherein the computer program realizes the steps of any disaster recovery method based on edge calculation provided by the embodiment of the application when being executed by a processor.
An embodiment of the present invention provides an apparatus, where a structural block diagram of the apparatus is shown in fig. 3, and the apparatus includes: a processor 301, a storage medium 302, and a program stored on the storage medium 302 and executable on the processor 302, the processor 301 implementing the following steps when executing the program:
the method comprises the steps that a service to be processed and data to be processed are transferred from a main node to edge nodes in advance, wherein the number of the edge nodes is at least one;
under the condition that a processing request for a target service is received, evaluating the risk of the target service based on an intelligent risk control model to obtain an evaluation result;
If the evaluation result belongs to a first level, acquiring a first target node for processing the target service, identifying the first target node, and if the first target node fails, selecting a second target node;
And identifying the second target node, and if the second target node fails, enabling a standby node to process the target service.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and variations of the present application will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are to be included in the scope of the claims of the present application.

Claims (10)

1. The disaster recovery method based on edge calculation is characterized by comprising the following steps:
the method comprises the steps that a service to be processed and data to be processed are transferred from a main node to edge nodes in advance, wherein the number of the edge nodes is at least one;
under the condition that a processing request for a target service is received, evaluating the risk of the target service based on an intelligent risk control model to obtain an evaluation result;
If the evaluation result belongs to a first level, acquiring a first target node for processing the target service, identifying the first target node, and if the first target node fails, selecting a second target node;
And identifying the second target node, and if the second target node fails, enabling a standby node to process the target service.
2. The edge-computing-based disaster recovery method of claim 1, further comprising:
If the evaluation result belongs to the second level, selecting a main node to execute the target service, and if the main node fails, enabling a standby node to process the target service, or;
And if the evaluation result belongs to a third level and the first target node fails, processing the target service in an offline mode, and synchronizing the data generated in the processing process of the target service after the first target node is recovered to be normal.
3. The edge computation-based disaster recovery method of claim 1, wherein transferring the traffic to be processed and the data to be processed from the master node to the edge node in advance comprises:
the method comprises the steps of obtaining each first grade of a service to be processed and each second grade of each edge node in advance, and establishing an association relation between the first grade and the second grade;
Acquiring a target first grade to which a current service to be processed belongs, and determining a target second grade corresponding to the target first grade based on the association relation;
Determining a target edge node corresponding to the target second classification, acquiring current to-be-processed data corresponding to the current to-be-processed service, and slicing the current to-be-processed data to obtain current to-be-processed data after each slicing;
and storing the current service to be processed and the current data to be processed after the slicing to the target edge node.
4. The disaster recovery method based on edge computing according to claim 1, wherein, in the case of receiving a processing request for a target service, evaluating risk of the target service based on an end-intelligent risk control model to obtain an evaluation result, comprising:
Acquiring user behavior data, equipment information and context information associated with the target service, wherein the context information comprises: network environment, geographic location, and time information;
Transmitting the user behavior data, the equipment information and the context information to a risk scoring function to obtain a risk score;
and comparing the risk score with a preset risk score threshold value to obtain an evaluation result.
5. The edge-computing-based disaster recovery method of claim 1, further comprising:
Acquiring service data stored by a current edge node, and generating check data based on a preset coding rule by the service data;
the verification data is stored in the other edge node.
6. The edge-computing-based disaster recovery method of claim 1, further comprising:
Acquiring service data of a current edge node, traversing other edge nodes, and searching for another edge node containing the service data;
And carrying out consistency check on the service data and the service data in the other edge node.
7. A disaster recovery device based on edge computation, comprising:
the transfer module is used for transferring the service to be processed and the data to be processed from the main node to the edge node in advance, wherein the number of the edge nodes is at least one;
the evaluation module is used for evaluating the risk of the target service based on the terminal intelligent risk control model under the condition of receiving the processing request of the target service, so as to obtain an evaluation result;
The identifying and selecting module is used for acquiring a first target node for processing the target service if the evaluation result belongs to a first level, identifying the first target node, and selecting a second target node if the first target node fails;
and the identifying and starting module is used for identifying the second target node, and starting the standby node to process the target service if the second target node fails.
8. The edge-based disaster recovery device of claim 7, further comprising:
The selecting and starting module is used for selecting a main node to execute the target service if the evaluation result belongs to a second level, and starting a standby node to process the target service if the main node fails, or;
And the offline processing and synchronizing module is used for processing the target service in an offline mode if the evaluation result belongs to a third level and the first target node fails, and synchronizing the data generated in the processing process of the target service after the first target node returns to be normal.
9. A storage medium comprising a stored program, wherein the program performs the steps of the edge-calculation-based disaster recovery method of any one of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program when executed by a processor implements the steps of the edge-based disaster recovery method as claimed in any one of claims 1-6.
CN202410138396.XA 2024-01-31 2024-01-31 Disaster recovery method based on edge calculation and related device Pending CN118132333A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410138396.XA CN118132333A (en) 2024-01-31 2024-01-31 Disaster recovery method based on edge calculation and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410138396.XA CN118132333A (en) 2024-01-31 2024-01-31 Disaster recovery method based on edge calculation and related device

Publications (1)

Publication Number Publication Date
CN118132333A true CN118132333A (en) 2024-06-04

Family

ID=91231640

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410138396.XA Pending CN118132333A (en) 2024-01-31 2024-01-31 Disaster recovery method based on edge calculation and related device

Country Status (1)

Country Link
CN (1) CN118132333A (en)

Similar Documents

Publication Publication Date Title
CN106716972B (en) Semi-automatic failover
US11632291B2 (en) System facilitating prediction, detection and mitigation of network or device issues in communication systems
US20190378073A1 (en) Business-Aware Intelligent Incident and Change Management
US11269718B1 (en) Root cause detection and corrective action diagnosis system
US9280661B2 (en) System administrator behavior analysis
US11561851B2 (en) Datacenter IoT-triggered preemptive measures using machine learning
US11669390B2 (en) Automated detection and classification of dynamic service outages
CN114064196A (en) System and method for predictive assurance
US20230016199A1 (en) Root cause detection of anomalous behavior using network relationships and event correlation
US11567756B2 (en) Causality determination of upgrade regressions via comparisons of telemetry data
CN113098715B (en) Information processing method, device, system, medium and computing equipment
CN118132333A (en) Disaster recovery method based on edge calculation and related device
US8478954B2 (en) Prioritizing subgroups in a consistency group
US20090259795A1 (en) Policy framework to treat data
KR102180105B1 (en) Method and apparatus for determining malicious software for software installed on device
Caricato et al. TIPS: A Zero-Downtime Platform Powered by Automation
CN117873408B (en) Cloud printer data recovery method and related device
US20230004468A1 (en) Identifying and collecting data from assets of a system under evaluation by a system analysis system
US20230188408A1 (en) Enhanced analysis and remediation of network performance
CN118246026A (en) Threat detection method and system for financial online banking encryption service
CN117762331A (en) Method and system for automatically updating configuration in distributed storage
CN117389692A (en) Disaster recovery protection method and device for virtual machine, electronic equipment and storage medium
CN117762740A (en) Method, system, equipment and medium for data security monitoring
CN117834271A (en) Data automatic synchronization method and system for cross-block chain network
CN114172732A (en) Authentication switching method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination