CN115297124A - System operation and maintenance management method and device and electronic equipment - Google Patents

System operation and maintenance management method and device and electronic equipment Download PDF

Info

Publication number
CN115297124A
CN115297124A CN202210877828.XA CN202210877828A CN115297124A CN 115297124 A CN115297124 A CN 115297124A CN 202210877828 A CN202210877828 A CN 202210877828A CN 115297124 A CN115297124 A CN 115297124A
Authority
CN
China
Prior art keywords
edge node
data
preset
edge
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210877828.XA
Other languages
Chinese (zh)
Other versions
CN115297124B (en
Inventor
吴文峰
林洁琬
黄鹄
毛廷鸿
沈聪
全树强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202210877828.XA priority Critical patent/CN115297124B/en
Publication of CN115297124A publication Critical patent/CN115297124A/en
Priority to PCT/CN2022/141396 priority patent/WO2024021469A1/en
Application granted granted Critical
Publication of CN115297124B publication Critical patent/CN115297124B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1031Controlling of the operation of servers by a load balancer, e.g. adding or removing servers that serve requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Abstract

A system operation and maintenance management method, a device and an electronic device are provided, wherein the method comprises the following steps: the method comprises the steps that a proxy server is newly added in a system, an edge node state monitoring task is deployed based on the newly added proxy server, when an abnormal edge node is monitored, mirror image data corresponding to the abnormal edge node is determined from a preset mirror image database, recovery of the abnormal edge node is executed based on the mirror image data, and/or when a business flow fluctuation trend in preset time is predicted based on an adaptive scheduling system, the edge node quantity value in the system is adjusted from a first quantity value to a second quantity value according to a preset list, and the edge node of the second quantity value is controlled to process business flow. By the method, the edge node resources are adaptively adjusted according to the short-time flow trend, so that the resources are more effectively utilized; high-availability applications, state monitoring and mirror image timing backup recovery tasks are configured on the edge cloud, and the center cloud and the edge cloud are coordinated for operation and maintenance, so that the automation degree of operation and maintenance is improved, and the continuity and reliability of services are ensured.

Description

System operation and maintenance management method and device and electronic equipment
Technical Field
The application relates to the technical field of cloud computing, in particular to a system operation and maintenance management method and device and electronic equipment.
Background
With the development of cloud computing technology, cloud computing is more and more widely applied to life, such as: when a user directly broadcasts a network video, the user usually needs the support of a central cloud and an edge cloud, the edge cloud is a brand-new network architecture and an open platform, the edge cloud is used as an extension of the central cloud, part of services and/or capabilities of the central cloud are expanded to an edge infrastructure, the capabilities include storage, calculation, network, big data, safety and the like, the open platform integrates core capabilities of the network, calculation, storage and application at the edge side of the network, the edge cloud changes the working mode of the traditional centralized cloud, and more flexible services and faster response speed can be provided for the user.
Specifically, the central cloud and the edge cloud have different functions, different hardware platforms to which the central cloud and the edge cloud belong, and different services to be deployed, and in the actual deployment and upgrade processes of the central cloud and the edge cloud, the central cloud and the edge cloud need to be deployed independently.
At present, because a central cloud and an edge cloud need to be separately managed, when service traffic has large fluctuation, edge node resources cannot be managed or released in a self-adaptive manner, so that when the service traffic is large, the edge node resources cannot be effectively used, when the service traffic is small, the resources of the edge nodes are not released in time, and the edge node resources cannot be scheduled in a self-adaptive manner, furthermore, the edge cloud and the central cloud need to be managed in a separate operation and maintenance manner, high-availability applications are not integrated together, when a new node is accessed or released, some high-availability applications need to be manually accessed into a system, and an image installed and deployed lacks effective management and maintenance, so that operation and maintenance are difficult, and reliability is insufficient.
Disclosure of Invention
The application provides a system operation and maintenance management method, a system operation and maintenance management device and electronic equipment, which are used for improving the qualification and reliability of a system and realizing the self-adaptive scheduling of edge nodes in the system.
In a first aspect, the present application provides a system operation and maintenance management method, where the method includes:
adding a new proxy server in a system, and deploying an edge node state monitoring task based on the new proxy server, wherein the edge node state monitoring task is used for monitoring the state of each edge node in the system;
when an abnormal edge node is monitored, determining mirror image data corresponding to the abnormal edge node from a preset mirror image database, and executing recovery of the abnormal edge node based on the mirror image data, wherein backup mirror image data used for recovery of each edge node in the system is stored in the preset mirror image database; and/or
When the business flow fluctuation trend in the preset time is predicted based on the adaptive scheduling system, the edge node quantity value in the system is adjusted from a first quantity value to a second quantity value according to a preset list, and the edge node of the second quantity value is controlled to process the business flow.
In one possible design, a proxy server is newly added to the system, including:
deploying high-availability applications from the newly-added proxy server, and determining a main server and a standby server from the high-availability applications;
generating a virtual IP address based on the main server and the standby server, and binding the virtual IP address with the main server;
and when the main server is detected to be abnormal, unbinding the virtual IP address and the main server, and binding the standby server and the virtual IP address until the main server is recovered.
In one possible design, when the abnormal edge node is monitored, the method further includes:
determining a first edge node which does not respond to the heartbeat information, and taking the first edge node as an abnormal edge node, wherein the heartbeat information is used for determining the abnormal edge node; and/or
When the running state of the second edge node is determined to be a waiting state, responding to the fact that the waiting time corresponding to the waiting state exceeds the preset time, and taking the second edge node as an abnormal edge node; and/or
And determining a third edge node of which the running state is the recovering state, and taking the third edge node as an abnormal edge node.
In one possible design, adjusting a quantity value of an edge node in the system from a first quantity value to a second quantity value based on the traffic flow includes:
processing the edge node data according to a preset mode to obtain a parameter value corresponding to the edge node data;
obtaining a flow trend of the service flow within a preset time based on the parameter value and a preset flow prediction module, wherein the flow trend represents a fluctuation range of the service flow within the preset time;
adjusting a quantity value of an edge node in a system from a first quantity value to a second quantity value based on the traffic momentum.
In a possible design, processing the edge node data according to a preset manner to obtain a parameter value corresponding to the edge node data includes:
extracting edge data of each type in the edge node data, deleting invalid data in each edge data based on a first preset method, and generating each group of de-noising data corresponding to the edge node data, wherein the invalid data are repeated data in the edge node data and data with larger deviation with other data in the edge node data;
processing each group of de-noising data based on the second preset method to obtain training data corresponding to each group of de-noising data;
and inputting the groups of training data into a preset model for training to obtain respective corresponding parameter values of the groups of training data.
In one possible design, adjusting the number value of the edge node in the system from a first number value to a second number value according to a preset list, and controlling the edge node of the second number value to process the traffic flow includes:
obtaining a first quantity value of a current edge node and obtaining the total amount of service flow;
determining a second numerical value of the edge node corresponding to the service flow from the preset list based on the total amount;
and adjusting the first numerical value into a second numerical value, and controlling the edge node of the second numerical value to process the service flow.
In a second aspect, the present application provides a system operation and maintenance management apparatus, including:
the monitoring module is used for adding a new proxy server in the system and deploying an edge node state monitoring task based on the new proxy server;
the recovery module is used for determining mirror image data corresponding to the abnormal edge node from a preset mirror image database when the abnormal edge node is monitored, and executing recovery of the abnormal edge node based on the mirror image data;
and the adjusting module is used for adjusting the edge node quantity value in the system from a first quantity value to a second quantity value according to a preset list when the business flow fluctuation trend in the preset time is predicted based on the adaptive scheduling system, and controlling the edge node of the second quantity value to process the business flow.
In one possible design, the monitoring module is specifically configured to deploy a high-availability application from the newly-added proxy server, determine a main server and a standby server from the high-availability application, generate a virtual IP address based on the main server and the standby server, bind the virtual IP address to the main server, unbind the virtual IP address from the main server when the abnormality of the main server is detected, and bind the standby server to the virtual IP address until the main server recovers.
In a possible design, the recovery module is specifically configured to, when an abnormal edge node is monitored, further include: and determining a first edge node which does not respond to the heartbeat information, and taking the first edge node as an abnormal edge node, and/or when determining that the running state of the second edge node is a waiting state, and responding to that the waiting time corresponding to the waiting state exceeds a preset time, taking the second edge node as an abnormal edge node, and/or determining a third edge node of which the running state is a recovering state, and taking the third edge node as an abnormal edge node.
In a possible design, the adjusting module is specifically configured to process the edge node data according to a preset manner, obtain a parameter value corresponding to the edge node data, obtain a traffic trend of the service traffic within a preset time based on the parameter value and a preset traffic prediction module, and adjust a number value of an edge node in the system from a first number value to a second number value based on the traffic trend.
In a possible design, the adjusting module is further configured to extract edge data of each type in the edge node data, delete invalid data in each edge data based on a first preset method, generate each set of denoising data corresponding to the edge node data, process each set of denoising data based on the second preset method, obtain training data corresponding to each set of denoising data, input each set of training data into a preset model for training, and obtain parameter values corresponding to each set of training data.
In a possible design, the adjusting module is further configured to obtain a first quantity value of the current edge node, obtain a total amount of the traffic flow, determine, from the preset list, a second quantity value of the edge node corresponding to the traffic flow based on the total amount, adjust the first quantity value to the second quantity value, and control the edge node of the second quantity value to process the traffic flow.
In a third aspect, the present application provides an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for realizing the steps of the system operation and maintenance management method when executing the computer program stored in the memory.
In a fourth aspect, a computer-readable storage medium has a computer program stored therein, and the computer program, when executed by a processor, implements the above-mentioned steps of a system operation and maintenance management method.
For each of the first aspect to the fourth aspect and possible technical effects achieved by each aspect, please refer to the above description of the technical effects that can be achieved by the first aspect or various possible schemes in the first aspect, and details are not repeated here.
Drawings
FIG. 1 is a flowchart illustrating steps of a system operation and maintenance management method provided in the present application;
fig. 2 is a schematic structural diagram of a system operation and maintenance management apparatus provided in the present application;
fig. 3 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clear, the present application will be further described in detail with reference to the accompanying drawings. The particular methods of operation in the method embodiments may also be applied in device embodiments or system embodiments. It should be noted that "a plurality" is understood as "at least two" in the description of the present application. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. A is connected with B and can represent: a and B are directly connected and A and B are connected through C. In addition, in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not intended to indicate or imply relative importance nor order to be construed.
At present, in a system of a central cloud and an edge cloud, due to the fact that the central cloud and the edge cloud have different functions and different hardware platforms and different deployed services, the central cloud and the edge cloud need to be separately managed, the central cloud and the edge cloud are not integrated together, when a new node is accessed or released, some highly available applications need to be manually accessed into the system, and an installed and deployed mirror image lacks effective management and maintenance, so that operation and maintenance of the system are difficult, and reliability is insufficient.
In order to solve the above-described problem, an embodiment of the present application provides a system operation and maintenance management method, which is used for implementing adaptive scheduling of edge cloud resources in a system and improving high availability and stability of the system. The method and the device in the embodiment of the application are based on the same technical concept, and because the principles of the problems solved by the method and the device are similar, the device and the embodiment of the method can be mutually referred, and repeated parts are not repeated.
The embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Referring to fig. 1, the present application provides a method for system operation and maintenance management, which can improve high availability and stability of a system, and can implement adaptive scheduling for edge cloud resources in the system, and improve utilization rate of the edge cloud resources, and an implementation process of the method is as follows:
step S1: and a newly-added proxy server exists in the system, and an edge node state monitoring task is deployed based on the newly-added proxy server.
The embodiment of the application aims to improve the high availability and stability of a system and enable the operation and maintenance management of the system to be more convenient and faster, two proxy servers and a self-adaptive scheduling system are additionally arranged in the system comprising a center cloud and an edge cloud, the proxy servers realize the operation and maintenance management of the edge cloud based on a mode of configuring high-availability application, the high-availability application can be an edge node state monitoring task, a mirror image timing backup task, a load balancing task, a flow forwarding task and the like, and the high-availability application can be adjusted based on the actual operation and maintenance condition of the system, so that the description is omitted here.
The described adaptive scheduling system predicts the service flow in the next system based on the edge node data sent by the proxy server, thereby realizing the adaptive scheduling of the edge node resources, can realize the unified management of the center cloud and the edge cloud based on the cooperative operation and maintenance of the described proxy server, the highly available application and the adaptive scheduling module, and in the real-time operation and maintenance management of the system, the service is automatically communicated to the operation and maintenance management system in a dynamic access mode, thereby improving the operation and maintenance capability of cloud edge cooperation.
It should be further noted that, when a new proxy server is added to the system, the new proxy server at least includes: the number of the main servers and the backup servers in the embodiment of the present application is 1, and the number of the main servers and the backup servers can be adjusted according to actual situations, which will not be described herein.
Specifically, a virtual IP address exists between the main server and the standby server, the virtual IP address is bound with the main server, when the main server is detected to be abnormal, in order to not influence the operation and maintenance of the system and ensure that each node in the system works normally, the standby server replaces the main server to work, and the virtual IP address is unbound with the main server and bound with the standby server until the main server recovers.
The new proxy server described above is used to deploy services such as service proxy, traffic forwarding, load balancing, node state monitoring, multipath, and the like in the system, and the new proxy server deploys an edge node state monitoring task, which is used to monitor the state of each edge node in the system.
Based on the above description, when the main server is abnormal, the standby server works instead of the main server, thereby realizing uninterrupted service in the system and ensuring the stability and high availability of the system.
Step S2: when the abnormal edge node is monitored, determining mirror image data corresponding to the abnormal edge node from a preset mirror image database, and executing recovery of the abnormal edge node based on the mirror image data.
The edge node state monitoring task described above sends heartbeat information to each edge node, and when an edge node in the system is in a normal working state, the edge node responds to the received heartbeat information, so that the edge node state monitoring task can determine whether the edge node is abnormal or not based on whether the edge node responds to the heartbeat information or not. In addition, the edge node state monitoring task can also determine whether the edge node is abnormal or not based on the running state of the edge node.
When the abnormal edge node appears in the system, the specific situation of the abnormal edge node is monitored as follows:
and when the first edge node which does not respond to the heartbeat information in the system is determined, taking the first edge node as an abnormal edge node. And/or
And when the second edge node with the running state being the waiting state in the system is determined, and the waiting time corresponding to the waiting state of the second edge node exceeds the preset time, taking the second edge node as the abnormal edge node. And/or
When the operation state in the system is determined to be the recovering third edge node, the recovering edge node is the failed edge node, and therefore the third edge node is used as the abnormal edge node.
After the abnormal edge node is determined, mirror image data corresponding to the abnormal edge node in a preset mirror image database is required to be determined, mirror images of all modules in the system are stored in the preset mirror image database, the preset mirror image database can be a Docker mirror image database, limitation is not required, and after the mirror image data corresponding to the abnormal edge node is determined, the abnormal edge node is recovered through the determined mirror image data.
The edge node state monitoring task in the embodiment of the present application may be a crontab timing daemon process, where the crontab command is commonly found in Unix and Unix-like operating systems, and is used to set a periodically executed instruction, the edge node state monitoring task may detect an edge node in the system according to a preset period, and an operating state of the edge node may be obtained based on an automated operation and maintenance tool, for example: and in the ansable tool, when the state of the edge node is abnormal, the system displays that the state of the edge node is ERROR, the service is quickly switched to a preset server through the multipath service, the preset server can be a BACKUP server, and a message is sent to the mirror image recovery task of the edge node.
After receiving the message of the edge node state monitoring task, the edge node image recovery task analyzes the received message, finds image data of a corresponding node from a preset image library, and performs image recovery by using a preset command, wherein the preset command can be a docker command.
It should be noted that, in order to ensure that an abnormal edge node in the system can be recovered in real time, a mirror image timing backup task in the system backs up mirror image data of each edge node in the system, and a specific backup method is to check and select a mirror image for backup at each cluster node container at an appropriate timing.
Based on the method, when the abnormal edge nodes appear in the system, the abnormal edge nodes are quickly recovered through mirror image backup data of all the edge nodes, the abnormal edge nodes in the system are updated, the normal state of the edge nodes in the system is ensured, the system is more flexibly deployed, and the convenience of operation and maintenance of the system is improved.
And step S3: when the business flow fluctuation trend in the preset time is predicted based on the adaptive scheduling system, the edge node quantity value in the system is adjusted from a first quantity value to a second quantity value according to a preset list, and the edge node of the second quantity value is controlled to process the business flow.
In the system operation and maintenance management method in the embodiment of the present application, the method further includes adaptive scheduling of edge node resources, which is used to implement high utilization rate of the edge node resources, and the specific process of the adaptive scheduling of the edge node resources is as follows:
when predicting a service flow fluctuation trend within a preset time based on an adaptive scheduling system, edge node data of an edge node needs to be acquired in real time, wherein the edge node data at least comprises: the edge node data includes multiple types of edge data, and each edge data includes invalid data, which is duplicate data in the edge data and data having a large deviation from other data in the service data, so that the invalid data needs to be deleted from each edge data.
Specifically, the invalid data in each service data is deleted based on a first preset method, where the first preset method specifically is: each service data from which invalid data is deleted is denoised by a k-nearest neighbor method, and processing the data based on the k-nearest neighbor method is a technique well known to those skilled in the art, so the detailed process of denoising the data based on the k-nearest neighbor method is not specifically described here.
Based on a k-nearest neighbor method, data with a distance of 5 nearest neighbors exceeding a threshold value in each edge data can be deleted as abnormal data, denoising data corresponding to each edge data is generated, and each group of denoising data corresponding to edge node data is obtained.
After each group of denoising data of the service flow is obtained, normalization processing needs to be performed on each group of denoising data, and parameters of each group of denoising data are unified to a substantially same value interval.
After the training data is obtained, the training data needs to be input into a preset model for training, and the training method in the preset model at least comprises the following steps: and obtaining parameter values corresponding to each group of training data by multi-parameter fitting, correlation coefficients, a linear regression algorithm and the like.
After the system detects a series of parameter values corresponding to the service flow, the system can determine the flow trend of the service flow within the preset time based on each parameter value and the preset flow prediction module, wherein the flow trend represents the fluctuation range of the service flow within the preset time.
Such as: the edge node data is shown in table 1:
edge node data Edge data
a {1.2、2.3、2.4、5.7、2.3、1.9、3.2、6、1.2}
b {2.1、1.9、3.3、2.2、2.8、1.6、1.2、3、3.2}
c {1.7、2.6、1.4、3.6、1.3、1.7、1.2、3、2.2}
...... ......
TABLE 1
Part of edge data of the edge node data is recorded in table 1, only 3 sets of edge data are listed in table 1, each set of edge data has 9 parameters, and each set of service data is data after deduplication, and the 9 parameters are trained and normalized.
After training and normalizing each set of edge data based on each set of edge data in table 1, the following table 2 is obtained:
edge node data Parameter value
a 2.6
b 3.1
c 1.8
...... ......
TABLE 2
In table 2, 3 sets of parameter values corresponding to the edge data are illustrated, where a parameter value corresponding to the edge data a is 2.6, a parameter value corresponding to the edge data b is 3.1, and a parameter value corresponding to the edge data c is 1.8, based on which a traffic trend corresponding to the traffic flow can be determined, and parameter values corresponding to other edge data and other edge data are referred to table 2, which is not described herein too much.
After the traffic trend of the traffic flow is determined, the number value of the edge node in the system is adjusted from a first number value to a second number value based on the traffic trend.
The specific way of adjusting the number value of the edge node is as follows: obtaining a first quantity value of a current edge node, in order to determine a traffic trend of the traffic data, obtaining a total amount of the traffic data, and determining a second quantity value of the edge node corresponding to the total amount of the traffic data from a preset list based on the total amount, where the preset list records a corresponding relationship between the total amount of the traffic data and the second quantity value of the edge node, and the preset list is as follows:
aggregate amount of traffic flow Second numerical value of edge node
90 10
100 20
110 30
...... ......
TABLE 3
In the above table 3, each total amount of the service traffic corresponds to the second quantity value of the edge node, the above table 3 only lists 3 examples for explaining the corresponding relationship between the total amount of the service and the second quantity value, and the second quantity values of the edge nodes corresponding to the total amounts of other service traffic refer to the examples in the above table 3, which is not listed here.
Through the table 3, the first number value of the edge node in the current system is determined, and then the first number value is adjusted to the second number value based on the traffic momentum of the traffic flow, so that the number of the edge node is determined before the traffic flow is processed, and further, the effective scheduling of the edge node resource is realized.
When the flow trend is large, the system can accommodate a new edge node, so that the aim of reducing the pressure of the edge node is fulfilled; when the traffic trend is small, the system can release the edge node and forward the released service traffic of the edge node, so that the resources of the edge node can be effectively utilized.
Based on the above description, when the business flow fluctuation trend within the preset time is predicted based on the adaptive scheduling system, the number of edge nodes can be adjusted based on the load condition of the edge nodes, so that the problem of overload when each edge node processes the business flow is avoided, and after the business flow is processed, the parameter value corresponding to the business flow is obtained, and then the flow trend corresponding to the business flow is obtained, so that the adjustment is performed based on the number of the edge nodes in the flow trend system, the adaptive adjustment of the number of the edge nodes is ensured, and the utilization rate of the edge nodes is improved.
Based on the same inventive concept, an embodiment of the present application further provides a system operation and maintenance management apparatus, where the system operation and maintenance management apparatus is configured to implement a function of a system operation and maintenance management method, and with reference to fig. 2, the apparatus includes:
a monitoring module 201, configured to add a new proxy server in a system, and deploy an edge node state monitoring task based on the new proxy server;
the recovery module 202 is configured to determine, when an abnormal edge node is monitored, mirror image data corresponding to the abnormal edge node from a preset mirror image database, and perform recovery of the abnormal edge node based on the mirror image data;
the adjusting module 203 is configured to, when predicting a fluctuation trend of the traffic flow within a preset time based on the adaptive scheduling system, adjust the number value of the edge node in the system from a first number value to a second number value according to a preset list, and control the edge node of the second number value to process the traffic flow.
In one possible design, the monitoring module 201 is specifically configured to deploy a high-availability application from the newly-added proxy server, determine a main server and a standby server from the high-availability application, generate a virtual IP address based on the main server and the standby server, bind the virtual IP address with the main server, unbind the virtual IP address from the main server when the abnormality of the main server is detected, and bind the standby server with the virtual IP address until the main server recovers.
In a possible design, the recovery module 202 is specifically configured to, when an abnormal edge node is monitored, further include: and determining a first edge node which does not respond to the heartbeat information, and taking the first edge node as an abnormal edge node, and/or when determining that the running state of the second edge node is a waiting state, and responding to that the waiting time corresponding to the waiting state exceeds a preset time, taking the second edge node as an abnormal edge node, and/or determining a third edge node of which the running state is a recovering state, and taking the third edge node as an abnormal edge node.
In a possible design, the adjusting module 203 is specifically configured to process the edge node data according to a preset manner, obtain a parameter value corresponding to the edge node data, obtain a traffic trend of the service traffic within a preset time based on the parameter value and a preset traffic prediction module, and adjust a quantity value of an edge node in the system from a first quantity value to a second quantity value based on the traffic trend.
In a possible design, the adjusting module 203 is further configured to extract edge data of each type in the edge node data, delete invalid data in each edge data based on a first preset method, generate each set of de-noising data corresponding to the edge node data, process each set of de-noising data based on the second preset method, obtain training data corresponding to each set of de-noising data, and input each set of training data into a preset model for training, so as to obtain parameter values corresponding to each set of training data.
In a possible design, the adjusting module 203 is further configured to obtain a first quantity value of the current edge node, obtain a total amount of the service traffic, determine, from the preset list, a second quantity value of the edge node corresponding to the service traffic based on the total amount, adjust the first quantity value to the second quantity value, and control the edge node of the second quantity value to process the service traffic.
Based on the same inventive concept, an embodiment of the present application further provides an electronic device, where the electronic device may implement the function of the foregoing system operation and maintenance management apparatus, and with reference to fig. 3, the electronic device includes:
at least one processor 301 and a memory 302 connected to the at least one processor 301, in this embodiment, a specific connection medium between the processor 301 and the memory 302 is not limited in this application, and fig. 3 illustrates an example where the processor 301 and the memory 302 are connected through a bus 300. The bus 300 is shown in fig. 3 by a thick line, and the connection between other components is merely illustrative and not limited thereto. The bus 300 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 3 for ease of illustration, but does not represent only one bus or type of bus. Alternatively, the processor 301 may also be referred to as a controller, without limitation to name a few.
In the embodiment of the present application, the memory 302 stores instructions executable by the at least one processor 301, and the at least one processor 301 may execute the system operation and maintenance management method discussed above by executing the instructions stored in the memory 302. The processor 301 may implement the functions of the various modules in the apparatus shown in fig. 2.
The processor 301 is a control center of the apparatus, and may be connected to various parts of the entire control device by using various interfaces and lines, and perform various functions and process data of the apparatus by operating or executing instructions stored in the memory 302 and calling data stored in the memory 302, thereby performing overall monitoring of the apparatus.
In one possible design, processor 301 may include one or more processing units, and processor 301 may integrate an application processor that primarily handles operating systems, user interfaces, application programs, and the like, and a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 301. In some embodiments, the processor 301 and the memory 302 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The processor 301 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the system operation and maintenance management method disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
The memory 302, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 302 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 302 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 302 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.
By programming the processor 301, the code corresponding to the system operation and maintenance management method described in the foregoing embodiment may be solidified into a chip, so that the chip can execute a system operation and maintenance management step of the embodiment shown in fig. 1 when running. How to program the processor 301 is well known to those skilled in the art and will not be described herein.
Based on the same inventive concept, the present application further provides a storage medium storing computer instructions, which when executed on a computer, cause the computer to perform the system operation and maintenance management method discussed above.
In some possible embodiments, the present application provides that the various aspects of a system operation and maintenance management method may also be implemented in the form of a program product, which includes program code for causing the control device to perform the steps in a system operation and maintenance management method according to various exemplary embodiments of the present application described above in this specification, when the program product runs on an apparatus.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. A system operation and maintenance management method is characterized by comprising the following steps:
adding a new proxy server in a system, and deploying an edge node state monitoring task based on the new proxy server, wherein the edge node state monitoring task is used for monitoring the state of each edge node in the system;
when an abnormal edge node is monitored, determining mirror image data corresponding to the abnormal edge node from a preset mirror image database, and executing recovery of the abnormal edge node based on the mirror image data, wherein backup mirror image data used for recovery of each edge node in the system is stored in the preset mirror image database; and/or
When a business flow fluctuation trend in preset time is predicted based on an adaptive scheduling system, the number value of the edge node in the system is adjusted from a first number value to a second number value according to a preset list, and the edge node of the second number value is controlled to process the business flow, wherein the preset list is the corresponding relation between the business flow and the number of the edge node.
2. The method of claim 1, wherein adding a proxy server in the system comprises:
deploying high-availability applications from the newly-added proxy server, and determining a main server and a standby server from the high-availability applications;
generating a virtual IP address based on the main server and the standby server, and binding the virtual IP address with the main server;
and when the abnormality of the main server is detected, unbinding the virtual IP address and the main server, and binding the standby server and the virtual IP address until the main server recovers.
3. The method of claim 1, wherein when an abnormal edge node is monitored, further comprising:
determining a first edge node which does not respond to the heartbeat information, and taking the first edge node as an abnormal edge node, wherein the heartbeat information is used for determining the abnormal edge node; and/or
When the running state of the second edge node is determined to be a waiting state, responding to the fact that the waiting time corresponding to the waiting state exceeds the preset time, and taking the second edge node as an abnormal edge node; and/or
And determining a third edge node of which the running state is the recovering state, and taking the third edge node as an abnormal edge node.
4. The method of claim 1, wherein adjusting the number of edge nodes in the system from a first number to a second number according to a predetermined list comprises:
processing the edge node data according to a preset mode to obtain a parameter value corresponding to the edge node data;
obtaining the flow trend of the service flow within preset time based on the parameter value and a preset flow prediction module, wherein the flow trend represents the fluctuation range of the service flow within the preset time;
adjusting a quantity value of an edge node in a system from a first quantity value to a second quantity value based on the traffic momentum.
5. The method of claim 4, wherein processing the edge node data in a preset manner to obtain parameter values corresponding to the edge node data comprises:
extracting edge data of each type in the edge node data, deleting invalid data in each edge data based on a first preset method, and generating each group of de-noising data corresponding to the edge node data, wherein the invalid data are repeated data in the edge node data and data with larger deviation with other data in the edge node data;
processing each group of de-noising data based on the second preset method to obtain training data corresponding to each group of de-noising data;
and inputting the groups of training data into a preset model for training to obtain respective corresponding parameter values of the groups of training data.
6. The method of claim 1, wherein adjusting the number of edge nodes in the system from a first number to a second number according to a predetermined list, and controlling the edge nodes of the second number to process the traffic flow comprises:
obtaining a first quantity value of a current edge node and obtaining the total amount of service flow;
determining a second numerical value of the edge node corresponding to the service flow from the preset list based on the total amount;
and adjusting the first numerical value into a second numerical value, and controlling the edge node of the second numerical value to process the service flow.
7. A system operation and maintenance management apparatus, comprising:
the monitoring module is used for newly adding a proxy server in the system and deploying an edge node state monitoring task based on the newly added proxy server;
the recovery module is used for determining mirror image data corresponding to the abnormal edge node from a preset mirror image database when the abnormal edge node is monitored, and executing recovery of the abnormal edge node based on the mirror image data;
and the adjusting module is used for adjusting the edge node quantity value in the system from a first quantity value to a second quantity value according to a preset list when the business flow fluctuation trend in preset time is predicted based on the adaptive scheduling system, and controlling the edge node of the second quantity value to process the business flow.
8. The apparatus of claim 7, wherein the monitoring module is specifically configured to deploy a highly available application from the newly added proxy server, determine a primary server and a backup server from the highly available application, generate a virtual IP address based on the primary server and the backup server, bind the virtual IP address to the primary server, unbind the virtual IP address from the primary server when the primary server is detected as abnormal, and bind the backup server to the virtual IP address until the primary server is recovered.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the method steps of any one of claims 1-6 when executing the computer program stored on the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-6.
CN202210877828.XA 2022-07-25 2022-07-25 System operation and maintenance management method and device and electronic equipment Active CN115297124B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210877828.XA CN115297124B (en) 2022-07-25 2022-07-25 System operation and maintenance management method and device and electronic equipment
PCT/CN2022/141396 WO2024021469A1 (en) 2022-07-25 2022-12-23 System operation and maintenance management method and apparatus, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210877828.XA CN115297124B (en) 2022-07-25 2022-07-25 System operation and maintenance management method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN115297124A true CN115297124A (en) 2022-11-04
CN115297124B CN115297124B (en) 2023-08-04

Family

ID=83824243

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210877828.XA Active CN115297124B (en) 2022-07-25 2022-07-25 System operation and maintenance management method and device and electronic equipment

Country Status (2)

Country Link
CN (1) CN115297124B (en)
WO (1) WO2024021469A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116450356A (en) * 2023-04-21 2023-07-18 珠海创投港珠澳大桥珠海口岸运营管理有限公司 Cross-border logistics management method based on cloud management and control
WO2024021469A1 (en) * 2022-07-25 2024-02-01 天翼云科技有限公司 System operation and maintenance management method and apparatus, and electronic device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180278541A1 (en) * 2015-12-31 2018-09-27 Huawei Technologies Co., Ltd. Software-Defined Data Center and Service Cluster Scheduling and Traffic Monitoring Method Therefor
CN111355610A (en) * 2020-02-25 2020-06-30 网宿科技股份有限公司 Exception handling method and device based on edge network
CN111756800A (en) * 2020-05-21 2020-10-09 网宿科技股份有限公司 Method and system for processing burst flow
CN112822283A (en) * 2021-01-21 2021-05-18 重庆紫光华山智安科技有限公司 Edge node control method and device, control node and storage medium
CN113315719A (en) * 2020-02-27 2021-08-27 阿里巴巴集团控股有限公司 Traffic scheduling method, device, system and storage medium
US20220116289A1 (en) * 2021-12-22 2022-04-14 Palaniappan Ramanathan Adaptive cloud autoscaling
US20220129673A1 (en) * 2020-10-22 2022-04-28 X Development Llc Edge-based processing of agricultural data
CN114499979A (en) * 2021-12-28 2022-05-13 云南电网有限责任公司信息中心 SDN abnormal flow cooperative detection method based on federal learning
CN114679463A (en) * 2022-05-09 2022-06-28 苏州思萃工业互联网技术研究所有限公司 Method and device for realizing PCDN (Primary Contourlet distribution) resource management

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110838932A (en) * 2018-08-17 2020-02-25 阿里巴巴集团控股有限公司 Network current limiting method and device and electronic equipment
CN111314149B (en) * 2020-02-26 2023-07-18 赛特斯信息科技股份有限公司 System for realizing unified monitoring operation and maintenance management based on multiple edge cloud platforms
US20220104127A1 (en) * 2020-09-25 2022-03-31 Samsung Electronics Co., Ltd. Method and apparatus for power management in a wireless communication system
CN112511456B (en) * 2020-12-21 2024-03-22 北京百度网讯科技有限公司 Flow control method, apparatus, device, storage medium, and computer program product
CN115297124B (en) * 2022-07-25 2023-08-04 天翼云科技有限公司 System operation and maintenance management method and device and electronic equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180278541A1 (en) * 2015-12-31 2018-09-27 Huawei Technologies Co., Ltd. Software-Defined Data Center and Service Cluster Scheduling and Traffic Monitoring Method Therefor
CN111355610A (en) * 2020-02-25 2020-06-30 网宿科技股份有限公司 Exception handling method and device based on edge network
CN113315719A (en) * 2020-02-27 2021-08-27 阿里巴巴集团控股有限公司 Traffic scheduling method, device, system and storage medium
CN111756800A (en) * 2020-05-21 2020-10-09 网宿科技股份有限公司 Method and system for processing burst flow
US20220129673A1 (en) * 2020-10-22 2022-04-28 X Development Llc Edge-based processing of agricultural data
CN112822283A (en) * 2021-01-21 2021-05-18 重庆紫光华山智安科技有限公司 Edge node control method and device, control node and storage medium
US20220116289A1 (en) * 2021-12-22 2022-04-14 Palaniappan Ramanathan Adaptive cloud autoscaling
CN114499979A (en) * 2021-12-28 2022-05-13 云南电网有限责任公司信息中心 SDN abnormal flow cooperative detection method based on federal learning
CN114679463A (en) * 2022-05-09 2022-06-28 苏州思萃工业互联网技术研究所有限公司 Method and device for realizing PCDN (Primary Contourlet distribution) resource management

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
F. PAOLUCCI; F. CIVERCHIA; A. SGAMBELLURI; A. GIORGETTI; F. CUGINI; P. CASTOLDI: "P4 edge node enabling stateful traffic engineering and cyber security", 《JOURNAL OF OPTICAL COMMUNICATIONS AND NETWORKING》, vol. 11, no. 1, XP011711464, DOI: 10.1364/JOCN.11.000A84 *
刘嵩; 李文蕙: "云计算系统中基于边缘节点和容量的延迟分析", 《计算机应用于软件》, vol. 31, no. 04 *
叶春明; 王珍; 陈思; 单洪: "基于节点行为特征分析的网络流量分类方法", 《电子与信息学报》, vol. 36, no. 09 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024021469A1 (en) * 2022-07-25 2024-02-01 天翼云科技有限公司 System operation and maintenance management method and apparatus, and electronic device
CN116450356A (en) * 2023-04-21 2023-07-18 珠海创投港珠澳大桥珠海口岸运营管理有限公司 Cross-border logistics management method based on cloud management and control
CN116450356B (en) * 2023-04-21 2024-02-02 珠海创投港珠澳大桥珠海口岸运营管理有限公司 Cross-border logistics management method based on cloud management and control

Also Published As

Publication number Publication date
CN115297124B (en) 2023-08-04
WO2024021469A1 (en) 2024-02-01

Similar Documents

Publication Publication Date Title
CN115297124B (en) System operation and maintenance management method and device and electronic equipment
CN108633311B (en) Method and device for concurrent control based on call chain and control node
CN109656742B (en) Node exception handling method and device and storage medium
CN106951559B (en) Data recovery method in distributed file system and electronic equipment
CN106789141B (en) Gateway equipment fault processing method and device
CN111818159A (en) Data processing node management method, device, equipment and storage medium
CN106874142B (en) Real-time data fault-tolerant processing method and system
CN108268305A (en) For the system and method for virtual machine scalable appearance automatically
CN111143133A (en) Virtual machine backup method and backup virtual machine recovery method
CN113704052B (en) Operation and maintenance system, method, equipment and medium of micro-service architecture
CN112395124A (en) Robot abnormity control method and device in cluster environment
CN108170507B (en) Virtual application management method/system, computer readable storage medium and server
CN1322422C (en) Automatic startup of cluster system after occurrence of recoverable error
CN111459642A (en) Fault processing and task processing method and device in distributed system
CN116340005A (en) Container cluster scheduling method, device, equipment and storage medium
CN113132176B (en) Method for controlling edge node, node and edge computing system
CN111652728A (en) Transaction processing method and device
CN108924772B (en) Short message sending method and device, computer equipment and storage medium
CN108810992B (en) Resource control method and device for network slice
CN111209084B (en) FAAS distributed computing method and device
CN115291891A (en) Cluster management method and device and electronic equipment
CN109032674B (en) Multi-process management method, system and network equipment
CN111158956A (en) Data backup method and related device for cluster system
CN113630317B (en) Data transmission method and device, nonvolatile storage medium and electronic device
CN112269693B (en) Node self-coordination method, device and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant