WO2021218645A1 - 节点控制的方法、系统以及装置 - Google Patents

节点控制的方法、系统以及装置 Download PDF

Info

Publication number
WO2021218645A1
WO2021218645A1 PCT/CN2021/087365 CN2021087365W WO2021218645A1 WO 2021218645 A1 WO2021218645 A1 WO 2021218645A1 CN 2021087365 W CN2021087365 W CN 2021087365W WO 2021218645 A1 WO2021218645 A1 WO 2021218645A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
working mode
master
central system
connection
Prior art date
Application number
PCT/CN2021/087365
Other languages
English (en)
French (fr)
Inventor
刘辉勇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP21797439.3A priority Critical patent/EP4132065A4/en
Publication of WO2021218645A1 publication Critical patent/WO2021218645A1/zh
Priority to US17/974,911 priority patent/US20230039817A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/08Testing, supervising or monitoring using real traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/02Arrangements for optimising operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0805Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability
    • H04L43/0811Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters by checking availability by checking connectivity
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/58Association of routers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W24/00Supervisory, monitoring or testing arrangements
    • H04W24/04Arrangements for maintaining operational condition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/20Arrangements for monitoring or testing data switching networks the monitoring system or the monitored elements being virtualised, abstracted or software-defined entities, e.g. SDN or NFV

Definitions

  • This application relates to the field of communications, and in particular to a method, system and device for node control.
  • the European Telecommunications Standards Institute ETSI
  • MEC mobile edge computing
  • 5G 5th generation
  • ETSI European Telecommunications Standards Institute
  • the MEC technology can improve user experience and save bandwidth resources;
  • the MEC technology provides third-party application integration by sinking computing power to mobile edge nodes, providing unlimited possibilities for service innovation at the mobile edge entrance .
  • MEC technology is inseparable from the orchestration capability of virtual network function manager (VNFM). For example, MEC technology also needs to consider the "dual master” and "no master” issues of VNFM.
  • VNFM virtual network function manager
  • the "dual master” problem refers to a phenomenon in which two machines or two nodes in a master and backup relationship, when triggered by certain conditions, turn themselves into master nodes. When this phenomenon occurs, since both nodes are processing the same business as the master node, the problem of inconsistent business processing results on the two nodes may occur. The dual-machine of the active and standby mode needs to avoid this situation as much as possible.
  • the “no-master” problem means that two machines or two nodes in a master and backup relationship, when triggered by certain conditions, both turn themselves into backup nodes. When this phenomenon occurs, since both nodes are standby node identities, it will cause business interruption.
  • VNFM dual master
  • non-owner problems are issues that need to be resolved urgently.
  • the present application provides a method, system and device for node control, which can not only solve the dual-master problem at a lower cost, but also avoid the occurrence of the non-master problem.
  • a method for node control is provided, which is applied to a node control system.
  • the node control system includes: a first node, a second node, and a central system.
  • the first node and the second node are located at the edge.
  • the method includes: the first node monitors the connection state with the central system, and when the connection between the first node and the central system is interrupted, switching to the standby node working mode; the central The system monitors the connection status with the first node, and when the connection between the central system and the first node is interrupted, sends a master upgrade command to the second node, and the master upgrade command is used to notify slaves
  • the working mode of the standby node is switched to the working mode of the master node; the second node is switched to the working mode of the master node according to the master promotion command.
  • connection between the first node and the central system is interrupted, or the connection between the central system and the first node is interrupted, it means that the connection between the central system and the first node is interrupted. status.
  • the current working mode of the first node is the main node working mode
  • the current working mode of the second node is the standby node working mode, which is just an example, which is used to indicate that: at a certain time, in the edge system Among multiple nodes (such as two nodes, or more than two nodes), one node is in the working mode of the master node, and the other node is in the working mode of the standby node.
  • the master promotion operation is carried out. Specifically, on the one hand, by detecting the connectivity between the node in the working mode of the master node and the central system, it is judged whether the node in the working mode of the master node needs to be backed up (that is, the original master node is reduced to a backup node), so as to avoid occurrence Double master problem.
  • the central system may issue a master upgrade command to a node in the standby node working mode, so that the original standby node switches to the master node working mode. Therefore, there is no need to add a new arbitration node, but by integrating the switching capability into the central system, not only the waste of additional resources can be reduced, but also the ownerless problem can be avoided.
  • the central system can decide whether to issue a master upgrade command according to the connection with the node in the master node working mode, which can reduce unnecessary switching and save switching time.
  • the embodiments of the present application can also be used in MEC technology, so as to solve the dual-master problem and the non-master problem that may occur in the MEC technology.
  • the first node monitors the connection state with the second node, and/or the second node monitors the connection with the first node When it is determined that the connection between the first node and the second node is interrupted or connected, the current working mode of the first node and/or the second node is maintained.
  • the detection between nodes such as heartbeat detection
  • the detection between the node and the central system jointly determines whether the node in the state of the master node needs to perform the backup operation (ie, the backup node), so that it can be Avoid dual master problems.
  • the node in the working mode of the master node determines whether to downgrade according to the connection between itself and the central system, so even if there is a hardware failure (such as physical host network card failure or physical host failure or network equipment failure, etc.), or ,
  • the primary node or the backup node cannot detect the heartbeat of the opposite end due to internal software failures (failures caused by factors other than network cards, networks, etc.), and there will be no misjudgment that may lead to dual-master problems.
  • the monitoring of the connection status between the first node and the central system includes: monitoring the connection status between the first node and the gateway, so The gateway is between the first node and the central system.
  • the gateway is located in the edge system.
  • a method for node control is provided, which is applied to a first node in an edge system, and the first node is connected to a central system.
  • the method includes: determining that the current working mode of the first node is In the case of the master node working mode, the first node monitors the connection state between itself and the central system, and when it is determined that the connection between the first node and the central system is interrupted, switch to the standby node working mode; In the case where it is determined that the current working mode of the first node is the standby node working mode, the first node listens to the command from the central system, and when receiving the main upgrade command sent by the central system, switches to The working mode of the master node, wherein the upgrade master command is used to notify the switch from the working mode of the standby node to the working mode of the master node.
  • the master upgrade command may be received when the connection between the second node currently in the master node working mode and the central system is interrupted, the edge system further includes the second node, and the second node The node is connected to the central system.
  • the first node when it is determined that it is in the working mode of the primary node, when it is detected that the connection between the self and the central system is interrupted, it switches to the working mode of the standby node. That is to say, if the first node is in the working mode of the primary node, the first node switches to the working mode of the standby node when detecting that the connection between itself and the central system is interrupted. It can be understood that, for a node in the working mode of the master node, when it detects that the connection between itself and the central system is interrupted, the backup operation is performed.
  • the standby node working mode is maintained.
  • the command from the central system is monitored, and when the master upgrade command from the central system is received, it switches to the working mode of the primary node. That is to say, if the first node is in the standby node working mode, the first node switches to the master node working mode when receiving the master upgrade command from the central system. It can be understood that, for the node in the standby node working mode, after receiving the master promotion command from the central system, the master promotion operation is performed.
  • listening to commands from the central system means that they are in the listening state or in the receiving state.
  • the master promotion operation is carried out. Specifically, on the one hand, by detecting the connectivity between the node in the working mode of the master node and the central system, it is judged whether the node in the working mode of the master node needs to be backed up (that is, the original master node is reduced to a backup node), so as to avoid occurrence Double master problem.
  • the central system may issue a master upgrade command to a node in the standby node working mode, so that the original standby node switches to the master node working mode. Therefore, there is no need to add a new arbitration node, but by integrating the switching capability into the central system, not only the waste of additional resources can be reduced, but also the ownerless problem can be avoided.
  • the central system can decide whether to issue a master upgrade command according to the connection with the node in the master node working mode, which can reduce unnecessary switching and save switching time.
  • the embodiments of the present application can also be used in MEC technology, so as to solve the dual-master problem and the non-master problem that may occur in the MEC technology.
  • the edge system further includes a second node, the second node is connected to the central system, and the first node monitors itself and the second node.
  • connection with the second node is interrupted, and the connection with the central system is detected to be interrupted, if it is in the working mode of the primary node, switch to the working mode of the backup node; if it is working in the backup node Mode, the standby node working mode is maintained; if the connection with the second node is detected to be interrupted, and the connection with the central system is not detected, then no matter whether it is in the working mode of the primary node or the working mode of the standby node, it will be No processing.
  • the first node may periodically detect the connection with the second node. For example, after the system is started, the first node may periodically detect the connection with the second node.
  • the first node detects whether the connection with the second node is normal by pinging the IP address of the gateway, or sending a message (such as a heartbeat signal).
  • the first node monitoring the connection status between itself and the central system includes: the first node monitoring the connection status between itself and the gateway , The gateway is between the first node and the central system.
  • the first node periodically detects the connection with the gateway. For example, after the system is started, the first node can periodically detect the connection with the gateway.
  • the gateway is in the edge system.
  • the method further includes: when detecting that the first node is in an active state, determining that the current working mode of the first node is the master node working mode, Or, when it is detected that the first node is in an inactive state, it is determined that the current working mode of the first node is the standby node working mode; or, when it is detected that the business application on the first node is in the running state, it is determined The current working mode of the first node is the master node working mode, or when it is detected that the service application on the first node is in a stopped state, it is determined that the current working mode of the first node is the standby node working mode; or When it is detected that there is no status identifier on the first node, it is determined that the current operating mode of the first node is the master node operating mode, or when the status identifier is detected on the first node, it is determined that the first node The current working mode of a node is the working mode
  • the active state may indicate whether the service can be provided externally, for example.
  • the master node provides external services. For example, when it is detected that the first node provides services externally, it is determined that the current working mode of the first node is the main node working mode, or when it is detected that the first node cannot provide services externally, it is determined that the first node is unable to provide external services.
  • the current working mode of the node is the working mode of the standby node.
  • the status of the node, or the status of the business application on the node, or identification, etc. can be used to determine whether it is a master node or a backup node.
  • a method for node control includes: a central system monitors the connection status with a first node, the current working mode of the first node is the master node working mode; When the connection of the first node is interrupted, a master upgrade command is sent to the second node, and the current working mode of the second node is the backup node working mode, and the master upgrade command is used to notify the switch from the backup node working mode to the master Node working mode.
  • the arbitration capability can be integrated into the central system to provide.
  • the central system can decide whether the standby node should switch to the working mode of the primary node according to the failure condition of the service channel, instead of detecting the connectivity between the newly added arbitration node and the edge device.
  • the switching capability into the central system, additional waste of resources can be reduced.
  • determining whether to switch between the active and standby nodes according to the failure of the service channel can reduce unnecessary switching and save switching time.
  • the embodiments of the present application can also be used in MEC technology, so as to solve the "dual master" problem and the "non-master" problem that may occur in the MEC technology.
  • the central system monitors the connection status with the first node, including: the central system uses the secure shell protocol SSH command to remotely log in to the node, and detects the connection between the central system and the first node. Is the connection between them normal?
  • a method for node control is provided, which is applied to a first node in an edge system, and the first node is connected to a central system.
  • the method includes: the first node detects a connection with the central system Connection; in the case where it is determined that it is in the working mode of the master node, when it is detected that the connection between itself and the central system is interrupted, switch to the backup node working mode; in the case of determining that it is in the backup node working mode, in When it is detected that the connection between itself and the central system is interrupted, the standby node working mode is maintained.
  • the connectivity between the node and the central system is detected to determine whether the node needs to be downgraded (that is, the original primary node is reduced to a backup node), so as to avoid the dual-master problem.
  • the node performs the central system connectivity detection.
  • the master node can be actively reduced to a backup node. In this way, not only the appearance of dual masters can be avoided, but also the situation that the network cannot be found until after the switch is found can be avoided, and the consumption of ineffective switch time can be reduced.
  • the embodiments of the present application can also be used in MEC technology, so as to solve the "dual master" problem that may occur in MEC technology.
  • the edge system further includes a second node
  • the method further includes: detecting the connection between the first node and the second node; When the connection between the node and the second node is interrupted, no matter whether the first node is in the working mode of the primary node or the working mode of the standby node, no processing is performed.
  • the detection of the connection between the first node and the central system includes: the detection of the connection between the first node and the gateway, the gateway Between the first node and the central system.
  • the gateway is located in the edge system.
  • a method for node control is provided, which is applied to a first node in an edge system; the method includes: receiving a master upgrade command from a central system, where the master upgrade command is used to notify the standby node to switch to The working mode of the master node; when it is determined that it is in the working mode of the standby node, it switches to the working mode of the master node.
  • the central system when it is determined that it is in the working mode of the master node, it switches to the working mode of the backup node.
  • the original active and standby nodes need to be switched, that is, the central system can be connected to two nodes, and the central system issues a master upgrade command (or can also be called a master upgrade command) to the two nodes ,
  • the node that was originally in the working mode of the primary node is switched to the working mode of the backup node, and the node that was originally in the working mode of the backup node is switched to the working mode of the primary node.
  • the master promotion operation is performed.
  • business linkage detection can be performed, and arbitration capabilities can be integrated into the central system to provide.
  • the central system can decide whether the original backup node should switch to the primary node according to the failure condition of the service channel, instead of detecting according to the connectivity between the arbitration and the edge device.
  • additional waste of resources can be reduced.
  • determining whether to switch between the active and standby nodes according to the failure of the service channel can reduce unnecessary switching and save switching time.
  • the embodiments of the present application can also be used in MEC technology, so as to solve the "dual master" problem and the "non-master" problem that may occur in the MEC technology.
  • a node control system in a sixth aspect, includes: a first node, a second node, a central system, the first node and the second node are located in an edge system, and the first node And the second node are both connected to the central system, the current working mode of the first node is the master node working mode, the current working mode of the second node is the standby node working mode, the first node, Used to monitor the connection status with the central system, switch to the standby node working mode when the connection between the first node and the central system is interrupted; the central system is used to monitor the connection with the first node When the connection state between the nodes is interrupted, when the connection between the central system and the first node is interrupted, a master-up command is sent to the second node. The master-up command is used to notify the standby node to switch to the master Node working mode; the second node is used to switch to the master node working mode according to the master upgrade command.
  • the first node is also used to monitor the connection state with the second node; the first node is also used to determine the The connection between the first node and the second node is interrupted or connected, and the current working mode of the first node and/or the second node is maintained.
  • the first node is specifically configured to monitor the connection status with a gateway, and the gateway is between the first node and the central system .
  • the gateway is located in the edge system.
  • a device for node control which includes various modules or units for executing the method in any one of the foregoing first to fifth aspects.
  • a device for node control including a processor.
  • the processor is coupled with the memory and can be used to execute instructions in the memory to implement the method in any one of the foregoing first aspect to the fifth aspect and any one of the first aspect to the fifth aspect.
  • the device further includes a memory.
  • the device further includes a communication interface, the processor is coupled with the communication interface, and the communication interface is used to input and/or output information.
  • the information includes at least one of instructions and data.
  • the device is a node, or the device is a central system.
  • the communication interface may be a transceiver, or an input/output interface.
  • the device is a chip or a chip system.
  • the communication interface may be an input/output interface, which may be an input/output interface, interface circuit, output circuit, input circuit, pin, or related circuit on the chip or chip system.
  • the processor may also be embodied as a processing circuit or a logic circuit.
  • the device is a chip or a chip system configured in a node, or the device is a chip or a chip system configured in a central system.
  • the transceiver may be a transceiver circuit.
  • the input/output interface may be an input/output circuit.
  • a processor including: an input circuit, an output circuit, and a processing circuit.
  • the processing circuit is configured to receive a signal through the input circuit and transmit a signal through the output circuit, so that the processor executes the method in any one of the possible implementation manners of the first aspect to the fifth aspect.
  • the above-mentioned processor may be a chip
  • the input circuit may be an input pin
  • the output circuit may be an output pin
  • the processing circuit may be a transistor, a gate circuit, a flip-flop, and various logic circuits.
  • the input signal received by the input circuit can be received and input by, for example, but not limited to, the input interface
  • the signal output by the output circuit can be, for example, but not limited to, output to the output interface and transmitted by the output interface
  • the circuit can be the same circuit, which is used as an input circuit and an output circuit at different times.
  • the embodiments of the present application do not limit the specific implementation manners of the processor and various circuits.
  • a processing device including a processor and a memory.
  • the processor is configured to read instructions stored in the memory, receive signals through the input interface, and transmit signals through the output interface, so as to execute the method in any one of the possible implementation manners of the first aspect to the fifth aspect.
  • the output interface and the input interface can be collectively referred to as a communication interface.
  • processors there are one or more processors, and one or more memories.
  • the memory may be integrated with the processor, or the memory and the processor may be provided separately.
  • the processing device in the above tenth aspect may be a chip, and the processor may be implemented by hardware or software.
  • the processor When implemented by hardware, the processor may be a logic circuit, an integrated circuit, etc.; when implemented by software, When implemented, the processor may be a general-purpose processor, which is implemented by reading software codes stored in the memory.
  • the memory may be integrated in the processor, may be located outside the processor, and exist independently.
  • An eleventh aspect provides a computer-readable storage medium on which a computer program is stored.
  • the communication device realizes the first aspect to the fifth aspect, and the first aspect to the fifth aspect.
  • a computer program product containing instructions which when executed by a computer, causes an apparatus to implement the methods provided in the first to fifth aspects.
  • a node control system which includes the aforementioned first node and second node; or, includes the aforementioned first node, second node, and a central system.
  • FIG. 1 is a schematic diagram of a network architecture applicable to an embodiment of the present application
  • Fig. 2 is a schematic diagram of an application scenario applicable to an embodiment of the present application
  • FIG. 3 is a schematic block diagram of a method for node control according to an embodiment of the present application.
  • FIG. 4 is a schematic block diagram of a method for node control according to still another embodiment of the present application.
  • FIG. 5 is a schematic block diagram of a method for node control according to another embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a method for node control according to another embodiment of the present application.
  • FIG. 7 shows a schematic diagram of a method for node control applicable to an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of a node control device provided by an embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a device controlled by a node according to an embodiment of the present application.
  • the technical solutions of the embodiments of the present application can be applied to various communication systems, such as: fifth generation (5G) system or new radio (NR), long term evolution (LTE) system, LTE frequency Frequency division duplex (FDD) system, LTE time division duplex (TDD), universal mobile telecommunication system (UMTS), etc.
  • 5G fifth generation
  • LTE long term evolution
  • FDD Frequency division duplex
  • TDD time division duplex
  • UMTS universal mobile telecommunication system
  • Fig. 1 is a schematic diagram of a network architecture applicable to an embodiment of the present application.
  • the network architecture can belong to a two-tier architecture.
  • the upper layer is the central system located in the central area
  • the lower layer is the edge system located in each city.
  • the central system can act as the central controller, and the creation and deletion of services can be initiated by the central system.
  • the central system can issue requests for service creation and deletion to each edge system, and the edge system can create or delete the service.
  • the edge system also has certain business processing capabilities, such as query business, fault handling, and so on.
  • the edge system also includes edge nodes.
  • Edge node means a service platform built on the edge of the network close to the user, providing storage, computing, network and other resources, and sinking some key business applications to the edge of the access network to reduce network transmission and multi-level forwarding. Width and delay loss.
  • An edge node may include one or more nodes (nodes).
  • a node is a virtual machine, including the system, application, database, etc. running in the virtual machine.
  • the edge nodes may include node 1 and node 2, for example.
  • Node 1 and node 2 can be two nodes in a master and backup relationship.
  • Node 1 and node 2 are usually deployed on different physical hosts and different storage. The failure of any one node will not affect the other node, and the other node can still continue to run.
  • each physical host has only two physical network cards, one for hardware management (ie, network card 0, not shown in the figure), and one for virtual machines running on the physical host (ie, network card 1).
  • the network card 1-1 and the network card 1-2 in Figure 1 where the network card 1-1 represents the network card 1 on the physical host 1 where the node 1 is located, and the network card 1-2 represents the network card 1 on the physical host 2 where the node 2 is located.
  • Network card 1 A heartbeat channel can be configured between node 1 and node 2, as shown in heartbeat 1 in Figure 1.
  • Node 1 and node 2 can detect the heartbeat channel between nodes, such as heartbeat 1, to determine whether the connection between nodes is interrupted.
  • the central system can issue instructions to the edge node through the management channel.
  • the central system can issue instructions to edge nodes through gateways, switches, etc.
  • Fig. 2 is a schematic diagram of an application scenario suitable for an embodiment of the present application.
  • the embodiments of this application can be applied to mobile edge computing (mobile edge computing, MEC) scenarios.
  • MEC mobile edge computing
  • the architecture shown in Figure 1 can be applied to MEC scenarios.
  • the central system is the central virtual network function manager (VNFM) device
  • the edge system is the edge VNFM device.
  • VNFM virtual network function manager
  • the central VNFM equipment and the edge VNFM equipment can collaborate to create a virtual network element (visual network function, VNF).
  • VNF virtual network element
  • the VNF can obtain virtual resource information from the edge VNFM device, which may include, but is not limited to, virtual machine information, network, etc., for example. If the VNF is faulty, the VNF can cooperate with the edge VNFM device to repair the VNF.
  • the VNF may include a user plane function (UPF).
  • UPF user plane function
  • VNF UPF network element
  • FIG. 2 may be an independent device, or may be integrated in the same device to implement different functions, which is not limited in this application.
  • FIG. 1 and FIG. 2 are only exemplary illustrations, and the network architecture applicable to the embodiments of the present application is not limited thereto, and any network architecture capable of realizing the functions of the foregoing various network elements is applicable to the embodiments of the present application.
  • the foregoing edge system may include a larger number of backup nodes.
  • FIG. 1 and FIG. 2 take the gateway included in the edge system as an example for exemplary description, which is not limited.
  • the gateway can also be included in the central system.
  • ETSI European Telecommunications Standards Institute
  • a carrier-class service environment with high performance, low latency and high bandwidth can be created to accelerate the distribution and download of various content, services and applications in the network, so that consumers can enjoy more High-quality network experience.
  • the “dual master” problem refers to a phenomenon in which two machines or two nodes in a master and backup relationship, when triggered by certain conditions, turn themselves into master nodes. When this phenomenon occurs, since both nodes are processing the same business as the master node, the problem of inconsistent business processing results on the two nodes may occur. The dual-machine of the active and standby mode needs to avoid this situation as much as possible. It can be understood that the "no master” problem means that two machines or two nodes in a master and backup relationship, when triggered by certain conditions, both turn themselves into backup nodes.
  • the node 1 and node 2 in Figure 1 or Figure 2 Take the node 1 and node 2 in Figure 1 or Figure 2 as an example.
  • the node 1 and node 2 in the active-standby mode are usually deployed on different physical hosts and different storages. The failure of any node will not affect the other One node, another node can still continue to run.
  • One way is to avoid the "dual master" problem by detecting the heartbeat channel between nodes.
  • multiple heartbeat channels are configured between node 1 and node 2.
  • the master node is still the master node
  • the backup node is still the backup node; when there is no heartbeat channel that can be connected, the backup node becomes the master node.
  • a hardware failure will cause a "dual master” problem: in a virtualization scenario, the virtual network cards through which multiple heartbeat channels of the same node actually correspond to the same physical network card in the physical host (such as network card 1 in Figure 2). Assuming that node 1 is the master node and node 2 is the backup node, the physical host network card failure or physical host failure or network equipment failure will cause node 2 to fail to detect the heartbeat of node 1, and node 2 will promote itself as the master node. There is a “dual master” phenomenon.
  • a software failure will cause a "dual master" problem: when the primary node or backup node cannot detect the heartbeat of the peer due to internal software failure (failure caused by factors other than the network card, network, etc.), a "dual master” situation will occur; Adding multiple heartbeat channel configurations cannot solve the dual-master problem in this scenario.
  • Another way is to avoid the "dual master" problem by adding arbitration nodes.
  • this application provides a way that not only can solve the "dual master” problem at a lower cost, but can also avoid the "no master” problem.
  • FIG. 3 is a schematic block diagram of a method 300 for node control according to an embodiment of the present application.
  • the method 300 can be applied to a node control system.
  • the node control system includes a first node, a second node, and a central system.
  • the first node and the second node are located in the edge system, and both the first node and the second node are connected to the central system.
  • the central system refers to the equipment that can control the edge system where the first node and the second node are located.
  • the central system can act as the central controller to initiate business creation and deletion.
  • the central system can issue requests for service creation and deletion to each edge system, and the edge system can create or delete the service.
  • the central system may be, for example, the central system as described in FIG. 1, or the central system may also be the central VNFM device as described in FIG. 2.
  • the specific form of the central system is not limited in the embodiment of this application.
  • the method 300 may include the following steps.
  • the first node monitors the connection state with the central system, and switches to the standby node working mode when the connection between the first node and the central system is interrupted;
  • the central system monitors the connection status with the first node. When the connection between the central system and the first node is interrupted, it sends a master-up command to the second node. The master-up command is used to notify the standby node to switch to the master. Node working mode;
  • the second node switches to the working mode of the master node according to the master upgrade command.
  • the backup operation is performed when it detects that the connection between itself and the central system is interrupted; for a node in the working mode of the standby node, After receiving the master promotion command from the central system, the master promotion operation is carried out. Specifically, on the one hand, by detecting the connectivity between the node in the working mode of the master node and the central system, it is judged whether the node in the working mode of the master node needs to be backed up (that is, the original master node is reduced to a backup node), so as to avoid occurrence Double master problem.
  • the central system may issue a master upgrade command to a node in the standby node working mode, so that the original standby node switches to the master node working mode. Therefore, there is no need to add a new arbitration node, but by integrating the switching capability into the central system, not only the waste of additional resources can be reduced, but also the ownerless problem can be avoided.
  • the central system can decide whether to issue a master upgrade command according to the connection with the node in the master node working mode, which can reduce unnecessary switching and save switching time.
  • the first node and the second node belong to an edge device or a node in an edge system, and the edge device or an edge system may include multiple nodes.
  • an edge device or an edge system may include two nodes, for example, denoted as the first node and the second node.
  • the edge device or the edge system may include more than two nodes, for example, one node is in the active node working mode, and the other nodes are in the standby node working mode, which is not limited.
  • the method 300 is mainly exemplified by assuming that the first node is in the working mode of the master node and the second node is in the working mode of the standby node as an example.
  • the specific form of the node is not limited in the embodiment of the present application.
  • the node may be a virtual machine, which includes the system, application, database, etc. running in the virtual machine.
  • the application may be composed of management applications and business applications, for example.
  • the backup node is relative to the master node.
  • the backup node can also be called a backup node or a non-master node. Its naming does not limit the protection scope of the embodiments of this application. In future agreements, it is used to mean the same The naming of functions all fall into the protection scope of the embodiments of this application.
  • the embodiment of the present application does not limit the state of the node.
  • the first node is currently in the state of the master node, and may be in the state of the standby node in a certain period of time, and it is not limited that the first node can only be the master node.
  • a certain node is a master node, or a certain node is in a master node working mode, they are all used to represent the same meaning, and those skilled in the art should understand the meaning.
  • the standby node in other words, a certain node is a standby node, or a certain node is in a standby node working mode, they are all used to represent the same meaning, and those skilled in the art should understand the meaning.
  • the primary node and the standby node are uniformly used to describe.
  • the node can be determined whether the node is a master node or a backup node according to the state of the node.
  • the primary node refers to a node in an active state
  • the standby node refers to a node in an inactive state. It can be understood that if the node is in the active state, the node is determined to be the master node; if the node is in the inactive state, the node is determined to be the standby node.
  • the active state may indicate whether the service can be provided to the outside.
  • the master node provides external services.
  • the edge system when installed, one node can be set as the master node by default, and the other node is the standby node, and the master node provides external services.
  • the system, management application, business application, database, etc. in the primary node can all be in the running state; the system, management application, database, etc. of the standby node can be in the running state, while the business application is in the stopped state. It can be understood that if the business application on the node is in the running state, the node is determined to be the master node; if the business application on the node is in the non-running state or stopped state, the node is determined to be the standby node.
  • the embodiment of the application does not limit the business application on the standby node to be in a stopped state.
  • the system, management application, business application, database, etc. on the primary node and the standby node can all be running state.
  • whether the node is the master node or the backup node can be determined according to the identifier.
  • the way of recording the flag bit can be used to identify the active and standby nodes. For example, during the installation of the edge system, an identification file (that is, a file containing the standby status flag bit, such as standby_flag,) can be created under the preset storage path on a certain node to identify the node as a standby node without the identification file The node of is the master node.
  • the node is determined to be a standby node; if there is no file containing the flag bit on the node, the node is determined to be the master node. It should be understood that there are many ways to identify the primary node or the backup node through the identification, which will not be repeated here, and any other ways of identifying the primary node or the backup node through the identification method fall within the protection scope of the embodiments of the present application.
  • switching to the standby node working mode” or “downgrading from the primary node to the standby node” or “downgrading” is mentioned many times, and those skilled in the art should understand the meaning. They are all used to indicate that a node is switched from the original master node mode to the standby node mode; in other words, a node starts to process services as a standby node; or it can be understood that in an edge system, a node starts to exist as a standby node For example, the node no longer processes services temporarily (for example, the node stops all applications on the node). Regarding this, I won't go into details below.
  • switch to the working mode of the master node or “promote from a backup node to a backup node” or “promote to the master” are mentioned many times, and those skilled in the art should understand the meaning. They are all used to indicate that a node has switched from the original standby node mode to the master node mode; in other words, a node starts to process business as the master node; or it can be understood that in the edge system, a node starts to exist as the master node , Such as the main point to start processing business. Regarding this, I won't go into details below.
  • each node management application controls the communication channels for external access to the edge system, and can record the active/standby mode, and update the active/standby mode when performing operations such as upgrading or lowering standby.
  • the first node may detect the connection with the gateway to determine the connection status with the central system.
  • the first node when the first node detects that the connection with the gateway is interrupted, it determines that the connection status with the central system is interrupted; when the first node detects that the connection with the gateway is connected, it determines that the connection status with the central system is connected.
  • the following is mainly an example of detecting the connection with the gateway. It should be understood that any method that can determine the connectivity between the node and the central system is applicable to the embodiments of the present application.
  • the gateway may be located in the edge system, such as shown in FIG. 1 or FIG. 2; or, the gateway may also be located in the central system, which is not limited.
  • the gateway can be a network connector or a network device in a switch.
  • the gateway may also be an internet connector in a routing device or a network device.
  • the connection between the gateway and the node means that the node connects to the gateway so that the node can run.
  • the node is connected to the gateway so that applications such as programs or databases can be run.
  • the timing of the detection between the node and the central system is not limited in the embodiment of the present application.
  • the first node may periodically detect the connection with the central system.
  • the first node can periodically detect the connection with the central system. For example, after the system is started, the first node can periodically detect the connection with the central system.
  • the first node periodically detects the connection with the central system according to the first preset time.
  • the first preset time may be a configured time length; or, the first preset time may also be a pre-defined time length, such as a time pre-defined by an agreement or a pre-defined time period by the central system; or, the first preset time may also be The duration can be determined according to the historical detection situation. There is no restriction on this.
  • the first preset time may be one minute. That is, the first node can check the connection with the central system every one minute.
  • the first node may also detect the connection with the central system from time to time, which is not limited in the embodiment of the present application.
  • the detection method between the node and the central system is not limited in the embodiment of the present application.
  • the first node may determine whether the connection between the first node and the central system is normal by sending a message. For example, the first node sends a message (such as a heartbeat message or a heartbeat signal, etc.) to the central system. If the first node receives a message (such as an acknowledgement (ACK) message) from the central system, it can determine the relationship with the central system The connection is normal; if the first node does not receive the reply message from the central system, it can be determined that the connection with the central system is abnormal.
  • ACK acknowledgement
  • the first node can detect whether the connection with the central system (such as the gateway) is normal by pinging the IP address of the gateway.
  • the connection between the first node and the central system (such as the gateway) is normal; if the result of the ping command returns other values, the connection between the first node and the central system (such as the gateway) is faulty ( Or the connection is interrupted).
  • the relevant parameters of the ping command may include, but are not limited to: specifying the number of consecutive executions, the interval time of each execution of the ping, and the execution timeout period.
  • the specified number of consecutive executions can indicate the number of consecutive executions, such as the number of consecutive executions detected by pinging the gateway IP address.
  • Specify the number of consecutive executions which can be configured or pre-defined, which is not limited.
  • the number of consecutive executions can be set to A, where A is an integer greater than or equal to 1.
  • A is 5, that is, the number of consecutive executions is 5 times.
  • the interval time of each execution of ping may represent the time interval of two adjacent executions of ping, such as the time interval of two adjacent executions of detection by means of pinging the gateway IP address.
  • the interval between each execution of ping can be configured or pre-defined, which is not limited.
  • the default interval for each ping is T1, where T1 is a number greater than 0.
  • T1 is 1 second, that is, the interval between each ping is 1 second.
  • the execution timeout period may indicate the timeout period for performing ping.
  • the execution timeout period can be configured or pre-defined, which is not limited.
  • the default execution timeout period can be T2, where T2 is a number greater than 0.
  • T2 is 4 seconds, that is, the execution timeout period is 4 seconds.
  • the central system can also detect the connection status with the first node.
  • the embodiment of the present application does not limit it.
  • the central system can detect whether the connection with the edge node (such as the first node) is normal through a secure shell (SSH) command to remotely log in to the node. If the result of the SSH command returns 0, it means that the connection between the central system and the edge node (such as the first node) is normal; if the SSH command returns other numbers, it means the connection between the central system and the edge node (such as the first node) Connection failure.
  • SSH secure shell
  • the embodiment of the present application does not limit the manner in which the central system detects whether the connection between the edge node (such as the first node) is normal, and any method that can realize whether the central system detects whether the connection between the edge node and the edge node is normal. All fall within the protection scope of the embodiments of the present application.
  • the central system can use the ping method described above to detect whether the connection with the edge node is normal.
  • the timing of the detection between the central system and the edge node is not limited in the embodiment of the present application.
  • the central system can detect whether the connection with the edge node (such as the first node) is normal.
  • the edge device appears in the "unowned state"
  • all the programs on the node are stopped, which will lead to the established channel between the central system and the edge device, such as the established remote process based on Google (google)
  • the channel developed by the remote procedure (RPC) (gRPC) framework is forced to be interrupted, and the central system will receive an "unavailable" error code.
  • the “non-primary state” is mentioned many times, which means that multiple nodes are all standby nodes or all are in standby node working mode, and there is no active node.
  • the nodes in the edge system are all standby nodes or are in standby node working mode, and there is no master node. For example, the programs on all nodes in the edge system are stopped.
  • the gRPC channel indicates that the communication between the central system and the edge device adopts the gRPC framework, and the protocol is HTTP/2 (Hypertext Transfer Protocol (HTTP)).
  • HTTP/2 Hypertext Transfer Protocol
  • the communication channel There has been. It should be understood that the gRPC channel is only an example, and the embodiment of the present application does not limit the communication between the central system and the edge device using the gRPC channel only.
  • the central system When the central system detects that the service connection channel with the edge device (or edge system) where the first node is located is interrupted, the central system can issue a master upgrade command to the edge device. Correspondingly, the second node receives the master upgrade command.
  • the central system may send a master upgrade command to any one of the nodes.
  • the upgrade master command is used to notify the switch from the standby node working mode to the master node working mode, that is, the upgrade master command is used to instruct to switch to the master node working mode, or in other words, the upgrade master command is used to instruct the original backup node to switch to the master node. model.
  • the main upgrade command is only a naming for distinguishing different functions, and its naming does not limit the protection scope of the embodiments of the present application.
  • the main upgrade command may also be referred to as a switching command or a mandatory main upgrade command or a mandatory command, or for the main node, the main upgrade command may also be replaced with a backup backup command.
  • the names used to represent the same function all fall into the protection scope of the embodiments of the present application. The following is unified, expressed by the promotion command.
  • the central system issues a master upgrade command to the second node, or in other words, the central system issues a command to the second node to switch itself from the standby node to the master node.
  • the second node switches from the standby node to the master node according to the master upgrade command.
  • the edge device can actively establish a gRPC connection with the central system. After the gRPC channel is successfully established, the upgrade process ends.
  • detection may also be performed between nodes.
  • the first node monitors the connection state with the second node; when it is determined that the connection between the first node and the second node is interrupted or connected, the current working mode of the first node is maintained.
  • the second node monitors the connection state with the first node; when it is determined that the connection between the second node and the first node is interrupted or connected, the current working mode of the second node is maintained.
  • the detection between nodes is combined with the detection between the node and the central system (such as the gateway) to jointly determine whether the master node should perform the backup operation, thereby avoiding the dual-master problem .
  • the node determines whether to downgrade according to the connection between the node and the central system, so even if there is a hardware failure (such as a physical host network card failure or a physical host failure or a network device failure, etc.), or it appears, the primary node or the backup node Because of the internal software failure (failure caused by factors other than the network card, network, etc.), the peer's heartbeat cannot be detected, and there will be no misjudgment that will cause the dual-master problem.
  • a hardware failure such as a physical host network card failure or a physical host failure or a network device failure, etc.
  • a dual-master problem may occur.
  • hardware failure such as physical host network card failure or physical host failure or network equipment failure, etc.
  • software failure such as internal software failure caused by factors other than network card, network, etc.
  • two master nodes ie dual-master problem
  • the node that is in the state of the master node and the node that has been promoted from the backup node to the master node. node. Therefore, in the embodiment of the present application, after the node detects that the connection between the nodes is faulty, no matter whether the node is the master node or the backup node, no action is taken to avoid the dual-master problem.
  • the timing of detection between nodes is not limited in the embodiment of the present application.
  • the following mainly takes the detection of the first node as an example for exemplification.
  • the first node may periodically detect the connection with the second node.
  • the first node can periodically detect the connection with the second node.
  • the first node may periodically detect the connection with the second node.
  • the first node periodically detects the connection with the second node according to the second preset time.
  • the second preset time may be, for example, a configured time length; or, the second preset time may also be a pre-defined time length, such as a time pre-defined by an agreement or a pre-defined time period by the central system; or, the second preset time may also The duration can be determined according to the historical detection situation. There is no restriction on this. Wherein, the second preset time and the second preset time may be the same or different, and the two may or may not be related, and there is no limitation on this.
  • the second preset time may be one minute. That is, the first node can detect the connection with the second node every one minute.
  • the first node may also detect the connection with the second node from time to time, which is not limited in the embodiment of the present application.
  • the method of detecting between nodes is not limited in the embodiment of the present application.
  • a possible implementation is to detect the connection between nodes by sending a message (such as a heartbeat signal). For example, as long as the first node can detect the heartbeat of the second node, it can be considered that the connection between the first node and the second node is normal. When there is no heartbeat channel to connect, the first node and the second node can be considered The connection between is not normal.
  • a message such as a heartbeat signal
  • the first node can detect whether the connection with the second node is normal by pinging the gateway IP address.
  • the connection between the first node and the second node is normal; if the result of the ping command returns other values, the connection between the first node and the second node is faulty (or the connection is interrupted).
  • the relevant parameters of the ping command may include, but are not limited to: specifying the number of consecutive executions, the interval time of each execution of the ping, and the execution timeout period.
  • the specified number of consecutive executions can indicate the number of consecutive executions, such as the number of consecutive executions detected by pinging the gateway IP address.
  • Specify the number of consecutive executions which can be configured or pre-defined, which is not limited.
  • the number of consecutive executions can be set to B, where B is an integer greater than or equal to 1.
  • B is 5, that is, the number of consecutive executions is 5 times.
  • B and A may be the same or different, and the two may be related or unrelated, and there is no limitation on this.
  • the interval time of each execution of ping may represent the time interval of two adjacent executions of ping, such as the time interval of two adjacent executions of detection by means of pinging the gateway IP address.
  • the interval between each execution of ping can be configured or pre-defined, which is not limited. For example, you can default the interval time of each ping to be t1, where t1 is a number greater than 0. For example, t1 is 1 second, that is, the interval between each ping is 1 second.
  • t1 and T1 may be the same or different, and the two may be related or not, and there is no limitation on this.
  • the execution timeout period may indicate the timeout period for performing ping.
  • the execution timeout period can be configured or pre-defined, which is not limited.
  • the default execution timeout time can be t2, where t2 is a number greater than 0.
  • t2 is 4 seconds, that is, the execution timeout period is 4 seconds.
  • t2 and T2 may be the same or different, and the two may or may not be related, and there is no limitation on this.
  • the nodes can also detect the heartbeat channel between the nodes to determine whether the connection between the nodes is normal.
  • the heartbeat channel between the nodes can be detected by sending a heartbeat signal.
  • step 320 and step 330 are before step 310, that is, the second node may first receive the master upgrade command from the central system, and perform corresponding processing according to the master upgrade command.
  • step 310 is before step 320 and step 330, that is, the first node may also perform corresponding processing according to the connection with the central system.
  • FIG. 4 is a schematic block diagram of a method 400 for node control according to an embodiment of the present application.
  • the method 400 may include the following steps.
  • Node #1 detects the connection with the central system.
  • the node #1 may be the master node, in other words, the node #1 is in the active node working mode; or, the node #1 may be the standby node, or in other words, the node #1 is in the standby node working mode.
  • the node #1 when the node #1 is a standby node, the node #1 may not need to detect the connection with the central system, which is not limited.
  • node #1 detects the connection with the gateway. Specifically, reference may be made to the description of the first node detecting the connection with the gateway in the method 300 above, which will not be repeated here.
  • the node #1 determines that it is in the working mode of the master node, it switches to the working mode of the backup node when detecting that the connection between itself and the central system is interrupted.
  • node #1 is the master node. If node #1 detects a connection failure with the central system, or in other words, node #1 detects that the connection with the central system is interrupted, then node #1 switches from the working mode of the main node to the working mode of the standby node, or Node #1 is switched to the standby node. For example, node #1 can stop applications such as programs and databases on node #1.
  • node #1 is a standby node. If node #1 detects a connection failure with the central system, or in other words, node #1 detects that the connection with the central system is interrupted, then node #1 keeps the standby node working mode, or node #1 keeps the standby node constant.
  • node #1 determines that it is in the standby node working mode, it listens to the command from the central system, and when it receives the master upgrade command from the central system, it switches to the master node working mode.
  • the master upgrade command is used to notify the slave
  • the working mode of the standby node is switched to the working mode of the primary node.
  • node #1 may receive the following information from the central system.
  • a possible implementation method When the central system detects that the service connection channel with the edge device (or edge system) where node #1 is located is interrupted, the central system can issue a master upgrade command to the edge device through the management channel .
  • the central system can determine whether to send a master upgrade command according to the business situation.
  • the central system can detect whether the connection with the node of the edge device (such as the node currently in the working mode of the master node) is normal. In the case that the central system detects that the connection with the node currently in the working mode of the primary node is interrupted, the central system may issue the main upgrade command to the node currently in the working mode of the standby node.
  • the edge device such as the node currently in the working mode of the master node
  • the central system When the central system detects that the service connection channel with the edge device (or edge system) where the node #1 is located is interrupted, the central system can issue a master upgrade command to the edge device. Correspondingly, the node of the edge device receives the master upgrade command.
  • the node #1 is the master node, or in other words, the node #1 is in the master node working mode.
  • the service connection channel of the edge device (or edge system) where node #1 is located has been interrupted, that is, the programs on node #1 are all stopped or it can be understood that node #1 is already in standby node working mode , So node #1 will not receive the master upgrade command.
  • the node #1 is a standby node, or in other words, the node #1 is in the standby node working mode.
  • the central system issues a master upgrade command to the node #1, or in other words, the central system issues a command to the node #1: switching itself from the standby node to the master node.
  • node #1 After node #1 receives the master upgrade command, it switches from the standby node to the master node according to the master upgrade command.
  • the edge device can actively establish a gRPC connection with the central system. After the gRPC channel is successfully established, the upgrade process ends.
  • connectivity detection can be performed.
  • the central system such as the gateway
  • it is determined whether the node needs to be downgraded that is, the original primary node is reduced to a standby node
  • business linkage detection can be performed.
  • the central system may issue a master upgrade command to a node in the standby node working mode, so that the original standby node switches to the master node working mode. Therefore, there is no need to add a new arbitration node, but by integrating the switching capability into the central system, not only the waste of additional resources can be reduced, but also the ownerless problem can be avoided.
  • step 430 when the current working mode of node #1 is the standby node working mode, only step 430 may be performed.
  • the current working mode of node #1 is the master node working mode, only steps 410 and 420 may be performed.
  • node #1 there is no strict sequence between the node #1 determining whether it is in the active node working mode or the standby node working mode, and the node #1 detecting whether the connection with the central system is interrupted. For example, node #1 can first determine whether it is in the working mode of the primary node or the standby node, and then detect whether the connection with the central system is interrupted; or node #1 can also detect whether the connection with the central system is interrupted first, and then Determine whether it is in the working mode of the primary node or the working mode of the standby node.
  • the method 400 may further include step 401.
  • Node #1 detects the connection with node #2.
  • node #1 is the master node and node #2 is the backup node; or node #1 is the backup node and node #2 is the master node.
  • node #1 and node #2 will not be processed, or they will remain in the current mode.
  • step 401 there is no sequence relationship between step 401 and step 410.
  • node #1 can periodically detect the connection with node #2, and node #1 can periodically detect the connection with the gateway.
  • the solution of connectivity detection (such as step 410 and step 420) combined with service linkage detection (such as step 430) is introduced.
  • This solution can solve the unowned problem and the dual master problem.
  • the following describes the connectivity detection scheme and the service linkage detection scheme with reference to FIG. 5 and FIG. 6 respectively. It should be understood that the solutions described in the method 500 and the method 600 can be used in combination (as described in the method 300 or the method 400), or can be used separately, which is not limited.
  • FIG. 5 is a schematic block diagram of a method 500 for node control according to an embodiment of the present application.
  • the method 500 may include the following steps.
  • Node #1 detects the connection with the central system.
  • the node #1 may be the master node, in other words, the node #1 is in the active node working mode; or, the node #1 may be the standby node, or in other words, the node #1 is in the standby node working mode.
  • node #1 can detect the connection with the gateway. That is to say, when node #1 detects that the connection with the gateway is interrupted, when it is determined that node #1 is in the working mode of the primary node, switch to the standby node working mode; when it is judged that the node #1 is in the standby node working mode , Keep the standby node working mode.
  • the node #1 is the master node, or in other words, the node #1 is in the master node working mode. If node #1 detects a connection failure with the central system, or node #1 detects that the connection between the central system is interrupted, then node #1 switches from the main node working mode to the standby node working mode, or node #1 Switch to the standby node. For example, node #1 can stop applications such as programs and databases on node #1.
  • the node #1 is a standby node, or in other words, the node #1 is in the standby node working mode. If node #1 detects a connection failure with the central system, or in other words, node #1 detects that the connection with the central system is interrupted, then node #1 keeps the standby node working mode, or node #1 keeps the standby node constant.
  • the connectivity between the node and the central system is used to determine whether the node needs to be downgraded (that is, the original primary node is reduced to a standby node), so as to avoid the dual-master problem.
  • the node can be actively degraded, that is, the primary node can be actively reduced to a standby node. In this way, not only the appearance of dual masters can be avoided, but also the situation that the network cannot be found until after the switch is found can be avoided, and the consumption of ineffective switch time can be reduced.
  • the method 500 may further include step 501.
  • Node #1 detects the connection with node #2.
  • step 501 there is no sequence relationship between step 501 and step 510.
  • node #1 can periodically detect the connection with node #2, and node #1 can periodically detect the connection with the central system.
  • the trigger conditions for the promotion of master such as when the backup node will be promoted to the master node.
  • Any solution that can make a master node exist in the edge system after the master node is reduced to a backup node (for example, the backup node is promoted to be a master node) is applicable to the embodiments of the present application.
  • the connectivity between the node and the central system is detected to determine whether the node needs to be downgraded (that is, the original primary node is reduced to a backup node), so as to avoid the dual-master problem.
  • the node can be actively degraded, that is, the primary node can be actively reduced to a standby node .
  • the embodiments of the present application can also be used in MEC technology, so as to solve the "dual master" problem that may occur in MEC technology.
  • FIG. 6 is a schematic block diagram of a method 600 for node control according to an embodiment of the present application.
  • the method 600 may include the following steps.
  • Node #1 receives a master upgrade command from the central system, and the master upgrade command is used to notify the switch from the standby node working mode to the master node working mode.
  • the node #1 may be the master node, in other words, the node #1 is in the active node working mode; or, the node #1 may be the standby node, or in other words, the node #1 is in the standby node working mode.
  • a possible implementation method When the central system detects that the service connection channel with the edge device (or edge system) where node #1 is located is interrupted, the central system can issue a master upgrade command to the edge device through the management channel .
  • the central system can determine whether to send a master upgrade command according to the business situation.
  • the central system can detect whether the connection with the node of the edge device (such as node #1) is normal.
  • the node #1 determines that it is in the working mode of the master node, it switches to the working mode of the backup node.
  • the central system may decide to switch between the active and standby nodes in the edge system. For example, in the case of a software failure, the original active and standby nodes need to be switched, that is, the central system can be connected to two nodes, and the central system issues a master upgrade command (or can also be called a master upgrade command) to the two nodes , The node that was originally in the working mode of the primary node is switched to the working mode of the backup node, and the node that was originally in the working mode of the backup node is switched to the working mode of the primary node.
  • a master upgrade command or can also be called a master upgrade command
  • the central system can only be connected to a certain node (for example, node #1, and the node #1 is in the standby node working mode).
  • the central system issues a master upgrade command to the node, or in other words, the central system issues a command to the node to switch itself from the standby node to the master node.
  • the edge device can actively establish a gRPC connection with the central system. After the gRPC channel is successfully established, the upgrade process ends.
  • the central system can be connected to two nodes (such as node #1 and node #2).
  • the central system issues the main upgrade command (or switch command) to the two nodes, or in other words, the central system issues the two nodes: switch the original main node to the standby node, and the original standby node The command to switch the node to the master node.
  • the dual-machine monitoring on the node can be operated according to its own active and standby conditions, for example, if the current node is the master node, the current node will switch to the standby node working mode; if the current node is the standby node, the current node will switch to the master node working mode .
  • the edge device can actively establish a gRPC connection with the central system. After the gRPC channel is successfully established, the forced switch ends.
  • a possible scenario is that the software fails and the original active and standby nodes need to be switched, that is, the central system can be connected to two nodes, and the central system issues an upgrade command to the two nodes, so that the original The master node is switched to a backup node, and the original backup node is switched to the master node.
  • the arbitration capability can be integrated into the central system to provide.
  • the central system can decide whether to switch according to the failure condition of the service channel (that is, the original master node is switched to the backup node, and the original backup node is switched to the master node), instead of detecting the connectivity between the arbitration and the edge device.
  • the switching capability into the central system, additional waste of resources can be reduced.
  • determining whether to switch between the active and standby nodes according to the failure of the service channel can reduce unnecessary switching and save switching time.
  • the embodiments of the present application can also be used in MEC technology, so as to solve the "dual master" problem and the "non-master" problem that may occur in the MEC technology.
  • the method 600 may further include step 601.
  • Node #1 detects the connection with node #2.
  • service linkage detection can be performed, that is, the heartbeat between nodes is interrupted, and the nodes cannot perceive the status of the opposite end, so the switch can be determined according to the master upgrade command issued by the central system (that is, the original master node switch As the standby node, the original standby node is switched to the master node).
  • the heartbeat between nodes is interrupted, and the nodes cannot perceive the status of each other.
  • the central system can issue the main upgrade command to the edge device through the management channel.
  • the above combination of methods 300 and 400 introduces the solution of connectivity detection combined with service linkage detection
  • the combination of method 500 introduces the solution of using connectivity detection alone
  • the combination of method 600 introduces the solution of using service linkage detection alone.
  • the central system is used as the central VNFM device
  • the edge device is the edge VNFM device
  • the connection between the node detection and the gateway is taken as an example for exemplification. It should be understood that, for the details that are not described in the method 700, reference may be made to the descriptions in the method 300 to the method 600.
  • FIG. 7 shows a schematic diagram of a method 700 for node control applicable to an embodiment of the present application. It is assumed that the edge VNFM device includes node #1 and node #2, wherein, of node #1 and node #2, one is the master node or is in the working mode of the master node, and the other is the backup node or is in the working mode of the backup node.
  • the method 700 may include the following steps.
  • Node #1 detects whether the connection with the gateway is normal.
  • Node #1 can detect the connection with the gateway to determine whether the connection with the central system is normal.
  • node #1 can periodically detect the connection with the gateway. In other words, node #1 can periodically check the connection with the gateway. For example, after the system is started, node #1 can periodically check the connection with the gateway.
  • the node #1 periodically detects the connection with the gateway according to the first preset time.
  • node #1 may also detect the connection with the gateway from time to time, which is not limited in the embodiment of the present application.
  • node #1 can detect whether the connection with the gateway is normal by pinging the IP address of the gateway.
  • the connection between node #1 and the gateway is normal; if the result of the ping command returns other values, the connection between node #1 and the gateway is faulty (or the connection is interrupted).
  • the relevant parameters of the ping command may include, but are not limited to: specifying the number of consecutive executions, the interval time of each execution of the ping, and the execution timeout period.
  • node #1 can detect whether the connection with the gateway is normal by sending a message.
  • node #1 detects a connection failure (or connection interruption) between the node and the gateway.
  • the node #1 itself is a standby node, or in other words, the node #1 is in the standby node working mode. In this case, if the node #1 detects a connection failure between the node and the gateway, the node #1 keeps the standby node working mode unchanged.
  • the node #1 itself is the master node, or in other words, the node #1 is in the master node working mode.
  • the node #1 detects the connection failure between the node and the gateway, and the node #1 switches to the standby node working mode.
  • the node #1 stops the programs, databases and other applications on the node #1.
  • node #1 and node #2 are both backup nodes.
  • the edge VNFM device forms a "non-master" state.
  • the central VNFM device can be forced to be promoted to master according to business conditions. Then the node #2 is promoted to the master node.
  • Node #1 detects whether the connection with node #2 is normal.
  • node #1 can periodically detect the connection with node #2.
  • node #1 can periodically check the connection with node #2. For example, after the system is started, node #1 can periodically detect the connection with node #2.
  • the node #1 periodically detects the connection with the node #2 according to the second preset time.
  • node #1 can check whether the connection with node #2 is normal by pinging the gateway IP address.
  • the connection between node #1 and node #2 is normal; if the result of the ping command returns other values, the connection between node #1 and node #2 is faulty.
  • the relevant parameters of the ping command may include, but are not limited to: specifying the number of consecutive executions, the interval time of each execution of the ping, and the execution timeout period.
  • Node #1 detects a connection failure between node #1 and node #2, and node #1 remains unchanged.
  • no processing is performed. Conversely, assuming that the node #1 is in the standby node working mode, if the node #1 is promoted from the standby node to the master node according to the connection failure between the node #1 and the node #2, a dual-master problem may occur.
  • node #1 may cause node #1 to detect node #1
  • the connection with node #2 is faulty, and in this case, if node #1 is promoted from the standby node to the master node, there may be two master nodes (ie dual-master problem): the node itself is in the working mode of the master node #2, and node #1 that has been promoted from the standby node to the master node.
  • node #1 detects the connection failure between node #1 and node #2, no matter whether node #1 is in the working mode of the primary node or the working mode of the standby node, no processing is performed to avoid dual-master problem. Furthermore, the central VNFM device can also be instructed to avoid the ownerless problem.
  • node #1 and node #2 are both standby nodes, and the edge VNFM device forms an "unowned" state at this time.
  • the central VNFM device can be used according to business conditions. Perform forced promotion to master, and then promote node #2 to master.
  • the central VNFM device detects whether the service channel connection with the edge VNFM device is normal.
  • the procedures on all nodes are stopped, which will cause the central VNFM device and the edge VNFM device to be established.
  • the gRPC channel is forced to be interrupted, and the central VNFM device will receive an "unavailable" error code.
  • the central VNFM device can detect whether the service channel connection with the edge VNFM device is normal.
  • the central VNFM device uses the SSH command to remotely log in to the node to detect whether the connection with the node #1 or the node #2 is normal. If the result of the SSH command returns 0, it means that the connection between the central VNFM device and the node for the edge VNFM device is normal; if the SSH command returns other numbers, it means that the connection between the central VNFM device and the node for the edge VNFM device is faulty.
  • the central VNFM device can detect whether the connection with the node is normal falls within the protection scope of the embodiments of the present application.
  • the central VNFM device can also detect whether the connection with node #1 or node #2 is normal through ping or other methods.
  • the central VNFM device detects a failure in the service channel connection with the edge VNFM device, the central VNFM device can also issue the main upgrade command below.
  • the central VNFM device can only be connected to a certain node (for example, node #1, and the node #1 is in the standby node working mode).
  • the central VNFM device issues a master upgrade command to the node, or in other words, the central VNFM device issues a command to the node to switch itself from the standby node to the master node.
  • the edge VNFM device can actively establish a gRPC connection with the central VNFM device. After the gRPC channel is successfully established, the upgrade process ends.
  • the central VNFM device can be connected to two nodes (such as node #1 and node #2).
  • the central VNFM device issues an upgrade master command (or called a mandatory command) to two nodes, or in other words, the central VNFM device issues to the two nodes: switch the original master node to the backup node, and the original There is a command for the standby node to switch to the master node.
  • the dual-machine monitoring on the node can be operated according to its own active and standby conditions, for example, if the current node is the master node, the current node will switch to the standby node working mode; if the current node is the standby node, the current node will switch to the master node working mode .
  • the edge VNFM device can actively establish a gRPC connection with the central VNFM device, and after the gRPC channel is successfully established, the forced switching ends.
  • step 710 and step 720 and step 730 and step 740 For example, after the system is started, node #1 periodically performs the detection with the gateway, and node #1 periodically performs the detection with the node #2. In another example, there is no sequence relationship between step 710 and step 720 and step 750 and step 760.
  • the method 700 is mainly illustrated by taking the node #1 as an example, and the node #2 may also perform the steps described in the method 700. In other words, each node in the edge device can execute the steps of the method 700.
  • the method 700 takes a central VNFM device and an edge VNFM device as examples for exemplification, and the embodiment of the present application is not limited thereto.
  • the central system nor the edge system is limited to VNFM equipment.
  • the edge system includes two nodes as an example for exemplification, and the application is not limited thereto.
  • the edge system may also include more than two nodes.
  • one of the backup nodes can be selected to send the master upgrade command.
  • the node detects whether the connection with the gateway is normal to determine whether the connection between the node and the central system is normal as an example, and the application is not limited thereto.
  • any method that can make it possible to determine whether the connection between the node and the central system is normal is applicable to the embodiment of the present application.
  • the backup operation is performed. That is to say, through the connectivity test between the node and the central system, it is judged whether the node should be downgraded (that is, the original primary node is reduced to the standby node), so as to avoid the dual-master problem. Or, add connectivity detection in addition to the heartbeat detection. If the connectivity detection fails, take the initiative to perform a backup operation to avoid the dual-master problem.
  • the node in the case of a connection detection failure, that is, when the connection between the primary node and the central system is faulty or interrupted, the node can be actively degraded, that is, the primary node can be actively reduced to a standby node.
  • the node can be actively degraded, that is, the primary node can be actively reduced to a standby node. In this way, not only the appearance of dual masters can be avoided, but also the situation that the network cannot be found until after the switch is found can be avoided, and the consumption of ineffective switch time can be reduced.
  • the master promotion operation is performed for the node in the standby node working mode.
  • business linkage detection can be performed, and arbitration capabilities can be integrated into the central system to provide.
  • the central system can decide whether the original backup node should switch to the primary node according to the failure condition of the service channel, instead of detecting according to the connectivity between the arbitration and the edge device.
  • additional waste of resources can be reduced.
  • determining whether to switch between the active and standby nodes according to the failure of the service channel can reduce unnecessary switching and save switching time.
  • edge devices can also be implemented by components (such as chips or circuits) that can be used for edge devices (or nodes), and are implemented by the central system (
  • the methods and operations implemented by the central device can also be implemented by components (such as chips or circuits) that can be used in the central system (such as the central device).
  • the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.
  • the embodiments of the present application can divide the edge device (or node) or the central system (such as the central device) into functional modules according to the foregoing method examples.
  • each functional module can be divided corresponding to each function, or two or more
  • the functions are integrated in a processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or software functional modules. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation. The following is an example of dividing each function module corresponding to each function.
  • FIG. 8 is a schematic block diagram of a node control device provided by an embodiment of the present application.
  • the device 800 includes a transceiver unit 810 and a processing unit 820.
  • the transceiver unit 810 can implement corresponding communication functions, and the processing unit 820 is used for data processing.
  • the transceiving unit 810 may also be referred to as a communication interface or a communication unit.
  • the device 800 may further include a storage unit, which may be used to store instructions and/or data, and the processing unit 820 may read the instructions and/or data in the storage unit, so that the communication device can implement the foregoing method. example.
  • a storage unit which may be used to store instructions and/or data
  • the processing unit 820 may read the instructions and/or data in the storage unit, so that the communication device can implement the foregoing method. example.
  • the device 800 can be used to perform the actions performed by the node in the master node working mode in the above method embodiments.
  • the device 800 can be a node or a component that can be configured on a node.
  • the transceiving unit 810 is configured to perform operations related to the transmission and reception of the node in the working mode of the master node in the above method embodiment, and the processing unit 820 is configured to execute processing related to the node side in the working mode of the master node in the above method embodiment. operate.
  • the device 800 may be used to perform the actions performed by the node in the standby node working mode in the above method embodiment.
  • the device 800 may be a node or a component that can be configured on a node.
  • the transceiver unit 810 is configured to perform operations related to the transmission and reception of the node in the standby node working mode in the above method embodiment
  • the processing unit 820 is configured to perform the processing related to the node side in the backup node working mode in the above method embodiment. operate.
  • the device 800 can be used to perform the actions performed by the central system in the above method embodiments.
  • the device 800 can be a central system or a component that can be configured in the central system.
  • the transceiving unit 810 is configured to perform operations related to transmission and reception on the central system side in the above method embodiment
  • the processing unit 820 is configured to perform processing related operations on the central system side in the above method embodiment.
  • the device 800 is used to perform the actions performed by the nodes in the above embodiments.
  • the processing unit 820 is configured to monitor the connection state between the device 800 and the central system when the current working mode of the device 800 is determined to be the master node working mode, and switch to the standby when it is determined that the connection between the device 800 and the central system is interrupted Node working mode;
  • the transceiver unit 810 is configured to monitor the command from the central system when it is determined that the current working mode of the device 800 is the standby node working mode, and when receiving the master-up command sent by the central system, the processing unit 820 also It is used to switch to the working mode of the master node, where the upgrade master command is used to notify the switch from the working mode of the standby node to the working mode of the master node.
  • the edge system further includes a second node, a processing unit 820, which is also used to monitor the connection status with the second node; when it is determined that the connection between the device 800 and the second node is interrupted or connected, the device is maintained 800 current working mode.
  • the processing unit 820 is specifically configured to monitor the connection state between the device 800 and the gateway, and the gateway is between the device 800 and the central system.
  • the gateway is within the edge system.
  • the processing unit 820 is further configured to determine that the current working mode of the device 800 is the master node working mode when it is detected that the device 800 is in an active state, or, when it is detected that the device 800 is in an inactive state, determine that the device 800 is in an active state.
  • the current working mode is the standby node working mode; or, when it is detected that the business application on the device 800 is running, it is determined that the current working mode of the device 800 is the master node working mode, or it is detected that the business application on the device 800 is in a stopped state
  • identifying it is determined that the current working mode of the apparatus 800 is the standby node working mode.
  • the device 800 can implement the steps or processes executed by the nodes in the method 300 to the method 700 according to the embodiments of the present application, and the device 800 can include steps or processes for executing the method 300 in FIG. 3 to the method 700 in FIG. 7
  • the unit of the method executed by the node The unit of the method executed by the node.
  • each unit in the device 800 and other operations and/or functions described above are used to implement the corresponding processes of the method 300 in FIG. 3 to the method 700 in FIG. 7 respectively.
  • the transceiver unit 810 can be used to execute step 320 in the method 300
  • the processing unit 820 can be used to execute steps 310 and 330 in the method 300.
  • the transceiving unit 810 can be used to execute step 430 in the method 400, and the processing unit 820 can be used to execute steps 410, 420, and 401 in the method 400.
  • the processing unit 820 can be used to execute steps 510, 520, and 501 in the method 500.
  • the transceiving unit 810 can be used to execute step 610 in the method 600, and the processing unit 820 can be used to execute steps 620 and 601 in the method 600.
  • the transceiving unit 810 can be used to execute step 760 in the method 700
  • the processing unit 820 can be used to execute steps 710, 720, 730, and 740 in the method 700.
  • the device 800 is used to perform the actions performed by the central system of the above embodiment.
  • the processing unit 820 is used to detect the service channel of the edge system.
  • the edge system includes a first node and a second node; The second node sends a master upgrade command, where the master upgrade command is used to notify the switch from the standby node working mode to the master node working mode.
  • the processing unit 820 is specifically configured to: detect whether the connection with the first node or the second node is normal; in the case of detecting that the connection with the first node or the second node is interrupted, determine the edge system The business channel of has failed.
  • the processing unit 820 is specifically configured to: use a secure shell protocol SSH command to remotely log in to the node, and detect whether the connection with the first node or the second node is normal.
  • the device 800 can implement the steps or processes executed by the central system corresponding to the method 300 to the method 700 according to the embodiments of the present application, and the device 800 can include steps or processes for executing the method 300 in FIG.
  • the unit of the method performed by the central system.
  • each unit in the device 800 and other operations and/or functions described above are used to implement the corresponding processes of the method 300 in FIG. 3 to the method 700 in FIG. 7 respectively.
  • the transceiving unit 810 can be used to execute step 320 in the method 300
  • the processing unit 820 can be used to execute step 320 in the method 300.
  • the transceiving unit 810 can be used to execute step 430 in the method 400, and the processing unit 820 can be used to execute step 410 in the method 400.
  • the transceiver unit 810 may be used to execute step 610 in the method 600.
  • the transceiving unit 810 can be used to execute step 760 in the method 700, and the processing unit 820 can be used to execute step 750 in the method 700.
  • the device 800 is used to perform the actions performed by the node control system of the above embodiment.
  • the node control system includes: a first node, a second node, and a central system.
  • the first node and the second node are located in the edge system, and both the first node and the second node are connected to the central system.
  • the current working mode of the first node is the main Node working mode
  • the current working mode of the second node is the standby node working mode
  • the processing unit 820 is used to monitor the connection state between the first node and the central system, and when the connection between the first node and the central system is interrupted, make the first node One node switches to the standby node working mode;
  • the transceiver unit 810 is used to send a master upgrade command to the second node when the connection between the central system and the first node is interrupted.
  • Node working mode; the processing unit 820 is also used to make the second node switch to the master node working mode according to the master upgrade command.
  • the processing unit 820 is further configured to monitor the connection state between the first node and the second node; when it is determined that the connection between the first node and the second node is interrupted or connected, the first node and/or The current working mode of the second node.
  • processing unit 820 is specifically configured to monitor the connection state between the first node and the gateway, and the gateway is between the first node and the central system.
  • the gateway is located in the edge system.
  • the device 800 may implement the steps or processes executed by the node control system in the method 300 to the method 700 according to the embodiments of the present application.
  • the device 800 may include methods for executing the method 300 in FIG. 3 to the method 700 in FIG. 7
  • the node in the control system executes the method unit.
  • each unit in the device 800 and other operations and/or functions described above are used to implement the corresponding processes of the method 300 in FIG. 3 to the method 700 in FIG. 7 respectively.
  • the processing unit 820 in the above embodiment may be implemented by at least one processor or processor-related circuit.
  • the transceiver unit 810 may be implemented by a transceiver or a transceiver-related circuit.
  • the transceiving unit 810 may also be referred to as a communication unit or a communication interface.
  • the storage unit may be realized by at least one memory.
  • transceiving unit 810 in the device 800 may correspond to the transceiver 930 in the device 900 shown in FIG. 9, and the processing unit 820 in the device 800 may correspond to the processing in the device 900 shown in FIG. ⁇ 910.
  • FIG. 9 is a schematic block diagram of a node control device 900 provided by an embodiment of the present application.
  • the device 900 includes a processor 910, which is coupled to a memory 920, the memory 920 is used to store computer programs or instructions and/or data, and the processor 910 is used to execute computer programs or instructions stored in the memory 920 and / Or data, so that the method in the above method embodiment is executed.
  • the device 900 includes one or more processors 910.
  • the communication device 900 may further include a memory 920.
  • the memory 920 included in the device 900 may be one or more.
  • the memory 920 may be integrated with the processor 910 or provided separately.
  • the foregoing processor 910 and the memory 920 may be combined into one processing device, and the processor 910 is configured to execute the program code stored in the memory 920 to implement the foregoing functions.
  • the memory 920 may also be integrated in the processor 910 or independent of the processor 910.
  • the device 900 may further include a transceiver 930, and the transceiver 930 is used for signal reception and/or transmission.
  • the processor 910 is configured to control the transceiver 930 to receive and/or send signals.
  • the transceiver 930 may include an input interface (or referred to as a receiver) and an output interface (or referred to as a transmitter).
  • the transceiver may also be referred to as a communication interface.
  • the transceiver may further include an antenna, and the number of antennas may be one or more.
  • the device 900 is used to implement the operations performed by the nodes in the above method embodiments.
  • the device 900 may be the node in the above method embodiment, or may be a chip for implementing the function of the node in the above method embodiment.
  • the processor 910 is used to implement the processing-related operations performed by the node in the above method embodiment
  • the transceiver 930 is used to implement the transceiving-related operations performed by the node in the above method embodiment.
  • the device 900 may correspond to the node in FIG. 3 to FIG. 7 according to an embodiment of the present application, and the device 900 may include a device for executing the method 300 in FIG. 3 to the method 700 in FIG.
  • the unit of the method each unit in the device 900 and other operations and/or functions described above are used to implement the corresponding processes of the method 300 in FIG. 3 to the method 700 in FIG. 7 respectively. It should be understood that the specific process for each unit to execute the foregoing corresponding steps has been described in detail in the foregoing method embodiment, and is not repeated here for brevity.
  • the communication device 900 is used to implement the operations performed by the central system in the above method embodiments.
  • the device 900 may be the central system in the above method embodiment, or may be a chip used to implement the function of the central system in the above method embodiment.
  • the processor 910 is used to implement the processing-related operations performed by the central system in the above method embodiment
  • the transceiver 930 is used to implement the transceiving-related operations performed by the central system in the above method embodiment.
  • the device 900 may correspond to the central system (or central device) in FIG. 3 to FIG. 7 according to an embodiment of the present application, and the device 900 may include methods for executing the method 300 in FIG. 3 to FIG. 7 The unit of the method performed by the central system in the method 700.
  • each unit in the device 900 and other operations and/or functions described above are used to implement the corresponding processes of the method 300 in FIG. 3 to the method 700 in FIG. 7 respectively. It should be understood that the specific process for each unit to execute the foregoing corresponding steps has been described in detail in the foregoing method embodiment, and is not repeated here for brevity.
  • the communication device 900 is used to implement the operations performed by the node control system in the above method embodiments.
  • the device 900 may be the node control system in the above method embodiment, or may be a chip used to implement the function of the node control system in the above method embodiment.
  • the processor 910 is used to implement the processing-related operations performed by the node control system in the above method embodiment
  • the transceiver 930 is used to implement the transceiver-related operations performed by the node control system in the above method embodiment.
  • the device 900 may correspond to the node control system in FIG. 3 to FIG. 7 (for example, including the first node, the second node, and the central system) according to an embodiment of the present application, and the device 900 may include a device for executing FIG. 3 In the method 300 in FIG. 7 to the method 700 in FIG. 7, the node controls the unit of the method executed by the system.
  • each unit in the device 900 and other operations and/or functions described above are used to implement the corresponding processes of the method 300 in FIG. 3 to the method 700 in FIG. 7 respectively. It should be understood that the specific process for each unit to execute the foregoing corresponding steps has been described in detail in the foregoing method embodiment, and is not repeated here for brevity.
  • the embodiment of the present application also provides a computer-readable storage medium on which is stored computer instructions for implementing the method executed by the node in the foregoing method embodiment or the method executed by the central system.
  • the computer program when executed by a computer, the computer can implement the method executed by the node in the foregoing method embodiment or the method executed by the central system.
  • the embodiments of the present application also provide a computer program product containing instructions that, when executed by a computer, enable the computer to implement the method executed by the node in the foregoing method embodiments or the method executed by the central system.
  • An embodiment of the present application also provides an edge system, which includes the nodes in the above embodiments, such as a first node and a second node (or node #1 and node #2).
  • the embodiments of the present application also provide a node control system, which includes the nodes (such as the first node and/or the second node) and the central system in the above embodiments.
  • the terminal device or the network device may include a hardware layer, an operating system layer running on the hardware layer, and an application layer running on the operating system layer.
  • the hardware layer may include hardware such as a central processing unit (CPU), a memory management unit (MMU), and memory (also referred to as main memory).
  • the operating system at the operating system layer can be any one or more computer operating systems that implement business processing through processes, such as Linux operating systems, Unix operating systems, Android operating systems, iOS operating systems, or windows operating systems.
  • the application layer can include applications such as browsers, address books, word processing software, and instant messaging software.
  • the embodiment of this application does not specifically limit the specific structure of the execution subject of the method provided in the embodiment of this application, as long as it can run a program that records the code of the method provided in the embodiment of this application, according to the method provided in the embodiment of this application.
  • the execution subject of the method provided in the embodiments of the present application may be a terminal device or a network device, or a functional module in the terminal device or the network device that can call and execute the program.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • Usable media may include, but are not limited to: magnetic media or magnetic storage devices (for example, floppy disks, hard disks (such as mobile hard disks), magnetic tapes), optical media (for example, optical disks, compact discs).
  • CD compact disc
  • DVD digital versatile disc
  • smart cards and flash memory devices for example, erasable programmable read-only memory (EPROM), cards, sticks or key drives, etc.)
  • semiconductor media such as solid state disks (SSD), USB flash drives, read-only memory (ROM), random access memory (RAM), etc.
  • the various storage media described herein may represent one or more devices and/or other machine-readable media for storing information.
  • the term "machine-readable medium” may include, but is not limited to, wireless channels and various other media capable of storing, containing, and/or carrying instructions and/or data.
  • processors mentioned in the embodiments of this application may be a central processing unit (central processing unit, CPU), or other general-purpose processors, digital signal processors (digital signal processors, DSP), and application-specific integrated circuits ( application specific integrated circuit (ASIC), ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • CPU central processing unit
  • DSP digital signal processors
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory mentioned in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • the volatile memory may be random access memory (RAM).
  • RAM can be used as an external cache.
  • RAM may include the following various forms: static random access memory (static RAM, SRAM), dynamic random access memory (dynamic RAM, DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM) , Double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory (synchlink DRAM, SLDRAM) and Direct RAM Bus RAM (DR RAM).
  • static random access memory static random access memory
  • dynamic RAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • SDRAM synchronous DRAM
  • Double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced SDRAM enhanced synchronous dynamic random access memory
  • SLDRAM Direct RAM Bus RAM
  • the processor is a general-purpose processor, DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, or discrete hardware component
  • the memory storage module
  • memories described herein are intended to include, but are not limited to, these and any other suitable types of memories.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the above-mentioned units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described above as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to implement the solution provided in this application.
  • the functional units in the various embodiments of the present application may be integrated into one unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the computer program product includes one or more computer instructions.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer can be a personal computer, a server, or a network device.
  • Computer instructions can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • computer instructions can be transmitted from a website, computer, server, or data center through a cable (such as Coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website site, computer, server or data center.
  • a cable such as Coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Hardware Redundancy (AREA)

Abstract

本申请提供了一种节点控制的方法、系统以及装置。该方法应用于节点控制系统,该系统包括:第一节点、第二节点、中心系统,第一节点和第二节点位于边缘系统、且均与中心系统连接,第一节点处于主节点工作模式,第二节点处于备用节点工作模式,方法包括:第一节点监视与中心系统之间的连接状态,与中心系统的连接状态为中断时,切换到备用节点工作模式;中心系统监视与第一节点之间的连接状态,与第一节点的连接状态为中断时,向第二节点发送升主命令;第二节点根据升主命令切换到主节点工作模式。基于本申请,通过节点与中心系统的连通性检测,判断主节点是否降备;通过中心系统的指示,判断备节点是否升主,从而避免出现双主问题和无主问题。

Description

节点控制的方法、系统以及装置
本申请要求于2020年4月28日提交中国国家知识产权局、申请号为202010352126.0、发明名称为“节点控制的方法、系统以及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,尤其涉及一种节点控制的方法、系统以及装置。
背景技术
欧洲电信标准协会(European Telecommunications Standards Institute,ETSI)曾提出一种基于第五代(5th generation,5G)演进架构的移动边缘计算(mobile edge computing,MEC)技术。一方面,该MEC技术可以改善用户体验,节省带宽资源;另一方面,该MEC技术通过将计算能力下沉到移动边缘节点,提供第三方应用集成,为移动边缘入口的服务创新提供了无限可能。
MEC技术离不开虚拟化网元管理(visual network function manager,VNFM)的编排能力。例如,MEC技术也需要考虑VNFM“双主”和“无主”的问题。
“双主”问题表示,处于主、备关系的两台机器或两个节点,在某种条件触发下,均将自身变为主节点的现象。在该现象发生时,由于两个节点均以主节点身份在处理相同的业务,从而可能出现两个节点上的业务处理结果不一致的问题,主备模式双机需要尽量避免该情况的出现。类似地,“无主”问题表示,处于主、备关系的两台机器或两个节点,在某种条件触发下,均将自身变为备用节点的现象。在该现象发生时,由于两个节点均为备用节点身份,故会导致业务中断。
因此,VNFM“双主”问题和“无主”问题,是亟需解决的问题。
发明内容
本申请提供一种节点控制的方法、系统以及装置,不仅能够用较低成本解决双主问题,还可以避免无主问题的出现。
第一方面,提供了一种节点控制的方法,应用于节点控制系统,所述节点控制系统包括:第一节点、第二节点、中心系统,所述第一节点和所述第二节点位于边缘系统,且所述第一节点和所述第二节点均与所述中心系统连接,所述第一节点的当前工作模式为主节点工作模式,所述第二节点的当前工作模式为备用节点工作模式,所述方法包括:所述第一节点监视与所述中心系统之间的连接状态,在所述第一节点与所述中心系统的连接中断时,切换到备用节点工作模式;所述中心系统监视与所述第一节点之间的连接状态,在所述中心系统与所述第一节点的连接中断时,向所述第二节点发送升主命令,所述升主命令用于通知从备用节点工作模式切换到主节点工作模式;所述第二节点根据所述升主命令切换到主节点工作模式。
示例地,所述第一节点与所述中心系统的连接中断,或者,所述中心系统与所述第一节 点的连接中断,均表示所述中心系统与所述第一节点的连接状态处于中断的状态。
应理解,第一节点的当前工作模式为主节点工作模式,第二节点的当前工作模式为备用节点工作模式,仅是一种示例,其用于表示:在某个时间,在边缘系统中的多个节点中(如两个节点,或者也可以是两个以上的节点),一个节点处于主节点工作模式,另外的节点处于备用节点工作模式。
基于上述技术方案,对于处于主节点工作模式的节点来说,在检测到自身与中心系统之间的连接中断的情况下,进行降备操作;对于处于备用节点工作模式的节点来说,在接收到中心系统的升主命令后,进行升主操作。具体地,一方面,通过处于主节点工作模式的节点与中心系统的连通性检测,来判断处于主节点工作模式的节点是否要降备(即原来的主节点降为备用节点),从而避免出现双主问题。通过该方式,不仅可以避免双主的出现,还可以避免倒换以后才发现网络无法使用的情况,减少了无效的倒换时间消耗。又一方面,通过将仲裁能力集成到中心系统,从而可以避免出现无主问题。例如,中心系统可以向处于备用节点工作模式的节点下发升主命令,使得原来的备用节点切换为主节点工作模式。从而,不需要增加新的仲裁节点,而是通过将倒换能力集成到中心系统,不仅可以减少额外的资源浪费,还可以避免出现无主问题。此外,中心系统可以根据与处于主节点工作模式的节点之间的连接情况来决策是否下发升主命令,可以减少不必要的倒换,节省倒换时间。示例地,本申请实施例还可以用于MEC技术中,从而可以解决MEC技术中可能出现的双主问题和无主问题。
结合第一方面,在第一方面的某些实现方式中,所述第一节点监视与所述第二节点之间的连接状态,和/或,所述第二节点监视与所述第一节点之间的连接状态;确定所述第一节点与所述第二节点之间的连接中断或连通时,均维持所述第一节点和/或所述第二节点的当前工作模式。
基于上述技术方案,节点之间的检测,如心跳检测,结合节点与中心系统之间的检测,共同来确定处于主节点状态的节点是否要进行降备操作(即降为备用节点),从而可以避免出现双主问题。此外,处于主节点工作模式的节点根据自身与中心系统之间的连接,来确定是否要降备,所以即使出现硬件故障(如物理主机网卡故障或者物理主机故障或者网络设备故障等),或者出现,主节点或者备用节点因为内部软件故障(非网卡、网络等因素引起的故障)无法检测到对端心跳,也不会出现误判使得出现双主问题。
结合第一方面,在第一方面的某些实现方式中,所述第一节点监视与所述中心系统之间的连接状态,包括:所述第一节点监视与网关之间的连接状态,所述网关在所述第一节点与所述中心系统之间。
基于上述技术方案,可以根据节点与网关之间的连通性来确定处于主节点工作模式的节点是否要降备。
结合第一方面,在第一方面的某些实现方式中,所述网关位于所述边缘系统内。
第二方面,提供了一种节点控制的方法,应用于边缘系统中的第一节点,所述第一节点与中心系统连接,所述方法包括:在确定所述第一节点的当前工作模式为主节点工作模式的情况下,所述第一节点监视自身与所述中心系统之间的连接状态,在确定所述第一节点与所述中心系统的连接中断时,切换到备用节点工作模式;在确定所述第一节点的当前工作模式 为备用节点工作模式的情况下,所述第一节点监听来自所述中心系统的命令,在接收到所述中心系统发送的升主命令时,切换到主节点工作模式,其中,所述升主命令用于通知从备用节点工作模式切换到主节点工作模式。
示例地,所述升主命令可以是在当前处于主节点工作模式的第二节点与所述中心系统的连接中断时接收到的,所述边缘系统还包含所述第二节点,所述第二节点与所述中心系统连接。
示例地,在确定自身处于主节点工作模式的情况下,在检测到自身与中心系统之间的连接中断时,切换到备用节点工作模式。也就是说,如果第一节点处于主节点工作模式,那么第一节点在检测到自身与中心系统之间的连接中断时,切换到备用节点工作模式。可以理解,对于处于主节点工作模式的节点来说,在检测到自身与中心系统之间的连接中断的情况下,进行降备操作。
示例地,在确定自身处于备用节点工作模式的情况下,在检测到自身与中心系统之间的连接中断时,保持备用节点工作模式。
示例地,在确定自身处于备用节点工作模式的情况下,监听来自中心系统的命令,并在接收到来自中心系统的升主命令时,切换到主节点工作模式。也就是说,如果第一节点处于备用节点工作模式,那么第一节点在接收到来自中心系统的升主命令时,切换到主节点工作模式。可以理解,对于处于备用节点工作模式的节点来说,在接收到中心系统的升主命令后,进行升主操作。
示例地,监听来自中心系统的命令,即表示处于监听状态或者处于接收状态。
基于上述技术方案,对于处于主节点工作模式的节点来说,在检测到自身与中心系统之间的连接中断的情况下,进行降备操作;对于处于备用节点工作模式的节点来说,在接收到中心系统的升主命令后,进行升主操作。具体地,一方面,通过处于主节点工作模式的节点与中心系统的连通性检测,来判断处于主节点工作模式的节点是否要降备(即原来的主节点降为备用节点),从而避免出现双主问题。通过该方式,不仅可以避免双主的出现,还可以避免倒换以后才发现网络无法使用的情况,减少了无效的倒换时间消耗。又一方面,通过将仲裁能力集成到中心系统,从而可以避免出现无主问题。例如,中心系统可以向处于备用节点工作模式的节点下发升主命令,使得原来的备用节点切换为主节点工作模式。从而,不需要增加新的仲裁节点,而是通过将倒换能力集成到中心系统,不仅可以减少额外的资源浪费,还可以避免出现无主问题。此外,中心系统可以根据与处于主节点工作模式的节点之间的连接情况来决策是否下发升主命令,可以减少不必要的倒换,节省倒换时间。示例地,本申请实施例还可以用于MEC技术中,从而可以解决MEC技术中可能出现的双主问题和无主问题。
结合第二方面,在第二方面的某些实现方式中,所述边缘系统还包含第二节点,所述第二节点与所述中心系统连接,所述第一节点监视自身与所述第二节点之间的连接状态;在确定所述第一节点与所述第二节点之间的连接中断或连通时,均维持所述第一节点的当前工作模式。
示例地,如果检测到与第二节点之间的连接中断,且检测到与中心系统之间的连接中断,若自身处于主节点工作模式,则切换到备用节点工作模式;若自身处于备用节点工作模式, 则保持备用节点工作模式;如果检测到与第二节点之间的连接中断,且未检测到与中心系统之间的连接中断,则无论自身处于主节点工作模式还是备用节点工作模式,均不作处理。
示例地,第一节点可以周期性地检测与第二节点之间的连接。例如,在系统启动后,第一节点可以周期性地检测与第二节点之间的连接。
示例地,第一节点通过ping网关IP地址的方式,或者,发送消息(如心跳信号),检测与第二节点连接是否正常。
结合第二方面,在第二方面的某些实现方式中,所述第一节点监视自身与所述中心系统之间的连接状态,包括:所述第一节点监视自身与网关之间的连接状态,所述网关在所述第一节点与所述中心系统之间。
示例地,第一节点周期性地检测与网关之间的连接。例如,在系统启动后,第一节点可以周期性地检测与网关之间的连接。
结合第二方面,在第二方面的某些实现方式中,所述网关在所述边缘系统内。
结合第二方面,在第二方面的某些实现方式中,所述方法还包括:检测到所述第一节点处于活跃状态时,确定所述第一节点的当前工作模式为主节点工作模式,或者,检测到所述第一节点处于非活跃状态时,确定所述第一节点的当前工作模式为备用节点工作模式;或者,检测到所述第一节点上的业务应用处于运行状态时,确定所述第一节点的当前工作模式为主节点工作模式,或者,检测到所述第一节点上的业务应用处于停止状态时,确定所述第一节点的当前工作模式为备用节点工作模式;或者,检测到所述第一节点上不存在状态标识时,确定所述第一节点的当前工作模式为主节点工作模式,或者,检测到所述第一节点上存在状态标识时,确定所述第一节点的当前工作模式为备用节点工作模式。
示例地,活跃状态,例如可以表示是否能够对外提供服务。也就是说,主节点对外提供服务。例如,检测到所述第一节点对外提供服务时,确定所述第一节点的当前工作模式为主节点工作模式,或者,检测到所述第一节点不能对外提供服务时,确定所述第一节点的当前工作模式为备用节点工作模式。
基于上述技术方案,可以通过节点的状态、或者节点上业务应用的状态、或者标识等,均可以确定自身是主节点还是备用节点。
第三方面,提供了一种节点控制的方法,方法包括:中心系统监视与第一节点之间的连接状态,所述第一节点的当前工作模式为主节点工作模式;在所述中心系统与所述第一节点的连接中断时,向第二节点发送升主命令,所述第二节点的当前工作模式为备用节点工作模式,所述升主命令用于通知从备用节点工作模式切换到主节点工作模式。
基于上述技术方案,仲裁能力可以集成到中心系统提供。例如,中心系统可以根据业务通道的故障情况来决策备用节点是否要切换到主节点工作模式,而不是根据新增的仲裁节点与边缘设备的连通性来检测。从而,通过将倒换能力集成到中心系统,可以减少额外的资源浪费。此外,根据业务通道故障情况来决策主备节点是否倒换,可以减少不必要的倒换,节省倒换时间。示例地,本申请实施例还可以用于MEC技术中,从而可以解决MEC技术中可能出现的“双主”问题和“无主”问题。
结合第三方面,在第三方面的某些实现方式中,中心系统监视与第一节点之间的连接状态,包括:中心系统通过安全外壳协议SSH命令远程登录节点方式,检测与第一节点之间 的连接是否正常。
第四方面,提供了一种节点控制的方法,应用于边缘系统中的第一节点,所述第一节点与中心系统连接,方法包括:所述第一节点检测与所述中心系统之间的连接;在确定自身处于主节点工作模式的情况下,在检测到自身与所述中心系统之间的连接中断时,切换到备用节点工作模式;在确定自身处于备用节点工作模式的情况下,在检测到自身与所述中心系统之间的连接中断时,保持备用节点工作模式。
基于上述技术方案,通过节点与中心系统的连通性检测,来判断节点是否要降备(即原来的主节点降为备用节点),从而避免出现双主问题。例如,节点进行中心系统连通性检测,在中心系统连通检测故障的情况下,如果节点为主节点,则该主节点可以主动降为备用节点。通过该方式,不仅可以避免双主的出现,还可以避免倒换以后才发现网络无法使用的情况,减少了无效的倒换时间消耗。示例地,本申请实施例还可以用于MEC技术中,从而可以解决MEC技术中可能出现的“双主”问题。
结合第四方面,在第四方面的某些实现方式中,边缘系统还包括第二节点,方法还包括:所述第一节点检测与第二节点之间的连接;在检测到所述第一节点与第二节点之间的连接中断的情况下,无论第一节点处于主节点工作模式还是备用节点工作模式,均不作处理。
结合第四方面,在第四方面的某些实现方式中,所述第一节点检测与所述中心系统之间的连接,包括:所述第一节点检测与网关之间的连接,所述网关在所述第一节点与所述中心系统之间。
结合第四方面,在第四方面的某些实现方式中,所述网关位于所述边缘系统内。
第五方面,提供了一种节点控制的方法,应用于边缘系统中的第一节点;方法包括:接收来自中心系统的升主命令,其中,升主命令用于通知从备用节点工作模式切换到主节点工作模式;在确定自身处于备用节点工作模式的情况下,切换到主节点工作模式。
示例地,在确定自身处于主节点工作模式的情况下,切换到备用节点工作模式。例如,在软件发生故障的情况下,原来的主备节点需要倒换,即中心系统能够连接到两个节点,且中心系统向两个节点下发升主命令(或者也可以称为升主命令),使得原处于主节点工作模式的节点切换到备用节点工作模式,原处于备用节点工作模式的节点切换到主节点工作模式。
基于上述技术方案,对于处于备用节点工作模式的节点来说,在接收到中心系统的升主命令后,进行升主操作。也就是说,可以进行业务联动检测,仲裁能力可以集成到中心系统提供。例如,中心系统可以根据业务通道的故障情况来决策原来的备用节点是否要切换到主节点,而不是根据仲裁与边缘设备的连通性来检测。从而,通过将倒换能力集成到中心系统,可以减少额外的资源浪费。此外,根据业务通道故障情况来决策主备节点是否倒换,可以减少不必要的倒换,节省倒换时间。示例地,本申请实施例还可以用于MEC技术中,从而可以解决MEC技术中可能出现的“双主”问题和“无主”问题。
第六方面,提供一种节点控制系统,所述节点控制系统包括:第一节点、第二节点、中心系统,所述第一节点和所述第二节点位于边缘系统,且所述第一节点和所述第二节点均与所述中心系统连接,所述第一节点的当前工作模式为主节点工作模式,所述第二节点的当前工作模式为备用节点工作模式,所述第一节点,用于监视与所述中心系统之间的连接状态, 在所述第一节点与所述中心系统的连接中断时,切换到备用节点工作模式;所述中心系统,用于监视与所述第一节点之间的连接状态,在所述中心系统与所述第一节点的连接中断时,向所述第二节点发送升主命令,所述升主命令用于通知从备用节点工作模式切换到主节点工作模式;所述第二节点,用于根据所述升主命令切换到主节点工作模式。
结合第六方面,在第六方面的某些实现方式中,所述第一节点,还用于监视与所述第二节点之间的连接状态;所述第一节点,还用于确定所述第一节点与所述第二节点之间的连接中断或连通,均维持所述第一节点和/或所述第二节点的当前工作模式。
结合第六方面,在第六方面的某些实现方式中,所述第一节点,具体用于监视与网关之间的连接状态,所述网关在所述第一节点与所述中心系统之间。
结合第六方面,在第六方面的某些实现方式中,所述网关位于所述边缘系统内。
第七方面,提供一种节点控制的装置,包括用于执行上述第一方面至第五方面中任一种可能实现方式中的方法的各个模块或单元。
第八方面,提供一种节点控制的设备,包括处理器。该处理器与存储器耦合,可用于执行存储器中的指令,以实现上述第一方面至第五方面以及第一方面至第五方面中任一种可能实现方式中的方法。可选地,该设备还包括存储器。可选地,该设备还包括通信接口,处理器与通信接口耦合,所述通信接口用于输入和/或输出信息。所述信息包括指令和数据中的至少一项。
在一种实现方式中,该设备为节点,或者,该设备为中心系统。当该设备为节点或中心系统时,所述通信接口可以是收发器,或,输入/输出接口。
在另一种实现方式中,该设备为芯片或芯片系统。当该设备为芯片或芯片系统时,所述通信接口可以是输入/输出接口可以是该芯片或芯片系统上的输入/输出接口、接口电路、输出电路、输入电路、管脚或相关电路等。所述处理器也可以体现为处理电路或逻辑电路。
在另一种实现方式中,该设备为配置于节点中的芯片或芯片系统,或者,该设备为配置于中心系统中的芯片或芯片系统。
可选地,所述收发器可以为收发电路。可选地,所述输入/输出接口可以为输入/输出电路。
第九方面,提供了一种处理器,包括:输入电路、输出电路和处理电路。所述处理电路用于通过所述输入电路接收信号,并通过所述输出电路发射信号,使得所述处理器执行上述第一方面至第五方面任一种可能实现方式中的方法。
在具体实现过程中,上述处理器可以为芯片,输入电路可以为输入管脚,输出电路可以为输出管脚,处理电路可以为晶体管、门电路、触发器和各种逻辑电路等。输入电路所接收的输入的信号可以是由例如但不限于输入接口接收并输入的,输出电路所输出的信号可以是例如但不限于输出给输出接口并由输出接口发射的,且输入电路和输出电路可以是同一电路,该电路在不同的时刻分别用作输入电路和输出电路。本申请实施例对处理器及各种电路的具体实现方式不做限定。
第十方面,提供了一种处理装置,包括处理器和存储器。该处理器用于读取存储器中存储的指令,并可通过输入接口接收信号,通过输出接口发射信号,以执行上述第一方面至第五方面任一种可能实现方式中的方法。其中,输出接口和输入接口可以统称为通信接口。
可选地,所述处理器为一个或多个,所述存储器为一个或多个。
可选地,所述存储器可以与所述处理器集成在一起,或者所述存储器与处理器分离设置。
上述第十方面中的处理装置可以是一个芯片,该处理器可以通过硬件来实现也可以通过软件来实现,当通过硬件实现时,该处理器可以是逻辑电路、集成电路等;当通过软件来实现时,该处理器可以是一个通用处理器,通过读取存储器中存储的软件代码来实现,该存储器可以集成在处理器中,可以位于该处理器之外,独立存在。
第十一方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被通信装置执行时,使得所述通信装置实现第一方面至第五方面,以及第一方面至第五方面的任一可能的实现方式中的方法。
第十二方面,提供一种包含指令的计算机程序产品,所述指令被计算机执行时使得装置实现第一方面至第五方面提供的方法。
第十三方面,提供了一种节点控制系统,包括前述的第一节点和第二节点;或者,包括前述的第一节点、第二节点以及中心系统。
附图说明
图1是适用于本申请实施例的网络架构的一示意图;
图2是适用于本申请实施例的应用场景的一示意图;
图3是本申请一实施例提供的节点控制的方法的示意性框图;
图4是本申请再一实施例提供的节点控制的方法的示意性框图;
图5是本申请又一实施例提供的节点控制的方法的示意性框图;
图6是本申请另一实施例提供的节点控制的方法的示意性框图
图7示出了适用于本申请实施例的节点控制的方法的示意图;
图8是本申请实施例提供的节点控制的装置的示意性框图;
图9是本申请实施例提供的节点控制的设备的示意性框图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例的技术方案可以应用于各种通信系统,例如:第五代(5th generation,5G)系统或新无线(new radio,NR)、长期演进(long term evolution,LTE)系统、LTE频分双工(frequency division duplex,FDD)系统、LTE时分双工(time division duplex,TDD)、通用移动通信系统(universal mobile telecommunication system,UMTS)等。
为便于理解本申请实施例,首先结合图1和图2详细说明适用于本申请实施例的网络架构。
图1是适用于本申请实施例的网络架构的一示意图。如1图所示,该网络架构可以属于二层架构。其中,上层为位于中心区域的中心系统,下层为位于各个地市的边缘系统。中心系统可作为中心控制者,业务的创建和删除可以由中心系统发起。且中心系统可以将业务的创建和删除的请求下发到各个边缘系统,由边缘系统创建或者删除该业务。此外,边缘系统也具有一定的业务处理能力,如查询业务、故障处理等。
其中,边缘系统还包括边缘节点(edge node)。边缘节点,即表示在靠近用户的网络边缘侧构建的业务平台,提供存储、计算、网络等资源,将部分关键业务应用下沉到接入网络边缘,以减少网络传输和多级转发带来的宽度和时延损耗。
边缘节点可以包括一个或多个节点(node),节点,是虚拟机,包含了虚拟机里面所运行的系统、应用、数据库等。如图1所示,边缘节点,例如可以包括节点1和节点2。节点1和节点2可以是处于主、备关系的两个节点。节点1和节点2通常都会部署在不同的物理主机、不同的存储上,任何一个节点故障均不影响另外一个节点,另外一个节点仍然可继续运行。大部分情况下每台物理主机只有2个物理网卡,一个用于硬件管理(即网卡0,图中未标识),一个提供给运行在物理主机上的虚拟机使用(即网卡1)。例如图1中的网卡1-1和网卡1-2,其中,网卡1-1表示该节点1所在的物理主机1上的网卡1,网卡1-2表示该节点2所在的物理主机2上的网卡1。在节点1和节点2之间可以配置心跳通道,如图1中的心跳1,节点1和节点2可以通过检测之间的心跳通道,如心跳1,确定节点之间的连接是否中断。
在本申请实施例中,边缘节点和中心系统之间存在业务连接通道,从而中心系统可以通过管理通道来对边缘节点下发指令。如图1或图2,中心系统可以通过网关、交换机等对边缘节点下发指令。
图2是适用于本申请实施例的应用场景的一示意图。
本申请实施例可以应用于移动边缘计算(mobile edge computing,MEC)场景中。例如,如图1所示的架构可以应用到MEC场景中。如图2所示,中心系统即中心虚拟化网元管理(visual network function manager,VNFM)设备,边缘系统即边缘VNFM设备。
中心VNFM设备和边缘VNFM设备可以协同创建虚拟化网元(visual network function,VNF)。同时该VNF可以从边缘VNFM设备获取虚拟资源的信息,例如可以包括但不限于:虚拟机信息、网络等。如果该VNF存在故障,VNF可以与边缘VNFM设备协同对VNF进行修复。示例地,如图2所示,该VNF可以包括用户面功能(user plane function,UPF)。
应理解,图2中所示的VNF(UPF网元)可以为独立的设备,也可以集成于同一设备中实现不同的功能,本申请对此不做限定。
还应理解,上述命名仅为用于区分不同的功能,并不代表这些网元分别为独立的物理设备,本申请对于上述网元的具体形态不作限定,例如,可以集成在同一个物理设备中,也可以分别是不同的物理设备。此外,上述命名仅为便于区分不同的功能,而不应对本申请构成任何限定,本申请并不排除在5G网络以及未来其它的网络中采用其他命名的可能。例如,在6G网络中,上述各个网元中的部分或全部可以沿用5G中的术语,也可能采用其他名称等。在此进行统一说明,以下不再赘述。
还应理解,上述图1和图2仅是示例性说明,适用本申请实施例的网络架构并不局限于此,任何能够实现上述各个网元的功能的网络架构都适用于本申请实施例。例如,上述边缘系统中可以包括更多数量的备用节点。
还应理解,图1和图2以网关包含于边缘系统中为例,进行示例性说明,对此不作限定。例如,网关也可以包含于中心系统中。
为避免移动承载网络被管道化,与移动互联网及物联网业务深度融合,进而提升移动网 络带宽的价值,欧洲电信标准协会(European Telecommunications Standards Institute,ETSI)提出了基于5G演进架构的MEC技术。一方面,MEC技术可以改善用户体验,节省带宽资源;另一方面,通过将计算能力下沉到移动边缘节点,提供第三方应用集成,为移动边缘入口的服务创新提供了无限可能。
将计算能力下沉到移动边缘节点后,可以创造出一个具备高性能、低延迟与高带宽的电信级服务环境,加速网络中各项内容、服务及应用的分发和下载,让消费者享有更高质量网络体验。
不论是第四代移动通信技术(the fourth generation of mobile phone mobile communication technology standards)的MEC,还是5G的MEC,都离不开VNFM的编排能力;VNFM也需要适应MEC的业务需求。
以VNFM“双主”问题为例。“双主”问题表示,处于主、备关系的两台机器或两个节点,在某种条件触发下,均将自身变为主节点的现象。在该现象发生时,由于两个节点均以主节点身份在处理相同的业务,从而可能出现两个节点上的业务处理结果不一致的问题,主备模式双机需要尽量避免该情况的出现。可以理解,“无主”问题,即表示处于主、备关系的两台机器或两个节点,在某种条件触发下,均将自身变为备用节点的现象。
以图1或图2中的节点1和节点2为例,主备模式双机中的节点1和节点2通常都会部署在不同的物理主机、不同的存储上,任何一个节点故障均不影响另外一个节点,另外一个节点仍然可继续运行。
现有技术中,解决“双主”问题包括以下两种方式。
一种方式,通过检测节点之间的心跳通道,来避免出现“双主”问题。以图2所示的架构为例,在节点1和节点2之间配置多条心跳通道。具体地,只要有一条心跳通道还能连通,就不会出现双主现象,即主节点还是主节点,备用节点还是备用节点;当没有心跳通道能够连通的情况下,备用节点升为主节点。
通过该方式,硬件故障或软件故障,均将引起“双主”问题。例如,硬件故障将引起“双主”问题:虚拟化场景中同一个节点的多条心跳通道所通过的虚拟网卡实际会对应物理主机中同一个物理网卡(如图2的网卡1)。假设节点1为主节点、节点2为备用节点,物理主机网卡故障或者物理主机故障或者网络设备故障等,均会导致节点2无法检测到节点1的心跳,节点2将自身升为主节点,从而出现“双主”现象。又如,软件故障将引起“双主”问题:当主节点或者备用节点因为内部软件故障(非网卡、网络等因素引起的故障)无法检测到对端心跳,则会出现“双主”情况;即使增加多条心跳通道配置,也不能解决该场景下的双主问题。
又一种方式,通过增加仲裁节点,来避免出现“双主”问题。有两种方式增加仲裁节点:一种在是每个边缘系统增加仲裁节点,就近进行裁决,即每个边缘系统中增加的仲裁节点决策该边缘系统哪个节点为主机;另外一种是在与中心VNFM相等位置增加仲裁节点,由该增加的仲裁节点决策每个边缘系统内哪个节点为主机。
通过该方式,不仅增加额外的资源,如计算、存储、网络都增加额外的资源;而且如果某仲裁节点与某个节点在相同的物理单板上,那么物理单板的单点故障导致仲裁节点和该节点同时故障,剩余一个节点因为缺少仲裁,不能进行升主,形成“无主”现象,最终整体 业务不可用。
有鉴于此,本申请提供一种方式,不仅可以使用较低的成本解决“双主”问题,还可以避免“无主”问题的产生。
下面将结合附图详细说明本申请提供的各个实施例。
图3是本申请实施例提供的一种节点控制的方法300的示意性框图。方法300可以应用于节点控制系统,节点控制系统包括:第一节点、第二节点、中心系统,第一节点和第二节点位于边缘系统,且第一节点和第二节点均与中心系统连接。
其中,中心系统,即表示可以控制第一节点和第二节点所处的边缘系统的设备。如该中心系统可作为中心控制者,发起业务的创建和删除。且该中心系统可以将业务的创建和删除的请求下发到各个边缘系统,由边缘系统创建或者删除该业务。
示例地,该中心系统例如可以为如图1所述的中心系统,或者,该中心系统也可以为如图2所述的中心VNFM设备。对于中心系统的具体形式,本申请实施例不作限定。
应理解,中心系统仅是为区分不同功能做的命名,其命名不对本申请实施例的保护范围造成限定,在未来协议中,用于表示相同功能的命名,都落入本申请实施例的保护范围。
假设第一节点的当前工作模式为主节点工作模式,第二节点的当前工作模式为备用节点工作模式,方法300可以包括如下步骤。
310,第一节点监视与中心系统之间的连接状态,在第一节点与中心系统的连接中断时,切换到备用节点工作模式;
320,中心系统监视与第一节点之间的连接状态,在中心系统与第一节点的连接中断时,向第二节点发送升主命令,升主命令用于通知从备用节点工作模式切换到主节点工作模式;
330,第二节点根据升主命令切换到主节点工作模式。
通过本申请实施例,对于处于主节点工作模式的节点来说,在检测到自身与中心系统之间的连接中断的情况下,进行降备操作;对于处于备用节点工作模式的节点来说,在接收到中心系统的升主命令后,进行升主操作。具体地,一方面,通过处于主节点工作模式的节点与中心系统的连通性检测,来判断处于主节点工作模式的节点是否要降备(即原来的主节点降为备用节点),从而避免出现双主问题。通过该方式,不仅可以避免双主的出现,还可以避免倒换以后才发现网络无法使用的情况,减少了无效的倒换时间消耗。又一方面,通过将仲裁能力集成到中心系统,从而可以避免出现无主问题。例如,中心系统可以向处于备用节点工作模式的节点下发升主命令,使得原来的备用节点切换为主节点工作模式。从而,不需要增加新的仲裁节点,而是通过将倒换能力集成到中心系统,不仅可以减少额外的资源浪费,还可以避免出现无主问题。此外,中心系统可以根据与处于主节点工作模式的节点之间的连接情况来决策是否下发升主命令,可以减少不必要的倒换,节省倒换时间。
第一节点和第二节点属于边缘设备或者说边缘系统中的节点,该边缘设备或者说边缘系统中可以包括多个节点。如边缘设备或者说边缘系统中可以包括两个节点,例如记作第一节点和第二节点。或者,边缘设备或者说边缘系统中可以包括两个以上的节点,例如一个节点处于主节点工作模式,另外多个节点处于备用节点工作模式,对此不作限定。方法300主要以假设第一节点处于主节点工作模式,第二节点处于备用节点工作模式为例进行示例性说明。
应理解,关于节点的具体形式,如第一节点和第二节点,本申请实施例不作限定。例如,节点可以为虚拟机,其包含了虚拟机里面所运行的系统、应用、数据库等。其中,应用例如可以由管理应用和业务应用组成。
应理解,备用节点,是相对于主节点来说的,备用节点也可以称为备节点或者非主节点,其命名不对本申请实施例的保护范围造成限定,在未来协议中,用于表示相同功能的命名,都落入本申请实施例的保护范围。
还应理解,本申请实施例对节点的状态不作限定。以第一节点为例,该第一节点在当前处于主节点状态,在某个时段可能处于备用节点状态,并不限定该第一节点只能为主节点。
关于主节点,或者说,某个节点为主节点,或者说某个节点处于主节点工作模式,其均用于表示相同的含义,本领域技术人员应理解其含义。关于备用节点,或者说,某个节点为备用节点,或者说某个节点处于备用节点工作模式,其均用于表示相同的含义,本领域技术人员应理解其含义。下文统一用主节点和备用节点表述。
一示例,可以根据节点所处的状态,确定节点是主节点还是备用节点。例如,主节点表示处于活跃状态的节点,备用节点表示处于非活跃状态的节点。可以理解,如果节点处于活跃状态,则确定该节点为主节点;如果节点处于非活跃状态,则确定该节点为备用节点。
其中,活跃状态,例如可以表示是否能够对外提供服务。也就是说,主节点对外提供服务。例如,边缘系统安装时可以默认设置一节点为主节点,另一节点为备用节点,由主节点对外提供服务。
又一示例,可以根据节点上业务应用的状态,确定节点是主节点还是备用节点。例如,主节点中的系统、管理应用、业务应用、数据库等都可以处于运行状态;备用节点的系统、管理应用、数据库等可以处于运行状态,业务应用是停止状态。可以理解,如果节点上的业务应用处于运行状态,则确定该节点为主节点;如果节点上的业务应用处于不运行状态或者说停止状态,则确定该节点为备用节点。应理解,本申请实施例并未限定备用节点上的业务应用一定处于停止状态,例如,在热备模式下,主节点和备用节点上的系统、管理应用、业务应用、数据库等都可以处于运行状态。
又一示例,可以根据标识确定节点是主节点还是备用节点。可以采用记录标志位的方式来识别主备节点。例如,边缘系统安装过程中,可以在某一节点上预设存储路径下创建标识文件(即包含备用状态标志位,如standby_flag,的文件),以标识该节点为备用节点,而无该标识文件的节点则为主节点。可以理解,以标志位为standby_flag为例,如果节点上存在包含standby_flag标志位的文件,则确定该节点为备用节点;如果节点上不存在包含该标志位的文件,则确定该节点为主节点。应理解,关于通过标识识别主节点还是备用节点的方式很多,此处不再赘述,任何其他通过标识的方式识别主节点还是备用节点的方式,都落入本申请实施例的保护范围。
应理解,上述示例仅是示例性说明,任何可以实现判断自身是主节点还是备节点的方式,都落入本申请实施例的保护范围。
本申请实施例中多次提及“切换到备用节点工作模式”或者“从主节点降为备用节点”或者“降备”,本领域技术人员应理解其含义。其均用于表示,某节点从原先的主节点模式切换为备用节点模式;或者说,某节点开始以备用节点身份处理业务;或者可以理解为,在边缘 系统中,某节点开始作为备用节点存在,如该节点暂时不再处理业务(如该节点停止该节点上的全部应用)。对此,下文不再赘述。
本申请实施例中多次提及“切换到主节点工作模式”或者“从备用节点升为备用节点”或者“升主”,本领域技术人员应理解其含义。其均用于表示,某节点从原先的备用节点模式切换为主节点模式;或者说,某节点开始以主节点身份处理业务;或者可以理解为,在边缘系统中,某节点开始作为主节点存在,如该主点开始处理业务。对此,下文不再赘述。
可以理解,各个节点管理应用控制外部访问边缘系统的通信通道,并可以记录主备模式,当进行升主或者降备等操作时会更新主备模式。
可选地,第一节点可以检测与网关的连接,以便确定与中心系统的连接状态。
例如,第一节点检测与网关的连接为中断时,确定与中心系统的连接状态为中断;第一节点检测与网关的连接为连通时,确定与中心系统的连接状态为连通。下文主要以检测与网关的连接为例进行示例性说明,应理解,任何可以确定节点与中心系统的连通性的方式,都适用于本申请实施例。
示例地,网关可以位于边缘系统中,例如图1或图2所示;或者,网关也可以位于中心系统,对此不作限定。
应理解,关于网关的具体形式,本申请实施例不作限定。例如,网关可以为交换机中的网间连接器或者说网络设备。或者,网关也可以为路由设备中的网间连接器或者说网络设备。网关与节点之间的连接,即可以表示节点通过连接网关,以便节点可以运行。如节点通过连接到网关,以便可以运行程序或数据库等应用。
可选地,关于节点与中心系统(如网关)之间检测的时机,本申请实施例不作限定。
一种可能的实现方式,第一节点可以周期性地检测与中心系统之间的连接。或者说,第一节点可以定期检测与中心系统之间的连接。例如,在系统启动后,第一节点可以周期性地检测与中心系统之间的连接。
示例地,第一节点按照第一预设时间,周期性地检测与中心系统之间的连接。
其中,第一预设时间,例如可以是配置的时长;或者,第一预设时间也可以是预先定义的时长,如协议预先定义或者中心系统预先规定的时长;或者,第一预设时间也可以根据历史检测情况确定的时长。对此,不作限定。
例如,该第一预设时间可以是一分钟。即第一节点可以每隔一分钟检测与中心系统之间的连接。
应理解,第一节点也可以不定期地检测与中心系统之间的连接,对此本申请实施例不作限定。
可选地,关于节点与中心系统(如网关)之间检测的方式,本申请实施例不作限定。
一种可能的实现方式,第一节点可以通过发送消息的方式,确定该第一节点与中心系统连接是否正常。例如,第一节点向中心系统发送消息(如心跳消息或者心跳信号等),如果第一节点接收到中心系统回复的消息(如确认(acknowledge,ACK)消息),则可以确定与中心系统之间的连接正常;如果第一节点未接收到中心系统回复的消息,则可以确定与中心系统之间的连接不正常。
又一种可能的实现方式,第一节点可以通过ping网关IP地址的方式,检测与中心系统 (如网关)连接是否正常。
例如,如果ping命令结果返回0,则第一节点与中心系统(如网关)之间连接正常;如果ping命令结果返回了其他数值,则第一节点与中心系统(如网关)之间连接故障(或者说连接中断)。
示例地,ping命令的相关参数可以包括但不限于:指定连续执行次数、每次执行ping的间隔时间、执行超时时间。
其中,指定连续执行次数,例如可以表示连续执行的次数,如连续执行通过ping网关IP地址的方式检测的次数。指定连续执行次数,可以是配置的,也可以是预先定义的,对此不作限定。例如,可以设置连续执行次数为A,其中,A为大于1或等于1的整数。例如A为5,即连续执行次数为5次。
其中,每次执行ping的间隔时间,例如可以表示相邻两次执行ping的时间间隔,如相邻两次执行通过ping网关IP地址的方式检测的时间间隔。每次执行ping的间隔时间,可以是配置的,也可以是预先定义的,对此不作限定。例如,可以默认每次执行ping的间隔时间为T1,其中,T1为大于0的数。例如T1为1秒,即每次执行ping的间隔时间为1秒。
其中,执行超时时间,例如可以表示执行ping的超时时间。执行超时时间,可以是配置的,也可以是预先定义的,对此不作限定。例如,可以默认执行超时时间为T2,其中,T2为大于0的数。例如T2为4秒,即执行超时时间为4秒。
应理解,上述方式仅是示例性说明,任何可以使得节点检测与中心系统之间的连接的方式,都落入本申请实施例的保护范围。
如前所述,在本申请实施例中,中心系统也可以检测与第一节点的连接状态。关于中心系统与节点之间检测的方式,本申请实施例不作限定。
一种可能的实现方式,中心系统可以通过安全外壳协议(secure shell,SSH)命令远程登录节点方式检测与边缘节点(如第一节点)之间的连接是否正常。如果SSH命令结果返回0,则表示中心系统与边缘节点(如第一节点)之间的连接正常;如果SSH命令返回了其他数字,则表示中心系统与边缘节点(如第一节点)之间的连接故障。
应理解,本申请实施例对中心系统检测与边缘节点(如第一节点)之间的连接是否正常的方式,不作限定,任何可以实现中心系统检测与边缘节点之间的连接是否正常的方式,都落入本申请实施例的保护范围。例如,中心系统可以采用如上文所述的ping方式检测与边缘节点之间的连接是否正常。又如,也可以通过向边缘节点发送消息的方式确定连接是否正常。
关于中心系统与边缘节点(如第一节点)之间检测的时机,本申请实施例不作限定。
例如,当中心系统收到“unavailable”的错误码后,中心系统可以检测与边缘节点(如第一节点)之间的连接是否正常。一种可能的情况,当边缘设备出现“无主状态”时,所有的节点上面的程序都被停止,会导致中心系统和边缘设备已建立的通道,如已建立的基于谷歌(google)远程过程调用(remote procedure calls,RPC)(gRPC)框架开发的通道被迫中断,从而中心系统会收到“unavailable”的错误码。
应理解,在本申请实施例中,多次提及“无主状态”,其表示多个节点均为备用节点或者说均处于备用节点工作模式,没有主节点。或者说,边缘系统中的节点均为备用节点或者 说均处于备用节点工作模式,没有主节点,例如,边缘系统中所有的节点上面的程序都被停止。
其中,gRPC通道,表示中心系统和边缘设备之间通信采用了gRPC框架,协议为HTTP/2(超文本传输协议(Hypertext Transfer Protocol,HTTP)),中心系统和边缘设备建立通信以后,该通信通道一直存在。应理解,gRPC通道仅是一种示例说明,本申请实施例并未限定中心系统和边缘设备之间只能使用gRPC通道通信。
中心系统检测到与第一节点所处的边缘设备(或者说边缘系统)的业务连接通道中断的情况下,则中心系统可以对边缘设备下发升主命令。相应地,第二节点接收该升主命令。
应理解,当边缘系统中处于备用节点工作模式的节点有多个时,中心系统可以向其中任一节点发送升主命令。
升主命令用于通知从备用节点工作模式切换到主节点工作模式,即升主命令用于指示切换到主节点工作模式,或者说,升主命令用于指示原来的备用节点切换到主节点工作模式。应理解,升主命令仅是为区分不同功能做的命名,其命名不对本申请实施例的保护范围造成限定。升主命令例如也可以称为倒换指令或者强制升主命令或者强制命令,或者对于主节点来说,升主命令也可以替换为降备命令。在未来协议中,用于表示相同功能的命名,都落入本申请实施例的保护范围。下文为统一,用升主命令表述。
中心系统下发升主命令到第二节点,或者说,中心系统向该第二节点下发:将自身由备用节点切换为主节点的命令。第二节点接收到该升主命令后,根据该升主命令,从备用节点切换为主节点。示例地,应用启动以后,边缘设备可以主动建立与中心系统的gRPC连接,gRPC通道建立成功后,升主过程结束。
可选地,在本申请实施例中,节点之间也可以进行检测。
例如,第一节点监视与第二节点之间的连接状态;确定第一节点与第二节点之间的连接中断或连通时,均维持第一节点的当前工作模式。
又如,第二节点监视与第一节点之间的连接状态;确定第二节点与第一节点之间的连接中断或连通时,均维持第二节点的当前工作模式。
应理解,不管第一节点与第二节点之间的连接是否发生故障,无论是处于主节点工作模式的第一节点,还是处于备用节点工作模式的第二节点,均不作处理,或者说均保持当前的模式。
在本申请实施例中,节点之间的检测,如心跳检测,结合节点与中心系统(如网关)之间的检测,共同来确定主节点是否要进行降备操作,从而可以避免出现双主问题。此外,节点根据节点与中心系统之间的连接,来确定是否要降备,所以即使出现硬件故障(如物理主机网卡故障或者物理主机故障或者网络设备故障等),或者出现,主节点或者备用节点因为内部软件故障(非网卡、网络等因素引起的故障)无法检测到对端心跳,也不会出现误判使得出现双主问题。
相反地,如果节点检测到节点之间的连接发生故障,从而备用节点升为主节点,那么可能会出现双主问题。例如,硬件故障(如物理主机网卡故障或者物理主机故障或者网络设备故障等)或者,软件故障(如非网卡、网络等因素引起的内部软件故障),均可能导致节点检测到节点之间的连接发生故障,而在这种情况下,如果备用节点升为主节点,那么可能出 现两个主节点(即双主问题):即本身处于主节点状态的节点、以及从备用节点升为主节点的节点。因此,在本申请实施例中,节点检测到节点之间的连接发生故障后,无论节点为主节点还是备用节点,均不作动作,以避免出现双主问题。
可选地,关于节点之间检测的时机,本申请实施例不作限定。下面主要以第一节点检测为例进行示例性说明。
一种可能的实现方式,第一节点可以周期性地检测与第二节点之间的连接。或者说,第一节点可以定期检测与第二节点之间的连接。例如,在系统启动后,第一节点可以周期性地检测与第二节点之间的连接。
示例地,第一节点按照第二预设时间,周期性地检测与第二节点之间的连接。
其中,第二预设时间,例如可以是配置的时长;或者,第二预设时间也可以是预先定义的时长,如协议预先定义或者中心系统预先规定的时长;或者,第二预设时间也可以根据历史检测情况确定的时长。对此,不作限定。其中,第二预设时间和第二预设时间可以相同也可以不同,两者可以有关系,也可以没有关系,对此不作限定。
例如,该第二预设时间可以是一分钟。即第一节点可以每隔一分钟检测与第二节点之间的连接。
应理解,第一节点也可以不定期地检测与第二节点之间的连接,对此本申请实施例不作限定。
可选地,关于节点之间检测的方式,本申请实施例不作限定。
一种可能的实现方式,通过发送消息(如心跳信号)的方式来检测节点之间的连接。例如,只要第一节点可以检测到第二节点的心跳,就可以认为第一节点与第二节点之间的连接正常,当没有心跳通道能够连通的情况下,可以认为第一节点与第二节点之间的连接不正常。
又一种可能的实现方式,第一节点可以通过ping网关IP地址的方式,检测与第二节点连接是否正常。
例如,如果ping命令结果返回0,则第一节点与第二节点之间连接正常;如果ping命令结果返回了其他数值,则第一节点与第二节点之间连接故障(或者说连接中断)。
示例地,ping命令的相关参数可以包括但不限于:指定连续执行次数、每次执行ping的间隔时间、执行超时时间。
其中,指定连续执行次数,例如可以表示连续执行的次数,如连续执行通过ping网关IP地址的方式检测的次数。指定连续执行次数,可以是配置的,也可以是预先定义的,对此不作限定。例如,可以设置连续执行次数为B,其中,B为大于1或等于1的整数。例如B为5,即连续执行次数为5次。其中,B和A可以相同也可以不同,两者可以有关系,也可以没有关系,对此不作限定。
其中,每次执行ping的间隔时间,例如可以表示相邻两次执行ping的时间间隔,如相邻两次执行通过ping网关IP地址的方式检测的时间间隔。每次执行ping的间隔时间,可以是配置的,也可以是预先定义的,对此不作限定。例如,可以默认每次执行ping的间隔时间为t1,其中,t1为大于0的数。例如t1为1秒,即每次执行ping的间隔时间为1秒。其中,t1和T1可以相同也可以不同,两者可以有关系,也可以没有关系,对此不作限定。
其中,执行超时时间,例如可以表示执行ping的超时时间。执行超时时间,可以是配置的,也可以是预先定义的,对此不作限定。例如,可以默认执行超时时间为t2,其中,t2为大于0的数。例如t2为4秒,即执行超时时间为4秒。其中,t2和T2可以相同也可以不同,两者可以有关系,也可以没有关系,对此不作限定。
应理解,上述方式仅是示例性说明,任何可以使得节点检测与节点之间的连接的方式,都落入本申请实施例的保护范围。例如,节点之间也可以通过检测节点之间的心跳通道来确定节点之间的连接是否正常,如可以通过发送心跳信号来检测节点之间的心跳通道。
还应理解,各个步骤之间,没有严格的先后顺序。例如,步骤320和步骤330在步骤310之前,即第二节点可以先接收来自中心系统的升主命令,并根据该升主命令进行相应的处理。或者,步骤310在步骤320和步骤330之前,即第一节点也可以先根据与中心系统之间的连接情况进行相应的处理。
上文方法300中,假设第一节点和第二节点当前的工作模式。为不失一般性,下文以节点#1为例进行说明。
图4是本申请实施例提供的一种节点控制的方法400的示意性框图。方法400可以包括如下步骤。
410,节点#1检测与中心系统之间的连接。
示例地,节点#1可以为主节点,或者说,节点#1处于主节点工作模式;或者,节点#1可以为备用节点,或者说,节点#1处于备用节点工作模式。
应理解,在节点#1为备用节点时,节点#1也可以不用检测与中心系统的连接,对此不作限定。
一种可能的实现方式,节点#1检测与网关之间的连接。具体地,可以参考上文方法300中第一节点检测与网关的连接的描述,此处不再赘述。
420,节点#1确定自身处于主节点工作模式的情况下,在检测到自身与中心系统之间的连接中断时,切换到备用节点工作模式。
一种情况,节点#1为主节点。如果节点#1检测到与中心系统之间的连接故障,或者说,节点#1检测到与中心系统之间的连接中断,那么节点#1从主节点工作模式切换到备用节点工作模式,或者说节点#1切换为备用节点。例如,节点#1可以停止节点#1上面的程序、数据库等应用。
又一种情况,节点#1为备用节点。如果节点#1检测到与中心系统之间的连接故障,或者说,节点#1检测到与中心系统之间的连接中断,那么节点#1保持备用节点工作模式,或者说节点#1保持备用节点不变。
430,节点#1确定自身处于备用节点工作模式的情况下,监听来自中心系统的命令,并在接收到来自中心系统的升主命令时,切换到主节点工作模式,升主命令用于通知从备用节点工作模式切换到主节点工作模式。
在节点#1未检测到与中心系统之间的连接故障的情况下,或者说,在节点#1与中心系统之间的连接没有断开的情况下,节点#1可能会收到中心系统下发的升主命令。
一种可能的实现方式,中心系统检测到与节点#1所处的边缘设备(或者说边缘系统)的业务连接通道中断的情况下,则中心系统可以通过管理通道对边缘设备下发升主命令。
可以理解,中心系统可以根据业务情况确定是否要发送升主命令。
示例地,中心系统可以检测与边缘设备的节点(如当前处于主节点工作模式的节点)之间的连接是否正常。在中心系统检测到与当前处于主节点工作模式的节点之间的连接中断的情况下,中心系统可以向当前处于备用节点工作模式的节点下发升主命令。
关于中心系统与节点之间检测的方式,可以参考方法300的描述,此处不再赘述。
中心系统检测到与节点#1所处的边缘设备(或者说边缘系统)的业务连接通道中断的情况下,则中心系统可以对边缘设备下发升主命令。相应地,边缘设备的节点接收该升主命令。
一种情况,节点#1为主节点,或者说,节点#1处于主节点工作模式。在该情况下,由于节点#1所处的边缘设备(或者说边缘系统)的业务连接通道已中断,即节点#1上面的程序都被停止或者可以理解为节点#1已处于备用节点工作模式,故节点#1不会接收到升主命令。
又一种情况,节点#1为备用节点,或者说,节点#1处于备用节点工作模式。在该情况下,中心系统下发升主命令到该节点#1,或者说,中心系统向该节点#1下发:将自身由备用节点切换为主节点的命令。节点#1接收到该升主命令后,根据该升主命令,从备用节点切换为主节点。示例地,应用启动以后,边缘设备可以主动建立与中心系统的gRPC连接,gRPC通道建立成功后,升主过程结束。
在本申请实施例中,一方面,可以进行连通性检测。通过节点与中心系统(如网关)的连通性检测,来判断节点是否要降备(即原来的主节点降为备用节点),从而避免出现双主问题。通过该方式,不仅可以避免双主的出现,还可以避免倒换以后才发现网络无法使用的情况,减少了无效的倒换时间消耗。又一方面,可以进行业务联动检测。通过将仲裁能力集成到中心系统,从而可以避免出现无主问题。例如,中心系统可以向处于备用节点工作模式的节点下发升主命令,使得原来的备用节点切换为主节点工作模式。从而,不需要增加新的仲裁节点,而是通过将倒换能力集成到中心系统,不仅可以减少额外的资源浪费,还可以避免出现无主问题。
应理解,上述各个步骤仅是示例说明,本申请实施例并未限定于此。例如,在节点#1当前的工作模式为备用节点工作模式时,可以仅执行步骤430。又如,在节点#1当前的工作模式为主节点工作模式时,可以仅执行步骤410和420。
还应理解,节点#1判断自身处于主节点工作模式还是备用节点工作模式,与节点#1检测与中心系统的连接是否中断之间,没有严格的先后顺序。例如,节点#1可以先判断自身处于主节点工作模式还是备用节点工作模式,然后再检测与中心系统的连接是否中断;或者,节点#1也可以先检测与中心系统的连接是否中断,然后再判断自身处于主节点工作模式还是备用节点工作模式。
可选地,方法400还可以包括步骤401。
401,节点#1检测与节点#2之间的连接。
示例地,节点#1为主节点、节点#2为备用节点;或者,节点#1为备用节点、节点#2为主节点。
应理解,不管节点#1与节点#2之间的连接是否发生故障,节点#1与节点#2均不作处 理,或者说均保持当前的模式。
关于节点之间的检测,可以参考上文方法300的描述,此处不再赘述。
应理解,步骤401和步骤410之间没有先后顺序的关系。例如,在系统启动后,节点#1可以定期地检测与节点#2之间的连接,节点#1可以定期地检测与网关之间的连接。
上文结合方法300和方法400,介绍了连通性检测(如步骤410和步骤420)结合业务联动检测(如步骤430)的方案,通过该方案既可以解决无主问题,也可以解决双主问题。下面结合图5和图6分别介绍连通性检测的方案和业务联动检测的方案。应理解,方法500和方法600所述的方案可以结合使用(如方法300或方法400所述),也可以各自单独使用,对此不作限定。
图5是本申请实施例提供的一种节点控制的方法500的示意性框图。方法500可以包括如下步骤。
510,节点#1检测与中心系统之间的连接。
示例地,节点#1可以为主节点,或者说,节点#1处于主节点工作模式;或者,节点#1可以为备用节点,或者说,节点#1处于备用节点工作模式。
节点与中心系统之间检测的时机以及方式,均可以参考方法300的描述,此处不再赘述。
520,在节点#1检测到与中心系统之间的连接中断的情况下,确定节点#1处于主节点工作模式时,切换到备用节点工作模式;确定节点#1处于备用节点工作模式时,保持备用节点工作模式。
一种可能的实现方式,节点#1可以检测与网关之间的连接。也就是说,在节点#1检测到与网关之间的连接中断的情况下,判断节点#1处于主节点工作模式时,切换到备用节点工作模式;判断节点#1处于备用节点工作模式的情况下,保持备用节点工作模式。
关于节点与网关之间检测的方案,可以参考方法300中的描述,此处不再赘述。
一种情况,节点#1为主节点,或者说,节点#1处于主节点工作模式。如果节点#1检测到与中心系统之间的连接故障,或者说,节点#1检测到中心系统之间的连接中断,那么节点#1从主节点工作模式切换到备用节点工作模式,或者说节点#1切换为备用节点。例如,节点#1可以停止节点#1上面的程序、数据库等应用。
又一种情况,节点#1为备用节点,或者说,节点#1处于备用节点工作模式。如果节点#1检测到与中心系统之间的连接故障,或者说,节点#1检测到与中心系统之间的连接中断,那么节点#1保持备用节点工作模式,或者说节点#1保持备用节点不变。
在本申请实施例中,通过节点与中心系统(如网关)的连通性检测,来判断节点是否要降备(即原来的主节点降为备用节点),从而避免出现双主问题。例如,在连通检测故障的情况下,即主节点检测与中心系统之间的连接出现故障或者说中断的情况下,可以主动对节点进行降备操作,即主节点可以主动降为备用节点。通过该方式,不仅可以避免双主的出现,还可以避免倒换以后才发现网络无法使用的情况,减少了无效的倒换时间消耗。
可选地,方法500还可以包括步骤501。
501,节点#1检测与节点#2之间的连接。
应理解,不管节点#1与节点#2之间的连接是否发生故障,节点#1与节点#2均不作动 作,或者说均保持当前的模式。
关于节点之间检测的方案,可以参考方法300中的描述,此处不再赘述。
应理解,步骤501和步骤510之间没有先后顺序的关系。例如,在系统启动后,节点#1可以定期地检测与节点#2之间的连接,节点#1可以定期地检测与中心系统之间的连接。
还应理解,在方法500中,关于升主的触发条件,如备用节点何时升为主节点,不作限定。任何可以使得主节点降为备用节点后、边缘系统中存在一个主节点(如备用节点升为主节点)的方案,均适用于本申请实施例。
基于上述技术方案,通过节点与中心系统的连通性检测,来判断节点是否要降备(即原来的主节点降为备用节点),从而避免出现双主问题。或者,结合节点之间的检测与节点和中心系统之间的检测,共同来判断节点是否要降备,从而避免出现双主问题。例如,在中心系统连通检测故障的情况下,即主节点检测与中心系统之间的连接出现故障或者说中断的情况下,可以主动对节点进行降备操作,即主节点可以主动降为备用节点。通过该方式,不仅可以避免双主的出现,还可以避免倒换以后才发现网络无法使用的情况,减少了无效的倒换时间消耗。示例地,本申请实施例还可以用于MEC技术中,从而可以解决MEC技术中可能出现的“双主”问题。
上文结合方法500介绍了连通性检测机制,即节点与中心系统之间连通性的检测,下文结合方法600介绍业务联动检测。应理解,方法500和方法600所述的方案可以结合使用,也可以各自单独使用,对此本申请实施例不作限定。
图6是本申请实施例提供的一种节点控制的方法600的示意性框图。方法600可以包括如下步骤。
610,节点#1接收来自中心系统的升主命令,升主命令用于通知从备用节点工作模式切换到主节点工作模式。
示例地,节点#1可以为主节点,或者说,节点#1处于主节点工作模式;或者,节点#1可以为备用节点,或者说,节点#1处于备用节点工作模式。
关于中心系统和升主命令可以参考方法300中的描述,此处不再赘述。
一种可能的实现方式,中心系统检测到与节点#1所处的边缘设备(或者说边缘系统)的业务连接通道中断的情况下,则中心系统可以通过管理通道对边缘设备下发升主命令。
可以理解,中心系统可以根据业务情况确定是否要发送升主命令。
示例地,中心系统可以检测与边缘设备的节点(如节点#1)之间的连接是否正常。
关于中心系统与节点之间检测的时机以及方式,均可以参考方法300中的描述,此处不再赘述。
620,节点#1确定自身处于备用节点工作模式的情况下,切换到主节点工作模式。
示例地,节点#1判断自身处于主节点工作模式的情况下,切换到备用节点工作模式。可以理解,在某些情况下,中心系统可以决定边缘系统中的主备节点倒换。例如,在软件发生故障的情况下,原来的主备节点需要倒换,即中心系统能够连接到两个节点,且中心系统向两个节点下发升主命令(或者也可以称为升主命令),使得原处于主节点工作模式的节点切换到备用节点工作模式,原处于备用节点工作模式的节点切换到主节点工作模式。
一种可能的情况,中心系统只能连接到某个节点(如节点#1,且该节点#1处于备用节 点工作模式)。在该情况下,中心系统下发升主命令到该节点,或者说,中心系统向该节点下发:将自身由备用节点切换为主节点的命令。示例地,应用启动以后,边缘设备可以主动建立与中心系统的gRPC连接,gRPC通道建立成功后,升主过程结束。
又一种可能的情况,中心系统能连接到两个节点(如节点#1和节点#2)。在该情况下,中心系统下发升主命令(或者称为倒换命令)到两个节点,或者说,中心系统向该两个节点下发:将原有主节点切换为备用节点、原有备用节点切换为主节点的命令。节点上的双机监控可以根据自身主备情况进行操作,例如,如果本节点是主节点,则本节点切换到备用节点工作模式;如果本节点是备用节点,则本节点切换到主节点工作模式。示例地,边缘设备强制倒换成功后,边缘设备可以主动建立与中心系统的gRPC连接,gRPC通道建立成功后,强制倒换结束。关于该情况,一种可能的场景是,软件发生故障,原来的主备节点需要倒换,即中心系统能够连接到两个节点,且中心系统向两个节点下发升主命令,使得原有的主节点切换为备用节点,原有的备用节点切换为主节点。
在本申请实施例中,仲裁能力可以集成到中心系统提供。例如,中心系统可以根据业务通道的故障情况来决策是否进行倒换(即原来的主节点切换为备用节点,原来的备用节点切换为主节点),而不是根据仲裁与边缘设备的连通性来检测。从而,通过将倒换能力集成到中心系统,可以减少额外的资源浪费。此外,根据业务通道故障情况来决策主备节点是否倒换,可以减少不必要的倒换,节省倒换时间。示例地,本申请实施例还可以用于MEC技术中,从而可以解决MEC技术中可能出现的“双主”问题和“无主”问题。
可选地,方法600还可以包括步骤601。
601,节点#1检测与节点#2之间的连接。
关于节点之间检测的时机和方式,可以参考方法300中的描述,此处不再赘述。
基于本申请实施例,可以进行业务联动检测,即节点之间心跳中断,节点互相之间无法感知对端状态,那么可以根据中心系统下发的升主命令确定进行倒换(即原来的主节点切换为备用节点,原来的备用节点切换为主节点)。也就是说,节点之间心跳中断,节点互相之间无法感知对端状态,如果中心系统与边缘设备的业务连接通道中断,则中心系统可以通过管理通道对边缘设备下发升主命令。
上文结合方法300和400,介绍了连通性检测结合业务联动检测的方案,结合方法500介绍了单独使用连通性检测的方案,结合方法600介绍了单独使用业务联动检测的方案。为便于理解,下文结合图7,以中心系统为中心VNFM设备、边缘设备为边缘VNFM设备、节点检测与网关的连接为例,进行示例性说明。应理解,方法700中未详细描述的,可以参考方法300至方法600中的描述。
图7示出了适用于本申请实施例的节点控制的方法700的示意图。假设边缘VNFM设备包括节点#1和节点#2,其中,节点#1和节点#2中,一个为主节点或者说处于主节点工作模式,另一个为备用节点或者说处于备用节点工作模式。方法700可以包括如下步骤。
710,节点#1检测与网关之间的连接是否正常。
节点#1可以检测与网关的连接,以便确定与中心系统的连接是否正常。
一种可能的实现方式,节点#1可以周期性地检测与网关之间的连接。或者说,节点#1可以定期检测与网关之间的连接。例如,在系统启动后,节点#1可以周期性地检测与网关 之间的连接。
示例地,节点#1按照第一预设时间,周期性地检测与网关之间的连接。
应理解,节点#1也可以不定期地检测与网关之间的连接,对此本申请实施例不作限定。
关于第一预设时间,可以参考方法300中的描述,此处不再赘述。
一种可能的实现方式,节点#1可以通过ping网关IP地址的方式,检测与网关连接是否正常。
例如,如果ping命令结果返回0,则节点#1与网关之间连接正常;如果ping命令结果返回了其他数值,则节点#1与网关之间连接故障(或者说连接中断)。
示例地,ping命令的相关参数可以包括但不限于:指定连续执行次数、每次执行ping的间隔时间、执行超时时间。
又一种可能的实现方式,节点#1可以通过发送消息的方式,检测与网关连接是否正常。
关于节点与网关之间的检测,可以参考方法300中的描述,此处不再赘述。
720,如果节点#1与网关之间的连接故障,则对节点#1进行降备操作。
假设节点#1检测到节点与网关的连接故障(或者说连接中断)。
一种可能的情况,节点#1本身为备用节点,或者说,节点#1处于备用节点工作模式。在该情况下,节点#1检测到节点与网关的连接故障,则节点#1保持备用节点工作模式不变。
又一种可能的情况,节点#1本身为主节点,或者说,节点#1处于主节点工作模式。在该情况下,节点#1检测到节点与网关的连接故障,节点#1切换到备用节点工作模式,例如节点#1停止节点#1上面的程序、数据库等应用全部停止。示例地,在该情况下,节点#1和节点#2均为备用节点,此时边缘VNFM设备形成“无主”状态,在该情况下,可以由中心VNFM设备根据业务情况进行强制升主,进而将节点#2升为主节点。
730,节点#1检测与节点#2之间的连接是否正常。
一种可能的实现方式,节点#1可以周期性地检测与节点#2之间的连接。或者说,节点#1可以定期检测与节点#2之间的连接。例如,在系统启动后,节点#1可以周期性地检测与节点#2之间的连接。
示例地,节点#1按照第二预设时间,周期性地检测与节点#2之间的连接。
关于第二预设时间,可以参考方法300中的描述,此处不再赘述。
一种可能的实现方式,节点#1可以通过ping网关IP地址的方式,检测与节点#2连接是否正常。
例如,如果ping命令结果返回0,则节点#1与节点#2之间连接正常;如果ping命令结果返回了其他数值,则节点#1与节点#2之间连接故障。
示例地,ping命令的相关参数可以包括但不限于:指定连续执行次数、每次执行ping的间隔时间、执行超时时间。
关于ping命令的相关参数,可以参考方法300中的描述,此处不再赘述。
740,节点#1检测到节点#1与节点#2的连接故障,节点#1保持不变。
不管节点#1处于主节点工作模式还是备用节点工作模式,均不作处理。相反地,假设节点#1处于备用节点工作模式,如果节点#1根据节点#1与节点#2之间的连接故障,则从备用节点升为主节点,那么可能会出现双主问题。例如,硬件故障(如物理主机网卡故障或 者物理主机故障或者网络设备故障等)或者,软件故障(如非网卡、网络等因素引起的内部软件故障),均可能导致节点#1检测到节点#1与节点#2的连接故障,而在这种情况下,如果节点#1从备用节点升为主节点,那么可能出现两个主节点(即双主问题):即本身处于主节点工作模式的节点#2、以及从备用节点升为主节点的节点#1。因此,在本申请实施例中,节点#1检测到节点#1与节点#2的连接故障后,无论节点#1处于主节点工作模式还是备用节点工作模式,均不作处理,以避免出现双主问题。进一步地,还可以通过中心VNFM设备的指示,避免出现无主问题。
示例地,在方法700中所述的情况下,节点#1和节点#2均为备用节点,此时边缘VNFM设备形成“无主”状态,在该情况下,可以由中心VNFM设备根据业务情况进行强制升主,进而将节点#2升为主节点。
750,中心VNFM设备检测与边缘VNFM设备的业务通道连接是否正常。
当边缘VNFM设备出现“无主状态”时,例如步骤720或步骤740中所述的“无主状态”,所有的节点上面的程序都被停止,会导致中心VNFM设备和边缘VNFM设备已建立的gRPC通道被迫中断,中心VNFM设备会收到“unavailable”的错误码。
当中心VNFM设备收到“unavailable”的错误码后,中心VNFM设备可以检测与边缘VNFM设备的业务通道连接是否正常。
一种可能的实现方式,中心VNFM设备通过SSH命令远程登录节点方式检测与节点#1或者节点#2连接是否正常。如果SSH命令结果返回0,则表示中心VNFM设备与边缘VNFM设备用节点之间的连接正常;如果SSH命令返回了其他数字,则表示中心VNFM设备与边缘VNFM设备用节点连接故障。
应理解,任何可以实现中心VNFM设备检测与节点连接是否正常的方式,都落入本申请实施例的保护范围。例如,中心VNFM设备还可以通过ping方式或者其它方式,检测与节点#1或者节点#2连接是否正常。
中心VNFM设备检测与边缘VNFM设备的业务通道连接故障的情况下,中心VNFM设备还可以下发升主命令。
760,中心VNFM设备检测与边缘VNFM设备的业务通道连接故障的情况下,下发升主命令。
一种可能的情况,中心VNFM设备只能连接到某个节点(如节点#1,且该节点#1处于备用节点工作模式)。在该情况下,中心VNFM设备下发升主命令到该节点,或者说,中心VNFM设备向该节点下发:将自身由备用节点切换为主节点的命令。示例地,应用启动以后,边缘VNFM设备可以主动建立与中心VNFM设备的gRPC连接,gRPC通道建立成功后,升主过程结束。
又一种可能的情况,中心VNFM设备能连接到两个节点(如节点#1和节点#2)。在该情况下,中心VNFM设备下发升主命令(或者称为强制命令)到两个节点,或者说,中心VNFM设备向该两个节点下发:将原有主节点切换为备用节点、原有备用节点切换为主节点的命令。节点上的双机监控可以根据自身主备情况进行操作,例如,如果本节点是主节点,则本节点切换到备用节点工作模式;如果本节点是备用节点,则本节点切换到主节点工作模式。示例地,边缘VNFM设备强制倒换成功后,边缘VNFM设备可以主动建立与中心VNFM 设备的gRPC连接,gRPC通道建立成功后,强制倒换结束。
应理解,方法700中的各个步骤之间没有严格的先后顺序。一示例,步骤710和步骤720,与步骤730和步骤740之间并未先后顺序关系。例如,在系统启动后,节点#1定期的执行与网关之间的检测,节点#1定期的执行与节点#2之间的检测。又一示例,步骤710和步骤720,与步骤750和步骤760之间并没有先后顺序关系。
还应理解,方法700主要以节点#1为例进行示例性说明,节点#2也可以执行如方法700所述的步骤。换句话说,边缘设备中的各个节点均可以执行如方法700的步骤。
还应理解,方法700以中心VNFM设备、边缘VNFM设备为例进行示例性说明,本申请实施例并未限定于此。例如,在实际通信中,中心系统和边缘系统均不限于VNFM设备。
在上述一些实施例中,以边缘系统包括两个节点为例进行示例性说明,本申请并未限定于此。例如,边缘系统中也可以包括两个以上的节点。此外,边缘系统中包括多个备节点的情况下,可以选择其中一个备节点发送升主命令。
在上述一些实施例中,以节点检测与网关之间的连接是否正常,以确定节点与中心系统的连接是否正常为例进行了示例性说明,本申请并未限定于此。例如,任何可以使得确定节点与中心系统的连接是否正常的方式都适用于本申请实施例。
基于上述技术方案,对于处于主节点工作模式的节点来说,在检测到自身与中心系统(如网关)之间的连接中断的情况下,进行降备操作。也就是说,通过节点与中心系统的连通性检测,来判断节点是否要降备(即原来的主节点降为备用节点),从而避免出现双主问题。或者,在心跳检测之外增加了连通性检测,如果连通性检测不通,则主动进行降备操作,从而避免出现双主问题。例如,在连通检测故障的情况下,即主节点检测与中心系统之间的连接出现故障或者说中断的情况下,可以主动对节点进行降备操作,即主节点可以主动降为备用节点。通过该方式,不仅可以避免双主的出现,还可以避免倒换以后才发现网络无法使用的情况,减少了无效的倒换时间消耗。
此外,基于本申请实施例,对于处于备用节点工作模式的节点来说,在接收到中心系统的升主命令后,进行升主操作。也就是说,可以进行业务联动检测,仲裁能力可以集成到中心系统提供。例如,中心系统可以根据业务通道的故障情况来决策原来的备用节点是否要切换到主节点,而不是根据仲裁与边缘设备的连通性来检测。从而,通过将倒换能力集成到中心系统,可以减少额外的资源浪费。此外,根据业务通道故障情况来决策主备节点是否倒换,可以减少不必要的倒换,节省倒换时间。
本文中描述的各个实施例可以为独立的方案,也可以根据内在逻辑进行组合,这些方案都落入本申请的保护范围中。例如,方法500所述的方案和方法600所述的方案可以单独使用,也可以结合使用,对此不作限定。
可以理解的是,上述各个方法实施例中,由边缘设备(或者节点)实现的方法和操作,也可以由可用于边缘设备(或者节点)的部件(例如芯片或者电路)实现,由中心系统(如中心设备)实现的方法和操作,也可以由可用于中心系统(如中心设备)的部件(例如芯片或者电路)实现。
以上,结合图3至图7详细说明了本申请实施例提供的方法。以下,结合图8和图9详细说明本申请实施例提供的装置。应理解,装置实施例的描述与方法实施例的描述相互对 应,因此,未详细描述的内容可以参见上文方法实施例,为了简洁,这里不再赘述。
本领域技术人员应该可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对边缘设备(或者节点)或者中心系统(如中心设备)进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。下面以采用对应各个功能划分各个功能模块为例进行说明。
图8是本申请实施例提供的节点控制的装置的示意性框图。该装置800包括收发单元810和处理单元820。收发单元810可以实现相应的通信功能,处理单元820用于进行数据处理。收发单元810还可以称为通信接口或通信单元。
可选地,该装置800还可以包括存储单元,该存储单元可以用于存储指令和/或数据,处理单元820可以读取存储单元中的指令和/或数据,以使得通信装置实现前述方法实施例。
该装置800可以用于执行上文方法实施例中处于主节点工作模式的节点所执行的动作,这时,该装置800可以为节点或者可配置于节点的部件。收发单元810用于执行上文方法实施例中处于主节点工作模式的节点侧的收发相关的操作,处理单元820用于执行上文方法实施例中处于主节点工作模式的节点侧的处理相关的操作。
或者,该装置800可以用于执行上文方法实施例中处于备用节点工作模式的节点所执行的动作,这时,该装置800可以为节点或者可配置于节点的部件。收发单元810用于执行上文方法实施例中处于备用节点工作模式的节点侧的收发相关的操作,处理单元820用于执行上文方法实施例中处于备用节点工作模式的节点侧的处理相关的操作。
或者,该装置800可以用于执行上文方法实施例中处于中心系统所执行的动作,这时,该装置800可以为中心系统或者可配置于中心系统的部件。收发单元810用于执行上文方法实施例中中心系统侧的收发相关的操作,处理单元820用于执行上文方法实施例中中心系统侧的处理相关的操作。
作为一种设计,该装置800用于执行上文实施例中节点所执行的动作。处理单元820,用于在确定装置800的当前工作模式为主节点工作模式的情况下,监视装置800与中心系统之间的连接状态,在确定装置800与中心系统的连接中断时,切换到备用节点工作模式;收发单元810,用于在确定装置800的当前工作模式为备用节点工作模式的情况下,监听来自中心系统的命令,在接收到中心系统发送的升主命令时,处理单元820还用于切换到主节点工作模式,其中,升主命令用于通知从备用节点工作模式切换到主节点工作模式。
作为一示例,边缘系统还包括第二节点,处理单元820,还用于监视与第二节点之间的连接状态;在确定装置800与第二节点之间的连接中断或连通时,均维持装置800的当前工作模式。
作为又一示例,处理单元820,具体用于监视装置800与网关之间的连接状态,网关在装置800与中心系统之间。
作为又一示例,网关在边缘系统内。
作为又一示例,处理单元820,还用于检测到装置800处于活跃状态时,确定装置800的当前工作模式为主节点工作模式,或者,检测到装置800处于非活跃状态时,确定装置800的当前工作模式为备用节点工作模式;或者,检测到装置800上的业务应用处于运行状态时,确定装置800的当前工作模式为主节点工作模式,或者,检测到装置800上的业务应用处于停止状态时,确定装置800的当前工作模式为备用节点工作模式;或者,检测到装置800上不存在状态标识时,确定装置800的当前工作模式为主节点工作模式,或者,检测到装置800上存在状态标识时,确定装置800的当前工作模式为备用节点工作模式。
该装置800可实现对应于根据本申请实施例的方法300至方法700中的节点执行的步骤或者流程,该装置800可以包括用于执行图3中的方法300至图7中的方法700中的节点执行的方法的单元。并且,该装置800中的各单元和上述其他操作和/或功能分别为了实现图3中的方法300至图7中的方法700的相应流程。
其中,当该装置800用于执行图3中的方法300时,收发单元810可用于执行方法300中的步骤320,处理单元820可用于执行方法300中的步骤310、330。
当该装置800用于执行图4中的方法400时,收发单元810可用于执行方法400中的步骤430,处理单元820可用于执行方法400中的步骤410、420、401。
当该装置800用于执行图5中的方法500时,处理单元820可用于执行方法500中的步骤510、520、501。
当该通信装置800用于执行图6中的方法600时,收发单元810可用于执行方法600中的步骤610,处理单元820可用于执行方法600中的步骤620、601。
当该通信装置800用于执行图7中的方法700时,收发单元810可用于执行方法700中的步骤760,处理单元820可用于执行方法700中的步骤710、720、730、740。
应理解,各单元执行上述相应步骤的具体过程在上述方法实施例中已经详细说明,为了简洁,在此不再赘述。
作为又一种设计,该装置800用于执行上文实施例中心系统所执行的动作。处理单元820,用于检测边缘系统的业务通道,边缘系统包括第一节点和第二节点;收发单元810,用于在边缘系统的业务通道发生故障的情况下,向第一节点和/或第二节点发送升主命令,其中,升主命令用于通知从备用节点工作模式切换到主节点工作模式。
作为一示例,处理单元820,具体用于:检测与第一节点或者第二节点之间的连接是否正常;在检测与第一节点或第二节点之间的连接中断的情况下,确定边缘系统的业务通道发生故障。
作为又一示例,处理单元820,具体用于:通过安全外壳协议SSH命令远程登录节点方式,检测与第一节点或者第二节点之间的连接是否正常。
该装置800可实现对应于根据本申请实施例的方法300至方法700中的中心系统执行的步骤或者流程,该装置800可以包括用于执行图3中的方法300至图7中的方法700中的中心系统执行的方法的单元。并且,该装置800中的各单元和上述其他操作和/或功能分 别为了实现图3中的方法300至图7中的方法700的相应流程。
其中,当该装置800用于执行图3中的方法300时,收发单元810可用于执行方法300中的步骤320,处理单元820可用于执行方法300中的步骤320。
当该装置800用于执行图4中的方法400时,收发单元810可用于执行方法400中的步骤430,处理单元820可用于执行方法400中的步骤410。
当该装置800用于执行图6中的方法600时,收发单元810可用于执行方法600中的步骤610。
当该装置800用于执行图7中的方法700时,收发单元810可用于执行方法700中的步骤760,处理单元820可用于执行方法700中的步骤750。
应理解,各单元执行上述相应步骤的具体过程在上述方法实施例中已经详细说明,为了简洁,在此不再赘述。
作为另一种设计,该装置800用于执行上文实施例节点控制系统所执行的动作。节点控制系统包括:第一节点、第二节点、中心系统,第一节点和第二节点位于边缘系统,且第一节点和第二节点均与中心系统连接,第一节点的当前工作模式为主节点工作模式,第二节点的当前工作模式为备用节点工作模式,处理单元820,用于监视第一节点与中心系统之间的连接状态,在第一节点与中心系统的连接中断时,使得第一节点切换到备用节点工作模式;收发单元810,用于在中心系统与第一节点的连接中断时,向第二节点发送升主命令,升主命令用于通知从备用节点工作模式切换到主节点工作模式;处理单元820,还用于使得第二节点根据升主命令切换到主节点工作模式。
作为一示例,处理单元820,还用于监视第一节点与第二节点之间的连接状态;确定第一节点与第二节点之间的连接中断或连通时,均维持第一节点和/或第二节点的当前工作模式。
作为又一示例,处理单元820,具体用于监视第一节点与网关之间的连接状态,网关在第一节点与中心系统之间。
作为又一示例,网关位于边缘系统内。
该装置800可实现对应于根据本申请实施例的方法300至方法700中的节点控制系统执行的步骤或者流程,该装置800可以包括用于执行图3中的方法300至图7中的方法700中的节点控制系统执行的方法的单元。并且,该装置800中的各单元和上述其他操作和/或功能分别为了实现图3中的方法300至图7中的方法700的相应流程。
上文实施例中的处理单元820可以由至少一个处理器或处理器相关电路实现。收发单元810可以由收发器或收发器相关电路实现。收发单元810还可称为通信单元或通信接口。存储单元可以通过至少一个存储器实现。
应理解,该装置800中的收发单元810可对应于图9中示出的设备900中的收发器930,该装置800中的处理单元820可对应于图9中示出的设备900中的处理器910。
图9是本申请实施例提供的节点控制的设备900的示意性框图。如图所示,该设备900包括处理器910,处理器910与存储器920耦合,存储器920用于存储计算机程序或指令和/或数据,处理器910用于执行存储器920存储的计算机程序或指令和/或数据,使得上文方法实施例中的方法被执行。
可选地,该设备900包括的处理器910为一个或多个。
可选地,如图9所示,该通信装置900还可以包括存储器920。
可选地,该设备900包括的存储器920可以为一个或多个。
可选地,该存储器920可以与该处理器910集成在一起,或者分离设置。也就是说,上述处理器910和存储器920可以合成一个处理装置,处理器910用于执行存储器920中存储的程序代码来实现上述功能。具体实现时,该存储器920也可以集成在处理器910中,或者独立于处理器910。
可选地,如图9所示,该设备900还可以包括收发器930,收发器930用于信号的接收和/或发送。例如,处理器910用于控制收发器930进行信号的接收和/或发送。收发器930可以包括输入接口(或者称,接收机)和输出接口(或者称,发射机)。收发器还可以称为通信接口。收发器还可以进一步包括天线,天线的数量可以为一个或多个。
作为一种方案,该设备900用于实现上文方法实施例中由节点执行的操作。在一种可能的设计中,该设备900可以是上文方法实施例中的节点,也可以是用于实现上文方法实施例中节点的功能的芯片。
例如,处理器910用于实现上文方法实施例中由节点执行的处理相关的操作,收发器930用于实现上文方法实施例中由节点执行的收发相关的操作。
具体地,该设备900可对应于根据本申请实施例的图3至图7中的节点,该设备900可以包括用于执行图3中的方法300至图7中的方法700中的节点执行的方法的单元。并且,该设备900中的各单元和上述其他操作和/或功能分别为了实现图3中的方法300至图7中的方法700的相应流程。应理解,各单元执行上述相应步骤的具体过程在上述方法实施例中已经详细说明,为了简洁,在此不再赘述。
作为另一种方案,该通信装置900用于实现上文方法实施例中由中心系统执行的操作。在一种可能的设计中,该设备900可以是上文方法实施例中的中心系统,也可以是用于实现上文方法实施例中中心系统的功能的芯片。
例如,处理器910用于实现上文方法实施例中由中心系统执行的处理相关的操作,收发器930用于实现上文方法实施例中由中心系统执行的收发相关的操作。
具体地,该设备900可对应于根据本申请实施例的图3至图7中的中心系统(或者说中心设备),该设备900可以包括用于执行图3中的方法300至图7中的方法700中的中心系统执行的方法的单元。并且,该设备900中的各单元和上述其他操作和/或功能分别为了实现图3中的方法300至图7中的方法700的相应流程。应理解,各单元执行上述相应步骤的具体过程在上述方法实施例中已经详细说明,为了简洁,在此不再赘述。
作为另一种方案,该通信装置900用于实现上文方法实施例中由节点控制系统执行的操作。在一种可能的设计中,该设备900可以是上文方法实施例中的节点控制系统,也可以是用于实现上文方法实施例中节点控制系统的功能的芯片。
例如,处理器910用于实现上文方法实施例中由节点控制系统执行的处理相关的操作,收发器930用于实现上文方法实施例中由节点控制系统执行的收发相关的操作。
具体地,该设备900可对应于根据本申请实施例的图3至图7中的节点控制系统(如包括第一节点、第二节点、中心系统),该设备900可以包括用于执行图3中的方法300至 图7中的方法700中的节点控制系统执行的方法的单元。并且,该设备900中的各单元和上述其他操作和/或功能分别为了实现图3中的方法300至图7中的方法700的相应流程。应理解,各单元执行上述相应步骤的具体过程在上述方法实施例中已经详细说明,为了简洁,在此不再赘述。
本申请实施例还提供一种计算机可读存储介质,其上存储有用于实现上述方法实施例中由节点执行的方法,或由中心系统执行的方法的计算机指令。
例如,该计算机程序被计算机执行时,使得该计算机可以实现上述方法实施例中由节点执行的方法,或由中心系统执行的方法。
本申请实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得该计算机实现上述方法实施例中由节点执行的方法,或由中心系统执行的方法。
本申请实施例还提供一种边缘系统,该边缘系统包括上文实施例中的节点,如第一节点和第二节点(或者节点#1和节点#2)。
本申请实施例还提供一种节点控制系统,该系统包括上文实施例中的节点(如第一节点和/或第二节点)和中心系统。
所属领域的技术人员可以清楚地了解到,为描述方便和简洁,上述提供的任一种通信装置中相关内容的解释及有益效果均可参考上文提供的对应的方法实施例,此处不再赘述。
在本申请实施例中,终端设备或网络设备可以包括硬件层、运行在硬件层之上的操作系统层,以及运行在操作系统层上的应用层。其中,硬件层可以包括中央处理器(central processing unit,CPU)、内存管理单元(memory management unit,MMU)和内存(也称为主存)等硬件。操作系统层的操作系统可以是任意一种或多种通过进程(process)实现业务处理的计算机操作系统,例如,Linux操作系统、Unix操作系统、Android操作系统、iOS操作系统或windows操作系统等。应用层可以包含浏览器、通讯录、文字处理软件、即时通信软件等应用。
本申请实施例并未对本申请实施例提供的方法的执行主体的具体结构进行特别限定,只要能够通过运行记录有本申请实施例提供的方法的代码的程序,以根据本申请实施例提供的方法进行通信即可。例如,本申请实施例提供的方法的执行主体可以是终端设备或网络设备,或者,是终端设备或网络设备中能够调用程序并执行程序的功能模块。
本申请的各个方面或特征可以实现成方法、装置或使用标准编程和/或工程技术的制品。本文中使用的术语“制品”可以涵盖可从任何计算机可读器件、载体或介质访问的计算机程序。
其中,计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质(或者说计算机可读介质)例如可以包括但不限于:磁性介质或磁存储器件(例如,软盘、硬盘(如移动硬盘)、磁带)、光介质(例如,光盘、压缩盘(compact disc,CD)、数字通用盘(digital versatile disc,DVD)等)、智能卡和闪存器件(例如,可擦写可编程只读存储器(erasable programmable read-only memory,EPROM)、卡、棒或钥匙驱动器等)、或者半导体介质(例如固态硬盘(solid state disk,SSD)等、U盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)等各种可以存储程序代码的介质。
本文描述的各种存储介质可代表用于存储信息的一个或多个设备和/或其它机器可读介质。术语“机器可读介质”可以包括但不限于:无线信道和能够存储、包含和/或承载指令和/或数据的各种其它介质。
应理解,本申请实施例中提及的处理器可以是中央处理单元(central processing unit,CPU),还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
还应理解,本申请实施例中提及的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM)。例如,RAM可以用作外部高速缓存。作为示例而非限定,RAM可以包括如下多种形式:静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
需要说明的是,当处理器为通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件时,存储器(存储模块)可以集成在处理器中。
还需要说明的是,本文描述的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅是示意性的,例如,上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。此外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元实现本申请提供的方案。
另外,在本申请各个实施例中的各功能单元可以集成在一个单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。
当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。计算机可以是通用计算机、专用计算机、计算机网 络、或者其他可编程装置。例如,计算机可以是个人计算机,服务器,或者网络设备等。计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。关于计算机可读存储介质,可以参考上文描述。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求和说明书的保护范围为准。

Claims (20)

  1. 一种节点控制的方法,其特征在于,应用于节点控制系统,所述节点控制系统包括:第一节点、第二节点、中心系统,所述第一节点和所述第二节点位于边缘系统,且所述第一节点和所述第二节点均与所述中心系统连接,所述第一节点的当前工作模式为主节点工作模式,所述第二节点的当前工作模式为备用节点工作模式,
    所述方法包括:
    所述第一节点监视与所述中心系统之间的连接状态,在所述第一节点与所述中心系统的连接中断时,切换到备用节点工作模式;
    所述中心系统监视与所述第一节点之间的连接状态,在所述中心系统与所述第一节点的连接中断时,向所述第二节点发送升主命令,所述升主命令用于通知从备用节点工作模式切换到主节点工作模式;
    所述第二节点根据所述升主命令切换到主节点工作模式。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    所述第一节点监视与所述第二节点之间的连接状态,和/或,所述第二节点监视与所述第一节点之间的连接状态;
    确定所述第一节点与所述第二节点之间的连接中断或连通时,均维持所述第一节点和/或所述第二节点的当前工作模式。
  3. 根据权利要求1或2所述的方法,其特征在于,
    所述第一节点监视与所述中心系统之间的连接状态,包括:
    所述第一节点监视与网关之间的连接状态,所述网关在所述第一节点与所述中心系统之间。
  4. 根据权利要求3所述的方法,其特征在于,所述网关位于所述边缘系统内。
  5. 一种节点控制的方法,其特征在于,应用于边缘系统中的第一节点,所述第一节点与中心系统连接,所述方法包括:
    在确定所述第一节点的当前工作模式为主节点工作模式的情况下,所述第一节点监视自身与所述中心系统之间的连接状态,在确定所述第一节点与所述中心系统的连接中断时,切换到备用节点工作模式;
    在确定所述第一节点的当前工作模式为备用节点工作模式的情况下,所述第一节点监听来自所述中心系统的命令,在接收到所述中心系统发送的升主命令时,切换到主节点工作模式,其中,所述升主命令用于通知从备用节点工作模式切换到主节点工作模式。
  6. 根据权利要求5所述的方法,其特征在于,所述边缘系统还包含第二节点,所述第二节点与所述中心系统连接,所述方法还包括:
    所述第一节点监视与所述第二节点之间的连接状态;
    在确定所述第一节点与所述第二节点之间的连接中断或连通时,均维持所述第一节点的当前工作模式。
  7. 根据权利要求5或6所述的方法,其特征在于,
    所述第一节点监视自身与所述中心系统之间的连接状态,包括:
    所述第一节点监视自身与网关之间的连接状态,所述网关在所述第一节点与所述中心 系统之间。
  8. 根据权利要求7所述的方法,其特征在于,
    所述网关在所述边缘系统内。
  9. 根据权利要求5至8中任一项所述的方法,其特征在于,所述方法还包括:
    检测到所述第一节点处于活跃状态时,确定所述第一节点的当前工作模式为主节点工作模式,或者,检测到所述第一节点处于非活跃状态时,确定所述第一节点的当前工作模式为备用节点工作模式;
    或者,
    检测到所述第一节点上的业务应用处于运行状态时,确定所述第一节点的当前工作模式为主节点工作模式,或者,检测到所述第一节点上的业务应用处于停止状态时,确定所述第一节点的当前工作模式为备用节点工作模式;
    或者,
    检测到所述第一节点上不存在状态标识时,确定所述第一节点的当前工作模式为主节点工作模式,或者,检测到所述第一节点上存在状态标识时,确定所述第一节点的当前工作模式为备用节点工作模式。
  10. 一种节点控制系统,其特征在于,所述节点控制系统包括:第一节点、第二节点、中心系统,所述第一节点和所述第二节点位于边缘系统,且所述第一节点和所述第二节点均与所述中心系统连接,所述第一节点的当前工作模式为主节点工作模式,所述第二节点的当前工作模式为备用节点工作模式,
    所述第一节点,用于监视与所述中心系统之间的连接状态,在所述第一节点与所述中心系统的连接中断时,切换到备用节点工作模式;
    所述中心系统,用于监视与所述第一节点之间的连接状态,在所述中心系统与所述第一节点的连接中断时,向所述第二节点发送升主命令,所述升主命令用于通知从备用节点工作模式切换到主节点工作模式;
    所述第二节点,用于根据所述升主命令切换到主节点工作模式。
  11. 根据权利要求10所述的节点控制系统,其特征在于,
    所述第一节点,还用于监视与所述第二节点之间的连接状态;
    所述第一节点,还用于确定所述第一节点与所述第二节点之间的连接中断或连通,均维持所述第一节点和/或所述第二节点的当前工作模式。
  12. 根据权利要求10或11所述的节点控制系统,其特征在于,
    所述第一节点,具体用于监视与网关之间的连接状态,所述网关在所述第一节点与所述中心系统之间。
  13. 一种节点,其特征在于,应用于边缘系统中,所述节点与中心系统连接,所述节点包括:
    处理单元,用于在确定所述节点的当前工作模式为主节点工作模式的情况下,监视所述节点与所述中心系统之间的连接状态,在确定所述节点与所述中心系统的连接中断时,切换到备用节点工作模式;
    收发单元,用于在确定所述节点的当前工作模式为备用节点工作模式的情况下,监听 来自所述中心系统的命令,在接收到所述中心系统发送的升主命令时,所述处理单元还用于切换到主节点工作模式,其中,所述升主命令用于通知从备用节点工作模式切换到主节点工作模式。
  14. 根据权利要求13所述的节点,其特征在于,所述边缘系统还包含第二节点,所述第二节点与所述中心系统连接,
    所述处理单元,还用于:
    监视与所述第二节点之间的连接状态;
    在确定所述节点与所述第二节点之间的连接中断或连通时,均维持所述节点的当前工作模式。
  15. 根据权利要求13或14所述的节点,其特征在于,
    所述处理单元,具体用于监视所述节点与网关之间的连接状态,所述网关在所述节点与所述中心系统之间。
  16. 根据权利要求13至15中任一项所述的节点,其特征在于,所述处理单元,还用于:
    检测到所述节点处于活跃状态时,确定所述节点的当前工作模式为主节点工作模式,或者,检测到所述节点处于非活跃状态时,确定所述节点的当前工作模式为备用节点工作模式;
    或者,
    检测到所述节点上的业务应用处于运行状态时,确定所述节点的当前工作模式为主节点工作模式,或者,检测到所述节点上的业务应用处于停止状态时,确定所述节点的当前工作模式为备用节点工作模式;
    或者,
    检测到所述节点上不存在状态标识时,确定所述节点的当前工作模式为主节点工作模式,或者,检测到所述节点上存在状态标识时,确定所述节点的当前工作模式为备用节点工作模式。
  17. 根据权利要求12所述的节点控制系统或所述权利要求15所述的节点,其特征在于,
    所述网关在所述边缘系统内。
  18. 一种节点控制的装置,其特征在于,包括至少一个处理器,所述至少一个处理器用于执行存储器中存储的计算机程序,以使得所述装置实现如权利要求1至4中任一项所述的方法,或者,以使得所述装置实现如权利要求5至9中任一项所述的方法。
  19. 一种计算机可读存储介质,其特征在于,包括计算机程序,当所述计算机程序在计算机上运行时,使得所述计算机执行如权利要求1至4中任一项所述的方法,或者,使得所述计算机执行如权利要求5至9中任一项所述的方法。
  20. 一种计算机程序产品,其特征在于,所述计算机程序产品中包括计算机程序代码,其特征在于,当所述计算机程序代码在计算机上运行时,使得计算机实现上述权利要求1至4中任一项所述的方法,或者,使得计算机实现上述权利要求5至9中任一项所述的方法。
PCT/CN2021/087365 2020-04-28 2021-04-15 节点控制的方法、系统以及装置 WO2021218645A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP21797439.3A EP4132065A4 (en) 2020-04-28 2021-04-15 NODE CONTROL METHOD, SYSTEM AND DEVICE
US17/974,911 US20230039817A1 (en) 2020-04-28 2022-10-27 Node Control Method, System, and Apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010352126.0A CN113573329A (zh) 2020-04-28 2020-04-28 节点控制的方法、系统以及装置
CN202010352126.0 2020-04-28

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/974,911 Continuation US20230039817A1 (en) 2020-04-28 2022-10-27 Node Control Method, System, and Apparatus

Publications (1)

Publication Number Publication Date
WO2021218645A1 true WO2021218645A1 (zh) 2021-11-04

Family

ID=78158198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/087365 WO2021218645A1 (zh) 2020-04-28 2021-04-15 节点控制的方法、系统以及装置

Country Status (4)

Country Link
US (1) US20230039817A1 (zh)
EP (1) EP4132065A4 (zh)
CN (1) CN113573329A (zh)
WO (1) WO2021218645A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108282801A (zh) * 2018-01-26 2018-07-13 重庆邮电大学 一种基于移动边缘计算的切换管理方法
US20180376380A1 (en) * 2017-06-23 2018-12-27 Huawei Technologies Co., Ltd. Exposure of capabilities of central units and distributed units in base station entities for admission control
CN109495938A (zh) * 2018-12-21 2019-03-19 西安电子科技大学 基于多接入边缘计算的网络切换方法
CN109861867A (zh) * 2019-02-28 2019-06-07 新华三技术有限公司 一种mec业务处理方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3300298B1 (en) * 2015-06-30 2020-11-25 Huawei Technologies Co., Ltd. Method and apparatus for switching vnf
CN109005045B (zh) * 2017-06-06 2022-01-25 北京金山云网络技术有限公司 主备服务系统及主节点故障恢复方法
US10411948B2 (en) * 2017-08-14 2019-09-10 Nicira, Inc. Cooperative active-standby failover between network systems
CN110740072B (zh) * 2018-07-20 2023-03-10 华为技术有限公司 一种故障检测方法、装置和相关设备
CN110417600B (zh) * 2019-08-02 2022-10-25 秒针信息技术有限公司 分布式系统的节点切换方法、装置及计算机存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180376380A1 (en) * 2017-06-23 2018-12-27 Huawei Technologies Co., Ltd. Exposure of capabilities of central units and distributed units in base station entities for admission control
CN108282801A (zh) * 2018-01-26 2018-07-13 重庆邮电大学 一种基于移动边缘计算的切换管理方法
CN109495938A (zh) * 2018-12-21 2019-03-19 西安电子科技大学 基于多接入边缘计算的网络切换方法
CN109861867A (zh) * 2019-02-28 2019-06-07 新华三技术有限公司 一种mec业务处理方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG LINGYAN; WANG SHANGGUANG; CHANG RONG N: "QCSS: A QoE-Aware Control Plane for Adaptive Streaming Service over Mobile Edge Computing Infrastructures", 2018 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (ICWS), 2 July 2018 (2018-07-02), pages 139 - 146, XP033399106, DOI: 10.1109/ICWS.2018.00025 *

Also Published As

Publication number Publication date
CN113573329A (zh) 2021-10-29
US20230039817A1 (en) 2023-02-09
EP4132065A1 (en) 2023-02-08
EP4132065A4 (en) 2023-08-30

Similar Documents

Publication Publication Date Title
WO2016184175A1 (zh) 数据库处理方法及装置
US11403227B2 (en) Data storage method and apparatus, and server
US11330071B2 (en) Inter-process communication fault detection and recovery system
US10609123B2 (en) Hybrid quorum policies for durable consensus in distributed systems
US7734948B2 (en) Recovery of a redundant node controller in a computer system
US11184435B2 (en) Message transmission method and apparatus in cluster file system
US11231983B2 (en) Fault tolerance processing method, apparatus, and server
US20230262572A1 (en) Communication method and related device
US20200067810A1 (en) State detection of netconf session
US9489281B2 (en) Access point group controller failure notification system
CN106789279B (zh) 一种网关的控制方法、远程控制端的控制方法及装置
WO2021218645A1 (zh) 节点控制的方法、系统以及装置
CN109445984B (zh) 一种业务恢复方法、装置、仲裁服务器以及存储系统
CN111511041B (zh) 一种远程连接方法及装置
US11314670B2 (en) Method, apparatus, and device for transmitting file based on BMC, and medium
US11812487B2 (en) Method, device, extender, and computer medium for automatically restoring connection
CN115640169A (zh) 保障主集群停止提供服务的方法、系统、设备和存储介质
CN114553936A (zh) 连接方法、装置、电子设备和计算机可读存储介质
CN113391759B (zh) 一种通信方法和设备
CN110990313B (zh) 一种i3c总线处理时钟拉伸的方法、设备以及存储介质
CN110362386B (zh) 网卡处理方法、装置、电子设备和存储介质
US11989105B2 (en) Storage system and control method for adjusting timeout based on network conditions
CN112039941B (zh) 一种数据传输方法、设备及介质
US20240028723A1 (en) Suspicious workspace instantiation detection
KR101902075B1 (ko) 서버 및 이를 이용한 네트워크 시스템

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21797439

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021797439

Country of ref document: EP

Effective date: 20221026

NENP Non-entry into the national phase

Ref country code: DE