CN116346587A - Service grid disaster recovery method, equipment and medium - Google Patents

Service grid disaster recovery method, equipment and medium Download PDF

Info

Publication number
CN116346587A
CN116346587A CN202310398903.9A CN202310398903A CN116346587A CN 116346587 A CN116346587 A CN 116346587A CN 202310398903 A CN202310398903 A CN 202310398903A CN 116346587 A CN116346587 A CN 116346587A
Authority
CN
China
Prior art keywords
service
service grid
disaster recovery
cluster
recovery method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310398903.9A
Other languages
Chinese (zh)
Inventor
铁锦程
李虎
曾毅峰
刘佳利
刘冉
吕刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Pudong Development Bank Co Ltd
Original Assignee
Shanghai Pudong Development Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Pudong Development Bank Co Ltd filed Critical Shanghai Pudong Development Bank Co Ltd
Priority to CN202310398903.9A priority Critical patent/CN116346587A/en
Publication of CN116346587A publication Critical patent/CN116346587A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0663Performing the actions predefined by failover planning, e.g. switching to standby network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0894Policy-based network configuration management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/10Active monitoring, e.g. heartbeat, ping or trace-route
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L61/00Network arrangements, protocols or services for addressing or naming
    • H04L61/45Network directories; Name-to-address mapping
    • H04L61/4505Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols
    • H04L61/4511Network directories; Name-to-address mapping using standardised directories; using standardised directory access protocols using domain name system [DNS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Cardiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a service grid disaster recovery method, equipment and medium, which are applied to a service grid cluster, wherein the method comprises the following steps: setting a global gateway in the service grid cluster, and analyzing a request sent by the service grid cluster into the global gateway; forwarding request information from the service grid cluster to a target service by configuring IPTABLES; when SIDEAR is abnormal, deleting the strategy that the request sent by the service grid cluster in the IPTABLES is turned to SIDEAR, and realizing disaster recovery of the service grid. Compared with the prior art, the request sent by the cluster is analyzed into the IP of the GLOBAL-SIDEAR through the hardware DNS configuration rule, disaster recovery switching can be performed under the condition that the SIDEAR is abnormal, the POD is not required to be restarted, and the service influence is reduced.

Description

Service grid disaster recovery method, equipment and medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a service grid disaster recovery method, device, and medium.
Background
The service grid (SERVICEMESH) is an infrastructure layer that handles service traffic specifically, which is the next generation of micro-service architecture. Its duty is reliable request transfer under complex topologies of services consisting of cloud-native applications. It is a lightweight network proxy for a group of application services to deploy together and is transparent to the application services. The service grid overall architecture is an architecture consisting of a data plane and a control plane.
The management component is called the CONTROL PLANE (CONTROL PLANE) and is responsible for communicating with agents in the data PLANE, issuing policies and configurations.
Agents are referred to as SIDECAR or data planes (DATA PLANE) in the service grid that directly process inbound and outbound packets, forward, route, health check, load balancing, authentication, generation of monitoring data, etc.
ISITO is one of the technologies of Service Mesh, and the existing Service Mesh is generally developed based on open source ISITO, and performs function expansion on the basis of the developed ISITO. In Kubernetes clusters, pod is the basis for all traffic types. The current service grid platform is supposed to be used as the SIDEAR of the transparent proxy of the business service, sometimes causes business failure due to the problem of the SIDEAR, and the following problems exist when the service grid platform is used:
(1) When the POD fails to start, the service container always restarts due to the configuration of the health check: when the service is started, other services need to be called, if the service fails, the service exits, and the service does not have retry logic or disaster recovery logic, so that the traffic sent after the service container is started cannot be processed.
(2) In some cases, DNS resolution is easy to fail, so that traffic sent by a service container cannot be processed: because of the intelligent DNS configured with ISTIO, the responsive DNS packet format differs from the normal DNS, which can easily result in resolution failure when using other DNS clients.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a service grid disaster recovery method, equipment and medium, so as to solve the problem that the prior method lacks a disaster recovery mechanism when the service grid SIDEAR is abnormal.
The aim of the invention can be achieved by the following technical scheme:
in one aspect of the present invention, a service grid disaster recovery method is provided, applied to a service grid cluster, and the method includes the following steps:
setting a global gateway in the service grid cluster, and analyzing a request sent by the service grid cluster into the global gateway;
forwarding request information from the service grid cluster to a target service by configuring IPTABLES;
when SIDEAR is abnormal, deleting the strategy of turning the request sent by the service grid cluster in the IPTABLES to SIDEAR, and using the global gateway to route so as to realize disaster recovery of the service grid.
As a preferred solution, the sip ar includes a first unit for acquiring XDS rules and performing availability checks, and a second unit for forwarding request information from the service grid cluster to a target service based on the XDS rules.
As a preferable technical solution, the DNS is configured to resolve the request sent by the service grid cluster into the IP address of the global gateway.
As a preferred solution, the service grid cluster is configured by starting POD and injecting a SIDECAR.
As a preferable technical scheme, the judging conditions for the occurrence of abnormality of the SIDECAR are as follows:
the unified registry does not receive the heartbeat packet or receive an exception.
As a preferable technical scheme, the service grid cluster is a K8S cluster.
As a preferable technical scheme, the method further comprises the following steps:
and acquiring the availability information of each container in the service grid cluster and sending a heartbeat packet to an external unified registry.
As a preferred solution, the service grid cluster includes a plurality of containers.
In another aspect of the present invention, there is provided an electronic apparatus including: one or more processors and a memory, wherein the memory stores one or more programs, and the one or more programs comprise instructions for executing the service grid disaster recovery method.
In another aspect of the invention, a computer-readable storage medium is provided that includes one or more programs for execution by one or more processors of an electronic device, the one or more programs including instructions for performing the service grid disaster recovery method described above.
Compared with the prior art, the disaster recovery switching method can perform disaster recovery switching under the condition of SIDEAR abnormality, does not need to restart POD, reduces service influence, and solves or partially solves the problem that the prior method lacks a disaster recovery mechanism when the service grid SIDEAR is abnormal by deleting IPTABLES strategy for turning to SIDEAR when the disaster recovery is switched by resolving a request sent by the service grid cluster into a global gateway through a hardware DNS configuration rule.
Drawings
Fig. 1 is a flowchart of a service grid disaster recovery method in embodiment 1.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Example 1
The service grid (SERVICEMESH) is an infrastructure layer that handles service traffic specifically, and is the next generation of micro-service architecture. Its duty is reliable request transfer under complex topologies of services consisting of cloud-native applications. It is a lightweight network proxy for a group of application services to deploy together and is transparent to the application services. The service grid overall architecture is an architecture consisting of a data plane and a control plane.
The management component is called the CONTROL PLANE (CONTROL PLANE) and is responsible for communicating with agents in the data PLANE, issuing policies and configurations.
Agents are referred to as SIDECAR or data planes (DATAPLANE) in the service grid that directly process inbound and outbound packets, forward, route, health check, load balancing, authentication, generation of monitoring data, etc.
The flow hijacking procedure of SIDEAR is as follows: IPTABLES is a management tool for firewall software NETFILTER in the LINUX kernel, located in user space and also part of NETFILTER. NETFILTER is located in kernel space, and has not only network address conversion function, but also firewall functions such as data packet content modification and data packet filtering. The import and export flow of the service grid is hijacked to SIDECAR through technical means such as IPTABLES.
The specific interception process is as follows:
in NETWORK NAMESPACE where the POD (container) is located, IPTABLES rules intercept both incoming and outgoing traffic, with the exception of the traffic sent by the ENVOY, and redirect it through NAT REDIRECT to the 15001 port where the ENVOY listens.
The ENVOY forwards the traffic according to the XDS rule taken from PILOT.
ENVOY LISTENER0.0.0.0:15001 receives all traffic in and out of POD and then hands over the request to the corresponding VIRTUAL register
For the service of the POD, a HTTP LISTENER PODIP+ port receives INBOUND traffic
Each SERVICE + non-HTTP port, listener paired out non-HTTP traffic
Each service+HTTP port has a HTTP LISTENER:0.0.0.0+port receiving OUTBOUND traffic
The whole interception forwarding process is transparent to the SERVICE container, the SERVICE container still uses the SERVICE domain name and the port to communicate, the SERVICE domain name still can be converted into SERVICE IP, but SERVICE IP can be directly converted into POD IP in the sip, the traffic going out of the container can be directly forwarded to the corresponding POD after using the POD IP, and compared with the traditional kubrennetes SERVICE mechanism, the conversion of SERVICE IP into POD IP is performed on NODE and realized by IPTABLES maintained by KUBE-PROXY.
The service discovery of SIDECAR is as follows: DNS resolution is an important component of any application infrastructure. When an application code attempts to access another service in the cluster, even a service on the internet, it must first look up the IP address corresponding to the service hostname before initiating a connection with the service. This name lookup process is commonly referred to as service discovery. In kubrennetes, the cluster DNS server, if a CLUSTERIP type service, resolves the hostname of the service to a unique non-routable Virtual IP (VIP). An intelligent DNS agent is introduced into the ISTIO SIDEAR agent, the DNS analysis of the application is controlled, a random IP is returned, the request is intercepted to SIDEAR through IPTABLES in the POD, and the request is forwarded to the target service by the SIDEAR.
The existing service grids are mostly developed based on open source ISITO, and function expansion is performed on the basis. The SIDEAR, which should be a transparent proxy for business services, sometimes causes business failure due to its own problems, which are easily caused when using a service grid platform:
(1) When the POD is failed to start, the service container is always restarted due to the configuration of the health check, other services (such as pulling the configuration from the configuration center) need to be invoked when the service is started, and if the POD fails, the POD exits without retry logic or disaster recovery logic. The reason for failure is that the SIDEAR is not ready (it needs to pull the configuration from the control plane, requiring time) resulting in traffic sent out after the service container is started cannot be handled.
(2) After the intelligent DNS of ISTIO is enabled, DNS resolution fails in some cases, resulting in traffic sent out by the traffic container being unable to be handled. The reason for this is that intelligent DNS implementation is problematic, the format of the responsive DNS packet is different from that of normal DNS, and GLIBC resolution using the underlying library is not problematic, but using other DNS clients may result in failure.
As shown in fig. 1, the present embodiment provides a service grid disaster recovery method, which uses a domain name suffix as a part of a service name of a service grid bottom layer, closes an intelligent DNS service in an ISTIO, and uses hardware DNS for resolving.
A set of GLOBAL-SIDECAR (Global gateway, the same technology as SIDECAR, the related functions of removing service administration in SIDECAR originally only remain the functions of east-west service discovery) is deployed in each service grid cluster, and all requests sent by the KUBERNETES cluster are configured to be resolved into the IP of GLOBAL-SIDECAR in a hardware DNS resolution server
Under normal conditions, when the POD is started, SIDEAR is automatically injected, and at the moment, the SIDEAR starts an initialization container to initialize IPTABLES, a request is intercepted to the SIDEAR through the IPTABLES, and then the SIDEAR is used for proxy, and the request is forwarded to a target service;
in order to ensure that service impact is not caused when SIDEAR is abnormal, the embodiment provides SIDEAR disaster recovery switching, and when SIDEAR container is abnormal, IPTABLES maintained by a service grid control KUBE-PROXY removes IPTABLES strategy of the request to be switched to SIDEAR, so that all traffic sent by service is forwarded to GLOBAL-SIDEAR, and GLOBAL-SIDEAR routes and forwards the traffic.
The method comprises the steps of controlling KUBE-PROXY to delete IPTABLES strategy turning to SIDECAR through a control surface when disaster recovery is switched, and taking a domain name suffix as a service grid bottom service name. The request sent by the Kubernetes cluster is resolved into the IP of GLOBAL-SIDEAR through the hardware DNS configuration rule, so that disaster recovery switching can be performed under the condition that SIDEAR is abnormal, POD is not required to be restarted, and service influence is reduced.
Example 2
The present embodiment provides an electronic device, including: one or more processors and a memory, the memory having stored therein one or more programs including instructions for performing the service grid disaster recovery method as described in embodiment 1.
Example 3
The present embodiment provides a computer-readable storage medium including one or more programs for execution by one or more processors of an electronic device, the one or more programs including instructions for performing the service grid disaster recovery method as described in embodiment 1.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. A service grid disaster recovery method, applied to a service grid cluster, comprising the steps of:
setting a global gateway in the service grid cluster, and analyzing a request sent by the service grid cluster into the global gateway;
forwarding request information from the service grid cluster to a target service by configuring IPTABLES;
when SIDEAR is abnormal, deleting the strategy of turning the request sent by the service grid cluster in the IPTABLES to SIDEAR, and using the global gateway to route so as to realize disaster recovery of the service grid.
2. The service grid disaster recovery method according to claim 1, wherein a request sent by the service grid cluster is resolved into an IP address of the global gateway by configuring DNS.
3. The service grid disaster recovery method according to claim 1, wherein said service grid cluster is configured by initiating POD and injecting a sip ar.
4. The service grid disaster recovery method of claim 1 wherein said SIDEAR comprises a first unit for retrieving XDS rules and performing an availability check and a second unit for forwarding request information from said service grid cluster to a target service based on said XDS rules.
5. The service grid disaster recovery method of claim 1 wherein the service grid cluster is a K8S cluster.
6. The method of claim 1, wherein the service grid cluster comprises a plurality of containers.
7. The service grid disaster recovery method of claim 1, further comprising the steps of:
and acquiring the availability information of each container in the service grid cluster and sending a heartbeat packet to an external unified registry.
8. The service grid disaster recovery method according to claim 7, wherein the judging condition of the occurrence of abnormality of the SIDEAR is:
the unified registry does not receive the heartbeat packet or receive an exception.
9. An electronic device, comprising: one or more processors and memory, the memory having stored therein one or more programs, the one or more programs comprising instructions for performing the service grid disaster recovery method of any of claims 1-8.
10. A computer readable storage medium comprising one or more programs for execution by one or more processors of an electronic device, the one or more programs comprising instructions for performing the service grid disaster recovery method of any of claims 1-8.
CN202310398903.9A 2023-04-14 2023-04-14 Service grid disaster recovery method, equipment and medium Pending CN116346587A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310398903.9A CN116346587A (en) 2023-04-14 2023-04-14 Service grid disaster recovery method, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310398903.9A CN116346587A (en) 2023-04-14 2023-04-14 Service grid disaster recovery method, equipment and medium

Publications (1)

Publication Number Publication Date
CN116346587A true CN116346587A (en) 2023-06-27

Family

ID=86889379

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310398903.9A Pending CN116346587A (en) 2023-04-14 2023-04-14 Service grid disaster recovery method, equipment and medium

Country Status (1)

Country Link
CN (1) CN116346587A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980480A (en) * 2023-09-25 2023-10-31 上海伊邦医药信息科技股份有限公司 Method and system for processing fusing information based on micro-service network model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116980480A (en) * 2023-09-25 2023-10-31 上海伊邦医药信息科技股份有限公司 Method and system for processing fusing information based on micro-service network model
CN116980480B (en) * 2023-09-25 2024-02-27 上海伊邦医药信息科技股份有限公司 Method and system for processing fusing information based on micro-service network model

Similar Documents

Publication Publication Date Title
US10567340B2 (en) Data center system
EP1847110B1 (en) Resilient registration with a call manager
US7480737B2 (en) Technique for addressing a cluster of network servers
JP4860677B2 (en) Method and system for assigning multiple MACs to multiple processors
US9088478B2 (en) Methods, systems, and computer readable media for inter-message processor status sharing
US8838771B2 (en) Enabling VoIP calls to be initiated when a call server is unavailable
US11770359B2 (en) Maintaining communications in a failover instance via network address translation
EP3905590A1 (en) System and method for obtaining network topology, and server
US9270558B2 (en) Method, local gateway, and system for local voice survivability
CN116346587A (en) Service grid disaster recovery method, equipment and medium
US20080205376A1 (en) Redundant router having load sharing functionality
US11218517B2 (en) Media gateway
US11115266B2 (en) Priority based selection of time services
US10924397B2 (en) Multi-VRF and multi-service insertion on edge gateway virtual machines
JP2012034334A (en) Communication control apparatus and program, and communication system
KR20200072941A (en) Method and apparatus for handling VRRP(Virtual Router Redundancy Protocol)-based network failure using real-time fault detection
US20220400075A1 (en) Failure detection and mitigation in an mc-lag environment
US20210352004A1 (en) Multi-vrf and multi-service insertion on edge gateway virtual machines
JP4133738B2 (en) High-speed network address takeover method, network device, and program
US7966406B2 (en) Supporting a response to a mid-dialog failure
Cisco Configuring the Cisco SIP Proxy Server
JP2006157313A (en) Path creation system, path creation apparatus and path creation program
CN114448931B (en) Domain name resolution method, device and medium based on MLAG networking environment
JP2008219279A (en) Network monitoring method and network monitoring system
JP2009055342A (en) Media gateway system compatible with sip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination