CN116346587A

CN116346587A - Service grid disaster recovery method, equipment and medium

Info

Publication number: CN116346587A
Application number: CN202310398903.9A
Authority: CN
Inventors: 铁锦程; 李虎; 曾毅峰; 刘佳利; 刘冉; 吕刚
Original assignee: Shanghai Pudong Development Bank Co Ltd
Current assignee: Shanghai Pudong Development Bank Co Ltd
Priority date: 2023-04-14
Filing date: 2023-04-14
Publication date: 2023-06-27

Abstract

The invention relates to a service grid disaster recovery method, equipment and medium, which are applied to a service grid cluster, wherein the method comprises the following steps: setting a global gateway in the service grid cluster, and analyzing a request sent by the service grid cluster into the global gateway; forwarding request information from the service grid cluster to a target service by configuring IPTABLES; when SIDEAR is abnormal, deleting the strategy that the request sent by the service grid cluster in the IPTABLES is turned to SIDEAR, and realizing disaster recovery of the service grid. Compared with the prior art, the request sent by the cluster is analyzed into the IP of the GLOBAL-SIDEAR through the hardware DNS configuration rule, disaster recovery switching can be performed under the condition that the SIDEAR is abnormal, the POD is not required to be restarted, and the service influence is reduced.

Description

Service grid disaster recovery method, equipment and medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a service grid disaster recovery method, device, and medium.

Background

The service grid (SERVICEMESH) is an infrastructure layer that handles service traffic specifically, which is the next generation of micro-service architecture. Its duty is reliable request transfer under complex topologies of services consisting of cloud-native applications. It is a lightweight network proxy for a group of application services to deploy together and is transparent to the application services. The service grid overall architecture is an architecture consisting of a data plane and a control plane.

The management component is called the CONTROL PLANE (CONTROL PLANE) and is responsible for communicating with agents in the data PLANE, issuing policies and configurations.

Agents are referred to as SIDECAR or data planes (DATA PLANE) in the service grid that directly process inbound and outbound packets, forward, route, health check, load balancing, authentication, generation of monitoring data, etc.

ISITO is one of the technologies of Service Mesh, and the existing Service Mesh is generally developed based on open source ISITO, and performs function expansion on the basis of the developed ISITO. In Kubernetes clusters, pod is the basis for all traffic types. The current service grid platform is supposed to be used as the SIDEAR of the transparent proxy of the business service, sometimes causes business failure due to the problem of the SIDEAR, and the following problems exist when the service grid platform is used:

(1) When the POD fails to start, the service container always restarts due to the configuration of the health check: when the service is started, other services need to be called, if the service fails, the service exits, and the service does not have retry logic or disaster recovery logic, so that the traffic sent after the service container is started cannot be processed.

(2) In some cases, DNS resolution is easy to fail, so that traffic sent by a service container cannot be processed: because of the intelligent DNS configured with ISTIO, the responsive DNS packet format differs from the normal DNS, which can easily result in resolution failure when using other DNS clients.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a service grid disaster recovery method, equipment and medium, so as to solve the problem that the prior method lacks a disaster recovery mechanism when the service grid SIDEAR is abnormal.

The aim of the invention can be achieved by the following technical scheme:

in one aspect of the present invention, a service grid disaster recovery method is provided, applied to a service grid cluster, and the method includes the following steps:

setting a global gateway in the service grid cluster, and analyzing a request sent by the service grid cluster into the global gateway;

forwarding request information from the service grid cluster to a target service by configuring IPTABLES;

when SIDEAR is abnormal, deleting the strategy of turning the request sent by the service grid cluster in the IPTABLES to SIDEAR, and using the global gateway to route so as to realize disaster recovery of the service grid.

As a preferred solution, the sip ar includes a first unit for acquiring XDS rules and performing availability checks, and a second unit for forwarding request information from the service grid cluster to a target service based on the XDS rules.

As a preferable technical solution, the DNS is configured to resolve the request sent by the service grid cluster into the IP address of the global gateway.

As a preferred solution, the service grid cluster is configured by starting POD and injecting a SIDECAR.

As a preferable technical scheme, the judging conditions for the occurrence of abnormality of the SIDECAR are as follows:

the unified registry does not receive the heartbeat packet or receive an exception.

As a preferable technical scheme, the service grid cluster is a K8S cluster.

As a preferable technical scheme, the method further comprises the following steps:

and acquiring the availability information of each container in the service grid cluster and sending a heartbeat packet to an external unified registry.

As a preferred solution, the service grid cluster includes a plurality of containers.

In another aspect of the present invention, there is provided an electronic apparatus including: one or more processors and a memory, wherein the memory stores one or more programs, and the one or more programs comprise instructions for executing the service grid disaster recovery method.

In another aspect of the invention, a computer-readable storage medium is provided that includes one or more programs for execution by one or more processors of an electronic device, the one or more programs including instructions for performing the service grid disaster recovery method described above.

Compared with the prior art, the disaster recovery switching method can perform disaster recovery switching under the condition of SIDEAR abnormality, does not need to restart POD, reduces service influence, and solves or partially solves the problem that the prior method lacks a disaster recovery mechanism when the service grid SIDEAR is abnormal by deleting IPTABLES strategy for turning to SIDEAR when the disaster recovery is switched by resolving a request sent by the service grid cluster into a global gateway through a hardware DNS configuration rule.

Drawings

Fig. 1 is a flowchart of a service grid disaster recovery method in embodiment 1.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

Example 1

The service grid (SERVICEMESH) is an infrastructure layer that handles service traffic specifically, and is the next generation of micro-service architecture. Its duty is reliable request transfer under complex topologies of services consisting of cloud-native applications. It is a lightweight network proxy for a group of application services to deploy together and is transparent to the application services. The service grid overall architecture is an architecture consisting of a data plane and a control plane.

Agents are referred to as SIDECAR or data planes (DATAPLANE) in the service grid that directly process inbound and outbound packets, forward, route, health check, load balancing, authentication, generation of monitoring data, etc.

The flow hijacking procedure of SIDEAR is as follows: IPTABLES is a management tool for firewall software NETFILTER in the LINUX kernel, located in user space and also part of NETFILTER. NETFILTER is located in kernel space, and has not only network address conversion function, but also firewall functions such as data packet content modification and data packet filtering. The import and export flow of the service grid is hijacked to SIDECAR through technical means such as IPTABLES.

The specific interception process is as follows:

in NETWORK NAMESPACE where the POD (container) is located, IPTABLES rules intercept both incoming and outgoing traffic, with the exception of the traffic sent by the ENVOY, and redirect it through NAT REDIRECT to the 15001 port where the ENVOY listens.

The ENVOY forwards the traffic according to the XDS rule taken from PILOT.

ENVOY LISTENER0.0.0.0:15001 receives all traffic in and out of POD and then hands over the request to the corresponding VIRTUAL register

For the service of the POD, a HTTP LISTENER PODIP+ port receives INBOUND traffic

Each SERVICE + non-HTTP port, listener paired out non-HTTP traffic

Each service+HTTP port has a HTTP LISTENER:0.0.0.0+port receiving OUTBOUND traffic

The whole interception forwarding process is transparent to the SERVICE container, the SERVICE container still uses the SERVICE domain name and the port to communicate, the SERVICE domain name still can be converted into SERVICE IP, but SERVICE IP can be directly converted into POD IP in the sip, the traffic going out of the container can be directly forwarded to the corresponding POD after using the POD IP, and compared with the traditional kubrennetes SERVICE mechanism, the conversion of SERVICE IP into POD IP is performed on NODE and realized by IPTABLES maintained by KUBE-PROXY.

The service discovery of SIDECAR is as follows: DNS resolution is an important component of any application infrastructure. When an application code attempts to access another service in the cluster, even a service on the internet, it must first look up the IP address corresponding to the service hostname before initiating a connection with the service. This name lookup process is commonly referred to as service discovery. In kubrennetes, the cluster DNS server, if a CLUSTERIP type service, resolves the hostname of the service to a unique non-routable Virtual IP (VIP). An intelligent DNS agent is introduced into the ISTIO SIDEAR agent, the DNS analysis of the application is controlled, a random IP is returned, the request is intercepted to SIDEAR through IPTABLES in the POD, and the request is forwarded to the target service by the SIDEAR.

The existing service grids are mostly developed based on open source ISITO, and function expansion is performed on the basis. The SIDEAR, which should be a transparent proxy for business services, sometimes causes business failure due to its own problems, which are easily caused when using a service grid platform:

(1) When the POD is failed to start, the service container is always restarted due to the configuration of the health check, other services (such as pulling the configuration from the configuration center) need to be invoked when the service is started, and if the POD fails, the POD exits without retry logic or disaster recovery logic. The reason for failure is that the SIDEAR is not ready (it needs to pull the configuration from the control plane, requiring time) resulting in traffic sent out after the service container is started cannot be handled.

(2) After the intelligent DNS of ISTIO is enabled, DNS resolution fails in some cases, resulting in traffic sent out by the traffic container being unable to be handled. The reason for this is that intelligent DNS implementation is problematic, the format of the responsive DNS packet is different from that of normal DNS, and GLIBC resolution using the underlying library is not problematic, but using other DNS clients may result in failure.

As shown in fig. 1, the present embodiment provides a service grid disaster recovery method, which uses a domain name suffix as a part of a service name of a service grid bottom layer, closes an intelligent DNS service in an ISTIO, and uses hardware DNS for resolving.

A set of GLOBAL-SIDECAR (Global gateway, the same technology as SIDECAR, the related functions of removing service administration in SIDECAR originally only remain the functions of east-west service discovery) is deployed in each service grid cluster, and all requests sent by the KUBERNETES cluster are configured to be resolved into the IP of GLOBAL-SIDECAR in a hardware DNS resolution server

Under normal conditions, when the POD is started, SIDEAR is automatically injected, and at the moment, the SIDEAR starts an initialization container to initialize IPTABLES, a request is intercepted to the SIDEAR through the IPTABLES, and then the SIDEAR is used for proxy, and the request is forwarded to a target service;

in order to ensure that service impact is not caused when SIDEAR is abnormal, the embodiment provides SIDEAR disaster recovery switching, and when SIDEAR container is abnormal, IPTABLES maintained by a service grid control KUBE-PROXY removes IPTABLES strategy of the request to be switched to SIDEAR, so that all traffic sent by service is forwarded to GLOBAL-SIDEAR, and GLOBAL-SIDEAR routes and forwards the traffic.

The method comprises the steps of controlling KUBE-PROXY to delete IPTABLES strategy turning to SIDECAR through a control surface when disaster recovery is switched, and taking a domain name suffix as a service grid bottom service name. The request sent by the Kubernetes cluster is resolved into the IP of GLOBAL-SIDEAR through the hardware DNS configuration rule, so that disaster recovery switching can be performed under the condition that SIDEAR is abnormal, POD is not required to be restarted, and service influence is reduced.

Example 2

The present embodiment provides an electronic device, including: one or more processors and a memory, the memory having stored therein one or more programs including instructions for performing the service grid disaster recovery method as described in embodiment 1.

Example 3

The present embodiment provides a computer-readable storage medium including one or more programs for execution by one or more processors of an electronic device, the one or more programs including instructions for performing the service grid disaster recovery method as described in embodiment 1.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A service grid disaster recovery method, applied to a service grid cluster, comprising the steps of:

2. The service grid disaster recovery method according to claim 1, wherein a request sent by the service grid cluster is resolved into an IP address of the global gateway by configuring DNS.

3. The service grid disaster recovery method according to claim 1, wherein said service grid cluster is configured by initiating POD and injecting a sip ar.

4. The service grid disaster recovery method of claim 1 wherein said SIDEAR comprises a first unit for retrieving XDS rules and performing an availability check and a second unit for forwarding request information from said service grid cluster to a target service based on said XDS rules.

5. The service grid disaster recovery method of claim 1 wherein the service grid cluster is a K8S cluster.

6. The method of claim 1, wherein the service grid cluster comprises a plurality of containers.

7. The service grid disaster recovery method of claim 1, further comprising the steps of:

8. The service grid disaster recovery method according to claim 7, wherein the judging condition of the occurrence of abnormality of the SIDEAR is:

9. An electronic device, comprising: one or more processors and memory, the memory having stored therein one or more programs, the one or more programs comprising instructions for performing the service grid disaster recovery method of any of claims 1-8.

10. A computer readable storage medium comprising one or more programs for execution by one or more processors of an electronic device, the one or more programs comprising instructions for performing the service grid disaster recovery method of any of claims 1-8.