CN113946450A

CN113946450A - Self-adaptive authorized polling load balancing system for K8S micro service framework

Info

Publication number: CN113946450A
Application number: CN202111293069.4A
Authority: CN
Inventors: 沃天宇; 谢一凡
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2021-11-03
Filing date: 2021-11-03
Publication date: 2022-01-18

Abstract

The invention realizes a self-adaptive authorized polling load balancing system and a self-adaptive authorized polling load balancing system for a K8S micro service framework by a method in the field of neural network parameter updating. The method comprises the following steps: the system comprises a service load layer, a service control layer and a user interaction layer, wherein the service load layer is of a three-layer structure from bottom to top, provides weight information acquired from the service control layer, and accesses downstream services by using a weighted polling algorithm; the service control layer calculates the weight proportion which is adopted by each group of micro-services when calling the downstream services by using an algorithm strategy, and sends the proportion to the micro-service example; the user interaction layer provides a web interaction interface. The invention realizes a load balancing system which calculates the optimal weight proportion of the request quantity which should be received between each instance of each group of services under the current state according to indexes such as physical resource load condition, historical service call response time condition and the like in the micro-service cluster, and solves the problems that the traditional load balancing method is empirical, difficult to migrate and depends on architecture optimization.

Description

Self-adaptive authorized polling load balancing system for K8S micro service framework

Technical Field

The invention relates to the technical field of neural network parameter updating, in particular to a self-adaptive authorized polling load balancing system for a K8S micro-service framework.

Background

Most of the early internet backend service architectures are single architectures and are deployed on a single machine. This architecture is limited by the bottleneck of stand-alone performance and cannot cope with increasing access traffic, and program code becomes difficult to maintain as the amount of service code and the degree of coupling between modules increase. Therefore, the backend architecture gradually develops towards distributed and decoupling, and the micro-service architecture is an answer obtained after practical experience. The micro-service architecture is to decouple single software integrated by multiple modules into multiple modules to be respectively and independently deployed, and the modules communicate with each other in a remote network calling mode. Therefore, each module is maintained independently, the capability of the distributed cluster is fully utilized, and the problem of access flow bottleneck is greatly relieved. Kubernetes (hereinafter K8S) is the most widely used microservice management framework in the present day.

The K8S is essentially a container arrangement framework, a Pod is a basic unit managed by the K8S, and a Pod can contain a plurality of containers, and the containers are often cooperated together to complete a business unit, namely, the Pod can be understood as a basic service unit instance. K8S in turn abstracts multiple identical Pod Service instances into Service resources, so-called microservices. Therefore, when calling the downstream Service, the upstream Service does not need to know the details of the specific instance of the Service, and only needs to uniformly request the Service resource of K8S, and then the K8S is responsible for forwarding the request sent to the Service to the specific Pod instance. The mode of the design of K8S can embody the advantages of the micro service architecture, each service instance can be deployed in a distributed mode and in multiple copies, and meanwhile, when the load of the service is increased, the capacity can be rapidly expanded by increasing the copies of the Pod instances.

However, there are some problems with the microservice architecture. Because the logic of each service of the micro-service architecture is too basic and light, a complete business logic needs to be completed by cooperation of a plurality of micro-service program instances, which means that executing a request often generates a call chain or even a call network, thereby naturally causing the increase of the business response time, and the corresponding time of the business directly affects the user experience. Reducing the total response time of a service invocation is therefore a valuable point of research. The prior art is mainly based on a container level, that is, a reasonable placement strategy and a capacity expansion and contraction strategy are designed for an instance of a service, so that each instance of a group of services keeps a low load as much as possible, and the group of services can continuously provide high-quality services. In addition, some software starts from a load balancing strategy of the flow, and provides an interface for modifying the weight for the operation and maintenance personnel of the micro service, so that the access weight of the low-quality container instances in the service is reduced, and the access quantity of the upstream service to the instances with the overhigh load is reduced.

Two feasible ideas are provided for reducing service time delay and improving service quality. One is from the perspective of service instances, and for services with high load, the number of instances of the services is increased, and such schemes usually have good effects, but waste of cluster resources is inevitably caused. In addition, the cluster physical resources are costly, and under the condition that the cluster physical resources are limited, the effect of the instance capacity expansion strategy can be trapped in a bottleneck, and the resources of other instances can be preempted. Therefore, a second class of thinking for improving the service quality is introduced, namely starting from load balancing under the condition of a certain number of service instances, flow access weight proportion among a group of micro service instances is adjusted, and the service quality is further improved. The related prior art solutions starting from the second category of ideas will be briefly described below.

There are several service management systems applied to micro service architecture, such as Istio, Traetik, Kong, etc. that compare the heat in the industry. Since microservice architecture itself is a concept emerging from the industry, it is often of interest to researchers how to make an architecture better serve business products. Some of the above mentioned products are open sources and some have commercial versions, but their functions are all more focused on designing for the concept of "service" in the micro service architecture, for example, they all provide gateway authentication, gray release, etc. But for the "traffic" management implied under the "service" management, especially the load balancing capability of the traffic provides only simple functions, such as a basic polling algorithm and a weighted polling load balancing algorithm by manually configuring weights.

There are some fundamental research works that are more apt to serve business functions than mature products. Hao Zhou et al measures the load of the micro-service according to the average waiting time of the requests in the queue, and preferentially processes the service requests with high priority and discards the service requests with low priority according to the indexes of the service priority and the like which are subjected to fine design, thereby ensuring the service quality of the important service. Ding Z et al focuses on the service invocation link, calculates the deadline of each task by analyzing instance processing speed, network transmission speed and task concurrency to obtain the urgency of the task, and then uses an algorithm based on list scheduling to process the more urgent request to ensure the quality of service, which then obtains good results on the simulator.

The existing technology adopting the weighted polling load balancing algorithm usually needs to manually match the weights of the service instances, and the method based on experience is not timely enough and brings extra workload to development and maintenance personnel. In addition, some techniques for improving the service quality rely on optimizing the conventional K8S micro service architecture, which is not universal and is not convenient for service maintenance and migration. Still other studies on load balancing algorithm strategies have only stayed in the laboratory and have not given a complete system that could be applied to actual production.

Disclosure of Invention

Therefore, the invention firstly provides a self-adaptive right-carrying polling load balancing system for a K8S micro-service framework, which comprises a three-layer structure of a service load layer, a service control layer and a user interaction layer from bottom to top, wherein the service load layer provides weight information acquired from the service control layer for each service instance of micro-service and accesses downstream services by using a right-carrying polling algorithm; the service control layer calculates the weight proportion which is adopted by each group of micro-services when calling the downstream services according to the monitoring data of the service load layer collected by the monitoring module and by using an algorithm strategy, and sends the proportion to the micro-service example; the user interaction layer provides a web interaction interface for monitoring flow calling and weight information of the micro-services in the cluster and service quality conditions, and the usability of the system is improved.

The specific implementation manner of the service load layer is as follows: when the K8S framework accesses the downstream Service through Service IP, the IP address is intercepted by the physical machine where the upstream container is located, and is converted into an actual instance IP according to an IPtables routing policy table configured by K8S; the scheme needs to complete two parts of work, the first part of work is to control the flow of the service outflow to the service agent. This part can be realized by an Iptables tool provided by Linux, and the second part is that the service agent sends traffic to a specific instance downstream, and selects the cloud native network agent Envoy to realize.

The service control layer is composed of a monitoring module and a decision module.

The monitoring module monitors three dimensional indexes: the first dimension is cluster resource monitoring, and physical machine information, service information and instance information contained in service in a cluster are acquired through an Api-server interface provided by K8S; the second dimension is physical resource information, after the information of the first dimension is obtained, the conditions of physical machines and container examples in the cluster are mastered, and then a MetricsServer plug-in provided by a K8S community is used for collecting the CPU load conditions of the physical machines and the examples; the third dimension is service calling condition, a service calling relation directed graph and time consumption of service calling are constructed by monitoring the flow direction condition and calling time of service flow, monitoring is carried out by Envoy network agent software deployed in a service container, and data collection and historical data storage are carried out by Prometheus software.

The decision module calculates the weight proportion, and the specific method comprises the following steps: one service instance A for an upstream service A_iInstance B of invoking downstream service B_jIs expressed as

The type is an integer, where the upstream instance calls the total sum of the weights of all instances of the downstream service to 100, i.e.

Upstream instance A_iInvoking downstream instance B_jResponse time of

Formed of two parts, the processing time of the downstream service itself

And time consumption of downstream services to continue invoking lower level service chains

Namely, it is

Time consuming for the downstream service to continue invoking a lower level service chain

Using softmax function to schedule calls to various instances of downstream services

Converting into a weight factor, then adjusting the previous round of weight by a factor of:

wherein k is B_jThe number of the physical machine is located, and alpha is a hyper-parameter: CPU occupancy P of a machine_iAnd degree of idleness A_iThe relationship of (1):

where threshold is an empirical threshold, then the idleness of the machine is paired with instance B of service B_jThe impact factor of (c) can be simply modeled as:

since the sum of the results of the Softmax function is 1 and the sum of the weights is 100, the result is here scaled up by a factor of 100 and the result of the Softmax function scaled up by a factor of 100 is subtracted from the theoretical average of the set of weights, where | B | is the number of instances of service B. Each service completing a request has an inherent CPU consumption and excessive accesses have an additional CPU consumption, which can be modeled by the softmax function:

the operation of normalization is not described herein.

The expression for the final weight update according to the above is:

the method of (2) assigns an initial value to the parameter:

and finally, obtaining the weight matching information of each group of services, and providing the weight matching information to a service load layer, thereby realizing the self-adaptive weighted polling load balancing system.

The user interaction layer provides a relevant command line tool and a visual interface, and the K8S service deployment can realize the Yaml file deployment of the capability of realizing the network agent by additionally injecting scripts into the service container through modification.

The technical effects to be realized by the invention are as follows:

the invention provides an algorithm strategy, which is used for calculating the optimal weight ratio of the request quantity which is required to be received among all instances of each group of services under the current state according to indexes such as physical resource load conditions, historical service call response time conditions and the like in a micro service cluster.

The invention provides a self-adaptive weighted polling load balancing system applied to a K8S framework, which comprises a service load layer, a service control layer and a user interaction layer. And the network agent container injected into the Pod where the service container is located in the service load layer completes the process of accessing the downstream flow by using the authorized polling algorithm. And the service control layer calculates the optimal access weight ratio of each service instance in the micro-service cluster according to the monitoring index. The user interaction layer provides a command line tool and a visual interface, and the usability of the system is improved.

Drawings

FIG. 1 is a schematic diagram of the overall architecture of the system;

FIG. 2 is a schematic diagram of service load layer traffic flow;

FIG. 3 is a schematic diagram of a monitoring sub-module of a service control layer

Detailed Description

The following is a preferred embodiment of the present invention and is further described with reference to the accompanying drawings, but the present invention is not limited to this embodiment.

The invention provides an adaptive weighted polling load balancing system for a K8S micro service framework. The structure of the system is shown in fig. 1, and the system comprises a service load layer, a service control layer and a user interaction layer. In the service load layer, each service instance of the micro service accesses downstream services by using a weighted polling algorithm according to the weight information acquired from the service control layer; the service control layer generates weight information used by a weighted polling algorithm for each group of micro-services by using an algorithm strategy according to the monitoring data of the service load layer collected by the monitoring module; the user interaction layer provides a web interaction interface for monitoring the flow calling and weight information of the micro-service in the cluster and the service quality condition, and improves the usability of the system.

The system calculates the weight proportion that each group of service should adopt when calling the downstream service according to the load condition of the service instance in the current cluster and the algorithm strategy, and sends the proportion to the service instance. And each service instance is matched according to the received weight and uses a weighted polling algorithm to call the service, so that the aim of reducing the total time consumption of service response is fulfilled. Therefore, a layer of guarantee can be provided on the basis that the cluster adopts a container expansion and contraction strategy, and meanwhile, the defect of manual weight setting is overcome by the self-adaptive weight adjustment strategy.

The system realized by the invention is essentially a load balancer, and after the load balancer is deployed into a K8S cluster, service instances in the cluster access downstream services by adopting a self-adaptive weighted polling algorithm, thereby finally achieving the effect of reducing the response time of service call. The system needs to be deployed into the K8S microservice framework and become part of the K8S framework. In an actual production environment, micro-service developers do not need to perform additional operation, and all used auxiliary plug-ins such as auxiliary containers for monitoring service instances and the like can be automatically deployed.

Service load layer:

the service load layer is a place where the micro-service container is deployed and is also a place where the access traffic really forms a load. The service load layer will use the weighted polling algorithm to access the downstream services it needs to access according to the weight ratio received from the service control layer.

K8S provides an abstract Pod concept, where multiple containers (containers) under a Pod are under a network stack, so that the network stack of the Pod where a normal service Container is located can be modified by an extra injected Container, the network access traffic flow direction of the service Container is controlled, and a network proxy is deployed for the Pod, so that all traffic flow of a service can be controlled to the proxy Container, and then the traffic flow is controlled to a specific instance of the downstream service through the proxy Container. As in fig. 2, a container within an upstream pod will access the container service in a downstream instance, with the service having three instances. The load balancing capability provided by the K8S framework itself is that when an upstream container accesses a downstream Service through Service IP, the IP address will be intercepted by the physical machine where the container is located, and converted into an actual instance IP according to the Iptables routing policy table configured by K8S. Compared with a native method of a K8S framework, the scheme adopted by the system transfers the step to the inside of the Pod of K8S for processing, and as the Pod provides a virtualized Linux environment with more perfect functions, the conversion from Service IP to Pod IP can be completed inside the Pod.

This solution requires two parts of work to be done. The first part of the work is to control the flow of service outgoing traffic to the service agents. This may be accomplished through the Linux-provided Iptables tool, which is a user-mode command line tool provided by the Linux kernel for allowing a user to operate the Netfilter firewall in the kernel space. The project uses the Iptables to hijack all traffic flowing out of the space of a specific user to the monitoring port of the agent. The second part of the work is that the service agent sends traffic downstream to a particular instance. Here, first, the type selection of the service agent needs to be considered, and the service agent needs to satisfy the following characteristics: first, the weight is light, and each service Pod has a service agent therein, so that the agent needs to be as light as possible; secondly, the method is configurable, and the forwarding rules of the agent can be conveniently configured outside; and thirdly, the proxy service is native to the cloud, and as the proxy service runs under a micro-service framework, the functions of the proxy service are realized by depending on a network interface instead of a configuration file and the like. In conclusion, a cloud native network agent Envoy is finally selected. Envoy can obtain the traffic forwarding strategy by sending a request to the server, thereby realizing the purpose of specifically forwarding the traffic of the service container according to the rules.

In conclusion, the scheme hijacks the traffic of the service container to the Envoy network proxy container belonging to the service container Pod, and then implements the access policy with right polling in the Envoy container. In addition, the weight matching information of the downstream service is accessed and is also acquired by the service request of the Envoy container to the service control layer. The container traffic hijacking scheme under the micro-service is also a relatively mature solution in the industry.

Service control layer:

the service control layer in the architecture diagram of fig. 1 is basically composed of two parts in terms of functional logic, namely a monitoring part responsible for acquiring cluster state data and a decision part for making a decision according to monitoring data of the monitoring part.

The architecture diagram of the monitoring submodule is shown in fig. 3. The monitoring module needs to monitor three dimensional indicators. The first dimension is cluster resource monitoring, and physical machine information, service information and instance information contained in a service in a cluster are acquired through an Api-server interface provided by K8S. The second dimension is physical resource information, after the information of the first dimension is obtained, the conditions of physical machines and container instances in the cluster can be mastered, and then a MetricsServer plug-in provided by a K8S community is used for collecting the CPU load conditions of the physical machines and the instances. The third dimension is service calling condition, a service calling relation directed graph and time consumption of service calling can be constructed by monitoring the flow direction condition and calling time of service traffic, the index is monitored by Envoy network agent software deployed in a service container, and data collection and historical data storage are performed by Prometheus software. Prometheus is also an open source, sophisticated monitoring assistance component that can access the metrics data interface at intervals and save the metrics data in its own time-series database, while providing sql-like language to query for relevant metrics across time periods. Prometheus also features cloud-native, which is compatible with the K8S framework and allows for automated incorporation of newly created services into the monitoring through simple configuration. And finally, the monitoring submodule of the service control layer can obtain stable cluster state monitoring index data by using the three types of index information, so that a decision module is assisted to make a decision.

And the other part of the decision module of the service control layer is responsible for calculating a reasonable weight ratio for the process of calling the downstream service for each group of services according to the cluster state collected by the monitoring part at the moment. One service instance A for an upstream service A_iIn other words, it invokes instance B of downstream service B_jIs expressed as

The types are integers. Where the upstream instance calls the total sum of the weights of all instances of the downstream service to 100, i.e.

Upstream instance A_iInvoking downstream instance B_jResponse time of

Formed of two parts, the processing time of the downstream service itself

Namely, it is

The update of the weight parameters also needs to take these two parts into account.

First is the time consumption of reading downstream services to continue invoking lower-level service chains

Using a thought process of dynamic programming, i.e. consider instance B_jIs already optimal, then only need to be based on

Modeling is performed. Here, the call time of each instance of the downstream service is taken using the softmax function

Converted into a weighting factor. The softmax function was chosen to take advantage of its constant invariant property:

softmax(X)＝softmax(X+C)

since each service will have an inherent processing response time

This time assumption is considered notIn addition, the response time increment caused by the excessive extra request

Then for each instance downstream call time of the service, the result of modeling it using the softmax function can be considered as the time consumption caused by the influence of excessive access to this instance, and further, a factor for adjusting the previous round of weight is derived:

wherein k is B_jThe physical machine in which it is located.

The second part is the modeling of the downstream service's own processing time. The largest impact on processing time, except for external IO time, is the number of CPU slots. In an environment where multiple services compete for resources, the more time slices a service can be divided into, the shorter it will be to process a request. The number of time slices a service can be divided into is related to how strongly the physical machines compete, and therefore traffic should be distributed to machines where the physical machine at the downstream instance is relatively idle. A simple function is used to describe the CPU occupation P of a certain machine_iAnd degree of idleness A_iThe relationship of (1):

where threshold is an empirical threshold. The above equation can be simply understood as how idle is dependent on the CPU footprint P_iIncreases and decreases linearly, and after a threshold is reached, the complete machine is considered fully busy. Then the machine's idleness is matched to instance B of service B_jThe impact factor of (c) can be simply modeled as:

since the sum of the results of the Softmax function is 1 and the sum of the weights is 100, the result is here scaled up by a factor of 100 and the result of the Softmax function scaled up by a factor of 100 is subtracted from the average of the set of weights, where | B | is the number of instances of service B. However, it should be noted that distributing traffic to instances where the machine in the downstream service is idle may result in overloading the instance and thus affecting processing time, and an additional penalty is added to avoid this. This penalty term can be modeled by using softmax for the CPU occupancy that services all services. As with request time, there is an inherent CPU consumption per service completion request and an excess of accesses brings an additional CPU consumption, which can be modeled by the softmax function:

wherein P is_BIs the CPU consumption value of all instances of service B, furthermore

Formula (I) is carried out

The normalization operations with similar formulas are not repeated herein

The expression for the final weight update according to the above is:

learning, an initial value needs to be given to the parameter, and the initial value is simply an average distribution method:

finally, according to the formula, the weight matching information of each group of services can be obtained and provided to a service load layer, so that the self-adaptive weighted polling load balancing system is realized.

User interaction layer:

in view of the ease of use of the system, the proposed system also provides a related command line tool and a visualization interface. The command line tool may help facilitate deployment of the present system into the K8S microservice framework, as part of the K8S framework. In addition, the command line tool also provides a service deployment tool, and since the K8S service deployment needs to be deployed through the Yaml file and describe the detailed attributes of the service in the file, the system needs to additionally inject some scripts into the service container to realize the capability of the network proxy, the original Yaml file needs to be modified. By using the command line tool, the microservice developer does not need to perform additional operation, the tool can convert the original Yaml file into the file required by the system, and all the used plug-ins and auxiliary containers can be automatically deployed. The visual web interface provides resource information in the K8S cluster, calling relation of the service and the like.

Claims

1. An adaptive weighted polling load balancing system for a K8S microservice framework, comprising: the micro-service access system comprises a three-layer structure of a service load layer, a service control layer and a user interaction layer from bottom to top, wherein the service load layer provides weight information acquired from the service control layer for each service instance of the micro-service and accesses downstream services by using a weighted polling algorithm; the service control layer calculates the weight proportion which is adopted by each group of micro-services when calling the downstream services according to the monitoring data of the service load layer collected by the monitoring module and by using an algorithm strategy, and sends the proportion to the micro-service example; the user interaction layer provides a web interaction interface.

2. The adaptive weighted polling load balancing system for a K8S microservice framework of claim 1, wherein: the specific implementation manner of the service load layer is as follows: when the K8S framework accesses the downstream Service through Service IP, the IP address is intercepted by the physical machine where the upstream container is located, and is converted into an actual instance IP according to an IPtables routing policy table configured by K8S; the scheme needs to complete two parts of work, the first part of work is to control the flow of the service outflow to the service agent. This part can be realized by an Iptables tool provided by Linux, and the second part is that the service agent sends traffic to a specific instance downstream, and selects the cloud native network agent Envoy to realize.

3. The adaptive weighted polling load balancing system for the K8S microservice framework of claim 2, wherein: the service control layer is composed of a monitoring module and a decision module.

4. The adaptive weighted polling load balancing system for the K8S microservice framework of claim 3, wherein: the monitoring module monitors three dimensional indexes: the first dimension is cluster resource monitoring, and physical machine information, service information and instance information contained in service in a cluster are acquired through an Api-server interface provided by K8S; the second dimension is physical resource information, after the information of the first dimension is obtained, the conditions of physical machines and container examples in the cluster are mastered, and then a MetricsServer plug-in provided by a K8S community is used for collecting the CPU load conditions of the physical machines and the examples; the third dimension is service calling condition, a service calling relation directed graph and time consumption of service calling are constructed by monitoring the flow direction condition and calling time of service flow, monitoring is carried out by Envoy network agent software deployed in a service container, and data collection and historical data storage are carried out by Prometheus software.

5. The adaptive weighted polling load balancing system for the K8S microservice framework of claim 4, wherein: the decision module calculates the weight proportion, and the specific method comprises the following steps: for upstream servicesOne service instance of A_iInstance B of invoking downstream service B_jIs expressed as

Upstream instance A_iInvoking downstream instance B_jResponse time of

Formed of two parts, the processing time of the downstream service itself

Namely, it is

Represents modeling of service invocation time, where B is the downstream invocation service of a.

6. The adaptive weighted polling load balancing system for the K8S microservice framework of claim 5, wherein: time consuming for the downstream service to continue invoking a lower level service chain

wherein k is B_jThe physical machine is located at the same time as the request time;

the processing time of the service itself

The generated weight factor is calculated by the following method: CPU occupancy P of a machine_iAnd degree of idleness A_iThe relationship of (1):

| B | is the number of instances of service B; each service completing a request has an inherent CPU consumption and excessive accesses have an additional CPU consumption, which can be modeled by the softmax function:

wherein P is_BCPU consumption values for all instances of service B;

the expression for the final weight update according to the above is:

and carrying out normalization operation on weight updating, keeping the sum of the weights about 100, and assigning an initial value to the parameters by using an average distribution method:

7. The adaptive weighted polling load balancing system for the K8S microservice framework of claim 6, wherein: the user interaction layer provides a relevant command line tool and a visual interface, and the K8S service deployment can realize the Yaml file deployment of the capability of realizing the network agent by additionally injecting scripts into the service container through modification.