CN118245196A

CN118245196A - Application switching method, device, equipment, storage medium and computer program product

Info

Publication number: CN118245196A
Application number: CN202410649874.3A
Authority: CN
Inventors: 马梦雨; 黄吉旺
Original assignee: Jinan Inspur Data Technology Co Ltd
Current assignee: Jinan Inspur Data Technology Co Ltd
Priority date: 2024-05-24
Filing date: 2024-05-24
Publication date: 2024-06-25
Anticipated expiration: 2044-05-24
Also published as: CN118245196B

Abstract

The embodiment of the invention provides an application switching method, an application switching device, a storage medium and a computer program product, which relate to the technical field of communication and comprise the following steps: under the condition that an application switching request initiated when a target application performs cross-node switching is detected, determining a first target switching node according to node resource information acquired in advance, and determining the corresponding target copy number after target application switching; performing flow distribution on application copies of the target application of the stateless workload according to a flow segmentation strategy; synchronizing data replication of application copies of the target application of the stateful workload among different nodes; and when the service level data corresponding to the target application before and after switching is detected to be consistent, all the service requests corresponding to the initial node are completed, and the application copy data corresponding to the target application before and after switching is consistent, the target application cross-node switching is completed. The method and the device realize the equivalence and smoothness of automatic scheduling and migration of the cloud native application in a cloud multi-core scene.

Description

Application switching method, device, equipment, storage medium and computer program product

Technical Field

The present invention relates to the field of communications technologies, and in particular, to an application switching method, apparatus, device, storage medium, and computer program product.

Background

The cloud computing software and hardware industry chain of China is vigorously developed, the phenomenon of application scene diversification and chemical conversion is prominent, customers take a cloud multi-core strategy to make up for the shortages, and hope that the service can realize equivalent service quality, namely equivalent smooth switching, among different infrastructure multiple heterogeneous processor architectures.

In the related technology, the main research direction of most manufacturers is the mixed part of multi-core, the automatic scheduling and the cross-architecture migration based on the mirror architecture are realized based on the scheduling strategy of node selection and the multi-architecture mirror image manifest, but the cross-architecture equivalent operation and the smooth switching which are centered on the application have a larger gap. With the development of cloud primary technology, more and more services are gradually converted into micro-service and containerization, but the preferred scheme Kubernetes+docker technology of micro-service containerization faces to heterogeneous complex environments of a cloud multicore, and the shortages of cross-architecture computational power equivalent and smooth switching of service application cannot be realized.

Disclosure of Invention

The embodiment of the invention aims to provide an application switching method, an application switching device, a storage medium and a computer program product, and the specific technical scheme is as follows:

in a first aspect of the present invention, there is provided an application switching method, including:

Under the condition that an application switching request initiated when a target application performs cross-node switching is detected, determining a first target switching node according to node resource information acquired in advance, and determining the corresponding target copy number after target application switching according to a specification calculation force corresponding to the first target switching node; and performing flow distribution on the application copy of the target application of the stateless workload according to a flow segmentation strategy to obtain a flow proportion corresponding to the application copy; and synchronizing data replication of application copies of the target application of the stateful workload among different nodes;

and when the service level data corresponding to the target application before and after switching is detected to be consistent, all the service requests corresponding to the initial nodes corresponding to the target application are processed, and the application copy data corresponding to the target application before and after switching is consistent, the target application is in a cross-node switching completion state.

Optionally, before the step of detecting that the service level data corresponding to the target application before and after switching is consistent, all the processing of the service request corresponding to the initial node corresponding to the target application is completed, and the application copy data corresponding to the target application before and after switching is consistent, the method includes:

monitoring the target application according to a preset monitoring component to obtain monitoring data;

and determining whether the service level data corresponding to the target application before and after switching is consistent or not according to the monitoring data.

Optionally, the determining whether the service level data corresponding to the target application before and after switching is consistent according to the monitoring data includes:

and if not, correcting the target copy number corresponding to the target application and the first target switching node according to the monitoring data until the service level data corresponding to the target application before and after switching are consistent.

Optionally, the correcting the target copy number corresponding to the target application and the first target switching node according to the monitoring data includes:

Acquiring a difference value between first service level data corresponding to an initial node before the target application is switched and second service level data corresponding to a first target switching node after the target application is switched according to the monitoring data;

And if the difference value is larger than a first threshold value, the target cost number is adjusted according to a preset step length, or a second target switching node is determined according to the specification calculation force and the mapping matrix of the switching node.

Optionally, the obtaining, according to the monitoring data, a difference between the first service level data corresponding to the initial node before the target application is switched and the second service level data corresponding to the first target switching node after the target application is switched includes:

And carrying out normalization processing on the throughput value, the delay value and the error number corresponding to the initial node before the target application is switched, and carrying out normalization processing on the throughput value, the delay value and the error number corresponding to the first target switching node after the target application is switched, and combining the weight values respectively corresponding to the throughput value, the delay value and the error number to obtain a difference value.

Optionally, the mapping matrix according to the specification calculation force and the switching node is a calculation force distribution matrix table based on a CPU model, and the calculation force distribution matrix table is used for recording the mapping matrix of the CPU model and the specification calculation force under different nodes.

And judging whether the service requests corresponding to the initial nodes corresponding to the target application are processed completely or not according to a second preset monitoring component.

Optionally, after the step of determining, according to the second preset monitoring component, whether the service request corresponding to the initial node corresponding to the target application is all processed, the method includes:

If not, optimizing the flow segmentation strategy according to the information aiming at the service request recorded by a preset log system.

if yes, the flow proportion corresponding to the application copy is used as the flow proportion corresponding to the target application cross-node switching.

And verifying whether data between the application copy of the target application of the stateful workload corresponding to the first target switching node and the application copy of the target application of the stateful workload corresponding to the initial node are consistent or not through a preset database query statement.

Optionally, after the step of verifying, by a preset database query statement, whether the data between the application copy of the target application of the stateful workload corresponding to the first target switching node and the application copy of the target application of the stateful workload corresponding to the initial node are consistent, the method includes:

and if not, carrying out data replication synchronization on the application copy of the target application with the state workload among different nodes again until the corresponding application copy data of the target application before and after switching is consistent.

Optionally, the determining the first target switching node according to the pre-acquired node resource information, and determining the number of target copies corresponding to the target application after switching according to the specification calculation force corresponding to the first target switching node includes:

Determining a first target switching node according to node resource information acquired in advance;

acquiring a first specification calculation force corresponding to the first target switching node and an initial specification calculation force corresponding to an initial node;

And generating target copy numbers corresponding to the target application on the first target switching node according to the first specification calculation force, the initial specification calculation force and the initial copy numbers.

Optionally, the obtaining the first specification computation force corresponding to the first target handover node, and the initial specification computation force corresponding to the initial node includes:

Respectively acquiring the shaping operation capability and the floating point operation capability corresponding to the first target switching node and the initial node according to the evaluation of the reference test tool;

and quantizing the first specification calculation force corresponding to the first target switching node and the initial specification calculation force corresponding to the initial node according to the shaping calculation capability and the floating point calculation capability.

Optionally, the performing, according to the traffic splitting policy, traffic distribution on the application copy of the target application of the stateless workload, and obtaining a traffic proportion corresponding to the application copy includes:

and performing flow distribution on the application copy of the target application of the stateless workload according to a preset flow drainage mechanism, a preset flow preheating mechanism, a ready probe and an element and configuration retry mechanism to obtain a flow proportion corresponding to the application copy.

Optionally, the performing traffic distribution on the application copy of the target application of the stateless workload according to a preset traffic draining mechanism, a preset traffic preheating mechanism, a ready probe, and an element and configuration retry mechanism, and obtaining a traffic proportion corresponding to the application copy includes:

And carrying out flow distribution on the application copy of the target application of the stateless workload according to the sequence of a preset flow preheating mechanism, a ready probe, a preset flow drainage mechanism and a configuration retry mechanism in sequence to obtain the flow proportion corresponding to the application copy.

Optionally, the synchronizing data replication of application copies of the target application of the stateful workload between different nodes includes:

And synchronously copying the copy data of the application copy of the target application of the stateful workload at the initial node to the first target switching node according to a preset distributed consistency algorithm.

In a second aspect of the present invention, there is also provided an application switching apparatus, the apparatus including:

The static switching module is used for determining a first target switching node according to node resource information obtained in advance under the condition that an application switching request initiated when the target application is subjected to cross-node switching is detected, and determining the corresponding target copy number after the target application is switched according to the specification calculation force corresponding to the first target switching node; and performing flow distribution on the application copy of the target application of the stateless workload according to a flow segmentation strategy to obtain a flow proportion corresponding to the application copy; and synchronizing data replication of application copies of the target application of the stateful workload among different nodes;

And the dynamic optimization switching module is used for completing all processing of the corresponding service requests on the initial nodes corresponding to the target application when the service level data corresponding to the target application before and after switching is detected to be consistent, and the target application is in a cross-node switching completion state under the condition that the application copy data corresponding to the target application before and after switching is consistent.

In a third aspect of the present invention, there is also provided a communication device comprising: a transceiver, a memory, a processor, and a program stored on the memory and executable on the processor;

The processor is configured to read a program in a memory to implement the application switching method according to any one of the first aspect.

In a fourth aspect of the present invention, there is also provided a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to implement the application switching method according to any one of the first aspects.

In a fourth aspect of the invention, there is also provided a computer program product comprising a computer program/instruction which, when executed by a processor, implements the application switching method according to any of the first aspects.

According to the application switching method provided by the embodiment of the invention, under the condition that an application switching request initiated when a target application is subjected to cross-node switching is detected, a first target switching node is determined according to node resource information obtained in advance, and the corresponding target copy number after target application switching is determined according to the specification calculation force corresponding to the first target switching node; and performing flow distribution on the application copy of the target application of the stateless workload according to a flow segmentation strategy to obtain a flow proportion corresponding to the application copy; and synchronizing data replication of application copies of the target application of the stateful workload among different nodes; and when the service level data corresponding to the target application before and after switching is detected to be consistent, all the service requests corresponding to the initial nodes corresponding to the target application are processed, and the application copy data corresponding to the target application before and after switching is consistent, the target application is in a cross-node switching completion state. In the embodiment of the invention, the balance scheduling based on the specification calculation force can ensure the consistency of the calculation force before and after the application switching by considering the environment of the application operation; the service flow aiming at stateless application is split according to the flow based on a gateway or a load equalizer, so that the service quality consistency before and after application switching can be ensured; the method and the device can ensure the consistency of functions before and after application switching by aiming at the data synchronization of stateful applications based on efficient negotiation and data synchronization mechanisms, continuously adjust and optimize the equivalent effect of the application cross-architecture switching service quality before and after consistency by sensing service calculation force indexes in real time, and form an observation type compensation feedback design.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a flowchart illustrating steps of an application switching method according to an embodiment of the present invention;

fig. 2 is a second flowchart of a step of an application switching method according to an embodiment of the present invention;

Fig. 3 is a schematic structural diagram of an application switching device according to an embodiment of the present invention;

Fig. 4 is a schematic diagram of a communication device according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a copy termination process according to an embodiment of the present invention;

Fig. 6 is an overall schematic diagram of an application switching method according to an embodiment of the present invention;

fig. 7 is a schematic diagram of an application switching system architecture according to an embodiment of the present invention;

Fig. 8 is a flowchart of an application switching method according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings. However, those of ordinary skill in the art will understand that in various embodiments of the present application, numerous technical details have been set forth in order to provide a better understanding of the present application. The claimed application may be practiced without these specific details and with various changes and modifications based on the following embodiments. The following embodiments are divided for convenience of description, and should not be construed as limiting the specific implementation of the present application, and the embodiments can be mutually combined and referred to without contradiction.

Referring to fig. 1, a first step flowchart of an application switching method provided by an embodiment of the present invention is shown, where the method may include:

Step 101, under the condition that an application switching request initiated when a target application performs cross-node switching is detected, determining a first target switching node according to node resource information acquired in advance, and determining the number of target copies corresponding to the target application after switching according to a specification calculation force corresponding to the first target switching node; and performing flow distribution on the application copy of the target application of the stateless workload according to a flow segmentation strategy to obtain a flow proportion corresponding to the application copy; and synchronizing data replication of application copies of the target application of the stateful workload among different nodes;

It should be noted that, the overall architecture diagram of the present application is shown in fig. 7, it can be seen that, based on the Operator and the service management technology, the present application realizes automatic scheduling and migration of the cloud native application in a cloud multi-core scenario through three major parts of node scheduling, traffic segmentation and data synchronization, and can ensure equivalence and smoothness in the process, in fig. 7, pod is the minimum scheduling unit in Kubernetes, one Pod encapsulates one container (multiple containers may also be encapsulated), and the containers in Pod share and store

Storage, networking, etc. That is, the entire pod can be considered as a virtual machine, then each container corresponds to a process running on the virtual machine, all containers in the same pod are uniformly arranged and scheduled, kubernetes is also known as k8s, and is an open-source container orchestration platform.

It should be noted that, in the embodiment of the present application, in order to implement smooth switching between heterogeneous nodes for a target application, referring to fig. 6, in a static stage, when an application switching request is initiated, it is necessary to calculate the number of copies equivalent to the current standard computing power of different nodes when the target application is switched, determine a policy of flow splitting according to performance differences of different processor architectures, and ensure state consistency synchronization when the application is in a state, so as to implement online use, and quickly respond to a request for application switching, which can be implemented in three aspects.

For the environmental consideration of application operation, the balanced scheduling based on the specification calculation force is a key for ensuring the consistency of calculation force before and after application switching.

Therefore, the determining the first target switching node according to the node resource information acquired in advance, and determining the number of target copies corresponding to the target application after switching according to the specification calculation force corresponding to the first target switching node includes: determining a first target switching node according to node resource information acquired in advance; acquiring a first specification calculation force corresponding to the first target switching node and an initial specification calculation force corresponding to an initial node; and generating target copy numbers corresponding to the target application on the first target switching node according to the first specification calculation force, the initial specification calculation force and the initial copy numbers.

Further, the obtaining the first specification computation force corresponding to the first target switching node and the initial specification computation force corresponding to the initial node includes: respectively acquiring the shaping operation capability and the floating point operation capability corresponding to the first target switching node and the initial node according to the evaluation of the reference test tool; and quantizing the first specification calculation force corresponding to the first target switching node and the initial specification calculation force corresponding to the initial node according to the shaping calculation capability and the floating point calculation capability.

Specifically, the specification calculation force is calculated: based on the SPEC CPU benchmark test tool, the integer and floating point operation capacities of the node CPU are evaluated, geometric average processing is carried out on all operation capacities, and then the calculation capacities of heterogeneous CPUs are quantized. The calculation formula based on SPEC CPU is as follows:

(equation 1)

Wherein, in the above formula 1,For an integer multi-tasking throughput base value,For the floating point multitasking throughput base value,/>Calculating a speed base for an integer single task,/>Calculating a speed base value for a single task of the floating point type, a0+a1=100, expressed as a ratio of the shaping and floating point type (A0, A1), defaulting a0=50; b0 +b1=100, expressed as the ratio of the calculation type and IO type (B0, B1), defaulting b0=50, the spec2017 test has two evaluation modes, speed and rate, speed is used to test the time required to complete a task, rate is used to test how many tasks can be completed in a unit time, speed and rate tests include the score of shaping (int) and floating point test, the score of Base (Base) and Peak (Peak) test, and the meaning of the index in equation 1 is shown in table 1.

TABLE 1 indicating meaning table in an exemplary specification calculation formula

And calculating the copy number of the application when the application is switched based on the result of the specification calculation, and specifically calculating the scheduled application copy number and node selection. The node preferentially selects a target with sufficient resources, and the calculation formula of the target copy number is designed as follows:

(equation 2)

In the above formula 2, ceil is an upward rounding function.

It should be noted that, in the embodiment of the present application, before the first target switching node is determined, the specification calculation force of all the available nodes in the cluster is obtained by default, because the specification calculation force of the target node is known first when the switching occurs, and the number of copies can be calculated.

However, the existence of a new node in the dynamic adjustment, which can reselect the target node selection, allows dynamic fine-tuning when the first target switching node fails, or a new, more suitable new node appears.

Therefore, in the embodiment of the present application, the standard computing power of all available nodes in the cluster needs to be obtained, specifically, the cluster state can be trained by the resident program according to the life cycle of the nodes, when the node is found to be added first, the standard computing power calculation job is triggered, the standard computing power mapping table (Key-value of the node and the standard computing power) is updated, when the node is deleted, the standard computing power mapping table can be triggered to be updated again, and further, the determination is made

A new target switching node.

Secondly, aiming at service flow of stateless application, the on-demand segmentation of the flow based on a gateway or a load equalizer is a key for ensuring the consistency of service quality before and after application switching;

Therefore, according to a preset flow drainage mechanism, a preset flow preheating mechanism, a ready probe and an element and configuration retry mechanism, the application copy of the target application of the stateless workload is subjected to flow distribution, and the flow proportion corresponding to the application copy is obtained.

Specifically, in the cross-architecture switching process, the application Pod instance of the original architecture is terminated, and meanwhile, the application Pod instance is newly built at the destination architecture node. In this process, it is necessary to ensure that the user request is normally received and processed, so as to achieve the purpose that the cross-architecture handover does not affect the application service level SLA. After analyzing a flow and a key path of cross-architecture switching, the smooth switching method based on flow segmentation comprises the following four strategies:

First, a flow draining mechanism is preset, in order to achieve the purpose of smoothly ending the service, it is necessary to achieve graceful exit of the application, and the graceful meaning is essentially to slow down the original service termination and flow switching process, so as not to interrupt the service request of the user. The application combines the flow drainage (TAFFIC DRAIN) mechanism with the Pod termination flow, and can achieve the effect of graceful exit of the application. The flow drainage refers to gradually removing the old service node load balancer flow pool after waiting for the old service node to process the received request while gradually guiding the flow to the new service node when the service gray scale is upgraded, so as to avoid service unavailability and slow response in the service change process and bring service unavailability experience to the user. A key issue to consider in this process is how to determine that the old serving node has processed the existing request.

Referring to fig. 5, fig. 5 shows an effect of realizing flow drainage by disposing a stop hook (preStop Hook) to a container in a termination flow of application Pod. First, the running Pod accepts the delete command from APISERVER, and the Pod goes to the terminating process, where a delay exit time, i.e., a stop time (preStop time), is set, indicating the waiting time before the Pod is deleted; next, upon expiration of preStop time, or completion of the Hook function execution, kubelet sends an end-of-program Signal (SIGTERM) to the container, which continues to complete all requests being processed, including requests that were reached in response to the Pod receiving the delete command; finally, kubernetes sends a forced termination process Signal (SIGKILL), which forces the container, noting that if all requests in the received queue are not completed within preStop time, the received but not normally processed requests may be lost.

As can be seen from the above Pod termination procedure, preStop time set during the flow drainage is critical, and the default value is 30s. On the other hand, whether the old service node has processed all the requests can also be judged by a monitoring method, the monitoring indexes generally comprise the request number, processing time, error rate and the like of the old service node, and the change of the indexes is checked in real time by a tool in the switching process to judge the request processing condition of the old service node. Meanwhile, the log can be used for judging, and in the switching process, the log of the new and old version service is compared to check whether the request is repeatedly processed or unprocessed is missed. Based on the monitoring and logging methods, the threshold setting of the drain time (determined by the time actually occurring in the log record) is further optimized.

Second, the pre-set flow preheating process, because the system is in the initialization phase, the key resources are in a low water level state, including the connection pool, the thread pool, etc. If the flow suddenly increases, the water level of the system may rise instantaneously, resulting in an increase in response time due to the creation of key resources, further blocking more requests, causing great consumption of memory, socket, and the like, and finally resulting in a dead system. The flow preheating (WarmUp) can enable the flow passing through to slowly increase, gradually increase to the upper limit of the threshold value in a certain time, and provide a key resource preheating creation time for the cold system so as to avoid the cold system breakdown. The warm-up start function can slowly adjust the flow to a specific threshold in a specified time by adjusting the slope at any time through a Guava algorithm.

According to the application, simulation request preheating and small flow preheating are carried out on the application by writing the pressure measurement script, and firstly, the weight is calculated at the service consumption end according to the starting time of each service provider instance. Secondly, a process that the flow of the application which is just started is gradually increased to a normal level along with the starting time is controlled by combining a load balancing algorithm, so that the pre-heating is assisted in the operation which is just started.

The calculation of the small flow service preheating model realized by the method is shown in a formula 3, the application flow segmentation proportion f (x) of the target architecture linearly changes along with the calling moment x, x-startTime is switching start time, warmupTime is application preheating time configured by a user, k is a constant, k can be expressed as 1, and the k can be used for identifying the proportion of the application cost of the target architecture to the total cost.

(Equation 3)

Third, a ready probe (READINESS PROBE) is used to detect if Pod is ready, capable of receiving traffic and processing requests. The primary role of the ready probe is to enable Kubernetes to ensure that applications inside the Pod are initialized and ready for traffic before routing the traffic to the Pod. If the ready probe detects that Pod is not ready, kubernetes will pause routing traffic to that Pod until it is ready. The principle of the ready probe is to periodically perform an HTTP request or TCP check in the Pod, and if the request or check is successfully returned, it indicates that the Pod is ready to receive traffic. If the request or detection fails, indicating that the Pod is not ready, kubernetes will continue to wait for the Pod to be ready. The application introduces one of the smooth switching strategies by the ready probe, mainly comprising three kinds of request detection, namely:

For HTTP requests, the ready probe sends GET requests to the application-exposed HTTP endpoint and checks the returned status code. If the status code is 200-399, then the application is deemed ready to process traffic. Otherwise, the ready probe will consider the application not ready.

For TCP detection, the ready probe checks whether the TCP port of the application is in an open state. If the TCP port is in an open state, the application is deemed ready to process traffic. Otherwise, the ready probe will consider the application not ready.

For command detection, the ready probe will execute a command and if the command returns a status code of 0, the application is considered ready to process traffic. Otherwise, the ready probe will consider the application not ready.

Fourth, when service provider system issues or other unpredictable service access times out, it may cause the loss of client request or the loss of response return, so the embodiment of the present application can compensate by retry mechanism, and improve the request success rate and service availability.

The application is based on Istio service grid technology, the retry mechanism adopts random retry time interval (variable jitter), limits the retry times and ensures the smoothness of service call to the maximum extent. Istio supports configuration service retry strategies with fine granularity, supports setting parameters such as retry timeout, retry times and the like, and can better achieve the target effect of retry.

Specifically, for example, a retry mechanism is set for httpbin services, where if the service does not return the correct return value within 2 seconds, a retry is set, where the retry condition is that the return code is 5xx, and the retry is performed 3 times, and the configuration example code is as follows:

- match:

- uri:

prefix: /httpbin

mirror:

host: httpbin

subset: v2

mirror_percent: 100

name: httpbin-route

retries:

attempts: 3

perTryTimeout: 2s

retryOn: 5xx

Further, the performing flow distribution on the application copy of the target application of the stateless workload according to the flow splitting policy, and obtaining the flow proportion corresponding to the application copy includes:

Specifically, the application copy of the target application of the stateless workload is subjected to flow distribution in sequence according to a preset flow preheating mechanism, a ready probe, a preset flow drainage mechanism and a configuration retry mechanism, so that the flow proportion corresponding to the application copy is obtained.

Specifically, in the flow smoothing portion, flow preheating is sequentially performed according to a sequence, so that a related system of a target node can pass through an initialization stage, namely, through a preheating model, the preheating model is a preamble small flow service preheating model and a formula 3, warmupTime in the formula 3 is application preheating duration configured by a user, which can be understood as time when an application needs to generate a switching request is received, startTime is determined by a ready probe, where k can be understood as 1, because the application is switched into a scene, and the number of application copies on the finally realized target node accounts for 100%.

It should be noted that, when the application scenario is a capacity-expanded scenario, that is, the number of partial copies is expanded to a new target node, and the initial node also retains the number of partial copies, the coefficient of the k value is set for compatibility of the application scenario.

And secondly, introducing a ready probe, judging whether the application Pod on the target node is ready or not, and receiving the flow and processing requests. This time also determines startTime in equation 3, i.e., the time at which the handoff starts.

And thirdly, configuring traffic drainage (namely applying a graceful exit mechanism), namely immediately exiting an application Pod of the initial node after switching is started, wherein the default drainage time is 30s, and meanwhile, judging and adjusting according to specific log information or monitoring information.

Finally, the container design of the retry mechanism is configured to ensure that in the event of unpredictable error causes that service is not available, the attempt to compensate can be repeated, improving the stability and availability of traffic smoothing.

It should be noted that, for the application in the handover process, the application Pod is simultaneously present on both the initial node and the first target handover node to provide services, and the traffic proportion of different nodes in the process is adjusted step by step, which can be understood that the initial node is gradually reduced to 0% by 100%, and the first target handover node is gradually enlarged from 0 to 100%.

The whole process is completed according to the four strategies, and the methods are also caused for realizing the goal of flow smoothing in the switching process.

Thirdly, aiming at the data synchronization of stateful applications, an efficient negotiation and data synchronization mechanism is a key for ensuring the consistency of functions before and after application switching.

Therefore, the copy data of the application copy of the target application of the stateful workload at the initial node is synchronously copied to the first target switching node according to a preset distributed consistency algorithm.

Specifically, firstly, the influence of overtime time is differentiated due to the differences of processing capacity, network conditions and the like of each node in a cloud multi-core system, an adaptive method based on maximum likelihood estimation can be adopted, frequent triggering of election of nodes with large heartbeat delay and weak processing capacity is avoided, and meanwhile rapid initiation of election of nodes with strong processing capacity is guaranteed. Secondly, for voting strategies, a mechanism of setting node priority or reducing random overtime value range is adopted, so that a strong node can obtain a plurality of votes more easily.

Election (leader selection) process: the master node (leader) periodically sends heartbeat to all the slave nodes (slaves) to ensure the leader status, and when one of the slaves does not receive heartbeat within a timeout period, the node is converted into a candidate node to participate in election. The influence differentiation of the overtime time is caused by the differences of processing capacity, network conditions and the like of each node in the cloud multi-core system, and an adaptive method based on maximum likelihood estimation can be adopted, so that frequent triggering of election of nodes with large heartbeat delay and weak processing capacity is avoided, and meanwhile, rapid initiation of election of nodes with strong processing capacity is ensured. For the voting strategy, a mechanism of setting node priority or reducing random overtime value range is adopted, so that the strong node can obtain a plurality of votes more easily.

Replication (log replication) process: using a legal write (quorum write) mechanism, the leader receives requests from clients, initiates write proposals to the follower and receives feedback votes, each proposal obtaining more than half the number of votes to commit the write, i.e., if the system has n nodes, node w feeds back the votesIf 1 is otherwise 0, the minimum number of write votes should satisfy the following equation 4.

(Equation 4)

Wherein, in the above formula 4,And voting feedback corresponding to the node w, wherein n is the number of the nodes, and w is the node.

Step 102, when it is detected that the service level data corresponding to the target application before and after switching is consistent, all the service requests corresponding to the initial nodes corresponding to the target application are processed, and the application copy data corresponding to the target application before and after switching is consistent, the target application is in a cross-node switching completion state.

It should be noted that, in the embodiment of the present application, during the dynamic phase, with the normal operation of the application after the cross-architecture switching, indexes related to the service quality, such as throughput, delay, TPS (Transactions Per Second), etc., are obtained through monitoring of service computing power, and then a policy for dynamic optimization is formulated according to the difference from the expected service quality and issued in time until the service level SLA (Service Level Agreement) approaches before and after the application switching.

Specifically, dynamic optimization of the three smoothing modes in step 101 is required, and specific reference is made to the following discussion.

Referring to fig. 2, a second step flowchart of an application switching method provided by an embodiment of the present invention is shown, where the method may include:

Step 201, under the condition that an application switching request initiated when a target application performs cross-node switching is detected, determining a first target switching node according to node resource information acquired in advance, and determining the number of target copies corresponding to the target application after switching according to a specification calculation force corresponding to the first target switching node; and performing flow distribution on the application copy of the target application of the stateless workload according to a flow segmentation strategy to obtain a flow proportion corresponding to the application copy; and synchronizing data replication of application copies of the target application of the stateful workload among different nodes;

step 202, monitoring the target application according to a preset monitoring component to obtain monitoring data;

Step 203, determining whether the service level data corresponding to the target application before and after switching is consistent according to the monitoring data;

further, step 203 may include:

Further, the correcting the target copy number and the first target switching node corresponding to the target application according to the monitoring data includes:

Further, the obtaining, according to the monitoring data, a difference value between the first service level data corresponding to the initial node before the target application is switched and the second service level data corresponding to the first target switching node after the target application is switched includes:

Further, the mapping matrix according to the specification calculation force and the switching node is a calculation force distribution matrix table based on the CPU model, and the calculation force distribution matrix table is used for recording the mapping matrix of the CPU model and the specification calculation force under different nodes.

It should be noted that, in the embodiment of the present application, referring to fig. 8, since the measurement standard of the computing power is difficult to be quantified by a certain fixed model due to the diversity of the service, the present application directly adopts the monitoring mode to describe the service level of the computing power, and uses the index system of throughput TPS, delay and error number to measure the computing power of a certain application.

The business calculation power index is measured by writing a script through Jmeter.

First, correction of SLA inconsistencies is achieved by increasing or decreasing the number of application copies.

Specifically, a Service Level (SLA) before and after application cross-architecture switching is obtained through a service calculation power analysis formula, an optimized and adjusted core strategy is performed according to the difference of service calculation power indexes before and after application switching, and the optimization is performed by adding or reducing the number of application copies, changing target nodes with consistent specification according to the calculation power distribution condition of CPU model numbers, and the like. The specific process is as follows:

if the data obtained by the front and back pressure measurement is greatly different before and after the cross-architecture switching is applied, the data obtained by the front and back pressure measurement can abstract the SLA into a triplet: throughput, delay, number of errors can be evaluated by quantization according to the following equation 5, and further, the number of scheduled copies is considered to be increased or decreased according to the evaluation result.

(Equation 5)

Wherein, in the above formula 5, f (x) is represented as normalization process, the application adopts maximum and minimum normalization calculation process #) The purpose is to unify dimensions in the interval mapping data values (monitoring data sets over a period of time) to [0,1 ]. A, b and c in the formula respectively represent weight values of indexes of throughput, time delay and error number, defaulting to 1, and the weight can be properly adjusted according to specific service types (such as IO intensive type or computation intensive type) so as to adapt to specific service scenes, and flexibility is improved,/>Representing/>, before a period of timeNumerical value/>Representing/>, after a period of timeNumerical value/>Representing a delay value before a period of time,/>Representing a delay value after a period of time,/>Representing the number of errors before a period of time,/>Indicating the number of errors after a period of time.

Therefore, if the value of results is smaller than the first threshold, for example, the first threshold is 2%, it is considered that the SLA is consistent, and if the value is larger than 2%, the number of applied copies is adjusted so that the step size is 1, the applied SLA values before and after the adjustment are measured again, the difference (results) between the SLA before and after the adjustment is observed, and the comparison (2%) with the target value is performed, and the cycle is repeated until the SLA is consistent.

The value of results is the difference between the first service level data corresponding to the initial node before the target application is switched and the second service level data corresponding to the first target switching node after the target application is switched.

Secondly, according to the CPU model calculation force distribution condition, the target nodes with consistent specification are changed to realize correction when the SLAs are inconsistent.

Specifically, besides the strategy of changing the number of applied copies, another optimization strategy depends on the aforementioned standard calculation force measuring and calculating process, and this process can obtain a CPU model calculation force distribution matrix table, and records the mapping matrix of CPU models and calculation forces (standard calculation forces) under different architectures, and the measuring and calculating formula is as described above, and the measuring and calculating process is supplemented as follows:

the first step, taking a cluster as a view angle, when a node joins the cluster, the computing power measuring and calculating task can be automatically scheduled through nfd-worker daemon (computing node), and the computing power measuring and calculating task can be assembled with different computing power measuring and calculating reference programs according to the requirement;

and secondly, storing the measured result in a text format in a key value pair mode, and organizing the result by a structural body of a hash table.

Thirdly, automatically reporting a node calculation force value in a periodical detection mode and detecting whether a new node is added;

Fourthly, the reported measuring and calculating result is further processed through nfd-master process (management node);

fifthly, establishing a mapping relation between calculation force and nodes, and adding labels;

And sixthly, adding the label obtained before to metadata of the node and saving the metadata in an etcd module (database).

Therefore, when the condition that the SLAs are inconsistent before and after the application cross-architecture switching is obtained through the monitoring method, the method can also be corrected by reselecting a new scheduling node, the selection of the new node is selected according to the model and specification calculation force mapping matrix of the node CPU obtained in the measuring and calculating process, the application SLA values before and after adjustment are measured and pressed again, the difference (results) between the SLAs before and after adjustment is observed, and the comparison (2%) is carried out with the target value, and the cycle is repeated until the consistency is achieved.

Step 204, judging whether the service request corresponding to the initial node corresponding to the target application is processed completely according to a second preset monitoring component;

Further, after the step of determining, according to the second preset monitoring component, whether the service request corresponding to the initial node corresponding to the target application is completely processed, the following two cases may be included:

Step 205, verifying whether data between an application copy of the target application of the stateful workload corresponding to the first target switching node and an application copy of the target application of the stateful workload corresponding to the initial node are consistent or not through a preset database query statement;

Further, after step 205, if the application copies of the target application with the state workload are inconsistent, i.e. if the application copies are not consistent, the data replication synchronization is performed again between different nodes until the application copy data corresponding to the target application before and after switching is consistent.

Step 206, when it is detected that the service level data corresponding to the target application before and after switching is consistent, all the service requests corresponding to the initial nodes corresponding to the target application are processed, and the application copy data corresponding to the target application before and after switching is consistent, the target application is in a cross-node switching completion state.

It should be noted that, in the embodiment of the present application, referring to fig. 8, before step 204, a correction is already performed for whether the SLA is consistent, so in steps 204-205, a process of correcting for flow smoothness and a process of correcting for data synchronization are included.

Specifically, for correction of the data synchronization part, verifying whether data between the application copy of the target application of the stateful workload corresponding to the first target switching node and the application copy of the target application of the stateful workload corresponding to the initial node is consistent or not through a preset database query statement.

For the process of correcting the flow smoothly, whether the old service node has processed all the requests can also be judged by a monitoring method, the monitoring indexes generally comprise the request number, processing time, error rate and the like of the old service node, and the change of the indexes is checked in real time through a tool (monitoring Agent) in the switching process, so as to judge the request processing condition of the old service node. Meanwhile, the log can be used for judging, and in the switching process, the log of the new and old version service is compared to check whether the request is repeatedly processed or unprocessed is missed. Based on the monitoring and logging methods, the threshold setting of the drain time (flow drain) is further optimized.

In the embodiment of the invention, the balance scheduling based on the specification calculation force can ensure the consistency of the calculation force before and after the application switching by considering the environment of the application operation; the service flow aiming at stateless application is split according to the flow based on a gateway or a load equalizer, so that the service quality consistency before and after application switching can be ensured; the method and the device can ensure the consistency of functions before and after application switching by aiming at the data synchronization of stateful applications based on efficient negotiation and data synchronization mechanisms, continuously adjust and optimize the equivalent effect of the application cross-architecture switching service quality before and after consistency by sensing service calculation force indexes in real time, and form an observation type compensation feedback design.

Referring to fig. 3, a schematic structural diagram of an application switching device according to an embodiment of the present invention is shown, where the device may include:

The static switching module 301 is configured to determine, when an application switching request initiated when a target application performs cross-node switching is detected, a first target switching node according to node resource information obtained in advance, and determine, according to a specification calculation force corresponding to the first target switching node, a target copy number corresponding to the target application after switching; and performing flow distribution on the application copy of the target application of the stateless workload according to a flow segmentation strategy to obtain a flow proportion corresponding to the application copy; and synchronizing data replication of application copies of the target application of the stateful workload among different nodes;

And the dynamic optimization switching module 302 is configured to, when it is detected that service level data corresponding to the target application before and after switching is consistent, complete all processing of service requests corresponding to the initial node corresponding to the target application, and when application copy data corresponding to the target application before and after switching is consistent, enable the target application to be in a cross-node switching completion state.

The embodiment of the invention also provides a communication device, as shown in fig. 4, comprising a processor 401, a communication interface 402, a memory 403 and a communication bus 404, wherein the processor 401, the communication interface 402 and the memory 403 complete communication with each other through the communication bus 404,

A memory 403 for storing a computer program;

The processor 401, when executing the program stored in the memory 403, may implement the following steps:

In yet another embodiment of the present invention, a computer readable storage medium is provided, in which instructions are stored, which when run on a computer, cause the computer to perform the application switching method according to any one of the above embodiments.

In a further embodiment of the present invention, a computer program product comprising instructions which, when run on a computer, cause the computer to perform the application switching method of any of the above embodiments is also provided.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. An application switching method, the method comprising:

2. The method according to claim 1, wherein, in the case that the service level data corresponding to the target application before and after the handover is detected to be consistent, all the processing of the service request corresponding to the initial node corresponding to the target application is completed, and the application copy data corresponding to the target application before and after the handover is consistent, the method includes:

3. The method according to claim 2, wherein determining whether the service level data corresponding to the target application before and after switching is consistent according to the monitoring data comprises:

4. A method according to claim 3, wherein said modifying the target number of copies and the first target switching node corresponding to the target application according to the monitoring data comprises:

5. The method of claim 4, wherein the obtaining, according to the monitoring data, a difference between the first service level data corresponding to the initial node before the target application is switched and the second service level data corresponding to the first target switching node after the target application is switched includes:

6. The method of claim 4, wherein the mapping matrix of the calculation force and the switching node according to the specification is a calculation force distribution matrix table based on a CPU model, and the calculation force distribution matrix table is used for recording mapping matrices of the CPU model and the calculation force of the specification under different nodes.

7. The method according to claim 1, wherein, in the case that the service level data corresponding to the target application before and after the handover is detected to be consistent, all the processing of the service request corresponding to the initial node corresponding to the target application is completed, and the application copy data corresponding to the target application before and after the handover is consistent, the method includes:

8. The method according to claim 7, wherein after the step of determining, according to a second preset monitoring component, whether the corresponding service request on the initial node corresponding to the target application is fully processed, the method comprises:

9. The method according to claim 7, wherein after the step of determining, according to a second preset monitoring component, whether the corresponding service request on the initial node corresponding to the target application is fully processed, the method comprises:

10. The method according to claim 1, wherein, in the case that the service level data corresponding to the target application before and after the handover is detected to be consistent, all the processing of the service request corresponding to the initial node corresponding to the target application is completed, and the application copy data corresponding to the target application before and after the handover is consistent, the method includes:

11. The method of claim 10, wherein after the step of verifying, by a preset database query statement, whether data between the application copy of the target application of the stateful workload corresponding to the first target switching node and the application copy of the target application of the stateful workload corresponding to the initial node are consistent, the method comprises:

12. The method of claim 1, wherein determining the first target switching node according to the pre-acquired node resource information, and determining the number of target copies corresponding to the target application after switching according to the specification calculation force corresponding to the first target switching node comprises:

13. The method of claim 12, wherein the obtaining the first specification computation force corresponding to the first target handover node and the initial specification computation force corresponding to the initial node comprises:

14. The method of claim 1, wherein the performing traffic distribution on the application copy of the target application of the stateless workload according to the traffic splitting policy to obtain the traffic proportion corresponding to the application copy comprises:

15. The method of claim 14, wherein the performing traffic distribution on the application copy of the target application of the stateless workload according to a preset traffic draining mechanism, a preset traffic preheating mechanism, a ready probe, and an element and configuration retry mechanism, obtaining a traffic proportion corresponding to the application copy comprises:

16. The method of claim 1, wherein synchronizing data replication of application copies of the target application of the stateful workload between different nodes comprises:

17. An application switching apparatus, the apparatus comprising:

18. A communication device, comprising: a transceiver, a memory, a processor, and a program stored on the memory and executable on the processor;

the processor being configured to read a program in a memory to implement an application switching method as claimed in any one of claims 1 to 16.

19. A readable storage medium storing a program, wherein the program when executed by a processor implements the application switching method according to any one of claims 1-16.

20. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the application switching method of any of claims 1-16.