WO2018137254A1 - 一种基于调用链的并发控制的方法、装置及控制节点 - Google Patents

一种基于调用链的并发控制的方法、装置及控制节点 Download PDF

Info

Publication number
WO2018137254A1
WO2018137254A1 PCT/CN2017/072781 CN2017072781W WO2018137254A1 WO 2018137254 A1 WO2018137254 A1 WO 2018137254A1 CN 2017072781 W CN2017072781 W CN 2017072781W WO 2018137254 A1 WO2018137254 A1 WO 2018137254A1
Authority
WO
WIPO (PCT)
Prior art keywords
service
node
service node
concurrency
threshold
Prior art date
Application number
PCT/CN2017/072781
Other languages
English (en)
French (fr)
Inventor
辛华
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP17893715.7A priority Critical patent/EP3564816A4/en
Priority to CN201780000205.1A priority patent/CN108633311B/zh
Priority to PCT/CN2017/072781 priority patent/WO2018137254A1/zh
Publication of WO2018137254A1 publication Critical patent/WO2018137254A1/zh
Priority to US16/523,480 priority patent/US10873622B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1029Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers using data related to the state of servers by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1031Controlling of the operation of servers by a load balancer, e.g. adding or removing servers that serve requests
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1034Reaction to server failures by a load balancer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1042Peer-to-peer [P2P] networks using topology management mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/61Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources taking into account QoS or priority requirements

Definitions

  • the present invention relates to the field of Internet and cloud computing, and in particular, to a method, device and control node for concurrency control based on a call chain.
  • concurrency control is the guarantee of application performance and reliability.
  • the common practice is to set different concurrency thresholds for each service node. When the number of concurrency thresholds is adjusted, the performance of the system is required.
  • the concurrency threshold of a service node in the calling chain system cannot guarantee the service requirements of the upper or lower service nodes because the service node is faulty or expanded, the application performance and Reliability cannot be guaranteed, affecting the reliability and performance of the entire system.
  • the present application provides a method, device and control node for concurrency control based on a call chain, which can adjust the concurrency threshold of a service node in a concurrent control system to ensure the performance and reliability of the concurrency control system.
  • the present application provides a method for concurrency control based on a call chain, the method being applied to a concurrency control system based on a call chain, the concurrency control system comprising a control node and a plurality of service nodes, each of which can be used
  • the control node may adjust the concurrent number threshold of the service node in the concurrent control system as follows: First, the control node acquires the analysis statistics of the service node in the concurrent control system, and determines the service node according to the analysis statistics.
  • the control node determines the updated number of concurrent thresholds of the target service node according to the concurrency threshold of the target service node, the analysis statistics, and the concurrency threshold and weight information of the neighboring service nodes of the target service node; The control node sends a concurrency threshold adjustment request carrying the updated concurrency threshold to the target serving node.
  • the distributed service call relationship is adopted in the call chain-based concurrency control system, and the business request processing process needs According to the preset order, each service node is sequentially passed.
  • the service node that needs to complete the service request processing process before the service node is called the superior service node of the service node, and needs to be
  • the service node that completes the service request processing process after the service node is called the lower-level service node of the service node, and the superior service node and the lower-level service node are concepts relative to a specified service node, and the upper-level service changes with the specified service node.
  • the superior service node includes a service node that needs to complete the service request processing before the service node and has a direct invocation relationship with the service node, and needs to be in front of the service node.
  • the lower level service node includes a service node that needs to complete the service request processing after the service node and has a direct invocation relationship with the service node, and needs to be in the service After completion of the service request processing point and the presence of other service nodes not directly call relationship with the service node, for convenience of description, the upper and lower stages can be referred to as service node neighbor nodes.
  • the weight information can be specified by the user or set to the system default value.
  • the weight information is a proportional relationship between the number of concurrent numbers of two or more service nodes of the same service node. When there is only one upper service node or lower service node of a service node, the weight information is 1. Assuming that the service node B has only one subordinate service node D, the weight of the service node D is 1. When the concurrency threshold of the service node is equal to the concurrency threshold of the service node B, the service node D can invoke all of the service node B at this time. Resource processing business request.
  • the analysis statistics may further include identifiers and quantity information of each service node processing the service request.
  • control node can periodically obtain the analysis statistics, the concurrency threshold and the weight information according to the service requirements, and also collect the above information in real time, so as to better concurrency of the service nodes in the concurrency control system based on the call chain.
  • the number threshold is adjusted and controlled.
  • the control node may monitor the status and service call status of the service node in the concurrent control system, and adjust the concurrency threshold of the target service node in the concurrent control system, so that the target service node is updated.
  • the concurrency threshold can satisfy the concurrent capability invocation relationship, so as to solve the problem that the processing performance of the concurrency control system based on the call chain is degraded due to the unreasonable concurrency threshold in the prior art, and the stability of the concurrency control system based on the call chain is guaranteed. Reliability, reducing the latency of business request processing and improving the performance of processing business requests.
  • the target service node is a service node that processes a service request failure in a processing result of analyzing the statistical data.
  • the control node determines that there is a service node whose processing result is a failure according to the processing result in the analysis statistics, it is determined that the service node is a target service node whose unreasonable number of concurrent numbers is unreasonable. Since the service node can process the service request normally for the service node whose number of concurrent thresholds is normal, and the process of detecting a service request fails on a service node, the service node may be faulty. And determining that the service node is a target service node whose unreasonable number of concurrent thresholds is determined, so that the unreasonable service node is determined to be accurate.
  • the service node may fail to process the result of the service request in any of the following cases:
  • Case 1 The threshold of the number of concurrent nodes of the target service node is set too small, and the target service node cannot satisfy the calling relationship of the upper service node or the lower service node.
  • Case 2 Some hardware failures occur in the target service node, and the processing capability of the target service node decreases. For example, some CPUs in the target service node are faulty.
  • Case 3 The network of the target service node has a network failure, such as a network flash.
  • control node may determine, by analyzing a service request in the statistics, a target service node whose number of concurrent thresholds is unreasonable, or may detect that multiple service requests with the same service node are interrupted. The situation determines the target service node whose number of concurrent thresholds is unreasonable.
  • the service request may be interrupted due to network flashing. After the network is flashed and restored, the service request can continue to be processed. Therefore, analyzing the interruption of multiple service requests will make the analysis result more accurate.
  • the control node determines, according to the analysis statistics, that the processing delay of the service request is greater than or equal to the preset threshold
  • the determining that the serving node is the concurrent threshold is not Reasonable target service node.
  • the service request delay of each service node should be less than a preset threshold for a service node with a normal number of concurrent thresholds, and the service request processing delay of a certain service request on a service node is greater than or equal to the pre-determination.
  • the service node may be faulty (such as the network fault caused by the service node's network card failure), so as to determine that the service node is a target service node with an unreasonable number of concurrent thresholds, so that the unreasonable service node is determined accurately.
  • control node may determine, by detecting a delay of multiple service requests having the same service node, a target service node whose number of concurrent thresholds is unreasonable. Analysis of multiple service request delays due to network flashing or other delays in processing service requests due to network flashing or other self-healing failures (such as abnormal service processes) Will make the analysis more accurate.
  • the number of concurrently unreasonable target service nodes is determined by using a plurality of different manners as described above, so that the manner of determining the number of concurrently unreasonable service nodes is flexible.
  • the method for the control node to determine the updated concurrency threshold of the target service node may be performed according to the following process: First, the control node acquires a service call topology relationship of the concurrent control system; The number of concurrent thresholds after the target service node is updated according to the concurrency threshold of the target service node, the service invocation topology relationship, and the concurrency threshold and weight information of the upper service node or the subordinate service node.
  • the updated concurrency threshold satisfies the preset calling relationship in the topology relationship, thereby ensuring the stability and reliability of the calling chain system, reducing the delay of the service request processing, and improving the performance of the calling chain system.
  • the service call topology relationship is relatively fixed, and the control node can save the service call topology relationship after the first time the service call topology relationship is obtained, at the next concurrency threshold.
  • the service call topology relationship is used as a reference to adjust and control the concurrency threshold of the service node in the concurrency control system based on the call chain, so as to avoid the need to obtain the topological relationship of the service call in each adjustment process. Waste of resources.
  • the call chain-based concurrency control system further includes an information processing node, where the information processing node is configured to acquire analysis statistics of the multiple service nodes, and use the analysis
  • the call relationship between each service node and other service nodes in the statistical data analyzes the service call topology relationship of the concurrent control system.
  • the analysis statistics obtained by the control node from the information processing node further include the service call topology relationship.
  • the control node may also analyze the service call topology relationship of the concurrent control system based on the calling relationship between each service node and other service nodes in the analysis statistics.
  • control node may determine, according to the service invocation topological relationship, the target service node with an unreasonable number of concurrent thresholds, and the control node may invoke the topology through the service manner by directly analyzing the statistical service to analyze the target service node with an unreasonable number of concurrent thresholds. Relationships learn the service call relationships in the entire concurrency control system, and determine the number of concurrents more quickly A target service node with an unreasonable threshold.
  • the control node may determine that the service node adjacent to the target service node is a new target service node, and use In the same way, the concurrency threshold of the new target service node is adjusted. This process is repeated, and finally the adjustment of the concurrency threshold for all service nodes in the entire concurrent control system can be completed.
  • the control node after determining, by the control node, the concurrency threshold after the target service node is updated, invokes the topology relationship, the updated concurrency threshold of the target service node, and the target service node.
  • the concurrency threshold and weight information of the adjacent service node are adjusted to the neighboring service node of the target service node to obtain the updated concurrency threshold of the neighboring service node; then the control node sends the neighboring service node to the neighboring service node.
  • the number of concurrent thresholds after the update of the adjacent service node is described. To achieve an adjustment to the number of concurrency thresholds of neighboring service nodes of the target service node.
  • the present application provides an apparatus for concurrency control based on a call chain, the apparatus having various modules for implementing a concurrency control method based on a call chain in any one of the foregoing possible implementations of the first aspect and the first aspect,
  • the module can be implemented by hardware or by corresponding software implementation by hardware.
  • the hardware or software includes one or more modules corresponding to the functions described above.
  • the present application provides a control node, which includes a processor, a memory, a communication interface, and a bus.
  • the processor, the memory, and the communication interface are connected by a bus and complete communication with each other.
  • the processor executing a computer execution instruction in the memory to execute the first aspect and any one of the first aspects by using hardware resources in the control node Possible implementations are based on the operational steps of the call chain's concurrency control method.
  • the present application provides a computer readable storage medium having instructions stored therein that, when executed on a computer, cause the computer to perform the methods described in the above aspects.
  • the control node can monitor the calling relationship of the service in the concurrency control system based on the call chain by acquiring the analysis statistics of the service node, the number of concurrent thresholds, and the weight information.
  • the concurrency threshold of the service node is such that the updated concurrency threshold can satisfy the concurrent capability invocation relationship, thereby solving the performance degradation of the concurrency control system caused by the unreasonable number of concurrency thresholds in the concurrency control system based on the call chain in the prior art.
  • the problem is to ensure the stability and reliability of the concurrency control system based on the call chain, reduce the delay of the service request processing, and improve the overall performance of the processing service request.
  • the present application may further combine to provide more implementations.
  • FIG. 1 is a schematic structural diagram of a concurrency control system based on a call chain according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a distributed service call relationship in a concurrency control system based on a call chain according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a threshold value of a control node itself controlling a number of concurrent calls in a call chain based concurrency control system provided in the prior art
  • FIG. 4 is a schematic flowchart of a call chain-based concurrency control method according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a service call topology relationship of a call chain based control system according to an embodiment of the present invention
  • FIG. 6 is a schematic structural diagram of a control node according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of another control node according to an embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of a call chain-based concurrency control system 100 according to an embodiment of the present invention.
  • the concurrency control system includes at least one service node 110, a control node 120, and information processing. Node 130.
  • the service node 110 is configured to deploy an application service, and each application service may be deployed by using a single service node 110 or a cluster of multiple service nodes 110, where the application service refers to a service required to run the same service application.
  • the Taobao shopping application includes three application services: Web service, order service, and payment service.
  • the user purchases through Taobao the user needs to select the desired product through the Web service; then, the order product is added to the order through the order service. Configure the shipping address; then complete the purchase of the desired item through the payment service.
  • each service node deploys the same service application, and load balancing policies are adopted between different service nodes to ensure the reliability and processing efficiency of the application service.
  • the application service A needs to be deployed by using a single service node A
  • the application service H is a high-load application service, and needs to utilize a cluster deployment composed of multiple service nodes, the service node H1 and the service node H2.
  • the service node H3 deploys the application service H at the same time, wherein the service node H1, the service node H2, and the service node H3 adopt a load balancing policy to jointly undertake the processing process of the application service H, thereby ensuring the reliability and processing of the application service H. effectiveness.
  • the application service may also be deployed in a distributed manner, and the application service that processes the service request is split into different sub-services and deployed on different service nodes, and the sub-services with high load in the application service may also be used.
  • a clustered form of service nodes is deployed to better decouple and combine call relationships between different application services. For example, for an application service, it can be split into service A, service B, and service C.
  • Each sub-service can be deployed on one or more service nodes 110, and each sub-service can be deployed on one service node. It can be deployed on a cluster consisting of multiple service nodes, and can also be deployed on a single service node and cluster at the same time.
  • any service node can simultaneously
  • the process called by multiple adjacent upper-level service nodes or sub-service nodes is called concurrent.
  • the number of concurrent calls of any service node by multiple service nodes is called concurrent number, and each service node can be called multiple upper-level service nodes.
  • the maximum value that is invoked by the subordinate service node at the same time is called the concurrency threshold of the service node.
  • the calling relationship of an application service is service A->service B. Assume that service A is deployed by a cluster of three service nodes, service B.
  • the process in which the three service nodes can simultaneously invoke the service node deploying service B to complete the service request is called concurrent; the current number of times each service node in service A calls the service node deploying service B at the same time is called the service of deploying service B.
  • the number of concurrent nodes; the maximum number of simultaneous calls of the service nodes that can be deployed by the service node of the service A is called the concurrency threshold of the service node of the service A.
  • the concurrency threshold of the service A is The sum of the concurrency thresholds of the service nodes deploying Service A.
  • FIG. 2 is a schematic diagram of a distributed service call relationship in a concurrency control system based on a call chain according to an embodiment of the present invention.
  • the concurrency control system includes two call chains, and the service request 1 needs to pass through the processes of the service A, the service B, the service D, the service E, and the service F in sequence, and then the call chain corresponding to the service request A is a service.
  • Service Request 2 needs to go through the process of Service A, Service C, Service E, Service F in turn, then the call chain corresponding to Service Request 2 is the service A->Service C->Service E->Service F.
  • different service requests complete the processing of the application service through different call chains.
  • each service node can deploy only one application service or one sub-service of one application service, or multiple application services or multiple sub-services of one application service or
  • the multiple sub-services of the multiple application services are not limited in the present invention.
  • a sub-service in which only one application service is deployed per serving node is taken as an example for specific description.
  • the control node 120 is configured to dynamically adjust the number of concurrency thresholds of the serving nodes in the call chain based concurrency control system 100.
  • the control node 120 may be deployed in a single physical server or a virtual machine, or may be implemented in a joint manner, that is, the control node and the service node deploying other application services are deployed together in the same physical server.
  • the information processing node 130 is configured to collect logs of the service nodes 110, and generate analysis statistics according to the recorded content in the log, where the analysis statistics include delays, processing results, and other service nodes between the service nodes 110 processing the service requests.
  • the calling relationship such as success or failure, that is, whether each service request is interrupted on each service node 110.
  • the information processing node 130 can be deployed in a separate physical server or virtual machine.
  • the service node may record the response time, input parameters, and the like of each service request in a log manner, and the log follows a unified log management model, and each service request
  • the log content can be customized, for example, TraceId
  • the initial service node is also referred to as a root node; the CalledNodeId is used to identify the service node that passes through the service request processing process, that is, the call relationship between the current service node and other service nodes in the process of processing the service request may be identified, thereby Analyze all service nodes passing through a certain TraceId process to determine the topological relationship.
  • the root node of the service request with TraceId 1 is service node A, service node A calls service node B, and service node B calls service node D, service node D. Calling service node E, and service node E calling service node F, , You can get a call chain service request. If the root node corresponding to another TraceId is also recorded in the log of the service node, the service node A calls the service node C, the service node C calls the service node E, and the service node E calls the service node F, thereby You can get another call chain of business requests. Finally, you can get the service call topology relationship between each service node by integrating different call chains. Span is used to identify the service request. The delay of each service node is obtained, so that the processing delay of each service request on each service node can be obtained.
  • the sub-service deployment form information may also be recorded in the log of each service node of the deployment sub-service.
  • the service E is served by the service node E1 and the service node E2.
  • the service node E3 is composed.
  • the service node E1, the service node E2, and the service node E3 record the service in a cluster form and other service node identifiers in the same cluster.
  • the service node identifier may be a service node. Name or IP.
  • other service call parameters may also be recorded according to service requirements, for example, the processing result of each service request at each service node is recorded. For example, by adding a Result identifier, it is possible to record whether the call of each TraceId is successful at each service node. For example, the call of the service request TraceId is successfully determined by Result0, and the call of the service request TraceId is failed by Result1, thereby obtaining various services.
  • Each service request needs to pass a TraceId when processing.
  • the processing node collects and processes the logs recorded by the service node, and calculates the complete call chain information of the service request according to the TraceId. According to the call chain information, the performance of the service request processing process can be easily analyzed.
  • Table 1 shows an example of analysis statistics of a distributed service invocation relationship in a concurrency control system based on a call chain, where Span AB is used to indicate the time elapsed since the service request arrived from the service node A to the service node B, That is, it represents the delay of the service node A, and thus the meanings of Span AC, Span BD, Span CE, Span DE, Span EF are known; if the TraceId is 1, it indicates that the call chain passes the service node A->service node B-> Service node D->Service node E->Service node F one-time service request; when TraceId is 2, it indicates that the call chain passes a service request of service node A->service node C->service node E->service node F.
  • the information processing node may determine, according to other TraceIds, the status of each service node invoked by the service request corresponding to the TraceId.
  • the information processing node may By counting the service nodes that the same TraceId passes through, and the data corresponding to the TraceId (such as SpanId), the number of service requests of each service node, the service request delay, the processing result, and other service nodes are determined. The relationship is invoked, and the call relationship between each service node and other service nodes is further analyzed to determine the service call topology relationship in the concurrent control system.
  • the information processing node collects the log of the service node in a form of periodic collection or real-time collection.
  • the analysis statistics may further include identifiers and quantity information of each service node processing the service request.
  • control node 120 and the information processing node 130 in FIG. 1 may also be integrated in a physical server or a virtual machine, which is not only used to adjust the concurrency threshold of the service node in the concurrent control system, but also It is used to collect the delay of each service node to process the service request, the processing result and the calling relationship between other service nodes.
  • the concurrency threshold of the service node in the call chain-based concurrency control system shown in FIG. 2 there are usually two control methods, one is to deploy the same application service at the start of the call of the service request.
  • the service node 110 of each sub-service sets a unified concurrency threshold.
  • the concurrency threshold of each service node 110 is set to 100 through a unified interface or a web interface. The problem in this way is that the top-level service cannot perceive the underlying services.
  • the top-level service is a child service initiated in the application service process, such as service A;
  • the underlying service is a sub-service that has an indirect call relationship with the top-level service during the application service process, such as service D, service E, service F is the underlying service of service A, but in order to prevent the performance of the entire system from degrading a sub-service failure, more resources are reserved, resulting in waste of resources.
  • the service nodes 110 of each sub-service deploying the application service respectively set different concurrency thresholds, such as setting the maximum number of concurrency that can be supported by itself as the concurrency threshold of the serving node. Referring to FIG. 3, FIG.
  • FIG. 3 is a schematic diagram of a concurrency threshold of each service node in a concurrency control system based on the call chain system corresponding to the call chain system shown in FIG. 2, as shown in FIG. 3, the service A is served by the service node A.
  • service B is provided by service node B
  • service C is provided by service node C
  • service D is provided by service node D
  • service E and service F are deployed in cluster form, service E is served by service node E1, service node E2, service node E3 Co-deployment; service F is deployed by service node F1, service node F2, and service node F3.
  • Each service node in the cluster adopts a load balancing policy to complete the processing process of each call chain.
  • Each service node sets different concurrency thresholds according to its maximum supportable concurrency number. For example, the service node E1, the service node E2, and the service node E3 that provide the service E support a maximum of 50 concurrent numbers, then the service E is provided.
  • the maximum concurrent processing capability of the cluster is 150; the service node F1, the service node F2, and the service node F3 providing the service F each support a maximum of 50 concurrent numbers, and then the maximum concurrent processing capability of the cluster providing the service F is 150; the service node C and The service node D serves as the upper service node of the service E, and the number of concurrent thresholds is 100 and 50, respectively.
  • a node is a concept relative to a specified service node. As the specified service node changes, the correspondence between the upper-level service node and the lower-level service node also changes.
  • the upper-level service node includes the need to complete the service before the service node. a service node that requests processing and has a direct invocation relationship with the service node, and other service nodes that need to complete service request processing before the service node and have an indirect calling relationship with the service node; the subordinate service node includes the service node that needs to be in the service node It A service node that completes the service request processing and has a direct invocation relationship with the service node, and other service nodes that need to complete the service request processing after the service node and have an indirect calling relationship with the service node, for the convenience of description, the superior and the superior The subordinate service nodes are collectively referred to as adjacent service nodes.
  • the service node A and the service node B are their superior service nodes, and the service node B is The service node C completes the business process and has a direct invocation relationship with the service node C.
  • the service node A is a superior service node that performs the service processing before the service node C, but has a non-direct invocation relationship with the service node C.
  • Node D is its subordinate service node; for service node B, service node A is its superior service node, service node C and service node D are its subordinate service nodes, and service node C completes service processing after service node B and The service node B has a lower-level service node that directly invokes the relationship, and the service node D completes the service after the service node B.
  • the concurrency threshold of the serving node in the concurrency control system shown in FIG. 3 is controlled by the above method, when a certain service node providing the service E fails, for example, the service node E1 is faulty, the maximum concurrency of the cluster providing the service E is obtained.
  • the capability changes from 150 to 100.
  • the service request of A and B may fail, but the failure is
  • the service processing calls the service C or the service D the service E is faulty.
  • the service request is not processed, and the upper-level service node of the service node E1 performs useless work and wastes resources.
  • the sum of the concurrent capabilities of the service nodes providing the service E decreases, and the service pressure of the lower-level calling node is reduced, but the downlink service F of the service E still provides reserved resources of three service node sizes, which also leads to waste of resources.
  • the threshold of the number of concurrent nodes of the service node A is greater than the sum of the thresholds of the number of concurrent nodes of the service node B and the service node C. At this time, the service node A is supported.
  • the concurrency threshold of 250 reserves the corresponding resources, but when the service request is actually processed, the reserved resources are redundant and the resources are wasted.
  • the embodiment of the present invention provides a concurrency control method based on a call chain. After the control node 120 obtains the analysis statistics of the plurality of service nodes, the target service node whose threshold number is unreasonable is determined, and then The control node 120 dynamically adjusts the concurrency threshold of the target service node, so that the reliability of the call chain system composed of the service nodes can be guaranteed in real time.
  • FIG. 4 is a schematic flowchart diagram of a call chain-based concurrency control method according to an embodiment of the present invention. The method is applied to the call chain-based concurrency control system shown in FIG. 1. As shown in FIG. 4, the method may include the following steps:
  • the control node acquires analysis statistics of each of the plurality of service nodes.
  • the analysis statistics include a delay of processing a service request by each service node in a call chain-based concurrency control system, a processing result, and a calling relationship between other service nodes.
  • the processing result is used to identify the success or failure of the processing service request, such as success or failure.
  • the processing result is a failure, it indicates that the service request is interrupted during the processing of the service node.
  • the analysis statistics are obtained by the information processing nodes in the call chain system 100 as shown in FIG. 1 according to the logs of the respective service nodes collected and analyzed.
  • control node acquires the method for analyzing the statistical data, and the control node may send the request message for obtaining the analysis statistical data to the information processing node, and the information processing node sends the analysis statistical data to the control node according to the request message.
  • the control node obtains the analysis statistics, which may be a periodic operation or a real-time operation, which is not limited in the present invention.
  • the form of the analysis statistical data obtained by the control node from the information processing node may be analysis statistical data including the delay of each service node processing the service request, the processing result, and the calling relationship between other service nodes; optionally, information processing
  • the node can also further analyze the analysis statistics to obtain the service call extension between each service node.
  • the analysis statistics can also include the service call topology relationship.
  • the control node can analyze the service call of the call chain system more intuitively based on the analysis statistics.
  • the information processing node further analyzes the analysis statistics, traces the service nodes that each call chain passes through the TraceId, and then combines all the call chains to form a call relationship between the service nodes, that is, the service of the application service. Call the topology relationship. If the control node obtains the analysis statistics that have not been further processed, the control node can further obtain the service call topology relationship by using the method. Make the processing more flexible.
  • the control node determines, according to the analysis statistics, a target service node whose threshold number is unreasonable.
  • control node analyzes the target service node whose threshold number is unreasonable according to the analysis statistics acquired by S401, wherein the manner in which the control node determines the target service node whose number of concurrent thresholds is unreasonable includes at least one of the following manners:
  • control node may determine, by analyzing the processing result in the statistical data, a service node whose number of concurrent thresholds is unreasonable.
  • the processing result in the analysis statistics of any service node is a failure, it is determined that the service node is a target service node with an unreasonable number of concurrent numbers.
  • the service node may fail to process the service request in any of the following cases:
  • Case 1 The threshold of the number of concurrent nodes of the target service node is set too small, and the target service node cannot satisfy the calling relationship of the upper service node or the lower service node.
  • Case 2 Some hardware failures occur in the target service node, and the processing capability of the target service node decreases. For example, some CPUs in the target service node are faulty.
  • Case 3 The network of the target service node has a network failure, such as a network flash.
  • control node may determine whether the service node is a target service node with an unreasonable number of concurrent thresholds by detecting whether a service request fails at a certain service node.
  • the target service node whose unambiguous number of concurrency thresholds is determined may be determined by detecting a service request failure condition of multiple service requests having the same service node. It can be understood that, for a single service request, the service request may fail due to network flashing. After the network flash recovery, the service request can continue to be processed. Therefore, analyzing the failure of multiple service requests will make the analysis result more accurate.
  • each service is deployed by one service node, and the control node first obtains analysis statistics of each service node, according to each The analysis statistics of the service nodes can determine the processing process of the service request.
  • the control node acquires There is no analysis statistical data of the service node E in the information. At this time, it can be determined that the service node E is faulty, that is, the service node E is determined to be a target service node whose unreasonable number of concurrent nodes is unreasonable.
  • the call chain-based concurrency control system shown in FIG. 2 it is also possible to determine the unreasonable target of the concurrent number threshold by analyzing the failure condition of multiple service requests at the same time.
  • Service node Assume that each service is deployed by a service node. As can be seen from Figure 2, the concurrency control system includes two call chains, each of which can process a service request. If the service node E fails, the control node can analyze the statistics.
  • Determining the service request through service node A->service node B->service node C->service node E at service node E If there is a failure, it is also possible to determine that the service request through the service node A->service node B->service node D->service node E fails at the service node E by analyzing the statistical data, and the service can be determined more accurately at this time.
  • the node E is a target service node whose number of concurrent thresholds is unreasonable.
  • the method for determining the service node that fails the service in each service request is the same as that in the foregoing embodiment, and details are not described herein again.
  • the control node may determine an unreasonable service node according to whether the delay of processing the service request by each service node in the analysis statistics exceeds a preset threshold.
  • the preset threshold may be a service request delay threshold that may be used to identify an unreasonable service node according to experience, or may be used to observe a service request delay of each service node according to the analysis statistics, so that one determined may be used. Identifies the service request delay threshold for an unreasonable service node.
  • the service request delay of each service node is less than a preset threshold, and the service request delay of a certain service request on a service node is greater than or equal to the pre-determination. If the threshold is set, it indicates that the service node may be faulty (such as a network failure), thereby determining that the service node is a target service node whose number of concurrent thresholds is unreasonable.
  • control node may also determine, by using a preset threshold of the delay of the multiple service nodes in the same call chain, the target service node whose number of concurrent thresholds is unreasonable.
  • control node may also be a target service node that determines that the number of concurrent thresholds is unreasonable by detecting a delay of multiple service requests. It can be understood that, for a single service request, multiple services may be analyzed due to network flashing or other delays in processing a service request due to a self-recoverable failure (such as a process abnormality in providing a service). Requesting a delay will make the analysis more accurate.
  • each service is deployed by one service node, and the preset threshold is 0.05 s, if a service request is processed.
  • the process needs to pass through the service node A, the service node B, and the service node E, and detects that the service request delay through the service node A and the service node B is 0.01 s by analyzing the statistical data, and the processing of the service request through the service node E is processed.
  • the delay is 0.1s, and the delay of the service node E processing the service request is greater than the preset threshold.
  • the service node E is determined to be the target service node whose unreasonable number threshold is unreasonable.
  • the target service node whose the number of concurrency threshold is unreasonable may be unreasonable for the initial setting of the concurrency threshold, for example, the initial setting of the concurrency threshold is too large, or the initial setting of the concurrency threshold is set. too small.
  • the target service node whose threshold number is not reasonable is also The initial setting of the number of concurrent thresholds may be reasonable, but because the control node adjusts a faulty service node during the operation of the concurrent control system, the number of concurrent thresholds of the upper or lower service nodes of the faulty service node is unreasonable. .
  • control node will use the above three methods to monitor whether there is a target service node with an unreasonable number of concurrent thresholds in the concurrent control system, so that the initial setting of the concurrent number threshold of the service node is unreasonable, the service node fails, or If the control node adjusts the concurrency threshold of a service node and causes the concurrency threshold of the upper service node or the subordinate service node to be unreasonable, the target service node with the unreasonable number threshold is determined in time, and the target service node is served. The node's concurrency threshold is adjusted to ensure the performance of the concurrent control system.
  • the control node acquires a concurrency threshold of the target serving node, and a concurrency threshold and weight information of the adjacent serving node of the target serving node.
  • the concurrency threshold refers to the maximum value that the service node can be called concurrently in the concurrency control system based on the call chain.
  • the weight information is used to identify a proportional relationship between the number of concurrent calls of the target service node by the neighboring service node, where the neighboring service node of the target service node includes an upper service node and a lower service node of the target service node, then The weight information may be used to identify a proportional relationship of the number of concurrent thresholds between the lower-level service nodes that simultaneously call the same service node in the call chain, or a proportional relationship of the number of concurrent thresholds between the upper-level service nodes of the service node. Referring to FIG.
  • the service node A there are two subordinate service nodes, namely, the service node B and the service node C. It is assumed that the concurrency threshold ratio of the service node B and the service node C is 2:3, and the service node A If the number of concurrency thresholds is 100, it can be determined that the concurrency thresholds of the service node B and the service node C are 40 and 60 respectively. At this time, the invocation relationship between the service node A, the service node B, and the service node C satisfies each other in the service.
  • the resources between the service nodes can be fully utilized, and the concurrent number threshold of the service node A can satisfy the maximum processing capability of the service node B and the service node C in the service processing process.
  • the weight information is also a proportional relationship of the number of concurrent thresholds between a plurality of upper-level service nodes that simultaneously invoke a service node. After the threshold of the number of concurrent nodes of a service node is determined, the service node may be determined according to the weight information.
  • the concurrency threshold of the upper-level service node and the concurrency threshold of the sub-service node are the same as those of the service node B and the service node C.
  • the weight information can be specified by the user or set to the system default value.
  • the weight information is a proportional relationship between the number of concurrent numbers of two or more service nodes of the same service node.
  • the weight information is 1.
  • the service node B has only one subordinate service node D, and the weight of the service node D is 1.
  • the concurrency threshold of the service node D is equal to the concurrency threshold of the service node B, the service node D may Call all resources of service node B to process the service request.
  • control node may obtain the respective concurrency threshold and weight information from each service node in real time, or periodically obtain the concurrency threshold and weight information of each service node, so that the control node can obtain the latest concurrency according to the obtained concurrency.
  • the number threshold and weight information monitor and adjust the service call of the service node.
  • the service node may directly send the concurrency threshold and the weight information to the control node.
  • the client proxy module is installed on each service node, and the client proxy module sets the concurrency threshold and weight information of each service node. Send to the control node.
  • the control node determines, according to the concurrency threshold of the target serving node, the analysis statistics, and the concurrency threshold and weight information of the neighboring service node of the target serving node, the concurrent number of the target service node after the update. Threshold.
  • the control node After determining the target service node whose number of concurrent thresholds is unreasonable, the control node firstly obtains the service between the service nodes in the concurrent control system based on the call relationship analysis between each service node and other service nodes in the analysis statistics. Call the topology relationship. Then, the control node determines the concurrency threshold of the target service node according to the concurrency threshold of the target service node, the service invoking the topology relationship, and the concurrency threshold and weight information of the neighboring service node of the target service node in S403.
  • the updated concurrency threshold enables the service nodes in the call chain to meet the calling requirements of the superior service node and/or the subordinate service node in the service invocation topology relationship, and there is no service node to reserve redundant resources to cause resources. The problem of wasting.
  • the analysis statistics obtained by the control node may also include a service call topology relationship.
  • the service invocation topology relationship may be determined by the information processing node according to the log of the service node, and then sent to the control node. It may also be determined by the control node directly according to the calling relationship of each service node and other service nodes in the analysis statistics obtained by the control node, so that the processing manner is more flexible.
  • the service call topology relationship is relatively fixed. After the first time the service call topology relationship is obtained, the control node can save the topological relationship of the service call.
  • the concurrency threshold of the service node in the call chain-based concurrency control system is adjusted and controlled to avoid the waste of resources caused by the topology relationship of the service call in each adjustment process.
  • any of the following two methods may be used to implement other services in the concurrent control system.
  • the node's concurrency threshold is adjusted.
  • step S402 can be used to re-determine that the upper-level service node and/or the lower-level service node of the target service node are new target service nodes, and the concurrency threshold of the newly determined target service node is performed in the same manner. Adjustment. This process is repeated to complete the adjustment of the concurrency threshold for all service nodes in the entire concurrent control system.
  • the target service node when the current concurrency threshold of the target serving node is set too small, for example, the target service node is initially set to have a concurrency threshold that is too small, resulting in an unreasonable concurrency threshold.
  • the target service node is initially set to have a concurrency threshold that is too small, resulting in an unreasonable concurrency threshold.
  • the control node acquires the service node. After analyzing the statistical data, it can be determined that the service node A is a target service node with an unreasonable number of concurrent thresholds. If the maximum number of concurrent nodes that the service node A can support is greater than or equal to 100, the control node directly adjusts the threshold of the concurrent number of the serving node A.
  • the control node adjusts the concurrency threshold of the serving node B and the serving node C to 25 respectively; at this time, the concurrency threshold of the target serving node is completed once. Adjustment. If the number of concurrent nodes of the service node A is 100, the threshold of the number of concurrent nodes of the service node B is 50, the threshold of the number of concurrent nodes of the service node C is 50, and the threshold of the number of concurrent nodes of the service node D is 100, and the method of step S402 can be used to determine.
  • the service node D is a target service node whose number of concurrent thresholds is unreasonable.
  • the threshold of the number of concurrent nodes of the service node D is also adjusted to 50, so that the service node D and service node B are mutually exclusive in the distributed service invocation relationship shown in Figure 2. Call the demand. If there are other target service nodes whose concurrency threshold is unreasonable in the concurrency control system, the concurrency thresholds of other service nodes may continue to be adjusted in the same manner until all service nodes are in the distributed service invocation relationship with other The calling nodes meet the calling requirements, and there is no problem that the service nodes reserve redundant resources and waste resources.
  • the target service node when the target service node is set too large for the number of concurrent thresholds, for example, the target service node is initially set to have a concurrency threshold that is too large, resulting in an unreasonable number of concurrent thresholds.
  • the number of concurrent nodes of service node A in FIG. 2 is 200
  • the number of concurrent nodes of service node B and service node C is 50
  • the ratio of the number of concurrent nodes of service node B and service node C is 1:1.
  • the sum of the number of concurrent nodes of the service node B and the service node C is 100, that is, the service node B and the service node C can only process 100 service call relationships that are simultaneously delivered by the service node A, and the control node acquires each service node.
  • the control node After analyzing the statistical data, it can be determined that the concurrency threshold of the service node A is set too large.
  • the control node sets the service node according to the concurrency threshold of the service node A, the concurrency threshold and the weight information of the subordinate service node of the service node A.
  • the concurrency threshold of A is adjusted to 100.
  • the concurrency thresholds of other service nodes of the concurrent control system can continue to be adjusted in the same manner.
  • the target service node is determined by S402, and then the target service node can be concurrently run according to the analysis statistics.
  • the maximum number of service requests determines the maximum number of service requests as the number of concurrent thresholds after the target service node is updated.
  • the service E and the service F are deployed in a cluster form, and both the service node C and the service node D call multiple service nodes deploying the service E for service processing.
  • the number of concurrent nodes of the service node C is 100
  • the number of concurrent nodes of the service node D is 50
  • the ratio of the number of concurrent nodes of the service node C and the service node D is 2:1.
  • the sum of the number of concurrent thresholds of the service cluster should be 150, and the number of concurrent nodes of the service node E1, the service node E2, and the service node E3 is 50.
  • the cluster deploying the service E can simultaneously process the service node C and the service node D.
  • the sum of the number of concurrent thresholds is 150 for processing requests.
  • the service node of the service E When the service node of the service E is faulty, the sum of the number of concurrent thresholds of the cluster deploying the service E becomes 100. If the service node C and the service node D still perform the service call relationship of the service request according to the concurrency threshold configured at the current time. If some of the service requests fail to be processed, the control node can determine the faulty node as an unreasonable target service node based on the analysis statistics, and directly determine the maximum service that can be concurrently run on the target service node according to the analysis statistics. The number of requests is 100, so the sum of the updated concurrency thresholds of the service nodes deploying the service E can be adjusted to 100.
  • the step S402 can be used to determine that the upper-level service node and/or the lower-level service node of the service node deploying the service E are the target service nodes whose ambiguity threshold is unreasonable.
  • the service node C can be re-determined as the target with an unreasonable number of concurrent thresholds.
  • the serving node then adjusts the updated number of concurrent nodes of the serving node C according to the concurrency threshold of the subordinate serving node E of the serving node C and the weight data so that the serving node E can satisfy the calling requirement of its superior serving node. Repeat the above steps to adjust the concurrency thresholds of other service nodes in the concurrent control system in turn until all service nodes meet the call with other service nodes in the distributed service invocation relationship. Demand, and there is no problem that the service node reserves redundant resources and causes waste of resources.
  • the control node may adjust the concurrent number threshold process of the service node in the concurrent control system by adjusting the target service node after adjusting the task first, and then sequentially adjusting the concurrent number of the upper service node and/or the lower service node of the target service node. Threshold, and finally send the updated concurrency threshold to the service node that needs to be adjusted. It is also possible to confirm only one target service node for each adjustment task, adjust the concurrency threshold of the target service node, and then use the same method to perform the concurrency threshold of the upper service node or the lower service node of the target service node. Adjustment.
  • the control node determines the updated number-of-conversion threshold of the target service node
  • the control node further invokes the topology relationship according to the service, the updated concurrent threshold of the target service node, and the The concurrency threshold and the weight information of the neighboring service node of the target service node are adjusted to the upper service node of the target service node to obtain the updated concurrency threshold of the upper service node.
  • the adjustment principle when adjusting the concurrency threshold of the upper service node and/or the lower service node of the target service node by using the above method is the same as the adjustment principle when adjusting the concurrency threshold of the target service node. Therefore, when the concurrency threshold of the target service node is adjusted, and the concurrency thresholds of the upper service node and/or the lower service node of the target service node are sequentially adjusted, all the service nodes in the concurrent control system are caused to be in the distributed service. The calling relationship is satisfied with other service nodes in the calling relationship, and there is no problem that the service node reserves redundant resources and wastes resources.
  • the threshold of the number of concurrent connections of the service node B is 100.
  • the number of concurrent nodes of service node C is 50
  • the threshold of concurrent number of service node E is 50
  • the ratio of the number of concurrent nodes of service node B and service node C is 1:1
  • the maximum number of concurrent nodes of service node can be 120.
  • the service node C cannot meet the calling requirement of the service node A. Therefore, it can be first determined that the service node C is a target service node whose ambiguity threshold is unreasonable.
  • the control node is based on the concurrent number threshold of the service node C and the upper service node A.
  • Concurrency threshold and weight information determining that the number of concurrent nodes after the update of the service node C is 100, and then according to the service call topology relationship shown in FIG. 2, the updated concurrency threshold 100 of the service node C, and the service node C.
  • the number of concurrent thresholds and weight information of the lower-level service node E adjusts the concurrency threshold of the service node E, and obtains the updated concurrency of the service node E.
  • Threshold is 100, in order to meet the service call relations services node C.
  • the service node to be adjusted includes any one of the following situations:
  • Case 1 The upper-level service node and the lower-level service node of the target service node whose number of concurrent thresholds are unreasonable.
  • the control node may affect the ability of the lower-level service node to process the service request, that is, the service when adjusting the concurrent threshold of the serving node C.
  • the concurrency threshold of node E also needs to be adjusted.
  • the upper-level service node and the lower-level service node of the target service node with the unreasonable number of concurrent thresholds include a calling relationship with the target service node whose ambiguity of the number of concurrent thresholds is present, and the adjacent upper-level service node and the lower-level service node, and A target service node whose number threshold is unreasonable has an invocation relationship and a non-adjacent upper service node and a lower service node.
  • each service is deployed by one service node. If the service node E is a target service node whose number of concurrent thresholds is unreasonable, the control node determines that the service node C and the service node F need to adjust the number of concurrent thresholds. The threshold of the concurrent number of the service node C and the service node F is updated. The current node number threshold of the service node A is unreasonable. At this time, the control node also needs to adjust the concurrency threshold of the service node A.
  • determining the concurrency threshold of the target service node does not need to be adjusted according to the concurrency threshold of the target service node, the concurrency threshold and the weight information of the upper service node or the subordinate service node of the target service node, for example,
  • the maximum number of concurrent calls that can be supported by the target service node is smaller than the number of concurrent nodes of the upper-level service node or the lower-level service node.
  • the number of concurrent nodes of the target service node or the lower-level service node needs to be adjusted to ensure the entire concurrency control. System performance.
  • control node may also adjust the concurrency threshold of the calling chain system by closing the serving node, thereby reducing resource loss.
  • the control node adjusts the number of concurrent thresholds
  • the number of concurrent nodes of the service node deploying the service E needs to be adjusted from 150 to 100
  • the service E is served by the service node E1 and the service node E2.
  • the service node E3 provides the same, each service node provides 50 concurrent numbers, so one of the service nodes can be shut down at this time.
  • the service node E1 can be shut down, thereby reducing the resource loss of the service node E1.
  • the control node sends a concurrent number threshold adjustment request to the target serving node, where the concurrent number threshold adjustment request includes the updated concurrent number threshold.
  • the control node determines the new target service.
  • the concurrency threshold adjustment request is sent to the new target serving node, where the updated concurrency threshold is included in the concurrency threshold adjustment request, and the step is repeated. Thereby updating the concurrency threshold of all service nodes in the concurrent control system.
  • the subordinate serving node and/or the subordinate serving node of the target serving node are sequentially Adjusting the threshold of the concurrent number to obtain the updated concurrency threshold of the upper service node and/or the lower service node of the target service node, and then the control node sends the concurrency to the upper service node and/or the lower service node of the target node.
  • the number threshold adjustment request includes the updated concurrency threshold corresponding to the upper service node and/or the lower service node, so that each service node can adjust its own concurrency threshold, thereby finally achieving An update to the concurrency threshold for all service nodes in the concurrent control system. It can be understood that, by sending the updated concurrency threshold to each target service node, each target service node can perform the concurrency threshold adjustment according to the updated concurrency threshold, so that the concurrency threshold between the service nodes satisfies the calling relationship. .
  • the control node acquires the analysis statistics of each service node and determines the target service node according to the analysis statistics, and then the control node acquires the concurrent number threshold and weight information of the upper service node or the lower service node of the target service node. And determining, according to the concurrency threshold of the target service node, the analysis statistics, and the concurrency threshold and weight information of the upper service node or the subordinate service node, the updated concurrency threshold of the target service node; and finally, the control node sends the carried to the target service node.
  • Concurrency threshold adjustment request for concurrent number threshold after update due to
  • the control node can determine the target node whose threshold is unreasonable by analyzing the statistical data periodically or in real time, and adjust the concurrency threshold. Therefore, the control node can dynamically adjust the concurrency threshold of the service node, so that the concurrent concurrency is performed.
  • the number threshold can satisfy the concurrent capability calling relationship, so as to solve the problem that the concurrency control system based on the call chain in the prior art cannot sense the service node failure or the unreasonable number of concurrent thresholds, and the processing performance of the concurrent control system is degraded, and the concurrency control is guaranteed.
  • the stability and reliability of the system reduce the delay of service request processing and improve the overall performance of service request processing.
  • the information processing node collects logs recorded by each service node in real time, and determines statistical data through the log, and then sends the analysis statistics to the control node.
  • the control node determines the service call topology relationship of the concurrent control system according to the concurrency threshold of each service node, the weight data, and the analysis statistics.
  • FIG. 5 shows a call chain-based concurrency provided by the embodiment of the present invention. Schematic diagram of the service call topology relationship of the control system.
  • the service A is provided by the service node A; the service B is provided by the service node B; the service E is provided by the service node E; the service F is provided by the service node F; the service G is deployed in a cluster form, including the service node G1 and the service node G2; Service H is deployed in a cluster, including service node H1 and service node H2.
  • the number of concurrency thresholds of the service node A is initially set to 80; the concurrency threshold of the service node B is 40; the concurrency threshold of the service node E is 80; the concurrency threshold of the service node F is 40; and the number of concurrency of the clusters of the service G is deployed.
  • the sum of the thresholds is 120, wherein the concurrency threshold of the serving node G1 and the serving node G2 is 60 respectively; the sum of the concurrency thresholds of the cluster deploying the service H is 120, wherein the concurrency threshold of the serving node H1 and the serving node H2 They are respectively 60.
  • the control node determines that the service node G1 is faulty in the analysis statistics acquired at a certain moment, the G service is only provided by the service node G2, that is, the G service is currently concurrent.
  • the processing capability becomes 60, and it is necessary to adjust the concurrency threshold of the upper service node and the lower service node based on the service G.
  • the control node can obtain two uplink call chains related to the service node G1 according to the service call topology relationship, which are the service node E->the service node A and the service node F->the service node B, respectively, because the service node E and the service node F
  • the sum of the number of concurrency thresholds is 120, which has exceeded the capability of the service G (service node G2).
  • the service node E can be calculated according to the ratio of the number of concurrent nodes of the service node E and the service node F being 2:1.
  • the number of concurrency thresholds with service node F should be adjusted to 40 and 20, respectively.
  • the concurrent number threshold of the upper-level service node A of the service node E is 80, which is greater than the concurrent number threshold 40 of the service node E, so it is also adjusted. Since the service node E has only one superior service node and the weight is 1, the concurrent operation of the service node A The number threshold is kept consistent with the concurrency threshold of the serving node E. In the same manner, the concurrency threshold of the serving node B can be adjusted to 20 to complete the concurrency threshold adjustment of the upper serving node; for the service G, there is a downlink calling chain.
  • Service H service H is deployed in a cluster form, and the sum of the number of concurrent thresholds is 120, and the sum of the number of concurrent thresholds of service H can be adjusted to 60.
  • the concurrent operation of service node H1 and service node H2 can be respectively performed.
  • the number threshold is adjusted to 30, or one of the service nodes providing the G service is closed, such as shutting down the service node H2, thereby saving the service node resources. That is, the number of concurrent adjustments of the lower service node of the service node G is completed.
  • the control node detects that the serving node A is a target node with an unreasonable number of concurrent thresholds (for example, the number of concurrent thresholds is too small), since the serving node A does not have a superior serving node, At this time, it is only necessary to adjust the concurrency threshold of the subordinate service node of the service node A, and the specific adjustment manner is the same as the above example. with.
  • the control node detects that the serving node H is a target node with an unreasonable number of concurrent thresholds (such as the presence of a network flash), since the serving node H does not have a lower level calling the service node, this You only need to adjust the concurrency threshold of the upper-level calling service node of H.
  • the specific adjustment method is the same as the above example.
  • the above adjustment method can be used to adjust the concurrency threshold of other service nodes.
  • the general principle is to make adjustments.
  • the new concurrency threshold of each service node obtained can satisfy the calling capability of the upper service node and the lower service node of the service node, that is, the new concurrency threshold enables the concurrent calling capabilities of the service nodes to match, and is not idle redundant.
  • the above adjustment methods are suitable for scenarios where the service node is faulty and the number of concurrent thresholds is unreasonable.
  • the single service node may be separately adjusted in turn by using the foregoing adjustment manner to complete the entire adjustment process.
  • the concurrency threshold of the service node can be adjusted, so that the updated concurrency threshold can satisfy the calling relationship, and the performance of the call chain system is stabilized.
  • the call chain-based concurrency control method mentioned in the embodiment of the present invention is not limited to the scenario where the service node failure mentioned in the foregoing embodiment and the processing delay of the service node exceed the threshold, and the concurrent with other service nodes.
  • the scenario in which the invocation capability of each service node does not match in the scenario where the threshold is unreasonable can be applied to the solution mentioned in the embodiment of the present invention.
  • the service is increased by expanding the service.
  • the sum of the number of concurrency thresholds of the service cluster caused by the service node is too large, and the scenario of reducing the concurrency threshold of the cluster providing the service caused by the service node deploying the service is reduced by reducing the capacity. It can be applied to the corresponding adjustment scheme mentioned in the embodiment of the present invention.
  • a method for concurrency control based on call chain according to an embodiment of the present invention is described in detail above with reference to FIG. 1 to FIG. 5. The following is a description based on the embodiments of the present invention.
  • FIG. 6 is a schematic diagram of an apparatus 600 for concurrency control based on a call chain according to an embodiment of the present invention.
  • the apparatus 600 includes an obtaining module 610, a processing module 620, and a sending module 630.
  • the obtaining module 610 is configured to acquire analysis statistics of each of the plurality of service nodes
  • the processing module 620 is configured to determine, according to the analysis statistics of each of the multiple service nodes, a target service node that is unreasonable in a plurality of service nodes;
  • the obtaining module 610 is further configured to acquire a concurrency threshold of the target serving node, and a concurrency threshold and weight information of the neighboring serving node of the target serving node, where the concurrency threshold is used to identify the service. a maximum value that the node can be called concurrently in the concurrent control system; the weight information is used to identify a proportional relationship of the number of concurrent calls of the neighboring service node to the target service node;
  • the processing module 620 is further configured to determine, according to a concurrency threshold of the target serving node, the analysis statistics, and a concurrency threshold and weight information of a neighboring serving node of the target serving node, the target serving node. Updated concurrency threshold;
  • the sending module 630 is configured to send a concurrent number threshold adjustment request to the target serving node, where the concurrent number is The threshold adjustment request includes the updated concurrency threshold.
  • each module in the device 600 of the embodiment of the present invention may be implemented by an Application Specific Integrated Circuit (ASIC) or a Programmable Logic Device (PLD), and the PLD may be complex.
  • ASIC Application Specific Integrated Circuit
  • PLD Programmable Logic Device
  • CPLD Complex Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • GAL Generic Array Logic
  • the device 600 and its respective modules can also be software modules.
  • the analysis statistics include a processing result of the service node processing a service request, and the target service node is a service node whose processing result is a failure to process a service request.
  • the analysis statistic data includes a delay of the service node processing the service request, where the target service node is a service node whose processing service request has a delay greater than or equal to a preset threshold.
  • the analysis statistic data includes a call relationship between the service node and other service nodes; the obtaining module 610 is further configured to acquire a service call topology relationship in the concurrent control system;
  • the processing module 620 is further configured to: according to a concurrency threshold of the target serving node, the service invoking a topological relationship, and a concurrency threshold and weight information of a neighboring serving node of the target serving node, serving the target The node adjusts to get the updated concurrency threshold.
  • the obtaining module 610 acquires a service invocation topology relationship in the concurrency control system, including obtaining, according to the calling relationship between each service node and another service node in the analysis statistics, the concurrency control system service. Call the topology relationship.
  • the processing module 620 is further configured to serve the target according to the service invocation topological relationship, the updated concurrency threshold of the target serving node, and the concurrency threshold and weight information of the neighboring serving node.
  • the neighboring service node of the node performs adjustment to obtain the updated concurrency threshold of the neighboring service node of the target service node;
  • the sending module 630 is further configured to send, to the neighboring service node of the target serving node, a updated number of concurrent thresholds of the neighboring service node of the target serving node.
  • the device 600 obtains the analysis statistics of each service node, determines the target service node according to the analysis statistics, and obtains the concurrency threshold and weight information of the superior service node or the subordinate service node adjacent to the target service node. Determining the number of concurrent thresholds after the target service node is updated according to the concurrency threshold of the target service node, the analysis statistics, and the concurrency threshold and weight information of the upper service node or the subordinate service node; and sending the number of concurrent extensions to the target service node Threshold concurrency threshold adjustment request.
  • the apparatus 600 dynamically adjusts the concurrency threshold of the serving node, so that the updated concurrency threshold can satisfy the concurrent capability invocation relationship, so that the concurrency control system based on the call chain in the prior art cannot sense the service node failure or the concurrency threshold in time.
  • the problem of degraded processing performance of the concurrency control system caused by the unreasonable, the stability and reliability of the concurrency control system are guaranteed, the delay of the service request processing is reduced, and the overall performance in the service request processing process is improved.
  • Apparatus 600 in accordance with an embodiment of the present invention may correspond to performing the methods described in the embodiments of the present invention, and the above and other operations and/or functions of the various units in apparatus 600 are respectively performed to implement the control node in the method of FIG. The corresponding process, for the sake of brevity, will not be described here.
  • FIG. 7 is a schematic structural diagram of another control node 700 according to an embodiment of the present invention.
  • the control node 700 includes a processor 702, a communication interface 703, a memory 701, and a bus 704.
  • the communication interface 703, the processor 702, and the memory 701 can be connected to each other through the bus 704 and complete communication with each other;
  • the memory 701 is configured to store computer execution instructions, and when the control node 700 is running, the processor 702 executes the memory 701.
  • the computer executes the instructions to perform the following operations with hardware resources in the control node 700:
  • the processor 702 may be a CPU, and the processor 702 may also be other general-purpose processors, digital signal processors (DSPs), application specific integrated circuits (ASICs), and off-the-shelf programmable gate arrays. (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the memory 701 can include read only memory and random access memory and provides instructions and data to the processor 702.
  • a portion of the memory 701 may also include a non-volatile random access memory.
  • the memory 701 can also store information of the device type.
  • the bus 704 may include a power bus, a control bus, a status signal bus, and the like in addition to the data bus. However, for clarity of description, various buses are labeled as bus 504 in the figure.
  • control node 700 may correspond to the control node 120 in the concurrency control system shown in FIG. 1 in the embodiment of the present invention, and the device 600 shown in FIG. 6, and may correspond to the execution according to the present invention.
  • the control node of FIG. 4 of the embodiment of the invention, and the above-described and other operations and/or functions of the respective modules in the control node 700 are respectively implemented in order to implement the respective processes of the respective methods in FIGS. 4 to 5, for the sake of brevity, Let me repeat.
  • the computer program product includes one or more computer instructions.
  • the processes or functions described in accordance with embodiments of the present invention are generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be passed from a website site, computer, server or data center Wired (eg, infrared, wireless, microwave, etc.) to another website site, computer, server, or data center.
  • Computer readable storage medium It can be any available media that the computer can access or a data storage device such as a server, data center, or the like that contains one or more sets of available media.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a Solid State Disk (SSD)).
  • the disclosed apparatus may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical or otherwise.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Abstract

本申请公开了一种基于调用链的并发控制的方法、装置及控制节点,该方法应用于基于调用链的并发控制系统,并发控制系统包括控制节点、多个服务节点,该方法包括:控制节点获取多个服务节点中每个服务节点的分析统计数据;根据每个服务节点的分析统计数据确定并发数阈值不合理的目标服务节点;获取目标服务节点的并发数阈值,以及目标服务节点的相邻服务节点的并发数阈值和权重信息;根据目标服务节点的并发数阈值、分析统计数据、以及目标服务节点的相邻服务节点的并发数阈值和权重信息确定目标服务节点更新后的并发数阈值;向目标服务节点发送并发数阈值调整请求。从而保障系统的稳定与可靠性,降低业务请求处理的时延,提升调用链系统性能。

Description

一种基于调用链的并发控制的方法、装置及控制节点 技术领域
本发明涉及互联网和云计算领域,具体涉及一种基于调用链的并发控制的方法、装置及控制节点。
背景技术
随着互联网和云计算的发展,越来越多的公司采用了分布式的架构,通过将应用拆分成微服务的方式,更好的解耦与组合,以满足业务快速发展的需求。当业务拆分的越来越细时,一次业务请求可能涉及到大量服务的调用,为了监控服务调用的情况,很多分布式应用都采用了调用链技术,在服务的调用链中通过打印日志的方式记录每个服务调用的响应时间、参数信息,并利用调用链的日志监控服务的性能。
对大型的分布式应用而言,并发控制是应用性能和可靠性的保障,为了控制并发,目前通用的做法是各个服务节点单独对其设置不同的并发数阈值。调整时并发数阈值导致系统性能要求当调用链系统因为服务节点故障或扩容导致调用链系统中的某一个服务节点的并发数阈值无法保证其上级或下级服务节点业务需求时,导致应用的性能和可靠性无法得到保障,影响整个系统的可靠性和性能。
发明内容
本申请提供了一种基于调用链的并发控制的方法、装置及控制节点,可以对并发控制系统中的服务节点的并发数阈值进行调整,以保障并发控制系统的性能和可靠性。
第一方面,本申请提供了一种基于调用链的并发控制的方法,该方法应用于基于调用链的并发控制系统,该并发控制系统包括控制节点、多个服务节点,每个服务节点可以用于部署应用服务,控制节点可以对并发控制系统中服务节点的并发数阈值调整过程如下:首先,控制节点获取该并发控制系统中服务节点的分析统计数据,并根据该分析统计数据确定服务节点中并发数阈值不合理的目标服务节点,该分析统计数据包括每个服务节点处理业务请求的时延、处理结果、以及和其他服务节点之间的调用关系;然后控制节点再获取目标节点的并发数阈值、目标服务节点的相邻服务节点的并发数阈值和权重信息,其中,并发数阈值用于标识一个服务节点在并发控制系统中能够被并发调用的最大值;权重信息用于标识目标服务节点的相邻服务节点的并发数的比例关系;接下来,控制节点根据目标服务节点的并发数阈值、分析统计数据、以及目标服务节点的相邻服务节点的并发数阈值和权重信息确定目标服务节点更新后的并发数阈值;最后,控制节点向目标服务节点发送携带更新后的并发数阈值的并发数阈值调整请求。
在业务请求处理过程中,相邻的服务节点之间存在调用关系,任一服务节点可以同时被多个相邻的上级服务节点或下级服务节点调用的过程称为并发,任一服务节点同时被多个服务节点并发调用的次数称为并发数,每个服务节点能够被多个上级服务节点或下级服务节点同时调用的最大值称为该服务节点的并发数阈值。
在基于调用链的并发控制系统中采用分布式服务调用关系,业务请求的处理过程需要 按照预置顺序依次经过各个服务节点,对于任一服务节点,在业务请求的处理过程中,需要在该服务节点之前完成业务请求处理过程的服务节点称为该服务节点的上级服务节点,需要在该服务节点之后完成业务请求处理过程的服务节点称为该服务节点的下级服务节点,上级服务节点和下级服务节点是相对于一个指定服务节点的概念,随着指定服务节点的变化,其上级服务节点和下级服务节点的对应关系也会随之变化,其中,上级服务节点包括需要在该服务节点之前完成业务请求处理且与该服务节点存在直接调用关系的服务节点,以及需要在该服务节点之前完成业务请求处理且与该服务节点存在非直接调用关系的其他服务节点;下级服务节点包括需要在该服务节点之后完成业务请求处理且与服务节点存在直接调用关系的服务节点,以及需要在该服务节点之后完成业务请求处理且与该服务节点存在非直接调用关系的其他服务节点,为了描述方便,可以把上级和下级服务节点统称为相邻服务节点。
值得说明的是,该权重信息可以由用户指定,也可设置为系统默认值。另外,权重信息是同一个服务节点的两个或两个以上服务节点的并发数阈值的比例关系,当某个服务节点的上级服务节点或下级服务节点仅有一个时,其权重信息为1,假设服务节点B仅有一个下级服务节点D,则服务节点D的权重为1,当服务节点的并发数阈值等于服务节点B的并发数阈值时,此时服务节点D可以调用服务节点B的所有资源处理业务请求。
可选地,分析统计数据中还可以包括每个服务节点处理业务请求的标识、数量信息。
值得说明的是,控制节点可以根据业务需求周期性获取分析统计数据、并发数阈值和权重信息,也可以实时收集上述信息,以便更好的对基于调用链的并发控制系统中各服务节点的并发数阈值进行调整和控制。
本申请提供的上述可能的技术方案中,控制节点可以监控并发控制系统中服务节点的状态和服务调用情况,并对并发控制系统中目标服务节点的并发数阈值进行调整,使得目标服务节点更新后的并发数阈值能满足并发能力调用关系,以此解决现有技术中并发数阈值不合理所导致的基于调用链的并发控制系统处理性能下降的问题,保障基于调用链的并发控制系统的稳定与可靠性,降低业务请求处理的时延,提升处理业务请求的性能。
在第一方面的一种可能的实现方式中,所述目标服务节点为分析统计数据的处理结果中处理业务请求失败的服务节点。当控制节点根据分析统计数据中的处理结果确定存在处理结果为失败的服务节点时,确定该服务节点为并发数阈值不合理的目标服务节点。由于对于并发数阈值正常的服务节点来说,各服务节点的能够正常处理业务请求,而一旦检测到某个业务请求的处理过程在某个服务节点上失败时,则说明该服务节点可能发生故障,进而确定该服务节点为并发数阈值不合理的目标服务节点,使得不合理的服务节点确定准确。
进一步地,服务节点可能在以下任意一种情况下处理业务请求的结果为失败:
情况一:目标服务节点的并发数阈值设置过小,目标服务节点无法满足上级服务节点或下级服务节点的调用关系。
情况二:目标服务节点出现部分硬件故障,目标服务节点的处理能力下降。例如,目标服务节点中部分CPU故障。
情况三:目标服务节点的网络存在网络故障,例如网络闪断。
在第一方面的另一种可能的实现方式中,控制节点可以通过分析统计数据中一个业务请求确定并发数阈值不合理的目标服务节点,也可以通过检测多个具有相同服务节点的业务请求中断情况来确定并发数阈值不合理的目标服务节点。由于对于单业务请求来说,可能由于网络闪断导致业务请求中断,当网络闪断恢复后业务请求又可以继续处理,故分析多个业务请求的中断情况,将使得分析结果更为准确。
在第一方面的另一种可能的实现方式中,当控制节点根据分析统计数据确定存在业务请求的处理时延大于或等于预设阈值的服务节点时,可以确定该服务节点为并发数阈值不合理的目标服务节点。由于对并发数阈值正常的服务节点来说,各服务节点的业务请求时延应小于预设阈值,而一旦检测到某个业务请求在某个服务节点上的业务请求处理时延大于或等于预设阈值,则说明该服务节点可能发生故障(如服务节点的网卡故障所导致的网络故障),从而确定该服务节点为并发数阈值不合理的目标服务节点,使得不合理的服务节点确定准确。
在第一方面的另一种可能的实现方式中,控制节点可以通过检测多个具有相同服务节点的业务请求的时延来确定并发数阈值不合理的目标服务节点。由于对于单业务请求来说,可能由于网络闪断或其他在可自我恢复故障(如提供服务的进程异常)导致处理业务请求的时延大于或等于预设阈值,故分析多个业务请求时延,将使得分析更为准确。通过上述多种不同的方式来确定并发数不合理的目标服务节点,使得并发数不合理的服务节点的确定方式灵活。
在第一方面的另一种可能的实现方式中,控制节点确定目标服务节点更新后的并发数阈值的方法可以按照如下过程操作:首先,控制节点获取该并发控制系统的服务调用拓扑关系;然后根据该目标服务节点的并发数阈值、服务调用拓扑关系、以及上级服务节点或下级服务节点的并发数阈值和权重信息对目标服务节点更新后的并发数阈值。由于更新后的并发数阈值在拓扑关系中满足预设调用关系,从而保障调用链系统的稳定与可靠性,降低业务请求处理的时延,提升调用链系统性能。
值得说明的是,对于同一并发控制系统来说,其服务调用拓扑关系相对固定,控制节点可以在第一次获取到服务调用拓扑关系后,保存该服务调用拓扑关系,在接下来的并发数阈值调整过程中,以此服务调用拓扑关系为参考,对基于调用链的并发控制系统中服务节点的并发数阈值进行调整和控制,避免每次调整过程中均需要获取该服务调用拓扑关系所导致的资源浪费。
在第一方面的另一种可能的实现方式中,所述基于调用链的并发控制系统还包括信息处理节点,该信息处理节点用于获取该多个服务节点的分析统计数据,并利用该分析统计数据中每个服务节点和其他服务节点的之间的调用关系,分析该并发控制系统的服务调用拓扑关系,此时,控制节点从信息处理节点获取的分析统计数据中还包括服务调用拓扑关系;或者,控制节点也可以基于分析统计数据中每个服务节点和其他服务节点的之间的调用关系分析该并发控制系统的服务调用拓扑关系。
进一步地,控制节点可以基于服务调用拓扑关系确定并发数阈值不合理的目标服务节点,相比于直接通过分析统计数据分析并发数阈值不合理的目标服务节点的方式,控制节点可以通过服务调用拓扑关系获知整个并发控制系统中服务调用关系,更快速确定并发数 阈值不合理的目标服务节点。
在第一方面的另一种可能的实现方式中,控制节点在确定完目标服务节点更新后的并发数阈值后,可再次确定目标服务节点相邻的服务节点为新的目标服务节点,并使用同样的方式对新的目标服务节点的并发数阈值进行调整。重复该过程,最终可完成对整个并发控制系统中所有服务节点的并发数阈值的调整。
在第一方面的另一种可能的实现方式中,控制节点在确定目标服务节点更新后的并发数阈值后,再根据服务调用拓扑关系、目标服务节点更新后的并发数阈值、以及目标服务节点相邻的服务节点的并发数阈值和权重信息,对目标服务节点的相邻服务节点进行调整,以得到该相邻服务节点更新后的并发数阈值;然后控制节点向该相邻服务节点发送所述相邻服务节点更新后的并发数阈值。以实现对目标服务节点的相邻服务节点的并发数阈值的调整。
更进一步地,通过重复使用上述方式对目标服务节点的相邻的上级服务节点以及下级服务节点的并发数阈值进行调整,可最终实现对整个并发控制系统的服务节点的并发数阈值的调整。
第二方面,本申请提供一种基于调用链的并发控制的装置,该装置具有实现上述第一方面及第一方面的任意一种可能的实现方式中基于调用链的并发控制方法的各个模块,所述模块可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个与上述功能相对应的模块。
第三方面,本申请提供一种控制节点,该控制节点包括处理器、存储器、通信接口、总线,所述处理器、存储器和通信接口之间通过总线连接并完成相互间的通信,所述存储器中用于存储计算机执行指令,所述控制节点运行时,所述处理器执行所述存储器中的计算机执行指令以利用所述控制节点中的硬件资源执行第一方面及第一方面的任意一种可能的实现方式中基于调用链的并发控制方法的操作步骤。
第四方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述各方面所述的方法。
相较于现有技术,本发明实施例的技术方案中,控制节点通过获取服务节点的分析统计数据、并发数阈值和权重信息,可以监控基于调用链的并发控制系统中服务的调用关系,调整服务节点的并发数阈值,使得更新后的并发数阈值能满足并发能力调用关系,以此解决现有技术中基于调用链的并发控制系统中并发数阈值不合理所导致的并发控制系统处理性能下降的问题,保障基于调用链的并发控制系统的稳定与可靠性,降低业务请求处理的时延,提升处理业务请求的整体性能。
本申请在上述各方面提供的实现方式的基础上,还可以进行进一步组合以提供更多实现方式。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面结合附图对本发明的技术方案进行详细介绍。
图1是本发明实施例提供的一种基于调用链的并发控制系统的架构示意图;
图2是本发明实施例提供的一种基于调用链的并发控制系统中分布式服务调用关系示意图;
图3是现有技术中提供的一种基于调用链的并发控制系统中服务节点自身控制并发数阈值示意图;
图4是本发明实施例提供的一种基于调用链的并发控制方法的流程示意图;
图5是本发明实施例提供的一种基于调用链的并控制系统的服务调用拓扑关系示意图;
图6是本发明实施例提供的一种控制节点的结构示意图;
图7是本发明实施例提供的另一种控制节点的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。
参见图1,图1是本发明实施例提供的一种基于调用链的并发控制系统100的架构示意图,如图1所示,该并发控制系统包括至少一个服务节点110、控制节点120以及信息处理节点130。
服务节点110用于部署应用服务,每个应用服务可以利用一个单独的服务节点110部署,也可以利用多个服务节点110组成的集群部署,其中,应用服务是指运行同一业务应用所需的服务,如淘宝购物应用包括三个应用服务:Web服务、订单服务、支付服务,当用户通过淘宝购物时,需要先通过Web服务选择所需商品;再通过订单服务将所需商品添加到订单中并配置寄送地址;然后通过支付服务完成所需商品的购买。集群形式部署的应用服务中,每个服务节点部署相同的业务应用,不同服务节点之间采用负载均衡策略,以保证应用服务的可靠性和处理效率。
示例地,如图1所示,应用服务A需要利用一个单独的服务节点A部署,应用服务H为高负载的应用服务,需要利用多个服务节点组成的集群部署,服务节点H1、服务节点H2、服务节点H3同时部署应用服务H,其中,服务节点H1、服务节点H2、服务节点H3之间采用负载均衡策略共同承担应用服务H的处理过程,以此保证该应用服务H的可靠性和处理效率。
可选地,应用服务也可以采用分布式形式部署,将处理业务请求的应用服务拆分成不同的子服务分别部署在不同的服务节点上,对于应用服务中负载高的子服务也可以采用多个服务节点组成的集群形式部署,以此来更好的解耦与组合不同应用服务之间的调用关系。例如,对于某个应用服务来说,可以拆分成服务A、服务B、服务C,各子服务可以部署在一个或多个服务节点110上,每个子服务可以部署在一个服务节点上,也可以部署在由多个服务节点组成的集群上,还可以为同时部署在单个服务节点与集群上,在业务请求处理过程中,相邻的服务节点之间存在调用关系,任一服务节点可以同时被多个相邻的上级服务节点或下级服务节点调用的过程称为并发,任一服务节点同时被多个服务节点并发调用的次数称为并发数,每个服务节点能够被多个上级服务节点或下级服务节点同时调用的最大值称为该服务节点的并发数阈值,例如,某应用服务的调用关系为服务A->服务B,假设服务A由3个服务节点组成的集群部署,服务B由单个服务节点部署,那么,部署服务A 的3个服务节点可以同时调用部署服务B的服务节点完成业务请求的处理过程称为并发;当前时刻服务A中每个服务节点同时调用部署服务B的服务节点的次数称为部署服务B的服务节点的并发数;当前时刻部署服务A的3个服务节点能够被部署服务B的服务节点同时调用的最大值称为部署服务A的服务节点的并发数阈值,其中服务A的并发数阈值为每个部署服务A的服务节点的并发数阈值之和。
各服务节点之间根据应用服务的处理过程形成调用链,其中,同一个并发控制系统100中可包括至少一个调用链,每个调用链用于标识一个业务应用中应用服务处理过程的轨迹。例如参见图2,图2是本发明实施例提供的一种基于调用链的并发控制系统中分布式服务调用关系示意图。如图2所示,该并发控制系统包括2条调用链,业务请求1需要依次经过服务A、服务B、服务D、服务E、服务F的处理过程,那么业务请求A对应的调用链为服务A->服务B->服务D->服务E->服务F,而业务请求2需要依次经过服务A、服务C、服务E、服务F的处理过程,那么业务请求2对应的调用链为服务A->服务C->服务E->服务F,当接收用户的业务请求时,不同业务请求会经过不同调用链完成对应用服务的处理过程。
值得说明的是,对于同一个服务节点而言,每个服务节点可以仅部署一种应用服务或一个应用服务的一个子服务,也可以同时部署多种应用服务或一个应用服务的多个子服务或多个应用服务的多个子服务,本发明不作限制,为便于描述,本发明的以下描述中,以每个服务节点仅部署一个应用服务的一个子服务为例进行具体描述。
控制节点120用于对基于调用链的并发控制系统100中服务节点的并发数阈值进行动态调整。具体地,该控制节点120可以采用单个物理服务器或虚拟机部署,也可以采用合布方式,即控制节点与部署其它应用服务的服务节点共同部署在同一个物理服务器中实现。
信息处理节点130用于收集各个服务节点110的日志,并根据日志中记录内容生成分析统计数据,该分析统计数据包括各个服务节点110处理业务请求的时延、处理结果、和其他服务节点之间的调用关系,例如成功或失败,也即各个业务请求在各个服务节点110上的是否有中断的情况。具体地,该信息处理节点130可以部署在独立物理服务器或虚拟机中。
具体地,基于图2所示的分布式调用链关系,服务节点可以通过日志的方式记录每一个业务请求调用的响应时间、输入参数等信息,日志中遵循统一的日志打点模型,每个业务请求可以自定义日志记录内容,例如TraceId|CalledNodeId|…|Span,其中,TraceId用于唯一标识每个业务请求,TraceId由应用服务的起始服务节点生成并沿着调用链传递,其中,应用服务的起始服务节点也称为根节点;CalledNodeId用于标识业务请求处理过程中所经过的服务节点,也即可以标识业务请求处理过程中当前服务节点与其它服务节点之间的调用关系,从而通过依次分析某个TraceId处理过程中经过的所有服务节点确定拓扑关系,如TraceId为1的业务请求的根节点为服务节点A,服务节点A调用服务节点B,服务节点B调用服务节点D,服务节点D调用服务节点E,以及服务节点E调用服务节点F,由此,可以得到一次业务请求的调用链。若服务节点的日志中还记录有另一个TraceId所对应的根节点也是服务节点A,服务节点A调用服务节点C,服务节点C调用服务节点E,服务节点E调用服务节点F,由此,又可以得到另一次业务请求的调用链,最后整合各个不同的调用链可以得到各服务节点之间的服务调用拓扑关系;Span用于标识业务请求在经 过各服务节点的时延,从而可得到各业务请求在各个服务节点上的处理时延。
可选地,当某个子服务以集群形式部署时,在部署子服务的每个服务节点的日志中,还可以记录有该子服务部署形式信息,例如,服务E由服务节点E1、服务节点E2、服务节点E3组成,在服务节点E1、服务节点E2、服务节点E3的日志中均记录该服务以集群形式部署,以及同一集群中其他服务节点标识,其中,服务节点的标识可以为服务节点的名称或IP。
可选地,在服务节点的日志中,还可以根据业务需要记录其它服务调用参数,例如,记录各次业务请求在各服务节点的处理结果。示例地,可以通过添加Result标识,记录每个TraceId在每个服务节点的调用是否成功,例如,通过Result0标识业务请求TraceId的调用成功,通过Result1标识业务请求TraceId的调用失败,从而可得到各个业务请求在各个服务节点上的处理结果。TraceId在用户入口调用的时候生成,然后在部署服务A、服务B、服务C、服务D、服务E、服务F的各服务节点中逐个传递,每个业务请求在处理时都需要传递TraceId,信息处理节点会收集并对服务节点所记录的日志进行处理,根据TraceId统计出一次业务请求的完整调用链信息,根据调用链信息可以很容易分析出业务请求处理过程中的性能情况。表1示出了一种基于调用链的并发控制系统中分布式服务调用关系的分析统计数据的示例,其中,Span AB用于表示业务请求从服务节点A到达服务节点B所经历的时间,也即代表服务节点A的时延,依此可知Span AC,Span BD、Span CE、Span DE、Span EF的含义;假设当TraceId为1时,表示调用链经过服务节点A->服务节点B->服务节点D->服务节点E->服务节点F的一次业务请求;TraceId为2时,表示调用链经过服务节点A->服务节点C->服务节点E->服务节点F的一次业务请求。信息处理节点可以根据其他TraceId确定该TraceId所对应的业务请求所调用的各个服务节点的情况。
更进一步地,信息处理节点在获取各个服务节点在业务请求处理时记录的日志后,可以根据TraceId确定同一个业务请求处理时所经过的所有服务节点,如表1所示数据,信息处理节点可以通过统计同一TraceId所经过的各个服务节点,以及该TraceId所对应的数据(如SpanId),由此确定各服务节点的业务请求个数、业务请求时延、处理结果、和其他服务节点之间的调用关系,并进一步分析各服务节点和其他服务节点之间的调用关系确定并发控制系统中服务调用拓扑关系。
表1
TraceId Span AB Span AC Span BD Span CE Span DE Span EF
1 <数据1> <数据2> <数据3> <数据4>
2 <数据5> <数据6> <数据7>
   
可选地,信息处理节点收集服务节点的日志的形式可以是周期性收集,也可以是实时收集。
可选地,分析统计数据中还可以包括每个服务节点处理业务请求的标识、数量信息。
可选地,图1中控制节点120和信息处理节点130也可以合布在一个物理服务器或虚拟机中,该物理服务器或虚拟机不仅用于调整并发控制系统中服务节点的并发数阈值,还用于收集各个服务节点处理业务请求的时延、处理结果和其他服务节点之间的调用关系。
为了对图2所示的基于调用链的并发控制系统中服务节点的并发数阈值进行管理和控制,通常有两种控制方法,一种是在业务请求的调用起始处对部署同一应用服务的各子服务的服务节点110设置一个统一的并发数阈值,例如,通过统一接口或Web界面将各服务节点110的并发数阈值均设置为100,这种方式的问题在于顶层服务无法感知底层众多服务状态,其中,顶层服务是应用服务处理过程中起始的子服务,如服务A;底层服务是应用服务处理过程中与顶层服务存在非直接调用关系的子服务,如服务D、服务E、服务F是服务A的底层服务,但为了防止底层某子服务故障引起整个系统性能下降,预留了较多的资源,导致资源浪费。另一种是部署应用服务的各子服务的服务节点110各自设置不同的并发数阈值,如将自身最大可支持的并发数设置为该服务节点的并发数阈值。参见图3,图3是对应图2所示调用链系统的一种基于调用链的并发控制系统中各服务节点分别控制自身的并发数阈值示意图,如图3所示,服务A由服务节点A提供,服务B由服务节点B提供,服务C由服务节点C提供,服务D由服务节点D提供;服务E和服务F采用集群形式部署,服务E由服务节点E1、服务节点E2、服务节点E3共同部署;服务F由服务节点F1、服务节点F2、服务节点F3共同部署,集群中各个服务节点采用负载均衡策略完成每个调用链的处理过程。由各服务节点根据自身的最大可支持的并发数设置不同的并发数阈值,例如提供服务E的服务节点E1、服务节点E2以及服务节点E3均支持最大50的并发数,那么,提供服务E的集群的最大并发处理能力为150;提供服务F的服务节点F1、服务节点F2以及服务节点F3均最大支持50并发数,那么,提供服务F的集群的最大并发处理能力为150;服务节点C和服务节点D作为服务E的上级服务节点,并发数阈值分别为100和50。
在基于调用链的并发控制系统中采用分布式服务调用关系,业务请求的处理过程需要按照预置顺序依次经过各个服务节点,对于任一服务节点,在业务请求的处理过程中,需要在该服务节点之前完成业务请求处理过程的服务节点称为该服务节点的上级服务节点,需要在该服务节点之后完成业务请求处理过程的服务节点称为该服务节点的下级服务节点,上级服务节点和下级服务节点是相对于一个指定服务节点的概念,随着指定服务节点的变化,其上级服务节点和下级服务节点的对应关系也会随之变化,其中,上级服务节点包括需要在该服务节点之前完成业务请求处理且与该服务节点存在直接调用关系的服务节点,以及需要在该服务节点之前完成业务请求处理且与该服务节点存在非直接调用关系的其他服务节点;下级服务节点包括需要在该服务节点之后完成业务请求处理且与服务节点存在直接调用关系的服务节点,以及需要在该服务节点之后完成业务请求处理且与该服务节点存在非直接调用关系的其他服务节点,为了描述方便,可以把上级和下级服务节点统称为相邻服务节点。例如,服务节点A->服务节点B->服务节点C->服务节点D的调用关系中,对于服务节点C来说,服务节点A和服务节点B为其上级服务节点,服务节点B是在服务节点C之前完成业务处理且与服务节点C存在直接调用关系的上级服务节点,服务节点A是在服务节点C之前完成业务处理,但与服务节点C存在非直接调用关系的上级服务节点,服务节点D是其下级服务节点;而对于服务节点B,服务节点A是其上级服务节点,服务节点C和服务节点D是其下级服务节点,服务节点C是在服务节点B之后完成业务处理且与服务节点B存在直接调用关系的下级服务节点,服务节点D是在服务节点B之后完成业 务处理,但与服务节点B存在非直接调用关系的下级服务节点。
但是利用上述方法对如图3所示的并发控制系统中服务节点的并发数阈值进行控制时,当提供服务E的某个服务节点故障,例如服务节点E1故障,提供服务E的集群的最大并发能力由150变成100,此时,若调用链系统中服务C和服务D对服务E的并发调用的并发数之和大于100,A和B的业务请求调用有可能会失败,但失败是在业务处理调用服务C或服务D时才感知服务E故障,此时,业务请求并未完成处理,服务节点E1的上级服务节点做了无用功,浪费资源。并且提供服务E的服务节点的并发能力之和下降,其下级调用节点的业务压力降低,但服务E的下行链路服务F依然提供3个服务节点规模的预留资源也导致资源浪费。
值得说明的是,当业务请求个数在服务E的并发能力之内时,此时依然能正常调用。或者,调用链系统中局部服务的并发控制可以做的较好,但无法保证整体并发控制效果,例如图3中,在服务A->服务C->服务E->服务F的调用链中,若服务E最大支持150并发数阈值,对于单服务节点而言并发数阈值的设置无问题,而服务节点A最多支撑250并发数,服务节点A的下级服务节点包括服务节点B和服务节点C,而服务节点B和服务节点C的并发数阈值之和为150,此时,服务节点A的并发数阈值大于服务节点B和服务节点C的并发数阈值之和,此时,服务节点A为支持250的并发数阈值会预留相应的资源,但实际处理业务请求时,预留资源存在冗余而导致资源浪费。
针对上述问题,本发明实施例提供了一种基于调用链的并发控制方法,通过由上述控制节点120获取到多个服务节点的分析统计数据确定并发数阈值不合理的目标服务节点后,再由控制节点120对目标服务节点的并发数阈值进行动态调整,从而可以实时保证由各服务节点组成的调用链系统的可靠性。
下面结合附图详细描述本发明的实施例,以便本领域技术人员理解。
参见图4,图4示出了本发明实施例提供的一种基于调用链的并发控制方法的流程示意图,该方法应用于图1所示的基于调用链的并发控制系统。如图4所示,该方法可以包括以下步骤:
S401、控制节点获取多个服务节点中每个服务节点的的分析统计数据。
其中,所述分析统计数据包括基于调用链的并发控制系统中各个服务节点处理业务请求的时延、处理结果和其他服务节点之间的调用关系。其中,处理结果用于标识处理业务请求成败状况,例如成功或失败,当处理结果为失败时,表示业务请求在服务节点处理过程中有中断的情况。该分析统计数据由如图1所示的调用链系统100中的信息处理节点根据其收集的各个服务节点的日志并分析获得。
进一步地,控制节点获取分析统计数据的方式,可以是控制节点向信息处理节点发送获取分析统计数据的请求消息,信息处理节点根据该请求消息将分析统计数据发送给控制节点。其中,控制节点获取分析统计数据可以是周期性操作,也可以是实时操作,本发明不作限制。
控制节点从信息处理节点获取到的分析统计数据的形式,可以是包括各个服务节点处理业务请求的时延、处理结果和其他服务节点之间的调用关系的分析统计数据;可选地,信息处理节点也可以对分析统计数据进行进一步分析,得到各服务节点之间的服务调用拓 扑关系,此时,分析统计数据中还可以包括服务调用拓扑关系,此时控制节点根据分析统计数据可以更为直观地分析调用链系统的服务调用情况。
示例地,信息处理节点对分析统计数据进行进一步分析,通过TraceId追踪到各个调用链所经过的服务节点,然后再组合所有调用链形成各个服务节点之间的调用关系,也即该应用服务的服务调用拓扑关系。若控制节点获取到的是未经过进一步处理的分析统计数据,控制节点也可以通过该方式,进一步得到服务调用拓扑关系。使得处理方式更为灵活。
S402、控制节点根据分析统计数据确定并发数阈值不合理的目标服务节点。
具体地,控制节点根据S401获取的分析统计数据,分析并发数阈值不合理的目标服务节点,其中,控制节点确定并发数阈值不合理的目标服务节点的方式包括以下方式中的至少一种:
方式一:可以在检测到业务请求在某个服务节点失败时,确定该服务节点的并发数阈值不合理。
具体地,控制节点可以通过分析统计数据中的处理结果确定并发数阈值不合理的服务节点。当任一服务节点的分析统计数据中处理结果为失败时,确定该服务节点为并发数阈值不合理的目标服务节点。其中,服务节点可能在以下任意一种情况下出现处理业务请求的结果为失败:
情况一:目标服务节点的并发数阈值设置过小,目标服务节点无法满足上级服务节点或下级服务节点的调用关系。
情况二:目标服务节点出现部分硬件故障,目标服务节点的处理能力下降。例如,目标服务节点中部分CPU故障。
情况三:目标服务节点的网络存在网络故障,例如网络闪断。
可选地,控制节点可以通过检测某一个业务请求在某个服务节点是否失败来确定该服务节点是否为并发数阈值不合理的目标服务节点。
可选地,也可以是通过检测多个具有相同服务节点的业务请求的业务请求失败情况来确定并发数阈值不合理的目标服务节点。可以理解,由于对于单业务请求来说,可能由于网络闪断导致业务请求失败,当网络闪断恢复后业务请求又可以继续处理,故分析多个业务请求的失败情况,将使得分析结果更为准确。
示例地,在本发明的一个可能的实施例中,对于图2所示的调用链系统来说,假设每个服务由一个服务节点部署,控制节点首先获取各个服务节点的分析统计数据,根据每个服务节点的分析统计数据可以确定业务请求的处理过程,当根据分析统计数据获知某个服务节点的下级服务节点无分析统计数据或分析统计数据中处理结果为失败时,例如,控制节点获取的信息中无服务节点E的分析统计数据,此时,可以确定服务节点E出现故障,也即确定服务节点E为并发数阈值不合理的目标服务节点。
示例地,在本发明的另一个可能的实施例中,对于图2所示的基于调用链的并发控制系统,也可以同时通过分析多个业务请求的失败情况来确定并发数阈值不合理的目标服务节点。假设每个服务由一个服务节点部署,由图2可知,该并发控制系统中包括两个调用链,每个调用链可以处理一种业务请求,若服务节点E故障,控制节点可以通过分析统计数据确定经过服务节点A->服务节点B->服务节点C->服务节点E的业务请求在服务节点E 处出现失败,还可以通过分析统计数据确定经过服务节点A->服务节点B->服务节点D->服务节点E的业务请求在服务节点E处出现失败,此时可以更为准确地确定服务节点E为并发数阈值不合理的目标服务节点,其中,每个业务请求中确定业务失败的服务节点的方法与上述实施例相同,在此不再赘述。
值得说明的是,对于某子服务以单服务节点形式部署的情况,当该服务节点整体故障(如服务节点断电导致整体故障)时,调用链无法继续完成处理业务请求的处理过程,控制节点获取的分析统计数据中仅能根据该故障服务节点的上级服务节点确定调用关系,但无法获取到该故障服务节点的分析统计数据,此时,控制节点也可以因无法获取该服务节点的分析统计数据确定该故障节点为目标服务节点,需要由维护人员进一步确认故障原因,并进行故障恢复。
方式二:可以在检测到业务请求在某个服务节点的处理时延超过预设阈值时,确定该服务节点的并发数阈值不合理。
具体地,控制节点可以根据分析统计数据中每个服务节点处理业务请求的时延是否超过预设阈值确定不合理的服务节点。其中,该预设阈值可以为根据经验设置的一个可以标识不合理服务节点的业务请求时延阈值,也可以为根据分析统计数据观察各服务节点的业务请求时延,从而确定的一个可以用于标识不合理服务节点的业务请求时延阈值。
可以理解,对于并发数阈值正常的服务节点来说,各服务节点的业务请求时延小于预设阈值,而一旦检测到某个业务请求在某个服务节点上的业务请求时延大于或等于预设阈值,则说明该服务节点可能发生故障(如网络故障),从而确定该服务节点为并发数阈值不合理的目标服务节点。
方式三:可以是在检测到业务请求在多个服务节点的时延超过预设阈值时,确定该多个服务节点的并发数阈值不合理。
具体地,控制节点也可以通过同一调用链中多个服务节点的时延超过预设阈值确定并发数阈值不合理的目标服务节点。
可选地,控制节点也可以是通过检测多个业务请求的时延来确定并发数阈值不合理的目标服务节点。可以理解,由于对于单业务请求来说,可能由于网络闪断或其他在可自我恢复故障(如提供服务的进程异常)导致处理业务请求的时延大于或等于预设阈值,故分析多个业务请求时延,将使得分析更为准确。
示例地,在本发明的一个可能的实施例中,对于图2所示的调用链系统来说,假设每个服务由一个服务节点部署,预设阈值为0.05s,若某个业务请求的处理过程需要经过服务节点A、服务节点B、服务节点E,而通过分析统计数据检测到通过服务节点A、服务节点B的业务请求时延为0.01s,而通过服务节点E的业务请求的处理时延为0.1s,服务节点E处理业务请求的时延大于预设阈值,此时可确定服务节点E为并发数阈值不合理的目标服务节点。
可选地,在本发明的一个可能的实施例中,该并发数阈值不合理的目标服务节点可以为并发数阈值初始设置不合理,例如并发数阈值初始设置过大,或并发数阈值初始设置过小。
可选地,在本发明的另一个可能的实施例中,该并发数阈值不合理的目标服务节点也 可以为并发数阈值初始设置合理,但由于在并发控制系统运行的过程中,由控制节点对某个故障服务节点进行调整后,导致的故障服务节点的上级或下级服务节点出现并发数阈值不合理。
值得说明的是,控制节点将同时使用上述三种方式监控并发控制系统中是否存在并发数阈值不合理的目标服务节点,以在服务节点的并发数阈值初始设置不合理、服务节点出现故障,或者控制节点对某个服务节点的并发数阈值进行调整后导致其上级服务节点或下级服务节点的并发数阈值不合理的情况下,及时确定并发数阈值不合理的目标服务节点,并对该目标服务节点的并发数阈值进行调整,以保证并发控制系统的性能。
S403、控制节点获取目标服务节点的并发数阈值,以及目标服务节点的相邻服务节点的并发数阈值和权重信息。
其中,并发数阈值是指该服务节点在基于调用链的并发控制系统中能够被并发调用的最大值。所述权重信息用于标识所述相邻服务节点调用所述目标服务节点的并发数的比例关系,其中目标服务节点的相邻服务节点包括目标服务节点的上级服务节点和下级服务节点,那么,权重信息可以用于标识调用链中同时调用同一服务节点的下级服务节点之间并发数阈值的比例关系,或同时调用该服务节点的上级服务节点之间并发数阈值的比例关系。参见图3,对于服务节点A来说,存在2个下级服务节点,即服务节点B和服务节点C,假设服务节点B和服务节点C的并发数阈值比例为2:3,且服务节点A的并发数阈值为100,则可以确定服务节点B和服务节点C的并发数阈值分别为40和60时,此时,服务节点A、服务节点B和服务节点C之间的调用关系满足彼此在业务请求处理时最优配比,服务节点间的资源可以得到充分利用,服务节点A的并发数阈值能够满足服务节点B和服务节点C在业务处理过程中最大处理能力。或者,权重信息也是指同时调用某个服务节点的多个上级服务节点之间的并发数阈值的比例关系,当某个服务节点的并发数阈值确定后,可以根据该权重信息确定该服务节点的上级服务节点的并发数阈值以及下级服务节点的并发数阈值,其并发数阈值确定情况与上述服务节点A存在服务节点B和服务节点C的处理过程相同,在此不再赘述。
值得说明的是,该权重信息可以由用户指定,也可设置为系统默认值。另外,权重信息是同一个服务节点的两个或两个以上服务节点的并发数阈值的比例关系,当某个服务节点的上级服务节点或下级服务节点仅有一个时,其权重信息为1,如图2中,服务节点B仅有一个下级服务节点D,则服务节点D的权重为1,此时,服务节点D的并发数阈值在等于服务节点B的并发数阈值时,服务节点D可以调用服务节点B的所有资源处理业务请求。
可选地,控制节点可以实时从各个服务节点获取各自的并发数阈值和权重信息,也可以周期性点获取各个服务节点的并发数阈值与权重信息,以便控制节点可以根据其获取的最新的并发数阈值和权重信息对服务节点的服务调用情况进行监控和调整。
具体地,可以由各服务节点直接将并发数阈值和权重信息发送给控制节点,例如,在各个服务节点上安装客户端代理模块,由客户端代理模块将各服务节点的并发数阈值和权重信息发送至控制节点。
S404、控制节点根据所述目标服务节点的并发数阈值、所述分析统计数据、以及所述目标服务节点的相邻服务节点的并发数阈值和权重信息确定目标服务节点更新后的并发数 阈值。
具体地,控制节点在确定并发数阈值不合理的目标服务节点后,首先基于分析统计数据中每个服务节点和其它服务节点之间的调用关系分析得到并发控制系统中各服务节点之间的服务调用拓扑关系。然后控制节点再根据该目标服务节点的并发数阈值,该服务调用拓扑关系,以及S403中目标服务节点的相邻服务节点的并发数阈值和权重信息,确定目标服务节点更新后的并发数阈值,更新后的并发数阈值使得调用链中各服务节点调用时,在服务调用拓扑关系中,满足其上级服务节点和/或下级服务节点的调用需求,并且也不存在服务节点预留多余资源导致资源浪费的问题。
可选地,控制节点获取的分析统计数据中也可以包括服务调用拓扑关系。
具体地,该服务调用拓扑关系可以是由信息处理节点根据服务节点的日志来确定,然后发送给控制节点。也可以是由控制节点直接根据其所获取到的分析统计中每个服务节点和其他服务节点的调用关系确定的,使得处理方式更为灵活。
对于同一并发控制系统来说,其服务调用拓扑关系相对固定,控制节点可以在第一次获取到服务调用拓扑关系后,可以保存该服务调用拓扑关系,在接下来的并发数阈值调整过程中,以此服务调用拓扑关系为参考,对基于调用链的并发控制系统中服务节点的并发数阈值进行调整和控制,避免每次调整过程中均需要获取该服务调用拓扑关系所导致的资源浪费。
更进一步地,当调整完目标服务节点后,若并发控制系统中还存在其它并发数阈值不合理的目标服务节点,可以使用以下两种方式中的任意一种来实现对并发控制系统中其他服务节点的并发数阈值进行调整。
方式一:可继续使用步骤S402重新确定该目标服务节点的上级服务节点和/或下级服务节点为新的目标服务节点,并使用同样的方式对重新确定的新的目标服务节点的并发数阈值进行调整。重复该过程,以完成对整个并发控制系统中所有服务节点的并发数阈值的调整。
示例地,在本发明的一个可能的实施例中,当目标服务节点的当前并发数阈值设置过小时,例如,目标服务节点为初始设置并发阈值过小导致并发数阈值不合理。参见图2,若每个服务利用一个服务节点部署,且设置服务节点A的并发数阈值为50,服务节点B的并发数阈值为50,服务节点C的并发数阈值为50,且服务节点B和服务节点C的并发数阈值的比例关系是1:1,此时,服务节点B和服务节点C的并发数阈值之和大于服务节点A的并发数阈值,则控制节点在获取各个服务节点的分析统计数据后,可以确定服务节点A为并发数阈值不合理的目标服务节点,若服务节点A可支持的最大并发数阈值大于或等于100,则控制节点直接将服务节点A的并发数阈值调整为100;若服务节点A可支持的最大并发数阈值为50,则控制节点分别将服务节点B和服务节点C的并发数阈值调整为25;此时完成对目标服务节点的并发数阈值的一次调整。若服务节点A的并发数阈值为100,服务节点B的并数阈值为50,服务节点C的并发数阈值为50,而服务节点D的并发数阈值为100,继续使用步骤S402的方式可以确定服务节点D为并发数阈值不合理的目标服务节点,此时,将根据服务节点D的上级服务节点B的并发数阈值50,以调整服务节点D的并发数阈值也为50,以使服务节点D与服务节点B的在图2所示的分布式服务调用关系中互相满 足调用需求。若该并发控制系统中还存在其它并发数阈值不合理的目标服务节点,可继续使用同样的方式对其它服务节点的并发数阈值进行调整,直至所有服务节点在该分布式服务调用关系中与其它服务节点之间满足调用需求,并且也不存在服务节点预留多余资源导致资源浪费的问题。
示例地,在本发明的另一个可能的实施例中,当目标服务节点为并发数阈值设置过大时,例如,目标服务节点为初始设置并发阈值过大导致并发数阈值不合理。假设图2中服务节点A的并发数阈值为200,服务节点B和服务节点C的并发数阈值分别为50,服务节点B和服务节点C的并发数阈值的比例关系为1:1,此时,服务节点B和服务节点C的并发数阈值之和为100,即服务节点B和服务节点C最大仅能处理服务节点A同时下发的100个服务调用关系,控制节点在获取各个服务节点的分析统计数据后,可以确定服务节点A的并发数阈值设置过大,此时,控制节点根据服务节点A的并发数阈值、服务节点A的下级服务节点的并发数阈值和权重信息,将服务节点A的并发数阈值调整为100。同样地,可继续使用同样的方式对该并发控制系统的其它服务节点的并发数阈值进行调整。
值得说明的是,当服务由多个服务节点组成的集群部署,且其中一个服务节点发生故障时,提供该服务的多个服务节点的并发数阈值之和发生改变,从而导致无法与该服务的上级服务节点和/或下级服务节点的并发数阈值匹配,影响调用链系统的整体性能,首先由S402确定出来该目标服务节点,然后再根据分析统计数据得到该目标服务节点上可同时并发运行的最大业务请求个数,即可确定该最大业务请求个数为该目标服务节点更新后的并发数阈值。
示例地,在本发明的又一个可能的实施例中,参见图3,服务E和服务F采用集群形式部署,服务节点C和服务节点D均会调用部署服务E的多个服务节点进行业务处理,若当前时刻服务节点C的并发数阈值为100,服务节点D的并发数阈值为50,服务节点C和服务节点D的并发数阈值的比例关系为2:1,此时,部署服务E的服务集群的并发数阈值之和应为150,服务节点E1、服务节点E2、服务节点E3的并发数阈值均为50,此时,部署服务E的集群能够同时处理来自服务节点C和服务节点D的并发数阈值之和为150的处理请求。当部署服务E的一个服务节点故障时,部署服务E的集群的并发数阈值之和变为100,若服务节点C和服务节点D仍按照当前时刻配置的并发数阈值执行业务请求的服务调用关系,会导致其中部分业务请求处理失败,此时,控制节点可以根据分析统计数据确定故障节点为不合理的目标服务节点,并直接根据分析统计数据确定该目标服务节点上可同时并发运行的最大业务请求个数为100,所以可调整部署服务E的服务节点的更新后的并发数阈值之和为100。在完成对部署服务E的服务节点的并发数阈值的调整后,此时部署服务E的上级服务节点和下级服务节点的并发数阈值将与部署服务E的服务节点的并发数阈值不匹配,故可继续使用步骤S402分别确定部署服务E的服务节点的上级服务节点和/或下级服务节点为并发数阈值不合理的目标服务节点,例如,可重新确定服务节点C为并发数阈值不合理的目标服务节点,然后再根据服务节点C的下级服务节点E的并发数阈值以及权重数据调整服务节点C更新后的并发数阈值为50,以使得服务节点E能满足其上级服务节点的调用需求。重复上述步骤,可依次对并发控制系统中其它服务节点的并发数阈值进行调整,直至所有服务节点在该分布式服务调用关系中与其它服务节点之间满足调用 需求,并且也不存在服务节点预留多余资源导致资源浪费的问题。
方式二:也可以在调整完目标服务节点后,再分别对该目标服务节点的上级服务节点和/或下级服务节点进行调整,以完成对并发控制系统中所有服务节点的并发数阈值的调整。具体调整方式如下:
控制节点对并发控制系统中服务节点的并发数阈值调整过程,可以是通过一次调整任务,先确认一个目标服务节点后,然后依次调整目标服务节点的上级服务节点和/或下级服务节点的并发数阈值,最后再将更新后的并发数阈值发送给需要调整的服务节点。也可以是每次调整任务仅确认一个目标服务节点,调整完该目标服务节点的并发数阈值后,再利用相同的方法,对该目标服务节点的上级服务节点或下级服务节点的并发数阈值进行调整。
对于目标服务节点的上级服务节点来说,在控制节点确定目标服务节点更新后的并发数阈值后,控制节点再根据服务调用拓扑关系、所述目标服务节点更新后的并发数阈值、以及所述目标服务节点的相邻服务节点的并发数阈值和权重信息对所述目标服务节点的上级服务节点进行调整以得到所述上级服务节点更新后的并发数阈值。
值得说明的是,利用上述方式对目标服务节点的上级服务节点和/或下级服务节点的并发数阈值进行调整时的调整原则,与对目标服务节点的并发数阈值进行调整时的调整原则相同,从而当对目标服务节点的并发数阈值进行调整,并依次对目标服务节点的上级服务节点和/下级服务节点的并发数阈值进行调整后,将使得并发控制系统中所有服务节点在该分布式服务调用关系中与其它服务节点之间满足调用需求,并且也不存在服务节点预留多余资源导致资源浪费的问题。
示例地,在本发明的一个可能的实施例中,参见图2,若每个服务利用一个服务节点部署,且设置服务节点A的并发数阈值为200,服务节点B的并发数阈值为100,服务节点C的并发数阈值为50,服务节点E的并发数阈值为50,服务节点B和服务节点C的并发数阈值的比例关系为1:1,且服务节点最大可支持120的并发数阈值,此时,服务节点C无法满足服务节点A的调用需求,所以首先可确定服务节点C为并发数阈值不合理的目标服务节点,控制节点根据服务节点C的并发数阈值、上级服务节点A的并发数阈值以及权重信息,确定服务节点C更新后的并发数阈值为100,接下来再根据图2所示的服务调用拓扑关系,服务节点C的更新后的并发数阈值100,以及服务节点C的下级服务节点E的并发数阈值和权重信息,对服务节点E的并发数阈值进行调整,得到服务节点E的更新后的并发数阈值为100,以满足服务节点C的服务调用关系。
控制节点在确定并发数阈值不合理的目标服务节点后,利用方式二依次对该目标服务节点的上级服务节点以及下级服务节点进行调整时,待调整的服务节点包括以下情况中的任意一种:
情况一:并发数阈值不合理的目标服务节点的上级服务节点和下级服务节点。
例如,如图2所示,若服务节点C为并发数阈值不合理的目标服务节点,控制节点在调整服务节点C的并发数阈值时,会影响其下级服务节点处理业务请求的能力,即服务节点E的并发数阈值也需要进行调整。
其中,并发数阈值不合理的目标服务节点的上级服务节点和下级服务节点包括与并发数阈值不合理的目标服务节点存在调用关系且相邻的上级服务节点和下级服务节点、与并 发数阈值不合理的目标服务节点存在调用关系且非相邻的上级服务节点和下级服务节点。
例如,如图2所示,假设每个服务由一个服务节点部署,若服务节点E是并发数阈值不合理的目标服务节点,控制节点确定服务节点C和服务节点F需要调整并发数阈值后,服务节点C和服务节点F更新后的并发数阈值导致服务节点A的当前并发数阈值不合理,此时,控制节点还需要调整服务节点A的并发数阈值。
情况二:并发数阈值不合理的目标服务节点的上级服务节点或下级服务节点。
控制节点确定目标服务节点后,根据目标服务节点的并发数阈值、目标服务节点的上级服务节点或下级服务节点的并发数阈值和权重信息,确定目标服务节点的并发数阈值不需要调整,例如,目标服务节点最大可支持的并发调用个数小于其上级服务节点或下级服务节点的并发数阈值,需要对目标服务节点的上级服务节点或下级服务节点的并发数阈值进行调整,以保证整个并发控制系统的性能。
可选地,控制节点在调整并发数阈值的过程中,也可以通过关闭服务节点的方式调整调用链系统的并发数阈值,以此降低资源损耗。
示例地,如图3所示,若控制节点在调整并发数阈值的过程中,若部署服务E的服务节点的并发数阈值需要从150调整到100,而服务E由服务节点E1、服务节点E2以及服务节点E3共同提供,各服务节点分别提供50并发数,所以此时可以关闭其中一个服务节点,例如,可以关闭服务节点E1,从而降低了服务节点E1的资源损耗。
S405、所述控制节点向所述目标服务节点发送并发数阈值调整请求,并发数阈值调整请求包括更新后的并发数阈值。
可选地,在本发明的一个可能的实施例中,当控制节点再次将目标服务节点的上级服务节点或下级服务节点确定为新的目标服务节点后,控制节点在确定完该新的目标服务节点的更新后的并发数阈值后,向该新的目标服务节点发送并发数阈值调整请求,该并发数阈值调整请求中包括该新的目标服务节点的更新后的并发数阈值,重复该步骤,从而实现对并发控制系统中所有服务节点的并发数阈值的更新。
可选地,在本发明的另一个可能的实施例中,当控制节点完成对目标服务节点的并发数阈值的调整后,再依次对该目标服务节点的上级服务节点和/或下级服务节点的并发数阈值的调整,以得到该目标服务节点的上级服务节点和/或下级服务节点的更新后的并发数阈值,然后控制节点再向该目标节点的上级服务节点和/或下级服务节点发送并发数阈值调整请求,该并发数阈值调整请求中包括与上级服务节点和/或下级服务节点对应的更新后的并发数阈值,将使各服务节点能对自身的并发数阈值进行调整,从而最终实现对并发控制系统中所有服务节点的并发数阈值的更新。可以理解,通过将更新后的并发数阈值发送至各目标服务节点,使各目标服务节点可以根据更新后的并发数阈值进行并发数阈值调整,使得各服务节点之间的并发数阈值满足调用关系。
通过上述内容的描述,控制节点获取各个服务节点的分析统计数据并根据分析统计数据确定目标服务节点,然后控制节点再获取目标服务节点的上级服务节点或下级服务节点的并发数阈值和权重信息后,根据目标服务节点的并发数阈值、分析统计数据,以及上级服务节点或下级服务节点的并发数阈值和权重信息确定目标服务节点更新后的并发数阈值;最后控制节点再向目标服务节点发送携带更新后并发数阈值的并发数阈值调整请求,由于 控制节点可以周期性或实时的通过分析统计数据确定并发数阈值不合理的目标节点,并对其并发数阈值进行调整,因此,控制节点可以动态调整服务节点的并发数阈值,使得更新后的并发数阈值能满足并发能力调用关系,以此解决现有技术中基于调用链的并发控制系统无法及时感知服务节点故障或并发数阈值不合理所导致的并发控制系统处理性能下降的问题,保障并发控制系统的稳定与可靠性,降低业务请求处理的时延,提升业务请求处理的整体性能。
下面以具体的示例来说明控制节点对各服务节点的并发数阈值进行调整的流程,详述如下。
在本发明的一个可能的实施例中,若信息处理节点实时采集各服务节点所记录的日志,并通过日志确定分析统计数据,然后再将该分析统计数据发送给控制节点。控制节点根据各服务节点的并发数阈值、权重数据以及分析统计数据确定并发控制系统的服务调用拓扑关系如图5所示,图5示出了本发明实施例提供的一种基于调用链的并发控制系统的服务调用拓扑关系示意图。其中,服务A由服务节点A提供;服务B由服务节点B提供;服务E由服务节点E提供;服务F由服务节点F提供;服务G以集群形式部署,包括服务节点G1和服务节点G2;服务H以集群形式部署,包括服务节点H1和服务节点H2。初始设置服务节点A的并发数阈值为80;服务节点B的并发数阈值为40;服务节点E的并发数阈值为80;服务节点F的并发数阈值为40;部署服务G的集群的并发数阈值之和为120,其中,服务节点G1和服务节点G2的并发数阈值分别为60;部署服务H的集群的并发数阈值之和为120,其中,服务节点H1和服务节点H2的并发数阈值分别为60,此时各服务节点之间调用时,在图5所示的服务调用拓扑关系中满足调用链中各个服务节点的调用需求。
在本发明的一个可能的实施例中,若控制节点在某时刻获取到的分析统计数据中,确定服务节点G1整体故障时,此时,G服务仅有服务节点G2提供,即G服务当前并发处理能力变为60,需要基于服务G对其上级服务节点和下级服务节点的并发数阈值进行调整。控制节点根据服务调用拓扑关系可以得出涉及服务节点G1的上行调用链有2条,分别为服务节点E->服务节点A和服务节点F->服务节点B,由于服务节点E和服务节点F的并发数阈值之和为120,已经超过了服务G(服务节点G2)的能力,此时,根据服务节点E和服务节点F的并发数阈值的比例关系为2:1,可以算出服务节点E和服务节点F的并发数阈值应该分别调整成40和20。服务节点E的上级服务节点A的并发数阈值为80,大于服务节点E的并发数阈值40,所以也要调整,由于服务节点E只有一个上级服务节点,权重为1,所以服务节点A的并发数阈值保持和服务节点E的并发数阈值一致,利用同样的方式可将服务节点B的并发数阈值调整为20,完成上级服务节点的并发数阈值调整;对于服务G来说有一个下行调用链的服务H,服务H采用集群形式部署,其并发数阈值之和为120,可以将服务H的并发数阈值之和调整为60,具体实施时,可以分别将服务节点H1和服务节点H2的并发数阈值调整为30,或者关闭其中一个提供G服务的服务节点,如关闭服务节点H2,由此节省服务节点资源。即完成服务节点G的下级服务节点的并发数调整。
在本发明的另一个可能的实施例中,若控制节点检测到服务节点A为并发数阈值不合理的目标节点(如并发数阈值设置过小),由于服务节点A不存在上级服务节点,所以此时只需要对服务节点A的下级服务节点的并发数阈值进行调整,具体调整方式与上述示例相 同。
在本发明的另一个可能的实施例中,若控制节点检测到服务节点H为并发数阈值不合理的目标节点(如存在网络闪断),由于服务节点H不存在下级调用服务节点,所以此时只需要对H的上级调用服务节点的并发数阈值进行调整,具体调整方式与上述示例相同。
值得说明的是,对于任一服务调用拓扑关系中的任意一个服务节点的并发数阈值设置不合理时,均可以采用上述调整方式对其它服务节点的并发数阈值进行调整,总的原则是使得调整后得到的各服务节点的新的并发数阈值能满足该服务节点的上级服务节点和下级服务节点的调用能力,即新的并发数阈值使得各服务节点的并发调用能力匹配,并且不会空闲多余的并发数调用能力。对于服务节点故障以及并发数阈值设置不合理的场景下,上述调整方式均适合。
更进一步地,当多个服务节点的并发数阈值设置不合理时,可以利用上述调整方式依次分别对单个服务节点进行调整以完成整个调整过程。
可以理解,通过上述方式,即可实现对服务节点的并发数阈值进行调整,从而使得更新后的并发数阈值能满足调用关系,保障调用链系统的性能稳定。
需要说明,本发明实施例中所提及的基于调用链的并发控制方法不限于上述实施例中所提及的服务节点故障以及服务节点的处理时延超过阈值的场景,对于其它服务节点的并发数阈值不合理的场景导致的各服务节点的调用能力不匹配的场景均可适用于本发明实施例所提及的方案,例如,对于采用集群形式部署的服务,通过扩容的方式增加部署该服务的服务节点所导致的提供该服务集群的并发数阈值之和过大,以及通过减容的方式减少部署该服务的服务节点所导致的提供该服务的集群的并发数阈值之和过小的场景,均可适用于本发明实施例中所提及的相应调整方案。
上文中结合图1至图5,详细描述了根据本发明实施例所提供的一种基于调用链的并发控制的方法,下面将结合图6至图7,描述根据本发明实施例所提供的基于调用链的并发控制的装置和控制节点。
参见图6,图6是本发明实施例提供的一种基于调用链的并发控制的装置600示意图,该装置600包括获取模块610、处理模块620和发送模块630;
所述获取模块610,用于获取所述多个服务节点中每个服务节点的分析统计数据;
所述处理模块620,用于根据所述多个服务节点中每个服务节点的分析统计数据确定所述多个服务节点中并发数阈值不合理的目标服务节点;
所述获取模块610,还用于获取所述目标服务节点的并发数阈值,以及所述目标服务节点的相邻服务节点的并发数阈值和权重信息,其中,所述并发数阈值用于标识服务节点在所述并发控制系统中能够被并发调用的最大值;所述权重信息用于标识所述相邻服务节点调用所述目标服务节点的并发数的比例关系;
所述处理模块620,还用于根据所述目标服务节点的并发数阈值、所述分析统计数据、以及所述目标服务节点的相邻服务节点的并发数阈值和权重信息确定所述目标服务节点更新后的并发数阈值;
所述发送模块630,用于向所述目标服务节点发送并发数阈值调整请求,所述并发数 阈值调整请求包括所述更新后的并发数阈值。
应理解的是,本发明实施例的装置600中各个模块均可以通过专用集成电路(Application Specific Integrated Circuit,ASIC)实现,或可编程逻辑器件(Programmable Logic Device,PLD)实现,上述PLD可以是复杂程序逻辑器件(Complex Programmable Logic Device,CPLD),现场可编程门阵列(Field-Programmable Gate Array,FPGA),通用阵列逻辑(Generic Array Logic,GAL)或其任意组合。也可以通过软件实现图4至图5所示的基于调用链的并发控制的方法时,装置600及其各个模块也可以为软件模块。
可选地,所述分析统计数据包括所述服务节点处理业务请求的处理结果,所述目标服务节点为所述处理结果为处理业务请求失败的服务节点。
可选地,所述分析统计数据包括所述服务节点处理业务请求的时延,所述目标服务节点为所述处理业务请求的时延大于或等于预设阈值的服务节点。
可选地,所述分析统计数据包括所述服务节点处理和其他服务节点之间的调用关系;所述获取模块610,还用于获取所述并发控制系统中服务调用拓扑关系;
所述处理模块620,还用于根据所述目标服务节点的并发数阈值,所述服务调用拓扑关系、以及所述目标服务节点的相邻服务节点的并发数阈值和权重信息对所述目标服务节点进行调整以得到更新后的并发数阈值。
可选地,所述获取模块610获取所述并发控制系统中服务调用拓扑关系,包括基于所述分析统计数据中每个服务节点和其他服务节点之间的调用关系获得所述并发控制系统的服务调用拓扑关系。
可选地,所述处理模块620,还用于根据服务调用拓扑关系、所述目标服务节点更新后的并发数阈值、以及所述相邻服务节点的并发数阈值和权重信息对所述目标服务节点的相邻服务节点进行调整以得到所述目标服务节点的相邻服务节点更新后的并发数阈值;
所述发送模块630,还用于向所述目标服务节点的相邻服务节点发送所述目标服务节点的相邻服务节点更新后的并发数阈值。
通过上述内容的描述,装置600获取各个服务节点的分析统计数据并根据分析统计数据确定目标服务节点,再获取目标服务节点相邻的上级服务节点或下级服务节点的并发数阈值和权重信息后,根据目标服务节点的并发数阈值、分析统计数据,以及上级服务节点或下级服务节点的并发数阈值和权重信息确定目标服务节点更新后的并发数阈值;再向目标服务节点发送携带更新后并发数阈值的并发数阈值调整请求。装置600通过动态调整服务节点的并发数阈值,使得更新后的并发数阈值能满足并发能力调用关系,以此解决现有技术中基于调用链的并发控制系统无法及时感知服务节点故障或并发数阈值不合理所导致的并发控制系统处理性能下降的问题,保障并发控制系统的稳定与可靠性,降低业务请求处理的时延,提升业务请求处理过程中的整体性能。
根据本发明实施例的装置600可对应于执行本发明实施例中描述的方法,并且装置600中的各个单元的上述和其它操作和/或功能分别为了实现图4所述方法中控制节点执行的相应流程,为了简洁,在此不再赘述。
值得说明的是,本发明实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分, 实际实现时可以有另外的划分方式。
参见图7,图7是本发明实施例提供的另一种控制节点700的结构示意图,如图7所示,该控制节点700包括:处理器702、通信接口703、存储器701和总线704。其中,通信接口703、处理器702以及存储器701可以通过总线704相互连接并完成相互间的通信;存储器701用于存储计算机执行指令,控制节点700运行时,处理器702执行所述存储器701中的计算机执行指令以利用控制节点700中的硬件资源执行以下操作:
获取所述多个服务节点中每个服务节点的分析统计数据;
根据所述多个服务节点中每个服务节点的分析统计数据确定所述多个服务节点中并发数阈值不合理的目标服务节点;
获取所述目标服务节点的并发数阈值,以及所述目标服务节点的相邻服务节点的并发数阈值和权重信息,其中,所述并发数阈值用于标识服务节点在所述并发控制系统中能够被并发调用的最大值;所述权重信息用于标识所述相邻服务节点调用所述目标服务节点的并发数的比例关系;
根据所述目标服务节点的并发数阈值、所述分析统计数据、以及所述目标服务节点的相邻服务节点的并发数阈值和权重信息确定所述目标服务节点更新后的并发数阈值;
向所述目标服务节点发送并发数阈值调整请求,所述并发数阈值调整请求包括所述更新后的并发数阈值。
应理解,在本发明实施例中,该处理器702可以是CPU,该处理器702还可以是其他通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器701可以包括只读存储器和随机存取存储器,并向处理器702提供指令和数据。存储器701的一部分还可以包括非易失性随机存取存储器。例如,存储器701还可以存储设备类型的信息。
该总线704除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都标为总线504。
应理解,根据本发明实施例的控制节点700可对应于本发明实施例中图1所示的并发控制系统中的控制节点120,以及图6所示的装置600,并可以对应于执行根据本发明实施例的图4中的控制节点,并且控制节点700中的各个模块的上述和其它操作和/或功能分别为了实现图4至图5中的各个方法的相应流程,为了简洁,在此不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质 可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk,SSD)等。
值得说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的范围。

Claims (14)

  1. 一种基于调用链的并发控制的方法,其特征在于,所述方法应用于基于调用链的并发控制系统,所述并发控制系统包括控制节点、多个服务节点,所述多个服务节点用于部署应用服务,所述方法包括:
    所述控制节点获取所述多个服务节点中每个服务节点的分析统计数据;
    所述控制节点根据所述多个服务节点中每个服务节点的分析统计数据确定所述多个服务节点中并发数阈值不合理的目标服务节点;
    所述控制节点获取所述目标服务节点的并发数阈值,以及所述目标服务节点的相邻服务节点的并发数阈值和权重信息,其中,所述并发数阈值用于标识服务节点在所述并发控制系统中能够被并发调用的最大值;所述权重信息用于标识所述相邻服务节点调用所述目标服务节点的并发数的比例关系;
    所述控制节点根据所述目标服务节点的并发数阈值、所述分析统计数据、以及所述目标服务节点的相邻服务节点的并发数阈值和权重信息确定所述目标服务节点更新后的并发数阈值;
    所述控制节点向所述目标服务节点发送并发数阈值调整请求,所述并发数阈值调整请求包括所述更新后的并发数阈值。
  2. 根据权利要求1所述的方法,其特征在于,所述分析统计数据包括所述服务节点处理业务请求的处理结果,所述目标服务节点为所述处理结果为处理业务请求失败的服务节点。
  3. 根据权利要求1所述的方法,其特征在于,所述分析统计数据包括所述服务节点处理业务请求的时延,所述目标服务节点为所述处理业务请求的时延大于或等于预设阈值的服务节点。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述分析统计数据包括所述服务节点处理和其他服务节点之间的调用关系,则所述控制节点根据所述目标服务节点的并发数阈值、所述分析统计数据、以及所述权重信息确定所述目标服务节点更新后的并发数阈值,包括:
    所述控制节点获取所述并发控制系统中服务调用拓扑关系;
    所述控制节点根据所述目标服务节点的并发数阈值,所述服务调用拓扑关系、以及所述目标服务节点的相邻服务节点的并发数阈值和权重信息对所述目标服务节点进行调整以得到更新后的并发数阈值。
  5. 根据权利要求4所述的方法,其特征在于,所述控制节点获取所述并发控制系统中服务调用拓扑关系,包括:
    所述控制节点根据所述多个服务节点中每个服务节点的分析统计数据中所述服务节点和其他服务节点之间的调用关系获得所述并发控制系统的服务调用拓扑关系。
  6. 根据权利要求1至5中任一所述方法,其特征在于,所述方法还包括:所述控制节点根据服务调用拓扑关系、所述目标服务节点更新后的并发数阈值、以及所述相邻服务节点的并发数阈值和权重信息对所述目标服务节点的相邻服务节点进行调整以得到所述目标服务节点的相邻服务节点更新后的并发数阈值;
    所述控制节点向所述目标服务节点的相邻服务节点发送所述目标服务节点的相邻服务节点更新后的并发数阈值。
  7. 一种基于调用链的并发控制的装置,其特征在于,所述装置包括获取模块、处理模块和发送模块:
    所述获取模块,用于获取所述多个服务节点中每个服务节点的分析统计数据;
    所述处理模块,用于根据所多个服务节点中每个服务节点的述分析统计数据确定所述服务节点中并发数阈值不合理的目标服务节点;
    所述获取模块,还用于获取所述目标服务节点的并发数阈值,以及所述目标服务节点的相邻服务节点的并发数阈值和权重信息,所述并发数阈值用于标识服务节点在所述并发控制系统中能够被并发调用的最大值;所述权重信息用于标识所述相邻服务节点调用所述目标服务节点的并发数的比例关系;
    所述处理模块,还用于根据所述目标服务节点的并发数阈值,所述分析统计数据、以及所述目标服务节点的相邻服务节点的并发数阈值和权重信息确定所述目标服务节点更新后的并发数阈值;
    所述发送模块,用于向所述目标服务节点发送并发数阈值调整请求,所述并发数阈值调整请求包括所述更新后的并发数阈值。
  8. 根据权利要求7所述的装置,其特征在于,所述分析统计数据包括所述服务节点处理业务请求的处理结果,所述目标服务节点为所述处理结果为处理业务请求失败的服务节点。
  9. 根据权利要求7所述的装置,其特征在于,所述分析统计数据包括所述服务节点处理业务请求的时延,所述目标服务节点为所述处理业务请求的时延大于或等于预设阈值的服务节点。
  10. 根据权利要求7至9任一项所述的装置,其特征在于,所述分析统计数据包括所述服务节点处理和其他服务节点之间的调用关系;
    所述获取模块,还用于获取所述并发控制系统中服务调用拓扑关系;
    所述处理模块,还用于根据所述目标服务节点的并发数阈值,所述服务调用拓扑关系、以及所述目标服务节点的相邻级服务节点的并发数阈值和权重信息对所述目标服务节点进行调整以得到更新后的并发数阈值。
  11. 根据权利要求10所述的装置,其特征在于,所述获取模块获取所述并发控制系统中服务调用拓扑关系,包括:
    根据所述多个服务节点中每个服务节点的分析统计数据中所述服务节点和其他服务节点之间的调用关系获得所述并发控制系统的服务调用拓扑关系。
  12. 根据权利要求7至11任一项所述的装置,其特征在于,
    所述处理模块,还用于根据服务调用拓扑关系、所述目标服务节点更新后的并发数阈值、以及所述相邻服务节点的并发数阈值和权重信息对所述目标服务节点的相邻服务节点进行调整以得到所述目标服务节点的相邻服务节点更新后的并发数阈值值;
    所述发送模块,还用于向所述目标服务节点的相邻服务节点发送所述目标服务节点的相邻服务节点更新后的并发数阈值。
  13. 一种控制节点,其特征在于,所述控制节点包括处理器、存储器、通信接口、总线,所述处理器、存储器和通信接口之间通过总线连接并完成相互间的通信,所述存储器中用于存储计算机执行指令,所述控制节点运行时,所述处理器执行所述存储器中的计算机执行指令以利用所述控制节点中的硬件资源执行权利要求1至6中任一所述的方法。
  14. 一种计算机可读存储介质,所述计算机可读存储介质中包括指令,当其在计算机上运行时,使得计算机执行权利要求1至6中任一所述的方法。
PCT/CN2017/072781 2017-01-26 2017-01-26 一种基于调用链的并发控制的方法、装置及控制节点 WO2018137254A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
EP17893715.7A EP3564816A4 (en) 2017-01-26 2017-01-26 CALL-BASED SIMULTANEITY CONTROL PROCEDURES, DEVICE AND CONTROL NODES
CN201780000205.1A CN108633311B (zh) 2017-01-26 2017-01-26 一种基于调用链的并发控制的方法、装置及控制节点
PCT/CN2017/072781 WO2018137254A1 (zh) 2017-01-26 2017-01-26 一种基于调用链的并发控制的方法、装置及控制节点
US16/523,480 US10873622B2 (en) 2017-01-26 2019-07-26 Call chain-based concurrency control method and apparatus, and control node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/072781 WO2018137254A1 (zh) 2017-01-26 2017-01-26 一种基于调用链的并发控制的方法、装置及控制节点

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/523,480 Continuation US10873622B2 (en) 2017-01-26 2019-07-26 Call chain-based concurrency control method and apparatus, and control node

Publications (1)

Publication Number Publication Date
WO2018137254A1 true WO2018137254A1 (zh) 2018-08-02

Family

ID=62978930

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/072781 WO2018137254A1 (zh) 2017-01-26 2017-01-26 一种基于调用链的并发控制的方法、装置及控制节点

Country Status (4)

Country Link
US (1) US10873622B2 (zh)
EP (1) EP3564816A4 (zh)
CN (1) CN108633311B (zh)
WO (1) WO2018137254A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111415261A (zh) * 2020-03-27 2020-07-14 中国建设银行股份有限公司 银行系统的流控阈值动态更新的控制方法、系统和装置
CN112835717A (zh) * 2021-02-05 2021-05-25 远光软件股份有限公司 一种用于集群的集成应用处理方法和装置
CN113760652A (zh) * 2021-08-13 2021-12-07 济南浪潮数据技术有限公司 基于应用的全链路监控的方法、系统、设备和存储介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106656536B (zh) * 2015-11-03 2020-02-18 阿里巴巴集团控股有限公司 一种用于处理服务调用信息的方法与设备
US10984412B2 (en) * 2018-09-20 2021-04-20 Coinbase, Inc. System and method for management of cryptocurrency systems
CN110727518B (zh) * 2019-10-14 2022-05-27 北京奇艺世纪科技有限公司 一种数据处理方法及相关设备
CN110809062B (zh) * 2019-11-14 2022-03-25 思必驰科技股份有限公司 公有云语音识别资源调用控制方法和装置
CN111124731A (zh) * 2019-12-20 2020-05-08 浪潮电子信息产业股份有限公司 一种文件系统异常监测方法、装置、设备、介质
CN111343240B (zh) * 2020-02-12 2022-08-16 北京字节跳动网络技术有限公司 一种服务请求的处理方法、装置、电子设备及存储介质
CN112181498B (zh) * 2020-10-09 2024-01-30 中国工商银行股份有限公司 并发控制方法、装置和设备
CN112866055A (zh) * 2021-01-05 2021-05-28 广州品唯软件有限公司 业务流量评估方法、装置、计算机设备和存储介质
CN113315718B (zh) * 2021-03-31 2023-11-14 阿里巴巴新加坡控股有限公司 自适应限流的系统、方法和装置
CN113590261B (zh) * 2021-06-30 2022-05-06 济南浪潮数据技术有限公司 一种分布式服务的部署方法及系统
CN116016524B (zh) * 2023-03-24 2023-07-07 湖南智芯微科技有限公司 一种应用于机动式指挥平台的数据处理方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104090823A (zh) * 2014-06-09 2014-10-08 中国建设银行股份有限公司 一种用于计算机系统的流量控制方法和装置
CN104850431A (zh) * 2015-04-29 2015-08-19 努比亚技术有限公司 基于fota升级的稳定处理方法和装置
US20160217003A1 (en) * 2013-06-24 2016-07-28 Sap Se Task Scheduling for Highly Concurrent Analytical and Transaction Workloads
CN106250199A (zh) * 2016-07-26 2016-12-21 北京北森云计算股份有限公司 一种多语言云编译的动态微服务调用方法及装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9152464B2 (en) * 2010-09-03 2015-10-06 Ianywhere Solutions, Inc. Adjusting a server multiprogramming level based on collected throughput values
CN103379040B (zh) * 2012-04-24 2016-08-31 阿里巴巴集团控股有限公司 一种高并发系统中控制并发数的装置和方法
CN102681889B (zh) * 2012-04-27 2015-01-07 电子科技大学 一种云计算开放平台的调度方法
US8869148B2 (en) * 2012-09-21 2014-10-21 International Business Machines Corporation Concurrency identification for processing of multistage workflows
US9979674B1 (en) * 2014-07-08 2018-05-22 Avi Networks Capacity-based server selection
US9760406B2 (en) * 2014-09-02 2017-09-12 Ab Initio Technology Llc Controlling data processing tasks
CN104932898B (zh) * 2015-06-30 2018-03-23 东北大学 一种基于改进多目标粒子群优化算法的待增组件选择方法
US10659371B1 (en) * 2017-12-11 2020-05-19 Amazon Technologies, Inc. Managing throttling limits in a distributed system
US10817497B2 (en) * 2018-01-29 2020-10-27 Salesforce.Com, Inc. Migration flow control
US20200092395A1 (en) * 2018-09-19 2020-03-19 International Business Machines Corporation Overload management of a transaction processing server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160217003A1 (en) * 2013-06-24 2016-07-28 Sap Se Task Scheduling for Highly Concurrent Analytical and Transaction Workloads
CN104090823A (zh) * 2014-06-09 2014-10-08 中国建设银行股份有限公司 一种用于计算机系统的流量控制方法和装置
CN104850431A (zh) * 2015-04-29 2015-08-19 努比亚技术有限公司 基于fota升级的稳定处理方法和装置
CN106250199A (zh) * 2016-07-26 2016-12-21 北京北森云计算股份有限公司 一种多语言云编译的动态微服务调用方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3564816A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111415261A (zh) * 2020-03-27 2020-07-14 中国建设银行股份有限公司 银行系统的流控阈值动态更新的控制方法、系统和装置
CN111415261B (zh) * 2020-03-27 2023-10-24 中国建设银行股份有限公司 银行系统的流控阈值动态更新的控制方法、系统和装置
CN112835717A (zh) * 2021-02-05 2021-05-25 远光软件股份有限公司 一种用于集群的集成应用处理方法和装置
CN113760652A (zh) * 2021-08-13 2021-12-07 济南浪潮数据技术有限公司 基于应用的全链路监控的方法、系统、设备和存储介质
CN113760652B (zh) * 2021-08-13 2023-12-26 济南浪潮数据技术有限公司 基于应用的全链路监控的方法、系统、设备和存储介质

Also Published As

Publication number Publication date
EP3564816A4 (en) 2019-11-20
US10873622B2 (en) 2020-12-22
CN108633311B (zh) 2021-12-21
CN108633311A (zh) 2018-10-09
EP3564816A1 (en) 2019-11-06
US20190349423A1 (en) 2019-11-14

Similar Documents

Publication Publication Date Title
WO2018137254A1 (zh) 一种基于调用链的并发控制的方法、装置及控制节点
US11106388B2 (en) Monitoring storage cluster elements
US11159450B2 (en) Nonintrusive dynamically-scalable network load generation
US10728135B2 (en) Location based test agent deployment in virtual processing environments
US10601643B2 (en) Troubleshooting method and apparatus using key performance indicator information
US10860311B2 (en) Method and apparatus for drift management in clustered environments
US7849178B2 (en) Grid computing implementation
US20180295029A1 (en) Managing groups of servers
WO2017000260A1 (zh) 一种切换vnf的方法和装置
WO2020030000A1 (zh) 容灾切换方法、相关设备及计算机存储介质
US9489281B2 (en) Access point group controller failure notification system
BR112013027974A2 (pt) método e aparelho para a implementação de grupo de recurso de banda base em estação base de lte
WO2020010906A1 (zh) 操作系统os批量安装方法、装置和网络设备
CN112073499A (zh) 一种多机型云物理服务器的动态服务方法
US9521134B2 (en) Control apparatus in software defined network and method for operating the same
WO2016000244A1 (zh) 一种云计算下业务弹性的方法和装置
US11695856B2 (en) Scheduling solution configuration method and apparatus, computer readable storage medium thereof, and computer device
Pashkov et al. On high availability distributed control plane for software-defined networks
US11496595B2 (en) Proxy management controller system
US9798633B2 (en) Access point controller failover system
CN107547257B (zh) 一种服务器集群实现方法及装置
JP5631285B2 (ja) 障害監視システムおよび障害監視方法
US10277700B2 (en) Control plane redundancy system
Kitamura Configuration of a Power-saving High-availability Server System Incorporating a Hybrid Operation Method
CN113535359B (zh) 一种多租户云中服务请求调度方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17893715

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017893715

Country of ref document: EP

Effective date: 20190801