CN116841797A - Service retry management method and device, electronic equipment and readable storage medium - Google Patents

Service retry management method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN116841797A
CN116841797A CN202310673959.0A CN202310673959A CN116841797A CN 116841797 A CN116841797 A CN 116841797A CN 202310673959 A CN202310673959 A CN 202310673959A CN 116841797 A CN116841797 A CN 116841797A
Authority
CN
China
Prior art keywords
service node
request
retry
time
service
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310673959.0A
Other languages
Chinese (zh)
Inventor
李俊
项连志
谢良辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Apollo Zhilian Beijing Technology Co Ltd
Original Assignee
Apollo Zhilian Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Apollo Zhilian Beijing Technology Co Ltd filed Critical Apollo Zhilian Beijing Technology Co Ltd
Priority to CN202310673959.0A priority Critical patent/CN116841797A/en
Publication of CN116841797A publication Critical patent/CN116841797A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1482Generic software techniques for error detection or fault masking by means of middleware or OS functionality

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The disclosure provides a service retry management method, a device, an electronic device and a readable storage medium, and relates to the technical field of computers, in particular to the technical field of cloud computing or cloud service. The specific scheme is as follows: acquiring the time consumption of a request of a call request sent by a first service node to a second service node in a first time window, wherein the first service node and the second service node are adjacent nodes in a service call link, the first service node is positioned at the upstream of the second service node in the service call link, the time consumption of the request is the duration between a first moment and a second moment, the first moment is the moment when the first service node sends the call request, and the second moment is the moment when the first service node receives a response result of the call request; and determining a retry time length based on the request time consumption, wherein the retry time length is the time length between the sending time of the call request and the sending time of the retry request corresponding to the call request. The scheme can avoid overload of the downstream service node caused by service retry.

Description

Service retry management method and device, electronic equipment and readable storage medium
Technical Field
The disclosure relates to the technical field of computers, in particular to the technical field of cloud computing or cloud service, and specifically relates to a service retry management method, device, electronic equipment and readable storage medium.
Background
In the service call link, service call failure is often caused by temporary faults such as service node faults and network jitter, and the problems can be well avoided by retrying the failed call request, so that the usability of the system is improved.
In the prior art, retrying the call request causes load pressure of the service node, and increases the risk of system failure.
Disclosure of Invention
In order to solve at least one of the above defects, the present disclosure provides a method, an apparatus, an electronic device, and a readable storage medium for managing service retries.
According to a first aspect of the present disclosure, there is provided a method of managing service retries, the method comprising:
acquiring the time consumption of a request of a call request sent by a first service node to a second service node in a first time window, wherein the first service node and the second service node are adjacent nodes in a service call link, the first service node is positioned at the upstream of the second service node in the service call link, the time consumption of the request is the duration between a first moment and a second moment, the first moment is the moment when the first service node sends the call request, and the second moment is the moment when the first service node receives a response result of the call request returned by the second service node;
And determining a retry time length based on the request time consumption, wherein the retry time length is the time length between the sending time of the call request and the sending time of the retry request corresponding to the call request.
According to a second aspect of the present disclosure, there is provided a management apparatus for service retry, the apparatus comprising:
the system comprises a request time consumption acquisition module, a first service node and a second service node, wherein the request time consumption acquisition module is used for acquiring the request time consumption of a call request sent by the first service node to the second service node in a first time window, the first service node and the second service node are adjacent nodes in a service call link, the first service node is positioned at the upstream of the second service node in the service call link, the request time consumption is the duration between a first moment and a second moment, the first moment is the moment when the first service node sends the call request, and the second moment is the moment when the first service node receives the response result of the call request returned by the second service node;
the retry duration determining module is used for determining retry duration based on the time consumption of the request, wherein the retry duration is the duration between the sending time of the call request and the sending time of the retry request corresponding to the call request.
According to a third aspect of the present disclosure, there is provided an electronic device comprising:
At least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of managing service retries.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the above-described management method of service retry.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the above-described method of managing service retries.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 is a schematic diagram of a case where a service call link causes a system failure due to retry.
FIG. 2 is a flow chart of a method for managing service retries according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a change in post-request time consumption for retry management of a service call link according to a method for managing service retries provided by an embodiment of the present disclosure;
FIG. 4 is a flow chart of a specific implementation of a method for managing service retries provided by an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a service retry management apparatus according to an embodiment of the present disclosure;
fig. 6 is a block diagram of an electronic device for implementing a method of managing service retries of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The service call link generally includes multiple stages of service nodes, and an upstream service node may send a call request to a downstream node to call the downstream node to complete a specific service function. The service node may have problems of node failure, network jitter, node performance bottleneck caused by resource competition, and the like, so that temporary failure of the service node is caused, and the problems can be avoided by retrying a failed call request, thereby improving the usability of the system.
Retrying the scheduling request causes load stress on the service node, so that the retrying request needs to be effectively managed to avoid overload of the downstream service node.
Some retry strategies are provided in the related art for controlling the retry request, but the retry strategies do not fully consider the running state of the called downstream service node, and when the downstream service node is in a high-load state, the retry request of the downstream service node is easy to overload, so that the system fault of the service call link is caused.
As an example, a schematic diagram of a case where a service call link causes a system failure due to retry is shown in fig. 1.
As shown in fig. 1, the service invocation link includes a service node a, a service node B, and a service node C, wherein the service node B is located upstream of the service node C, the service node a is located upstream of the service node B, the service node B may invoke the service node C, and the service node a may invoke the service node B.
Four sub-graphs are formed in fig. 1 by groups a-B-C, which are arranged in sequence from top to bottom, each sub-graph being used to represent the load situation of each service node in the service invocation link over time. The first subgraph from top to bottom shows that the load state of each service node in the service call link is normal, and the load state of the service node C is full load operation. The second sub-graph from top to bottom shows that the service node C in the service link is overloaded. Due to network jitter and other reasons, the service node B adds a small number of retry requests to the service node C, and the service node C is already in full-load operation, so that the number of retry requests added by the service node B increases, thereby causing overload of the service node C. The third sub-graph from top to bottom shows that both serving node B and serving node C are overloaded in the serving link. After service node C is overloaded, the invocation request will be queued at service node B, resulting in performance degradation for service node B, which causes service node a to fail heavily to invoke the invocation request to service node B, and service node a will send retry requests to service node B until service node B is overloaded. The third sub-graph from top to bottom shows that service node a, service node B, and service node C in the service link are all overloaded. Failure of service node C may be conducted along the service invocation link to the upstream service node B. Likewise, failure of service node B may be conducted along the service invocation link to upstream service node a until the entire system fails.
The step-by-step fault conduction arrow in fig. 1 indicates that over time, a fault may be conducted along the service call link by a service node downstream to a service node upstream. The retry traffic increases exponentially, meaning that as the downstream service node fault condition increases, the number of retry requests sent by the upstream node to the downstream node increases rapidly.
As can be seen from the example in fig. 1, when a certain service node on the service call link fails, the fault is conducted step by step towards the upstream, and eventually causes the overall system fault, so that the number of retry requests of each stage in the service call link needs to be effectively controlled to avoid node overload caused by sending the retry requests, thereby avoiding the system fault caused by multi-stage conduction of the service node fault.
The embodiment of the disclosure provides a service retry management method, a device, an electronic apparatus and a readable storage medium, which aim to solve at least one of the above technical problems in the prior art.
Fig. 2 is a flow chart illustrating a method for managing service retry according to an embodiment of the disclosure, where, as shown in fig. 2, the method may mainly include:
step S210: acquiring the time consumption of a request of a call request sent by a first service node to a second service node in a first time window, wherein the first service node and the second service node are adjacent nodes in a service call link, the first service node is positioned at the upstream of the second service node in the service call link, the time consumption of the request is the duration between a first moment and a second moment, the first moment is the moment when the first service node sends the call request, and the second moment is the moment when the first service node receives a response result of the call request returned by the second service node;
Step S220: and determining a retry time length based on the request time consumption, wherein the retry time length is the time length between the sending time of the call request and the sending time of the retry request corresponding to the call request.
The service call link generally includes multiple stages of service nodes, the first service node and the second service node are adjacent nodes in the service call link, and the first service node is located upstream of the second service node in the service call link, i.e. the first service node may send a call request to the second service node to call the downstream node to complete a specific service function.
The request for a call request is time consuming, which is the time period from when the call request is issued to when the return result of the call request is received.
After the first service node sends a call request to the second service node, the call request may fail, the call request needs to be retransmitted, and the retransmitted call request is a retry request.
For any call request, the retry request corresponding to the call request is the call request which is resent when the call request fails.
In the embodiment of the disclosure, the retry duration may be waited after the call request is sent, and then the retry request corresponding to the call request is sent.
In actual use, if the retry duration is set too low, a large number of retry requests are generated, so that a large pressure is caused on the downstream service node, overload of the downstream service node is easily caused, and if the retry duration is set too high, the response speed of the service is influenced. Therefore, reasonably setting the retry period is significant for the stability of the service invocation link.
The duration of the first time window may be set according to actual needs, for example, to 10s. To ensure the real-time performance of statistics, the first time window may slide over time, i.e. the end time of the first time window is designated as the current time. And the dynamic configuration of the retry time length is realized by setting a first time window which dynamically slides, so that the timeliness of the update of the retry time length is ensured.
The time consumption of the call request can reflect the processing speed of the second service node for the call request. The processing speed of the call request when the running state of the second service node is normal is obviously faster than that when the running state of the second service node is high-load running. Therefore, by counting the time consumption of all the call requests in the first time window, the running state of the second service node can be reflected. Based on the time consumption of the request of the call request in the first time window, the retry duration is determined, which is equivalent to determining the retry duration based on the running state of the second service node, namely, the running state of the downstream service node is fully considered to set the retry duration, so that reasonable configuration of the retry duration can be realized, namely, effective management of the retry request is realized through reasonable configuration of the initiation opportunity of the retry request, overload of the downstream service node caused by the retry request can be avoided, and system faults of a service call link are avoided.
According to the method provided by the embodiment of the disclosure, the time consumption of a request of a call request sent by a first service node to a second service node in a first time window is obtained, the first service node and the second service node are adjacent nodes in a service call link, the first service node is located at the upstream of the second service node in the service call link, and the time consumption of the request is the time between the time when the first service node sends the call request and the time when the first service node receives a response result of the call request returned by the second service node; and determining a retry time length based on the request time consumption, wherein the retry time length is the time length between the sending time of the call request and the sending time of the retry request corresponding to the call request. In the scheme, the retry time length is configured based on the time consumption of the request of the call request in the first time window, and the retry request is effectively managed through reasonable configuration of the retry time length, so that overload of a downstream service node can be avoided.
In an alternative manner of the present disclosure, determining a retry duration based on a request time consumption includes:
and determining the request time consumption of the designated quantile in the request time consumption of each call request as a retry time length.
In the embodiment of the disclosure, the distribution of the request time consumption of each call request in the first time window can be counted, and the request time consumption of the designated quantile in the request time consumption distribution is determined as the retry time.
For example, the time-consuming duration of the request specifying a quantile of 95 quantiles is determined as the retry duration. In the example, the retry time is determined by setting the time length of the 95-quantile request time consumption, so that the call request with the request time consumption exceeding 95-quantile only initiates the corresponding retry request, namely, the number proportion of the retry requests in the whole call request is controlled to be about 5%, and the effective control of the retry request number is realized.
In the embodiment of the disclosure, the retry duration may be determined based on the request time consumption in the first time window at the current time, and the determined retry time is applied to long-time retry request after the current time. For example, as the load of the downstream service node becomes higher and the request time consumption increases, the request time consumption statistics in the first time window correspondingly increases, and accordingly the determined retry time period also increases, which enables the number proportion of retry requests in the overall call request to be controlled to be about a stable threshold, for example, when the designated quantile is 95 quantile, the stable threshold is 5%.
In an optional manner of the disclosure, the method further includes:
and responding to the response condition of the second service node to the call request to meet the preset call failure condition, and generating a retry request corresponding to the call request.
In the embodiment of the disclosure, a call failure condition may be preset, and when the response condition of the call request meets the call failure condition, the call request is indicated to fail, and at this time, a retry request corresponding to the call request may be generated.
As one example, the call failure condition may be that a specified period of time is exceeded after the call request is issued, and the return result is not yet received. The specified duration may be equal to the retry duration or less than the retry duration.
In the embodiment of the present disclosure, generating a retry request may be understood as taking a call request that fails to call as a retry request.
In an optional manner of the disclosure, the method further includes:
acquiring the current duty ratio of a retry request in a call request sent by a first service node to a second service node in a second time window, wherein the ending time of the second time window is the current time, and the length of the second time window is the appointed duration;
and controlling the number of retry requests sent by the first service node to the second service node based on the current duty ratio and a preset duty ratio threshold.
In the embodiment of the disclosure, the duty ratio threshold may be preset, and the current duty ratio of the retry request in the call request sent by the first service node to the second service node in the second time window is monitored, and when the current duty ratio is not less than the duty ratio threshold, it indicates that the duty ratio of the retry request in the current call request is too high, which easily causes the load pressure of the downstream service node to be too high, and the number of retry requests may be controlled. When the current duty ratio is smaller than the duty ratio threshold value, the current calling request indicates that the retry request duty ratio does not cause excessive load pressure of the downstream service node, and the number of the current retry requests can not be controlled.
The duration of the second time window may be set according to actual needs, for example, to 10s. To ensure the real-time performance of statistics, the second time window may be slid over time, i.e. the end time of the second time window is designated as the current time. And setting a second time window for dynamic sliding to realize dynamic monitoring of the current duty ratio of the retry request in the call request so as to control the number of the retry requests in time.
In an optional manner of the disclosure, controlling a retry request sent by a first service node to a second service node based on a current duty ratio and a preset duty ratio threshold value includes:
And responding to the current duty ratio not smaller than a preset duty ratio threshold, and reducing the number of retry requests sent by the first service node to the second service node according to a preset current limiting strategy until the current duty ratio is smaller than the preset duty ratio threshold.
In the embodiment of the disclosure, when the current duty ratio is not smaller than the duty ratio threshold, the current calling request indicates that the duty ratio of the retry request is too high, so that the load pressure of the downstream service node is easily caused to be too high. And along with the sliding of the second time window, stopping using the current limiting strategy until the current duty ratio counted in the second time window is smaller than a preset duty ratio threshold value.
In the embodiment of the disclosure, when the current limiting strategy is triggered, the current limiting strategy can be recorded as a retry fusing state.
In an alternative manner of the present disclosure, reducing the number of retry requests sent by a first service node to a second service node according to a preset throttling policy includes:
and controlling the first service node to stop sending the retry request to the second service node.
In this embodiment of the present disclosure, the throttling policy may be to control the first service node to stop sending the retry request to the second service node, where the load pressure of the second service node may be greatly reduced, and after a period of recovery, the load pressure of the second service node may return to a reasonable range.
In an alternative manner of the present disclosure, reducing the number of retry requests sent by a first service node to a second service node according to a preset throttling policy includes:
determining a load state of the second service node based on the time consumption of a request of a call request sent by the first service node to the second service node in the second time window;
determining a retry request quantity threshold based on the load status;
and controlling the number of retry requests sent by the first service node to the second service node to be not more than a retry request number threshold value in each third time window after the current moment.
In an embodiment of the disclosure, the throttling policy may further determine a load state of the second service node based on the request time consumption within the second time window, and determine a retry request number threshold based on the load state. The retry request number threshold is an upper limit of the retry request number for the third time window. And the effective current limiting of the retry request is realized by setting the retry request quantity threshold.
In the embodiment of the disclosure, the threshold value of the number of retry requests is determined by the load state of the second service node, so that reasonable current limiting of the retry requests based on the load state is realized.
The third time window may be of equal length in a plurality of consecutive arrangements after the current time. To facilitate a precise control of the number of retry requests, the duration of the third time window may be small, e.g. 1s.
In an optional manner of the disclosure, determining, based on a time consumption of a request of a call request sent by a first service node to a second service node in a second time window, a current load state of the second service node includes:
and determining the load state of the second service node based on a first corresponding relation between the preset request time consumption and the load state and based on the request time consumption of the call request sent by the first service node to the second service node in the second time window.
In the embodiment of the disclosure, the load state may be represented by a load value, and the load value represents the load pressure.
The first corresponding relation between the request time consumption and the load state can be preset because the request time consumption can intuitively reflect the load state of the second service node. In the first correspondence, a higher request time corresponds to a higher load value, and a lower request time corresponds to a lower load value.
The load state of the second service node can be effectively determined based on the first corresponding relation, and a basis is provided for determining the retry request quantity threshold based on the load state.
In the embodiment of the disclosure, the load state may also be determined together based on the performance parameter of the second service node and the time consumption of the request of the call request sent by the first service node to the second service node in the second time window.
Specifically, the first weight may be determined based on the performance parameter of the second service node, the second weight may be determined based on the time consumption of the request of the first service node to the call request sent by the second service node in the second time window, and the first weight and the second weight are weighted to obtain a load value, where the load value represents the comprehensive load state of the second service node.
In an alternative manner of the present disclosure, determining a retry request number threshold based on a load status includes:
and determining a retry request quantity threshold corresponding to the load state of the second service node based on a second corresponding relation between the preset load state and the retry request quantity threshold.
In the embodiment of the disclosure, there is a certain correlation between the load state and the retry request number threshold, specifically, when the load pressure is higher, the number of retry requests should be greatly reduced to reduce the load pressure of the second service node, and at this time, a lower retry request number threshold may be set; accordingly, the lower the load pressure, the smaller the number of retry requests can be reduced, and a higher threshold of the number of retry requests can be set at this time. Based on the correlation, a second corresponding relation between the load state and the request quantity threshold value can be preset, so as to determine the current retry request quantity threshold value of the second service node based on the second corresponding relation
In an optional manner of the disclosure, the method further includes:
allocating tokens to the call request based on whether tokens remain in a preset token bucket, wherein a preset number of tokens are allocated in the token bucket in a fourth time window;
in response to the call request being assigned a token, the call request is sent to the second service node.
In the embodiment of the disclosure, in order to avoid overlarge load pressure of the downstream service node caused by intensively sending a large number of call requests to the downstream service node in a short time, the number of call requests sent to the downstream service node in a unit time can be controlled by a token bucket.
Specifically, a specified number of tokens may be distributed to the token bucket in the fourth time window, and the tokens may be distributed to the call request, until the tokens in the token bucket are all distributed, the call request to which the tokens are distributed may not be sent to the second service node, and the call request to which the tokens are not distributed may not be sent to the second service node, and it may be determined that the request fails.
By setting the token bucket, it can be ensured that the call request is uniformly transmitted.
In the embodiment of the disclosure, the first service node controls the retry time length and the current duty ratio of the retry flow, so that the effective management of the retry request can be realized, and the node overload caused by sending the retry request is avoided, thereby avoiding the system fault caused by the multistage conduction of the service node fault.
In the related art, a control center node may be set, and service states of service nodes in a service call link are collected by the control center node, so as to generate a retry strategy and send the retry strategy to the service nodes. The method depends on the data interaction between the control center node and each service node, when faults such as network delay exist, the retry strategy cannot be issued in time, so that the retry strategy of each service node cannot be updated in time, and management of the retry request is affected. Meanwhile, the process of collecting the service states of all the service nodes by the control center node has certain time consumption, so that hysteresis is caused in the process of managing the retry request. In addition, once the control center node fails (e.g., is down), management of retry requests for the entire service call link may fail.
In the embodiment of the disclosure, the retry duration is determined by collecting the time consumption of the request of the call request sent by each service node, and each service node monitors the self retry request duty ratio, so that the duty ratio threshold is prevented from being exceeded, the effective management of the retry request of the service node is realized, the timeliness is strong, and the control center node is not required to be relied on.
In an optional implementation manner of the disclosed embodiment, each service node in the service invocation link may periodically send a record of the retry request duty cycle (i.e., the number of retry requests in the invocation request) to an upper level, and the downstream service node may send its own retry request duty cycle and the received retry request duty cycle to an upstream service node.
Referring to the example in fig. 1, the service node C may send a record of the retry request duty cycle to the service node B periodically, and the service node B may send the retry request duty cycle of the service node B and the retry request duty cycle of the service node C to the service node a together after receiving the record of the retry request duty cycle sent by the service node C, at which time the service node a can obtain the retry request duty cycle record of each service node in the entire service call link.
The service calling node at the topmost end of the service calling link can count the times that each service node triggers the retry fusing state according to the retry request duty ratio record so as to carry out subsequent management on the downstream service node. Specifically, the service node with higher number of times of triggering the retry fusing state has weaker processing capability of the downstream service node, and the processing of improving the processing capability of the downstream service node, such as expanding or upgrading the task processing module, can be performed on the downstream service node.
As an example, a schematic diagram of a change in time consumption of a post-request for retry management of a service call link according to a management method of service retry provided in an embodiment of the present disclosure is shown in fig. 3.
As shown in fig. 3, the number of retry requests varies over four sample periods. Wherein S1, S2, S3 … … Sx identify different times within each sampling period. Retry the fusing ratio, i.e., the duty cycle threshold.
The initial retry time is the retry duration at the time of entering the first adoption period. At this point there are two retry requests in the first sample period. Upon entering the second sampling period, the retry period is reset based on the time consumption of the request in the first sampling period, and the retry period in the second sampling period is increased. There are three retry requests in the second sample period. At this time, the duty ratio of the retry request exceeds the duty ratio threshold, the retry request is stopped to be sent in the third sampling period, and then the duty ratio of the retry request gradually returns to be below the duty ratio threshold, and at this time, the retry request can be returned to be sent in the fourth sampling period.
As can be seen from the example in fig. 3, the service retry management method provided by the embodiment of the present disclosure can implement effective management of retry requests, so as to avoid overload of service nodes.
As an example, a flowchart of a specific implementation of a method for managing service retry provided by an embodiment of the disclosure is shown in fig. 4.
In the example shown in fig. 4, communication between service nodes may be implemented based on remote procedure calls (Remote Procedure Call, RPC), i.e. the service nodes correspond to an RPC client (client) component. The internal dynamic retry strategy of the component is independently implemented independent of the network communication part to achieve the dynamic management of the retry request. The service retry management method in this example can be applied to interaction scenarios under various network protocols, such as hypertext transfer protocol (Hyper Text Transfer Protocol, HTTP) protocol communication, remote dictionary service (Remote Dictionary Server, dis) protocol, and the like.
In fig. 4, br_ms represents a retry time waiting interval to be set for the current downstream access, br_ms= -1, indicating that no retry request is sent, which is typically the case when retry is in a blown state or not in a blown state, but no acquisition token is set to that value; br_ms=xxms represents the current downstream access set retry period.
A downstream request event, i.e. a call request to be sent to a downstream service node.
The tokens are acquired, namely, the tokens are allocated for the call requests based on the token bucket, the call requests of the allocated tokens can judge whether the retry can be carried out or not, and the call requests of the unallocated tokens can be directly refused to be sent and the request is considered to be failed.
Whether retry is possible, i.e. whether a retry fusing state is currently started or not is determined, in this example, the current limiting policy in the retry fusing state is to stop sending the retry flow request. At this time, when it is judged that the retry fusing state is not triggered, the call request may be sent to the downstream service node (i.e., accessed downstream); when the retry fusing state is determined to be triggered, the call request may be directly refused to be sent, and the request may be considered to fail.
The event of blowing, i.e. triggering a retry of the blown state, invokes an event that requests that the request be denied to be sent, which may be classified as an access failure event.
Both access success events and access failure events are collected and used as fusing/retry decisions, i.e., for management of service retries.
Based on the same principle as the method shown in fig. 2, fig. 5 shows a schematic structural diagram of a service retry management apparatus provided in an embodiment of the present disclosure, and as shown in fig. 5, the service retry management apparatus 50 may include:
a request time consumption obtaining module 510, configured to obtain a request time consumption of a call request sent by a first service node to a second service node in a first time window, where the first service node and the second service node are neighboring nodes in a service call link, the first service node is located upstream of the second service node in the service call link, the request time consumption is a duration between a first time and a second time, the first time is a time when the first service node sends the call request, and the second time is a time when the first service node receives a response result of the call request returned by the second service node;
the retry duration determining module 520 is configured to determine a retry duration based on the request time, where the retry duration is a duration between a sending time of the call request and a sending time of the retry request corresponding to the call request.
According to the device provided by the embodiment of the disclosure, the time consumption of the request of the call request sent by the first service node to the second service node in the first time window is obtained, the first service node and the second service node are adjacent nodes in the service call link, the first service node is positioned at the upstream of the second service node in the service call link, and the time consumption of the request is the time between the time when the first service node sends the call request and the time when the first service node receives the response result of the call request returned by the second service node; and determining a retry time length based on the request time consumption, wherein the retry time length is the time length between the sending time of the call request and the sending time of the retry request corresponding to the call request. In the scheme, the retry time length is configured based on the time consumption of the request of the call request in the first time window, and the retry request is effectively managed through reasonable configuration of the retry time length, so that overload of a downstream service node can be avoided.
Optionally, the retry duration determination module is specifically configured to, when determining the retry duration based on the request time consumption:
and determining the request time consumption of the designated quantile in the request time consumption of each call request as a retry time length.
Optionally, the apparatus further includes:
the retry request generation module is used for responding to the response condition of the second service node to the call request to meet the preset call failure condition and generating a retry request corresponding to the call request.
Optionally, the apparatus further includes:
the retry request quantity control module is used for:
acquiring the current duty ratio of a retry request in a call request sent by a first service node to a second service node in a second time window, wherein the ending time of the second time window is the current time, and the length of the second time window is the appointed duration;
and controlling the number of retry requests sent by the first service node to the second service node based on the current duty ratio and a preset duty ratio threshold.
Optionally, the retry request quantity control module is specifically configured to, when controlling the retry request sent by the first service node to the second service node based on the current duty ratio and a preset duty ratio threshold value:
and responding to the current duty ratio not smaller than a preset duty ratio threshold, and reducing the number of retry requests sent by the first service node to the second service node according to a preset current limiting strategy until the current duty ratio is smaller than the preset duty ratio threshold.
Optionally, the retry request number control module is specifically configured to, when reducing the number of retry requests sent by the first service node to the second service node according to a preset throttling policy:
and controlling the first service node to stop sending the retry request to the second service node.
Optionally, the retry request number control module is specifically configured to, when reducing the number of retry requests sent by the first service node to the second service node according to a preset throttling policy:
determining a load state of the second service node based on the time consumption of a request of a call request sent by the first service node to the second service node in the second time window;
determining a retry request quantity threshold based on the load status;
and controlling the number of retry requests sent by the first service node to the second service node to be not more than a retry request number threshold value in each third time window after the current moment.
Optionally, the retry request quantity control module is specifically configured to, when determining the current load status of the second service node based on the time consumption of the request of the call request sent by the first service node to the second service node in the second time window:
and determining the load state of the second service node based on a first corresponding relation between the preset request time consumption and the load state and based on the request time consumption of the call request sent by the first service node to the second service node in the second time window.
Optionally, the retry request number control module is specifically configured to, when determining the retry request number threshold based on the load status:
and determining a retry request quantity threshold corresponding to the load state of the second service node based on a second corresponding relation between the preset load state and the retry request quantity threshold.
Optionally, the apparatus further includes a token limiting module, where the token limiting module is configured to:
allocating tokens to the call request based on whether the tokens remain in a preset token bucket in a fourth time window, wherein a preset number of tokens are allocated in the token bucket in the fourth time window;
in response to the call request being assigned a token, the call request is sent to the second service node.
It can be understood that the above-described modules of the service retry management apparatus in the embodiment of the present disclosure have functions of implementing the respective steps of the service retry management method in the embodiment shown in fig. 2. The functions can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the functions described above. The modules may be software and/or hardware, and each module may be implemented separately or may be implemented by integrating multiple modules. The functional description of each module of the service retry management apparatus may be specifically referred to the corresponding description of the service retry management method in the embodiment shown in fig. 2, and will not be repeated here.
In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
The electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of managing service retries as provided by embodiments of the present disclosure.
Compared with the prior art, the electronic device has the advantages that the time consumption of the request of the call request sent by the first service node to the second service node in the first time window is obtained, the first service node and the second service node are adjacent nodes in the service call link, the first service node is located at the upstream of the second service node in the service call link, and the time consumption of the request is the time between the time when the first service node sends the call request and the time when the first service node receives the response result of the call request returned by the second service node; and determining a retry time length based on the request time consumption, wherein the retry time length is the time length between the sending time of the call request and the sending time of the retry request corresponding to the call request. In the scheme, the retry time length is configured based on the time consumption of the request of the call request in the first time window, and the retry request is effectively managed through reasonable configuration of the retry time length, so that overload of a downstream service node can be avoided.
The readable storage medium is a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method of managing service retries as provided by embodiments of the present disclosure.
Compared with the prior art, the readable storage medium has the advantages that the time consumption of a request of a call request sent by a first service node to a second service node in a first time window is acquired, the first service node and the second service node are adjacent nodes in a service call link, the first service node is positioned at the upstream of the second service node in the service call link, and the time consumption of the request is the time between the time when the first service node sends the call request and the time when the first service node receives a response result of the call request returned by the second service node; and determining a retry time length based on the request time consumption, wherein the retry time length is the time length between the sending time of the call request and the sending time of the retry request corresponding to the call request. In the scheme, the retry time length is configured based on the time consumption of the request of the call request in the first time window, and the retry request is effectively managed through reasonable configuration of the retry time length, so that overload of a downstream service node can be avoided.
The computer program product comprises a computer program which, when executed by a processor, implements a method of managing service retries as provided by embodiments of the present disclosure.
Compared with the prior art, the computer program product has the advantages that the time consumption of the request of the call request sent by the first service node to the second service node in the first time window is acquired, the first service node and the second service node are adjacent nodes in the service call link, the first service node is positioned at the upstream of the second service node in the service call link, and the time consumption of the request is the time between the time when the first service node sends the call request and the time when the first service node receives the response result of the call request returned by the second service node; and determining a retry time length based on the request time consumption, wherein the retry time length is the time length between the sending time of the call request and the sending time of the retry request corresponding to the call request. In the scheme, the retry time length is configured based on the time consumption of the request of the call request in the first time window, and the retry request is effectively managed through reasonable configuration of the retry time length, so that overload of a downstream service node can be avoided.
Fig. 6 shows a schematic block diagram of an example electronic device 60 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the electronic device 60 includes a computing unit 610 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 620 or a computer program loaded from a storage unit 680 into a Random Access Memory (RAM) 630. In RAM 630, various programs and data required for the operation of device 60 may also be stored. The computing unit 610, ROM 620, and RAM 630 are connected to each other by a bus 640. An input/output (I/O) interface 650 is also connected to bus 640.
Various components in device 60 are connected to I/O interface 650, including: an input unit 660 such as a keyboard, a mouse, etc.; an output unit 670 such as various types of displays, speakers, and the like; a storage unit 680 such as a magnetic disk, an optical disk, or the like; and a communication unit 690 such as a network card, modem, wireless communication transceiver, etc. The communication unit 690 allows the device 60 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 610 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 610 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 610 performs a management method of service retries provided in the embodiments of the present disclosure. For example, in some embodiments, the management method of performing the service retry provided in the embodiments of the present disclosure may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 680. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 60 via ROM 620 and/or communication unit 690. When the computer program is loaded into the RAM 630 and executed by the computing unit 610, one or more steps of a method of managing service retries provided in embodiments of the present disclosure may be performed. Alternatively, in other embodiments, the computing unit 610 may be configured by any other suitable means (e.g., by means of firmware) to perform the method of managing service retries provided in the embodiments of the present disclosure.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (20)

1. A method of managing service retries, comprising:
acquiring the time consumption of a request of a call request sent by a first service node to a second service node in a first time window, wherein the first service node and the second service node are adjacent nodes in a service call link, the first service node is positioned at the upstream of the second service node in the service call link, the time consumption of the request is the time between a first time and a second time, the first time is the time when the first service node sends the call request, and the second time is the time when the first service node receives a response result of the call request returned by the second service node;
And determining retry time length based on the time consumption of the request, wherein the retry time length is the time length between the sending time of the call request and the sending time of the retry request corresponding to the call request.
2. The method of claim 1, wherein the determining a retry duration based on the request elapsed time comprises:
and determining the request time consumption of the designated quantile in the request time consumption of each call request as a retry time length.
3. The method of claim 1 or 2, further comprising:
and responding to the response condition of the second service node to the call request to meet a preset call failure condition, and generating a retry request corresponding to the call request.
4. A method according to any one of claims 1-3, further comprising:
acquiring the current duty ratio of a retry request in a call request sent by the first service node to the second service node in a second time window, wherein the ending time of the second time window is the current time, and the length of the second time window is the appointed time;
and controlling the number of retry requests sent by the first service node to the second service node based on the current duty ratio and a preset duty ratio threshold.
5. The method of claim 4, wherein the controlling the retry request from the first service node to the second service node based on the current duty cycle and a preset duty cycle threshold comprises:
and in response to the current duty ratio not being smaller than a preset duty ratio threshold, reducing the number of retry requests sent by the first service node to the second service node according to a preset current limiting strategy until the current duty ratio is smaller than the preset duty ratio threshold.
6. The method of claim 5, wherein the reducing the number of retry requests issued by the first service node to the second service node in accordance with a preset throttling policy comprises:
and controlling the first service node to stop sending the retry request to the second service node.
7. The method of claim 5, wherein the reducing the number of retry requests issued by the first service node to the second service node in accordance with a preset throttling policy comprises:
determining a load state of the second service node based on time consuming of a request of a call request sent by the first service node to the second service node in the second time window;
Determining a retry request quantity threshold based on the load status;
and controlling the number of retry requests sent by the first service node to the second service node to be not more than the threshold of the number of retry requests in each third time window after the current moment.
8. The method of claim 7, wherein the determining the current load state of the second service node based on the time consuming request of the call request issued by the first service node to the second service node within the second time window comprises:
and determining the load state of the second service node based on a first corresponding relation between preset request time consumption and load state and based on the request time consumption of a call request sent by the first service node to the second service node in the second time window.
9. The method of claim 7 or 8, wherein the determining a retry request number threshold based on the load status comprises:
and determining a retry request quantity threshold corresponding to the load state of the second service node based on a second corresponding relation between the preset load state and the retry request quantity threshold.
10. The method according to any one of claims 1-9, further comprising:
Allocating tokens to the call request based on whether tokens remain in a preset token bucket or not in a fourth time window, wherein a preset number of tokens are allocated in the token bucket in the fourth time window;
a token is assigned in response to the call request, and the call request is sent to the second service node.
11. A management apparatus for service retry, comprising:
the system comprises a request time consumption acquisition module, a first service node and a second service node, wherein the request time consumption acquisition module is used for acquiring the request time consumption of a call request sent by the first service node to the second service node in a first time window, the first service node and the second service node are adjacent nodes in a service call link, the first service node is positioned at the upstream of the second service node in the service call link, the request time consumption is the duration between a first moment and a second moment, the first moment is the moment when the first service node sends the call request, and the second moment is the moment when the first service node receives the response result of the call request returned by the second service node;
the retry duration determining module is used for determining retry duration based on the time consumption of the request, wherein the retry duration is the duration between the sending time of the call request and the sending time of the retry request corresponding to the call request.
12. The apparatus of claim 11, wherein the retry duration determination module, when determining a retry duration based on the request time consumption, is specifically configured to:
and determining the request time consumption of the designated quantile in the request time consumption of each call request as a retry time length.
13. The apparatus of claim 11 or 12, further comprising:
and the retry request generation module is used for responding to the response condition of the second service node to the call request to meet a preset call failure condition and generating a retry request corresponding to the call request.
14. The apparatus of any of claims 11-13, further comprising a retry request quantity control module to:
acquiring the current duty ratio of a retry request in a call request sent by the first service node to the second service node in a second time window, wherein the ending time of the second time window is the current time, and the length of the second time window is the appointed time;
and controlling the number of retry requests sent by the first service node to the second service node based on the current duty ratio and a preset duty ratio threshold.
15. The apparatus of claim 14, wherein the retry request quantity control module is configured to, when controlling the retry request sent by the first service node to the second service node based on the current duty cycle and a preset duty cycle threshold:
and in response to the current duty ratio not being smaller than a preset duty ratio threshold, reducing the number of retry requests sent by the first service node to the second service node according to a preset current limiting strategy until the current duty ratio is smaller than the preset duty ratio threshold.
16. The apparatus of claim 15, wherein the retry request quantity control module is configured to, when reducing the number of retry requests issued by the first service node to the second service node according to a preset throttling policy:
and controlling the first service node to stop sending the retry request to the second service node.
17. The apparatus of claim 15, wherein the retry request quantity control module is configured to, when reducing the number of retry requests issued by the first service node to the second service node according to a preset throttling policy:
determining a load state of the second service node based on time consuming of a request of a call request sent by the first service node to the second service node in the second time window;
Determining a retry request quantity threshold based on the load status;
and controlling the number of retry requests sent by the first service node to the second service node to be not more than the threshold of the number of retry requests in each third time window after the current moment.
18. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.
19. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-10.
20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-10.
CN202310673959.0A 2023-06-07 2023-06-07 Service retry management method and device, electronic equipment and readable storage medium Pending CN116841797A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310673959.0A CN116841797A (en) 2023-06-07 2023-06-07 Service retry management method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310673959.0A CN116841797A (en) 2023-06-07 2023-06-07 Service retry management method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN116841797A true CN116841797A (en) 2023-10-03

Family

ID=88168026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310673959.0A Pending CN116841797A (en) 2023-06-07 2023-06-07 Service retry management method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN116841797A (en)

Similar Documents

Publication Publication Date Title
US9027025B2 (en) Real-time database exception monitoring tool using instance eviction data
EP3264723B1 (en) Method, related apparatus and system for processing service request
CN113381944B (en) System current limiting method, apparatus, electronic device, medium, and program product
CN112965823B (en) Control method and device for call request, electronic equipment and storage medium
CN112508768B (en) Single-operator multi-model pipeline reasoning method, system, electronic equipment and medium
JP7037066B2 (en) Evaluation device, evaluation method and evaluation program
CN112506619A (en) Job processing method, apparatus, electronic device, storage medium, and program product
CN111538572A (en) Task processing method, device, scheduling server and medium
CN108415765B (en) Task scheduling method and device and intelligent terminal
CN116567077A (en) Bare metal instruction sending method, device, equipment and storage medium
CN116661960A (en) Batch task processing method, device, equipment and storage medium
CN111400045A (en) Load balancing method and device
CN109104334B (en) Management method and device for nodes in monitoring system
CN116841797A (en) Service retry management method and device, electronic equipment and readable storage medium
CN113486229B (en) Control method and device for grabbing pressure, electronic equipment and readable storage medium
CN113747506A (en) Resource scheduling method, device and network system
CN112988417A (en) Message processing method and device, electronic equipment and computer readable medium
CN113342463B (en) Capacity adjustment method, device, equipment and medium of computer program module
CN117421331A (en) Data query optimization method, device, equipment and storage medium
CN117640729A (en) Flow control method, configuration information sending method, device and electronic equipment
CN114610575B (en) Method, apparatus, device and medium for calculating updated peak value of branch
US20230267060A1 (en) Performance testing method and apparatus, and storage medium
CN114598705B (en) Message load balancing method, device, equipment and medium
CN112532450B (en) Dynamic updating method and system for data stream distribution process configuration
CN117539719A (en) Application operation monitoring method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination