WO2018228323A1 - Service level control method and system for on-line service system, and readable storage medium - Google Patents

Service level control method and system for on-line service system, and readable storage medium Download PDF

Info

Publication number
WO2018228323A1
WO2018228323A1 PCT/CN2018/090613 CN2018090613W WO2018228323A1 WO 2018228323 A1 WO2018228323 A1 WO 2018228323A1 CN 2018090613 W CN2018090613 W CN 2018090613W WO 2018228323 A1 WO2018228323 A1 WO 2018228323A1
Authority
WO
WIPO (PCT)
Prior art keywords
service
quality
feature value
current
system state
Prior art date
Application number
PCT/CN2018/090613
Other languages
French (fr)
Chinese (zh)
Inventor
刘东辉
王俊杰
褚建辉
Original Assignee
广东神马搜索科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广东神马搜索科技有限公司 filed Critical 广东神马搜索科技有限公司
Publication of WO2018228323A1 publication Critical patent/WO2018228323A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5003Managing SLA; Interaction between SLA and QoS
    • H04L41/5019Ensuring fulfilment of SLA
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5032Generating service level reports

Definitions

  • the present invention relates to the field of online services, and in particular, to a service level control method and corresponding system for an online service system, and a readable storage medium.
  • a typical feature of an online service system based on Internet-provided services is that traffic is unstable. In addition to the obvious peak-to-valley distribution over time, the online service system's access traffic is also affected by hot events or special events. In order to ensure the stability of the services provided by the online service system, it is often necessary to reserve sufficient service capabilities for the online service system. However, the reserved resources are only designed based on the expected traffic peaks. The system still has a fatal risk in the case of a partial module exception of the service system (for example, the cache module is invalid), an upstream or downstream service exception or a malicious attack. In order to cope with these situations, the usual method is to downgrade the service, that is, to release the resources by sacrificing a certain quality of service, thereby ensuring the basic service capability of the system.
  • the existing downgrade scheme usually sets a threshold in advance. When the health state of the service system is lower than the threshold, the system switches to the degraded mode and provides only the most basic service. When the health status of the service system rises above the threshold, the system Restore to normal service status.
  • the existing downgrade scheme is simple to implement, but due to the instability of the service system, the system is likely to cause jitter near the threshold, and the system is in a high service pressure state for a long time.
  • the main object of the present invention is to provide a service level control method and system and a readable storage medium for a relatively stable online service system.
  • a service level control method for an online service system comprising: a service degradation step, in determining an online service When the system state of the system decreases, the service quality of the online service system is lowered to a level corresponding to the current system state; and the service recovery step, when determining that the system state is rising, suspends the restoration of the service quality to a level corresponding to the current system state.
  • the service recovery step may include: when determining that the system status is rising, restoring the quality of service to a level lower than a level corresponding to the current system state. Therefore, while improving the quality of service, a certain service capability can be reserved for the online service system to continue to restore its system state.
  • the system state is characterized by a feature value of the current service capability, the quality of service is divided into a plurality of levels corresponding to the plurality of feature value thresholds, and wherein the service downgrading step may include: determining that the system state is degraded such that the current feature When the value satisfies the first eigenvalue threshold, the service quality of the online service system is immediately lowered to a level corresponding to the first eigenvalue threshold; and the service recovery step may include: determining that the current eigenvalue satisfies the second trait when the system state is raised When the value is thresholded, the quality of service is suspended to a level corresponding to the second eigenvalue threshold.
  • the service degradation step and the service recovery step can be performed simply based on the relationship between the current feature value and the feature value threshold.
  • the service recovery step may include restoring the quality of service to a level corresponding to the second feature value threshold when determining that the system state is rising such that the current feature value satisfies the second feature value threshold for a predetermined length of time.
  • the current feature value can be made to satisfy the second eigenvalue threshold for a predetermined period of time as an additional condition. This can prevent the online service system from being under high stress, and can also provide additional recovery time for the online service system, which helps the system state to recover quickly.
  • the service recovery step may include: restoring the quality of service to a second eigenvalue threshold when determining that the system state is rising such that the current eigenvalue satisfies a third eigenvalue threshold that is higher than the second eigenvalue threshold quality level Level. Therefore, while improving the quality of service, a certain service capability can be reserved for the online service system to continue to restore its system state.
  • the service recovery step may include: when it is determined that the system state is raised such that the current feature value satisfies the second feature value threshold corresponding to the optimal quality of service level, only the current feature value further satisfies the better service than the second feature value representation Quality of service is returned to the best quality of service level when the best service regression is achieved.
  • the system state of the online service system is stable at a state suitable for providing the best quality of service level, and the system state is determined to be stable in providing quality of service.
  • the quality of service is restored to the best quality of service level when the status of the level is reached.
  • the feature value of the current service capability may preferably be the queue length currently required to be processed by the online service system.
  • an online quality of service control system comprising a distributed service system for providing an online service and one or more quality control servers, wherein one or more quality control servers are used Obtaining a feature value for characterizing the current service capability of the distributed service system; when the current feature value indicates that the system state of the distributed service system is degraded, immediately lowering the service quality of the distributed service system to a level corresponding to the current system state And suspending the restoration of the quality of service to a level corresponding to the current system state when the current eigenvalue characterizes the state of the system.
  • the one or more quality control servers are further configured to: when the current feature value indicates that the system state rises, restore the quality of service to a level lower than a level corresponding to the current system state.
  • the quality of service is divided into a plurality of levels corresponding to the plurality of feature value thresholds, and wherein the one or more quality control servers are further configured to: determine that the current state value meets the first feature value threshold by determining that the system state is degraded Immediately, the quality of service of the online service system is lowered to a level corresponding to the first feature value threshold; and when it is determined that the system state is increased such that the current feature value satisfies the second feature value threshold, the quality of service is suspended to be restored to the second The level corresponding to the eigenvalue threshold.
  • the one or more quality control servers are further configured to restore the quality of service to a level corresponding to the second feature value threshold when it is determined that the system state is raised such that the current feature value satisfies the second feature value threshold for a predetermined length of time.
  • the one or more quality control servers are further configured to restore the quality of service to the first time when it is determined that the system state rises such that the current feature value satisfies a third eigenvalue threshold that is higher than the second eigenvalue threshold quality level.
  • the level corresponding to the second eigenvalue threshold is higher than the second eigenvalue threshold.
  • the one or more quality control servers are further configured to: when the system state is raised such that the current feature value satisfies the second feature value threshold corresponding to the optimal quality of service level, only the current feature value further satisfies the second feature The quality of service is restored to the best quality of service level when the value represents the best service regression for better service quality.
  • the feature value of the current service capability is the queue length currently required to be processed by the distributed service system.
  • the feature value of the current service capability is a queue length currently required by one or more service modules of the distributed service system, and the one or more quality control servers adjust the quality of service of the one or more service modules according to the feature value. level.
  • the service level control method and system and the readable storage medium of the online service system of the present invention mainly include a service degradation step and a service recovery step, and the service degradation step may be performed immediately in response to a system state drop of the online service system.
  • the recovery step is to suspend recovery in response to a rise in the system state of the online service system. Therefore, by adopting a level control strategy of fast-downgrading and slow recovery, it is possible to prevent the adverse effects caused by the jitteriness and repeatability of the system state of the online service system. Moreover, the slow recovery can also provide more time for the online service system to further restore its system state.
  • FIG. 1 is a schematic flowchart showing a service level control method of an online service system according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram showing a downgrade trigger and upgrade recovery using a specific embodiment of the present invention
  • FIG. 3 is a functional block diagram showing the structure of an online quality of service control system 300 in accordance with an embodiment of the present invention
  • FIG. 4 is a functional block diagram showing the structure of an online quality of service control system 300 in accordance with another embodiment of the present invention.
  • the online service system described in the present invention mainly refers to an Internet application system for providing online services, in particular, a distributed online service system including a plurality of servers and divided into a plurality of modules, such as an online information recommendation service system, an online search service system, and the like. Wait.
  • the quality of service provided by the online service system may be divided into multiple (two or more) levels according to the state of the system, wherein the system state of the online service system may indicate the service capability or health status of the online service system.
  • the quality of service at each of the divided levels matches the level of service that the online service system can provide or is adapted to provide in the system state corresponding to that level.
  • an increase in the state of the system indicates an increase in the service capability or health state of the online service system
  • a decrease in the state of the system indicates a decline in the service capability or health state of the online service system.
  • FIG. 1 is a schematic flowchart showing a service level control method of an online service system according to an embodiment of the present invention.
  • the service level control method of the online service system of the present invention mainly includes a service downgrading step (step S110) and a service recovery step (step S120).
  • step S110 when it is determined that the system state of the online service system is degraded, the quality of service of the online service system is lowered to a level corresponding to the current system state.
  • the system status may indicate the service capability or the health status of the online service system.
  • the system status rises, it may indicate that the service capability of the online service system is enhanced or the health status is improved.
  • the system status is decreased, the online service system may be indicated. Service capacity declines or health status deteriorates. Therefore, it is possible to judge the rise or fall of the system state according to changes in the health status or service capability of the online service system.
  • the rise or fall of the system state may be determined according to various manners such as the service capability of the online service system or the change trend and degree of change of the health state.
  • a feature value eg, queue length to be processed
  • the system state can be directly determined by comparing the current feature value with a preset or real-time adjusted threshold.
  • the feature value increase indicates that the system state is rising (for example, the feature value takes the CPU idle rate), or the decrease of the feature value indicates that the system state is rising (for example, the foregoing pending queue) The case of length).
  • the present invention does not limit the proportional relationship between the feature value and the system state as long as the change can reflect the change in the state of the system.
  • a combination of multiple feature values can be selected to determine the state of the system from more dimensions.
  • the corresponding threshold(s) can be set for the CPU idle rate and the queue length to be processed, respectively, and the rise or fall of the system state is determined as a whole according to the relationship between the current values of the two and their respective thresholds.
  • more or other dimensions are introduced, which are not limited by the present invention.
  • step S120 when it is judged that the system state is rising, the service quality is suspended to the level corresponding to the current system state.
  • the system status of the online service system often appears to fluctuate up and down. Therefore, if the system status of the online service system is judged to rise, the service of the online service system will be immediately The quality is restored to the level corresponding to the current system state, which may cause the online service system to continue to be in a high service stress state, which is not conducive to the recovery of the system state of the online service system.
  • the suspension recovery operation can be performed to suspend the restoration of the service quality to the level corresponding to the current system state, so that the adverse effects due to the jitter and repeatability of the system state can be effectively avoided.
  • the suspension recovery operation described herein may have certain additional conditions, that is, the quality of service may be restored to a level corresponding to the current system state after the system state rises and certain additional conditions are met.
  • the additional condition described herein may be that the predetermined time threshold is exceeded, that is, the quality of service may be restored to a level corresponding to the current system state after the system state has risen and exceeded a predetermined time threshold.
  • the suspend recovery operation may also have other additional conditions as will be detailed below, and will not be described here.
  • the quality of service may be restored to the level corresponding to the current system state of the online service system before the suspension operation is resumed, or the quality of service may be restored to the online after the suspension operation is completed.
  • the level of the current system state of the service system may be restored to the level corresponding to the current system state of the service system.
  • the system state of the online service system can be well recovered. After the system state is judged to rise and the suspension recovery operation is performed, the quality of service can be restored to a level corresponding to the current system state. Other levels lower than one or several levels. In this way, while improving the service quality of the online service system, it can also contribute to the subsequent recovery of the system state of the online service system.
  • the current system state herein may refer to the current system state of the online service system before the suspension of the recovery operation, or the current system state of the online service system after the suspension operation is completed.
  • the service level control method of the present invention mainly includes a service downgrading step and a service recovery step, and the service downgrading step may be performed immediately in response to a system state drop of the online service system, and the service recovery step is Suspend recovery in response to a system state rise of the online service system. Therefore, by adopting a level control strategy of fast-downgrading and slow recovery, it is possible to prevent the adverse effects caused by the jitteriness and repeatability of the system state of the online service system. Moreover, the slow recovery can also provide more time for the online service system to further restore its system state.
  • the service restoration step S110 described in the service degradation step S110 described earlier it should be understood that the numbering and description order of the above steps does not limit the order in which the degradation and recovery steps occur.
  • the system status of the online service system may be a variety of changes such as first rising and then descending, first falling, then rising, successively rising several levels, successively descending several levels, and moving up and down, and accordingly, using the service of the present invention.
  • the level control method implements the control of the service level of the online service system, the corresponding degradation or recovery operation may also be performed according to the actual change of the system state.
  • the system state of the online service system may be characterized by the feature value of the current service capability, and accordingly, the quality of service may be divided into multiple levels corresponding to the plurality of feature value thresholds.
  • the online service system can be a plurality of types of Internet application systems, and the feature values for characterizing the service capabilities of the online service system are different according to the online service system.
  • the queue length that needs to be processed may be used as the feature value for characterizing the system state
  • the current available resources of the system, idle servers, etc. may also be used as the representation system state. Characteristic value.
  • the service degradation step and the service recovery step may be performed according to the relationship between the current feature value and the feature value threshold.
  • the service quality of the online service system may be immediately lowered to a level corresponding to the first feature value threshold.
  • the quality of service is suspended to a level corresponding to the second feature value threshold.
  • the feature value and the system state may be proportional or inversely proportional, that is, the larger the feature value, the better the system state, or the smaller the feature value, the better the system state.
  • the first feature value threshold and the second feature value threshold may be the same or different.
  • the suspend recovery operation in the service recovery step may have certain additional conditions.
  • the following additional conditions can be set.
  • the quality of service is restored to a level corresponding to the second feature value threshold when it is determined that the system state is raised such that the current feature value satisfies the second feature value threshold for a predetermined length of time.
  • the current feature value can be made to satisfy the second eigenvalue threshold for a predetermined period of time as an additional condition. This avoids the online service system being under high stress and also helps in the rapid recovery of the system state of the online service system.
  • the third eigenvalue threshold may be a eigenvalue threshold that is higher than the second eigenvalue threshold level, or may be an eigenvalue threshold that is higher than the second eigenvalue threshold level.
  • the online service system can have an idle service capability to improve the business status. For example, when the feature value is the queue length currently required by the online service system, the online service system can be made to have the task in the idle resource processing queue length to reduce the queue length.
  • the second feature threshold corresponds to the best quality of service level, and the best service regression value is not used to divide the quality level, which is only a threshold for indicating that the online service system is restored to the best quality.
  • the best service regression value is not used to divide the quality level, which is only a threshold for indicating that the online service system is restored to the best quality.
  • FIG. 2 is a schematic diagram showing a downgrade trigger and upgrade recovery using a particular embodiment of the present invention.
  • the abscissa is the time axis and the ordinate is the feature value.
  • three eigenvalue thresholds a, b, and c can be set.
  • the feature value may characterize the length of the request queue that the online service system currently needs to process.
  • the request queue length is less than the feature value threshold b, and the online service system is in a normal service state, and the best quality service can be provided externally.
  • the service level control method in this embodiment uses the request queue length as an indicator for judging the system service capability, and designs a multi-level threshold, and adopts different thresholds for triggering and restoring the degraded state, realizing the rapid triggering and slow recovery of the degraded state.
  • the system stays in the vicinity of high service pressure and reduces the system risk.
  • the thresholds b and c are threshold values for dividing the service level, and the threshold a is not used for the actual system service level division (this is because the system service can already be considered when the queue length drops to b). The ability has been restored, but it can be seen as the best service regression. In other words, only after the system state has returned to a can the system service capability be considered "safely" fully restored.
  • FIG. 3 is a functional block diagram showing the structure of an online quality of service control system 300 in accordance with an embodiment of the present invention.
  • the functional modules of the online service quality control system 300 can be implemented by hardware, software or a combination of hardware and software that implements the principles of the present invention.
  • the functional modules described in FIG. 3 can be combined or divided into sub-modules to implement the principles of the above invention. Accordingly, the description herein may support any possible combination, or division, or further limitation of the functional modules described herein.
  • the online service quality control system 300 shown in FIG. 3 can be used to implement the service level control method described in FIG. 1 and FIG. 2, and only the functional modules that the online service quality control system 300 can have and the operations that can be performed by each functional module.
  • FIG. 1 and FIG. 2 For a brief description, reference may be made to the above description in conjunction with FIG. 1 and FIG. 2 for details of the details involved therein, and details are not described herein again.
  • the online quality of service control system 300 includes a distributed service system 310 and a quality control server 320 for providing online services.
  • the distributed service system 310 described herein can be equivalent to the online service system described above.
  • the quality control server 320 can obtain the feature values used to characterize the current service capabilities of the distributed service system 310, and immediately reduce the quality of service of the distributed service system 310 when the current feature values characterize the system state of the distributed service system 310. As for the level corresponding to the current system state, and when the current feature value indicates that the system state is rising, the service quality is suspended to the level corresponding to the current system state.
  • the online quality of service control system 300 incorporating the quality control server 320 can also be viewed as a distributed online service system with quality control functionality.
  • one or more quality control servers 320 may also restore the quality of service to a level lower than the level corresponding to the current system state when the current feature value characterizes the system state rise.
  • the quality of service may be divided into a plurality of levels corresponding to a plurality of feature value thresholds, and wherein the one or more quality control servers 320 may be further configured to: determine the system status When the current feature value satisfies the first eigenvalue threshold, the service quality of the online service system is immediately lowered to a level corresponding to the first eigenvalue threshold; and the system state is raised so that the current eigenvalue satisfies the second eigenvalue threshold. At this time, the quality of service is suspended to a level corresponding to the second eigenvalue threshold.
  • the one or more quality control servers 320 may be further configured to: when determining that the system state rises such that the current feature value meets the second feature value threshold for a predetermined length of time, the quality of service is restored to The level corresponding to the second eigenvalue threshold.
  • the one or more quality control servers 320 may be further configured to: determine a third eigenvalue threshold that is higher than the second eigenvalue threshold quality level by determining that the system state is rising such that the current eigenvalue satisfies a second eigenvalue threshold quality level The quality of service is restored to the level corresponding to the feature value threshold.
  • the one or more quality control servers 320 may be further configured to: when determining that the system state rises such that the current feature value satisfies the second eigenvalue threshold corresponding to the optimal quality of service level, only The quality of service is restored to the optimal quality of service level when the current feature value further satisfies the best service regression value that represents a better quality of service than the second feature value.
  • the feature value of the current service capability may be the queue length that the distributed service system 310 currently needs to process.
  • FIG. 4 is a schematic block diagram showing the structure of an online quality of service control system in accordance with another embodiment of the present invention.
  • the distributed service system 310 can be refined to include a plurality of service modules, and the quality control server 320 can be connected to the service modules in the distributed service system 310, wherein FIG. 4 is a diagram showing online quality of service control.
  • the system 300 includes a quality control server 320.
  • the online quality of service control system 300 can also include a plurality of quality control servers 320, which can be associated with multiple service modules in the distributed service system 310.
  • each quality control server 320 can adjust its quality of service level according to the current system state of its corresponding distributed service module.
  • the feature value of the current service capability may be the queue length currently required by one or more service modules of the distributed service system 310, and each quality control server 320 may adjust the quality of service level of the service module corresponding thereto according to the feature value.
  • each quality control server 320 may adjust the quality of service level of the service module corresponding thereto according to the feature value.
  • the technology in the embodiments of the present invention can be implemented by means of software plus necessary general hardware including general-purpose integrated circuits, general-purpose CPUs, general-purpose memories, general-purpose components, and the like. It can be implemented by dedicated hardware including an application specific integrated circuit, a dedicated CPU, a dedicated memory, a dedicated component, etc., but in many cases the former is a better implementation. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a read-only memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • CD Compact Disc
  • the above technical concept of the present invention can also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium) on which executable code (or computer program/computer instruction code) is stored.
  • executable code or computer program/computer instruction code
  • the processor is caused to perform the service level control method described above.
  • the above technical concept of the present invention can also be embodied as a computing device including a processor and a non-transitory machine readable storage medium (or computer readable storage medium).
  • the non-transitory machine readable storage medium stores executable code (or computer program/computer instruction code).
  • executable code or computer program/computer instruction code
  • the processor is caused to perform the service level control method described above.
  • the method according to the invention can also be implemented as a computer program comprising computer program code instructions for performing the various steps defined above in the above method of the invention.
  • the method according to the invention may also be embodied as a computer program product comprising a computer readable medium on which is stored a computer for performing the above-described functions defined in the above method of the invention program.
  • the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
  • each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions.
  • the functions noted in the blocks may also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Disclosed in the present invention are a service level control method and system for an on-line service system, and a readable storage medium. The service level control method comprises: a service degradation step, when it is determined that the system state of an on-line service system has been degraded, lowering the quality of service of the on-line service system to a level corresponding to the current system state; and a service recovery step, when it is determined that the system state has been raised, suspending recovering the quality of service to a level corresponding to the current system state. Thus, by taking a fast-degradation and slow-recovery level control policy, the present invention may avoid adverse influence caused by the jitter and repeatability of the system state of the on-line service system, and may also allow more time for the on-line service system to further recover the system state thereof.

Description

在线服务系统的服务级别控制方法与系统、可读存储介质Service level control method and system for online service system, readable storage medium 技术领域Technical field
本发明涉及在线服务领域,特别是涉及一种针对在线服务系统的服务级别控制方法和相应系统、及可读存储介质。The present invention relates to the field of online services, and in particular, to a service level control method and corresponding system for an online service system, and a readable storage medium.
背景技术Background technique
基于互联网提供服务的在线服务系统的一个典型特征是流量不稳定。在线服务系统的访问流量除了随着时间周期呈现明显的波峰波谷分布之外,还受热点事件或者特殊活动影响。为了保证在线服务系统所提供的服务的稳定性,往往需要在线服务系统预留足够的服务能力。然而预留的资源仅是基于预期的流量高峰设计的,在服务系统的局部模块异常(比如,cache模块失效)、上下游服务异常或者受到恶意攻击等意外情况,系统仍然存在致命的风险。为了应对这些情况,通常的方法是进行服务降级,即通过牺牲一定的服务质量来释放资源,从而保证系统的基础服务能力。A typical feature of an online service system based on Internet-provided services is that traffic is unstable. In addition to the obvious peak-to-valley distribution over time, the online service system's access traffic is also affected by hot events or special events. In order to ensure the stability of the services provided by the online service system, it is often necessary to reserve sufficient service capabilities for the online service system. However, the reserved resources are only designed based on the expected traffic peaks. The system still has a fatal risk in the case of a partial module exception of the service system (for example, the cache module is invalid), an upstream or downstream service exception or a malicious attack. In order to cope with these situations, the usual method is to downgrade the service, that is, to release the resources by sacrificing a certain quality of service, thereby ensuring the basic service capability of the system.
现有的降级方案通常是预先设定一个阈值,当服务系统的健康状态低于阈值时,系统切换到降级模式,只提供最基础的服务,待服务系统的健康状态上升到阈值以上时,系统恢复到正常的服务状态。现有的降级方案实现简单,但是由于服务系统的不稳定性,容易导致系统在阈值附近抖动,使系统长时间处于高服务压力状态。The existing downgrade scheme usually sets a threshold in advance. When the health state of the service system is lower than the threshold, the system switches to the degraded mode and provides only the most basic service. When the health status of the service system rises above the threshold, the system Restore to normal service status. The existing downgrade scheme is simple to implement, but due to the instability of the service system, the system is likely to cause jitter near the threshold, and the system is in a high service pressure state for a long time.
因此,需要一种新的在线服务系统的服务级别控制方法与系统,以解决上述至少一个问题。Therefore, there is a need for a new service level control method and system for an online service system to address at least one of the above problems.
发明内容Summary of the invention
本发明的主要目的在于提供一种相对平稳的在线服务系统的服务级别控制方法与系统、可读存储介质。The main object of the present invention is to provide a service level control method and system and a readable storage medium for a relatively stable online service system.
根据本发明的一个方面,提供了一种在线服务系统的服务级别控制方法,其中在线服务系统提供的服务质量根据系统状态被划分为多个级别,该方法包括:服务降级步骤,在判断在线服务系统的系统状态下降时,将在线服务 系统的服务质量调低至与当前系统状态对应的级别;以及服务恢复步骤,在判断系统状态上升时,暂缓将服务质量恢复至与当前系统状态对应的级别。According to an aspect of the present invention, a service level control method for an online service system is provided, wherein a quality of service provided by an online service system is divided into a plurality of levels according to a system state, the method comprising: a service degradation step, in determining an online service When the system state of the system decreases, the service quality of the online service system is lowered to a level corresponding to the current system state; and the service recovery step, when determining that the system state is rising, suspends the restoration of the service quality to a level corresponding to the current system state. .
由此,通过采取快降级、缓恢复的级别控制策略,可以防止在线服务系统的系统状态的抖动性、反复性带来的不利影响,并且还可以为在线服务系统提供更多的时间来进一步恢复其系统状态。Therefore, by adopting a level-lowering and slow recovery level control strategy, it is possible to prevent the adverse effects caused by the jitter and repeatability of the system state of the online service system, and also provide more time for the online service system to further recover. Its system status.
优选地,服务恢复步骤可以包括:在判断系统状态上升时,将服务质量恢复至比当前系统状态对应的级别要低的级别。由此,在提升服务质量的同时,还可以为在线服务系统预留一定的服务能力,用于继续恢复其系统状态。Preferably, the service recovery step may include: when determining that the system status is rising, restoring the quality of service to a level lower than a level corresponding to the current system state. Therefore, while improving the quality of service, a certain service capability can be reserved for the online service system to continue to restore its system state.
优选地,系统状态由当前服务能力的特征值所表征,服务质量被划分为与多个特征值阈值相对应的多个级别,并且其中,服务降级步骤可以包括:在判断系统状态下降使得当前特征值满足第一特征值阈值时,立刻将在线服务系统的服务质量调低至与第一特征值阈值对应的级别;以及服务恢复步骤可以包括:在判断系统状态上升使得当前特征值满足第二特征值阈值时,暂缓将服务质量恢复至与第二特征值阈值对应的级别。由此,通过由特征值来表征系统状态,可以简单地根据当前特征值与特征值阈值的关系来执行服务降级步骤和服务恢复步骤。Preferably, the system state is characterized by a feature value of the current service capability, the quality of service is divided into a plurality of levels corresponding to the plurality of feature value thresholds, and wherein the service downgrading step may include: determining that the system state is degraded such that the current feature When the value satisfies the first eigenvalue threshold, the service quality of the online service system is immediately lowered to a level corresponding to the first eigenvalue threshold; and the service recovery step may include: determining that the current eigenvalue satisfies the second trait when the system state is raised When the value is thresholded, the quality of service is suspended to a level corresponding to the second eigenvalue threshold. Thus, by characterizing the system state by the feature values, the service degradation step and the service recovery step can be performed simply based on the relationship between the current feature value and the feature value threshold.
优选地,服务恢复步骤可以包括:在判断系统状态上升使得当前特征值满足第二特征值阈值达预定时长时,才将服务质量恢复至与第二特征值阈值对应的级别。Preferably, the service recovery step may include restoring the quality of service to a level corresponding to the second feature value threshold when determining that the system state is rising such that the current feature value satisfies the second feature value threshold for a predetermined length of time.
由此,针对系统状态上下抖动造成的现象,可以将当前特征值满足第二特征值阈值达预定时长设为附加条件。如此可以避免在线服务系统处于高压力状态,并且还可以为在线服务系统提供额外的恢复时间,有助于系统状态的快速恢复。Therefore, for the phenomenon caused by the up and down jitter of the system state, the current feature value can be made to satisfy the second eigenvalue threshold for a predetermined period of time as an additional condition. This can prevent the online service system from being under high stress, and can also provide additional recovery time for the online service system, which helps the system state to recover quickly.
优选地,服务恢复步骤可以包括:在判断系统状态上升使得当前特征值满足比第二特征值阈值质量级别更高的第三特征值阈值时,才将服务质量恢复至与第二特征值阈值对应的级别。由此,在提升服务质量的同时,还可以为在线服务系统预留一定的服务能力,用于继续恢复其系统状态。Preferably, the service recovery step may include: restoring the quality of service to a second eigenvalue threshold when determining that the system state is rising such that the current eigenvalue satisfies a third eigenvalue threshold that is higher than the second eigenvalue threshold quality level Level. Therefore, while improving the quality of service, a certain service capability can be reserved for the online service system to continue to restore its system state.
优选地,服务恢复步骤可以包括:在判断系统状态上升使得当前特征值满足对应于最佳服务质量级别的第二特征值阈值时,只有在当前特征值进一步满足比第二特征值表征更佳服务质量的最佳服务回归值时,才将服务质量 恢复至最佳服务质量级别。Preferably, the service recovery step may include: when it is determined that the system state is raised such that the current feature value satisfies the second feature value threshold corresponding to the optimal quality of service level, only the current feature value further satisfies the better service than the second feature value representation Quality of service is returned to the best quality of service level when the best service regression is achieved.
由此,可以通过判断当前特征值是否满足最佳服务回归值,确定在线服务系统的系统状态是否稳定在适于提供最佳服务质量级别的状态,在判定系统状态已经稳定在适于提供服务质量级别的状态时,才将服务质量恢复至最佳服务质量级别。Therefore, it can be determined whether the current eigenvalue satisfies the optimal service regression value, whether the system state of the online service system is stable at a state suitable for providing the best quality of service level, and the system state is determined to be stable in providing quality of service. The quality of service is restored to the best quality of service level when the status of the level is reached.
当前服务能力的特征值优选地可以是在线服务系统当前所需处理的队列长度。The feature value of the current service capability may preferably be the queue length currently required to be processed by the online service system.
根据本发明的另一个方面,还提供了一种在线服务质量控制系统,包括用于提供在线服务的分布式服务系统和一个或多个质量控制服务器,其中,一个或多个质量控制服务器用于:获取用于表征分布式服务系统当前服务能力的特征值;在当前特征值表征分布式服务系统的系统状态下降时,立刻将分布式服务系统的服务质量调低至与当前系统状态对应的级别;以及在当前特征值表征系统状态上升时,暂缓将服务质量恢复至与当前系统状态对应的级别。According to another aspect of the present invention, there is also provided an online quality of service control system comprising a distributed service system for providing an online service and one or more quality control servers, wherein one or more quality control servers are used Obtaining a feature value for characterizing the current service capability of the distributed service system; when the current feature value indicates that the system state of the distributed service system is degraded, immediately lowering the service quality of the distributed service system to a level corresponding to the current system state And suspending the restoration of the quality of service to a level corresponding to the current system state when the current eigenvalue characterizes the state of the system.
优选地,一个或多个质量控制服务器进一步用于:在当前特征值表征系统状态上升时,将服务质量恢复至比当前系统状态对应的级别要低的级别。Preferably, the one or more quality control servers are further configured to: when the current feature value indicates that the system state rises, restore the quality of service to a level lower than a level corresponding to the current system state.
优选地,服务质量被划分为与多个特征值阈值相对应的多个级别,并且其中,一个或多个质量控制服务器进一步用于:在判断系统状态下降使得当前特征值满足第一特征值阈值时,立刻将在线服务系统的服务质量调低至与第一特征值阈值对应的级别;以及在判断系统状态上升使得当前特征值满足第二特征值阈值时,暂缓将服务质量恢复至与第二特征值阈值对应的级别。Preferably, the quality of service is divided into a plurality of levels corresponding to the plurality of feature value thresholds, and wherein the one or more quality control servers are further configured to: determine that the current state value meets the first feature value threshold by determining that the system state is degraded Immediately, the quality of service of the online service system is lowered to a level corresponding to the first feature value threshold; and when it is determined that the system state is increased such that the current feature value satisfies the second feature value threshold, the quality of service is suspended to be restored to the second The level corresponding to the eigenvalue threshold.
优选地,一个或多个质量控制服务器进一步用于:在判断系统状态上升使得当前特征值满足第二特征值阈值达预定时长时,才将服务质量恢复至与第二特征值阈值对应的级别。Preferably, the one or more quality control servers are further configured to restore the quality of service to a level corresponding to the second feature value threshold when it is determined that the system state is raised such that the current feature value satisfies the second feature value threshold for a predetermined length of time.
优选地,一个或多个质量控制服务器进一步用于:在判断系统状态上升使得当前特征值满足比第二特征值阈值质量级别更高的第三特征值阈值时,才将服务质量恢复至与第二特征值阈值对应的级别。Preferably, the one or more quality control servers are further configured to restore the quality of service to the first time when it is determined that the system state rises such that the current feature value satisfies a third eigenvalue threshold that is higher than the second eigenvalue threshold quality level. The level corresponding to the second eigenvalue threshold.
优选地,一个或多个质量控制服务器进一步用于:在判断系统状态上升使得当前特征值满足对应于最佳服务质量级别的第二特征值阈值时,只有在当前特征值进一步满足比第二特征值表征更佳服务质量的最佳服务回归值时, 才将服务质量恢复至最佳服务质量级别。Preferably, the one or more quality control servers are further configured to: when the system state is raised such that the current feature value satisfies the second feature value threshold corresponding to the optimal quality of service level, only the current feature value further satisfies the second feature The quality of service is restored to the best quality of service level when the value represents the best service regression for better service quality.
优选地,当前服务能力的特征值是分布式服务系统当前所需处理的队列长度。Preferably, the feature value of the current service capability is the queue length currently required to be processed by the distributed service system.
优选地,当前服务能力的特征值是分布式服务系统的一个或多个服务模块当前所需处理的队列长度,并且一个或多个质量控制服务器根据特征值调整一个或多个服务模块的服务质量级别。Preferably, the feature value of the current service capability is a queue length currently required by one or more service modules of the distributed service system, and the one or more quality control servers adjust the quality of service of the one or more service modules according to the feature value. level.
综上,本发明的在线服务系统的服务级别控制方法与系统、可读存储介质,主要包括服务降级步骤和服务恢复步骤,服务降级步骤可以是响应于在线服务系统的系统状态下降立刻执行,服务恢复步骤则是响应于在线服务系统的系统状态上升暂缓恢复。由此,通过采取快降级、缓恢复的级别控制策略,可以防止在线服务系统的系统状态的抖动性、反复性带来的不利影响。并且,通过缓恢复还可以为在线服务系统提供更多的时间来进一步恢复其系统状态。In summary, the service level control method and system and the readable storage medium of the online service system of the present invention mainly include a service degradation step and a service recovery step, and the service degradation step may be performed immediately in response to a system state drop of the online service system. The recovery step is to suspend recovery in response to a rise in the system state of the online service system. Therefore, by adopting a level control strategy of fast-downgrading and slow recovery, it is possible to prevent the adverse effects caused by the jitteriness and repeatability of the system state of the online service system. Moreover, the slow recovery can also provide more time for the online service system to further restore its system state.
附图说明DRAWINGS
通过结合附图对本公开示例性实施方式进行更详细的描述,本公开的上述以及其它目的、特征和优势将变得更加明显,其中,在本公开示例性实施方式中,相同的参考标号通常代表相同部件。The above and other objects, features, and advantages of the present invention will become more apparent from the aspects of the embodiments of the invention. The same parts.
图1是示出了根据本发明一实施例的在线服务系统的服务级别控制方法的示意性流程图;FIG. 1 is a schematic flowchart showing a service level control method of an online service system according to an embodiment of the present invention; FIG.
图2是示出了利用本发明的一具体实施例下的降级触发和升级恢复的示意图;2 is a schematic diagram showing a downgrade trigger and upgrade recovery using a specific embodiment of the present invention;
图3是示出了根据本发明一实施例的在线服务质量控制系统300的结构的功能框图;FIG. 3 is a functional block diagram showing the structure of an online quality of service control system 300 in accordance with an embodiment of the present invention; FIG.
图4是示出了根据本发明另一实施例的在线服务质量控制系统300的结构的功能框图。4 is a functional block diagram showing the structure of an online quality of service control system 300 in accordance with another embodiment of the present invention.
具体实施方式detailed description
下面将参照附图更详细地描述本公开的优选实施方式。虽然附图中显示了本公开的优选实施方式,然而应该理解,可以以各种形式实现本公开 而不应被这里阐述的实施方式所限制。相反,提供这些实施方式是为了使本公开更加透彻和完整,并且能够将本公开的范围完整地传达给本领域的技术人员。Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiment of the present invention is shown in the drawings, it is understood that the invention may be embodied in various forms and not limited by the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
本发明述及的在线服务系统主要是指提供在线业务的互联网应用系统,尤其是包含众多服务器并划分为若干模块的分布式在线服务系统,例如可以是在线资讯推荐服务系统、在线搜索服务系统等等。The online service system described in the present invention mainly refers to an Internet application system for providing online services, in particular, a distributed online service system including a plurality of servers and divided into a plurality of modules, such as an online information recommendation service system, an online search service system, and the like. Wait.
在线服务系统提供的服务质量可以根据系统状态划分为多个(两个或两个以上)级别,其中,在线服务系统的系统状态可以指示在线服务系统的服务能力或健康状态。所划分的各级别下的服务质量与该级别所对应的系统状态下在线服务系统能够提供或适于提供的服务水平相匹配。在本发明中,系统状态的上升表明在线服务系统的服务能力或健康状态的上升,系统状态的下降表明在线服务系统的服务能力或健康状态的下降。The quality of service provided by the online service system may be divided into multiple (two or more) levels according to the state of the system, wherein the system state of the online service system may indicate the service capability or health status of the online service system. The quality of service at each of the divided levels matches the level of service that the online service system can provide or is adapted to provide in the system state corresponding to that level. In the present invention, an increase in the state of the system indicates an increase in the service capability or health state of the online service system, and a decrease in the state of the system indicates a decline in the service capability or health state of the online service system.
图1是示出了根据本发明一实施例的在线服务系统的服务级别控制方法的示意性流程图。FIG. 1 is a schematic flowchart showing a service level control method of an online service system according to an embodiment of the present invention.
如图1所示,本发明的在线服务系统的服务级别控制方法主要包括服务降级步骤(步骤S110)和服务恢复步骤(步骤S120)。As shown in FIG. 1, the service level control method of the online service system of the present invention mainly includes a service downgrading step (step S110) and a service recovery step (step S120).
在步骤S110,在判断在线服务系统的系统状态下降时,将在线服务系统的服务质量调低至与当前系统状态对应的级别。In step S110, when it is determined that the system state of the online service system is degraded, the quality of service of the online service system is lowered to a level corresponding to the current system state.
如前文所述,系统状态可以指示在线服务系统的服务能力或健康状态,系统状态上升时,可以表明在线服务系统的服务能力增强或健康状态变好,系统状态下降时,可以表明在线服务系统的服务能力下降或健康状态变差。因此可以根据在线服务系统的健康状态或服务能力的变化来判断系统状态的上升或下降。As described above, the system status may indicate the service capability or the health status of the online service system. When the system status rises, it may indicate that the service capability of the online service system is enhanced or the health status is improved. When the system status is decreased, the online service system may be indicated. Service capacity declines or health status deteriorates. Therefore, it is possible to judge the rise or fall of the system state according to changes in the health status or service capability of the online service system.
在具体实现上,可以根据在线服务系统的服务能力或健康状态的变化趋势、变化程度等多种方式来判断系统状态的上升或下降。在一个实施例中,可以简单地选取能够表征系统状态的一个特征值(例如待处理队列长度),通过将当前特征值与预先设定或是实时调整的阈值进行比较来直接判断系统状态。取决于选取的具体特征值,可以是特征值增大指示系统状态上升(例如,特征值取CPU空闲率的情况),也可以是特征值的减小指示系统状态上升(例如,前述待处理队列长度的情况)。本发明在此不对特征值与系统状态的比例 关系加以限定,只要其变动能够反映系统状态的变化即可。In a specific implementation, the rise or fall of the system state may be determined according to various manners such as the service capability of the online service system or the change trend and degree of change of the health state. In one embodiment, a feature value (eg, queue length to be processed) that can characterize the state of the system can be simply selected, and the system state can be directly determined by comparing the current feature value with a preset or real-time adjusted threshold. Depending on the specific feature value selected, it may be that the feature value increase indicates that the system state is rising (for example, the feature value takes the CPU idle rate), or the decrease of the feature value indicates that the system state is rising (for example, the foregoing pending queue) The case of length). The present invention does not limit the proportional relationship between the feature value and the system state as long as the change can reflect the change in the state of the system.
在另一个实施例中,可以选取多个特征值的组合变化来从更多的维度对系统状态进行判断。例如,可以分别为CPU空闲率和待处理队列长度设定相应的(一个或多个)阈值,并且根据两者当前数值与其各自阈值的关系,从整体上判断系统状态的上升或是下降。在其他实施例中,还有引入更多或是其他的维度,本发明对此不做限制。In another embodiment, a combination of multiple feature values can be selected to determine the state of the system from more dimensions. For example, the corresponding threshold(s) can be set for the CPU idle rate and the queue length to be processed, respectively, and the rise or fall of the system state is determined as a whole according to the relationship between the current values of the two and their respective thresholds. In other embodiments, more or other dimensions are introduced, which are not limited by the present invention.
在判定在线服务系统的系统状态下降时,表明在线服务系统当前的服务能力下降,此时可以立刻将在线服务系统的服务质量调低至与当前系统状态对应的级别。此处述及的“立刻”是指没有附加条件的随即降级。When it is determined that the system state of the online service system is declining, it indicates that the current service capability of the online service system is degraded, and the service quality of the online service system can be immediately lowered to a level corresponding to the current system state. "Immediately" as used herein refers to a subsequent degradation without additional conditions.
在步骤S120,在判断系统状态上升时,暂缓将服务质量恢复至与当前系统状态对应的级别。In step S120, when it is judged that the system state is rising, the service quality is suspended to the level corresponding to the current system state.
如前文所述,由于在线服务系统的流量不稳定特征,在线服务系统的系统状态也会经常出现上下抖动的现象,因此如果在判断在线服务系统的系统状态上升时,立刻将在线服务系统的服务质量恢复至与当前系统状态对应的级别,则有可能会造成在线服务系统持续处于高服务压力状态,不利于在线服务系统的系统状态的恢复。As mentioned above, due to the unstable traffic characteristics of the online service system, the system status of the online service system often appears to fluctuate up and down. Therefore, if the system status of the online service system is judged to rise, the service of the online service system will be immediately The quality is restored to the level corresponding to the current system state, which may cause the online service system to continue to be in a high service stress state, which is not conducive to the recovery of the system state of the online service system.
基于上述考虑,在判断系统状态上升时,可以执行暂缓恢复操作,暂缓将服务质量恢复至与当前系统状态对应的级别,如此可以有效地避免由于系统状态的抖动性、反复性带来的不利影响。Based on the above considerations, when it is judged that the system state is rising, the suspension recovery operation can be performed to suspend the restoration of the service quality to the level corresponding to the current system state, so that the adverse effects due to the jitter and repeatability of the system state can be effectively avoided. .
此处述及的暂缓恢复操作可以带有一定的附加条件,也就是说,可以在系统状态上升且满足特定的附加条件后,才将服务质量恢复至与当前系统状态对应的级别。例如,此处述及的附加条件可以是超过预定时间阈值,也就是说,可以在系统状态上升且超过预定时间阈值后才将服务质量恢复至与当前系统状态对应的级别。另外,暂缓恢复操作还可以带有如下将详述的其它附加条件,这里暂不赘述。The suspension recovery operation described herein may have certain additional conditions, that is, the quality of service may be restored to a level corresponding to the current system state after the system state rises and certain additional conditions are met. For example, the additional condition described herein may be that the predetermined time threshold is exceeded, that is, the quality of service may be restored to a level corresponding to the current system state after the system state has risen and exceeded a predetermined time threshold. In addition, the suspend recovery operation may also have other additional conditions as will be detailed below, and will not be described here.
需要说明的是,在暂缓恢复操作执行完毕后,可以将服务质量恢复至暂缓恢复操作执行前在线服务系统的当前系统状态所对应的级别,也可以将服务质量恢复至暂缓恢复操作执行完毕后在线服务系统的当前系统状态所对应的级别。It should be noted that after the suspension of the recovery operation is completed, the quality of service may be restored to the level corresponding to the current system state of the online service system before the suspension operation is resumed, or the quality of service may be restored to the online after the suspension operation is completed. The level of the current system state of the service system.
另外,为了减轻在线服务系统的服务压力,使得在线服务系统的系统状 态可以得到良好的恢复,在判断系统状态上升并执行了暂缓恢复操作后,可以将服务质量恢复至比当前系统状态对应的级别低一级或数级的其它级别。如此,在提升在线服务系统的服务质量的同时,还可以有助于在线服务系统的系统状态的后续恢复。此处的当前系统状态可以是指暂缓恢复操作执行前在线服务系统的当前系统状态,也可以是指暂缓恢复操作执行完毕后在线服务系统的当前系统状态。In addition, in order to alleviate the service pressure of the online service system, the system state of the online service system can be well recovered. After the system state is judged to rise and the suspension recovery operation is performed, the quality of service can be restored to a level corresponding to the current system state. Other levels lower than one or several levels. In this way, while improving the service quality of the online service system, it can also contribute to the subsequent recovery of the system state of the online service system. The current system state herein may refer to the current system state of the online service system before the suspension of the recovery operation, or the current system state of the online service system after the suspension operation is completed.
基于上文结合图1的描述可知,本发明的服务级别的控制方法主要包括服务降级步骤和服务恢复步骤,服务降级步骤可以是响应于在线服务系统的系统状态下降立刻执行,服务恢复步骤则是响应于在线服务系统的系统状态上升暂缓恢复。由此,通过采取快降级、缓恢复的级别控制策略,可以防止在线服务系统的系统状态的抖动性、反复性带来的不利影响。并且,通过缓恢复还可以为在线服务系统提供更多的时间来进一步恢复其系统状态。Based on the above description in conjunction with FIG. 1, the service level control method of the present invention mainly includes a service downgrading step and a service recovery step, and the service downgrading step may be performed immediately in response to a system state drop of the online service system, and the service recovery step is Suspend recovery in response to a system state rise of the online service system. Therefore, by adopting a level control strategy of fast-downgrading and slow recovery, it is possible to prevent the adverse effects caused by the jitteriness and repeatability of the system state of the online service system. Moreover, the slow recovery can also provide more time for the online service system to further restore its system state.
需要说明的是,虽然上文是先描述的服务降级步骤S110然后描述的服务恢复步骤S120,但是应该理解的是,上述步骤的编号和描述顺序并非对降级和恢复步骤的发生顺序加以限制。在实际运行中,在线服务系统的系统状态可以是先升后降、先降后升、连升几级、连降几级、升降交替等多种变化形式,相应地,在利用本发明的服务级别的控制方法实现对在线服务系统的服务级别的控制时,也可以按照系统状态的实际变化进行对应的降级或是恢复操作。It should be noted that although the above is the service restoration step S110 described in the service degradation step S110 described earlier, it should be understood that the numbering and description order of the above steps does not limit the order in which the degradation and recovery steps occur. In actual operation, the system status of the online service system may be a variety of changes such as first rising and then descending, first falling, then rising, successively rising several levels, successively descending several levels, and moving up and down, and accordingly, using the service of the present invention. When the level control method implements the control of the service level of the online service system, the corresponding degradation or recovery operation may also be performed according to the actual change of the system state.
至此,结合图1简要说明了本发明的服务级别的控制方法的原理及过程。下面结合具体实施例就本发明的服务级别的控制方法做进一步详细说明。So far, the principle and process of the service level control method of the present invention are briefly explained in conjunction with FIG. The service level control method of the present invention will be further described in detail below with reference to specific embodiments.
实施例一、Embodiment 1
在本实施例中,在线服务系统的系统状态可以由当前服务能力的特征值所表征,相应地,服务质量可以被划分为与多个特征值阈值相对应的多个级别。In this embodiment, the system state of the online service system may be characterized by the feature value of the current service capability, and accordingly, the quality of service may be divided into multiple levels corresponding to the plurality of feature value thresholds.
如上文所述,在线服务系统可以是多种类型的互联网应用系统,根据在线服务系统的不同,用于表征在线服务系统的服务能力的特征值也不尽相同。例如,对于资讯推荐服务在线系统,可以将当前需要处理的队列长度作为用于表征系统状态的特征值,对于其它分布式任务系统,也可以将系统的当前可用资源、空闲服务器等作为表征系统状态的特征值。As described above, the online service system can be a plurality of types of Internet application systems, and the feature values for characterizing the service capabilities of the online service system are different according to the online service system. For example, for the information recommendation service online system, the queue length that needs to be processed may be used as the feature value for characterizing the system state, and for other distributed task systems, the current available resources of the system, idle servers, etc. may also be used as the representation system state. Characteristic value.
在系统状态由特征值表征时,可以根据当前特征值与特征值阈值的关系来执行服务降级步骤和服务恢复步骤。When the system state is characterized by the feature value, the service degradation step and the service recovery step may be performed according to the relationship between the current feature value and the feature value threshold.
具体来说,对于服务降级步骤,可以在判断系统状态下降使得当前特征值满足第一特征值阈值时,立刻将在线服务系统的服务质量调低至与第一特征值阈值对应的级别。Specifically, for the service degradation step, when the system state is degraded such that the current feature value satisfies the first feature value threshold, the service quality of the online service system may be immediately lowered to a level corresponding to the first feature value threshold.
对于服务恢复步骤,可以在判断系统状态上升使得当前特征值满足第二特征值阈值时,暂缓将服务质量恢复至与第二特征值阈值对应的级别。For the service recovery step, when it is determined that the system state is raised such that the current feature value satisfies the second feature value threshold, the quality of service is suspended to a level corresponding to the second feature value threshold.
在本实施例中,特征值与系统状态之间可以是正比例,也可以是反比例,即可以是特征值越大,系统状态越好,也可以是特征值越小,系统状态越好。并且,第一特征值阈值和第二特征值阈值可以相同,也可以不同。In this embodiment, the feature value and the system state may be proportional or inversely proportional, that is, the larger the feature value, the better the system state, or the smaller the feature value, the better the system state. Also, the first feature value threshold and the second feature value threshold may be the same or different.
如上文所述,服务恢复步骤中的暂缓恢复操作可以带有一定的附加条件。在本实施例中,可以设定如下几种附加条件。As mentioned above, the suspend recovery operation in the service recovery step may have certain additional conditions. In this embodiment, the following additional conditions can be set.
1.1、在判断系统状态上升使得当前特征值满足第二特征值阈值达预定时长时,才将服务质量恢复至与第二特征值阈值对应的级别。1.1. The quality of service is restored to a level corresponding to the second feature value threshold when it is determined that the system state is raised such that the current feature value satisfies the second feature value threshold for a predetermined length of time.
由此,针对系统状态上下抖动造成的现象,可以将当前特征值满足第二特征值阈值达预定时长设为附加条件。如此可以避免在线服务系统处于高压力状态,并且还有助于在线服务系统的系统状态的快速恢复。Therefore, for the phenomenon caused by the up and down jitter of the system state, the current feature value can be made to satisfy the second eigenvalue threshold for a predetermined period of time as an additional condition. This avoids the online service system being under high stress and also helps in the rapid recovery of the system state of the online service system.
1.2、在判断系统状态上升使得当前特征值满足比第二特征值阈值质量级别更高的第三特征值阈值时,才将服务质量恢复至与第二特征值阈值对应的级别。1.2. Restoring the quality of service to a level corresponding to the second feature value threshold when determining that the system state is rising such that the current feature value satisfies a third feature value threshold that is higher than the second feature value threshold quality level.
第三特征值阈值可以是高于第二特征值阈值一级的特征值阈值,也可以是高于第二特征值阈值数级的特征值阈值。如此,可以使得在线服务系统拥有空闲服务能力改善业务状态。例如,在特征值是在线服务系统当前所需处理的队列长度时,可以使得在线服务系统拥有空闲资源处理队列长度中的任务,以减小队列长度。The third eigenvalue threshold may be a eigenvalue threshold that is higher than the second eigenvalue threshold level, or may be an eigenvalue threshold that is higher than the second eigenvalue threshold level. In this way, the online service system can have an idle service capability to improve the business status. For example, when the feature value is the queue length currently required by the online service system, the online service system can be made to have the task in the idle resource processing queue length to reduce the queue length.
1.3、在判断系统状态上升使得当前特征值满足对应于最佳服务质量级别的第二特征值阈值时,只有在当前特征值进一步满足比第二特征值表征更佳服务质量的最佳服务回归值时,才将服务质量恢复至最佳服务质量级别。1.3. When it is judged that the system state rises such that the current feature value satisfies the second eigenvalue threshold corresponding to the optimal QoS level, only the current eigenvalue further satisfies the best service regression value that represents better service quality than the second eigenvalue. When the quality of service is restored to the best quality of service level.
第二特征阈值对应于最佳服务质量级别,最佳服务回归值不用于划分质量级别,其仅是用于指示在线服务系统恢复至最佳质量的阈值。在判断系统 状态上升到当前特征值满足第二特征阈值并进一步满足最佳服务回归值时,可以表明在线服务系统的系统状态已经稳定在适于提供最佳服务质量级别的状态,此时可以将服务质量恢复至最佳服务质量级别。The second feature threshold corresponds to the best quality of service level, and the best service regression value is not used to divide the quality level, which is only a threshold for indicating that the online service system is restored to the best quality. When it is judged that the system state rises until the current feature value satisfies the second feature threshold and further satisfies the optimal service regression value, it may indicate that the system state of the online service system has stabilized at a state suitable for providing the best quality of service level, and Quality of service is restored to the best quality of service level.
实施例二、Embodiment 2
图2是示出了利用本发明的一具体实施例下的降级触发和升级恢复的示意图。其中,横坐标为时间轴,纵坐标为特征值,如图2所示,可以设定三个特征值阈值a、b、c。在本实施例中,特征值可以表征在线服务系统当前需要处理的请求队列长度。2 is a schematic diagram showing a downgrade trigger and upgrade recovery using a particular embodiment of the present invention. The abscissa is the time axis and the ordinate is the feature value. As shown in FIG. 2, three eigenvalue thresholds a, b, and c can be set. In this embodiment, the feature value may characterize the length of the request queue that the online service system currently needs to process.
1)t2时刻之前,请求队列长度小于特征值阈值b,在线服务系统处于正常服务状态,可以对外提供最佳质量的服务。1) Before the time t2, the request queue length is less than the feature value threshold b, and the online service system is in a normal service state, and the best quality service can be provided externally.
2)随着流量的增加,系统无法支撑高质量的服务,请求队列长度不断增加,t2时刻队列长度达到特征值阈值b,此时在线服务系统可以进入一级降级状态,向外提供相对较差质量的服务。2) As the traffic increases, the system cannot support high-quality services, and the request queue length increases continuously. The queue length reaches the eigenvalue threshold b at time t2. At this time, the online service system can enter the first-level degradation state and provide relatively poor outward. Quality service.
3)随着流量继续增加,在线服务系统仍然无法支撑相对较差质量的服务,请求队列长度继续增加,t3时刻队列长度达到特征值阈值c,此时在线服务系统可以进入二级降级状态,对外只提供最基础的服务。3) As the traffic continues to increase, the online service system still cannot support the relatively poor quality service. The request queue length continues to increase. The queue length reaches the eigenvalue threshold c at time t3. At this time, the online service system can enter the secondary degradation state. Only provide the most basic services.
4)t3到t5阶段,由于在线系统只提供最基础的服务,大量资源得到释放,服务能力得到加强,请求队列长度得到改善,但是在队列长度下降到特征值阈值b之前,在线服务系统可以仍然只提供最基础的服务。此处之所以没有在队列长度降到特征值阈值c之后马上切换回一级降级状态,是为了避免反复,防止在线服务系统一直停留在高压力的状态,使在线服务系统的健康状态能够尽可能快的得到恢复。4) From t3 to t5, since the online system only provides the most basic services, a large amount of resources are released, the service capability is strengthened, and the request queue length is improved, but before the queue length drops to the eigenvalue threshold b, the online service system can still Only provide the most basic services. The reason why the queue length is not changed to the eigenvalue threshold c immediately after switching to the first-level degraded state is to avoid repetition and prevent the online service system from staying in the high-pressure state, so that the health status of the online service system can be as Get it back quickly.
5)t5时刻,随着请求队列长度进一步下降到b以下,在线服务系统恢复到一级降级状态。5) At time t5, as the request queue length further drops below b, the online service system returns to the first-level degradation state.
6)t6时刻,在线系统的服务能力得到完全的恢复,退出一级降级状态,恢复高质量的服务。6) At time t6, the service capability of the online system is fully restored, exiting the first-level downgrade state, and restoring high-quality services.
综上,本实施例的服务级别控制方法以请求队列长度作为判断系统服务能力的指标,设计了多级阈值,并对降级状态触发和恢复采用不同的阈值,实现了降级的快速触发和慢恢复,避免了单阈值降级触发方案中,系统停留在高服务压力附近抖动的情况,减少了系统风险。在本实施例中,阈值b和 c是用于划分服务级别的特征值阈值,阈值a虽然并不用于实际的系统服务级别划分(这是因为在队列长度下降到b时其实已经可以认为系统服务能力已恢复),但可将其看做是最佳服务回归值。换句话说,只有在系统状态恢复到a之后,才能认为系统服务能力已经“稳妥地”完全恢复。In summary, the service level control method in this embodiment uses the request queue length as an indicator for judging the system service capability, and designs a multi-level threshold, and adopts different thresholds for triggering and restoring the degraded state, realizing the rapid triggering and slow recovery of the degraded state. In the single-threshold degradation triggering scheme, the system stays in the vicinity of high service pressure and reduces the system risk. In this embodiment, the thresholds b and c are threshold values for dividing the service level, and the threshold a is not used for the actual system service level division (this is because the system service can already be considered when the queue length drops to b). The ability has been restored, but it can be seen as the best service regression. In other words, only after the system state has returned to a can the system service capability be considered "safely" fully restored.
图3是示出了根据本发明一实施例的在线服务质量控制系统300的结构的功能框图。其中,在线服务质量控制系统300的功能模块可以由实现本发明原理的硬件、软件或硬件和软件的结合来实现。本领域技术人员可以理解的是,图3所描述的功能模块可以组合起来或者划分成子模块,从而实现上述发明的原理。因此,本文的描述可以支持对本文描述的功能模块的任何可能的组合、或者划分、或者更进一步的限定。FIG. 3 is a functional block diagram showing the structure of an online quality of service control system 300 in accordance with an embodiment of the present invention. Wherein, the functional modules of the online service quality control system 300 can be implemented by hardware, software or a combination of hardware and software that implements the principles of the present invention. Those skilled in the art can understand that the functional modules described in FIG. 3 can be combined or divided into sub-modules to implement the principles of the above invention. Accordingly, the description herein may support any possible combination, or division, or further limitation of the functional modules described herein.
图3所示的在线服务质量控制系统300可以用于实现图1、图2述及的服务级别控制方法,下面仅就在线服务质量控制系统300可以具有的功能模块以及各功能模块可以执行的操作做简要说明,对于其中涉及的细节部分可以参见上文结合图1、图2的描述,这里不再赘述。The online service quality control system 300 shown in FIG. 3 can be used to implement the service level control method described in FIG. 1 and FIG. 2, and only the functional modules that the online service quality control system 300 can have and the operations that can be performed by each functional module. For a brief description, reference may be made to the above description in conjunction with FIG. 1 and FIG. 2 for details of the details involved therein, and details are not described herein again.
如图3所示,在线服务质量控制系统300包括用于提供在线服务的分布式服务系统310和质量控制服务器320。此处述及的分布式服务系统310可以等同于上文述及的在线服务系统。As shown in FIG. 3, the online quality of service control system 300 includes a distributed service system 310 and a quality control server 320 for providing online services. The distributed service system 310 described herein can be equivalent to the online service system described above.
质量控制服务器320可以获取用于表征分布式服务系统310当前服务能力的特征值,并在当前特征值表征分布式服务系统310的系统状态下降时,立刻将分布式服务系统310的服务质量调低至于当前系统状态对应的级别,并且,在在当前特征值表征系统状态上升时,暂缓将服务质量恢复至与当前系统状态对应的级别。换句话说,并入了质量控制服务器320的在线服务质量控制系统300也可以看做是带有质量控制功能的分布式在线服务系统。The quality control server 320 can obtain the feature values used to characterize the current service capabilities of the distributed service system 310, and immediately reduce the quality of service of the distributed service system 310 when the current feature values characterize the system state of the distributed service system 310. As for the level corresponding to the current system state, and when the current feature value indicates that the system state is rising, the service quality is suspended to the level corresponding to the current system state. In other words, the online quality of service control system 300 incorporating the quality control server 320 can also be viewed as a distributed online service system with quality control functionality.
作为本发明的一个可选实施例,在当前特征值表征所述系统状态上升时,一个或多个质量控制服务器320还可以将服务质量恢复至比当前系统状态对应的级别要低的级别。As an alternative embodiment of the present invention, one or more quality control servers 320 may also restore the quality of service to a level lower than the level corresponding to the current system state when the current feature value characterizes the system state rise.
作为本发明的另一个可选实施例,服务质量可以被划分为与多个特征值阈值相对应的多个级别,并且其中,一个或多个质量控制服务器320可以进一步用于:在判断系统状态下降使得当前特征值满足第一特征值阈值时,立刻将在线服务系统的服务质量调低至与第一特征值阈值对应的级别;以及在 判断系统状态上升使得当前特征值满足第二特征值阈值时,暂缓将服务质量恢复至与第二特征值阈值对应的级别。As another optional embodiment of the present invention, the quality of service may be divided into a plurality of levels corresponding to a plurality of feature value thresholds, and wherein the one or more quality control servers 320 may be further configured to: determine the system status When the current feature value satisfies the first eigenvalue threshold, the service quality of the online service system is immediately lowered to a level corresponding to the first eigenvalue threshold; and the system state is raised so that the current eigenvalue satisfies the second eigenvalue threshold. At this time, the quality of service is suspended to a level corresponding to the second eigenvalue threshold.
作为本发明的另一个可选实施例,一个或多个质量控制服务器320可以进一步用于:在判断系统状态上升使得当前特征值满足第二特征值阈值达预定时长时,才将服务质量恢复至与第二特征值阈值对应的级别。As another optional embodiment of the present invention, the one or more quality control servers 320 may be further configured to: when determining that the system state rises such that the current feature value meets the second feature value threshold for a predetermined length of time, the quality of service is restored to The level corresponding to the second eigenvalue threshold.
作为本发明的另一个可选实施例,一个或多个质量控制服务器320可以进一步用于:在判断系统状态上升使得当前特征值满足比第二特征值阈值质量级别更高的第三特征值阈值时,才将服务质量恢复至与特征值阈值对应的级别。As another optional embodiment of the present invention, the one or more quality control servers 320 may be further configured to: determine a third eigenvalue threshold that is higher than the second eigenvalue threshold quality level by determining that the system state is rising such that the current eigenvalue satisfies a second eigenvalue threshold quality level The quality of service is restored to the level corresponding to the feature value threshold.
作为本发明的另一个可选实施例,一个或多个质量控制服务器320可以进一步用于:在判断系统状态上升使得当前特征值满足对应于最优服务质量级别的第二特征值阈值时,只有在当前特征值进一步满足比第二特征值表征更佳服务质量的最佳服务回归值时,才将服务质量恢复至最佳服务质量级别。As another optional embodiment of the present invention, the one or more quality control servers 320 may be further configured to: when determining that the system state rises such that the current feature value satisfies the second eigenvalue threshold corresponding to the optimal quality of service level, only The quality of service is restored to the optimal quality of service level when the current feature value further satisfies the best service regression value that represents a better quality of service than the second feature value.
作为本发明的另一个可选实施例,当前服务能力的特征值可以是分布式服务系统310当前所需处理的队列长度。As another alternative embodiment of the present invention, the feature value of the current service capability may be the queue length that the distributed service system 310 currently needs to process.
图4是示出了根据本发明另一实施例的在线服务质量控制系统的结构的示意性方框图。4 is a schematic block diagram showing the structure of an online quality of service control system in accordance with another embodiment of the present invention.
如图4所示,分布式服务系统310可以细化为包括多个服务模块,质量控制服务器320可以与分布式服务系统310中的服务模块连接,其中,图4是示出了在线服务质量控制系统300包括一个质量控制服务器320的情形,应该知道,在线服务质量控制系统300还可以包括多个质量控制服务器320,多个质量控制服务器320可以与分布式服务系统310中的多个服务模块一一对应,每个质量控制服务器320可以根据与其对应的分布式服务模块的当前系统状态,调节其服务质量级别。As shown in FIG. 4, the distributed service system 310 can be refined to include a plurality of service modules, and the quality control server 320 can be connected to the service modules in the distributed service system 310, wherein FIG. 4 is a diagram showing online quality of service control. The system 300 includes a quality control server 320. It should be appreciated that the online quality of service control system 300 can also include a plurality of quality control servers 320, which can be associated with multiple service modules in the distributed service system 310. In one correspondence, each quality control server 320 can adjust its quality of service level according to the current system state of its corresponding distributed service module.
例如,当前服务能力的特征值可以是分布式服务系统310的一个或多个服务模块当前所需处理的队列长度,每个质量控制服务器320可以根据特征值调整与其对应的服务模块的服务质量级别。其中,质量控制服务器320调整服务模块的服务质量级别的过程可以参见上文相关描述,这里不再赘述。For example, the feature value of the current service capability may be the queue length currently required by one or more service modules of the distributed service system 310, and each quality control server 320 may adjust the quality of service level of the service module corresponding thereto according to the feature value. . For the process of the quality control server 320 adjusting the service quality level of the service module, refer to the related description above, and details are not described herein again.
上文中已经参考附图详细描述了根据本发明的在线服务系统的服务级别控制方法与系统。The service level control method and system of the online service system according to the present invention has been described in detail above with reference to the accompanying drawings.
本领域的技术人员可以清楚地了解到本发明实施例中的技术可借助软件加必需的通用硬件的方式来实现,通用硬件包括通用集成电路、通用CPU、通用存储器、通用元器件等,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明实施例中的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例或者实施例的某些部分所述的方法。It will be apparent to those skilled in the art that the technology in the embodiments of the present invention can be implemented by means of software plus necessary general hardware including general-purpose integrated circuits, general-purpose CPUs, general-purpose memories, general-purpose components, and the like. It can be implemented by dedicated hardware including an application specific integrated circuit, a dedicated CPU, a dedicated memory, a dedicated component, etc., but in many cases the former is a better implementation. Based on such understanding, the technical solution in the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product, which may be stored in a storage medium such as a read-only memory. (ROM, Read-Only Memory), Random Access Memory (RAM), disk, CD, etc., including a number of instructions to make a computer device (can be a personal computer, server, or network device, etc.) The methods described in various embodiments of the invention or in certain portions of the embodiments are performed.
因此,本发明的上述技术构思还可以被实施为一种非暂时性机器可读存储介质(或计算机可读存储介质),其上存储有可执行代码(或计算机程序/计算机指令代码)。当该可执行代码(或计算机程序/计算机指令代码)被处理器执行时,使该处理器执行上文所述的服务级别控制方法。Accordingly, the above technical concept of the present invention can also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium) on which executable code (or computer program/computer instruction code) is stored. When the executable code (or computer program/computer instruction code) is executed by the processor, the processor is caused to perform the service level control method described above.
另一方面,本发明的上述技术构思还可以被实施为一种计算设备,该计算设备包括处理器和非暂时性机器可读存储介质(或计算机可读存储介质)。该非暂时性机器可读存储介质上存储有可执行代码(或计算机程序/计算机指令代码)。当该可执行代码(或计算机程序/计算机指令代码)被该处理器执行时,使该处理器执行上文所述的服务级别控制方法。In another aspect, the above technical concept of the present invention can also be embodied as a computing device including a processor and a non-transitory machine readable storage medium (or computer readable storage medium). The non-transitory machine readable storage medium stores executable code (or computer program/computer instruction code). When the executable code (or computer program/computer instruction code) is executed by the processor, the processor is caused to perform the service level control method described above.
此外,根据本发明的方法还可以实现为一种计算机程序,该计算机程序包括用于执行本发明的上述方法中限定的上述各步骤的计算机程序代码指令。或者,根据本发明的方法还可以实现为一种计算机程序产品,该计算机程序产品包括计算机可读介质,在该计算机可读介质上存储有用于执行本发明的上述方法中限定的上述功能的计算机程序。本领域技术人员还将明白的是,结合这里的公开所描述的各种示例性逻辑块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。Furthermore, the method according to the invention can also be implemented as a computer program comprising computer program code instructions for performing the various steps defined above in the above method of the invention. Alternatively, the method according to the invention may also be embodied as a computer program product comprising a computer readable medium on which is stored a computer for performing the above-described functions defined in the above method of the invention program. The various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
附图中的流程图和框图显示了根据本发明的多个实施例的系统和方法的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意, 在有些作为替换的实现中,方框中所标记的功能也可以以不同于附图中所标记的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality and operation of possible implementations of systems and methods in accordance with various embodiments of the present invention. In this regard, each block of the flowchart or block diagram can represent a module, a program segment, or a portion of code that includes one or more of the Executable instructions. It should also be noted that in some alternative implementations, the functions noted in the blocks may also occur in a different order than those illustrated in the drawings. For example, two consecutive blocks may be executed substantially in parallel, and they may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented in a dedicated hardware-based system that performs the specified function or operation. Or it can be implemented by a combination of dedicated hardware and computer instructions.
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。The embodiments of the present invention have been described above, and the foregoing description is illustrative, not limiting, and not limited to the disclosed embodiments. Numerous modifications and changes will be apparent to those skilled in the art without departing from the scope of the invention. The choice of terms used herein is intended to best explain the principles, practical applications, or improvements of the techniques in the various embodiments of the embodiments, or to enable those of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (16)

  1. 一种在线服务系统的服务级别控制方法,其特征在于,所述在线服务系统提供的服务质量根据系统状态被划分为多个级别,所述方法包括:A service level control method for an online service system, wherein the quality of service provided by the online service system is divided into multiple levels according to a system state, and the method includes:
    服务降级步骤,在判断所述在线服务系统的系统状态下降时,将所述在线服务系统的服务质量调低至与当前系统状态对应的级别;以及a service degradation step of lowering a service quality of the online service system to a level corresponding to a current system state when determining that a system state of the online service system is degraded;
    服务恢复步骤,在判断所述系统状态上升时,暂缓将所述服务质量恢复至与当前系统状态对应的级别。The service recovery step suspends the restoration of the quality of service to a level corresponding to the current system state when it is determined that the system status is rising.
  2. 如权利要求1所述的方法,其特征在于,所述服务恢复步骤包括:The method of claim 1 wherein said service recovery step comprises:
    在判断所述系统状态上升时,将所述服务质量恢复至比当前系统状态对应的级别要低的级别。When it is determined that the system state is rising, the quality of service is restored to a level lower than a level corresponding to the current system state.
  3. 如权利要求1所述的方法,其特征在于,所述系统状态由当前服务能力的特征值所表征,所述服务质量被划分为与多个特征值阈值相对应的多个级别,The method of claim 1 wherein said system state is characterized by a feature value of a current service capability, said quality of service being divided into a plurality of levels corresponding to a plurality of feature value thresholds,
    并且其中,And among them,
    所述服务降级步骤包括:The service degradation step includes:
    在判断所述系统状态下降使得当前特征值满足第一特征值阈值时,立刻将所述在线服务系统的服务质量调低至与所述第一特征值阈值对应的级别;以及When it is determined that the system state is decreased such that the current feature value satisfies the first feature value threshold, the service quality of the online service system is immediately lowered to a level corresponding to the first feature value threshold;
    所述服务恢复步骤包括:The service recovery steps include:
    在判断所述系统状态上升使得当前特征值满足第二特征值阈值时,暂缓将所述服务质量恢复至与所述第二特征值阈值对应的级别。When it is determined that the system state rises such that the current feature value satisfies the second feature value threshold, the quality of service is suspended to a level corresponding to the second feature value threshold.
  4. 如权利要求3所述的方法,其特征在于,所述服务恢复步骤包括:The method of claim 3 wherein said service recovery step comprises:
    在判断所述系统状态上升使得当前特征值满足第二特征值阈值达预定时长时,才将所述服务质量恢复至与所述第二特征值阈值对应的级别。The quality of service is restored to a level corresponding to the second feature value threshold when it is determined that the system state rises such that the current feature value satisfies the second feature value threshold for a predetermined length of time.
  5. 如权利要求3所述的方法,其特征在于,所述服务恢复步骤包括:The method of claim 3 wherein said service recovery step comprises:
    在判断所述系统状态上升使得当前特征值满足比所述第二特征值阈值质量级别更高的第三特征值阈值时,才将所述服务质量恢复至与所述第二特征值阈值对应的级别。Restoring the quality of service to a second eigenvalue threshold corresponding to determining that the current state value is such that the current eigenvalue satisfies a third eigenvalue threshold that is higher than the second eigenvalue threshold quality level level.
  6. 如权利要求3所述的方法,其特征在于,所述服务恢复步骤包括:The method of claim 3 wherein said service recovery step comprises:
    在判断所述系统状态上升使得当前特征值满足对应于最佳服务质量级 别的第二特征值阈值时,只有在当前特征值进一步满足比所述第二特征值表征更佳服务质量的最佳服务回归值时,才将所述服务质量恢复至所述最佳服务质量级别。When it is determined that the system state rises such that the current feature value satisfies the second eigenvalue threshold corresponding to the optimal QoS level, only the current eigenvalue further satisfies the best service that represents better service quality than the second eigenvalue. The quality of service is restored to the optimal quality of service level upon regression.
  7. 如权利要求3所述的方法,其特征在于,所述当前服务能力的特征值是所述在线服务系统当前所需处理的队列长度;其中,所述当前所需处理的队列长度减小,表征所述系统状态上升;所述当前所需处理的队列长度增加,表征所述系统状态下降。The method according to claim 3, wherein the feature value of the current service capability is a queue length currently required to be processed by the online service system; wherein the current required queue length is reduced, and the representation The system state rises; the queue length of the current required processing increases, indicating that the system state is degraded.
  8. 一种在线服务质量控制系统,其特征在于,包括用于提供在线服务的分布式服务系统和一个或多个质量控制服务器,所述一个或多个质量控制服务器用于:An online quality of service control system, comprising: a distributed service system for providing an online service and one or more quality control servers, the one or more quality control servers for:
    获取用于表征所述分布式服务系统当前服务能力的特征值;Obtaining a feature value for characterizing a current service capability of the distributed service system;
    在当前特征值表征所述分布式服务系统的系统状态下降时,立刻将所述分布式服务系统的服务质量调低至与当前系统状态对应的级别;以及As soon as the current feature value characterizes the system state of the distributed service system, the quality of service of the distributed service system is immediately lowered to a level corresponding to the current system state;
    在当前特征值表征所述系统状态上升时,暂缓将所述服务质量恢复至与当前系统状态对应的级别。When the current feature value characterizes the system state rise, the service quality is suspended to a level corresponding to the current system state.
  9. 如权利要求8所述的控制系统,其特征在于,所述一个或多个质量控制服务器进一步用于:The control system of claim 8 wherein said one or more quality control servers are further for:
    在当前特征值表征所述系统状态上升时,将所述服务质量恢复至比当前系统状态对应的级别要低的级别。When the current feature value characterizes the system state rise, the quality of service is restored to a level lower than a level corresponding to the current system state.
  10. 如权利要求8所述的控制系统,其特征在于,所述服务质量被划分为与多个特征值阈值相对应的多个级别,并且其中,所述一个或多个质量控制服务器进一步用于:The control system of claim 8 wherein said quality of service is divided into a plurality of levels corresponding to a plurality of feature value thresholds, and wherein said one or more quality control servers are further configured to:
    在判断所述系统状态下降使得当前特征值满足第一特征值阈值时,立刻将所述在线服务系统的服务质量调低至与所述第一特征值阈值对应的级别;以及When it is determined that the system state is decreased such that the current feature value satisfies the first feature value threshold, the service quality of the online service system is immediately lowered to a level corresponding to the first feature value threshold;
    在判断所述系统状态上升使得当前特征值满足第二特征值阈值时,暂缓将所述服务质量恢复至与所述第二特征值阈值对应的级别。When it is determined that the system state rises such that the current feature value satisfies the second feature value threshold, the quality of service is suspended to a level corresponding to the second feature value threshold.
  11. 如权利要求10所述的控制系统,其特征在于,所述一个或多个质量控制服务器进一步用于:The control system of claim 10 wherein said one or more quality control servers are further for:
    在判断所述系统状态上升使得当前特征值满足第二特征值阈值达预定 时长时,才将所述服务质量恢复至与所述第二特征值阈值对应的级别。The quality of service is restored to a level corresponding to the second feature value threshold when it is determined that the system state rises such that the current feature value satisfies the second feature value threshold for a predetermined length of time.
  12. 如权利要求10所述的控制系统,其特征在于,所述一个或多个质量控制服务器进一步用于:The control system of claim 10 wherein said one or more quality control servers are further for:
    在判断所述系统状态上升使得当前特征值满足比所述第二特征值阈值质量级别更高的第三特征值阈值时,才将所述服务质量恢复至与所述第二特征值阈值对应的级别。Restoring the quality of service to a second eigenvalue threshold corresponding to determining that the current state value is such that the current eigenvalue satisfies a third eigenvalue threshold that is higher than the second eigenvalue threshold quality level level.
  13. 如权利要求10所述的控制系统,其特征在于,一个或多个质量控制服务器进一步用于:The control system of claim 10 wherein the one or more quality control servers are further configured to:
    在判断所述系统状态上升使得当前特征值满足对应于最佳服务质量级别的第二特征值阈值时,只有在当前特征值进一步满足比所述第二特征值表征更佳服务质量的最佳服务回归值时,才将所述服务质量恢复至所述最佳服务质量级别。When it is determined that the system state rises such that the current feature value satisfies the second eigenvalue threshold corresponding to the optimal QoS level, only the current eigenvalue further satisfies the best service that represents better service quality than the second eigenvalue. The quality of service is restored to the optimal quality of service level upon regression.
  14. 如权利要求10所述的控制系统,其特征在于,所述当前服务能力的特征值是所述分布式服务系统当前所需处理的队列长度;其中,所述当前所需处理的队列长度减小,表征所述系统状态上升;所述当前所需处理的队列长度增加,表征所述系统状态下降。The control system according to claim 10, wherein said feature value of said current service capability is a queue length currently required to be processed by said distributed service system; wherein said currently required queue length is reduced Representing the rise of the system state; the queue length of the current required processing is increased, indicating that the state of the system is degraded.
  15. 如权利要求10所述的控制系统,其特征在于,所述当前服务能力的特征值是所述分布式服务系统的一个或多个服务模块当前所需处理的队列长度;其中,所述当前所需处理的队列长度减小,表征所述系统状态上升;所述当前所需处理的队列长度增加,表征所述系统状态下降;The control system according to claim 10, wherein the feature value of the current service capability is a queue length currently required by one or more service modules of the distributed service system; wherein the current location The length of the queue to be processed is decreased, and the state of the system is marked to rise; the queue length of the current required processing is increased, and the state of the system is degraded;
    并且所述一个或多个质量控制服务器根据所述特征值调整所述一个或多个服务模块的服务质量级别。And the one or more quality control servers adjust the quality of service level of the one or more service modules according to the feature value.
  16. 一种可读存储介质,其特征在于,其上存储有计算机程序;A readable storage medium, characterized in that a computer program is stored thereon;
    所述计算机程序被处理器执行时实现如权利要求1-7任一项所述的方法。The method of any one of claims 1-7 is implemented when the computer program is executed by a processor.
PCT/CN2018/090613 2017-06-13 2018-06-11 Service level control method and system for on-line service system, and readable storage medium WO2018228323A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710443789.1 2017-06-13
CN201710443789.1A CN107317702A (en) 2017-06-13 2017-06-13 The service class control method and system of online service system

Publications (1)

Publication Number Publication Date
WO2018228323A1 true WO2018228323A1 (en) 2018-12-20

Family

ID=60181902

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/090613 WO2018228323A1 (en) 2017-06-13 2018-06-11 Service level control method and system for on-line service system, and readable storage medium

Country Status (2)

Country Link
CN (1) CN107317702A (en)
WO (1) WO2018228323A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107317702A (en) * 2017-06-13 2017-11-03 广东神马搜索科技有限公司 The service class control method and system of online service system
CN110034946A (en) * 2019-01-03 2019-07-19 阿里巴巴集团控股有限公司 Adaptive service degradation method and apparatus
CN109976935B (en) * 2019-03-14 2020-09-04 北京三快在线科技有限公司 Micro service architecture, micro service node and fusing recovery method and device thereof

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1348287A (en) * 2000-09-29 2002-05-08 三星电子株式会社 Apparatus and method for responsing heat transfer service grade produced by terminal
US20020083169A1 (en) * 2000-12-21 2002-06-27 Fujitsu Limited Network monitoring system
CN102869046A (en) * 2011-07-08 2013-01-09 杭州海康威视数字技术股份有限公司 Video transmission method and device for wireless network
CN104394484A (en) * 2014-11-12 2015-03-04 海信集团有限公司 Wireless live streaming media transmission method
CN104506609A (en) * 2014-12-22 2015-04-08 合一网络技术(北京)有限公司 Method and device for automatically monitoring server state and self-adaptively adjusting services
CN106487779A (en) * 2015-08-28 2017-03-08 想象技术有限公司 Bandwidth Management
CN107317702A (en) * 2017-06-13 2017-11-03 广东神马搜索科技有限公司 The service class control method and system of online service system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7000013B2 (en) * 2001-05-21 2006-02-14 International Business Machines Corporation System for providing gracefully degraded services on the internet under overloaded conditions responsive to HTTP cookies of user requests
CN101965007A (en) * 2009-07-21 2011-02-02 中兴通讯股份有限公司 Congestion control method and device for base station
CN103023938B (en) * 2011-09-26 2015-11-25 阿里巴巴集团控股有限公司 A kind of service capability control method of server cluster and system
US9071631B2 (en) * 2012-08-09 2015-06-30 International Business Machines Corporation Service management roles of processor nodes in distributed node service management
US9369525B2 (en) * 2013-06-26 2016-06-14 International Business Machines Corporation Highly resilient protocol servicing in network-attached storage
CN104636213A (en) * 2013-11-15 2015-05-20 上海信游网络科技有限公司 Degradation service replacing technology enhancing SOA survivability

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1348287A (en) * 2000-09-29 2002-05-08 三星电子株式会社 Apparatus and method for responsing heat transfer service grade produced by terminal
US20020083169A1 (en) * 2000-12-21 2002-06-27 Fujitsu Limited Network monitoring system
CN102869046A (en) * 2011-07-08 2013-01-09 杭州海康威视数字技术股份有限公司 Video transmission method and device for wireless network
CN104394484A (en) * 2014-11-12 2015-03-04 海信集团有限公司 Wireless live streaming media transmission method
CN104506609A (en) * 2014-12-22 2015-04-08 合一网络技术(北京)有限公司 Method and device for automatically monitoring server state and self-adaptively adjusting services
CN106487779A (en) * 2015-08-28 2017-03-08 想象技术有限公司 Bandwidth Management
CN107317702A (en) * 2017-06-13 2017-11-03 广东神马搜索科技有限公司 The service class control method and system of online service system

Also Published As

Publication number Publication date
CN107317702A (en) 2017-11-03

Similar Documents

Publication Publication Date Title
WO2018228323A1 (en) Service level control method and system for on-line service system, and readable storage medium
US20170230297A1 (en) Automatic Detection And Prevention Of Network Overload Conditions Using SDN
US8145950B2 (en) Execution of a plugin according to plugin stability level
CN111614746B (en) Load balancing method and device of cloud host cluster and server
CN108600005A (en) A method of defence micro services avalanche effect
CN111694633A (en) Cluster node load balancing method and device and computer storage medium
CN109525500B (en) Information processing method and information processing device capable of automatically adjusting threshold
CN107145388B (en) Task scheduling method and system under multi-task environment
CN106713028B (en) Service degradation method and device and distributed task scheduling system
WO2011079467A1 (en) Method, device and system for scheduling distributed buffer resources
KR950023101A (en) Higher Processor Overload Control Method in a Distributed Exchange System with a Hierarchical Structure
CN112398945A (en) Service processing method and device based on backpressure
CN111245732A (en) Flow control method, device and equipment
CN115277577B (en) Data processing method, apparatus, computer device, and computer readable storage medium
CN111083062A (en) Weight mechanism-based current limiting method and device, computer equipment and storage medium
CN113391910A (en) Task processing method and device, computer equipment and storage medium
CN114598658A (en) Flow limiting method and device
CN106936926B (en) Method and system for accessing data node
US9135064B2 (en) Fine grained adaptive throttling of background processes
CN114143327A (en) Cluster resource quota allocation method and device and electronic equipment
US9459929B2 (en) Configurable dynamic load shedding method in distributed stream computing system
US20180309686A1 (en) Reducing rate limits of rate limiters
CN111338803B (en) Thread processing method and device
CN106547609B (en) Event processing method and device
CN109582460B (en) Redis memory data elimination method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18818661

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18818661

Country of ref document: EP

Kind code of ref document: A1