WO2023082765A1 - 服务器状态控制方法、系统及存储介质 - Google Patents

服务器状态控制方法、系统及存储介质 Download PDF

Info

Publication number
WO2023082765A1
WO2023082765A1 PCT/CN2022/114281 CN2022114281W WO2023082765A1 WO 2023082765 A1 WO2023082765 A1 WO 2023082765A1 CN 2022114281 W CN2022114281 W CN 2022114281W WO 2023082765 A1 WO2023082765 A1 WO 2023082765A1
Authority
WO
WIPO (PCT)
Prior art keywords
server
weight
service
health
preset
Prior art date
Application number
PCT/CN2022/114281
Other languages
English (en)
French (fr)
Inventor
王林翰
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023082765A1 publication Critical patent/WO2023082765A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3433Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment for load management

Definitions

  • the present application relates to the field of content distribution network traffic scheduling, for example, to a server state control method, system and storage medium.
  • the isolation recovery mechanism calculates the server load through bandwidth utilization and other parameters, and compares the settings. threshold to determine whether the service quality of the server is up to standard, so as to isolate or restore the server.
  • the recovery mechanism in some cases, when the specific indicators of the isolated server reach the threshold, it is considered that the isolated server has resumed normal service functions, and the isolated server is directly set to the normal service state, and according to its pre-configured business weight The distribution of business volume is prone to the problem that the server that has just returned to the normal service state enters the isolation state again due to too much traffic being transferred to the server, resulting in low resource utilization of the server.
  • Embodiments of the present application provide a server state control method, system, and storage medium, which can reduce the problem that a server that has just returned to a normal service state enters an isolated state again, thereby improving the utilization rate of server resources.
  • the health value mentioned in this application is an index that characterizes the service quality of the server. If the health value corresponding to the server is greater than the preset health threshold, it is considered that the server is also in the service quality standard state or healthy state. Provide high-quality service to connected users.
  • the embodiment of the present application provides a server state control method, which is applied to a server state control system in a content distribution network, and the method includes: acquiring a health value of a first server in an isolated state, the health value Characterize the service quality of the server; determine the first server whose health value is greater than a preset health threshold as a second server, and the second server is a server in a recovery state; gradually increase the service weight of the second server , until the service weight reaches the preset weight, and the health values are greater than the preset health threshold in the process of gradually increasing the service weight, the second server is determined as the third server, and the third The server is a server in normal service state.
  • an embodiment of the present application provides a server state control system, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the The computer program implements the server state control method described in any one of the embodiments of the first aspect.
  • the embodiments of the present application provide a computer-readable storage medium, which stores computer-executable instructions, and the computer-executable instructions are used to execute the server state as described in any one of the embodiments of the first aspect. Control Method.
  • FIG. 1 is a block diagram of a server state control system provided by an embodiment of the present application
  • FIG. 2 is a flow chart of steps of a server state control method provided in another embodiment of the present application.
  • FIG. 3 is a flow chart of steps of a server state control method provided in another embodiment of the present application.
  • FIG. 4 is a flow chart of steps of a server state control method provided in another embodiment of the present application.
  • FIG. 5 is a flow chart of steps of a server state control method provided in another embodiment of the present application.
  • FIG. 6 is a flow chart of steps of a server state control method provided in another embodiment of the present application.
  • FIG. 7 is a flow chart of steps of a server state control method provided in another embodiment of the present application.
  • FIG. 8 is a flow chart of steps of a server state control method provided in another embodiment of the present application.
  • FIG. 9 is a flow chart of steps of a server state control method provided in another embodiment of the present application.
  • FIG. 10 is a flow chart of steps of a server state control method provided in another embodiment of the present application.
  • Fig. 11 is an example diagram of performing server state control in a method for performing server state control in a content distribution network provided by another embodiment of the present application;
  • Fig. 12 is an example diagram of performing server state control in a method for performing server state control in a content distribution network provided by another embodiment of the present application;
  • Fig. 13 is a schematic diagram of a server status control system provided by another embodiment of the present application.
  • Embodiments of the present application include a server state control method, system, and storage medium, wherein the server state control method is applied to a server state control system in a content distribution network, and the method includes: obtaining a health value of a first server in an isolated state, The health value represents the service quality of the server; the first server whose health value is greater than the preset health threshold is determined as the second server, and the second server is the server in the recovery state; gradually increase the business weight of the second server until the business weight reaches the preset value. Set the weight, and in the process of gradually increasing the business weight, the health values are all greater than the preset health threshold, and the second server is determined as the third server, and the third server is a server in a normal service state.
  • the corresponding weight of the server in the process of restoring one or more servers in the recovery state, can be gradually increased, and users can be assigned to the server in recovery in multiple times , so that the server in recovery is also in the state of service quality compliance, increase the proportion of service quality standard servers in the entire content distribution network, and improve network service quality.
  • the server state control method proposed in this application is essentially to enable the content distribution network to better perform traffic scheduling.
  • the resources occupied by the server itself Influenced by circumstances, other emergencies of the server, etc. it will lead to the "unhealthy state" of the server's poor service quality, and in contrast, the server state that can provide better service quality is called "healthy state”.
  • an embodiment of the present application provides a specific architecture of a content distribution network-based server status control system.
  • the server state control system includes: hardware layer, data layer, data preprocessing layer, algorithm execution layer and master control layer.
  • the server state control system is used to process all steps from data generation to prediction results, and each level of the server state control system has an independent data format verification algorithm, data integrity verification algorithm, and data completion algorithm, which can greatly improve System robustness.
  • the hardware layer is composed of servers in the content distribution network.
  • the servers located in the entire dispatching system network can be divided into normal service groups, isolation groups and recovery groups.
  • the normal service groups include Servers in the normal service state, the isolation group includes servers in the isolation state, and the recovery group includes servers in the recovery state.
  • the servers in the normal service group undertake normal service functions in the content distribution network and provide traffic for users;
  • the servers in the isolated group are servers whose health value is less than the preset health threshold, and at the same time, the servers in the isolated group are set to After the isolation state, assign the user terminal connected to the server to the server in the normal server state in the normal service group;
  • the server in the recovery group is the server whose health value in the isolation group is greater than the preset health threshold. It is conceivable that after the server in the isolation group is isolated, the server control system will regularly check the health value of the server in the isolation group, and when the corresponding health value of the server in the isolation group is greater than the preset health threshold, the server will be included in the recovery Group monitoring.
  • the data layer includes: server status data interface, network status data interface and scheduling data delivery interface.
  • the server status data interface is used to obtain the underlying data of the server from each server, such as the CPU load rate, memory load rate, disk load rate and other data of each server in the content distribution network;
  • the network status data interface is used to Obtain the network condition parameters between the server and the user from the network ports of each server, such as the proportion of 5XX error services for user services, the bandwidth utilization rate of the server, the return-to-source rate of the server and other data;
  • the dispatching data delivery interface is used to send The isolation recovery command of the group isolation and recovery module and the weight parameters of each server generated by the intelligent scheduling module are sent to each server, so that each server performs corresponding operations after receiving the corresponding command.
  • the scheduling data delivery interface sends to The server issues a group isolation command.
  • the server After the server receives the group isolation command, the server automatically stops the service, and the GSLB system of the content distribution network dispatches the user requests on the server to other servers.
  • the GSLB system is a commonly used technology in related fields, so I won’t repeat it here; it is conceivable that after the server status data interface and the network status data interface collect data, the collected data is transmitted to the data preprocessing layer for subsequent data preprocessing Layers prepare for preprocessing operations on the transmitted data.
  • the data preprocessing layer includes: data formatting module and data cleaning module. It is conceivable that since the content distribution network is a distributed network system, the hardware facilities of each device in the system are different, and the types of data uploaded through the interface are different. Therefore, it is necessary to preprocess the parameters uploaded through the interface. Valid data is extracted from the uploaded parameters.
  • the data formatting module is used to extract the data corresponding to the items required by the detection module from the data of different items obtained by the server status data interface and the network status data interface, and re-format the data corresponding to the above required items Spliced into a data format that can be processed by the detection module;
  • the data cleaning module is used to verify the authenticity and reliability of the data obtained from the server status data interface and the network status data interface, clean the data with obvious errors, and send it again
  • the collection instruction is sent to the data layer, so that the data layer collects the data with obvious errors again, and updates the data with obvious errors in time.
  • the algorithm execution layer includes: a detection module, a packet isolation module, a packet recovery module, a capacity decision module, and an intelligent scheduling module.
  • the detection module is used to obtain the health value of each server in the content distribution network
  • the group isolation module is used to isolate the server
  • the group recovery module is used to control the recovery operation of the isolated server
  • the capacity decision module is used to control the server In the stable service state
  • the intelligent scheduling module is used to perform intelligent scheduling operations among the servers in the normal service state in the content distribution network, so that the servers are in a balanced state.
  • group recovery instructions, weight setting instructions and normal service group setting instructions are all issued by the group recovery module and sent to the target server through the data delivery interface.
  • the main control layer includes a main control module, which is used for global setting and control of the service state control system.
  • the server state control system is also provided with a main control server and an impedance server, the main control server is used to control the server to isolate according to the QOS parameter information generated by the server itself, and the impedance server is used to control the distribution of user terminals to Servers other than quarantine servers.
  • FIG. 2 is a flowchart of steps of a server state control method provided by an embodiment of the present application.
  • the server state control method is applied to a server state control system in a content distribution network.
  • the server state control method includes but does not Limited to the following steps:
  • Step S210 acquiring the health value of the first server in the isolation state, where the health value represents the service quality of the server.
  • the solution proposed in this application judges the status of the server by comparing the health value with the preset health threshold.
  • the corresponding health value of the server is greater than the preset health threshold, it means that the server is in a healthy state. It can provide the connected user clients with the expected quality of service.
  • the corresponding health value of the server is less than the preset health threshold, it means that the server is in an unhealthy state and cannot provide the connected user clients with the expected service quality.
  • the server is in an isolated state, which means that the server is an unhealthy server, and the server does not undertake any traffic load function in the content distribution network, so as to prevent the server from bringing bad experience to users.
  • the specific value of the preset health threshold depends on the expected value of the service quality of each server in the content distribution network by the technicians in the relevant field during the actual operation.
  • the specific operating data of each server in the service state sets the preset health threshold.
  • step S220 the first server whose health value is greater than the preset health threshold is determined as the second server, and the second server is a server in recovery state.
  • the first server in the isolated state does not undertake any traffic load function. As time goes by, the first server in the isolated state will gradually regain its consumed server resources such as memory and bandwidth, making the server The health value of the first server is increased, and the server state control system will periodically detect the health value of each first server when the first server exists. When the health value of the first server is greater than the preset health threshold, it will be considered The first server can process part of the user service data under the condition that the quality of service requirements are met, and at the same time, determine the first server as the second server, and prepare to perform subsequent recovery processing on it.
  • the second server is a server in a recovery state, and being in a recovery state means that the server already has a part of the ability to process user service data under the premise of meeting the quality of service requirements, but this ability may not yet reach the normal state.
  • the standard of the server in the service state so it is necessary to observe whether the health value of the user reaches the preset health threshold when the user is gradually transferred in according to the recovery strategy, so that there is a recovery process.
  • Step S230 gradually increase the business weight of the second server until the business weight reaches the preset weight, and during the process of gradually increasing the business weight, the health values are all greater than the preset health threshold, and determine the second server as the third server, the third server A server that is in normal service.
  • the business weight is used to determine the business volume allocated to the server corresponding to the business weight or the connection volume of the user client.
  • the server corresponding to the business weight is allocated more services, and the scale of the allocated business varies with Increases with the increase of business weight, and at the same time, the increase of distribution business will increase the load of the corresponding server, and the health value of the corresponding server will also fluctuate under the influence of load changes. Generally, the health value of the corresponding server will increase when the load increases.
  • the second server to ensure that the health value corresponding to the second server is greater than the preset health threshold in the process of gradually increasing the business weight, so that the second server can still provide the service quality that meets the expected service for the user client connected to it in the recovery state, Reduce the occurrence of the problem that the server that has just returned to the normal service state enters the isolation state again due to too much business volume transferred to the server, and improves the server utilization rate.
  • the third server is a server in a normal service state, and this type of server undertakes normal service functions in the content distribution network to provide users with traffic; in a normal service state, it means that the health value of this type of server is greater than the preset health threshold , this type of server is a healthy server.
  • the content of the invention of this application is mainly a method for controlling the status of various servers in the content distribution network, through more effective control of server status changes, to achieve more timely scheduling of customer business, services or traffic , to reduce the occurrence of network congestion and improve the robustness of the entire content distribution network.
  • the first server, the second server and the third server are servers in different service states in the content distribution network. This classification does not affect the server itself and the content distribution
  • the network ontology constitutes a limitation.
  • the service weight is obtained according to the scheduling weight set by the content distribution network and the built-in weight coefficient.
  • the scheduling weight is the content distribution network determined by its own intelligent scheduling function. weight, and the weight coefficient represents the recovery status and recovery progress of the server during recovery.
  • the business weight is increased by increasing the weight coefficient to achieve the effect of gradually increasing the server's pending business volume.
  • the scheduling weight and The multiplication result of the weight coefficient is determined as the business weight, and the corresponding service volume to be processed by the server is determined according to the total service volume to be processed in the content distribution network and the business weight.
  • Those skilled in the relevant fields can set the range of the weight coefficient according to the actual situation to control the server's expected Recovery effect.
  • step S230 of the embodiment shown in FIG. 2 includes but is not limited to the following steps:
  • Step S310 gradually increase the service weight of the second server according to the preset weight increase rule, until the service weight reaches the preset weight, and the health values are greater than the preset health threshold during the process of gradually increasing the service weight, and the second server The server is determined to be the third server.
  • the service weight of the second server is gradually increased through the preset weight increase rule until the service weight reaches the preset weight, wherein, through the preset weight increase rule, the service weight can be increased by the same method each time.
  • the unit weight is increased linearly, for example: 0.1, 0.2, 0.3, 0.4;
  • the business weight can also be increased exponentially in the way that the weight increases as the number of increases changes, for example: 0.1, 0.2, 0.4, 0.7 ;
  • the specific weight increasing rules are not limited by examples, and those skilled in the relevant fields can adjust it according to the actual situation.
  • the health value corresponding to the second server should be detected to ensure the health value corresponding to the second server If the value is greater than the preset health threshold, until the business weight reaches the preset weight, which means that the second server is completely restored, then the second server is determined as the third server, and the second server assumes normal service functions in the content distribution network.
  • step S310 of the embodiment shown in FIG. 3 includes but is not limited to the following steps:
  • Step S410 adding a first unit of weight to the service weight of the second server according to a preset weight increasing rule, and acquiring the health value of the second server.
  • Step S420 when the health value is greater than the preset health threshold, execute again the step of adding the first unit weight to the business weight of the second server according to the preset weight increase rule, and obtain the health value of the second server until the business weight When the preset weight is reached and the health value is greater than the preset health threshold, the second server is determined as the third server.
  • the business weight of the second server is increased by the first unit weight to obtain the health value of the second server, and at the same time, the interval between each increase of the first unit weight is determined as Recovery cycle, in each recovery cycle, gradually increase the business weight of the server, and continuously detect its health value, until the server's scheduling weight reaches the normal weight and the health value in each recovery cycle is higher than the health value threshold, then the The second server is reset to the third server, wherein the normal weight is the corresponding weight determined by the content distribution network through the intelligent scheduling module when the server is in a normal service state.
  • the weight of the first unit set in this step is not limited to a fixed value, it can be changed according to the weight increase rule, so that the server in the recovery state has a more stable and gradual recovery effect, reducing the need for adjustments to the server.
  • the embodiment shown in FIG. 2 also includes but not limited to the following steps:
  • Step S510 acquiring the total service traffic to be processed and the service bandwidth of each first server.
  • step S520 the load index is obtained according to the total business traffic and the bandwidth of each service.
  • Step S530 when the load index is greater than or equal to a preset dangerous threshold, determine several first servers and/or several second servers as third servers.
  • the total business traffic to be processed refers to the total traffic that needs to be processed by the entire content distribution network, and the main service function of the content distribution network is undertaken by the third server in the normal service state, so it is necessary to obtain the service bandwidth of each third server , to determine the overall load capacity of the content distribution network.
  • the service bandwidth is only one of the indicators representing the server load capacity.
  • the main content to be protected in the scheme proposed in this embodiment is a content distribution network.
  • the third server in the normal service state cannot undertake the main service functions of the content distribution network. In order to quickly improve the overall content of the content distribution network
  • the load capacity the method of increasing the number of third servers.
  • the load index can be obtained by comparing the total traffic in the content distribution network with the total service bandwidth of the normal service server, and the load index can represent the load status of the current content distribution network. To determine whether the load of the content distribution network is too high.
  • the ratio of the total traffic in the content distribution network to the total service bandwidth of the normal service server is used as the load index, and if the ratio reaches a preset dangerous threshold, several first servers and/or several The second server is determined as the third server to reduce the load of the servers in the normal service state in the current content distribution network, and ensure that the servers in the normal service state in the content distribution network are in a stable service state.
  • each preset danger threshold corresponds to a different scheduling policy of the first server and the second server, for example: a first danger threshold A and a second danger threshold B are set , when the load index reaches the first dangerous threshold A, only change the server status of the second server, and determine several second servers as the third server to relieve the load pressure on the content distribution network; when the load index reaches the second dangerous threshold B , change the server status of the first server and the second server at the same time, and determine several first servers and several second servers as the third server to relieve the load pressure on the content distribution network; it is conceivable that under normal circumstances, in The current load capacity of the second server in the recovery state is generally higher than that of the first server in the isolation state, so the second server is preferentially determined as the third server.
  • the first server is determined as the third server, so as to alleviate the load pressure of the servers in the normal service state
  • step S530 of the embodiment shown in FIG. 5 further includes but not limited to the following steps:
  • Step S610 when the load index is greater than or equal to the preset dangerous threshold, obtain the service weight of each second server.
  • Step S620 determining according to the service weight that the second server corresponding to the highest service weight is determined as the third server.
  • the second server corresponding to the highest service weight is prioritized as the third server.
  • the server can share more load pressure of the servers in normal service state, reduce the load index, and enable the content distribution network to provide more stable services.
  • the step of determining the second server corresponding to the highest service weight according to the service weight is determined as the third server, if the load index is still greater than or equal to the preset dangerous threshold, then continue to perform the step of determining the highest service weight according to the service weight.
  • the step of determining the second server corresponding to the service weight as the third server until the load index is less than the preset dangerous threshold, so as to ensure that the content service network can provide services whose service quality meets the expected demand.
  • the embodiment shown in FIG. 2 also includes but not limited to the following steps:
  • Step S710 acquiring the health value of the second server.
  • Step S720 determining the second server whose health value is less than the preset health threshold as the first server.
  • the health value of the second server is obtained, and the second server whose health value is less than the preset health threshold is determined as the first server, and the health value of each second server in the recovery state is guaranteed to be greater than the preset health threshold, so that the recovery
  • Each second server in the state is a healthy server; it is conceivable that the second server represents a recovery group in the content distribution network.
  • the unhealthy server is isolated, and the second server is determined as the first server.
  • the user terminal connected to the isolated second server is allocated to each of the second servers in the normal service state.
  • the embodiment shown in FIG. 2 also includes but not limited to the following steps:
  • Step S810 acquiring the health value of the third server.
  • Step S820 determining the third server whose health value is less than the preset health threshold as the first server.
  • the health value of the third server is obtained, and the third server whose health value is less than the preset health threshold is determined as the first server, and the health value of each third server in the normal service state is guaranteed to be greater than the preset health threshold, so that in Each third server in the normal service state is a healthy server; it is conceivable that, from the original intention of setting up the third server, the third server represents a normal service group in the content distribution network, in order to make each server in the normal service group uniform For healthy servers to meet the quality requirements of normal services in the content distribution network, it is necessary to isolate the unhealthy servers according to the health value, determine the third server as the first server, and use the traffic scheduling function of the content distribution network itself.
  • the user terminals connected to the isolated third server are assigned to the first servers in the normal service state, so as to ensure that the user can be provided with services whose service quality meets expectations.
  • the method for obtaining the health value in the embodiment shown in FIG. 2 includes but is not limited to the following steps:
  • Step S910 acquiring user service data of each server, where the server includes at least one of the following: a first server, a second server, and a third server.
  • step S920 the service quality evaluation result is obtained according to the user service data.
  • step S930 the health value is obtained according to the evaluation result of the service quality.
  • the user service data of each server in the content distribution network is acquired, and the server includes at least one of the following: a first server, a second server, and a third server, and it is conceivable that the first server, the second server, and the third server It is only classified according to the different states of the server, and there is no limit to the server itself, so the method of obtaining the health value of each server will not change due to the different states of the server.
  • the underlying data of each server is acquired through the underlying data interface, and the network condition parameters between each server and the user are acquired through the network status data interface.
  • the underlying data and network condition parameters are formatted and data The cleaning operation gets the user service data.
  • the user service data is analyzed to obtain the service quality evaluation result, and the health value representing the service quality of the server is generated according to the service quality evaluation result.
  • the service data is analyzed to obtain the service quality evaluation result, which will not be repeated here.
  • the server status control method in any one of the embodiments in FIG. 2 to FIG. 9 includes but is not limited to the following steps:
  • Step S1010 obtaining the load rate of each third server
  • Step S1020 performing balancing processing on each third server according to the load ratio, so that the third servers are in a balanced state.
  • the load rate of each third server in a normal service state is obtained according to the underlying server data and network condition parameters, wherein, based on the traffic scheduling function of the content distribution network itself, the server state control system is The third server in the normal service state provides scheduling services, and at the same time distributes traffic between each third server according to the load rate of each third server, and balances the business volume corresponding to each third server, so that each All the third servers are in a balanced state, reducing the occurrence of sudden large traffic that causes some third servers to be overloaded and thus causes the content distribution network to collapse.
  • FIG. 11 is an example diagram of a server state control method in a content distribution network provided by another embodiment of the invention, including but not limited to the following steps:
  • Step S1110 obtaining the underlying data of each server in the content distribution network
  • Step S1111 perform data formatting and data cleaning on the underlying data of each service to obtain user service data
  • Step S1112 judging whether all servers in the content distribution network have been traversed, if not, execute step S1113, if yes, end this server state control;
  • Step S1113 obtaining the health value according to the corresponding user service data of the server;
  • Step S1114 judging whether the health value corresponding to the server is greater than the preset health threshold, if yes, execute step S1112, if not, execute step S1115;
  • Step S1115 determining the server as the first server in the isolation state
  • Step S1116 judging whether the health value corresponding to the first server is greater than the preset health threshold, if yes, execute step S1117, if not, execute step S1115;
  • Step S1117 determining the first server as the second server in recovery state
  • Step S1119 judging whether the health value corresponding to the second server is greater than the preset health threshold, if yes, execute step S1120, if not, execute step S1115;
  • Step S1120 judging whether the weight of the service corresponding to the second server reaches the preset weight, if yes, execute step S1121, if not, execute step S1118;
  • Step S1121 determining the second server as the third server in a normal service state.
  • FIG. 12 is an example diagram of server state control in a method for controlling server state in a content distribution network provided by another embodiment of the invention, including but not limited to the following steps:
  • Step S1210 obtaining the total service traffic to be processed and the service bandwidth of each first server
  • Step S1211 obtain the load index according to the total business traffic and the bandwidth of each service
  • Step S1212 judging whether the load index is greater than or equal to the preset dangerous threshold, if yes, execute step S1213, if not, execute step S1210;
  • Step S1213 determining the second server corresponding to the highest service weight as the third server according to the service weight
  • Step S1214 judging whether the load index is still greater than or equal to the preset dangerous threshold, if yes, execute step S1213, if not, end this server state control.
  • an embodiment of the present application also provides a server state control system 1300, a memory 1320, a processor 1310, and a computer program stored in the memory 1320 and operable on the processor 1310, and the processor 1310
  • the server state control method of any one of the foregoing is realized, for example, performing method steps S210 to S230 in FIG. 2 described above, method steps S310 in FIG. 3 , method steps S410 to S420 in FIG. 4 , and FIG. Method steps S510 to S530 in 5, method steps S610 to S620 in Fig. 6, method steps S710 to S720 in Fig. 7, method steps S810 to S820 in Fig.
  • an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by one or more control processors, for example, executing Method steps S210 to S230 in FIG. 2 described above, method steps S310 in FIG. 3 , method steps S410 to S420 in FIG. 4 , method steps S510 to S530 in FIG. 5 , method steps S610 to S620 in FIG. 6 , method steps S710 to S720 in Fig. 7, method steps S810 to S820 in Fig. 8, method steps S910 to S930 in Fig. 9, method steps S1010 to S1020 in Fig. 10, method steps S1110 to S1121 in Fig. 11 , method steps S1210 to S1214 in FIG. 12 .
  • Embodiments of the present application include a server state control method, system, and storage medium, wherein the server state control method is applied to a server state control system in a content distribution network, and the method includes: obtaining a health value of a first server in an isolated state, The health value represents the service quality of the server; the first server whose health value is greater than the preset health threshold is determined as the second server, and the second server is the server in the recovery state; gradually increase the business weight of the second server until the business weight reaches the preset value. Set the weight, and in the process of gradually increasing the business weight, the health values are all greater than the preset health threshold, and the second server is determined as the third server, and the third server is a server in a normal service state.
  • the corresponding weights of the servers are gradually increased, and at the same time, services are allocated to the servers in the recovery state according to the gradually allocated weights, which can reduce There is a problem that the server that has just returned to the normal service state enters the isolation state again due to the excessive amount of business transferred to the server, thereby improving the utilization rate of server resources.
  • Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer And Data Communications (AREA)
  • Hardware Redundancy (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本申请公开了一种服务器状态控制方法、系统及存储介质,其中,服务器状态控制方法应用于内容分发网络中的服务器状态控制系统,方法包括:获取处于隔离状态的第一服务器的健康值,健康值表征服务器的服务质量(S210);将健康值大于预设健康阈值的第一服务器确定为处于恢复状态的第二服务器(S220);逐步增加第二服务器的业务权重,直至业务权重达到预设权重,且在逐步增加业务权重过程中健康值均大于预设健康阈值,将第二服务器确定为处于正常服务状态的第三服务器(S230)。

Description

服务器状态控制方法、系统及存储介质
相关申请的交叉引用
本申请基于申请号为202111342245.9、申请日为2021年11月12日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及内容分发网络流量调度领域,例如涉及一种服务器状态控制方法、系统及存储介质。
背景技术
在内容分发网络中,大部分的故障都是由于网络环境暂时故障,或者负载率过高响应速度不够造成的,现有一种隔离恢复机制,通过带宽利用率等参数,计算服务器负载,对比设定的阈值,判断服务器服务质量是否达标,从而对服务器进行隔离或恢复操作。
但一些情形下的恢复机制,当被隔离服务器的具体指标达到阈值时,则认为该被隔离服务器已经恢复正常服务功能,直接将隔离服务器设置为正常服务状态,并根据其预配置的业务权值分配业务量,容易出现服务器因被调入业务量过多,而使得刚恢复为正常服务状态的服务器再次进入隔离状态的问题,导致服务器的资源利用率低。
发明内容
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。
本申请实施例提供一种服务器状态控制方法、系统及存储介质,能够减少出现刚恢复为正常服务状态的服务器再次进入隔离状态的问题,从而提高服务器资源的利用率。
需要说明的是,本申请中所提到的健康值为表征服务器的服务质量的一种指标,服务器对应的健康值大于预设健康阈值,则认为服务器也处于服务质量达标状态或健康状态,能为连接的用户提供高质量的服务。
第一方面,本申请实施例提供了一种服务器状态控制方法,应用于内容分发网络中的服务器状态控制系统,所述方法包括:获取处于隔离状态的第一服务器的健康值,所述健康值表征服务器的服务质量;将所述健康值大于预设健康阈值的所述第一服务器确定为第二服务器,所述第二服务器为处于恢复状态的服务器;逐步增加所述第二服务器的业务权重,直至所述业务权重达到预设权重,且在逐步增加所述业务权重过程中所述健康值均大于所述预设健康阈值,将所述第二服务器确定为第三服务器,所述第三服务器为处于正常服务状态的服务器。
第二方面,本申请实施例提供了一种服务器状态控制系统,包括:存储器、 处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现第一方面任意一项实施例所述的服务器状态控制方法。
第三方面,本申请实施例提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行如第一方面任意一项所述实施例所述的服务器状态控制方法。
本申请的其它特征和优点将在随后的说明书中阐述,并且,本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。
附图说明
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例一起用于解释本申请的技术方案,并不构成对本申请技术方案的限制。
图1是本申请一个实施例提供的服务器状态控制系统的架构框图;
图2是本申请另一个实施例提供的服务器状态控制方法的步骤流程图;
图3是本申请另一个实施例提供的服务器状态控制方法的步骤流程图;
图4是本申请另一个实施例提供的服务器状态控制方法的步骤流程图;
图5是本申请另一个实施例提供的服务器状态控制方法的步骤流程图;
图6是本申请另一个实施例提供的服务器状态控制方法的步骤流程图;
图7是本申请另一个实施例提供的服务器状态控制方法的步骤流程图;
图8是本申请另一个实施例提供的服务器状态控制方法的步骤流程图;
图9是本申请另一个实施例提供的服务器状态控制方法的步骤流程图;
图10是本申请另一个实施例提供的服务器状态控制方法的步骤流程图;
图11是本申请另一个实施例提供的在内容分发网络中进行服务器状态控制方法进行服务器状态控制的实例图;
图12是本申请另一个实施例提供的在内容分发网络中进行服务器状态控制方法进行服务器状态控制的实例图;
图13是本申请另一个实施例提供的服务器状态控制系统的示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。
需要说明的是,虽然在系统示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于系统中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。
本申请实施例包括一种服务器状态控制方法、系统及存储介质,其中,服务器状态控制方法应用于内容分发网络中的服务器状态控制系统,方法包括:获取处于隔离状态的第一服务器的健康值,健康值表征服务器的服务质量;将健康值大于预设健康阈值的第一服务器确定为第二服务器,第二服务器为处于恢复状态的服务器;逐步增加第二服务器的业务权重,直至业务权重达到预设权重,且在逐步增加业务权重过程中健康值均大于预设健康阈值,将第二服务器确定为第三服务器,第三服务器为处于正常服务状态的服务器。根据本申请实施例提供的方案,能根据服务器对应的健康值和权重,在对恢复状态中一个或多个服务器进行恢复过程中,逐步提高服务器对应权重,分多次为恢复中的服务器分配用户,使恢复中的服务器也处于服务质量达标状态,提高整个内容分发网络中服务质量达标服务器占比,提高网络服务质量。
需要说明的是,本申请提出的服务器状态控制方法本质是为了使内容分发网络能更好的进行流量调度,在内容分发网络的正常运行过程中,由于服务器所在当地的网络环境,服务器自身资源占用情况,服务器其他突发情况等的影响,会导致该服务器的服务质量较差的“不健康状态”,而与之相对的,能够提供较好的服务质量的服务器状态称之为“健康状态”。为了保证在内容分发网络中的服务器均为健康状态,从而为用户提供良好的内容服务,需要对于整个网络中的服务器的服务质量进行评价,并通过该评价结果进行流量调度,将部分不健康服务器隔离,将恢复健康的服务器恢复服务,从而保证网络的流畅运行和良好的服务质量。
下面结合附图,对本申请实施例作进一步描述。
参考图1,本申请的一实施例提供了一种基于内容分发网络的服务器状态控制系统的具体架构。服务器状态控制系统包括:硬件层、数据层、数据预处理层、算法执行层以及主控层。服务器状态控制系统用于处理数据自产生至预测结果所有步骤,且服务器状态控制系统每一层级均具有独立的数据格式校验算法,数据完整性校验算法,数据补全算法,能够极大提升系统的鲁棒性。
其中,硬件层由内容分发网络中的服务器构成,能够根据服务器在网络中的服务情况,将位于整个调度系统网络中的服务器分为正常服务分组、隔离分组和恢复分组,其中正常服务分组包括处于正常服务状态的服务器,隔离分组包括处于隔离状态的服务器,恢复分组包括处于恢复状态的服务器。
需要说明的是,正常服务分组中服务器在内容分发网络中承担正常服务功能,为用户提供流量;隔离分组中服务器为健康值小于预设健康阈值的服务器,同时,隔离分组中服务器在被设置为隔离状态后,将该服务器下所连接的用户终端分配至正常服务分组中处于正常服务器状态的服务器;恢复分组中服务器为隔离分组中的健康值大于预设健康阈值的服务器。可以想到的是,隔离分组服务器在被隔离后,服务器控制系统将定期对隔离分组中的服务器进行健康值检测,当隔离分组中的服务器对应健康值大于预设健康阈值时,将该服务器纳入恢复分组进行监 测。
数据层包括:服务器状态数据接口、网络状态数据接口和调度数据下发接口。
需要说明的是,服务器状态数据接口用于从各个服务器中获得服务器的底层数据,例如内容分发网络中的各个服务器的CPU负载率,内存负载率,磁盘负载率等数据;网络状态数据接口用于从各个服务器的网络端口中获得服务器和用户之间的网络情况参数,例如用户服务的5XX错误服务占比,服务器的带宽利用率,服务器的回源率等数据;调度数据下发接口用于将分组隔离、恢复模块的隔离恢复命令以及智能调度模块生成的各个服务器的权重参数下发至各个服务器中,使各个服务器在接到相应的命令后执行相应的操作,例如,调度数据下发接口向服务器下发分组的隔离命令,服务器收到分组的隔离命令后,服务器自动停止服务,并由内容分发网络的GSLB系统将该服务器上的用户请求调度至其他服务器中,相关领域技术人员可以理解的是,GSLB系统为相关领域常用技术,在此不再赘述;可以想到的是,服务器状态数据接口和网络状态数据接口进行数据采集后,将采集数据传输到数据预处理层,为后续数据预处理层根据所传输数据进行预处理操作做准备。
数据预处理层包括:数据格式化模块,数据清洗模块。可以想到的是,由于内容分发网络为一个分布式网络系统,系统中各设备的硬件设施不同,通过接口上传的数据类型存在差别,因此需要对通过接口上传的各项参数进行预处理,从接口上传的各项参数中抽取出有效数据。
需要说明的是,数据格式化模块用于从服务器状态数据接口和网络状态数据接口获取的不同条目的数据中抽取检测模块所需条目对应的数据,并将上述所需条目对应数据的数据格式重新拼接成检测模块所能处理的数据格式;数据清洗模块用于检验从从服务器状态数据接口和网络状态数据接口获取的数据的真实性和可靠性,对存在明显错误的数据进行清洗,并发送再次采集指令至数据层,使数据层对该存在明显错误的数据再次进行数据采集,及时更新该存在明显错误的数据。
算法执行层包括:检测模块,分组隔离模块,分组恢复模块,容量决策模块,智能调度模块。其中,检测模块用于得到内容分发网络中的各个服务器的健康值,分组隔离模块用于对服务器进行隔离操作,分组恢复模块用于控制隔离的服务器进行恢复操作用,容量决策模块用于控制服务器处于稳定服务状态,智能调度模块用于在内容分发网络中处于正常服务状态的服务器之间执行智能调度操作,使服务器处于均衡状态。
需要说明的是,在进行服务器状态控制的过程中,分组的恢复指令、权重设置指令以及正常服务分组设置指令均通过分组恢复模块发出,并通过数据下发接口下发至目标服务器中。
主控层包括主控模块,主控模块用于进行对服务状态控制系统的全局设置和控制。
需要说明的是,一实施例中,服务器状态控制系统还设置有主控服务器和阻抗服务器,主控服务器用于根据服务器自身产生QOS参数信息控制服务器进行隔离,阻抗服务器用于控制用户终端分配至除隔离服务器外其他服务器。
另外,参考图2,图2是本申请一个实施例提供的服务器状态控制方法的步骤流程图,该服务器状态控制方法应用于内容分发网络中的服务器状态控制系统,该服务器状态控制方法包括但不限于有以下步骤:
步骤S210,获取处于隔离状态的第一服务器的健康值,健康值表征服务器的服务质量。
需要说明的是,本申请提出的方案通过对比健康值和预设健康阈值的方式,来判断服务器所处状态,当服务器对应的健康值大于预设健康阈值时,说明服务器处于健康状态,该服务器能为连接的用户客户端提供服务质量达到预期的服务,相对的,当服务器对应的健康值小于预设健康阈值时,则说明服务器处于不健康状态,不能为连接的用户客户端提供服务质量达到预期的服务;可以理解的是,当服务器对应的健康值等于预设健康阈值时,相关领域技术人员可以根据健康阈值的实际定义,进而把健康值等于预设健康阈值的服务器归于健康状态或不健康状态中的一种,并按照相应的方法对该服务器进行处理。
需要说明的是,本实施例中服务器处于隔离状态代表该服务器为不健康服务器,该服务器在内容分发网络中不承担任何流量负载功能,以免该服务器给用户带来不良的使用体验。
值得说明的是,预设健康阈值的具体取值,取决于相关领域技术人员在实际操作过程中对内容分发网络中各服务器服务质量的预期值,相关领域技术人员可以根据内容分发网络中处于正常服务状态下的各服务器的具体运行数据设置预设健康阈值。
步骤S220,将健康值大于预设健康阈值的第一服务器确定为第二服务器,第二服务器为处于恢复状态的服务器。
需要说明的是,处于隔离状态的第一服务器不承担任何流量负载功能,随着时间的推移,处于隔离状态的第一服务器会逐渐重新获取其被消耗掉的内存、带宽等服务器资源,使该第一服务器的健康值提升,服务器状态控制系统在有第一服务器存在的情况下,将定期检测各第一服务器的健康值,当有第一服务器健康值大于预设健康阈值时,则认为该第一服务器可以在满足服务质量需求的情况下处理部分用户服务数据,同时,将该第一服务器确定为第二服务器,准备对其进行后续的恢复处理。
需要说明的是,第二服务器为处于恢复状态的服务器,而处于恢复状态则代表该服务器已经具备部分在满足服务质量需求的前提下处理用户服务数据的能力,但该能力可能还未达到处于正常服务状态的服务器的标准,因此需要根据恢复策略在分次逐步调入用户的情况下观察其健康值是否达到预设健康阈值,使其有一个恢复的过程。
步骤S230,逐步增加第二服务器的业务权重,直至业务权重达到预设权重,且在逐步增加业务权重过程中健康值均大于预设健康阈值,将第二服务器确定为第三服务器,第三服务器为处于正常服务状态的服务器。
其中,业务权重用于确定分配给业务权重对应服务器的业务量或者是用户客户端连接量,通过增加业务权重的方式,使该业务权重对应服务器被分配更多的业务,分配的业务的规模随着业务权重的增加而增加,同时分配业务增加则会使对应服务器负载增加,对应服务器的健康值也会在负载变化的影响下而波动,一般情况下,对应服务器的健康值会在负载增加的情况下降低,因此,在本步骤中,当逐步增加第二服务器的业务权重,直至业务权重达到预设权重时,其健康值若大于预设健康阈值,则说明该第二服务器具备处理正常服务状态下应当承担业务量的能力,但为了保证该第二服务器在恢复的过程中依然处于服务质量达标状态即健康状态,需要在逐步增加业务权重过程中,定期或持续检测该第二服务器对应健康值,保证在逐步增加业务权重过程中该第二服务器对应健康值均大于预设健康阈值,使在恢复状态下该第二服务器依然能为其连接的用户客户端提供服务质量满足预期的服务,减少因对服务器调入业务量过多而使得刚恢复为正常服务状态的服务器再次进入隔离状态的问题的发生,提高服务器利用率。
可以理解的是,第三服务器为处于正常服务状态的服务器,该类服务器在内容分发网络中承担正常服务功能,为用户提供流量;处于正常服务状态则说明该类服务器健康值大于预设健康阈值,该类服务器为健康服务器。
可以理解的是,本申请的发明内容主要为对内容分发网络中各类服务器状态进行控制的方法,通过更加有效的控制服务器状态改变,以达到对客户业务、服务或者是流量进行更及时的调度,减少网络拥塞的发生,提高整个内容分发网络的健壮性,其中第一服务、第二服务器和第三服务器为内容分发网络中处于不同服务状态的服务器,这种分类并不对服务器本身和内容分发网络本体构成限制。
在一实施例中,业务权重根据内容分发网络设定的调度权重和内置的权重系数得到,可以想到的是,调度权重是内容分发网络通过本身的智能调度功能确定的正常服务状态下服务器应当分配的权重,权重系数则代表恢复中服务器的恢复状态和恢复进度,在服务器恢复过程中,通过增加权重系数的方式增加业务权重,达到逐步增加服务器待处理业务量的效果,例如,将调度权重与权重系数的乘算结果确定为业务权重,根据内容分发网络的待处理的总服务量与业务权重确定对应服务器待处理业务量,相关领域技术人员可以根据实际情况设置权重系数的范围,控制服务器预期恢复效果。
另外,参考图3,在一实施例中,图2所示实施例步骤S230包括但不限于以下步骤:
步骤S310,根据预设的权重增加规则对第二服务器的业务权重进行逐步增加处理,直至业务权重达到预设权重,且在逐步增加业务权重过程中健康值均大于预设健康阈值,将第二服务器确定为第三服务器。
需要说明的是,通过预设的权重增加规则对第二服务器的业务权重进行逐步增加处理,直至业务权重达到预设权重,其中,通过预设的权重增加规则,业务权重可以采用每次增加相同的单位权重的方式进行线性增加,例如:0.1,0.2,0.3,0.4;业务权重也可以采用增加权重随着增加次数的变化而发生变化的方式进行指数增加,例如:0.1,0.2,0.4,0.7;值得说明的是业务权重在逐步增加的过程中,具体权重增加规则并不受举例限制,相关领域技术人员可以根据实际情况调整。
值得注意的是,根据预设的权重增加规则对第二服务器的业务权重进行逐步增加处理过程中,每次增加业务权重之后,应当检测第二服务器对应的健康值,确保第二服务器对应的健康值大于预设健康阈值,直至业务权重达到预设权重,代表第二服务器完全恢复,则将第二服务器确定为第三服务器,使第二服务器承担内容分发网络中的正常服务功能。
另外,参考图4,在一实施例中,图3所示实施例步骤S310包括但不限于以下步骤:
步骤S410,根据预设的权重增加规则对第二服务器的业务权重增加第一单位权值,获取第二服务器的健康值。
步骤S420,在健康值大于预设健康阈值情况下,再次执行根据预设的权重增加规则对第二服务器的业务权重增加第一单位权值,获取第二服务器的健康值的步骤,直至业务权重达到预设权重,且健康值大于预设健康阈值,将第二服务器确定为第三服务器。
需要说明的是,根据预设的权重增加规则对第二服务器的业务权重增加第一单位权值,获取第二服务器的健康值,同时将每次增加第一单位权值之间的间隔确定为恢复周期,在每个恢复周期中,逐步增加服务器的业务权重,并持续检测其健康值,直到服务器的调度权重达到了正常权重且各恢复周期中健康值均高于健康值阈值,则将该第二服务器重新设置为第三服务器,其中,正常权重为内容分发网络通过智能调度模块确定的该服务器处于正常服务状态下对应的权重。
需要说明的是,本步骤中设定的第一单位权重并不限制为固定值,其可以根据权重增加规则而变化,使处于恢复状态的服务器拥有更加平稳的逐步恢复效果,减少因对服务器调入业务量过多而使得刚恢复为正常服务状态的服务器再次进入隔离状态的问题的发生。
另外,参考图5,在一实施例中,图2所示实施例还包括但不限于以下步骤:
步骤S510,获取待处理的业务总流量和每个第一服务器的服务带宽。
步骤S520,根据业务总流量和每个服务带宽得到负载指数。
步骤S530,在负载指数大于或者等于预设危险阈值情况下,将若干个第一服务器和/或若干个第二服务器确定为第三服务器。
其中,待处理的业务总流量是指整个内容分发网络需要处理的总流量,而内容分发网络的主要服务功能由处于正常服务状态的第三服务器承担,因此需获得 每个第三服务器的服务带宽,来确定内容分发网络整体的负载能力,同时,可以想到的是,服务带宽只是其中一种代表服务器负载能力的指标,相关领域技术人员可以根据实际情况选择指标类型,并不对本实施例提出方案构成限制,本实施例提出方案要保护的主要内容为一种内容分发网络在特殊情况影响下,处于正常服务状态的第三服务器无法承担内容分发网络的主要服务功能,为快速提高内容分发网络整体的负载能力,提高第三服务器数量的方法。
可以想到的是,对内容分发网络中的总流量和正常服务服务器的总服务带宽进行比较得到负载指数,该负载指数能表征当前内容分发网络的负载状态,同时通过将负载指数与预设危险阈值对比,来判断内容分发网络负载是否过高。
在一实施例中,以内容分发网络中的总流量和正常服务服务器的总服务带宽进行的比例作为负载指数,若该比例达到预设危险阈值,则将若干个第一服务器和/或若干个第二服务器确定为第三服务器,来降低当前内容分发网络中各正常服务状态服务器负载,保证内容分发网络中各处于正常服务状态的服务器处于稳定服务状态。
在一实施例中,设有多个预设危险阈值,每个预设危险阈值对应不同的第一服务器和第二服务器的调度策略,例如:设有第一危险阈值A和第二危险阈值B,当负载指数到达第一危险阈值A时,只改变第二服务器的服务器状态,将若干个第二服务器确定为第三服务器,来缓解内容分发网络负载压力,当负载指数到达第二危险阈值B时,同时改变第一服务器和第二服务器的服务器状态,将若干个第一服务器和若干个第二服务器确定为第三服务器,来缓解内容分发网络负载压力;可以想到的是正常情况下,处于恢复状态的第二服务器当前能承担的负载能力一般高于处于隔离状态的第一服务器,因此优先将第二服务器确定为第三服务器,在负载指数大于等于预设危险阈值且第二服务器数量不足时,则将第一服务器确定为第三服务器,来缓解内容分发网络中各处于正常服务状态的服务器的负载压力。
另外,参考图6,在一实施例中,图5所示实施例步骤S530还包括但不限于以下步骤:
步骤S610,在负载指数大于或等于预设危险阈值情况下,获取每个第二服务器的业务权重。
步骤S620,根据业务权重确定最高的业务权重对应的第二服务器确定为第三服务器。
其中,获取代表内容分发网络整体负载的负载指数后,在负载指数大于或等于预设危险阈值的情况下,获取每个第二服务器的业务权重,并将最高的业务权重对应的第二服务器确定为第三服务器;可以想到的是,服务器对应的业务权重越高,服务器恢复的状态就越好,能更多的承担正常服务,因此优先将最高的业务权重对应的第二服务器确定为第三服务器,能在调动相同数目服务器的前提下,更多的分担处于正常服务状态的服务器的负载压力,降低负载指数,使内容分发 网络能提供更稳定的服务。
在一实施例中,在一次根据业务权重确定最高的业务权重对应的第二服务器确定为第三服务器的步骤后,负载指数仍然大于或等于预设危险阈值,则继续执行根据业务权重确定最高的业务权重对应的第二服务器确定为第三服务器的步骤,直至负载指数小于预设危险阈值,保证内容服务网络能提供服务质量满足预期需求的服务。
另外,参考图7,在一实施例中,图2所示实施例还包括但不限于以下步骤:
步骤S710,获取第二服务器的健康值。
步骤S720,将健康值小于预设健康阈值的第二服务器确定为第一服务器。
其中,获取第二服务器的健康值,将健康值小于预设健康阈值的第二服务器确定为第一服务器,保证处于恢复状态的各第二服务器的健康值均大于预设健康阈值,使处于恢复状态的各第二服务器为健康服务器;可以想到的是,第二服务器代表内容分发网络中的恢复分组,为减少刚恢复为正常服务状态的服务器再次进入隔离状态的问题的出现,则需要根据健康值对其中不健康服务器进行隔离,将第二服务器确定为第一服务器,同时通过内容分发网络本身的流量调度功能,将该被隔离的第二服务器连接的用户终端分配给处于正常服务状态的各第一服务器,
另外,参考图8,在一实施例中,图2所示实施例还包括但不限于以下步骤:
步骤S810,获取第三服务器的健康值。
步骤S820,将健康值小于预设健康阈值的第三服务器确定为第一服务器。
其中,获取第三服务器的健康值,将健康值小于预设健康阈值的第三服务器确定为第一服务器,保证处于正常服务状态的各第三服务器的健康值均大于预设健康阈值,使处于正常服务状态的各第三服务器为健康服务器;可以想到的是,从设立第三服务器的初衷出发,第三服务器代表内容分发网络中的正常服务分组,为使该正常服务分中的各服务器均为健康服务器,满足内容分发网络中正常服务的质量需求,则需要根据健康值对其中不健康服务器进行隔离,将第三服务器确定为第一服务器,同时通过内容分发网络本身的流量调度功能,将该被隔离的第三服务器连接的用户终端分配给处于正常服务状态的各第一服务器,保证能对用户提供服务质量满足预期的服务。
另外,参考图9,在一实施例中,图2所示实施例中健康值的获取方法包括但不限于以下步骤:
步骤S910,获取每个服务器的用户服务数据,服务器包括至少如下之一:第一服务器,第二服务器,第三服务器。
步骤S920,根据用户服务数据得到服务质量评估结果。
步骤S930,根据服务质量评估结果得到健康值。
其中,获取内容分发网络中的每个服务器的用户服务数据,服务器包括至少如下之一:第一服务器、第二服务器以及第三服务器,可以想到的是第一服务器、 第二服务器以及第三服务器只是根据服务器不同状态进行的分类,对服务器本身没有限制,因此获取每个服务器健康值的方法不会因服务器处于不同状态而发生改变。
在一实施例中,通过底层数据接口获取各个服务器的底层数据,通过网络状态数据接口获取各个服务器和用户之间的网络情况参数,同时,通过对底层数据和网络情况参数进行数据格式化和数据清洗操作得到用户服务数据。
可以想到的是,对用户服务数据进行分析得到服务质量评估结果,并根据服务质量评估结果生成表征服务器服务质量健康值为现有技术,相关领域技术人员可以根据实际情况选择不同的算法实现根据用户服务数据进行分析得到服务质量评估结果,在此不再赘述。
另外,参考图10,在一实施例中,图2至图9任意一项实施例服务器状态控制方法包括但不限于以下步骤:
步骤S1010,获取每个第三服务器的负载率;
步骤S1020,根据负载率对每个第三服务器进行均衡处理,以使第三服务器处于均衡状态。
可以想到的是,根据服务器底层数据和网络情况参数得到每个处于正常服务状态的第三服务器的负载率,其中,基于内容分发网络的本身的流量调度功能,服务器状态控制系统为内容分发网络中处于正常服务状态的第三服务器提供调度服务,同时根据每个第三服务器的负载率在每个第三服务器之间进行流量分配,对每个第三服务器对应的业务量进行均衡处理,使每个第三服务器均处于均衡状态,减少突发大流量导致部分第三服务器超负载从而导致内容分发网络崩溃的情况发生。
另外,参考图11,图11是发明另一个实施例提供的在内容分发网络中进行服务器状态控制方法进行服务器状态控制的实例图,包括但不限于以下步骤:
步骤S1110,获取内容分发网络中各服务器的底层数据;
步骤S1111,对各服对应务底层数据进行数据格式化和数据清洗,得到用户服务数据;
步骤S1112,判断是否遍历完内容分发网络中所有服务器,若否,执行步骤S1113,若是,则结束本次服务器状态控制;
步骤S1113,根据服务器对应用户服务数据得到健康值;
步骤S1114,判断服务器对应健康值是否大于预设健康阈值,若是,执行步骤S1112,若否,则执行步骤S1115;
步骤S1115,将服务器确定为处于隔离状态的第一服务器;
步骤S1116,判断第一服务器对应健康值是否大于预设健康阈值,若是,执行步骤S1117,若否,则执行步骤S1115;
步骤S1117,将第一服务器确定为处于恢复状态的第二服务器;
步骤S1118,逐步增加第二服务器对应的业务权重;
步骤S1119,判断第二服务器对应健康值是否大于预设健康阈值,若是,执行步骤S1120,若否,则执行步骤S1115;
步骤S1120,判断第二服务器对应业务权重是否到达预设权重,若是,执行步骤S1121,若否,则执行步骤S1118;
步骤S1121,将第二服务器确定为处于正常服务状态的第三服务器。
另外,参考图12,图12是发明另一个实施例提供的在内容分发网络中进行服务器状态控制方法进行服务器状态控制的实例图,包括但不限于以下步骤:
步骤S1210,获取待处理的业务总流量和每个第一服务器的服务带宽;
步骤S1211,根据业务总流量和每个服务带宽得到负载指数;
步骤S1212,判断负载指数是否大于或等于预设危险阈值,若是,执行步骤S1213,若否,则执行步骤S1210;
步骤S1213,根据业务权重将最高的业务权重对应的第二服务器确定为第三服务器;
步骤S1214,判断负载指数是否仍然大于或等于预设危险阈值,若是,执行步骤S1213,若否,则结束本次服务器状态控制。
另外,参考图13,本申请的一个实施例还提供了一种服务器状态控制系统1300,存储器1320、处理器1310及存储在存储器1320上并可在处理器1310上运行的计算机程序,处理器1310执行计算机程序时实现前述任意一项的服务器状态控制方法,例如,执行以上描述的图2中的方法步骤S210至S230、图3中的方法步骤S310、图4中的方法步骤S410至S420、图5中的方法步骤S510至S530、图6中的方法步骤S610至S620、图7中的方法步骤S710至S720、图8中的方法步骤S810至S820、图9中的方法步骤S910至S930、图10中的方法步骤S1010至S1020、图11中的方法步骤S1110至S1121、图12中的方法步骤S1210至S1214。
此外,本申请的一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个或多个控制处理器执行,例如,执行以上描述的图2中的方法步骤S210至S230、图3中的方法步骤S310、图4中的方法步骤S410至S420、图5中的方法步骤S510至S530、图6中的方法步骤S610至S620、图7中的方法步骤S710至S720、图8中的方法步骤S810至S820、图9中的方法步骤S910至S930、图10中的方法步骤S1010至S1020、图11中的方法步骤S1110至S1121、图12中的方法步骤S1210至S1214。
本申请实施例包括一种服务器状态控制方法、系统及存储介质,其中,服务器状态控制方法应用于内容分发网络中的服务器状态控制系统,方法包括:获取处于隔离状态的第一服务器的健康值,健康值表征服务器的服务质量;将健康值大于预设健康阈值的第一服务器确定为第二服务器,第二服务器为处于恢复状态的服务器;逐步增加第二服务器的业务权重,直至业务权重达到预设权重,且在逐步增加业务权重过程中健康值均大于预设健康阈值,将第二服务器确定为第三 服务器,第三服务器为处于正常服务状态的服务器。根据本申请实施例提供的方案,在对处于恢复状态中一个或多个服务器进行恢复过程中,逐步提高服务器对应权重,并同时根据逐步分配的权重对处于恢复状态中的服务器分配业务,能够减少出现因对服务器调入业务量过多而使得刚恢复为正常服务状态的服务器再次进入隔离状态的问题,从而提高服务器资源的利用率。
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。
以上是对本申请的若干实施进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请本质的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。

Claims (11)

  1. 一种服务器状态控制方法,应用于内容分发网络中的服务器状态控制系统,所述方法包括:
    获取处于隔离状态的第一服务器的健康值,所述健康值表征服务器的服务质量;
    将所述健康值大于预设健康阈值的所述第一服务器确定为第二服务器,所述第二服务器为处于恢复状态的服务器;
    逐步增加所述第二服务器的业务权重,直至所述业务权重达到预设权重,且在逐步增加所述业务权重过程中所述健康值均大于所述预设健康阈值,将所述第二服务器确定为第三服务器,所述第三服务器为处于正常服务状态的服务器。
  2. 根据权利要求1的方法,其中,所述逐步增加所述第二服务器的业务权重,直至所述业务权重达到预设权重,且在逐步增加所述业务权重过程中所述健康值均大于所述预设健康阈值,将所述第二服务器确定为第三服务器,包括:
    根据预设的权重增加规则对所述第二服务器的业务权重进行逐步增加处理,直至所述业务权重达到预设权重,且在逐步增加所述业务权重过程中所述健康值均大于所述预设健康阈值,将所述第二服务器确定为第三服务器。
  3. 根据权利要求2的方法,其中,所述根据预设的权重增加规则对所述第二服务器的业务权重进行逐步增加处理,直至所述业务权重达到预设权重,且在逐步增加所述业务权重过程中所述健康值均大于所述预设健康阈值,将所述第二服务器确定为第三服务器,包括:
    根据预设的权重增加规则对所述第二服务器的业务权重增加第一单位权值,获取所述第二服务器的健康值;
    在所述健康值大于所述预设健康阈值情况下,再次执行所述根据预设的权重增加规则对所述第二服务器的业务权重增加第一单位权值,获取所述第二服务器的健康值的步骤,直至所述业务权重达到预设权重,且所述健康值大于所述预设健康阈值,将所述第二服务器确定为第三服务器。
  4. 根据权利要求1的方法,还包括:
    获取待处理的业务总流量和每个所述第一服务器的服务带宽;
    根据所述业务总流量和每个所述服务带宽得到负载指数;
    在所述负载指数大于或者等于预设危险阈值情况下,将若干个所述第一服务器和/或若干个所述第二服务器确定为所述第三服务器。
  5. 根据权利要求4的方法,其中,所述在所述负载指数不低于所述预设危险阈值情况下,将若干个所述第一服务器和/或若干个所述第二服务器确定为所述第三服务器,包括:
    在所述负载指数不低于所述预设危险阈值情况下,获取每个所述第二服务器的业务权重;
    根据所述业务权重确定最高的所述业务权重对应的所述第二服务器确定为 所述第三服务器。
  6. 根据权利要求1的方法,还包括:
    获取所述第二服务器的所述健康值;
    将所述健康值小于预设健康阈值的所述第二服务器确定为所述第一服务器。
  7. 根据权利要求1的方法,还包括:
    获取所述第三服务器的所述健康值;
    将所述健康值小于预设健康阈值的所述第三服务器确定为所述第一服务器。
  8. 根据权利要求1的方法,其中,所述健康值的获取方法,包括:
    获取每个服务器的用户服务数据,所述服务器包括至少如下之一:所述第一服务器,所述第二服务器,所述第三服务器;
    根据所述用户服务数据得到服务质量评估结果;
    根据所述服务质量评估结果得到所述健康值。
  9. 根据权利要求1至8任意一项的方法,还包括:
    获取每个所述第三服务器的负载率;
    根据负载率对每个所述第三服务器进行均衡处理,以使所述第三服务器处于均衡状态。
  10. 一种服务器状态控制系统,包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如权利要求1至9中任意一项所述的服务器状态控制方法。
  11. 一种计算机可读存储介质,存储有计算机可执行指令,计算机可执行指令用于执行如权利要求1至9中任意一项所述的服务器状态控制方法。
PCT/CN2022/114281 2021-11-12 2022-08-23 服务器状态控制方法、系统及存储介质 WO2023082765A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111342245.9A CN116126618A (zh) 2021-11-12 2021-11-12 服务器状态控制方法、系统及存储介质
CN202111342245.9 2021-11-12

Publications (1)

Publication Number Publication Date
WO2023082765A1 true WO2023082765A1 (zh) 2023-05-19

Family

ID=86296039

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/114281 WO2023082765A1 (zh) 2021-11-12 2022-08-23 服务器状态控制方法、系统及存储介质

Country Status (2)

Country Link
CN (1) CN116126618A (zh)
WO (1) WO2023082765A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116366657B (zh) * 2023-05-31 2023-08-04 天翼云科技有限公司 一种缓存服务器的数据请求调度方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100235195A1 (en) * 2009-03-10 2010-09-16 Searete Llc, A Limited Liability Corporation Of The State Of Delaware Computational systems and methods for health services planning and matching
CN106339296A (zh) * 2016-08-31 2017-01-18 虎扑(上海)文化传播股份有限公司 服务状态监测方法和装置
CN108156091A (zh) * 2016-12-02 2018-06-12 阿里巴巴集团控股有限公司 一种流量控制方法及系统
CN110351311A (zh) * 2018-04-02 2019-10-18 亿度慧达教育科技(北京)有限公司 负载均衡方法及计算机存储介质
CN111666170A (zh) * 2020-05-29 2020-09-15 中国工商银行股份有限公司 基于分布式框架的故障节点处理方法及装置
CN113127201A (zh) * 2021-04-23 2021-07-16 中国工商银行股份有限公司 故障应用服务器隔离方法及装置、电子设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100235195A1 (en) * 2009-03-10 2010-09-16 Searete Llc, A Limited Liability Corporation Of The State Of Delaware Computational systems and methods for health services planning and matching
CN106339296A (zh) * 2016-08-31 2017-01-18 虎扑(上海)文化传播股份有限公司 服务状态监测方法和装置
CN108156091A (zh) * 2016-12-02 2018-06-12 阿里巴巴集团控股有限公司 一种流量控制方法及系统
CN110351311A (zh) * 2018-04-02 2019-10-18 亿度慧达教育科技(北京)有限公司 负载均衡方法及计算机存储介质
CN111666170A (zh) * 2020-05-29 2020-09-15 中国工商银行股份有限公司 基于分布式框架的故障节点处理方法及装置
CN113127201A (zh) * 2021-04-23 2021-07-16 中国工商银行股份有限公司 故障应用服务器隔离方法及装置、电子设备和存储介质

Also Published As

Publication number Publication date
CN116126618A (zh) 2023-05-16

Similar Documents

Publication Publication Date Title
US11546644B2 (en) Bandwidth control method and apparatus, and device
CN106933650B (zh) 云应用系统的负载管理方法及系统
CN111614570B (zh) 一种用于服务网格的流量控制系统及方法
JP6559670B2 (ja) ネットワーク機能仮想化情報コンセントレータのための方法、システム、およびコンピュータ読取可能媒体
JP4856760B2 (ja) ネットワーク・トラフィックの配分を制御するための方法、装置及びコンピュータ・プログラム
EP2515504B1 (en) Content delivery method, system and schedule server
US20090234908A1 (en) Data transmission queuing using fault prediction
US20120084788A1 (en) Complex event distributing apparatus, complex event distributing method, and complex event distributing program
JP2010204876A (ja) 分散システム
CN108809848A (zh) 负载均衡方法、装置、电子设备及存储介质
WO2023082765A1 (zh) 服务器状态控制方法、系统及存储介质
CN103227754A (zh) 一种高可用集群系统负载动态均衡方法及节点设备
CN112165436A (zh) 流量控制方法、装置及系统
CN112272217B (zh) 一种kafka集群负载均衡方法、系统、设备以及介质
CN113568756B (zh) 一种密码资源协同动态调度方法和系统
US20210026341A1 (en) Network analysis program, network analysis device, and network analysis method
CN112711479A (zh) 服务器集群的负载均衡系统、方法、装置和存储介质
CN110636109B (zh) 节点调度优化方法、服务器及计算机可读存储介质
US20170206125A1 (en) Monitoring system, monitoring device, and monitoring program
CN107426012B (zh) 一种基于超融合架构的故障恢复方法及其装置
CN109510730B (zh) 分布式系统及其监控方法、装置、电子设备及存储介质
CN116723154A (zh) 一种基于负载均衡的路由分发方法及系统
CN116382892A (zh) 一种基于多云融合以及云服务的负载均衡方法及装置
Gromoll et al. Fluid model for a data network with α-fair bandwidth sharing and general document size distributions: two examples of stability
CN114237910A (zh) 客户端负载均衡实现方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891572

Country of ref document: EP

Kind code of ref document: A1