WO2024016092A1

WO2024016092A1 - Multi-level network traffic management

Info

Publication number: WO2024016092A1
Application number: PCT/CN2022/106137
Authority: WO
Inventors: Boyang ZHENG; Yehan WANG; Yu Chen; Jinyang ZHOU; Yuchao Dai; Zhenguo Yang; Bradley RUTKOWSKI
Original assignee: Microsoft Technology Licensing, Llc
Priority date: 2022-07-16
Filing date: 2022-07-16
Publication date: 2024-01-25

Abstract

The present disclosure proposes a method, apparatus and computer program product for multi-level network traffic management. A plurality of network events in a machine may be collected. Network bandwidth usage data of a target object running on the machine may be generated based on the plurality of network events, the target object being a target process or a target component in a process. It may be detected whether the network bandwidth usage data is abnormal. In response to detecting that the network bandwidth usage data is abnormal, network traffic to be transmitted from the target object may be tuned. The present disclosure also proposes a multi-level network traffic management system comprising a collector, a monitor and an executor.

Description

MULTI-LEVEL NETWORK TRAFFIC MANAGEMENT

BACKGROUND

With the development of technologies such as cloud storage and cloud computing, more and more enterprises and institutions leverage cloud services for daily operations and management. The cloud services may refer to a wide range of services delivered on demand over the Internet. The cloud services are managed by cloud service providers and made available to customers from the providers' machines, e.g., cloud servers, so there's no need for customers to host applications or resources on their own on-premises servers.

SUMMARY

This Summary is provided to introduce a selection of concepts that are further described below in the Detailed Description. It is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Embodiments of the present disclosure propose a method, apparatus and computer program product for multi-level network traffic management. A plurality of network events in a machine may be collected. Network bandwidth usage data of a target object running on the machine may be generated based on the plurality of network events, the target object being a target process or a target component in a process. It may be detected whether the network bandwidth usage data is abnormal. In response to detecting that the network bandwidth usage data is abnormal, network traffic to be transmitted from the target object may be tuned.

Furthermore, the embodiments of the present disclosure also propose a multi-level network traffic management system. The multi-level network traffic management system may comprise: a collector, configured to: collect a plurality of network events in a machine, generate network bandwidth usage data of a target object running on the machine based on the plurality of network events, the target object being a target process or a target component in a process; a monitor, configured to: detect whether the network bandwidth usage data is abnormal, and in response to detecting that the network bandwidth usage data is abnormal, generate a network traffic tuning request for the target object; and an executor, configured to: tune network traffic to be transmitted from the target object based on the network traffic tuning request.

It should be noted that the above one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the drawings set forth in detail certain illustrative features of the one or more aspects. These features are only indicative of the various ways in which the principles of various aspects may be employed, and this disclosure is intended to include all such aspects and their equivalents.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosed aspects will hereinafter be described in connection with the appended drawings that are provided to illustrate and not to limit the disclosed aspects.

FIG. 1 illustrates an exemplary multi-level network traffic management system according to an embodiment of the present disclosure.

FIG. 2 illustrates an exemplary process for obtaining network bandwidth usage data of a target process according to an embodiment of the present disclosure.

FIG. 3 illustrates an exemplary process for obtaining network bandwidth usage data of a target component according to an embodiment of the present disclosure.

FIG. 4 illustrates an exemplary process for detecting whether network bandwidth usage data of a target object is abnormal according to an embodiment of the present disclosure.

FIG. 5 illustrates an exemplary process for training a rate range predicting model according to an embodiment of the present disclosure.

FIG. 6 illustrates an exemplary process for using a port as a match condition for setting a traffic tuning policy for a target component according to an embodiment of the present disclosure.

FIG. 7 illustrates an example of using a port as a match condition for setting a traffic tuning policy for a target component according to an embodiment of the present disclosure.

FIG. 8 is a flowchart of an exemplary method for multi-level network traffic management according to an embodiment of the present disclosure.

FIG. 9 illustrates an exemplary multi-level network traffic management system according to an embodiment of the present disclosure.

FIG. 10 illustrates an exemplary apparatus for multi-level network traffic management according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.

In order to provide reliable cloud services, cloud service providers usually deploy a number of machines at various datacenters. These machines are transmitting tremendous network traffic across datacenters. Network traffic transmitted from a machine may be produced by various elements, such as services, processes, threads, etc., in the machine. At the present, many elements are producing and transmitting network traffic without restraint, which brings considerable network bandwidth costs. Moreover, inevitably, faults may occur with some elements from time to time. Any small fault in a single element may lead to widespread internal traffic flood. Such traffic flood may occupy the entire network bandwidth, and impact the normal traffic. Network traffic can be managed, so as to control the amount of traffic sent over the network. At the present, the network traffic can be managed at a machine level. That is, network traffic transmitted from a machine can be throttled. However, network traffic transmitted from a machine may involve multiple elements, such as multiple processes, etc. If all the network traffic transmitted from the machine is throttled, then important network traffic, such customer-related network traffic, is also throttled. This may decrease the availability of the cloud service.

Embodiments of the present disclosure propose multi-level network traffic management. The multi-level network traffic management may be implemented on e.g., a machine providing a cloud service. Firstly, a plurality of network events in a machine may be collected. A network event may include at least the number of bytes transmitted at a corresponding moment. Network bandwidth usage data of a target object running on the machine may be generated based on the plurality of network events. Herein, a target object may refer to an object from which network traffic is to be monitor and/or tune. The target object may be a target process or a target component in a process. Taking the machine being an Internet Information Services (IIS) server as an example, a component in a process may be an application pool. An application pool may be in a process, and be responsible for to isolate one or more applications into their own process. It may be detected whether the network bandwidth usage data is abnormal. If the network bandwidth usage data is abnormal, network traffic to be transmitted from the target object may be tuned.

Compared with the existing network traffic management, the proposed network traffic management is fine-grained, and can be based on multiple levels, such as a process level, a component level, etc. Usually, a number of processes are running on a machine at the same time. Some processes may be high-priority processes, such as customer-related and latency-sensitive processes, while some processes may be low-priority processes, such as customer-unrelated or latency-insensitive processes. Furthermore, a process may involve multiple components, e.g., multiple application pools. Some components may be high-priority components, while some components may be low-priority components. Managing the network traffic at a process level or at a component level may prioritize network traffic of the high-priority processes or components, and deprioritize network traffic of the low-priority processes or components. In this way, the high-priority network traffic can be allocated with more bandwidth, and can be delivered with low latency.

The network bandwidth usage data of the target object can be generated and detected in real time. If it is detected that the network bandwidth usage data is abnormal, network traffic to be transmitted from the target object may be tuned immediately. For example, a traffic tuning policy may be applied on the network traffic. The traffic tuning policy may specify a rate range and/or a priority for the network traffic. The traffic tuning policy may take effect within one minute or even a few seconds, so that the network traffic may be tuned in time. In this way, a high-performing network can be maintained, and a cloud service with high reliability and availability can be guaranteed.

In order to specify to which object the traffic tuning policy will target, the traffic tuning policy may include a match condition. If a target object is the target process, the match condition in the traffic tuning policy may be a process name of the target process. If the target object is the target component in a process, the match condition may be a port corresponding to the target component. However, since all components in a process usually share the same source port and/or destination port, an original port corresponding to the target component cannot be used as the match condition for the target component. The embodiments of the present disclosure propose to enable a port to be used as a match condition for a target component through leveraging a port proxy. At a transmitting machine, a destination port corresponding to a target component may be changed to a new destination port different from its original destination port through a port proxy. A traffic tuning policy may be set for the target component. The traffic tuning policy may include the new destination port. The new destination port may be converted back to the original destination port at a receiving machine through a port proxy. Through the above approach, a port may be used as a match condition for setting a traffic tuning policy for a target component. In this regard, a component level may also be referred to as a port level. This approach is lightweight, agile, and easy to achieve. With the above approach, there is no need for services owners to duplicate receiver-side codes to listen to a new port, which reduces the complexity of workloads greatly.

It should be appreciated that although the foregoing discussion and the following discussion may involve examples of managing network traffic at a process level, at a component level or at a port level, the embodiments of the present disclosure are not limited to this. Depending on actual application requirements, the proposed network traffic management may be applicable at any other levels in a similar manner.

FIG. 1 illustrates an exemplary multi-level network traffic management system 100 according to an embodiment of the present disclosure. The multi-level network traffic management system 100 may be deployed on a local machine that is to transmit network traffic to a network. As an example, the machine may be a machine providing a cloud service, and the network may be a cloud service network. Such a deployment does not require a dependency on a centralized service. This improves service reliability and saves data transfers. The multi-level network traffic management system 100 may manage network traffic transmitted from the machine at multiple levels, such as at a process level, at a component level or a port level, etc. The multi-level network traffic management system 100 may comprise, e.g., a collector 110, a monitor 120, an executor 130, etc.

Firstly, the collector 110 may collect a plurality of network events 102 in the machine. A network event may include multiple network statistics, e.g., a process identifier (ID) , the number of bytes received, the number of bytes transmitted, a destination address, a source address, a destination port, a source port, etc. Then, the collector 110 may generate network bandwidth usage data 112 of a target object running on the machine based on the plurality of network events 102. The network bandwidth usage data 112 may be time-series data, and may indicate network bandwidth usage of a target object at a predetermined time interval, e.g., a data point per minute. The target object may be a target process running on the machine. Furthermore, a process may involve a plurality of components, e.g., a plurality of application pools. The target object may be a target component in a process, e.g., a target application pool in a process. That is, the collector 110 may generate network bandwidth usage data of a target process or a target component. Network bandwidth usage data of a target process and network bandwidth usage data of a target component may be obtained through different processes. An exemplary process for obtaining network bandwidth usage data of a target process will be described later in conjunction with FIG. 2. An exemplary process for obtaining network bandwidth usage data of a target component will be described later in conjunction with FIG. 3.

After the collector 110 generates the network bandwidth usage data 112 of the target object, the collector 110 may provide the network bandwidth usage data 112 to the monitor 120. The monitor 120 may detect whether the network bandwidth usage data 112 is abnormal. The monitor 120 may detect whether the network bandwidth usage data 112 through performing anomaly detection on the network bandwidth usage data 112. An exemplary process for detecting whether network bandwidth usage data is abnormal will be described later in conjunction with FIG. 4.

If the monitor 120 detects that the network bandwidth usage data 112 of the target object is abnormal, network traffic to be transmitted from the target object may be tuned. For example, the monitor 120 may generate a network traffic tuning request 122 for the target object. In order to specify to which object the network traffic tuning request 122 targets, the network traffic tuning request 122 may include a match condition. If the target object is the target process, the match condition may be a process name of the target process. If the target object is the target component, the match condition may be a port corresponding to the target component. However, since all components in a process usually share the same source port and/or destination port, an original port corresponding to the target component cannot be used as the match condition for the target component. An exemplary process for using a port as a match condition for a target component will be described later in conjunction with FIG. 6.

Additionally, the network traffic tuning request 122 may include at least one action to be performed for the target object. Actions to be performed for the target object may include, e.g., changing a priority of network traffic of the target object, throttling even blocking the network traffic of the target object, etc.

The network traffic tuning request 122 may be provided to the executor 130. The executor 130 may perform a network traffic tuning action 132 for the target object based on the network traffic tuning request 122. The network traffic tuning action 132 may tune network traffic to be transmitted from the target object in a specific period. In an implementation, the network traffic tuning action 132 may be to apply a traffic tuning policy on the network traffic to be transmitted from the target object. Taking the machine running the Windows operating system as an example, the traffic tuning policy may be NetQosPolicy. The traffic tuning policy may be generated based on the network traffic tuning request 122. The traffic tuning policy may provide differential treatment to specific network traffic. In order to specify to which object the traffic tuning policy will target, the traffic tuning policy may include a match condition. The match condition in the traffic tuning policy may be the same as that in the network traffic tuning request 122. For example, if the target object is the target process, the match condition in the traffic tuning policy may be a process name of the target process. If the target object is the target component, the match condition in the traffic tuning policy may be a port corresponding to the target component. The process for using a port as a match condition for a target component may be learned from FIG. 6.

Additionally, depending on the network traffic tuning request 122, the traffic tuning policy may specify, e.g., a rate range, a priority, etc. for the network traffic. For example, if an action in the network traffic tuning request 122 is to throttle the network traffic, the traffic tuning policy may specify a rate range for the network traffic. The rate range may include an upper rate limit and/or a lower rate limit for the network traffic. In an implementation, the rate range may be provided by a rate range predicting model which is previously trained for the target object. Additionally, for the priority, a Differentiated Services Code Point (DSCP) value may be used to indicate a specific priority for the network traffic.

Compared with existing network traffic management systems, the proposed network traffic management system enables fine-grained network traffic management which is based on multiple levels, such as a process level, a component level or a port level, etc. Usually, a number of processes are running on a machine at the same time. Some processes may be high-priority processes, such as customer-related and latency-sensitive processes, while some processes may be low-priority processes, such as customer-unrelated or latency-insensitive processes. Furthermore, a process may involve multiple components, e.g., multiple application pools. Some components may be high-priority components, while some components may be low-priority components. Managing the network traffic at a process level or at a component level may prioritize network traffic of the high-priority processes or components, and deprioritize network traffic of the low-priority processes or components. In this way, the high-priority network traffic can be allocated with more bandwidth, and can be delivered with low latency.

Through the multi-level network traffic management system 100, the network bandwidth usage data 112 of the target object can be generated and detected in real time. If it is detected that the network bandwidth usage data is abnormal, the network traffic tuning action 132 can be performed for the target object immediately. For example, the traffic tuning policy for the target object may take effect within one minute or even a few seconds, so that the network traffic may be tuned in time. In this way, a high-performing network can be maintained, and a cloud service with high reliability and availability can be guaranteed.

It should be appreciated that the multi-level network traffic management system 100 illustrated in FIG. 1 is merely one example. Depending on actual application requirements, the multi-level network traffic management system 100 may have any other structure and may include more or fewer elements. Moreover, various modules in the multi-level network traffic management system 100 may perform other operations in addition to the operations described above. For example, when the collector 110 generates the network bandwidth usage data 112 of the target object, it may also upload the network bandwidth usage data 112 to a presenting platform for presenting through a dashboard, so that related technical persons may make a further analysis on the network bandwidth usage data 112. Also, in addition to the network traffic tuning request 122 from the monitor 120, the executor 130 may receive an on-demand network traffic tuning request which is manually triggered.

FIG. 2 illustrates an exemplary process 200 for obtaining network bandwidth usage data of a target process according to an embodiment of the present disclosure. The process 200 may be performed through a collector in a multi-level network traffic management system, such as the collector 110 in FIG. 1.

At 202, a plurality of network events in a machine may be collected. For example, the plurality of network events may be collected through a network event tracing tool running on the machine. Taking the machine running the Windows operating system as an example, the network event tracing tool may be Event Tracing for Windows (ETW) . A network event may include multiple network statistics, e.g., a process ID, the number of bytes received, the number of bytes transmitted, a destination address, a source address, a destination port, a source port, etc.

At 204, a network event set corresponding to a target process may be selected from the plurality of network events through performing process mapping. A network event may include a process ID. Every time a process starts, it is assigned a process ID. Thus, a process ID of a process may be changed when the process restarts, and thus cannot persistently identify the process. In an implementation, a process name of the process may be used to uniquely and persistently identify the process. A process name of a process which is transmitting network traffic at various moments may be obtained through a system Application Programming Interface (API) of the machine. The process mapping may be performed with this system API, so that the network event set corresponding to the target process may be selected from the plurality of network events.

At 206, network bandwidth usage data of the target process may be generated based on the network event set. A network event in the network event set may include the number of bytes transmitted from the target process. The network bandwidth usage data may be time-series data. The network bandwidth usage data of the target process may be generated based on, e.g., the number of bytes in each network event in the network event set and a timestamp of the network event.

It should be appreciated that the process 200 in FIG. 2 is merely examples of the process for obtaining the network bandwidth usage data of the target process. Depending on actual application requirements, the steps in the process for obtaining the network bandwidth usage data of the target process may be replaced or modified in any manner, and the process may comprise more or fewer steps.

FIG. 3 illustrates an exemplary process 300 for obtaining network bandwidth usage data of a target component according to an embodiment of the present disclosure. The process 300 may be performed through a collector in a multi-level network traffic management system, such as the collector 110 in FIG. 1.

At 302, a plurality of network events in a machine may be collected. The step 302 may be similar to the step 202 in FIG. 2.

At 304, a process corresponding to a target component may be determined.

At 306, a network event set corresponding to the determined process may be selected from the plurality of network events through performing process mapping. The step 306 may be similar to the step 204 in FIG. 2.

At 308, a network event subset corresponding to the target component may be selected from the network event set through performing component mapping. A component name of the target component may be used to uniquely and persistently identify the target component. A component name of a component which is transmitting network traffic at various moments may be obtained through a system API of the machine. Taking the machine running the Windows operating system as an example, this system API may be Windows Management Instrumentation (WMI) . The component mapping may be performed with this system API, so that the network event subset corresponding to the target component may be selected from the network event set.

At 310, network bandwidth usage data of the target component may be generated based on the network event subset. The step 310 may be similar to the step 206 in FIG. 2.

It should be appreciated that the process 300 in FIG. 3 is merely an example of the process for obtaining the network bandwidth usage data of the target component. Depending on actual application requirements, the steps in the process for obtaining the network bandwidth usage data of the target component may be replaced or modified in any manner, and the process may comprise more or fewer steps.

FIG. 4 illustrates an exemplary process 400 for detecting whether network bandwidth usage data of a target object is abnormal according to an embodiment of the present disclosure. The process 400 may be performed through a monitor in a multi-level network traffic management system, such as the monitor 120 in FIG. 1.

Network bandwidth usage data of a target object may be time-series data. The time-series data may correspond to a predetermined time period, such as fifteen minutes. At 402, the network bandwidth usage data may be divided into a plurality of data windows. For example, the network bandwidth usage data may be divided into three data windows, each data windows being five minutes. The number of data windows and the length of each data window may be determined based on, e.g., a service scenario, tolerance to the traffic anomaly, etc.

Subsequently, it may be determined whether all data windows in the plurality of data windows are abnormal data windows. If all data windows in the plurality of data windows are abnormal data windows, it may be detected that the network bandwidth usage data is abnormal. Whether a data window is an abnormal data window may be determined through determining whether the data window includes at least one abnormal data point. If the data window includes at least one abnormal data point, it may be determined that the data window is an abnormal data window.

Each data window in the plurality of data windows includes one or more data points. At 404, a predicted rate range of at least one data point in a data window may be obtained. The predicted rate range of the data point may be defined by an upper limit and/or a lower limit for a rate value of the data point. In an implementation, the predicted rate range may be intelligently and dynamically predicted through a rate range predicting model. The rate range predicting model may be a machine learning model which is previously trained for the target object. The training of the machine learning model may take at least a priority of the target object into account. An exemplary process for training a rate range predicting model will be described later in conjunction with FIG. 5. Preferably, the rate range predicting model may be retrained at a predetermined time interval. Additionally, the predicted rate range may be predicted in advance of a predetermined time. For example, assuming that the data point corresponds to a moment t, the predicted rate range of the data point may be predicted at a moment t-Δt, wherein Δt may be, e.g., three minutes, five minutes, etc.

At 406, it may be determined whether a real rate value of the at least one data point is outside the predicted rate range. The real rate value of the data point is a value indicated by the network bandwidth usage data.

If it determined at 406 that the real rate value of the at least one data point is not outside the predicted rate range, that is, real rate values of all data points are within corresponding predicted rate ranges, the process 400 may proceed to a step 408. At 408, it may be determined that the data window does not include abnormal data points. Then, at 410, it may be determined that the data window is not an abnormal data window. At 412, it may be determined that the network bandwidth usage data is not abnormal.

If it is determined at 406 that the real rate value of the at least one data point is outside the predicted rate range, the process 400 may proceed to a step 414. At 414, it may be determined that the data window includes at least one abnormal data point. Then, at 416, it may be determined that the data window is an abnormal data window.

At 418, it may be determined whether all data windows in the plurality of data windows have been traversed. If it is determined that not all data windows have been traversed, the process 400 may return to the step 404. The step 404 and the following steps may be performed for a next data window.

If it is determined that all data windows have been traversed, the process 400 may proceed to a step 420. At 420, it may be determined that all data windows in the plurality of data windows are abnormal data windows. Then, at 422, it may be detected that the network bandwidth usage data is abnormal.

It should be appreciated that the process 400 in FIG. 4 is merely an example of the process for detecting whether network bandwidth usage data of a target object. Depending on actual application requirements, the steps in the process for detecting whether network bandwidth usage data may be replaced or modified in any manner, and the process may comprise more or fewer steps. For example, in addition to predict a rate range through a rate range predicting model, the rate range may also be predicted through a rule-based approach. In addition, the specific order or hierarchy of the steps in the process 400 is merely exemplary, and the process for detecting whether network bandwidth usage data may be performed in an order different from the described order.

FIG. 5 illustrates an exemplary process 500 for training a rate range predicting model according to an embodiment of the present disclosure. The rate range predicting model trained through the process 500, when deployed, may predict a rate range of each data point in network bandwidth usage data of a target object.

At 502, a category corresponding to a target object may be identified from a set of categories. For example, the set of categories may include at least a customer-related and latency-sensitive object and a customer-unrelated or latency-insensitive object. The customer-related and latency-sensitive object may be, e.g., customer-facing requests/responses, etc. The customer-unrelated or latency-insensitive object may be, e.g., replication traffic for site resilience, cache refresh for latency improvement, etc.

At 504, a priority of the target object may be determined based on the category. For example, if the category corresponding to the target object is the customer-related and latency-sensitive object, the priority of the target object may be determined as a high priority; while if the category corresponding to the target object is the customer-unrelated or latency-insensitive object, the priority of the target object may be determined as a low priority.

At 506, a historical network bandwidth usage data of the target object may be obtained. The historical network bandwidth usage data may be data corresponding to a time period in the past. The time period may be, e.g., a day, a week, a month, etc.

Subsequently, the historical network bandwidth usage data may be optimized based at least on the priority of the target object. The rate range predicting model may be trained with the optimized historical network bandwidth usage data.

The historical network bandwidth usage data may include a plurality of data points. At 508, an acceptable rate range of a data point in the historical network bandwidth usage data may be determined based at least on the priority. For example, if the priority of the target object is a high priority, the acceptable rate range of the data point may be large, e.g., the upper limit for a rate value of the data point may be high. If the priority of the target object is a low priority, the acceptable rate range of the data point may be small, e.g., the upper limit for a rate value of the data point may be low. The acceptable rate range of the data point may also be determined based on other factors, such as time of day, service type of the target object, acceptable latency, etc.

At 510, it may be determined whether a real rate value of the data point is outside the acceptable rate range. The real rate value of the data point is a value indicated by the historical network bandwidth usage data.

If it is determined that the real rate value of the data point is not outside the acceptable rate range, that is, the real rate value of the data point is within the acceptable rate range, the process 500 may proceed to a step 512. At 512, the data point may be maintained.

If it is determined that the real rate value of the data point is outside the acceptable rate range, the process 500 may proceed to a step 514. At 514, the data point may be moved, e.g., raised or lowered, so that a rate value of the data point is within the acceptable rate range.

At 516, it may be determined whether all data points in the historical network bandwidth usage data have been traversed. If it is determined that not all data points have been traversed, the process 500 may return to the step 508. The step 508 and the following steps may be performed for a next data point.

If it is determined that all data points have been traversed, the process 500 may proceed to a step 518. At 518, the optimized historical network bandwidth usage data may be obtained. Compared with the original historical network bandwidth usage data, the optimized historical network bandwidth usage data may be more flat.

At 520, the rate range predicting model may be trained with the optimized historical network bandwidth usage data.

It should be appreciated that the process 500 in FIG. 5 is merely an example of the process for training the rate range predicting model. Depending on actual application requirements, the steps in the process for training the rate range predicting model may be replaced or modified in any manner, and the process may comprise more or fewer steps. In addition, the specific order or hierarchy of the steps in the process 500 is merely exemplary, and the process for training the rate range predicting model may be performed in an order different from the described order.

FIG. 6 illustrates an exemplary process 600 for using a port as a match condition for setting a traffic tuning policy for a target component according to an embodiment of the present disclosure. The process 600 may be performed for network traffic transmitted from a transmitting machine to a receiving machine. The target component may be in a process. All components in the process may initially share an original destination port.

At 602, at a transmitting machine, a destination port corresponding to a target component may be changed to a new destination port different from its original destination port. The new destination port may be a port that is not in use.

At 604, a traffic tuning policy may be set for the target component. Taking the machine running the Windows operating system as an example, the traffic tuning policy may be NetQosPolicy. The traffic tuning policy may include the new destination port.

At 606, the new destination port may be converted back to the original destination port at a receiving machine. In an implementation, the new destination port may be converted back to the original destination port through leveraging a feature named Portproxy. Through the Portproxy feature, a port proxy may be made to the original destination port when the receiving machine receives data packets to the new destination port.

Through the process 600, a port may be used as a match condition for setting a traffic tuning policy for a target component. Further, the traffic tuning policy may be applied on at least one of a transmitting machine, a receiving machine, and a network between the transmitting machine and the receiving machine. Accordingly, network traffic of the target component may be tuned, e.g., prioritized or deprioritized, throttled or even blocked, etc., through the traffic tuning policy. With the above approach, the network traffic may be tuned based on two-level traffic classification, such as machine-level traffic classification and network-level traffic classification. This approach is lightweight, agile, and easy to achieve. With the above approach, there is no need for services owners to duplicate receiver-side codes to listen to a new port, which reduces the complexity of workloads greatly.

It should be appreciated that the process 600 in FIG. 6 is merely an example of the process for using a port as a match condition for a target component. Depending on actual application requirements, the steps in the process for using a port as a match condition for the target component may be replaced or modified in any manner, and the process may comprise more or fewer steps. For example, in addition to a destination port of a target component, a source port of the target component may also be used as a match condition for the target component in a similar manner.

FIG. 7 illustrates an example 700 of using a port as a match condition for setting a traffic tuning policy for a target component according to an embodiment of the present disclosure.

A target component running on a transmitting machine 710 transmits network traffic to a receiving machine 720 via a network 730. For example, a data packet 702 may belong to the network traffic from the target component. Initially, a destination port of the data packet 702 may be an original destination port, i.e., a port 444. All components in a process which involves the target component may share the port 444.

In order to differentiate the target component, at the transmitting machine 710, a destination port corresponding to the target component may be changed to a new destination port which is different from its original destination port and is not in use. For example, the destination port 444 may be changed to a destination port 1444. For ease of indication, the data packet having the destination port 1444 may be denoted as a data packet 704. The data packet 704 may be transmitted to the receiving machine 720 via the network 730.

A traffic tuning policy 708 may be set for the target component. The traffic tuning policy 708 may include the destination port 1444. The traffic tuning policy 708 may be applied on at least one of the transmitting machine 710, the receiving machine 720, and the network 730. Accordingly, network traffic of the target component, e.g., the data packet 704, may be tuned through the traffic tuning policy 708.

At the receiving machine 720, the destination port 1444 may be converted back to the original destination port, i.e., the destination port 444. The data packet having the destination port 444 may be re-denoted as the data packet 702.

It should be appreciated that the example 700 in FIG. 7 is merely one example of using a port as a match condition for a target component. For example, in addition to the port 1444, the destination port corresponding to the target component may be changed to any other port that is different from its original destination port and is not in use.

FIG. 8 is a flowchart of an exemplary method 800 for multi-level network traffic management according to an embodiment of the present disclosure.

At 810, a plurality of network events in a machine may be collected.

At 820, network bandwidth usage data of a target object running on the machine may be generated based on the plurality of network events, the target object being a target process or a target component in a process.

At 830, it may be detected whether the network bandwidth usage data is abnormal.

At 840, in response to detecting that the network bandwidth usage data is abnormal, network traffic to be transmitted from the target object may be tuned.

In an implementation, the target object may be the target process. The generating network bandwidth usage data of a target object may comprise: selecting, from the plurality of network events, a network event set corresponding to the target process through performing process mapping; and generating the network bandwidth usage data of the target process based on the network event set.

In an implementation, the target object may the target component. The generating network bandwidth usage data of a target object may comprise: determining a process corresponding to the target component; selecting, from the plurality of network events, a network event set corresponding to the determined process through performing process mapping; selecting, from the network event set, a network event subset corresponding to the target component through performing component mapping; and generating the network bandwidth usage data of the target component based on the network event subset.

In an implementation, the network bandwidth usage data may be time-series data. The detecting whether the network bandwidth usage data is abnormal may comprise: dividing the network bandwidth usage data into a plurality of data windows; determining whether all data windows in the plurality of data windows are abnormal data windows; and in response to determining that all data windows in the plurality of data windows are abnormal data windows, detecting that the network bandwidth usage data is abnormal.

Each data window may include one or more data points. The determining whether all data windows in the plurality of data windows are abnormal data windows may comprise, for each data window: determining whether the data window includes at least one abnormal data point; and in response to determining that the data window includes at least one abnormal data point, determining that the data window is an abnormal data window.

The determining whether the data window includes at least one abnormal data point may comprise: obtaining a predicted rate range of at least one data point in the data window; determining whether a real rate value of the at least one data point is outside the predicted rate range; and in response to determining that the real rate value of the at least one data point is outside the predicted rate range, determining that the data window includes at least one abnormal data point.

The predicted rate range may be predicted through a rate range predicting model. The rate range predicting model may be previously trained for the target object.

Training of the rate range predicting model may comprise: identifying a category corresponding to the target object from a set of categories; determining a priority of the target object based on the category; obtaining a historical network bandwidth usage data of the target object; optimizing the historical network bandwidth usage data based at least on the priority; and training the rate range predicting model with the optimized historical network bandwidth usage data.

The set of categories may include at least a customer-related and latency-sensitive object and a customer-unrelated or latency-insensitive object. The determining a priority of the target object may comprise: determining the priority of the target object as a high priority if the category corresponding to the target object is the customer-related and latency-sensitive object; or determining the priority of the target object as a low priority if the category corresponding to the target object is the customer-unrelated or latency-insensitive object.

The historical network bandwidth usage data may include a plurality of data points. The optimizing the historical network bandwidth usage data may comprise, for each data point: determining an acceptable rate range of the data point based at least on the priority; determining whether a real rate value of the data point is outside the acceptable rate range; and in response to determining that the real rate value of the data point is outside the acceptable rate range, moving the data point so that a rate value of the data point is within the acceptable rate range.

In an implementation, the tuning network traffic to be transmitted from the target object may comprise: tuning the network traffic to be transmitted from the target object through applying a traffic tuning policy on the network traffic, the traffic tuning policy specifying a rate range and/or a priority for the network traffic.

The target object may be the target process. The traffic tuning policy may include a process name of the target process.

The target object may be the target component in a process. All components in the process initially share an original destination por. The method 800 may further comprise: changing, at the machine, a destination port corresponding to the target component to a new destination port different from the original destination port.

The traffic tuning policy may include the new destination port.

The new destination port may be converted back to the original destination port at a receiving machine.

It should be appreciated that the method 800 may further comprise any steps/processes for multi-level network traffic management according to the embodiments of the present disclosure as mentioned above.

FIG. 9 illustrates an exemplary multi-level network traffic management system 900 according to an embodiment of the present disclosure.

The multi-level network traffic management system 900 may comprise: a collector 910, configured to: collect a plurality of network events in a machine, generate network bandwidth usage data of a target object running on the machine based on the plurality of network events, the target object being a target process or a target component in a process; a monitor 920, configured to: detect whether the network bandwidth usage data is abnormal, and in response to detecting that the network bandwidth usage data is abnormal, generate a network traffic tuning request for the target object; and an executor 930, configured to: tune network traffic to be transmitted from the target object based on the network traffic tuning request.

The target object may be the target component in a process. All components in the process initially may share an original destination port. The system may further comprise: a port proxy, configured to change a destination port corresponding to the target component to a new destination port different from the original destination port.

It should be appreciated that the multi-level network traffic management system 900 may further comprise any other modules configured for multi-level network traffic management according to the embodiments of the present disclosure as mentioned above.

FIG. 10 illustrates an exemplary apparatus 1000 for multi-level network traffic management according to an embodiment of the present disclosure.

The apparatus 1000 may comprise: at least one processor 1010; and a memory 1020 storing computer-executable instructions. The computer-executable instructions, when executed, may cause the at least one processor 1010 to: collect a plurality of network events in a machine; generate network bandwidth usage data of a target object running on the machine based on the plurality of network events, the target object being a target process or a target component in a process; detect whether the network bandwidth usage data is abnormal; and in response to detecting that the network bandwidth usage data is abnormal, tune network traffic to be transmitted from the target object.

It should be appreciated that the processor 1010 may further perform any steps/processes for multi-level network traffic management according to the embodiments of the present disclosure as mentioned above.

The embodiments of the present disclosure propose a computer program product for multi-level network traffic management, comprising a computer program that is executed by at least one processor for: collecting a plurality of network events in a machine; generating network bandwidth usage data of a target object running on the machine based on the plurality of network events, the target object being a target process or a target component in a process; detecting whether the network bandwidth usage data is abnormal; and in response to detecting that the network bandwidth usage data is abnormal, tuning network traffic to be transmitted from the target object. Furthermore, the computer program may be further executed for implementing any other steps/processes of the methods for multi-level network traffic management according to the embodiments of the present disclosure as mentioned above.

The embodiments of the present disclosure may be embodied in a non- transitory computer-readable medium. The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations of the methods for multi-level network traffic management according to the embodiments of the present disclosure as mentioned above.

It should be appreciated that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the methods or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts. In addition, the articles “a” and “an” as used in this specification and the appended claims should generally be construed to mean “one” or “one or more” unless specified otherwise or clear from the context to be directed to a singular form.

It should also be appreciated that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.

Processors have been described in connection with various apparatuses and methods. These processors may be implemented using electronic hardware, computer software, or any combination thereof. Whether such processors are implemented as hardware or software will depend upon the particular application and overall design constraints imposed on the system. By way of example, a processor, any portion of a processor, or any combination of processors presented in the present disclosure may be implemented with a microprocessor, microcontroller, digital signal processor (DSP) , a field-programmable gate array (FPGA) , a programmable logic device (PLD) , a state machine, gated logic, discrete hardware circuits, and other suitable processing components configured for performing the various functions described throughout the present disclosure. The functionality of a processor, any portion of a processor, or any combination of processors presented in the present disclosure may be implemented with software being executed by a microprocessor, microcontroller, DSP, or other suitable platform.

Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, threads of execution, procedures, functions, etc. The software may reside on a computer-readable medium. A computer-readable medium may include, by way of example, memory such as a magnetic storage device (e.g., hard disk, floppy disk, magnetic strip) , an optical disk, a smart card, a flash memory device, random access memory (RAM) , read only memory (ROM) , programmable ROM (PROM) , erasable PROM (EPROM) , electrically erasable PROM (EEPROM) , a register, or a removable disk. Although memory is shown separate from the processors in the various aspects presented throughout the present disclosure, the memory may be internal to the processors, e.g., cache or register.

The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skilled in the art are expressly incorporated herein and intended to be encompassed by the claims.

Claims

A method for multi-level network traffic management, comprising:

collecting a plurality of network events in a machine;

generating network bandwidth usage data of a target object running on the machine based on the plurality of network events, the target object being a target process or a target component in a process;

detecting whether the network bandwidth usage data is abnormal; and

in response to detecting that the network bandwidth usage data is abnormal, tuning network traffic to be transmitted from the target object.
The method of claim 1, wherein the target object is the target process, and the generating network bandwidth usage data of a target object comprises:

selecting, from the plurality of network events, a network event set corresponding to the target process through performing process mapping; and

generating the network bandwidth usage data of the target process based on the network event set.
The method of claim 1, wherein the target object is the target component, and the generating network bandwidth usage data of a target object comprises:

determining a process corresponding to the target component;

selecting, from the plurality of network events, a network event set corresponding to the determined process through performing process mapping;

selecting, from the network event set, a network event subset corresponding to the target component through performing component mapping; and

generating the network bandwidth usage data of the target component based on the network event subset.
The method of claim 1, wherein the network bandwidth usage data is time-series data, and the detecting whether the network bandwidth usage data is abnormal comprises:

dividing the network bandwidth usage data into a plurality of data windows;

determining whether all data windows in the plurality of data windows are abnormal data windows; and

in response to determining that all data windows in the plurality of data windows are abnormal data windows, detecting that the network bandwidth usage data is abnormal.
The method of claim 4, wherein each data window includes one or more data points, and the determining whether all data windows in the plurality of data windows are abnormal data windows comprises, for each data window:

determining whether the data window includes at least one abnormal data point; and

in response to determining that the data window includes at least one abnormal data point, determining that the data window is an abnormal data window.
The method of claim 5, wherein the determining whether the data window includes at least one abnormal data point comprises:

obtaining a predicted rate range of at least one data point in the data window;

determining whether a real rate value of the at least one data point is outside the predicted rate range; and

in response to determining that the real rate value of the at least one data point is outside the predicted rate range, determining that the data window includes at least one abnormal data point.
The method of claim 6, wherein the predicted rate range is predicted through a rate range predicting model, and the rate range predicting model is previously trained for the target object.
The method of claim 7, wherein training of the rate range predicting model comprises:

identifying a category corresponding to the target object from a set of categories;

determining a priority of the target object based on the category;

obtaining a historical network bandwidth usage data of the target object;

optimizing the historical network bandwidth usage data based at least on the priority; and

training the rate range predicting model with the optimized historical network bandwidth usage data.
The method of claim 8, wherein the set of categories includes at least a customer-related and latency-sensitive object and a customer-unrelated or latency-insensitive object, and the determining a priority of the target object comprises:

determining the priority of the target object as a high priority if the category corresponding to the target object is the customer-related and latency-sensitive object; or

determining the priority of the target object as a low priority if the category corresponding to the target object is the customer-unrelated or latency-insensitive object.
The method of claim 8, wherein the historical network bandwidth usage data includes a plurality of data points, and the optimizing the historical network bandwidth usage data comprises, for each data point:

determining an acceptable rate range of the data point based at least on the priority;

determining whether a real rate value of the data point is outside the acceptable rate range; and

in response to determining that the real rate value of the data point is outside the acceptable rate range, moving the data point so that a rate value of the data point is within the acceptable rate range.
The method of claim 1, wherein the tuning network traffic to be transmitted from the target object comprises:

tuning the network traffic to be transmitted from the target object through applying a traffic tuning policy on the network traffic, the traffic tuning policy specifying a rate range and/or a priority for the network traffic.
The method of claim 11, wherein the target object is the target process, and the traffic tuning policy includes a process name of the target process.
The method of claim 11, wherein the target object is the target component in a process, all components in the process initially share an original destination port, and the method further comprises:

changing, at the machine, a destination port corresponding to the target component to a new destination port different from the original destination port.
The method of claim 13, wherein the traffic tuning policy includes the new destination port.
The method of claim 13, wherein the new destination port is converted back to the original destination port at a receiving machine.
A multi-level network traffic management system, comprising:

a collector, configured to:

collect a plurality of network events in a machine,

generate network bandwidth usage data of a target object running on the machine based on the plurality of network events, the target object being a target process or a target component in a process;

a monitor, configured to:

detect whether the network bandwidth usage data is abnormal, and

in response to detecting that the network bandwidth usage data is abnormal, generate a network traffic tuning request for the target object; and

an executor, configured to:

tune network traffic to be transmitted from the target object based on the network traffic tuning request.
The multi-level network traffic management system of claim 16, wherein the network bandwidth usage data is time-series data, and the detecting whether the network bandwidth usage data is abnormal comprises:

dividing the network bandwidth usage data into a plurality of data windows;

determining whether all data windows in the plurality of data windows are abnormal data windows; and

in response to determining that all data windows in the plurality of data windows are abnormal data windows, detecting that the network bandwidth usage data is abnormal.
The multi-level network traffic management system of claim 16, wherein the tuning network traffic to be transmitted from the target object comprises:

tuning the network traffic to be transmitted from the target object through applying a traffic tuning policy on the network traffic, the traffic tuning policy specifying a rate range and/or a priority for the network traffic.
The multi-level network traffic management system of claim 18, wherein the target object is the target component in a process, all components in the process initially share an original destination port, and the system further comprises:

a port proxy, configured to change a destination port corresponding to the target component to a new destination port different from the original destination port.
A computer program product for multi-level network traffic management, comprising a computer program that is executed by at least one processor for:

collecting a plurality of network events in a machine;

generating network bandwidth usage data of a target object running on the machine based on the plurality of network events, the target object being a target process or a target component in a process;

detecting whether the network bandwidth usage data is abnormal; and

in response to detecting that the network bandwidth usage data is abnormal, tuning network traffic to be transmitted from the target object.