WO2015199366A1

WO2015199366A1 - Method for scheduling in multiprocessing environment and device therefor

Info

Publication number: WO2015199366A1
Application number: PCT/KR2015/005914
Authority: WO
Inventors: 정기웅
Original assignee: 정기웅; (주) 구버넷
Priority date: 2014-06-26
Filing date: 2015-06-12
Publication date: 2015-12-30

Abstract

Disclosed are a network interface device for processing a packet, and a method therefor. The network interface device comprises multiple queues. Upon receiving a packet through a physical network, the network interface device identifies the flow of the packet, stores the packet to the multiple queues in units of flow, and then executes parallel processing of the packet through a multiprocessor.

Description

Scheduling method and apparatus therefor in multiprocessing environment

The present invention relates to a multiple scheduling method and apparatus, and more particularly, to a method and apparatus for scheduling in a multiple processing environment for parallel processing a packet.

The multiple processing system parallelizes a plurality of processes using a plurality of central processing unit (CPU) cores. However, in the case of multiprocessing, there are problems such as load distribution among CPU cores, resource sharing among CPU cores, and cache efficiency degradation.

In particular, in a multi-processing system that parallelizes packets, it is desirable to process packets belonging to the same flow in the same CPU and maintain the flow affinity for the CPU in order to improve packet processing efficiency. Due to the heavy load on the CPU, an imbalance in load distribution may occur between the entire CPUs, which may lower the overall processing efficiency of the multiprocessing system.

In order to solve this problem, load balancing between CPUs should be performed periodically, but in this case, the CPU that processes the flow is changed during the load balancing process, which lowers the flow affinity. There is a problem in that packet processing efficiency of a multiprocessing system is lowered due to the need for a packet re-ordering process.

As such, in order to increase the processing efficiency of the multiprocessing system, the flow affinity and the appropriate load balancing are required, but the two conflict with each other, and thus, it is necessary to appropriately compensate for this.

In addition, recently, the amount of communication through the Internet is rapidly increasing, and accordingly, the capacity and speed of the server are rapidly increasing. Server virtualization is accelerating in order to solve the physical volume increase and cost savings caused by the large capacity of the server. Due to the large capacity, high speed, and virtualization of servers, it is essential to increase the efficiency of parallel processing for large data including data packets generated in the virtual environment received from the physical network. As the performance degradation caused by the increase of the load is caused, the technical concept of transferring the load of the server according to the virtual switch function to the physical network interface device is required.

In the case of a NIC supporting a conventional virtualization environment, there is an attempt to reduce the bottleneck between the network interface device and the virtual switch of the server by creating and managing queues in units of virtual machines as a method of supporting the virtualization environment in the physical network interface device. . In the conventional case, however, processor allocation for parallel processing of received data packets and redistribution of queues are performed in units of virtual machines. In other words, processor allocation takes into account only the physical layer of the virtualization environment. Therefore, processor affinity, which is one of the most important factors to increase the processing efficiency in parallel processing, cannot be considered, and processor allocation and redistribution of queues take place only in consideration of processor usage load. This may act as a factor to reduce the efficiency of parallel processing.

SUMMARY OF THE INVENTION The present invention provides a scheduling method for mitigating a trade-off between flow affinity and load balancing in a multi-processing environment that performs packet parallel processing and increasing the efficiency of use of all processors. To provide a device.

In accordance with an aspect of the present invention, there is provided a scheduling method according to the present invention, comprising: grouping all or some of a plurality of processors into at least one processor group; And if there is a processor group or processor specified for the flow of the received packet, assigning the flow to the predetermined processor group or processor; And if a predetermined processor group or processor does not exist for the flow of the received packet, generating and assigning a new processor group for the flow or allocating the flow to a processor that does not belong to any processor group.

An example of the scheduling method according to the present invention for achieving the above technical problem, in the scheduling method in a multi-processing apparatus, single scheduling or based on the load status or processing capacity of the plurality of processors or the nature of the received packet Determining multiple scheduling; In the case of the single scheduling, designating one of the plurality of processors as a scheduler; And in the case of the multiple scheduling, grouping a plurality of processors into at least two processor groups and designating one of the processors in each processor group as a scheduler of each processor group.

In order to achieve the above technical problem, an example of a scheduling method according to the present invention is to obtain a deep packet including virtualization environment network layer information encapsulated in a physical network frame from a packet received through a physical network. step; Identifying the deep packet as a deep flow based on the virtualized environment network layer information included in the deep packet; And dividing the deep packet into the identified deep flow unit and assigning the deep packet to a corresponding queue.

One example of a network interface device according to the present invention for achieving the above technical problem is to obtain a deep packet including virtualization environment network layer information, encapsulated in a physical network frame from a packet received through a physical network. A packet receiver; A packet analyzer configured to identify the deep packet as a deep flow based on the virtualized environment network layer information included in the deep packet; And a scheduler for dividing the deep packet into the identified deep flow unit and assigning the deep packet to a corresponding queue.

According to the present invention, the tradeoff between flow affinity and load balancing can be alleviated to improve the performance of parallel processing. In addition, by using a plurality of dynamically assigned schedulers, latency due to packet scheduling and queuing may be reduced. In addition, various scheduling algorithms according to traffic attributes may be easily applied through a plurality of schedulers. In addition, the load of a server having a virtualized environment including a plurality of virtual machines is reduced. By processing packets on a deep-flow basis, the affinity between the deep packet and the processor increases, increasing the efficiency of parallel processing. In addition, the load on the virtual switch can be distributed among the network interface cards to increase the efficiency of virtual network processing. In addition, by processing queued by deep flow unit, scalable communication processing that guarantees QoS of virtual machine end-to-end deep flow unit can be implemented.

1 illustrates an example of a multiple processing apparatus for performing a single scheduling according to the present invention;

2 illustrates an example of a multiple processing method for performing a single scheduling according to the present invention;

3 is a diagram illustrating an example of a classification policy for multiple scheduling according to the present invention;

4 is a diagram illustrating an example configuration of a multiple processing apparatus to which a multiple scheduling method is applied according to the present invention;

5 is a view showing an example of a detailed configuration of a multi-scheduling unit according to the present invention,

6 is a flowchart illustrating an example of a multiple scheduling method in a multiple processing environment according to the present invention;

7 is a flowchart illustrating another example of a multiple scheduling method in a multiple processing environment according to the present invention;

8 is a flowchart illustrating an example of a scheduling method of a processor designated as a scheduler in a processor group in a multiple processing environment according to the present invention;

9 illustrates an example of processor grouping for multiple scheduling according to the present invention;

10 illustrates an example of a method for dynamically grouping processors for multiple scheduling according to the present invention;

11 illustrates an example of a schematic structure of a system including a network interface device for multiple processing according to the present invention;

12 illustrates an example of a method for dynamically setting resources of a NIC according to the present invention;

13 illustrates a configuration of an embodiment of a NIC according to the present invention;

14 illustrates an example of deepflow-based queue allocation of a NIC according to the present invention;

15 illustrates another example of a deepflow-based queue allocation of a NIC according to the present invention;

16 is a view showing an example of a deep packet used in the present invention, and

17 is a flowchart illustrating an example of a packet processing method for a virtual machine environment according to the present invention.

Hereinafter, a multi-scheduling method and a device in a multi-processing environment according to the present invention will be described in detail with reference to the accompanying drawings.

1 is a diagram illustrating an example of a multiple processing apparatus for performing a single scheduling according to the present invention.

Referring to FIG. 1, the multiprocessor device 100 may include a packet identification unit 105, a packet transfer unit 140, a packet allocation table 130, a memory 104, a plurality of

queues

110, 112, and 114, and a plurality of processors ( 120, 122, 124 and the controller 150.

The packet identification unit 105 receives a packet from a wired or wireless network or another device, and identifies a flow of the received packet. In addition, the packet identification unit 105 determines whether there is a processor allocated to the received packet flow with reference to the packet allocation table 130.

The packet allocation table 130 includes information of processors allocated to each packet flow. For example, the packet allocation table 130 may include information indicating that a first processor is allocated as a processor for processing the first flow and a second flow, and a second processor is allocated as a processor for processing the third flow. Include. The information stored in the packet allocation table 130 is then generated and updated by the review scheduler.

The memory 104 stores the packet received by the packet identification unit 105. In this case, the memory 104 may store the flow information of the packet identified by the packet identification unit 105, the processor information obtained by referring to the packet allocation table 130, and the like.

The packet transmitter 140 transfers the packets stored in the memory 104 to the queue of the corresponding processor. The packet delivery unit 140 may deliver the packets stored in the memory to the processor queue in order, or deliver the packets out of order to the processor queue in consideration of various conditions such as quality of service (QoS), priority, or the like. have.

The

queues

110, 112, and 114 receive and store packets to be processed by each processor from the memory 104. In the present exemplary embodiment, one

queue

110, 112, and 114 are present for each of the

processors

120, 122, and 124, but the present invention is not limited thereto, and two or more queues exist in one processor, or two or more queues exist in one queue. Can share Alternatively, the

queues

110, 112, 114 may be grouped through the method disclosed in FIGS. 11 through 17.

In addition, the

queues

110, 112, and 114 have a first-in first-out (FIFO) structure, but are not necessarily limited thereto, and may be implemented in various types of structures such as last-in-first-out (LIFO) or priority-based output. It can be a form that can store packets.

When packet flow information does not exist in the packet allocation table 130, the controller 150 designates one of the plurality of processors as a scheduler and transmits an interrupt request signal. Upon receiving the interrupt request signal, the processor selects a processor to process a packet and stores related information in a packet allocation table.

The plurality of

processors

120, 122, and 124 process packets, respectively. In addition, in consideration of the efficiency of the multi-processing system and the reduction in manufacturing cost, one of the plurality of processors (for example, processor 1 120) may be used as a scheduler without having a separate scheduler for packet scheduling. A method of using one of the plurality of processors as a scheduler will be described with reference to FIG. 2. Of course, in this embodiment, a separate scheduler may be provided in addition to the plurality of processors.

2 is a diagram illustrating an example of a multiple processing method for performing a single scheduling according to the present invention.

1 and 2 together, the packet identification unit 105 analyzes the received packet to identify the packet flow (S200, S210). Here, the flow identification method analyzes traffic attributes of all layers of the received packet and classifies the packet according to a predetermined network communication policy. For example, one flow may be distinguished by a communication policy that is set by using attributes such as a transmission node address, a destination address, a session, and an application layer of the received packet. The packet identifying unit 105 refers to the packet allocation table 130 to determine whether there is information of a processor to process the flow (S220). The packet, flow information of the packet, processor information, and the like are stored in the memory 104.

If information on the processor to process the flow exists in the packet allocation table 130 (S230), the packet transfer unit 140 delivers the packet to the queue of the processor (S260). For example, if the received packet is identified as the first flow by the packet identification unit 105, and the second processor is assigned to the processor to process the first flow in the packet allocation table 130, the packet delivery unit 140. Transmits the packet to the queue 112 of the second processor 122.

On the other hand, if there is no information on the processor to process the flow in the packet allocation table 130 (that is, a new flow) (S230), the controller 150 sends an interrupt request signal to a processor designated as a scheduler among a plurality of processors. It passes (S240). The controller 150 may designate a processor having the least current load among a plurality of processors as a scheduler, a scheduler through a preset scheduler determination algorithm, or designate a preset processor as a scheduler. In the present embodiment, processor 1 120 is designated as a scheduler.

Receiving the interrupt request signal, the processor 120 stops the previously performed work and performs a scheduling operation (S250). For example, the processor 120 designated as a scheduler selects a processor to process a new flow (S250), and stores information on the selected processor in the packet allocation table 130 (S260). If an interrupt request is delivered to the processor designated by the scheduler, the interrupt request for newly input packet processing is not allowed until the interrupt is released.

In addition, the processor 120 designated as the scheduler may perform load rebalancing between each processor by applying various conventional load balancing algorithms or when a specific event occurs, such as when the load imbalance is more than a predetermined level or periodically. re-balancing).

In the case of FIGS. 1 and 2, a new interrupt is not allowed from the system while one scheduler 120 is selected to perform the task, and processing for another new flow is delayed until the requested interrupt is released. In addition, since load redistribution is performed for all the processors to solve the load imbalance, there is a problem in that the conflict between the flow affinity and the load distribution becomes more severe. This can be mitigated through the multiple scheduling of FIG.

3 is a diagram illustrating an example of a classification policy for multiple scheduling according to the present invention.

Referring to FIG. 3, the classification policy includes a policy for dividing a plurality of processors into groups. As shown in FIG. 4, in the case of multiple scheduling, a plurality of processors are divided into at least two groups to perform scheduling for each group. This requires a policy for dividing a plurality of processors into groups.

An example of a classification policy is a packet flow based policy, as shown in FIG. 3. The flow can be divided into two groups A and B based on attributes that can divide the packet flow hierarchically. In this case, the plurality of processors may be divided into two groups according to which group the flow currently being processed belongs to.

Another example is a classification policy based on the load of each processor. Processors can be divided according to a predetermined number of groups so that the load distribution of each group can be even.

The classification policy for dividing the plurality of processors into groups may be a plurality. For example, the first policy is a policy for dividing a plurality of processors into two groups based on a flow, the second policy is a policy for dividing a plurality of processors into three groups based on a flow, and the third policy is a plurality of policies. It may be a policy to divide the processors into at least two groups according to the load level.

The present invention is not limited to the embodiment of FIG. 3, and various classification policies for dividing the processors may be applied. The classification policy is set in advance, and the user may update the classification policy through a separate input / output interface. In FIG. 4, it is assumed that a criterion for dividing a plurality of processors, that is, a classification policy, is set in advance for multiple scheduling.

4 is a diagram illustrating an example of a configuration of a multiple processing apparatus to which a multiple scheduling method according to the present invention is applied.

Referring to FIG. 4, the multiprocessing apparatus 400 includes a packet identification unit 410, a packet transfer unit 480, a packet allocation table 420, a multischeduling unit 430, a memory 440, and a plurality of queues ( 450) and a plurality of processors (460, 462, 464, 466, 470, 472, 474).

The packet identification unit 410, the packet transfer unit 480, the packet allocation table 420, the memory 440, the plurality of queues 450, the

processors

460, 462, 464, 466, 470, 472, and 474 include all the configurations and functions described with reference to FIG. 1. . Therefore, the present embodiment will not be repeated descriptions of the same configuration and function as in FIG. 1 and will be described based on the configuration and function necessary for multischeduling according to the present embodiment.

The multi-scheduling unit 430 determines whether to perform a single scheduling or multiple scheduling based on the state information of the multiple processing system such as load distribution status, traffic attributes, traffic processing capacity, and the like. For example, the multi-scheduling unit 430 may change to multi-scheduling while performing a single scheduling, or vice versa.

When there are a plurality of classification policies as shown in FIG. 3, the multi-scheduling unit 430 may determine which classification policy to apply based on the state information. When the multi-scheduling unit 430 determines to perform the multi-scheduling, the multi-scheduling unit 430 classifies the plurality of processors into at least two groups according to the classification policy, and designates a scheduler to perform the scheduling for each group. Detailed configuration of the multi-scheduling unit is shown in FIG.

For example, as shown in the example of FIG. 4, the multi-scheduling unit 430 divides seven processors into two processor groups (first group: processors 1 to 4 and second group: processors 5 to 7). One processor (466, 474) in the group is designated as a scheduler by a predetermined scheduler determination algorithm. When the multi-scheduling unit 430 divides the processors into groups, information about the processor group may be stored in the packet allocation table 420.

For example, when grouping processors on a flow basis as shown in FIG. 3, the multi-scheduling unit 430 stores information on which group the flow belongs to in the packet allocation table 420. When the information on the newly received packet flow does not exist in the packet allocation table 420, the packet identification unit 410 determines which group the new flow belongs to and stores the packet and identification information on the packet in memory ( 440). The multi-scheduling unit 430 designates a scheduler to process the packet in the group according to the load level or a predetermined scheduler determination algorithm, and transmits an interrupt request signal to process the packet to the specified scheduler. As described with reference to FIG. 1, the processor designated as the scheduler performs a scheduling operation such as selecting a processor in a group to process the flow.

For example, the first group 490 including the processors 1 to 4 is allocated to the flow group A 300 according to the grouping policy of FIG. 3, and the processors 5 to 7 are assigned to the flow group B 310. Assume that a second group of processors 495 including n is allocated. If the flow of the received packet belongs to the flow group A (300), the multi-scheduling unit 430 designates one of the processors belonging to the processor first group 490 as a scheduler to perform the scheduling operation in the group 490. To be performed. In this case, the processor second group 495 may perform new packet processing or scheduling regardless of whether the first group 490 is scheduled or not, thereby improving processing efficiency of the entire processor. In other words, scheduling operations of the first processor group 490 and the second processor group 495 may be performed in parallel.

As another example, in multi-scheduling, the grouping of processors may be changed at any time according to the processor load or various policies, and the ungrouped processors may be newly grouped into a processor group for a flow of a new received packet. This will be described again with reference to FIGS. 9 and 10.

5 is a diagram illustrating an example of a detailed configuration of a multi-scheduling unit according to the present invention.

Referring to FIG. 5, the multi-scheduling unit 430 includes a policy decision unit 500, a group divider 510, and a scheduler unit 520.

The policy decision unit 500 determines a single scheduling or multiple scheduling using various state information of a multiprocessing environment, for example, load distribution state, traffic attributes, traffic processing capacity, and the like. In addition, the policy decision unit 500 determines how to divide a plurality of processors when performing multiple scheduling, or which policy for each divided group to apply.

For example, the policy determiner 500 may determine to perform multi-scheduling when the total traffic processing capacity of the multi-processing apparatus is less than or equal to a predetermined level, and select the flow-based policy as shown in FIG. 3 as the classification policy.

The group divider 510 divides the plurality of processors into at least two groups according to the classification policy determined by the policy determiner 500.

The scheduler unit 520 designates one of the processors in each group as a scheduler according to a load level or a preset scheduler selection algorithm. For example, the scheduler 520 may designate a processor having the least load for each group as a scheduler. In addition, you can pin a specific processor as a scheduler, or specify a scheduler dynamically with several other choices.

6 is a flowchart illustrating an example of a multiple scheduling method in a multiple processing environment according to the present invention.

4 and 6 together, the multi-scheduling unit 430 grasps various state information such as traffic capacity, flow attribute, load distribution state (S600). Based on the state information, the multi-scheduling unit 430 determines whether to perform a single scheduling or multi-scheduling (S610). When performing a single scheduling, the multiple scheduling unit 430 designates one of the plurality of processors as a scheduler. When performing multi-scheduling, the multi-scheduling unit 430 divides the plurality of processors into at least two groups according to the classification policy (S620), and designates a processor to operate as a scheduler for each group (S630). The processor designated for each group performs scheduling on the corresponding packet according to the interrupt request signal as shown in FIG. 1, but does not perform scheduling for all processors, but only for processors in the group to which the processor belongs. Therefore, each group divided into two or more may be independently scheduled simultaneously by a scheduler designated for each group. In other words, each group scheduler receives an interrupt signal from the multi-scheduling unit 430, and may perform a scheduling operation such as selection of a processor for packet processing regardless of whether interrupts to other schedulers are released. have. Each divided group may be independently applied with different policies or algorithms as necessary.

7 is a flowchart illustrating another example of a multiple scheduling method in a multiple processing environment according to the present invention. FIG. 7 assumes that a plurality of processors are grouped by the salping multi-scheduling unit.

4 and 7, when receiving a packet (S700), the packet identification unit 410 analyzes the packet to identify a flow (S710). When the information on the flow exists in the packet allocation table 420 (S730), the packet identification unit 410 determines which group the new flow belongs to, and identifies the packet and identification information about the packet. Stored in the memory 440. The multi-scheduling unit 430 transmits an interrupt request signal to the scheduler to process the packet.

If the information on the flow does not exist in the packet allocation table 420 (S730), the multi-scheduling unit 430 refers to the classification policy to determine the group to which the flow belongs (S740). For example, when the processor group is divided based on the flow as shown in FIG. 3, the multi-scheduling unit 430 determines which group the newly recognized flow belongs to based on an upper attribute that classifies the flow hierarchically. As another example, when processor groups are divided according to load distribution, the multischeduling unit 430 may select a group having a relatively low load as a group to which a newly recognized flow belongs.

After determining which group the new flow belongs to, the multi-scheduling unit 430 designates a scheduler to process the packet in the group by using a load level or a predetermined scheduler determination algorithm, and processes the packet to the specified scheduler. Send the interrupt request signal.

Upon receiving the interrupt signal, the processor operates as a scheduler, selects a processor to process a new flow, and stores related information in the packet allocation table 420 (S750 and S760).

For example, referring back to FIG. 4, when a new flow is allocated to a group consisting of processors 1 to 4, the multischeduling unit 430 transmits an interrupt signal to the processor 4 466 designated as a scheduler. Processor 4 466 selects processor 1 460 as a processor to process the packet according to a predetermined processor determination algorithm and stores the corresponding information in packet allocation table 420. Then, when a packet of the same flow is received, the packet transfer unit 480 allocates the packet to processor 1 (460).

8 is a flowchart illustrating an example of a scheduling method of a processor designated as a scheduler in a processor group in a multiple processing environment according to the present invention.

Referring to FIG. 8, a plurality of processors perform a general flow process or the like before receiving an interrupt signal (S800). If there is a processor group or processor allocated to the newly received packet flow, the multi-scheduling unit 430 allocates the flow to the predetermined processor group or processor. If there is no processor group or processor allocated to the newly received packet flow, the multi-scheduling unit 430 generates a processor group for the flow of the new packet and then interrupts any one processor in the newly created processor group. The signal is transmitted (S810).

After receiving the interrupt signal, the processor stops an operation previously performed (S820), performs a scheduling operation for determining which processor to allocate a new packet flow to (S830), and resumes the operation performed before the interrupt signal is received. (S840). The multischeduling unit 430 may transmit the interrupt signal for each processor group, and thus, each processor group may simultaneously perform the scheduling operation.

9 illustrates an example of processor grouping for multiple scheduling according to the present invention.

Referring to FIG. 9, the multi-scheduling unit 430 designates any one of the processors 900 as a scheduler in the case of single scheduling, and in the case of multi-scheduling, all or some of the processors may be at least one group. Group at (910,920) and specify a scheduler for each group.

The multi-scheduling unit 430 may create a new processor group or update an existing processor group according to the load state of the processors or various preset policies.

For example, in the case of multi-scheduling, the multi-scheduling unit 430 groups a plurality of processors into two groups, a first group 910 of processors 1 to 3 and a second group 920 of processors 4 to 5, and the rest of the plurality of processors. The processors may not be grouped.

If a new processor group is needed during the multi-scheduling operation, the multi-scheduling unit 430 groups all or some of the processors that are not currently grouped to create a new processor group or change existing processor groups to create a new processor group. Can be generated.

As another example, the multi-scheduling unit 430 may be added to a group that is already grouped during the multi-scheduling operation, for example, the first group 910 of the processors 1 to 3 and the second group 920 of the processors 4 to 5. The update operation may be performed such as adding a processor or removing some of the processors in the group.

10 illustrates an example of a method for dynamically grouping processors for multiple scheduling according to the present invention.

Referring to FIG. 10, the multischeduling unit 430 generates a new processor group or updates an existing processor group when a new processor group for a flow of a received packet is needed or when a previous processor group needs to be changed (S1000). ). For example, when there is no processor group or processor allocated to a newly received packet flow, the multi-scheduling unit 430 creates all or part of the ungrouped processors among the processors as a new processor group. . As another example, the load level within the processor group or the total processor load reaches a certain level or according to a preset policy, the multi-scheduling unit 430 reconfigures the entire processor group or adds a new processor to a specific processor group. Or remove at least one of the existing processors in the group.

The multi-scheduling unit 430 designates any one processor in the generated or updated processor group as a scheduler (S1010). One processor in each group does not always operate as a scheduler, but operates as a scheduler only when an interrupt signal is received as described with reference to FIG. 8, and performs the same operation as other general processors when the scheduling operation is completed.

Up to now, a method of scheduling by grouping a plurality of processors has been described. Hereinafter, a method of grouping and scheduling queues connected to a plurality of processors will be described. The technical configuration of FIGS. 1 to 10 may be added to the technical configuration of FIGS. 11 to 17, or conversely, the configuration of FIGS. 11 to 17 may be added to the configuration of FIGS. 1 to 10. In other words, the grouping of the plurality of processors described with reference to FIGS. 1 to 10 and the grouping of the plurality of queues described with reference to FIGS. 11 to 17 may be simultaneously performed. The multiple processing device may be implemented as a network interface device.

11 is a diagram illustrating an example of a schematic structure of a system including a network interface device for multiple processing according to the present invention.

Referring to FIG. 11, a network interface device is implemented with a network interface card (NIC) 1100. However, the network interface device is not necessarily limited to the network interface card 1100, and may be implemented in various forms such as hardware or software within and outside the server. For convenience of explanation, hereinafter, a network interface device is referred to as a NIC.

The server 1120 includes a plurality of

virtual machines

1150, 1152, and 1154, a virtual switch 1140, and a connection slot 1130. The virtual switch 1140 forwards the packet received through the NIC 1100 to the destination virtual machine. The connection slot 1130 is an interface connecting the NIC 1100 and the server 1120 and may be implemented as, for example, a Peripheral Component Interconnect Express (PCIe). In this case, the NIC 1100 may be attached to or detached from the PCIe slot.

The NIC 1100 analyzes traffic characteristics of upper layers of packets received from the network 1110 to identify flows, and processes the identified flows in parallel through multiple processors. Here, the packet refers to a packet that encapsulates virtualization environment network layer information using various techniques such as conventional tunneling so as to be delivered to a plurality of virtual machines. The virtual environment network layer refers to a network layer formed of a virtual machine, and the virtual environment network layer information refers to network layer information formed of a virtual machine encapsulated in a physical network frame for packet transmission in a network layer formed of a virtual machine. do. Hereinafter, a packet identified based on the virtualization environment network layer information used in the present embodiment is referred to as a deep packet. The deep packet is encapsulated in a physical network frame to be recognized by a general communication protocol in the physical network so that a smooth transmission can be made. In addition, a flow classified using the DIP packet virtualization network layer information is called a deep flow. Deep flow is described as the flow of service end created in virtual machine in communication service structure.

Deepflow can be defined as specific traffic in the virtualized environment network classified according to the upper layer (vL3 or higher) traffic attribute in the virtualized environment network frame from which the deep packet's physical network frame is removed. Deepflows can be classified and identified according to a number of preset policies. For example, the packet analyzer 1310 may identify a TCP flow of a virtual machine as a deep flow. The structure of the deep packet will be described with reference to FIG. 16.

The NIC 1100 includes a plurality of queues and a plurality of processors for parallel processing of the received deep packets, and the size and number of the queues are fixed, or information about the deep flow, information on the virtualization environment of the server, load of the processors, and the like. Can be changed dynamically.

12 is a diagram illustrating an example of a method for dynamically setting resources of a NIC according to the present invention.

11 and 12, when the NIC 1100 is attached to the connection slot 1130 of the server 1120 and connected to the server 1120 (S1200), the NIC 1100 is connected to the server 1120. The virtual environment information including the number of virtual machines is received (S1210). The NIC 1100 dynamically sets resources such as size and number of queues and creation of a queue group according to the information on the deep flow, the received virtualization environment information, or the load distribution of the processor (S1220).

For example, when the NIC 1100 receives virtualization environment information that there are four virtual machines from the server 1120, the NIC 1100 may allocate three queues three for each virtual machine. As another example, the NIC 1100 may divide the deepflow into two groups based on the information on the deepflow, and allocate six queues to each group. The number of queues allocated to each virtual machine or the size of each queue may be variously set according to a preset rule.

13 is a diagram illustrating a configuration of an embodiment of a NIC according to the present invention.

Referring to FIG. 13, the NIC 1100 includes a packet receiver 1300, a packet analyzer 1310, a memory 1320, a plurality of queues 1330, a plurality of processors 1340, a scheduler 1350, and a monitoring unit. 1360 and the queue manager 1370. The connection line between each component including the packet receiver 1300 is just one example to help understanding of the present invention, and the connection between the queue manager 1370 and the monitoring unit 1360, the scheduler 1350 and the plurality of queues. Of course, various connection relationships, such as the connection between (1330) can be set.

When the packet receiver 1300 receives a packet in which the deep packet is encapsulated by various tunneling methods such as conventional tunneling so as to be recognized as a general Ethernet frame in an external network, the packet receiver 1300 removes the header part corresponding to the physical network. Restore data packet frames in a virtualized environment.

The packet analyzer 1310 identifies a deep flow of the restored deep packet. In order to identify the deep flow, not only the data link layer (vL2 layer) but also the upper layer of the network layer (vL3 layer) in the virtualization environment should be interpreted. To this end, the packet analyzer 1310 analyzes a virtual packet from a virtual data link layer (vL2 layer) to a virtual application layer (vL7 layer) of the decapsulated deep packet through a DPI (Deep Packet Insepection) process to identify a deep flow. . The analysis of the deep packet for deep flow identification is not limited to analyzing both the virtual data link layer and the virtual application layer, and the scope of the analysis may vary according to the deep flow identification policy.

The memory 1320 stores the deep packet and the deep flow information identified by the packet analyzer 1310, and stores and manages a flow table indicating a mapping relationship between the deep flow and the queue.

According to an embodiment, the packet receiver 1300 stores the decapsulated deep packet in the memory 1320, and notifies the packet analyzer 1310 of the fact that the deep packet is stored. The packet analyzer 1310 then performs deep flow identification for the corresponding deep packet stored in the memory 1320. That is, the packet analysis unit 1310 that has learned that the new deep packet has been received, identifies the deep flow characteristic of the corresponding deep packet according to a preset policy, stores the information, and informs the scheduler 1350 of the deep packet.

The scheduler 1350 assigns the identified deep flows to corresponding queues, respectively, and assigns each queue in parallel to the multiprocessor 1340. More specifically, the scheduler 1350 searches for a queue to which a deep packet deep flow is mapped by referring to a flow table stored in the memory 1320, and delivers the deep packet stored in the memory 1320 to the retrieved queue. If there is no mapping information of the deep packet received in the table, the scheduler 1350 allocates the deep flow to a specific queue through various conventional methods, and stores the mapping relationship between the deep flow and the queue in the flow table. .

The scheduler 1350 may queue a deep packet for each virtual machine in a deep flow unit. For example, when establishing a mapping relationship between a deepflow and a queue, a first flow and a second flow of the same nature (eg, the same QoS priority) facing the first virtual machine and the second virtual machine are in the same queue. Can be assigned. Although the present invention does not exclude such a case, in order to increase the efficiency of parallel processing, it is preferable to allocate a deep flow to different groups of queues for each virtual machine. In other words, when the scheduler 1350 groups the queues for each virtual machine as shown in FIG. 14, the first flow for the first virtual machine is allocated to the queues of the first group 1400 on a deep flow basis, and the second virtual The second flow for the machine is assigned to the queue of the second group 1410 in deepflow units.

For example, upon receiving the fact that a new deep packet has been loaded into the memory 1320 and the deep flow information of the deep packet, the scheduler 1350 searches the flow table to find out which queue the deep flow is allocated to, and the memory. The deep packet loaded in 1320 can be loaded into the found queue. If the information on the identified deep flow cannot be found in the flow table, the scheduler 1350 may allocate the deep packet to one of the queues belonging to the corresponding virtual machine according to a preset policy. Here, the preset policy may vary according to embodiments. For example, a policy for selecting a queue in consideration of flow affinity, a policy for selecting a queue with the least load among queues in a virtual machine to which a deep packet is sent, and a utilization rate There is a policy for selecting the queue assigned to the lowest processor.

Each of the plurality of queues 1330 is mapped to at least one deep flow. Queuing on a deep-flow basis increases processor affinity, which increases the efficiency of parallel processing. The plurality of queues 1330 may be divided into groups including at least one queue for each virtual machine. In addition, the plurality of queues 1330 may be divided into at least two partitions as shown in FIG. 15.

The scheduler 1350 may be a processor selected from a plurality of processors. For example, a specific processor 1350 of all the processors 1380 may be designated as a scheduler, or the load of each processor may be determined through the monitoring unit 1360, and then the processor having the least load may be selected as the scheduler 1350. have. In addition, various methods for selecting a scheduler may be applied. When a scheduler is designated among the processors, the controller (not shown) generates an interrupt signal whenever the scheduling is needed and transmits the interrupt signal to the processor designated as the scheduler as shown in FIG. Stop, finish the action as a scheduler, and perform the previous task again.

The plurality of processors 1340 processes the deep packets stored in each queue in parallel and transmits them to the virtual machine of the server. A salping single scheduling or multiple scheduling method may be applied to the plurality of processors in FIGS. 1 to 10. That is, a plurality of processors may be grouped and scheduled for each group. The plurality of processors 1340 are connected with at least one queue.

For example, the plurality of processors 1340 are connected to the queue in consideration of flow affinity. In other words, queues that store deep packets with the same or similar deep flow attributes are tied to the processor.

As another example, the plurality of processors 1340 may be connected to a queue for each virtual machine. Referring to FIG. 14, a first processor is connected to first to third queues 1400 allocated to a first virtual machine, and a second processor is fourth to sixth queues 1410 assigned to a second virtual machine. The third processor may be connected to the seventh and eighth queues 1420 allocated to the third virtual machine.

As another example, the first processor is connected with a fourth queue assigned to the second virtual machine together with the first to third queues assigned to the first virtual machine, in which case the second processor is assigned to the second virtual machine. Connected to the fifth and sixth queues. In other words, the processor may be connected to all or part of a queue allocated to at least two virtual machines.

The monitoring unit 1360 monitors various states including loads of the processor 1340 and the queue 1330.

The queue manager 1370 divides the queues into a plurality of partitions according to the monitoring result and processes the scheduler for each partition as shown in FIG. 15, combines or splits the plurality of queues into one, or the number of queues allocated to the virtual machine. Adjust the size and number of queues, such as increasing or decreasing them. The queue manager may dynamically set the number and size of queues for each virtual machine according to the virtualization environment of the server identified through the process of FIG. 12.

14 illustrates an example of deepflow-based queue allocation of a NIC according to the present invention.

Referring to FIG. 14, the queues 1330 are classified by virtual machines. For example, the first to third queues 1400 are assigned to the first virtual machine, the fourth to sixth queues 1410 are assigned to the second virtual machine, and the seventh and eighth queues 1420 are assigned to the first virtual machine. It is assigned to the third virtual machine. The scheduler performs queuing by referring to a deep flow for each virtual machine.

For example, in the case of identifying a deep flow directed to the first virtual machine according to the priority, the scheduler 1350 may assign the deep packet based on the priority to the first to third queues 1400 assigned to the first virtual machine. Classify and save. That is, among the deep flows destined for the first virtual machine, the highest priority deepflow is stored in the first queue, the next priority deepflow is stored in the second queue, and the remaining priority deepflows are stored in the third queue.

15 is a diagram illustrating another example of deep flow based queue allocation of a NIC according to the present invention.

Referring to FIG. 15, the queues 1330 are divided into at least two

partitions

1520 and 1530. The

schedulers

1500 and 1510 are allocated to each

partition

1520 and 1530. For example, the first scheduler 1500 is assigned to the first partition 1520, and the second scheduler 1510 is assigned to the second partition 1530. Each

scheduler

1500 and 1510 independently performs scheduling tasks in parallel with the assigned partitions. According to the present embodiment, an example of grouping the queues 1330 into two

groups

1520 and 1530 may be grouped into three or more groups, and scheduling of each group may be performed according to an embodiment. The multiple scheduling method described above may be applied. As described above, the scheduler may be a processor selected by a predetermined method among the plurality of processors 1380.

For example, if the load distribution of the queue measured by the monitoring unit falls below a preset threshold during scheduling by one scheduler as shown in FIG. 13, redistribution of the queue or processor reallocation may be determined. Alternatively, redistribution of queues or processor reallocation may be determined if the processor load is below a certain threshold by calculating the statistical amount of deep packets received from the network and the processor capabilities performed by the total processor in the NIC. When redistributing a queue or reallocating processors, as shown in FIG. 15, when a queue is divided into a plurality of partitions and additional scheduler designation is required, the processor having the least load may be designated as an additional scheduler.

Queues belonging to each partition may be grouped 1540 on a virtual machine basis, and the queues in the group 1540 may be classified on a deep flow basis. In this case, a hierarchical structure of a partition-group by virtual machine-flow unit queue by group is generated.

16 illustrates an example of a deep packet used in the present invention.

Referring to FIG. 16, a deep packet includes a physical network frame 1610, a tunneling field 1620, a network frame 1630 between virtual machines, and a data field 1600.

The physical network frame 1610 includes information representing a layer of a conventional physical network such as L2, IP, TCP, and the like. The tunneling field 1620 represents tunneling information and the like. The virtual machine network frame 1630 includes information on each layer (vL2 to vL7, etc.) in the network environment between virtual machines. The data field 1600 contains data.

The structure of the deep packet of FIG. 16 is just one example to help the understanding of the present invention, and the present invention is not limited thereto. The structure of the deep packet may be defined and used in various forms for the virtual machine environment.

In addition, the structure of the deep packet stored in the memory and the structure of the deep packet stored in the queue may be the same or different according to the embodiment. For example, the deep packet restored by decapsulating the packet of FIG. 16 received from the network may be changed to an optimal structure that can be processed in the virtual machine environment, or some or all of the fields of the deep packet are unnecessary in the virtual machine environment. Various design changes, such as deleting, can be stored in the queue.

Referring to FIG. 17, when receiving a deep packet (S1700), the network interface device analyzes the deep packet through a DPI process to identify a destination virtual machine and a deep flow to which the deep packet is to be delivered (S1710). The network interface device stores deep packets in deep flow units for at least one queue allocated to each virtual machine (S1720). In operation S1730, the network interface device processes a deep packet stored in each queue through a plurality of processors and transmits the deep packet to the virtual machine.

The invention can also be embodied as computer readable code on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include various types of ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

So far I looked at the center of the preferred embodiment for the present invention. Those skilled in the art will appreciate that the present invention can be implemented in a modified form without departing from the essential features of the present invention. Therefore, the disclosed embodiments should be considered in descriptive sense only and not for purposes of limitation. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the scope will be construed as being included in the present invention.

Claims

In the scheduling method in a multiple processing apparatus,

Grouping all or some of the plurality of processors into at least one processor group;

Allocating the flow to a predetermined processor group or processor if a predetermined processor group or processor exists for the received packet flow; And

If a predetermined processor group or processor does not exist for the flow of the received packet, creating and assigning a new processor group for the flow or assigning the flow to a processor that does not belong to any processor group; Scheduling method.
The method of claim 1, wherein the assigning to create the new processor group comprises:

Grouping all or some of the ungrouped processors among the plurality of processors into a new processor group; And

Allocating a flow of the received packet to a new processor group.
The method of claim 1,

Transmitting an interrupt signal to one of the processors in the processor group for each processor group; And

Receiving the interrupt signal, the processor stops the previous task, and after performing the scheduling operation resumes the previous task; further comprising the scheduling method.
In the scheduling method in a multiple processing apparatus,

Determining a single scheduling or multiple scheduling based on the load status or processing capacity of the plurality of processors or the nature of the received packet;

In the case of the single scheduling, designating one of the plurality of processors as a scheduler; And

In the case of the multi-scheduling, grouping a plurality of processors into at least two processor groups and designating one of the processors in each processor group as a scheduler of each processor group. .
The method of claim 4, wherein

In the case of the multi-scheduling, if there is no processor group or processor specified for the flow of the received packet, generating a new processor group for the flow; scheduling method further comprising.
The method of claim 5, wherein creating the new processor group comprises:

Generating all or some of the ungrouped processors among the plurality of processors into a new processor group.
Obtaining a deep packet including virtualization environment network layer information encapsulated in a physical network frame from a packet received through the physical network;

Identifying the deep packet as a deep flow based on the virtualized environment network layer information included in the deep packet; And

And dividing the deep packet into the identified deep flow unit and assigning the deep packet to a corresponding queue.
The method of claim 7, wherein

Grouping one or more queues by virtual machine; scheduling method further comprising.
The method of claim 7, wherein

The identifying may include identifying a destination virtual machine of the deep packet,

The allocating may include allocating the deep packet to a corresponding queue based on the destination virtual machine and the deep flow.
The method of claim 7, wherein

The virtual environment network layer information refers to network layer information formed of a virtual machine encapsulated in a physical network frame for packet transmission in a network layer formed of a virtual machine,

The deep packet is a scheduling method, characterized in that the physical network is encapsulated in a physical network frame to be recognized by a common communication protocol to achieve a smooth transmission.
The method of claim 7, wherein the assigning step,

Selecting a queue group based on a destination virtual machine among a plurality of queue groups including at least one queue; And

Selecting a queue to store a packet in the selected queue group based on the deep flow.
The method of claim 7, wherein

The number of queues allocated to each virtual machine is dynamically determined based on the load distribution or virtualization environment information including the information about the virtual machine.
The method of claim 7, wherein

Includes a plurality of queue groups allocated for each virtual machine including at least one queue,

The scheduler allocates one of the plurality of queue groups to the deep packet based on a destination virtual machine of the deep packet, and allocates a queue in the selected queue group based on the deep flow.
A packet receiver configured to obtain a deep packet including virtualization environment network layer information encapsulated in a physical network frame from a packet received through a physical network;

A packet analyzer configured to identify the deep packet as a deep flow based on the virtualized environment network layer information included in the deep packet; And

And a scheduler for dividing the deep packet into the identified deep flow unit and assigning the deep packet to a corresponding queue.
The method of claim 14, wherein the scheduler,

Selecting a queue group based on a destination virtual machine among a plurality of queue groups including at least one queue, and selecting a queue to store a packet in the selected queue group based on the deep flow.