CN104281493A

CN104281493A - Method for improving performance of multiprocess programs of application delivery communication platforms

Info

Publication number: CN104281493A
Application number: CN201410510222.8A
Authority: CN
Inventors: 高明; 张广龙; 彭建章
Original assignee: BANGGOO TECHNOLOGY Co Ltd
Current assignee: Solid (Beijing) Network Technology Co., Ltd.
Priority date: 2014-09-28
Filing date: 2014-09-28
Publication date: 2015-01-14

Abstract

The invention discloses a method for improving performance of multiprocess programs of application delivery communication platforms. The method includes hashing data packets to network card queues according to source IPs (internet protocols); binding the data packets in the network card queues to corresponding CPU (central processing unit) cores; binding the data packets received by the CPU cores with corresponding processes to process; respectively creating service programs for the processes, setting the service processes as REUSEPORT options, and binding IPs (internet protocols) and ports; running the corrected service programs, and adjusting the number of queues enabled by multi-queue network cards according to the number of the service processes; enabling each service process to be bond with one CUP core. By the method, hard interruption and soft interruption of the network cards can be balanced, CPU cores used for receiving and sending the data packets are the same, and thus, CPU cache hit ratio is increased.

Description

A kind of method promoting application delivery communication platform multi-process program feature

Technical field

The present invention relates to computer networking technology, particularly relate to a kind of method promoting application delivery communication platform multi-process program feature.

Background technology

Cpu cache is the temporary storage between CPU and internal memory, mainly in order to solve CPU arithmetic speed and the unmatched contradiction of memory read-write speed.The tightness degree that cpu cache is combined with CPU according to digital independent order, can be divided into level cache, L2 cache.Wherein, level cache can be divided into data buffer storage and instruction buffer again, is used for store data and decoding in time to performing the instruction of these data.

Each core of usual polycaryon processor has a less independently level cache, and core all in addition shares a larger L2 cache.The speed of routine access data is as follows:

If the data of routine access are in the level cache of this core, then access the fastest of these data.

If the data of routine access are in L2 cache, then it is very fast to access these data.

If the data of routine access are not in I and II buffer memory, because CPU arithmetic speed is faster than memory read-write speed, CPU need take a long time from internal memory called data, accesses very slow.

When same data are stored on the level cache of multiple core, there is a core to revise this data, the cache invalidation of these data on other cores can be caused.

At present, under polycaryon processor environment, single queue network interface card and many queues network interface card have the mechanism improving cpu cache hit rate in varying degrees.

The mechanism of cpu cache hit rate is improved under single queue network interface card

Single queue network interface card mechanism adopts RPS (Receive Packet Steering receives packet and turns to)+RFS (Receive Flow Steering receiving data stream turns to) mechanism optimization data stream.The Hash that RPS achieves data stream is sorted out, simultaneously weaken rock load balancing to each CPU.Because RPS is the simple packet delivery of same flow is processed to same CPU core, the core of the application program that may cause the CPU core of distributing data stream and perform this data stream of process is not same, now, packet is balanced to different CPU, the CPU at application program place is different with the CPU of weaken rock process, has a strong impact on cpu cache efficiency.RFS solves this problem, and it sends to the packet received on the CPU at application place, improves cpu cache hit rate.Wherein, the schematic diagram of RPS and RFS as illustrated in figs. ia and ib.

Although RPS, RFS energy balanced network interface card weaken rock process of single queue network interface card mechanism, hard interruption can become the bottleneck of system performance.

The mechanism of cpu cache hit rate is improved under many queues network interface card

Many queues network interface card mechanism adopts the method optimized data stream such as RSS (expansion of Receive-side Scaling receiving end), XPS (Transmit Packet Steering sends packet and turns to), REUSEPORT (port is reused).Under many network interface cards, there is multiple sending and receiving queue.After network interface card receives packet, use network layer RSS hash algorithm (IP-based algorithm) or transport layer RSS hash algorithm (algorithm based on port) that packet is dealt into each queue, then send on the CPU core that in look-at-me to queue, processing queue is interrupted, thus make this core perform driving instruction, collect packet to system.RSS principle as shown in Figure 2.

In XPS mechanism, for each CPU prepares a transmit queue mapping table, represent the packet that this CPU sends, can only send with the queue that this transmit queue mapping table is arranged, user can arrange the transmit queue mapping table of each CPU.By limiting the quantity of CPU transmit queue, being equivalent to binding CPU to transmit queue, improve cpu cache efficiency.Fig. 3 is the instance graph of XPS, and in figure, each processor can only send packet with a queue.

In traditional multi-process model, these processes shared server socket (socket).When client-requested arrives, the competition of each process obtains this request, and now client-requested is uncertain by which process process.In order to address this problem, for the socket created arranges REUSEPORT (port is reused) option, make multiple process can bind same IP and port simultaneously, client-requested is according to (source IP, source port, object IP, destination interface) Hash arrival server processes, eliminate the frightened group's problem of accept (reception), improve system effectiveness.The instance graph of REUSEPORT shown in Fig. 4,4 processes bind identical IP and port provides service to outside client.

Under polycaryon processor, many queues network interface card environment, although adopt above method can improve the treatment effeciency of program, but really can not ensure that the CPU core receiving packet is same CPU core with the CPU core of transmission packet, cpu cache efficiency can be caused low, and system performance declines.

Summary of the invention

For solving the problems of the technologies described above, the method of application delivery communication platform multi-process program feature is promoted under the object of this invention is to provide a kind of polycaryon processor, many queues network interface card environment, process in application delivery communication platform that network interface card queue is interrupted firmly, weaken rock, protocol stack transceiving data bag, application program process etc. complete on a core, and by the process of intensive data bag, improve system CPU buffer efficiency, the final performance improving whole system.

Object of the present invention is realized by following technical scheme:

Promote a method for application delivery communication platform multi-process program feature, the method comprises:

According to source IP, packet is hashing onto network interface card queue;

Bind packet in described network interface card queue to corresponding CPU core;

The packet that described CPU core receives is bound corresponding process process;

For each process creates service routine respectively, arranging described service processes is REUSEPORT option, and binding IP and port;

Run amended service routine, adjust according to the quantity of service processes the number of queues that many queues network interface card enables; Described each service processes is bundled on a CPU core respectively.

Compared with prior art, one or more embodiment of the present invention can have the following advantages by tool:

Can balanced network interface card firmly interrupt, weaken rock, ensure that receiving and sending the CPU core of packet is same CPU core, thus improve the cache hit rate of CPU.

Accompanying drawing explanation

Fig. 1 a and Fig. 1 b is RPS/RFS and the RPS structural representation that prior art provides;

Fig. 2 is the RSS structure principle chart that prior art provides;

Fig. 3 is the XPS instance graph that prior art provides;

Fig. 4 is the REUSEPORT example block diagram that prior art provides;

Fig. 5 is method flow diagram provided by the invention;

Fig. 6 is the schematic diagram of the present invention when connecting;

Fig. 7 is the schematic diagram after the present invention connects.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with embodiment and accompanying drawing, the present invention is described in further detail.

As shown in Figure 5, for promoting the method for application delivery communication platform multi-process program feature, the method comprises:

Packet is hashing onto network interface card queue according to source IP by step 10;

In order to ensure that the packet of same source IP interrupts firmly, weaken rock all transfers to same processor core process, replacing the RSS hash algorithm given tacit consent to based on source IP, source port, object IP, destination interface in linux kernel trawl performance module is source IP address hash algorithm.

Step 20 binds packet in described network interface card queue to corresponding CPU core;

Network interface card in N number of network interface card queue interrupts firmly, weaken rock gives N number of CPU process.The hard interruption of N number of queue of network interface card is set to N number of CPU core; Weaken rock acquiescence and hard interrupt run are on same CPU core.

The packet that described CPU core receives is bound corresponding process and is processed by step 30;

The packet that N number of CPU receives gives N number of process process.Amendment REUSEPORT hash algorithm, makes the result of its Hash identical with the Hash result of hash algorithm in step 10.

Step 40 creates service routine respectively for each process, and arranging described service processes is REUSEPORT option, and binding IP and port; Then call accept and wait for client's side link.

Step 50 runs amended service routine, adjusts according to the quantity of service processes the number of queues that many queues network interface card enables; Described each service processes is bundled on a CPU core respectively.

Linux system after the modification runs amended service routine, and suppose process quantity is N, then adjusting many queues number of queues that network interface card is enabled is N, and each server processes are bundled on a core respectively.

Disposition when connecting, as shown in Figure 6: system has N number of core, each core runs server processes, and many queues network interface card enables N number of queue; After client-requested arrives network interface card, network interface card is hashing onto on next core according to client source IP, after the protocol stack process on this core, the hash algorithm that REUSEPORT hash algorithm uses network interface card to use equally, according to the source IP of this client, gives next server processes process the request of this client.

Disposition after connecting, as shown in Figure 7: when after connection establishment, the packet of client, through network interface card Hash, transfers to next core process, then directly transfers to the server processes process operated on this core.

Although the embodiment disclosed by the present invention is as above, the embodiment that described content just adopts for the ease of understanding the present invention, and be not used to limit the present invention.Technician in any the technical field of the invention; under the prerequisite not departing from the spirit and scope disclosed by the present invention; any amendment and change can be done what implement in form and in details; but scope of patent protection of the present invention, the scope that still must define with appending claims is as the criterion.

Claims

1. promote a method for application delivery communication platform multi-process program feature, it is characterized in that, described method comprises:

According to source IP, packet is hashing onto network interface card queue;

2. the method for lifting application delivery communication platform multi-process program feature according to claim 1, it is characterized in that, each communication data processing queue binds an independent CPU core; Each process binds an independent CPU core.

3. the method for lifting application delivery communication platform multi-process program feature according to claim 1, it is characterized in that, the packet that network interface card receives is after source address hash algorithm, Hash enters certain communication data processing queue, and transfer to the process of binding on the CPU core of this communication data processing queue to process, through returning to client by the transmit queue that this CPU core is bound after data processing is complete.

4. the method for lifting application delivery communication platform multi-process program feature according to claim 1, it is characterized in that, after client mails to certain queue of packet arrival network interface card of application delivery communication platform program, triggering network interface card interrupts, and network interface card interruption, packet delivery, protocol stack and application program are bundled on same CPU core.

5. the method for lifting application delivery communication platform multi-process program feature according to claim 1, it is characterized in that, the application delivery communication platform process operated on N number of CPU core uses N number of receipts queue and N number of transmit queue transceiving data bag of network interface card.