CN102404211A

CN102404211A - Method and device for realizing load balancing of processors under AMP framework

Info

Publication number: CN102404211A
Application number: CN2011103622328A
Authority: CN
Inventors: 刘彤
Original assignee: Beijing Topsec Technology Co Ltd
Current assignee: Beijing Topsec Technology Co Ltd
Priority date: 2011-11-15
Filing date: 2011-11-15
Publication date: 2012-04-04

Abstract

The invention discloses a method for realizing the load balancing of processors under an AMP framework, wherein the method comprises the following steps of: establishing an annular working queue for each detection processor; after a network manager receives a data message, looking up the detection processor corresponding to the message, if the annular working queue of the detection processor is not full, then submitting the message to the detection processor to detect; if the annular working queue of the detection processor is full, then submitting the message to a detection processor having an annular working queue which is not full to detect; and if the annular working queue of each detection processor is empty, then submitting the work of sending messages processed by a network processor to the detection processors to process. The efficient balancing of the loads among the detection processors, and between the network processor and the detection processors is realized by the technical proposal disclosed by the invention, thereby being better for exerting the performance of each processor.

Description

Balanced implementation method and the device of processor load under a kind of AMP framework

Technical field

The invention belongs to communication technical field; Particularly relate in a kind of multi-core parallel concurrent computing environment; AMP (Asymmetric MultiProcessing; Asymmetric multiprocessing) implementation method and the device of CPU (Central Processing Unit, central processing unit is hereinafter to be referred as processor) load balancing under the framework.

Background technology

Recent years; IPS (Intrusion Prevention System, intrusion prevention system) product becomes the new focus in safety product market, has not only kept annual growth rate of market more than 100%; And application constantly enlarges, and application technology is also progressively popularized.With traditional I DS (Intrusion Detection Systems; Intruding detection system) bypass inserts different; The IPS product adopts the mode that works online, and promptly the data that receive is detected, and transmits according to its purpose then; This and security gateway series products such as fire compartment wall, VPN (Virtual Private Network, Virtual Private Network) etc. are closely similar.This working method has determined the IPS product except detectability accurately will be arranged, and also the performance requirement that adapts with application network will be arranged.

In fact since the IPS product is born, adopted mature technologies such as agreement identification and attack signature pattern matching, what perplex its range of application mainly is performance requirement always.Present fire compartment wall reaches gigabit wire speed, 4G even 10G transfer capability and has belonged to usually, but IPS will realize that this performance is by no means easy.In IPS, not only need check the head of data message; Also will be to the content of concrete application protocol inspection data message; This data message that just makes that five-tuple is identical in IPS can not " quicken to handle "; That is to say in the whole road of IPS deal with data message does not have " shortcut ", and IPS need detect self each message of flowing through one by one.IPS becomes the person that mainly do not expend of cpu resource like this, and its performance depends on the disposal ability of hardware processor to a great extent.

Developing into of polycaryon processor utilizes parallel processing technique improving IPS properties of product that wide space is provided in recent years; Because it all is effectively that the lifting of processor computing capability detects complete trails to IPS, so number of cores is directly proportional with performance boost theoretically.But theory is not equal to practice, and the lifting of actual performance depends primarily on IPS to the balanced utilization of each processor, promptly brings into play the max calculation ability of each processor.

Two kinds of processor framework of work are generally arranged in the multi-core parallel concurrent computing environment, and a kind of is SMP (Symmetrical MultiProcessing, symmetrical multiprocessing) mode; Be also referred to as the isomorphism mode; The SMP mode is as the term suggests treat a plurality of kernel equalitys exactly, and the work that each kernel is undertaken is all identical, and all moves a cover IPS system on each kernel; Like this from Data Receiving, connect to set up, Data Detection is sent to data all is concurrent execution, be equivalent to a plurality of IPS system and move at the same time.This framework is more succinct, each processor cores load balancing, but because all kernels are all undertaken identical work; Certainly will produce a large amount of contentions to shared resource (internal storage data, filec descriptor, I/O equipment etc.); For handling these concurrent and synchronous a large amount of lock mechanisms that use, seriously restricted the performance performance again, more serious is increasing along with number of cores; Concurrent and synchronous consumption reaches certain magnitude, and performance not only can not increase on the contrary and can descend to some extent.

Another kind is the AMP mode, is also referred to as the isomery mode.The AMP mode exactly with a plurality of inner core region although treat; Can move different operating systems and also can in the identical operations system, move various tasks, each processor cores acts in accordance with the division of their functions and duties according to task division; Evade the competition of shared resource, thus the improving IPS The comprehensive performance.Complete operating system is often huger, and consumes resources is more, and efficient is also lower.Take out several physics kernels; Set up a kind of easy system environments (sometimes directly being called " bare nucleus " environment) above that; Operation single task role (such as transceive data, pattern matching etc.) often can obtain high performance in this " clean space "; This is the characteristics of AMP mode, also is its advantage.Though AMP framework more complicated is very effective because of its performance boost, is widely used at present.

The difficult point of AMP framework need to be the task sharing of careful each kernel of balance, otherwise can cause the kernel load unbalanced, influences the performance performance.The method that generally adopts at present is that processor cores is divided into two types, and one type is called network processing unit, is used for the reception and the transmission of network data message, and the another kind of measurement processor that is called is used to carry out IPS and detects.After network processing unit receives the network data message; According to its five-tuple connect (data flow); To connect with the hash algorithm then and navigate to fifty-fifty on unique measurement processor, realize load balancing like this, and be about to data flow and be assigned on the measurement processor fifty-fifty; Ensure simultaneously same distribution of flows on same measurement processor, ensure that a data flow is all the time by a measurement processor processing.Fig. 1 is the balanced implementation method schematic diagram of processor load under the AMP framework in this prior art.

The defective of said method is; Though data flow by relative equilibrium be assigned on the measurement processor because the data message number, message size, the message content that comprise are all widely different in the different data flow, it is inequality that this directly causes measurement processor to detect the speed of data flow; If any the smaller application layer data that even do not comprise of message; Need not carry out IPS and detect, can dispose apace, and the data message that has be http (HTTP) agreement and comprises abundant uri (unified resource identifier) information; There is a large amount of IPS rule to need matching detection one by one, certainly will be consuming time longer.This has caused the load between the measurement processor in fact to be in imbalance, has influenced the performance performance.On the other hand, be that pinned task is distributed between network processing unit and measurement processor, and just in time equilibrium of two kinds of work is handled in network processes and detection, this has also influenced the lifting of overall performance.

Wait other need do the product that data content detects for av (anti-virus), dpi (deep message detection); Its processor is divided into network processing unit and measurement processor equally, and exists above-mentioned network processes equally and detect to handle the problem that two kinds of work can not fine equilibrium.

Summary of the invention

The present invention provides processor load under a kind of AMP framework balanced implementation method and device, with the load that solves AMP framework lower network processor and measurement processor in the prior art can not efficient balance problem.

The present invention provides the balanced implementation method of processor load under a kind of AMP framework, comprising:

For each measurement processor is respectively set up an annular working formation;

After network manager is received data message, search the corresponding measurement processor of this message, if the said annular working formation of this measurement processor less than, then message is transferred to this measurement processor and detects; If the said annular working formation of this measurement processor is full, then with message transfer to a said annular working formation less than measurement processor detect.

Further, after said network manager is received data message, search the corresponding measurement processor of this message, may further comprise the steps:

After network manager is received data message,, then in the record of this connection, find corresponding measurement processor if find corresponding connection; If do not find corresponding connection, then the five-tuple according to this message connects, and confirms corresponding measurement processor according to connecting again.

Further, the said connection that finds correspondence, the method for employing is: the five-tuple according to message calculates the hash value, finds connection according to the hash value again.

Further, the five-tuple of said message comprises source address, destination address, source port, destination interface and agreement.

Further, said according to connecting the measurement processor of confirming correspondence, adopt the hash algorithm to realize.

Further, the balanced implementation method of processor load also comprises under the said AMP framework:

Between network processing unit and measurement processor, carrying out task dynamically adjusts.

Further, the said task that between network processing unit and measurement processor, carries out is dynamically adjusted, and comprising:

When the said annular working formation of each measurement processor is sky, said measurement processor is transferred in the work of being handled by said network processing unit originally handled.

Again further, said script is meant the work of sending message by the work of said network processing unit processing.

The present invention also provides the balanced implement device of processor load under a kind of AMP framework, comprising:

Module is set up in the annular working formation, is used to each measurement processor and respectively sets up an annular working formation;

The measurement processor load balancing module is used for after network manager is received data message, searches the corresponding measurement processor of this message, if the said annular working formation of this measurement processor less than, then message is transferred to this measurement processor and detects; If the said annular working formation of this measurement processor is full, then with message transfer to a said annular working formation less than measurement processor detect.

Further, the balanced implement device of processor load also comprises the task adjusting module under the AMP framework, and this module is used between network processing unit and measurement processor, carrying out task and dynamically adjusts.

Beneficial effect of the present invention is following:

The present invention proposes to measurement processor and set up annular working formation (being the periodic duty formation), thus the dynamic loading condition of each measurement processor of perception;

The present invention proposes the method for measurement processor load balancing, help the performance performance of measurement processor;

The present invention proposes the load-balancing method between network processing unit and the measurement processor, can't balanced problem thereby solved in the prior art between the network processing unit and measurement processor, realized the lifting of data detection system overall performance under the AMP framework.

Description of drawings

Fig. 1 is the balanced implementation method schematic diagram of processor load under the AMP framework in the prior art;

Fig. 2 is the balanced implementation method flow chart of processor load under the AMP framework of the embodiment of the invention;

Fig. 3 is the balanced implementation method schematic diagram of processor load under the AMP framework of the embodiment of the invention;

Fig. 4 is the balanced implement device structure chart of processor load under the AMP framework of the embodiment of the invention.

Embodiment

Below in conjunction with accompanying drawing and embodiment, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, does not limit the present invention.

Method embodiment

According to embodiments of the invention; The balanced implementation method of processor load under a kind of AMP framework is provided; In following examples, be that example specifies, establish in this IPS system with the IPS system; A network processing unit and three measurement processor are arranged, and three measurement processor numberings are followed successively by 0,1 and 2.Fig. 2 is the flow chart of the balanced implementation method of processor load under the AMP framework of the embodiment of the invention; Fig. 3 is the balanced implementation method schematic diagram of processor load under the AMP framework of the embodiment of the invention; Can know in conjunction with Fig. 2 and Fig. 3; The balanced implementation method of processor load under the AMP framework of the embodiment of the invention comprises following processing:

Step 201 is set up the annular working formation.

In the present embodiment, queue length is 512, and promptly maximum can 512 message informations of buffer memory, set up queue heads and rear of queue pointer simultaneously respectively.

The working method of annular working formation is:

After network processing unit receives data message; Search connection according to its source address, destination address, source port, destination interface; If search less than needs and newly set up a syndeton; Then message protocol type, message size, message data address, connection handle information such as (pointing to the pointer of syndeton) are formed a message information structure, join the formation afterbody of the annular working formation of corresponding measurement processor.The queue heads of annular working formation and rear of queue all are dynamic, keep at a distance between end to end, if distance is that zero explanation formation is for empty, if distance is for queue length then queue full is described.Distance adds 1 when from data structure of afterbody adding; Distance subtracts 1 when taking a data structure away from the head; Formation just can not have been fetched data structure again for empty when distance was zero end to end, and queue full when distance is queue length equally end to end just can not add data structure again.Measurement processor is taken out message information successively from queue heads, according to this information the data message content is carried out IPS and detect to handle, and handles the back queue heads and moves after successively.

Step 202 is carried out data flow and is redirected between measurement processor.This step specifically comprises:

1) network processing unit receives data message; Carry out initial analysis; Do not handle direct forwarding for non-TCP (Transmission Control Protocol, transmission control protocol)/UDP (User Data Protocol, UDP) message; Calculate the hash value for the TCP/UDP message according to its five-tuple (source address, destination address, source port, destination interface, agreement), search connection according to the hash value again.All syndetons all are recorded in a hash array the inside; The following hash value that is designated as of array, the member of array is the pointer of syndeton, does array index with the hash value when searching and takes out the array member; Just obtained the syndeton that needs, if the null pointer explanation does not also connect.Usually this hash array has 1,000,000 members, and representative system can be supported 1,000,000 connections at most.Connect direct execution in step 4 if find).

2) for not finding connection, need newly set up a syndeton, the in fact corresponding data flow of syndeton according to the five-tuple of message.

3) the hash value that connects is subtracted 1 divided by the cpu number, get surplusly then, the value that obtains is between 0～2, and this value is exactly that this connects corresponding measurement processor and numbers, and number record in syndeton, need not all be calculated so at every turn.Certainly, adopt the hash algorithm to obtain connecting corresponding measurement processor numbering in this step, the example that concrete algorithm is not limited to provide here can also be other any algorithms that can the numbering of hash value that connects and measurement processor be mapped.

4) take out to connect in the measurement processor numbering of record, find its annular working formation one to one, the pointer end to end of inspection annular working formation, if end to end the pointer gap less than 512 the explanation formation less than, direct execution in step 6).

5) the full annular working formation that then continues the next measurement processor of inspection of formation; If less than; Measurement processor numbering during then change connects; Connection is redirected to this new measurement processor, and whether the pointer distance end to end of judging the measurement processor that this is new then less than 512, like this up to find the annular working formation less than measurement processor.If the annular working formation of all measurement processor is all full, then abandon detecting, directly transmit and should connect.

6) generate a message information structure, comprise message protocol type, message size, message data address, connect handle information such as (pointing to the pointer of syndeton), add the annular working formation, wait for that measurement processor detects from the formation afterbody.

Step 203 is carried out task and is dynamically adjusted between network processing unit and measurement processor.Be that example describes with the work of sending message below, certainly, the task adjustment between network processing unit and the measurement processor is not limited in to be adjusted the task that sends message.This step specifically comprises:

1) it is independent to send message work.

To send the message part program and independently become a module, and make network processing unit to call, measurement processor also can be called simultaneously.When network processing unit called transmission message module, this block code was moved on network processing unit, takies the network processing unit load.When measurement processor was called transmission message module, this block code was moved on measurement processor, takies the measurement processor load.

A switch is set, is in closed condition under the normal condition.Call transmission message module by network processing unit when this switch cuts out, measurement processor is never called.Just in time opposite during switch opens.

2) between network processing unit and measurement processor, carrying out task according to the measurement processor loading condition dynamically adjusts.

A timer is set, the annular working formation of each measurement processor of regular check.

If being queue heads, all annular working formations equate with rear of queue; I.e. all annular working formations are sky, then open switch, measurement processor is born sent message work; Increase the measurement processor load; Alleviate the network processing unit load simultaneously, network processing unit is no longer handled and is sent message work, between two kinds of processors, carries out task and dynamically adjusts.

Certainly, when also can be designed as a clear text number average when all annular working formations, open switch, between network processing unit and measurement processor, carry out task and dynamically adjust less than pre-set threshold value.

More than be example with the IPS system, the implementation method balanced to processor load under the AMP framework of the present invention specifies, the present invention is not limited in the application of IPS system, and can be applied to other data detection systems such as av, dpi equally.

Device embodiment

According to embodiments of the invention; The balanced implement device of processor load under a kind of AMP framework is provided; Fig. 4 is the structural representation of the balanced implement device of processor load under the AMP framework of the embodiment of the invention; As shown in Figure 4, the balanced implement device of processor load comprises under the AMP framework of the embodiment of the invention: module 401, measurement processor load balancing module 402 and task adjusting module 403 are set up in the annular working formation.Below each module of the embodiment of the invention is carried out detailed explanation.

Particularly, the annular working formation is set up module 401 and is used to each measurement processor and respectively sets up an annular working formation.

Measurement processor load balancing module 402 is used for after network manager is received data message, searches the corresponding measurement processor of this message, if the annular working formation of this measurement processor less than, then message is transferred to this measurement processor and detects; If the annular working formation of this measurement processor is full, then with message transfer to an annular working formation less than measurement processor detect.

Task adjusting module 403 is used between network processing unit and measurement processor, carrying out task and dynamically adjusts.

The details of the embodiment of the implement device of processor load equilibrium can repeat no more referring to the description of method embodiment part to the balanced implementation method of processor load under the AMP framework here under the AMP framework of the present invention.

Although be the example purpose, the preferred embodiments of the present invention are disclosed, it also is possible those skilled in the art will recognize various improvement, increase and replacement, therefore, scope of the present invention should be not limited to the foregoing description.

Claims

1. the balanced implementation method of processor load under the asymmetric multiprocessing AMP framework is characterized in that, comprising:

2. the balanced implementation method of processor load is characterized in that under the AMP framework as claimed in claim 1, after said network manager is received data message, searches the corresponding measurement processor of this message, may further comprise the steps:

3. the balanced implementation method of processor load is characterized in that under the AMP framework as claimed in claim 2, the said connection that finds correspondence, and the method for employing is: the five-tuple according to message calculates the hash value, finds connection according to the hash value again.

4. the balanced implementation method of processor load is characterized in that the five-tuple of said message comprises source address, destination address, source port, destination interface and agreement under the AMP framework as claimed in claim 2.

5. the balanced implementation method of processor load is characterized in that under the AMP framework as claimed in claim 2, and is said according to connecting the measurement processor of confirming correspondence, adopts the hash algorithm to realize.

6. like the balanced implementation method of processor load under each described AMP framework in the claim 1 to 5, it is characterized in that, also comprise:

7. the balanced implementation method of processor load under the AMP framework as claimed in claim 6 is characterized in that the said task that between network processing unit and measurement processor, carries out is dynamically adjusted, and comprising:

8. the balanced implementation method of processor load is characterized in that under the AMP framework as claimed in claim 7, and the work that said script is handled by said network processing unit is meant the work of sending message.

9. the balanced implement device of processor load under the asymmetric multiprocessing AMP framework is characterized in that, comprising:

10. the balanced implement device of processor load is characterized in that also comprise the task adjusting module, this module is used between network processing unit and measurement processor, carrying out task and dynamically adjusts under the AMP framework as claimed in claim 9.