CN1272724C - No.7 layer load equalization method based on socket butt joint in kernel - Google Patents

No.7 layer load equalization method based on socket butt joint in kernel Download PDF

Info

Publication number
CN1272724C
CN1272724C CN 02159493 CN02159493A CN1272724C CN 1272724 C CN1272724 C CN 1272724C CN 02159493 CN02159493 CN 02159493 CN 02159493 A CN02159493 A CN 02159493A CN 1272724 C CN1272724 C CN 1272724C
Authority
CN
China
Prior art keywords
request
message
node
client
service node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 02159493
Other languages
Chinese (zh)
Other versions
CN1512377A (en
Inventor
李电森
冯锐
许正华
肖利民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN 02159493 priority Critical patent/CN1272724C/en
Publication of CN1512377A publication Critical patent/CN1512377A/en
Application granted granted Critical
Publication of CN1272724C publication Critical patent/CN1272724C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Computer And Data Communications (AREA)

Abstract

The present invention relates to a seventh layer load equalizing method based on the butt joint of link words in a kernel. A front node receives a client request, a request message source, a destination address and a port number are modified; the message is directly sent to a service node; the response of the service node is received; the source, the destination address and the port number of the response message are modified; the message is directly sent to the service node; if the request is completed, two link words are recycled so as to update a conversion mapping table; otherwise, the client request is continuously received. The present invention can fully utilize the retransmission advantages of a transmission layer and an application layer, not only can enhance the message retransmission efficiency, but also can have great flexibility, thereby rapidly and evenly distributing client requests over the service node; the universal middleware of the design can be utilized to realize the message retransmission function of all the ports, and the complicated analysis of the message can also be carried out so as to realize the fire wall with complicated functions by applying specific filtering rules; the existing TCP connection can even be dynamically transferred.

Description

Method based on the layer 7 load balancing of socket butt joint in the kernel
Technical field
The present invention relates to the method for load balancing in the group of planes, relate in particular to a kind of method, belong to technical field of the computer network based on the layer 7 load balancing of socket butt joint in the kernel.
Background technology
Along with popularizing of the applications of computer network, the computer network number of users is in continuous expansion, and this has proposed bigger challenge for the service provider: use the separate unit server to provide service can not satisfy the demand of huge user's request far away according to the conventional method; Usually all can use multiple servers to provide service now to the user.But, how could all user's requests balancedly be distributed on the background server, then be the problem that load balancing need solve.
If divide according to implementing the residing position of load balancing policy components, load balancing can be divided into two classes: based on name server (Domain Name Server, be called for short DNS) load balancing and based on the load balancing of service end application layer and Internet protocol (Internet Protocol is called for short IP) layer.Usually using DNS poll (round-robin) mechanism based on the load balancing of DNS, is that a name is provided with a plurality of IP address on DNS, points to the actual server that service is provided in backstage respectively.Its mechanism is very simple, but lacks dirigibility, and a lot of restrictive conditions are arranged; For example: the server on backstage must use external address, and energy and client directly communicate.Based on the service end application layer with/or the load balancing of IP layer is then different, it can be analyzed the particular content of request msg according to certain strategy, suitably revises the content of request message, sends it to suitable service node; For example: special-purpose load equalizer hardware, WWW (the World Wide Web that World Wide Web (Web) server provides, be called for short WWW) used HTML (Hypertext Markup Language) (the Hypertext Transfer Protocol of service routine, be called for short HTTP) redirection function etc., can implement load balancing according to the request of using in service end.
Clustering can be packaged into some server aggregates to multiple servers, and unification provides service to external world.This technology has lot of advantages, and for example: extensibility is good, shield interior details etc. to external world.Usually can be in cluster for the user provide a unique single login point (being called preposition node), the Servers-all in it and the cluster keeps being communicated with, and this characteristics have determined to implement special load-balancing mechanism in cluster.
Preposition node in the cluster is exactly the single login point that provides for the user usually, it is responsible for transmitting data mutual between the client and server, and according to certain strategy, the request of client balancedly is distributed on each service node, and the response of service node is transmitted to client, thereby provide powerful service processing ability for the user pellucidly.
In fact, the main task of preposition node has two: message is transmitted and equally loaded.Message is transmitted and can be carried out on two levels: transport layer and application layer.Transmit technology such as to use network address translation (nat), destination address conversion, IP encapsulation, the forwarding of direct bag based on the message of transport layer; Can directly revise source address, the destination address of IP datagram like this, perhaps add an IP datagram stem in addition, perhaps directly revise the ethernet address of datagram, message sends to destination address the most at last.
Because above-mentioned process only need be revised transmission control protocol (Transfer Control Protocol is called for short TCP) stem or IP stem, does not relate to the variation of transmitting data, therefore, can E-Packet apace; Its shortcoming is to lack dirigibility.Bag pass-through mode based on application layer is then different, and it is progressively resolved the datagram that receives, and peels off IP stem and TCP stem, and data finally are applied; Select the appropriate purpose address according to application data then, encapsulate TCP stem and IP stem again, be transmitted to the destination address of internal network by network driver.Its advantage is and can cushion request msg according to the content choice destination node of request, can shield the details and the fault of internal node simultaneously, and is transparent to the terminal user.Because application layer is the layer 7 in the OSI (Open System InterconnectReference Model is called for short osi model), therefore this message retransmission technique just is called the layer 7 forwarding usually.
Because preposition node need be handled various user's requests, this relates to a large amount of operations such as message analysis, data base querying, can not finish in kernel spacing fully, and therefore, preposition node generally all adopts the application layer program to realize message forwarding and load balancing.
To be used for (the Point of Presence that Email receives, also Post Office Protocol, abbreviation POP3) service is example: if a POP3 server only externally is provided, and can directly be communicated with client, situation is just very simple so: client only needs to set up a TCP with 110 ports of POP3 server and is connected, and gets final product to the server requests data.And for a cluster that provides POP3 to serve, it is very complicated that situation just becomes: the user in the large-scale usually mail server is distributed, and the user's data file may be distributed on many mail servers, and client is also ignorant to this.Suppose that client need read the mail of " username@domain.com ", process as shown in Figure 1:
Detailed step is as follows:
1, client is initiated request to 110 ports of preposition node (domain.com), requires to read the mail of user " username@domain.com ";
2, the network interface card of preposition node receives after user's request, and operating system copies it to user's space from kernel spacing, gives the routine processes of application layer;
3, the application layer program is analyzed user's request, determines it is the POP3 request, needs inquiry mail user database LDAP/DB;
4, return Query Result, determine that user's " username " data file is kept on the mail server Mail Serverl;
5, with the copying data in the user's space to kernel spacing, encapsulated message again;
6 set up socket (socket) with mail server Mail Seryerl is connected, and sends request msg;
7, mail server Mail Serverl receives after the request, reads the mail data of user " username ", and message sends it back preposition node in response; Response message oppositely returns along incoming road, until sending to client.
If the request of client is the user who newly sets up " username@domain.com ", so Shang Mian step 3 is to inquire about the mail user database simply no longer just; The application layer program can be collected information such as the load, disk space of all mail servers, adopt corresponding strategy, this user's request is assigned on the only mail server, and on this mail server, set up data file for this user, and the result returned to preposition node, be transmitted to client by preposition node, so just can guarantee that the load between all mail servers is balanced.
TCP is a Connection-oriented Protocol, and common TCP establishment of connection need experience the process of a three-way handshake:
At first, request end (being commonly referred to client) sends a SYN message to link (server), indicates the port that will open, comprises the initial sequence number ISNC of client in the message; Then, server returns an ACK (ACK) message and replys, and acknowledgement number wherein equals ISNC+1, sends a SYN message to client simultaneously, comprising the initial sequence number ISNS of server; At last, client sends ACK message as acknowledgement number to server with ISNS+1 and replys, and can send the request msg message simultaneously.
After this, just use this TCP to connect between the client and server and carry out interaction data, constitute a complete session.
Can use source IP address, source port number, purpose IP address and destination slogan to come session of unique identification; And the detailed process of each session also needs two other value to identify: sequence number (SEQ) and acknowledgement number (ACK).Wherein source IP address and purpose IP address are arranged in the stem of IP datagram, and source port number, destination slogan, sequence number, acknowledgement number then are arranged in the stem of TCP message.
Compare with message retransmission technique based on transport layer, it is exactly that efficient is very low that the layer 7 forwarding has a fatal shortcoming, because it copies the application data of transmission to user's space from kernel spacing, it is not carried out any modification, copy back kernel spacing again, but also caused the expense that corresponding context switches.A kind of transfer socket (divert socket) mechanism is provided among the FreeBSD, permission is directly handled TCP/ User Datagram Protoco (UDP) (User Datagram Protocol at client layer, be called for short UDP) message, can simplify the process that application layer E-Packets, but the expense of still inevitable application data turnover kernel spacing.
In fact, to be connected with two TCP that client, server are set up respectively be not to not the least concerned to forward node.These two connect in case foundation just can obtain these two connections source address, source port number, destination address and destination slogan separately in kernel spacing; And also there is certain corresponding relation between the request and the sequence number of replying.Respectively with symbol string SEQ C-D, SEQ D-S, SEQ S-D, SEQ C-CThe expression client to forward node, forward node to server, the sequence number of server in to forward node, forward node to message between the client, with ACK C-D, ACK D-S, ACK S-D, ACK D-CThe expression client to forward node, forward node to server, the response sequence of server in number to forward node, forward node to message between the client, and establish:
Δ R=SEQ D-S-SEQ C-D=ACK D-S-ACK C-D
Δ A=SEQ D-C-SEQ S-D=ACK D-C-ACK S-D
Forward node just can calculate this two values, and these two values remained unchanged in the life cycle of this session after for the first time the request information between the client and server being transmitted; After this forward node just can be according to the SEQ in the client message that receives C-DWith the SEQ in the server message S-DCalculate corresponding SEQ D-SAnd SEQ D-C
Figure 2 shows that the synoptic diagram that use is transmitted message based on kernel socket berthing mechanism, client is finished the once process of task of asking via forward node and is:
1, client sends first SYN message to forward node, comprising 32 client ip address (be called for short SA), 32 the IP address (being called for short DA), 16 client end slogan (being called for short SP), 16 forward node port numbers (being called for short DP), initial sequence number (being called for short ISNC) of forward node;
2, forward node produces a record after transport layer receives the SYN message, writes down SA, DA, SP, DP, ISNC equivalence, and this message is passed to the application layer program that reception is waited on the upper strata.The application layer program produces an initial sequence number (being called for short ISND), and replys the connection bag to the client transmission; Wherein: ACK=ISNC+1, SEQ=ISND;
3, client begins request msg to forward node, SEQ=ISNC+1, ACK=ISND+1;
4, after the application layer program on the forward node receives data, the content of request msg is analyzed, determined the destination server node;
5, forward node repeated for 1 to 3 step and really provides the service node of service to connect, and request msg is sent to service node;
6, service node receives after the request, and response data is sent to forward node;
7, forward node receives after the response data of service node, and response data is re-assemblied, and sends it to client; Search the data in the conversion mapping table simultaneously, calculate the corresponding sequence number difference DELTA of transmitting front and back RAnd Δ A, finish the structure of changing mapping table;
8, the application layer program of forward node is determined can carry out after the two-way forwarding, merges two socket by input and output control (ioctl) system call notice kernel, discharges the control to these two socket simultaneously;
9, later data forwarding, according to mechanism shown in Figure 2, in kernel, revise content corresponding in TCP heading, the IP datagram stem, directly data are transmitted, the data of being transmitted will no longer pass in and out user's space, therefore will greatly improve the efficient of forwarding;
If 10, both sides one side who connects is interrupted, then discharge whole connection.
Summary of the invention
Fundamental purpose of the present invention provides a kind of method based on the layer 7 load balancing of socket butt joint in the kernel, can reduce preposition node/server copies between kernel spacing and user's space owing to data when transmitting data, and caused system overhead such as corresponding context switching, reduce the load of preposition node, shorten user's request responding time.
The object of the present invention is achieved like this:
A kind of method based on the layer 7 load balancing of socket butt joint in the kernel comprises at least:
Step 10: preposition node receives the request of client;
Step 20: revise source, destination address and the port numbers of request message, message is directly sent to service node;
Step 30: receive the response of service node;
Step 40: revise source, destination address and the port numbers of response message, message is directly sent to client;
Step 50: if request is finished, then reclaim two sockets, upgrade the conversion mapping table; Otherwise execution in step 10.
Before above-mentioned step 20, also further comprise:
Step 11: if preposition node receives the request of client for the first time, then execution in step 12, otherwise, carry out described step 20;
Step 12: copy this request msg to user's space from kernel spacing, transfer to the application layer routine processes, the application layer program is analyzed request msg, and according to the state of load balancing strategy and service node, this services request is transferred to corresponding service node handle;
Step 13: the application layer program is set up socket with selected service node and is connected, and user's request msg is encapsulated again, and it is transmitted to selected service node;
Step 14: service node is handled user's request, and response message is issued preposition node;
Step 15: the application layer program of preposition node receives response message, makes up the conversion mapping table, and the notice kernel merges this two sockets, abandons the control to two sockets simultaneously, carries out described step 50.
Load balancing strategy described in the step 12 is at least:
Polling algorithm;
Or the polling algorithm of weighting;
Or minimum linking number algorithm;
Or the minimum linking number algorithm of weighting;
Or based on the minimum linking number algorithm of asking the position, that is: the request with same client IP sends to identical service node processing;
Or allocate the task method in advance, that is: select the lightest node of load according to the task situation of bearing;
Or the task of allocating in advance the method for weighting, that is: come the lightest node of comprehensive selection load according to the performance of bearing task situation and node;
Or client ip address subregion method, that is: the IP address with different clients is divided into a plurality of districts, all is distributed to the node of an appointment from the request in a district.
Method based on the layer 7 load balancing of socket butt joint in the kernel provided by the present invention, can make full use of the advantage that transport layer is transmitted and application layer is transmitted, can improve the message forward efficiency, have very big dirigibility again, thus faster, more balancedly the request of client is distributed on the service node.
And can use a kind of general middleware of this Mechanism Design, realize the message forwarding capability on all of the port; Also can carry out complicated analysis, use special filtering rule, realize the fire wall of function complexity message; Even can dynamically move existing TCP connection.
Description of drawings
Fig. 1 is prior art is used application layer routine processes POP3 services request in cluster a process;
Fig. 2 uses the synoptic diagram of message being transmitted based on kernel socket berthing mechanism for the present invention;
Fig. 3 is for using the process flow diagram of realizing the layer 7 load balancing based on socket docking technique in the kernel;
Fig. 4 is the process flow diagram of the communication process after the socket butt joint.
Embodiment
The present invention is described further below in conjunction with specific embodiment:
Referring to Fig. 3, method of the present invention is at first accepted the request that client connects, and finishes initialized work, just receives the request of data of client then; Judge in the conversion mapping table whether have corresponding record according to this request,, then revise the contents such as source address, destination address and port numbers of request message, message directly is transmitted to service node if record is arranged; Receive the response of service node then, then revise the contents such as source, destination address and port numbers of response message, message directly is transmitted to client; Judge further more whether request is finished; If do not finish, then return the request of data that receives client, continue circulation, accept the request that client connects; If finish, then reclaim two Socket, upgrade the conversion mapping table, finish.
If do not have record in the above-mentioned conversion mapping table, then request is given the application layer routine processes, the application layer program is determined service node according to the load balancing strategy, application layer program and service node connect then, send request of data; Then receive the response of service node and collect relevant information such as load, receive the response of service node after, the notice kernel merges two Socket, response message is transmitted to client, judges again whether request is finished, if do not finish, then return the request of data that receives client, continue circulation; If finish, then reclaim two Socket, upgrade the conversion mapping table, finish.
As shown in Figure 3, the application program of preposition node is after the response message that receives mail server Mail Serverl, just can determine to E-Packet mutually between client and the service node, therefore notify kernel that two socket are merged, after this preposition node is when the communication of transmitting between client and the mail server MailServerl, just can directly finish at kernel spacing, after two socket merge, communication process between client and the service node just as shown in Figure 4, therefore, significantly reduce the transmission time, greatly improved efficient.
It should be noted last that: above embodiment is the unrestricted technical scheme of the present invention in order to explanation only, although the present invention is had been described in detail with reference to the foregoing description, those of ordinary skill in the art is to be understood that: still can make amendment or be equal to replacement the present invention, and not breaking away from any modification or partial replacement of the spirit and scope of the present invention, it all should be encompassed in the middle of the claim scope of the present invention.

Claims (3)

1, a kind of method based on the layer 7 load balancing of socket butt joint in the kernel is characterized in that: comprise at least:
Step 10: preposition node receives the request of client;
Step 20: revise source, destination address and the port numbers of request message, message is directly sent to service node;
Step 30: receive the response of service node;
Step 40: revise source, destination address and the port numbers of response message, message is directly sent to client;
Step 50: if request is finished, then reclaim two sockets, upgrade the conversion mapping table; Otherwise execution in step 10.
2, the method based on the layer 7 load balancing of socket butt joint in the kernel according to claim 1 is characterized in that: also further comprise before the step 20:
Step 11: if preposition node receives the request of client for the first time, then execution in step 12, otherwise, carry out described step 20;
Step 12: copy this request msg to user's space from kernel spacing, transfer to the application layer routine processes, the application layer program is analyzed request msg, and according to the state of load balancing strategy and service node, this services request is transferred to corresponding service node handle;
Step 13: the application layer program is set up socket with selected service node and is connected, and user's request msg is encapsulated again, and it is transmitted to selected service node;
Step 14: service node is handled user's request, and response message is issued preposition node;
Step 15: the application layer program of preposition node receives response message, makes up the conversion mapping table, and the notice kernel merges this two sockets, abandons the control to two sockets simultaneously, carries out described step 50.
3, the method based on the layer 7 load balancing of socket butt joint in the kernel according to claim 2, it is characterized in that: described load balancing strategy is at least: polling algorithm; Or the polling algorithm of weighting; Or minimum linking number algorithm; Or the minimum linking number algorithm of weighting; Or based on the minimum linking number algorithm of asking the position, that is: the request with same client IP sends to identical service node processing; Or allocate the task method in advance, that is: select the lightest node of load according to the task situation of bearing; Or the task of allocating in advance the method for weighting, that is: come the lightest node of comprehensive selection load according to the performance of bearing task situation and node; Or client ip address subregion method, that is: the IP address with different clients is divided into a plurality of districts, all is distributed to the node of an appointment from the request in a district.
CN 02159493 2002-12-31 2002-12-31 No.7 layer load equalization method based on socket butt joint in kernel Expired - Fee Related CN1272724C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 02159493 CN1272724C (en) 2002-12-31 2002-12-31 No.7 layer load equalization method based on socket butt joint in kernel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 02159493 CN1272724C (en) 2002-12-31 2002-12-31 No.7 layer load equalization method based on socket butt joint in kernel

Publications (2)

Publication Number Publication Date
CN1512377A CN1512377A (en) 2004-07-14
CN1272724C true CN1272724C (en) 2006-08-30

Family

ID=34237501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 02159493 Expired - Fee Related CN1272724C (en) 2002-12-31 2002-12-31 No.7 layer load equalization method based on socket butt joint in kernel

Country Status (1)

Country Link
CN (1) CN1272724C (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1968140B (en) * 2006-06-16 2010-05-12 华为技术有限公司 Method and system for hot-swapping of overload control component and overload control component thereof
CN101399692B (en) * 2007-09-27 2011-12-21 华为技术有限公司 Method and system for service migration
CN101217464B (en) * 2007-12-28 2010-09-08 北京大学 UDP data package transmission method
CN101217493B (en) * 2008-01-08 2011-05-04 北京大学 TCP data package transmission method
CN102130756B (en) * 2008-07-17 2016-05-25 华为技术有限公司 Data transmission method and device
CN101335603B (en) 2008-07-17 2011-03-30 华为技术有限公司 Data transmission method and apparatus
CN101442493B (en) * 2008-12-26 2011-08-10 华为技术有限公司 Method for distributing IP message, cluster system and load equalizer
CN102215231A (en) * 2011-06-03 2011-10-12 华为软件技术有限公司 Data forwarding method and gateway
CN103491016B (en) * 2012-06-08 2017-11-17 百度在线网络技术(北京)有限公司 Source address transmission method, system and device in UDP SiteServer LBSs
CN107483574B (en) * 2012-10-17 2021-05-28 阿里巴巴集团控股有限公司 Data interaction system, method and device under load balance
CN103841139B (en) * 2012-11-22 2018-02-02 深圳市腾讯计算机系统有限公司 Transmit the methods, devices and systems of data
CN109951537B (en) * 2019-03-06 2021-09-10 上海共链信息科技有限公司 Load balancing distribution method facing block chain
CN110730252A (en) * 2019-09-25 2020-01-24 南京优速网络科技有限公司 Address translation method by modifying linux kernel message processing function
CN113691589B (en) * 2021-07-27 2023-12-26 杭州迪普科技股份有限公司 Message transmission method, device and system
CN114650271B (en) * 2022-03-23 2023-12-05 杭州迪普科技股份有限公司 Global load DNS neighbor site learning method and device

Also Published As

Publication number Publication date
CN1512377A (en) 2004-07-14

Similar Documents

Publication Publication Date Title
CN1272724C (en) No.7 layer load equalization method based on socket butt joint in kernel
US7639700B1 (en) Architecture for efficient utilization and optimum performance of a network
US7286476B2 (en) Accelerating network performance by striping and parallelization of TCP connections
Apostolopoulos et al. Design, implementation and performance of a content-based switch
CN1303798C (en) Ip multicast distribution system, streaming data distribution system and program therefor
US7076555B1 (en) System and method for transparent takeover of TCP connections between servers
US20040260745A1 (en) Load balancer performance using affinity modification
CN1968226A (en) Method for crossing network address conversion in point-to-point communication
US7290050B1 (en) Transparent load balancer for network connections
CN1507734A (en) Generic external proxy
CN110768994B (en) Method for improving SIP gateway performance based on DPDK technology
CN101030946A (en) Method and system for realizing data service
CN1968194A (en) Method for passing through network address switching
CN101060533A (en) A method, system and device for improving the reliability of VGMP protocol
CN1115843C (en) Radio data communication equipment and its method
JP2004510394A (en) Virtual IP framework and interface connection method
CN1157898C (en) method for internet communication
CN101068189A (en) Method for supporting IPv4 applied program utilizing intermain machine tunnel in IPV6
CN1917512A (en) Method for establishing direct connected peer-to-peer channel
CN1223159C (en) Method of supporting address transfer application network
CN1863152A (en) Method for transmitting various messages between internal network users
CN1529481A (en) Method for realizing distributed application tier conversion gate-link in network processor
WO2003105006A1 (en) Load balancing with direct terminal response
CN1728661A (en) Method for realizing backup and load shared equally based on proxy of address resolution protocol
CN1697445A (en) Implementation method for transferring data in virtual private network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20060830

Termination date: 20201231