Summary of the invention
The present invention proposes a kind of data centre network system.This system has reduced the networking cost of data center, has solved simultaneously the problem of core layer link bandwidth deficiency.In this data centre network system, core layer does not need to adopt expensive high-speed equipment, but the same with Access Layer, adopts the general switch of high performance-price ratio.For the topological structure of this network system, we have formulated the network addressing rule, and have proposed a fast method for routing.
The present invention has taked following technical scheme.The present invention includes switch and server two kind equipments, described switch comprises the first switch of row, the first switch of row and is the access switch that matrix is arranged, every every trade head of matrix disposes a first switch of row at least, every row row head disposes a first switch of row at least, server is connected with access switch, and all of each access switch and the first switches of its all row of being expert at and column thereof are listed as first switch and are connected.The first switch of each row is connected with all access switch in the one's own profession, and each is listed as first switch and is connected all access switch with Ben Lienei and wants to connect.Directly do not link to each other between the first switch of row and the row head switch and between each access switch arbitrarily.We are referred to as the switch type matrix topology above-mentioned annexation.
Described switch and server adopt internal network IP address, and address according to following rule: the IP address configuration of the first switch of row is 10.Row.0.X; The IP address configuration that is listed as first switch is 10.0.Col.X; The IP address configuration of access switch is 10.Row.Col.1; The IP address configuration of server is 10.Row.Col.X.Wherein Row is the line number that switch or server are expert at, and Col is the row number of switch or server column; First for going/as to be listed as first switch, 0<X≤255, for server, 1<X≤255.The subnet mask of all devices all is made as 255.255.255.0.
Compare with the available data central site network, the present invention has the following advantages:
1) the switch type matrix topology structure of the present invention's proposition is compared with the network configuration centered by server based on Clos network, fat tree network with existing, can reach equally identical two minutes bandwidth (bisection bandwidth), the over-booking of every one deck routing/exchanging equipment (Oversubscription) ratio can reach 1: 1 too.
2) the switch type matrix topology structure of the present invention's proposition is more succinctly clear, the deployment of the more convenient network equipment and network layout.The large-scale data center network comprises a large amount of network equipments, and is connected with a large amount of netting twines between the equipment, therefore succinctly clearly network topology be that network implementation and maintenance provide great convenience.
3) the present invention does not have strict sequence requirement to the connection between the device port, and these characteristics have been simplified the network operation operation equally.
4) the present invention has good retractility, by increasing routing/exchanging equipment, does not need to adjust existing topology and just can expand network size easily, and the network size that can support can be from tens station servers to several ten thousand station servers.
5) the present invention has good flexibility, can according to the practical application needs by adjusting the quantity of the first switch of row and the first switch of row, change neatly over-booking (Oversubscription) ratio between the first switch of access switch and the first switch of row and access switch and row.
6) route/exchange algorithm of the present invention's employing is more simple, is convenient to understand and realize, routing algorithm is supported equal cost multipath route ECMP (Equal-Cost Multipath Routing) technology in addition, possesses load balance ability.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.
In the data centre network system provided by the invention, the topological structure of network using regularization (switch type matrix topology), and address according to certain rule.Data centre network system is made of the first switch 10.Row.0.X of one group of row, row first switch 10.0.Col.X, access switch 10.Row.Col.1 and the various server 10.Row.Col.X that are connected on the access switch.Wherein Row is the line number that switch or server are expert at, and Col is the row number of switch or server column; First for going/as to be listed as first switch, 0<X≤255, for server, 1<X≤255.The subnet mask of all devices all is made as 255.255.255.0.
Communication is finished by the route forwarding function of access switch, the first switch of row and the first switch of row between the server.The first switch of row is responsible for all access switch of one's own profession are linked together, and is listed as first switch and is responsible for all access switch of these row are connected together.Each access switch be connected to simultaneously the first switches of all row of being expert at and column all be listed as above the first switch, between the first switch of the first switch of row and row, do not have direct annexation between each access switch.Each server is connected on the access switch.In order to finish route forwarding function, on each row first switch of head/row and access switch, safeguard a routing table, packet is transmitted according to routing table.The switch that links to each other is by intercoursing the annexation between information (comprising the machine IP address and the MAC Address) study, and generates routing table according to annexation.
1, switch type matrix topology structure
In the present embodiment, data centre network system is comprised of switch and server two kind equipments.Switch device provides two layers (link layers) and three layers of (network layer) function of network exchange, and server apparatus provides data operation and stores service.Wherein switch is divided into again three types, is called the first switch of row, the first switch of row and access switch.The first switch of row and the first switch of row belong to the network core layer, have three layers of exchange/routing capabilities, are responsible for access switch is linked together; Access switch belongs to network access layer, has two layers of exchange and three layers of exchange/routing capabilities, is responsible for server is linked in the network.The first switch of row in the present embodiment, the first switch of row and access switch can adopt the general switch of high performance-price ratio.The first switch of row, the first switch of row and access switch link together and have consisted of switch type matrix topology structure.
The port number (active port number) of the participation exchange/route of the capable first switch of the switch type matrix topology structural requirement that the present invention proposes, the first switch of row and access switch is preferably identical, and establishing port number is N (N>3).The port of access switch is divided into three parts, and wherein first's port is used for connection server, and the second portion port connects the first switch of row, and remaining a part of port is used for connecting the first switch of row.Complete topological structure always total N capable * a N row access switch, the capable stem of every delegation is affixed one's name to a plurality of row head switches, the row stem of each row is affixed one's name to the first switch of a plurality of row.Any access switch need to connect the first switch of whole row and the first switch of row of its place row and column.The present invention allows between server and the access switch, access switch and institute are expert at and connect by arbitrary port between the first switch of head/row.Between each access switch and the first switch of each row of being expert at an independent connection is arranged, be connected with column that each is listed as first switch an independent connection is also arranged.Fig. 1 is the example of switch type matrix of the port number N=6 of a switch, for the purpose of clear, do not draw server among the figure, and represent connection between first switches of all row of all access switch in the delegation and this row with an empty horizontal line in the drawings, all that represent all access switch that row are interior and these row with an empty vertical line are listed as the connection between the first switch.
The port assignment ratio of access switch can be distributed according to actual needs, typical case's point-score is the port part that classifies in three categories, 1/3 port is used for connecting the first switch of row, and 1/3 port is used for connecting the first switch of row, and remaining 1/3 port is used for connection server equipment.The first switch quantity of the row of the first switch of the row of every like this row and every row is N/3.This configuration mode can guarantee that the over-booking ratio of every layer of equipment reaches 1: 1.If the traffic that needs the first switch of the first switch of row or row to participate in transmitting is not a lot, also can suitably reduces as required the quantity of the first switch of row and the first switch of row, thereby reduce the networking cost.Such as 1/2 port assignment of access switch to server, 1/4 port connects the first switch of row, remaining 1/4 port is used for connecting the first switch of row.The first switch quantity of the row of the first switch of the row of every like this row and every row can reduce to N/4.In this case, the over-booking ratio of the first switch of row head/row is 1: 2.
Above-described is a complete switch type matrix topology.In some cases, can adjust network topology for actual needs.Such as less for network size, the data center that number of servers is few also can make up incomplete switch type matrixing network.Complete switch type matrixing network have N capable * a N row access switch, if the quantity of server does not reach N
3/ 3, can be according to right-to-left, bottom-up order reduces the quantity of access switch.For incomplete row or column, can correspondingly proportionally reduce the quantity of going head or being listed as first switch.Unnecessary idle port on the first switch of row head/row merges on other ports by trunking (Trunk) technology.Port number such as switch is 12, then can consist of the network topology that is 12 row *, 12 row to the maximum.The quantity of the first switch of the row of every row and the first switch of row is 12/3=4.If incomplete topological structure, such as the access switch that only has 6 row *, 12 row, the quantity that then is listed as first switch can reduce half, is 2.The connectivity scenario that be listed as first switch this moment is: at first each is listed as first switch connects Ben Lienei with 6 ports 6 access switch, and then remaining port converges on these ports in order one by one fifty-fifty.
2, network addressing scheme
The intrasystem various switches of notebook data central site network and server adopt internal network IP address 10.X.X.X (0<X≤255) to address, and Adoption Network address transition (NAT) technology converts outside network address to when needs and external network communication.
The IP address configuration of the first switch of row is 10.Row.0.X, and wherein Row is the line number at the first switch of row place, 0<Row≤N, and 0<X≤255, in this prescribed limit, the address of the first switch of row can arbitrary disposition (X can specify arbitrarily).
The IP address configuration that is listed as first switch is 10.0.Col.X, and wherein Col is the row at row first switch place number, 0<Col≤N, and 0<X≤255, in this prescribed limit, the address that is listed as first switch can arbitrary disposition (X can specify arbitrarily).
The IP address configuration of access switch is 10.Row.Col.1, and wherein Row is the line number at access switch place, and Col is the row number at access switch place, 0<Row≤N, 0<Col≤N.
The IP address configuration of server is 10.Row.Col.X, and wherein Row is the line number at this server place, and Col is the row number at this server place, 0<Row≤N, 0<Col≤N, 1<X≤255, in this prescribed limit, the address of server can arbitrary disposition (X can specify arbitrarily).
In the superincumbent addressing scheme, we can tell its device type according to the IP address area of equipment, and this equipment residing position in network, and this helps to determine equipment connecting relation, simplifies routing plan.According to addressing scheme and equipment connecting relation, can find out that whole servers of each access switch and its connection consist of a physical subnets, subnet mask is 255.255.255.0.Although identical with first its network address prefix of switch of the row of the first switch of the row of delegation or same row, there is not direct connection relational.
3, method for routing
In the switch type matrixing network that the present embodiment proposes, row and column is a symmetrical structure, therefore, communicate between the server in the different ranks, can pass through again the first switch of row through the first switch of space first, perhaps first through being listed as first switch again through the first switch of space, such as there being an equipment 10.2.2.X to communicate by letter with 10.4.4.X, the path of the first switch of process row is as follows first:
10.2.2.X→10.2.2.1→10.0.2.X→10.4.2.1→10.4.0.X→10.4.4.1→10.4.4.X
As follows through the path of the first switch of space first:
10.2.2.X→10.2.2.1→10.2.0.X→10.2.4.1→10.0.4.X→10.4.4.1→10.4.4.X
The present embodiment regulation: with communicating between the equipment in the delegation, only transmit by the first switch of row, communicate between the equipment in the same row, only transmit by being listed as first switch.Devices communicating between the different ranks adopts first through being listed as the path of first switch.
3.1 the structure of routing table
The well-regulated topological structure of data centre network system tool that the present invention proposes, thus method for routing can design very simple.Consider flexibility and autgmentability, the present invention adopts the routing conversion based on routing table.The structure of routing table is as follows:
Targeted subnet |
Subnet mask |
Next-hop ip address |
Next-hop mac address |
Outbound port |
Timestamp |
10.1.0.0 |
255.255.0.0 |
10.1.0.1 |
XX-XX-XX-XX-XX-XX |
1 |
|
10.2.0.0 |
255.255.0.0 |
10.2.0.1 |
XX-XX-XX-XX-XX-XX |
2 |
|
Illustrate:
(1) timestamp is used for establishment or the update time of minute book bar route table items.
When (2) routing table is described in the back, next-hop ip address, next-hop mac address and timestamp information have been omitted.
The routing table (adopting discontinuous subnet mask) of the first switch 10.RoW.0.X of row
Be listed as the routing table of first switch 10.0.Col.X
The routing table of access switch 10.Row.Col.1
The routing forwarding rule of access switch has following three: 1) for the communication of this subnet, adopt two layers of traditional exchange to transmit, do not explain herein; 2) be server (but being positioned at different sub-network) in the one's own profession for destination address, be forwarded to the first switch of row; 3) be the communication of the server (being positioned at different sub-network) of different rows for destination address, be forwarded to the first switch of row.For situation (2) and (3), need to carry out three layers of routing forwarding, routing table is as follows:
Illustrate:
1) 10.Row.0.X
iThe IP address (1≤i≤K, K are the quantity of the capable first switch of row of Row) of i capable capable first switch of Row, 10.0.Col.X
jIt is the IP address (1≤j≤M, M is the quantity of the first switch of row) of j the first switch of row of Col row.
2) for same targeted subnet, the routed path that has many equivalences in the routing table of access switch, the present invention adopts equal cost multipath route ECMP (Equal-Cost Multipath Routing) technology, realizes the random paths of selecting from the equative route of many repetitions.
3.2 routing table building method
Annexation by between the capable first switch of head/row of automatic learning and the access switch can construct routing table easily.In order to learn the annexation between the switch, need all switches regularly to send PDU (protocol Data Unit) to all active ports, content is the machine IP address and the machine MAC Address.For any switch, the at most corresponding route table items of each port, so the entry number of routing table equals at most the port number of switch.
(1) goes the structure of first switch routing table
The first switch 10.Row.0.X of row constructs routing table according to following rule:
If receive the PDU that 10.Row.Col.1 sends from port Port, in routing table, add or the renewal route table items:
10.0.Col.0/255.0.255.0/10.Row.Col.1/MAC address/Port/ timestamp
Upgrade PDU if can not receive in the stipulated time, then delete corresponding route entry (out of date).
(2) structure of the first switch routing table of row
Be listed as first switch 10.0.Col.X and construct routing table according to following rule:
If receive the PDU that 10.Row.Col.1 sends from port Port, in routing table, add or the renewal route table items:
10.Row.0.0/255.255.0.0/10.Row.Col.1/MAC address/Port/ timestamp
Upgrade PDU if can not receive in the stipulated time, then delete corresponding route entry (out of date).
(3) structure of access switch routing table
Access switch 10.Row.Col.1 constructs routing table according to following rule:
A) receive the PDU of the first switch 10.Row.0.X of row of one's own profession from port Port, add in the routing table or upgrade route table items:
10.Row.0.0/255.255.0.0/10.Row.0.X/MAC address/Port/ timestamp
B) receive the PDU of the first switch 10.0.Col.X of row of these row from port Port, add in the routing table or upgrade route table items:
10.0.0.0/255.0.0.0/10.0.Col.X/MAC address/Port/ timestamp
C) if can not receive renewal PDU in the stipulated time, then delete corresponding route entry (out of date).
Illustrate: for the data communication in the same subnet, adopt two traditional layer-switching technologies to carry out data retransmission, the structure of the addresses forwarding table of two layers of exchange (AFT, Address Forwarding Table) does not explain herein.
3.3 routing procedure illustrates
(1) data communication of equipment in the same subnet.Suppose to have two-server IP address to be respectively 10.1.1.2 (source) and 10.1.1.3 (purpose), then the communication of the two is directly transmitted by access switch 10.1.1.1.
(2) with the data communication of equipment in the delegation.Suppose to have two-server IP address to be respectively 10.1.3.2 and 10.1.5.2, packet will send to 10.1.5.2 from 10.1.3.2, need at first mail to access switch 10.1.3.1, and then according to the routing table of each switch, routing procedure is as follows:
10.1.3.2→10.1.3.1→10.1.0.X→10.1.5.1→10.1.5.2
(3) data communication of equipment in the same row.Suppose to have two-server IP address to be respectively 10.2.2.2 and 10.4.2.2, packet will send to 10.4.2.2 from 10.2.2.2, need at first mail to access switch 10.2.2.1, and then according to the routing table of each switch, routing procedure is as follows:
10.2.2.2→10.2.2.1→10.0.2.X→10.4.2.1→10.4.2.2
(4) data communication of the equipment of different rows.Suppose to have two-server IP address to be respectively 10.2.2.2 and 10.4.4.2, packet will send to 10.4.4.2 from 10.2.2.2, need at first mail to access switch 10.2.2.1, and then according to the routing table of each switch, routing procedure is as follows:
10.2.2.2→10.2.2.1→10.0.2.X→10.4.2.1→10.4.0.X→10.4.4.1→10.4.4.2
Above data centre network system provided by the present invention is described in detail, used specific embodiment herein principle of the present invention and execution mode are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications.In sum, this description should not be construed as limitation of the present invention.