CN102970150A - Extensible multicast forwarding method and device for data center (DC) - Google Patents
Extensible multicast forwarding method and device for data center (DC) Download PDFInfo
- Publication number
- CN102970150A CN102970150A CN2011102571687A CN201110257168A CN102970150A CN 102970150 A CN102970150 A CN 102970150A CN 2011102571687 A CN2011102571687 A CN 2011102571687A CN 201110257168 A CN201110257168 A CN 201110257168A CN 102970150 A CN102970150 A CN 102970150A
- Authority
- CN
- China
- Prior art keywords
- multicast
- data center
- address
- index value
- address section
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
- Small-Scale Networks (AREA)
Abstract
The invention provides an extensible multicast forwarding method and a device for a DC. According to the method and the device, the multicast address of the server in the DC is expressed by the address in a continuous multicast address field. The method includes receiving an input multicast data packet and extracting a destination MAC address from the head of the input multicast data packet; performing logic and operation on the extracted destination MAC address and a scheduled mask to obtain an index value and achieve a routing inquiry engine function; determining an output port sequence through the obtained index value to achieve a forwarding engine function; and finally, dispatching a line card through a switching matrix to achieve the multicast forwarding of the input multicast data packet, namely, outputting the input multicast data packet from the port indicated by the determined the output port sequence. The device and the method have the advantages of being simple in design and extensible.
Description
Technical field
The present invention relates to data center, more specifically, relate to a kind of expandable multicasting for data center and transmit scheme.
Background technology
Along with the development that the Internet and cloud computing service etc. are used, the scale of data center increases day by day, and the machine quantity that data center comprises is more and more (ten thousand ranks) also.Because the network layer topological structure that the data communication between data center's internal server has high density centrality and comparison rule with respect to the Internet, so the network architecture in the data center becomes the focus of current research.
The increasing demand of using (such as Online Video) and data center itself (such as the data backup etc.) the use multicast functionality.With respect to man-to-man singlecast router, Multicast Routing is mainly used in the service of one-to-many, and namely a multicast source provides service, the service (such as video, file copy etc.) that all members that add this multicast group just can reception sources provide.Data center interconnects a large amount of servers, and provides efficient and fault-tolerant routing forwarding service for the application on upper strata.Therefore, have higher requirement for multicast forwarding mechanism.Yet intracardiac multicast forwarding still adopts in the current data is mode for internet, applications, exist transmit excessive, the switch cost performance is low and the intrinsic drawback such as poor expandability.For these problems, need a kind of new method to solve the scalability problem of multicast forwarding in the data center, can adopt new route querying mechanism, and realize transmitting the improvement of storage.
The application of clean culture is more extensive with respect to multicast on the Internet, the network equipment that therefore most of network equipment supplier produces in the design of routing module, mainly for be singlecast router, possess simultaneously the Multicast Routing function.Because the polymerization property of IPv4 address on singlecast router, so that in router, what carry out the routing table lookup employing is the longest matching way.As: when the packet rs destination address is 1.2.3.4, if in the current routing table 1.0.0.0/8 is arranged, 1.2.0.0/16, three list items of 1.2.3.0/24 coupling, then with the 1.2.3.0/24 of the longest coupling as hitting item.The requirement of this longest coupling is so that route querying becomes step the most consuming time in the package forward process.
The time complexity of routing table inquiry is main relevant with the storage mode of routing table.As shown in table 1 is a routing table that the address is unordered.List item with ethernet address (MAC Address) as index, content is a bit map (Interface Bitmap) of transmitting port, after query hit was a certain, the port bit map (bit is the port of 1 correspondence) according to storage sent from the corresponding port.Because the index of table 1 is unordered, therefore when inquiring about according to the packet rs destination address, need the whole routing table of traversal, time complexity is 0 (N).
Table 1
Be an orderly routing table shown in the table 2, the routing table after namely sorting according to MAC Address can adopt binary chop when therefore searching index, and time complexity is 0 (log
2N).
Table 2
But the route querying time complexity under these two kinds of storage modes and some the tree storage modes all can not satisfy the Internet and large-scale data center to the requirement of routing table inquiry: approximate 0 (1).
Network equipment provider mostly adopts three-state content addressing memory (TCAM:Ternary Content Addressable Memory) to finish the route querying process at present.This is the routing table parallel search method that a kind of hardware mode is realized, can access approximate 0 (1) time complexity, namely just can obtain lookup result by routing table is read once, and finish the longest coupling.It is that basis from CAM develops.The state of each bit only has two in the general CAM memory, " 0 " or " 1 ", and each bit has three kinds of states among the TCAM, remove outside " 0 " and " 1 ", one in addition " DON ' T CARE " state, so be called " three-state ", it is realized by mask, this the third status flag of TCAM makes it can carry out exact-match lookup just, can carry out again fuzzy matching and search.Yet the cost performance of TCAM is low, and poor expandability can find out that by following table 3 TCAM is higher than SRAM far away on price and power consumption.
Table 3
Therefore, use DRAM or SRAM to substitute TCAM, and realize that routing table lookup becomes the main direction of current network equipment supplier research near the method for 0 (1) time complexity.The memory access speed of DRAM (10~20 nanosecond) is excessively low than SRAM (~2 nanosecond), and power consumption and SRAM are suitable, and therefore, current research concentrates on adopts SRAM to replace TCAM.
There is not the problem of long coupling in multicast address, and has the characteristic that address field is concentrated in data center network.Total nearly 4,300,000,000 (2 of IPv4 address
32), the address realm of multicast is 224.0.0.0~239.255.255.255, has more than 200,000,000 (16 * 2
24).Therefore, route and forwarding mechanism address realm to be processed for multicast on the Internet are 200,000,000, and the continuity of address is uncontrollable.Different from the Internet, in data center network, the quantity of multicast address depends on the quantity of server, and the number of servers at current large-scale data center is 10,000~100,000 scope is interior and concentrate on 50, about 000, therefore, a continuous multicast address section (for example, 239.0.*.*/16 have 65,536 addresses) be enough to satisfy in the current data central site network demand to multicast.Few based on this number of addresses, concentrate controlled characteristic, the routing table lookup of adopting this method for internet, applications of TCAM to carry out multicast seems that cost performance and extensibility are excessively low, therefore adopts simple gate circuit+SRAM to replace TCAM to realize that the routing table lookup of multicast just becomes a kind of efficient data center network scheme.Thereby, need a kind of technical scheme, can under the prerequisite that guaranteed performance does not reduce, reduce the expense that multicast routing table is inquired about and transmitted in the large-scale data central site network, higher extensibility is provided.
Summary of the invention
The present invention proposes a kind of expandable multicasting for data center network and transmit scheme, the method substitutes TCAM with a kind of simple gate circuit and finishes routing table inquiry work, and guarantee that time complexity is 0 (1), this querying method does not need to store routing table, but directly operation draws and transmits index value according to mask ' with '.Mask part in this circuit can change according to the quantity difference of server in the data center network.
In addition, transmitting storage area, the present invention proposes a kind of compression method of transmitting equally, and the identical list item in existing the transmitting is only kept portion, before reading this and not having transmitting of identical list item, once is redirected by another concordance list.
According to the first scheme of the present invention, a kind of multicast forwarding equipment for data center has been proposed, wherein adopt address in the continuous multicast address section to represent the multicast address of the server in the described data center, described multicast forwarding equipment comprises: ply-yarn drill, be used for receiving the input multicast packet, and from described input multicast packet packet header, extract target MAC (Media Access Control) address; The routing inquiry unit, the target MAC (Media Access Control) address that is used for described ply-yarn drill is extracted carries out the logical “and” operation with predetermined mask entry, obtains index value; Retransmission unit is used for utilizing the resulting index value in described routing inquiry unit, determines the output port sequence; And the switching matrix unit, be used for dispatching described ply-yarn drill, realize the multicast forwarding of described input multicast packet, namely export described input multicast packet from the indicated port of the determined output port sequence of described retransmission unit.
Preferably, described retransmission unit as transmitting index value, directly hits the corresponding list item in transmitting with the resulting index value in described routing inquiry unit, determines the output port sequence; Perhaps described retransmission unit comprises the index sequence table and transmits, at first with the resulting index value in described routing inquiry unit as the index sequence table index value, directly hit the corresponding list item in the index sequence table, obtain transmitting index value, then, recycling is transmitted index value by what the index sequence table hit, directly hits the corresponding list item in transmitting, and determines the output port sequence.
Preferably, described predetermined mask entry is to set according to the sum of the server in the described data center.
More preferably, the sum of the server in described data center is not more than 2
16The time, described continuous multicast address section is the contiguous ip address section that only has 16 bit suffix different, described predetermined mask entry is set as 0xffff; The sum of the server in described data center is not more than 2
17The time, described continuous multicast address section is the contiguous ip address section that only has 17 bit suffix different, described predetermined mask entry is set as 0x1ffff; The sum of the server in described data center is not more than 2
18The time, described continuous multicast address section is the contiguous ip address section that only has 18 bit suffix different, described predetermined mask entry is set as 0x3ffff; The sum of the server in described data center is not more than 2
19The time, described continuous multicast address section is the contiguous ip address section that only has 19 bit suffix different, described predetermined mask entry is set as 0x7ffff; And the sum of the server in described data center is not more than 2
20The time, described continuous multicast address section is the contiguous ip address section that only has 20 bit suffix different, described predetermined mask entry is set as 0xfffff.
Preferably, described retransmission unit adopts described the transmitting of SRAM storage; Perhaps described retransmission unit adopts SRAM to store described index sequence table and described transmitting.
According to alternative plan of the present invention, a kind of multicast forward method for data center has been proposed, wherein adopt address in the continuous multicast address section to represent the multicast address of the server in the described data center, described multicast forward method comprises: receive the input multicast packet; From described input multicast packet packet header, extract target MAC (Media Access Control) address; The target MAC (Media Access Control) address that extracts is carried out the logical “and” operation with predetermined mask entry, obtain index value; Utilize resulting index value, determine the output port sequence; And export described input multicast packet from the indicated port of determined output port sequence.
Preferably, utilize resulting index value, determining in the step of output port sequence, as transmitting index value, directly hitting the corresponding list item in transmitting with resulting index value, determining the output port sequence; Perhaps at first, as the index sequence table index value, directly hit the corresponding list item in the index sequence table with resulting index value, obtain transmitting index value; Then, recycling is transmitted index value by what the index sequence table hit, directly hits the corresponding list item in transmitting, and determines the output port sequence.
Preferably, described predetermined mask entry is to set according to the sum of the server in the described data center.
More preferably, the sum of the server in described data center is not more than 2
16The time, described continuous multicast address section is the contiguous ip address section that only has 16 bit suffix different, described predetermined mask entry is set as 0xffff; The sum of the server in described data center is not more than 2
17The time, described continuous multicast address section is the contiguous ip address section that only has 17 bit suffix different, described predetermined mask entry is set as 0x1ffff; The sum of the server in described data center is not more than 2
18The time, described continuous multicast address section is the contiguous ip address section that only has 18 bit suffix different, described predetermined mask entry is set as 0x3ffff;
The sum of the server in described data center is not more than 2
19The time, described continuous multicast address section is the contiguous ip address section that only has 19 bit suffix different, described predetermined mask entry is set as 0x7ffff; And the sum of the server in described data center is not more than 2
20The time, described continuous multicast address section is the contiguous ip address section that only has 20 bit suffix different, described predetermined mask entry is set as 0xfffff.
Preferably, described transmitting is stored among the SRAM; Perhaps described index sequence table and described transmitting are stored among the SRAM.
According to third party's case of the present invention, a kind of OpenFlow switch has been proposed, comprise the described multicast forwarding equipment of the first scheme according to the present invention.
According to cubic case of the present invention, a kind of Ethernet switch has been proposed, comprise the described multicast forwarding equipment of the first scheme according to the present invention.
In the expandable multicasting forwarding scheme that is used for data center proposed by the invention, adopt a kind of simple gate circuit to finish the mask operation, replaced complicated TCAM.At first, because this implementation does not need to process the longest coupling, do not need to store routing table, so cost and resource overhead (power consumption, circuit complexity etc.) are well below TCAM.Secondly, owing to be simple mask ' with ' operation, so the figure place of mask can be according to the variation of number of servers in the data center network and corresponding change has realized extensibility.In addition, because simplicity of design, query manipulation time overhead of the present invention and a memory access time of SRAM identical (~2 nanosecond) are lower than TCAM (3~5 nanosecond).In addition, in the present invention, the compression scheme of transmitting is by having saved memory space (original space 1/2~1/4) with the mode that is redirected, this explanation is in the identical situation of memory space, the address space that the present invention supports is more than original storage mode (2 times~4 times), for data center network provides good extensibility.
Description of drawings
By below in conjunction with description of drawings the preferred embodiments of the present invention, will make of the present invention above-mentioned and other objects, features and advantages are clearer, wherein:
Fig. 1 shows the schematic block diagram of the expandable multicasting forwarding unit 1000 for data center of the present invention.
Fig. 2 transmits 132 schematic diagram.
Fig. 3 shows the schematic flow diagram of the expandable multicasting retransmission method 3000 for data center of the present invention.
Fig. 4 shows the schematic block diagram of the expandable multicasting forwarding unit 4000 for data center of the present invention.
Fig. 5 resets the schematic diagram of always realizing the memory capacity compression for explanation by transmitting.
Fig. 6 adopts OpenFlow to realize schematic diagram of the present invention.
Fig. 7 realizes schematic diagram of the present invention in existing ethernet environment.
In institute of the present invention drawings attached, same or analogous structure and step are all with same or analogous designated.
Embodiment
To a preferred embodiment of the present invention will be described in detail, having omitted in the description process is unnecessary details and function for the present invention with reference to the accompanying drawings, obscures to prevent that the understanding of the present invention from causing.
According to before introduction as can be known, be from the different of the Internet maximum: in data center network, the quantity of multicast address depends on the quantity of server, and the number of servers at current large-scale data center is 10,000~100,000 scope is interior and concentrate on 50, about 000, therefore, a continuous multicast address section (for example, 239.0.*.*/16 have 65,536 addresses) be enough to satisfy in the current data central site network demand to multicast.According to these characteristics, the present invention proposes to adopt the IP address in the continuous IP address field to represent the multicast address of the server in the data center, for example, can adopt the address in 239.0.*.*/16 or 192.1.*.*/16 to represent.
In view of the characteristics of choosing (the IP address in the continuous IP address field) of IP address, multicast mac address also correspondingly has same characteristic with multicast ip address.In Ethernet, (adopt the address in address field 239.0.*.*/16 to represent), 01:00:5E:00:00:00~01:00:5E:FF:FF:FF is the MAC Address section that is used as specially multicast, wherein front 25 bits are (01:00:5E) that fix, and rear 23 bits are identical with rear 23 bits of corresponding IP address.In the mapping of multicast address, rear 23 bits of IP address map directly among rear 23 of MAC Address, namely behind rear 23 IP multicast address with the upper strata of multicast mac address 23 identical.In internet environment, because front 4 bits of IP address are (1110) fixed according to the D classification of IP address, there are 5 bits not to be mapped in the MAC Address in the middle of therefore, can cause like this multicast mac address may corresponding 2
5The uniqueness of this multicast address can not be judged by MAC Address in=32 different IP addresses, can only judge by multicast ip address.Yet in data center network, by the introduction of front, number of servers is limited (to be lower than 100,000), the address is controlled continuously, therefore, can distinguish each multicast address according to rear 23 bits fully, reason is: the number of servers in the data center<100,000<2
23Therefore, 23 bits can determine uniquely that fully (maximum quantity of the data center server that most of example of this specification is discussed is 65,536, so has only used last 16 for the multicast address of a server; For extendible situation, can adopt at most 23 bits, realize 2
23The addressing of individual server).At last, by an example this address mapping is described: it is 01:00:5E:41:10:02 that multicast address 224.193.16.2 is mapped to mac-layer address.At first front 25 of MAC Address is fixed, be 01:00:5E, 193 binary form is shown 11000001, removing after the highest order is 1000001, be 0x41,16 hexadecimal representation is 0x10, and 2 hexadecimal representation is 0x02, therefore decide for rear 23, obtain final MAC Address 01:00:5E:41:10:02.
Fig. 1 shows the schematic block diagram of the expandable multicasting forwarding unit 1000 for data center of the present invention.
As shown in Figure 1, the expandable multicasting forwarding unit 1000 for data center according to the present invention comprises: ply-yarn drill 110, routing inquiry engine 120, forwarding engine 130 and switching matrix 140.
Ply-yarn drill 110 is packet inputs and from the door of router/switch output, comprises a plurality of network interface card (not shown), mainly is comprised of I/O formation 112 and data packet processor 114.
I/O formation 112: after packet entered from network interface card, at first buffer memory in input rank waited pending.If input rank is full, then reentrant packet will be simply discarded.The processing sequence of packet depends on different service quality (QoS) to the priority requirement of packet, processes (FIFO) according to first-in first-out rule under the default situations.Output queue and input rank are similar.
Data packet processor 114: mainly finish the destination address (target MAC (Media Access Control) address) that from the packet head, extracts data link layer and the destination address (purpose IP address) of route layer.These two addresses will be respectively as the index entry of data link layer and route layer Lookup engine.
Because the present inventor recognizes that the address that can adopt in continuous IP/MAC Address section represents the multicast address of server in data center, routing inquiry engine 120 of the present invention has simple design and good extensibility.For example, be not more than 65,536 (namely 2 with the number of servers in the data center network
16) be example, multicast address can be that 16 address block represents with continuous prefix length.Routing inquiry engine 120 carries out logical “and” by the mask entry 124 of simple AND gate circuit 122 and relative set and operates to realize query manipulation.
AND gate circuit 122: after the data packet processor 114 from ply-yarn drill 120 obtains the destination address of packet, AND gate circuit 122 with target MAC (Media Access Control) address be stored in mask in the mask entry 124 (because the quantity of multicast address is not more than 2
16Individual, so the number of significant digit of mask is 16) carry out ' with ' operation, the result who obtains is exactly the index value of transmitting, and exports to forwarding engine 130.Because the address continuity in the data center network, so that behind discovering network topology, what mainly carry out is two layers of forwarding, so AND gate circuit 122 processes is two layers MAC Address.
Mask entry 124: being used for storage is used for the mask value of logic ' with ' operation.The data center network number of servers is not more than 2
16Individual, therefore adopt 16 bit-wise mask value to get final product.Here, the simplest mode represents 16 bitmasks with 0xffff exactly, and then the result of all ' with ' operations remains destination address itself.Owing to centrality and the controllability of address in the data center, can guarantee rear 16 differences of the employed MAC Address of each server.
Here need to prove that mask value can change according to the number of servers in the current data central site network is different.Still take the simplest mask as example, the multicast address number is 65,536 (2
16) time, set corresponding mask value is 0xffff, the multicast address number is 131,072 (2
17) time, set corresponding mask value is 0x1ffff, the multicast address number is 1,048, and 576 (2
20) time, set corresponding mask value is 0xfffff.As seen mask value can adjust easily, is a rewrite operation to mask entry 124 from hardware is realized upward.This mode not only realizes simple than TCAM, and price, all well below TCAM, and provides good extensibility for data center network on the power consumption.
Fig. 2 is the schematic diagram of transmitting 132 (G2I).Two-layer retransmitting table for data center network is realized with bit map.It is identical with the multicast group number that list item is counted G, namely in this scene, have 65,536 multicast address (address represents a multicast group), just have 65,536 list items, what each list item was corresponding is the bit map of current switch ports themselves number I, what use in the current data central site network mainly is 64 port switch, then this bit map has 64 bits, each bit represents a port numbers, transmits from port one, 3,4,6,8,10 and 11 such as 10,110,101,011,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 00000000 expressions.In the present invention, can adopt SRAM to come forwarding table memory, can access higher cost performance.According to 65,536 (64K, 1K=1024) multicast address and 64 port switch, can calculate to such an extent that every size of transmitting is: 64K*64bits/8=512KB.The size of SRAM is according to shown in the table 3 on the one single chip group in the current switch, and on average about 4MB, so there is not the excessive problem of expense in the SRAM of 512KB.On the contrary, if the SRAM forwarding table memory with the 4MB size, suppose that the switch ports themselves number still is 64, then can support 4MB/512KB*64K=512K multicast address, be that number of servers rises in the situation of 512K in the data center network, forwarding engine of the present invention still can be realized 0 (1) inquiry velocity, for data center network provides good extensibility.
The effect of switching matrix 140 is outgoing routes of data dispatching bag.Package forward between a plurality of network interface cards in the same ply-yarn drill 110 is by data packet processor 114 scheduling of ply-yarn drill 110 inside, package forward between different ply-yarn drills is then by switching matrix 140 scheduling, comprising the shared scheduling to other resources of the network equipment, all is to be finished by switching matrix 140.
Fig. 3 shows the schematic flow diagram of the expandable multicasting retransmission method 3000 for data center of the present invention.
As shown in Figure 3, at step S3010, ply-yarn drill 110 receives the input multicast packet, enters the input rank in the I/O formation 112.Then, at step S3020, this multicast packet of 114 pairs of inputs of data packet processor in the ply-yarn drill 110 is processed, and extracts target MAC (Media Access Control) address from this input multicast packet packet header, and the target MAC (Media Access Control) address that extracts is sent to routing inquiry engine 120.At step S3030, routing inquiry engine 120 carries out the logical “and” operation with the target MAC (Media Access Control) address and the mask entry that receive, obtains the index value for forwarding engine 130.At step S3040, forwarding engine 130 is direct as transmitting index value with the index value of routing inquiry engine 120 outputs, and the corresponding list item in 132 is transmitted in direct hit, obtains the output port sequence.At step S3050, switching matrix 140 is according to forwarding engine 130 determined output port sequences, and scheduling ply-yarn drill 110 is realized the multicast forwarding (that is, exporting this input multicast packet from the indicated port of output port sequence) of this input multicast packet.
Improvement to previous embodiment
Although shown in Figure 2 transmit 132 and be convenient to read (directly according to the index value reading of content), because therefore the network topology comparison rule of data center exists a large amount of identical list items.Many middle layer switch (the non-leaf node switch in the tree structure) have identical forwarding port sequence for different destination addresses, so the present invention further proposes to adopt the mode (bit map is shared bitmap sharing) that is redirected to compress storage to transmitting.
Fig. 4 shows the schematic block diagram of the expandable multicasting forwarding unit 4000 for data center of the present invention.Compared to Figure 1, only forwarding engine 130 is improved, with forwarding engine 430 signs.Other processing units are identical with Fig. 1, for fear of unnecessary repetition, omitted detailed description to these same treatment unit at this.
In forwarding engine 430, increased index sequence table 434, transmit 132 and also correspondingly be revised as and transmit 432, in order to realize hierarchical index.
When operation, (replace the step S3040 among Fig. 3), forwarding engine 430 is with the index value of the routing inquiry engine 120 output index value (index value of each the list item forwarding table memory in the index sequence table 434) as index sequence table 434, directly hit the corresponding list item in the index sequence table 434, obtain transmitting index value; Then, forwarding engine 430 recycling is transmitted index value by what index sequence table 434 hit, directly hits the corresponding list item of transmitting in 432, obtains the output port sequence.Thus, by secondary index, hit the output port sequence; Meanwhile, can realize the compression of memory capacity.
Fig. 5 resets the schematic diagram of always realizing the memory capacity compression for explanation by transmitting.
Among Fig. 5, G2M is index sequence table 434, and M2I transmits 432.The sequence length of G2M is identical with the multicast group number, and 65,536 elements are namely arranged in this scene, and the content of each element storage is the index value of M2I, and therefore hitting to transmit needs two secondary indexs: being G2M for the first time, is M2I for the second time.The purpose of introducing index sequence table 434 is that original transmitting has many identical list items among 132 (Fig. 2), can compress storage, transmitting in 432 (being M2I) after the compression, every list item is unique bit map of representative all, this bit map is with original to transmit bit map the same, and number of bits equals the switch ports themselves number.Arrow among Fig. 5 has represented the process that is redirected: when routing inquiry engine 120 inquires the MAC Address of two different pieces of information bags, by obtaining two different index entries after mask ' from ' operation, respectively multicast group 1 and multicast group 2, read two corresponding in the index sequence table 434 element values according to these two index entries, obtained the identical index value of transmitting, this show compress storage before, these two multicast group are corresponding, and to transmit 132 contents in table identical.
For the ease of the disposable read operation of CPU, the element width of index sequence table G2M has been carried out the restriction of alignment rule, the scope that namely L can value is 2 bytes, 4 bytes.Therefore for 32 bit CPUs, an accessing operation is 4 bytes, if L gets 2 bytes and then can once read two, gets 4 bytes and can once read one, has improved memory access efficient (if get 3 bytes, then read all at every turn and can't align).In addition, as seen from Figure 5, the maximum M that transmits 432 list item number N is relevant with the width of L: M=2
L, even L gets 2 bytes, and then transmitting 432 list item can have M=2 at most
16=64K, if L gets 4 bytes, transmitting 432 list item can have M=2 at most
32=4G.Namely getting the data center server quantity that 2 bytes and 4 bytes can support respectively is 64K and 4G, this shows that this index sequence table alignment rule of the present invention has not only made things convenient for the CPU accessing operation, and good extensibility is provided.
Owing to be redirected, therefore this twice index operation transmitted more than once accessing operation than original direct index, because the present invention has adopted SRAM to store, therefore the time overhead that needs extra 1.5~2ns (nanosecond), this expense still time overhead than TCAM is low, and the present invention has saved the memory space of transmitting in addition.
In Fig. 2, identical with the multicast group number (for example, 64K), contain identical list item, every list item is a bit map to original 132 the list item number G of transmitting without overcompression, and width I is switch ports themselves number (for example, 64 ports).In Fig. 5, through the index sequence table 434 of overcompression storage with transmit 432, the list item number G of index sequence table 434 is identical with the multicast group number, and (for example, 64K), the element width is that (for example, according to the CPU alignment rule, get L is 2 bytes (2 to L
16=64K)), it is N that 432 list item number is transmitted in compression, and every list item is a bit map, and width I is similarly switch ports themselves number (for example, 64 ports).Thus, can pass through equation (1), the space expense after the calculating compression and the ratio of luv space expense:
In typical case of the present invention quotes, L=2B, I=64bits=8B, so L/I=1/4.Compression factor N/G depends on that (if average per two list items are identical, then the frequency of occurrences of identical list item is 2 to the frequency of occurrences of transmitting identical list item in 132; If per four list items are identical, then the frequency of occurrences of identical list item is 4).According to the tree of data center network practical topology, the frequency of occurrences of identical list item is on average between 4~8, so compression factor N/G value is between 1/8~1/4, so the as a result span of expression formula (1) is [1/4,1/2].Therefore, compressed index sequence table 434 of the present invention with transmit 432 and transmit 132 and compare spatially and only to be 1/4~1/2 of original requisite space with original.This shows that under same memory space condition, this improvement is compared with previous embodiment and can be supported 2 times~4 times multicast address space, has greatly improved the extensibility of data center network.
At last, introduce two kinds of specific implementations of the present invention, a kind of OpenFlow technology of current hot topic that can adopt realizes that another kind can be in the deployment that has under the ethernet environment.
(1) OpenFlow is a kind of technology of increasing income, and allows the researcher to realize according to the actual requirements and dispose new procotol or network experiment.Because the network equipment that current network equipment vendors provide all is " flight data recorder ", the user can only be configured to it (such as configuration of IP address, VLAN etc.), can't programme according to experiment, therefore for most researchers, it is very urgent that the demand of open-type network equipment DLL (dynamic link library) becomes.OpenFlow is a kind of like this technology just, and it is the product of increasing income that a kind of software is realized route forwarding function, is researched and developed by Stanford University.OpenFlow not only can directly be deployed on the Linux main frame, is used as switch with the PC of many network interface cards, and has been loaded into (such as the product of NEC Corporation, JUNIPER company) in the commercial switch.The user can according to the research needs, to its realization of programming, make things convenient for researchers to carry out network experiment.
Fig. 6 adopts OpenFlow to realize schematic diagram of the present invention.
The forwarding of OpenFlow switch 610 inside rule and transmit all and disposed by the controller 620 of far-end, new Routing Protocols etc. are all realized in controller 620 programmings, carry out the information interaction of key-course (Control Plane) between the two by the SSL security protocol of encrypting, OpenFlow switch 610 itself is responsible forwarding data bag only.Transmit leg 630 sends multicast packet, and as shown in Figure 6, destination address is 239.0.0.6.According to multicast addressing rule, corresponding target MAC (Media Access Control) address is 01:00:5E:00:00:06, therefore sets up the ARP list item as shown in Figure 6.After OpenFlow switch 610 was received this packet, according to the characteristic of OpenFlow, programming realized transmitting rule: directly transmit index value according to 16 conducts after the MAC Address.Be equivalent to the result that operates with 16 works behind mask value 0xffff and the target MAC (Media Access Control) address ' with '.Here, rear 16 (circle marks among the figure) of target MAC (Media Access Control) address are 0x0006, so index value is 6, namely directly transmit according to transmitting the 6th indicated port.OpenFlow switch 610 among Fig. 6, controller 620 and transmit leg (PC) 630 are finished respectively following functions:
OpenFlow switch 610:
The function that can site of deployment programmable gate array FPGA (Field-Programmable Gate Array) realizes the routing inquiry engine.Because OpenFlow is supported in the programming on the hardware FPGA, therefore directly read to transmit/be redirected to read and transmit and can by using high-level language (C language) programming, realize at hardware according to data Layer (Data Plane) function that the indicated port of bit map is directly transmitted.
Transmit and be stored in the OpenFlow switch 610, but controlled by controller 620 by ssl protocol.
Controller 620:
By programming or administrator configurations, increase/delete/change/look into operation to being stored in transmitting in the OpenFlow switch 610.
Be the server-assignment multicast address in the data center network, avoid occurring address conflict, like this can be so that the multicast address of each server have uniqueness, corresponding switch is transmitted rule multiple coupling (for example, behind each target MAC (Media Access Control) address 16 unique) can not occur.
PC 630:
Before communication, at first send multicast request message, guarantee that intermediary switch node 610 sets up forwarding-table item.
The user does not need to make any modification at the service development of server end, still gets final product according to original multicast socket programming:
setsockopt(socket,IPPROTO_IP,IP_ADD_MEMBERSHIP,...)。
(2) Fig. 7 realizes schematic diagram of the present invention in existing ethernet environment.Main frame 730-1~730-n is responsible for surveying original multicast address, to prevent address conflict.Because the equipment as the controller 610 among the OpenFlow can not carry out address assignment and transmit the control function, therefore mainly finishes the multicast forwarding process by switch 710-1~710-m and PC 730-1~730-n.
Switch 710-1~710-m:
Equally, can use on-site programmable gate array FPGA (Field-Programmable Gate Array) to realize the function of routing inquiry engine.Owing to be that commercial switch adds FPGA and realizes, therefore adopt special hardware program language (such as Verilog or VHDL) to finish directly to read to transmit/be redirected and read the data Layer function of transmitting and directly transmitting according to the indicated port of bit map.
PC 730-1~730-n:
Owing to there is not controller to carry out address assignment, main frame 730-1~730-n itself will carry out multicast address and seek, and is not used with the multicast address of determining oneself.
The user does not need to make any modification at the service development of server end, still gets final product according to original multicast socket programming:
setsockopt(socket,IPPROTO_IP,IP_ADD_MEMBERSHIP,...)
Other settings of the embodiment of the invention disclosed herein comprise the step of the embodiment of the method that execution is formerly summarized and describe in detail subsequently and the software program of operation.More specifically, computer program is following a kind of embodiment: have computer-readable medium, coding has computer program logic on the computer-readable medium, when when computing equipment is carried out, computer program logic provides relevant operation, thereby provides above-mentioned expandable multicasting to transmit scheme.When at least one processor of computing system is carried out, computer program logic is so that processor is carried out the described operation of the embodiment of the invention (method).This set of the present invention typically is provided as and arranges or be coded in such as the software on the computer-readable medium of light medium (such as CD-ROM), floppy disk or hard disk etc., code and/or other data structures or such as other media or the Downloadable software image in application-specific integrated circuit (ASIC) (ASIC) or the one or more module, the shared data bank etc. of the firmware on one or more ROM or RAM or the PROM chip or microcode.Software or firmware or this configuration can be installed on the computing equipment, so that the one or more processors in the computing equipment are carried out the described technology of the embodiment of the invention.Also can provide according to node of the present invention and main frame in conjunction with the software process that operates such as the computing equipment in one group of data communications equipment or other entities.According to node of the present invention and main frame also can be distributed on a plurality of data communications equipment a plurality of software process or between all software process that all software process that one group of small, dedicated computer moves or single computer move.
Should be appreciated that, strictly say that embodiments of the invention can be implemented as software program on the data processing equipment, software and hardware or independent software and/or independent circuit.
So far invention has been described in conjunction with the preferred embodiments.Should be appreciated that those skilled in the art can carry out various other change, replacement and interpolations in the situation that does not break away from the spirit and scope of the present invention.Therefore, scope of the present invention is not limited to above-mentioned specific embodiment, and should be limited by claims.
Claims (12)
1. multicast forwarding equipment that is used for data center wherein adopts address in the continuous multicast address section to represent the multicast address of the server in the described data center, and described multicast forwarding equipment comprises:
Ply-yarn drill is used for receiving the input multicast packet, and extracts target MAC (Media Access Control) address from described input multicast packet packet header;
The routing inquiry unit, the target MAC (Media Access Control) address that is used for described ply-yarn drill is extracted carries out the logical “and” operation with predetermined mask entry, obtains index value;
Retransmission unit is used for utilizing the resulting index value in described routing inquiry unit, determines the output port sequence; And
The switching matrix unit is used for dispatching described ply-yarn drill, realizes the multicast forwarding of described input multicast packet, namely exports described input multicast packet from the indicated port of the determined output port sequence of described retransmission unit.
2. multicast forwarding equipment according to claim 1, it is characterized in that described retransmission unit with the resulting index value in described routing inquiry unit as transmitting index value, directly hit the corresponding list item in transmitting, determine the output port sequence; Perhaps
Described retransmission unit comprises the index sequence table and transmits, at first with the resulting index value in described routing inquiry unit as the index sequence table index value, directly hit the corresponding list item in the index sequence table, obtain transmitting index value, then, recycling is transmitted index value by what the index sequence table hit, directly hits the corresponding list item in transmitting, and determines the output port sequence.
3. multicast forwarding equipment according to claim 1 and 2 is characterized in that described predetermined mask entry is to set according to the sum of the server in the described data center.
4. multicast forwarding equipment according to claim 3 is characterized in that
The sum of the server in described data center is not more than 2
16The time, described continuous multicast address section is the contiguous ip address section that only has 16 bit suffix different, described predetermined mask entry is set as 0xffff;
The sum of the server in described data center is not more than 2
17The time, described continuous multicast address section is the contiguous ip address section that only has 17 bit suffix different, described predetermined mask entry is set as 0x1ffff;
The sum of the server in described data center is not more than 2
18The time, described continuous multicast address section is the contiguous ip address section that only has 18 bit suffix different, described predetermined mask entry is set as 0x3ffff;
The sum of the server in described data center is not more than 2
19The time, described continuous multicast address section is the contiguous ip address section that only has 19 bit suffix different, described predetermined mask entry is set as 0x7ffff; And
The sum of the server in described data center is not more than 2
20The time, described continuous multicast address section is the contiguous ip address section that only has 20 bit suffix different, described predetermined mask entry is set as 0xfffff.
5. one of according to claim 1~4 described multicast forwarding equipment is characterized in that
Described retransmission unit adopts described the transmitting of SRAM storage; Perhaps
Described retransmission unit adopts SRAM to store described index sequence table and described transmitting.
6. multicast forward method that is used for data center wherein adopts address in the continuous multicast address section to represent the multicast address of the server in the described data center, and described multicast forward method comprises:
Receive the input multicast packet;
From described input multicast packet packet header, extract target MAC (Media Access Control) address;
The target MAC (Media Access Control) address that extracts is carried out the logical “and” operation with predetermined mask entry, obtain index value;
Utilize resulting index value, determine the output port sequence; And
Export described input multicast packet from the indicated port of determined output port sequence.
7. multicast forward method according to claim 6 is characterized in that utilizing resulting index value, determine in the step of output port sequence,
As transmitting index value, directly hit the corresponding list item in transmitting with resulting index value, determine the output port sequence; Perhaps
At first, as the index sequence table index value, directly hit the corresponding list item in the index sequence table with resulting index value, obtain transmitting index value; Then, recycling is transmitted index value by what the index sequence table hit, directly hits the corresponding list item in transmitting, and determines the output port sequence.
8. according to claim 6 or 7 described multicast forward methods, it is characterized in that described predetermined mask entry is to set according to the sum of the server in the described data center.
9. multicast forward method according to claim 8 is characterized in that
The sum of the server in described data center is not more than 2
16The time, described continuous multicast address section is the contiguous ip address section that only has 16 bit suffix different, described predetermined mask entry is set as 0xffff;
The sum of the server in described data center is not more than 2
17The time, described continuous multicast address section is the contiguous ip address section that only has 17 bit suffix different, described predetermined mask entry is set as 0x1ffff;
The sum of the server in described data center is not more than 2
18The time, described continuous multicast address section is the contiguous ip address section that only has 18 bit suffix different, described predetermined mask entry is set as 0x3ffff;
The sum of the server in described data center is not more than 2
19The time, described continuous multicast address section is the contiguous ip address section that only has 19 bit suffix different, described predetermined mask entry is set as 0x7ffff; And
The sum of the server in described data center is not more than 2
20The time, described continuous multicast address section is the contiguous ip address section that only has 20 bit suffix different, described predetermined mask entry is set as 0xfffff.
10. one of according to claim 6~9 described multicast forward method is characterized in that
Described transmitting is stored among the SRAM; Perhaps
Described index sequence table and described transmitting are stored among the SRAM.
11. an OpenFlow switch comprises according to claim 1~one of 5 described multicast forwarding equipments.
12. an Ethernet switch comprises according to claim 1~one of 5 described multicast forwarding equipments.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011102571687A CN102970150A (en) | 2011-09-01 | 2011-09-01 | Extensible multicast forwarding method and device for data center (DC) |
JP2012152239A JP5518135B2 (en) | 2011-09-01 | 2012-07-06 | Extensible multicast forwarding method and apparatus for data center |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011102571687A CN102970150A (en) | 2011-09-01 | 2011-09-01 | Extensible multicast forwarding method and device for data center (DC) |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102970150A true CN102970150A (en) | 2013-03-13 |
Family
ID=47800042
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011102571687A Pending CN102970150A (en) | 2011-09-01 | 2011-09-01 | Extensible multicast forwarding method and device for data center (DC) |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP5518135B2 (en) |
CN (1) | CN102970150A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103346969A (en) * | 2013-07-05 | 2013-10-09 | 中国科学院计算机网络信息中心 | Method for achieving dynamic multicast spanning tree path adjustment based on OpenFlow |
WO2016106506A1 (en) * | 2014-12-29 | 2016-07-07 | 华为技术有限公司 | Routing method and device |
CN105791126A (en) * | 2014-12-26 | 2016-07-20 | 中兴通讯股份有限公司 | Ternary content addressable memory (TCAM) table search method and device |
CN106656799A (en) * | 2017-02-14 | 2017-05-10 | 湖南基石通信技术有限公司 | Message forwarding method and device based on wireless mesh network |
CN106789727A (en) * | 2016-12-27 | 2017-05-31 | 锐捷网络股份有限公司 | Packet classification method and device |
WO2019128740A1 (en) * | 2017-12-27 | 2019-07-04 | 华为技术有限公司 | Message processing method and device |
CN110808910A (en) * | 2019-10-29 | 2020-02-18 | 长沙理工大学 | OpenFlow flow table energy-saving storage framework supporting QoS and application thereof |
CN112235198A (en) * | 2020-10-15 | 2021-01-15 | 东莞飞思凌通信技术有限公司 | Multi-user TCAM mask matching algorithm realized based on FPGA |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9825884B2 (en) | 2013-12-30 | 2017-11-21 | Cavium, Inc. | Protocol independent programmable switch (PIPS) software defined data center networks |
US9438703B2 (en) * | 2014-06-19 | 2016-09-06 | Cavium, Inc. | Method of forming a hash input from packet contents and an apparatus thereof |
US9742694B2 (en) | 2014-06-19 | 2017-08-22 | Cavium, Inc. | Method of dynamically renumbering ports and an apparatus thereof |
US10616380B2 (en) | 2014-06-19 | 2020-04-07 | Cavium, Llc | Method of handling large protocol layers for configurable extraction of layer information and an apparatus thereof |
US9628385B2 (en) | 2014-06-19 | 2017-04-18 | Cavium, Inc. | Method of identifying internal destinations of networks packets and an apparatus thereof |
US9635146B2 (en) | 2014-06-19 | 2017-04-25 | Cavium, Inc. | Method of using bit vectors to allow expansion and collapse of header layers within packets for enabling flexible modifications and an apparatus thereof |
CN105471609B (en) | 2014-09-05 | 2019-04-05 | 华为技术有限公司 | A kind of method and apparatus for configuration service |
WO2016105445A1 (en) * | 2014-12-27 | 2016-06-30 | Intel Corporation | Technologies for scalable local addressing in high-performance network fabrics |
CN113709272B (en) * | 2021-08-26 | 2024-01-19 | 无锡思朗电子科技有限公司 | Method for improving image switching speed |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1809019A (en) * | 2005-01-18 | 2006-07-26 | 北京大学 | Method of implementing quick network message distribution based on adaptive cache mechanism |
CN101083622A (en) * | 2006-06-01 | 2007-12-05 | 富士通株式会社 | System and method for managing forwarding database resources in a switching environment |
CN102055641A (en) * | 2010-12-28 | 2011-05-11 | 华为技术有限公司 | Distribution method for virtual local area network and related device |
-
2011
- 2011-09-01 CN CN2011102571687A patent/CN102970150A/en active Pending
-
2012
- 2012-07-06 JP JP2012152239A patent/JP5518135B2/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1809019A (en) * | 2005-01-18 | 2006-07-26 | 北京大学 | Method of implementing quick network message distribution based on adaptive cache mechanism |
CN101083622A (en) * | 2006-06-01 | 2007-12-05 | 富士通株式会社 | System and method for managing forwarding database resources in a switching environment |
CN102055641A (en) * | 2010-12-28 | 2011-05-11 | 华为技术有限公司 | Distribution method for virtual local area network and related device |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103346969A (en) * | 2013-07-05 | 2013-10-09 | 中国科学院计算机网络信息中心 | Method for achieving dynamic multicast spanning tree path adjustment based on OpenFlow |
CN105791126B (en) * | 2014-12-26 | 2020-04-24 | 中兴通讯股份有限公司 | Ternary Content Addressable Memory (TCAM) table look-up method and device |
CN105791126A (en) * | 2014-12-26 | 2016-07-20 | 中兴通讯股份有限公司 | Ternary content addressable memory (TCAM) table search method and device |
CN106170956B (en) * | 2014-12-29 | 2019-04-12 | 华为技术有限公司 | A kind of method for routing and equipment |
CN106170956A (en) * | 2014-12-29 | 2016-11-30 | 华为技术有限公司 | A kind of method for routing and equipment |
WO2016106506A1 (en) * | 2014-12-29 | 2016-07-07 | 华为技术有限公司 | Routing method and device |
CN106789727A (en) * | 2016-12-27 | 2017-05-31 | 锐捷网络股份有限公司 | Packet classification method and device |
CN106656799A (en) * | 2017-02-14 | 2017-05-10 | 湖南基石通信技术有限公司 | Message forwarding method and device based on wireless mesh network |
CN106656799B (en) * | 2017-02-14 | 2019-12-03 | 湖南基石通信技术有限公司 | A kind of message forwarding method and device based on wireless mesh network |
WO2019128740A1 (en) * | 2017-12-27 | 2019-07-04 | 华为技术有限公司 | Message processing method and device |
US11134129B2 (en) | 2017-12-27 | 2021-09-28 | Huawei Technologies Co., Ltd. | System for determining whether to forward packet based on bit string within the packet |
CN110808910A (en) * | 2019-10-29 | 2020-02-18 | 长沙理工大学 | OpenFlow flow table energy-saving storage framework supporting QoS and application thereof |
CN110808910B (en) * | 2019-10-29 | 2021-09-21 | 长沙理工大学 | OpenFlow flow table energy-saving storage framework supporting QoS and method thereof |
CN112235198A (en) * | 2020-10-15 | 2021-01-15 | 东莞飞思凌通信技术有限公司 | Multi-user TCAM mask matching algorithm realized based on FPGA |
Also Published As
Publication number | Publication date |
---|---|
JP2013055642A (en) | 2013-03-21 |
JP5518135B2 (en) | 2014-06-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102970150A (en) | Extensible multicast forwarding method and device for data center (DC) | |
US9225643B2 (en) | Lookup cluster complex | |
US10230639B1 (en) | Enhanced prefix matching | |
Quan et al. | Scalable name lookup with adaptive prefix bloom filter for named data networking | |
KR101511945B1 (en) | Method and system for facilitating forwarding a packet in a content-centric network | |
US9531723B2 (en) | Phased bucket pre-fetch in a network processor | |
US8799507B2 (en) | Longest prefix match searches with variable numbers of prefixes | |
US20150131666A1 (en) | Apparatus and method for transmitting packet | |
Huang et al. | Green datapath for TCAM-based software-defined networks | |
US20100082895A1 (en) | Multi-level content addressable memory | |
CN103428093A (en) | Route prefix storing, matching and updating method and device based on names | |
US7440460B2 (en) | Apparatus and method for forwarding mixed data packet types in a high-speed router | |
US11652744B1 (en) | Multi-stage prefix matching enhancements | |
US7403526B1 (en) | Partitioning and filtering a search space of particular use for determining a longest prefix match thereon | |
US9485179B2 (en) | Apparatus and method for scalable and flexible table search in a network switch | |
JP2004266837A (en) | Packet classification apparatus and method using field level tree | |
CN111611348A (en) | ICN network information name searching method based on learning bloom filter | |
US7564841B2 (en) | Apparatus and method for performing forwarding table searches using consecutive symbols tables | |
Lim et al. | Two-dimensional packet classification algorithm using a quad-tree | |
US7512122B1 (en) | Identifying QoS flows using indices | |
US9590897B1 (en) | Methods and systems for network devices and associated network transmissions | |
CN102739550B (en) | Based on the multi-memory flowing water routing architecture that random copy distributes | |
Rojas-Cessa et al. | Helix: IP lookup scheme based on helicoidal properties of binary trees | |
CN101494603B (en) | Paralleling high-speed route addressing method for 128 bits Internet address | |
Taylor et al. | On using content addressable memory for packet classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20130313 |