WO2017219719A1 - 一种数据传输方法、装置以及网元 - Google Patents

一种数据传输方法、装置以及网元 Download PDF

Info

Publication number
WO2017219719A1
WO2017219719A1 PCT/CN2017/076966 CN2017076966W WO2017219719A1 WO 2017219719 A1 WO2017219719 A1 WO 2017219719A1 CN 2017076966 W CN2017076966 W CN 2017076966W WO 2017219719 A1 WO2017219719 A1 WO 2017219719A1
Authority
WO
WIPO (PCT)
Prior art keywords
link
member link
group member
links
group
Prior art date
Application number
PCT/CN2017/076966
Other languages
English (en)
French (fr)
Inventor
彭晓澎
郑合文
潘灏涛
沈利
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP17814458.0A priority Critical patent/EP3477893B1/en
Publication of WO2017219719A1 publication Critical patent/WO2017219719A1/zh
Priority to US16/225,385 priority patent/US10904139B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/28Routing or path finding of packets in data switching networks using route fault recovery
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/08Error detection or correction by redundancy in data representation, e.g. by using checking codes
    • G06F11/10Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's
    • G06F11/1004Adding special bits or symbols to the coded information, e.g. parity check, casting out 9's or 11's to protect a block of data words, e.g. CRC or checksum
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0061Error detection codes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0823Errors, e.g. transmission errors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/16Multipoint routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/22Alternate routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/24Multipath
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/74Address processing for routing
    • H04L45/745Address table lookup; Address filtering
    • H04L45/7453Address table lookup; Address filtering using hashing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • H04L47/125Avoiding congestion; Recovering from congestion by balancing the load, e.g. traffic engineering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/41Flow control; Congestion control by acting on aggregated flows or links
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/03Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words
    • H03M13/05Error detection or forward error correction by redundancy in data representation, i.e. code words containing more digits than the source words using block codes, i.e. a predetermined number of check bits joined to a predetermined number of information bits
    • H03M13/09Error detection only, e.g. using cyclic redundancy check [CRC] codes or single parity bit
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a data transmission method, apparatus, and network element.
  • optical modules are connected to each other through optical modules.
  • the price difference between optical modules of different speeds is large.
  • a 10 Gigabit small package hot-swappable transceiver (English: Small form-factor pluggable Transceiver, SFP) is only a price of 3.8 for a 40 Gigabit SFP. %.
  • SFP Small form-factor pluggable Transceiver
  • multiple low-speed links are usually selected to connect two network elements. Therefore, in the existing network, multiple links are widely deployed, and the multi-chain is followed. Road load balancing requirements and technology.
  • the network element adopts the following load balancing technology.
  • the NE extracts the source address and the destination address from the packet header.
  • the network protocol (English: Internet Protocol, IP for short) packet
  • the source IP address is extracted from the IP packet header.
  • destination IP address is extracted from the IP packet header.
  • the transport layer protocol header may be extracted to extract the source port number and the destination port number.
  • the transport layer protocol is, for example, a Transmission Control Protocol (English: Transmission Control Protocol, TCP for short) or a User Datagram Protocol (English: User Datagram Protocol, UDP for short).
  • TCP Transmission Control Protocol
  • UDP User Datagram Protocol
  • the bit sequence is used to describe the data stream in which the message is located, and is used as an input parameter of the load balancing algorithm.
  • a hash redundancy check (English: Cyclic Redundancy Check, CRC) algorithm is used to calculate a hash value of the bit sequence, and then all the available values of the network element are obtained according to the calculated hash value.
  • the number of the outbound ports is modulo, so that the packet is mapped to a certain physical port, and the physical port is selected as the egress port for forwarding the packet.
  • the last step is to use a hash value to modulo the number of available links and the number of ports, and then obtain a certain value as the final outgoing port.
  • a failed link or a link that is no longer used for management reasons is not considered in the number of outgoing ports.
  • This design ensures that the final selected outgoing port is always an available port, but if the link fails, A technical problem that would result in redistribution of message flows on other non-failed links before the link fails.
  • the packet stream 2 and the packet stream 7 are directed to the link 1, and the packet stream 1, the packet stream 3, and the packet stream 6 are directed to the link 4.
  • link 4 suddenly fails, then message flow 2 will be directed to link 3 instead of the originally directed link 1. Only the packet stream 7 is distributed on the link 1, and the packet flow distributed on the link 1 and the link 3 changes before the link 4 fails.
  • the embodiment of the present invention provides a data transmission method, a device, and a network element, which are used to solve the problem that in the prior art, when only the available link is considered to be in the load balancing, the link fails, and the other un-failed link before the link fails.
  • an embodiment of the present invention provides a data transmission method, including:
  • Receiving a message determining, according to the first decision mode, the first member link in the first group member link; if the first member link is unavailable, according to the second decision mode, in the second group member link Determining a second member link; wherein all member links in the second group member link are available, the first group member link including the second group member link and an unavailable member chain Transmitting the message through the second member link.
  • the member link that is not available since the member link that is not available is considered in the first decision, when the result of the first decision is that the first member link is unavailable, the only link that includes the available link is included.
  • the second member link is determined in the two-member link, so that the packet can be forwarded on the one hand, and the other link fails. The flow on the link is redistributed.
  • the determining, by the first method, the first member link in the first group of member links including: acquiring the packet The flow information is used to describe the message; perform hash calculation on the flow information; determine the first member link in the first group member link according to the calculated hash value .
  • the performing the hash calculation on the flow information includes: performing, according to the first loop, the flow information
  • the redundancy check CRC algorithm performs hash calculation.
  • the determining, by the second determining manner, determining the second member link in the second group member link includes: acquiring the stream information, performing hash calculation on the stream information according to a second CRC algorithm different from the first CRC algorithm; and performing, according to the hash value calculated according to the second CRC algorithm, The second member link is determined in the second group member link.
  • the determining, by the calculated hash value, in the first group member link The one member link includes: determining the first member link according to the calculated hash value and the total number of links of the first group member link.
  • an embodiment of the present invention provides a network element, including:
  • a receiver configured to receive a message or generate a message by using the receiver, and configured to determine a first member link in the first group member link according to the first decision mode;
  • the first member link is unavailable, and the second member link is determined in the second group member link according to the second decision mode; wherein all member links in the second group member link are available,
  • the first group member link includes the second group member link and the unavailable member link; the member link corresponds to the port; and the transmitter is configured to correspond to the second member link
  • the port sends the message.
  • the processor is configured to: obtain flow information of the packet; the flow information is used to describe the packet; The information is hashed; the first member link is determined in the first set of member links according to the calculated hash value.
  • the processor is configured to perform hashing on the flow information according to the first cyclic redundancy check CRC algorithm Calculation.
  • the processor is configured to: acquire the flow information;
  • the second CRC algorithm of the first CRC algorithm performs hash calculation; according to the hash value calculated according to the second CRC algorithm, the second component
  • the second member link is determined in the member link.
  • the processor is configured to use the calculated hash value and the total of the first group member links The number of links determines the first member link.
  • an embodiment of the present invention provides a data transmission apparatus, where the data transmission apparatus includes a functional module for implementing the method described in the first aspect.
  • an embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores program code, where the program code includes instructions for implementing any possible implementation of the method of the first aspect.
  • the flow information includes a source network protocol IP address, a destination IP address, a source port, and a destination port of the packet.
  • the member link is available to be both physically and administratively available. Physically available, including the physical state of the link, can be used to transfer data. Management is available including administrator configuration to enable the link to participate in the work.
  • FIG. 1a-1b are structural diagrams of a communication system according to an embodiment of the present invention.
  • FIG. 2 is a structural diagram of a network element according to an embodiment of the present invention.
  • FIG. 3 is a flowchart of a data transmission method according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of performing hash calculation according to a CRC algorithm according to an embodiment of the present invention.
  • FIG. 5 is a functional block diagram of a data transmission apparatus according to an embodiment of the present invention.
  • the embodiment of the present invention provides a data transmission method, a device, and a network element, which are used to solve the problem that in the prior art, when only the available link is considered to be in the load balancing, the link fails, and the other un-failed link before the link fails.
  • FIG. 1a and FIG. 1b are structural diagrams of a possible communication network system according to an embodiment of the present invention.
  • the communication network system includes a network element 1 and a network element 2. Between the network element 1 and the network element 2, there are n transmission channels of link 1 to link n, where n is an integer greater than 1.
  • the transmission bandwidth or transmission rate of each link may be the same or different.
  • the n links are formed, for example, by an optical module connected to an optical fiber, and may of course be other forms of links. Further, in the structure shown in FIG.
  • the network element 1 sends the received packet to the network element 2 through one of the n links, and is further processed by the network element 2, so that it needs to be in the n-chain.
  • the load balancing in the road makes the load of the n links reach equilibrium.
  • the network element 1 is, for example, a server
  • the network element 2 is, for example, a Top of Rack (TOR: TOR) switch.
  • the n links are link aggregation groups (English: Link Aggregation Group, LGA for short).
  • the network element 1 is a Spine switch
  • the network element 2 is, for example, a TOR switch.
  • the n links are equal-cost multipaths (English: Equal-Cost Multi-Path, referred to as ECMP).
  • one end of the n links is connected to the network element 1, Different network elements connected to the other end, such as network element 21 to network element 2n.
  • the functions of any two network elements in the network element 21 to the network element 2n are the same or different.
  • load balancing is performed on n links, which can also be understood as load balancing in n network elements, so that the load of n network elements is balanced.
  • network side device or the user side device other than the illustrated network element and the link may be included in the communication system, which is not limited in the embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a network element according to an embodiment of the present invention.
  • the network element is, for example, the network element 1, the network element 2, and the network element 21 to the network element 2n.
  • the network element includes a processor 10, a transmitter 20, a receiver 30, a memory 40, and a port 50.
  • the memory 40, the transmitter 20 and the receiver 30 and the processor 10 can be connected via a bus.
  • the memory 40, the transmitter 20, and the receiver 30 and the processor 10 may not be a bus structure, but may be other structures, such as a star structure, which is not specifically limited herein.
  • the processor 10 may be a general-purpose central processing unit or an application specific integrated circuit (ASIC), and may be one or more integrated circuits for controlling program execution, and may be A hardware circuit developed using a Field Programmable Gate Array (FPGA) can be a baseband processor.
  • ASIC application specific integrated circuit
  • FPGA Field Programmable Gate Array
  • processor 10 may include at least one processing core.
  • the memory 40 may include one or more of a read only memory (English: Read Only Memory, ROM for short), a random access memory (English: Random Access Memory, RAM), and a disk storage.
  • Memory 40 is used to store data and/or instructions needed by processor 10 to operate.
  • the number of memories 40 may be one or more.
  • the port 50 includes at least one input port and at least two egress ports.
  • the transmitter 20 and the receiver 30 may be physically independent of each other or integrated.
  • Transmitter 20 can transmit data through an egress port in port 50.
  • Receiver 30 can receive data via an input port in port 50.
  • each egress port corresponds to one link.
  • FIG. 3 is a flowchart of a data transmission method in an embodiment of the present invention. As shown in FIG. 3, the method includes:
  • Step 101 Receive a message
  • Step 102 Determine a first member link in the first group member link according to the first decision mode
  • Step 103 If the first member link is unavailable, determine the second member link in the second group member link according to the second decision mode; wherein all member links in the second group member link All of the member links include the second group member link and an unavailable member link;
  • Step 104 Send the packet by using the second member link.
  • the network element 1 receives the packet sent by the other network element through the input port 50.
  • the packet generated by the network element 1 itself may also be used.
  • step 102 is performed to determine the first member link in the first group member link according to the first decision mode.
  • the first method of decision making can be implemented in multiple ways. The following examples will be explained.
  • the step 102 includes: acquiring flow information of the packet; performing hash calculation on the flow information; and determining a first member link according to the calculated hash value.
  • the flow information of the packet is used to describe the packet.
  • the flow information is, for example, a source IP address and a destination IP address.
  • the source port and the destination port may be included. Therefore, after the packet is obtained in step 101, the packet can be parsed, and the source IP address, the destination IP address, the source port, and the destination port are extracted from the IP header of the packet to form stream information, for example, a 96-bit bit sequence.
  • the flow information may also be other information, as long as all the messages of one packet flow can be directed to the same member link according to the flow information.
  • the flow information is hashed.
  • a plurality of hash algorithms may be used for calculation.
  • the CRC algorithm is taken as an example.
  • the value of the kth bit in g corresponds to the coefficient of x k in g(x).
  • Multiply g(x) by x m that is, add m zeros after g, and then divide by the generator polynomial h(x) of order m, and obtain the binary code r corresponding to the (m-1)th order term r(x) It is the CRC result.
  • the division operation of g(x) and h(x) is an XOR operation by g and h, that is, an exclusive OR operation. For example, the result of XOR operation of 11001 and 10101 is 01100.
  • m is added after g, in this case, 8 zeros, and 10100111010000100000000 (hereinafter referred to as g 0 ) is obtained, and then g 0 and h are divided to obtain g 1 , that is, 1001101110000100000000.
  • g 0 10100111010000100000000
  • the first member link is determined according to the calculated hash value and the number of links of the first group member link. For example, assuming that the number of links of the first group of member links is 7, then r-to-seven is used for modulo operation, that is, the remainder of the CRC result is divided by 7. If the binary 10001100 obtains a remainder of 7 and obtains 0, then the determined first member link is the first link and can correspond to port 0 of port 50.
  • Another possible implementation method is to determine the port by looking up the table.
  • a correspondence table between the hash value and the egress port or the link is pre-configured on the network element, and the correspondence relationship table may also be determined by modulo the number of links by using a hash value. Therefore, after the hash value is calculated, the corresponding egress port or link can be determined by searching the correspondence table.
  • h(x) can be freely selected or used international standards.
  • the CRC algorithm is called CRC-m according to the order m of h(x), such as CRC-32, CRC-64, and the like.
  • the available in the embodiment of the present invention means that both the physical and the management are available. Specifically, the physical state of the link including the link is normal and can be used to transmit data. Management is available including administrator configuration to enable the link to participate in the work.
  • the member links in the first group of member links are available or unavailable.
  • all links that are pre-configured as uplink or downlink of the network element are the first group member links, including available and unavailable.
  • the downlink of network element 1 includes link 1 to link 4, where link 4 is a faulty link, ie, unavailable, and the remaining three links are available links.
  • the first group member link may be saved on the network element in the form of a member link table, and the table may also be Record whether each link is available.
  • the status of the link in the table can be queried to determine whether the first member link is available. For example, the first member link is link 1, and the link 1 can be determined to be available by looking up the table. If the first member link is link 4, the status of link 4 is determined to be unavailable by looking up the table.
  • the packet may be sent by using the first member link.
  • step 103 is performed, that is, the second member link is determined in the second group member link according to the second decision mode.
  • the member links in the second group member link are all available links and are all available links in the first group member link.
  • the second set of member links includes link 1 to link 3. In other words, the unavailable link in the first group member link is deleted, and the remaining one is the second group member link.
  • the second group member link may change with time, and the change includes both the number of links and the change of the specific link.
  • link 4 recovers from the failure, and link 2 fails again, then the second group member link becomes link 1, link 3 and chain. Road 4.
  • link 4 still fails, and link 2 also fails, then the second group member link becomes link 1 and link 3.
  • the link state of the first group of member links changes over time. For example, when link 4 recovers from a failure, the state of link 4 is modified to be available. For another example, if link 2 fails, then the state of link 2 is modified to be unavailable.
  • step 103 can also have multiple implementations.
  • the second decision mode can be the same as the first decision mode.
  • the CRC algorithm described above can also be used, and the same generator polynomial can be used, or different generator polynomials can be used. Polarization can be avoided by using different generator polynomials.
  • the second decision method is different from the first one.
  • the first decision method uses the CRC algorithm described above, and the second decision method may use an exclusive OR operation or a direct modulo calculation.
  • the second member link is also available, so the determined second member link is also available, so step 104 can be performed, that is, the packet is sent through the second member link.
  • link 4 fails. According to the method in the prior art, since the total number of links available at this time becomes 3, all packet flows will be recalculated according to the CRC result. The result of the calculation is the link number of column 6 of Table 1.
  • Table 2 shows the flow distribution on each link before the link fails
  • Table 3 shows the flow distribution on each link after the link fails.
  • the transmission method in the embodiment of the present invention in the case that the link 4 fails, because the first decision is made in the first group member link, that is, the CRC result is available and unavailable.
  • the number of all member links is modulo-calculated. For example, the CRC result is still modulo-calculated for 4, so the result of the calculation is the same as that before the failure of link 4. If, in the first decision, the result of the calculation is the failed link 4, then the remaining available member links are recalculated, that is, the CRC result is used to perform modulo operation on 3 to obtain a new calculation result. Referring to Table 4, after the link is invalid, the result calculated by the transmission method in the embodiment of the present invention and the calculation result before the link failure are compared.
  • the result of the first CRC calculation is 7, and the modulo is 4, and the obtained link number is 3, that is, link 4.
  • the CRC result is recalculated, and the modulo operation is performed on the 3, and the obtained link number is 0, that is, link 1. That is to say, the message stream originally intended to be directed to link 4 is redirected to link 1. Similarly, other messages that would otherwise be directed to link 4 are redirected to other non-failed links by a second decision.
  • the method in the embodiment of the present invention can recover from a failure of a link or from a failure, and the entire process does not affect the flow distribution of the first decision to other links. Do business without loss.
  • an embodiment of the present invention further provides a network element (shown in FIG. 2), where the network element is used to implement any one of the foregoing methods.
  • the processor 10 is configured to receive a message or generate a message by using the receiver 30, and is further configured to determine, in the first group member link, a first member link according to the first decision mode; The member link is unavailable, and the second member link is determined in the second group member link according to the second decision mode; wherein all member links in the second group member link are available,
  • the set of member links includes the second group member link and the unavailable member link; the member link corresponds to the port 50; and the transmitter 20 is configured to correspond to the second member link Port 50 sends the message.
  • the processor 10 is configured to: obtain flow information of the packet; and the flow information is used to describe the packet; Performing a hash calculation on the flow information; determining the first member link in the first group member link according to the calculated hash value.
  • the processor 10 is configured to perform hash calculation on the flow information according to the first cyclic redundancy check CRC algorithm.
  • the processor 10 is configured to: acquire the flow information, perform hash calculation on the flow information according to a second CRC algorithm different from the first CRC algorithm, and calculate according to the second CRC algorithm.
  • the hash value determines the second member link in the second set of member links.
  • the processor 10 is configured to determine the first member link according to the calculated hash value and the total number of links of the first group member link.
  • an embodiment of the present invention further provides a data transmission apparatus.
  • the data transmission apparatus includes an obtaining unit 201 for acquiring a message, and a processing unit 202 for performing the first decision mode. Determining a first member link in the first group member link; and determining, in the second group member link, a second member link according to the second decision mode; All the member links of the second group member link are available, the first group member link includes the second group member link and the unavailable member link; and the sending unit 203 is configured to pass the The second member link sends the message.
  • the processing unit 202 is configured to: obtain flow information of the packet; the flow information is used to describe the packet; perform hash calculation on the flow information; and obtain a hash value according to the calculation Determining the first member link in the first group member link.
  • the processing unit 202 is configured to perform hash calculation on the flow information according to the first cyclic redundancy check CRC algorithm.
  • the processing unit 202 is configured to: acquire the flow information, perform hash calculation on the flow information according to a second CRC algorithm different from the first CRC algorithm, and calculate according to the second CRC algorithm.
  • the hash value determines the second member link in the second set of member links.
  • the processing unit 202 is configured to determine the first member link according to the calculated hash value and the total number of links of the first group member link.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • These computer program instructions can also be stored in a particular computer capable of booting a computer or other programmable data processing device In a computer readable memory that operates in a computer readable memory, causing instructions stored in the computer readable memory to produce an article of manufacture comprising instruction means implemented in a block or in a flow or a flow diagram and/or block diagram of the flowchart The functions specified in the boxes.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

一种数据传输方法、装置以及网元,该方法包括:接收报文;按照第一种决策方式在第一组成员链路中确定第一成员链路;若所述第一成员链路不可用,按照第二种决策方式,在第二组成员链路中确定第二成员链路;其中,所述第二组成员链路中的所有成员链路均可用,所述第一组成员链路包括所述第二组成员链路以及不可用的成员链路;通过所述第二成员链路发送所述报文。通过该方法,可以实现在有链路失效后,在链路失效前分布在其它未失效链路上的流不会重新分布的目的。

Description

一种数据传输方法、装置以及网元
本申请要求在2016年6月22日提交中国专利局、申请号为201610459884.6、发明名称为“一种数据传输方法、装置以及网元”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信技术领域,尤其涉及一种数据传输方法、装置以及网元。
背景技术
当前,网元之间采用光模块接光纤的模式实现互联。不同速率的光模块之间的价格差距较大,例如一个10吉比特(Gigabit)小封装热插拔收发器(英文:Small form-factor pluggable Transceiver,简称:SFP)价格只是一个40Gigabit SFP价格的3.8%。在光模块价格差的驱动下,通常会选择多个低速链路连接两个网元,于是,在现有的网络中,会看到多链路被广泛部署,随之而来的就是多链路负载均衡需求和技术。
在现有技术中,网元采用如下的负载均衡技术。网元接收到报文后,从报文头中提取源地址和目的地址,报文例如为网络协议(英文:Internet Protocol,简称:IP)报文,那么就从IP报文头提取源IP地址和目的IP地址。进一步,还可以提取传输层协议头提取源端口号和目的端口号。传输层协议例如为传输控制协议(英文:Transmission Control Protocol,简称:TCP)或者用户数据报协议(英文:User Datagram Protocol,简称:UDP)。接下来网元将提取的源地址、目的地址、传输层源端口、传输层目的端口组成一个比特序列。该比特序列用来描述该报文所在的数据流,同时用作负载均衡算法的输入参数。具体来说,采用循环冗余校验(英文:Cyclic Redundancy Check,简称:CRC)算法计算该比特序列的哈希(Hash)值,然后根据计算得到的哈希值再对该网元所有可用的出端口数目进行取模运算,从而把该报文对应到一个确定的物理端口上,并选择该物理端口作为转发该报文的出端口。
在上述负载均衡技术中,最后一步是用哈希值对可用链路的数目出端口数取模,然后得到一个确定的值作为最终的出端口。换言之,失效链路或因为管理原因不再使用的链路是不被考虑到出端口数中去的,这种设计保证最终选择的出端口永远是可用端口,然而却存在如果链路失效,就会导致链路失效前其它未失效链路上的报文流重新分布的技术问题。举例来说,按照上述负载均衡技术,将报文流2和报文流7定向到链路1,报文流1、报文流3、报文流6定向到了链路4。此时链路4突然失效,那么报文流2就再会被定向到链路3,而不是原来定向的链路1。链路1上也只分布有报文流7,相较于链路4失效前,链路1和链路3上分布的报文流发生了变化。
发明内容
本发明实施例提供一种数据传输方法、装置以及网元,用以解决现有技术中负载均衡时只考虑可用的链路导致的有链路失效时,链路失效前其它未失效链路上的报文流重新分布的技术问题。
第一方面,本发明实施例提供了一种数据传输方法,包括:
接收报文;按照第一种决策方式在第一组成员链路中确定第一成员链路;若所述第一成员链路不可用,按照第二种决策方式,在第二组成员链路中确定第二成员链路;其中,所述第二组成员链路中的所有成员链路均可用,所述第一组成员链路包括所述第二组成员链路以及不可用的成员链路;通过所述第二成员链路发送所述报文。
在本发明实施例的方案中,因为在第一次决策中,考虑了不可用的成员链路,当第一次决策的结果为第一成员链路不可用时,在仅包括可用链路的第二组成员链路中确定可用的第二成员链路,所以一方面可以确保报文能够转发出去,另一方面,即使某一个成员链路失效,不会导致该成员链路失效前其它未失效链路上的流重新分布。
结合第一方面,在第一方面的第一种可能的实现方式中,所述按照第一种决策方式在第一组成员链路中确定第一成员链路,包括:获取所述报文的流信息;所述流信息用于描述所述报文;对所述流信息进行哈希计算;根据计算得到的哈希值在所述第一组成员链路中确定所述第一成员链路。通过该方法,可以保证具有同一流信息的报文流在链路不失效的情况小被分配到同一链路进行转发,避免丢包、会话一致性收到影响。
结合第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,所述对所述流信息进行哈希计算,包括:对所述流信息根据第一循环冗余校验CRC算法进行哈希计算。
结合第一方面的第二种可能的实现方式,在第一方面的第三种可能的实现方式中,所述按照第二种决策方式,在第二组成员链路中确定第二成员链路,包括:获取所述流信息;对所述流信息根据不同于所述第一CRC算法的第二CRC算法进行哈希计算;根据根据所述第二CRC算法计算得到的哈希值在所述第二组成员链路中确定所述第二成员链路。通过两次决策采用不同的CRC算法,可以避免极化这样的负载不均衡的现象。
结合第一方面的第一种可能的实现方式,在第一方面的第四种可能的实现方式中,所述根据计算得到的哈希值在所述第一组成员链路中确定所述第一成员链路,包括:根据计算得到的哈希值与所述第一组成员链路的总链路数目确定所述第一成员链路。
第二方面,本发明实施例提供一种网元,包括:
接收器;端口;处理器,用于通过所述接收器接收报文或者生成报文;还用于按照第一种决策方式在第一组成员链路中确定第一成员链路;若所述第一成员链路不可用,按照第二种决策方式,在第二组成员链路中确定第二成员链路;其中,所述第二组成员链路中的所有成员链路均可用,所述第一组成员链路包括所述第二组成员链路以及不可用的成员链路;所述成员链路与所述端口对应;发送器,用于通过与所述第二成员链路对应的端口发送所述报文。
结合第二方面,在第二方面的第一种可能的实现方式中,所述处理器用于:获取所述报文的流信息;所述流信息用于描述所述报文;对所述流信息进行哈希计算;根据计算得到的哈希值在所述第一组成员链路中确定所述第一成员链路。
结合第二方面的第一种可能的实现方式,在第二方面的第二种可能的实现方式中,所述处理器用于对所述流信息根据第一循环冗余校验CRC算法进行哈希计算。
结合第二方面的第二种可能的实现方式中,在第二方面的第三种可能的实现方式中,所述处理器用于:获取所述流信息;对所述流信息根据不同于所述第一CRC算法的第二CRC算法进行哈希计算;根据根据所述第二CRC算法计算得到的哈希值在所述第二组成 员链路中确定所述第二成员链路。
结合第二方面的第一种可能的实现方式,在第二方面的第四种可能的实现方式中,所述处理器用于根据计算得到的哈希值与所述第一组成员链路的总链路数目确定所述第一成员链路。
第三方面,本发明实施例提供一种数据传输装置,所述数据传输装置包括用于实现第一方面所述的方法的功能模块。
第四方面,本发明实施例还提供一种计算机存储介质,所述计算机存储介质上存储有程序代码,所述程序代码包括用于实现所述第一方面方法的任意可能的实现方式的指令。
在前述一些可能的实现方式中,所述流信息包括所述报文的源网络协议IP地址、目的IP地址、源端口和目的端口。
在前述一些可能的实现方式中,成员链路可用是指物理上和管理上均可用。物理上可用包括链路的物理状态正常,可以用于传输数据。管理上可用包括管理员配置使能该链路参与工作。
附图说明
图1a-图1b为本发明实施例提供的一种通信系统结构图;
图2为本发明实施例提供的一种网元的结构图;
图3为本发明实施例提供的一种数据传输方法的流程图;
图4为本发明实施例提供的根据CRC算法进行哈希计算的示意图;
图5为本发明实施例提供的一种数据传输装置的功能框图。
具体实施方式
本发明实施例提供一种数据传输方法、装置以及网元,用以解决现有技术中负载均衡时只考虑可用的链路导致的有链路失效时,链路失效前其它未失效链路上的报文流重新分布的技术问题。
以下将详细描述本发明实施例中方案的实施过程、目的。
本发明实施例提供的一种数据传输方法,该方法可以应用于通信网络系统中。请参考图1a和图1b所示,为本发明实施例提供的一种可能的通信网络系统结构图。如图1a中所示的结构,该通信网络系统包括网元1、网元2。网元1和网元2之间具有链路1至链路n的n条传输通道,其中,n为大于1的整数。每个链路的传输带宽或传输速率可以相同,也可以不相同。这n条链路例如是光模块接光纤形成的,当然也可以是其它形式的链路。进一步,在图1a所示的结构中,网元1将接收到的报文通过n条链路中的1条发送给网元2,由网元2作进一步的处理,所以需要在n条链路中作负载均衡,使得n条链路的负载量达到均衡。
可选的,网元1例如为服务器,网元2例如为架顶(英文:Top of Rack,简称:TOR)交换机。n条链路为链路汇聚组(英文:Link Aggregation Group,简称:LGA)。
可选的,网元1为主干(Spine)交换机,网元2例如为TOR交换机。n条链路为等价多路径(英文:Equal-Cost Multi-Path,简称:ECMP)。
在图1b所示的结构中,与图1a所示的结构不同的是,n条链路一端连接的是网元1, 另一端连接的不同的网元,例如网元21至网元2n。网元21至网元2n中任意两个网元的功能相同或不同。在该种情况下,在n条链路中作负载均衡,也可以理解为在n个网元中作负载均衡,使得n个网元的负载量达到均衡。
应理解,图1a和图1b所示的通信系统中仅示出了两类网元和链路的情形,但本发明并不限于此。该通信系统中还可包括除图示的网元和链路以外的网络侧设备或用户侧设备,本发明实施例不做限定。
另外,本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
接下来请参考图2,图2为本发明实施例提供的网元的可能的结构图。该网元例如为上述的网元1、网元2、网元21至网元2n。如图2所示,该网元包括:处理器10、发送器20、接收器30、存储器40和端口50。存储器40、发送器20和接收器30和处理器10可以通过总线进行连接。当然,在实际运用中,存储器40、发送器20和接收器30和处理器10之间可以不是总线结构,而可以是其它结构,例如星型结构,本申请不作具体限定。
可选的,处理器10具体可以是通用的中央处理器或特定应用集成电路(英文:Application Specific Integrated Circuit,简称:ASIC),可以是一个或多个用于控制程序执行的集成电路,可以是使用现场可编程门阵列(英文:Field Programmable Gate Array,简称:FPGA)开发的硬件电路,可以是基带处理器。
可选的,处理器10可以包括至少一个处理核心。
可选的,存储器40可以包括只读存储器(英文:Read Only Memory,简称:ROM)、随机存取存储器(英文:Random Access Memory,简称:RAM)和磁盘存储器中的一种或多种。存储器40用于存储处理器10运行时所需的数据和/或指令。存储器40的数量可以为一个或多个。
可选的,端口50包括至少一个输入端口和至少两个出端口。
可选的,发送器20和接收器30在物理上可以相互独立也可以集成在一起。发送器20可以通过端口50中的出端口进行数据发送。接收器30可以通过端口50中的输入端口进行数据接收。
可选的,每个出端口对应一条链路。
接下来请参考如图3所示,为本发明实施例中数据传输方法的流程图。如图3所示,该方法包括:
步骤101:接收报文;
步骤102:按照第一种决策方式在第一组成员链路中确定第一成员链路;
步骤103:若第一成员链路不可用,按照第二种决策方式,在第二组成员链路中确定第二成员链路;其中,所述第二组成员链路中的所有成员链路均可用,所述第一组成员链路包括所述第二组成员链路以及不可用的成员链路;
步骤104:通过第二成员链路发送所述报文。
可选的,在步骤101中,网元1通过输入端口50接收其它网元发送给的报文。
可选的,在步骤101中,也可以是网元1自己生成的报文。
接下来,执行步骤102,即按照第一种决策方式在第一组成员链路中确定第一成员链路。第一种决策方式可以有多种实现方式,以下将举例进行说明。
一种可能的实现方式,步骤102包括:获取所述报文的流信息;对所述流信息进行哈希计算;根据计算得到的哈希值确定第一成员链路。其中,报文的流信息用于描述报文。
可选的,在IP报文中,流信息例如为源IP地址、目的IP地址。进一步可以包括源端口和目的端口。因此,在步骤101中获得报文后,可以解析该报文,在报文的IP头提取源IP地址、目的IP地址、源端口和目的端口组成流信息,例如为96比特的比特序列。
当然,在实际运用中,流信息还可以是其它信息,只要能够根据流信息将一个报文流的所有报文定向到同一成员链路即可。
接下来,对所述流信息进行哈希计算,在实际使用中,可以采用多种哈希算法进行计算,本发明实施例中以CRC算法为例进行介绍。
假设得到的流信息为一个15位的二进制信息g=101001110100001,这串二进制码可表示为代数多项式g(x)=x14+x12+x9+x8+x7+x5+1,其中g中第k位的值,对应g(x)中xk的系数。将g(x)乘以xm,即将g后加m个0,然后除以m阶的生成多项式h(x),得到的(m-1)阶余项r(x)对应的二进制码r就是CRC结果。g(x)和h(x)的除运算,是通过g和h作XOR运算,即异或运算,比如将11001与10101做XOR运算得到的结果为01100。
以下举例以CRC-m(8)算法求二进制的101001110100001的CRC结果。CRC-8采用标准的生成多项式h(x)=x8+x7+x6+x4+x2+1,即h是9位的二进制串111010101。如图4所示,先在g后面加m个0,在本例中为8个0,得到10100111010000100000000(下称g0),然后将g0和h进行除运算,得到g1,即1001101110000100000000。继续将g1和h作除运算,得到g2,即111000100000100000000。继续迭代运算,直到7阶余项r,即10001100,即为CRC计算的结果,也即得到的哈希值。
接下来根据计算得到的哈希值和第一组成员链路的链路数目确定第一成员链路。举例来说,假设第一组成员链路的链路数目为7,那么就用r对7取模运算,即用CRC结果除7求余数。二进制的10001100对7求余数得到0,那么就表示确定出的第一成员链路为第一条链路,可以对应到端口50中的第0号端口。
在得到哈希值后,另一种可能的实现方式为通过查表的方式确定出端口。具体来说,在网元上预先配置哈希值与出端口或者链路的对应关系表,该对应关系表也可以是通过哈希值对链路数目取模确定出来的。因此,在计算得到哈希值后,可以通过查找该对应关系表确定对应的出端口或者链路。
需要说明的是,h(x)可以自由选择或者使用国际通行标准,一般按照h(x)的阶数m,将CRC算法称为CRC-m,比如CRC-32、CRC-64等。
以上描述了一种可能的第一种决策方式,在实际运用中,还可以采用其它决策方式,例如通过异或运算或者取模运算对流信息进行哈希计算,本发明实施例不作具体限定。
当在步骤102中确定出第一成员链路后,可以判断第一成员当前是否可用,本发明实施例中的可用是指物理上和管理上均可用。具体来说,物理上可用包括链路的物理状态正常,可以用于传输数据。管理上可用包括管理员配置使能该链路参与工作。
另外,第一组成员链路中的成员链路是可用或不可用的,通常来讲,可以预先配置为网元上行或下行的所有链路为第一组成员链路,包括可用的以及不可用的链路。举例来说,网元1的下行链路包括链路1至链路4,其中,链路4为故障链路,即不可用,而其余三条链路为可用链路。
可选的,第一组成员链路可以以成员链路表的形式保存在网元上,同时该表上还可以 记录每条链路是否为可用的状态。相应的,在步骤102中确定出第一成员链路后,可以通过查询该表中的链路的状态,可以判断第一成员链路是否可用。举例来说,第一成员链路为链路1,通过查表可以确定该链路1为可用。若第一成员链路为链路4,通过查表可以确定链路4为不可用的状态。
当然,在实际运用中,还可以通过其它方式记录每条链路可用或不可用的状态,本发明实施例不作具体限定。
可选的,若第一成员链路当前是可用的,那么就可以通过第一成员链路发送所述报文。
可选的,若第一成员链路不可用,则执行步骤103,即按照第二种决策方式,在第二组成员链路中确定第二成员链路。其中,第二组成员链路中的成员链路均为可用的链路,且为第一组成员链路中的所有可用的链路。举例来说,第二组成员链路包括链路1至链路3。换言之,将第一组成员链路中的不可用链路删除,剩余的即为第二组成员链路。
需要说明的是,第二组成员链路随着时间的增加可能会发生变化,这个变化既包括链路数量的变化也包括具体链路的变化。距离来说,随着时间的增加,链路4从故障中恢复了,而链路2又发生了故障,那么此时第二组成员链路就变成了链路1、链路3和链路4。再例如,链路4依然故障,而链路2也发生了故障,那么此时第二组成员链路就变成了链路1和链路3。
类似的,第一组成员链路的链路状态会随着时间的变化发生变化。例如在链路4从故障中恢复时,就将链路4的状态修改为可用。再例如,链路2也发生了故障,那么就将链路2的状态修改为不可用。
其中,步骤103也可以有多种实现方式。一种可能的情况为:第二种决策方式可以和第一种决策方式相同,例如也采用前述所描述的CRC算法,既可以采用相同的生成多项式,也可以可以采用不同的生成多项式。采用不同的生成多项式可以避免出现极化现象。另一种可能的情况为:第二种决策方式和第一种决策方式不相同。举例来说,第一种决策方式采用前述所描述的CRC算法,而第二种决策方式可以采用异或运算或者直接取模计算。
因为第二组成员链路均为可用,所以确定出的第二成员链路也是可用的,所以可以执行步骤104,即通过第二成员链路发送所述报文。
以下将举一个具体的例子来说明本发明实施例中的数据传输方法的实施过程,以及与现有技术中的数据传输方法作一个对比。
假设在开始一段时间内,网元的4个链路均为可用,在现有技术中,按照上述CRC算法进行链路定向,定向的结果请参考表一所示。
Figure PCTCN2017076966-appb-000001
Figure PCTCN2017076966-appb-000002
表一
在现有技术中,因为是用CRC结果对可用的总链路数目作取模运算,在刚开始时,无成员链路失效,所以可用的总成员链路数为4。8条报文流按照CRC结果对可用的总成员链路数取模运算的结果请参考表一中第4列的链路编号。其中链路标号0-3分别对应链路1至链路4。
在后续的某个时刻,链路4失效,按照现有技术中的方法,因为此时可用的总链路数目变为3,所以将会按照CRC结果对3取模重新计算所有报文流,计算的结果如表一第6列的链路编号。
为了更清楚的对比链路失效前和失效后每个链路上的流分布情况,请参考表二和表三所示。其中,表二表示链路失效前的每个链路上的流分布情况,表三表示链路失效后的每个链路上的流分布情况。
链路1 链路2 链路3 链路4
流2 流4 流5 流1
流7 流8   流3
      流6
表二
链路1 链路2 链路3 链路4
流7 流1 流2  
  流5 流3  
  流6 流4  
  流8    
表三
通过表三和表二可以看出,在链路4失效前,流2的报文从链路1进行转发,而在链路4失效后,流2的后续报文要从链路3进行转发。在链路4失效前,流5的报文从链路3进行转发,而在链路4失效后,流5的后续报文要从链路2进行转发。也就是说,在有链路失效后,在链路失效前其它未失效链路的流进行了重新分布。类似的,当链路4从故障中恢复至可用状态时,流分布又会从表三的状态恢复至表二的状态,再次发生未失效链路的流进行重新分布的情况。这样的情况容易导致业务受影响,例如拥塞、收敛导致丢包、会话一致性受到影响。
若采用本发明实施例中的传输方法时,在链路4失效的情况下,因为第一次决策时,是在第一组成员链路中确定的,即CRC结果是对可用以及不可用的所有成员链路数目进行取模运算,例如依然是CRC结果对4进行取模运算,所以计算的结果和链路4失效前的计算结果是一样的。而如果在第一次决策时,计算结果是失效的链路4,那么就再在剩余的可用的成员链路中重新计算,即用CRC结果对3进行取模运算,得到新的计算结果。请参考表四所示,为有链路失效后,采用本发明实施例中的传输方法计算的结果和链路失效前的计算结果的对比。
Figure PCTCN2017076966-appb-000003
Figure PCTCN2017076966-appb-000004
表四
举例来说,如表四所示,第一次CRC计算的结果为7,对4取模,得到的链路编号为3,即链路4。经过确定发现链路4已经失效,所以就重新计算CRC结果,并对3进行取模运算,得到的链路编号为0,即链路1。也就是说将原本要定向到链路4的报文流重新定向到链路1。同理,其它原本要定向到链路4的报文通过第二次决策重新定向到了其它未失效的链路上。
对于像CRC结果为8的流2来说,通过第一次决策,就确定从链路1进行转发,因为链路1是可用的,所以就不需要再次进行决策,所以在链路4失效的情况下,定位到的结果依然是链路1,即链路编号0。同理,对于其它在链路4失效前分布在其它未失效链路上的流,在链路失效后,依然分布在原来的链路上,并没有发生变化。为了更清楚的看到决策的结果,请参考表二和表五所示。其中,表五表示在链路4失效后,每个链路上的流分布情况。
链路1 链路2 链路3 链路4
流2 流4 流5  
流7 流8 流6  
流1 流3    
表五
由表五和表二的对比可以看出,在链路4失效后,原本分布在链路4上的流重新分布在其它未失效的链路上。而在链路4失效前,就分布在其它未失效链路上的流在链路4失效后,依然分布在原来的链路上。类似的,当链路4从失效状态恢复到可用的状态后,也仅仅是回到链路4失效前的状态,即表二的状态,而不会造成其它未失效链路上的原本的流重新分布。
由上述描述可以看出,通过本发明实施例中方法,可以在某链路失效时或者是从失效中恢复,整个过程不会对第一次决策落到其它链路的流分布带来影响,做到业务无损。
基于同一发明构思,本发明实施例还提供一种网元(如图2所示),该网元用于实现前述方法中的任意一种方法。
具体的,处理器10,用于通过接收器30接收报文或者生成报文;还用于按照第一种决策方式在第一组成员链路中确定第一成员链路;若所述第一成员链路不可用,按照第二种决策方式,在第二组成员链路中确定第二成员链路;其中,所述第二组成员链路中的所有成员链路均可用,所述第一组成员链路包括所述第二组成员链路以及不可用的成员链路;所述成员链路与所述端口50对应;发送器20,用于通过与所述第二成员链路对应的端口50发送所述报文。
可选的,处理器10用于:获取所述报文的流信息;所述流信息用于描述所述报文; 对所述流信息进行哈希计算;根据计算得到的哈希值在所述第一组成员链路中确定所述第一成员链路。
可选的,处理器10用于对所述流信息根据第一循环冗余校验CRC算法进行哈希计算。
可选的,处理器10用于:获取所述流信息;对所述流信息根据不同于所述第一CRC算法的第二CRC算法进行哈希计算;根据根据所述第二CRC算法计算得到的哈希值在所述第二组成员链路中确定所述第二成员链路。
可选的,处理器10用于根据计算得到的哈希值与所述第一组成员链路的总链路数目确定所述第一成员链路。
基于同一发明构思,本发明实施例还提供一种数据传输装置,如图5所示,该数据传输装置包括获取单元201,用于获取报文;处理单元202,用于按照第一种决策方式在第一组成员链路中确定第一成员链路;若所述第一成员链路不可用,按照第二种决策方式,在第二组成员链路中确定第二成员链路;其中,所述第二组成员链路中的所有成员链路均可用,所述第一组成员链路包括所述第二组成员链路以及不可用的成员链路;发送单元203,用于通过所述第二成员链路发送所述报文。
可选的,处理单元202用于:获取所述报文的流信息;所述流信息用于描述所述报文;对所述流信息进行哈希计算;根据计算得到的哈希值在所述第一组成员链路中确定所述第一成员链路。
可选的,处理单元202用于对所述流信息根据第一循环冗余校验CRC算法进行哈希计算。
可选的,处理单元202用于:获取所述流信息;对所述流信息根据不同于所述第一CRC算法的第二CRC算法进行哈希计算;根据根据所述第二CRC算法计算得到的哈希值在所述第二组成员链路中确定所述第二成员链路。
可选的,处理单元202用于根据计算得到的哈希值与所述第一组成员链路的总链路数目确定所述第一成员链路。
前述实施例中的数据传输方法中的各种变化方式和具体实例同样适用于本实施例的数据传输装置以及图2中的网元,通过前述对数据传输方法的详细描述,本领域技术人员可以清楚的知道本实施例中数据传输装置以及图2中的网元的实施方法,所以为了说明书的简洁,在此不再详述。
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方 式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (15)

  1. 一种数据传输方法,其特征在于,包括:
    接收报文;
    按照第一种决策方式在第一组成员链路中确定第一成员链路;
    若所述第一成员链路不可用,按照第二种决策方式,在第二组成员链路中确定第二成员链路;其中,所述第二组成员链路中的所有成员链路均可用,所述第一组成员链路包括所述第二组成员链路以及不可用的成员链路;
    通过所述第二成员链路发送所述报文。
  2. 如权利要求1所述的方法,其特征在于,所述按照第一种决策方式在第一组成员链路中确定第一成员链路,包括:
    获取所述报文的流信息;所述流信息用于描述所述报文;
    对所述流信息进行哈希计算;
    根据计算得到的哈希值在所述第一组成员链路中确定所述第一成员链路。
  3. 如权利要求2所述的方法,其特征在于,所述对所述流信息进行哈希计算,包括:
    对所述流信息根据第一循环冗余校验CRC算法进行哈希计算。
  4. 如权利要求3所述的方法,其特征在于,所述按照第二种决策方式,在第二组成员链路中确定第二成员链路,包括:
    获取所述流信息;
    对所述流信息根据不同于所述第一CRC算法的第二CRC算法进行哈希计算;
    根据根据所述第二CRC算法计算得到的哈希值在所述第二组成员链路中确定所述第二成员链路。
  5. 如权利要求2所述的方法,其特征在于,所述根据计算得到的哈希值在所述第一组成员链路中确定所述第一成员链路,包括:
    根据计算得到的哈希值与所述第一组成员链路的总链路数目确定所述第一成员链路。
  6. 一种网元,其特征在于,包括:
    接收器;
    端口;
    处理器,用于通过所述接收器接收报文或者生成报文;还用于按照第一种决策方式在第一组成员链路中确定第一成员链路;若所述第一成员链路不可用,按照第二种决策方式,在第二组成员链路中确定第二成员链路;其中,所述第二组成员链路中的所有成员链路均可用,所述第一组成员链路包括所述第二组成员链路以及不可用的成员链路;所述成员链路与所述端口对应;
    发送器,用于通过与所述第二成员链路对应的端口发送所述报文。
  7. 如权利要求6所述的网元,其特征在于,所述处理器用于:获取所述报文的流信息;所述流信息用于描述所述报文;对所述流信息进行哈希计算;根据计算得到的哈希值在所述第一组成员链路中确定所述第一成员链路。
  8. 如权利要求7所述的网元,其特征在于,所述处理器用于对所述流信息根据第一循环冗余校验CRC算法进行哈希计算。
  9. 如权利要求8所述的网元,其特征在于,所述处理器用于:获取所述流信息;对所述流信息根据不同于所述第一CRC算法的第二CRC算法进行哈希计算;根据根据所述 第二CRC算法计算得到的哈希值在所述第二组成员链路中确定所述第二成员链路。
  10. 如权利要求7所述的网元,其特征在于,所述处理器用于根据计算得到的哈希值与所述第一组成员链路的总链路数目确定所述第一成员链路。
  11. 一种数据传输装置,其特征在于,包括:
    获取单元,用于获取报文;
    处理单元,用于按照第一种决策方式在第一组成员链路中确定第一成员链路;若所述第一成员链路不可用,按照第二种决策方式,在第二组成员链路中确定第二成员链路;其中,所述第二组成员链路中的所有成员链路均可用,所述第一组成员链路包括所述第二组成员链路以及不可用的成员链路;
    发送单元,用于通过所述第二成员链路发送所述报文。
  12. 如权利要求11所述的装置,其特征在于,所述处理单元用于:获取所述报文的流信息;所述流信息用于描述所述报文;对所述流信息进行哈希计算;根据计算得到的哈希值在所述第一组成员链路中确定所述第一成员链路。
  13. 如权利要求12所述的装置,其特征在于,所述处理单元用于对所述流信息根据第一循环冗余校验CRC算法进行哈希计算。
  14. 如权利要求13所述的装置,其特征在于,所述处理单元用于:获取所述流信息;对所述流信息根据不同于所述第一CRC算法的第二CRC算法进行哈希计算;根据根据所述第二CRC算法计算得到的哈希值在所述第二组成员链路中确定所述第二成员链路。
  15. 如权利要求12所述的装置,其特征在于,所述处理单元用于根据计算得到的哈希值与所述第一组成员链路的总链路数目确定所述第一成员链路。
PCT/CN2017/076966 2016-06-22 2017-03-16 一种数据传输方法、装置以及网元 WO2017219719A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP17814458.0A EP3477893B1 (en) 2016-06-22 2017-03-16 A data transmission method and device, and network element
US16/225,385 US10904139B2 (en) 2016-06-22 2018-12-19 Data transmission method and apparatus and network element

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610459884.6A CN107528711B (zh) 2016-06-22 2016-06-22 一种数据传输方法、装置以及网元
CN201610459884.6 2016-06-22

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/225,385 Continuation US10904139B2 (en) 2016-06-22 2018-12-19 Data transmission method and apparatus and network element

Publications (1)

Publication Number Publication Date
WO2017219719A1 true WO2017219719A1 (zh) 2017-12-28

Family

ID=60735407

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/076966 WO2017219719A1 (zh) 2016-06-22 2017-03-16 一种数据传输方法、装置以及网元

Country Status (4)

Country Link
US (1) US10904139B2 (zh)
EP (1) EP3477893B1 (zh)
CN (2) CN107528711B (zh)
WO (1) WO2017219719A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220255847A1 (en) * 2019-10-28 2022-08-11 Huawei Technologies Co., Ltd. Packet Sending Method and First Network Device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11750531B2 (en) * 2019-01-17 2023-09-05 Ciena Corporation FPGA-based virtual fabric for data center computing
CN113472700B (zh) * 2021-09-01 2022-02-25 阿里云计算有限公司 报文处理方法、设备、存储介质及网卡

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102098740A (zh) * 2011-02-15 2011-06-15 中兴通讯股份有限公司 链路聚合选路方法及装置
CN102263697A (zh) * 2011-08-03 2011-11-30 杭州华三通信技术有限公司 一种聚合链路流量分担方法和装置
US20120087372A1 (en) * 2009-10-08 2012-04-12 Force 10 Networks, Inc. Link Aggregation Based on Port and Protocol Combination
CN102685007A (zh) * 2012-05-04 2012-09-19 华为技术有限公司 一种多链路捆绑链路组中成员链路的处理方法及装置
US20140198656A1 (en) * 2013-01-15 2014-07-17 Brocade Communications Systems, Inc. Adaptive link aggregation and virtual link aggregation

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8264953B2 (en) * 2007-09-06 2012-09-11 Harris Stratex Networks, Inc. Resilient data communications with physical layer link aggregation, extended failure detection and load balancing
US9298732B2 (en) * 2010-09-29 2016-03-29 Red Hat, Inc. Searching cloud-based distributed storage resources using a set of expendable probes
CN102158398A (zh) * 2011-02-25 2011-08-17 杭州华三通信技术有限公司 报文转发方法和装置
US20120230194A1 (en) * 2011-03-11 2012-09-13 Broadcom Corporation Hash-Based Load Balancing in Large Multi-Hop Networks with Randomized Seed Selection
US20130003549A1 (en) 2011-06-30 2013-01-03 Broadcom Corporation Resilient Hashing for Load Balancing of Traffic Flows
CN102577280B (zh) * 2011-11-28 2014-05-21 华为技术有限公司 发送报文的方法、装置和系统
US8553552B2 (en) * 2012-02-08 2013-10-08 Radisys Corporation Stateless load balancer in a multi-node system for transparent processing with packet preservation
CN102857419B (zh) * 2012-10-12 2015-07-22 华为技术有限公司 链路聚合端口故障的处理方法和装置
CN103236975B (zh) * 2013-05-09 2017-02-08 杭州华三通信技术有限公司 报文转发方法和装置
CN103401801A (zh) * 2013-08-07 2013-11-20 盛科网络(苏州)有限公司 动态负载均衡的实现方法及装置
CN104703222B (zh) * 2013-12-10 2018-06-15 华为技术有限公司 一种数据传输方法及路由器

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120087372A1 (en) * 2009-10-08 2012-04-12 Force 10 Networks, Inc. Link Aggregation Based on Port and Protocol Combination
CN102098740A (zh) * 2011-02-15 2011-06-15 中兴通讯股份有限公司 链路聚合选路方法及装置
CN102263697A (zh) * 2011-08-03 2011-11-30 杭州华三通信技术有限公司 一种聚合链路流量分担方法和装置
CN102685007A (zh) * 2012-05-04 2012-09-19 华为技术有限公司 一种多链路捆绑链路组中成员链路的处理方法及装置
US20140198656A1 (en) * 2013-01-15 2014-07-17 Brocade Communications Systems, Inc. Adaptive link aggregation and virtual link aggregation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3477893A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220255847A1 (en) * 2019-10-28 2022-08-11 Huawei Technologies Co., Ltd. Packet Sending Method and First Network Device

Also Published As

Publication number Publication date
CN107528711B (zh) 2021-08-20
EP3477893B1 (en) 2021-08-18
US20190123997A1 (en) 2019-04-25
US10904139B2 (en) 2021-01-26
CN107528711A (zh) 2017-12-29
CN113852561A (zh) 2021-12-28
EP3477893A4 (en) 2019-05-01
EP3477893A1 (en) 2019-05-01

Similar Documents

Publication Publication Date Title
US9942064B2 (en) Data processing method and apparatus
US9654419B2 (en) Fabric channel control apparatus and method
US8891516B2 (en) Extended link aggregation (LAG) for use in multiple switches
US9692686B2 (en) Method and system for implementing a multi-chassis link aggregation group in a network
US9455916B2 (en) Method and system for changing path and controller thereof
WO2018187049A1 (en) Digital signature systems and methods for network path trace
US10116577B2 (en) Detecting path MTU mismatch at first-hop router
US20150078375A1 (en) Mutable Hash for Network Hash Polarization
US8861334B2 (en) Method and apparatus for lossless link recovery between two devices interconnected via multi link trunk/link aggregation group (MLT/LAG)
WO2017219719A1 (zh) 一种数据传输方法、装置以及网元
US20140040477A1 (en) Connection mesh in mirroring asymmetric clustered multiprocessor systems
GB2578415A (en) Methods and systems for transmitting information packets through tunnel groups at a network node
US8953607B2 (en) Internet group membership protocol group membership synchronization in virtual link aggregation
US8634417B2 (en) Method and apparatus providing selective flow redistribution across Multi Link Trunk/Link Aggregation Group (MLT/LAG) after port member failure and recovery
Engelmann et al. Exploiting parallelism with random linear network coding in high-speed ethernet systems
WO2019024759A1 (zh) 一种数据通信方法及数据通信网络
US20160254928A1 (en) Systems and methods for stacking fibre channel switches with fibre channel over ethernet stacking links
WO2023011153A1 (zh) 负载均衡的哈希算法信息的确定方法、装置及存储介质
US9497285B1 (en) Connection bucketing in mirroring asymmetric clustered multiprocessor systems
US20210258247A1 (en) Path selection systems and methods for data traffic for link aggregation group topologies
US9450892B2 (en) Switch device, network system, and switch device control method
WO2018028457A1 (zh) 一种确定路由的方法、装置及通信设备
US8660127B2 (en) Cascaded load balancing
JP2014154915A (ja) スイッチングハブ

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17814458

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017814458

Country of ref document: EP

Effective date: 20190122