WO2015149374A1 - 众核下网络数据的分发方法及系统 - Google Patents

众核下网络数据的分发方法及系统 Download PDF

Info

Publication number
WO2015149374A1
WO2015149374A1 PCT/CN2014/074868 CN2014074868W WO2015149374A1 WO 2015149374 A1 WO2015149374 A1 WO 2015149374A1 CN 2014074868 W CN2014074868 W CN 2014074868W WO 2015149374 A1 WO2015149374 A1 WO 2015149374A1
Authority
WO
WIPO (PCT)
Prior art keywords
core
host system
state
flow table
memory
Prior art date
Application number
PCT/CN2014/074868
Other languages
English (en)
French (fr)
Inventor
唐继元
王伟
蔡毅
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2014/074868 priority Critical patent/WO2015149374A1/zh
Priority to CN201480000856.7A priority patent/CN105164980B/zh
Publication of WO2015149374A1 publication Critical patent/WO2015149374A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/64Hybrid switching systems
    • H04L12/6418Hybrid transport

Definitions

  • the invention belongs to the field of communication technologies, and in particular relates to a method and system for distributing network data under a multi-core.
  • RSS has the following disadvantages: it cannot guarantee the data distribution to the core where the application is located; it cannot cooperate with the software partition technology; it cannot cooperate with the OS (English full name: Operating) System, Chinese: operating system) scheduling, load balancing, etc.
  • Flow Director Flow Guide table
  • Table the navigation flow table is dedicated to recording the correspondence between the network flow and the CPU where the application is located, and the multi-queue network card first performs matching flow table matching and then performs RSS distribution.
  • the navigation flow table is located in the NIC memory and is a hash (Chinese: hash) table. Each entry corresponds to a filter. The matched filter can be used to RX.
  • the packet is transmitted to the corresponding hardware queue, and the hash table can accommodate 8k ⁇ 32k entries.
  • the adding, deleting, and updating operations of the entries in the flow table are all performed by the network card.
  • the host system (application or OS) sends a request to the network card through the register interface of the multi-queue network card, and the network card completes the related operations and returns the operation result to the network card. Host system.
  • the interaction between the host system and the NIC is expensive; for example, the NIC driver cannot delete a single entry in the flow table because the driver cannot sense whether the connection of a single entry is active; the driver needs periodic flush ( Refresh)
  • the entire flow table the test results show that the driver scheduling kernel to run the flush operation takes 80,000 cycles (about 40us), in addition, flushing the entire flow table also takes 70000 cycles (about 35us). If the driver of the NIC does not adopt the policy of periodically flushing the entire flow table, it also takes a lot of time to periodically confirm whether each entry in the flow table is active, so as to correspond to the entry that is inactive for a long time. The filter is removed from the flow table.
  • the embodiment of the invention provides a method for distributing network data under a multi-core to solve the problem of large interaction cost between the host system and the network card in the prior art.
  • a method for distributing network data under a plurality of cores comprising:
  • the host system maintains a flow table of the network card, and the flow table of the network card is stored in an independent memory block in the network card;
  • a memory area of the flow table in the host system and the independent memory block are mapped by a direct memory access DMA controller in the host system;
  • the DMA controller After the DMA controller detects that the flow table of the host system is changed, the flow table of the changed host system memory is mapped into the independent memory of the network card to complete the maintenance of the flow table of the network card; The host system prohibits sending a flow table maintenance command to the network card.
  • the table entry of the flow table is The Entry includes: three fields; the three fields are: Flow State, Delay Time, and Pending Packets to be processed. Ptr;
  • the field Flow State indicates the current state of the flow, and the state includes: a transmission state, a migration state, and a shutdown state;
  • the field Relay Time indicates the time that the flow state needs to be delayed when it is in the migration state.
  • the host system maintaining the shared flow table includes:
  • the host system modifies the table entry of the corresponding flow in the flow table of the host system memory; the new core new The core identifier of the core is saved in the table entry, and the state of the migration corresponding stream is modified to a migration state;
  • the host system determines whether the current queue load of the old core old core is smaller than the current queue load of the new core; The current queue load of the core is smaller than the current queue load of the new core. After the host system delays for a time t, the received packets of the migrated stream are distributed to new. The core processes, and the host system saves t in the Relay Time;
  • the old core is a core that executes the application before the migration occurs
  • the new core is a core that executes the application after the migration occurs.
  • the calculation method of the t is:
  • t ( old core current load—new core current load) / The number of packets processed by the protocol stack per unit time.
  • a method for transmitting network data under a plurality of cores comprising:
  • the network card receives the network data, and distributes the network data according to the flow table stored in the independent memory block of the network card;
  • the independent memory block is mapped to a memory area of a flow table of the host system by a direct memory access DMA controller in the host system, and the flow table is maintained by the host system.
  • a third aspect is a host system, the host system comprising:
  • a maintenance unit configured to maintain a flow table of the network card, where the flow table of the network card is stored in an independent memory block in the network card;
  • a memory area of the flow table in the host system and the independent memory block are mapped by a direct memory access DMA controller in the host system;
  • a DMA controller configured to: after detecting a change in a flow table of the host system memory, mapping a flow table of the changed host system memory to the independent memory of the network card to complete maintenance of the flow table of the network card;
  • the prohibition unit is configured to prohibit sending a flow table maintenance command to the network card.
  • the table entry of the flow table is The Entry includes: three fields; the three fields are: Flow State, Delay Time, and Pending Packets to be processed. Ptr;
  • the field Flow State indicates the current state of the flow, and the state includes: a transmission state, a migration state, and a shutdown state;
  • the field Relay Time indicates the time that the flow state needs to be delayed when it is in the migration state.
  • the maintenance unit includes:
  • the migration modification module is configured to modify a table entry of the corresponding flow in the flow table of the host system memory when the application flowing in the network is migrated;
  • the core identifier of the core is saved in the table entry, and the state of the migration corresponding stream is modified to a migration state;
  • the judgment delay module is used to judge whether the current queue load of the old core old core is newer than new The current queue load of the core is small; when the current queue load of the old core is smaller than the current queue load of the new core, after a delay of a time t, the received packet of the migrated stream is distributed to new. Core processing, saving t in the Relay Time;
  • the old core is a core that executes the application before the migration occurs
  • the new core is a core that executes the application after the migration occurs.
  • the t is:
  • t ( old core current load—new core current load) / The number of packets processed by the protocol stack per unit time.
  • a network card the network card includes:
  • a receiving unit configured to receive network data
  • a stream forwarding unit configured to stream the network data according to a flow table stored in the independent memory block of the network card
  • the independent memory block is mapped to a memory area of a flow table of the host system by a direct memory access DMA controller in the host system, and the flow table is maintained by the host system.
  • a fifth aspect is a host system, the host system comprising: a processor, a memory, a communication port, a bus, and a direct memory access DMA controller, wherein the processor, the memory, the communication port, the DMA controllers are all connected through the bus;
  • the communication port is configured to receive network data
  • the memory is configured to store a flow table; a memory area where the flow table stored by the memory is located and a network card independent memory block form a mapping by the DMA controller;
  • the processor is configured to maintain a flow table of the network card, where the flow table of the network card is stored in an independent memory block in the network card;
  • the DMA controller is configured to: after detecting the change of the flow table stored in the memory, mapping the changed flow table stored by the memory to the independent memory of the network card to complete the flow table of the network card. maintain;
  • the processor is configured to prohibit sending a flow table maintenance command to the network card.
  • the table entry of the flow table is The Entry includes: three fields; the three fields are: Flow State, Delay Time, and Pending Packets to be processed. Ptr;
  • the field Flow State indicates the current state of the flow, and the state includes: a transmission state, a migration state, and a shutdown state;
  • the field Relay Time indicates the time that the flow state needs to be delayed when it is in the migration state.
  • the processor modifies a table entry of the corresponding stream in the flow table of the memory when the application of the stream in the network migrates;
  • the core identifier of the core is saved in the table entry, and the state of the migration corresponding stream is modified to a migration state; whether the current queue load of the old core is smaller than the current queue load of the new core;
  • the current queue load of the core is lower than the current queue load of the new core.
  • the old core is a core that executes the application before the migration occurs
  • the new core is a core that executes the application after the migration occurs.
  • t ( old core current load—new Core current load) / The number of packets processed by the protocol stack per unit time.
  • a sixth aspect a network card, comprising: a logic processing module, a memory, a memory, a communication port, and a bus, wherein the logic processing module, the memory, the communication port, and the memory pass the Bus connection
  • the communication port is configured to receive network data
  • the memory is used to store a flow table
  • the logic processing module is configured to stream the network data according to a flow table stored in a NIC memory;
  • the memory is mapped to a memory area of a flow table of the host system by a direct memory access DMA controller in the host system, and the flow table is maintained by the host system.
  • the embodiment of the present invention has the beneficial effects that the technical solution of the present invention improves the network parallel processing capability and increases the network throughput.
  • the NIC flow table is completely maintained by the host system, which reduces the maintenance time overhead, improves the flexibility and efficiency of the NIC flow table maintenance, and provides a basis for the full utilization of the NIC flow table function; the host system can update the connection according to the protocol connection status.
  • filters Table hit rate improve filters Table hit rate, convenient for web applications to use filters Table implements streaming data to the core of the network program, which helps to reduce competition for shared resources, minimizes software synchronization overhead, increases cache hit ratio, and provides a transparent mechanism for upper-layer network application flow orientation; when the application migrates The real-time load of the receive queues of the two cores is used to calculate the timeout period, reduce the delay distribution time of the data packets, and further improve the network throughput.
  • FIG. 1 is a flowchart of a method for distributing network data under a plurality of cores according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a TCP packet provided by an embodiment of the present invention.
  • FIG. 3 is a structural block diagram of a host system according to an embodiment of the present invention.
  • FIG. 4 is a structural block diagram of a network card according to an embodiment of the present invention.
  • FIG. 5 is a hardware structural diagram of a host system according to an embodiment of the present invention.
  • FIG. 6 is a hardware structural diagram of a network card according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram showing the out-of-order of a TCP packet message according to an embodiment of the present invention.
  • DMA Direct memory access
  • English full name Direct Memory
  • the Access controller is a unique device that transfers data inside the system and can be thought of as a controller that can connect internal and external memory to each DMA-capable peripheral through a dedicated bus.
  • the DMA controller can be set up separately or integrated into the host system or network card.
  • a specific embodiment of the present invention provides a method for distributing network data under a plurality of cores.
  • the method is as shown in FIG. 1 and includes the following steps:
  • the host system maintains a flow table of the network card; the flow table of the network card is stored in an independent memory block in the network card;
  • the memory area of the flow table in the host system and the independent memory block are mapped by a direct memory access DMA controller in the host system;
  • the DMA controller in the host system detects the change of the flow table of the host system memory, mapping the changed flow table of the host system memory to the independent memory of the network card to complete the flow table of the network card. Maintenance; the host system prohibits sending a flow table maintenance command to the network card.
  • the technical solution provided by the specific embodiment of the present invention shares the flow table in the independent memory block of the network card to the host system, so that the host system maintains the flow table in the network card completely, reduces the time overhead of the flow table maintenance, and improves the maintenance of the network card flow table. Flexibility and efficiency provide the foundation for the full utilization of NIC flow meter functionality. Since the flow table is not stored in the NIC data memory area, but is stored separately in a memory block, the memory block can be independent memory mapped in the memory area of the host system, so that the flow table in the independent memory block can be separately shared.
  • the host system can be directly maintained on the flow table (including the search, add, delete, and refresh operations of the filter) without the intervention of the network card, so the method provided by the present invention has a small interaction between the host system and the network card. .
  • the host system can directly operate on the flow table, and the flow table can be updated according to the state of the connection in the protocol stack. Therefore, the technical solution provided by the present invention has the advantage of low overhead of interaction between the host system and the network card.
  • the Table Entry in the above flow table (English: Table Entry); the Table Entry may include the following fields:
  • Flow table identification field (English: Flow ID Field); used to identify the flow;
  • Filter operation (English: Filter Action); used to record the identity of the core that operates the stream;
  • Collision Flag (English: Collision Flag); used to identify whether a stream conflicts;
  • Flow status (English: Flow State); the current state of the recording stream, specifically the following three states can be set (of course, in the actual situation, more states can be set);
  • CLOSED Closed state
  • NVMAL Normal state
  • the above-mentioned old core may be a core that executes the application before the migration occurs, and the new core may be a core that executes the application after the migration occurs.
  • Delay time (English: Relay Time); used to record when Flow When State is TRANSITIONAL, the time t to be delayed is required;
  • Packet pointer to be processed (English: Pending Packets Ptr); used to record when Flow When the State is TRANSITIONAL, it receives the data packet (English full name: Receive Packets, English abbreviation: RX) Packets)
  • RX Packets are sequenced with RX in Pending Packets Ptr Packets
  • the sequence of pending packet pointers, where RX Packets are sequenced with RX in Pending Packets Ptr Packets have the same serial number.
  • Next filter pointer (English: Next Filter PTR); used to record the next filter identifier.
  • the original application data is divided into 4 TCP (English full name: Transmission Control Protocol, Chinese: Transmission Control Protocol) Packet, the sequence number in the TCP header of the 4 TCP packets is incremented.
  • the four packets calculated according to the routing algorithm (for example, the shortest path first algorithm) can reach the destination host through different paths, which may cause the packets to be out of order.
  • the intermediate slice 1 and the intermediate slice 2 precede the first slice.
  • the intermediate piece 2 arrives before the intermediate piece 1, and there is a packet disorder between the first piece, the intermediate piece 1, and the intermediate piece 2.
  • the technical solution for maintaining the shared flow table may include:
  • the host system determines whether the current queue load of the old core is smaller than the current queue load of the new core; The core's current queue load is less than the current core's current queue load. After the host system delays for a time t, the migrated RX Packets are distributed to new. Core processing; and save t in Relay Time.
  • the foregoing flow management table shared by the host system can also implement the following two functions: the two functions are specifically: Filter addition and Filter deletion; the implementation methods of the two functions are specifically described below.
  • the host system can create a filter of the connection according to a certain policy and add the filter to the flow table.
  • RX received by the connection The number of Packets. When the number reaches the set number threshold, the new connected filter is added to the flow table.
  • the above-mentioned first filter addition scheme is suitable for long-term links. For short-time links, the first scheme will bring unnecessary maintenance of the flow table; for short-time links, another filter is applicable. Add a program.
  • the host system can delete the Filter from the flow table according to the deletion policy.
  • the foregoing deletion policy may specifically be: when the connection state of the Filter is changed from the normal state to the closed state, the shutdown state may be: TIME_WAIT, CLOSE_WAIT or CLOSED, the host system deletes the Filter from the flow table; or the host system periodically Detects the connection status of the Filter.
  • the connection corresponding to the Filter is abnormally interrupted
  • the connection status of the Filter is changed to the closed state, and the Filter is deleted from the flow table.
  • This deletion policy is applicable to the case where the link is abnormally interrupted.
  • the technical solution of the above host system maintenance flow table improves the network parallel processing capability and increases the network throughput.
  • the host system maintains the flow table in the NIC completely, which reduces the maintenance time overhead, improves the flexibility and efficiency of the NIC flow table maintenance, and provides the basis for the full utilization of the NIC flow table function; the host system can be updated according to the protocol connection status.
  • the filter corresponding to the connection improve filters Table hit rate, convenient for web applications to use filters Table implements streaming data to the core of the network program, which helps to reduce competition for shared resources, minimizes software synchronization overhead, increases cache hit ratio, and provides a transparent mechanism for upper-layer network application flow orientation; when the application migrates
  • the real-time load of the receive queues of the two cores is used to calculate the timeout period, reduce the delay distribution time of the data packets, and further improve the network throughput.
  • a specific embodiment of the present invention further provides a method for transmitting network data under a multi-core, the method comprising:
  • the network card receives the network data, and distributes the network data according to the flow table stored in the independent memory of the network card;
  • the independent memory block is mapped to a memory area of a flow table of the host system by a direct memory access DMA controller in the host system, and the flow table is maintained by the host system.
  • the method shares the flow table data with the host system and maintains the flow table, which can reduce the interaction overhead.
  • the embodiment of the present invention further provides a host system.
  • the host system 300 includes as shown in FIG. 3:
  • the maintenance unit 301 is configured to maintain a flow table of the network card, where the flow table of the network card is stored in an independent memory block in the network card;
  • the memory area of the flow table in the host system 300 and the independent memory block are mapped by a direct memory access DMA controller in the host system;
  • the DMA controller 302 is configured to: after detecting a change in the flow table of the host system memory, mapping the changed flow table of the host system memory to the independent memory of the network card to complete the maintenance of the flow table of the network card ;
  • the prohibiting unit 303 is configured to prohibit sending a flow table maintenance command to the network card.
  • the Table Entry of the flow table includes: three fields; the three fields are: a flow state Flow State, delay time Relay Time and pending packet pointer Pending Packets Ptr;
  • the field Flow State indicates the current state of the flow, and the state includes: a transmission state, a migration state, and a shutdown state;
  • the field Relay Time indicates the time that the flow state needs to be delayed when it is in the migration state.
  • the maintenance unit 301 can include:
  • the migration modification module 3011 is configured to modify a table entry of a corresponding flow in the flow table of the host system memory when the application flowing in the network is migrated;
  • the core identifier of the core is saved in the table entry, and the state of the migration corresponding stream is modified to a migration state;
  • the determining delay module 3012 is configured to determine whether the current queue load of the old core old core is newer than new The core's current queue load is small; when the old core's current queue load is new The current queue load of the core is small, and after the host system is delayed by one time t, the received data packet of the migrated stream is distributed to the new core for processing, and t is stored in the Relay Time;
  • the above-mentioned old core is a core that executes the application before the migration occurs
  • the new core is a core that executes the application after the migration occurs.
  • t (old core current load - new core current load) / The number of packets processed by the protocol stack per unit time.
  • a specific embodiment of the present invention provides a network card.
  • the network card 400 includes:
  • the receiving unit 401 is configured to receive network data.
  • the stream forwarding unit 402 is configured to stream the network data according to a flow table stored in the NIC independent memory block.
  • the independent memory block is mapped to a memory area of a flow table of the host system by a direct memory access DMA controller in the host system, and the flow table is maintained by the host system.
  • a specific embodiment of the present invention provides a host system, as shown in FIG. 5, including: a processor 501, a memory 502, a communication port 503, a bus 504, and a DMA controller 505, wherein the processor 501, the memory 502, and the communication Port 503
  • the DMA controller 505 is all connected through the bus 504; the DMA controller 505 can also be integrated in the processor 501;
  • a communication port 503, configured to receive network data
  • the memory 502 is configured to store the flow table; the memory area of the flow table stored by the memory 502 and the network card independent memory block are mapped by the DMA controller 505;
  • the processor 501 is configured to maintain a flow table of the network card, where the flow table of the network card is stored in an independent memory block in the network card;
  • the DMA controller 505 is configured to: after detecting the change of the flow table stored in the memory 502, mapping the flow table stored in the changed memory 502 to the independent memory of the network card to complete maintenance of the flow table of the network card;
  • the processor 501 is configured to prohibit sending a flow table maintenance command to the network card.
  • the table entry Table Entry of the foregoing flow table includes: three fields; the three fields are: a flow state flow State, delay time Relay Time and pending packet pointer Pending Packets Ptr;
  • the field Flow State indicates the current state of the flow, and the state includes: a transmission state, a migration state, and a shutdown state;
  • the field Relay Time indicates the time that the flow state needs to be delayed when it is in the migration state.
  • the processor 501 when the application flowing in the network migrates, modifies the table entry of the corresponding stream in the flow table of the memory 502;
  • the core identifier of the core is saved in the table entry, and the state of the migration corresponding stream is modified to a migration state; whether the current queue load of the old core is smaller than the current queue load of the new core;
  • the current queue load of the core is lower than the current queue load of the new core.
  • the old core is a core that executes the application before the migration occurs
  • the new core is a core that executes the application after the migration occurs.
  • the above t (old core current load—new core current load) / The number of packets processed by the protocol stack per unit time.
  • a specific embodiment of the present invention provides a network card, as shown in FIG. 6, including: a logic processing module 601, a memory 602, a memory 605, a communication port 603, and a bus 604, wherein the logic processing module 601, the memory 602, and the communication port 603 The memory 605 is connected through the bus 604;
  • a communication port 603, configured to receive network data
  • a memory 605 for storing a flow table
  • the logic processing module 601 is configured to stream the network data according to a flow table stored in the NIC memory;
  • the memory 605 forms a mapping with a memory area of the flow table of the host system through a DMA controller within the host system, the flow table being maintained by the host system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

本发明适用于通信技术领域,提供了一种众核下网络数据的分发方法及装置,所述方法包括:主机系统维护网卡的流表,所述网卡的流表保存在所述网卡内的独立内存块内;所述主机系统内的流表的内存区域与所述独立内存块通过主机系统内的直接内存访问DMA控制器形成映射;所述DMA控制器检测到所述主机系统内存的流表发生变化后,将变化后的主机系统内存的流表映射到所述网卡的独立内存内以完成所述网卡的流表的维护;所述主机系统禁止向所述网卡发送流表维护命令。本发明提供的技术方案具有交互开销小,提高网络吞吐量的优点。

Description

众核下网络数据的分发方法及系统 技术领域
本发明属于通信技术领域,尤其涉及一种众核下网络数据的分发方法及系统。
背景技术
网络通信技术的快速发展促使网络带宽和流量不断增大,经验法则表明:1Mbps的网络数据需要1MHz的CPU处理能力,当千兆、万兆以上的网络出现后,网络数据协议处理占用大量的CPU计算时间,网络协议栈的处理能力逐渐成为系统总体性能的瓶颈。随着10Gbps网络适配器的到来,单核CPU已经不能完全满足网卡的需求,所以CPU向多核、众核方向发展。
对于众核方向,考虑CPU的并行性是自然而然的事情,然而,使用众核通过单一、共享的队列提高吞吐量则需要软件锁定机制,软件锁定机制可能导致效率低下。为了提高效率,引用了RSS(英文全称:Receive Side Scaling,中文:接收端扩展)俗称多队列网卡技术,通过多队列网卡驱动的支持,可以将各个队列通过中断绑定到众核中不同的核上,以满足网卡的需求。但是RSS有如下缺点:无法保证数据分发到应用程序所在的核上;无法配合软件分区技术;无法配合OS(英文全称:Operating System,中文:操作系统)调度、负载均衡等。为了弥补RSS的缺点,在多队列网卡中引入了导向流表(Flow Director Table),该导向流表专用于记录网络流与应用所在CPU的对应关系,多队列网卡先进行导向流表匹配然后再进行RSS分发。导向流表位于网卡内存中,是一张hash(中文:哈希)表,每个表项都对应一个filter(滤波器),通过匹配的filter就可以将RX Packet传送到对应的硬件队列中,该hash表可以容纳8k~32k的表项。流表中表项的添加、删除、更新操作都是由网卡负责的,主机系统(应用程序或者OS)通过多队列网卡的寄存器接口向网卡发出请求,网卡完成相关操作后再将操作结果返回给主机系统。
在实现现有技术的技术方案中,发现现有技术存在如下技术问题:
主机系统与网卡之间交互的开销大;例如,网卡的驱动程序无法删除流表中的单个表项,因为驱动程序无法感知单个表项的连接是否处于激活状态;驱动程序需要周期性的flush(刷新)整个流表,测试结果表明,驱动程序调度内核去运行flush操作需要花费80000周期(约40us),另外,flush整个流表还需要花费70000周期(约35us)。如果网卡的驱动程序不采取周期性flush整个流表的策略,也需要花费大量时间周期性来确认流表中每个表项是否处于激活状态,以便将长时间处于未激活状态的表项所对应的filter从流表中删除。
技术问题
本发明实施例提供一种众核下网络数据的分发方法,以解决现有技术主机系统与网卡之间的交互开销大的问题。
技术解决方案
第一方面,一种众核下网络数据的分发方法,所述方法包括:
主机系统维护网卡的流表,所述网卡的流表保存在所述网卡内的独立内存块内;
所述主机系统内的流表的内存区域与所述独立内存块通过主机系统内的直接内存访问DMA控制器形成映射;
所述DMA控制器检测到所述主机系统内存的流表发生变化后,将变化后的主机系统内存的流表映射到所述网卡的独立内存内以完成所述网卡的流表的维护;所述主机系统禁止向所述网卡发送流表维护命令。
在第一方面的第一种可能实施方式中,所述流表的表条目Table Entry中包括:三个字段;所述三个字段分别为:流状态Flow State、延迟时间Relay Time和待处理的数据包指针Pending Packets Ptr;
其中,字段Flow State,表示流当前处于的状态,所述状态包括:传输状态、迁移状态和关闭状态;
字段Relay Time,表示所述Flow State为迁移状态时,需要延迟的时间;
字段Pending Packets Ptr,表示Flow State为迁移状态时,接收数据包在待处理的数据包指针的先后顺序。
结合第一方面的第一种可能实施方式,在第二种可能实施方式中, 所述主机系统维护共享的流表包括:
在网络中流的应用程序发生迁移时,所述主机系统修改所述主机系统内存的流表中迁移对应流的表条目;将新核new core的核标识保存在所述表条目内,并将所述迁移对应流的状态修改成迁移状态;
主机系统判断旧核old core的当前队列负载是否比new core的当前队列负载小;当old core的当前队列负载比new core的当前队列负载小时,主机系统延迟一个时间t后,将发生迁移的流的接收数据包分发到new core进行处理,主机系统将t保存在所述Relay Time内;
所述旧核为迁移发生前执行所述应用程序的核,所述新核为迁移发生后执行所述应用程序的核。
结合第一方面的第二种可能实施方式,在第三种可能实施方式中, 所述t的计算方法为:
t=( old core当前负载—new core当前负载) / 协议栈单位时间内处理的数据包数。
第二方面,一种众核下网络数据的发送方法,所述方法包括:
网卡接收网络数据,依据网卡独立内存块内存储的流表对所述网络数据进行流分发;
所述独立内存块通过主机系统内的直接内存访问DMA控制器与所述主机系统的流表的内存区域形成映射,所述流表由所述主机系统维护。
第三方面,一种主机系统,所述主机系统包括:
维护单元,用于维护网卡的流表,所述网卡的流表保存在所述网卡内的独立内存块内;
所述主机系统内的流表的内存区域与所述独立内存块通过主机系统内的直接内存访问DMA控制器形成映射;
DMA控制器,用于检测到所述主机系统内存的流表发生变化后,将变化后的主机系统内存的流表映射到所述网卡的独立内存内以完成所述网卡的流表的维护;
禁止单元,用于禁止向所述网卡发送流表维护命令。
在第三方面的第一种可能实施方式中,所述流表的表条目Table Entry中包括:三个字段;所述三个字段分别为:流状态Flow State、延迟时间Relay Time和待处理的数据包指针Pending Packets Ptr;
其中,字段Flow State,表示流当前处于的状态,所述状态包括:传输状态、迁移状态和关闭状态;
字段Relay Time,表示所述Flow State为迁移状态时,需要延迟的时间;
字段Pending Packets Ptr,表示Flow State为迁移状态时,接收数据包在待处理的数据包指针的先后顺序。
在第三方面的第二种可能实施方式中,所述维护单元包括:
迁移修改模块,用于在网络中流的应用程序发生迁移时,修改所述主机系统内存的流表中迁移对应流的表条目;将新核new core的核标识保存在所述表条目内,并将所述迁移对应流的状态修改成迁移状态;
判断延时模块,用于判断旧核old core的当前队列负载是否比new core的当前队列负载小;当old core的当前队列负载比new core的当前队列负载小时,延迟一个时间t后,将发生迁移的流的接收数据包分发到new core进行处理,将t保存在所述Relay Time内;
所述旧核为迁移发生前执行所述应用程序的核,所述新核为迁移发生后执行所述应用程序的核。
在第三方面的第三种可能实施方式中,所述t为:
t=( old core当前负载—new core当前负载) / 协议栈单位时间内处理的数据包数。
第四方面,一种网卡,所述网卡包括:
接收单元,用于接收网络数据;
流转发单元,用于依据网卡独立内存块内存储的流表对所述网络数据进行流分发;
所述独立内存块通过主机系统内的直接内存访问DMA控制器与所述主机系统的流表的内存区域形成映射,所述流表由所述主机系统维护。
第五方面,一种主机系统,所述主机系统包括:处理器、存储器、通信端口、总线和直接内存访问DMA控制器,其中,所述处理器、所述存储器、所述通信端口、所述DMA控制器均通过所述总线连接;
所述通信端口,用于接收网络数据;
所述存储器,用于存储流表;所述存储器存储的流表所在的内存区域与网卡独立内存块通过所述DMA控制器形成映射;
所述处理器,用于维护网卡的流表,所述网卡的流表保存在所述网卡内的独立内存块内;
所述DMA控制器,用于检测到所述存储器存储的流表发生变化后,将变化后的所述存储器存储的流表映射到所述网卡的独立内存内以完成所述网卡的流表的维护;
所述处理器,用于禁止向所述网卡发送流表维护命令。
在第五方面的第一种可能实施方式中,所述流表的表条目Table Entry中包括:三个字段;所述三个字段分别为:流状态Flow State、延迟时间Relay Time和待处理的数据包指针Pending Packets Ptr;
其中,字段Flow State,表示流当前处于的状态,所述状态包括:传输状态、迁移状态和关闭状态;
字段Relay Time,表示所述Flow State为迁移状态时,需要延迟的时间;
字段Pending Packets Ptr,表示Flow State为迁移状态时,接收数据包在待处理的数据包指针的先后顺序。
结合第五方面的第一种可能实施方式,在第二种可能实施方式中, 所述处理器在网络中流的应用程序发生迁移时,修改所述存储器的流表中迁移对应流的表条目;将new core的核标识保存在所述表条目内,并将所述迁移对应流的状态修改成迁移状态;判断old core的当前队列负载是否比new core的当前队列负载小;当old core的当前队列负载比new core的当前队列负载小时,延迟一个时间t后,将发生迁移的流的接收数据包分发到new core进行处理,将t保存在所述Relay Time内;
所述旧核为迁移发生前执行所述应用程序的核,所述新核为迁移发生后执行所述应用程序的核。
结合第五方面的第二种可能实施方式,在第三种可能实施方式中, t=( old core当前负载—new core当前负载) / 协议栈单位时间内处理的数据包数。
第六方面,一种网卡,所述网卡包括:逻辑处理模块、存储器、内存、通信端口和总线,其中,所述逻辑处理模块、所述存储器、所述通信端口、所述内存均通过所述总线连接;
所述通信端口,用于接收网络数据;
所述内存,用于存储流表;
所述逻辑处理模块,用于依据网卡内存内存储的流表对所述网络数据进行流分发;
所述内存通过主机系统内的直接内存访问DMA控制器与所述主机系统的流表的内存区域形成映射,所述流表由所述主机系统维护。
有益效果
本发明实施例与现有技术相比存在的有益效果是:本发明的技术方案提高了网络并行处理能力,增加网络吞吐量。完全由主机系统维护网卡流表,降低了维护的时间开销,提高网卡流表维护的灵活性和效率,为网卡流表功能的充分利用提供了基础;主机系统可以根据协议连接状态及时更新该连接对应的filter,提高filters table的命中率,方便网络应用程序利用filters table实现流数据定向到网络程序所在core,有利于减少对共享资源的竞争,最小化软件同步开销,增加cache命中率,为上层网络应用程序流定向提供了透明的机制;当应用程序发生迁移时,利用两个core的接收队列的实时负载计算超时时间,降低数据包的延时分发时间,进一步提高网络吞吐量。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本发明具体实施方式提供的众核下网络数据的分发方法的流程图;
图2是本发明具体实施方式提供的TCP分组示意图;
图3是本发明具体实施方式提供的主机系统的结构框图;
图4是本发明具体实施方式提供的网卡的结构框图;
图5是本发明具体实施方式提供的主机系统的硬件结构图;
图6是本发明具体实施方式提供的网卡的硬件结构图;
图7是本发明具体实施方式提供的TCP分组报文乱序示意图。
本发明的实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、接口、技术之类的具体细节,以便透切理解本发明实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本发明。在其它情况中,省略对众所周知的装置、电路以及方法的详细说明,以免不必要的细节妨碍本发明的描述。
DMA(中文:直接内存访问,英文全称:Direct Memory Access)控制器是一种在系统内部转移数据的独特设备,可以将其视为一种能够通过一组专用总线将内部和外部存储器与每个具有DMA能力的外设连接起来的控制器。DMA控制器可以单独设置,也可以集成在主机系统或网卡内。
为了说明本发明所述的技术方案,下面通过具体实施例来进行说明。
本发明具体实施方式提供一种众核下网络数据的分发方法,该方法如图1所示,包括如下步骤:
101、主机系统维护网卡的流表;所述网卡的流表保存在所述网卡内的独立内存块内;
上述主机系统内的流表的内存区域与所述独立内存块通过主机系统内的直接内存访问DMA控制器形成映射;
102、主机系统内的DMA控制器检测到所述主机系统内存的流表发生变化后,将变化后的主机系统内存的流表映射到所述网卡的独立内存内以完成所述网卡的流表的维护;所述主机系统禁止向所述网卡发送流表维护命令。
本发明具体实施方式提供的技术方案将网卡独立内存块内的流表共享给主机系统,这样完全由主机系统维护网卡内的流表,降低了流表维护的时间开销,提高网卡流表维护的灵活性和效率,为网卡流表功能的充分利用提供了基础。由于流表不存放在网卡数据内存区中,而是单独存放在一个内存块内,该内存块可以为映射在主机系统内存区域的独立内存,这样便于将独立内存块中的流表能够单独共享给主机系统,使主机系统能够直接对流表进行维护(包括filter的查找、添加、删除、刷新等操作)而不需要网卡的干涉,所以本发明提供的方法主机系统与网卡之间交互的开销小。主机系统能够直接对流表进行操作,就可以根据协议栈中连接的状态对流表进行更新。所以本发明提供的技术方案具有主机系统与网卡之间交互的开销小的优点。
可选的,上述流表中的表条目(英文:Table Entry);Table Entry可以包括如下字段:
流表标识域(英文:Flow ID Field);用于标识流;
滤波器操作(英文:Filter Action);用于记录操作该流的核的标识;
冲突标识(英文:Collision Flag);用于标识流是否发生冲突;
流状态(英文:Flow State);记录流当前处于的状态,具体可以设置如下三种状态(当然在实际情况中也可以设置更多的状态);
关闭状态(CLOSED),此状态表示该流已经关闭,即将从流表中删除;
正常状态(NORMAL):此状态表示该流处于正常状态,也可以称为传输状态;
迁移状态(TRANSITIONAL):此状态表示该流的应用程序发生迁移,即流的应用程序从旧核(old core)迁移到新核(new core)。
上述旧核可以为迁移发生前执行所述应用程序的核,上述新核可以为迁移发生后执行所述应用程序的核
延迟时间(英文:Relay Time);用于记录当Flow State为TRANSITIONAL时,需要延迟的时间t;
待处理的数据包指针(英文:Pending Packets Ptr);用于记录当Flow State为TRANSITIONAL时,接收数据包(英文全称:Receive Packets ,英文简称:RX Packets)在待处理的数据包指针的先后顺序,其中Pending Packets Ptr中RX Packets的先后顺序与RX Packets的序列号相同。
下一个过滤器指针(英文:Next Filter PTR);用于记录下一个过滤器标识。
上述流状态(英文:Flow State)、延迟时间(英文:Relay Time)和待处理的数据包指针(英文:Pending Packets Ptr)为本发明具体实施方式在流条目中新增加的字段。
网络中进程迁移的packet reordering(中文:分组乱序)问题,如图2所示,原始的应用数据被分成了4个TCP(英文全称:Transmission Control Protocol,中文:传输控制协议)分组,4个TCP分组中TCP头部中的序列号是递增的。依据路由算法(例如最短路径优先算法)计算出的4个分组可以通过不同的路径达到目的主机,因而可能造成报文乱序,如图7所示,中间片1、中间片2先于首片到达,中间片2先于中间片1到达,首片、中间片1、中间片2之间存在分组乱序。为了解决分组乱序问题,本发明具体实施方式采用主机系统维护共享流表的技术方案来解决上述技术问题,具体维护共享流表的技术方案可以包括:
当网络应用程序发生迁移时,在网络中迁移产生的原因有很多种,例如OS(英文全称:Operating System,中文:操作系统)负载均衡发生应用程序的调动迁移。通常在网络应用程序发生迁移时,我们将当前核(也可以称为迁移前的核)称为 old core;将应用程序(即将迁移)立即分发到的核(也可以称为迁移后的核)称为new core;本发明具体实施方式提供的方案,在网络应用程序发生迁移时,主机系统修改流表中迁移对应流的Table Entry;将新核的核标识保存在该Table Entry内,将迁移对应流的状态修改成迁移状态。具体的实现方式可以为:将Table Entry中的滤波器操作(Filter Action)修改成新核的核标识;将Table Entry中Flow State修改成TRANSITIONAL;
主机系统判断old core的当前队列负载是否比new core的当前队列负载小;当old core的当前队列负载比new core的当前队列负载小时,主机系统延迟一个时间t后,将迁移的RX Packets分发到new core进行处理;并将t保存在Relay Time内。
采用此种情况可以保证在new core处理RX Packets之前,old core已经处理完待处理的RX Packets,这样RX Packets的处理顺序就和RX Packets的序列号一致了,所以其也不会发生分组乱序问题。
上述t的计算方法具体可以为:t=( old core当前负载—new core当前负载) / 协议栈单位时间内(具体可以为秒,毫秒,微秒等)处理的数据包数。
可选的,上述主机系统维护共享的流表还可以实现如下二个功能,二个功能具体为:Filter添加和Filter删除;下面将具体介绍二个功能的实现方法。
Filter添加;
当网络连接建立时,当经历过三次握手成功之后,网络连接状态变成正常状态,此时主机系统可以根据一定的策略创建该连接的filter并将该filter添加到流表中。在流表中添加filter的策略有两种,第一种为当出现新的连接时,直接在流表中添加该新的连接的filter;另一种为当出现新的连接时,统计该新的连接所接收到的RX Packets的数量,当该数量达到设定数量阈值时,在流表中添加该新的连接的filter。上述第一种filter的添加方案适用与长时间的链接,对于短时间的链接,采用第一种方案会带来不必要的流表的维护开销;对于短时间的链接,比较适用另一种filter的添加方案。
Filter删除;
当Filter的连接状态发生变化时(从正常状态变化到关闭状态时),主机系统可以根据删除策略将该Filter从流表中删除。上述删除策略具体可以为:Filter的连接状态由正常状态变成关闭状态时,该关闭状态具体可以为:TIME_WAIT、CLOSE_WAIT或CLOSED,主机系统将该Filter从流表中删除;或主机系统周期性的检测Filter的连接状态,当Filter对应的连接异常中断时,将Filter的连接状态修改成关闭状态,将该Filter从流表中删除;此种删除策略适用于链接异常中断的情况。
上述主机系统维护流表的技术方案提高了网络并行处理能力,增加网络吞吐量。完全由主机系统维护网卡内的流表,降低了维护的时间开销,提高网卡流表维护的灵活性和效率,为网卡流表功能的充分利用提供了基础;主机系统可以根据协议连接状态及时更新该连接对应的filter,提高filters table的命中率,方便网络应用程序利用filters table实现流数据定向到网络程序所在core,有利于减少对共享资源的竞争,最小化软件同步开销,增加cache命中率,为上层网络应用程序流定向提供了透明的机制;当应用程序发生迁移时,利用两个core的接收队列的实时负载计算超时时间,降低数据包的延时分发时间,进一步提高网络吞吐量。
本发明具体实施方式还提供一种众核下网络数据的发送方法,上述方法包括:
网卡接收网络数据,依据网卡独立内存内存储的流表对所述网络数据进行流分发;
上述独立内存块通过主机系统内的直接内存访问DMA控制器与所述主机系统的流表的内存区域形成映射,上述流表由所述主机系统维护。
本方法将流表数据与主机系统共享,并且对流表进行维护,这样能减少交互的开销。
本发明具体实施方式还提供一种主机系统,主机系统300如图3所示包括:
维护单元301,用于维护网卡的流表,所述网卡的流表保存在所述网卡内的独立内存块内;
主机系统300内的流表的内存区域与所述独立内存块通过主机系统内的直接内存访问DMA控制器形成映射;
DMA控制器302,用于检测到所述主机系统内存的流表发生变化后,将变化后的主机系统内存的流表映射到所述网卡的独立内存内以完成所述网卡的流表的维护;
禁止单元303,用于禁止向所述网卡发送流表维护命令。
可选的,流表的Table Entry中包括:三个字段;所述三个字段分别为:流状态Flow State、延迟时间Relay Time和待处理的数据包指针Pending Packets Ptr;
其中,字段Flow State,表示流当前处于的状态,所述状态包括:传输状态、迁移状态和关闭状态;
字段Relay Time,表示所述Flow State为迁移状态时,需要延迟的时间;
字段Pending Packets Ptr,表示Flow State为迁移状态时,接收数据包在待处理的数据包指针的先后顺序。
可选的,维护单元301可以包括:
迁移修改模块3011,用于在网络中流的应用程序发生迁移时,修改所述主机系统内存的流表中迁移对应流的表条目;将新核new core的核标识保存在所述表条目内,并将所述迁移对应流的状态修改成迁移状态;
判断延时模块3012,用于判断旧核old core的当前队列负载是否比new core的当前队列负载小;当old core的当前队列负载比new core的当前队列负载小时,主机系统延迟一个时间t后,将发生迁移的流的接收数据包分发到new core进行处理,将t保存在所述Relay Time内;
上述旧核为迁移发生前执行所述应用程序的核,上述新核为迁移发生后执行所述应用程序的核。
可选的,t=( old core当前负载—new core当前负载) / 协议栈单位时间内处理的数据包数。
本发明具体实施方式提供一种网卡,网卡400如图4所示,包括:
接收单元401,用于接收网络数据;
流转发单元402,用于依据网卡独立内存块内存储的流表对所述网络数据进行流分发;
上述独立内存块通过主机系统内的直接内存访问DMA控制器与所述主机系统的流表的内存区域形成映射,所述流表由所述主机系统维护。
本发明具体实施方式提供一种主机系统,该主机系统如图5所示包括:处理器501、存储器502、通信端口503、总线504和DMA控制器505,其中,处理器501、存储器502、通信端口503 、DMA控制器505均通过总线504连接;DMA控制器505也可以集成在处理器501内;
通信端口503,用于接收网络数据;
存储器502,用于存储流表;存储器502存储的流表的内存区域与网卡独立内存块通过DMA控制器505形成映射;
处理器501,用于维护网卡的流表,该网卡的流表保存在所述网卡内的独立内存块内;
DMA控制器505,用于检测到存储器502存储的流表发生变化后,将变化后的存储器502存储的流表映射到网卡的独立内存内以完成所述网卡的流表的维护;
处理器501,用于禁止向所述网卡发送流表维护命令。
可选的,上述流表的表条目Table Entry中包括:三个字段;所述三个字段分别为:流状态Flow State、延迟时间Relay Time和待处理的数据包指针Pending Packets Ptr;
其中,字段Flow State,表示流当前处于的状态,所述状态包括:传输状态、迁移状态和关闭状态;
字段Relay Time,表示所述Flow State为迁移状态时,需要延迟的时间;
字段Pending Packets Ptr,表示Flow State为迁移状态时,接收数据包在待处理的数据包指针的先后顺序。
可选的,上述处理器501在网络中流的应用程序发生迁移时,修改存储器502的流表中迁移对应流的表条目;将new core的核标识保存在所述表条目内,并将所述迁移对应流的状态修改成迁移状态;判断old core的当前队列负载是否比new core的当前队列负载小;当old core的当前队列负载比new core的当前队列负载小时,延迟一个时间t后,将发生迁移的流的接收数据包分发到new core进行处理,将t保存在所述Relay Time内;
所述旧核为迁移发生前执行所述应用程序的核,所述新核为迁移发生后执行所述应用程序的核。
可选的,上述t=( old core当前负载—new core当前负载) / 协议栈单位时间内处理的数据包数。
本发明具体实施方式提供一种网卡,该网卡如图6所示包括:逻辑处理模块601、存储器602、内存605、通信端口603和总线604,其中,逻辑处理模块601、存储器602、通信端口603 、内存605均通过总线604连接;
通信端口603,用于接收网络据;
内存605,用于存储流表;
逻辑处理模块601,用于依据网卡内存内存储的流表对所述网络数据进行流分发;
内存605通过主机系统内的DMA控制器与所述主机系统的流表的内存区域形成映射,所述流表由所述主机系统维护。
以上所述实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明实施例各实施例技术方案的精神和范围。

Claims (15)

  1. 一种众核下网络数据的分发方法,其特征在于,所述方法包括:
    主机系统维护网卡的流表,所述网卡的流表保存在所述网卡内的独立内存块内;
    所述主机系统内的流表的内存区域与所述独立内存块通过主机系统内的直接内存访问DMA控制器形成映射;
    所述DMA控制器检测到所述主机系统内存的流表发生变化后,将变化后的主机系统内存的流表映射到所述网卡的独立内存内以完成所述网卡的流表的维护;所述主机系统禁止向所述网卡发送流表维护命令。
  2. 根据权利要求1所述的方法,其特征在于,所述流表的表条目Table Entry中包括:三个字段;所述三个字段分别为:流状态Flow State、延迟时间Relay Time和待处理的数据包指针Pending Packets Ptr;
    其中,字段Flow State,表示流当前处于的状态,所述状态包括:传输状态、迁移状态和关闭状态;
    字段Relay Time,表示所述Flow State为迁移状态时,需要延迟的时间;
    字段Pending Packets Ptr,表示Flow State为迁移状态时,接收数据包在待处理的数据包指针的先后顺序。
  3. 根据权利要求2所述的方法,其特征在于,所述主机系统维护共享的流表包括:
    在网络中流的应用程序发生迁移时,所述主机系统修改所述主机系统内存的流表中迁移对应流的表条目;将新核new core的核标识保存在所述表条目内,并将所述迁移对应流的状态修改成迁移状态;
    主机系统判断旧核old core的当前队列负载是否比new core的当前队列负载小;当old core的当前队列负载比new core的当前队列负载小时,主机系统延迟一个时间t后,将发生迁移的流的接收数据包分发到new core进行处理,主机系统将t保存在所述Relay Time内;
    所述旧核为迁移发生前执行所述应用程序的核,所述新核为迁移发生后执行所述应用程序的核。
  4. 根据权利要求3所述的方法,其特征在于,所述t的计算方法为:
    t=( old core当前负载—new core当前负载) / 协议栈单位时间内处理的数据包数。
  5. 一种众核下网络数据的发送方法,其特征在于,所述方法包括:
    网卡接收网络数据,依据网卡独立内存块内存储的流表对所述网络数据进行流分发;
    所述独立内存块通过主机系统内的直接内存访问DMA控制器与所述主机系统的流表的内存区域形成映射,所述流表由所述主机系统维护。
  6. 一种主机系统,其特征在于,所述主机系统包括:
    维护单元,用于维护网卡的流表,所述网卡的流表保存在所述网卡内的独立内存块内;
    所述主机系统内的流表的内存区域与所述独立内存块通过主机系统内的直接内存访问DMA控制器形成映射;
    DMA控制器,用于检测到所述主机系统内存的流表发生变化后,将变化后的主机系统内存的流表映射到所述网卡的独立内存内以完成所述网卡的流表的维护;
    禁止单元,用于禁止向所述网卡发送流表维护命令。
  7. 根据权利要求6所述的主机系统,其特征在于,所述流表的表条目Table Entry中包括:三个字段;所述三个字段分别为:流状态Flow State、延迟时间Relay Time和待处理的数据包指针Pending Packets Ptr;
    其中,字段Flow State,表示流当前处于的状态,所述状态包括:传输状态、迁移状态和关闭状态;
    字段Relay Time,表示所述Flow State为迁移状态时,需要延迟的时间;
    字段Pending Packets Ptr,表示Flow State为迁移状态时,接收数据包在待处理的数据包指针的先后顺序。
  8. 根据权利要求6所述的主机系统,其特征在于,所述维护单元包括:
    迁移修改模块,用于在网络中流的应用程序发生迁移时,修改所述主机系统内存的流表中迁移对应流的表条目;将新核new core的核标识保存在所述表条目内,并将所述迁移对应流的状态修改成迁移状态;
    判断延时模块,用于判断旧核old core的当前队列负载是否比new core的当前队列负载小;当old core的当前队列负载比new core的当前队列负载小时,延迟一个时间t后,将发生迁移的流的接收数据包分发到new core进行处理,将t保存在所述Relay Time内;
    所述旧核为迁移发生前执行所述应用程序的核,所述新核为迁移发生后执行所述应用程序的核。
  9. 根据权利要求6所述的主机系统,其特征在于,所述t为:
    t=( old core当前负载—new core当前负载) / 协议栈单位时间内处理的数据包数。
  10. 一种网卡,其特征在于,所述网卡包括:
    接收单元,用于接收网络数据;
    流转发单元,用于依据网卡独立内存块内存储的流表对所述网络数据进行流分发;
    所述独立内存块通过主机系统内的直接内存访问DMA控制器与所述主机系统的流表的内存区域形成映射,所述流表由所述主机系统维护。
  11. 一种主机系统,其特征在于,所述主机系统包括:处理器、存储器、通信端口、总线和直接内存访问DMA控制器,其中,所述处理器、所述存储器、所述通信端口、所述DMA控制器均通过所述总线连接;
    所述通信端口,用于接收网络数据;
    所述存储器,用于存储流表;所述存储器存储的流表所在的内存区域与网卡独立内存块通过所述DMA控制器形成映射;
    所述处理器,用于维护网卡的流表,所述网卡的流表保存在所述网卡内的独立内存块内;
    所述DMA控制器,用于检测到所述存储器存储的流表发生变化后,将变化后的所述存储器存储的流表映射到所述网卡的独立内存内以完成所述网卡的流表的维护;
    所述处理器,用于禁止向所述网卡发送流表维护命令。
  12. 根据权利要求11所述的主机系统,其特征在于,所述流表的表条目Table Entry中包括:三个字段;所述三个字段分别为:流状态Flow State、延迟时间Relay Time和待处理的数据包指针Pending Packets Ptr;
    其中,字段Flow State,表示流当前处于的状态,所述状态包括:传输状态、迁移状态和关闭状态;
    字段Relay Time,表示所述Flow State为迁移状态时,需要延迟的时间;
    字段Pending Packets Ptr,表示Flow State为迁移状态时,接收数据包在待处理的数据包指针的先后顺序。
  13. 根据权利要求12所述的主机系统,其特征在于,
    所述处理器在网络中流的应用程序发生迁移时,修改所述存储器的流表中迁移对应流的表条目;将new core的核标识保存在所述表条目内,并将所述迁移对应流的状态修改成迁移状态;判断old core的当前队列负载是否比new core的当前队列负载小;当old core的当前队列负载比new core的当前队列负载小时,延迟一个时间t后,将发生迁移的流的接收数据包分发到new core进行处理,将t保存在所述Relay Time内;
    所述旧核为迁移发生前执行所述应用程序的核,所述新核为迁移发生后执行所述应用程序的核。
  14. 根据权利要求13所述的主机系统,其特征在于,
    t=( old core当前负载—new core当前负载) / 协议栈单位时间内处理的数据包数。
  15. 一种网卡,其特征在于,所述网卡包括:逻辑处理模块、存储器、内存、通信端口和总线,其中,所述逻辑处理模块、所述存储器、所述通信端口、所述内存均通过所述总线连接;
    所述通信端口,用于接收网络数据;
    所述内存,用于存储流表;
    所述逻辑处理模块,用于依据网卡内存内存储的流表对所述网络数据进行流分发;
    所述内存通过主机系统内的直接内存访问DMA控制器与所述主机系统的流表的内存区域形成映射,所述流表由所述主机系统维护。
PCT/CN2014/074868 2014-04-04 2014-04-04 众核下网络数据的分发方法及系统 WO2015149374A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2014/074868 WO2015149374A1 (zh) 2014-04-04 2014-04-04 众核下网络数据的分发方法及系统
CN201480000856.7A CN105164980B (zh) 2014-04-04 2014-04-04 众核下网络数据的分发方法及系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/074868 WO2015149374A1 (zh) 2014-04-04 2014-04-04 众核下网络数据的分发方法及系统

Publications (1)

Publication Number Publication Date
WO2015149374A1 true WO2015149374A1 (zh) 2015-10-08

Family

ID=54239334

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/074868 WO2015149374A1 (zh) 2014-04-04 2014-04-04 众核下网络数据的分发方法及系统

Country Status (2)

Country Link
CN (1) CN105164980B (zh)
WO (1) WO2015149374A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112780953A (zh) * 2021-02-08 2021-05-11 浙江工业大学 一种基于模式检测的独立计量区域管网漏损检测方法
CN115174409A (zh) * 2022-06-30 2022-10-11 无锡芯光互连技术研究院有限公司 一种基于cxl协议的网络连接方法及系统
CN115914102A (zh) * 2023-02-08 2023-04-04 阿里巴巴(中国)有限公司 数据转发方法、流表处理方法、设备及系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11720283B2 (en) 2017-12-19 2023-08-08 Western Digital Technologies, Inc. Coherent access to persistent memory region range
US10929309B2 (en) 2017-12-19 2021-02-23 Western Digital Technologies, Inc. Direct host access to storage device memory space
CN111880942A (zh) * 2020-08-03 2020-11-03 北京天融信网络安全技术有限公司 一种网络威胁处理方法及装置
CN116723162B (zh) * 2023-08-10 2023-11-03 浪潮电子信息产业股份有限公司 一种网络首包处理方法、系统、装置、介质及异构设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102780642A (zh) * 2012-08-23 2012-11-14 深圳乌托邦系统集成有限公司 多通道网络报文传输方法
US20130054857A1 (en) * 2011-08-25 2013-02-28 Neil R.T Horman Reducing latency at a network interface card
CN102984085A (zh) * 2012-11-21 2013-03-20 网神信息技术(北京)股份有限公司 映射方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100464304C (zh) * 2006-08-29 2009-02-25 飞塔信息科技(北京)有限公司 一种基于Linux操作系统实现零拷贝的装置和方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130054857A1 (en) * 2011-08-25 2013-02-28 Neil R.T Horman Reducing latency at a network interface card
CN102780642A (zh) * 2012-08-23 2012-11-14 深圳乌托邦系统集成有限公司 多通道网络报文传输方法
CN102984085A (zh) * 2012-11-21 2013-03-20 网神信息技术(北京)股份有限公司 映射方法及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PESTEREV, A. ET AL.: "Improving Network Connection Locality on Multicore Systems", EUROSYS'12., 13 April 2012 (2012-04-13), XP055229188 *
QI, ZHANSHENG ET AL.: "Research of the Module of High-Speed Packet Capture based on Multi-Core Architectures and Multi-Transceiver Queue", NETWORK SECURITY TECHNOLOGY & APPLICATION, 2013 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112780953A (zh) * 2021-02-08 2021-05-11 浙江工业大学 一种基于模式检测的独立计量区域管网漏损检测方法
CN112780953B (zh) * 2021-02-08 2022-07-12 浙江工业大学 一种基于模式检测的独立计量区域管网漏损检测方法
CN115174409A (zh) * 2022-06-30 2022-10-11 无锡芯光互连技术研究院有限公司 一种基于cxl协议的网络连接方法及系统
CN115914102A (zh) * 2023-02-08 2023-04-04 阿里巴巴(中国)有限公司 数据转发方法、流表处理方法、设备及系统

Also Published As

Publication number Publication date
CN105164980B (zh) 2019-01-08
CN105164980A (zh) 2015-12-16

Similar Documents

Publication Publication Date Title
WO2015149374A1 (zh) 众核下网络数据的分发方法及系统
WO2018076861A1 (zh) 数据传输的控制方法、装置、存储介质、服务器及系统
US9727508B2 (en) Address learning and aging for network bridging in a network processor
WO2021075671A1 (ko) 중앙 네트워크 구성개체 및 이를 포함하는 시간 민감 네트워크 제어 시스템
US8396986B2 (en) Method and system of virtual machine migration
US7000055B1 (en) Multi-interface symmetric multiprocessor
US6978459B1 (en) System and method for processing overlapping tasks in a programmable network processor environment
EP0642246B1 (en) Network communication method for systems equipped with virtual memory
WO2018076841A1 (zh) 数据分享方法、装置、存储介质及服务器
WO2015180434A1 (zh) 一种数据库集群管理数据的方法、节点及系统
WO2014082506A1 (zh) 触摸传感器的触摸检测方法、系统和触控终端
WO2019200728A1 (zh) 虚拟网关主备切换方法、装置及计算机可读存储介质
WO2015024167A1 (zh) 一种处理用户报文的方法及转发面设备
WO2018161575A1 (zh) 调整广播消息队列的方法、装置、存储介质及终端
WO2015012454A1 (ko) 가상 링크 조정에 의한 네트워크 성능 개선 방법 및 이를 적용한 네트워크 시스템
JP2018185624A (ja) スイッチプログラム、スイッチング方法及び情報処理装置
WO2020032345A1 (ko) 데이터 패킷을 처리하기 위한 장치 및 방법
WO2021051492A1 (zh) 数据库服务节点切换方法、装置、设备及计算机存储介质
CN112769514A (zh) 基于时间敏感的通信设备
US6968447B1 (en) System and method for data forwarding in a programmable multiple network processor environment
Wang et al. Quadrant: A cloud-deployable nf virtualization platform
WO2018161576A1 (zh) 一种广播的控制方法、装置、存储介质及移动终端
Fei et al. FlexNFV: Flexible network service chaining with dynamic scaling
WO2018124331A1 (ko) 그래프 처리 시스템 및 그래프 처리 시스템의 동작 방법
WO2014201613A1 (zh) 一种mep配置方法及网络设备

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201480000856.7

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14888297

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14888297

Country of ref document: EP

Kind code of ref document: A1