CN113411262B - Method and device for setting large-scale receiving and unloading functions - Google Patents

Method and device for setting large-scale receiving and unloading functions Download PDF

Info

Publication number
CN113411262B
CN113411262B CN202110527602.2A CN202110527602A CN113411262B CN 113411262 B CN113411262 B CN 113411262B CN 202110527602 A CN202110527602 A CN 202110527602A CN 113411262 B CN113411262 B CN 113411262B
Authority
CN
China
Prior art keywords
receiving queue
lro
message
queue
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110527602.2A
Other languages
Chinese (zh)
Other versions
CN113411262A (en
Inventor
曲会春
徐成
程韬
武雪平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XFusion Digital Technologies Co Ltd
Original Assignee
XFusion Digital Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XFusion Digital Technologies Co Ltd filed Critical XFusion Digital Technologies Co Ltd
Priority to CN202110527602.2A priority Critical patent/CN113411262B/en
Publication of CN113411262A publication Critical patent/CN113411262A/en
Application granted granted Critical
Publication of CN113411262B publication Critical patent/CN113411262B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/19Flow control; Congestion control at layers above the network layer
    • H04L47/193Flow control; Congestion control at layers above the network layer at the transport layer, e.g. TCP related

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application discloses a method and a device for setting a large-scale receiving and unloading function. The method comprises the following steps: determining start-stop information of a large-scale receiving offload LRO function of a receiving queue, wherein the start-stop information is used for indicating to start or stop the LRO function of the receiving queue; according to the start-stop information, the LRO function of the receiving queue is set, so that the overall performance of the system is improved.

Description

Method and device for setting large-scale receiving and unloading functions
Technical Field
The present application relates to the field of communications technologies, and in particular, to a method and an apparatus for setting a large-scale receive offload (large receive offload, LRO) function.
Background
LRO is an offloading technique implemented by a network card (network interface card, NIC), for example, to which the aggregation process of transmission control protocol (Transmission Control Protocol, TCP) packets Wen Qiepian (segments) is offloaded by a processor. Specifically, when the network card starts or stops the LRO function, all the receiving queues start or stop the function at the same time. When the network card starts the LRO function, the network card gathers the received TCP message slices belonging to the same data stream into one TCP message or one large TCP message slice. Taking Linux operating system as an example, the operating system adopted by the server where the network card driver is located, the network card driver can convert the aggregated TCP message or large TCP message slice into data with a socket buffer (SKB) structure, then send the data with the SKB structure to the processor, and then the processor completes the subsequent processing process of a protocol stack (such as a TCP/IP protocol stack). Thus, the aggregation behavior of the message slices and the conversion operation of the SKB structure do not need to be executed by the processor, and the processing overhead of the processor is reduced.
However, if the aggregation effect is not good, after the LRO function is started, the processing time of the TCP packet is prolonged due to the TCP packet slice aggregation executed by the network card. Therefore, how to set start and stop of LRO function to improve overall performance of the system is a technical problem to be solved.
Disclosure of Invention
The application provides a setting method and device of an LRO function, which are beneficial to improving the overall performance of a system.
In a first aspect, the present application provides a method for setting LRO functions, where the method may include: determining start-stop information of an LRO function of a receiving queue, wherein the start-stop information is used for indicating to start or stop the LRO function of the receiving queue; and setting an LRO function of the receiving queue according to the start-stop information. Specifically, when the start-stop information is used to indicate that the LRO function of the receive queue is started, the LRO function of the receive queue is started. When the start-stop information is used for indicating to stop the LRO function of the receiving queue, the LRO function of the receiving queue is stopped. According to the technical scheme, the LRO function is set based on the granularity of the receiving queue, so that the aggregation effect of the message slices and the processing time length of the message slices can be balanced by setting the condition of reasonably starting or stopping the LRO function of the receiving queue, and the processing efficiency and the overall performance of the system are improved.
Wherein, starting the LRO function of the receiving queue comprises: and starting to aggregate the message slices belonging to the same data flow in the receiving queue. It can be understood whether the message slices can be aggregated or not, and whether the message slices belonging to the same data stream meet the aggregation condition is also needed.
Optionally, if a plurality of continuous message slices in the receiving queue belong to the same data stream and the sequence numbers are continuous, the plurality of message slices meet the aggregation condition, otherwise, the plurality of message slices do not meet the aggregation condition.
In one possible implementation, determining start-stop information of LRO functions of a receive queue includes: under the condition that the LRO function of the receiving queue is started, counting the probability of interruption of the aggregation process of the receiving queue; the target object obtained after the LRO function is executed for the receive queue includes: the message in the receiving queue, the message slice which can not participate in aggregation in the receiving queue and/or the message slice or the message obtained after the aggregation of a plurality of message slices in the receiving queue, if one of the target objects is the message slice, is not the last message slice in the affiliated message and has the length smaller than or equal to a first threshold value, the aggregation process of the receiving queue is interrupted once; when the probability of the aggregation process of the receiving queue being interrupted is greater than or equal to a second threshold value, determining that the start-stop information is used for indicating to stop the LRO function of the receiving queue. When the probability of the aggregation process interruption of one receiving queue is greater than or equal to the second threshold, the aggregation process of the receiving queue can be considered to be frequently interrupted, and further, the serious interleaving among the message slices belonging to a plurality of data streams in the receiving queue can be estimated. In this case, since the number of interruption times is large, the number of aggregated message slices is small, if the LRO function is continuously executed for the receiving queue, the effect of reducing the CPU occupancy rate caused by the execution of the LRO function cannot be better exerted, but instead, the network card needs to judge whether the received message slices meet the aggregation condition one by one due to the execution of the LRO function, so that the processing speed of data is slow, thereby affecting the overall performance of the system. Therefore, at this time, the LRO function of the reception queue may be stopped.
It will be appreciated that which message and/or message slice the target object specifically includes is determined from the objects in the receive queue. Optionally, at least one of the first threshold and the second threshold is configurable. The first threshold may be determined according to an aggregation capability of the network cards of the receiving end server, and the second threshold may be determined according to an aggregation effect of the message slices in the receiving queue and a duration of processing the message/message slices by the network cards of the receiving end server.
In one possible implementation, counting the probability of an aggregate process interruption for the receive queue includes: counting the times of interruption of the aggregation process of the receiving queue in a first preset time period, and counting the number of target objects obtained after the LRO function is executed on the receiving queue in the first preset time period; and determining the probability of the aggregation process interruption of the receiving queue according to the times (marked as x) of the aggregation process interruption of the receiving queue in the first preset time period and the number (marked as y) of target objects obtained after the LRO function is executed on the receiving queue in the first preset time period. For example, the probability can be determined by dividing x by y. Of course, embodiments of the present application are not limited thereto.
In one possible implementation, the method further includes: selecting a threshold set from a plurality of threshold sets, each threshold set comprising a third threshold and a fourth threshold; and taking a third threshold value included in the selected threshold value group as the first threshold value, and taking a fourth threshold value included in the selected threshold value group as the second threshold value. In this way, the method is beneficial to realizing different services to be processed (namely, the services to which the objects in the receiving queue belong), and selecting different thresholds, thereby being beneficial to improving the overall performance of the system.
In one possible implementation, determining start-stop information of LRO functions of a receive queue includes: under the condition that the LRO function of the receiving queue is stopped, counting the probability that the data flow to which the object in the receiving queue belongs is multiflow, wherein the object comprises a message and/or a message slice; if the counted probability that the data stream to which the object in the receiving queue belongs is multi-stream is smaller than or equal to a fifth threshold value, the start-stop information is determined to be used for indicating to start the LRO function of the receiving queue. Optionally, the fifth threshold is configurable. When the probability that the data stream to which the object in one receive queue belongs is multi-stream is less than or equal to the fifth threshold, interleaving among the plurality of data streams in the receive queue can be considered not serious. In this case, the LRO function is executed for the receiving queue, so that the probability of interruption is low when the message slices belonging to the same data stream are aggregated, and the number of the aggregated message slices is large, so that the effect of reducing the CPU occupancy rate caused by executing the LRO function can be well exerted, thereby improving the overall performance of the system.
In one possible implementation, counting the probability that the data stream to which the object in the receive queue belongs is multi-stream includes: counting the number of times that the data stream to which the object in the receiving queue belongs is multi-stream in a second preset time period and the number of the objects in the receiving queue in the second preset time period; if the hash values of two adjacent objects in the receiving queue are different, adding 1 to the number of times that the data stream to which the object of the receiving queue belongs is multi-stream; and determining the probability that the data stream of the object in the receiving queue is multi-stream according to the number of times (marked as c) that the data stream of the object in the receiving queue is multi-stream in the second preset time period and the number (marked as d) of the objects in the receiving queue in the second preset time period. For example, the probability may be determined from c divided by d, although embodiments of the application are not limited in this regard.
In one possible implementation, the method may further include: one threshold value is selected from a plurality of threshold values, and the selected threshold value is taken as the fifth threshold value. In this way, the method is beneficial to realizing different services to be processed (namely, the services to which the objects in the receiving queue belong), and selecting different thresholds, thereby being beneficial to improving the overall performance of the system.
The execution body of the first aspect or any possible implementation manner of the first aspect may be a receiving end server or a network card of the receiving end server. The receiving end server is a server for receiving data (including messages and message slices). For a server, when the server is used to transmit data, the server is referred to as a transmitting side server, and when the server is used to receive data, the server is referred to as a receiving side server.
In a second aspect, the present application provides a large-scale reception offload LRO function setting apparatus, the apparatus including respective modules for executing the LRO function setting method in the first aspect or any possible implementation manner of the first aspect.
In a third aspect, the present application provides a large-scale receiving LRO function offloading setting device, the device including a memory and a processor, the memory being configured to store computer-executable instructions, the device, when running, executing the computer-executable instructions in the memory to perform the steps of the method of the first aspect or any one of the possible implementations of the first aspect using hardware resources in the LRO function setting device. The device may specifically be a receiving end server, or a network card of the receiving end server.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform the steps of the method of the first aspect or any of the possible implementations of the first aspect.
In a fifth aspect, the application also provides a computer program product which, when run on a computer, causes the operational steps of the first aspect or any one of the possible implementations of the first aspect.
It should be appreciated that any of the apparatus, the computer-readable storage medium, the computer program product, or the like provided above is used to perform the corresponding method provided above, and thus, the benefits achieved by the apparatus, the computer-readable storage medium, or the computer-readable storage medium are referred to as benefits in the corresponding method, and are not described herein.
Drawings
Fig. 1 is a schematic structural diagram of a communication system according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a buffer and a buffer descriptor, a receiving queue and a corresponding relationship between objects according to an embodiment of the present application;
fig. 3 is a schematic diagram of a correspondence relationship between a network card, a port and a receiving queue according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a relationship between an object and a target object according to an embodiment of the present application;
fig. 5 is a flow chart of a method for setting LRO function according to an embodiment of the present application;
fig. 6 is a flowchart illustrating another LRO function setting method according to an embodiment of the present application;
FIG. 7 is a schematic flow chart of a data processing method according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an LRO function setting device according to an embodiment of the present application;
fig. 9 is a schematic diagram of a hardware structure of a network card according to an embodiment of the present application;
fig. 10 is a schematic hardware structure of a server according to an embodiment of the present application.
Detailed Description
The technical scheme provided by the application is further described below with reference to the accompanying drawings.
Fig. 1 is a schematic structural diagram of a communication system according to an embodiment of the present application. As shown, the communication system includes a server 100 and a server 200. Message communication between server 100 and server 200 may be via network 300, network 300 comprising an ethernet network, that is, communication between servers using TCP/IP protocols. The server that sends data (including messages and message slices) is also called a sender server, and the server that receives data is also called a receiver server. For a server, when the server is used to transmit data, the server is referred to as a transmitting side server, and when the server is used to receive data, the server is referred to as a receiving side server.
Each of the servers 100 and 200 includes a hardware layer and a software layer, and only the structure of the server 100 is illustrated in fig. 1. The hardware layers of the server 100 include a network card, a memory, and one or more processors, such as a central processing unit (central processing unit, CPU). The software layer is a program code running on the hardware layer. In particular, the software layer may be further divided into several layers, and the layers communicate with each other through a software interface. The software layer comprises an application layer, an operating system layer and a driving layer from top to bottom. Wherein the application layer comprises a series of program code running an application program. The operating system layer includes operating system program code and protocol stacks. The operating system may be Linux, windows or vxWarks, etc. A protocol stack refers to a collection of program code that is partitioned according to the different levels involved in the communication protocol and processes the corresponding level of data processing. For convenience of description, in the following description of the embodiments of the present application, the operating system is a Linux operating system, the protocol stack is a TCP/IP protocol stack, and the data structure processed by the TCP/IP protocol stack is an SKB structure. The driving layer is used for realizing message interaction between the hardware layer and the software layer. The driving layer comprises a network card driver and the like.
In order to better understand the technical solution provided by the embodiments of the present application, first, terms and techniques related to the embodiments of the present application are briefly described.
1) Message, message slice, object
The message involved in the embodiment of the application can be a TCP message or a user datagram protocol (user datagram protocol, UDP) message, etc. The lengths of different messages can be the same or different.
After a message is split into segments, each segment is referred to as a message slice. Specifically, after receiving a data instruction from a processor of the sender server, if it is determined that the length of a message for sending the data instruction is greater than a threshold value, dividing the data instruction into a plurality of data segments according to a maximum transmission unit (maximum transmission unit, MTU) such as 1500B, etc., transmitting each data segment by using one message, wherein a message for transmitting one data segment can be called a message slice, and sequentially assigning a sequence number to each message slice, and sequence numbers of two adjacent message slices are continuous; and then, the network card sends each message slice to the receiving end server.
For convenience of description, in the embodiment of the present application, a message and a message slice sent by a network card in a sending end server are collectively referred to as an "object", or a message and a message slice received by a network card in a receiving end server are collectively referred to as an "object".
2) Data flow
Data flow refers to the set of messages and/or message slices with the same five-tuple. The five-tuple refers to a protocol (internet protocol, IP), destination IP, source port number, destination port number, protocol number, of the interconnection between the source networks. That is, the five-tuple information of each message and/or message slice in the same data stream is the same, and the five-tuple information of different data streams is different. The two five-tuple being different may be understood as at least one item of information in the two five-tuple being different.
3) Buffer and buffer descriptor (buffer descriptor)
The buffer area is a storage space for buffering the object. Alternatively, the buffer may be a storage space in the network card. Alternatively, the buffer may be a storage space divided in other memories of the server, for implementing the function of buffering the object.
The buffer descriptor is information describing a buffer, for example, whether the buffer is free, the size of the buffer, and the like.
4) Receiving Queue (RQ)
The receiving queue is a sequence formed by the buffer descriptors, and the receiving end server can store the received object into the buffer described by the buffer descriptors of the receiving queue. For ease of description, in embodiments of the present application, these objects are referred to as objects that are assigned to or in a receive queue. One or more objects may be stored in one cache region, and one object may be stored in a plurality of cache regions.
Fig. 2 is a schematic diagram of a corresponding relationship between a buffer and a buffer descriptor, a receiving queue and an object according to an embodiment of the present application. As shown, a buffer descriptor connected by a dashed line with double arrows is used to describe the buffer to which the dashed line is connected. The arrow to the right above the receive queue 1 indicates the order of buffer descriptors in the receive queue. In fig. 2, the objects belonging to the receive queue 1 are illustrated as including objects 1 to w, and each object is stored in a buffer, where w is an integer greater than or equal to 1.
The data streams to which different objects in a receive queue belong may be the same or different. In one possible implementation manner, for a network card of the receiving end server, a preset hash algorithm may be used to calculate five-tuple of each object to obtain a hash value, and the hash value is used to determine whether two adjacent objects in a receiving queue belong to the same data stream. If the hash values of two adjacent objects in one receiving queue are the same, the two objects are considered to belong to the same data stream; if it is determined that the hash values of the two objects are different, then the two objects are not considered to belong to the same data stream. As another possible embodiment, for the network card of the receiving end server, it may also be determined in other manners whether two adjacent objects in the same receiving queue belong to the same data stream, for example, each object carries a field for identifying the data stream, which is not limited to this embodiment of the present application.
5) Network card and physical function (physical function, PF)
The network card, which may also be referred to as a network interface card, has a main function of connecting a plurality of servers to a network, so that the servers can communicate with each other through the network. The network card may be connected to the network via an external optical fiber, cable, etc. The network card may be plugged into a high speed universal serial bus (peripheral component interconnect express, PCIe) slot of the computer and connected to the server through PCIe. Or the network card may be connected to the server via some specific (or proprietary) bus, which is not limited by the embodiments of the present application. It will be appreciated that in a physical implementation, the network card may be part of the server or may be a device/apparatus independent of the server. For ease of description, the network cards are hereinafter described as network cards of servers.
A network card may include one or more ports, specifically ports for receiving data. Typically, each port corresponds to a PF. Of course, there may be scenarios where one port corresponds to multiple PFs or where multiple ports correspond to one PF. A PF is understood to be a logical network card, which is able to perform all the logical functions of a network card.
One PF may support one or more receive queues. Where which receive queues are supported by a PF may be predefined. A receive queue supported by a PF may be considered a receive queue managed by the PF.
Fig. 3 is a schematic diagram of a correspondence relationship between a network card, a port and a receiving queue according to an embodiment of the present application. As shown, the network card comprises ports 1-n, n is an integer greater than or equal to 1, and one port corresponds to one PF. PF1 supports m receive queues (labeled receive queues 11-1 m), PF2 supports k receive queues (labeled receive queues 21-1 k), … …, PFn supports t receive queues (labeled receive queues n 1-nt), m, k, and t are integers greater than or equal to 1.
Each PF can correspond to one hash algorithm, and the hash algorithms corresponding to different PFs can be the same or different. The network card can carry out hash operation on the object received from the port corresponding to the PF according to a hash algorithm corresponding to the PF to obtain a hash value of the object; then, according to the mapping relation between the preset hash values and the receiving queues (i.e. the receiving queues corresponding to the PF), determining the receiving queue corresponding to the object, and taking the object as the object in the receiving queue.
As one possible embodiment, when the server supports single-root I/O virtualization (SR-IOV) functions, each PF may also correspond to multiple Virtual Functions (VFs), where each VF corresponds to one or more receive queues into which different data streams may be stored separately.
It should be noted that, in the embodiment of the present application, the processing procedures for the cases of PF and VF are similar, and for convenience of description, description will be given by taking an example in which one PF supports one or more receive queues.
6) LRO and target object
LRO is a network card acceleration technology, and specifically, aggregation of message slices is realized through a network card of a receiving end server. After the LRO function of the network card is started, the CPU does not need to execute the aggregation operation of the message slices, so that the occupancy rate of the CPU can be reduced.
The network card executes the LRO function for one receiving queue, which comprises the following steps:
step 1: the network card judges whether the objects in the receiving queue need to be aggregated or not.
Specifically, if the object is a message, the network card does not need to perform aggregation processing on the message, that is, if the message received by the network card of the receiving end server contains a complete data instruction, aggregation operation is not needed. If the object received by the network card of the receiving end server is a message slice, the network card needs to aggregate a plurality of message slices. Or only the message slices need to be aggregated.
Step 2: if the object needs to execute LRO flow processing, it is determined whether the object can participate in the aggregation.
Specifically, first, it is determined whether a plurality of message slices in a receiving queue satisfy an aggregation condition, and if so, the plurality of message slices can participate in aggregation. If the plurality of message slices are continuous message slices in the receiving queue, belong to the same data stream and have continuous sequence numbers, the aggregation condition is satisfied. More specifically, whether two adjacent message slices meet the aggregation condition is judged, and if so, the two message slices can participate in aggregation. The previous message slice in the two message slices can be a message slice belonging to the receiving queue, or can be a message slice obtained after aggregation of a plurality of message slices belonging to the receiving queue; the subsequent message slice is the message slice attributed to the receive queue. If the two message slices belong to the same data stream and the sequence numbers are continuous, the aggregation condition is satisfied, otherwise, the aggregation condition is not satisfied.
For example, assume that the network card of the receiving end server receives the object belonging to one receiving queue sequentially includes: when the LRO function is executed, the network card may first aggregate the message slice 11 and the message slice 12 to obtain the message slice 11+12, and then aggregate the message slice 11+12 and the message slice 13.
Optionally, the polymerization conditions further comprise: the length of the message slice or the message obtained after aggregation does not exceed the maximum aggregation length supportable by the network card. The maximum aggregate length supportable by the network card may be the size of a buffer (e.g. 64 k), or other preset lengths may be used as the maximum aggregate length supportable by the network card, which is not limited by the embodiment of the present application. For example, if a message is 128K in length and is split into 8 message slices, and each message slice is 16K, then when the consecutive 8 objects included in one receive queue are the 8 message slices, and the sequence numbers of the 8 message slices received by the network card of the receiving end server are consecutive, the network card may aggregate the first 4 message slices and the last 4 message slices, respectively. For convenience of description, the following description will take an example that the length of the message slice or the message obtained after aggregation does not exceed the maximum aggregation length supportable by the network card, and the description is unified herein, and the description will not be repeated.
Optionally, for a plurality of message slices meeting the aggregation condition, only two message slices may be executed at a time, or more than two message slices may be executed at a time, or all the message slices may be aggregated into one large message slice at a time.
In one possible implementation, a message slice cannot participate in aggregation if neither the object adjacent to the message slice nor the object adjacent to the message slice belong to the same data stream.
In another possible implementation, although the objects adjacent to each other before and after a message slice belong to the same data stream as the message slice; however, the front and rear adjacent objects are messages, or the front and rear adjacent objects are message slices, the sequence numbers of which are discontinuous with the sequence numbers of the message slices, or one of the front and rear adjacent objects is a message, the other is a message slice, the sequence numbers of which are discontinuous with the sequence numbers of the message slices, so that the message slices cannot participate in aggregation.
Step 3: the network card aggregates objects capable of participating in aggregation.
The target object is obtained after performing LRO functions on one or more objects in the receive queue. Wherein the target object comprises: the message in the receiving queue, the message slice which can not participate in aggregation in the receiving queue, and/or the message slice or the message obtained after the aggregation of a plurality of message slices in the receiving queue. That is, if an object does not need to perform an aggregation operation, the object is taken as a target object. If an object needs to be aggregated, but cannot participate in the aggregation, the object is taken as a target object. If one object can participate in aggregation, taking all message slices capable of being aggregated with the object and the message slices or messages obtained after the aggregation of the object as a target object.
Fig. 4 is a schematic diagram of a relationship between an object and a target object according to an embodiment of the present application. In fig. 4, the objects in the receive queue 1 sequentially include: message slice 11, message slice 12, message slice 21, message slice 22, message slice 23, message slice 13, message 3. The message slices 11-13 are obtained by cutting the message 1, and the message slice 13 is the last message Wen Qiepian of the message 1; the message slices 21-23 are obtained by slicing the message 2, and the message slice 23 is the last message slice of the message 2. After the LRO function is performed on the receive queue 1, the obtained target objects are in order: messages Wen Qiepian 11+12, message 2, message slice 13, message 3. The messages Wen Qiepian 11+12 are message slices obtained by aggregating the message slices 11 and the message slices 12.
7) TCP slice offload (TCP segment offload, TSO)
TSO is a process in which a sender server segments a message into message slices. The TSO function cooperates with the LRO function to jointly implement service acceleration of the server system shown in fig. 1. Specific:
for the sender server, when the processor finishes processing in the TCP/IP protocol stack and sends a message, the processor can send the TCP message with SKB structure to the network card driver because the TSO function is turned on, and the network card driver sequentially stores the TCP message to be sent to a buffer area described by a buffer area descriptor of a Send Queue (SQ) and notifies the network card to send the TCP message. After the network card receives the notification, if the length of the TCP message is determined to be greater than the threshold value, the TCP message is divided into a plurality of TCP message slices, and the TCP message Wen Qiepian is packaged with a new message header and sent out. The sending queue is a sequence formed by descriptors of non-idle buffer areas and is used for sending data by a sending end server, for example, the server can sequentially send data stored in the buffer areas described by the buffer area descriptors of the queue.
For the receiving end server, after the network card receives the TCP message slices, as the LRO function is started, the network card aggregates the TCP message slices, then the aggregated TCP message or large TCP message slices are stored in a buffer area described by a buffer area descriptor of the receiving queue by utilizing a direct memory access (direct memory access, DMA) technology, and after finishing DMA (direct memory access) of one or more TCP messages or TCP message slices into the buffer area, the network card can inform the network card of driving through an interrupt mechanism. The network card driver can integrate the TCP message or the TCP message slice in the buffer area described by the buffer area descriptor of the receiving queue into an SKB structure, and send the SKB structure to the processor, and the processor continuously completes the processing of the TCP/IP protocol stack.
8) Other terms
The term "at least one" in the embodiments of the present application includes one (species) or a plurality of (species). "plurality of species" means two species or more than two species. For example, at least one of A, B and C, includes: a alone, B alone, a and B together, a and C together, B and C together, and A, B and C together. In the description of the present application, "/" means or, unless otherwise indicated, for example, A/B may represent A or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. "plurality" means two or more than two. In addition, in order to facilitate the clear description of the technical solution of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.
In the conventional technology, LRO functions are uniformly started or stopped for all the receiving queues based on the granularity of the network card, that is, if one network card supports a plurality of receiving queues, the network card can only start or stop LRO functions of the plurality of receiving queues at the same time. Assuming that one network card supports multiple receive queues, then there may be: for some receiving queues in the plurality of receiving queues, the aggregation process is frequently interrupted due to the crossing of message slices belonging to different data streams, so that a good aggregation effect is not achieved; for another part of the receive queues, the aggregation works well. If the traditional technology is adopted, the LRO function of the network card is started or stopped, and the overall performance of the system cannot be effectively improved. An example of a situation where the aggregation process is frequently interrupted and a good aggregation effect is not achieved due to the crossing of the message slices belonging to different data streams is as follows: in an extreme case, any two adjacent message slices in one receiving queue do not meet the aggregation condition, and in this case, as each message slice enters the aggregation flow, the time for the message slice to pass through the network card is dragged, so that the overall performance of the system is affected.
The technical scheme provided by the embodiment of the application starts or stops the LRO function based on the granularity of the receiving queue. For any one receiving queue in the same network card, LRO functions of each receiving queue can be dynamically set according to the situation of the polymerizable message slices in the receiving queue. At the same time, only the LRO function of a part of the receive queues may be turned on, and the LRO function thereof may be turned off for another part of the receive queues. Whether the LRO function is supported by each receiving queue is dynamically and adaptively set, the aggregation effect of the network card of the receiving end can be effectively ensured, the problem that the aggregation process is frequently interrupted and the good aggregation effect cannot be achieved due to the fact that message slices of different data streams are crossed is avoided, the aggregation effect and the data processing efficiency of the network card of the receiving end server are improved, and further the performance of the whole system is improved.
Next, a detailed description will be given of a method for setting LRO functions according to an embodiment of the present application with reference to the accompanying drawings.
Fig. 5 is a schematic flow chart of a method for setting LRO function according to an embodiment of the present application. The method illustrated in fig. 5 may be performed by a receiving end server. The embodiment of the application can respectively start or stop the LRO function aiming at each receiving queue of the network card of the receiving end server, and the method for starting and stopping each receiving queue is the same, and is described by taking one receiving queue for processing a plurality of data streams as an example for convenience of description. The method shown in fig. 5 includes the steps of:
S101: under the condition that an LRO function of a receiving queue is started, counting the times x of aggregation process interruption of the receiving queue in a first preset time period and the number y of target objects obtained by executing the LRO function on the receiving queue in the first preset time period. The target object includes: the message in the receiving queue, the message slice which can not participate in aggregation in the receiving queue, and/or the message slice or the message obtained after the aggregation of a plurality of message slices in the receiving queue.
If one target object in the receiving queue is a message slice, and not the last message slice in the affiliated message, and the length is smaller than or equal to the first threshold value, the aggregation process of the receiving queue is interrupted once. Accordingly, the number x of times the aggregation process of the receive queue is interrupted within the first preset time period can be determined. The message slice that interrupts the aggregation process of the receiving queue once may be a message slice in the receiving queue that does not participate in the aggregation, or may be a message slice obtained after participating in the aggregation and having a length less than or equal to the first threshold.
S102: and determining the probability of the aggregation process interruption of the receiving queue according to the times x of the aggregation process interruption of the receiving queue in the first preset time period and the number y of target objects obtained by executing the LRO function on the receiving queue in the first preset time period. For example, the probability of an aggregate process interruption for the receive queue may be obtained by dividing x by y.
S103: when the probability of the aggregation process of the receiving queue being interrupted is greater than or equal to a second threshold value, determining start-stop information is used for indicating to stop the LRO function of the receiving queue.
When the probability of the aggregation process interruption of one receiving queue is greater than or equal to the second threshold, the aggregation process of the receiving queue can be considered to be frequently interrupted, and further, the serious interleaving among the message slices belonging to a plurality of data streams in the receiving queue can be estimated. In this case, since the number of interruption times is large, the number of aggregated message slices is small, if the LRO function is continuously executed for the receiving queue, the effect of reducing the CPU occupancy rate caused by the execution of the LRO function cannot be better exerted, but instead, the network card needs to judge whether the received message slices meet the aggregation condition one by one due to the execution of the LRO function, so that the processing speed of data is slow, thereby affecting the overall performance of the system. Therefore, at this time, the LRO function of the reception queue may be stopped.
It is appreciated that when the probability of an aggregate procedure interruption for a receive queue is less than a second threshold, LRO functionality may continue to be performed for the receive queue. At this time, the timing may be restarted, the probability of the aggregation process of the receive queue being interrupted in the first preset period of time may be counted again, and so on, and when the probability of the aggregation process of the receive queue being interrupted obtained in a certain counting period is greater than or equal to the second threshold value, the LRO function of the receive queue is stopped.
The above-described S101 to S102 may be regarded as one possible implementation of counting the probability of the aggregation process interruption of the receive queue. The above-described S101 to S103 may be regarded as one possible implementation of determining start-stop information of LRO functions of the receive queue. S104 is one possible implementation of setting the LRO function of the receive queue according to start-stop information.
S104: and stopping the LRO function of the receiving queue according to the start-stop information.
Fig. 6 is a schematic flow chart of a method for setting LRO function according to an embodiment of the present application. The method shown in fig. 6 may be performed by a receiving end server. The method shown in fig. 6 includes the steps of:
s201: in the case that the LRO function of the receiving queue is stopped, counting the number c of times that the data stream to which the object in the receiving queue belongs is multi-stream in the second preset time period, and the number d of the objects belonging to the receiving queue in the second preset time period. If the hash values of two adjacent objects in the receiving queue are different, which indicates that the data flows to which the two objects belong are different, the number of times that the data flow to which the object of the receiving queue belongs is multiple flows is increased by 1. Wherein, the object is a generic term of a message slice and a message.
For example, assume that the objects in the receive queue in the second preset time period are: message slice 11, message slice 21, message slice 12, message slice 22, and the data stream to which message slices 11-12 belong is data stream 1, and the data stream to which message slices 21-22 belong is data stream 2, then the number of times that the data stream to which the receiving queue belongs is multi-stream in the second preset time period is 3.
S202: and determining the probability that the data flow of the object in the receiving queue is multiflow according to the times c of multiflow of the object in the receiving queue in the second preset time period and the number d of the objects in the receiving queue in the second preset time period. For example, the probability that the data stream to which the object in the receive queue belongs is multi-stream may be obtained by dividing c by d.
S203: if the probability that the data stream to which the object in the receiving queue belongs is multi-stream is smaller than or equal to a fifth threshold value, determining start-stop information is used for indicating to start the LRO function of the receiving queue.
When the probability that the data stream to which the object in one receive queue belongs is multi-stream is less than or equal to the fifth threshold, interleaving among the plurality of data streams in the receive queue can be considered not serious. In this case, the LRO function is executed for the receiving queue, so that the probability of interruption is low when the message slices belonging to the same data stream are aggregated, and the number of the aggregated message slices is possibly large, so that the effect of reducing the CPU occupancy rate brought by executing the LRO function can be well exerted, and the overall performance of the system is improved.
It will be appreciated that interleaving between the data streams of a receive queue may be considered severe when the probability that the data stream to which the object in the receive queue belongs is multi-stream is greater than a fifth threshold. At this time, the timing may be restarted, and the probability that the data stream to which the object in the receiving queue belongs is multi-stream in the second preset period of time may be counted again, and so on, and when the probability that the data stream to which the object in the receiving queue belongs is multi-stream obtained in a certain counting period is less than or equal to the fifth threshold value, the LRO function of the receiving queue is determined to be turned on.
S204: and starting the LRO function of the receiving queue according to the start-stop information.
S201 to S202 can be regarded as one possible implementation of counting the probability that the data stream to which the object in the receive queue belongs is multi-stream. The above-mentioned S201 to S203 may be regarded as another possible implementation manner of determining start-stop information of the LRO function of the receive queue, and S204 is another possible implementation manner of setting the LRO function of the receive queue according to the start-stop information.
As can be seen from the embodiments shown in fig. 5 and fig. 6, the LRO function setting method provided by the embodiment of the present application sets the LRO function based on the granularity of the receive queue. Therefore, by setting the condition of reasonably starting or stopping the LRO function of the receiving queue, the aggregation effect of the message slices and the processing time length of the message slices are balanced, and the overall performance of the system is improved.
Optionally, the first threshold, the second threshold, and the fifth threshold above are all configurable, for example, configured according to the characteristics of the service to be processed, the CPU occupancy, and the aggregation effect of the LRO function. Of course, one or more of these thresholds may also be predefined. The embodiment of the present application is not limited thereto.
As an alternative implementation, a plurality of threshold groups is predefined, each threshold group comprising a third threshold and a fourth threshold. Based on this, before performing S101, the method may further include: selecting a threshold set from the plurality of threshold sets; and the third threshold value included in the selected threshold value group is taken as a first threshold value, and the fourth threshold value included in the selected threshold value group is taken as a second threshold value.
As another possible implementation, a plurality of thresholds are predefined. Based on this, before performing S201, the method may further include: one of the plurality of threshold values is selected as a fifth threshold value.
As another possible implementation, a plurality of threshold groups may be predefined, each threshold group comprising a third threshold, a fourth threshold and a sixth threshold. Based on this, the method may further comprise: selecting a threshold value group from a plurality of threshold value groups; and the third threshold value included in the selected threshold value group is used as the first threshold value, the fourth threshold value included in the threshold value group is used as the second threshold value, and the sixth threshold value included in the threshold value group is used as the fifth threshold value.
For example, assume that a first threshold is labeled pkt.len.threshold, a second threshold is labeled lro.interface.rate, and a fifth threshold is labeled multiple. Then, the predefined plurality of threshold groups in the receiving end server may be as shown in table 1:
TABLE 1
Threshold value group pkt.len.threshold lro.interrupt.rate multiple.flow.rate
Threshold value group one 32K 50% 5%
Threshold value group two 16K 50% 15%
…… …… …… ……
Threshold set N 8K 50% 10%
It can be understood that, when the network card works, the bandwidth capability of the port can be sensed, configuration information (such as the size of the first preset time period and the second preset time period) of the timer can also be obtained, and when the service to be processed is different, the information sensed or obtained by the network card may not be able to be obtained, so that different threshold groups can be predefined according to the characteristics of different services, and when the LRO function of the receiving queue is set, a proper threshold group is selected from a plurality of threshold groups according to the characteristics of the service to which the object in the receiving queue belongs, and the LRO function of the receiving queue is set according to the selected threshold group, so as to achieve the purpose of maximizing the system performance.
Optionally, the server may refer to, but is not limited to, at least one of the following factors when predefining the size of the thresholds in each threshold group: port rate (e.g., 10GE, 25GE, 40GE, 50GE, 100GE, or 200GE, etc.), service typical IO size, LRO aggregation parameters (e.g., buffer size and configuration information of timers).
The above-mentioned steps of predefining a plurality of thresholds and selecting one of the plurality of thresholds as the first threshold (or the second threshold or the fifth threshold) are helpful for realizing different services to be processed (i.e. the services to which the object in the receiving queue belongs), and selecting different thresholds, that is, providing personalized services for different services to be processed. Based on this, by predefining a reasonable threshold value, and reasonably selecting the threshold value, the overall performance of the system is facilitated.
Next, by way of a specific example, the method of setting the LRO function described above is specifically explained.
Fig. 7 is a schematic diagram of a data processing method according to an embodiment of the present application. The data processing method includes a setting method of LRO function. The method shown in fig. 7 includes the steps of:
s300: when the processor of the transmitting end server needs to transmit the message to the receiving end server, the processor of the transmitting end server transmits the message to the network card of the transmitting end server.
S301: and after the network card of the transmitting end server receives the message, generating an object.
Specifically, if the length of the message is determined to be greater than the threshold value, the message is segmented into a plurality of message slices, so that the length of each message slice is smaller than the threshold value, a message header is encapsulated for each message slice, and the message slice after the message header is encapsulated is taken as an object. And if the length of the message is smaller than or equal to the threshold value, taking the message as an object.
Each object includes header information that includes description information of the object, such as information indicating that the object is a slice or a message. In addition, for the last message Wen Qiepian in a message, the header information of the message slice may further include an end mark for marking that the message slice is the last message slice of the message to which the message slice belongs. Subsequently, the network card of the receiving end server can identify whether the object is a message slice or a message according to the header information of the object. And whether a message slice is the last message slice of the message to which it belongs.
S302: the network card of the transmitting end server transmits the object to the receiving end server.
S303: the network card of the receiving end server receives the object through N ports. N is more than or equal to 1, and N is an integer.
S304: for each object received through the port corresponding to the target PF, the network card of the receiving end server calculates the hash value of the object according to the hash algorithm corresponding to the target PF, and determines the receiving queue corresponding to the hash value of the object according to the mapping relation between the hash values and the receiving queues. The object is then considered to be assigned to the receive queue.
The target PF may be any PF supported by the network card of the receiving end server. One port may support one or more PFs, or multiple ports may support one PF. The hash algorithms corresponding to different PFs may be the same or different.
The mapping relationship between the plurality of hash values and the plurality of receive queues may be predefined. The number of receive queues may be the number of receive queues corresponding to the target PF. The number of hash values may be the number of hash values of the hash algorithm to which the target PF corresponds. For example, assuming that the receiving queue corresponding to the target PF is the receiving queues 1 to 16, and the hash value of the hash algorithm corresponding to the target PF includes hash values 1 to 32, the correspondence between the hash value and the receiving queue may be that the hash value a and the hash value 16+a correspond to the receiving queue a, where 1 is equal to or greater than a and equal to or less than 16, and a is an integer.
S305: for the target receive queue, the network card of the receive end server may store each object belonging to the target receive queue in turn into the buffer described by the buffer descriptor included in the receive queue. One or more objects may be stored in one buffer area, and one object may be stored in a plurality of buffer areas.
If the LRO function of the target receive queue is already on, S306 is performed.
If the LRO function of the target receive queue has stopped, S313 is performed.
S306: for the object belonging to the target receiving queue, the network card of the receiving end server executes the LRO function.
For example, for a consecutive plurality of message slices belonging to the target receive queue, aggregation is performed if the aggregation condition is satisfied, otherwise aggregation is not performed. The target receiving queue may be any receiving queue corresponding to the target PF. Of course, there may be cases where each slice belonging to the target receive queue does not satisfy the aggregation condition.
The embodiment of the application does not limit the sequence of S305 and S306. For example, S306 may be executed in the process of executing S305, where, for example, each time the network card determines a receiving queue to which an object belongs, the network card stores the object in a buffer described by a buffer descriptor included in the receiving queue, and determines whether the object needs to perform an aggregation operation at the same time or sequentially.
S307: in performing LRO functions for the target receive queue, the network card of the receiving end device generates completion queue elements (complete queue element, CQEs) for one or more target objects.
The network card of the receiving end device generates a CQE for each target object after determining the target object. One target object is a message in a target receiving queue, or a message slice or a message obtained by aggregating a plurality of message slices in the target receiving queue, or a message slice which cannot participate in aggregation in the target receiving queue.
The network card of the receiving end device can judge and record whether each target object needs to perform aggregation operation. If the current target object is a message, the aggregation operation is not needed. If the current target object is a datagram Wen Qiepian, an aggregation operation is required.
The network card of the receiving end device can judge and record whether the current message slice is the last message slice in the message to which the current message slice belongs. For example, whether the current message slice is the last message slice in the message to which the current message slice belongs is determined according to whether the header information of the current message slice contains an end mark.
The network card of the receiving end device can judge and record the length of each target object.
Alternatively, the CQE of the current target object may include: information indicating whether or not the current target object needs to perform an aggregation operation, for example, the information may be marked as lro_flag. Specifically, lro_flag=1 if the current target object needs to perform an aggregation operation, otherwise lro_flag=0.
Further, if the current target object needs to perform an aggregation operation, the CQE may further include information indicating whether the current target object is the last slice in the message to which it belongs. For example, the information indicating whether the current target object is the last message slice in the belonging message may be marked as a push_flag, specifically, if it is the last message Wen Qiepian in the belonging message, then the push_flag=1, otherwise, the push_flag=0.
Still further, if the current target object is not the last message Wen Qiepian of the messages to which it belongs, the CQE may also include information indicating the length of the current target object. Alternatively, the information may be the length of the current target object, or information whether the length of the current target object is less than or equal to the first target preset threshold. For example, information indicating the length of the current target object may be marked as lro_length. When lro_length is information of whether the length of the current target object is greater than or equal to the first target preset threshold, lro_length=1 if the length of the current target object is less than or equal to the first target preset threshold, otherwise lro_length=0.
S308: after the network card of the receiving end equipment generates CQEs of a preset number of target objects, a notification message is sent to a processor of the receiving end equipment. Wherein the specific value of the preset number may be predefined, for example by a protocol.
S309: after the processor of the receiving end device receives the notification message, the CQEs of the preset number of target objects are read from the network card of the receiving end device.
S310: the processor of the receiving end device analyzes the CQE of the target object read in the first preset time period, and according to the information obtained by analysis, the probability of the aggregation process interruption of the target receiving queue is counted. If the probability is greater than or equal to the second threshold, the processor determines to stop the LRO function of the target receive queue.
The probability is obtained by dividing the number of times the aggregation process of the receiving queue is interrupted in the first preset time period by the number of target objects indicated by the CQEs of the target objects read in the first preset time period (i.e., the number of CQEs of the target objects read in the first preset time period).
Based on the example in S307, the processor of the receiving end device obtains the number of times that the aggregation process of the receiving queue is interrupted in the first preset period of time, which may be achieved in the following manner 1 or manner 2:
Mode 1: if lro_length is the length of the current target object, when the processor of the receiving end device parses the CQE of the current target object to obtain lro_flag=1, push_flag=0, and the value of lro_length is less than or equal to the first preset threshold, it is determined that the current target object causes the aggregation process of the target receive queue to be interrupted. Otherwise, it is determined that the current target object does not cause an interruption of the aggregation process of the target receive queue.
Mode 2: if lro_length is information of whether the length of the current target object is greater than or equal to a first preset threshold, when the processor parses the CQE of the current target object to obtain lro_flag=1, push_flag=0, and lro_length=1, it is determined that the current target object causes the aggregation process of the target receive queue to be interrupted. Otherwise, it is determined that the current target object does not cause an interruption of the aggregation process of the target receive queue.
According to the above mode 1 or mode 2, it may be determined whether the target object indicated by each CQE acquired in the first preset period of time causes interruption of the aggregation process of the target receive queue, so as to calculate the probability of interruption of the aggregation process of the target receive queue in the first preset period of time.
S311: the processor of the receiving end device sends first indication information to the network card, wherein the first indication information is used for indicating to stop the LRO function of the target receiving queue.
S312: and the network card of the receiving end equipment stops the LRO function of the target receiving queue according to the first indication information.
After S312 is performed, the procedure for how to stop the LRO function of the target receive queue ends.
S313: the network card of the receiving end device generates a CQE for one or more objects in the target receive queue.
The CQE of the current target object may include: hash value information of the current object.
In one implementation, the hash value information for the current object may be a hash value for the current object. In another implementation, the hash value information of the current object may be whether the hash value of the current object has changed from the hash value of the previous object of the current object.
S314: after the network card of the receiving end equipment generates CQEs of a preset number of target objects, a notification message is sent to a processor of the receiving end equipment.
Wherein the specific value of the preset number may be predefined, for example by a protocol.
S315: after the processor of the receiving end device receives the notification message, the CQEs of the preset number of target objects are read from the network card of the receiving end device.
S316: the processor of the receiving end device analyzes the CQE of the target object read in the second preset time period, and according to the information obtained by analysis, the probability that the data stream of the object in the receiving queue belongs to multiple streams is counted. If the probability is less than or equal to the fifth threshold, the processor of the receiving end device determines to turn on the LRO function of the target receive queue. The probability is a value obtained by dividing the number of times that the data stream to which the object in the target receiving queue counted in the second preset time period belongs is multi-stream by the number of the objects in the target receiving queue in the second preset time period.
S317: the processor of the receiving end device sends second indication information to the network card of the receiving end device, and the second indication information is used for indicating to start the LRO function of the target receiving queue.
S318: and the network card of the receiving end equipment starts an LRO function of the target receiving queue according to the second indication information.
After S318 is performed, the procedure for how to turn on the LRO function of the target receive queue ends.
It should be noted that, in the specific implementation process, the steps executed by the processor of the receiving end device in the steps S303 to S318 may be replaced by the steps executed by the network card of the receiving end device, in this case, the steps of interaction between the network card and the processor in the steps S303 to S318 may not be executed, so as to obtain a new embodiment, which is not described herein for brevity.
The foregoing description of the solution provided by the embodiments of the present application has been mainly presented in terms of a method. To achieve the above functions, it includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The embodiment of the application can divide the function modules of the LRO function setting device according to the method example, for example, each function module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.
The method for setting the LRO function provided according to the embodiment of the present application is described in detail above with reference to fig. 5 to 7, and the device, the network card, and the server for setting the LRO function provided according to the embodiment of the present application will be described below with reference to fig. 8 to 10.
Fig. 8 is a schematic structural diagram of an LRO function setting device 80 according to an embodiment of the present application. The apparatus 80 may be used to perform the LRO function setting method shown in any one of fig. 5 to 7. The apparatus 80 may include: a determination unit 801 and a setting unit 802. The determining unit 801 is configured to determine start-stop information of an LRO function of a receive queue, where the start-stop information is used to instruct to start or stop the LRO function of the receive queue. A setting unit 802, configured to set an LRO function of the receive queue according to the start-stop information. For example, in connection with fig. 5, the determination unit 801 may be used to perform S101 to S102, and the setting unit 802 may be used to perform S103. As another example, in connection with fig. 6, the determining unit 801 may be used to perform S201 to S202, and the setting unit 802 may be used to perform S203.
In one possible implementation manner, the setting unit 802 is specifically configured to start the LRO function of the receive queue according to the start-stop information; wherein, starting the LRO function of the receiving queue includes: and starting to aggregate a plurality of message slices belonging to the same data stream in the receiving queue.
In one possible implementation manner, the determining unit 801 is specifically configured to: under the condition that the LRO function of the receiving queue is started, counting the probability of interruption of the aggregation process of the receiving queue; the target object obtained after the LRO function is executed for the receive queue includes: the message in the receiving queue, the message slice which can not participate in aggregation in the receiving queue and/or the message slice or the message obtained after the aggregation of a plurality of message slices in the receiving queue, if one target object is the message slice and is not the last message slice in the affiliated message and the length is smaller than or equal to a first threshold value, the aggregation process of the receiving queue is interrupted once; when the probability of the aggregation process of the receiving queue being interrupted is greater than or equal to a second threshold value, determining start-stop information is used for indicating to stop the LRO function of the receiving queue. For example, in connection with fig. 5, the determination unit 801 may be used to perform S101 to S102.
In one possible implementation manner, the determining unit 801 is specifically configured to: counting the times of interruption of the aggregation process of the receiving queue in a first preset time period, and counting the number of target objects obtained after the LRO function is executed on the receiving queue in the first preset time period; and determining the probability of the aggregation process interruption of the receiving queue according to the times of the aggregation process interruption of the receiving queue in the first preset time period and the number of target objects obtained after the LRO function is executed on the receiving queue in the first preset time period. For example, in connection with fig. 5, the determination unit 801 may be used to perform S101.
In a possible implementation, the apparatus 80 further comprises a selection unit 803 for: selecting a threshold set from a plurality of threshold sets, each threshold set comprising a third threshold and a fourth threshold; the third threshold value included in the selected threshold value group is used as a first threshold value, and the fourth threshold value included in the selected threshold value group is used as a second threshold value.
In one possible implementation manner, the determining unit 801 is specifically configured to: under the condition that the LRO function of the receiving queue is stopped, counting the probability that the data flow to which the object in the receiving queue belongs is multiflow, wherein the object comprises a message and/or a message slice; if the counted probability that the data stream to which the object in the receiving queue belongs is multi-stream is smaller than or equal to a fifth threshold value, determining start-stop information to be used for indicating to start the LRO function of the receiving queue. For example, in connection with fig. 6, the determination unit 801 may be used to perform S201 to S202.
In one possible implementation manner, the determining unit 801 is specifically configured to: counting the number of times that the data stream to which the object in the receiving queue belongs is multi-stream in a second preset time period and the number of the objects in the receiving queue in the second preset time period; if the hash values of two adjacent objects in the receiving queue are different, adding 1 to the number of times that the data stream to which the object of the receiving queue belongs is multi-stream; and determining the probability that the data flow of the object in the receiving queue is multi-flow according to the times that the data flow of the object in the receiving queue is multi-flow in the second preset time period and the number of the objects in the receiving queue in the second preset time period. As another example, in connection with fig. 6, the determining unit 801 may be used to perform S201.
In a possible implementation, the apparatus 80 further comprises a selection unit 803 for selecting one threshold value from the plurality of threshold values and regarding the selected threshold value as the fifth threshold value.
It should be appreciated that the apparatus 80 of embodiments of the present application may be implemented by an application-specific integrated circuit (application-specific integrated circuit, ASIC), a programmable logic device (programmable logic device, PLD), which may be a complex program logic device (complex programmable logical device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general-purpose array logic (generic array logic, GAL), or any combination thereof. When the method shown in fig. 5-6 is implemented by software, the apparatus 80 and its respective modules may also be software modules.
The explanation of the relevant contents, the description of the advantageous effects, and the like in this embodiment can refer to the above-described method embodiments.
Fig. 9 is a schematic hardware structure of a network card 90 according to an embodiment of the present application. As shown, the network card 90 includes: at least one processor 901, communication lines 902, memory 903, and communication interface 904. Communication line 902 may include a path for communicating information between the at least one processor 901, memory 903, and communication interface 904. The communication interface 904 is used to communicate with other devices or means by the network card 90. The communication interface 904 may include a wired transceiver or a wireless transceiver. The wireless transceiver may include a communication chip. The at least one processor 901 and the communication chip may be integrated together or may be separately provided. The memory 903 is used to store computer-executable instructions for performing aspects of the present application and is controlled by the processor 901 for execution. The processor 901 is configured to execute computer-executable instructions stored in the memory 903, thereby implementing the LRO function setting method provided in the above embodiment of the present application. The explanation of the relevant contents, the description of the advantageous effects, and the like in this embodiment can refer to the above-described method embodiments.
Fig. 10 is a schematic structural diagram of a server 1000 according to an embodiment of the present application. As shown, the server 1000 includes at least one processor 1001, communication lines 1002, memory 1003, a network card 1004, and a communication interface 1005. Communication interface 1005 may include a wired transceiver or a wireless transceiver. The wireless transceiver may include a communication chip. The at least one processor 1001 and the communication chip may be integrated together or may be separately provided.
The processor 1001 may be a general purpose CPU, and the processor 1001 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. The processor 1001 may also be a graphics processor (graphics processing unit, GPU), a neural network processor (neural network processing unit, NPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the program of the present application.
Communication line 1002 may include a path for communicating information between components such as processor 1001, memory 1003, network card 1004, and communication interface 1005.
The memory 1003 may be, but is not limited to, a read-only memory (ROM) or other type of static storage device that can store static information and instructions, a random access memory (random access memory, RAM) or other type of dynamic storage device that can store information and instructions, or an electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), a compact disc read-only memory (compact disc read-only memory) or other optical disk storage, optical disk storage (including compact disc, laser disc, optical disc, digital versatile disc, blu-ray disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 1003 may be separate and coupled to the processor 1001 by communication lines 1002. Memory 1003 may also be integrated with processor 1001. Memory 1003 provided by embodiments of the present application may generally have non-volatility. The memory 1003 is used for storing computer-executable instructions for performing the aspects of the present application and is controlled for execution by the processor 1001. The processor 1001 is configured to execute computer-executable instructions stored in the memory 1003, thereby implementing the LRO function setting method provided in the above embodiment of the present application.
The structure of the network card 1004 may refer to fig. 9, which is not described herein.
The communication interface 1005 may be any transceiver-like device for the server 1000 to communicate with other devices.
Computer-executable instructions in embodiments of the present application may alternatively be referred to as application code.
As one example, the processor 1001 may include one or more CPUs. As one example, the server 1000 may include a plurality of processors. Each of these processors may be a single-core (single-CPU) processor or may be a multi-core (multi-CPU) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
The server 1000 may be a general purpose device or a special purpose device. For example, the server 1000 may be an X86, ARM based server, or may be other dedicated servers, such as policy control and charging (policy control and charging, PCC) servers, etc. The embodiment of the present application is not limited to the type of server 1000. ARM is an acronym for advanced reduced instruction processor (advanced RISC machines), RISC is an acronym for reduced instruction set computer (reduced instruction set compute).
The embodiment of the application also provides a communication system, which may include the server 1000, where the server 1000 may be used as a receiving end server. In addition, the communication system further includes a transmitting end server for transmitting an object to the receiving end server so that the receiving end device performs the setting method of the LRO function described above.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using a software program, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer-executable instructions are loaded and executed on a computer, the processes or functions in accordance with embodiments of the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, a website, computer, server, or data center via a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices including one or more servers, data centers, etc. that can be integrated with the media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
The foregoing is only a specific embodiment of the present application. Variations and alternatives will occur to those skilled in the art based on the detailed description provided herein and are intended to be included within the scope of the application.

Claims (17)

1. A method for setting a large-scale reception offload LRO function, the method comprising:
acquiring the interruption probability of a first receiving queue aggregation process in a network card, wherein the first receiving queue is any one of a plurality of receiving queues; determining start-stop information of the first receiving queue according to an LRO start condition and the interruption probability, and counting the interruption probability of the aggregation process of the receiving queue under the condition that the LRO function of the receiving queue is started; the target object obtained after the LRO function is executed for the receiving queue comprises: the message in the receiving queue, the message slice which can not participate in aggregation in the receiving queue and/or the message slice or the message obtained after the aggregation of a plurality of message slices in the receiving queue, if one target object is the message slice and is not the last message slice in the affiliated message and the length is smaller than or equal to a first threshold value, the aggregation process of the receiving queue is interrupted once; when the interruption probability of the aggregation process of the receiving queue is greater than or equal to a second threshold value, determining that the start-stop information is used for indicating to stop the LRO function of the receiving queue; the LRO opening condition is used for indicating a rule for opening or closing an LRO function of a receiving queue in the network card;
And setting an LRO function of the network card according to the start-stop information.
2. The method of claim 1, wherein the network card includes a plurality of receive queues, each receive queue being capable of setting the LRO open condition.
3. The method according to claim 1 or 2, wherein the start-stop information includes an LRO function for indicating to start aggregation of a plurality of message slices belonging to a same data flow in the first queue and a LRO function for indicating to close aggregation of a plurality of message slices belonging to a same data flow.
4. The method of claim 1, wherein said counting the probability of an aggregate process interruption for the receive queue comprises:
counting the times of the interruption of the aggregation process of the receiving queue in a first preset time period and the number of target objects obtained after the LRO function is executed on the receiving queue in the first preset time period;
and determining the probability of the aggregation process interruption of the receiving queue according to the times of the aggregation process interruption of the receiving queue in the first preset time period and the number of target objects obtained after the LRO function is executed on the receiving queue in the first preset time period.
5. The method according to claim 1 or 4, characterized in that the method further comprises:
selecting a threshold set from a plurality of threshold sets, each threshold set comprising a third threshold and a fourth threshold;
and taking a third threshold value included in the selected threshold value group as the first threshold value, and taking a fourth threshold value included in the selected threshold value group as the second threshold value.
6. The method of claim 1, wherein the determining start-stop information for the first receive queue based on the LRO start condition and the outage probability comprises:
under the condition that the LRO function of the receiving queue is stopped, counting the probability that the data flow to which the object in the receiving queue belongs is multiflow, wherein the object comprises a message and/or a message slice;
and if the counted probability that the data flow to which the object in the receiving queue belongs is multi-flow is smaller than or equal to a fifth threshold value, determining that the start-stop information is used for indicating to start an LRO function of the receiving queue.
7. The method of claim 6, wherein said counting the probability that the data stream to which the object in the receive queue belongs is multi-stream comprises:
counting the number of times that the data stream to which the object in the receiving queue belongs is multi-stream in a second preset time period and the number of the objects in the receiving queue in the second preset time period; if the hash values of two adjacent objects in the receiving queue are different, adding 1 to the number of times that the data stream to which the object of the receiving queue belongs is multi-stream;
And determining the probability that the data flow of the object in the receiving queue is multi-flow according to the times that the data flow of the object in the receiving queue is multi-flow in the second preset time period and the number of the objects in the receiving queue in the second preset time period.
8. The method according to claim 6 or 7, characterized in that the method further comprises:
one threshold value is selected from a plurality of threshold values, and the selected threshold value is taken as the fifth threshold value.
9. A setting apparatus for large-scale reception offload LRO function, the apparatus comprising:
the network card aggregation device comprises a determining unit, a first receiving queue and a second receiving queue, wherein the determining unit is used for obtaining the interruption probability of a first receiving queue aggregation process in the network card, and the first receiving queue is any one of a plurality of receiving queues; determining start-stop information of the first receiving queue according to an LRO start condition and the interruption probability, and counting the interruption probability of the aggregation process of the receiving queue under the condition that the LRO function of the receiving queue is started; the target object obtained after the LRO function is executed for the receiving queue comprises: the message in the receiving queue, the message slice which can not participate in aggregation in the receiving queue and/or the message slice or the message obtained after the aggregation of a plurality of message slices in the receiving queue, if one target object is the message slice and is not the last message slice in the affiliated message and the length is smaller than or equal to a first threshold value, the aggregation process of the receiving queue is interrupted once; and when the interruption probability of the aggregation process of the receiving queue is greater than or equal to a second threshold value, determining that the start-stop information is used for indicating to stop the LRO function of the receiving queue; the LRO opening condition is used for indicating a rule for opening or closing an LRO function of a receiving queue in the network card;
And the setting unit is used for setting the LRO function of the network card according to the start-stop information.
10. The apparatus of claim 9, wherein the network card includes a plurality of receive queues, each receive queue capable of setting the LRO open condition.
11. The apparatus according to claim 9 or 10, wherein the start-stop information includes an LRO function for indicating to start aggregation of a plurality of message slices belonging to a same data flow in the first queue and a LRO function for indicating to close aggregation of a plurality of message slices belonging to a same data flow.
12. The apparatus of claim 9, wherein the device comprises a plurality of sensors,
the determining unit is further configured to count the number of times of interruption of the aggregation process of the receive queue in a first preset time period, and the number of target objects obtained after the LRO function is performed on the receive queue in the first preset time period; and determining the probability of the interruption of the aggregation process of the receiving queue according to the interruption times of the aggregation process of the receiving queue in the first preset time period and the number of target objects obtained after the LRO function is executed on the receiving queue in the first preset time period.
13. The device according to claim 9 or 10, wherein,
the determining unit is further configured to select a threshold group from a plurality of threshold groups, each threshold group including a third threshold and a fourth threshold; and taking a third threshold value included in the selected threshold value group as the first threshold value, and taking a fourth threshold value included in the selected threshold value group as the second threshold value.
14. The apparatus of claim 9, wherein the device comprises a plurality of sensors,
the determining unit is further configured to, in a case where an LRO function of the receiving queue is stopped, count a probability that a data stream to which an object in the receiving queue belongs is multi-stream, where the object includes a packet and/or a packet slice; and if the counted probability that the data flow to which the object in the receiving queue belongs is multi-flow is smaller than or equal to a fifth threshold value, determining that the start-stop information is used for indicating to start an LRO function of the receiving queue.
15. The apparatus of claim 14, wherein the device comprises a plurality of sensors,
the determining unit is further configured to count the number of times that the data stream to which the object in the receiving queue belongs is multi-stream in a second preset time period and the number of objects in the receiving queue in the second preset time period; if the hash values of two adjacent objects in the receiving queue are different, adding 1 to the number of times that the data stream to which the object of the receiving queue belongs is multi-stream; and determining the probability that the data flow of the object in the receiving queue is multi-flow according to the times that the data flow of the object in the receiving queue is multi-flow in the second preset time period and the number of the objects in the receiving queue in the second preset time period.
16. The apparatus according to claim 14 or 15, further comprising a selection unit for selecting one threshold value from a plurality of threshold values, and taking the selected threshold value as the fifth threshold value.
17. A setting device for large-scale reception of an offloaded LRO function, characterized in that the device comprises a memory for storing computer-executable instructions and a processor for invoking the computer-executable instructions such that the device, when running, executes the computer-executable instructions to perform the operational steps of the method according to any of claims 1 to 8.
CN202110527602.2A 2018-11-14 2018-11-14 Method and device for setting large-scale receiving and unloading functions Active CN113411262B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110527602.2A CN113411262B (en) 2018-11-14 2018-11-14 Method and device for setting large-scale receiving and unloading functions

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110527602.2A CN113411262B (en) 2018-11-14 2018-11-14 Method and device for setting large-scale receiving and unloading functions
CN201811356753.0A CN109688063B (en) 2018-11-14 2018-11-14 Method and device for setting large receiving and unloading function

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201811356753.0A Division CN109688063B (en) 2018-11-14 2018-11-14 Method and device for setting large receiving and unloading function

Publications (2)

Publication Number Publication Date
CN113411262A CN113411262A (en) 2021-09-17
CN113411262B true CN113411262B (en) 2023-09-05

Family

ID=66184666

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201811356753.0A Active CN109688063B (en) 2018-11-14 2018-11-14 Method and device for setting large receiving and unloading function
CN202110527602.2A Active CN113411262B (en) 2018-11-14 2018-11-14 Method and device for setting large-scale receiving and unloading functions

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201811356753.0A Active CN109688063B (en) 2018-11-14 2018-11-14 Method and device for setting large receiving and unloading function

Country Status (1)

Country Link
CN (2) CN109688063B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111694781A (en) * 2020-04-21 2020-09-22 恒信大友(北京)科技有限公司 ARM main control board based on data acquisition system
CN112214968A (en) * 2020-10-12 2021-01-12 中国民航信息网络股份有限公司 Message conversion method and device and electronic equipment
CN115733897A (en) * 2021-08-27 2023-03-03 华为技术有限公司 Data processing method and device
CN114448916A (en) * 2021-12-24 2022-05-06 锐捷网络股份有限公司 TIPC message processing method, device, equipment and storage medium
CN115665073B (en) * 2022-12-06 2023-04-07 江苏为是科技有限公司 Message processing method and device
CN117354254B (en) * 2023-10-17 2024-04-02 无锡众星微系统技术有限公司 Combined interrupt control method and device based on LRO timeout and interrupt ITR timeout

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6529911B1 (en) * 1998-05-27 2003-03-04 Thomas C. Mielenhausen Data processing system and method for organizing, analyzing, recording, storing and reporting research results
CN101841545A (en) * 2010-05-14 2010-09-22 中国科学院计算技术研究所 TCP stream restructuring and/or packetizing method and device
US8386644B1 (en) * 2010-10-11 2013-02-26 Qlogic, Corporation Systems and methods for efficiently processing large data segments
EP2843891A1 (en) * 2013-08-26 2015-03-04 VMWare, Inc. Traffic and load aware dynamic queue management
JP2016012801A (en) * 2014-06-27 2016-01-21 富士通株式会社 Communication apparatus, communication system, and communication apparatus control method
CN108337188A (en) * 2013-08-26 2018-07-27 Vm维尔股份有限公司 The traffic and the management of Load-aware dynamic queue

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090232137A1 (en) * 2008-03-12 2009-09-17 Dell Products L.P. System and Method for Enhancing TCP Large Send and Large Receive Offload Performance
US9384033B2 (en) * 2014-03-11 2016-07-05 Vmware, Inc. Large receive offload for virtual machines
US9742682B2 (en) * 2014-03-11 2017-08-22 Vmware, Inc. Large receive offload for virtual machines

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6529911B1 (en) * 1998-05-27 2003-03-04 Thomas C. Mielenhausen Data processing system and method for organizing, analyzing, recording, storing and reporting research results
CN101841545A (en) * 2010-05-14 2010-09-22 中国科学院计算技术研究所 TCP stream restructuring and/or packetizing method and device
US8386644B1 (en) * 2010-10-11 2013-02-26 Qlogic, Corporation Systems and methods for efficiently processing large data segments
EP2843891A1 (en) * 2013-08-26 2015-03-04 VMWare, Inc. Traffic and load aware dynamic queue management
CN108337188A (en) * 2013-08-26 2018-07-27 Vm维尔股份有限公司 The traffic and the management of Load-aware dynamic queue
JP2016012801A (en) * 2014-06-27 2016-01-21 富士通株式会社 Communication apparatus, communication system, and communication apparatus control method

Also Published As

Publication number Publication date
CN113411262A (en) 2021-09-17
CN109688063A (en) 2019-04-26
CN109688063B (en) 2021-05-18

Similar Documents

Publication Publication Date Title
CN113411262B (en) Method and device for setting large-scale receiving and unloading functions
US9965441B2 (en) Adaptive coalescing of remote direct memory access acknowledgements based on I/O characteristics
CN107003905B (en) Techniques to dynamically allocate resources for local service chains of configurable computing resources
US9313047B2 (en) Handling high throughput and low latency network data packets in a traffic management device
US20210152675A1 (en) Message segmentation
CN111641566B (en) Data processing method, network card and server
CN115349121A (en) Method and device for processing stateful service
CN113014508A (en) Message processing method and device
CN105978821B (en) The method and device that network congestion avoids
US11876859B2 (en) Controlling packet delivery based on application level information
CN117157957A (en) Switch-induced congestion messages
WO2023107208A1 (en) Congestion control
US11303571B2 (en) Data communication method and data communications network
US20220103479A1 (en) Transmit rate based on detected available bandwidth
CN115242726A (en) Queue scheduling method and device and electronic equipment
CN113328953A (en) Method, device and storage medium for network congestion adjustment
CN116868553A (en) Dynamic network receiver driven data scheduling on a data center network for managing endpoint resources and congestion relief
WO2015032430A1 (en) Scheduling of virtual machines
EP3709164A1 (en) Software assisted hashing to improve distribution of a load balancer
US20230029796A1 (en) Stateful service processing method and apparatus
CN115514708B (en) Congestion control method and device
US20220179805A1 (en) Adaptive pipeline selection for accelerating memory copy operations
CN117579543B (en) Data stream segmentation method, device, equipment and computer readable storage medium
US20190007317A1 (en) Technologies for efficiently determining a root of congestion with a multi-stage network switch
EP3016333B1 (en) Handling high throughput and low latency network data packets in a traffic management device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20211223

Address after: 450046 Floor 9, building 1, Zhengshang Boya Plaza, Longzihu wisdom Island, Zhengdong New Area, Zhengzhou City, Henan Province

Applicant after: Super fusion Digital Technology Co.,Ltd.

Address before: 518129 Bantian HUAWEI headquarters office building, Longgang District, Guangdong, Shenzhen

Applicant before: HUAWEI TECHNOLOGIES Co.,Ltd.

GR01 Patent grant
GR01 Patent grant