CN112799840B

CN112799840B - Method, apparatus, device and storage medium for transmitting data

Info

Publication number: CN112799840B
Application number: CN202110125964.9A
Authority: CN
Inventors: 赵罡; 陈方耀
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2024-07-05
Anticipated expiration: 2041-01-29
Also published as: CN112799840A

Abstract

The application discloses a method, a device, equipment and a storage medium for transmitting data, relates to the field of artificial intelligence, and particularly relates to the field of virtualization in cloud computing. The specific implementation scheme is as follows: acquiring a target message through a field programmable gate array; responding to the determination that the category of the target message comprises a control message, and analyzing the control message to obtain a first processing result; responding to the determination that the category of the target message comprises the data message, and analyzing the data message to obtain a second processing result; and carrying out data transmission through the system-in-chip according to the third processing result, wherein the third processing result comprises any one of the first processing result and the second processing result. According to the realization mode, the acquired target message is analyzed, the processed message is distributed to the corresponding processing unit in the network card for professional processing according to the analysis result, the bare metal server can be compatible, the throughput performance of data input/output is improved, and the overall material cost is reduced.

Description

Method, apparatus, device and storage medium for transmitting data

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to the field of virtualization in cloud computing, and more particularly, to a method, apparatus, device, and storage medium for transmitting data.

Background

VirtIO is a paravirtualized IO protocol, and is the most widely applied virtualized device implementation at present.

Bare metal servers are becoming mature public cloud products. In order to be able to share an image with a virtual machine, a bare metal server wants to support IO devices of the VirtIO backend. However, the initial goal of virtuo is to solve the problem of virtualized IO, so that the current back-end implementation is based on a virtual machine monitor, and bare hardware implementation is lacking, so that the public cloud bare metal server is greatly limited in functional characteristics.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and storage medium for transmitting data.

According to an aspect of the present disclosure, there is provided a method for transmitting data, applied to a network card, wherein the network card includes a field programmable gate array and a system-on-chip, the method including: acquiring a target message through a field programmable gate array; responding to the determination that the category of the target message comprises a control message, and analyzing the control message to obtain a first processing result; responding to the determination that the category of the target message comprises the data message, and analyzing the data message to obtain a second processing result; and carrying out data transmission through the system-in-chip according to the third processing result, wherein the third processing result comprises any one of the first processing result and the second processing result.

According to another aspect of the present disclosure, there is provided an apparatus for transmitting data, applied to a network card, wherein the network card includes a field programmable gate array and a system on a chip, the apparatus comprising: the acquisition unit is configured to acquire the target message through the field programmable gate array; the first analyzing unit is configured to analyze the control message to obtain a first processing result in response to determining that the category of the target message comprises the control message; the second analyzing unit is configured to analyze the data message to obtain a second processing result in response to determining that the category of the target message comprises the data message; and the control unit is configured to perform data transmission through the system-in-chip according to a third processing result, wherein the third processing result comprises any one of the first processing result and the second processing result.

According to still another aspect of the present disclosure, there is provided an electronic device for transmitting data, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method for transmitting data as described above.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method for transmitting data as described above.

According to yet another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements a method for transmitting data as described above.

According to the technology, the problem that the public cloud bare metal server is greatly limited in functional characteristics due to the fact that the virtual IO back end is applied to the bare metal server is solved, the acquired target message is analyzed and processed, the processed message is distributed to corresponding processing units (such as a hardware end field programmable logic gate array and a software end system-level chip in the network card) in the network card according to analysis results to perform specialized processing, the whole back end is independent of a Host, the bare metal server can be compatible, throughput performance of data input/output is improved, thousands of level scale paravirtualized IO protocol equipment and total queues can be supported under the condition of small hardware device capacity, queue management and scheduling are achieved through software driving, complex scheduling logic and large resource consumption are shifted from the field programmable logic gate array to the system-level chip special for the processing logic under the condition that a large number of equipment and queues exist, stability and efficiency are improved, and overall material cost is reduced.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are included to provide a better understanding of the present application and are not to be construed as limiting the application. Wherein:

FIG. 1 is an exemplary system architecture diagram in which an embodiment of the present application may be applied;

FIG. 2 is a flow chart of one embodiment of a method for transmitting data in accordance with the present application;

fig. 3 is a schematic diagram of an application scenario of a method for transmitting data according to the present application;

FIG. 4 is a flow chart of another embodiment of a method for transmitting data according to the present application;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for transmitting data in accordance with the present application;

Fig. 6 is a block diagram of an electronic device for implementing a method for transmitting data according to an embodiment of the application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other. The application will be described in detail below with reference to the drawings in connection with embodiments.

Fig. 1 shows an exemplary system architecture 100 to which an embodiment of a method for transmitting data or an apparatus for transmitting data of the present application may be applied.

As shown in fig. 1, a system architecture 100 may include a client 101, a field programmable gate array 102, and a system on a chip. The user terminal 101 can be in communication connection with the field programmable gate array 102, the field programmable gate array 102 and the system-in-chip 103.

Specifically, the client 101 may include a Guest OS, which may include a front end driver VirtIO frontend driver, which is a standard network card or block device driver with a client kernel.

In the application, the realization of the virtual IO back end by the cooperation of software and hardware is generally based on IO equipment hardware with a hardware Field Programmable Gate Array (FPGA) and an independent software system-on-chip (SOC), such as an intelligent network card. Wherein the FPGA pair models VirtIOPCIe devices, but needs to implement IODMA and queue management scheduling in concert with drivers in the SOC.

Specifically, the FPGA mainly implements a TLP control module, a virto device simulation module, and a virto DMA module, and needs to implement a back-end queue corresponding to the front-end queue one to one in the driving on the SOC.

In the present application, the field programmable gate array 102 may be a field programmable gate array (FPGA, field Programmable GATE ARRAY). Specifically, the field programmable gate array 102 may include a TLP control module (TLP adapter) for performing packet processing for a packet type, a virto DMA module for copying and transmitting a data packet in a front-end queue and a virto device simulation module (VirtIO PF Emulator) for parsing and responding to a control packet.

Specifically, a System-on-a-Chip (SoC) 103 may include a back-end driver (VirtIO backend driver), a virtual Network switch (Network vSwitch), service software for implementing communication and isolation of a virtual Network, and a front-end service software for implementing cloud Storage forwarding.

The field programmable gate array 102 may receive a packet sent by the client 101, determine a type of the packet by using the TLP control module, and forward the packet to a corresponding processing module according to the type of the packet, for example, forward the control packet to the virto device simulation module for parsing and responding, and forward the data packet to the virto DMA module for copying and transmitting the data packet in the front-end and back-end queues.

The system-in-chip 103 can control data transmission according to the data processed by the field programmable gate array 102, and forward the data to other physical networks.

It will be appreciated that the number of individual modules in the field programmable gate array 102 and the system on a chip 103 may be one or more, and the application is not limited in detail.

With continued reference to fig. 2, a flow 200 of one embodiment of a method for transmitting data in accordance with the present application is shown. The method for transmitting data of the present embodiment includes the steps of:

step 201, obtaining a target message through a field programmable gate array.

In this embodiment, the execution body (for example, the system-on-chip 103) of the method for transmitting data may obtain the target packet from the user side Host through the field programmable gate array therein. Specifically, the target message may be an operation instruction or data. TLP control modules, virtoio DMA modules, virtIO PF Emulator device simulation modules may be included in a field programmable gate array (FPGA, field Programmable GATE ARRAY).

Step 202, in response to determining that the category of the target message includes the control message, analyzing the control message to obtain a first processing result.

After the executing body acquires the target message, the executing body can determine the category of the target message through a TLP control module in the field programmable gate array. Specifically, a TLP control module in the field programmable gate array may input the target packet into a pre-trained classification model to obtain a class of the target packet, where the pre-trained classification model may be used to characterize a correspondence between the packet and the packet class. Of course, the TLP control module in the field programmable gate array in the execution body may also first obtain a correspondence between the pre-stored packet and the packet type, and determine the type of the target packet according to the obtained correspondence between the target packet and the packet type. The method for determining the message category corresponding to the target message is not particularly limited. Specifically, the category of the target message may be a control message or a data message. The control message is used for initializing and setting the VirtIOPCIe equipment attributes obtained by the external simulation by the FPGA. The data message is used for copying and transmitting in the front-end queue and the back-end queue.

After determining that the type of the target message is the control message, the field programmable gate array of the execution body can analyze the control message to obtain a first processing result. For example, the obtained first processing result may be a control instruction that performs state query, bar space access, expansionRom access on the external VirtIOPCIe devices by the FPGA.

The execution body can distribute the first processing result to the system-in-chip based on the independent interface of the VirtIO PF Emulator equipment simulation module and the SOC, so that the system-in-chip can perform corresponding initialization setting on the VirtIOPCIe equipment attribute obtained by the external simulation of the field programmable gate array FPGA according to the first processing result.

And 203, in response to determining that the category of the target message comprises the data message, analyzing the data message to obtain a second processing result.

The TLP control module may distinguish whether the target packet is a control packet or a data packet.

After the executing body responds to the data message of the type of the determined target message, the VirtIO DMA module in the field programmable gate array of the executing body can analyze the data message to obtain a second processing result. The second processing result may include a storage address of a corresponding memory of the data packet, a destination address of data transmission, etc., and the specific content of the second processing result is not limited in the present application.

Specifically, the virto DMA module in the execution body field programmable gate array may obtain a corresponding memory of the data packet based on the second processing result, and copy and transmit the corresponding memory of the data packet in the front-end queue and the back-end queue based on the data transmission address in the second processing result.

Specifically, in the design of the back-end drive on the SOC side of the system-on-chip, the back-end drive queues and the front-end drive queues are in one-to-one correspondence. According to the virto protocol and implementation, the front-end drive queues rely on two ring queues: and carrying out full duplex data communication with a back end (system on chip (SOC) side and the whole text). Full Duplex (Full Duplex) refers to the ability to receive data while transmitting data, both in synchronization. The same data structure and queue design are also used for the back-end driver in the SOC, so that the FPGA can multiplex the VirtIO device simulation module and the VirtIO DMA module to read, analyze and write the front-end data transmission queue and the back-end data transmission queue.

For example, for an IO process, taking a sending direction (i.e. a direction from Host (i.e. front end) to SmartNIC (i.e. back end)) as an example, the FPGAVirtIO DMA module takes out an available descriptor from available rings in a back-end queue in SmartNIC (intelligent network card), copies a corresponding memory of a data packet from the front end of Host to an available buffer indicated by the descriptor, and then fills the descriptor into the used ring. The back end driver VirtIO backend driver in the system-on-chip SOC of SmartNIC can read the IO data from the synchronized used ring, and forward the corresponding memory of the data packet to the physical network through the intelligent network card chip according to the service requirement, so as to complete service network forwarding or service storage reading and writing.

And step 204, according to the third processing result, performing data transmission through the system-in-chip.

In this embodiment, the third processing result includes any one of the first processing result and the second processing result. After the execution body obtains the third processing result, the execution body may perform data transmission based on the third processing result. Specifically, the third processing result may be only the first processing result, may be only the second processing result, or may include both the first processing result and the second processing result.

Specifically, if the third processing result includes the first processing result, the third processing result may be a control instruction, and initializing the VirtIOPCIe device attribute obtained by the external module by the FPGA; for another example, if the third processing result includes the second processing result, the third processing result may be a memory location of the data to be transferred and an address to be transferred, and the execution body may copy the data to be transferred in the front-end queue and the back-end queue based on the address to be transferred based on the virto DMA and the back-end driver VirtIO backend driver communicatively connected thereto.

In the general service implementation, the whole process is zero-copy, and very good throughput performance can be obtained. After the data is forwarded from the physical network, the back-end driver SmartNIC fills the corresponding descriptor into the available ring for reuse by the FPGAVirtIO DMA module of the field programmable gate array of the back-end SmartNIC.

With continued reference to fig. 3, a schematic diagram of one application scenario of the method for transmitting data according to the present application is shown. In the application scenario of fig. 3, the field programmable gate array 302 obtains the target message 305 from the client 301. The field programmable gate array 302, in response to determining that the class of the target message 305 includes the control message 306, parses the control message 306 to obtain a first processing result 307. In response to determining that the class of the target message 305 includes the data message 308, the field programmable gate array 302 parses the data message 308 to obtain a second processing result 309. The execution body network card 311 performs data transmission according to the third processing result 310 and through the system level chip 303, and transmits the data transmission to the external physical network 304, where the third processing result 310 includes any one of the first processing result 307 and the second processing result 309.

According to the method, the device and the system, the acquired target message is analyzed, the processed message is distributed to corresponding processing units (such as a hardware-end field programmable gate array and a software-end system-level chip in the network card) in the network card according to the analysis result to perform specialized processing in each adept field, the whole back end can be realized independently of Host, a bare metal server is compatible, throughput performance of data input/output can be improved, the number of thousands-scale paravirtualized IO protocol devices and total queues can be supported under the condition of smaller hardware device capacity, queue management and scheduling are realized through software driving, and complex scheduling logic and large resource consumption are improved from the field programmable logic gate array to the system-level chip special for the processing logic under the condition that a large number of devices and queues exist, so that the stability and the efficiency are improved, and the overall material cost is reduced.

With continued reference to fig. 4, a flow 400 of another embodiment of a method for transmitting data in accordance with the present application is shown. As shown in fig. 4, the method for transmitting data of the present embodiment may include the steps of:

Step 401, obtaining a target message through a field programmable gate array.

In step 402, in response to determining that the class of the target message includes the control message, the control message is parsed to obtain a first processing result.

Step 403, in response to determining that the class of the target message includes the data message, analyzing the data message to obtain a second processing result.

And step 404, according to the third processing result, performing data transmission through the system-in-chip.

Wherein the third processing result includes any one of the first processing result and the second processing result.

The principle of steps 401 to 404 is similar to that of steps 201 to 204, and will not be described here again.

Specifically, step 404 may also be implemented by steps 4041 to 4047:

step 4041, determining configuration information by the system on chip based on the third processing result.

After the execution body obtains the third processing result, in response to determining that the third processing result is a result obtained by processing the control message, configuration information of VirtIOPCIe devices which are externally molded by the FPGA can be determined through the system-in-chip based on the third processing result. The configuration information may be, for example, an upper limit threshold value of the number of data transmission queues of the one-time-passing VirtIOPCIe device, and information such as Bar access space capacity.

In addition, in some optional implementations of the present embodiment, the third processing result may further include some query instructions, and by way of example, when the third processing result is a result obtained by processing the control packet, the execution body may further obtain a query/access instruction in the third processing result, for example, a status query on VirtIOPCIe devices, a Bar space access, a ExpansionRom access, and so on.

Step 4042, data transmission is performed based on the configuration information.

After the execution body obtains the configuration information, the configuration information can be distributed to the system-in-chip based on the independent interface of the VirtIO PF Emulator equipment simulation module and the SOC, so that the system-in-chip can perform initialization setting on the VirtIOPCIe equipment attributes obtained by the field programmable gate array FPGA through simulation according to the configuration information.

According to the embodiment, the analysis result is obtained by correspondingly analyzing the target message according to the type of the target message, the field programmable gate array can respond to some conventional inquiry/access actions based on the analysis result, the software and hardware division is clear, and the field programmable gate array at the hardware end of the network card can improve the speed and accuracy of data transmission by distributing some logical works such as data transmission/attribute configuration to a system-level chip for logic processing for processing.

Step 4043, determining the number of the current data transmission queues according to the third processing result.

In this embodiment, when the third processing result is obtained by analyzing the data packet, the third processing result may include information such as an address of a memory corresponding to the data packet, an address for forwarding the data packet, and the number of current data transmission queues corresponding to the data packet. When the execution body determines that the third processing result is obtained by analyzing the data message, the number of the current data transmission queues can be determined and obtained from the third processing result.

In this embodiment, after acquiring the corresponding memory of the data packet corresponding to the front end control command, the system-in-chip of the execution body may adjust the number of data transmission queues based on the corresponding memory to control data transmission. The memory corresponding to the data message may include a preset data transmission queue threshold.

Specifically, a control data transfer queue control queue is specified in a paravirtualized IO protocol (virto protocol) for performing a virtualized VirtIOPCIe device implementation, to pass some device control commands. The data structure definition of the control queue is the same as the data queue. In order to simplify the implementation, the back end of the control queue can implement the multiplexing data queue in the back end driver of the SOC, that is, the back end driver on the SOC also has the control queue corresponding to the front end one by one. VirtIOPCIe the device control instruction needs to affect the virto device simulation module, and the back end driving of the SOC and the virto device simulation module have separate control interfaces.

In this embodiment, after the front-end control command is issued from the control queue, the field programmable gate array FPGA copies and sends the corresponding memory of the front-end control command to the corresponding queue of the SOC through the virto DMA module. The SOC backend driver needs to be able to parse all types of control commands. If the control command affects the equipment simulation module, if the network card mac address is set, the back end driver needs to be set to the FPGA VirtIO equipment simulation module through an independent control interface for further processing; if the control command affects the back-end data plane, such as adjusting the number of queues, the back-end driver directly acts according to the control command to control the number of queues, thereby realizing control data transmission.

Step 4044, in response to determining that the number of current data transfer queues exceeds the preset threshold, slowing the rate of supplementing the descriptors.

Wherein the descriptor is used for indicating the position of the target message in the data transmission queue.

Step 4045, responsive to determining that the number of current data transfer queues is below a preset threshold, accelerating the rate of supplementing the descriptors.

The execution body may also perform IO control and scheduling. Specifically, for each data transmission queue in the system-in-chip, the execution body may adjust the rate of consuming the descriptor for indicating the target message or adjust the rate of supplementing the descriptor for indicating the target message according to the data transmission rate of the data transmission queue or according to the number of current data transmission queues to control data transmission.

The execution body may slow down the rate of supplementing the descriptor when the data transmission rate of the data transmission queue exceeds a transmission rate threshold; the execution body may accelerate the rate of supplementing the descriptor when the data transmission rate of the data transmission queue is below a transmission rate threshold.

And, when the execution body determines that the number of current data transmission queues is below a preset threshold, the execution body may accelerate the rate of supplementing the descriptors. The rate of supplementing the descriptors may be slowed when the executing body determines that the number of current data transfer queues exceeds a preset threshold. Therefore, the number of the data transmission queues is dynamically adjusted, and the data transmission speed is improved. In order to reduce the cost and complexity of the FPGA, the control of back pressure, speed limit and the like in a single queue and the scheduling logic among the queues are realized by the back end drive of the SOC side of the system-in-chip.

For example, for the counter-pressure speed limit in a single queue, since FPGAVirtIO DMA modules need front-end and back-end drivers to prepare available descriptors and buffers at the same time when data transmission is performed, the back-end driver can control the data IO speed of the whole VirtIO back-end by controlling the speed of supplementing recovered descriptors into the available ring. Specifically, the statistical unit received or transmitted in the SOC-side backend drive maintenance queue may be the number of frames or bytes. And stopping supplementing descripor recovered in the available ring when the current data transmission rate of the data transmission queue reaches a preset threshold value. Then, FPGAVirtIO DMA module can not obtain available descriptor from the back end queue, and pause DMA process with front end queue to reach back pressure or speed limit effect, and active packet loss will not occur in the process.

The embodiment distributes control of back pressure, speed limit and the like in a single queue and scheduling logic among the queues to the SOC side rear end drive of the system-in-chip through the cooperation of the system-in-chip and the field programmable gate array, reduces the cost and the complexity of the field programmable gate array, and does not generate active packet loss. The queue management and scheduling are realized through the SOC back-end drive, and under the condition that a large amount of equipment and queues exist, complex scheduling logic and a large amount of resources are consumed, the field programmable gate array FPGA is shifted to the SOC driving software, so that the stability is improved, and meanwhile, the overall material cost is reduced. In addition, the number of the data transmission queues is dynamically adjusted, so that the length of the data queue at the rear end (the system-level chip end) of the network card is consistent with the length of the data queue at the front end, and when the FPGA performs DMA actions on the descriptors and buffers in the front and rear end queues, the DMA pipeline interruption caused by insufficient memory at any end can not occur, and the DMA efficiency and the overall throughput performance are improved.

In some optional implementations of this embodiment, the FPGA of the execution body may prefetch descriptors and buffers at front and back ends to improve the virto DMA efficiency inside the FPGA. In order to cooperate with the field programmable gate array FPGA to prefetch, the system-on-chip SOC can consume and supplement the queues in a batch mode, so that the FPGA can be prefetched as completely as possible, and the front-end and back-end pipeline is kept to run under full load.

Step 4046, determining the difference between the number of current data transmission queues and the preset threshold.

In step 4047, the supplemental descriptor is stopped in response to determining that the difference is less than a preset difference threshold.

The execution body may further determine a difference between the number of current data transmission queues and a preset threshold, and when the difference is smaller than the preset difference threshold, the execution body may control to stop supplementing the descriptor indicating the address of the target message. Therefore, the length of the rear-end data queue is consistent with that of the front-end data queue, the interruption of a DMA pipeline caused by insufficient memory at any end can be avoided, and the DMA efficiency and the overall throughput performance are improved.

With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for transmitting data, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for transmitting data of the present embodiment includes: an acquisition unit 501, a first analysis unit 502, a second analysis unit 503, and a control unit 504.

The obtaining unit 501 is configured to obtain the target message through the field programmable gate array.

The first parsing unit 502 is configured to parse the control message in response to determining that the class of the target message includes the control message, so as to obtain a first processing result.

The second parsing unit 503 is configured to parse the data packet in response to determining that the class of the target packet includes the data packet, to obtain a second processing result.

And a control unit 504 configured to perform data transmission through the system-in-chip according to a third processing result, wherein the third processing result includes any one of the first processing result and the second processing result.

In some optional implementations of the present embodiment, the control unit 504 is further configured to: determining configuration information through the system-in-chip based on the third processing result; and carrying out data transmission based on the configuration information.

In some optional implementations of the present embodiment, the control unit 504 is further configured to: determining the number of the current data transmission queues according to the third processing result; in response to determining that the number of current data transmission queues exceeds a preset threshold, slowing down a rate of supplementing descriptors, wherein the descriptors are used for indicating positions of target messages in the data transmission queues; responsive to determining that the number of current data transmission queues is below a preset threshold, a rate of supplementing descriptors is increased, wherein the descriptors are used to indicate a location of a target message in the data transmission queues. .

In some optional implementations of the present embodiment, the control unit 504 is further configured to: determining a difference value between the number of the current data transmission queues and a preset threshold value; in response to determining that the difference is less than the preset difference threshold, the supplemental descriptor is stopped.

It should be understood that the units 501 to 504 described in the apparatus 500 for transmitting data correspond to the respective steps in the method described with reference to fig. 2. Thus, the operations and features described above with respect to the method for transmitting data are equally applicable to the apparatus 500 and the units contained therein, and are not described in detail herein.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the various methods and processes described above, such as a method for transmitting data. For example, in some embodiments, the method for transmitting data may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the method for transmitting data described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the method for transmitting data as described above, in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual PRIVATE SERVER" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

Computer program product comprising a computer program which, when executed by a processor, implements a method for transmitting data as described above.

According to the technical scheme of the embodiment of the application, the front-end and the rear-end data transmission is realized through the cooperation of software and hardware, the bare metal server can be compatible, the throughput performance of data input/output can be improved, the half-virtualization IO protocol equipment and the total queue number of thousands of levels can be supported under the condition of smaller hardware device capacity, the queue management and the scheduling are realized through software driving, and complex scheduling logic and a large amount of resource consumption are moved from the hardware end to the software end under the condition that a large amount of equipment and queues exist, so that the stability is improved, and the overall material cost is reduced.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method for transmitting data, applied to a network card, wherein the network card comprises a field programmable gate array and a system-on-chip, the field programmable gate array pair outer mold VirtIOPCIe devices, the method comprising:

Acquiring a target message through the field programmable gate array;

responding to the determination that the category of the target message comprises a control message, analyzing the control message to obtain a first processing result, and determining configuration information through the system-in-chip based on the first processing result; configuring the VirtIOPCIe equipment based on the configuration information;

Responding to the determination that the category of the target message comprises a data message, analyzing the data message to obtain a second processing result, and determining the number of current data transmission queues according to the second processing result; in response to determining that the number of current data transmission queues exceeds a preset threshold, slowing down a rate of supplementing descriptors indicating locations of target messages in the data transmission queues; responsive to determining that the number of current data transmission queues is below a preset threshold, a rate of supplementing the descriptor is increased.

2. The method of claim 1, wherein the method further comprises:

determining a difference value between the number of the current data transmission queues and a preset threshold value;

And stopping supplementing the descriptor in response to determining that the difference is less than a preset difference threshold.

3. An apparatus for transmitting data, applied to a network card, wherein the network card comprises a field programmable gate array and a system-on-chip, the field programmable gate array pair outer mold VirtIOPCIe devices, the apparatus comprising:

The acquisition unit is configured to acquire a target message through the field programmable gate array;

The first analyzing unit is configured to analyze the control message to obtain a first processing result in response to determining that the category of the target message comprises the control message;

the second analyzing unit is configured to analyze the data message to obtain a second processing result in response to determining that the category of the target message comprises the data message;

A control unit configured to determine, through the system-in-chip, configuration information based on the first processing result, configure the VirtIOPCIe devices based on the configuration information, and determine the number of current data transmission queues according to the second processing result; in response to determining that the number of current data transmission queues exceeds a preset threshold, slowing down a rate of supplementing descriptors, wherein the descriptors are used for indicating positions of target messages in the data transmission queues; and responsive to determining that the number of current data transmission queues is below a preset threshold, accelerating a rate of supplementing descriptors, wherein the descriptors are used for indicating the positions of target messages in the data transmission queues.

4. The apparatus of claim 3, wherein the control unit is further configured to:

5. An electronic device for transmitting data, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-2.

6. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-4.

7. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-2.