CN112799840A

CN112799840A - Method, device, equipment and storage medium for transmitting data

Info

Publication number: CN112799840A
Application number: CN202110125964.9A
Authority: CN
Inventors: 赵罡; 陈方耀
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2021-05-14

Abstract

The application discloses a method, a device, equipment and a storage medium for data transmission, relates to the field of artificial intelligence, and particularly relates to the field of virtualization in cloud computing. The specific implementation scheme is as follows: acquiring a target message through a field programmable gate array; in response to the fact that the type of the target message comprises the control message, analyzing the control message to obtain a first processing result; in response to the fact that the type of the target message comprises the data message, analyzing the data message to obtain a second processing result; and performing data transmission through the system-on-chip according to a third processing result, wherein the third processing result comprises any one of the first processing result and the second processing result. According to the implementation mode, the acquired target message is analyzed, and the processed message is distributed to the corresponding processing unit in the network card for professional processing according to the analysis result, so that the method is compatible with a bare metal server, the throughput performance of data input/output is improved, and the overall material cost is reduced.

Description

Method, device, equipment and storage medium for transmitting data

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to the field of virtualization in cloud computing, and in particular, to a method, an apparatus, a device, and a storage medium for transmitting data.

Background

VirtiO is a semi-virtualization IO protocol and is the most widely applied virtualization device implementation at present.

Bare metal servers are gradually becoming mature public cloud products. To be able to share images with virtual machines, bare metal servers want to support the IO devices of the VirtIO backend. However, the virtoi proposes an initial target to solve the problem of virtualization IO, so that the current back-end implementation is based on a virtual machine monitor, and is lack of bare hardware implementation, so that the public cloud bare metal server is greatly limited in functional characteristics.

Disclosure of Invention

The present disclosure provides a method, apparatus, device, and storage medium for transmitting data.

According to an aspect of the present disclosure, there is provided a method for transmitting data, which is applied to a network card, wherein the network card includes a field programmable gate array and a system-on-chip, the method includes: acquiring a target message through a field programmable gate array; in response to the fact that the type of the target message comprises the control message, analyzing the control message to obtain a first processing result; in response to the fact that the type of the target message comprises the data message, analyzing the data message to obtain a second processing result; and performing data transmission through the system-on-chip according to a third processing result, wherein the third processing result comprises any one of the first processing result and the second processing result.

According to another aspect of the present disclosure, there is provided an apparatus for transmitting data, which is applied to a network card, wherein the network card includes a field programmable gate array and a system-on-chip, the apparatus includes: the acquisition unit is configured to acquire a target message through the field programmable gate array; the first analysis unit is configured to respond to the fact that the type of the target message comprises a control message, and analyze the control message to obtain a first processing result; the second analysis unit is configured to respond to the fact that the type of the target message comprises a data message, analyze the data message and obtain a second processing result; and the control unit is configured to perform data transmission through the system-on-chip according to a third processing result, wherein the third processing result comprises any one of the first processing result and the second processing result.

According to yet another aspect of the present disclosure, there is provided an electronic device for transmitting data, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for transmitting data as described above.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to execute the method for transmitting data as described above.

According to yet another aspect of the disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method for transmitting data as described above.

According to the technology of the application, the problem that the VirtIO back end is applied to a bare metal server to enable a public cloud bare metal server to be limited in functional characteristics is solved, the obtained target message is analyzed and processed, the processed message is distributed to corresponding processing units (such as a hardware end field programmable logic gate array and a software end system level chip in a network card) in the network card according to the analysis result to be subjected to professional processing with respective excellence, the whole back end can be independent of Host, the bare metal server can be compatible, the throughput performance of data input/output is improved, thousands of levels of semi-virtualized IO protocol devices and total queues can be supported under the condition of small hardware device capacity, queue management and scheduling are achieved through software driving, and under the condition that a large number of devices and queues exist, complex scheduling logic and large amount of resource consumption are pushed from the field programmable logic to the system level chip dedicated for processing logic Stability and efficiency are improved, and overall material cost is reduced.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for transmitting data according to the present application;

FIG. 3 is a schematic diagram of an application scenario of a method for transmitting data according to the present application;

FIG. 4 is a flow diagram of another embodiment of a method for transmitting data according to the present application;

FIG. 5 is a schematic block diagram illustrating one embodiment of an apparatus for transmitting data according to the present application;

fig. 6 is a block diagram of an electronic device for implementing a method for transmitting data according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the method for transmitting data or the apparatus for transmitting data of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include a user terminal 101, a field programmable gate array 102, and a system on chip. The user terminal 101 can be in communication connection with the field programmable gate array 102, and the field programmable gate array 102 and the system on chip 103.

Specifically, the client 101 may include a Guest OS, which may include a front-end driver VirtIO front driver, which is a standard network card or block device driver carried by the client kernel.

In the application, the implementation of the VirtIO backend by software and hardware cooperation is generally established on the basis of the IO device hardware with a hardware field programmable gate array FPGA and an independent software system on chip SOC, such as an intelligent network card. The FPGA simulates VirtioPCIe equipment externally, but the FPGA and a driver in the SOC are required to realize IODMA and queue management scheduling together.

Specifically, the FPGA mainly implements a TLP control module, a VirtIO device simulation module, and a VirtIO DMA module, and the SOC upper driver needs to implement back-end queues corresponding to the front-end queues one to one.

In this application, the Field Programmable Gate Array 102 may be a Field Programmable Gate Array (FPGA). Specifically, the field programmable gate array 102 may include a TLP control module (TLP adapter) configured to perform packet processing for packet types, a VirtIO DMA module configured to copy and transmit data packets in front and back queues, and a VirtIO device simulation module (VirtIO PF Emulator) configured to analyze and respond to control packets.

Specifically, the System-on-a-Chip (SoC) 103 may include a backend driver (VirtIO backup driver), a virtual Network switch (Network vSwitch), service software for implementing communication and isolation of a virtual Network, a cloud Storage agent front end (Storage Proxy), and front end service software for implementing cloud Storage forwarding.

The field programmable gate array 102 may receive a packet sent by the user terminal 101, determine the type of the packet by the TLP control module, and forward the packet to the corresponding processing module according to the type of the packet, for example, forward the control packet to the VirtIO device simulation module for analysis and response, and forward the data packet to the VirtIO DMA module, so that the data packet is copied and transmitted in the front-end queue and the back-end queue.

The soc 103 may control data transmission according to the data processed by the fpga 102, and forward the data to other physical networks.

It is understood that the number of the modules in the field programmable gate array 102 and the system on chip 103 may be one or more, which is not specifically limited in this application.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for transmitting data in accordance with the present application is shown. The method for transmitting data of the embodiment comprises the following steps:

step 201, a target message is obtained through a field programmable gate array.

In this embodiment, an executing entity (for example, the system on chip 103) of the method for transmitting data may obtain the target packet from the user-side Host through the field programmable gate array therein. Specifically, the target packet may be an operation instruction or data. A TLP control module, a VirtIO DMA module, and a VirtIO PF Emulator device simulation module may be included in a Field Programmable Gate Array (FPGA).

Step 202, in response to determining that the category of the target packet includes the control packet, analyzing the control packet to obtain a first processing result.

After acquiring the target packet, the execution body may determine the type of the target packet through a TLP control module in the field programmable gate array. Specifically, the TLP control module in the field programmable gate array may input the target packet into the pre-trained classification model to obtain the class of the target packet, where the pre-trained classification model may be used to represent a correspondence between the packet and the packet class. Of course, the TLP control module in the field programmable gate array in the execution main body may also obtain a correspondence between a pre-stored packet and a packet type, and determine the type of the target packet according to the obtained target packet and the correspondence. The method for determining the message type corresponding to the target message is not particularly limited. Specifically, the category of the target packet may be a control packet or a data packet. The control message is used for initializing the VirtioPCIe device attribute which is externally simulated by the FPGA. The data message is used for copying and transmitting in the front-end queue and the back-end queue.

After the field programmable gate array of the execution main body determines that the type of the target message is the control message, the field programmable gate array can analyze the control message to obtain a first processing result. For example, the obtained first processing result may be a control instruction for performing status query, Bar space access, and expansion rom access on the virtoriosple device externally simulated by the FPGA.

The execution main body can distribute the first processing result to the system-on-chip based on an independent interface between the VirtIO PF indicator device simulation module and the SOC, so that the system-on-chip performs corresponding initialization setting on the VirtIOPCIe device attribute externally simulated by the field programmable gate array FPGA according to the first processing result.

Step 203, in response to determining that the type of the target packet includes a data packet, analyzing the data packet to obtain a second processing result.

The TLP control module may distinguish the target packet as a control packet or a data packet.

After the execution main body responds to the fact that the type of the target message is determined to be the data message, a VirtiO DMA module in a field programmable logic gate array of the execution main body can analyze the data message to obtain a second processing result. The second processing result may include a storage address of a corresponding memory of the data packet, a destination address of the data transmission, and the like, and the specific content of the second processing result is not limited in the present application.

Specifically, the VirtIO DMA module in the field programmable gate array may obtain the corresponding memory of the data packet based on the second processing result, and copy and transmit the corresponding memory of the data packet in the front-end queue and the back-end queue based on the data transmission address in the second processing result.

Specifically, in the design of the back-end driver on the SOC side of the system on chip, the back-end drive queue and the front-end drive queue maintain a one-to-one correspondence. According to the virtoi protocol and implementation, the front-end drive queue relies on two circular queues: the available ring and the used ring perform full-duplex data communication with a back end (SOC side, the same as the whole text). Full Duplex (Full Duplex) means that data can be transmitted and received simultaneously, and both are synchronized. The back-end drive in the SOC also uses the same data structure and queue design, so that the FPGA can multiplex the VirtiO equipment simulation module and the VirtiO DMA module to read, analyze and write the front-end and back-end data transmission queues.

For example, for an IO process, taking a sending direction (i.e., a direction from Host (i.e., front end) to SmartNIC (i.e., back end)) as an example, the fpgavirrtio DMA module takes an available descriptor from available ring of a back-end queue in SmartNIC (smart network card), copies a corresponding memory of a data packet from the front end of Host to an available buffer indicated by the descriptor, and then fills the descriptor in used ring. The rear-end drive VirtiO backup driver in the SOC of the SmartNIC can read the IO data from the synchronous used ring, and the corresponding memory of the data message is forwarded to a physical network through the intelligent network card chip according to the service requirement, so that the service network forwarding or the service storage reading and writing are completed.

And step 204, carrying out data transmission through the system-on-chip according to the third processing result.

In this embodiment, the third processing result includes any one of the first processing result and the second processing result. After the execution main body obtains the third processing result, data transmission may be performed based on the third processing result. Specifically, the third processing result may be only the first processing result, may also be only the second processing result, or may include both the first processing result and the second processing result, and the content specifically included in the third processing result is not limited in the present application.

Specifically, if the third processing result includes the first processing result, the third processing result may be a control instruction, and the initialization setting is performed on the externally simulated virtorio pcie device attribute by the FPGA; for another example, if the third processing result includes the second processing result, the third processing result may be a memory location of the data to be transmitted and an address to be transmitted, and the execution main body may copy and transmit the data to be transmitted in the front-back end queue based on the address to be transmitted based on the virtoi DMA and the backend driver communicatively connected thereto.

In the embodiment, in a common service implementation, the whole process is zero-copy, and very good throughput performance can be obtained. After the data is forwarded from the physical network, the backend driver of the SmartNIC will fill the corresponding descriptor into the available ring, so as to be reused by the fpgavirrtio DMA module of the field programmable gate array at the backend of the SmartNIC.

With continued reference to fig. 3, a schematic diagram of one application scenario of a method for transmitting data according to the present application is shown. In the application scenario of fig. 3, the fpga 302 obtains the target packet 305 from the client 301. In response to determining that the type of the target packet 305 includes the control packet 306, the fpga 302 parses the control packet 306 to obtain a first processing result 307. In response to determining that the type of the target packet 305 includes the data packet 308, the fpga 302 parses the data packet 308 to obtain a second processing result 309. The execution main network card 311 performs data transmission through the system on chip 303 according to the third processing result 310, and transmits the data to the external physical network 304, where the third processing result 310 includes any one of the first processing result 307 and the second processing result 309.

In this embodiment, by parsing the acquired target packet and distributing the processed packet to corresponding processing units (e.g., a hardware-end fpga and a software-end system chip in the network card) in the network card according to the parsing result to perform professional processing in respective adept fields, the implementation of the whole back end is independent of Host, compatible with a bare metal server, capable of improving throughput performance of data input/output, capable of supporting a thousand-level scale of paravirtualized IO protocol devices and total queues under a small hardware device capacity, and implementing queue management and scheduling through software driving, under the condition of a large number of devices and queues, complex scheduling logic and a large amount of resource consumption are pushed from the fpga to a system-level chip dedicated for processing logic, thereby improving stability and efficiency, the overall material cost is reduced.

With continued reference to fig. 4, a flow 400 of another embodiment of a method for transmitting data in accordance with the present application is shown. As shown in fig. 4, the method for transmitting data of the present embodiment may include the following steps:

step 401, a target message is obtained through a field programmable gate array.

Step 402, in response to determining that the category of the target packet includes a control packet, analyzing the control packet to obtain a first processing result.

Step 403, in response to determining that the type of the target packet includes a data packet, analyzing the data packet to obtain a second processing result.

And step 404, performing data transmission through the system on chip according to the third processing result.

Wherein the third processing result includes any one of the first processing result and the second processing result.

The principle of step 401 to step 404 is similar to that of step 201 to step 204, and is not described here again.

Specifically, step 404 can also be implemented by steps 4041 to 4047:

step 4041, based on the third processing result, determines configuration information by the system on chip.

After obtaining the third processing result, the execution main body may determine, through the system on chip, configuration information of the virtorio pcie device externally simulated by the FPGA, in response to determining that the third processing result is a result obtained by processing the control packet. The configuration information may be, for example, information such as an upper threshold of the number of data transmission queues passing through the virtoriosple device at a time, a Bar access space capacity, and the like.

In addition, in some optional implementation manners of this embodiment, the third processing result may further include some query instructions, for example, the execution main body may further obtain the query/access instruction in the third processing result when the third processing result is a result obtained by processing the control packet, for example, a status query, a Bar space access, an expansion rom access, and the like for the virtoriosple device.

Step 4042, data transmission is performed based on the configuration information.

After the execution main body obtains the configuration information, the configuration information can be distributed to the system-on-chip based on an independent interface between the virtorio PF indicator device simulation module and the SOC, so that the system-on-chip can initialize the externally simulated virtorio pcie device attribute of the field programmable gate array FPGA according to the configuration information.

In this embodiment, the analysis result is obtained by analyzing the target packet correspondingly according to the type of the target packet, and the field programmable gate array may make some responses to the conventional query/access actions based on the analysis result, so that the division of labor between software and hardware is clear, and the field programmable gate array at the hardware end of the network card may improve the rate and accuracy of data transmission by allocating some logical operations such as data transmission/attribute configuration to the system level chip for logical processing.

Step 4043, determining the number of the current data transmission queues according to the third processing result.

In this embodiment, when the third processing result is obtained by analyzing the data packet, the third processing result may include information such as an address of a memory corresponding to the data packet, an address forwarded by the data packet, and the number of current data transmission queues corresponding to the data packet. When the execution subject determines that the third processing result is obtained by analyzing the data packet, the number of the current data transmission queues may be determined and obtained from the third processing result.

In this embodiment, after the system on chip of the execution main body obtains the corresponding memory of the data packet corresponding to the front-end control command, the number of the data transmission queues may be adjusted based on the corresponding memory to control data transmission. The memory corresponding to the data packet may include a preset data transmission queue threshold.

In particular, a para-virtualized IO protocol (VirtIO protocol) for implementing virtualized VirtIOPCIe device specifies a control data transmission queue for transferring some device control commands. The data structure definition of the control queue is the same as the data queue. In order to simplify the implementation, the back end of the control queue may implement the multiplexing data queue in the back end driver of the SOC, that is, the back end driver on the SOC also has the control queues corresponding to the front ends one by one. The VirtioPCIe device control instruction needs to affect the VirtioI device simulation module, and the back-end driver of the SOC and the VirtioI device simulation module have separate control interfaces.

In this embodiment, after the front-end control command is issued from the control queue, the FPGA copies the corresponding memory of the front-end control command through the VirtIO DMA module and sends the copied memory to the corresponding queue of the SOC. The SOC back-end driver needs to be able to interpret all types of control commands. Moreover, if the control command affects the device simulation module, such as setting a network card mac address, the back-end driver needs to be set to the FPGA VirtIO device simulation module through an independent control interface for further processing; if the control command affects the back end data plane, such as adjusting the number of queues, the back end driver directly acts according to the control command to control the number of queues, thereby realizing control data transmission.

Step 4044, in response to determining that the number of current data transmission queues exceeds a preset threshold, slowing down the rate of the supplemental descriptors.

The descriptor is used for indicating the position of a target message in the data transmission queue.

Step 4045, in response to determining that the number of current data transmission queues is below a preset threshold, speeding up the rate of supplementing descriptors.

The execution subject may also perform IO control and scheduling. Specifically, for each data transmission queue in the system on chip, the execution body may adjust a rate of consuming the descriptor for indicating the target packet or adjust a rate of supplementing the descriptor for indicating the target packet according to a data transmission rate of the data transmission queue or according to the number of current data transmission queues to control data transmission.

The execution body may slow down the rate of the supplemental descriptor when the data transfer rate of the data transfer queue exceeds a transfer rate threshold; the execution body may speed up the rate of the supplemental descriptor when a data transfer rate of the data transfer queue is below a transfer rate threshold.

And, when the execution body determines that the number of the current data transmission queues is lower than the preset threshold, the execution body may speed up the rate of the supplementary descriptor. The rate of the supplemental descriptors may be slowed down when the execution body determines that the number of current data transmission queues exceeds a preset threshold. Therefore, the number of the data transmission queues is dynamically adjusted, and the data transmission speed is improved. In order to reduce the cost and complexity of the FPGA, the back pressure, speed limit and other controls in the single queue and the scheduling logic between the queues are all realized by the back end drive of the SOC side of the system level chip.

For example, for back pressure speed limitation in a single queue, when the fpgavirtorio DMA module performs data transmission, the front-end driver and the back-end driver are required to prepare an available descriptor and a buffer at the same time, and the back-end driver can control the data IO speed of the whole virtorio back-end by controlling the speed of the descriptor supplemented and recovered in the available ring. Specifically, the statistical unit of reception or transmission in the SOC-side back-end driver maintenance queue may be a frame number or a byte number. And when the current data transmission rate of the data transmission queue reaches a preset threshold value, stopping supplementing the recycled descriptor into the available ring. Then, the FPGAVirtIO DMA module cannot obtain the available descriptor from the back-end queue, and suspends the DMA process with the front-end queue, so as to achieve the effect of back pressure or speed limitation, and active packet loss is not generated in the process.

In the embodiment, the back pressure, speed limit and other control in the single queue and the scheduling logic between the queues are distributed to the SOC side rear end driver of the system level chip through the work division cooperation of the system level chip and the field programmable logic gate array, so that the cost and the complexity of the field programmable logic gate array are reduced, and active packet loss is avoided. Queue management and scheduling are achieved through the SOC rear-end drive, under the condition that a large number of devices and queues exist, complex scheduling logic and a large number of resources are consumed, and the complex scheduling logic are pushed to SOC driving software from the FPGA, so that stability is improved, and meanwhile, the overall material cost is reduced. In addition, in the embodiment, the length of the data queue at the rear end (system level chip end) of the network card can be kept consistent with that of the data queue at the front end by dynamically adjusting the number of the data transmission queues, so that when the FPGA performs DMA actions on the descriptor and the buffer in the front and rear end queues, the DMA pipeline interruption caused by insufficient memory at any end can be avoided, and the DMA efficiency and the overall throughput performance can be improved.

In some optional implementation manners of this embodiment, the field programmable gate array FPGA of the execution main body may prefetch descriptors and buffers at the front and rear ends, so as to improve VirtIO DMA efficiency inside the field programmable gate array. In order to cooperate with field programmable gate array FPGA prefetching, the system-on-chip SOC can consume and supplement the queue in a batch mode, so that the FPGA can prefetch completely as much as possible, and the front-end and back-end pipelines can run at full load.

Step 4046, determining a difference between the number of current data transmission queues and a preset threshold.

Step 4047, in response to determining that the difference is less than the preset difference threshold, ceasing supplementing the descriptors.

The execution body may further determine a difference between the number of the current data transmission queues and a preset threshold, and when the difference is smaller than the preset difference threshold, the execution body may control to stop supplementing the descriptor indicating the address of the target packet. Therefore, the length of the back-end data queue is consistent with that of the front-end data queue, DMA pipeline interruption caused by insufficient memory at any end can be avoided, and DMA efficiency and overall throughput performance are improved.

With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, the present application provides an embodiment of an apparatus for transmitting data, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 5, the apparatus 500 for transmitting data of the present embodiment includes: an acquisition unit 501, a first analysis unit 502, a second analysis unit 503, and a control unit 504.

An obtaining unit 501 configured to obtain the target packet through the field programmable gate array.

The first parsing unit 502 is configured to, in response to determining that the category of the target packet includes the control packet, parse the control packet to obtain a first processing result.

The second parsing unit 503 is configured to, in response to determining that the category of the target packet includes a data packet, parse the data packet to obtain a second processing result.

A control unit 504 configured to perform data transmission through the system-on-chip according to a third processing result, where the third processing result includes any one of the first processing result and the second processing result.

In some optional implementations of this embodiment, the control unit 504 is further configured to: determining, by the system-on-chip, configuration information based on the third processing result; and carrying out data transmission based on the configuration information.

In some optional implementations of this embodiment, the control unit 504 is further configured to: determining the number of the current data transmission queues according to the third processing result; in response to determining that the number of the current data transmission queues exceeds a preset threshold, slowing down the rate of supplementing descriptors, wherein the descriptors are used for indicating the positions of target messages in the data transmission queues; and in response to determining that the number of the current data transmission queues is lower than a preset threshold, accelerating the rate of supplementing the descriptors, wherein the descriptors are used for indicating the positions of the target messages in the data transmission queues. .

In some optional implementations of this embodiment, the control unit 504 is further configured to: determining the difference value between the number of the current data transmission queues and a preset threshold value; in response to determining that the difference is less than the preset difference threshold, ceasing supplementing the descriptors.

It should be understood that the units 501 to 504 described in the apparatus 500 for transmitting data correspond to the respective steps in the method described with reference to fig. 2. Thus, the operations and features described above for the method for transmitting data are equally applicable to the apparatus 500 and the units included therein and will not be described again here.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 performs the respective methods and processes described above, such as the method for transmitting data. For example, in some embodiments, the method for transferring data may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the method for transferring data described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g. by means of firmware) to perform the method for transmitting data as described above.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

Computer program product comprising a computer program which, when being executed by a processor, carries out the method for transmitting data as described above.

According to the technical scheme of the embodiment of the application, data transmission of the front end and the rear end is achieved through software and hardware cooperation, the bare metal server can be compatible, throughput performance of data input/output can be improved, a thousand-level scale of semi-virtualized IO protocol equipment and the total queue number can be supported under the condition of smaller hardware device capacity, queue management and scheduling are achieved through software driving, under the condition that a large number of equipment and queues exist, complex scheduling logic and a large number of resource consumption are pushed to the software end from the hardware end, stability is improved, and overall material cost is reduced.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for transmitting data is applied to a network card, wherein the network card comprises a field programmable gate array and a system-on-chip, and the method comprises the following steps:

acquiring a target message through the field programmable logic gate array;

in response to the fact that the type of the target message comprises a control message, analyzing the control message to obtain a first processing result;

in response to the fact that the type of the target message comprises a data message, analyzing the data message to obtain a second processing result;

and performing data transmission through the system-on-chip according to a third processing result, wherein the third processing result includes any one of the first processing result and the second processing result.

2. The method of claim 1, wherein the transmitting data through the system-on-chip according to the third processing result comprises:

determining, by the system-on-chip, configuration information based on the third processing result;

and carrying out data transmission based on the configuration information.

3. The method of claim 2, wherein the transmitting data through the system-on-chip according to the third processing result further comprises:

determining the number of the current data transmission queues according to the third processing result;

in response to determining that the number of the current data transmission queues exceeds a preset threshold, slowing down the rate of supplementing descriptors, wherein the descriptors are used for indicating the positions of target messages in the data transmission queues;

and in response to determining that the number of the current data transmission queues is lower than a preset threshold, accelerating the rate of supplementing descriptors, wherein the descriptors are used for indicating the positions of the target messages in the data transmission queues.

4. The method of claim 3, wherein the transmitting data through the system-on-chip according to the third processing result further comprises:

determining the difference value between the number of the current data transmission queues and a preset threshold value;

in response to determining that the difference is less than a preset difference threshold, ceasing supplementing the descriptors.

5. A device for transmitting data is applied to a network card, wherein the network card comprises a field programmable gate array and a system-on-chip, and the device comprises:

the acquisition unit is configured to acquire a target message through the field programmable logic gate array;

the first analysis unit is configured to respond to the fact that the type of the target message comprises a control message, and analyze the control message to obtain a first processing result;

the second analysis unit is configured to respond to the fact that the type of the target message comprises a data message, and analyze the data message to obtain a second processing result;

a control unit configured to perform data transmission through the system-on-chip according to a third processing result, wherein the third processing result includes any one of the first processing result and the second processing result.

6. The apparatus of claim 5, wherein the control unit is further configured to:

and carrying out data transmission based on the configuration information.

7. The apparatus of claim 6, wherein the control unit is further configured to:

8. The apparatus of claim 7, wherein the control unit is further configured to:

9. An electronic device for transmitting data, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.

10. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-4.

11. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-4.