WO2021042782A1

WO2021042782A1 - Network card information processing method and chip

Info

Publication number: WO2021042782A1
Application number: PCT/CN2020/094031
Authority: WO
Inventors: 胡天驰; 林伟彬; 侯新宇
Original assignee: 华为技术有限公司
Priority date: 2019-09-06
Filing date: 2020-06-02
Publication date: 2021-03-11
Also published as: CN112463654A

Abstract

The present application provides a network card information processing method, a local server and a remote server perform data transmission through RDMA, at least one queue is stored in a memory of the local server, each queue is used to store IO commands, and the IO commands instruct the local server to perform data access to the remote server, the method comprises: a network card predicts high-frequency queue information, the possibility of accessing the high-frequency queue information is greater than that of other queue information, the high-frequency queue information is stored in a cache space, the cache space is in the network card, the queue information stored in the cache space is in one-to-one correspondence with the queues in the memory, and information of each queue is used by the network card to process the IO commands in the queue corresponding to the queue information. The present application can improve the hit rate of the cache space.

Description

Information processing method and chip of network card

Technical field

The present application relates to the field of servers, and more specifically, to an information processing method of a network card and a chip for information processing of the network card.

Background technique

In the network system of the data center, in order to avoid the delay of server-side data processing in network transmission, remote direct memory access (RDMA) technology can be used. When using RDMA technology, the local server of communication can transmit the data to be transmitted to the network card of the remote server through the network card, and the network card of the remote server can transfer the data to be transmitted to the memory of the remote server .

In actual business, after the local server receives the input output (IO) command to be processed, it needs to process the pending IO command through the network card, and send the processed IO command to the network card of the opposite server in. The network card can obtain queue information (for example, the queue pair context (QPC)) to process the IO commands to be processed. Usually, the queue information is stored in the memory of the server. In order to avoid the network card from frequently reading and writing data in the memory through the bus Queue information, part of the queue information can be buffered in the buffer space of the network card. Due to the small scale of the cache space in the network card, it is necessary to update the queue information cached in the cache space of the network card. How to improve the hit rate of the cache space of the network card and reduce the frequent read and write of the queue information stored in the memory due to the network card through the bus The resulting delay and the waste of bus bandwidth have become problems that need to be resolved urgently.

Summary of the invention

The present application provides an information processing method and chip for a network card, which can improve the hit rate of the cache space.

In the first aspect, an information processing method for a network card is provided. The local server and the remote server perform data transmission through remote direct memory access RDMA. The memory of the local server stores at least one queue, and each queue is used for storage. Input and output IO commands. The IO command instructs the local server to access data to the remote server. The method includes: the network card predicts high-frequency queue information, and the high-frequency queue information is more likely to be accessed than other queue information; the network card is in the cache space The high-frequency queue information is stored in the network card, where the buffer space is in the network card, the queue information stored in the buffer space corresponds to the queue in the memory of the local server, and each queue information is used for the network card to correspond to the queue information IO commands in the queue are processed.

In the above technical solution, the high-frequency queue information can be predicted, and the high-frequency queue information can be stored in the cache space of the network card, thereby improving the hit rate of the cache space of the network card, and further reducing the bus read and write access generated by the network card Reduce the waste of bus bandwidth, reduce the processing delay of the IO commands to be processed by the network card, and improve the transmission performance of the server.

In another possible implementation of the first aspect, the network card predicting the high-frequency queue information specifically includes: the network card searches the queue information currently being read in the queue information cached in the buffer space, and replaces the currently read queue information Determined as high-frequency queue information.

In the above technical solution, since the read and write access to the queue information in the buffer space occurs in pairs, if a read request is received for a certain queue information, then the queue information is more likely to be accessed in a short period of time. The currently read queue information is determined to be high-frequency queue information, which can improve the hit rate of the cache space.

In another possible implementation of the first aspect, the network card predicts the high-frequency queue information specifically including: the queue information is the queue pair context QPC, and the network card calculates the difference between the read pointer and the write pointer in the QPC cached in the cache space, The QPC with the difference between the read pointer and the write pointer greater than the preset threshold is determined as the high-frequency queue information.

In the above technical solution, the number of IO commands to be processed in the queue corresponding to the QPC can be determined according to the difference between the read pointer and the write pointer in the QPC, so as to determine the number of IO commands to be processed subsequently to be accessed by the QPC. Possibility to improve the hit rate of the cache space.

In another possible implementation of the first aspect, the network card predicting high-frequency queue information specifically includes: the network card obtains the IO command scheduling sequence, the IO command scheduling sequence records the pending IO commands; the network card determines the pending IO commands according to the IO command scheduling sequence The queue information corresponding to the queue to which the processed IO command belongs, and the queue information corresponding to the queue to which the IO command to be processed belongs is determined as high frequency queue information.

In the above technical solution, the unprocessed IO command can be pre-read first, and the queue information required to process the IO command is determined according to the queue information to which the unprocessed IO command belongs. The queue information is more likely to be accessed later. The queue information can be determined as high-frequency queue information, thereby improving the hit rate of the cache space.

It should be noted that the network card in this application can also predict high-frequency queue information according to any two or three prediction methods described above, and store the predicted high-frequency queue information in the cache space of the network card.

which is:

In another possible implementation of the first aspect, the network card predicting the high-frequency queue information specifically includes: the network card searches the queue information currently being read in the queue information cached in the buffer space, and replaces the currently read queue information When it is determined to be the high-frequency queue information, and, when the queue information is the queue pair context QPC, the network card calculates the difference between the read pointer and the write pointer in the QPC cached in the cache space, and the difference between the read pointer and the write pointer is greater than the preset threshold The QPC is determined as the high-frequency queue information.

In another possible implementation of the first aspect, the network card predicting the high-frequency queue information specifically includes: the network card searches the queue information currently being read in the queue information cached in the buffer space, and replaces the currently read queue information Determined as the high-frequency queue information, and, the network card obtains the IO command scheduling sequence. The IO command scheduling sequence records the IO commands to be processed; the network card determines the queue information corresponding to the queue to which the IO commands to be processed belongs according to the IO command scheduling sequence. The queue information corresponding to the queue to which the processed IO command belongs is determined to be high-frequency queue information.

In another possible implementation of the first aspect, the network card predicts the high-frequency queue information specifically including: the queue information is the queue pair context QPC, and the network card calculates the difference between the read pointer and the write pointer in the QPC cached in the cache space, The QPC where the difference between the read pointer and the write pointer is greater than the preset threshold is determined as high-frequency queue information, and the network card obtains the IO command scheduling sequence. The IO command scheduling sequence records the pending IO commands; the network card determines the IO command scheduling sequence The queue information corresponding to the queue to which the to-be-processed IO command belongs, and the queue information corresponding to the queue to which the to-be-processed IO command belongs is determined as high-frequency queue information.

In another possible implementation of the first aspect, the network card predicts the high-frequency queue information specifically including: the queue information is the queue pair context QPC, and the network card calculates the difference between the read pointer and the write pointer in the QPC cached in the cache space, The QPC where the difference between the read pointer and the write pointer is greater than the preset threshold is determined as high-frequency queue information, and the network card obtains the IO command scheduling sequence. The IO command scheduling sequence records the pending IO commands; the network card determines the IO command scheduling sequence The queue information corresponding to the queue to which the pending IO command belongs, the queue information corresponding to the queue to which the pending IO command belongs is determined as high-frequency queue information, and the network card searches the queue information cached in the buffer space for the current read The queue information of, determines the queue information currently being read as high-frequency queue information.

In another possible implementation manner of the first aspect, the network card reads the high-frequency queue information from the memory; and saves the high-frequency queue information in the buffer space.

In the above technical solution, the high-frequency queue information determined by the network card can be obtained from the memory and stored in the cache space of the network card, avoiding the large processing delay caused by obtaining the required queue information from the memory when processing IO commands. .

In another possible implementation of the first aspect, the network card determines that the high-frequency queue information has been stored in the cache space; the network card sets the state information of the high-frequency queue information in the cache space, so The status information is used to indicate that the high-frequency queue information continues to be stored in the buffer space.

In another possible implementation manner of the first aspect, the status information includes priority information or lock information, where the priority information is used to indicate the priority of the high-frequency queue information being updated in the buffer space. Level, the locking information is used to indicate that the high-frequency queue information is in a non-updated locked state in the buffer space.

In another possible implementation manner of the first aspect, the priority information or the lock information may be represented by a status flag bit.

In another possible implementation manner of the first aspect, the status flag bit of the queue information is set, which indicates that the queue information is in a locked state that is not updated temporarily in the buffer space.

In another possible implementation of the first aspect, the priority information of the queue information is represented by multiple status flag bits, and each status flag bit is used to indicate the result obtained by a method of predicting high-frequency queue information. A method of predicting high-frequency queue information is used to predict a queue information. When the result obtained is that the queue information is high-frequency queue information, the status flag corresponding to the queue information is set, and each queue information is combined The multiple status flags can determine the priority of the queue information to be updated.

In another possible implementation manner of the first aspect, the more the status flag bits corresponding to the queue information are set, the lower the priority of the queue information being updated in the buffer space.

In another possible implementation of the first aspect, the queue information includes one or more of the following information: queue pair context QPC, completion queue context CQC, and event queue context EQC.

In a possible implementation of the first aspect, the method further includes: the network card may also update or replace part or all of the queue information stored in the buffer space.

In another possible implementation manner of the first aspect, the method further includes: the network card may also update or replace part or all of the queue information stored in the buffer space according to the priority information or the lock information.

In another possible implementation of the first aspect, the network card divides the queue information in the cache space into levels according to the state information in the cache space, wherein the queue information with a higher level may be accessed The higher the performance, the priority is to update the low-level queue information stored in the buffer space.

In a second aspect, a chip is provided, the chip is applied to a server system, the local server and the remote server in the server system perform data transmission through remote direct memory access RDMA, and the memory of the local server At least one queue is stored, and each queue is used to store input and output IO commands. The IO commands instruct the local server to access data to the remote server, and the chip includes:

A prediction unit for predicting high-frequency queue information, which is more likely to be accessed than other queue information;

The processing unit is configured to store the high-frequency queue information in a cache space, wherein the cache space is in the network card of the local server, and the queue information stored in the cache space is one-to-one with the queue in the memory Correspondingly, each queue information is used by the network card to process the IO commands in the queue corresponding to the queue information.

In a possible implementation manner of the second aspect, the processing unit is further configured to: update or replace part or all of the queue information stored in the buffer space.

In another possible implementation manner of the second aspect, the prediction unit is specifically configured to: search for the currently read queue information in the queue information cached in the buffer space, and compare the currently read queue information The queue information is determined to be the high-frequency queue information.

In another possible implementation manner of the second aspect, the queue information is the queue pair context QPC, and the prediction unit is specifically configured to: calculate the difference between the read pointer and the write pointer in the QPC buffered in the buffer space Value, the QPC whose difference between the read pointer and the write pointer is greater than a preset threshold is determined as the high-frequency queue information.

In another possible implementation manner of the second aspect, the prediction unit is specifically configured to: obtain an IO command scheduling sequence, the IO command scheduling sequence records the IO commands to be processed; according to the IO command scheduling sequence, The queue information corresponding to the queue to which the to-be-processed IO command belongs is determined, and the queue information corresponding to the queue to which the to-be-processed IO command belongs is determined as the high-frequency queue information.

In another possible implementation manner of the second aspect, the processing unit is specifically configured to: read the high-frequency queue information from the memory; store the high-frequency queue information in the buffer space .

In another possible implementation manner of the second aspect, the processing unit is specifically configured to: determine that the high-frequency queue information has been stored in the cache space; and set the high-frequency queue in the cache space State information of the information, where the state information is used to indicate that the high-frequency queue information continues to be stored in the buffer space.

In another possible implementation manner of the second aspect, the status information includes priority information or lock information, and the priority information is used to indicate the priority of the high-frequency queue information being updated in the buffer space. Level, the locking information is used to indicate that the high-frequency queue information is in a non-updated locked state in the buffer space.

In another possible implementation manner of the second aspect, the priority information or the lock information may be represented by a status flag bit.

In another possible implementation of the second aspect, the queue information includes one or more of the following information: queue pair context QPC, completion queue context CQC, and event queue context EQC.

In another possible implementation manner of the second aspect, the processing unit is further configured to: update or replace part or all of the queue information stored in the buffer space according to priority information or lock information.

In another possible implementation of the second aspect, the processing unit is specifically configured to: state information in the buffer space of the network card, and divide the queue information in the buffer space into levels, where the higher the level The queue information of is more likely to be accessed; priority is given to updating the queue information of low level stored in the buffer space.

The beneficial effects of any possible implementation manner of the second aspect and the second aspect correspond to the beneficial effects of the first aspect and any possible implementation manner of the first aspect, and details are not described herein again.

In a third aspect, a network card is provided, including: the chip in the second aspect or any one of the possible implementation manners of the second aspect.

In a fourth aspect, a server is provided, including a memory and a network card as in the third aspect.

Description of the drawings

In order to more clearly illustrate the technical methods of the embodiments of the present application, the following will briefly introduce the drawings needed in the embodiments.

Fig. 1 is a schematic diagram of a possible server system provided by an embodiment of the present application.

Fig. 2 is a schematic flowchart of a method for a server to process an IO command according to an embodiment of the present application.

FIG. 3 is a schematic structural diagram of a possible server 110 provided by an embodiment of the present application.

FIG. 4 is a schematic structural diagram of queue information stored in a buffer space on a network card provided by an embodiment of the present application.

FIG. 5 is a schematic flowchart of a method for updating queue information in a buffer space provided by an embodiment of the present application.

FIG. 6 is a schematic diagram of QPC priorities divided in a buffer space on a network card according to an embodiment of the present application.

FIG. 7 is a schematic structural diagram of a chip 700 provided by an embodiment of the present application.

detailed description

The solutions in the embodiments provided in this application will be described below in conjunction with the drawings in this application.

For ease of description, several concepts involved in the embodiments of the present application will be described below.

(1) Input and output (input output, IO) commands

IO commands can be divided into read commands and write commands, which refer to commands issued by an application program running on the server to instruct to read data to or write data to a remote device. The processor of the server can receive the IO command and store it in the memory so that the IO command is waiting to be processed. Specifically, the IO command can be stored in a queue of the memory.

It should be noted that the IO commands mentioned in the embodiments of this application can be understood as IO commands to be processed. IO commands can be processed by the network card.

(2) Queue

A queue is a special linear table. Delete operations can be performed at the front of the table (front) and insert operations can be performed at the back of the table (rear). The end that performs the insertion operation is called the end of the team, and the end that performs the deletion operation is called the head of the team. When there are no elements in the queue, it is called an empty queue. The data elements of the queue are also called queue elements. Inserting a queue element into the queue is called enqueuing, and deleting a queue element from the queue is called dequeuing. The queue can be inserted at one end and deleted at the other end. The queue can also be called a first in first out (FIFO) linear table.

There can be multiple types of queues, for example, send queue (SQ), receive queue (RQ), complete queue (CQ), and event queue (event queue, EQ).

The sending queue of the local server and the receiving queue of the opposite server can be called a queue pair (QP).

The sending queue can be used to store IO commands to be processed; the receiving queue is used to store memory information required for processing IO commands.

As an example, the sending queue in the local server is used to store the IO command issued by the local server to instruct to read or write data to the peer server. For example, the IO command is to read the storage in the peer server. The network card in the local server can send the processed read command to the peer server. The read command can include the address and length of the data stored in the peer server that need to be read, so that The opposite server can send the data it needs to read to the local server according to the processed read command.

For another example, the IO command is a write command to write data to the peer server. After the network card in the local server processes the write command, it sends the processed write command to the peer server, so that the peer server The memory information in the receiving queue can be found according to the processed write command, and the data in the processed write command can be stored in the memory of the opposite server according to the found memory information.

Similarly, the sending queue in the opposite server can also store the IO commands issued by the opposite server to instruct to read or write data to the local server, and send it to the local server through the network card in the opposite server. For details, please refer to the above description, which will not be repeated here.

The completion queue is used to store completion commands. In other words, after the network card in the server completes the processing of the IO command, the completion information can be stored in the completion queue, and the completion information can also be referred to as a completion command.

The event queue is used to store event commands. In other words, after the network card in the server stores the completion command in the completion queue, an event command can be generated when the completion command in the completion queue reaches a certain number. As an example, the event command may include: event type, completion queue index. In order to trigger the processor to process one or more completion commands stored in the completion queue.

It should be understood that storing event commands in the event queue can prevent the network card in the server from frequently triggering the processor to process the completion commands stored in the completion queue after storing a completion command in the completion queue.

(3) Queue information

The context information of the queue may also be referred to as queue information. The queue information corresponds to the queue one-to-one and is used to process the IO commands in the queue corresponding to the queue information.

In the embodiment of the application, the possibility of accessing at least one queue information stored in the server can be predicted. If the possibility of accessing a certain queue information is greater than the possibility of accessing other queue information, the queue information can be called High-frequency queue information. In other words, the high frequency queue information is more likely to be accessed. For specific methods for determining high-frequency queue information, please refer to the method described below, which will not be detailed here.

In this application, queue information may include one or more of the following information: queue pair context (queue pair context, QPC), completion queue context (complete queue context, CQC), event queue context (event queue context, EQC) .

Specifically, as an example. QPC is used by the network card in the server to process the IO commands stored in the QP corresponding to the QPC, CQC is used by the processor to process the completion commands stored in the CQ corresponding to the CQC, and EQC is used by the processor in the EQ corresponding to the EQC The stored time command is processed.

In the embodiment of the present application, the server may also store state information corresponding to each queue information in at least one queue information, and the state information is used to indicate the possibility that the queue information corresponding to the state information is accessed.

(4) Queue pair context (QPC)

When processing the to-be-processed IO commands stored in the sending queue, it is necessary to perform authorization verification and virtual address conversion on the to-be-processed IO commands stored in the QP according to the QPC corresponding to the QP.

It should be understood that QPC is the context of the queue pair, and corresponds to the QP storing the IO commands to be processed in a one-to-one correspondence. When the network card processes the IO command to be processed according to the QPC, it needs to determine the QPC corresponding to the queue according to the queue to which the IO command belongs, and process the IO command to be processed according to the QPC.

In the network system of the data center, in order to avoid the delay of server-side data processing in network transmission, remote direct memory access (RDMA) technology can be used. It should be understood that RDMA is a memory access technology that quickly transfers data stored in the memory of one device to the memory of other devices without the intervention of the operating systems of both parties. RDMA technology can be suitable for high-throughput, low-latency network communication, and is especially suitable for use in large-scale parallel computer clusters.

As an example, when using RDMA technology, the local server of communication can transmit the data that needs to be transmitted to the network card of the remote server through the network card, and the network card of the remote server can transmit the data to be transmitted to the remote server. In memory.

The following describes in detail a network system applicable to the embodiment of the present application with reference to FIG. 1.

Fig. 1 is a schematic diagram of a possible server system provided by an embodiment of the present application. The server system may include at least two servers. The server 110 and the server 120 are described as examples in FIG. 1.

It should be noted that, in addition to the devices shown in FIG. 1, the server 110 and the server 120 may also include other devices such as a communication interface and a magnetic disk as an external memory, which is not limited here.

Take the server 110 as an example. The server 110 may include a memory 111, a processor 112, and a network card 113. Optionally, the server 110 may further include a bus 114. Among them, the memory 111, the processor 112, and the network card 113 may be connected through the bus 114. The bus 114 may be a peripheral component interconnect express (PCIE) bus or an extended industry standard architecture (EISA) bus. The bus 114 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 1, but it does not mean that there is only one bus or one type of bus.

The processor 112 is the computing core and control unit of the server 110. The processor 112 may include multiple processor cores (cores). The processor 112 may be a very large scale integrated circuit. An operating system and other software programs are installed in the processor 112, so that the processor 112 can implement access to the memory 111, cache, disk, and network card 113. It can be understood that, in the embodiment of the present application, the core in the processor 112 may be, for example, a central processing unit (CPU), a graphics processing unit (GPU) or other specific integrated circuits. (application specific integrated circuit, ASIC).

It should be understood that the processor 112 in the embodiments of the present application may also be other general-purpose processors, digital signal processors (digital signal processors, DSP), application specific integrated circuits (application specific integrated circuits, ASICs), ready-made programmable gate arrays ( field programmable gate array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The storage 111 is the main storage of the server 110. The memory 111 is generally used to store various running software programs in the operating system, input output (IO) commands issued by upper-level applications, and information exchanged with external memory. In order to increase the access speed of the processor 112, the memory 111 needs to have the advantage of fast access speed. In some computer system architectures, dynamic random access memory (DRAM) is used as the memory 111. The processor 112 can access the memory 111 at a high speed through a memory controller (not shown in FIG. 1), and perform a read operation and a write operation on any storage unit in the memory 111.

It should also be understood that the memory 111 in the embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Among them, the non-volatile memory can be read-only memory (ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), and electrically available Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory. The volatile memory may be random access memory (RAM), which is used as an external cache. By way of exemplary but not restrictive description, many forms of random access memory (RAM) are available, such as static random access memory (static RAM, SRAM), dynamic random access memory (DRAM), and synchronous dynamic random access memory (DRAM). Access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory Take memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM).

The network card 113 is used to enable the server 110 to communicate with other servers in the communication network. The network card may be built into the server 110, or may also be used as an external device of the server 110, and connected to the server 110 through an interface. The interface may be a network interface (network interface). For example, PCIE interface. In FIG. 1, the network card 113 is built in the server 110 as an example for description.

For the server 120, it may include a memory 121, a processor 122, and a network card 123. Optionally, the server 120 may further include a bus 124. The internal structure of the server 120 may be similar to that of the server 110, for example: the memory 121, the processor 122, and the network card 123 are respectively similar to the memory 111, the processor 112, and the network card 113 in the aforementioned server 110. For details, please refer to the above description of the server 110. The description of each part will not be repeated here.

In the scenario of data processing and communication, a communication network including the server 110 and the server 120 is taken as an example. The server 110 and the server 120 respectively run applications (application, APP), for example, the "weather forecast" APP. In a specific example, in order to achieve high-performance computing, the processor 112 in the server 110 is responsible for part of the calculation and stores the calculation result in the memory 111, and the processor 122 in the server 120 is responsible for the other part of the calculation and performs the calculation. The result is stored in the memory 121. At this time, it is necessary to summarize the calculation results stored in the memory 111 and the calculation results stored in the memory 121. For example, the processor 122 in the server 120 performs the calculation results stored in the memory 111 and the calculation results stored in the memory 121. Collective processing. The “weather forecast” APP running in the server 110 will issue an IO command, which is a write command and is used to instruct the calculation result stored in the memory 111 to be written into the memory 121.

The processor 112 in the server 110 may send the above-mentioned IO command to the memory 111 through the bus 114. The network card 113 can obtain the above-mentioned IO command from the memory 111 through the bus 114 and process the above-mentioned IO command. The IO command is a write command. The network card 113 obtains from the memory 111 the data to be written to the memory 121 indicated by the write command. The network card 113 transmits the data to the network card 123 of the server 120 through the network. The network card 123 can receive the data sent by the network card 113 and write the data into the memory 121 via the bus 124. The network card 123 can also send the processing result of writing data to the memory 121 to the network card 113 via the network, and the network card 113 stores the result to the memory 111 via the bus 114. The result is used to indicate that the network card 123 has requested the processor 112 to the memory 121 The written data is successfully stored in the memory 121.

The following describes the implementation process of the server 110 processing the IO commands to be processed with reference to FIG. 2.

Fig. 2 is a schematic flowchart of a method for a server to process an IO command according to an embodiment of the present application. As shown in FIG. 2, the method may include steps 210-240, and steps 210-240 will be described in detail below.

Step 210: The processor 112 in the server 110 sends the to-be-processed IO command to the memory 111.

Specifically, in the embodiment of the present application, the IO commands to be processed may be stored in the sending queue of the memory 111, and the sending queue may store one or more IO commands to be processed.

The information in the IO command may include, but is not limited to: the queue index of the queue to which the IO command belongs, the producer index (PI), and information related to the IO command.

PI represents the position of the pending IO command in the queue. PI can also be understood as the number of IO commands to be processed in the queue. The order of the IO commands to be processed in the queue may be counting from the head of the queue to the end of the queue, adding a pending IO command to the QP, and the PI count plus 1.

PI may also be called a write pointer (WT-POINTER), and is maintained by an upper-layer application that issues an IO command in the server 110.

The IO command related information may include, but is not limited to: a key for permission verification, and the type of the IO command. Among them, the key is used to verify the authority of the IO command, and the type of the IO command can be, for example, a read command or a write command. The IO command to be processed may also include: the length of the data (length), the virtual address (virtual address) of the data, and so on.

Step 220: The processor 112 sends a prompt message to the network card 113.

After the processor 112 sends the IO command to be processed to the memory 111, it may send a prompt message to the network card 113, where the prompt message is used to indicate that there is an IO command to be processed in the memory 111. As an example, the prompt message may be a door bell (DB).

In some embodiments, the DB sent by the processor 112 to the network card 113 may include: the queue index of the queue to which the IO command to be processed belongs, and the position (for example, PI) of the IO command to be processed in the queue. Optionally, the DB may also include IO command related information included in the IO command to be processed.

Step 230: The network card 113 processes the IO command to be processed.

The network card 113 can process the IO commands to be processed according to the queue information. The queue information used when processing the IO command to be processed in the embodiment of the application may be one or any combination of the following: queue pair context (queue pair context, QPC), complete queue context (complete queue context) , CQC), event queue context (event queue context, EQC).

Taking the queue information as QPC as an example, the QPC corresponds to the QP one-to-one, and the QP stores the pending IO commands issued by the processor. The network card 113 determines that the IO command to be processed is stored in the sending queue according to the prompt message issued by the processor 112, determines the corresponding QPC according to the sending queue storing the IO command to be processed, and processes the IO command to be processed according to the QPC. For example, the network card 113 performs permission verification on the read command or the write command in the IO command to be processed, or performs virtual address conversion of the read command or the write command.

Taking the queue information as CQC as an example, the CQC corresponds to the CQ one-to-one, and the completion command issued by the network card 113 is stored in the CQ. For example, the network card 113 may store the completed command in the CQ after processing the IO command in the sending queue. Specifically, the network card 113 may determine the corresponding CQC according to the CQ storing the completed command, and determine the address of the CQ according to the address information in the CQC, so that the network card 113 stores the completed command in the CQ according to the address of the CQ.

Taking the queue information as EQC as an example, the EQC corresponds to the EQ one-to-one, and the event command issued by the network card 113 is stored in the EQ. For example, after the network card 113 has processed the IO command in the QP, it may store the completed command in the CQ. And after the completion command in the CQ reaches a certain number, an event command is generated, and the network card 113 stores the event command in the EQ. Specifically, the network card 113 may determine the corresponding EQC according to the EQ storing the event command, and determine the address of the EQ according to the address information in the EQC, so that the network card 113 can store the event command in the EQ according to the address of the EQ.

The data structure of QPC is described below.

QPC can include, but is not limited to: PI, consumer index (CI). Wherein, CI represents the number of IO commands that the network card 113 has processed in the sending queue corresponding to the QPC, which can also be called a read pointer (RD-POINTER), and is maintained by the network card 113. For example, after the network card 113 has processed an IO command, it modifies the CI in the QPC corresponding to the QP storing the IO command, and the CI count is increased by one. PI is the number of IO commands to be processed in the QP. For the description of PI, please refer to the above description, which will not be repeated here.

The QPC may also include: queue status, key, and physical base address of the queue. Wherein, when the network card 113 processes the IO command to be processed, it can determine whether the queue to which the IO command to be processed belongs is available or normal according to the queue status. The network card 113 can also verify the authority of the IO command to be processed when processing the IO command to be processed according to the key. The network card 113 can also convert the virtual address of the read command or the write command among the pending IO commands stored in the queue according to the physical base address of the queue to which the pending IO command belongs, to obtain the read command from the pending IO command. The physical address of the command or write command.

Optionally, the QPC may also include other related data, such as one or more combinations of the following: the data length of the queue in the memory 111, the letter of credit, the working mode, and so on.

The network card 113 can process the IO commands to be processed according to the QPC. For ease of description, assume that the pending IO commands are stored in queue A, and queue A corresponds to QPC A. For example, the network card 113 may determine whether the queue A is available or normal according to the queue status included in the QPC A. For another example, the network card 113 may verify the authority of the IO command to be processed stored in the queue A according to the key included in the QPC A. For another example, the network card 113 can also convert the virtual address of the read command or write command among the pending IO commands stored in the queue A according to the physical base address of the queue included in QPC A, to obtain the pending IO command The physical address of the read command or write command.

The data structure of CQC is described below.

The CQC may include: the physical base address, PI, and CI of the queue. The physical base address of the queue indicates the physical base address of the CQ in the memory, so that the network card 113 can store the completion command in the CQ according to the physical base address. PI represents the number of completed commands stored in the network card 113 in the CQ corresponding to the CQC. For example, the network card 113 stores a completion command and modifies the PI in the CQC corresponding to the CQ storing the completion command, and the PI count is increased by one. CI represents the number of completed commands that the processor has processed in the CQ corresponding to the CQC. For example, the processor processes a completion command and modifies the CI in the CQC corresponding to the CQ storing the completion command, and the CI count is increased by one.

The data structure of EQC is described below.

The EQC can include: the physical base address of the queue, PI, and CI. Wherein, the physical base address of the queue indicates the physical base address of the EQ in the memory, so that the network card 113 stores the event command in the EQ according to the physical base address. PI represents the number of event commands stored in the network card 113 in the EQ corresponding to the EQC. For example, the network card 113 stores an event command and modifies the PI in the EQ corresponding to the EQ storing the event command, and the PI count is increased by one. CI represents the number of event commands that the processor has processed in the EQ corresponding to the EQC. For example, the processor processes an event command and modifies the CI in the EQC corresponding to the EQ storing the event command, and the CI count is increased by one.

Step 240: The network card 113 sends the processed IO command to the network card 123 of the server 120 via the network.

The above steps 210 to 240 describe the process in which the local server processes the pending IO commands after the upper application running on the local server sends the pending IO commands. It can be seen from the above process that the network card needs to process the IO command according to the queue information. Therefore, how to quickly obtain the corresponding queue information has a greater impact on the processing speed of the IO command.

The network card of the server may also include a cache space, for example, a cache memory (cache space). The buffer space can be used to store part of the queue information. When the network card processes the IO commands to be processed, it can directly obtain the stored queue information from the cache space of the network card, and process the IO commands to be processed according to the queue information, thereby avoiding the network card from frequently reading and writing the queue in the memory through the bus information.

Since the size of the cache space in the network card is small, only part of the queue information can be cached, and therefore, the queue information cached in the cache space needs to be updated. However, due to the randomness of the queue storing the IO commands to be processed, one queue corresponds to one queue information, it is difficult to replace the infrequently used queue information in the cache space, resulting in a decrease in the hit rate of the cache space, which in turn causes the network card to initiate Unnecessary bus read and write access increases the processing delay of the IO commands to be processed by the network card.

It should be understood that the ratio of the accessed queue information cached in the cache space to all caches in the cache space within a period of time is called the hit rate of the cache space. The larger the ratio, the higher the hit rate of the cache space. The smaller the ratio, the lower the hit rate of the cache space.

The technical solutions provided by the embodiments of the present application can increase the hit rate of the cache space in the network card, thereby reducing the delay caused by the network card frequently reading and writing queue information stored in the memory through the bus, and the waste of bus bandwidth, and reducing the processing of the network card The processing delay of IO commands improves the transmission performance.

FIG. 3 is a schematic structural diagram of a possible server 110 provided by an embodiment of the present application. As shown in FIG. 3, the server 110 may include a memory 111, a processor 112, and a network card 113. Optionally, the server 110 may further include a bus 114. Among them, the memory 111, the processor 112, and the network card 113 may be connected through the bus 114.

It should be understood that the network card 113 may be in the server 110, or may also be used as an external device of the server 110, and connected to the server 110 through an interface. In FIG. 3, the network card 113 is in the server 110 as an example for description.

The memory 111 may include multiple queues, for example, queue A, queue B, ..., queue N. One or more commands can be stored in each queue. Taking the queue as the sending queue in the QP as an example, one or more pending IO commands can be stored in the QP. Taking the queue as a CQ as an example, one or more completion commands can be stored in the CQ. Taking the queue as an EQ as an example, one or more event commands can be stored in the EQ.

The network card 113 may include: an input processing unit 310, a buffer space 320, a prediction unit 330, and a processing unit 340. Each of the above-mentioned units will be described in detail below.

The input processing unit 310 is mainly used to receive a prompt message (for example, a DB) sent by the processor 112, and the DB is used to indicate that there are IO commands to be processed in the queue of the memory 111. The input processing unit 310 may process the IO command to be processed. Specifically, in a possible implementation manner, the input processing unit 310 may obtain the IO command to be processed and the QPC corresponding to the queue from the queue of the memory 111, and process the IO command to be processed according to the QPC. In another possible implementation manner, the input processing unit 310 may also generate a completion command after processing the IO command. And according to the CQC corresponding to the CQ storing the completion command, the completion command is processed. In another possible implementation manner, the input processing unit 310 may also generate an event command after processing the IO command. And according to the EQC corresponding to the EQ storing the event command, the event command is processed. For the specific processing process, please refer to the above description, which will not be repeated here.

Cache space 320: used to store queue information, for example, queue information A, queue information B, ..., queue information M are stored in the cache space 320. Among them, M is less than N. The queue information A corresponds to the queue A and is used to process the commands stored in the queue A, and the queue information B corresponds to the queue B and is used to process the commands stored in the queue B, and so on.

A piece of queue information may include a queue information entry (entry) and corresponding status information. 4, DATA in the buffer space 320 is a space for storing queue information entries of queue information, and CTRL_DATA is used to store corresponding status information of the queue information in the DATA in the buffer space 320. For example, as shown in Figure 4, queue information A may include queue information entry A and corresponding status information A.

The status information stores related status information of the queue information in the buffer space 320. The status information can be represented by a flag bit, or can also be represented by a field, which is not specifically limited in this application.

The queue information in the embodiment of the present application may include one or more of the following: QPC, CQC, EQC.

The prediction unit 330: is mainly responsible for predicting the possibility that the queue information stored in the buffer space 320 will be subsequently accessed. If a certain queue information is more likely to be accessed than other queue information, the queue information can be called high-frequency queue information.

Processing unit 340: Mainly responsible for storing the high-frequency queue information determined by the prediction unit 330 in the buffer space, and can also update and replace the queue information stored in the buffer space 320 according to the prediction result obtained by the prediction unit 330, and the replaced ones The storage space occupied by the queue information can be used to store other new queue information.

It should be understood that the term "unit" herein can be implemented in the form of software and/or hardware, which is not specifically limited. For example, a "unit" may be a software program, a hardware circuit, or a combination of the two that realize the above-mentioned functions. When any of the above units is implemented by software, the software exists in the form of computer program instructions and is stored in the memory of the network card, and the processor of the network card can be used to execute the program instructions to implement the above method flow. The processor may include, but is not limited to, at least one of the following: central processing unit (CPU), microprocessor, digital signal processing (digital signal processing, DSP), microcontroller (microcontroller unit, MCU) , Or artificial intelligence processors and other computing devices that run software. Each computing device may include one or more cores for executing software instructions for calculation or processing. The processor can be a single semiconductor chip, or it can be integrated with other circuits to form a semiconductor chip. For example, it can be combined with other circuits (such as codec circuits, hardware acceleration circuits, or various bus and interface circuits) to form a system-on-chip ( system on chip, SoC), or as an application-specific integrated circuit (ASIC) built-in processor integrated in the ASIC, the ASIC integrated with the processor can be packaged separately or can be combined with other The circuits are packaged together. In addition to the core used to execute software instructions for calculation or processing, the processor may further include necessary hardware accelerators, such as field programmable gate array (FPGA) and programmable logic device (programmable logic device). device, PLD), or a logic circuit that implements dedicated logic operations.

When the above unit is implemented by a hardware circuit, the hardware circuit may be a general-purpose central processing unit (central processing unit, CPU), microcontroller (microcontroller unit, MCU), microprocessor (microprocessing unit, MPU), Digital signal processor (digital signal processing, DSP), system on chip (system on chip, SoC) to achieve, of course, it can also be implemented by application-specific integrated circuit (ASIC), or programmable logic device (programmable logic) device, PLD). The above-mentioned PLD can be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (generic array logic, GAL) or its In any combination, it can run necessary software or does not rely on software to execute the above method flow.

In the following, taking the server shown in FIG. 3 as an example, and in conjunction with FIG. 5, a method for information processing of a network card provided in an embodiment of the present application is described in detail.

FIG. 5 is a schematic flowchart of a method for information processing of a network card according to an embodiment of the present application. As shown in FIG. 5, the method may include steps 510-530, and steps 510-530 will be described in detail below.

Step 510: The prediction unit 330 predicts high-frequency queue information, which is more likely to be accessed than other queue information.

In the embodiment of the present application, the prediction unit 330 may predict the probability of at least one queue information stored in the buffer space 320 being accessed, and determine the high-frequency queue information with queue information whose access probability is greater than other queue information.

The at least one piece of queue information stored in the buffer space 320 in the embodiment of the present application may be one or a combination of any of the following: QPC, CQC, EQC. For details, please refer to the above description, which will not be repeated here.

Take the queue information as QPC as an example. In a possible implementation manner, the prediction unit 330 may predict the possibility that the QPC will be subsequently accessed according to whether the QPC is hit by the read request sent by the input processing unit 310. In another possible implementation manner, the prediction unit 330 may also pre-read the IO commands to be processed (for example, obtain the IO command scheduling sequence), and obtain the relevant information of the queue to which the IO commands to be processed belongs, so as to be based on the queue The related information predicts the possibility that at least one QPC stored in the cache space 320 will be subsequently accessed. In another possible implementation manner, the prediction unit 330 may also predict the possibility that the QPC will be subsequently accessed based on the difference between the values of the read pointer (for example, CI) and the write pointer (for example, PI) in the QPC.

Take the queue information as CQC as an example. The method of the prediction unit 330 for predicting the possibility of at least one CQC being accessed is similar to the method of predicting the possibility of at least one QPC being accessed. For example, the prediction unit 330 may predict the possibility that the CQC will be subsequently accessed according to whether the CQC is hit by the read request sent by the input processing unit 310. For another example, the prediction unit 330 may also pre-read the multiple completion commands to be processed (for example, obtain the IO command scheduling sequence), and obtain the relevant information of the CQ to which the multiple completion commands to be processed belong, so as to obtain the relevant information according to the CQ. The information predicts the possibility that at least one CQC stored in the cache space 320 will be subsequently accessed.

Take the queue information as EQC as an example. The method of the prediction unit 330 for predicting the possibility that at least one EQC is accessed is similar to the method for predicting the possibility that at least one QPC is accessed. For example, the prediction unit 330 may predict the possibility that the EQC will be subsequently accessed according to whether the EQC is hit by the read request sent by the input processing unit 310. For another example, the prediction unit 330 may also pre-read the multiple event commands to be processed, and obtain related information about the EQ to which the multiple event commands to be processed belong, so as to check at least the information stored in the buffer space 320 according to the related information of the EQ. The probability of an EQC being visited subsequently is predicted.

The several possible implementation manners described above will be described in detail below in conjunction with specific examples, which will not be described in detail here.

It should be understood that the method by which the prediction unit 330 predicts that at least one queue information stored in the buffer space 320 is high-frequency queue information may be predicted according to any one of the foregoing methods, or may be performed based on the superposition of any two of the foregoing methods. The prediction may also be based on the superposition of any of the above three methods, which is not specifically limited in this application.

Step 520: The processing unit 340 saves the high-frequency queue information in the buffer space.

In the embodiment of the present application, the queue information stored in the buffer space corresponds to the queue in the memory one-to-one, and each queue information is used by the network card to process the IO commands in the queue corresponding to the queue information.

As an example. In a possible implementation manner, the network card may read the high-frequency queue information from the memory, and store the high-frequency queue information in the buffer space. In another possible implementation manner, the network card determines that the high-frequency queue information has been stored in the cache space, and sets the state information of the high-frequency queue information in the cache space, and the state information is used to indicate the high-frequency queue information Continue to be stored in the cache space.

In the embodiments of the present application, the status information is not specifically limited, and may be priority information or lock information. Wherein, the priority information is used to indicate the priority at which the high-frequency queue information is updated in the cache space, and the lock information is used to indicate that the high-frequency queue information is in a non-updated locked state in the cache space.

Specifically, the processing unit 340 adjusts the state information corresponding to the high-frequency queue information according to the prediction result. The prediction unit 330 may predict the possibility that at least one queue information stored in the buffer space 320 will be subsequently accessed, and the processing unit 340 sets the state information corresponding to the predicted high-frequency queue information, and the high-frequency queue information is stored in the buffer space. One or more of the queue information is determined to be the high-frequency queue information. The probability of being accessed is greater than the probability of being accessed of other queue information in the buffer space.

There are many specific implementation manners. In one possible implementation manner, a flag bit corresponding to the high-frequency queue information can be set, for example, a setting operation is performed on the flag bit. In another possible implementation manner, the field corresponding to one or more queue information in the at least one queue information may also be modified.

Take the setting of the flag bit corresponding to the high-frequency queue information as an example. The flag bit corresponding to the high-frequency queue information may include one or any combination of the following: valid (valid), dirty (dirty), lock (lock), etc. Among them, the valid flag bit indicates whether the queue information entry is valid, the dirty flag bit indicates whether the queue information entry is dirty data, and the lock flag bit indicates whether the queue information entry is locked.

There may be one or more lock flag bits in the embodiment of this application, and the specific number of lock flag bits is not specifically limited in this application.

Step 530: The processing unit 340 updates part or all of the queue information stored in the buffer space 320 according to the state information corresponding to the queue information in the buffer space.

Taking the status information corresponding to the queue information as priority information as an example, the processing unit 340 may classify the queue information in the buffer space according to the priority information of the queue information stored in the buffer space. The queue information is more likely to be accessed; and priority is given to updating the low-level queue information stored in the cache space.

Taking the status information corresponding to the queue information as priority information as an example, the processing unit 340 may divide the queue information in the buffer space into levels according to the priority information, wherein the higher the level, the more likely the queue information to be accessed ; And priority to update the low-level queue information stored in the cache space.

In the embodiments of the present application, it is possible to determine the high-frequency queue information by predicting the possibility that the queue information stored in the cache space of the network card will be accessed, and optimize the replacement or update strategy of the queue information in the cache space to avoid replacing those that will be subsequently The used high-frequency queue information improves the hit rate of the network card's cache space, reduces the number of bus read and write accesses generated by the network card, reduces the waste of bus bandwidth, reduces the processing delay of the IO commands to be processed by the network card, and improves the transmission performance.

In the following, the queue information stored in the cache space is QPC as an example, and the possibility of access to at least one QPC stored in the cache space in the foregoing steps 510 and 520 is predicted and different implementations of the state information of the high-frequency queue information are set. Give a detailed description.

1. In a possible implementation manner, since the read and write access of the input processing unit 310 in the network card 113 to the QPC stored in the cache space 320 occurs in pairs, the prediction unit 330 can determine the cache space 320 Whether a certain stored QPC is read, it is determined that the possibility of subsequent access to the QPC is relatively high, and the QPC is determined to be high-frequency queue information. That is to say, the network card searches the QPC cached in the buffer space for the currently read QPC, and determines the currently read QPC as the high-frequency queue information.

For example, according to the queue A to which the IO command to be processed belongs, the input processing unit 310 determines that the IO command to be processed needs to be processed according to QPC A, and the input processing unit 310 needs to read the QPC A. After the pending IO command is processed, the read pointer (for example, CI) in the QPC A needs to be modified. Therefore, the QPC A needs to be written subsequently.

For the QPCs currently hit by the read request in the cache space, the processing unit 340 may set the state information corresponding to these QPCs. Specifically, as an example, a flag bit corresponding to QPC can be set. For example, a flag bit in the embodiment of the present application may include lock1. The processing unit 340 may set the lock 1 flag bit corresponding to the QPC. For example, setting the lock 1 flag bit to 1, and the lock 1 flag bit to 1 indicates that the QPC is more likely to be accessed subsequently.

Optionally, when the read QPC subsequently receives a write access request sent by the input processing unit 310, the processing unit 340 may clear the lock 1 flag bit corresponding to the QPC, for example, set the lock 1 flag bit to 0, lock1 The flag bit being 0 means that the QPC is less likely to be accessed subsequently.

2. In another possible implementation manner, the prediction unit 330 may also obtain the scheduling sequence of the IO commands to be processed, and determine the QPC corresponding to the queue to which the IO commands to be processed belongs according to the scheduling sequence of the IO commands, The QPC corresponding to the queue to which the to-be-processed IO command belongs is determined as the high-frequency queue information. It should be understood that the scheduling sequence of the IO commands records the IO commands to be processed. That is to say, before the processing of the pending IO command, the prediction unit 330 may predict the possibility that the QPC stored in the buffer space 320 will be read in the short term according to the pre-reading result of the pending IO command.

For example, the input processing unit 310 receives multiple prompt messages (for example, DBs) sent by the processor 112, and the multiple DBs indicate that there are multiple IO commands to be processed in the server 110. The input processing unit 310 may pre-read multiple DBs before processing the multiple pending IO commands, and send the pre-read results to the prediction unit 330. For example, the input processing unit 310 may determine the queue to which the IO command to be processed belongs according to the queue index of the queue to which the IO command to be processed belongs to, and the position (for example, PI) of the IO command in the queue included in the DB. information. The prediction unit 330 may determine corresponding multiple QPCs according to the queue information to which the multiple pending IO commands sent by the input processing unit 310 belong, and determine that the multiple QPCs are high-frequency queue information. The processing unit 340 sets the state information corresponding to the multiple QPCs. Specifically, as an example, the flag bits corresponding to these QPCs can be set. For example, a flag bit in the embodiment of the present application may include lock 2. The prediction unit 330 may set the lock 2 flag bit corresponding to QPC A, for example, set the lock 2 flag bit to 1.

Optionally, in some embodiments, if multiple QPCs determined according to the foregoing pre-reading results are not stored in the cache space 320, the network card may obtain the foregoing multiple QPCs from the memory 111 in advance and store them in the cache space 320. In this way, the multiple QPCs can be obtained from the memory 111 in advance, avoiding a large IO processing delay caused by obtaining the multiple QPCs from the memory 111 when processing IO commands.

3. In another possible implementation manner, the prediction unit 330 may also predict subsequent reads and writes to the QPC based on the PI and CI of the QPC stored in the read buffer space 320, and based on the difference between PI and CI access. When the read-write pointer difference is greater than the preset threshold, it indicates that the QPC is more likely to be accessed subsequently, and it can be determined that the QPC is high-frequency queue information.

It should be understood that PI represents the number of IO commands in the queue corresponding to the QPC stored in the buffer space 320, and the order of the IO commands in the queue may be counted from the head of the queue to the end of the queue. CI represents the number of IO commands processed in the queue corresponding to the QPC. When an IO command is processed, the CI count is increased by 1. Therefore, if the difference between PI and CI is large, the larger the PI and the smaller the CI, the larger the number of pending IO commands stored in the queue, the subsequent QPC corresponding to the queue is subsequently accessed The probability is also greater.

For example, the prediction unit 330 may compare the difference between the PI and the CI in the QPC in real time, and when the difference exceeds a certain preset threshold, the processing unit 340 may set the state information corresponding to the QPC. Specifically, as an example, the flag bit corresponding to QPC can be set. For example, a flag bit in the embodiment of the present application may include a lock (lock) 3. The processing unit 340 may set the lock 3 flag corresponding to QPC A, for example, set the lock 3 flag to 1.

Optionally, when the difference between PI and CI in the QPC is less than a certain preset threshold, the processing unit 340 may clear the lock 3 flag bit corresponding to the QPC, for example, set the lock 3 flag bit to 0.

It should be noted that the three prediction methods listed above can be used to determine the high-frequency queue information in the buffer space by one of them or a combination of any two or three prediction methods.

It should also be noted that the setting of status information is realized by setting a flag bit as an example. The flag bit can be set to one or multiple. The setting of the number of flag bits is related to the selected prediction method, which is not specifically limited in this application. For example, if the prediction result is obtained by one of these methods, a flag bit, for example, a lock flag bit, can be set. For another example, if the prediction result is obtained through a combination of any of the above two methods, two flag bits can be set, for example, the lock 1 flag and the lock 2 flag. For another example, if the prediction result is obtained through the combination of the above three methods, three flags can be set, for example, lock 1 flag, lock 2 flag, and lock 3 flag.

In the embodiment of the present application, the method for predicting the probability of at least one CQC or EQC being accessed by the prediction unit 330 is similar to the method for predicting the probability of at least one QPC being accessed. For example, since the input processing unit 310 in the network card 113 has read and write access to the CQC or EQC stored in the cache space 320 in pairs, the network card 113 will modify the CQC after reading the CQC or EQC in the cache space 320. Or the PI pointer in EQC. Therefore, the prediction unit 330 can determine whether a certain CQC or EQC stored in the buffer space 320 is read to determine that the subsequent access to the CQC or EQC is more likely, and the processing unit 340 can adjust the state corresponding to the CQC or EQC. information. For another example, the prediction unit 330 may also perform processing of the pending completion command or event command according to the scheduling sequence of the pending completion command or event command, and compare the buffer space 320 according to the pre-reading result of the pending completion command or event command. The possibility of short-term reading of the CQC or EQC stored in it is predicted. For the specific prediction process and the process of setting status information according to the prediction result, please refer to the QPC prediction and status information setting method mentioned above, which will not be repeated here.

The following uses the prediction result obtained by the combination of the above three prediction methods, and the flag bit corresponding to at least one queue information is set according to the prediction result as an example. The specific implementation process of the replacement of queue information by the processing unit 340 is described in detail.

For ease of description, the following uses the QPC prediction result as an example for description.

The processing unit 340 may divide the at least one QPC entry stored in the cache space 320 into several priority levels according to the lock flag bit corresponding to the QPC set by the prediction unit 330. For ease of description, FIG. 6 takes the division of at least one QPC entry into four levels as an example for description.

Level 1: A QPC entry in the cache space 320 that is likely to be accessed in a short period of time, and the lock1 flag corresponding to the QPC entry is set. The other lock flag bits, for example, the lock2 flag bit and the lock3 flag bit may be 1, or may also be 0, which is not specifically limited in this application.

Level 2: The QPC entry in the cache space 320 that is likely to be accessed in the short term, the lock 2 flag corresponding to the QPC entry is set, and the lock 1 flag is not set. In other words, the lock 2 flag bit corresponding to the QPC entry is 1, and the lock 1 flag bit is 0. The other lock flag bits, for example, the lock 3 flag bit may be 1, or may also be 0, which is not specifically limited in this application.

Level 3: QPC entries in the cache space 320 that are likely to be accessed in the long term, the lock 3 flag corresponding to the QPC entry is set, and the lock 1 flag and the lock 2 flag are not set. In other words, the lock 3 flag bit corresponding to the QPC entry is 1, the lock 1 flag bit and the lock 2 flag bit are 0.

Level 4: QPC entries in the cache space 320 that are less likely to be accessed for a long time, and the lock1 flag, lock2 flag, and lock3 flag bits corresponding to the QPC entry are not set. In other words, the lock1 flag, lock2 flag, and lock3 flag bits corresponding to the QPC entry are all 0.

Referring to FIG. 6, when a new QPC needs to be cached in the cache space 320, if there is an unoccupied space in the cache space 320 (level 5), the processing unit 340 may store the new QPC in the unoccupied space. If there is no unoccupied space in the cache space 320, replacement will occur, and the processing/changing unit 340 needs to delete part of the QPC and store the new QPC in the storage space occupied by the deleted QPC.

The priority of the processing unit 340 when performing replacement is from high to low: level 4-level 3-level 2-level 1. In other words, the processing unit 340 in the embodiment of the present application can be replaced according to the meaning of the lock flag bit.

As an example, three lock flags (lock1 flag, lock2 flag, lock3 flag) are not set QPC, for example, the QPC corresponding to level 4, the possibility of these QPCs being accessed in the subsequent access process If it is lower, the processing unit 340 may preferentially consider replacing the QPC corresponding to level 4. As another example, if there is no QPC corresponding to level 4, the processing unit 340 may consider replacing the QPC corresponding to level 3. These QPCs will have a large number of read and write accesses in the future, but these IO commands have not been scheduled in the short term, so , You can also consider replacing the QPC corresponding to level 3. As another example, if there is no QPC corresponding to level 3, the processing unit 340 may consider replacing the QPC corresponding to level 2. As another example, if there is no QPC corresponding to level 2, the processing unit 340 may finally consider replacing the QPC corresponding to level 1.

In the above technical solution, the network card can predict the possibility of accessing the QPC stored in the cache space in advance, thereby avoiding the need to update the QPC in the cache space, replacing it with the QPC that will be used later, and optimizing the replacement strategy of the cache space , Thereby improving the hit rate of the cache space of the network card.

In the embodiment of the present application, the processing unit 340 replaces the stored CQC or EQC according to the predicted result of the CQC being accessed or the predicted result of the EQC being accessed, and the specific implementation process of replacing the stored CQC or EQC is similar to the above method. The method of replacing the QPC with the prediction result will not be repeated here.

FIG. 7 is a schematic structural diagram of a chip 700 provided by an embodiment of the present application. The chip 700 is applied to a server system. The local server and the remote server in the server system perform data transmission through remote direct memory access RDMA. At least one queue is stored in the memory of the local server, and each queue is used to store input and output IO commands. The IO commands instruct the local server to access data to the remote server. The chip 700 includes :

The prediction unit 330 is configured to predict high-frequency queue information, which is more likely to be accessed than other queue information;

The processing unit 340 is configured to store the high-frequency queue information in a cache space, where the cache space is in the network card of the local server, and the queue information stored in the cache space is the same as the queue in the memory. One correspondence, each queue information is used by the network card to process the IO commands in the queue corresponding to the queue information.

Optionally, the processing unit 340 is further configured to: update or replace part or all of the queue information stored in the buffer space.

Optionally, the prediction unit 330 is specifically configured to: search for the currently read queue information in the queue information cached in the buffer space, and determine the currently read queue information as the high-frequency queue information.

Optionally, the queue information is the queue pair context QPC, and the prediction unit 330 is specifically configured to: calculate the difference between the read pointer and the write pointer in the QPC cached in the buffer space, and compare the read pointer and the write pointer. The QPC whose write pointer difference is greater than the preset threshold is determined as the high-frequency queue information.

Optionally, the prediction unit 330 is specifically configured to: obtain the IO command scheduling sequence, the IO command scheduling sequence records the IO commands to be processed; according to the IO command scheduling sequence, determine the IO command to be processed belongs The queue information corresponding to the queue of, and the queue information corresponding to the queue to which the to-be-processed IO command belongs is determined as the high-frequency queue information.

Optionally, the processing unit 340 is specifically configured to: read the high-frequency queue information from the memory; and store the high-frequency queue information in the buffer space.

Optionally, the processing unit 340 is specifically configured to: determine that the high-frequency queue information has been stored in the cache space; set the state information of the high-frequency queue information in the cache space, and the state information It is used to indicate that the high-frequency queue information continues to be stored in the buffer space.

Optionally, the status information includes priority information or lock information, the priority information is used to indicate the priority at which the high-frequency queue information is updated in the buffer space, and the lock information is used to indicate all The high-frequency queue information is in a non-updated locked state in the buffer space.

Optionally, priority information or lock information may be represented by status flag bits.

Optionally, the queue information includes one or more of the following information: queue pair context QPC, completion queue context CQC, and event queue context EQC.

Optionally, the processing unit 340 is further configured to: update or replace part or all of the queue information stored in the buffer space according to priority information or lock information.

Optionally, the processing unit 340 is specifically configured to: classify the queue information in the cache space according to the status information in the cache space of the network card, wherein the queue information with a higher level is more likely to be accessed. Large; priority is given to updating the low-level queue information stored in the buffer space.

It should be understood that in the various embodiments of the present application, the size of the sequence number of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not correspond to the embodiments of this application. The implementation process constitutes any limitation.

The embodiment of the present application also provides a network card, which includes the chip 700 described in any one of the above. For a specific description of the network card, please refer to Figure 3 and the description of the network card 113, which will not be repeated here.

The embodiment of the present application also provides a server, which includes a memory, a processor, a network card, and the like. For a specific description of the server, please refer to the description of Figure 1 and Figure 3, which will not be repeated here.

A person of ordinary skill in the art may realize that the units and algorithm steps of the examples described in combination with the embodiments disclosed herein can be implemented by electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed by hardware or software depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the system, device and unit described above can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

If the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disks or optical disks and other media that can store program codes. .

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Any person skilled in the art can easily think of changes or substitutions within the technical scope disclosed in this application. Should be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

An information processing method for a network card, which is characterized in that a local server and a remote server perform data transmission through remote direct memory access RDMA, and at least one queue is stored in the memory of the local server, and each queue is used for storing Input and output IO commands, where the IO commands instruct the local server to access data to the remote server, and the method includes:

The network card predicts high-frequency queue information, and the high-frequency queue information is more likely to be accessed than other queue information;

The network card stores the high-frequency queue information in a cache space, wherein the cache space is in the network card, and the queue information stored in the cache space corresponds to the queues in the memory one-to-one, and each queue information Used for the network card to process the IO commands in the queue corresponding to the queue information.
The method according to claim 1, wherein the predicting high frequency queue information of the network card comprises:

The network card searches for the currently read queue information in the queue information buffered in the buffer space, and determines the currently read queue information as the high frequency queue information.
The method according to claim 1, wherein the queue information is a queue pair context QPC, and the network card predicts high frequency queue information comprising:

The network card calculates the difference between the read pointer and the write pointer in the QPC cached in the cache space, and determines the QPC whose difference between the read pointer and the write pointer is greater than a preset threshold as the high-frequency queue information .
The method according to claim 1, wherein the predicting high frequency queue information of the network card comprises:

The network card obtains the IO command scheduling sequence, and the IO command scheduling sequence records the IO commands to be processed;

The network card determines the queue information corresponding to the queue to which the pending IO command belongs according to the IO command scheduling sequence, and determines the queue information corresponding to the queue to which the pending IO command belongs as the high-frequency queue information .
The method according to any one of claims 1 to 4, wherein the storing of the high-frequency queue information in a cache space by the network card comprises:

The network card reads the high-frequency queue information from the memory;

The high-frequency queue information is stored in the buffer space.
The method according to any one of claims 1 to 4, wherein the storing of the high-frequency queue information in a cache space by the network card comprises:

Determining, by the network card, that the high-frequency queue information has been stored in the buffer space;

The network card sets the state information of the high-frequency queue information in the cache space, and the state information is used to indicate that the high-frequency queue information continues to be stored in the cache space.
The method according to claim 6, wherein the status information includes priority information or lock information, and the priority information is used to indicate the priority of the high-frequency queue information being updated in the buffer space The lock information is used to indicate that the high-frequency queue information is in a non-updated lock state in the buffer space.
The method according to any one of claims 1-7, wherein the queue information includes one or more of the following information: queue pair context QPC, completion queue context CQC, and event queue context EQC.
A chip, characterized in that the chip is applied to a server system, the local server and the remote server in the server system perform data transmission through remote direct memory access RDMA, and the memory of the local server stores At least one queue, each queue is used to store input and output IO commands, the IO commands instruct the local server to access data to the remote server, and the chip includes:

A prediction unit for predicting high-frequency queue information, which is more likely to be accessed than other queue information;

The processing unit is configured to store the high-frequency queue information in a cache space, wherein the cache space is in the network card of the local server, and the queue information stored in the cache space is one-to-one with the queue in the memory Correspondingly, each queue information is used by the network card to process the IO commands in the queue corresponding to the queue information.
The chip according to claim 9, wherein the prediction unit is specifically configured to: search for the currently read queue information in the queue information buffered in the buffer space, and compare the currently read queue information The information is determined to be the high-frequency queue information.
The chip according to claim 9, wherein the queue information is a queue pair context QPC, and the prediction unit is specifically configured to: calculate the difference between the read pointer and the write pointer in the QPC buffered in the buffer space , Determining a QPC with a difference between the read pointer and the write pointer greater than a preset threshold as the high-frequency queue information.
The chip according to claim 9, wherein the prediction unit is specifically configured to:

Acquiring an IO command scheduling sequence, where the IO command scheduling sequence records the IO commands to be processed;

According to the IO command scheduling sequence, the queue information corresponding to the queue to which the to-be-processed IO command belongs is determined, and the queue information corresponding to the queue to which the to-be-processed IO command belongs is determined as the high-frequency queue information.
The chip according to any one of claims 9-12, wherein the processing unit is specifically configured to:

Read the high-frequency queue information from the memory;

The high-frequency queue information is stored in the buffer space.
The chip according to any one of claims 9-12, wherein the processing unit is specifically configured to:

Determining that the high-frequency queue information has been stored in the buffer space;

The state information of the high-frequency queue information in the buffer space is set, and the state information is used to indicate that the high-frequency queue information continues to be stored in the buffer space.
The chip according to claim 14, wherein the status information includes priority information or lock information, and the priority information is used to indicate the priority of the high-frequency queue information being updated in the cache space The lock information is used to indicate that the high-frequency queue information is in a non-updated lock state in the buffer space.
The chip according to any one of claims 9-15, wherein the queue information includes one or more of the following information: queue pair context QPC, completion queue context CQC, and event queue context EQC.
A network card, characterized by comprising: the chip according to any one of claims 9 to 16.
A server, characterized by comprising a memory and the network card according to claim 17.