CN112463654A - Cache implementation method with prediction mechanism - Google Patents

Cache implementation method with prediction mechanism Download PDF

Info

Publication number
CN112463654A
CN112463654A CN202010073159.1A CN202010073159A CN112463654A CN 112463654 A CN112463654 A CN 112463654A CN 202010073159 A CN202010073159 A CN 202010073159A CN 112463654 A CN112463654 A CN 112463654A
Authority
CN
China
Prior art keywords
queue
queue information
information
network card
command
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010073159.1A
Other languages
Chinese (zh)
Inventor
胡天驰
林伟彬
侯新宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to PCT/CN2020/094031 priority Critical patent/WO2021042782A1/en
Publication of CN112463654A publication Critical patent/CN112463654A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0853Cache with multiport tag or data arrays
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The application provides a cache implementation method with a prediction mechanism, a local server and a remote server perform data transmission through RDMA (remote direct memory access), at least one queue is stored in a memory of the local server, each queue is used for storing IO (input/output) commands, and the IO commands instruct the local server to perform data access to the remote server, and the cache implementation method comprises the following steps: the network card predicts high-frequency queue information, the possibility that the high-frequency queue information is accessed is higher than that of other queue information, the high-frequency queue information is stored in a cache space, the cache space is in the network card, the queue information stored in the cache space corresponds to queues in a memory one by one, and each queue information is used for the network card to process IO commands in the queues corresponding to the queue information. The method and the device can improve the hit rate of the cache space.

Description

Cache implementation method with prediction mechanism
The present application claims priority of chinese patent application with application number 201910843922.1, entitled "cache implementation method with prediction mechanism" filed by chinese patent office on 2019, 09/06, which is incorporated herein by reference in its entirety.
Technical Field
The present application relates to the field of servers, and more particularly, to a cache implementation method with a prediction mechanism.
Background
In a network system of a data center, in order to avoid delay of server-side data processing in network transmission, Remote Direct Memory Access (RDMA) technology may be used. When using RDMA technology, the local server in communication may transmit the data to be transmitted to the network card of the remote server through the network card, and the network card of the remote server transmits the data to be transmitted to the memory of the remote server.
In an actual service, after receiving an Input Output (IO) command to be processed, a local server needs to process the IO command to be processed through a network card, and send the processed IO command to the network card of an opposite server. The network card may acquire queue information (e.g., Queue Pair Context (QPC)) to process the IO command to be processed, where the queue information is usually stored in a memory of the server, and in order to avoid that the network card frequently reads and writes the queue information in the memory through the bus, a part of the queue information may be cached in a cache space of the network card. Because the scale of the cache space in the network card is small, the queue information cached in the cache space of the network card needs to be updated, how to improve the hit rate of the cache space of the network card, and how to reduce the delay and the waste of the bus bandwidth caused by the fact that the network card frequently reads and writes the queue information stored in the memory through the bus become problems which need to be solved urgently at present.
Disclosure of Invention
The application provides an information processing method and a chip of a network card, which can improve the hit rate of a cache space.
In a first aspect, an information processing method for a network card is provided, where a local server and a remote server perform data transmission via RDMA, a memory of the local server stores at least one queue, each queue is used to store an input/output IO command, and the IO command instructs the local server to perform data access to the remote server, where the method includes: the network card predicts high-frequency queue information, and the possibility of accessing the high-frequency queue information is higher than that of other queue information; the network card stores the high-frequency queue information in a cache space, wherein the cache space is in the network card, the queue information stored in the cache space corresponds to the queues in the memory of the local server one by one, and each queue information is used for the network card to process IO commands in the queue corresponding to the queue information.
According to the technical scheme, the high-frequency queue information can be predicted and stored in the cache space of the network card, so that the hit rate of the cache space of the network card is improved, the bus read-write access times generated by the network card can be further reduced, the bus bandwidth waste is reduced, the processing delay of the network card on the IO command to be processed is reduced, and the transmission performance of the server is improved.
In another possible implementation manner of the first aspect, the predicting, by the network card, the high-frequency queue information specifically includes: the network card searches the currently read queue information in the queue information cached in the cache space, and determines the currently read queue information as high-frequency queue information.
In the above technical solution, since the read-write access of the queue information in the cache space occurs in pairs, if a certain queue information receives a read request, the possibility that the queue information is accessed in a short period is high, the currently read queue information can be determined as high-frequency queue information, and the hit rate of the cache space can be improved.
In another possible implementation manner of the first aspect, the predicting, by the network card, the high-frequency queue information specifically includes: the queue information is queue pair context QPC, the network card calculates the difference value of a read pointer and a write pointer in the QPC of the cache in the cache space, and the QPC with the difference value of the read pointer and the write pointer larger than a preset threshold is determined as high-frequency queue information.
In the above technical solution, the number of IO commands to be processed in the queue corresponding to the QPC may be determined according to a difference between a read pointer and a write pointer in the QPC, so that the possibility of subsequent accesses of the QPC is determined according to the number of IO commands to be processed, and thus the hit rate of the cache space is improved.
In another possible implementation manner of the first aspect, the predicting, by the network card, the high-frequency queue information specifically includes: the network card acquires an IO command scheduling sequence, and the IO command scheduling sequence records the IO commands to be processed; the network card determines queue information corresponding to a queue to which the IO command to be processed belongs according to the IO command scheduling sequence, and determines the queue information corresponding to the queue to which the IO command to be processed belongs as high-frequency queue information.
In the technical scheme, the unprocessed IO command can be pre-read, the queue information required by the IO command is determined according to the queue information to which the unprocessed IO command belongs, the possibility that the queue information is accessed subsequently is high, and the queue information can be determined as high-frequency queue information, so that the hit rate of the cache space is improved.
It should be noted that, the network card in the present application may also predict the high-frequency queue information according to any two or three prediction methods, and store the predicted high-frequency queue information in a cache space of the network card.
In another possible implementation manner of the first aspect, the predicting, by the network card, the high-frequency queue information specifically includes: the network card searches the currently read queue information in the queue information cached in the cache space, determines the currently read queue information as high-frequency queue information, and when the queue information is a queue pair context QPC, the network card calculates a difference value of a read pointer and a write pointer in the QPC cached in the cache space, and determines the QPC with the difference value of the read pointer and the write pointer larger than a preset threshold value as the high-frequency queue information.
In another possible implementation manner of the first aspect, the predicting, by the network card, the high-frequency queue information specifically includes: the network card searches the currently read queue information in the queue information cached in the cache space, determines the currently read queue information as high-frequency queue information, and acquires an IO command scheduling sequence, wherein the IO command scheduling sequence records the IO command to be processed; the network card determines queue information corresponding to a queue to which the IO command to be processed belongs according to the IO command scheduling sequence, and determines the queue information corresponding to the queue to which the IO command to be processed belongs as high-frequency queue information.
In another possible implementation manner of the first aspect, the predicting, by the network card, the high-frequency queue information specifically includes: the queue information is a queue pair context QPC, the difference value of a read pointer and a write pointer in the QPC cached in the cache space is calculated by the network card, the QPC with the difference value of the read pointer and the write pointer larger than a preset threshold value is determined as high-frequency queue information, and the IO command scheduling sequence obtained by the network card records the IO command to be processed; the network card determines queue information corresponding to a queue to which the IO command to be processed belongs according to the IO command scheduling sequence, and determines the queue information corresponding to the queue to which the IO command to be processed belongs as high-frequency queue information.
In another possible implementation manner of the first aspect, the predicting, by the network card, the high-frequency queue information specifically includes: the queue information is a queue pair context QPC, the difference value of a read pointer and a write pointer in the QPC cached in the cache space is calculated by the network card, the QPC with the difference value of the read pointer and the write pointer larger than a preset threshold value is determined as high-frequency queue information, and the IO command scheduling sequence obtained by the network card records the IO command to be processed; the network card determines queue information corresponding to a queue to which the IO command to be processed belongs according to the IO command scheduling sequence, determines the queue information corresponding to the queue to which the IO command to be processed belongs as high-frequency queue information, searches currently read queue information in the queue information cached in the cache space, and determines the currently read queue information as the high-frequency queue information.
In another possible implementation manner of the first aspect, the network card reads the high-frequency queue information from the memory; and storing the high-frequency queue information in the cache space.
In the technical scheme, the high-frequency queue information determined by the network card can be acquired from the memory and stored in the cache space of the network card, so that the larger processing delay caused by acquiring the required queue information from the memory when the IO command is processed is avoided.
In another possible implementation manner of the first aspect, the network card determines that the high-frequency queue information is already stored in the cache space; and the network card sets the state information of the high-frequency queue information in the cache space, wherein the state information is used for indicating that the high-frequency queue information is continuously stored in the cache space.
In another possible implementation manner of the first aspect, the state information includes priority information or lock information, where the priority information is used to indicate a priority that the high-frequency queue information is updated in the buffer space, and the lock information is used to indicate that the high-frequency queue information is in a lock state that is not updated in the buffer space.
In another possible implementation manner of the first aspect, the priority information or the locking information may be represented by a status flag bit.
In another possible implementation manner of the first aspect, a state flag bit of the queue information is set, which indicates that the queue information is in a lock state where the queue information is not updated temporarily in the buffer space.
In another possible implementation manner of the first aspect, the priority information of the queue information is represented by a plurality of status flag bits, each status flag bit is used to represent a result obtained by a method for predicting high-frequency queue information, when a method for predicting high-frequency queue information is adopted to predict one queue information, and the obtained result is that the queue information is high-frequency queue information, the status flag bit corresponding to the queue information is set, and the updated priority of the queue information can be determined by combining the plurality of status flag bits of each queue information.
In another possible implementation manner of the first aspect, the more the number of the state flag bits corresponding to the queue information are set, the lower the priority of the queue information being updated in the buffer space.
In another possible implementation manner of the first aspect, the queue information includes one or more of the following information: queue pair context QPC, completion queue context CQC, event queue context EQC.
In a possible implementation manner of the first aspect, the method further includes: the network card can also update or replace part or all of the queue information stored in the cache space.
In another possible implementation manner of the first aspect, the method further includes: the network card can also update or replace part or all of the queue information stored in the cache space according to the priority information or the locking information.
In another possible implementation manner of the first aspect, the network card divides the queue information in the cache space according to the state information in the cache space, wherein the higher the level is, the higher the possibility that the queue information is accessed is; and preferentially updating the queue information with low grade stored in the cache space.
In a second aspect, a chip is provided, where the chip is applied to a server system, a local server and a remote server in the server system perform data transmission via RDMA, at least one queue is stored in a memory of the local server, each queue is used to store an input/output IO command, and the IO command instructs the local server to perform data access to the remote server, and the chip includes:
the prediction unit is used for predicting high-frequency queue information, and the possibility of accessing the high-frequency queue information is greater than that of other queue information;
and the processing unit is used for storing the high-frequency queue information in a cache space, wherein the cache space is in a network card of the local server, the queue information stored in the cache space corresponds to the queues in the memory one by one, and each queue information is used for the network card to process the IO commands in the queue corresponding to the queue information.
In a possible implementation manner of the second aspect, the processing unit is further configured to: and updating or replacing part or all of the queue information stored in the buffer space.
In another possible implementation manner of the second aspect, the prediction unit is specifically configured to: and searching the currently read queue information in the queue information cached in the cache space, and determining the currently read queue information as the high-frequency queue information.
In another possible implementation manner of the second aspect, the queue information is a queue pair context QPC, and the prediction unit is specifically configured to: and calculating the difference value of a read pointer and a write pointer in the QPC of the cache in the cache space, and determining the QPC with the difference value of the read pointer and the write pointer larger than a preset threshold as the high-frequency queue information.
In another possible implementation manner of the second aspect, the prediction unit is specifically configured to: obtaining an IO command scheduling sequence, wherein the IO command scheduling sequence records IO commands to be processed; and determining queue information corresponding to a queue to which the IO command to be processed belongs according to the IO command scheduling sequence, and determining the queue information corresponding to the queue to which the IO command to be processed belongs as the high-frequency queue information.
It should be noted that, the network card in the present application may also predict the high-frequency queue information according to any two or three prediction methods, and store the predicted high-frequency queue information in a cache space of the network card.
In another possible implementation manner of the second aspect, the processing unit is specifically configured to: reading the high frequency queue information from the memory; and storing the high-frequency queue information in the cache space.
In another possible implementation manner of the second aspect, the processing unit is specifically configured to: determining that the high frequency queue information has been stored in the cache space; setting state information of the high-frequency queue information in the cache space, wherein the state information is used for indicating that the high-frequency queue information is continuously stored in the cache space.
In another possible implementation manner of the second aspect, the state information includes priority information or lock information, where the priority information is used to indicate a priority that the high-frequency queue information is updated in the buffer space, and the lock information is used to indicate that the high-frequency queue information is in a lock state that is not updated in the buffer space.
In another possible implementation of the second aspect, the priority information or the lock information may be represented by a status flag bit.
In another possible implementation manner of the second aspect, the queue information includes one or more of the following information: queue pair context QPC, completion queue context CQC, event queue context EQC.
In another possible implementation manner of the second aspect, the processing unit is further configured to: and updating or replacing part or all of the queue information stored in the cache space according to the priority information or the locking information.
In another possible implementation manner of the second aspect, the processing unit is specifically configured to: the state information in the cache space of the network card divides the queue information in the cache space according to grades, wherein the higher the grade, the higher the possibility that the queue information is accessed is; and preferentially updating the queue information with low grade stored in the cache space.
The beneficial effects of the second aspect and any one of the possible implementation manners of the second aspect correspond to the beneficial effects of the first aspect and any one of the possible implementation manners of the first aspect, and therefore, the detailed description is omitted here.
In a third aspect, a network card is provided, which includes: such as the chip of the second aspect or any one of its possible implementations.
In a fourth aspect, there is provided a server comprising a memory and a network card as in the third aspect.
Drawings
Fig. 1 is a schematic diagram of a possible server system provided in an embodiment of the present application.
Fig. 2 is a schematic flowchart of a method for processing an IO command by a server according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of a possible server 110 provided in an embodiment of the present application.
Fig. 4 is a schematic structural diagram of queue information stored in a cache space on a network card according to an embodiment of the present application.
Fig. 5 is a schematic flow chart of a method for updating queue information in a buffer space according to an embodiment of the present application.
Fig. 6 is a schematic diagram of QPC priorities divided in a cache space on a network card according to an embodiment of the present application.
Fig. 7 is a schematic structural diagram of a chip 700 provided in an embodiment of the present application.
Detailed Description
The technical solution in the present application will be described below with reference to the accompanying drawings.
For convenience of description, several concepts related to the embodiments of the present application will be explained below.
(1) Input Output (IO) command
The IO command may be divided into a read command and a write command, and refers to a command issued by an application running on a server and used for instructing to read data from or write data to a remote device. The processor of the server may receive the IO command and store it in memory such that the IO command waits to be processed. In particular, the IO commands may be stored in a queue of memory.
It should be noted that the IO command mentioned in the embodiment of the present application may be understood as an IO command to be processed. The IO commands may be processed by the network card.
(2) Queue
A queue is a special linear table that can perform delete operations at the front end (front) of the table and insert operations at the back end (rear) of the table. The end performing the insert operation is called the tail of the queue, and the end performing the delete operation is called the head of the queue. When there are no elements in the queue, it is called an empty queue. The data elements of the queue are also referred to as queue elements. Inserting a queue element into a queue is called enqueuing, and removing a queue element from a queue is called dequeuing. Queues, which may be inserted at one end and deleted at the other, may also be referred to as First In First Out (FIFO) linear tables.
The type of the queue may be various, for example, a Send Queue (SQ), a Receive Queue (RQ), a Complete Queue (CQ), and an Event Queue (EQ).
The transmit queue of the home server and the receive queue of the peer server may be referred to as a Queue Pair (QP).
The send queue may be used to store IO commands to be processed; the receiving queue is used for storing memory information required by IO command processing.
As an example, the sending queue in the local server is used to store an IO command that is issued by the local server and used to instruct the opposite-end server to read or write data, for example, the IO command is a read command for reading data stored in the opposite-end server, a network card in the local server may send the processed read command to the opposite-end server, and the read command may include an address and a length of data stored in the opposite-end server that needs to be read, so that the opposite-end server may send the data that needs to be read to the local server according to the processed read command.
For another example, the IO command is a write command for writing data into the peer-to-peer server, and after the network card in the local server can process the write command, the processed write command is sent to the peer-to-peer server, so that the peer-to-peer server can search the memory information in the receive queue according to the processed write command, and store the data in the processed write command into the memory of the peer-to-peer server according to the searched memory information.
Similarly, the sending queue in the peer server may also store an IO command issued by the peer server to instruct the local server to read or write data, and send the IO command to the local server through the network card in the peer server. For details, please refer to the above description, which is not repeated herein.
The completion queue is used to store completion commands. That is, after the network card in the server completes the processing of the IO command, the completion information, which may also be referred to as a completion command, may be stored in a completion queue.
The event queue is used for storing event commands. That is, after the network card in the server stores the completion commands in the completion queue, the event command may be generated when the completion commands in the completion queue reach a certain number. As an example, the event command may include: event type, completion queue indexing. In order to trigger the processor to process one or more completion commands stored in the completion queue.
It should be appreciated that storing event commands in the event queue may avoid a network card in the server from frequently triggering the processor to process completion commands stored in the completion queue after storing a completion command in the completion queue.
(3) Queue information
The context information of the queue may also be referred to as queue information, where the queue information corresponds to the queue one-to-one and is used to process the IO command in the queue corresponding to the queue information.
In the embodiment of the application, the possibility that at least one queue information stored in a server is accessed can be predicted, and if the possibility that a certain queue information is accessed is higher than that of other queue information, the queue information can be called high-frequency queue information. That is, high frequency queue information is more likely to be accessed. For a specific method for determining the high frequency queue information, please refer to the method described below, which will not be described in detail herein.
In the present application, the queue information may include one or more of the following information: queue Pair Context (QPC), Completion Queue Context (CQC), Event Queue Context (EQC).
Specifically, as an example. The QPC is used for the network card in the server to process the IO command stored in the QP corresponding to the QPC, the CQC is used for the processor to process the completion command stored in the CQ corresponding to the CQC, and the EQC is used for the processor to process the time command stored in the EQ corresponding to the EQC.
In this embodiment of the application, the server may further store state information corresponding to each of the at least one queue information, where the state information is used to indicate a possibility that the queue information corresponding to the state information is accessed.
(4) Queue Pair Context (QPC)
When an IO command to be processed stored in a transmission queue is processed, it is necessary to perform processing such as permission check and virtual address conversion on the IO command to be processed stored in a QP according to the QPC corresponding to the QP.
It should be understood that QPC is the context of a queue pair, one-to-one with QP storing IO commands to be processed. When the network card processes the IO command to be processed according to the QPC, it needs to determine a QPC corresponding to a queue to which the IO command to be processed belongs according to the queue, and process the IO command to be processed according to the QPC.
In a network system of a data center, in order to avoid delay of server-side data processing in network transmission, remote direct data access (RDMA) technology may be used. It should be appreciated that RDMA is a memory access technique that quickly transfers data stored in the memory of one device to the memory of another device without the intervention of both operating systems. RDMA techniques may be suitable for high-throughput, low-latency network communications, particularly for use in massively parallel computer clusters.
As an example, when using RDMA technology, a local server in communication may transmit data to be transmitted to a network card of a remote server through the network card, and the network card of the remote server transmits the data to be transmitted to a memory of the remote server.
A network system suitable for use in the embodiments of the present application is described in detail below with reference to fig. 1.
Fig. 1 is a schematic diagram of a possible server system provided in an embodiment of the present application. At least two servers may be included in the server system, and server 110 and server 120 are illustrated in FIG. 1 as examples.
It should be noted that, in addition to the components shown in fig. 1, the server 110 and the server 120 may further include other components such as a communication interface and a magnetic disk as an external storage, and the components are not limited herein.
Take server 110 as an example. The server 110 may include a memory 111, a processor 112, and a network card 113. Optionally, server 110 may also include bus 114. The memory 111, the processor 112, and the network card 113 may be connected via a bus 114. The bus 114 may be a Peripheral Component Interconnect Express (PCIE) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 114 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 1, but it is not intended that there be only one bus or one type of bus.
The processor 112 is a control unit (control unit) and an arithmetic core of the server 110. Multiple processor cores (cores) may be included in processor 112. Processor 112 may be an ultra-large scale integrated circuit. An operating system and other software programs are installed in the processor 112 to enable the processor 112 to access the memory 111, cache, disk and network cards 113. It is understood that, in the embodiment of the present application, the core in the processor 112 may be, for example, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other Application Specific Integrated Circuit (ASIC).
It should be understood that the processor 112 in the embodiments of the present application may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Memory 111 is the main memory of server 110. The memory 111 is generally used for storing various running software programs in the operating system, Input Output (IO) commands issued by upper-layer applications, information exchanged with external memory, and the like. In order to increase the access speed of the processor 112, the memory 111 needs to have an advantage of high access speed. In some computer system architectures, Dynamic Random Access Memory (DRAM) is used as the memory 111. The processor 112 can access the memory 111 at a high speed through a memory controller (not shown in fig. 1) to perform a read operation and a write operation on any one of the memory cells in the memory 111.
It will also be appreciated that the memory 111 in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of example, but not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), Enhanced SDRAM (ESDRAM), synchlink DRAM (SLDRAM), and direct bus RAM (DR RAM).
Network card 113 is used to communicate server 110 with other servers in the communication network. The network card may be built in the server 110, or may also serve as an external device to the server 110, and is connected to the server 110 through an interface. The interface may be a network interface. Such as a PCIE interface. Fig. 1 illustrates an example in which the network card 113 is embedded in the server 110.
For the server 120, a memory 121, a processor 122, and a network card 123 may be included. Optionally, the server 120 may also include a bus 124. The internal structure of server 120 may be similar to server 110, for example: the memory 121, the processor 122, and the network card 123 are respectively similar to the memory 111, the processor 112, and the network card 113 in the server 110, and for details, reference is made to the above description of each part in the server 110, and details are not repeated here.
In the context of data processing and communication, server 110 and server 120 are included in a communication network as an example. An Application (APP), such as a "weather forecast" APP, runs on server 110 and server 120, respectively. In a specific example, to achieve high performance computing, processor 112 in server 110 undertakes a portion of the computing, stores the computing results in memory 111, and processor 122 in server 120 undertakes another portion of the computing, and stores the computing results in memory 121. In this case, it is necessary to perform a process of summarizing the calculation results stored in the memory 111 and the calculation results stored in the memory 121, and for example, the processor 122 in the server 120 performs a process of summarizing the calculation results stored in the memory 111 and the calculation results stored in the memory 121. The "weather forecast" APP running in the server 110 issues an IO command, which is a write command for instructing to write the calculation result stored in the memory 111 into the memory 121.
The processor 112 in the server 110 may send the IO command to the memory 111 through the bus 114. The network card 113 may obtain the IO command from the memory 111 through the bus 114, and process the IO command, where the IO command is a write command, and the network card 113 obtains data, which is indicated by the write command and needs to be written into the memory 121, from the memory 111. The network card 113 transmits the data to the network card 123 of the server 120 via the network. The network card 123 may receive data transmitted by the network card 113 and write the data into the memory 121 through the bus 124. The network card 123 may also send the processing result of writing data into the memory 121 to the network card 113 through the network, and the network card 113 stores the result into the memory 111 through the bus 114, where the result is used to indicate that the network card 123 has successfully stored the data that the processor 112 needs to write into the memory 121.
An implementation process of the server 110 for processing the IO command to be processed is described below with reference to fig. 2.
Fig. 2 is a schematic flowchart of a method for processing an IO command by a server according to an embodiment of the present application. As shown in FIG. 2, the method may include steps 210 and 240, and the steps 210 and 240 are described in detail below.
Step 210: the processor 112 in the server 110 sends the IO command to be processed to the memory 111.
Specifically, in the embodiment of the present application, the IO command to be processed may be stored in a sending queue of the memory 111, where one or more IO commands to be processed may be stored in the sending queue.
The information in the IO command may include, but is not limited to: queue index of the queue to which the IO command belongs, Producer Index (PI), IO command related information.
The PI indicates the position of the IO command to be processed in the queue. PI may also be understood as the number of IO commands pending in the queue. The ordering of the IO commands to be processed in the queue may be counting from head of the queue to tail of the queue from 1, adding one IO command to be processed in the QP, and adding 1 to the PI count.
The PI may also be referred to as a write POINTER (WT-POINTER), maintained by upper layer applications in server 110 that issue IO commands.
The IO command related information may include, but is not limited to: key for permission verification, type of IO command. The key is used to verify the authority of the IO command, and the type of the IO command may be, for example, a read command or a write command. The IO command to be processed may further include: length of data (length), virtual address of data (virtual address), etc.
Step 220: the processor 112 sends a prompt message to the network card 113.
After the processor 112 sends the IO command to be processed to the memory 111, a prompt message may be sent to the network card 113, where the prompt message is used to indicate the IO command to be processed in the memory 111. As an example, the prompt message may be a Doorbell (DB).
In some embodiments, the DB sent by the processor 112 to the network card 113 may include: the queue index of the queue to which the pending IO command belongs, the location (e.g., PI) of the pending IO command in the queue. Optionally, the DB may further include IO command related information included in the IO command to be processed.
Step 230: the network card 113 processes the IO command to be processed.
The network card 113 may process the IO command to be processed according to the queue information. In this embodiment of the present application, queue information used when an IO command to be processed is processed may be one of the following or a combination of any multiple of the following: queue Pair Context (QPC), Completion Queue Context (CQC), Event Queue Context (EQC).
Taking the queue information as QPC as an example, the QPC corresponds to QPs one by one, and the QPs store IO commands to be processed issued by the processor. The network card 113 determines, according to the prompt message issued by the processor 112, that the IO command to be processed is stored in the transmission queue, determines a corresponding QPC according to the transmission queue storing the IO command to be processed, and processes the IO command to be processed according to the QPC. For example, the network card 113 performs authority check on a read command or a write command in the IO commands to be processed, or performs virtual address translation on the read command or the write command.
Taking the queue information as CQC as an example, the CQC corresponds to the CQ one-to-one, and the CQ stores the completion command issued by the network card 113. For example, after the network card 113 finishes processing the IO command in the transmission queue, the completion command may be stored into the CQ. Specifically, the network card 113 may determine a corresponding CQC according to the CQ storing the completion command, and determine an address of the CQ according to the address information in the CQC, so that the network card 113 stores the completion command into the CQ according to the address of the CQ.
Taking the queue information as EQC as an example, the EQC corresponds to the EQ one-to-one, and the EQ stores the event command sent by the network card 113. For example, after the network card 113 has processed the IO command in the QP, the completion command may be stored in the CQ. And after the completion commands in the CQ reach a certain number, an event command is generated and the network card 113 stores the event command in the EQ. Specifically, the network card 113 may determine a corresponding EQC according to the EQ storing the event command, and determine an address of the EQ according to the address information in the EQC, so that the network card 113 stores the event command in the EQ according to the address of the EQ.
The data structure of QPC is explained below.
Among QPC may be, but is not limited to: PI, Consumer Index (CI). Wherein, CI represents the number of IO commands that have been processed by the network card 113 in the transmission queue corresponding to the QPC, and may also be referred to as a read POINTER (RD-POINTER), which is maintained by the network card 113. For example, the network card 113 finishes processing an IO command, modifies CI in the QPC corresponding to the QP storing the IO command, and increments the CI count by 1. PI is the number of IO commands to be processed in the QP, and for the description of PI, please refer to the above description, which is not described herein again.
QPC may also include: queue status, key, physical base address of the queue. When the IO command to be processed is processed, the network card 113 may determine whether a queue to which the IO command to be processed belongs is available or normal according to the queue state. The network card 113 may also verify the authority of the IO command to be processed when the IO command to be processed is processed according to the key. The network card 113 may also convert a virtual address of a read command or a write command in the IO command to be processed stored in the queue according to the physical base address of the queue to which the IO command to be processed belongs, to obtain a physical address of the read command or the write command in the IO command to be processed.
Optionally, other relevant data may also be included in QPC, such as one or more combinations of the following: length of data queued in memory 111, credits, operating mode, etc.
The network card 113 may process the IO command to be processed according to QPC. For ease of description, assume that the IO command to be processed is stored in queue a, which corresponds to QPC a. For example, the network card 113 may determine whether queue a is available or normal according to the queue status included in QPC a. For another example, the network card 113 may verify the authority of the IO command to be processed stored in the queue a according to the key included in the QPC a. For another example, the network card 113 may also convert a virtual address of a read command or a write command in the IO command to be processed stored in the queue a according to the physical base address of the queue included in the QPC a, to obtain a physical address of the read command or the write command in the IO command to be processed.
The following explains the data structure of the CQC.
CQCs may include: physical base address of queue, PI, CI. The physical base address of the queue indicates the physical base address of the CQ in the memory, so that the network card 113 stores the completion command in the CQ according to the physical base address. PI indicates the number of completion commands stored by the network card 113 in the CQ corresponding to the CQC. For example, the network card 113 stores a completion command, modifies the PI in the CQC corresponding to the CQ storing the completion command, and adds 1 to the PI count. CI represents the number of completion commands that the processor has processed in the CQ corresponding to the CQC. For example, the processor processes a completion command, modifies the CI in the CQC corresponding to the CQ storing the completion command, and increments the CI count by 1.
The data structure of the EQC is explained below.
The EQC may include: physical base address of queue, PI, CI. The physical base address of the queue represents a physical base address of the EQ in the memory, so that the network card 113 stores the event command in the EQ according to the physical base address. PI indicates the number of event commands stored by the network card 113 in the EQ corresponding to the EQC. For example, the network card 113 stores an event command, modifies the PI in the EQ corresponding to the EQ storing the event command, and increments the PI count by 1. CI denotes the number of event commands that the processor has processed in the EQ to which the EQC corresponds. For example, the processor processes an event command, modifies the CI in the EQC corresponding to the EQ storing the event command, and increments the CI count by 1.
Step 240: the network card 113 sends the processed IO command to the network card 123 of the server 120 through the network.
The above steps 210 to 240 describe a process in which the local server processes the IO command to be processed after the upper layer application running on the local server sends the IO command to be processed. According to the above process, the network card needs to process the IO command according to the queue information. Therefore, how to quickly acquire the corresponding queue information has a great influence on the processing speed of the IO command.
The network card of the server may further include a cache space, for example, a cache memory (cache space). The buffer space may be used to store part of the queue information. When the network card processes the IO command to be processed, the stored queue information can be directly obtained from the cache space of the network card, and the IO command to be processed is processed according to the queue information, so that the network card is prevented from frequently reading and writing the queue information in the memory through the bus.
Because the scale of the buffer space in the network card is small, only part of the queue information can be buffered, and therefore, the queue information buffered in the buffer space needs to be updated. However, since the queues storing the IO commands to be processed have a certain randomness, and one queue corresponds to one queue information, it is difficult to replace the queue information that is not commonly used in the cache space, which results in a decrease in the hit rate of the cache space, and further causes the network card to initiate unnecessary read-write access of the bus, and increases the processing delay of the network card on the IO commands to be processed.
It should be understood that the ratio of the accessed queue information cached in the cache space to all caches in the cache space over a period of time is referred to as the hit rate of the cache space. The larger the ratio, the higher the hit rate of the cache space. The smaller the ratio, the lower the hit rate of the cache space.
According to the technical scheme provided by the embodiment of the application, the hit rate of the cache space in the network card can be improved, the delay caused by the fact that the network card frequently reads and writes the queue information stored in the memory through the bus and the waste of the bus bandwidth are further reduced, the processing delay of the network card on the IO command to be processed is reduced, and the transmission performance is improved.
Fig. 3 is a schematic structural diagram of a possible server 110 provided in an embodiment of the present application. As shown in fig. 3, the server 110 may include a memory 111, a processor 112, and a network card 113. Optionally, server 110 may also include bus 114. The memory 111, the processor 112, and the network card 113 may be connected via a bus 114.
It should be understood that the network card 113 may be in the server 110, or may also be an external device to the server 110, and is connected to the server 110 through an interface. Fig. 3 illustrates the network card 113 in the server 110 as an example.
A plurality of queues, e.g., queue A, queue B,. and queue N, may be included in memory 111. One or more commands may be stored in each queue. Taking the queue as a sending queue in the QP, one or more IO commands to be processed may be stored in the QP. Taking the queue as a CQ for example, one or more completion commands may be stored in the CQ. Taking the queue as an EQ for example, the EQ may store one or more event commands therein.
The network card 113 may include: input processing unit 310, buffer space 320, prediction unit 330, processing unit 340. Each of the above units will be described in detail below.
The input processing unit 310: it is mainly used to receive the hint message (e.g., DB) sent by the processor 112, and the DB is used to instruct the queue of the memory 111 to store the IO commands to be processed. The input processing unit 310 may process an IO command to be processed. Specifically, in a possible implementation manner, the input processing unit 310 may obtain the IO command to be processed and the QPC corresponding to the queue from the queue of the memory 111, and process the IO command to be processed according to the QPC. In another possible implementation, the input processing unit 310 may also generate a completion command after the IO command processing is completed. And processing the completion command according to the CQC corresponding to the CQ storing the completion command. In another possible implementation, the input processing unit 310 may also generate an event command after the IO command processing is completed. And processing the event command according to the EQC corresponding to the EQ storing the event command. For the specific processing procedure, please refer to the above description, which is not repeated herein.
The buffer space 320: for storing queue information, for example, queue information a, queue information B, ·, and queue information M are stored in the buffer space 320. Wherein M is less than N. The queue information a corresponds to the queue a and is used for processing the commands stored in the queue a, and the queue information B corresponds to the queue B and is used for processing the commands stored in the queue B, and so on.
One queue information may include a queue information entry (entry) and corresponding status information. Referring to fig. 4, DATA in the buffer space 320 is a space for queue information entries for storing queue information, and CTRL _ DATA is used to store corresponding state information of the queue information in the DATA in the buffer space 320. For example, as shown in FIG. 4, queue information A may include queue information entry A and corresponding state information A.
The state information stores the relevant state information of the queue information in the buffer space 320. The status information may be represented by a flag bit or may be represented by a field, which is not specifically limited in this application.
The queue information in the embodiment of the present application may include one or more of the following: QPC, CQC, EQC.
The prediction unit 330: it is mainly responsible for implementing the prediction of the possibility of subsequent access to the queue information stored in the cache space 320. If the probability that certain queue information is accessed is greater than the probability that other queue information is accessed, the queue information may be referred to as high frequency queue information.
The processing unit 340: the prediction unit 330 is mainly responsible for storing the high-frequency queue information determined by the prediction unit 330 in the cache space, and can also update and replace the queue information stored in the cache space 320 according to the prediction result obtained by the prediction unit 330, and the storage space occupied by the replaced queue information can be used for storing other new queue information.
It should be understood that the term "unit" herein may be implemented in software and/or hardware, and is not particularly limited thereto. For example, a "unit" may be a software program, a hardware circuit, or a combination of both that implement the above-described functions. When any of the above units is implemented in software, the software exists in the form of computer program instructions and is stored in the memory of the network card, and the processor of the network card may be configured to execute the program instructions to implement the above method flow. The processor may include, but is not limited to, at least one of: various computing devices that run software, such as a Central Processing Unit (CPU), a microprocessor, a Digital Signal Processor (DSP), a Microcontroller (MCU), or an artificial intelligence processor, may each include one or more cores for executing software instructions to perform operations or processing. The processor may be a single semiconductor chip, or may be integrated with other circuits to form a system on chip (SoC), or may be integrated as a built-in processor of an application-specific integrated circuit (ASIC), which may be packaged separately or packaged together with other circuits. The processor may further include necessary hardware accelerators such as Field Programmable Gate Arrays (FPGAs), Programmable Logic Devices (PLDs), or logic circuits implementing dedicated logic operations, in addition to cores for executing software instructions to perform operations or processes.
When the above units are implemented as hardware circuits, the hardware circuits may be implemented as a general purpose Central Processing Unit (CPU), a Microcontroller (MCU), a Microprocessor (MPU), a Digital Signal Processor (DSP), a system on chip (SoC), or may be implemented as an application-specific integrated circuit (ASIC), or may be implemented as a Programmable Logic Device (PLD), which may be a Complex Programmable Logic Device (CPLD), a Field Programmable Gate Array (FPGA), a General Array Logic (GAL), or any combination thereof, and may run or be independent of software to perform the above processes.
The following describes in detail a method for processing information of a network card provided in an embodiment of the present application, with reference to fig. 5, by taking the server shown in fig. 3 as an example.
Fig. 5 is a schematic flowchart of a method for processing information of a network card according to an embodiment of the present application. As shown in FIG. 5, the method may include steps 510-530, which are described in detail below with respect to steps 510-530, respectively.
Step 510: the prediction unit 330 predicts high frequency queue information that is more likely to be accessed than other queue information.
In this embodiment, the prediction unit 330 may predict the possibility that at least one queue information stored in the buffer space 320 is accessed, and determine the high-frequency queue information according to the queue information with the possibility that the queue information is accessed more than other queue information.
In the embodiment of the present invention, the at least one queue information stored in the buffer space 320 may be one or a combination of any of the following: QPC, CQC, EQC. For details, refer to the above description, and are not repeated herein.
Take queue information as QPC as an example. In one possible implementation, the prediction unit 330 may predict the possibility of subsequent accesses to the QPC according to whether the QPC is hit by a read request sent by the input processing unit 310. In another possible implementation manner, the prediction unit 330 may also perform pre-reading on the IO command to be processed (for example, obtain a scheduling order of the IO command), and obtain related information of a queue to which the IO command to be processed belongs, so as to predict a possibility that at least one QPC stored in the cache space 320 is subsequently accessed according to the related information of the queue. In another possible implementation, the prediction unit 330 may further predict the possibility of a subsequent access of the QPC according to a difference between values of a read pointer (e.g., CI) and a write pointer (e.g., PI) in the QPC.
Take queue information as CQC as an example. The prediction unit 330 predicts the likelihood of the at least one CQC being accessed similarly to the prediction of the likelihood of the at least one QPC being accessed. For example, the prediction unit 330 may predict the possibility that the CQC is subsequently accessed according to whether the CQC is hit by the read request sent by the input processing unit 310. For another example, the prediction unit 330 may also pre-read (for example, obtain the IO command scheduling order) the multiple completion commands to be processed, and obtain the relevant information of the CQ to which the multiple completion commands to be processed belong, so as to predict the possibility that at least one CQC stored in the cache space 320 is subsequently accessed according to the relevant information of the CQ.
Take queue information as EQC as an example. The prediction unit 330 predicts the likelihood that the at least one EQC is accessed similarly to the prediction of the likelihood that the at least one QPC is accessed. For example, the prediction unit 330 may predict the likelihood that an EQC will be subsequently accessed based on whether the EQC is hit by a read request sent by the input processing unit 310. For another example, the prediction unit 330 may also pre-read a plurality of event commands to be processed, and acquire relevant information of an EQ to which the plurality of event commands to be processed belong, so as to predict the possibility that at least one EQC stored in the cache space 320 is subsequently accessed according to the relevant information of the EQ.
The following describes some possible implementations in detail with reference to specific examples, which will not be described in detail herein.
It should be understood that, the method for predicting, by the prediction unit 330, that at least one queue information stored in the buffer space 320 is a high-frequency queue information may be prediction according to any one of the above methods, prediction according to superposition of any two of the above methods, prediction according to superposition of any three of the above methods, and this is not particularly limited in this application.
Step 520: the processing unit 340 holds the high frequency queue information in a buffer space.
In the embodiment of the application, the queue information stored in the cache space corresponds to the queues in the memory one by one, and each queue information is used for the network card to process the IO command in the queue corresponding to the queue information.
As an example. In a possible implementation manner, the network card may read the high-frequency queue information from the memory, and store the high-frequency queue information in the buffer space. In another possible implementation manner, the network card determines that the high-frequency queue information is already stored in the cache space, and sets state information of the high-frequency queue information in the cache space, where the state information is used to indicate that the high-frequency queue information continues to be stored in the cache space.
The status information is not specifically limited in the embodiment of the present application, and may be priority information or locking information. The priority information is used for indicating the priority of the high-frequency queue information updated in the cache space, and the locking information is used for indicating that the high-frequency queue information is in a locking state without being updated in the cache space.
Specifically, the processing unit 340 adjusts the state information corresponding to the high frequency queue information according to the prediction result. The prediction unit 330 may predict the possibility that at least one queue information stored in the buffer space 320 is subsequently accessed, and the processing unit 340 sets state information corresponding to the predicted high-frequency queue information, where the high-frequency queue information is one or more of the queue information stored in the buffer space, and the probability of accessing the queue information determined as the high-frequency queue information is greater than the probability of accessing other queue information in the buffer space.
There are various specific implementation manners, and in one possible implementation manner, a flag bit corresponding to the high-frequency queue information may be set, for example, the flag bit is set. In another possible implementation manner, a field corresponding to one or more pieces of queue information in the at least one piece of queue information may also be modified.
Taking setting the flag bit corresponding to the high-frequency queue information as an example. The flag bit corresponding to the high-frequency queue information may include one or any combination of the following: valid, dirty, lock, etc. The valid flag bit indicates whether the queue information entry is valid, the dirty flag bit indicates whether the queue information entry is dirty data, and the lock flag bit indicates whether the queue information entry is locked.
In the embodiment of the present application, there may be one or more lock flag bits, and the number of the lock flag bits is specifically related, which is not specifically limited in the present application.
Step 530: the processing unit 340 updates part or all of the queue information stored in the buffer space 320 according to the state information corresponding to the queue information in the buffer space.
Taking the state information corresponding to the queue information as priority information, for example, the processing unit 340 may classify the queue information in the cache space according to the priority information of the queue information stored in the cache space, where the higher the rank, the higher the possibility that the queue information is accessed is; and preferentially updating the queue information with low grade stored in the buffer space.
Taking the state information corresponding to the queue information as priority information as an example, the processing unit 340 may classify the queue information in the cache space according to the priority information, where higher levels of the queue information are more likely to be accessed; and preferentially updating the queue information with low grade stored in the buffer space.
In the embodiment of the application, the high-frequency queue information can be determined by predicting the possibility that the queue information stored in the cache space of the network card is accessed, the strategy of replacing or updating the queue information in the cache space is optimized, and the high-frequency queue information which can be used subsequently is prevented from being replaced, so that the hit rate of the cache space of the network card is improved, the number of times of bus read-write access generated by the network card is reduced, the waste of bus bandwidth is reduced, the processing delay of the network card on the IO command to be processed is reduced, and the transmission performance is improved.
Different implementations of predicting the likelihood that at least one QPC stored in the cache space is accessed and setting the state information of the high frequency queue information in the foregoing steps 510 and 520 are described in detail below, taking the queue information stored in the cache space as QPC as an example.
In a possible implementation manner, since the input processing unit 310 in the network card 113 performs read-write access on a QPC stored in the cache space 320 in pairs, the prediction unit 330 may determine that the QPC is high-frequency queue information by determining whether a QPC stored in the cache space 320 is read or not, and determining that the QPC is subsequently accessed with a high probability. That is to say, the network card searches for the currently read QPC in the QPCs cached in the cache space, and determines the currently read QPC as the high-frequency queue information.
For example, the input processing unit 310 determines that the IO command to be processed needs to be processed according to the QPC a according to the queue a to which the IO command to be processed belongs, and the input processing unit 310 needs to read the QPC a. After the pending IO command is processed, the read pointer (e.g., CI) in the QPC a needs to be modified, and therefore, a write operation needs to be subsequently performed on the QPC a.
For QPCs in the cache space that have been hit by the read request, the processing unit 340 may set the state information corresponding to these QPCs. Specifically, as an example, a flag bit corresponding to QPC may be set. For example, in the embodiment of the present application, a flag bit may include a lock (lock) 1. The processing unit 340 may set the lock1 flag bit corresponding to the QPC, for example, set the lock1 flag bit to 1, and setting the lock1 flag bit to 1 indicates that the QPC is more likely to be accessed subsequently.
Optionally, when the read QPC subsequently receives a write access request sent by the input processing unit 310, the processing unit 340 may clear the lock1 flag bit corresponding to the QPC, for example, set the lock1 flag bit to 0, and set the lock1 flag bit to 0, which indicates that the QPC is less likely to be subsequently accessed.
In another possible implementation manner, the prediction unit 330 may further obtain a scheduling order of the IO commands to be processed, determine, according to the scheduling order of the IO commands, a QPC corresponding to a queue to which the IO commands to be processed belong, and determine, as the high-frequency queue information, a QPC corresponding to a queue to which the IO commands to be processed belong. It should be understood that the scheduling order of the IO commands records the IO commands to be processed. That is, before the IO command to be processed is processed, the prediction unit 330 may predict the possibility that the QPC stored in the cache space 320 is read in a short term according to the read-ahead result of the IO command to be processed.
For example, input processing unit 310 receives a plurality of hint messages (e.g., DBs) sent by processor 112 that indicate that there are a plurality of IO commands pending in server 110. The input processing unit 310 may pre-read a plurality of DBs before the plurality of IO commands to be processed are processed, and transmit the pre-read result to the prediction unit 330. For example, the input processing unit 310 may determine queue information to which the IO command to be processed belongs according to information such as a queue index of a queue to which the IO command to be processed belongs, a position (e.g., PI) of the IO command to be processed in the queue, and the like, which are included in the DB. The prediction unit 330 may determine, according to queue information to which a plurality of IO commands to be processed sent by the input processing unit 310 belong, a plurality of corresponding QPCs, and determine that the QPCs are high-frequency queue information. The processing unit 340 sets the state information corresponding to the QPCs. Specifically, as an example, the flag bits corresponding to these QPCs may be set. For example, in the embodiment of the present application, a lock (lock)2 may be included in one flag bit. Prediction unit 330 may set the lock2 flag bit corresponding to QPC a, e.g., set lock2 flag bit to 1.
Optionally, in some embodiments, if multiple QPCs determined according to the read-ahead result are not stored in the cache space 320, the network card may obtain the multiple QPCs from the memory 111 in advance and store the QPCs in the cache space 320. Therefore, the multiple QPCs can be obtained from the memory 111 in advance, and a large IO processing delay caused by obtaining the multiple QPCs from the memory 111 when an IO command is processed is avoided.
In another possible implementation manner, the prediction unit 330 may also predict subsequent read and write accesses to the QPC according to the PI and CI of the QPC stored in the read cache space 320, and according to a difference between the PI and the CI. When the difference between the read-write pointers is greater than a preset threshold, it indicates that the subsequent access possibility of the QPC is high, and it can be determined that the QPC is high-frequency queue information.
It should be understood that PI represents the number of IO commands in the queue corresponding to the QPC stored in the cache space 320, and the ordering of the IO commands in the queue may be from head of queue to tail of queue, starting with 1. CI represents the number of IO commands already processed in the queue corresponding to the QPC, one IO command is processed, and CI count is increased by 1. Therefore, if the difference between the PI and the CI is large, the larger the PI is, the smaller the CI is, the larger the number of IO commands to be processed stored in the queue is, and the probability that the subsequent QPC corresponding to the queue is subsequently accessed is also large.
For example, the prediction unit 330 may compare the difference between the PI and the CI in the QPC in real time, and when the difference exceeds a certain preset threshold, the processing unit 340 may set the state information corresponding to the QPC. Specifically, as an example, a flag bit corresponding to QPC may be set. For example, in the embodiment of the present application, a lock (lock)3 may be included in one flag bit. Processing unit 340 may set the lock3 flag bit corresponding to QPC a, e.g., set lock3 flag bit to 1.
Optionally, when the difference between the PI and the CI in the QPC is smaller than a certain preset threshold, the processing unit 340 may clear the lock3 flag bit corresponding to the QPC, for example, set the lock3 flag bit to 0.
It should be noted that, the three prediction methods listed above may determine the high-frequency queue information in the buffer space through one of the prediction methods or a combination of any two or three prediction methods.
It should be noted that, for example, the setting of the status information is realized by setting the flag bit. The flag bit may be set to be one, or may be set to be multiple, and the setting of the number of the flag bits is related to the selected prediction method, which is not specifically limited in this application. For example, a flag, e.g., a lock flag, may be set if the prediction is obtained by one of the methods. As another example, if the prediction is obtained by a combination of any two of the above methods, two flags may be set, for example, a Lock (lock)1 flag and a Lock (lock)2 flag. For another example, if the prediction result is obtained by combining the above three methods, three flags may be set, for example, lock1 flag, lock2 flag, and lock3 flag.
In the embodiment of the present application, the method for predicting the possibility that at least one CQC or EQC is accessed by the prediction unit 330 is similar to the method for predicting the possibility that at least one QPC is accessed. For example, since the input processing unit 310 in the network card 113 has paired read and write accesses to the CQC or the EQC stored in the cache space 320, the network card 113 also modifies the PI pointer in the CQC or the EQC after reading the CQC or the EQC in the cache space 320. Therefore, the prediction unit 330 may determine that the probability of subsequent access to a CQC or an EQC is high by determining whether a CQC or an EQC stored in the buffer space 320 is read, and the processing unit 340 may adjust the state information corresponding to the CQC or the EQC. For another example, the prediction unit 330 may also predict the possibility that the CQC or the EQC stored in the cache space 320 is read in a short period according to the scheduling order of the to-be-processed completion command or the event command before the to-be-processed completion command or the event command is processed. For the specific prediction process and the process of setting state information according to the prediction result, reference is made to the above prediction method for QPC and the setting method for state information, which are not described herein again.
In the following, a prediction result obtained by combining the above three prediction methods, and a flag bit corresponding to at least one queue information set according to the prediction result are taken as an example. A detailed description is given of a specific implementation process of the queue information replacement performed by the processing unit 340.
For convenience of description, the prediction result of QPC is explained below as an example.
The processing unit 340 may divide at least one QPC entry stored in the cache space 320 into several priority levels according to the lock flag bit corresponding to the QPC set by the prediction unit 330. For convenience of description, the division of at least one QPC entry into four levels is described in fig. 6 as an example.
Grade 1: the QPC entry in cache space 320, which is likely to be accessed in a short period of time, has lock1 flag bit set. The remaining lock flags, for example, the lock2 flag and the lock3 flag, may be 1 or 0, which is not specifically limited in this application.
Grade 2: and a QPC entry with a high possibility of being accessed in a short period in the cache space 320, wherein a lock2 flag bit corresponding to the QPC entry is set, and a lock1 flag bit is not set. That is, the lock2 flag bit corresponding to the QPC entry is 1, and the lock1 flag bit is 0. The remaining lock flag bits, for example, the lock3 flag bit may be 1 or may also be 0, which is not specifically limited in this application.
Grade 3: the QPC entry with a high possibility of being accessed in the cache space 320 for a long time has the lock3 flag bit set, and the lock1 flag bit and the lock2 flag bit not set. That is, the lock3 flag bit and the lock1 flag bits and the lock2 flag bit of the QPC entry are 1 and 0, respectively.
Grade 4: and a QPC entry which is less likely to be accessed in the cache space 320 in a long term is not set in the corresponding lock1 flag bit, lock2 flag bit and lock3 flag bit. That is, the lock1 flag bit, the lock2 flag bit, and the lock3 flag bit corresponding to the QPC entry are all 0.
Referring to FIG. 6, when a new QPC needs to be cached in cache space 320, if there is unoccupied space (level 5) in cache space 320, processing unit 340 may store the new QPC in the unoccupied space. If cache space 320 does not have unoccupied space, a replacement may occur, and swap unit 340 may need to delete part of the QPC and store the new QPC in the storage space occupied by the deleted QPC.
The priority levels according to which the processing unit 340 performs the replacement are, in order from high to low: level 4-level 3-level 2-level 1. That is, in the embodiment of the present application, the processing unit 340 may be replaced according to the meaning of the lock flag.
As an example, if none of the three lock flags (lock1 flag, lock2 flag, lock3 flag) is a set QPC, e.g., a QPC for level 4, which is less likely to be accessed during subsequent accesses, the processing unit 340 may prioritize the alternative QPC for level 4. As another example, if there is no QPC corresponding to class 4, the processing unit 340 may consider QPCs corresponding to class 3, which may have a large number of subsequent read and write accesses, but these IO commands have not been scheduled in a short period of time, and therefore, may also consider QPCs corresponding to class 3. As another example, if there is no QPC for level 3, the processing unit 340 may consider replacing the QPC for level 2. As another example, if there is no QPC for level 2, the processing unit 340 may eventually consider replacing the QPC for level 1.
In the above technical scheme, the network card may predict in advance the possibility that the QPC stored in the cache space is accessed, thereby avoiding replacing a subsequent used QPC when the QPC in the cache space needs to be updated, optimizing a replacement policy of the cache space, and improving the hit rate of the cache space of the network card.
In this embodiment of the application, a specific implementation process of the processing unit 340 replacing the stored CQC or the stored EQC according to the prediction result of the CQC being accessed or the prediction result of the EQC being accessed is similar to the method described above, and for specific reference, a method of replacing the QPC with the prediction result of the QPC is referred to above, and details are not described here again.
Fig. 7 is a schematic structural diagram of a chip 700 according to an embodiment of the present application, where the chip 700 is applied to a server system, a local server and a remote server in the server system perform data transmission through RDMA, at least one queue is stored in a memory of the local server, each queue is used to store an input/output IO command, and the IO command instructs the local server to perform data access to the remote server, and the chip 700 includes:
the prediction unit 330 is configured to predict high-frequency queue information, where the possibility of accessing the high-frequency queue information is greater than that of other queue information;
the processing unit 340 is configured to store the high-frequency queue information in a cache space, where the cache space is in a network card of the local server, the queue information stored in the cache space corresponds to the queues in the memory one to one, and each queue information is used for the network card to process an IO command in the queue corresponding to the queue information.
Optionally, the processing unit 340 is further configured to: and updating or replacing part or all of the queue information stored in the buffer space.
Optionally, the prediction unit 330 is specifically configured to: and searching the currently read queue information in the queue information cached in the cache space, and determining the currently read queue information as the high-frequency queue information.
Optionally, the queue information is a queue pair context QPC, and the prediction unit 330 is specifically configured to: and calculating the difference value of a read pointer and a write pointer in the QPC of the cache in the cache space, and determining the QPC with the difference value of the read pointer and the write pointer larger than a preset threshold as the high-frequency queue information.
Optionally, the prediction unit 330 is specifically configured to: obtaining an IO command scheduling sequence, wherein the IO command scheduling sequence records IO commands to be processed; and determining queue information corresponding to a queue to which the IO command to be processed belongs according to the IO command scheduling sequence, and determining the queue information corresponding to the queue to which the IO command to be processed belongs as the high-frequency queue information.
It should be noted that, the network card in the present application may also predict the high-frequency queue information according to any two or three prediction methods, and store the predicted high-frequency queue information in a cache space of the network card.
Optionally, the processing unit 340 is specifically configured to: reading the high frequency queue information from the memory; and storing the high-frequency queue information in the cache space.
Optionally, the processing unit 340 is specifically configured to: determining that the high frequency queue information has been stored in the cache space; setting state information of the high-frequency queue information in the cache space, wherein the state information is used for indicating that the high-frequency queue information is continuously stored in the cache space.
Optionally, the state information includes priority information or locking information, the priority information is used to indicate a priority of the high-frequency queue information being updated in the cache space, and the locking information is used to indicate that the high-frequency queue information is in a locking state of not being updated in the cache space.
Alternatively, the priority information or the lock information may be represented by a status flag bit.
Optionally, the queue information comprises one or more of the following information: queue pair context QPC, completion queue context CQC, event queue context EQC.
Optionally, the processing unit 340 is further configured to: and updating or replacing part or all of the queue information stored in the cache space according to the priority information or the locking information.
Optionally, the processing unit 340 is specifically configured to: the state information in the cache space of the network card divides the queue information in the cache space according to grades, wherein the higher the grade, the higher the possibility that the queue information is accessed is; and preferentially updating the queue information with low grade stored in the cache space.
It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
The embodiment of the present application further provides a network card, which includes the chip 700 described in any one of the above. For a detailed description of the network card, please refer to fig. 3 and the description of the network card 113, which are not described herein again.
An embodiment of the present application further provides a server, where the server includes: memory, processor, network card, etc. For a detailed description of the server, please refer to the description of fig. 1 and fig. 3, which is not described herein again.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (18)

1. An information processing method of a network card, characterized in that a local server and a remote server perform data transmission through remote direct memory access RDMA, at least one queue is stored in a memory of the local server, each queue is used for storing input/output (IO) commands, and the IO commands instruct the local server to perform data access to the remote server, the method comprising:
the network card predicts high-frequency queue information, and the possibility of accessing the high-frequency queue information is higher than that of other queue information;
the network card stores the high-frequency queue information in a cache space, wherein the cache space is in the network card, the queue information stored in the cache space corresponds to the queues in the memory one by one, and each queue information is used for the network card to process IO commands in the queue corresponding to the queue information.
2. The method of claim 1, wherein the network card predicting high frequency queue information comprises:
and the network card searches the currently read queue information in the queue information cached in the cache space, and determines the currently read queue information as the high-frequency queue information.
3. The method of claim 1, wherein the queue information is a queue pair context QPC, and wherein the network card predicting high frequency queue information comprises:
and the network card calculates the difference value of a read pointer and a write pointer in the QPC of the cache in the cache space, and determines the QPC with the difference value of the read pointer and the write pointer larger than a preset threshold as the high-frequency queue information.
4. The method of claim 1, wherein the network card predicting high frequency queue information comprises:
the network card acquires an IO command scheduling sequence, and the IO command scheduling sequence records IO commands to be processed;
and the network card determines the queue information corresponding to the queue to which the IO command to be processed belongs according to the IO command scheduling sequence, and determines the queue information corresponding to the queue to which the IO command to be processed belongs as the high-frequency queue information.
5. The method according to any one of claims 1 to 4, wherein the network card stores the high frequency queue information in a buffer space, comprising:
the network card reads the high-frequency queue information from the memory;
and storing the high-frequency queue information in the cache space.
6. The method according to any one of claims 1 to 4, wherein the network card stores the high frequency queue information in a buffer space, comprising:
the network card determines that the high-frequency queue information is stored in the cache space;
the network card sets state information of the high-frequency queue information in the cache space, and the state information is used for indicating that the high-frequency queue information is continuously stored in the cache space.
7. The method according to claim 6, wherein the status information comprises priority information indicating a priority of the high frequency queue information being updated in the buffer space or lock information indicating that the high frequency queue information is in a lock state of not being updated in the buffer space.
8. The method according to any one of claims 1 to 7, wherein the queue information comprises one or more of the following information: queue pair context QPC, completion queue context CQC, event queue context EQC.
9. A chip, applied to a server system, in which a local server and a remote server in the server system perform data transmission via RDMA, at least one queue is stored in a memory of the local server, each queue is used to store an input/output (IO) command, and the IO command instructs the local server to perform data access to the remote server, and the chip includes:
the prediction unit is used for predicting high-frequency queue information, and the possibility of accessing the high-frequency queue information is greater than that of other queue information;
and the processing unit is used for storing the high-frequency queue information in a cache space, wherein the cache space is in a network card of the local server, the queue information stored in the cache space corresponds to the queues in the memory one by one, and each queue information is used for the network card to process the IO commands in the queue corresponding to the queue information.
10. The chip of claim 9, wherein the prediction unit is specifically configured to: and searching the currently read queue information in the queue information cached in the cache space, and determining the currently read queue information as the high-frequency queue information.
11. The chip according to claim 9, wherein the queue information is a queue pair context QPC, and the prediction unit is specifically configured to: and calculating the difference value of a read pointer and a write pointer in the QPC of the cache in the cache space, and determining the QPC with the difference value of the read pointer and the write pointer larger than a preset threshold as the high-frequency queue information.
12. The chip of claim 9, wherein the prediction unit is specifically configured to:
obtaining an IO command scheduling sequence, wherein the IO command scheduling sequence records IO commands to be processed;
and determining queue information corresponding to a queue to which the IO command to be processed belongs according to the IO command scheduling sequence, and determining the queue information corresponding to the queue to which the IO command to be processed belongs as the high-frequency queue information.
13. The chip according to any one of claims 9 to 12, wherein the processing unit is specifically configured to:
reading the high frequency queue information from the memory;
and storing the high-frequency queue information in the cache space.
14. The chip according to any one of claims 9 to 12, wherein the processing unit is specifically configured to:
determining that the high frequency queue information has been stored in the cache space;
setting state information of the high-frequency queue information in the cache space, wherein the state information is used for indicating that the high-frequency queue information is continuously stored in the cache space.
15. The chip of claim 14, wherein the status information comprises priority information indicating a priority of the high frequency queue information being updated in the buffer space or lock information indicating that the high frequency queue information is in a lock state of not being updated in the buffer space.
16. The chip of any of claims 9 to 15, wherein the queue information comprises one or more of the following: queue pair context QPC, completion queue context CQC, event queue context EQC.
17. A network card, comprising: the chip of any one of claims 9 to 16.
18. A server comprising a memory and the network card of claim 17.
CN202010073159.1A 2019-09-06 2020-01-22 Cache implementation method with prediction mechanism Pending CN112463654A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/094031 WO2021042782A1 (en) 2019-09-06 2020-06-02 Network card information processing method and chip

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2019108439221 2019-09-06
CN201910843922 2019-09-06

Publications (1)

Publication Number Publication Date
CN112463654A true CN112463654A (en) 2021-03-09

Family

ID=74832776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010073159.1A Pending CN112463654A (en) 2019-09-06 2020-01-22 Cache implementation method with prediction mechanism

Country Status (2)

Country Link
CN (1) CN112463654A (en)
WO (1) WO2021042782A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303173A (en) * 2023-05-19 2023-06-23 深圳云豹智能有限公司 Method, device and system for reducing RDMA engine on-chip cache and chip
CN117112044A (en) * 2023-10-23 2023-11-24 腾讯科技(深圳)有限公司 Instruction processing method, device, equipment and medium based on network card
CN117573602A (en) * 2024-01-16 2024-02-20 珠海星云智联科技有限公司 Method and computer device for remote direct memory access message transmission

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114024915B (en) * 2021-10-28 2023-06-16 北京锐安科技有限公司 Traffic migration method, device and system, electronic equipment and storage medium
CN117749739B (en) * 2024-02-18 2024-06-04 北京火山引擎科技有限公司 Data transmission method, data reception method, device, equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7631106B2 (en) * 2002-08-15 2009-12-08 Mellanox Technologies Ltd. Prefetching of receive queue descriptors
US7466716B2 (en) * 2004-07-13 2008-12-16 International Business Machines Corporation Reducing latency in a channel adapter by accelerated I/O control block processing
CN103647807B (en) * 2013-11-27 2017-12-15 华为技术有限公司 A kind of method for caching information, device and communication equipment
US9311044B2 (en) * 2013-12-04 2016-04-12 Oracle International Corporation System and method for supporting efficient buffer usage with a single external memory interface
CN105468494A (en) * 2015-11-19 2016-04-06 上海天玑数据技术有限公司 I/O intensive application identification method
CN109117270A (en) * 2018-08-01 2019-01-01 湖北微源卓越科技有限公司 The method for improving network packet treatment effeciency

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116303173A (en) * 2023-05-19 2023-06-23 深圳云豹智能有限公司 Method, device and system for reducing RDMA engine on-chip cache and chip
CN116303173B (en) * 2023-05-19 2023-08-08 深圳云豹智能有限公司 Method, device and system for reducing RDMA engine on-chip cache and chip
CN117112044A (en) * 2023-10-23 2023-11-24 腾讯科技(深圳)有限公司 Instruction processing method, device, equipment and medium based on network card
CN117112044B (en) * 2023-10-23 2024-02-06 腾讯科技(深圳)有限公司 Instruction processing method, device, equipment and medium based on network card
CN117573602A (en) * 2024-01-16 2024-02-20 珠海星云智联科技有限公司 Method and computer device for remote direct memory access message transmission
CN117573602B (en) * 2024-01-16 2024-05-14 珠海星云智联科技有限公司 Method and computer device for remote direct memory access message transmission

Also Published As

Publication number Publication date
WO2021042782A1 (en) 2021-03-11

Similar Documents

Publication Publication Date Title
CN112463654A (en) Cache implementation method with prediction mechanism
US10956346B1 (en) Storage system having an in-line hardware accelerator
US10380035B2 (en) Using an access increment number to control a duration during which tracks remain in cache
US9639280B2 (en) Ordering memory commands in a computer system
EP3229142B1 (en) Read cache management method and device based on solid state drive
US9021189B2 (en) System and method for performing efficient processing of data stored in a storage node
WO2017050014A1 (en) Data storage processing method and device
US20130219117A1 (en) Data migration for composite non-volatile storage device
CN110688062B (en) Cache space management method and device
US20190073305A1 (en) Reuse Aware Cache Line Insertion And Victim Selection In Large Cache Memory
CN110209502B (en) Information storage method and device, electronic equipment and storage medium
EP3115904B1 (en) Method for managing a distributed cache
CN112667528A (en) Data prefetching method and related equipment
US11500577B2 (en) Method, electronic device, and computer program product for data processing
KR20100064673A (en) Memory device and management method of memory device
CN115934625B (en) Doorbell knocking method, equipment and medium for remote direct memory access
CN116324745A (en) Read option command and write option command
CN113094392A (en) Data caching method and device
US10747596B2 (en) Determining when to send message to a computing node to process items using a machine learning module
CN113742131B (en) Method, electronic device and computer program product for storage management
WO2023165543A1 (en) Shared cache management method and apparatus, and storage medium
CN117009389A (en) Data caching method, device, electronic equipment and readable storage medium
CN112667847A (en) Data caching method, data caching device and electronic equipment
CN116027982A (en) Data processing method, device and readable storage medium
JP3531739B2 (en) Digital systems and methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination