CN116932273B - Function level reset processing method and device used in RDMA and storage medium - Google Patents
Function level reset processing method and device used in RDMA and storage medium Download PDFInfo
- Publication number
- CN116932273B CN116932273B CN202311195508.7A CN202311195508A CN116932273B CN 116932273 B CN116932273 B CN 116932273B CN 202311195508 A CN202311195508 A CN 202311195508A CN 116932273 B CN116932273 B CN 116932273B
- Authority
- CN
- China
- Prior art keywords
- function
- time
- flr
- time node
- host
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 10
- 230000006870 function Effects 0.000 claims abstract description 284
- 238000012545 processing Methods 0.000 claims abstract description 113
- 238000000034 method Methods 0.000 claims abstract description 26
- 230000004044 response Effects 0.000 claims abstract description 11
- 238000004064 recycling Methods 0.000 claims abstract description 4
- 238000011084 recovery Methods 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 description 18
- 230000008569 process Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007488 abnormal function Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000004069 differentiation Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0727—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0793—Remedial or corrective actions
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a functional level reset processing method used in RDMA, which comprises the following steps: the method comprises the steps that a receiving host sends a Function level reset FLR command message aiming at a Function, and after RDMA equipment releases resources, an FLR response message is returned to the host; marking the Function as being in an FLR processing state, and obtaining a corresponding ending time stamp of the FLR processing; according to the marks of the functions and the ending time stamp, recycling the time nodes corresponding to the functions in the host memory; and marking the Function as being in the FLR processing state after recovering the time node corresponding to the Function. The application also provides a corresponding device and a storage medium. By implementing the application, the FLR processing efficiency can be improved, and the required resources can be reduced.
Description
Technical Field
The present application relates to the field of data storage technologies, and in particular, to a method and apparatus for functional level reset processing in RDMA, and a storage medium.
Background
In the application of remote direct data access (Remote Direct Memory Acess, RDMA) technology, each Queue Pair (QP) in each Function (Function) is configured to mount a time node (timenode) for timer-waking up the RDMA device for corresponding transceiving.
In general, the timing function of the time node is implemented by time-wheels, taking a response type (ACK) timer as an example, in an application scenario, the RDMA engine provides multiple time-wheels (e.g. 4) each capable of representing a duration of 2ms/32ms/512ms/8s, each time-wheel containing 16 scales, and each time-wheel scale representing a duration of 128us/2ms/32ms/512ms, respectively. Because the number of QPs supported by the RDMA engine is quite large, usually M (megabits), the maximum demand of the RDMA engine is 1M ACK type time nodes, the RDMA engine manages the time nodes in a linked list mode, the RDMA engine provides a linked list for each time round scale, when a certain time round scale is overtime, the RDMA engine reads all the time nodes in the linked list corresponding to the scale for the RDMA engine to judge overtime, under the extreme condition, 1M time nodes can be linked into the linked list corresponding to the same scale of the same time round, and the linked list nodes can be stored in an external storage space (e.g. DDR) of a host.
In RDMA communication, when some situations occur (such as abnormal functions or needs to be reset), a Function level reset (Function LEVEL RESET, FLR) process needs to be performed on the corresponding Function; in PCIe devices supporting multiple functions, it is desirable to perform a Function Level Reset (FLR) that only resets certain functions without affecting the normal operation of other functions. To achieve this, the data stream and time node associated with a particular Function need to be purged to ensure that the data of that Function is free of residuals, preventing it from generating residual or erroneous data transmissions.
Since each queue pair QP in the Function will mount a time node, during FLR processing, the corresponding time node of the Function needs to be cleared, specifically, all time nodes in the timing period need to be read, and the value of the valid bit in each time node is set to 0.
There are two embodiments in the existing solution, but there are drawbacks, including in particular:
the first way is: waiting for each timing period to end, when reading the time node, if it is found to be the time node of the Function that is undergoing FLR processing, it is directly discarded. This approach handles only time nodes within the current timing period, which is relatively simple to operate, but may require waiting for one or more timing periods to clear all time nodes for the relevant function. For example, when the whole time period is 10ms, it is at least necessary to wait for 10ms to complete the processing of the FLR of a single Function.
The second way is: and directly reading the time nodes of all the timing periods, and writing the valid positions in the time nodes as 0 back to the external storage space. This approach involves active scanning and write-back operations on external memory, which can generate a large number of read-write requests, and from the resources and bandwidth of the RDMA device (e.g., PCIe device) occupied, can interfere with normal read-write data traffic, increasing latency, and resulting in reduced performance.
Disclosure of Invention
The application aims to provide a functional level reset processing method, a device and a storage medium for RDMA. The efficiency of FLR processing can be improved and the resources required reduced.
To solve the above technical problem, as one aspect of the present application, there is provided a function level reset processing method for RDMA, which at least includes the following steps:
the method comprises the steps that a receiving host sends a Function level reset FLR command message aiming at a Function, and after RDMA equipment releases resources, an FLR response message is returned to the host;
marking the Function as being in an FLR processing state, and obtaining a corresponding ending time stamp of the FLR processing;
According to the marks of the functions and the ending time stamp, recycling the time nodes corresponding to the functions in the host memory;
and marking the Function as being in the FLR processing state after recovering the time node corresponding to the Function.
Wherein, further include:
At least one time node is allocated for each Function, classified according to the time wheel type and the scale, and stored in an external storage space of the host in a linked list mode;
wherein the data structure of each of the time nodes comprises: spare field, FLRset bits, valid bit, type bit, channel identification field, host identification field, function identification field, and QP number field.
The method comprises the steps of marking the Function as being in an FLR processing state, obtaining a corresponding ending time stamp of the FLR processing, and further comprising the steps of:
setting corresponding positions in a Function identification bitmap in a channel where the Function is located, wherein the corresponding positions are set to indicate that current FLR processing is started; writing the ending time stamp corresponding to the function into a channel time stamp table; and bits corresponding to the Funtion identifications are arranged in the Function identification bitmap.
According to the indication of the Function and the end time stamp, the recovery processing is performed on the time node corresponding to the Function in the host memory, and the method further includes:
discarding the time node corresponding to the Function if the time of the time node corresponding to the Function arrives before the end time stamp arrives during the bitmap setting period corresponding to the Function; and if a new time node allocated to the Function by the host is received, directly discarding the time node.
Wherein, after recovering the time node corresponding to the Function, marking the Function as being in the FLR processing state, further comprising:
resetting the corresponding bit in the Function identification bitmap when the ending time stamp arrives, and indicating that the current FLR processing is ended; or (b)
Before the end time stamp arrives, detecting that the number of time nodes corresponding to the Function is zero, and resetting the corresponding bit in the Function identification bitmap.
Wherein, further include:
Before the end time stamp arrives, if the Function is detected to be re-enabled, when a new time node allocated by the host for the Function is received, carrying out FLRset position 1 in the new time node, and carrying out mounting on the new time node to an external storage space of the host.
Wherein, further include:
Copying the Function identification bitmap in the channel where the Function is located when each time round starts to form an original Function identification bitmap and a copied Function identification bitmap; the original Function identification bitmap is used for polling a timestamp table of the current channel;
When a time node distributed by a host is received, detecting that a corresponding bit in the original Function identification bitmap is in a set state, judging that the Function is restarted if the Function identification is valid, marking FLRset positions 1 of the received time node, marking that the later time node is valid, and discarding the time node after reading before the time node is required, and resetting the corresponding bit in the copied Function identification bitmap.
As another aspect of the present application, there is also provided a functional level reset processing apparatus for use in RDMA, including at least:
the FLR command processing unit is used for receiving a Function-level reset FLR command message which is sent by the host and used for returning an FLR response message to the host after the RDMA equipment releases resources;
The FLR state marking processing unit is used for marking the Function as being in an FLR processing state and obtaining an ending time stamp corresponding to the FLR processing;
the time node recovery unit is used for carrying out recovery processing on the time node corresponding to the Function in the host memory according to the mark of the Function and the ending time stamp;
and the recovery end processing unit is used for marking the Function as a finished FLR processing state when the time node corresponding to the Function is recovered.
Wherein, further include:
The time node storage processing unit is used for distributing at least one time node for each Function, classifying according to the time wheel type and the scales and storing the time nodes into an external storage space of the host in a linked list mode;
wherein the data structure of each of the time nodes comprises: spare field, FLRset bits, valid bit, type bit, channel identification field, host identification field, function identification field, and QP number field.
Wherein, the FLR state marks processing unit, further includes:
the setting processing unit is used for setting corresponding positions in a Function identification bitmap in a channel where the Function is located, and the corresponding positions indicate that the current FLR processing is started;
The ending time stamp writing unit is used for writing the ending time stamp corresponding to the function into a channel time stamp table; and bits corresponding to the Funtion identifications are arranged in the Function identification bitmap.
Wherein the time node recovery unit further comprises:
A time node discarding unit, configured to discard the time node if the time node time corresponding to the Function arrives before the end timestamp arrives during the bitmap setting period corresponding to the Function; and if a new time node allocated to the Function by the host is received, directly discarding the time node.
Wherein the recovery end processing unit further includes:
the first ending unit is used for resetting the corresponding bit in the Function identification bitmap when the ending time stamp arrives, and indicating that the current FLR processing is ended;
and the second ending unit is used for resetting the corresponding bit in the Function identification bitmap after detecting that the number of the time nodes corresponding to the Function is zero before the ending time stamp arrives.
Wherein, further include:
And before the end time stamp arrives, if the Function is detected to be re-started, when a new time node allocated by the host for the Function is received, the new time node is mounted to an external storage space of the host at FLRset position 1 in the new time node.
Wherein, further include:
The copying unit is used for copying the Function identification bitmap in the channel where the Function is located when each time round starts to form an original Function identification bitmap and a copied Function identification bitmap; the original Function identification bitmap is used for polling a timestamp table of the current channel;
The copy identification bitmap processing unit is configured to use the copy Function identification bitmap as a current time wheel, detect that a corresponding bit in the original Function identification bitmap is in a set state when a time node allocated by a host is received, and determine that the Function is re-enabled if a Function id is valid, and mark a received FLRset position 1 of the time node for valid time nodes after the received FLRset position, where the previous time node needs to be discarded after being read out, and reset the corresponding bit in the copy Function identification bitmap.
Accordingly, in a further aspect of the present application, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described above.
The embodiment of the application has the following beneficial effects:
the application provides a functional level reset processing method, a device and a storage medium for RDMA (remote direct memory access), which can rapidly judge that a time node between ending times can be discarded by recording the time wheel ending time of FLR (flash memory) processing of corresponding functions and setting a Function identification bitmap. The FLR response can be immediately returned, so that the host machine finishes the FLR flow in advance and releases the Function resource in advance; the resource release is not needed to be carried out until the time node reading is invalid by the hardware, so that the FLR processing efficiency is improved;
by implementing the scheme of the application, in the FLR processing process, the external storage space is not required to be scanned and read-write accessed, and the rapid recovery of time nodes can be realized on the basis of adding fewer resources such as a Function identification bitmap, a channel timestamp table and the like, so that the resources required by the FLR processing are reduced;
By adding FLRset bits to the time nodes, it can be quickly identified which time nodes are before FLR processing and which are after FLR processing, and the time nodes allocated after Function restart can be used normally.
In addition, by adopting the mode of counting the number of nodes of the time node, the FLR processing flow with the number of nodes reset to zero can be finished in advance without waiting for the expiration of the time round.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are required in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the application, and that it is within the scope of the application to one skilled in the art to obtain other drawings from these drawings without inventive faculty.
FIG. 1 is a schematic diagram illustrating a main flow of an embodiment of a functional level reset processing method for RDMA according to the present application;
FIG. 2 is a schematic view of an application environment to which the application relates;
FIG. 3 is a schematic diagram of a storage principle of a time node corresponding to a time wheel scale according to the present application;
FIG. 4 is a schematic diagram of the structure of a Function identification bitmap according to the present application;
FIG. 5 is a schematic diagram of a channel timestamp table according to the present application;
FIG. 6 is a schematic diagram of a Function identification bitmap replication according to the present application;
FIG. 7 is a schematic diagram illustrating one embodiment of a functional level reset processing apparatus for RDMA according to the present application;
FIG. 8 is a schematic diagram of the FLR status indication processing unit in FIG. 7;
FIG. 9 is a schematic diagram of a time node recovery unit of FIG. 7;
FIG. 10 is a schematic diagram showing the structure of the recovery end processing unit in FIG. 7;
fig. 11 is a schematic structural diagram of the Function restart processing unit in fig. 7.
Detailed Description
The present application will be described in further detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent.
FIG. 1 is a schematic diagram illustrating the main flow of an embodiment of a functional level reset processing method for RDMA according to the present application. As shown in fig. 2 to 6 together, in this embodiment, the method is applied to an RDMA communication system, and generally, as shown in fig. 2, in one RDMA communication system, at least includes a local host located at a local end, a local external storage space (DDR), a local RDMA engine, a local PCIe device, and a remote PCIe device located at a remote end, a remote RDMA engine, a remote host, and a remote external storage space (DDR). Wherein the local PCIe device and the remote PCIe device may be, for example: network adapters, storage controllers, accelerator cards, hardware encryption cards, and the like.
Fig. 3 is a schematic diagram showing a storage principle of a time node corresponding to a time wheel scale according to the present application.
In the application, the host at the home terminal needs to allocate at least one time node for each Function in advance, classify according to the time wheel type and the scale, and store the time nodes in an external storage space (such as DDR) of the host in a linked list mode.
In one example of the present application, a scale of each time wheel is associated with each Function's time node using a linked list. A link schematic of time wheel 1-scale 1 for the ACK class is shown in fig. 3.
The time wheel scale in the figure shows the structures of the List head (head), list node_0, list node_10, list node_100 and the List tail. Each List NODE corresponds to a 4KB space in the external memory space (DDR); each time Node is 64bits and occupies 8Bytes of space, so the List Node of each 4KB of space can store 512 time nodes. After each 4KB space is filled with time NODEs, new List NODE needs to be reapplied, and the newly applied List NODE is chained.
Wherein head is the head node of the linked list, which points to the time node of the earliest arriving Function; tail is the end node of the linked list, which points to the time node of the latest arriving Function. Through the linked list, the time wheel can traverse and manage all the execution conditions of the functions according to the time sequence. The linked list is for a certain time-wheel scale (time-wheel 1-scale 1 in fig. 3), the specific time-wheel type is determined by the queue type fields, w_type and scale_type. For example, in one example there are three categories of time wheels, each with 10 scales, and all functions can be managed by 30 similar linked lists.
Further, in FIG. 3, node_id is assigned by a component called CAM (Content-Addressable Memory ). The CAM is used to assign each Node a unique identifier node_id, which is typically a 12-bit binary value ranging from 0 to 4095, where "[11:0]" means that node_id occupies the lower 11 of the 12 bits.
Regarding "queue type: { Wheel_type [2:0], scale_type [4:0] } ", which means a combination type, two fields are included: the wavelet_type and scale_type occupy 3 bits and 5 bits of the combination type, respectively. Here, the numbers in square brackets represent the number of bits occupied by each field.
The Wheel_type field occupies 3 bits and ranges from 0 to 7 (binary 000 to 111), indicating the type of time wheel.
The scale_type field occupies 5 bits, and has a value ranging from 0 to 31 (binary 00000 to 11111), which indicates the type of scale, and may have different types of scales.
By combining these two fields together, a specific queue type for one time-wheel scale is represented, where the write_type specifies the type of time-wheel and scale_type specifies the type of scale. The use of such a combination type may help determine the attributes or characteristics of a particular queue in a queue management or configuration.
Further, in the present application, the data structure of each of the time nodes includes: spare fields (rvs-0), FLRset bits, valid bit (vld), type bit (type), channel identification field (channel_id), host identification field (host_id), function identification field (function_id), and QP number field (QP).
The following shows the data structure of one time node:
Timenode data structure (8Byte):{rvs-0(11bit),FLRset(1bit),vld(1bit),type(1bit),channel_id(10bit),host_id(3bit),function_id(13bit),QPN(24bit)}
QPN/function_id/host_id/channel_id: provided by the request module;
type:1-RNR,0-ACK;
vld: whether the node is valid or not, if vld=0, it indicates that the node has been deregistered, and timer_ doorbell is not initiated.
Wherein rvs-0 (11 bits): this field is a spare field (reserved bit and status bit) and may be used for specific purposes in practical applications, such as status flags or other reserved bits.
FLRset (1 bit): representing an FLR (FIRST LEVEL Router) set bit. The FLRset field may be used to indicate whether a particular FLR is set or activated, where FLRset bit (1) indicates that the FLR Function corresponding to Function is activated (i.e., FLR is started); at FLRset bit reset (0), the FLR corresponding to Function is completed.
Vld (1 bit): a flag bit indicating whether the time node is valid. When vld=0, indicating that the time node has been logged off or marked as invalid, timer_ doorbell will not be initiated. When vld=1, indicating that the time node is active, a corresponding operation or event is triggered. "doorbell" refers to a notification mechanism that notifies a target device or process that an event has occurred or that an operation has completed.
Type (1 bit): indicating the type of time node. According to the description provided, type=1 denotes RNR (Receiver Not Ready), and type=0 denotes ACK (Acknowledgement). RNR and ACK are two types commonly used in RDMA communications, where RNR is used to indicate that the recipient is temporarily unable to receive data and ACK is used to acknowledge received data.
Channel_id (10 bits): representing the identifier of the channel. In RDMA systems, channels are used to transfer data from a sender to a receiver. The channel_id field is used to identify a particular channel for differentiation and selection in the processing and scheduling of the time node.
Host_id (3 bit): representing the identifier of the host. In RDMA systems, the host_id field is used to identify the different hosts or nodes. It may be used to associate a particular time node with a particular host for identification and routing in the processing and scheduling in the system.
Function_id (13 bit) is an identifier representing a function. The function_id field is used to identify a function or module that performs a particular function or task. The identifier may be used to select the function to be performed and the particular operation associated with the time node.
QPN (24 bit): the number indicating QP. QP is used to establish and manage transmit and receive queues for data transmissions. The QPN field is used to identify the particular QP to associate the time node with the correct queue.
Wherein, QPN, function_id, host_id and channel_id are provided by the request module of the host end.
Referring to fig. 1 again, in an embodiment of the present application, the method for functional level reset processing in RDMA further includes the following steps:
step S10, the receiving host sends a Function level reset FLR command message for a Function, and after RDMA equipment (such as PCIe equipment) releases resources, the receiving host returns an FLR response message to the host;
Specifically, after receiving the FLR command message sent by the host, the PCIe device cleans and releases various resources occupied by the Function or the device, for example, performs tasks such as connection resource disconnection, device state reset, and the like. After completion, an FLR reply message may be returned to the host.
Step S11, marking the Function as being in an FLR processing state, and obtaining a corresponding ending time stamp of the FLR processing;
more specifically, in a specific example, the step S11 further includes:
Setting (setting 1) corresponding positions in a Function identification bitmap in a channel where the Function is located, wherein the corresponding positions indicate that current FLR processing is started; writing the ending time stamp corresponding to the Function into a channel time stamp table; wherein bits corresponding to the Funtion identifications are set in the Function identification bitmap.
As shown in fig. 4, a Function identification bitmap in one channel is shown, where Ch1id 0 to Ch1idn respectively represent bits corresponding to n functions in channel 1; when the corresponding bit is in a set state (1), the corresponding Function is indicated to be subjected to FLR processing (in an FLR state); when the corresponding bit is in the reset state (0), it indicates that its corresponding Function has completed FLR processing (in the non-FLR state).
As shown in fig. 5, a schematic diagram of the structure of a channel timestamp table is shown. In the channel timestamp table, a time wheel end point (TIMESTAMPN) of timestamps of FLRs of a plurality of corresponding functions is stored, and the time wheel end point is used for indicating an expected completion time point of the FLR processing, and by the TIMESTAMPN, it can be determined which time nodes corresponding to the functions need to be recycled.
Step S12, recycling the time node corresponding to the Function in the host memory according to the mark of the Function and the ending time stamp;
More specifically, the step S12 further includes:
Discarding the time node corresponding to the Function if the time of the time node corresponding to the Function arrives before the end time stamp arrives during the bitmap setting (1) corresponding to the Function; and if a new time node allocated to the Function by the host is received, directly discarding the time node.
It will be appreciated that the time wheel is shared by multiple functions, corresponding to a standard time, with the time continuing to scroll forward. When a certain scale of a certain time wheel arrives, the corresponding time node is polled through a link, and when the expected time point is reached, the corresponding time node is read out for processing. And if the Function corresponding to the time node is in the FLR state, discarding the time node. And if the Function corresponding to the read time node is in a non-FLR state, reading the QP task in the time node, and carrying out corresponding subsequent processing.
So in this step, cleaning and reclamation of the time node associated with the Function is started. For a time node that has been written into the host external memory space (DDR), it can be discarded directly because its corresponding resource has been reclaimed by the software. Meanwhile, the newly allocated time node of the Function can be directly discarded, so that the invalid time node is prevented from occupying resources.
And step S13, marking the Function as a finished FLR processing state when the time node corresponding to the Function is recovered.
More specifically, in the step S13, further including:
Step S130, when the ending time stamp arrives, resetting the corresponding bit in the Function identification bitmap to indicate that the current FLR processing is ended;
The channel time stamp table of each channel is polled through the Function identification, if a certain time stamp is consistent with the time stamp of the current time wheel, the time wheel of the previous FLR is expired, and the corresponding bit in the corresponding Function identification bitmap can be reset (set to 0).
In other examples, step S13 may be implemented by the following steps:
Step S131, before the end timestamp arrives, the number of time nodes corresponding to the Function is detected to be zero, and the corresponding bit in the Function identification bitmap is reset (set to 0).
In a specific application, the number of time nodes of each Function can be recorded, and when one time node is written into an external storage, the number of the nodes is increased by 1; after reading out one time node from the external memory space, the number of nodes is reduced by 1. If the number of time nodes corresponding to the Function is 0, the FLR procedure representing the Function may be ended in advance, and the corresponding bit in the Function identification bitmap is reset (set to 0).
Still further, the method comprises the steps of:
Step S132, before the end timestamp arrives, if it is detected that the Function is re-enabled, when a new time node allocated by the host for the Function is received, position FLRset in the new time node is located at 1, and the new time node is mounted to an external storage space of the host.
Specifically, when the Function is restarted during FLR. At this time, if the host allocates a corresponding time node again; however, when the indication of the corresponding Function identification bitmap is still in the FLR state, it is indicated that the time of the Function is not discarded, and for the subsequent time node with the normal Function, the node can still be normally mounted on the external storage space. However, the position FLRset in the newly allocated time node needs to be set to 1, and the time node after marking the time node is valid and cannot be discarded.
In a specific example, the above step S132 may be implemented by the following scheme:
Step S1320, when each time round starts, copying the Function identification bitmap in the channel where the Function is located to form an original Function identification bitmap and a copied Function identification bitmap; the original Function identification bitmap polls a timestamp table of the current channel; specifically, the original Function identification bitmap is used to poll each channel timestamp table, and if the timestamp therein is consistent with the current time-wheel, indicating that the previous FLR time-wheel has expired, the corresponding bit of the original Function identification bitmap may be reset (set to 0).
Step S1322, using the copied Function identification bitmap as the current time wheel, detecting that the corresponding bit in the original Function identification bitmap is in a set state when the time node allocated by the host is received, and if the Function identification is valid, judging that the Function is re-enabled, and marking FLRset positions 1 of the received time node, marking that the time node is valid, discarding after the previous time node is read, and resetting (setting 0) the corresponding bit in the copied Function identification bitmap. Indicating that the existing time node has been marked FLRset, no further marking is required in the following.
In some practical applications, the situation that FLR is performed multiple times for one Function is also faced, and the following manner may be adopted to process:
In one example, if a bit corresponding to a Function in the Function identification bitmap of a channel is 1 and the FLR with the same Function identification is reached, it indicates that the previous FLR is not finished, and the host performs the FLR again, then the previous FLR needs to be maintained, and the new FLR cannot be executed.
In other examples, multiple FLRs may also be supported. If multiple FLRs need to be supported, the resources of the Function identification bitmap and channel timestamp table, etc. need to be added. For example, multiple Function identification bitmaps and channel timestamp tables may be employed to correspond to different FLRs, respectively; at the same time, the manner of identifying bits (owner bits) can be added to support multiple FLRs for a certain Function. Different FLRs can be correspondingly set with different identification bits, when time nodes between two identification bits are recovered, the node numbers of the time nodes corresponding to the different FLRs are respectively deducted, and when the node number corresponding to a certain FLR returns to zero, the Function identification bitmap corresponding to the FLR is reset.
By implementing the method of the application, the time nodes between the ending time can be rapidly judged to be discarded by recording the corresponding time wheel ending time of the FLR processing of the Function and setting the Function identification bitmap. The FLR response can be immediately returned, so that the host machine finishes the FLR flow in advance and releases the Function resource in advance; the resource release is not needed to be carried out until the time node reading is invalid by the hardware, so that the FLR processing efficiency is improved;
by implementing the scheme of the application, in the FLR processing process, the external storage space is not required to be scanned and read-write accessed, and the rapid recovery of time nodes can be realized on the basis of adding fewer resources such as a Function identification bitmap, a channel timestamp table and the like, so that the resources required by the FLR processing are reduced;
By adding FLRset bits to the time nodes, it can be quickly identified which time nodes are before FLR processing and which are after FLR processing, and the time nodes allocated after Function restart can be used normally.
In addition, by adopting the mode of counting the number of nodes of the time node, the FLR processing flow with the number of nodes reset to zero can be finished in advance without waiting for the expiration of the time round.
By expanding the scheme of the application, the same Function can be supported to carry out FLR processing for a plurality of times.
FIG. 7 is a schematic diagram illustrating one embodiment of a functional level reset processing apparatus for RDMA according to the present application. The device 1 comprises at least:
The FLR command processing unit 10 is used for receiving a Function level reset FLR command message sent by the host and returning an FLR response message to the host after the RDMA equipment releases resources;
the FLR state marking processing unit 11 is configured to mark the Function as having entered an FLR processing state, and obtain an end timestamp corresponding to the FLR processing;
A time node recovery unit 12, configured to perform recovery processing on a time node corresponding to the Function in the host memory according to the indication of the Function and the end timestamp;
And the recovery end processing unit 13 is configured to mark the Function as being in the FLR processing state after recovering the time node corresponding to the Function.
Wherein, further include:
The time node storage processing unit 14 is configured to allocate at least one time node for each Function, classify according to a time wheel type and a scale, and store the time node in an external storage space of the host in a linked list manner;
wherein the data structure of each of the time nodes comprises: spare field, FLRset bits, valid bit, type bit, channel identification field, host identification field, function identification field, and QP number field.
As shown in fig. 8, the FLR status indication processing unit 11 further includes:
a setting processing unit 110, configured to set a corresponding position in a Function identification bitmap in a channel where the Function is located, to indicate that current FLR processing has started;
An ending time stamp writing unit 111, configured to write an ending time stamp corresponding to the function into a channel time stamp table; and bits corresponding to the Funtion identifications are arranged in the Function identification bitmap.
As shown in fig. 9, the time node reclamation unit 12 further includes:
The condition judging unit 120 is configured to judge whether the current bitmap setting period corresponding to the Function is in the bitmap setting period, and before the ending timestamp arrives;
A time node discarding unit 121, configured to discard the time node if the time node corresponding to the Function arrives before the end timestamp arrives during the bitmap setting period corresponding to the Function; and if a new time node allocated to the Function by the host is received, directly discarding the time node.
As shown in fig. 10, the recovery end processing unit 13 further includes:
a first ending unit 130, configured to reset a corresponding bit in the Function identification bitmap when the ending timestamp arrives, to indicate that current FLR processing has ended;
And the second ending unit 131 is configured to reset a corresponding bit in the Function identification bitmap after detecting that the number of time nodes corresponding to the Function is zero before the ending timestamp arrives.
It will be appreciated that in the device 1 of the present application, it further comprises:
And the Function restarting processing unit 15 is configured to, before the end timestamp arrives, if it is detected that the Function is re-enabled, when a new time node allocated by the host for the Function is received, mount FLRset a position 1 in the new time node, and mount the new time node to an external storage space of the host.
Still further, the Function restart processing unit 15 further includes:
The copying unit 150 is configured to copy the Function identification bitmap in the channel where the Function is located when each time round starts, so as to form an original Function identification bitmap and a copied Function identification bitmap; the original Function identification bitmap is used for polling a timestamp table of the current channel;
The copy identification bitmap processing unit 151 is configured to use the copy Function identification bitmap as a current time wheel, detect that a corresponding bit in the original Function identification bitmap is in a set state when a time node allocated by a host is received, and determine that the Function is re-enabled if the Function identification is valid, and mark FLRset positions 1 of the received time node for valid time nodes after that, and discard the time node before reading the time node before that, and reset the corresponding bit in the copy Function identification bitmap.
Accordingly, in a further aspect of the present application, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method as described in the preceding figures 1 to 6. For more details reference is made to the previous description of fig. 1 to 6, and no trace back is made here.
The embodiment of the application has the following beneficial effects:
the application provides a functional level reset processing method, a device and a storage medium for RDMA (remote direct memory access), which can rapidly judge that a time node between ending times can be discarded by recording the time wheel ending time of FLR (flash memory) processing of corresponding functions and setting a Function identification bitmap. The FLR response can be immediately returned, so that the host machine finishes the FLR flow in advance and releases the Function resource in advance; the resource release is not needed to be carried out until the time node reading is invalid by the hardware, so that the FLR processing efficiency is improved;
by implementing the scheme of the application, in the FLR processing process, the external storage space is not required to be scanned and read-write accessed, and the rapid recovery of time nodes can be realized on the basis of adding fewer resources such as a Function identification bitmap, a channel timestamp table and the like, so that the resources required by the FLR processing are reduced;
By adding FLRset bits to the time nodes, it can be quickly identified which time nodes are before FLR processing and which are after FLR processing, and the time nodes allocated after Function restart can be used normally.
In addition, by adopting the mode of counting the number of nodes of the time node, the FLR processing flow with the number of nodes reset to zero can be finished in advance without waiting for the expiration of the time round.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above disclosure is only a preferred embodiment of the present application, and it is needless to say that the scope of the application is not limited thereto, and therefore, the equivalent changes according to the claims of the present application still fall within the scope of the present application.
Claims (11)
1. A functional level reset processing method for RDMA, comprising at least the steps of:
the method comprises the steps that a receiving host sends a Function level reset FLR command message aiming at a Function, and after RDMA equipment releases resources, an FLR response message is returned to the host;
Marking the Function as having entered the FLR processing state, and obtaining an end timestamp corresponding to the FLR processing, including: setting corresponding positions in a Function identification bitmap in a channel where the Function is located, wherein the corresponding positions are set to indicate that current FLR processing is started; writing the ending time stamp corresponding to the function into a channel time stamp table; bits corresponding to Funtion identifications are arranged in the Function identification bitmap;
According to the indication of the Function and the ending time stamp, recycling the time node corresponding to the Function in the host memory, including: discarding the time node corresponding to the Function if the time of the time node corresponding to the Function arrives before the end time stamp arrives during the bitmap setting period corresponding to the Function; if a new time node distributed by the host for the Function is received, directly discarding the time node;
and marking the Function as being in the FLR processing state after recovering the time node corresponding to the Function.
2. The method of processing according to claim 1, further comprising:
At least one time node is allocated to each Function in advance, classified according to the time wheel type and the scale, and stored in an external storage space of a host in a linked list mode;
wherein the data structure of each of the time nodes comprises: spare field, FLRset bits, valid bit, type bit, channel identification field, host identification field, function identification field, and QP number field.
3. The method of claim 2, wherein marking the Function as completed FLR processing status at a time node corresponding to the Function is reclaimed, further comprising:
resetting the corresponding bit in the Function identification bitmap when the ending time stamp arrives, and indicating that the current FLR processing is ended; or (b)
Before the end time stamp arrives, detecting that the number of time nodes corresponding to the Function is zero, and resetting the corresponding bit in the Function identification bitmap.
4. The method as recited in claim 3, further comprising:
Before the end time stamp arrives, if the Function is detected to be re-enabled, when a new time node allocated by the host for the Function is received, carrying out FLRset position 1 in the new time node, and carrying out mounting on the new time node to an external storage space of the host.
5. The method as recited in claim 4, further comprising:
Copying the Function identification bitmap in the channel where the Function is located when each time round starts to form an original Function identification bitmap and a copied Function identification bitmap; the original Function identification bitmap is used for polling a timestamp table of the current channel;
When a time node distributed by a host is received, detecting that a corresponding bit in the original Function identification bitmap is in a set state, judging that the Function is restarted if the Function id is valid, marking FLRset positions 1 of the received time node, marking that the later time node is valid, and discarding the time node after reading before the time node is read, and resetting the corresponding bit in the copied Function identification bitmap.
6. A functional level reset processing apparatus for use in RDMA, comprising at least:
the FLR command processing unit is used for receiving a Function-level reset FLR command message which is sent by the host and used for returning an FLR response message to the host after the RDMA equipment releases resources;
the FLR state marking processing unit is used for marking the Function as being in an FLR processing state and obtaining an ending time stamp corresponding to the FLR processing;
the time node recovery unit is used for carrying out recovery processing on the time node corresponding to the Function in the host memory according to the mark of the Function and the ending time stamp;
The recovery end processing unit is used for marking the Function as a finished FLR processing state when the time node corresponding to the Function is recovered;
wherein, the FLR state marks processing unit, further includes:
the setting processing unit is used for setting corresponding positions in a Function identification bitmap in a channel where the Function is located, and the corresponding positions indicate that the current FLR processing is started;
the ending time stamp writing unit is used for writing the ending time stamp corresponding to the function into a channel time stamp table; bits corresponding to Funtion identifications are arranged in the Function identification bitmap;
The time node reclamation unit further includes:
A time node discarding unit, configured to discard the time node if the time node time corresponding to the Function arrives before the end timestamp arrives during the bitmap setting period corresponding to the Function; and if a new time node allocated to the Function by the host is received, directly discarding the time node.
7. The apparatus as recited in claim 6, further comprising:
The time node storage processing unit is used for distributing at least one time node for each Function, classifying according to the time wheel type and the scales and storing the time nodes into an external storage space of the host in a linked list mode;
wherein the data structure of each of the time nodes comprises: spare field, FLRset bits, valid bit, type bit, channel identification field, host identification field, function identification field, and QP number field.
8. The apparatus of claim 7, wherein the recovery end processing unit further comprises:
the first ending unit is used for resetting the corresponding bit in the Function identification bitmap when the ending time stamp arrives, and indicating that the current FLR processing is ended;
and the second ending unit is used for resetting the corresponding bit in the Function identification bitmap after detecting that the number of the time nodes corresponding to the Function is zero before the ending time stamp arrives.
9. The apparatus as recited in claim 8, further comprising:
And before the end time stamp arrives, if the Function is detected to be re-started, when a new time node allocated by the host for the Function is received, the new time node is mounted to an external storage space of the host at FLRset position 1 in the new time node.
10. The apparatus as recited in claim 9, further comprising:
The copying unit is used for copying the Function identification bitmap in the channel where the Function is located when each time round starts to form an original Function identification bitmap and a copied Function identification bitmap; the original Function identification bitmap is used for polling a timestamp table of the current channel;
The copy identification bitmap processing unit is configured to use the copy Function identification bitmap as a current time wheel, detect that a corresponding bit in the original Function identification bitmap is in a set state when a time node allocated by a host is received, and determine that the Function is re-enabled if a Function identification is valid, and mark a received FLRset position 1 of the time node, and mark that a time node is valid thereafter, and discard the time node after reading before the time node, and reset the corresponding bit in the copy Function identification bitmap.
11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311195508.7A CN116932273B (en) | 2023-09-18 | 2023-09-18 | Function level reset processing method and device used in RDMA and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311195508.7A CN116932273B (en) | 2023-09-18 | 2023-09-18 | Function level reset processing method and device used in RDMA and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116932273A CN116932273A (en) | 2023-10-24 |
CN116932273B true CN116932273B (en) | 2024-06-11 |
Family
ID=88388231
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311195508.7A Active CN116932273B (en) | 2023-09-18 | 2023-09-18 | Function level reset processing method and device used in RDMA and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116932273B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117956054B (en) * | 2024-03-26 | 2024-06-11 | 上海云豹创芯智能科技有限公司 | Method, system, chip and storage medium for realizing timer processing in RDMA |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104753816A (en) * | 2015-03-27 | 2015-07-01 | 华为技术有限公司 | RDMA (remote direct memory access) connection message processing method and related device |
CN107147722A (en) * | 2017-05-19 | 2017-09-08 | 郑州云海信息技术有限公司 | A kind of IB RTI methods based on RDMA communication mechanisms |
CN111277616A (en) * | 2018-12-04 | 2020-06-12 | 中兴通讯股份有限公司 | RDMA (remote direct memory Access) -based data transmission method and distributed shared memory system |
CN113326155A (en) * | 2021-06-28 | 2021-08-31 | 深信服科技股份有限公司 | Information processing method, device, system and storage medium |
CN113873008A (en) * | 2021-08-30 | 2021-12-31 | 浪潮电子信息产业股份有限公司 | Connection reconfiguration method, device, system and medium for RDMA network node |
WO2022028456A1 (en) * | 2020-08-07 | 2022-02-10 | 中兴通讯股份有限公司 | Congestion control method and apparatus, network node device and computer-readable storage medium |
CN114827234A (en) * | 2022-04-29 | 2022-07-29 | 广东浪潮智慧计算技术有限公司 | Data transmission method, system, device and storage medium |
CN115801642A (en) * | 2023-02-13 | 2023-03-14 | 深圳市泛联信息科技有限公司 | RDMA communication management module, method, device and medium based on state control |
CN116366317A (en) * | 2023-03-17 | 2023-06-30 | 清华大学 | Remote memory access protection mechanism construction method, remote memory node and equipment |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8880935B2 (en) * | 2012-06-12 | 2014-11-04 | International Business Machines Corporation | Redundancy and load balancing in remote direct memory access communications |
-
2023
- 2023-09-18 CN CN202311195508.7A patent/CN116932273B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104753816A (en) * | 2015-03-27 | 2015-07-01 | 华为技术有限公司 | RDMA (remote direct memory access) connection message processing method and related device |
CN107147722A (en) * | 2017-05-19 | 2017-09-08 | 郑州云海信息技术有限公司 | A kind of IB RTI methods based on RDMA communication mechanisms |
CN111277616A (en) * | 2018-12-04 | 2020-06-12 | 中兴通讯股份有限公司 | RDMA (remote direct memory Access) -based data transmission method and distributed shared memory system |
WO2022028456A1 (en) * | 2020-08-07 | 2022-02-10 | 中兴通讯股份有限公司 | Congestion control method and apparatus, network node device and computer-readable storage medium |
CN113326155A (en) * | 2021-06-28 | 2021-08-31 | 深信服科技股份有限公司 | Information processing method, device, system and storage medium |
CN113873008A (en) * | 2021-08-30 | 2021-12-31 | 浪潮电子信息产业股份有限公司 | Connection reconfiguration method, device, system and medium for RDMA network node |
CN114827234A (en) * | 2022-04-29 | 2022-07-29 | 广东浪潮智慧计算技术有限公司 | Data transmission method, system, device and storage medium |
CN115801642A (en) * | 2023-02-13 | 2023-03-14 | 深圳市泛联信息科技有限公司 | RDMA communication management module, method, device and medium based on state control |
CN116366317A (en) * | 2023-03-17 | 2023-06-30 | 清华大学 | Remote memory access protection mechanism construction method, remote memory node and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN116932273A (en) | 2023-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20080086561A1 (en) | Method for obtaining log information from network element device by network management server, a network element device and a network management server | |
CN116932273B (en) | Function level reset processing method and device used in RDMA and storage medium | |
CN108989432B (en) | User-mode file sending method, user-mode file receiving method and user-mode file receiving and sending device | |
KR101309045B1 (en) | Network device, network manager, network managing system, and performing method for clone-process | |
CN111416823A (en) | Data transmission method and device | |
WO2012034408A1 (en) | Method and system for processing media access control address aging | |
US20130061017A1 (en) | Method and Apparatus for Managing Video Memory in Embedded Device | |
JP4066617B2 (en) | Storage system connected to data network with data integrity | |
CN109756429B (en) | Bandwidth allocation method and device | |
CN113127139B (en) | Memory allocation method and device based on DPDK of data plane development kit | |
US8359601B2 (en) | Data processing method, cluster system, and data processing program | |
US7343432B1 (en) | Message based global distributed locks with automatic expiration for indicating that said locks is expired | |
CN107094085B (en) | Signaling transmission method and device | |
CN112243040B (en) | Method and system for generating unique identifier | |
CN111831954B (en) | Content data updating method, device, computer equipment and storage medium | |
CN116827880B (en) | Cache space management method and device | |
CN111475202A (en) | Inter-core communication method and system based on heterogeneous multi-processing system | |
KR20170117326A (en) | Direct memory access control device for at least one processing unit having a random access memory | |
CN107239378B (en) | Overtime monitoring method and system | |
CN112181737B (en) | Message processing method, device, electronic equipment and medium | |
JPH0591108A (en) | Message communication control method and communication system | |
CN111556447B (en) | Information processing method and device | |
WO2017107083A1 (en) | Data sending method and receiving method, apparatus and system | |
CN112911630A (en) | Session processing method, device, system, terminal and readable storage medium | |
US20090122696A1 (en) | Data Transfer in a Messaging System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |