CN115543219A - Method, device, equipment and medium for optimizing host IO processing - Google Patents

Method, device, equipment and medium for optimizing host IO processing Download PDF

Info

Publication number
CN115543219A
CN115543219A CN202211508344.4A CN202211508344A CN115543219A CN 115543219 A CN115543219 A CN 115543219A CN 202211508344 A CN202211508344 A CN 202211508344A CN 115543219 A CN115543219 A CN 115543219A
Authority
CN
China
Prior art keywords
control
control block
page table
host
ring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211508344.4A
Other languages
Chinese (zh)
Other versions
CN115543219B (en
Inventor
崔健
王江
李树青
李幸远
孙华锦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202211508344.4A priority Critical patent/CN115543219B/en
Publication of CN115543219A publication Critical patent/CN115543219A/en
Application granted granted Critical
Publication of CN115543219B publication Critical patent/CN115543219B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0611Improving I/O performance in relation to response time
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/10Address translation
    • G06F12/1009Address translation using page tables, e.g. page table structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention relates to the field of storage, and discloses a method, a device, equipment and a medium for optimizing host IO processing. The method comprises the following steps: acquiring the space size and the execution sequence of a control block required by host IO, and establishing a plurality of control block rings according to the space size and the execution sequence; filling control blocks required by the host IO to the plurality of control block rings in batches according to the execution sequence; and starting execution of the control blocks required by the host IO in response to the control blocks required by the host IO completing the first batch filling. The method disclosed by the invention can improve the use efficiency of the storage space, allow IO to balance between performance and efficiency and reduce the processing delay of IO.

Description

Method, device, equipment and medium for optimizing host IO processing
Technical Field
The present invention relates to the field of storage, and in particular, to a method, an apparatus, a device, and a medium for optimizing host IO processing.
Background
With the rise of computing Storage (computing Storage) technology, a computing Storage architecture reduces corresponding data moving operations by migrating data computation from a host CPU to a data processing acceleration unit close to a Storage unit, thereby realizing a greater degree of system performance release.
In the prior art, in a microcode-driven-based general computing Acceleration chip Architecture, a UAA (universal Acceleration Architecture) is connected to a host through a PCIe (Peripheral Component Interconnect express) interface, the UAA is divided into two parts, namely a control plane and a data plane, the control plane is based on the microcode-driven Architecture to implement Acceleration tasks, and the UAA flows between Acceleration engine modules in steps. The problem of work efficiency and performance exists in a main mode of a CP (Control Page, control Page table)/CB (Control Block) under the architecture of UAA, and specifically, as the storage function becomes more and more complex, the number of CBs required by an IO increases significantly, and the demand for the storage space of a CP or CP chain increases, and the number of CPs and CP chains that can be simultaneously accommodated in a storage space of a given size on a chip decreases, resulting in a decrease in the capability of processing parallel IO; processing time of CB processing is very different, a considerable part of CBs in a CP page table and a CP chain corresponding to an IO need to perform DDR (Double Data Rate), hard disk access or complex operation, which needs to take a long time to complete, and in the execution process, all performed preorder CBs occupy a memory meaninglessly, and on the other hand, for a CB waiting to be executed later, the CB tends to be at the tail, and the waiting time is longer, thereby reducing the use efficiency of the memory; from the perspective of IO processing delay, according to a manner that after all CBs are created, the first CB is handed to the UAA for execution, since the number of CBs to be created increases, delay also increases.
Disclosure of Invention
In view of this, the present invention provides a method, an apparatus, a device, and a medium for optimizing host IO processing. The invention provides an optimization method for host IO processing, wherein in order to improve the use efficiency of CP storage space and reduce IO processing delay, a Ring control page table (Ring CP) mode is defined on the basis of single CP and chain CP definition, and when CP or CP chain uses the Ring control page table mode, all CB storage spaces of one host IO are organized in a Ring mode, namely, are control block rings (CB Ring). In the CB Ring mode, all CBs required by an IO are dynamically generated in batches according to an execution sequence and filled to the CB Ring, and are dynamically recycled after the execution is finished until a complete IO flow is processed. The storage space of the CB Ring can be smaller than the total space required by a complete CB sequence corresponding to one IO, so that the space occupation is reduced; after the CB is executed, the space can be recovered, so that the use efficiency of the storage space is improved; the CB is used and generated dynamically, and the execution can be started only by generating part of the CB, so that the delay of the host IO is reduced.
Based on the above objectives, an aspect of the embodiments of the present invention provides a method for optimizing host IO processing, including the following steps: acquiring the space size and the execution sequence of a control block required by host IO, and establishing a plurality of control block rings according to the space size and the execution sequence; filling control blocks required by the host IO to the plurality of control block rings in batches according to the execution sequence; and starting execution of the control blocks required by the host IO in response to the control blocks required by the host IO completing the first batch filling.
In some embodiments, the method further comprises: and adding the judgment of the ring mode and the parameter information required by generating the corresponding control block ring by the control page table of the ring mode in the definition of the control page table to obtain the definition of the updated control page table.
In some embodiments, the adding, in the definition of the control page table, the judgment of the ring mode and parameter information required for generating the corresponding control block ring by the control page table of the ring mode, and obtaining the updated definition of the control page table includes: updating the definition of a head region of a control page table based on judging whether the current control page table is in a ring mode or not and acquiring the position of the current control page table in a control page linked list; responding to the current control page table being a ring-shaped control page table, acquiring parameter information required by the ring-shaped control page table for generating a corresponding control block ring, setting corresponding parameter information in an additional parameter table based on the parameter information, and setting an additional parameter table offset address in the ring-shaped control page table to obtain a starting position of the additional parameter table of the ring-shaped control page table.
In some embodiments, said updating the definition of the header region of the control page table based on determining whether the current control page table is in the ring mode and obtaining the location of the current control page table in the control page linked list comprises: updating the attribute judgment of the control page table based on judging whether the current control page table is in the ring mode; responding to the current control page table being a control page table in a ring mode, obtaining the position of the control page table in the ring mode in the control page linked list, and setting a pointer pointing to the address of the first control block of the next control page table in the control page linked list in the ring mode.
In some embodiments, the obtaining the space size and the execution order of the control blocks required by the host IO, and the establishing a plurality of control block rings according to the space size and the execution order includes: and establishing a plurality of ring-mode control page tables based on the definition of the updated control page tables, the acquired space size of the control block required by the host IO and the execution sequence, and generating a plurality of corresponding control block rings through the plurality of ring-mode control page tables.
In some embodiments, the obtaining the space size and the execution order of the control blocks required by the host IO, and the establishing a plurality of control block rings according to the space size and the execution order further includes: and setting an additional parameter table in a control block storage area of the last control page table of the control page linked list according to the concatenation sequence, wherein the additional parameter table is used for storing parameter information required by a corresponding control block ring generated by the control page table in the ring mode.
In some embodiments, the setting an additional parameter table in a control block storage area of a last control page table of the control page linked list in the concatenation order for storing parameter information required by a corresponding control block ring generated by the ring mode control page table includes: and in response to the fact that the parameter volume of the additional parameter table is larger than the control block storage area of the last control page table, sequentially occupying the control block storage area of the previous control page table forward in a reverse order according to the concatenation order.
In some embodiments, the obtaining the space size and the execution order of the control blocks required by the host IO, and the establishing the plurality of control block rings according to the space size and the execution order further includes: setting a control block type corresponding to a mode of a control page table; status flags for the control blocks are established in the control block headers of the control blocks to perform different processing logic depending on the status flags for the different control blocks.
In some embodiments, the establishing the status flags for the control blocks in the control block headers of the control blocks, to perform different processing logic according to the status flags for the different control blocks comprises: a control block valid flag, a completion flag, and an end flag are set, and in response to the valid flag being valid, the completion flag and the end flag are present.
In some embodiments, the setting a control block type corresponding to a mode of the control page table includes: setting a loop back control block mark to indicate that the current control block is positioned at the tail part of the linear address of the storage space of the control block loop; setting an actively-generated control block flag to indicate that a control block located behind the current control block is not generated, and in response to calling the control block with the actively-generated control flag, setting a generation engine for the control block with the actively-generated control flag to generate and fill the following control block onto the corresponding control block ring.
In some embodiments, the batching the control blocks needed by the host IO onto the plurality of control block rings according to the execution order includes: according to the analysis of the firmware on the command of the host IO, creating a first batch of control page tables for the host IO, and sequentially filling control blocks onto control block rings corresponding to the first batch of control page tables according to an execution sequence; and setting the firmware to transmit the address of the first filled control block to a work queue of a corresponding engine to perform queuing.
In some embodiments, the starting execution of the required control blocks of the host IO in response to the required control blocks of the host IO completing the first bulk fill includes: and in response to the detection of the actively generated control block mark, sending a notification to the firmware through a work queue management engine to complete the first filling of the control block, recovering a space corresponding to a control page table which is filled in the first batch, and starting the execution of the control block required by the host IO.
In another aspect of the embodiments of the present invention, an apparatus for optimizing host IO processing is further provided, including the following modules: the system comprises a first module, a second module and a third module, wherein the first module is configured to acquire the space size and the execution sequence of a control block required by host IO and establish a plurality of control block rings according to the space size and the execution sequence; a second module configured to batch fill the control blocks required by the host IO onto the plurality of control block rings according to the execution order; and the third module is configured to respond to the completion of the first batch filling of the control blocks required by the host IO and start the execution of the control blocks required by the host IO.
In another aspect of the embodiments of the present invention, there is also provided a computer device, including at least one processor; and a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of any of the methods described above.
In yet another aspect of the embodiments of the present invention, a computer-readable storage medium is further provided, in which a computer program for implementing any one of the above method steps when executed by a processor is stored.
The invention has at least the following beneficial effects: the invention provides a method, a device, equipment and a medium for optimizing host IO processing, wherein the method for optimizing the host IO processing allows the storage space of a control block ring to be smaller than the total space required by a complete control block sequence corresponding to host IO by establishing the control block ring, and reduces the requirement of a control page table on the storage space; the control block ring generated under the ring mode control page table is premised on the granularity of the control page table, and therefore is flexible, i.e. allows host IO to balance between performance and efficiency; the control blocks run on the control block ring corresponding to the control page table in the ring mode recover the corresponding space after the execution is finished, so that the use efficiency of the storage space is improved; the control blocks are dynamically generated and recycled, so that in one host IO, the service IO can be started only by generating part of the control blocks, and the processing delay of the IO is reduced; furthermore, by setting a loopback control block mark and an active generation control block mark, the execution and the generation of the CB are allowed to be synchronously performed, namely, the consumed time of the generation of the CB is hidden in the execution process of other CBs.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained according to the drawings without creative efforts.
FIG. 1 illustrates a schematic diagram of a general computing architecture;
FIG. 2 is a diagram illustrating an embodiment of a method for host IO processing in the prior art;
FIG. 3 is a diagram showing the structure of a single control page table in the prior art;
FIG. 4 is a diagram illustrating the structure of a chain controlled page table in the prior art;
FIG. 5 is a schematic diagram illustrating an embodiment of a method for optimizing host IO processing according to the present invention;
FIG. 6 is a schematic diagram illustrating another embodiment of a method for optimizing host IO processing according to the present invention;
fig. 7 is a schematic diagram illustrating an active trigger and a trigger scenario in an optimization method for host IO processing according to the present invention;
FIG. 8 is a schematic diagram illustrating an embodiment of an apparatus for optimizing host IO processing according to the present invention;
FIG. 9 is a schematic diagram illustrating one embodiment of a computer device provided by the present invention;
FIG. 10 is a schematic diagram illustrating one embodiment of a computer-readable storage medium provided by the present invention.
Detailed Description
Embodiments of the present invention are described below. However, it is to be understood that the disclosed embodiments are merely examples and that other embodiments may take various and alternative forms.
In addition, it should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are only used for convenience of expression and should not be construed as a limitation to the embodiments of the present invention, and they are not described in any further embodiments. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
One or more embodiments of the present application will be described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, fig. 1 is a schematic diagram of a general computing architecture (UAA), and fig. 2 is a schematic diagram of an embodiment of a method for processing host IO in the prior art, where in the general computing architecture (UAA), a control page table (CP) may have a size of 512 bytes, 1024 bytes, or 2048 bytes, and the space is used to store a control block CB except for a CP header, a data cache, and an original host IO instruction backup, and all CPs are placed in a continuous memory space, thereby forming a CP resource pool, and furthermore, for convenience of CP resource pool management, CP granularity in a single resource pool needs to be consistent, that is, only one CP size can be selected. The size of the CB may be 16 bytes, 32 bytes, 64 bytes and 128 bytes depending on different application engine types, and thus, the number of CBs that can be carried in a single CP is determined by the size of the CP and the size of the CB. Therefore, in some complex application scenarios, the length difference of CB chains corresponding to different types of IO is too large, the size of CP is selected to cause certain difficulty, and if the selected length is too small, the CB chain with a long length cannot be carried; on the contrary, if the granularity of the CP is too large, a certain CP resource is wasted.
A typical host IO process is a flow of UAA, firstly, an AEM (Acceleration Engine host interface management Engine) takes back an original IO request according to a certain interface protocol, and notifies firmware through a hardware event Queue managed by a WQS (Work Queue Scheduler Engine), after the firmware is notified, a CP is created for the IO operation after an IO command is analyzed, and the CB of step operation is filled in the CP according to requirements. After the above steps are completed, the firmware will transfer the address of the first CB1 (4B wide) to WQS, which puts the CB1 address in the work queue of the corresponding engine to queue. And when the corresponding engine starts to execute the CB1, performing corresponding acceleration operation according to the configuration information of the CB 1. The execution status is returned to the WQS after completion (optionally, firmware may be notified) and the entry address of the next CB2 is ready, as well as to the WQS to enter the work queue of the corresponding engine to wait, and so on. The whole circulation process is under the control of WQS hardware, firmware is not necessarily needed to participate, and when the last CBX is completed, the host needs to be responded, and the firmware is informed to recycle the CP space.
On the basis of fig. 1 and 2, referring to fig. 3 and 4, fig. 3 is a schematic diagram illustrating the structure of a single control page table in the prior art; fig. 4 is a schematic diagram illustrating the structure of a chain control page table in the prior art. As shown in fig. 3 and 4, the CP is generally divided into a normal CP and a chained CP, and the chained CP is introduced on the basis of a single CP, where the normal CP adds an 8-byte address pointer pointing to the next chained CP on the basis of the definition of the single CP; the size of the chained CP is consistent with that of the common CP, except for a CP head in the chained CP, the remaining space is used for storing the CBs, three chained pointers with the size of 8 bytes are added in the chained CP head, one is used for pointing to the next chained CP (if any), one is used for pointing to the previous CP, the last one is used for directly pointing to the first common CP (the data cache and the like are conveniently searched), the position of the each CB in the whole CP chain is identified in each CB, and the required information can be quickly positioned when the engine processes the information conveniently. However, with the development of UAA architecture, a single CP and a chain CP suffer from work efficiency and performance problems, specifically, with the increasing complexity of storage functions, the number of CBs required by an IO tends to increase significantly, and with the increasing demand for storage space of a CP or CP chain, the number of CPs and CP chains that can be simultaneously accommodated in a storage space of a given size on a chip tends to decrease, resulting in a decrease in parallel IO processing capability; processing time of CB processing is very different, a considerable part of CBs in a CP page table and a CP chain corresponding to an IO need to perform DDR (Double Data Rate), hard disk access or complex operation, which needs to take a long time to complete, and in the execution process, occupation of all performed preorder CBs on a memory is meaningless, on the other hand, for a CB waiting to be executed later, the more towards a tail CB, the longer the waiting time is, and further the use efficiency of the memory is reduced; in view of IO processing delay, according to a manner that all CBs are created and then the first CB is handed over to the UAA for execution, since the number of CBs that need to be created increases, delay also increases.
Based on the above objectives, a first aspect of the embodiments of the present invention provides an embodiment of a method for optimizing host IO processing. Fig. 5 is a schematic diagram illustrating an embodiment of a method for optimizing host IO processing according to the present invention. As shown in fig. 5, a method for optimizing host IO processing according to an embodiment of the present invention includes the following steps:
s1, acquiring the space size and the execution sequence of a control block required by host IO, and establishing a plurality of control block rings according to the space size and the execution sequence;
s2, filling control blocks required by the host IO to the plurality of control block rings in batches according to the execution sequence;
and S3, responding to the completion of the first batch filling of the control blocks required by the host IO, and starting the execution of the control blocks required by the host IO.
Based on the above objectives, a first aspect of the embodiments of the present invention provides an embodiment of a method for optimizing host IO processing. Fig. 6 is a schematic diagram illustrating another embodiment of the method for optimizing host IO processing according to the present invention. As shown in fig. 6, a ring CP mode is proposed on the basis of a normal CP and a chain CP, and for this purpose, a definition about the ring CP mode is added on the basis of the original CP and CB definitions. Specifically, in order to indicate whether the CP has a ring CP mode, a flag of the ring CP mode is added to the CP attribute; secondly, other parameters which may be needed by the CB are dynamically generated in the ring CP mode, so that the definition of an additional parameter table is added in the definition of the CP page in the ring CP; the size of the modified single CP page table may be 512 bytes, 1 kbyte or 2 kbyte, mainly consisting of four parts.
A 1.128 byte CP header area comprising:
(1) The properties of the CP: the type of CP, and mark the CP is located in a CP linked list, whether it is the ring CP page mode and its position in the linked list;
(2) A sequence number of a current CP, specified when the CP is created;
(3) The related information of the NVMe (Non-Volatile Memory express) queue comprises a 2-byte completion queue ID, a 2-byte submission queue ID and 4-byte talking queue head information;
(4) If a CP linked list is formed, 8 bytes comprise an address pointer pointing to the first CB of the next chained CP;
(5) 4 64-bit time stamps for recording key time nodes in the CP execution process;
(6) 64 bytes of space reserved for firmware use.
2. When the CP attribute is Ring CP Page mode, contain CB Ring control header 32 bytes:
(1) The offset address of the additional parameter table is 4 bytes, which represents the initial address of the additional parameter table in a single CP, namely the relative offset address from the address of the first CP;
(2) The space is reserved for 28 bytes.
3. A plurality of bytes of CB storage area, corresponding to the CP length of 512B/1KB/2KB, and when the CP is a ring CP page table, the length of the CB storage area is 192 bytes/704 bytes/1728 bytes respectively; when the CP is in other modes, the length of the CB storage area is 224 bytes/738 bytes/1760 bytes respectively.
A 4.64 byte data buffer pointer region for pointing to a common data buffer region.
5.64 bytes of original NVMe management and IO instruction backup area, so that when an exception occurs, firmware intervenes to recover errors.
Specifically, as shown in table 1 below:
Figure 244316DEST_PATH_IMAGE001
TABLE 1
In the chained CP, the location of the additional parameter table is located in the CB storage area from the last CP. If the parameter table of the additional parameters has larger volume, the CB storage areas of the preorders CP are occupied from the last CP in sequence from the reverse order to the front, and the rest CB storage areas in the CP chain form a CB ring.
Modifications to the CB include adding CB header flag bit definitions and two new CB types. Specifically, the method comprises the following steps:
a CB valid flag, 1 represents legal CB, and 0 represents illegal CB;
a CB completion flag, 1 indicating a CB that has been completed and 0 indicating a CB that has not been completed;
the Last CB flag, 1, indicates that a CP or the Last CB in the CP chain processing an IO flow is processed, and 0 indicates that the Last CB is not processed.
The above CB complete flag and Last CB flag are meaningful only when the CB valid flag is 1.
In addition to the above modifications to CB definitions, for example, to support a ring CP mode, CBs in a ring include two new CB types in addition to the original CB type:
the loopback CB, CB Return (CB _ R), indicates that the current CB is already at the end of the linear address of the CB Ring memory space, and the next CB needs to go to the beginning of the CB memory space of the first CP to obtain, and CB _ R is necessary for the Ring CP mode.
The CB of the CB is actively generated, that is, CB Builder (CB _ B), when the CB appears in CB Ring, it indicates that the subsequent CB of the CB has not been generated yet, when the WQS (Work Queue Scheduler scheduling engine) schedules the CB, it needs to give the CB generation engine (implemented by software or hardware) to process the CB, generate the subsequent CB and fill the CB Ring, and CB _ B is optional for the Ring CP mode.
Fig. 7 is a schematic diagram illustrating an active trigger and a trigger scenario in the method for optimizing host IO processing according to the present invention. Under the UAA framework, the WQS is responsible for distributing CBs, processing the corresponding CB by a dedicated software or hardware engine, and returning the processing result and the address of the next CB to the WQS. In the CB Ring mode, when a hardware engine processing a CB submits a next CB, different CB valid flags, CB complete flags, last CB flags in the header of the next CB are detected to complete different logics such as CB padding, CB processing, and the like. The specific processing logic is as follows:
1. the WQS receives the processing result of the current CB, and first determines whether the current CB is the Last CB, that is, determines whether the flag of the Last CB of the current CB is 1. If the mark of Last CB of the current CB is 1, the IO flow is ended, and the CP is recovered; otherwise, judging whether the next CB of the current CB is valid.
2. And judging whether the next CB of the current CB is valid or not, namely judging whether the valid flag of the next CB of the current CB is 1 or not. If the valid flag of the next CB of the current CB is not 1 (0), the IO flow is not finished at this moment, but the firmware is informed to carry out subsequent filling processing due to the lack of the subsequent CB, the subsequent CB is filled by the firmware, the CB Ring is updated, the next CB is handed to WQS scheduling, and the process is the construction and the recovery of the passively triggered CB; and if the valid flag of the next CB of the current CB is 1, judging whether the next CB of the current CB is finished.
3. And judging whether the next CB of the current CB is finished or not, namely judging whether the finishing mark of the next CB of the current CB is 1 or not. If the completion flag of the next CB of the current CB is 1, the IO flow is not finished at this time, but the firmware is informed to perform subsequent filling processing due to the lack of the subsequent CB, the subsequent CB is filled by the firmware, the CB Ring is updated, the next CB is handed to WQS scheduling, and the process is the construction and recovery of the passively triggered CB; and if the completion flag of the next CB of the current CB is not 1, judging whether the next CB of the current CB is the CB _ B or not.
4. Judging whether the next CB of the current CB is a CB _ B or not, if so, distributing the CB _ B to a CB generation engine by the WQS, completing the filling of the subsequent CB by a CB generation engine (SW/HW), replying the WQS completion state after the filling is completed, and handing the subsequent CB to the WQS scheduling, wherein the process is the construction and recovery process of the actively triggered CB; and if the next CB of the current CB is not the CB _ B, scheduling the next CB to a corresponding engine according to the content of the CB.
When the CB engine finds that the next CB is CB _ R, additional processing is required, i.e., relocating the first CB to the first CP and submitting this CB as the next CB to the WQS. The dynamic generation of the CB can be realized by firmware in the original UAA architecture, only all the CBs generated at one time need to be generated in batches, and in order to improve the real-time performance and reduce the influence of uncertainty of time in the software execution process, a special hardware engine can be designed and realized for dynamically creating and recycling the CB.
An exemplary IO command processing flow operating in the ring CP mode includes: first, the AEM will fetch the original IO request following some interface protocol and notify the firmware through the hardware event queue managed by the WQS. After the firmware is notified, the IO command is analyzed, a first batch of CPs are created for the IO operation, and the CPs which are operated step by step are filled into the CPs according to requirements. After the above steps are completed, the firmware will transfer the address of the first CB1 (4B wide) to WQS, which puts the CB1 address in the work queue of the corresponding engine to queue. When a certain engine processing the CB detects CB _ B, it is submitted to the CB generation engine for processing through WQS. When an engine processing the CB detects a passive CB filling trigger scenario, a message is sent to the firmware through the WQS, and the CB filling is completed by the firmware. After the firmware filling is complete, it is still handed over to the WQS for distribution. When the last CBX is completed, it needs to respond to the host and notify the firmware to perform CP space reclamation. The active trigger scenario and the passive scenario are allowed to occur in sequence in one IO flow, or only one of them may occur.
By the method, the storage space of the control block ring is allowed to be smaller than the total space required by a complete control block sequence corresponding to one host IO, and the requirement of a control page table on the storage space is reduced; the control block ring generated under the ring mode control page table is premised on the granularity of the control page table, so that the control block ring has elasticity, namely host IO is allowed to be balanced between performance and efficiency; the control blocks run on the control block ring corresponding to the control page table in the ring mode recover the corresponding space after the execution is finished, so that the use efficiency of the storage space is improved; the control blocks are dynamically generated and recycled, so that in one host IO, the service IO can be started only by generating part of the control blocks, and the IO processing delay is reduced; furthermore, by setting the loopback control block flag and the active generation control block flag, the execution and generation of the CB are allowed to be performed synchronously, that is, the consumed time of CB generation is hidden in the execution process of other CBs.
In a second aspect of the embodiments of the present invention, an apparatus for optimizing host IO processing is provided. Fig. 8 is a schematic diagram illustrating an embodiment of an apparatus for optimizing host IO processing according to the present invention. As shown in fig. 8, an apparatus for optimizing host IO processing according to the present invention includes: the first module 011 is configured to obtain a space size and an execution sequence of a control block required by a host IO, and establish a plurality of control block rings according to the space size and the execution sequence; a second module 012 configured to batch fill the control blocks required by the host IO onto the plurality of control block rings according to the execution order; a third module 013, configured to initiate execution of the required control blocks of the host IO in response to completion of a first population of the required control blocks of the host IO.
In some embodiments, the optimizing device configured for host IO processing further comprises a processor configured to: and adding the judgment of the ring mode and the parameter information required by generating the corresponding control block ring by the control page table of the ring mode in the definition of the control page table to obtain the definition of the updated control page table.
In some embodiments, the optimizing device configured for host IO processing further comprises a processor configured to: updating the definition of a head region of a control page table based on judging whether the current control page table is in a ring mode or not and acquiring the position of the current control page table in a control page linked list; responding to the current control page table which is a control page table of an annular mode, acquiring parameter information required by the control page table of the annular mode for generating a corresponding control block ring, setting corresponding parameter information in an additional parameter table based on the parameter information, and setting an offset address of the additional parameter table in the control page table of the annular mode to obtain the initial position of the additional parameter table of the control page table of the annular mode.
In some embodiments, the optimizing device configured for host IO processing further comprises a processor configured to: updating the attribute judgment of the control page table based on the judgment whether the current control page table is in the ring mode; responding to the current control page table being a control page table in a ring mode, obtaining the position of the control page table in the ring mode in the control page linked list, and setting a pointer pointing to the address of the first control block of the next control page table in the control page linked list in the ring mode.
In some embodiments, the first module 011 is further configured for: and establishing a plurality of ring-mode control page tables based on the definition of the updated control page tables and the acquired space size and execution sequence of the control blocks required by the host IO, and generating a plurality of corresponding control block rings through the plurality of ring-mode control page tables.
In some embodiments, the first module 011 is further configured for: and setting an additional parameter table in a control block storage area of the last control page table of the control page linked list according to the concatenation sequence, wherein the additional parameter table is used for storing parameter information required by a corresponding control block ring generated by the control page table in the ring mode.
In some embodiments, the first module 011 is further configured for: and responding to the fact that the parameter volume of the additional parameter table is larger than the control block storage area of the last control page table, and sequentially occupying the control block storage area of the previous control page table forward in the reverse order of the concatenation order.
In some embodiments, the first module 011 is further configured for: setting a control block type corresponding to a mode of a control page table; the status flags for the control blocks are established in the control block headers for the control blocks to perform different processing logic depending on the status flags for the different control blocks.
In some embodiments, the first module 011 is further configured for: a control block valid flag, a completion flag, and an end flag are set, and in response to the valid flag being valid, the completion flag and the end flag are present.
In some embodiments, the first module 011 is further configured for: setting a loop back control block mark to indicate that the current control block is positioned at the tail part of the linear address of the storage space of the control block loop; setting an actively-generated control block flag to indicate that a control block located behind the current control block is not generated, and in response to calling the control block with the actively-generated control flag, setting a generation engine for the control block with the actively-generated control flag to generate and populate a corresponding control block ring for the following control block.
In some embodiments, the second module 012 is further configured to: according to the analysis of the command of the host IO by the firmware, creating a first batch of control page tables for the host IO, and sequentially filling control blocks onto control block rings corresponding to the first batch of control page tables according to an execution sequence; and setting the firmware to transmit the address of the first filled control block to a work queue of a corresponding engine to perform queuing.
In some embodiments, the third module 013 is further configured to: and in response to the detection of the actively generated control block mark, sending a notification to the firmware through a work queue management engine to complete the first filling of the control block, recovering a space corresponding to a control page table which is filled in the first batch, and starting the execution of the control block required by the host IO.
In view of the above object, a third aspect of the embodiments of the present invention provides a computer device, and fig. 9 is a schematic diagram illustrating an embodiment of a computer device provided by the present invention. As shown in fig. 9, an embodiment of a computer device provided by the present invention includes the following modules: at least one processor 021; and a memory 022, the memory 022 storing computer instructions 023 executable on the processor 021, the computer instructions 023, when executed by the processor 021, implementing the steps of the method as described above.
The invention also provides a computer readable storage medium. FIG. 10 is a schematic diagram illustrating an embodiment of a computer-readable storage medium provided by the present invention. As shown in fig. 10, the computer readable storage medium 031 stores a computer program 032 which, when executed by a processor, performs the method as described above.
Finally, it should be noted that, as one of ordinary skill in the art can appreciate that all or part of the processes of the methods of the above embodiments can be implemented by a computer program to instruct related hardware, and the program of the method for setting system parameters can be stored in a computer readable storage medium, and when executed, the program can include the processes of the embodiments of the methods as described above. The storage medium of the program may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a Random Access Memory (RAM), or the like. The embodiments of the computer program may achieve the same or similar effects as any of the above-described method embodiments corresponding thereto.
Furthermore, the methods disclosed according to embodiments of the invention may also be implemented as a computer program executed by a processor, which may be stored in a computer-readable storage medium. Which when executed by a processor performs the above-described functions as defined in the method disclosed by an embodiment of the invention.
Further, the above method steps and system elements may also be implemented using a controller and a computer readable storage medium for storing a computer program for causing the controller to implement the functions of the above steps or elements.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the disclosed embodiments of the present invention.
In one or more exemplary designs, the functions may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium. Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such computer-readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a general-purpose or special-purpose computer, or a general-purpose or special-purpose processor. Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, D0L, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, digital Versatile Disc (DVD), floppy disk, blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The numbers of the embodiments disclosed in the above embodiments of the present invention are merely for description, and do not represent the advantages or disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit or scope of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims (15)

1. A method for optimizing host IO processing, comprising:
acquiring the space size and the execution sequence of a control block required by host IO, and establishing a plurality of control block rings according to the space size and the execution sequence;
filling control blocks required by the host IO to the plurality of control block rings in batches according to the execution sequence;
and starting the execution of the required control blocks of the host IO in response to the first batch filling of the control blocks required by the host IO.
2. The method of claim 1, further comprising:
and adding the judgment of the annular mode and the parameter information required by generating the corresponding control block ring by the control page table of the annular mode in the definition of the control page table to obtain the definition of the updated control page table.
3. The method of claim 2, wherein adding the ring mode judgment and the parameter information required for generating the corresponding control block ring from the ring mode control page table to the definition of the control page table, and obtaining the updated definition of the control page table comprises:
updating the definition of a head region of a control page table based on judging whether the current control page table is in a ring mode or not and acquiring the position of the current control page table in a control page linked list;
responding to the current control page table being a ring-shaped control page table, acquiring parameter information required by the ring-shaped control page table for generating a corresponding control block ring, setting corresponding parameter information in an additional parameter table based on the parameter information, and setting an additional parameter table offset address in the ring-shaped control page table to obtain a starting position of the additional parameter table of the ring-shaped control page table.
4. The method of claim 3, wherein updating the definition of the header region of the control page table based on determining whether the current control page table is in ring mode and obtaining the location of the current control page table in the control page list comprises:
updating the attribute judgment of the control page table based on judging whether the current control page table is in the ring mode;
responding to the current control page table being a control page table in a ring mode, obtaining the position of the control page table in the ring mode in the control page linked list, and setting a pointer pointing to the address of the first control block of the next control page table in the control page linked list in the ring mode.
5. The method of claim 2, wherein obtaining a space size and an execution order of the control blocks required by the host IO, and wherein building a plurality of control block rings according to the space size and the execution order comprises:
and establishing a plurality of ring-mode control page tables based on the definition of the updated control page tables and the acquired space size and execution sequence of the control blocks required by the host IO, and generating a plurality of corresponding control block rings through the plurality of ring-mode control page tables.
6. The method of claim 5, wherein obtaining the space size and execution order of the control blocks required by the host IO, and establishing the plurality of control block rings according to the space size and execution order further comprises:
and setting an additional parameter table in a control block storage area of the last control page table of the control page linked list according to the concatenation sequence, wherein the additional parameter table is used for storing parameter information required by a corresponding control block ring generated by the control page table in the ring mode.
7. The method of claim 6, wherein the setting of the additional parameter table in the control block storing area of the last control page table of the control page list in the concatenation order for storing the parameter information required for the corresponding control block ring generated by the control page table in ring mode comprises:
and in response to the fact that the parameter volume of the additional parameter table is larger than the control block storage area of the last control page table, sequentially occupying the control block storage area of the previous control page table forward in a reverse order according to the concatenation order.
8. The method of claim 2, wherein obtaining a space size and an execution order of the control blocks required by the host IO, and establishing a plurality of control block loops according to the space size and the execution order further comprises:
setting a control block type corresponding to a mode of a control page table;
status flags for the control blocks are established in the control block headers of the control blocks to perform different processing logic depending on the status flags for the different control blocks.
9. The method of claim 8, wherein establishing the status flags for the control blocks in the control block headers for the control blocks to perform different processing logic according to the status flags for the different control blocks comprises:
a control block valid flag, a completion flag, and an end flag are set, and in response to the valid flag being valid, the completion flag and the end flag are present.
10. The method of claim 9, wherein setting a control block type corresponding to a mode of a control page table comprises:
setting a loop back control block mark to indicate that the current control block is positioned at the tail part of the linear address of the storage space of the control block loop;
setting an actively-generated control block flag to indicate that a control block located behind the current control block is not generated, and in response to calling the control block with the actively-generated control flag, setting a generation engine for the control block with the actively-generated control flag to generate and fill the following control block onto the corresponding control block ring.
11. The method of claim 10, wherein the batching the control blocks needed by the host IO onto the plurality of control block rings in the execution order comprises:
according to the analysis of the command of the host IO by the firmware, creating a first batch of control page tables for the host IO, and sequentially filling control blocks onto control block rings corresponding to the first batch of control page tables according to an execution sequence;
and setting the firmware to transmit the address of the first filled control block to a work queue of a corresponding engine to perform queuing.
12. The method of claim 11, wherein the opening execution of the required control blocks of the host IO in response to the required control blocks of the host IO completing a first fill comprises:
and in response to the detection of the actively generated control block mark, sending a notification to the firmware through a work queue management engine to complete the first filling of the control block, recovering a space corresponding to a control page table which is filled in the first batch, and starting the execution of the control block required by the host IO.
13. An apparatus for optimizing IO processing of a host, comprising:
the system comprises a first module, a second module and a third module, wherein the first module is configured to acquire the space size and the execution sequence of a control block required by a host IO and establish a plurality of control block rings according to the space size and the execution sequence;
a second module configured to batch fill the control blocks required by the host IO onto the plurality of control block rings according to the execution order;
and a third module, configured to respond to that the control block required by the host IO completes first batch padding, and start execution of the control block required by the host IO.
14. A computer device, comprising:
at least one processor; and
a memory storing computer instructions executable on the processor, the instructions when executed by the processor implementing the steps of the method of any one of claims 1 to 12.
15. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 12.
CN202211508344.4A 2022-11-29 2022-11-29 Method, device, equipment and medium for optimizing host IO processing Active CN115543219B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211508344.4A CN115543219B (en) 2022-11-29 2022-11-29 Method, device, equipment and medium for optimizing host IO processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211508344.4A CN115543219B (en) 2022-11-29 2022-11-29 Method, device, equipment and medium for optimizing host IO processing

Publications (2)

Publication Number Publication Date
CN115543219A true CN115543219A (en) 2022-12-30
CN115543219B CN115543219B (en) 2023-04-18

Family

ID=84722749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211508344.4A Active CN115543219B (en) 2022-11-29 2022-11-29 Method, device, equipment and medium for optimizing host IO processing

Country Status (1)

Country Link
CN (1) CN115543219B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117008843A (en) * 2023-09-26 2023-11-07 苏州元脑智能科技有限公司 Control page linked list construction device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102681952A (en) * 2012-05-12 2012-09-19 北京忆恒创源科技有限公司 Method for writing data into memory equipment and memory equipment
CN107145311A (en) * 2017-06-12 2017-09-08 郑州云海信息技术有限公司 A kind of I/O data processing method and system
CN111737002A (en) * 2020-06-24 2020-10-02 苏州浪潮智能科技有限公司 Method, device and equipment for processing chained storage request and readable medium
CN113885945A (en) * 2021-08-30 2022-01-04 山东云海国创云计算装备产业创新中心有限公司 Calculation acceleration method, equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102681952A (en) * 2012-05-12 2012-09-19 北京忆恒创源科技有限公司 Method for writing data into memory equipment and memory equipment
CN107145311A (en) * 2017-06-12 2017-09-08 郑州云海信息技术有限公司 A kind of I/O data processing method and system
CN111737002A (en) * 2020-06-24 2020-10-02 苏州浪潮智能科技有限公司 Method, device and equipment for processing chained storage request and readable medium
CN113885945A (en) * 2021-08-30 2022-01-04 山东云海国创云计算装备产业创新中心有限公司 Calculation acceleration method, equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117008843A (en) * 2023-09-26 2023-11-07 苏州元脑智能科技有限公司 Control page linked list construction device and electronic equipment
CN117008843B (en) * 2023-09-26 2024-01-19 苏州元脑智能科技有限公司 Control page linked list construction device and electronic equipment

Also Published As

Publication number Publication date
CN115543219B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
JP5011311B2 (en) Method and mechanism for loading an XML document into memory
CN115543219B (en) Method, device, equipment and medium for optimizing host IO processing
CN113885945B (en) Calculation acceleration method, equipment and medium
US9088537B2 (en) Apparatus and method for executing agent
US20220045948A1 (en) Path creation method and device for network on chip and electronic apparatus
CN108008950A (en) The implementation method and device of a kind of user interface updating
US20230011298A1 (en) Smart contract-based data processing method, apparatus, device, and storage medium
CN113094125B (en) Business process processing method, device, server and storage medium
CN116820527B (en) Program upgrading method, device, computer equipment and storage medium
CN112506676B (en) Inter-process data transmission method, computer device and storage medium
CN116521096A (en) Memory access circuit, memory access method, integrated circuit, and electronic device
CN116414527A (en) Method and system for greatly improving performance of distributed transaction coordinator
CN111831408A (en) Asynchronous task processing method and device, electronic equipment and medium
US20220365822A1 (en) Data Processing Method and Computer Device
WO2024113996A1 (en) Optimization method and apparatus for host io processing, device, and nonvolatile readable storage medium
CN114253694B (en) Asynchronous processing method and device based on neural network accelerator
WO2011023106A1 (en) Scheduling method and scheduler for multi-core processor messages
CN113867796A (en) Protocol conversion bridge for improving reading performance by using multi-state machine and implementation method
WO2020220272A1 (en) Method and system for changing resource state, terminal, and storage medium
CN113836177B (en) Cache management of consumable business data
US11537625B1 (en) Using structured data templates and invocation statements to dynamically define values for efficient data encoding
JPS61136132A (en) Information processor
EP4276611A1 (en) Instruction prediction method and system, and computer-readable storage medium
CN112416539B (en) Multi-task parallel scheduling method for heterogeneous many-core processor
US11892972B2 (en) Synchronization mechanisms for a multi-core processor using wait commands having either a blocking or a non-blocking state

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant