CN116360670A - Method for optimizing control parameters of IO command scheduling and related products - Google Patents

Method for optimizing control parameters of IO command scheduling and related products Download PDF

Info

Publication number
CN116360670A
CN116360670A CN202111599105.XA CN202111599105A CN116360670A CN 116360670 A CN116360670 A CN 116360670A CN 202111599105 A CN202111599105 A CN 202111599105A CN 116360670 A CN116360670 A CN 116360670A
Authority
CN
China
Prior art keywords
host
training
command
data
storage device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111599105.XA
Other languages
Chinese (zh)
Inventor
贾舒
程雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Starblaze Technology Co ltd
Original Assignee
Chengdu Starblaze Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Starblaze Technology Co ltd filed Critical Chengdu Starblaze Technology Co ltd
Priority to CN202111599105.XA priority Critical patent/CN116360670A/en
Publication of CN116360670A publication Critical patent/CN116360670A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0659Command handling arrangements, e.g. command buffers, queues, command scheduling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a method and related products for optimizing control parameters for IO command scheduling, the method comprising: in response to obtaining operational data of a storage device, constructing a training sample set, wherein the training sample set comprises a plurality of paired input samples and output samples, the input samples comprising operational parameters related to host IO command processing, the output samples comprising target values of one or more control parameters controlling host IO command scheduling; and training the neural network by using the training sample set, and outputting a training result matched with the output sample so as to determine the control parameters of the IO command scheduling according to the training result. The method can optimize the control parameters for IO command scheduling and ensure the service quality of the storage equipment.

Description

Method for optimizing control parameters of IO command scheduling and related products
Technical Field
The present application relates generally to the field of solid state storage devices. More particularly, the present application relates to a method, a computer device, a storage device and a computer readable storage medium for optimizing control parameters for IO command scheduling.
Background
FIG. 1A illustrates a block diagram of a solid state storage device. The solid state storage device 102 is coupled to a host for providing storage capability for the host. The host and solid state storage device 102 may be coupled by a variety of means including, but not limited to, connecting the host to the solid state storage device 102 via, for example, SATA (Serial Advanced Technology Attachment ), SCSI (Small Computer System Interface, small computer system interface), SAS (Serial Attached SCSI ), IDE (Integrated Drive Electronics, integrated drive electronics), USB (Universal Serial Bus ), PCIE (Peripheral Component Interconnect Express, PCIE, peripheral component interconnect Express), NVMe (NVM Express), ethernet, fibre channel, wireless communications network, and the like. The host may be an information processing device capable of communicating with the storage device in the manner described above, such as a personal computer, tablet, server, portable computer, network switch, router, cellular telephone, personal digital assistant, or the like. The storage device 102 (hereinafter, solid-state storage device will be simply referred to as storage device) includes an interface 103, a control section 104, one or more NVM chips 105, and a DRAM (Dynamic Random Access Memory ) 110.
The NVM chip 105 described above includes NAND flash memory, phase change memory, feRAM (Ferroelectric RAM, ferroelectric memory), MRAM (Magnetic Random Access Memory, magnetoresistive memory), RRAM (Resistive Random Access Memory, resistive memory), and the like, which are common storage media.
The interface 103 may be adapted to exchange data with a host by way of, for example, SATA, IDE, USB, PCIE, NVMe, SAS, ethernet, fibre channel, etc.
The control unit 104 is used for controlling data transmission among the interface 103, the NVM chip 105 and the DRAM 110, and also for memory management, host logical address to flash physical address mapping, erase balancing, bad block management, etc. The control component 104 can be implemented in a variety of ways, such as software, hardware, firmware, or a combination thereof, for example, the control component 104 can be in the form of an FPGA (Field-programmable gate array, field programmable gate array), an ASIC (Application Specific Integrated Circuit, application-specific integrated circuit), or a combination thereof. The control component 104 may also include a processor or controller in which software is executed to manipulate the hardware of the control component 104 to process IO (Input/Output) commands. Control unit 104 may also be coupled to DRAM 110 and may access data of DRAM 110. FTL tables and/or cached data of IO commands may be stored in the DRAM.
The control section 104 issues a command to the NVM chip 105 in a manner conforming to the interface protocol of the NVM chip 105 to operate the NVM chip 105, and receives a command execution result output from the NVM chip 105. Known NVM chip interface protocols include "Toggle", "ONFI", and the like.
The memory Target (Target) is one or more Logical Units (LUNs) of shared CE (Chip Enable) signals within the NAND flash package. One or more dies (Die) may be included within the NAND flash package. Typically, the logic unit corresponds to a single die. The logic cell may include multiple planes (planes). Multiple planes within a logic unit may be accessed in parallel, while multiple logic units within a NAND flash memory chip may execute commands and report status independently of each other.
Data is typically stored and read on a storage medium on a page basis. While data is erased in blocks. A block (also called a physical block) contains a plurality of pages. Pages (called physical pages) have a fixed size, for example 17664 bytes. The physical pages may also have other sizes.
FTL (Flash Translation Layer ) is utilized in the storage device 102 to maintain mapping information from logical addresses (LBAs) to physical addresses. The logical addresses constitute the storage space of the solid state storage device as perceived by upper level software such as the operating system. The physical address is an address for accessing a physical storage unit of the solid state storage device. Address mapping may also be implemented in the related art using an intermediate address modality. For example, logical addresses are mapped to intermediate addresses, which in turn are further mapped to physical addresses. The table structure storing mapping information from logical addresses to physical addresses is called FTL table. FTL tables are important metadata in a storage device. The data items of the FTL table record address mapping relations in units of data units in the storage device.
The host accesses the storage device in IO commands that follow the storage protocol. The control component generates one or more media interface commands based on the IO commands from the host and provides the media interface commands to the media interface controller. The media interface controller generates storage media access commands (e.g., program commands, read commands, erase commands) that follow the interface protocol of the NVM chip according to the media interface commands. The control unit also keeps track of all media interface commands generated from one IO command being executed and indicates to the host the result of processing the IO command.
Referring to fig. 1B, the control part includes a host interface 1041, a host command processing unit 1042, a storage command processing unit 1043, a medium interface controller 1044, and a storage medium management unit 1045. The host interface 1041 acquires an IO command provided by the host. The host command processing unit 1042 generates a storage command from the IO command and supplies the storage command to the storage command processing unit 1043. The store commands may access the same size of memory space, e.g., 4KB. The data unit of the data accessed by the corresponding one of the storage commands recorded in the NVM chip is referred to as a data frame. The physical page records one or more frames of data. For example, a physical page is 17664 bytes in size and a data frame is 4KB in size, and one physical page can store 4 data frames.
The storage medium management unit 1045 maintains a logical address to physical address conversion for each storage command. For example, the storage medium management unit 1045 includes FTL tables. For a read command, the storage medium management unit 1045 outputs a physical address corresponding to a logical address (LBA) accessed by the storage command. For a write command, the storage medium management unit 1045 allocates an available physical address thereto, and records a mapping relationship of a logical address (LBA) to which it accesses and the allocated physical address. The storage medium management unit 1045 also maintains functions required to manage NVM chips, such as garbage collection, wear leveling, and the like.
The storage command processing unit 1043 operates the medium interface controller 1044 to issue a storage medium access command to the NVM chip 105 according to the physical address supplied from the storage medium management unit 1045.
For the sake of clarity, the command sent by the host to the storage device 102 is referred to as an IO command, the command sent by the host command processing unit 1042 to the storage command processing unit 1043 is referred to as a storage command, the command sent by the storage command processing unit 1043 to the media interface controller 1044 is referred to as a media interface command, and the command sent by the media interface controller 1044 to the NVM chip 105 is referred to as a storage media access command. The storage medium access command follows the interface protocol of the NVM chip.
FTL tables include a plurality of FTL table entries (or entries). For example, a correspondence between a logical page address and a physical page is recorded in each FTL table entry. As described above, a plurality of NVM chips are included in a solid-state storage device. Each NVM chip includes one or more DIEs (DIE) or Logical Units (LUNs). Multiple dies or logic units may respond to read and write operations in parallel. Multiple read, write, or erase operations are performed sequentially on the same die or logic unit.
Fig. 2 shows a schematic diagram of a large block. A chunk includes a physical block from each of a plurality of logical units (referred to as a logical unit group). Preferably, each logical unit provides a physical block for a large block. By way of example, a large block is constructed on every 16 Logical Units (LUNs). Each chunk includes 16 physical blocks, from each of 16 Logical Units (LUNs). In the example of FIG. 2, chunk 0 includes physical chunk 0 from each of the 16 Logical Units (LUNs), and chunk 1 includes physical chunk 1 from each Logical Unit (LUN). There are a variety of other ways to construct the chunk.
Optionally, page stripes are constructed in large blocks, with physical pages of the same physical address in each Logical Unit (LUN) constituting a "page stripe". In FIG. 2, physical pages P0-0, P0-1, … … and P0-x form page stripe 0, where physical pages P0-0, P0-1, … … are used to store user data and physical pages P0-15 are used to store parity data calculated from all user data within the stripe. Similarly, in FIG. 2, physical pages P2-0, P2-1 and … …, and P2-x constitute page stripe 2. Alternatively, the physical page used to store the parity data may be located anywhere in the page stripe.
When a logical page is repeatedly written with data, the correspondence of the logical page address and the latest physical address is recorded in FTL table entry, and the data recorded in the physical address that was written with data but is no longer referenced (e.g., no record in FTL table) becomes "garbage" (data). Data that has been written to the data and referenced (e.g., has records in the FTL table) is referred to as valid data, and "garbage" is referred to as dirty data. The data units in the physical page corresponding to the logical page are referred to as transfer units. The transmission unit in which valid data is recorded is referred to as a valid transmission unit, and the transmission unit in which dirty data is recorded is referred to as a dirty transmission unit. The physical block containing the dirty transfer unit is referred to as a "dirty physical block", and the physical block to which data is not written is referred to as a "free physical block".
Since dirty data may occupy a certain memory resource, in order to improve the memory performance of the solid state memory, the solid state memory may perform garbage collection (Garbage collection, GC) processing periodically to collect garbage receipts and free up memory space. In addition, garbage collection of solid-state memories is performed in physical blocks. Fig. 3 shows a schematic diagram of a garbage data reclamation process. Physical block 0 and physical block 1 are written with data. Physical pages 310, 312, 314, 316, etc. of physical block 0, indicated by the grid box, have no record in the FTL table on which the data is dirty. Physical pages 330, 332, 334, 336 of physical block 0, etc. are indicated by blank boxes have records in the FTL table on which the data is valid data. The data on the physical pages 320, 322, 324, 326, etc., of physical block 1, indicated by the grid boxes, are dirty data. The data on the physical pages 344, 342, 346, 348, etc. of the physical block 1 indicated by the blank boxes are valid data. The data held by the physical pages indicated by the grid in fig. 3 is dirty data, while the data held by the physical pages indicated by the blank boxes is valid data. For garbage collection, dirty physical blocks (e.g., physical block 0 and physical block 1) are scanned, valid data therein is read out and written to free physical block 2, and changes in physical page addresses of valid data are recorded in the FTL table. After all valid data is moved to physical block 2, scanned physical block 0 and physical block 1 are erased, thereby making physical block 0 and physical block 1 free large blocks. The solid state storage device also implements a wear leveling process such that multiple physical blocks of multiple NVM chips of the solid state storage device experience substantially the same number of erasures.
Disclosure of Invention
The solid state storage devices perform garbage collection operations (i.e., perform GC processing or GC operations) are collected in physical blocks, not in dirty transport units. For a physical block, one physical block may include multiple transmission units DTU, where only a part of the multiple transmission units may be dirty transmission units, and another part of the multiple transmission units may be active transmission units, so that in order to implement recovery of data of a physical block including the dirty transmission units, data in the active transmission units in the physical block needs to be read out first and then stored in an idle physical block or an active physical block (a physical block not including the dirty transmission units). I.e., the solid state storage device consumes processing resources or capabilities, such as memory resources, CPU resources, credit resources, etc., during GC processing. Because of GC, the solid state memory needs to process not only the host IO command but also the GC, but because of limited processing resources or capabilities in the solid state memory, there is a competition between the host IO command processing and GC processing, which can cause the number of the processing commands per second of the solid state memory to fluctuate or shake, especially in the case that the processing pressure of the host IO command increases or there is a bad physical block, the fluctuation or shake will be more obvious, and further affect the stability or consistency of the capability of the solid state memory to process the IO command. The stability or consistency of the solid state memory's capability to handle IO commands in turn characterizes the quality of service (QoS) of the solid state memory device.
To improve the quality of service of the solid state storage device and the experience of the user, the fluctuation of quality of service (QoS) can be suppressed by adjusting the control parameters (such as credit line) of the solid state storage device for controlling the IO command scheduling to control the number of processed IOs in unit time. For example, the number of host IO processes and the number of GC IO processes per unit time may be allocated to maintain a dynamic balance between write data capacity and garbage collection capacity. However, since the actual working environment of the solid-state storage device is very complex, and the control parameters required for different working environments may not be the same, how to set the control parameters to achieve good quality of service (QoS) is a problem to be solved.
In order to solve the problems, the method comprises the steps of collecting actual working parameters of solid-state storage equipment, constructing a training sample set of the neural network according to the working parameters, and training the neural network through the training sample set to obtain training results meeting requirements; and determining control parameters corresponding to different working parameters according to the training result. Therefore, no matter how complex the actual working environment of the storage device is, only a proper neural network needs to be constructed, and proper control parameters can be obtained, so that the quality of service (QoS) in any working environment is ensured.
Further, the method and the device can acquire the working parameters of the storage device in the real working environment, and construct the neural network according to the working parameters in the real working environment, so that the finally acquired control parameters are more in line with the actual working environment.
Furthermore, when the training sample set is constructed, the method can selectively collect part of working parameters in a steady state or ideal state for training of the neural network according to the change rule of certain working parameters (such as iops), and the control parameters obtained by training according to the working parameters can enable the solid state memory to work in the steady state or ideal state as much as possible due to higher stability or consistency of the IO command processing capability of the solid state memory in the steady state or ideal state.
According to a first aspect of the present application, there is provided a first method for optimizing control parameters for IO command scheduling according to the first aspect, comprising: in response to obtaining operational data of a storage device, constructing a training sample set, wherein the training sample set comprises a plurality of paired input samples and output samples, the input samples comprising operational parameters related to host IO command processing, the output samples comprising target values of one or more control parameters controlling host IO command scheduling; and training the neural network by using the training sample set, and outputting a training result matched with the output sample so as to determine the control parameters of the IO command scheduling according to the training result.
According to a first method for optimizing control parameters of IO command scheduling of a first aspect of the present application, there is provided a second method for optimizing control parameters of IO command scheduling according to the first aspect of the present application, constructing a training sample set includes: collecting a plurality of data points from the working data, and taking the data corresponding to each data point as an input sample; and obtaining an output sample corresponding to each input sample in a steady state or ideal state.
According to a second method for optimizing control parameters of an IO command schedule of the first aspect of the present application, there is provided a third method for optimizing control parameters of an IO command schedule of the first aspect of the present application, collecting a plurality of data points from the working data includes: selecting a working period from the working data, and collecting a plurality of data points in the working period, wherein the working period refers to the time corresponding to one balance cycle between the writing data capacity and the garbage recycling capacity of the storage equipment, and the working data comprises a plurality of working periods; or a plurality of data points are collected from data corresponding to a plurality of duty cycles.
According to a third method for optimizing control parameters of an IO command schedule according to the first aspect of the present application, there is provided a fourth method for optimizing control parameters of an IO command schedule according to the first aspect of the present application, the collecting a plurality of data points during the duty cycle comprising: a plurality of data points are collected from the duty cycle at set time intervals.
According to a third aspect of the present application, there is provided a method for optimizing control parameters of IO command scheduling, according to a fifth aspect of the present application, the method for optimizing control parameters of IO command scheduling, collecting a plurality of data points from data corresponding to a plurality of work periods, including: collecting a group of data points from each working period of the working periods respectively to obtain a plurality of groups of data points; or selecting a plurality of ideal working data points from the working periods, and taking each ideal working data point as the data point, wherein the ideal working data point refers to the data point of which the corresponding working parameter reaches the preset requirement.
According to a fifth method for optimizing control parameters of IO command scheduling according to the first aspect of the present application, there is provided a sixth method for optimizing control parameters of IO command scheduling according to the first aspect of the present application, the preset requirements including: the number of host IO commands processed per second (host-iops) or the number of garbage collection commands processed per second (gc-iops) is maximized at a specified number of valid transfer units (VTCs); or at a specified number of valid transfer units (VTC), the jitter of the number of host IO commands processed per second (host-iops) and/or the number of garbage collection commands processed per second (gc-iops) is minimal.
According to a third method for optimizing control parameters of an IO command schedule according to the first aspect of the present application, there is provided a seventh method for optimizing control parameters of an IO command schedule according to the first aspect of the present application, the input samples comprising: number of valid transfer units (VTC) and number of host IO commands processed per second (host-IOs).
According to a seventh method for optimizing control parameters of an IO command schedule according to the first aspect of the present application, there is provided a method for optimizing control parameters of an IO command schedule according to the eighth aspect of the present application, the input samples further comprise working parameters related to garbage collection IO command processing, and the output samples further comprise target values of one or more control parameters controlling garbage collection IO command scheduling.
According to an eighth method for optimizing control parameters of an IO command schedule of the first aspect of the present application, there is provided a ninth method for optimizing control parameters of an IO command schedule of the first aspect of the present application, the input samples further comprising: one or more of the number of garbage collection commands per second (gc-iops), the number of free blocks, and the scene parameters.
According to a ninth method for optimizing control parameters of IO command scheduling according to the first aspect of the present application, there is provided a method for optimizing control parameters of IO command scheduling according to the tenth method of the first aspect of the present application, wherein the storage device has a multi-core processor; the number of host IO commands processed per second (host-IOs) comprises the number of host IO commands processed per second corresponding to each single core; the number of garbage collection commands per second (gc-iops) includes the number of garbage collection IO commands per second corresponding to each single core.
According to a ninth method for optimizing control parameters of IO command scheduling in the first aspect of the present application, there is provided a method for optimizing control parameters of IO command scheduling in the eleventh aspect of the present application, where the scene parameters include a mixed read-write scene parameter, a high-low temperature scene parameter, a read error scene parameter, and an alarm temperature scene parameter; the mixed read-write scene parameter representation includes: a ratio of processing read commands to processing write commands in the mixed read-write state; the high-low temperature scene parameters include: one or more temperature values in a high temperature scenario, or one or more temperature values in a low temperature scenario; the alarm temperature scene parameters include: one or more temperature values greater than a preset alarm temperature; the read error scene parameters include: number of read data error bits.
According to one of the second to eleventh methods for optimizing control parameters of an IO command schedule according to the first aspect of the present application, there is provided a twelfth method for optimizing control parameters of an IO command schedule according to the first aspect of the present application, the output samples include credit (credit) of an IO command, and the credit (credit) of the IO command is used for controlling the IO command schedule and processing.
According to a twelfth method for optimizing control parameters of IO command scheduling in the first aspect of the present application, there is provided a method for optimizing control parameters of IO command scheduling in the thirteenth aspect of the present application, wherein the credit consumed by different types of IO commands is different, and the types of IO commands are divided into a host IO command and a garbage collection IO command.
According to a twelfth or thirteenth method for optimizing control parameters of an IO command schedule according to the first aspect of the present application, there is provided a method for optimizing control parameters of an IO command schedule according to the fourteenth aspect of the present application, the output samples further comprising resource parameters for read commands, write commands, hosts and/or garbage collection; wherein the resource parameters include: one or more of cache number, time slice parameter, memory consumption and CPU processing resource amount.
According to a fourteenth aspect of the present application, there is provided a method for optimizing control parameters of IO command scheduling, according to the fifteenth aspect of the present application, the determining an output sample corresponding to each input sample in an ideal state includes: acquiring a write bandwidth proportion or a write bandwidth value under the specified number of effective transmission units (VTCs) in an ideal state; and determining the credit limit (credit) corresponding to the appointed VTC according to the write bandwidth proportion or the write bandwidth value.
According to one of the first to fifteenth methods for optimizing control parameters of an IO command schedule according to the first aspect of the present application, there is provided a method for optimizing control parameters of an IO command schedule according to the sixteenth aspect of the present application, training a neural network with the training sample set comprising: inputting each pair of input samples and input samples in the output samples in the training sample set into the neural network to obtain a training output value of the neural network so as to complete one forward training; in response to the error value between the training output value and the corresponding output sample being greater than a set error threshold, or the number of forward training times being less than a set number of times, updating the weight parameters of the neural network to complete one-time reverse training; repeating forward training and reverse training; and finishing the training in response to the error between the training output value and the output sample being not greater than the set error threshold or the number of forward training times being not less than a set number of times.
According to a sixteenth method for optimizing control parameters of an IO command schedule according to the first aspect of the present application, there is provided a seventeenth method for optimizing control parameters of an IO command schedule according to the first aspect of the present application, further comprising: and responding to the completion of data training in the training sample set, and only outputting training results matched with the output samples.
A sixteenth or seventeenth method for optimizing control parameters of an IO command schedule according to the first aspect of the present application provides the eighteenth method for optimizing control parameters of an IO command schedule according to the first aspect of the present application, further comprising: storing the training results in response to the training being completed in a storage device; and providing the training results to a storage device in response to the training not being completed in the storage device.
According to an eighteenth method for optimizing control parameters of an IO command schedule according to the first aspect of the present application, there is provided a method for optimizing control parameters of an IO command schedule according to the nineteenth method of the first aspect of the present application, the training result comprises a control parameter lookup table or weight parameters of a trained neural network, wherein the control parameter lookup table comprises one or more entries, each entry comprising an input sample and its corresponding training output value.
According to a sixteenth or seventeenth method for optimizing control parameters of an IO command schedule according to the first aspect of the present application, there is provided a method for optimizing control parameters of an IO command schedule according to the twentieth of the first aspect of the present application, further comprising: responsive to the training being completed in the storage device, collecting real-time operating parameters of the storage device; and determining the control parameters for IO command scheduling according to the real-time working parameters and the training results.
According to a second aspect of the present application there is also provided a computer device comprising a processor and a memory, the memory storing a computer program which when executed by the processor implements one of the methods of the first to twentieth aspects of the present application for optimizing control parameters for IO command scheduling.
According to a third aspect of the present application, there is also provided a first storage device according to the third aspect of the present application, comprising a controller and a memory, the memory storing a computer program, the controller implementing one of the methods of the first to twentieth aspects of the present application for optimizing control parameters of an IO command schedule when the computer program is executed.
According to a first storage device of a third aspect of the present application, there is provided a second storage device of the third aspect of the present application, the controller comprising a host command processing unit and a storage command processing unit, the host command processing unit or the storage command processing unit executing the computer program.
According to a fourth aspect of the present application, there is also provided a computer readable storage medium having stored therein a computer program which when executed implements one of the methods of the first to twentieth aspects of the present application for optimizing control parameters for IO command scheduling.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly introduce the drawings that are required to be used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may also be obtained according to these drawings for a person having ordinary skill in the art.
FIG. 1A is a block diagram of a prior art solid state storage device;
FIG. 1B is a schematic diagram of a control unit of the prior art;
FIG. 2 is a schematic diagram of a large block;
FIG. 3 is a schematic diagram of a garbage collection process;
FIG. 4 is a flow chart of a method for optimizing control parameters according to an embodiment of the present application;
FIG. 5 is a schematic diagram of working data in a random write state according to an embodiment of the present application;
FIG. 6 is a flowchart of a neural network training method according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a method of optimizing control parameters according to one embodiment of the present application;
FIG. 8 is a schematic diagram of a method of optimizing control parameters according to another embodiment of the present application;
fig. 9 is a flowchart of a scheduling method of an IO command according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application, taken in conjunction with the accompanying drawings, clearly and completely describes the technical solutions of the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The conception of the application is as follows: and obtaining a proper training result in a neural network training mode, and obtaining control parameters of the storage equipment according to the training result, so as to realize optimization of the control parameters. Fig. 4 shows a flow chart of a method for optimizing control parameters for IO command scheduling, the method comprising step S1 and step S2. The following is a detailed description.
In response to obtaining the working data of the storage device, a training sample set is constructed, wherein the training sample set comprises a plurality of paired input samples and output samples, the input samples comprise working parameters related to host IO command processing, and the output samples comprise target values of one or more control parameters for controlling host IO command scheduling.
And S2, training the neural network by using the training sample set, and outputting a training result matched with the output sample so as to determine the control parameters of the IO command scheduling according to the training result.
For neural network applications, the choice of input samples and output samples is critical. In order to implement the inventive concept of the present application, first, sample data for training of the neural network needs to be acquired, where the sample data refers to a training sample set in step S1. In the present application, the training sample set is obtained through working data of a storage device, where the working data of the storage device refers to data collected during an actual working process of the storage device, where the data may be obtained by statistics, measurement or calculation of the storage device, or may be obtained through other manners by the storage device. For example, the operating data of the storage device may include physical characteristics such as bandwidth, number of IO commands processed per second (iops), or average latency. In an embodiment of the present application, the training sample set includes a plurality of paired input samples and output samples, wherein the input samples are used as inputs to the neural network to train the neural network, and the output samples are used as reference values of output values of the neural network to determine training effects of the neural network. By way of example, the input samples include operating parameters related to host IO command processing and the output samples include target values for one or more control parameters controlling host IO command scheduling. By way of example, the operating parameter may be a physical characteristic parameter such as bandwidth, number of IO commands processed per second (iops) or average delay, and the control parameter may be a parameter associated with IO processing resources, such as the number of cache units required for IO processing, the time taken or the memory consumption, etc.; the control parameter may also be a parameter associated with IO processing capability, such as a credit value for scheduling IO commands (e.g., may be set such that the greater the credit value, the more likely an IO command is to be responded to).
Since the training sample set constructed in the present application contains two parameters: input samples and input samples. For ease of understanding, the process of constructing the input samples and constructing the output samples will be described below, respectively.
By way of example, a plurality of data points may be collected from the acquired operational data, each data point being taken as an input sample. FIG. 5 illustrates a schematic diagram of operational data when the memory device is in a random write state. Wherein the horizontal axis represents time t and the vertical axis represents the number of host IO commands processed per second (host-iops), and the working data waveform represents the process of the host-iops changing back and forth over time during random writing to the storage device. By way of example, fig. 5 illustrates three duty cycles I, II and III, where each duty cycle represents a balancing process between storage device write data capability and garbage collection capability. As can be seen from fig. 5, the waveform of each duty cycle appears spike-like, i.e., in one duty cycle host-iops increases gradually from the valley, and then drops to the valley after reaching the peak. The reason why host-iops exhibit this change is that as the random write data process deepens, the number of dirty physical blocks increases, the number of valid physical blocks decreases, the storage device's resources and ability to process host IO commands decrease, the storage device may perform garbage collection operations (GC operations) to collect data in the dirty physical blocks such that the number of valid physical blocks increases, and as the number of valid physical blocks increases, the storage device's resources and ability to process host IO commands also increases, so host-iops gradually increase from the valley. As the resources and capabilities of host IO commands increase, the number of dirt blocks increases, so host-iops falls to a valley after reaching a peak. Due to the GC operation, the storage device will exhibit a cycle-by-cycle process in processing host IO commands, resulting in multiple duty cycles. It should be noted that the waveforms of each working period are similar, but their various parameters are not identical, such as the time or working parameters of each working period are different (that is, the working period is not a strict time period). For example, in fig. 5, the peak value of duty cycle II is lower, while the time of duty cycle III is longer. It should be understood that only host-iops are shown in fig. 5, and that in practice the working data may also include other various types of working parameters, such as the number of valid transmission units (VTCs), the number of IO commands per second (GC-iops) to process GC, without limitation.
Further, there are various ways of collecting a plurality of data points from the acquired working data, for example, a working period may be selected from a plurality of working periods in the working data, and then the data points are collected at set time intervals in the working period, for example, one data point is collected every 100 ms. The data for a plurality of input samples taken from duty cycle I in fig. 5 is given with reference to table 1.
TABLE 1
VTC(20) host-iops(I-1)
VTC(30) host-iops(I-2)
VTC(40) host-iops(I-3)
VTC(50) host-iops(I-4)
Four entries are included in table 1, each representing a data point, corresponding to an input sample. Wherein each data point includes two operating parameters, namely VTC and host-iops. Wherein the VTCs (20), (30), 40, and 50) represent the number of valid transmission units, i.e., VTC data, the numbers in brackets being specific VTC values. host-ios (I-1), host-ios (I-2), host-ios (I-3), and host-ios (I-4) represent host-ios data. The data points in Table 1 are all from duty cycle I, so the brackets in Table 1 for host-ios (I-1), host-ios (I-2), host-ios (I-3), and host-ios (I-4) are denoted by "I" for duty cycle I.
Similarly, data may be collected from other working periods, see table 2 for a plurality of data points collected from working data corresponding to working period II, and an input sample corresponding to each data point is obtained. Thus, in Table 2, the work cycle II is indicated by "II" in brackets of host-ios (II-1), host-ios (II-2), host-ios (II-3) and host-ios (II-4). Referring to table 3, a plurality of data points are collected from the working data corresponding to the working period III, and an input sample corresponding to each data point is obtained. Thus, in Table 3, the work cycle III is denoted by "III" in brackets of host-ios (III-1), host-ios (III-2), host-ios (III-3), and host-ios (III-4).
TABLE 2
VTC(20) host-iops(II-1)
VTC(30) host-iops(II-2)
VTC(40) host-iops(II-3)
VTC(50) host-iops(II-4)
TABLE 3 Table 3
Figure BDA0003432443800000091
Figure BDA0003432443800000101
As another example, since the working data includes a plurality of working cycles, in addition to selecting one working cycle to collect the data points, the working data corresponding to each working cycle of the plurality of working cycles may be collected. The first, second and third sets of data points are used as input samples, i.e. the input samples contain multiple sets of data points, and the multiple sets of data points are used as input samples, where the multiple data points in the working period I collected in the above table 1 are used as first set of data points, the multiple data points in the working period II collected in the above table 1 are used as second set of data points, the multiple data points in the working period III collected in the table 3 are used as third set of data points.
Also for example, instead of collecting data points in one or more working cycles to obtain an input sample, a plurality of ideal working data points may be selected from a plurality of working cycles, and a plurality of ideal working data points may be used as the input sample, where the ideal working data points refer to data points whose corresponding working parameters reach preset requirements. The data of a plurality of input samples collected by this method are shown in table 4.
TABLE 4 Table 4
VTC(20) host-iops(I-1)
VTC(30) host-iops(I-2)
VTC(40) host-iops(II-3)
VTC(50) host-iops(III-4)
Each entry shown in table 4 corresponds to a data point. The data points corresponding to the first entry and the second entry come from a working period I, the data point corresponding to the third entry comes from a working period II, and the data point corresponding to the fourth entry comes from a working period III. That is, two data points [ VTC (20), host-iops (I-1) ] and [ VTC (30), host-iops (I-2) ] may be selected during the duty cycle I, one data point [ VTC (40), host-iops (II-3) ] may be selected during the duty cycle II, one data point [ VTC (50), host-iops (III-4) ] may be selected during the duty cycle III, the four data points being consecutive in the dimension of the VTC, so it is understood that the data point selected by this method is the ideal host-iops data (e.g., the maximum host-iops corresponding to the specified VTC value) for each specified VTC. In addition, it should be appreciated that these specified VTC values are within the VTC range covered by one duty cycle, e.g., one duty cycle covers a VTC range of (0, 80).
In some cases, such as the working data shown in fig. 5, the working data of some working cycles are ideal, so the working data of these working cycles can be used to construct the input samples, and the method shown in tables 1 to 3 can be applied. In other cases, the operational data for each of the operating cycles may not be optimal, and the desired operational data points may be selected from the plurality of operating cycles to form the input samples, which may be adapted to the manner shown in Table 4.
By way of example, the preset requirements for the ideal operating data point include, but are not limited to: at the specified VTC, host-iops reaches a maximum; the number of garbage collection commands processed per second (gc-iops) reaches a maximum at the specified VTC; at the specified VTC, host-iops jitter is minimal or at the specified VTC gc-iops jitter is minimal. The jitter refers to the number of the storage device processing the host IO command or the GC IO command in a period of time, or refers to the delay of a plurality of IO commands processed by each data point storage device. For example, in Table 4, the reason for selecting the operational data points [ VTC (20), host-ios (I-1) ] may be [ VTC (20), host-ios (I-1) ], [ VTC (20), host-ios (II-1) ] [ VTC (20), host-ios (III-1) ] where host-ios (I-1) is the largest. Or the change in the working data points [ VTC (20), host-iops (I-1) ] is minimal compared to the surrounding working data points, while the changes in the working data points [ VTC (20), host-iops (II-1) ] and [ VTC (20), host-iops (III-1) ] are greater.
Also by way of example, the ideal work data points described above may be stitched together to form an ideal work cycle, and a plurality of data points (which may be ideal work data points or newly fitted data points) are collected by fitting during the ideal work cycle to form an input sample.
In the above embodiment, the input samples include VTC and host-iops. In other embodiments, the input samples may include more types of operating parameters. In one embodiment, the input samples further comprise: one or more of the number of garbage collection commands per second (gc-iops), the number of free blocks (freeblock count), and the scene parameters (scene).
For example, the input samples may be expressed as [ VTC, host-ios, gc-ios, freeblock count ], i.e., one input sample includes four operating parameters of VTC, host-ios, gc-ios, and freeblock count. As another example, the input samples may be expressed as [ VTC, host-ios, gc-ios, freeblock count, scenario ], i.e., one input sample includes five operating parameters of VTC, host-ios, gc-ios, freeblock count, and scenario. Also for example, the input samples may be represented as [ VTC, host-iops, gc-iops ], i.e., one input sample includes three operating parameters of VTC, host-iops, and gc-iops.
Further, the storage device may be a single-core processor or a multi-core processor; therefore, host-IOs can be divided into the number of host IO commands processed per second corresponding to each single core, and gc-IOs can be divided into the number of garbage collection IO commands processed per second corresponding to each single core. For example, for a processor with two cores, the input samples may be represented as [ VTC, core1_host-ios, core2_host-ios, core1_gc-ios, core2_gc-ios, freeblock count ]. Where core 1_host-IOs represents the number of host IO commands processed per second for the first single core, core 2_host-IOs represents the number of host IO commands processed per second for the second single core, core 1_gc-IOs represents the number of garbage collection IO commands processed per second for the first single core, and core 2_gc-IOs represents the number of garbage collection IO commands processed per second for the second single core.
The scene parameters represent the states of the storage equipment, and working data of different working states of the storage equipment can be collected for training so as to control parameters of the different working states of the storage equipment. For example, fig. 5 shows the collected operation data in the random write state. In addition, for example, the storage device may be in a plurality of states such as a hybrid read-write state, a high-temperature state, a low-temperature state, a read error state, and an alarm temperature state, where the states respectively correspond to the hybrid read-write scene parameter, the high Wen Changjing parameter, the low-temperature scene parameter, the read error scene parameter, and the alarm temperature scene parameter.
The hybrid read-write scenario refers to that the host sends a read command in addition to a write command to the storage device, and the storage device may process the write command and also may process the read command. The hybrid read-write scene parameters may be expressed as: the ratio of memory device processing read commands to processing write commands in the hybrid read-write state. A high temperature scenario indicates that the storage device operates in a high temperature environment, where a high temperature environment refers to an environment having a temperature higher than normal temperature, for example, an environment of 80 degrees, 100 degrees, or 125 degrees. A low temperature scenario indicates that the storage device operates in a low temperature environment, where the low temperature environment refers to an environment having a temperature lower than normal temperature, for example, an environment of 0 degrees or 40 degrees below zero. Thus, the high temperature scene parameters can be expressed as: one temperature value in a high temperature scenario, or a plurality of temperature values in a high temperature scenario. The low temperature scene parameters may be expressed as: one temperature value in a low temperature scenario, or a plurality of temperature values in a low temperature scenario. The alarm temperature scene indicates that the storage device works above a preset alarm temperature, so the alarm temperature scene parameters can be expressed as: one or more temperature values greater than a preset alarm temperature. The read error scene indicates a situation in which read data is in error, and thus the read error scene parameter can be expressed as: number of read data error bits.
In other embodiments, the storage device may even be under a multi-state fused scene, for example, under a mixed read-write state and a high-temperature state fused scene, where the mixed read-write scene parameter and the high Wen Changjing parameter may be used together to describe the corresponding scene.
For example, since various scene parameters can be added to the input sample, working data similar to fig. 5 (random writing state) can be obtained in various different scenes, and the method of the present application can be applied to various scenes of the storage device. That is, regardless of the scenario, the input samples may be constructed by retrieving the corresponding working data of the memory under that scenario.
The input samples are described in detail above and the output samples are described in detail below. In this application, the input samples and the output samples are in pairs, that is, each input sample has its corresponding output sample, where the output sample refers to a control parameter corresponding to each input sample in a steady state or ideal state, and if one input sample includes a VTC value, then its corresponding output sample is the control parameter corresponding to the steady state or ideal state in the specified VTC value. By way of example, the steady state may refer to a state in which storage device (SSD) write data capability and garbage collection capability reach a dynamic balance; when the memory device is in a steady state, the random write performance will remain substantially unchanged and the memory device will tend to stabilize. The ideal state is a situation that the quality of service (QoS) of the storage device is ideal, for example, a jitter situation of host-iops may be used as a standard for measuring the quality of service (QoS), and a jitter situation of gc-iops may be used as a standard for measuring the quality of service (QoS), that is, a state that the jitter situation of host-iops and/or gc-iops meets expectations is used as the ideal state. In other embodiments, the ideal state may be normalized by other factors, such as bandwidth or average latency, or by the average performance of the storage device after the factors are combined.
As already mentioned above, the control parameter in the output sample may be a parameter associated with the IO processing resource, for example, the number of cache units required for the IO processing, the occupied time or the memory consumption, etc.; the control parameter may also be a parameter associated with IO processing capabilities, such as a credit value for scheduling IO commands, which may be different. For example, in memory technology, IO commands can be divided into two categories: the host IO command and the garbage collection IO command can be set to be different in credit (credit) consumed by processing the host IO command and the garbage collection IO command, in addition, the credit can be distributed according to time slices, for example, the credit is distributed once every 100ms, after the IO command is received by the storage equipment in 100ms, the type of the IO command is determined, then the credit required by processing the IO command is determined according to the type, whether the residual credit meets the processing IO command is checked, and if the residual credit does not meet the processing IO command, the processing of the IO command is suspended until enough residual credit is obtained.
In this application, there are various ways to construct the output samples by using the acquired working data, for example, when the control parameter is the credit (credit) of the IO command, the credit (credit) of the IO command under the specified VTC is calculated by acquiring the write bandwidth proportion or the write bandwidth value of the steady state or ideal state under the specified VTC, and then calculating the credit of the IO command under the specified VTC according to the write bandwidth proportion or the write bandwidth value. The specific calculation method can be as follows: a positive correlation function is designed, the write bandwidth proportion or the write bandwidth value is taken as an independent variable of the positive correlation function, and the credit (credit) of the IO command under the appointed VTC is taken as a dependent variable.
By way of example, in addition to the credit (credit) for the IO command described above, the output samples also include resource parameters for read commands, write commands, host and/or garbage collection; wherein the resource parameters include: one or more of the number of caches, time slice parameters (described in detail below), memory consumption, and CPU processing resources.
By way of example, the output samples may be represented as [ host-credit ], where host-credit represents the credit of the host IO command. In another embodiment, the output samples may also be denoted as [ host-credit, gc-credit ], where gc-credit represents the credit of the garbage collection IO command. In yet another embodiment, the output samples may be expressed as [ host-credit, gc-credit, buffer-allocated ], where buffer-allocated represents the number of buffers.
The foregoing describes a detailed method of constructing a training sample set, and a method of training a neural network is described below.
In step S2, a Neural Network (NN) is a mathematical model or computational model that mimics the structure and function of a biological neural network. Neural networks are made up of a large number of nodes and interconnections between each other. The links between each two nodes represent a weight, also referred to as a weight (also referred to as a weight value or a weight parameter), for the signal passing through the connection. The output of the network is different according to the connection mode, the weight value and the nodes of the network. For example, a simple neural network model may include an input layer, an hidden layer, and an output layer. The training process of the neural network is to continuously adjust the weight value of the network connection based on a preset network model, so that the difference between the input sample and the output sample is small enough. The process of training the neural network is a forward training process (forward propagation) and a reverse training process (reverse propagation) which are iterated continuously, and in the forward training process, the output value of the neural network is calculated through an input sample and the current network weight; in the reverse training process, the network weight is adjusted according to the error of the output value and the output sample of the neural network (for example, gradient descent algorithm is utilized); the forward and reverse training processes are then repeated, with the network weights adjusted to perfect the neural network until the error between the input and output samples is sufficiently small. For purposes of this application, the neural network model may select various types of architectures, such as feed forward neural networks, deep neural networks, recurrent neural networks, or symmetric connection networks, among others. Regarding the selection of the neural network model, a person skilled in the art can freely select the neural network model according to the actual situation, and since the architecture, principle, configuration mode and the like of various network models belong to the prior art, the disclosure of the present application is omitted here.
Fig. 6 shows a flowchart of a neural network training method according to an embodiment of the present application, and the method flowchart may be understood as detailed steps of step S2 in fig. 4. Including steps S21 to S25. The following is a detailed description.
In step S21, after the training sample set is constructed, a test data set may be constructed by selecting some or all of the pairs of input samples and output samples in the training sample set.
Step S22, training the neural network according to each pair of input samples and corresponding output samples in the test number set. For example, the input samples and output samples may be in the form of: [ VTC, host-ios, gc-ios, freeblockcount ] and [ host-credit, gc-credit ]. And inputting an input sample into the neural network to obtain a training output value of the neural network so as to finish one forward training.
Step S23, judging whether the error value between each training output value and the corresponding output sample is not more than a set error threshold value or whether the forward training times are not less than set times. Step S23 is used to determine whether training is completed, where the error value between the training output value and the corresponding output sample and the number of forward training can be used as criteria for whether training is completed. In an actual application scenario, the set error threshold and the set number of times may be predetermined.
And in response to the error value between the training output value and the corresponding output sample being greater than a set error threshold, or the number of forward training times being less than the set number of times, jumping to step S24, and updating the weight parameters of the neural network to complete one-time reverse training. Responding to the error value between the training output value and the corresponding output sample is not more than a set error threshold value, or the forward training times are not less than the set times, indicating that the training target is reached, jumping to step S25, completing the training process of the neural network, and obtaining a training result; for example, the training results include trained neural networks or weighting parameters.
By way of example, since the training sample set includes a plurality of pairs of input samples and output samples, not all of the training output values after training of each pair of input samples may be within a set error threshold value with the error values between the corresponding output samples, only training results that match the output samples may be output (which may be used to form a control parameter lookup table, described below), while training results that do not match the output samples may not be output. It should be understood that herein, matching means that the error value between the training output value and the corresponding output sample is within a set error threshold.
As an example, in step S21, a verification data set may be constructed at the same time as the test data set. After step S25, the trained neural network may also be validated using the validation data set.
Also for example, in the present application, the above-mentioned scheme for optimizing the control parameters of the IO command scheduling may be executed by the storage device itself or may be executed by another computer device other than the storage device. FIG. 7 illustrates a schematic diagram of a method for optimizing control parameters (i.e., an offline training method) of an embodiment of a computer device of the present application.
As shown in fig. 7, in order to optimize the control parameters, the computer device first needs to obtain its working data from the storage device, where it should be understood that the working data obtained by the computer device is also the working data actually operated by the storage device; then, constructing a training sample set according to the acquired working data, wherein the specific construction process is described above and is not repeated here; and then the computer equipment trains the neural network according to the training sample set, evaluates the training result, continuously adjusts the weight parameters of the neural network, and finally obtains the values of one or more control parameters matched with the output sample. Since the whole training process is performed in a computer device, such as a computer device like a PC, instead of a storage device (i.e. the training is offline), the training results need to be imported into the storage device after the training results (weight parameters or control parameter lookup tables) are obtained on these computer devices in order to achieve control of the storage device with the training results. For example, the storage device generates control parameters by means of firmware to schedule various types of IO commands, and then the firmware of the storage device can be upgraded by means of a burning program, so that a training result is solidified into the storage device, the storage device can conveniently configure the values of the control parameters in the IO command processing process according to the training result, and the data writing capability and the garbage recycling executing capability of the storage device are maintained in a steady state or an ideal state as much as possible, so that the service quality of the storage device is improved.
For example, to enable the storage device to configure the value of the control parameter during the processing of the IO command according to a training result, the training result may be a control parameter lookup table, where the control parameter lookup table includes a plurality of entries, and each entry includes a training output value matched with the output sample and its corresponding input sample. In order to obtain the control parameter lookup table, a training output value matched with the output sample in the forward training process needs to be recorded and output. For example, the input sample, the training output value and the error value (the error value of the training output value and the output sample) of each forward training are recorded, and the input sample and the corresponding training output value of the obtained minimum error value (for example, the error is within a set range) are saved, so as to construct a control parameter lookup table. An exemplary control parameter lookup table is shown in table 5.
TABLE 5
Operating parameter 1 Operating parameter 2 Control parameter (training output value)
VTC(1) host-iops(1) host-credit(1)
VTC(2) host-iops(2) host-credit(2)
VTC(3) host-iops(3) host-credit(3)
Three entries are included in table 5, each containing an input sample and a training output value corresponding thereto. For example, the first entry comprises two control parameters VTC (1) and host-ios (1), and the corresponding control parameter (training output value) host-credit (1). When the storage device uses the control parameter lookup table, according to the current actual VTC (x) and host-iops (x), a lookup is performed in the table 5, for example, if it is queried that VTC (x) =vtc (2), host-iops (x) =host-iops (2), then the current control parameter is determined to be host-credit (x) =host-credit (2). In another example, the same VTC (x) and host-ios (x) cannot be queried in the control parameter lookup table, but the host-ios (1) < host-ios (x) < host-ios (2) can be determined by determining the VTC (1) < VTC (x) < host-ios (2), and then the corresponding control parameter host-credit (x) can be determined by fitting, so that host-credit (1) < host-credit (x) < host-credit (2).
As another example, in order to implement that the storage device can optimize the control parameters in the IO command processing process according to the training result, the training result may be a weight parameter of the trained neural network, and the weight parameter may be directly extracted from the trained neural network. Based on the above, the storage device should be provided with the same or similar neural network as the computing device, and when the storage device works, the storage device receives the trained neural network weight parameters and configures the neural network of the storage device, and then outputs control parameters through the neural network of the storage device according to the current actual working parameters as the input of the neural network, and controls the processing of IO commands according to the control parameters.
FIG. 8 illustrates a schematic diagram of a method of optimizing control parameters (i.e., an online training method) for an embodiment of a storage device of the present application.
As shown in fig. 8, in order to optimize the control parameters, the storage device collects the working parameters of the storage device during the working process of the storage device, and then constructs a training sample set according to the collected working parameters and the target values of the control parameters, and the specific construction process is referred to above, which is not described herein. And then training the neural network according to the training sample set, evaluating the training result, and further continuously adjusting the weight parameters of the neural network to finally obtain the trained neural network. After the neural network is trained, the storage device stores the trained neural network. In the process of processing IO commands by the storage device, the storage device can directly determine corresponding control parameters according to current actual working parameters and the trained neural network. In addition, the neural network used to infer the control parameters may be updated autonomously during use of the storage device. For example, an update period may be set in which the storage device uses the first neural network to infer the control parameters, while the storage device continuously collects training samples, performs training of the second neural network, and obtains a trained second neural network, and updates the first neural network with the trained second neural network at the end of the update period.
As can be seen in fig. 7 and 8, the training results of the step S25 include, but are not limited to: a weight parameter of a trained neural network, a control parameter lookup table, or a trained neural network.
Fig. 9 illustrates a method of performing IO command scheduling in actual operation of a storage device. Including steps S91 to S95.
In step S91, the storage device collects the current actual operating parameters.
In step S92, the storage device obtains corresponding control parameters for IO command scheduling according to the training result (e.g., the trained neural network obtained in step S25). As an example, the storage device may obtain the credit (host-credit) of the corresponding host IO command to be 80, and the credit (gc-credit) of the garbage collection IO command to be 70 according to the current actual working parameters and the training result.
In step S93, the IO command is scheduled according to the credit (credit) of the current time slice. By way of example, host IO commands and garbage collection IO commands in a storage device need to be scheduled. A recurring time slice may be set, each time slice being allocated a credit (credit) so that IO commands are scheduled at each time slice. The number of credits (credits) specifically allocated is determined by an algorithm internal to the storage device (such algorithm is of prior art and will not be described in detail here). For example, a fixed credit (credit) may be allocated for each time slice, set to 100.
In step S94, in response to receiving the IO command, it is determined whether the credit (credit) of the current time slice is sufficient, if so, the process goes to step S95, and if not, the next time slice is rescheduled.
For example, in one time slice, a host IO command is received, and since the credit (host-credit) of the host IO command is less than the credit (credit) of the current time slice, i.e., 80 < 100, the credit (credit) representing the current time slice is sufficient, the storage device processes the host IO command. In this time slice, if the garbage collection IO command is received again, since the host IO command has already been processed, 80 credits (credit) are consumed, and 20 credits (credit) remain, then the credits (gc-credit) of the garbage collection IO command are greater than the credits (credit) of the time slice, i.e., 70 > 20, and the garbage collection IO command is no longer processed. At the next time slice, the credit (credit) of the time slice is restored to 100 again, and the garbage collection IO command can be processed again at this time.
For example, when the storage device receives the host IO command and the garbage collection IO command at the same time, it may determine which IO command is preferentially responded according to the credit (host-credit) of the host IO command and the credit (gc-credit) of the garbage collection IO command. For example, since the host IO command has a greater credit than the garbage collection IO command (80 > 70), the host IO command is preferentially responded to.
In the above embodiments, the credit line corresponding to the host IO command and the garbage collection IO command is fixed, or the credit line consumed by the host IO command and the garbage collection IO command is fixed. In other embodiments, the credit consumed by the host IO command and the garbage collection IO command may also be set to be variable, e.g., the host IO command determines the consumed credit (host-credit) based on the data length it accesses, i.e., the data length it accesses is positively correlated with the consumed credit (host-credit).
In the above embodiment, each time slice is allocated with the same credit (credit), and in other embodiments, time slices may be allocated with different credits (credits), for example, a set of time slices is set for circulation, and each time slice in a set of time slices is allocated with different credit (credit). The time slice may be set according to the processing power of the processor, for example, the time slice may be set to 100ms or 120ms. The configuration mode and the time length of the time slice are the time slice parameters, and the time slice parameters can be added into the output samples, so that the time slice parameters become variable, for example, the time slice length can be obtained according to the current actual working parameters of the storage device and the training result, and the self-adaptive change of the time slice length is realized.
Referring to FIG. 1B, in one embodiment, the method shown in FIG. 9 may be implemented in a host command processing unit 1042, e.g., the host command processing unit may determine a credit (credit) consumed by a host IO command based on a data length accessed by the host IO command. The method shown in fig. 9 may also be implemented in the storage command processing unit 1043, for example, the storage command processing unit may set a credit line (credit) that consumes one unit per storage command.
According to another aspect of the present application, there is also provided a computer device comprising a processor and a memory, the processor executing a computer program stored in the memory, the computer program, when executed, implementing the method as shown in fig. 4, 6 and 7. The result of processing the method is to generate the training results and to provide the training results to a storage device.
According to yet another aspect of the present application, there is also provided a storage device, including a controller and a memory, the memory storing a computer program, the controller implementing a method as shown in fig. 4, 6, 8 and 9 when executing the computer program, as a result of executing the method: the control parameters are derived to achieve optimization of the control parameters (e.g., fig. 8), or the control parameters are derived and IO commands are scheduled (e.g., fig. 9). The structure of the storage device is shown in fig. 1A, and the controller of the storage device is shown in the control unit 104 in fig. 1B, and the computer program is executed by the host command processing unit or may be executed by the storage command processing unit.
According to yet another aspect of the present application, there is also provided a computer-readable storage medium storing a computer program. In one embodiment, the computer program, when executed, implements the methods shown in fig. 4, 6 and 7. The result of performing the method is to generate the training result and to provide the training result to a storage device. That is, in this embodiment, the computer program is executed by a computer device other than the storage device.
In another embodiment, the computer program, when executed, implements a method as shown in fig. 4, 6, 8 and 9, the result of performing the method being: the control parameters are derived to achieve optimization of the control parameters (e.g., fig. 9), or the control parameters are derived and IO commands are scheduled (e.g., fig. 9). That is, in this embodiment, the computer program is executed by the storage device, and no computer device other than the storage device is required.
The foregoing computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, the computer readable storage medium may be any suitable magnetic or magneto-optical storage medium, such as, for example, resistance change Memory RRAM (Resistive Random Access Memory), dynamic Random Access Memory DRAM (Dynamic Random Access Memory), static Random Access Memory SRAM (Static Random-Access Memory), enhanced dynamic Random Access Memory EDRAM (Enhanced Dynamic Random Access Memory), high-Bandwidth Memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc., or any other medium that may be used to store the desired information and that may be accessed by an application, a module, or both. Any such computer storage media may be part of, or accessible by, or connectable to, the device.
It should be noted that, for the sake of brevity, some methods and embodiments thereof are described in the present application as a series of actions and combinations thereof, but those skilled in the art will understand that the aspects of the present application are not limited by the order of the described actions. Thus, one of ordinary skill in the art will appreciate in light of the present disclosure or teachings that certain steps thereof may be performed in other sequences or concurrently. Further, those skilled in the art will appreciate that the embodiments described herein may be considered alternative embodiments, i.e., wherein the acts or modules involved are not necessarily required for the implementation of some or all aspects of the present application. In addition, the description of some embodiments of the present application also has an emphasis on each of them according to the scheme. In view of this, those skilled in the art will appreciate that portions of one embodiment of the present application that are not described in detail herein may also be referred to in connection with other embodiments.
In particular implementations, based on the disclosure and teachings of the present application, one of ordinary skill in the art will appreciate that several embodiments disclosed herein may also be implemented in other ways not disclosed herein. For example, in terms of the foregoing embodiments of the electronic device or apparatus, the units are split in consideration of the logic function, and there may be another splitting manner when actually implemented. For another example, multiple units or components may be combined or integrated into another system, or some features or functions in the units or components may be selectively disabled. In terms of the connection relationship between different units or components, the connections discussed above in connection with the figures may be direct or indirect couplings between the units or components. In some scenarios, the foregoing direct or indirect coupling involves a communication connection utilizing an interface, where the communication interface may support electrical, optical, acoustical, magnetic, or other forms of signal transmission.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application. It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (10)

1. A method for optimizing control parameters for IO command scheduling, comprising:
in response to obtaining operational data of a storage device, constructing a training sample set, wherein the training sample set comprises a plurality of paired input samples and output samples, the input samples comprising operational parameters related to host IO command processing, the output samples comprising target values of one or more control parameters controlling host IO command scheduling;
and training the neural network by using the training sample set, and outputting a training result matched with the output sample so as to determine the control parameters of the IO command scheduling according to the training result.
2. The method of claim 1, wherein constructing a training sample set comprises:
collecting a plurality of data points from the working data, and taking the data corresponding to each data point as an input sample;
and obtaining an output sample corresponding to each input sample in a steady state or ideal state.
3. The method of claim 2, wherein collecting a plurality of data points from the operational data comprises:
selecting a working period from the working data, and collecting a plurality of data points in the working period, wherein the working period refers to the time corresponding to one balance cycle between the writing data capacity and the garbage recycling capacity of the storage equipment, and the working data comprises a plurality of working periods; or (b)
A plurality of data points are collected from data corresponding to a plurality of duty cycles.
4. A method according to any one of claims 1 to 3, wherein the input samples comprise: number of valid transfer units (VTC) and number of host IO commands processed per second (host-IOs).
5. The method of claim 4, wherein the input samples further comprise:
one or more of the number of garbage collection commands per second (gc-iops), the number of free blocks, and the scene parameters.
6. The method of claim 5, wherein the storage device has a multi-core processor;
the number of host IO commands processed per second (host-IOs) comprises the number of host IO commands processed per second corresponding to each single core;
the number of garbage collection commands per second (gc-iops) includes the number of garbage collection IO commands per second corresponding to each single core.
7. The method of any of claims 1 to 6, wherein the output samples include a credit (credit) for the IO command, the credit (credit) for the IO command being used to control IO command scheduling and processing.
8. The method according to any one of claims 1 to 6, further comprising:
storing the training results in response to the training being completed in a storage device;
and providing the training results to a storage device in response to the training not being completed in the storage device.
9. The method of claim 8, wherein the training results comprise a control parameter lookup table or weight parameters of the trained neural network, wherein the control parameter lookup table comprises one or more entries, each entry containing an input sample and its corresponding training output value.
10. A computer device comprising a processor and a memory, the memory storing a computer program, the processor implementing the method of any one of claims 1 to 9 when executing the computer program.
CN202111599105.XA 2021-12-24 2021-12-24 Method for optimizing control parameters of IO command scheduling and related products Pending CN116360670A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111599105.XA CN116360670A (en) 2021-12-24 2021-12-24 Method for optimizing control parameters of IO command scheduling and related products

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111599105.XA CN116360670A (en) 2021-12-24 2021-12-24 Method for optimizing control parameters of IO command scheduling and related products

Publications (1)

Publication Number Publication Date
CN116360670A true CN116360670A (en) 2023-06-30

Family

ID=86939119

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111599105.XA Pending CN116360670A (en) 2021-12-24 2021-12-24 Method for optimizing control parameters of IO command scheduling and related products

Country Status (1)

Country Link
CN (1) CN116360670A (en)

Similar Documents

Publication Publication Date Title
TWI740110B (en) Workload-adaptive overprovisioning in solid state storage drive arrays
US20200341664A1 (en) Intelligent wide port phy usage
CN111176564B (en) Method and device for determining data placement strategy in SSD
KR100858756B1 (en) Storage device and host apparatus
CN109445688B (en) Storage control method, storage controller, storage device and storage system
CN105354152B (en) Nonvolatile memory and abrasion equilibrium method
KR102538126B1 (en) Tail latency aware foreground garbage collection algorithm
US11467767B2 (en) Storage device throttling amount of communicated data depending on suspension frequency of operation
CN109977032A (en) Junk data recycling and control method and its device
CN109358821A (en) A kind of cold and hot data store optimization method of cloud computing of cost driving
CN109558334A (en) Junk data recovery method and solid storage device
CN108205478B (en) Intelligent sequential SCSI physical layer power management
CN108008917A (en) Storage device and the method for controlling its linking status
US20180364934A1 (en) Adaptive throttling
WO2022067686A1 (en) Data reading method applied to solid state disk (ssd), and related apparatus
Kim et al. QoS-aware flash memory controller
CN107919143A (en) Solid storage device and its temprature control method
WO2017059716A1 (en) Method and device for redundant arrays of independent disks to share write cache
CN116360670A (en) Method for optimizing control parameters of IO command scheduling and related products
KR20240025451A (en) Method and device for data caching
CN109783404A (en) Solid storage device with asymmetric channel
CN108984108A (en) For dispatching the method and solid storage device of I/O command
Wei et al. Reinforcement learning-assisted management for convertible SSDs
US20230214258A1 (en) Storage controller and storage device
CN110297596A (en) Storage equipment with wide operating temperature range

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination