CN115079936A - Data writing method and device - Google Patents

Data writing method and device Download PDF

Info

Publication number
CN115079936A
CN115079936A CN202110281305.4A CN202110281305A CN115079936A CN 115079936 A CN115079936 A CN 115079936A CN 202110281305 A CN202110281305 A CN 202110281305A CN 115079936 A CN115079936 A CN 115079936A
Authority
CN
China
Prior art keywords
request
read
data
storage device
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110281305.4A
Other languages
Chinese (zh)
Inventor
鲁鹏
金季焜
刘金虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202110281305.4A priority Critical patent/CN115079936A/en
Publication of CN115079936A publication Critical patent/CN115079936A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/064Management of blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0674Disk device
    • G06F3/0676Magnetic disk device
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data writing method and device, and relates to the field of data storage. The method comprises the following steps: the storage equipment acquires a first read IO request, and determines at least one second IO request having a continuously read incidence relation with the first read IO request according to a characteristic value of the first read IO request and a data characteristic analysis model, wherein the second IO request is stored in a memory of the storage equipment; further, the storage device writes the first read IO request and the at least one second IO request together into a mechanical hard disk of the storage device. Aiming at the storage equipment comprising various storage media (such as an HDD and an SSD), the storage equipment writes a first IO request and all second IO requests into the HDD together, so that the storage equipment writes a plurality of IO requests with continuously read association relationship into the HDD, the condition that IO with strong randomness is written into the HDD is avoided, the data read-write speed of the HDD is improved, and the data read-write speed of the storage equipment is further improved.

Description

Data writing method and device
Technical Field
The present application relates to the field of data storage, and in particular, to a data writing method and apparatus.
Background
A Redundant Array of Independent Drives (RAID) is a redundant array of multiple disks, and RAID can be used as an independent large storage device. RAID can fully exploit the advantages of multiple hard disks, such as increasing the speed of the hard disks, and providing fault tolerance to ensure data security. RAID may include a variety of storage media, such as Hard Disk Drives (HDDs) and Solid State Drives (SSDs).
Generally, in a process of writing Input Output (IO) data into a RAID by a processor, the processor writes IO data into a hard disk with a large remaining space in an HDD and an SSD according to the remaining space in the RAID. Because the HDD scans by using the magnetic head of the magnetic disk machine during the reading and writing process of the IO data, the data reading and writing speed of the IO data in the HDD is slow, and if the processor writes the IO data with strong randomness (for example, the IO data may be rewritten many times) into the HDD in the RAID, the data reading and writing speed of the HDD is slow, and further, the data reading and writing speed of the RAID is reduced.
Therefore, how to ensure the rationality of writing IO data into RAID and improve the data reading and writing speed of RAID is a problem that needs to be solved urgently at present.
Disclosure of Invention
The application provides a data writing method and a data writing device, and solves the problem that reading and writing speeds of an HDD and an RAID are low due to the fact that IO with high randomness is written into the HDD.
In order to achieve the purpose, the following technical scheme is adopted in the application.
In a first aspect, an embodiment of the present application provides a data writing method, which may be performed by a storage device, or performed by a data storage system including the storage device, and the method includes: the storage equipment acquires a first read IO request, and determines at least one second IO request having a continuously read incidence relation with the first read IO request according to a characteristic value of the first read IO request and a data characteristic analysis model, wherein the second IO request is stored in a memory of the storage equipment; further, the storage device writes the first read IO request and the at least one second IO request together into a mechanical hard disk of the storage device. By using the data writing method provided by the embodiment of the application, the storage device determines the IO requests having the continuously read association relation with the first read IO request, and for the storage device comprising various storage media (such as a HDD and an SSD), the storage device writes the first read IO request and all the second IO requests into the HDD together, so that the storage device writes a plurality of IO requests having the continuously read association relation into the HDD, the IO requests with strong randomness are prevented from being written into the HDD, the data reading and writing speed of the HDD is improved, and further, the data reading and writing speed of the storage device is improved.
In an optional implementation manner, the characteristic value of the first read IO request includes a first Logical Block Address (LBA) of data to be written, a data length, and a timestamp. The first LBA indicates a storage location of the data to be written in the storage device, the data length indicates a storage byte occupied by the data to be written, and the timestamp may indicate time information of last data change performed on the data to be written, where the data change may be at least one of data reading, data writing, or data rewriting.
In one possible example, the storage device determines at least one second IO request according to the feature values of the first read IO request and a data feature analysis model, and the method includes: the storage device outputs the probability values of the plurality of IO requests which are read continuously according to the characteristic values of the first IO request and the data characteristic analysis model, and the IO requests with the probability values reaching the probability threshold are used as second IO requests. The storage device can determine the incidence relation among the IO requests by utilizing the probability value of the plurality of IO requests which are read continuously, so that the data access characteristics of the plurality of IO requests are determined, the storage device can store the plurality of IO requests according to the data access characteristics of the plurality of IO requests, a group of IO requests with strong randomness are prevented from being written into the HDD, the IO requests with strong accessibility are stored by the mechanical hard disk, the data read-write speed of the HDD is improved, and further the data read-write speed of the storage device is improved.
In an optional implementation manner, the data writing method further includes: the storage device determines at least one third IO request which does not have the incidence relation with the first read IO request and is continuously read according to the characteristic value of the first read IO request and the data characteristic analysis model, and writes the third IO request into a solid state disk of the storage device. The storage device may determine whether the first read IO request and the IO request to be tested have the continuously read association relationship by using the data characteristic analysis model and the characteristic value of the first read IO request, and write the third IO request, which does not have the continuously read association relationship with the first read IO request, into the solid state disk. The storage device writes the IO into different types of hard disks according to the data access characteristics (such as stronger accessibility or stronger randomness) of the IO, so that the storage device is prevented from writing the IO with stronger randomness into the HDD, the data read-write speed of the HDD is improved, and further the data read-write speed of the storage device is improved.
In an optional implementation manner, before the storage device determines at least one second IO request according to the feature value of the first read IO request and the data feature analysis model, the data writing method further includes: the storage device obtains a characteristic value of the first read IO request in the memory. Compared with the storage device that the characteristic value of the first read IO request is stored in the mechanical hard disk or the solid state hard disk, the characteristic value of the first read IO request is stored in the memory of the storage device, the data reading speed of the memory is higher than that of the mechanical hard disk and that of the solid state hard disk, the time for the controller of the storage device to read the characteristic value is reduced, and the data writing speed of the storage device is improved.
In an optional implementation manner, the data writing method further includes: the storage device acquires a data characteristic analysis model. For example, the step of acquiring, by the storage device, the data feature analysis model specifically includes: and the storage equipment acquires the IO training set, inputs the IO training set into the first model to obtain the associated information, and then takes the first model as a data characteristic analysis model if the associated information meets the model convergence condition. The IO training set comprises a plurality of test IOs, and the characteristic values of the test IOs comprise read-write types, timestamps, second LBAs and data lengths of the test IOs; the association information includes a probability value that any two test IOs of the plurality of test IOs are read continuously. The storage device can determine the incidence relation among the IO requests by using the data characteristic analysis model, and further determine the data access mode of the IO requests according to the incidence relation, so that the process of manually determining the incidence relation among the IO requests is reduced, the incidence of written data is improved, further, in the data reading process, the storage device can sequentially read the incidence relation which is continuously read, the data reading performance of the HDD is improved, and the data reading and writing performance of the storage device where the HDD is located is improved.
In some examples, the data access patterns described above include a randomness pattern and an accessibility pattern. For example, the storage device determines a probability value that any two IO requests are read continuously by using the data feature analysis model, compares the probability value with a probability threshold, and if the probability value is greater than or equal to the probability threshold, the storage device determines that the any two IO requests have an association relationship that is read continuously, and the data access mode of the any two IO requests is an accessibility mode. If the probability value is smaller than the probability threshold, the storage device determines that any two IO requests do not have the association relation of continuous reading, and the data access mode of any two IO requests is a random mode.
In a second aspect, an embodiment provided by the present application provides a data writing method, which may be executed by a computing device, where the computing device is connected to a storage device, and the method includes: the method comprises the steps that a computing device obtains a first read IO request, at least one second IO request which has a continuously read association relation with the first read IO request is determined according to a characteristic value of the first read IO request and a data characteristic analysis model, a storage device obtains a first storage message of each second IO request, and the storage device sends the first read IO request and all the first storage messages. And further, the storage device writes the second IO request from the memory of the storage device into the mechanical hard disk of the storage device according to the first storage message. The computing device can determine, by using a feature value and a data feature analysis model of the first read IO request, a second IO request having an association relation with the first read IO request that is continuously read, and for a storage device including multiple storage media, the storage device sends first storage messages of the first read IO request and multiple second IO requests to the storage device, and the storage device writes the first read IO request and all the second IO requests into the mechanical hard disk together, so that sequential writing of the multiple IO requests having continuous reading is realized.
In an optional implementation manner, the characteristic value of the first read IO request includes a first LBA of data to be written, a data length, and a timestamp.
In one possible example, the computing device determines at least one second IO request according to the feature values of the first read IO request and a data feature analysis model, including: the computing device outputs a probability value that the plurality of IO requests are read continuously according to the feature value of the first IO request and the data feature analysis model, and takes the IO request with the probability value reaching a probability threshold value as a second IO request.
In an optional implementation manner, the data writing method further includes: the computing device determines at least one third IO request which does not have the continuously read association relation with the first IO request according to the feature value of the first IO request and the data feature analysis model, and further obtains a second storage message of each third IO request and sends all the second storage messages to the storage device. And then, the storage device writes the third IO request into the solid state disk of the storage device according to the second storage message. The computing device may determine whether the first read IO request and the IO request to be detected have a continuously read association relationship by using the data characteristic analysis model and the characteristic value of the first read IO request, and then store the plurality of IO requests into a mechanical hard disk or a solid state hard disk of the storage device according to whether the plurality of IO requests have a continuously read association relationship, so that data with strong randomness is prevented from being written into the mechanical hard disk, and the data read-write speed of the storage device is improved.
In an optional implementation manner, the data writing method further includes: the computing device obtains a data feature analysis model. For example, the obtaining, by the computing device, the data feature analysis model specifically includes: the computing equipment obtains an IO training set, inputs the IO training set into the first model to obtain associated information, and if the associated information meets a model convergence condition, the computing equipment takes the first model as a data characteristic analysis model. The IO training set comprises a plurality of test IOs, and the characteristic values of the test IOs comprise read-write types, timestamps, second LBAs and data lengths of the test IOs; the association information includes a probability value that any two test IOs of the plurality of test IOs are read continuously. Generally, the processing capacity of the computing device is stronger than that of the storage device, and the data feature analysis model is obtained by the computing device, so that the training time of the data feature analysis model is reduced, and the training efficiency is improved.
In a third aspect, an embodiment of the present application provides a data writing device, and beneficial effects may refer to descriptions of any aspect of the first aspect, which are not described herein again. The data writing means has the functionality to implement the behaviour in the method instance of any of the above first aspects. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions. In one possible design, the data writing apparatus is applied to a storage device, and the data writing apparatus includes: the receiving and sending unit is used for acquiring a first read IO request; the processing unit is used for determining at least one second IO request according to the characteristic value of the first IO read request and the data characteristic analysis model, the second IO request and the first IO read request have a continuously read incidence relation, and the second IO request is stored in a memory of the storage device; and the storage unit is used for writing the first read IO request and the at least one second IO request into a mechanical hard disk of the storage device together.
In an alternative embodiment, the characteristic value of the first read IO request includes a first LBA of the data to be written, a data length, and a time stamp.
In another optional embodiment, the processing unit is further configured to determine at least one third IO request according to the feature value of the first read IO request and the data feature analysis model, where the third IO request does not have an association relationship with the first read IO request that is read continuously; and the storage unit is further used for writing the third IO request into the solid state disk of the storage device.
In another optional embodiment, the processing unit is specifically configured to output probability values that the plurality of IO requests are read continuously according to the feature values of the first IO request and the data feature analysis model, and use the IO request with the probability value reaching the probability threshold as the second IO request.
In another optional implementation manner, the transceiver unit is further configured to obtain a characteristic value of the first read IO request in the memory.
In another optional embodiment, the data writing apparatus further comprises: and the model acquisition unit is used for acquiring the data characteristic analysis model. The model obtaining unit is specifically configured to obtain an IO training set, where the IO training set includes multiple test IOs, and the characteristic values of the test IOs include read-write types, timestamps, second LBAs, and data lengths of the test IOs; the model obtaining unit is specifically configured to input the IO training set to the first model to obtain associated information, where the associated information includes a probability value that any two test IOs of the plurality of test IOs are continuously read; the model obtaining unit is specifically configured to take the first model as a data feature analysis model if the correlation information meets a model convergence condition.
In a fourth aspect, an embodiment of the present application provides another data writing device, and beneficial effects may refer to descriptions of any aspect of the second aspect, which are not repeated herein. The data writing means has the function of implementing the behaviour in the method instance of any of the second aspects described above. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above-described functions. In one possible design, the data writing apparatus is applied to a computing device, the computing device is connected with a storage device, and the data writing apparatus includes: the receiving and sending unit is used for acquiring a first read IO request; the processing unit is used for determining at least one second IO request according to the characteristic value of the first IO read request and the data characteristic analysis model, the second IO request and the first IO read request have a continuously read incidence relation, and the second IO request is stored in a memory of the storage device; the processing unit is further configured to obtain a first storage message of each second IO request, where the first storage message indicates the storage device to write the second IO request from the memory into a mechanical hard disk of the storage device; and the transceiving unit is further used for sending the first read IO request and all the first storage messages to the storage device.
In an alternative embodiment, the characteristic value of the first read IO request includes a first LBA of the data to be written, a data length, and a time stamp.
In another optional implementation manner, the processing unit is further configured to determine at least one third IO request according to the feature value of the first read IO request and the data feature analysis model, where the third IO request does not have an association relationship with the first read IO request, and the association relationship is read continuously; the processing unit is further configured to obtain a second storage message of each third IO request, where the second storage message indicates the storage device to write the third IO request into the solid state disk of the storage device; and the transceiving unit is also used for sending all the second storage messages to the storage device.
In another optional implementation manner, the processing unit is specifically configured to output probability values that the plurality of IO requests are read continuously according to the feature values of the first read IO request and the data feature analysis model; and the processing unit is specifically configured to use the IO request with the probability value reaching the probability threshold as the second IO request.
In another optional embodiment, the data writing apparatus further comprises: and the model acquisition unit is used for acquiring the data characteristic analysis model. The model obtaining unit is specifically configured to obtain an IO training set, where the IO training set includes multiple test IOs, and the characteristic values of the test IOs include read-write types, timestamps, second LBAs, and data lengths of the test IOs; the model obtaining unit is specifically configured to input the IO training set to the first model to obtain associated information, where the associated information includes a probability value that any two test IOs of the plurality of test IOs are continuously read; the model obtaining unit is specifically configured to take the first model as a data feature analysis model if the associated information meets a model convergence condition.
In a fifth aspect, an embodiment of the present application provides a storage device, which includes a processor, a mechanical hard disk, and a solid state hard disk, where the processor is used to implement the operational steps of the method according to any one of the first aspect and the first possible implementation manner through logic circuits or executing code instructions.
In a sixth aspect, embodiments of the present application provide a computing device, which includes a processor and an interface circuit, where the interface circuit is configured to receive signals from other computing devices except the computing device and transmit the signals to the processor or send the signals from the processor to other computing devices except the computing device, and the processor is configured to implement the operation steps of the method according to any one of the second aspect and the possible implementation manner of the second aspect through logic circuits or executing code instructions.
In a seventh aspect, an embodiment of the present application provides a data storage system, where the data storage system includes a computing device and a storage device, and the computing device is connected to the storage device. The computing device is used for acquiring a first read IO request; the computing device is further used for determining at least one second IO request according to the characteristic value of the first IO read request and the data characteristic analysis model, the second IO request and the first IO read request have a continuously read association relationship, and the second IO request is stored in a memory of the storage device; the computing device is further configured to obtain a first storage message of each second IO request, where the first storage message indicates that the storage device writes the second IO request from the memory into a mechanical hard disk of the storage device; the computing device is further configured to send the first read IO request and all of the first storage messages to the storage device. The storage device is used for receiving the first read IO request and all the first storage messages; the storage device is further configured to obtain a second IO request corresponding to each first storage message from the memory, and write the first read IO request and all the second IO requests into the mechanical hard disk together. The computing device may determine, using the feature values and the data feature analysis model of the first read IO request, a second IO request having an association relationship with the first read IO request that is read continuously.
For the storage device comprising multiple storage media, the storage device sends a first storage message of a first read IO request and multiple second IO requests to the storage device, and the storage device writes the first read IO request and all the second IO requests into the mechanical hard disk together, so that sequential writing of multiple IO requests which are read continuously is realized.
Furthermore, in the data reading process, the disk in the mechanical hard disk sequentially reads the data according to the sequence of the plurality of IO requests, so that the sequential reading of the plurality of IO requests in the mechanical hard disk is realized, the data reading speed of the mechanical hard disk is increased, and the data reading speed of the storage device is increased.
In an eighth aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program or instructions are stored, which, when executed by a storage device, implement the operating steps of the method of any one of the first aspect and the first possible implementation manner, and when executed by a computing device, implement the operating steps of the method of any one of the second aspect and the second possible implementation manner.
In a ninth aspect, embodiments of the present application provide a computer program product, which, when run on a storage device, causes the storage device to perform the operational steps of the method of any one of the possible implementations of the first aspect and the first aspect, or when run on a computing device, causes the computing device to perform the operational steps of the method of any one of the possible implementations of the second aspect and the second aspect.
In a tenth aspect, an embodiment of the present application provides a chip, which includes a memory and a processor, where the memory is used to store computer instructions, and the processor is used to call and execute the computer instructions from the memory, so as to perform the operation steps of the method in the first aspect and any possible implementation manner of the first aspect, or the method in any possible implementation manner of the second aspect and the second aspect.
The present application may further combine to provide more implementation manners on the basis of the implementation manners provided by the above aspects.
Drawings
FIG. 1A is a schematic diagram of a data storage system provided herein;
FIG. 1B is a schematic view of another data storage system provided herein;
FIG. 2 is a schematic diagram of a data writing method provided in the present application;
FIG. 3 is a schematic diagram of data reading and writing provided by the present application;
FIG. 4 is a schematic diagram of a data feature analysis model provided in the present application;
FIG. 5 is a schematic diagram of another data writing method provided in the present application;
FIG. 6 is a schematic diagram of another data writing method provided in the present application;
FIG. 7 is a schematic diagram of a data writing apparatus provided in the present application;
fig. 8 is a schematic structural diagram of a computing device provided in the present application.
Detailed Description
For clarity and conciseness of the following description of various embodiments, a brief introduction to the related art is first given:
RAID requirements vary from customer to customer, for example, if RAID is composed using full flash granules, RAID performance is high, but cost is also very high. For another example, if a RAID is composed using a full SSD array, the cost of the RAID is low, but the performance is difficult to meet the customer's requirements. Therefore, to satisfy the balance between performance and cost of RAID, storage devices including a variety of storage media have come into play. For example, the Storage medium may be, but is not limited to, one or more of a Storage Class Memory (SCM), an SSD, and an HDD. SCM is a hybrid storage technology that combines the characteristics of both traditional storage devices and Memory, and Memory-level memories can provide faster read and write speeds than hard disks, but are slower in operation and cheaper in cost than Dynamic Random Access Memories (DRAMs).
Generally, for a storage device including a plurality of storage media, the read-write performance of the storage device depends on the read-write speed of the storage medium with the lowest read-write speed among the storage devices. For example, for a storage device comprising an SSD and an HDD, the read-write speed of the storage device depends on the read-write speed of the HDD. For example, the HDD may write new data (write new data after erasing original data of a disk) by a write-over (COW) method, and the SSD writes data by the COW method may erase all existing data in the SSD, so that the SSD generally writes new data by a write-over (ROW) method (if part of original data in the SSD is rewritten, the rewritten data is rewritten in the SSD, and a pointer is reconfigured for the rewritten data). The detailed process of COW and ROW can be referred to the related description of the prior art, and is not described herein.
The ROW may cause the storage device or a computing device connected to the storage device to perform GC according to the remaining space of the storage device, and release a data block (bolck) written with useless data in the storage device, so that the bolck may write new data after the GC, and as the ROW policy and the GC are continuously executed, the data distribution in the HDD may become scattered. Because the HDD uses the magnetic head of the magnetic disk to scan in the read-write process of the IO data, the data read-write speed of the IO data in the HDD is lower; under the condition that the average seek time of the HDD is not changed, if the data distribution in the HDD is more and more scattered, the seek times of the HDD can be increased, so that the total data read-write time of the HDD is increased, the data read-write speed of the HDD is reduced, and further, the data read-write speed of the storage equipment is reduced.
In order to solve the above problem, an embodiment of the present application provides a data writing method, including: the storage device obtains the first read IO request, determines at least one second IO request according to the characteristic value of the first read IO request and the data characteristic analysis model, and writes the first read IO request and all the second IO requests into a mechanical hard disk of the storage device. The second IO request and the first read IO request have an incidence relation of being read continuously, and the second IO request is stored in a memory of the storage device. By using the data writing method provided by the embodiment of the application, the storage device determines the IO requests having the continuously read association relation with the first IO request, and for the storage device comprising multiple storage media, the storage device writes the first IO request and all the second IO requests into the HDD together, so that the storage device writes the IO requests having the continuously read association relation into the HDD, the situation that the IO with strong randomness is written into the HDD is avoided, the data reading and writing speed of the HDD is improved, and further, the data reading and writing speed of the storage device is improved.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Fig. 1A is a schematic diagram of a data storage system provided herein, which includes a computing device 100 and a storage device 120. In the application scenario shown in FIG. 1A, a user accesses data through an application. The computer running these applications may be referred to as a "computing device". The computing device 100 may be a physical machine or a virtual machine. Physical computing devices include, but are not limited to, desktop computers, servers, notebook computers, and mobile devices. In one possible example, computing device 100 accesses storage device 120 over a network to access data, e.g., the network may include switch 110. In another possible example, the computing device 100 may also communicate with the storage device 120 through a wired connection, such as a Universal Serial Bus (USB).
The storage device 120 shown in FIG. 1A may be a centralized storage system. The centralized storage system is characterized by a uniform entry through which all data from external devices pass, which is the engine 121 of the centralized storage system. The engine 121 is the most central component in a centralized storage system, in which the high-level functions of many storage systems are implemented.
The storage device 120 shown in fig. 1A may also be a distributed storage system including a computing device cluster and a storage device cluster, the computing device cluster including one or more computing devices, and the computing devices 110 may communicate with each other. The computing device may be a computing device such as a server, a desktop computer, or a controller of a storage array, etc. In hardware, the computing device may include a processor, memory, a network card, and the like. The processor is a Central Processing Unit (CPU) for processing a data access request from outside the computing device or a request generated inside the computing device. For example, when the processor receives a write data request sent by a user, the data in the write data request is temporarily stored in the memory. And when the total amount of the data in the memory reaches a certain threshold value, the processor sends the data stored in the memory to the storage device for persistent storage. In addition, processors are used for data computation or processing, such as metadata management, deduplication, data compression, virtualized storage space, and address translation.
In one example, any one computing device may access any one of the storage devices in the storage device cluster over a network. The storage device cluster includes a plurality of storage devices. A storage device includes one or more controllers, a network card for communicating with a computing device, and a plurality of hard disks.
The data writing method provided by the present application may be executed by the computing device 100, or may be executed by the storage device 120, for example, the storage device 120 may be a centralized storage system or a distributed storage system.
As shown in fig. 1A, there may be one or more controllers in the engine 121, and fig. 1A illustrates an example in which the engine 121 includes one controller. In a possible example, if the engine 121 has multiple controllers, a mirror channel may be provided between any two controllers, so as to implement a function that any two controllers backup each other, thereby avoiding unavailability of the entire storage device 120 due to a hardware failure.
The engine 121 also includes a front-end interface 1211 and a back-end interface 1214, where the front-end interface 1211 is used to communicate with the computing device 100 to provide storage services for the computing device 100. And a back-end interface 1214 for communicating with the hard disk to expand the capacity of the storage device 120. Through the backend interface 1214, the engine 121 can connect more hard disks, thereby forming a very large pool of storage resources.
In hardware, as shown in fig. 1A, the controller includes at least a processor 1212 and a memory 1213. Processor 1212 is a CPU that processes data access requests from outside of storage device 120 (servers or other storage systems) as well as requests generated internally within storage device 120. For example, when the processor 1212 receives the write data request sent by the computing device 100 through the front-end port 1211, the data in the write data request is temporarily stored in the memory 1213. When the total amount of data in the memory 1213 reaches a certain threshold, the processor 1212 sends the data stored in the memory 1213 to at least one of the mechanical hard disk 1211, the mechanical hard disk 1222, the solid-state hard disk 1223, or the other hard disk 1224 through the back-end port for persistent storage.
The memory 1213 is an internal memory for directly exchanging data with the processor, and it can read and write data at any time and at a high speed as a temporary data storage for an operating system or other programs in operation. The Memory includes at least two types of Memory, for example, the Memory may be a random access Memory (ram) or a Read Only Memory (ROM). The random access memory is, for example, DRAM, or SCM. DRAM is a semiconductor Memory, and belongs to a volatile Memory (volatile Memory) device, like most Random Access Memories (RAMs). However, the DRAM and the SCM are only exemplary in this embodiment, and the Memory may also include other Random Access memories, such as Static Random Access Memory (SRAM), and the like. As the rom, for example, a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), and the like can be used. In addition, the Memory 1213 may also be a Dual In-line Memory Module (DIMM), i.e., a Module composed of Dynamic Random Access Memory (DRAM), or an SSD. In practice, the controller may be configured with a plurality of memories 1213, and different types of memories 1213. The number and type of the memories 1213 are not limited in this embodiment. In addition, the memory 1213 may be configured to have a power conservation function. The power-saving function means that data stored in the memory 1213 cannot be lost when the system is powered off and powered on again. A memory having a power retention function is called a nonvolatile memory.
The memory 1213 stores software programs, and the processor 1212 runs the software programs in the memory 1213 to manage the hard disk. For example, a hard disk is abstracted into a storage resource pool, and the storage resource pool is provided to a server for use in the form of a Logical Unit Number (LUN). The LUN here is in fact the hard disk seen on the server. Of course, some centralized storage systems are themselves file servers, and may provide shared file services for the servers.
As an alternative implementation, fig. 1B is a schematic diagram of another data storage system provided in the present application, in which the engine 121 may not have a hard disk slot, a hard disk needs to be placed in the hard disk frame 130, and the backend interface 1214 communicates with the hard disk frame 130. The backend interface 1214 exists in the engine 121 in the form of an adapter card, and two or more backend interfaces 1214 may be used simultaneously on one engine 121 to connect a plurality of hard disk frames. Alternatively, the adapter card may be integrated on the motherboard, and the adapter card may communicate with the processor 1212 through the PCIE bus.
It should be noted that only one engine 121 is shown in fig. 1B, however, in practical applications, two or more engines 121 may be included in the storage system, and redundancy or load balancing is performed among the engines 121.
The hard disk frame 130 includes a control unit 1225 and several hard disks. The control unit 1225 may have various forms. In one case, the hard disk frame 130 belongs to an intelligent disk frame, and as shown in fig. 1B, the control unit 1225 includes a CPU and a memory. The CPU is used for performing address conversion, reading and writing data and the like. The memory is used to temporarily store data to be written to the hard disk or read from the hard disk to be sent to the controller. Alternatively, the control unit 1225 is a programmable electronic component, such as a Data Processing Unit (DPU). The DPU has the generality and programmability of a CPU, but is more specialized and can run efficiently on network packets, storage requests or analysis requests. DPUs are distinguished from CPUs by a large degree of parallelism (requiring processing of large numbers of requests). Optionally, the DPU may also be replaced with a Graphics Processing Unit (GPU), an embedded neural Network Processor (NPU), and other processing chips. In general, the number of the control units 1225 may be one, or two or more. The functions of the control unit 1225 may be offloaded to the network card 1226. In other words, in this embodiment, the hard disk frame 130 does not have the control unit 1225 therein, but the network card 1226 performs data reading and writing, address conversion, and other computing functions. In this case, the network card 1226 is an intelligent network card. It may contain a CPU and memory. The CPU is used for performing address conversion, reading and writing data and other operations. The memory is used to temporarily store data to be written to the hard disk or read out from the hard disk to be sent to the controller. Or may be a programmable electronic component such as a DPU. There is no affiliation between the network card 1226 and the hard disk in the hard disk frame 130, and the network card 1226 can access any one of the hard disks in the hard disk frame 130 (such as the mechanical hard disk 1221, the mechanical hard disk 1222, the solid state hard disk 1223, and the other hard disks 1224 shown in fig. 1A), so it is convenient to expand the hard disks when the storage space is insufficient.
Depending on the type of communication protocol between the engine 121 and the hard disk frame 130, the hard disk frame 130 may be a serial attached small computer system interface (SAS) hard disk frame, an NVMe (non-volatile memory express) hard disk frame, and other types of hard disk frames. SAS hard disk frames adopt SAS3.0 protocol, and each frame supports 25 SAS hard disks. Engine 121 interfaces with hard disk frame 130 through an onboard SAS interface or SAS interface module. The NVMe hard disk frame is more like a complete computer system, and the NVMe hard disk is inserted into the NVMe hard disk frame. The NVMe hard disk box is in turn connected to the engine 121 through an RDMA port.
In order to improve the data read-write speed of RAID while ensuring the rationality of writing IO data into RAID, the following description takes the storage device 120 to execute the data writing method provided in this embodiment as an example, and fig. 2 is a schematic flow chart of a data writing method provided in this application, where the data writing method includes the following steps.
S210, the storage device 120 obtains the first read IO request.
In one possible implementation, as shown in fig. 1A, the first read IO request may be a read IO request sent by the computing device 100 to the storage device 120. For example, the first read IO request may be a read IO request generated when the computing device runs an application.
In another possible implementation manner, as shown in fig. 1B, the first read IO request may be an IO request generated when the storage device 120 performs a service. For example, the service may be a GC process executed by the storage device 120, the first read IO request includes valid data after GC, and the first read IO request may be stored in the memory 1213 of the storage device 120, for example, the processor 1212 in the storage device 120 may further obtain the first read IO request from the memory 1213.
Taking GC of the solid state disk 1223 by the storage device 120 shown in fig. 2 as an example, the storage device 120 deletes invalid data (or called garbage data) in the solid state disk 1223, and writes valid data of the solid state disk 1223 to the mechanical hard disk 1222, where the invalid data refers to data that is not read in a process after the storage device 120 after GC; the valid data refers to data that may be read in a process after the storage device 120 after the GC. As shown in fig. 3, fig. 3 is a schematic diagram of data reading and writing provided by this application, where a solid state disk 1223 and a mechanical hard disk 1222 each include multiple storage regions, and after GC is performed on one storage region in the solid state disk 1223, the storage region has multiple data blocks (blocks) including valid data, each block includes 8 data pages (pages), a data page in the data block 311 shown in fig. 3 is "A0C 0B 00D", a data page in the data block 312 is "0E 0000F 0", and a data page in the data block 313 is "000G 0000", where taking as an example that a controller reads a data page in the data block 311, the first read IO request includes valid data: "ACBD".
In this document, the read-write process shown in fig. 3 is described by taking the granularity of data (the minimum unit of data read-write) of the data blocks 311 to 313 as block and the data included in the data blocks as page as an example. However, in some possible examples, the data granularity of data blocks 311-313 may also be chunk (the data granularity of chunk is typically greater than block), and the data each chunk includes may be block. In other possible examples, the granularity of data blocks 311-313 may also be pages, and each page may include data that is smaller than the page.
The examples provided in fig. 2 and 3 are illustrated with data included in IO requests being stored on solid state disk 1223, but in some possible examples, the data included in the IO requests may also be stored on a mechanical hard disk (e.g., mechanical hard disk 1221 shown in fig. 1A) or other hard disk.
S220, the storage device 120 obtains the characteristic value of the first read IO request in the memory 1213.
In one possible example, if the storage device 120 is a centralized storage system, as shown in fig. 1B, the processor 1212 in the storage device 120 may read the characteristic value of the first read IO request from the memory 1213.
In another possible example, if the storage device 120 is a distributed storage system, the storage device 120 may also read the characteristic value of the first read IO request from the memories of other storage devices of the distributed storage system.
The characteristic value of the first read IO request includes a first LBA of data to be written, a data length, and a timestamp. Where the first LBA indicates a storage location of the data to be written in the storage device 120, the LBA may number from 0 to locate a block where the data to be written is located in the storage device 120, for example, the first LBA is 0, the second LBA is 1, and so on.
The data length indicates a storage byte occupied by the data to be written, for example, a byte occupied by the data to be written is 8 kilobytes (kB). The time stamp may indicate time information of a last data change of the data to be written, which may be at least one of data reading, data writing, or data rewriting.
In some possible embodiments, the characteristic value of the first read IO request may further include a read-write type of the first read IO request, for example, the first read IO request is a request with the read-write type being read IO.
In other possible embodiments, if the first read IO request is a group of IO data streams, the characteristic value of the first read IO request may further include at least one of a read-write ratio of data, an IO size distribution, a read-read interval distribution, a write-write interval distribution, a sequential stream characteristic, an interval stream characteristic, and an association stream characteristic. The read-write proportion refers to the proportion of read IO and write IO in an IO data stream, the IO size distribution refers to the data length of each IO data in the IO data stream, the read-read interval distribution refers to the interval time of two adjacent read IOs, the write-write interval distribution refers to the interval time of two adjacent write IOs, the sequential stream characteristic refers to the associated information of two adjacent IOs which are continuously written or continuously read, the interval stream characteristic refers to the associated information of two non-adjacent IOs which are continuously written or continuously read, and the associated stream characteristic refers to the associated information of any two IOs which are continuously written or continuously read. The above feature values are only possible cases provided by the embodiments of the present application, and should not be construed as limiting the present application, and in some possible examples, the feature values of the first read IO request may further include more or less features.
As an alternative implementation manner, as shown in fig. 2, a characteristic value of the first read IO request may be stored in the memory 1213, and the memory 1213 may further store a plurality of other IO requests. In a possible example, if the storage capacity of the memory 1213 is large, the memory 1213 may also hold characteristic values of a plurality of IO requests.
Compared with the storage device that the characteristic value of the first read IO request is stored in the mechanical hard disk or the solid state disk, the characteristic value of the first read IO request is stored in the memory of the storage device in the embodiment of the application, and the data reading speed of the memory is higher than that of the mechanical hard disk and that of the solid state disk, so that the time for a controller of the storage device to read the characteristic value is reduced, and the speed for the storage device to write data is increased.
As an alternative implementation, if there are more IO requests stored in the memory 1213 of the storage device 120, the characteristic value of the first IO request may also be stored in a hard disk of the storage device 120, such as the solid state disk 1223 shown in fig. 2. When the memory 1213 of the storage device 120 is not sufficient for storing the characteristic value of the first read IO request, the storage device 120 may use a partial area in the solid state disk 1223 as a memory to implement a function of the memory 1213 (e.g., cache the characteristic value of the first read IO request).
S230, the storage device 120 determines at least one second IO request according to the feature value of the first IO request and the data feature analysis model.
The second IO request has an association relationship with the first read IO request, and the second IO request may be stored in the memory 1213 of the storage device 120. The "being read continuously" means that the data included in the first read IO request and the data included in the second request are read continuously in the same process, or read in a plurality of continuous processes. As shown in fig. 3, if the first read IO request includes "page ACBD", 2 second IO requests having an association relationship with the first read IO request and being read continuously include "page EF" and "page G", respectively, and the data are read in the same process, the controller may rearrange and combine the data included in the IO requests to obtain a block including "ABCDEFG".
By using the data writing method provided by the embodiment of the application, the storage device determines at least one second IO request according to the feature value of the first read IO request and the data feature analysis model, in other words, the storage device determines a plurality of IO requests having continuously read association relations, and further, the storage device may determine the data access characteristics of the plurality of IO requests according to whether the plurality of IO requests have continuously read association relations. For example, if a plurality of IO requests have an association relationship of being read continuously, the storage device determines that the plurality of IO requests have a requirement of being read sequentially, and further determines that the plurality of IO requests are a group of IO requests with strong accessibility; if the plurality of IO requests do not have the incidence relation of being read continuously, the storage device determines that the plurality of IO requests have the incidence relation of being read randomly, and further determines that the plurality of IO requests are a group of IO requests with strong randomness.
As an optional implementation manner, the foregoing S230 specifically includes: the storage device 120 outputs a probability value that a plurality of IO requests are read continuously according to the feature value of the first IO request and the data feature analysis model; and taking the IO request with the probability value reaching the probability threshold as a second IO request. In some examples, "the probability value meets the probability threshold" refers to the probability value being greater than or equal to the probability threshold. The probability threshold is used for determining whether the plurality of IO requests have a continuously read association relationship, and a value of the probability threshold may be determined according to performance of the storage device, a requirement of the storage device for executing a service, or a requirement of the computing device for executing a service. Here, taking the pages a to G shown in fig. 3 as an example, table 1 shows probability values at which any two pages of the pages a to G are read consecutively.
TABLE 1
A B C D E F G
A 90% 4% 3% 2% 1% 0.4%
B 0.1% 85% 3% 2% 1% 0.4%
C 0.1% 0.2% 87% 4% 3% 2%
D 0.1% 0.2% 0.3% 79% 2% 1%
E 0.1% 0.2% 0.1% 0.2% 84% 4%
F 0.1% 0.2% 0.1% 0.2% 0.5% 74%
G 0.1% 0.2% 0.1% 0.2% 0.1% 0.2%
Wherein after page A is read, page B is read with a probability of 90%.
After page B is read, page C is read with a probability of 85%.
After page C was read, page D was read with a probability of 87%.
After page D was read, page E was read with a probability of 79%.
After page E is read, page F is read with a probability of 84%.
After page F is read, page G is read with a 74% probability.
In one possible implementation manner, each page has a different LBA, which is exemplified by LBA of page a being 0, and as shown in fig. 3, a data block 311 includes data pages "A0C 0B 00D", and the block occupies LBA of "0 to 7", LBA of page B being 4, LBA of page C being 2, LBA of page D being 7; the data block 312 includes data pages of "0E 0000F 0", the LBA occupied by the block is "8 to 15", the LBA of page E is 9, and the LBA of page F is 14; the data block 313 includes data pages of "000G 0000", the LBA occupied by the block is "16 to 23", and the LBA of page G is 19. In table 1, the probability values of the pages including valid data are illustrated in the present application, and the probability values of the continuous reading between the pages may be represented by the probability values of the continuous reading between any two LBAs, which is not limited in the present application.
In a first possible example, the storage device determines only one second IO request according to the characteristic values of the first read IO request and the data characteristic analysis model. For example, if the probability threshold is 70%, the first read IO request includes page a, and the IO request includes page B, the probability that page a and page B are read continuously shown in table 1 is 90% > 70%, it is determined that the first read IO request and the IO request have an association relationship of being read continuously, and the storage device takes the IO request as the second IO request.
In a second possible example, the storage device only determines a plurality of second IO requests according to the feature values of the first read IO requests and the data feature analysis model. For example, if the probability threshold is 70%, the first read IO request includes page a, the first IO request includes page B, and the second IO request includes page C, the probability that page a and page B are read continuously shown in table 1 is 90%, the probability that page B and page C are read continuously is 85%, the probability that pages a to C are read continuously is determined to be 90% × 85% >, 76.5% > 70%, further, the first read IO request and 2 IO requests are determined to have an association relationship of being read continuously, and the storage device takes the 2 IO requests as the second IO request. The example that the number of the second IO requests is 2 is only used for illustration, and in some possible examples, the second IO requests having the association relationship with the first read IO request that are read continuously may also have more.
In addition, in the above embodiment, the probability values that a plurality of IO requests are read continuously are an example of a product of probability values that 2 adjacent pages are read continuously, and in some possible examples, to obtain the probability values that a plurality of IO requests are read continuously, the storage device may further perform curve fitting, weighting processing, and the like on the plurality of probability values, which is not limited in this application. For example, regarding the above-mentioned "page a-page B-page C-page F", probability values read consecutively between adjacent 2 pages may be compared with a probability threshold value to determine whether IO requests have an association relationship read consecutively. For example, if the probability threshold is 65%, since the probability values of continuously reading pages a-page B, B-page C, and C-page F are 90%, 85%, 3%, and 90% and 85% are greater than the probability threshold 65%, respectively, it is determined that page a-page B-page C has an association relationship of being continuously read, and 3% is less than the probability threshold 65%, it is determined that page a-page B-page C and page F do not have an association relationship of being continuously read.
According to the data writing method provided by the embodiment of the application, the storage device can determine the incidence relation among the plurality of IO requests by utilizing the probability value that the plurality of IO requests are read continuously, so that the data access characteristics of the plurality of IO requests are determined, the storage device is favorable for storing the plurality of IO requests according to the data access characteristics of the plurality of IO requests, a group of IO requests with strong randomness can be prevented from being written into the HDD, the IO requests with strong accessibility are stored by the mechanical hard disk, the data reading and writing speed of the HDD is improved, and further the data reading and writing speed of the storage device is improved.
As an optional implementation manner, before the foregoing S230, the data writing method provided in this embodiment of the present application may further include: the storage device obtains a data feature analysis model. For example, the data feature analysis model may be a neural network model, such as a Deep Neural Network (DNN) model, a Recurrent Neural Network (RNN) model, or a Convolutional Neural Network (CNN) model. For another example, the data feature analysis model may also be an index table in which different types of IO are recorded, the feature value of the IO request is input to the index table, and the data feature analysis model outputs a probability value that each IO request in the plurality of IO requests and the IO request are read continuously.
As shown in fig. 4, fig. 4 is a schematic diagram illustrating an acquisition process of a data feature analysis model provided in the present application, and the acquisition process of the data feature analysis model may include the following steps S410 to S440.
And S410, obtaining an IO training set.
The IO training set comprises a plurality of test IOs, and the characteristic values of the test IOs comprise read-write types, timestamps, second LBAs and data lengths of the test IOs. For IO characteristic values, reference may be made to the above description of S220, which is not described herein.
And S420, inputting the IO training set into the first model to obtain the associated information.
The association information includes a probability value that any two test IOs of the plurality of test IOs are read continuously. For the probability values that any two test IOs are read consecutively, reference may be made to the related description in table 1, which is not described herein.
And S430, judging whether the associated information meets the model convergence condition.
The model convergence condition may be that the number of times of training the first model reaches a threshold number of times (e.g., 5 ten thousand times), or that the similarity of the actual association between the associated information and each test IO in the IO training set reaches a similarity threshold (e.g., 80%), which is not limited in this application.
If the associated information meets the model convergence condition, executing S430; and if the associated information does not accord with the model convergence condition, training the first model for multiple times by using an IO training set until the model convergence condition is met.
And S440, taking the first model as a data characteristic analysis model.
It should be noted that, in the method provided in the embodiment of the present application, the process of training the data feature analysis model may be executed by a storage device or a computing device, and the present application is not limited thereto. In some possible examples, the data feature analysis model may also be obtained by other computing devices that send the trained data feature analysis model to the computing device or storage device provided by embodiments of the present application.
By way of example, the other computing device may include at least one processor, which may be an integrated circuit chip having signal processing capabilities. The Processor may be a general purpose Processor including a CPU, a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. For example, when the other device has only one processor, the processor may implement the above-described S410 to S440 and possible sub-steps thereof. For another example, when the other device includes a plurality of processors, the plurality of processors may cooperatively implement the above S410 to S440 and possible sub-steps thereof, for example, the plurality of processors includes a first processor and a second processor, the first processor may implement the above S410 and S420, and the second processor may implement the above S430 and S440.
Generally, the processing power of the computing device is stronger than that of the storage device, and compared to the storage device for training to obtain the data feature analysis model, if the training process of the data feature analysis model is executed by the computing device, the training time of the data feature analysis model is shorter, and the training efficiency is higher.
The storage device determines the incidence relation among the IO requests by using the data characteristic analysis model, and then determines the data access mode of the IO requests according to the incidence relation, so that the process of manually determining the incidence relation among the IO requests is reduced, the incidence of written data is improved, further, in the data reading process, the storage device can sequentially read the incidence relation which is continuously read, the data reading performance of the HDD is improved, and the data reading and writing performance of the storage device where the HDD is located is improved.
In one possible example, the data access patterns described above include a random pattern and an access pattern. For example, the storage device determines a probability value that any two IO requests are read continuously by using the data feature analysis model, compares the probability value with a probability threshold, and if the probability value is greater than or equal to the probability threshold, the storage device determines that the any two IO requests have an association relationship that is read continuously, and the data access mode of the any two IO requests is an accessibility mode. If the probability value is smaller than the probability threshold, the storage device determines that any two IO requests do not have the association relation of continuous reading, and the data access mode of any two IO requests is a random mode.
With continued reference to fig. 2, after S230, in order to increase the data read/write speed of the storage device, the data writing method provided in the embodiment of the present application further includes the following steps.
S240, the storage device 120 writes the first read IO request and all the second IO requests together into the mechanical hard disk of the storage device 120.
As shown in fig. 2, the storage device 120 may write the first read IO request and all second IO requests to the mechanical hard disk 1222 or the mechanical hard disk 1221.
In one possible example, if the storage device 120 sets the storage logic of the same type of hard disk to be a balance policy, such as writing data into the same type of hard disk with a larger remaining space, if the remaining space of the mechanical hard disk 1222 is larger than the remaining space of the mechanical hard disk 1221, the storage device 120 may write the first read IO request and all the second IO requests into the mechanical hard disk 1222.
In another possible example, if the storage device 120 sets the storage logic of the same type of hard disk as the resource optimal utilization policy, such as writing a hard disk into another hard disk of the same type after being full, if the remaining space of the mechanical hard disk 1222 is larger than the remaining space of the mechanical hard disk 1221, the storage device 120 may write the first read IO request and all the second IO requests into the mechanical hard disk 1221.
As an optional implementation manner, if the data to be written included in the first read IO request and the second IO request is smaller than the minimum data granularity of the data written in the storage device, in S240, the meaning of "write together" may mean that the controller in the storage device 120 combines the first read IO request and all the second IO requests according to logic information, and writes the combined data into the mechanical hard disk, where the logic information indicates a sequence in which the first read IO request and all the second IO requests are read. For example, with reference to the contents shown in fig. 3 and table 1, if the first read IO request includes page ACBD, the first second IO request includes page E and page F, the second IO request includes page G, and the block obtained by combining the pages by the controller is: "ACBDEFG", the controller may obtain, according to the probability value information shown in table 1, a probability value that the order in which the pages are read is "ABCDEFG" as follows: 90% × 85% × 87% × 79% × 84% × 74% ≈ 32.68%, with a probability threshold of multiple IO requests being read consecutively of 25%, the controller rearranges and combines the above pages to obtain a bolck comprising "ACBDEFG".
In the GC process in the prior art, the storage device moves the valid data obtained by the GC, as shown in the data block 311 in fig. 3, the data block 311 includes 8 pages: "A0C 0B 00D", the storage device writes the valid data in the data block 311 to the mechanical hard disk 1222 in the order of "page a-page C-page B-page D". However, the 4 pages are read in the order of "page a-page B-page C-page D", which results in that the magnetic head in the mechanical hard disk 1222 needs to be repeatedly adjusted in position (the magnetic head needs to move 5 unit distances from page a-page D) for reading the pages a-D during the data reading process. In other words, in the GC process of the prior art, the storage device does not rearrange the read data, which causes the data in the HDD to become more scattered as the number of GC increases, the data reading speed of the HDD to become slower and slower, and further causes the data reading speed of the storage device where the HDD is located to decrease.
In contrast, according to the data writing method provided by the embodiment of the application, the storage device may determine, by using the feature value of the first read IO request and the data feature analysis model, the second IO request having the continuously read association relationship with the first read IO request, and for the storage device including multiple storage media, the storage device writes the first read IO request and the multiple second IO requests into the mechanical hard disk together, so that sequential writing of the multiple continuously read IO requests is realized, and further, in the data reading process, the storage device reads the mechanical hard disk according to the logical sequence of the multiple IO requests, so that sequential reading of the multiple IO requests in the mechanical hard disk is realized, the data reading speed of the mechanical hard disk is increased, and the data reading speed of the storage device is increased.
In one possible example, with reference to fig. 3 and table 1, if the first IO request includes page a and page B, the second IO request includes page C and page D, the probability threshold is 45%, the probability value of the page a-page B-page C-page D being read continuously is 90% × 85% × 87% × 79% ≈ 52.58% > 45%, and the storage device writes the first IO request and the second IO request to the HDD in the order of "page a-page B-page C-page D". In the process of data reading, because the first IO request and the second IO request are written in sequence, the storage device reads the 4 pages and reads the data according to the sequence of 'page A-page B-page C-page D', so that the track seeking times of a magnetic head of the HDD are reduced (the magnetic head needs to move 3 unit distances from the page A to the page D), the data reading speed of the HDD is improved, and further the data reading speed of the storage device is improved.
As an alternative implementation, in order to fully utilize the hard disk formed by various storage media in the storage device 120, data stored in the solid state disk 1223 (e.g., data with stronger accessibility) may be written into the mechanical hard disk 1221 or the mechanical hard disk 1222. As shown in fig. 1A, data included in the first read IO request and the second IO request are stored in the solid state disk 1223, the controller reads the data from the solid state disk 1223 to the memory 1213, and further, the controller writes the data to the mechanical hard disk 1221 or the mechanical hard disk 1222 according to the read order by using the data writing method provided in the embodiment of the present application. By writing a group of data with strong accessibility into the mechanical hard disk from the solid state disk, the utilization rate of various different storage medium layers in the storage equipment can be improved, and the balance between the performance and the cost of the storage equipment is realized.
With reference to fig. 2, in order to store IO requests that do not have a continuously read association relationship with the first read IO request and improve resource utilization of each storage medium in the storage device, the data writing method provided in the embodiment of the present application may further include the following steps.
S250, the storage device 120 determines at least one third IO request according to the characteristic value of the first read IO request and the data characteristic analysis model.
The third IO request has no association with the first read IO request to be read continuously.
With reference to the contents shown in fig. 2, fig. 3, and table 1, if the block included in the first IO read request is "page ACBD", the block included in the IO request to be tested is "page G", and the probability threshold is 40%, the probability that any one of the pages included in the first IO read request and the page G included in the IO request to be tested are continuously read is smaller than the probability threshold, the storage device determines that the first IO read request and the IO request to be tested do not have the continuously read association relationship, and uses the IO request to be tested as the third IO request.
The characteristic value of the first read IO request is stored in the memory of the storage device, so that when the incidence relation of a plurality of IO requests to be tested is confirmed, only the characteristic value of the first read IO request needs to be obtained from the memory.
S260, the storage device 120 writes all the third IO requests into the solid hard disk 1223 of the storage device 120.
As shown in fig. 2, storage device 120 may write the third IO request to solid state disk 1223. In one example, the third IO request may also be saved in the memory 1213 of the storage device 120, so as to increase the data writing speed of the storage device.
By using the data writing method provided by the embodiment of the application, the storage device at least comprises two storage media, namely the HDD and the SSD, so that the performance and the cost of the storage device can be balanced, and the rationality of the storage device is ensured. In addition, the storage device may determine whether the first read IO request and the IO request to be tested have a continuously read association relationship by using the data characteristic analysis model and the characteristic value of the first read IO request, and then the storage device stores the plurality of IO requests into a mechanical hard disk or a solid state hard disk of the storage device. For example, the storage device writes a second IO request having an association relationship with the first read IO request being read continuously into the mechanical hard disk, and the storage device writes a third IO request having no association relationship with the first read IO request being read continuously into the solid state hard disk.
In the foregoing embodiment of the present application, the data writing method is executed by a controller in the storage device 120, and as an optional implementation manner, the data writing method provided in the present application may also be executed by a hard disk frame 122 in the storage device 120, which is not described herein again.
In the foregoing embodiment of the present application, the data writing method is executed by the storage device 120, as an alternative implementation manner, the data writing method provided by the present application may also be executed by the computing device 100, and fig. 5 is a schematic flow chart of another data writing method provided by the present application, where the data writing method includes the following steps.
S510, the computing device 100 obtains the first read IO request.
As shown in fig. 1A, the first read IO request may be an IO request generated when the computing device 100 runs an application program, or an IO request sent by another device and received by the computing device 100 through the switch 110.
S520, the computing device 100 determines at least one second IO request according to the characteristic value of the first IO request and the data characteristic analysis model.
The characteristic value of the first read IO request includes a first LBA of data to be written, a data length, and a timestamp, the second IO request has an association relationship with the first read IO request that is read continuously, and the second IO request is stored in a memory of the storage device. For the characteristic value of the first read IO request, reference may be made to the related description of S210, which is not described herein again. For the data characteristic analysis model provided in the embodiment of the present application, reference may be made to the related description of fig. 4, which is not repeated herein.
As an optional implementation manner, the characteristic value of the first read IO request may be stored in a memory of the computing device, and since the data reading speed of the memory is greater than the data reading speed of the mechanical hard disk and the solid state disk, the time for the computing device to read the characteristic value is reduced, and the data writing speed is increased.
As an optional implementation manner, the foregoing S520 specifically includes: outputting probability values of a plurality of IO requests which are read continuously according to the characteristic values of the first IO request and the data characteristic analysis model; and taking the IO request with the probability value reaching the probability threshold as a second IO request. For the process and the beneficial effect of the computing device determining at least one second IO request according to the feature value of the first read IO request and the data feature analysis model, reference may be made to the above description of S230, which is not repeated herein.
S530, the computing device 100 obtains the first storage message of each second IO request.
In some examples, "storing a message" may also be referred to as storing an instruction, storing information, or storing an identification, etc. The first storage message in S530 instructs the storage device to write the second IO request from the memory to the mechanical hard disk of the storage device. The first storage message may include the LBA, data length, or LUN of the data to be written in the second IO request. In the data writing method provided by the embodiment of the application, the LUN may be used to determine a target storage resource pool of the data to be written included in the second IO request in the storage device. The LBA and the data length can be referred to in relation to S220, and are not described herein.
S540, the computing device 100 sends the first read IO request and all the first store messages to the storage device 120.
As shown in fig. 1A, computing device 100 may send a first read IO request and all first storage messages to storage device 120 through switch 110 to implement a process to access data. In some possible examples, the switch 110 may be an optional device, the computing device 100 may communicate with the storage device 120 through a network, and the computing device 100 may also communicate with the storage device 120 using a wired connection, which is not limited in the manner in which the computing device and the storage device communicate.
S550, the storage device 120 writes the second IO request from the memory into the mechanical hard disk of the storage device 120 according to the first storage message.
For example, the first storage message includes an LBA and a data length of the second IO request, where the LBA indicates a start address of the data to be written stored in the storage device, and the storage device obtains an end address of the data to be written according to the start address and the data length, and further, the storage device may read the data to be written corresponding to the second IO request according to the start address and the end address.
As an alternative implementation manner, fig. 6 is a schematic flow chart of another data writing method provided in the present application, where the foregoing S550 specifically includes S610 to S630.
S610, the storage device 120 receives the first read IO request and all the first storage messages from the computing device 100.
S620, the storage device 120 determines a second IO request corresponding to each first storage message.
For example, the first storage message includes the LBA, the data length, and the LUN of the second IO request, for example, "LUN 1 LBA 33 kB", where the second IO request is located in the 4 th logical block of the LUN1 disk in the storage device, and the data length of the second IO request is 3 kB.
S630, the storage device 120 writes the first read IO request and all the second IO requests into the mechanical hard disk.
For the specific implementation process of S630, reference may be made to the related description of S240 above, and details are not described here.
In the data writing method provided by the embodiment of the application, the computing device may determine, by using the feature value of the first read IO request and the data feature analysis model, the second IO request having the continuously read association relationship with the first read IO request, and for the storage device including the plurality of storage media, the storage device sends the first storage message of the first read IO request and the plurality of second IO requests to the storage device, and the storage device writes the first read IO request and all the second IO requests into the mechanical hard disk together, thereby implementing sequential writing of a plurality of IO requests having been read successively, and further, in the data reading process, the disk in the mechanical hard disk sequentially reads the data according to the sequence of the plurality of IO requests, so that the sequential reading of the plurality of IO requests in the mechanical hard disk is realized, the data reading speed of the mechanical hard disk is increased, and the data reading speed of the storage device is increased.
With reference to fig. 5, in order to store IO requests that do not have a continuously read association relationship with the first read IO request and improve resource utilization of each storage medium in the storage device, the data writing method provided in the embodiment of the present application may further include the following steps.
S561, the computing device 100 determines at least one third IO request according to the feature value of the first IO request and the data feature analysis model.
The third IO request has no association relationship with the first read IO request to be read continuously. The computing device may determine whether the third IO request has a continuously read association relationship with the probability value of the first IO read request, for example, if the block included in the first IO read request is "page a", the block included in the IO request to be detected is "page d", and the probability threshold is 30%, the computing device determines that the probability value of the page d read after the page a is read is 25% and 25% < 30% by using the feature value of the first IO read request, determines that the page a and the page d do not have the continuously read association relationship, and further determines that the IO request to be detected and the first IO read request do not have the continuously read association relationship, and uses the IO request to be detected as the third IO request.
S562, the computing device 100 obtains the second storage message of each third IO request.
The second storage message instructs the storage device to write the third IO request into the solid state disk of the storage device. The second storage message may include the LBA, data length, or LUN of the data to be written in the second IO request. In the data writing method provided by the embodiment of the application, the LUN may be used to determine a target storage resource pool of the data to be written included in the second IO request in the storage device. The LBA and the data length can be referred to in relation to S220, and are not described herein.
S563, the computing device 100 sends all the second storage messages to the storage device 120.
S564, the storage device 120 writes the third IO request from the memory into the solid state disk of the storage device 120 according to the second storage message.
For example, the second storage message includes an LBA and a data length of the third IO request, where the LBA indicates a start address of the data to be written (e.g., page d) stored in the storage device, and the storage device obtains an end address of the data to be written according to the start address and the data length (e.g., 3 bytes), and further, the storage device may read the data to be written corresponding to the third IO request according to the start address and the end address.
As an alternative implementation manner, as shown in fig. 6, the above S564 specifically includes S640 to S660.
S640, the storage device 120 receives all second storage messages from the computing device.
S650, the storage device 120 determines a third IO request corresponding to each second storage message.
For example, the second storage message includes the LBA, the data length, and the LUN of the third IO request, for example, "LUN 3 LBA 52 kB," where the third IO request is located in the 6 th logical block of the LUN3 disk in the storage device, and the data length of the second IO request is 2 kB.
S660, the storage device 120 writes all the third IO requests into the solid state disk.
For the specific implementation process of S660, reference may be made to the related description of S260 above, and details are not described here.
By using the data writing method provided by the embodiment of the application, the storage device at least comprises two storage media, namely the HDD and the SSD, so that the performance and the cost of the storage device can be balanced, and the reasonability of the storage device is ensured. In addition, the computing device may determine whether the first read IO request and the IO request to be detected have a continuously read association relationship by using the data characteristic analysis model and the characteristic value of the first read IO request, and then, the computing device stores the plurality of IO requests into a mechanical hard disk or a solid state hard disk of the storage device according to whether the plurality of IO requests have a continuously read association relationship, so that data with strong randomness is prevented from being written into the mechanical hard disk, and the data read-write speed of the storage device is improved.
For example, the computing device sends a first storage message of a second IO request having an association relationship with a first read IO request that is continuously read to the storage device, and the storage device writes the first read IO request and a second IO request corresponding to the first storage message to the mechanical hard disk.
For another example, the computing device sends a second storage message of a third IO request that does not have an association relationship with the first read IO request and is continuously read to the storage device, and the storage device writes the third IO request corresponding to the second storage message into the solid state disk.
It is understood that, in order to implement the functions of the above embodiments, the computing device and the storage device include corresponding hardware structures and/or software modules for performing the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and method steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is performed as hardware or computer software driven hardware depends on the particular application scenario and design constraints imposed on the solution.
The data writing method provided according to the present embodiment is described in detail above with reference to fig. 1A to 6, and the data writing apparatus and the computing device provided according to the present embodiment are described below with reference to fig. 7 and 8. For the storage device provided in the embodiment of the present application, reference may be made to relevant contents in fig. 1A or fig. 1B, which are not described herein again.
Fig. 7 is a schematic diagram of a data writing apparatus provided in the present application. The data writing device can be used for realizing the functions of the storage device and the computing device in the method embodiment, so that the beneficial effects of the method embodiment can be realized. In this embodiment, the data writing apparatus may be the storage device 120 or the computing device 100 shown in fig. 1A or fig. 1B.
The structure and function of the data writing apparatus 700 are described below with reference to fig. 7, and the data writing apparatus 700 may implement the functions of the storage device or the computing device shown in fig. 2 and fig. 4 to 6. It should be understood that the present embodiment only exemplarily divides the structure and the functional modules of the data writing device 700, and the present application does not limit the specific division thereof.
As shown in fig. 7, the data writing apparatus 700 includes a transceiver 710, a processing unit 720, a storage unit 730, and a model obtaining unit 740, which may be used to implement the methods corresponding to the operation steps executed by the storage devices or the computing devices shown in fig. 2 to fig. 6.
When the data writing apparatus 700 is used to implement the functions of the storage device in the method embodiment shown in fig. 2, the transceiving unit 710 is configured to perform S210, the processing unit 720 is configured to perform S220, S230, and S250, and the storage unit 730 is configured to perform S240 and S260.
Optionally, when the data writing apparatus 700 is configured to implement the functions in the method embodiment shown in fig. 4, the model obtaining unit 740 is configured to execute S410 to S440.
When the data writing apparatus 700 is used to implement the functions of the computing device in the method embodiment shown in fig. 5, the transceiving unit 710 is configured to execute S510, S540, S561, and S564, and the processing unit 720 is configured to execute S520 and S562.
When the data writing apparatus 700 is used to implement the function of the storage device in the method embodiment shown in fig. 5, the storage unit 730 is used to execute S550 and S564.
When the data writing apparatus 700 is used to implement the functions of the storage device in the method embodiment shown in fig. 6, the transceiving unit 710 is configured to execute S610 and S640, the processing unit 720 is configured to execute S620 and S650, and the storage unit 730 is configured to execute S630 and S660.
More detailed descriptions about the data writing apparatus 700 can be obtained directly by referring to the related descriptions in the method embodiments shown in fig. 2 and fig. 4 to fig. 6, which are not repeated herein.
Fig. 8 is a schematic structural diagram of a computing device provided in the present application, where the computing device 800 includes a processor 810 and a communication interface 820. Processor 810 and communication interface 820 are coupled to one another. It is to be appreciated that the communication interface 820 can be a transceiver or an input-output interface. Optionally, the computing device 800 may also include a memory 830 for storing instructions to be executed by the processor 810 or for storing input data required by the processor 810 to execute the instructions or for storing data generated by the processor 810 after executing the instructions.
As a possible implementation manner, the processor 810 may generate data to be compressed in a tree structure according to the original data, and determine data occupancy information in the tree structure by using a cyclic network layer included in the data compression model. The data occupancy information is used to indicate the data distribution of the original data in the tree structure. Further, the processor 810 compresses the data to be compressed according to the data occupancy information to obtain compressed data.
When the computing device 800 is used to implement the methods shown in fig. 4-6, the processor 810, the communication interface 820 and the memory 830 may also cooperatively implement various operation steps in the data processing method performed by the transmitting end and the receiving end. The computing device 800 may also perform the functions of the data writing apparatus 700 shown in fig. 7, which are not described herein.
The specific connection medium among the communication interface 820, the processor 810 and the memory 830 is not limited in the embodiments of the present application. In fig. 8, the communication interface 820, the processor 810 and the memory 830 are connected by a bus 840, the bus is represented by a thick line in fig. 8, and the connection manner among other components is only schematically illustrated and is not limited. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.
The memory 830 can be used for storing software programs and modules, such as program instructions/modules corresponding to the data processing method provided in the embodiments of the present application, and the processor 810 executes the software programs and modules stored in the memory 830, so as to execute various functional applications and data processing. The communication interface 820 may be used for signaling or data communication with other devices. The computing device 800 may have multiple communication interfaces 820 in this application.
The memory may be, but is not limited to, RAM, ROM, PROM, EPROM, EEPROM, and the like.
The processor may be an integrated circuit chip having signal processing capabilities. The processor may be a general purpose processor including a CPU, NP, etc.; but may also be a DSP, ASIC, FPGA or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, or the like.
The method steps in the embodiments of the present application may be implemented by hardware, or may be implemented by software instructions executed by a processor. The software instructions may consist of corresponding software modules that may be stored in RAM, flash memory, ROM, PROM, EPROM, EEPROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a computing device. Of course, the processor and the storage medium may reside as discrete components in a computing device.
In the above embodiments, all or part of the implementation may be realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network appliance, a user device, or other programmable apparatus. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire or wirelessly. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, such as a floppy disk, hard disk, magnetic tape; or optical media such as Digital Video Disks (DVDs); but may also be a semiconductor medium, such as an SSD.
In the embodiments of the present application, unless otherwise specified or conflicting with respect to logic, the terms and/or descriptions in different embodiments have consistency and may be mutually cited, and technical features in different embodiments may be combined to form a new embodiment according to their inherent logic relationship.
The terms "first," "second," and "third," etc. in the description and claims of this application and the above-described drawings are used for distinguishing between different objects and not for limiting a particular order.
In the embodiments of the present application, words such as "exemplary" or "for example" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "such as" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, use of the word "exemplary" or "such as" is intended to present concepts related in a concrete fashion.
In the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. In the description of the text of the present application, the character "/" generally indicates that the former and latter associated objects are in an "or" relationship; in the formula of the present application, the character "/" indicates that the preceding and following related objects are in a relationship of "division".
It is to be understood that the various numerical references referred to in the embodiments of the present application are merely for descriptive convenience and are not intended to limit the scope of the embodiments of the present application. The sequence numbers of the above processes do not mean the execution sequence, and the execution sequence of the processes should be determined by the functions and the inherent logic.

Claims (25)

1. A method of writing data, the method being performed by a storage device, the method comprising:
acquiring a first read IO request;
determining at least one second IO request according to the characteristic value of the first read IO request and a data characteristic analysis model, wherein the second IO request and the first read IO request have an incidence relation of being continuously read, and the second IO request is stored in a memory of the storage device;
writing the first read IO request and the at least one second IO request together into a mechanical hard disk of the storage device.
2. The method of claim 1, wherein the characteristic values of the first read IO request comprise a first Logical Block Address (LBA) of data to be written, a data length, and a timestamp.
3. The method according to claim 1 or 2, characterized in that the method further comprises:
determining at least one third IO request according to the characteristic value of the first read IO request and the data characteristic analysis model, wherein the third IO request does not have an incidence relation with the first read IO request which is continuously read;
and writing the third IO request into a solid state disk of the storage device.
4. The method according to any one of claims 1 to 3, wherein determining at least one second IO request according to the feature values of the first read IO request and a data feature analysis model comprises:
outputting probability values of a plurality of IO requests which are read continuously according to the characteristic values of the first IO read requests and the data characteristic analysis model;
and taking the IO request with the probability value reaching the probability threshold as the second IO request.
5. The method according to any one of claims 1 to 4, wherein before determining at least one second IO request according to a feature value of the first read IO request and a data feature analysis model, the method further comprises:
and acquiring the characteristic value of the first read IO request in the memory.
6. The method according to any one of claims 1-5, further comprising: acquiring the data characteristic analysis model;
the acquiring the data characteristic analysis model specifically includes:
obtaining an IO training set, wherein the IO training set comprises a plurality of test IOs, and the characteristic values of the test IOs comprise read-write types, timestamps, second LBAs and data lengths of the test IOs;
inputting the IO training set into a first model to obtain associated information, wherein the associated information comprises probability values of any two test IOs in the plurality of test IOs which are continuously read;
and if the associated information meets the model convergence condition, taking the first model as the data feature analysis model.
7. A method of writing data, the method being performed by a computing device, the computing device being coupled to a storage device, the method comprising:
acquiring a first read IO request;
determining at least one second IO request according to the characteristic value of the first read IO request and a data characteristic analysis model, wherein the second IO request and the first read IO request have an incidence relation of being continuously read, and the second IO request is stored in a memory of the storage device;
acquiring a first storage message of each second IO request, wherein the first storage message indicates the storage device to write the second IO request into a mechanical hard disk of the storage device from the memory;
and sending the first read IO request and all the first storage messages to the storage device.
8. The method of claim 7, wherein the characteristic values of the first read IO request comprise a first Logical Block Address (LBA) of data to be written, a data length, and a timestamp.
9. The method according to claim 7 or 8, characterized in that the method further comprises:
determining at least one third IO request according to the characteristic value of the first read IO request and the data characteristic analysis model, wherein the third IO request does not have an incidence relation with the first read IO request which is continuously read;
obtaining a second storage message of each third IO request, where the second storage message indicates that the storage device writes the third IO request into a solid state disk of the storage device;
sending all of the second storage messages to the storage device.
10. The method according to any one of claims 7 to 9, wherein determining at least one second IO request according to the eigenvalues and data characteristic analysis model of the first read IO request comprises:
outputting probability values of a plurality of IO requests which are read continuously according to the characteristic values of the first IO read requests and the data characteristic analysis model;
and taking the IO request with the probability value reaching the probability threshold value as the second IO request.
11. The method according to any one of claims 7-10, further comprising: acquiring the data characteristic analysis model;
the acquiring the data characteristic analysis model specifically includes:
obtaining an IO training set, wherein the IO training set comprises a plurality of test IOs, and the characteristic values of the test IOs comprise read-write types, timestamps, second LBAs and data lengths of the test IOs;
inputting the IO training set into a first model to obtain associated information, wherein the associated information comprises probability values of any two test IOs in the plurality of test IOs which are continuously read;
and if the associated information meets the model convergence condition, taking the first model as the data feature analysis model.
12. A data writing apparatus, wherein the apparatus is applied to a storage device, the apparatus comprising:
the receiving and sending unit is used for acquiring a first read IO request;
the processing unit is used for determining at least one second IO request according to the characteristic value of the first IO request and a data characteristic analysis model, the second IO request and the first IO request have a continuously read association relationship, and the second IO request is stored in a memory of the storage device;
and the storage unit is used for writing the first read IO request and the at least one second IO request into a mechanical hard disk of the storage device together.
13. The apparatus of claim 12, wherein the characteristic values of the first read IO request comprise a first logical block address LBA of data to be written, a data length, and a timestamp.
14. The apparatus according to claim 12 or 13, wherein the processing unit is further configured to determine at least one third IO request according to the feature value of the first read IO request and the data feature analysis model, where the third IO request does not have an association relationship with the first read IO request, and the third IO request is read continuously;
the storage unit is further configured to write the third IO request into a solid state disk of the storage device.
15. The apparatus according to any one of claims 12 to 14, wherein the processing unit is specifically configured to output, according to the feature value of the first read IO request and the data feature analysis model, a probability value that a plurality of read IO requests are read consecutively, and to use a read IO request with a probability value reaching a probability threshold as the second IO request.
16. The apparatus according to any one of claims 12 to 15, wherein the transceiver unit is further configured to obtain a characteristic value of the first read IO request in the memory.
17. The apparatus according to any one of claims 12-16, further comprising: the model acquisition unit is used for acquiring the data characteristic analysis model;
the model obtaining unit is specifically configured to obtain an IO training set, where the IO training set includes a plurality of test IOs, and the characteristic values of the test IOs include read-write types, timestamps, second LBAs, and data lengths of the test IOs;
the model obtaining unit is specifically configured to input the IO training set to a first model to obtain associated information, where the associated information includes a probability value that any two test IOs of the plurality of test IOs are continuously read;
the model obtaining unit is specifically configured to take the first model as the data feature analysis model if the associated information meets a model convergence condition.
18. A data writing apparatus, applied to a computing device, the computing device being connected to a storage device, the apparatus comprising:
the receiving and sending unit is used for acquiring a first read IO request;
the processing unit is used for determining at least one second IO request according to the characteristic value of the first IO request and a data characteristic analysis model, the second IO request and the first IO request have a continuously read association relationship, and the second IO request is stored in a memory of the storage device;
the processing unit is further configured to obtain a first storage message of each second IO request, where the first storage message indicates that the storage device writes the second IO request from the memory into a mechanical hard disk of the storage device;
the transceiver unit is further configured to send the first read IO request and all the first storage messages to the storage device.
19. The apparatus of claim 18, wherein the characteristic values of the first read IO request comprise a first logical block address LBA of the data to be written, a data length, and a time stamp.
20. The apparatus according to claim 18 or 19, wherein the processing unit is further configured to determine at least one third IO request according to the feature value of the first read IO request and the data feature analysis model, where the third IO request does not have an association relationship with the first read IO request, and the third IO request is read continuously;
the processing unit is further configured to obtain a second storage message of each third IO request, where the second storage message indicates that the storage device writes the third IO request into a solid state disk of the storage device;
the transceiver unit is further configured to send all the second storage messages to the storage device.
21. The apparatus according to any one of claims 18 to 20, wherein the processing unit is specifically configured to output, according to the feature value of the first read IO request and the data feature analysis model, a probability value that a plurality of read IO requests are read consecutively;
the processing unit is specifically configured to use the read IO request with the probability value reaching the probability threshold as the second IO request.
22. The apparatus according to any one of claims 18-21, further comprising: the model acquisition unit is used for acquiring the data characteristic analysis model;
the model obtaining unit is specifically configured to obtain an IO training set, where the IO training set includes a plurality of test IOs, and the characteristic values of the test IOs include read-write types, timestamps, second LBAs, and data lengths of the test IOs;
the model obtaining unit is specifically configured to input the IO training set to a first model to obtain associated information, where the associated information includes a probability value that any two test IOs of the plurality of test IOs are continuously read;
the model obtaining unit is specifically configured to, if the associated information meets a model convergence condition, use the first model as the data feature analysis model.
23. A storage device, comprising: processor, mechanical hard disk and solid state disk, the processor being adapted to implement the method of any one of claims 1 to 6 by logic circuits or executing code instructions.
24. A data storage system comprising a computing device and a storage device, the computing device being connected to the storage device;
the computing device is used for acquiring a first read IO request;
the computing device is further configured to determine at least one second IO request according to the feature value of the first read IO request and a data feature analysis model, where the second IO request has a continuously read association relationship with the first read IO request, and the second IO request is stored in a memory of the storage device;
the computing device is further configured to obtain a first storage message of each second IO request, where the first storage message indicates that the storage device writes the second IO request from the memory into a mechanical hard disk of the storage device;
the computing device is further configured to send the first read IO request and all the first storage messages to the storage device;
the storage device is configured to receive the first read IO request and all the first storage messages;
the storage device is further configured to obtain a second IO request corresponding to each first storage message from the memory, and write the first read IO request and all the second IO requests into the mechanical hard disk together.
25. A computer storage medium, in which a computer program or instructions are stored, which, when executed by a storage device, implements the method of any one of claims 1 to 6, and which, when executed by a computing device, implements the method of any one of claims 7 to 11.
CN202110281305.4A 2021-03-16 2021-03-16 Data writing method and device Pending CN115079936A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110281305.4A CN115079936A (en) 2021-03-16 2021-03-16 Data writing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110281305.4A CN115079936A (en) 2021-03-16 2021-03-16 Data writing method and device

Publications (1)

Publication Number Publication Date
CN115079936A true CN115079936A (en) 2022-09-20

Family

ID=83246022

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110281305.4A Pending CN115079936A (en) 2021-03-16 2021-03-16 Data writing method and device

Country Status (1)

Country Link
CN (1) CN115079936A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115865803A (en) * 2023-03-03 2023-03-28 浪潮电子信息产业股份有限公司 IO request processing method, device, equipment and readable storage medium
CN117391149A (en) * 2023-11-30 2024-01-12 爱芯元智半导体(宁波)有限公司 Processing method, device and chip for neural network output data

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115865803A (en) * 2023-03-03 2023-03-28 浪潮电子信息产业股份有限公司 IO request processing method, device, equipment and readable storage medium
CN115865803B (en) * 2023-03-03 2023-08-22 浪潮电子信息产业股份有限公司 IO request processing method, device, equipment and readable storage medium
CN117391149A (en) * 2023-11-30 2024-01-12 爱芯元智半导体(宁波)有限公司 Processing method, device and chip for neural network output data
CN117391149B (en) * 2023-11-30 2024-03-26 爱芯元智半导体(宁波)有限公司 Processing method, device and chip for neural network output data

Similar Documents

Publication Publication Date Title
WO2022017002A1 (en) Garbage collection method and device
US11360705B2 (en) Method and device for queuing and executing operation commands on a hard disk
CN115079936A (en) Data writing method and device
WO2017132797A1 (en) Data arrangement method, storage apparatus, storage controller and storage array
WO2023045483A1 (en) Storage device and data storage method and storage system
US11334280B2 (en) Storage device feature extraction optimization
CN114300032A (en) Method and device for checking failure of storage medium and solid state disk
US20230236971A1 (en) Memory management method and apparatus
WO2023000770A1 (en) Method and apparatus for processing access request, and storage device and storage medium
Koh et al. Faster than flash: An in-depth study of system challenges for emerging ultra-low latency SSDs
CN113687977B (en) Data processing device for improving computing performance based on RAID controller
CN109375868B (en) Data storage method, scheduling device, system, equipment and storage medium
US20240104014A1 (en) Data management method, and storage space management method and apparatus
CN113687978A (en) Data processing method for storage array controller
CN108877862B (en) Data organization of page stripes and method and device for writing data into page stripes
WO2023020136A1 (en) Data storage method and apparatus in storage system
EP4321981A1 (en) Data processing method and apparatus
EP4325367A1 (en) Method and device for data caching
US20150212759A1 (en) Storage device with multiple processing units and data processing method
CN103064926B (en) Data processing method and device
CN116560560A (en) Method for storing data and related device
CN112000289B (en) Data management method for full flash storage server system and related components
CN115793957A (en) Method and device for writing data and computer storage medium
CN115963977A (en) Solid state disk, data operation method and device thereof, and electronic device
CN212341857U (en) Intelligent storage device, system and hard disk cartridge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination