CN110837479B

CN110837479B - Data processing method, related equipment and computer storage medium

Info

Publication number: CN110837479B
Application number: CN201810940730.8A
Authority: CN
Inventors: 饶蓉; 魏明昌
Original assignee: Huawei Cloud Computing Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2018-08-17
Filing date: 2018-08-17
Publication date: 2023-09-01
Anticipated expiration: 2038-08-17
Also published as: WO2020034729A1; CN110837479A

Abstract

The embodiment of the invention discloses a data processing method, which comprises the following steps: the object storage device receives a data read or write IO request, wherein the data read or write IO request comprises a data integrity field DIF, and the DIF is used for bearing attribute information of data to be operated. The attribute information of the data to be read or the attribute information of the data to be written may be specific. Further, the object storage device processes the data to be operated according to the attribute information of the data to be operated. By adopting the embodiment of the invention, the performance of the disk can be improved, and the waste of storage resources can be avoided.

Description

Data processing method, related equipment and computer storage medium

Technical Field

The present invention relates to the field of storage technologies, and in particular, to a data processing method, a related device, and a computer storage medium.

Background

With the development and progress of the internet, the global data volume is increasing at a very fast rate every day. The requirements for data storage systems are particularly important in the face of such vast amounts of data. Currently, to meet the storage requirement of mass data, a distributed storage system is generally adopted to store the data.

However, in practice, it is found that the current distributed storage system uses a uniform storage policy and a storage layout to store data, which will bring about a loss of disk performance and a waste of storage resources.

Disclosure of Invention

The embodiment of the invention discloses a data processing method, related equipment and a computer storage medium, which reduce the performance loss of a magnetic disk and the waste of storage resources.

In a first aspect, an embodiment of the present invention discloses a data processing method, where the method includes: the object storage equipment receives a data read IO request sent by a host, wherein the data read IO request comprises a data integrity field DIF, and the DIF carries attribute information of data to be read, such as the size of the data to be read, the reading granularity of the data to be read, the reading information of an address to be read corresponding to the data to be read and the like. Accordingly, the object storage device may process the data to be read according to the attribute information of the data to be read.

By implementing the embodiment of the invention, the data to be read can be stored and managed according to the attribute information of the data to be read, and personalized storage is performed by utilizing the data characteristics, so that the problems of lower disk performance, storage resource waste and the like are avoided.

In some possible embodiments, the attribute information of the data to be read includes a read granularity of the data to be read, the read granularity being used to indicate a size of the pre-read data including the data to be read. Correspondingly, if the size of the read-ahead data is greater than or equal to the first threshold, the object storage device can read the read-ahead data including the data to be read and send the read-ahead data to the client so as to cache the read-ahead data in the memory of the client. When the host sends the same data read IO request next time, corresponding data to be read is directly read from the cache of the client, so that the data reading time is saved, and the data processing efficiency is improved.

If the size of the pre-read data is smaller than the first threshold, the object storage device can directly read the data to be read and send the data to be read to the host.

By implementing the embodiment of the invention, whether the pre-read data is extracted or not can be determined according to the size of the pre-read data, so that the data can be conveniently read from the memory of the client directly at the next time, the data reading time is saved, and the data processing efficiency is improved.

In some possible embodiments, the attribute information of the data to be read includes read information of an address where the data to be read corresponds to. The read information may include, but is not limited to, read frequency or read type, etc. The read information is used for indicating the frequency of data reading of the address to be read in unit time. Correspondingly, when the read information is the first read information, the object storage device writes the data to be read into the nonvolatile cache of the object storage device. When the read information is the second read information, the object storage device writes the data to be read into a nonvolatile cache of the object storage device, and the expiration time is set in the nonvolatile cache so as to clear the data stored in the nonvolatile cache for a time period exceeding the expiration time. Wherein the frequency indicated by the first read information is greater than the frequency indicated by the second read information.

In some possible embodiments, the attribute information of the data to be read includes read information of an address where the data to be read corresponds to. The read information is used for indicating the frequency of data reading of the address to be read in unit time. And when the frequency indicated by the read information is greater than or equal to a second threshold value, sending the data to be read to the client so as to be cached in the memory of the client. And the data to be read is convenient to be directly read from the memory of the client side next time, so that the data reading efficiency is improved. Correspondingly, when the frequency indicated by the read information is smaller than the second threshold value, the data to be read is sent to the client, and the read data is cached in the hard disk of the client.

In a second aspect, an embodiment of the present invention discloses a data processing method, where the method includes: the object storage device receives a data writing IO request forwarded by the client, wherein the data writing IO request comprises a data integrity field DIF, and the DIF carries attribute information of data to be written, such as the size of the data to be written, the writing granularity of the data to be written, the writing information of an address to be written corresponding to the data to be written and the like. Correspondingly, the object storage device processes the data to be written according to the attribute information of the data to be written. By implementing the embodiment of the invention, the attribute information of the data to be written can be reasonably utilized to carry out storage management on the data to be written, so as to solve the problems of disk performance loss, storage resource waste and the like in the existing distributed storage system.

In some possible embodiments, the attribute information of the data to be written includes writing information of an address to be written corresponding to the data to be written, such as a writing frequency or a writing type. The write information is used to indicate how often data writing occurs at the address to be written per unit time. Accordingly, when the write information is the first write information, the object storage device may store the data to be written into the nonvolatile cache of the object storage device. When the write information is the second write information, the object storage device may store the data to be written into the hard disk of the object storage device. Wherein the frequency indicated by the first writing information is greater than the frequency indicated by the second writing information. The read-write rate of the nonvolatile cache is greater than the read-write rate of the hard disk.

By implementing the steps, the data to be written can be stored in the corresponding cache or hard disk according to the attribute characteristics of the data to be written, so that the high efficiency of data processing is facilitated.

In some possible embodiments, the attribute information of the data to be written includes a write granularity of the data to be written, the write granularity being used to indicate a storage granularity employed when storing the data to be written. After receiving the data write IO request sent by the host, the client can allocate corresponding stripe information for the data to be written according to the attribute information of the data to be written, wherein the stripe information comprises one or more stripe units used when the data to be written is stored and the physical address of each stripe unit. Wherein each stripe unit is used for storing a small part of the data to be written, which may also be referred to as data to be stored. Further, the client sends the stripe unit used when the data to be written is stored to the corresponding object storage device according to the stripe information, wherein the stripe unit carries the physical address of the client and the data to be stored which needs to be stored. Accordingly, the object storage device receives stripe units sent by the client. The object storage device writes part of data (data to be stored) in the data to be written into the hard disk of the object storage device according to the physical address of the stripe unit.

By implementing the above steps, the client can configure the corresponding stripe for the data to be written according to the attribute information thereof, wherein the stripe is composed of one or more stripe units, and each stripe unit is mapped with the hard disk in the object storage device. Accordingly, after receiving the stripe unit, the object storage device writes or stores the data to be written to the hard disk according to the stripe unit. The method is beneficial to improving the performance of the hard disk and the utilization rate of the hard disk.

In some possible embodiments, after configuring the stripe information, the client may create a corresponding storage mapping relationship for the data to be written according to the writing granularity of the data to be written, that is, the storage mapping relationship of the data to be written. Correspondingly, the object storage device receives the storage mapping relation of the data to be written sent by the client, wherein the storage mapping relation of the data to be written comprises the mapping relation between the address to be written and the physical address when the data to be written is stored.

In some possible embodiments, the object storage device may further create a storage mapping relationship of the data to be stored according to the writing granularity of the data to be written, where the storage mapping relationship includes a mapping relationship between a logical address and a physical address when the data to be stored is stored, and the logical address is related to the address to be written of the data to be written. Specifically, the logical address may be obtained according to the address to be written and the writing granularity of the data to be written.

In a third aspect, an embodiment of the present invention discloses a data processing method, where the method includes: the host acquires attribute information of the data to be operated, and generates a data operation request according to the attribute information of the data to be operated. Further, the host sends a data operation request to the client, the data operation request including the data integrity field DIF. The DIF is used for carrying attribute information of data to be operated.

By implementing the embodiment of the invention, the host can obtain the attribute information of the data to be operated according to the actual demand, so as to generate a data operation request and send the data operation request to the client or forward the data operation request to the object storage device through the client. The method is convenient for the client or the object storage device to store and manage the data to be operated according to the attribute information of the data to be operated, so that the data to be operated is stored according to the attribute characteristics of the data, the performance loss of a disk is reduced, the waste of storage resources is avoided, and the like.

In some possible implementations, the data operation request may be a data read IO request. Correspondingly, the attribute information of the data to be operated may include the reading granularity of the data to be read and/or the reading information of the reading address corresponding to the data to be read, and the reading information may be specifically the reading frequency or the reading type. Wherein the read granularity is used to indicate the size of the pre-read data including the data to be read. The read information is used for indicating the frequency of data reading of the address to be read in unit time.

In some possible implementations, the data operation request may be a data write IO request. Correspondingly, the attribute information of the data to be operated comprises the writing granularity of the data to be written and/or the writing information of the address to be written corresponding to the data to be written, and the writing new specific can be the writing frequency or the writing type. The write granularity is used for indicating the storage granularity adopted when data to be written is stored. The write information is used to indicate the frequency at which data writing occurs at the address to be written in a unit time.

In a fourth aspect, an embodiment of the present invention discloses a data processing method, where the method includes: the client receives a data writing IO request sent by the host, wherein the data writing IO request comprises a data integrity field DIF, and attribute information of data to be written is carried in the DIF. Correspondingly, the client configures corresponding stripe information for the data to be written according to the attribute information of the data to be written, wherein the stripe information comprises at least one stripe unit and a physical address of each stripe unit, and the at least one stripe unit is used when the data to be written is stored. And the client sends the stripe unit carrying the partial data to be written and the physical address to the corresponding object storage device according to the stripe information.

In some possible embodiments, the attribute information of the data to be written includes a write granularity of the data to be written. Accordingly, the client may configure a stripe for the data to be written having a stripe unit size matching the write granularity, the stripe information (i.e., stripe information) including at least one stripe unit at the time of data storage to be written and a respective physical address of each stripe unit. Each stripe unit is used to store a portion of the data to be written. In other words, the client selects the stripe with the same stripe unit and writing granularity to store the data to be written, which is beneficial to improving the utilization rate of the disk.

In some possible embodiments, the attribute information of the data to be written includes writing information of an address to be written corresponding to the data to be written, and the writing information is used for indicating a frequency of data writing of the address to be written in a unit time. Accordingly, the client may configure the stripe for the data to be written that matches the frequency indicated by the write information. The write type may specifically be a write frequency or a write type. In other words, the client may store the data to be written of the same write type or in the same write frequency range into the same stripe.

In a fifth aspect, an embodiment of the present invention provides a data processing apparatus (specifically may be an object storage device), including a communication unit and a processing unit, where:

the communication unit is configured to receive a data read IO request, where the data read IO request includes a data integrity field DIF, where the DIF is used to carry attribute information of data to be read;

the processing unit is used for processing the data to be read according to the attribute information of the data to be read.

In some possible embodiments, the attribute information of the data to be read includes a read granularity of the data to be read, and is used to indicate a size of the pre-read data including the data to be read. The processing unit is specifically configured to read the pre-read data including the data to be read when the size of the pre-read data is greater than or equal to a first threshold, and send the pre-read data to a client to be cached in a memory of the client; or when the size of the preset data is smaller than a first threshold value, reading the data to be read, and sending the data to be read to a host.

In some possible embodiments, the attribute information of the data to be read includes read information of an address to be read corresponding to the data to be read, where the read information is used to indicate a frequency of data reading of the address to be read in a unit time. The processing unit is specifically configured to write the data to be read into a nonvolatile cache of the object storage device when the read information is first read information; or when the read information is the second read information, writing the data to be read into a nonvolatile cache of the object storage device, and setting an expiration time in the nonvolatile cache of the object storage device so as to clear data stored in the nonvolatile cache for a time longer than the expiration time; wherein the frequency indicated by the first read information is greater than the frequency indicated by the second read information.

Regarding what is not shown or described in the embodiments of the present invention, reference may be made to the related descriptions in the foregoing embodiments of the first aspect, which are not described herein.

In a sixth aspect, an embodiment of the present invention provides another data processing apparatus (specifically may be an object storage device), including a communication unit and a processing unit, where:

the communication unit is configured to receive a data write IO request, where the data write IO request includes a data integrity field DIF, where the DIF is configured to carry attribute information of data to be written;

the processing unit is used for processing the data to be written according to the attribute information of the data to be written.

In some possible embodiments, the attribute information of the data to be written includes writing information of an address to be written corresponding to the data to be written, where the writing information is used to indicate a frequency of data writing of the address to be written in a unit time. The processing unit is specifically configured to store the data to be written into a nonvolatile cache of the object storage device when the write information is first write information; or when the writing information is the second writing information, the object storage device stores the data to be written into a hard disk of the object storage device; wherein the frequency indicated by the first writing information is greater than the frequency indicated by the second writing information.

In some possible embodiments, the attribute information of the data to be written includes a write granularity of the data to be written, which is used to indicate a storage granularity adopted when the data to be written is stored. The client receives the data write IO request, and configures corresponding stripe information for the data to be written according to the attribute information of the data to be written, wherein the stripe information comprises stripe units used when the data to be written is stored and physical addresses of the stripe units; the client sends the stripe unit carrying the data to be written and the physical address to the communication unit according to the stripe information; and the communication unit receives the stripe unit which is sent by the client and carries the data to be written and the physical address. Correspondingly, the processing unit is specifically configured to store the data to be written into the hard disk of the object storage device according to the physical address of the stripe unit.

In some possible implementations, the processing unit is further configured to create a storage mapping relationship of the data to be written according to a writing granularity of the data to be written, where the storage mapping relationship includes a mapping relationship between the address to be written and the physical address when the data to be written is stored.

Regarding what is not shown or described in the embodiments of the present invention, reference may be made to the related descriptions in the embodiments of the foregoing second aspect, which are not repeated herein.

In a seventh aspect, an embodiment of the present invention provides another data processing apparatus (specifically, may be a host), including a communication unit and a processing unit, where:

the processing unit is used for acquiring attribute information of data to be operated;

the communication unit is configured to send a data operation request to a client, where the data operation request includes a data integrity field DIF, and the DIF is configured to carry attribute information of the data to be operated.

In some possible embodiments, when the data operation request is a data read IO request, the attribute information of the data to be operated includes a read granularity of the data to be read and/or read information of an address to be read corresponding to the data to be read. The reading granularity of the data to be read is used for indicating the size of the pre-read data including the data to be read; the read information of the address to be read is used for indicating the frequency of data reading of the address to be read in unit time.

In some possible embodiments, when the data operation request is a data write IO request, the attribute information of the data to be operated includes write granularity of the data to be written and/or write information of an address to be written corresponding to the data to be written. The write granularity of the data to be written is used for indicating the storage granularity adopted when the data to be written is stored; the writing information of the address to be written is used for indicating the frequency of data writing of the address to be written in unit time.

Regarding what is not shown or described in the embodiments of the present invention, reference may be made to the related descriptions in the embodiments of the foregoing third aspect, which are not described herein.

In an eighth aspect, an embodiment of the present invention provides an object storage device, including: a processor, a memory, a communication interface and a bus; the processor, the communication interface and the memory are communicated with each other through a bus; a communication interface for receiving and transmitting data; a memory for storing instructions; a processor for invoking instructions in memory to perform the method described in the first aspect or any possible implementation of the first aspect. In a ninth aspect, an embodiment of the present invention provides still another object storage apparatus, including: a processor, a memory, a communication interface and a bus; the processor, the communication interface and the memory are communicated with each other through a bus; a communication interface for receiving and transmitting data; a memory for storing instructions; a processor for invoking instructions in memory to perform the method described in the second aspect or any possible implementation of the second aspect.

In a tenth aspect, an embodiment of the present invention provides a host, including: a processor, a memory, a communication interface and a bus; the processor, the communication interface and the memory are communicated with each other through a bus; a communication interface for receiving and transmitting data; a memory for storing instructions; a processor for invoking instructions in memory to perform the method described in the third aspect or any possible implementation of the third aspect. In an eleventh aspect, an embodiment of the present invention provides a client, including: a processor, a memory, a communication interface and a bus; the processor, the communication interface and the memory are communicated with each other through a bus; a communication interface for receiving and transmitting data; a memory for storing instructions; a processor for invoking instructions in memory to perform the method described in the fourth aspect or any possible implementation of the fourth aspect.

In a twelfth aspect, a computer non-transitory (non-transitory) storage medium storing program code for data processing is provided. The program code comprising instructions for performing the method described in the first aspect or any possible implementation of the first aspect.

In a thirteenth aspect, a computer non-transitory storage medium storing program code for data processing is provided. The program code comprising instructions for performing the method described in the second aspect or any possible implementation of the second aspect.

In a fourteenth aspect, a computer non-transitory storage medium storing program code for data processing is provided. The program code comprising instructions for performing the method described in the third aspect or any possible implementation of the third aspect.

In a fifteenth aspect, a computer non-transitory storage medium storing program code for data processing is provided. The program code comprises instructions for performing the method described in the fourth aspect or any possible implementation of the fourth aspect.

The above storage medium may be nonvolatile.

In a sixteenth aspect, a chip product is provided for performing the method of the first aspect or any of the possible implementations of the first aspect.

In a seventeenth aspect, a chip product is provided for carrying out the method of the second aspect or any of the possible embodiments of the second aspect.

In an eighteenth aspect, a chip product is provided for carrying out the method of the third aspect or any possible implementation of the third aspect.

In a nineteenth aspect, a chip product is provided for carrying out the method of the fourth aspect or any possible implementation of the fourth aspect.

By implementing the embodiment of the invention, the problems of resource waste and performance loss in the existing distributed storage system can be solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a schematic diagram of a network architecture of a data processing system according to an embodiment of the present invention.

Fig. 2 is a flow chart of a data processing method according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a format of a data operation request according to an embodiment of the present application.

Fig. 4 is a flowchart of another data processing method according to an embodiment of the present application.

Fig. 5 is a flowchart of another data processing method according to an embodiment of the present application.

Fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application.

Fig. 7 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present application.

Detailed Description

The following describes the technical scheme in the embodiment of the present application in detail with reference to the drawings.

First, some technical terms related to the present application will be described.

Logical block addresses (logical block address, LBA), also known as logical addresses or relative addresses. Refers to the address that a block of data stored on a disk or tape has for retrieval or overwriting. It may also refer to accessing an address (or operand) given by an instruction as a logical address in a computing device having address translation functionality.

Physical block addresses (physical block address, LBA), also known as physical addresses. Refers to an address from the address bus of the central processing unit (central processing unit, CPU) for addressing. Where physical addresses are typically mapped onto memory or storage. In a computing device having address translation functionality, logical addresses may be translated into actual effective addresses (i.e., physical addresses) in memory by way of calculation or translation of addressing.

Stripe (stripe) is a method of dividing continuous data into blocks of the same size, and writing each segment of data to a different disk in the array. Among these parameters affecting the banding effect are two: stripe depth and stripe size. Stripe depth refers to the number of stripes that can be read or written in parallel at the same time. Stripe size (stripe size) refers to the size of the data block written to each disk. The application will be illustrated hereinafter.

In order to solve the problems of performance loss, resource waste and the like in the existing data storage scheme, the application provides a data processing method, a network framework and related equipment, wherein the network framework is suitable for the data processing method. Referring first to FIG. 1, a schematic diagram of a network architecture of a data processing system according to an embodiment of the present application is shown. The data processing system 100 as shown in fig. 1 includes application software 102, a virtual disk interface 104, a client 106, a metadata controller 108 (meta data controller, MDC), and an object storage device 110 (object-based storage device, OSD). Wherein,

the application software 102 may issue a data read or write input/output (IO) request to the virtual disk according to its actual requirement, where the data read/write IO request carries information such as a size of data to be read/written and a logical block address (logical block address, LBA) of the data storage, which is hereinafter abbreviated as a logical address.

The virtual disk interface 104 may be an interface provided by a virtual block storage (virtual blcok storage, VBS) management component for accessing virtual disks. Specifically, the application software 102 sends data read/write IO requests to the corresponding virtual disk via the virtual disk interface 104 to read or write data from or to the corresponding disk. In practical applications, the virtual disk interface 104 and the application software 102 are typically deployed on the same physical device (e.g., host or server, etc.). The metadata controller 108 is responsible for managing the object storage devices 110, the number of the object storage devices 110 may be one or more, the illustration being shown with n being an example, n being a positive integer. In a distributed data storage system, the number of object storage devices 110 is typically multiple.

Data processing system 100 may also be deployed in two parts, application software 102 and virtual disk interface 104 being deployed on the client's devices, client 106, metadata controller 108, and object storage device 110 being deployed within the cloud service provider's data center. The cloud service provider provides object storage services to clients through clients 106, metadata controllers 108, and object storage devices 110. Clients access the object storage services through application software 102 and virtual disk interface 104.

In particular, the metadata controller 108 may be responsible for maintaining connection states between the object storage device 110 and the client 106, including online and offline states. The online state refers to that normal communication can be performed between the object storage device 110 and the client 106, and a communication connection relationship is established. The offline state refers to no communication connection between the object storage device 110 and the client 106, i.e., the client cannot store data in the object storage device in the offline state. Optionally, the metadata controller 108 may further deploy the object storage device 110 according to a certain deployment policy, such as a partition deployment policy, a balanced load deployment policy, etc., without limitation herein. Accordingly, the metadata controller 108 may learn information about each object storage device OSD currently in communication with the client 106, such as the network protocol address (internet protocol address, IP) of the OSD, the identity of the OSD, and the like. Optionally, the metadata controller 108 may send the information of the object storage device 110 to the client 106 in advance, so that after the client 106 receives the data read/write IO request, the corresponding object storage device 110 can be calculated according to the LBA logical address in the request, which is not described in detail herein. Further, the client may establish communication between the client 106 and the object storage device 110 based on information (e.g., identification, IP address, etc.) of the OSD, such as the client 106 forwarding data read/write IO requests to the object storage device, etc.

In practical applications, the metadata controller 108 may be deployed on a physical device alone, or distributed on one or more physical devices, and is not limited thereto.

The client 106 may be a software module configured to receive data read/write IO requests issued by the application software 102 via the virtual disk interface 104. Further, the client 106 may configure a corresponding data stripe for the data to be read/written according to the data read/write IO request to write or store the data into the data stripe. In other words, the data is stored in a striped manner in the present application. Optionally, the client 106 may also calculate an Error Code (EC) corresponding to the data using a preset algorithm. The preset algorithm is set by a user or a system in a user-defined manner, such as an EC check code algorithm and the like. The application is not described in detail here as to how to calculate the EC check code, and how to configure the data stripes is described in detail below.

Optionally, the client 106 may also send the data stripe in which the data is to be stored, the EC check code, and the received data read/write IO request together or separately to the corresponding object storage device OSD. For example, after the client 106 obtains the data stripe and the EC check code of the data to be stored, the two information may be newly added to the data read/write IO request for repackaging, and the repackaged data read/write IO request is sent to the target storage device.

Accordingly, the object storage device 110 may receive the data read/write IO request and perform the reading or writing of the corresponding data according to the request. When the object storage device 110 receives a data read IO request, a data block, metadata, and the like in the corresponding virtual disk are read according to the LBA logical address in the request. Metadata herein may refer to data describing attributes of a data block, such as one or more of a logical address, a physical address, a size of the data block, where the data block is stored. Accordingly, when the object storage device 110 receives a data write IO request, data may be written to the corresponding virtual disk according to the LBA logical address in the request. How the object storage device performs the corresponding data operation according to the data read/write IO request is described in detail below.

In practical applications, application software 102, virtual disk interface 104, and client 106 are typically deployed on a physical device. The object storage device 110 is deployed to another physical device. Alternatively, the object storage device 110 and application software 102, as well as the client 106, etc. may be deployed on the same physical device. The metadata controller 108 is deployed separately to another physical device. The application is not limited with respect to the deployment of the various parts in fig. 1.

In an alternative embodiment, object storage device 110 includes a cache and a disk. The number of caches and disks is not limited, but 1 cache and 1 disk are illustrated as an example. Including but not limited to virtual or physical disks, etc. In order to avoid the problem of disk conflict caused by multiple processes accessing the same disk, the application adopts a stripe technology to balance IO load (data) into the disk of each object storage device. The following description will be made regarding data storage by taking n object storage devices as examples. That is, the disks of each of the n object storage devices each provide a corresponding storage space (stripe unit) for storing a small portion of the data. As shown, the data may be divided into n shares, resulting in n data blocks, each data block comprising a small portion of the data. Accordingly, each object storage unit in the n object storage devices may provide a stripe unit to store one data block, i.e. n stripe units may be obtained to store n data blocks correspondingly. As shown, object store 1 may provide stripe unit 1 to store a first data block (data block 1) and object store 2 may provide stripe unit 2 to store a second data block (data block 2). By analogy, object store n may provide stripe unit n to store the nth data block (data block n). In other words, the stripe used to store the entire piece of data here is composed of n stripe units from n object storage devices, i.e., one stripe unit provided by each of n disks.

Accordingly, when a certain object storage device learns a stripe unit carrying a data block, the data block can be stored on the stripe unit according to the physical address of the stripe unit, that is, the data block is written or stored in a disk of the object storage device according to the physical address of the stripe unit. For example, taking the object storage device 1 as an example, after knowing that the stripe unit 1 needs to store the data block 1, the data block 1 can be written to the stripe unit 1 according to the physical address of the stripe unit 1, that is, written to its own disk.

Similarly, when n object storage devices each learn the data blocks to be stored in the stripe units, the corresponding data blocks can be stored according to the physical addresses of the corresponding stripe units. When the n object storage devices all finish the storage of the respective data blocks, the storage of the whole data can be finished.

Wherein the cache in the object storage device is used to temporarily or transiently store data. Optionally, a timer may be further set in the cache, and when the data stored in the cache exceeds a preset expiration period, the data in the cache may be processed, for example, the data in the cache is cleared, or the data in the cache is written to a hard disk, or the like.

Next, please refer to fig. 2, which is a flowchart illustrating a data processing method according to an embodiment of the present application. The data processing method as shown in fig. 2 includes the following implementation steps:

step S202, the host acquires attribute information of the data to be operated, and generates a data operation request according to the attribute information of the data to be operated.

Step S204, the host sends a data operation request to the client. Accordingly, the client receives the data manipulation request and forwards it to the object storage device.

In the present application, the host refers to a physical device running application software. Specifically, the application software in the host can obtain attribute information of the data to be operated according to actual requirements. Further, the host may generate a data operation request and encapsulate attribute information of the data to be operated into a data integrity field (date inyegrity field, DIF) in the data operation request. And further sends the data manipulation request to the client for forwarding to the object storage device via the client.

The data to be operated herein may include, but is not limited to, data to be read or data to be written, etc. Accordingly, the attribute information of the data to be operated may specifically refer to attribute information of the data to be written, or attribute information of the data to be read, and the like. The attribute information about the data to be operated is specifically set forth below.

In step S206, the object storage device obtains a data operation request, where the data operation request includes attribute information of data to be operated, where the attribute information is used to describe an attribute of the data to be operated.

Step S208, the object storage device processes the data to be operated according to the attribute information of the data to be operated.

In the present application, the data operation request may specifically be a data read IO request or a data write IO request. When the data operation request is a data read IO request, the data to be operated may specifically be data to be read. When the data operation request is a data write IO request, the data to be operated may specifically be data to be written.

The data operation request includes a flag field set by a user or a system in a user-defined manner, where the flag field is used to indicate attribute information of the data to be operated, such as any one or more of a size of the data to be operated, a granularity of data IO, a logical address of the data to be operated, read information (such as a read frequency or a read type) related to the logical address, and write information (such as a write frequency or a write type), which will be described in detail below.

Specifically, a schematic diagram of the format of a data operation request is shown in fig. 3. The data operation request includes a data field and a data integrity field (date inyegrity field, DIF). The data field is used to carry data to be transmitted, which may be specifically data to be operated (data to be read or data to be written). The DIF field is used to detect the integrity of the data to be transmitted, and as shown in the figure, the DIF field includes a check area, a version area, an application software area, and an address area. The sizes of the data field and the DIF field may be specifically set by a user or a system in a user-defined manner, for example, several storage modes such as 512+8 bytes (bytes), 4096+8, 4096+64, and the like are adopted, which is not limited by the present application. Wherein,

the check region is used to carry check data, such as cyclic redundancy check (cyclic redundancy check, CRC) data, and the like. The size of the bytes occupied by the check area may be user or system custom set, e.g., 2 bytes.

The version area is used to indicate the version number of the DIF, and the size occupied by the version area may be set for user or system customization, for example, 1 byte.

The address area is used for indicating a logical address LBA where the data to be operated corresponds. The size of its occupation may be specifically set for user or system customization, for example 4 bytes.

The application software area is composed of a reserved field and an LBA valid indication field, and the size occupied by the application software area can be specifically set by a user or a system in a custom manner, for example, the illustration is shown by taking 1 byte as an example. The LBA valid indication field is used for indicating whether the LBA logical address carried by the address area is valid. Specifically, when the LBA valid indication field is a first preset character (e.g., 1), it indicates that the LBA logical address carried in the address area is valid; when the LBA valid indication field is a second preset character (e.g. 0), it indicates that the LBA logical address carried in the address area is invalid. The size occupied by the LBA valid indication field may be specifically set by a user or system user, for example, 1 bit (bit).

The reserved field is a field to be defined, and the occupied size of the reserved field can be set for user or system customization, such as 7 bits. The application redefines the reserved field, namely, the attribute information of the data to be operated is carried in the reserved field. Specifically, the reserved field includes, but is not limited to, any one or more of a data IO granularity field, a data write field, and a data read field, where the size and the position occupied by each of the reserved fields may be defined according to actual requirements, and the present application is not limited thereto. Wherein,

The data IO granularity field has different meaning represented by the field in different application scenarios. Specifically, in the application scenario of writing data (i.e., the data operation request is a data write IO request), the data IO granularity field may specifically be a write granularity field, which is used to indicate a minimum unit size adopted when the data to be operated is stored. In other words, the data to be stored is divided into minimum unit granularity at the time of storage. The size and location occupied by the storage granularity field may be specifically set by a user or a system. Illustratively, the storage granularity field occupies 3 bits, and the write granularity of its corresponding representation is 8 of the following: 512 bytes, 1KByte, 4KByte, 8KByte, 16KByte, 32KByte, 64KByte, 12KByte, etc.

In the read data application scenario (i.e. the data operation request is a data read IO request), the data IO granularity field may specifically be a read granularity field, which is used to indicate the size of the read-ahead data, where the read-ahead data includes the data to be operated. The pre-read data is the data to be read. For example, assume that the upper layer application software needs to read 1M data (i.e., the size of the pre-read data is 1M), and the size of the data read IO request sent by each application software support is 1K, i.e., the size of the data requested to be read by the data read IO request issued by each application software is 1K. Correspondingly, in order to complete the reading of 1M data, the application software needs to issue 1000 data read IO requests. However, the application can indicate the size of the pre-read data by reading the granularity field, and the pre-read data can be read in advance by adopting one data read IO request, thereby saving time and improving the high efficiency of data reading.

The data writing field is used for indicating writing information of an address to be written, wherein the address to be written is an address when data to be operated is written, and the address to be written can be specifically an LBA logical address. The write information includes, but is not limited to, a write type or write frequency, etc. In other words, the data write field is used to indicate the write type or write frequency of the address to be written. Specifically, the data writing field may be used to reflect the writing frequency of writing data in the address to be written in a unit time, that is, the frequency of writing data in the address to be written in a unit time.

Optionally, the device (in particular, the client or the object storage device) in the present application may further obtain a write type of the address to be written according to the write frequency of the address to be written. For example, when the writing frequency of the address to be written is within the first threshold range, the corresponding writing type may be considered or determined to be the first writing type, such as being quickly overwritten or being frequently overwritten. When the write frequency of the address to be written is within the second threshold range, the corresponding write type may be considered or determined to be the second write type, e.g., rarely overwritten, etc. The upper limit value and the lower limit value of each of the first threshold range and the second threshold range may be specifically set by a user or a system in a user-defined manner, and the lower limit value of the first threshold range is greater than or equal to the upper limit value of the second threshold range.

The size occupied by the data writing field can be specifically set by a user or a system according to actual requirements, such as 2 bits and the like. When there are multiple data writes to the same address to be written, the data will be covered, i.e. the data written later will cover the data written before. Therefore, the write frequency or type of the address to be written also indicates the write frequency or type of the address to be written where the data overwrite occurs.

And the data reading field is used for indicating the reading information of the address to be read. The address to be read is an address when the data to be operated is read, and may specifically be an LBA logical address. The read information includes, but is not limited to, a read type or a read frequency of an address to be read. Specifically, the data reading field may be used to reflect a reading frequency of data reading of the address to be read in a unit time.

Optionally, the device (specifically, the client or the object storage device) of the present application may further obtain a read type of the address to be read according to the read frequency of the address to be read. For example, when the reading frequency of the address to be read is within the third threshold range, the corresponding reading type may be considered or determined to be the first reading type, such as regular reading. When the reading frequency of the address to be read is in the fourth threshold range, the corresponding reading type can be considered or determined to be the second reading type, for example, the address to be read is about to be read for a plurality of times or sequentially read. When the read frequency of the address to be read is within the fifth threshold range, the corresponding read type may be considered or determined to be the third type, such as rarely read or read only once. The third threshold range, the fourth threshold range and the fifth threshold range may be specifically set by a user or a system in a custom manner. Wherein the lower limit value of the third threshold range is greater than or equal to the upper limit value of the fourth threshold range, and the lower limit value of the fourth threshold range is greater than or equal to the upper limit value of the fifth threshold range.

The size occupied by the data read field may be specifically set by a user or system, for example, 2 bits, etc.

The following describes related embodiments related to the data storage method according to the present application in conjunction with two application scenarios to which the present application is applicable. First, in an application scenario of writing data, fig. 4 is a schematic flow chart of another data processing method according to an embodiment of the present application. The method shown in fig. 4 includes the following implementation steps:

step S402, the application software acquires attribute information of the data to be written.

In step S404, the application software sends a data write IO request to the client, where the data write IO request includes a data integrity field DIF, where the DIF is used to carry attribute information of the data to be written. Accordingly, the client receives the data write IO request.

In the application, the application software (specifically, the host running the application software) can issue a data write IO request to the client according to actual requirements, wherein the data write IO request comprises the attribute information of the data to be written. Specifically, the attribute information of the data to be written is carried in the DIF field in the data write IO request, where the attribute information of the information to be written may include, but is not limited to, any one or more of the following combinations: the address to be written (specifically, the LBA logical address) corresponding to the data to be written, the size of the data to be written, the writing information of the address to be written, the data IO granularity and other information. Accordingly, the client receives the data write IO request.

In step S406, the client configures corresponding stripe information for the data to be written according to the attribute information of the data to be written, where the stripe information includes at least one stripe unit used when the data to be written is stored and a physical address of each stripe unit in the at least one stripe unit.

Step S408, the client sends the at least one stripe unit and the physical address of the stripe unit to each object storage device according to the stripe information. Wherein each stripe unit is for storing a small portion of the data to be written. Correspondingly, the object storage device receives a stripe unit and a physical address of the stripe unit, wherein the stripe unit carries data to be stored, and the data to be stored is part of the data to be written. Optionally, the stripe unit also carries the physical address of the stripe unit.

Step S410, the object storage device processes the data to be stored in the data to be written according to the attribute information of the data to be written.

In step S406, after receiving the data write IO request, the client may determine whether to configure a corresponding data stripe for the data to be written according to the attribute information of the data to be written in the data write IO request, so as to store the data to be written. Specifically, when the attribute information of the data to be written includes the writing information of the address to be written corresponding to the data to be written. The writing information may be specifically a writing frequency or a writing type, which may be used to indicate a frequency or a frequency range in which the writing of the data occurs to the address to be written in a unit time. When the writing information is first writing information, the frequency for indicating that the data writing occurs to the address to be written in unit time is in a first threshold range, or the writing type indicated by the first writing information is quickly overwritten or frequently overwritten, etc., the client may give up configuring corresponding stripe information for the data to be written. Conversely, when the writing information is the second writing information, which is used to indicate that the frequency of writing the data to the address to be written is in the second threshold range in the unit time, or the writing type indicated by the first writing information is rarely overwritten, the client may configure corresponding stripe information for the data to be written, which will be described in detail below. Wherein the lower limit of the first threshold range is greater than or equal to the upper limit of the second threshold range.

Optionally, after receiving the data write IO request, the client may directly configure corresponding stripe information for the data to be written according to the attribute information of the data to be written in the data write IO request, and issue the stripe information to the corresponding object storage device. The object storage device side decides whether to store the data according to the physical address of the corresponding stripe unit in the stripe information according to the attribute information of the data to be written again, and the application is not limited.

The client configures corresponding stripe information for the data to be written according to the attribute information of the data to be written, and the method specifically comprises the following steps:

first, when the attribute information of the data to be written includes the writing granularity of the data to be written, the client may select, according to the writing granularity, a stripe unit with a size matching the writing granularity for the data to be written, so as to be used for storing the data to be written. Specifically, to reduce the amount of data storage, stripe units having the same stripe size as the write granularity are typically selected to form a stripe for storing the data to be written. After each stripe unit in the stripe is obtained, the physical address corresponding to each stripe unit can be obtained, and the stripe units are in one-to-one correspondence with the physical addresses where the stripe units correspond.

For example, assume that the data write IO request is 32K in size and the write granularity is 8K. Accordingly, when selecting a configuration stripe for data to be written, the client may select 4 stripes of 8K size to combine to form a stripe storing the data to be written. In other words, the stripes here comprise 4 stripes of 8K size. Accordingly, in data storage, the data to be written of 32K can be divided into 4 parts of data to be stored, and each part of data to be stored has a size of 8K and is stored in one stripe unit. These 4 stripe units may come from 4 object storage devices, respectively, and are not discussed here.

Second, when the attribute information of the data to be written includes writing information of the address where the data to be written corresponds to the address to be written, the writing information may specifically be writing frequency or writing type. Correspondingly, the client can allocate corresponding stripe information for the data to be written according to the writing information so as to store the data to be written. The stripe units include one or more stripe units, each for storing a portion of data (i.e., data to be stored) in the data to be written, and a physical address of each stripe unit.

Specifically, when the writing information is the writing frequency, the client may write or store the data to be written in the same frequency range into the same stripe or stripe unit. The same frequency range can be specifically set by a user or a system in a user-defined manner according to actual requirements, for example, 10-15 times per minute, etc. When the writing information is of a writing type, the object storage device can store the data to be written of the same writing type into the same stripe or stripe unit so as to realize classified storage of the data.

Thirdly, the attribute information of the data to be written includes the writing granularity of the data to be written and the writing information of the address to be written where the data to be written corresponds to. Accordingly, the client may configure the data to be written with corresponding stripe information according to the write granularity and write information of the address to be written, where the stripe information includes one or more stripe units for storing the data to be written and a respective physical address of each stripe unit. Each stripe unit is used to store a portion of the data (i.e., the data to be stored) in the data to be written. Illustratively, stripe units are selected that have the same stripe size as the write granularity, and support carrying or storing data that matches the write information of the address to be written. In other words, the client will comprehensively consider the writing type of the data to be written and the writing information (writing frequency or writing type) of the address to be written to configure the corresponding stripe for the data to be written to store the data to be written.

In an alternative embodiment, after receiving the data read IO request, the client may further calculate a corresponding EC check code for the data to be written, for example, calculate an EC check code by using a preset algorithm, and so on. Correspondingly, the client can also send the EC check code obtained by calculation and the data to be written to the object storage device together, so that the object storage device can store the data to be written and the EC check code together.

Optionally, in the process of storing data in a stripe manner, for the EC check code that is not full of the stripe, the write type of the stripe corresponding to the address to be written may be marked as a preset type, for example, to be overwritten. The system background is convenient to recalculate a new EC check code, and the EC check code obtained by original calculation is covered by combining data to be written to form a full stripe, so that the utilization rate of the disk storage space is improved.

After configuring stripe information for data to be written, the client can correspondingly distribute one or more stripe units in the stripe information to corresponding object storage devices, wherein the stripe units can carry physical addresses of the stripe units and data to be stored which need to be stored, and the data to be stored are part of data in the data to be written. For the mapping relationship between the stripe unit and the object storage device, reference may be specifically made to the related description in the foregoing embodiment, and the description is omitted here. In other words, after the client acquires the stripe unit for storing the data to be stored, the physical address of the stripe unit may be acquired, and further, the object storage device or the hard disk (magnetic disk) of the object storage device where the stripe unit corresponds may be acquired, which is not limited and described in detail herein.

Accordingly, the object storage device may then write the data to be stored to the hard disk of the object storage device according to the received stripe unit and the storage address of the stripe unit.

In step S410, after receiving the data write IO request sent by the client and the stripe unit carrying the data to be stored and the physical address, the object storage device may store the data to be stored according to the attribute information of the data to be written in the data write IO request.

Specifically, the attribute information of the data to be written includes writing information (for example, writing frequency or writing type) of the address to be written. When the writing information is first writing information, the first writing information is used for indicating the frequency of data writing of the address to be written. Specifically, the first writing information may be used to indicate that the writing frequency of the address to be written is within the first threshold range, or the writing type of the address to be written is the first type (for example, often overwritten, soon overwritten, etc.). The object store may write or store the data to be written to the first non-volatile cache (typically set up internally to the object store) accordingly. Optionally, after a period of time has elapsed, the object storage device may write the data stored in the first nonvolatile cache to a nonvolatile hard disk of the object storage device, so that a storage medium to which the data is to be written is arranged reasonably based on characteristics of the data to be written, and efficiency of data reading and writing is improved. Because the writing frequency of the address to be written is higher, the data to be written is stored in the cache for a period of time and then written into the hard disk, so that if the object storage device receives other new writing IO requests aiming at the address to be written in the period of time, the old data to be written stored in the cache can be covered by the data to be written carried by the new writing IO requests, and only the data to be written of the latest writing IO requests aiming at the address to be written can be written into the hard disk after the period of time is finished. The inefficiency caused by the fact that data to be written carried by each write IO request aiming at the address to be written needs to be written into the hard disk is avoided.

When the writing information is second writing information, the second writing information is used for indicating that the writing frequency of the address to be written is in a second threshold range, or the writing type of the address to be written is a second type (such as being rarely overwritten). Which may be used to indicate how often data writing of the data to be written occurs. Accordingly, the object storage device may directly write or store the data to be written into the nonvolatile hard disk of the object storage device. Specifically, the object storage device writes or stores the data to be stored in the data to be written into the nonvolatile hard disk of the object storage device according to the physical address of the stripe unit.

The first threshold range and the second threshold range are preset by a user or a system, and the lower limit value of the first threshold range is larger than or equal to the upper limit value of the second threshold range. The read-write rate of the nonvolatile cache is greater than the read-write rate of the nonvolatile hard disk.

In other words, the object storage device may store or write data to be written to different non-volatile memories of different write frequencies or write types. For example, if the write type of the address to be written corresponding to the data to be written is fast to be overwritten, the object storage device may write the data to be written into the nonvolatile cache, and not into the disk or the hard disk. Optionally, the object storage device may also store the data to be written with write types that are frequently overwritten and rarely overwritten separately to reduce data access interference.

In an alternative embodiment, the attribute information of the data to be written includes a write granularity of the data to be written. After the object storage device receives the data write IO request and the stripe unit carrying the data to be stored, the object storage device can construct a storage mapping relation of the data to be stored according to the write granularity of the data to be written. The storage mapping relation comprises a mapping relation between a logical address and a physical address when the data to be stored are stored. The logical address is determined according to the address to be written and the granularity of writing.

Specifically, the client receives an address to be written (typically, an LBA logical address) carrying data to be written in the data write IO request. Accordingly, after the client configures stripe information (one or more stripe units) for the data to be written, the address to be written may be divided according to the writing granularity (or the number of stripe units) of the data to be written, so as to obtain the logical address corresponding to each stripe unit. Further, the logical address, the physical address of each stripe unit and the data to be stored required to be stored by the stripe unit are sent to the corresponding object storage device. Accordingly, after the object storage device receives the information, a corresponding storage mapping relationship can be created for the data to be stored, namely, the storage mapping relationship of the data to be stored. Which includes the mapping between the logical and physical addresses of the data to be stored (stripe units) when stored.

Or after the client side correspondingly sends the stripe unit and the data write IO request to the object storage device, the object storage device can obtain the logical address corresponding to the stripe unit according to the address to be written in the data write IO request and the writing granularity of the data to be written, namely the logical address when the data to be stored in the stripe unit is stored. The application is not limited as to how the logical addresses of the stripe units are obtained. Illustratively, assume that the client configures 4 stripe units, stripe unit 1-stripe unit 4, respectively, for the data to be written. Located in object storage device 1-object storage device 4, respectively. Accordingly, when the object storage device 4 receives the stripe unit 4 sent by the client, it carries its own physical address and the data to be stored. Further, the logical address of stripe unit 4, here 30-40, is determined in combination with the address to be written (assumed to be 0-40) in the received data write IO request. Finally, a memory mapping relationship is created for the data to be stored, which is herein the mapping relationship between the logical address and the physical address of the data to be stored in the stripe unit 4.

In an alternative embodiment, after configuring stripe information for the data to be written, the client may also create a corresponding storage mapping relationship for the data to be written according to the writing granularity of the data to be written, that is, the storage mapping relationship of the data to be written. The memory mapping relationship comprises a mapping relationship between a logical address (i.e. an address to be written) and a physical address when the data to be written is stored.

Specifically, the data write IO request includes the logical address of the LBA of the data to be written, but the logical address is not associated with or related to the physical address of the stripe unit where the data to be written is actually stored. Accordingly, after the data to be written is stored, the system cannot determine the physical address where the data to be written is actually stored according to the logical address of the data to be written, and thus cannot query the data to be written. Therefore, the object storage device also needs to establish and store the storage mapping relation of the data to be written, and the storage mapping relation is specifically divided or customized according to the writing granularity, and is obtained by performing association binding on the logical address and the physical address when the data to be written is stored.

For example, assume that the size of a data write IO request (specifically, the size of data to be written in the data write IO request) is 32K byte, the LBA logical address of the request including the data to be written is 0-200, and the write granularity of the data to be written is 8K. The stripe configured by the client for the data combination to be written comprises 4 stripe units, and the respective physical addresses are respectively 0X00-0X04. Accordingly, the object storage device creates a storage mapping relation table as shown in the following table 1 for reflecting the storage mapping relation of the data to be written.

TABLE 1

As can be seen from table 1 above, the stripe to be written to the data store consists of 4 stripe units, each stripe unit having a size of 8K. Each stripe unit has a respective logical address and physical address. Correspondingly, after receiving the data read IO request, the object storage device may find the physical address (i.e. the physical address 0X01 of the stripe unit 1) corresponding to the logical address according to the storage mapping table recorded in the above table 1, and then read the data of the stripe unit 1 from the 0X01 position to be the data to be read.

In an alternative embodiment, the client may further send the storage mapping relationship of the data to be written to each object storage device, so that the object storage device can learn the mapping relationship between the logical address and the physical address of the stripe unit of the data storage. After the object storage device receives the logical address of the data to be read carried in the data read IO request, the data to be read is obtained from the corresponding physical address according to the logical address.

By implementing the embodiment of the invention, the field of the DIF domain is redefined, so that the storage layout of the data is optimized, and compared with the prior art, the storage performance of the memory can be improved, and the utilization rate of the memory is improved.

Next, in an application scenario of reading data, please refer to fig. 5, which is a schematic flow chart of another data processing method according to an embodiment of the present application. The method shown in fig. 5 includes the following implementation steps:

in step S502, the application software (specifically, the host running the application software) obtains attribute information of the data to be read.

In step S504, the application software sends a data read IO request to the client, where the data read IO request includes a DIF field, where the DIF is used to carry attribute information of data to be read. Accordingly, the client reads the data read IO request. The data read IO request may correspond to the related description referred to above for the data write IO request, which is not repeated here.

Step S506, the client sends the data read IO request to the object storage device. Accordingly, the object storage device receives the data read IO request.

In the application, after receiving the data read IO request, if the attribute information of the data to be read in the data read IO request comprises the size of the pre-read data, when the size of the pre-read data is larger than the first threshold value, the client can forward the data read IO request to the object storage device in advance, so that the pre-read data can be conveniently and quickly read and obtained. There are various embodiments of the present application that are issued in advance, and the present application is not limited thereto. For example, the client may allocate more transmission resources for the data read IO request, so as to increase the sending rate of the data read IO request, and thus extract and send the data read IO request to the object storage device. And when the size of the pre-read data is smaller than or equal to a first threshold value, issuing a data read IO request to the object storage device according to the normal issuing rate.

The pre-read data includes the data to be read, which can be understood as the total size of the data to be read in a period of time by the application software. It can be appreciated that, limited by the performance of the application itself, there is a limit to the size of the data read IO request that the application sends each time. When the pre-read data is larger, the application software can complete the reading of the pre-read data with corresponding size by issuing data read IO requests for multiple times. For example, the size of the pre-read data is 1M, the size of the data read IO request is 1K (i.e., the size of the data to be read in the data read IO request is 1K), and the application software can complete the reading of the pre-read data through 1000 data read IO requests.

And step S508, the object storage equipment reads the data to be read and processes the data to be read according to the attribute information of the data to be read.

The following embodiments specifically exist in step S508: first, the attribute information of the data to be read includes the size of the pre-read data including the data to be read. If the size of the pre-read data is greater than the first threshold, the object storage device may first read the pre-read data, then send the pre-read data to the client, and cache the pre-read data to the client side. When the following application software issues the same data read IO request, the corresponding data to be read in the pre-read data is directly read from the client, and the data reading time is saved.

Specifically, the data read IO request includes an LBA logical address of the pre-read data, and the object storage device may obtain, according to the logical address of the pre-read data, a physical address where the logical address of the pre-read data corresponds to from the storage mapping relationship stored by the object storage device, and then read according to the physical address of the pre-read data to obtain the pre-read data. Further, the object storage device may send the pre-read data to the client. Accordingly, the client may write the pre-read data into the memory, or the client may store the pre-read data according to attribute information of the data to be read in the data read IO request, as described in detail below.

And if the size of the pre-read data is smaller than or equal to a first threshold value, the object storage device can read the data to be read and then store the data to be read according to the attribute information of the data to be read. Optionally, after the object storage device reads the data to be read, the data to be read may be sent to other object storage devices, so that the other object storage devices can store the data to be read correspondingly according to the attribute information of the data to be read. In this case, the data to be read may not be stored in the memory of the client.

Specifically, when the size of the pre-read data is smaller than or equal to the first threshold, the data read IO request includes the LBA logical address of the data to be read. Correspondingly, the object storage device obtains a physical address corresponding to the logical address of the data to be read from the stored storage mapping relation according to the logical address of the data to be read. And then reading and obtaining the data to be read from the physical address. Further, the data to be read is sent to upper layer application software (i.e. a host with application software deployed).

Second, when the attribute information of the data to be read includes the read information of the address to be read where the data to be read corresponds to, the object storage device may process the data to be read according to the attribute information of the data to be read.

Specifically, the attribute information of the data to be read includes read information of an address to be read corresponding to the data to be read, where the read information includes, but is not limited to, a read type or a read frequency. When the read information is used for indicating the frequency of data reading of the address to be read in unit time, and when the frequency is greater than or equal to a second threshold value, the object storage device can send the data to be read to the client so as to be cached in the memory of the client. The data to be read is convenient to read from the memory of the client directly next time, and the data reading rate is improved. Accordingly, when the frequency is less than the second threshold, the object storage device may store the data to be read into a hard disk of the object storage device. Optionally, the object storage device may also send the data to be read to the client, so as to store the data in a hard disk of the client, and the application is not limited thereto.

Optionally, when the read information is first read information, the first read information is used to indicate that the read frequency of the address to be read is in the third threshold range, or indicates that the read type of the address to be read is a third type (e.g. frequent reading). Correspondingly, the object storage device can write the data to be read into the second nonvolatile cache of the object storage device, so that the data to be read cached in the second nonvolatile cache can be conveniently read directly next time.

When the read information is second read information, the second read information is used for indicating that the read frequency of the address to be read corresponding to the data to be read is in the fourth threshold range, or indicating that the read type of the address to be read is a fourth type (for example, multiple times of reading is about to occur). The object store may store the data to be read in a third non-volatile cache of the object store accordingly. Optionally, the object storage device may further set a corresponding expiration time in the third nonvolatile cache, so as to facilitate clearing data stored in the third nonvolatile cache for a time period exceeding the expiration time. The expiration time period may be specifically set by a user or a system in a user-defined manner, for example, 1 day, etc.

When the read information is third read information, the third read information is used for indicating that the read frequency of the address to be read corresponding to the data to be read is in the fifth threshold range, or indicating that the read type of the address to be read is a fifth type (such as few reads or read-only sequences). The object storage device may write or store the data to be read into the hard disk of the object storage device, and/or the object storage device may clear the data stored corresponding to the preset storage address. Specifically, the target storage device may determine, according to the data write IO request, a preset storage address (or a nonvolatile memory) where the indicated data to be cleared is located, where the preset storage address may be an LBA logical address. Correspondingly, the object storage device can clear the data stored correspondingly to the preset storage address so as to release the data stored correspondingly in the nonvolatile memory; or directly release all data stored in the non-volatile memory.

The third threshold range, the fourth threshold range and the fifth threshold range can be set by a user or a system according to actual requirements in a self-defining mode, the lower limit value of the third threshold range is larger than or equal to the upper limit value of the fourth threshold range, and the upper limit value of the fourth threshold range is larger than or equal to the lower limit value of the fifth threshold range. The read-write rate of the nonvolatile cache is greater than the read-write rate of the nonvolatile hard disk. The first nonvolatile cache and the second nonvolatile cache related to the application can be the same cache deployed in the object storage device or different caches, and can be specifically determined according to actual requirements without limitation.

In other words, the object storage device may store or write the data to be read corresponding to the addresses to be read of different read frequencies or read types into different nonvolatile memories. For example, when the read type of the address to be read corresponding to the data to be read is frequent reading, the object storage device may write the data to be read into the cache. When the read type of the address to be read corresponding to the data to be read is to be read for a plurality of times (sequential read), the object storage can write the data to be read into the same memory (such as a memory, a cache or a magnetic disk) so as to improve the performance and the efficiency of data reading. Optionally, an out-of-date period (i.e., expiration period) may also be set in the memory to limit the storage period for storing data in the memory. When the read type of the data to be read is read once, the object storage device may not cache the data to be read, and directly write the data to be read into the disk. When the read type of the address to be read corresponding to the data to be read is a type for indicating to release the cache, the object storage device may obtain a preset storage address (such as a logical address) of the data to be cleared from the data write IO request, clear the data stored corresponding to the preset storage address, so as to release the cache, and so on.

By implementing the embodiment of the invention, the field of the DIF domain is redefined, so that the storage layout of the data is optimized, and compared with the prior art, the storage performance of the storage system can be improved, and the working efficiency of the storage system is improved.

Fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention. The apparatus 600 as shown in fig. 6 includes a communication module 602 and a processing module 604. Wherein,

in one possible implementation, the data processing apparatus 600 is a host. In particular, the processing module 604 may be configured to control and manage actions of the data processing apparatus 600. Illustratively, the processing module 604 is for supporting the host to perform step S202 of fig. 2, step S402 of fig. 4, step S502 of fig. 5, and/or for performing other steps of the techniques described herein. The communication module 602 is used to support communication of the data processing apparatus 600 with other devices, e.g., the communication module 602 is used by a host to perform step S204 in fig. 2, step S404 in fig. 4, step S504 in fig. 5, and/or to perform other steps of the techniques described herein.

In another possible implementation, the data processing apparatus 600 is an object storage device. In particular, the processing module 604 may be configured to control and manage actions of the data processing apparatus 600. Illustratively, the processing module 604 is for supporting the object store to perform steps S206 and S208 of fig. 2, step S410 of fig. 4, step S508 of fig. 5, and/or for performing other steps of the techniques described herein. The communication module 602 is configured to support communication of the data processing apparatus 600 with other devices, e.g., the communication module 602 is configured to receive data read IO requests or data write IO requests sent by a client and/or to perform other steps of the techniques described herein.

In another possible implementation, the data processing apparatus 600 is a client. In particular, the processing module 604 may be configured to control and manage actions of the data processing apparatus 600. Illustratively, the processing module 604 is for supporting the client to perform step S406 of fig. 4, and/or for performing other steps of the techniques described herein. The communication module 602 is used to support communication of the data processing apparatus 600 with other devices, e.g., the communication module 602 is used to perform step S408 in fig. 4, step S506 in fig. 5, and/or to perform other steps of the techniques described herein.

Optionally, the data processing apparatus 600 may further comprise a storage module 606 for storing program code and data of the data processing apparatus 600.

The processing module 604 may be a processor or controller, such as a central processing unit (Central Processing Unit, CPU), a general purpose processor, a digital signal processor (Digital Signal Processor, DSP), an Application-specific integrated circuit (ASIC), a field programmable gate array (FieldProgrammable Gate Array, FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, and the like. The communication module 602 may be a communication interface, transceiver circuitry, etc., where the communication interface is generic and may include one or more interfaces, such as an interface between a communication module and a processing module, an interface between an object storage device and other devices (e.g., a host or client), etc. The storage module 606 may be a memory, or other service or module for providing storage functionality.

When the processing module 604 is a processor, the communication module 602 is a communication interface, and the storage module 606 is a memory, the data processing apparatus according to the embodiment of the present invention may be the data processing device shown in fig. 7. The processing module 604, the communication module 602, and the storage module 606 may also be implemented in software.

Referring to fig. 7, the data processing apparatus 700 includes one or more processors 701, a communication interface 701, and a memory 703. Optionally, the data processing device 700 may also include a bus 704. Wherein the communication interface 703, the processor 702 and the memory 701 may be interconnected by a bus 704; bus 704 may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus 704 may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, only one thick line is shown in fig. 7, but not only one bus or one type of bus. Wherein:

the processor 701 may be constituted by one or more general-purpose processors, such as a central processing unit (Central Processing Unit, CPU). The processor 701 may be configured to execute a program of processing functions in the associated program code. That is, execution of the program code by the processor 701 may implement the functions of the processing module. Wherein specific reference is made to the processing module in relation to the description of the previous embodiments.

In a possible implementation manner, when the data processing apparatus 700 is a host, the processor 701 of the host is configured to execute relevant program codes to implement the functions of the processing module of the present application. Or to implement step S202 of fig. 2, step S402 of fig. 4, step S502 of fig. 5, and/or other steps for performing the techniques described herein, etc., as described above, the present application is not limited.

In another possible implementation manner, when the data processing device 700 is an object storage device, the processor 701 of the object storage device is configured to execute relevant program codes to implement the functions of the above-mentioned processing module of the present application. Or to implement steps S206 and S208 of fig. 2, step S410 of fig. 4, step S508 of fig. 5, and/or other steps for performing the techniques described herein, etc., as described above, the present application is not limited.

In another possible implementation manner, when the data processing device 700 is a client, the processor 701 of the client is configured to execute relevant program codes to implement the functions of the processing module of the present application. Or to implement step S406 of fig. 4 described above for the present application, and/or other steps for performing the techniques described herein, etc., the present application is not limited. The communication interface 602 may be a wired interface (e.g., an ethernet interface) or a wireless interface (e.g., a cellular network interface or using a wireless local area network interface) for communicating with other modules/devices. For example, the communication interface 602 in the embodiment of the present application may be specifically configured to obtain attribute information of data to be operated, such as attribute information of data to be read or data to be written.

The Memory 603 may include Volatile Memory (RAM), such as random access Memory (Random Access Memory); the Memory may also include a Non-Volatile Memory (Non-Volatile Memory), such as a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a Hard Disk (HDD), or a Solid State Drive (SSD); the memory 603 may also include a combination of the above types of memory. The memory 603 may be used to store a set of program codes such that the processor 601 invokes the program codes stored in the memory 603 to implement the functions of the communication module and/or the processing module involved in the embodiment of the present application.

It should be noted that fig. 7 is only one possible implementation of the embodiment of the present application, and in practical applications, the data processing apparatus may further include more or fewer components, which is not limited herein. For details not shown or described in the embodiments of the present application, reference may be made to the related descriptions in any of the foregoing embodiments of fig. 2 to 5, which are not repeated here.

Embodiments of the present application also provide a computer non-transitory storage medium having instructions stored therein that, when executed on a processor, implement the method flow shown in any of the embodiments of fig. 2-5.

Embodiments of the present invention also provide a computer program product which, when run on a processor, implements the method flow shown in any of the embodiments of fig. 2-5.

The steps of a method or algorithm described in connection with the present disclosure may be embodied in hardware, or may be embodied in software instructions executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in random access Memory (Random Access Memory, RAM), flash Memory, read Only Memory (ROM), erasable programmable Read Only Memory (Erasable Programmable ROM), electrically Erasable Programmable Read Only Memory (EEPROM), registers, hard disk, a removable disk, a compact disc Read Only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. In addition, the ASIC may reside in a computing device. The processor and the storage medium may reside as discrete components in a computing device.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in the embodiments may be accomplished by computer programs stored in a computer-readable storage medium, which when executed, may include the steps of the embodiments of the methods described above. And the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

Claims

1. A method of data processing, the method comprising:

the method comprises the steps that an object storage device receives a data read IO request, wherein the data read IO request comprises a data integrity field DIF, and the DIF is used for bearing attribute information of data to be read;

and processing the data to be read according to the attribute information of the data to be read.

2. The method according to claim 1, wherein the attribute information of the data to be read includes a read granularity of the data to be read, for indicating a size of pre-read data including the data to be read;

the processing the data to be read according to the attribute information of the data to be read includes:

when the size of the pre-read data is larger than or equal to a first threshold value, the pre-read data including the data to be read is read, and the pre-read data is sent to a client side to be cached in a memory of the client side; or,

And when the size of the pre-read data is smaller than a first threshold value, reading the data to be read, and sending the data to be read to a host.

3. The method according to claim 1 or 2, wherein the attribute information of the data to be read includes read information of an address to be read corresponding to the data to be read, the read information being used to indicate a frequency of data reading of the address to be read per unit time,

when the read information is first read information, writing the data to be read into a cache of the object storage device; or,

when the read information is second read information, writing the data to be read into a cache of the object storage device, and setting an expiration time in the cache of the object storage device so as to clear data stored in the cache for a time longer than the expiration time;

wherein the frequency indicated by the first read information is greater than the frequency indicated by the second read information.

4. A method of data processing, the method comprising:

The method comprises the steps that an object storage device receives a data write IO request, wherein the data write IO request comprises a data integrity field DIF, and the DIF is used for bearing attribute information of data to be written;

and processing the data to be written according to the attribute information of the data to be written.

5. The method of claim 4, wherein the attribute information of the data to be written includes write information of an address to be written corresponding to the data to be written, the write information being used to indicate a frequency of data writing at the address to be written in a unit time,

the processing the data to be written according to the attribute information of the data to be written includes:

when the writing information is first writing information, the object storage device stores the data to be written into a cache of the object storage device; or,

when the writing information is second writing information, the object storage device stores the data to be written into a hard disk of the object storage device;

wherein the frequency indicated by the first writing information is greater than the frequency indicated by the second writing information.

6. The method according to claim 4 or 5, wherein the attribute information of the data to be written includes a write granularity of the data to be written, and the write granularity is used for indicating a storage granularity adopted when the data to be written is stored; the method further comprises the steps of:

The client receives the data write IO request, and configures corresponding stripe information for the data to be written according to the attribute information of the data to be written, wherein the stripe information comprises stripe units used when the data to be written is stored and physical addresses of the stripe units;

the client sends stripe units carrying data to be stored and the physical address to the object storage equipment according to the stripe information, wherein the data to be stored is the data in the data to be written;

the object storage equipment receives the stripe unit which is sent by the client and carries the data to be stored and the physical address;

and the object storage device stores the data to be stored into a hard disk of the object storage device according to the physical address of the stripe unit.

7. The method of claim 6, wherein the method further comprises: and the object storage equipment creates a storage mapping relation of the data to be stored according to the writing granularity of the data to be written, wherein the storage mapping relation comprises a mapping relation between a logical address and the physical address when the data to be stored is stored, and the logical address is associated with the address to be written.

8. A method of data processing, the method comprising:

the method comprises the steps that a host acquires attribute information of data to be operated, and a data operation request is generated according to the attribute information of the data to be operated;

and sending the data operation request to a client, wherein the data operation request comprises a data integrity field DIF, and the DIF is used for bearing attribute information of the data to be operated.

9. The method according to claim 8, wherein when the data operation request is a data read IO request, the attribute information of the data to be operated includes read granularity of the data to be read and/or read information of an address to be read corresponding to the data to be read;

the reading granularity of the data to be read is used for indicating the size of the pre-read data including the data to be read;

the read information of the address to be read is used for indicating the frequency of data reading of the address to be read in unit time.

10. The method according to claim 8, wherein when the data operation request is a data write IO request, the attribute information of the data to be operated includes write granularity of the data to be written and/or write information of an address to be written corresponding to the data to be written;

The writing granularity of the data to be written is used for indicating the storage granularity adopted when the data to be written is stored;

the writing information of the address to be written is used for indicating the frequency of data writing of the address to be written in unit time.

11. A data processing device, which is characterized by comprising a communication module and a processing module; wherein,

the communication module is used for receiving a data read IO request, wherein the data read IO request comprises a data integrity field DIF, and the DIF is used for bearing attribute information of data to be read;

the processing module is used for processing the data to be read according to the attribute information of the data to be read.

12. A data processing device, which is characterized by comprising a communication module and a processing module; wherein,

the communication module is used for receiving a data writing IO request, wherein the data writing IO request comprises a data integrity field DIF, and the DIF is used for bearing attribute information of data to be written;

the processing module is used for processing the data to be written according to the attribute information of the data to be written.

13. A data processing device, characterized by comprising a communication unit and a processing unit; wherein,

14. An object storage device, comprising a processor and a memory; the processor and the memory establish communication; wherein the memory is used for storing instructions; the processor is configured to invoke instructions in the memory to perform the method of any of the preceding claims 1-3.

15. An object storage device, comprising a processor and a memory; the processor and the memory establish communication; wherein the memory is used for storing instructions; the processor is configured to invoke instructions in the memory to perform the method of any of the preceding claims 4-7.

16. A host, which is characterized by comprising a processor and a memory; the processor and the memory establish communication; wherein the memory is used for storing instructions; the processor is configured to invoke instructions in the memory to perform the method of any of the preceding claims 8-10.

17. A computer non-transitory storage medium storing a computer program, which when executed by a computing device implements the method of any one of claims 1 to 3.

18. A computer non-transitory storage medium storing a computer program which, when executed by a computing device, implements the method of any of claims 4 to 7.

19. A computer non-transitory storage medium storing a computer program, which when executed by a computing device implements the method of any of claims 8 to 10.